Java 7 Working With Directories: DirectoryStream, Filter and PathMatcher
My previous post was about Java 7's new java.nio.file.Path class this time I want to continue and explorer some of the other new file system related APIs in Java 7.
java.nio.file.DirectoryStream
DirectoryStream provides an easy way to iterate over directories content, but more than that it introduces a solution for a long existing problem of listing within very large directories. If I wanted to list folder entries using previous Java versions I had to use one of the java.io.File's list() or listFiles() overloaded methods:
The problem with the old API is that once I asked a File object to list its entries it would immediately scan the folder creating an array of strings (or file objects if listFiles() was used) for each entry in the folder. This approach might take some time when scanning very large folders but more important than that is the memory overhead – the old API pre-fetches and pre-allocates all entries in the folder even if, for example, all I wanted to do was to print out the names of the first five files found in the folder. Java 7 introduces the new DirectoryStream interface which can be used to iterate over a directory without preloading its content into memory. First here is a basic usage example:
The example above is pretty straight forward: it creates a DirectoryStream<Path> instance using the Files.newDirectoryStream() static method. Then the stream is iterated and the filename of each Path element in the stream is printed out. Since DirectoryStreams obtained using Files.newDirectoryStream() are generic over java.nio.file.Path and I want the output of both code examples above to be identical I invoke the getFileName() on each Path instance (otherwise my output would include a full path for each file). Beside the syntactic differences between the two examples above the big difference lays in the underline behavior of File vs. DirecotryStream- while the first one creates an array of all filenames in the folder the second one loads each filename (or limited size group of cached filenames) when it encounters it during iteration.
extends java.io.Closeable) but if used within a simple try-catch-finally block it must be properly closed. DirectoryStream also extends java.lang.Iterable but being a stream enforces this to be a "nontraditional" implementation of this interface, as javadocs says "While DirectoryStream extends Iterable, it is not a general-purpose Iterable as it supports only a single Iterator; invoking the iterator method to obtain a second or subsequent iterator throws IllegalStateException" which means that we can obtain only one iterator from the stream a second attempt to iterate the stream will result in the following exception:
Unlike iterators used by most Java collections DirectoryStream's iterators overcomes changes in the iterated object – so no ConcurrentModificationException is ever thrown. The iterator’s hasNext() method implementation is guaranteed to read-ahead by at least one element from the directory. If this read-ahead buffer is not empty the getNext() method returns true which guarantees that the following invocation of next() will succeed (even if the stream has been closed or the file which is represented by this call to next() was already deleted - since the iterator stores a read-ahead buffer it might not represent changes to the underlying filesystem during iteration). Two other characteristics of this iterator are: it is read only (the remove() operation is not supported) and it filters out the links to the iterated directory or its parent (the '.' and '..' directories).
The newDirectoryStream(Path, String) method in the example above uses an instance of java.nio.file.DirectoryStream.Filter to filter directory entries. The Filter interface provides an API to accept or reject entries while iterating the stream which, in conjunction with the extended file systems support in JDK7 – such as file attributes and path matchers, can be easily leveraged to provide complex filters. The following example creates a filter which uses each entry's FileOwnerAttributeView instance to ensure the file is owned by user ‘eyal’:
There are various types of files attribute views, such as: BasicFileAttributeView, FileOwnerAttributeView, AclFileAttributeView and others, which can be used for all kinds of file metadata operations and filtering (obviously the filter itself can be implemented in any why we would like - with or without the usage of file attributes).
argument which is composed of a syntax and pattern separated by colon (syntax:pattern). Currently the FileSystem implementation supports to syntaxes: 'glob' and 'regex':
components (directories), the syntax is composed of the following syntactic tokens:
Non-Matching Patterns
Connection the dots: the newDirectoryStream(Path, String) method (illustrated in the filtering example above) uses a Filter which uses a PathMatcher with a globbing pattern, here is the code from the JDK:
java.nio.file.DirectoryStream
DirectoryStream provides an easy way to iterate over directories content, but more than that it introduces a solution for a long existing problem of listing within very large directories. If I wanted to list folder entries using previous Java versions I had to use one of the java.io.File's list() or listFiles() overloaded methods:// Pre Java 7 Directory Listing Example File f = new File("c:/tmp"); String[] names = f.list(); // At this point Java listed all files in c:/tmp and loaded their names into an array of Strings for (String name : names) { System.out.println(name); }
The problem with the old API is that once I asked a File object to list its entries it would immediately scan the folder creating an array of strings (or file objects if listFiles() was used) for each entry in the folder. This approach might take some time when scanning very large folders but more important than that is the memory overhead – the old API pre-fetches and pre-allocates all entries in the folder even if, for example, all I wanted to do was to print out the names of the first five files found in the folder. Java 7 introduces the new DirectoryStream interface which can be used to iterate over a directory without preloading its content into memory. First here is a basic usage example:
//Creating a DirectoryStream inside a try-with-resource block try (DirectoryStream<Path> ds = Files.newDirectoryStream(FileSystems.getDefault().getPath("c:/tmp"))) {for (Path p : ds) { // Iterate over the paths in the directory and print filenames System.out.println(p.getFileName()); } } catch (IOException e) { e.printStackTrace(); }
The example above is pretty straight forward: it creates a DirectoryStream<Path> instance using the Files.newDirectoryStream() static method. Then the stream is iterated and the filename of each Path element in the stream is printed out. Since DirectoryStreams obtained using Files.newDirectoryStream() are generic over java.nio.file.Path and I want the output of both code examples above to be identical I invoke the getFileName() on each Path instance (otherwise my output would include a full path for each file). Beside the syntactic differences between the two examples above the big difference lays in the underline behavior of File vs. DirecotryStream- while the first one creates an array of all filenames in the folder the second one loads each filename (or limited size group of cached filenames) when it encounters it during iteration.
More About DirectoryStream
DirectoryStream needs to be closed, the example above uses a try-with-resource loop (DirectoryStreamextends java.io.Closeable) but if used within a simple try-catch-finally block it must be properly closed. DirectoryStream also extends java.lang.Iterable but being a stream enforces this to be a "nontraditional" implementation of this interface, as javadocs says "While DirectoryStream extends Iterable, it is not a general-purpose Iterable as it supports only a single Iterator; invoking the iterator method to obtain a second or subsequent iterator throws IllegalStateException" which means that we can obtain only one iterator from the stream a second attempt to iterate the stream will result in the following exception:
Exception in thread "main" java.lang.IllegalStateException: Iterator already obtained
at sun.nio.fs.WindowsDirectoryStream.iterator(WindowsDirectoryStream.java:117)
at com.eyallupu.blog.jse7.nio.filesystem.DirectoryStreamExample.main(DirectoryStreamExample.java:31)
Unlike iterators used by most Java collections DirectoryStream's iterators overcomes changes in the iterated object – so no ConcurrentModificationException is ever thrown. The iterator’s hasNext() method implementation is guaranteed to read-ahead by at least one element from the directory. If this read-ahead buffer is not empty the getNext() method returns true which guarantees that the following invocation of next() will succeed (even if the stream has been closed or the file which is represented by this call to next() was already deleted - since the iterator stores a read-ahead buffer it might not represent changes to the underlying filesystem during iteration). Two other characteristics of this iterator are: it is read only (the remove() operation is not supported) and it filters out the links to the iterated directory or its parent (the '.' and '..' directories).
Filtering
DirectoryStream entries can be filtered here is the simplest example which iterates over all filenames in the c:\tmp folder ending with '.exe':// Creating a DirectoryStream which accepts only filenames ending with '.exe' Path p = FileSystems.getDefault().getPath("c:/tmp"); try (DirectoryStreamds = Files.newDirectoryStream(p, "*.exe")) { for (Path p : ds) { // Iterate over the paths in the directory and print filenames System.out.println(p.getFileName()); } } catch (IOException e) { e.printStackTrace(); }
The newDirectoryStream(Path, String) method in the example above uses an instance of java.nio.file.DirectoryStream.Filter
Path folderToIterate = FileSystems.getDefault().getPath("c:/tmp"); // Creating the filter DirectoryStream.Filterfilter = new DirectoryStream.Filter @Override public boolean accept(Path entry) throws IOException { FileOwnerAttributeView ownerAttrs = Files.getFileAttributeView(entry, FileOwnerAttributeView.class); return "eyal".equals(ownerAttrs.getOwner().getName()); } }; try (DirectoryStream() { ds = Files.newDirectoryStream(folderToIterate, filter)) { for (Path p : ds) { // Iterate over the paths in the directory and print filenames System.out.println(p.getFileName()); } } catch (IOException e) { e.printStackTrace(); }
There are various types of files attribute views, such as: BasicFileAttributeView, FileOwnerAttributeView, AclFileAttributeView and others, which can be used for all kinds of file metadata operations and filtering (obviously the filter itself can be implemented in any why we would like - with or without the usage of file attributes).
PathMatcher
Filename is probably the most commonly used attribute of files making the new java.nio.file.PathMatcher interface another useful tool for developers. The PathMatcher interface defines a single method (boolean matches(Path path)) which matches paths against patterns. PathMatcher instances can be instantiated using the getPathMatcher(syntaxAndPattern) method of the FileSystem class, this instantiation method gets oneargument which is composed of a syntax and pattern separated by colon (syntax:pattern). Currently the FileSystem implementation supports to syntaxes: 'glob' and 'regex':
The 'glob' Syntax
The glob (stands for globbing) syntax is a 'simplified' form of regular expressions with awareness to pathcomponents (directories), the syntax is composed of the following syntactic tokens:
- The '*' character matches zero or more characters from the path elements without crossing directory boundaries (unlike regular expression this is not a Kleene star and it has nothing to do with the preceding part of the expression)
- The '**' characters match zero or more characters crossing directory boundaries
- The '?' character matches exactly one character of a name component
- '[' and ']' can be used to match a single character in the path name from a set of characters
- '-' (hyphen) can be used to specify a range of characters. If hyphen has to be included in the characters set it must be the first in the set
- '!' as the first character in the set can be used as a negation expression
- '{' and '}' can group sub patterns, the group matches if any of the sub patterns matches (comma is used to separate between the groups)
- The dot '.' character represents a dot (unlike regular expressions in which a dot is a replacement for any character)
- and finally: special characters escaping is done using backslash
// Create a Path object Path path = FileSystems.getDefault(). getPath("eyal/blog/my-workspace5/jse7/src/main/java/com/eyallupu/blug/jse7/Paths.java"); // Create a matcher and match FileSystems.getDefault().getPathMatcher("glob:**/*").matches(path); // The following is an output of invoking some patterns using the Path created above Matching Patterns
Pattern | Comment |
glob:** | Matches all. |
glob:**/jse7/** | Matches any path as long as 'jse7' is one of its components. |
glob:eyal/{jse7,main,blog}/** | The path instance starts with 'eyal/blog' which satisfies the pattern. |
glob:**/my-workspace[0-5]/** | Must include element named my-workspaceX where X is between 0 to 5. |
glob:**/*.java | The path's last element must end with '.java'. |
Pattern | Comment |
glob:* | Doesn't match since '*' means only one directory level. |
glob:**/jse7 | Doesn't match since the last component in the path is not 'jse7'. |
glob:eyal/{jse7,main}/** | Requires the path to start with 'eyal' followed immediately by either 'main' or 'jse7'. |
glob:**/my-workspace[6-9]/** | Must include element named my-workspaceX where X is between 6 to 9. |
glob:**/*..java | Unlike regular expressions: dot represents itself. |
Connection the dots: the newDirectoryStream(Path, String) method (illustrated in the filtering example above) uses a Filter which uses a PathMatcher with a globbing pattern, here is the code from the JDK:
public static DirectoryStreamnewDirectoryStream(Path dir, String glob) throws IOException { // create a matcher and return a filter that uses it. FileSystem fs = dir.getFileSystem(); final PathMatcher matcher = fs.getPathMatcher("glob:" + glob); DirectoryStream.Filter<Path> filter = new DirectoryStream.Filter<Path>() { @Override public boolean accept(Path entry) { return matcher.matches(entry.getFileName()); } }; return fs.provider().newDirectoryStream(dir, filter); }
The 'regex' Syntax
The 'regex' syntax is the traditional java.util.regex.Pattern - not too much to say about it...here is a short example:..... // Create a matcher and match FileSystems.getDefault().getPathMatcher("regex:.*").matches(path); - regex:.* (match) This is a match for zero or more characters - regex:* Exception: When using regex syntax the '*' is a Kleene operator requires a preceding expression
Comments
FileSystems.getDefault().getPath("c:/tmp")
instead
Paths.get("c:/tmp")
?