Java 7 Working With Directories: DirectoryStream, Filter and PathMatcher

My previous post was about Java 7's new java.nio.file.Path class this time I want to continue and explorer some of the other new file system related APIs in Java 7.

java.nio.file.DirectoryStream

DirectoryStream provides an easy way to iterate over directories content, but more than that it introduces a solution for a long existing problem of listing within very large directories. If I wanted to list folder entries using previous Java versions I had to use one of the java.io.File's list() or listFiles() overloaded methods:


// Pre Java 7 Directory Listing Example
File f = new File("c:/tmp");
String[] names = f.list();

// At this point Java listed all files in c:/tmp and loaded their names into an array of Strings
for (String name : names) {
 System.out.println(name);
}

The problem with the old API is that once I asked a File object to list its entries it would immediately scan the folder creating an array of strings (or file objects if listFiles() was used) for each entry in the folder. This approach might take some time when scanning very large folders but more important than that is the memory overhead – the old API pre-fetches and pre-allocates all entries in the folder even if, for example, all I wanted to do was to print out the names of the first five files found in the folder. Java 7 introduces the new DirectoryStream interface which can be used to iterate over a directory without preloading its content into memory. First here is a basic usage example:

//Creating a DirectoryStream inside a try-with-resource block
try (DirectoryStream<Path> ds = 
  Files.newDirectoryStream(FileSystems.getDefault().getPath("c:/tmp"))) {for (Path p : ds) {
       
   // Iterate over the paths in the directory and print filenames
   System.out.println(p.getFileName()); 
 }

} catch (IOException e) {
   e.printStackTrace();
}

The example above is pretty straight forward: it creates a DirectoryStream<Path> instance using the Files.newDirectoryStream() static method. Then the stream is iterated and the filename of each Path element in the stream is printed out. Since DirectoryStreams obtained using Files.newDirectoryStream() are generic over java.nio.file.Path and I want the output of both code examples above to be identical I invoke the getFileName() on each Path instance (otherwise my output would include a full path for each file). Beside the syntactic differences between the two examples above the big difference lays in the underline behavior of File vs. DirecotryStream- while the first one creates an array of all filenames in the folder the second one loads each filename (or limited size group of cached filenames) when it encounters it during iteration.

More About DirectoryStream
DirectoryStream needs to be closed, the example above uses a try-with-resource loop (DirectoryStream
extends java.io.Closeable) but if used within a simple try-catch-finally block it must be properly closed. DirectoryStream also extends java.lang.Iterable but being a stream enforces this to be a "nontraditional" implementation of this interface, as javadocs says "While DirectoryStream extends Iterable, it is not a general-purpose Iterable as it supports only a single Iterator; invoking the iterator method to obtain a second or subsequent iterator throws IllegalStateException" which means that we can obtain only one iterator from the stream a second attempt to iterate the stream will result in the following exception:



Exception in thread "main" java.lang.IllegalStateException: Iterator already obtained
 at sun.nio.fs.WindowsDirectoryStream.iterator(WindowsDirectoryStream.java:117)
 at com.eyallupu.blog.jse7.nio.filesystem.DirectoryStreamExample.main(DirectoryStreamExample.java:31)

Unlike iterators used by most Java collections DirectoryStream's iterators overcomes changes in the iterated object – so no ConcurrentModificationException is ever thrown. The iterator’s hasNext() method implementation is guaranteed to read-ahead by at least one element from the directory. If this read-ahead buffer is not empty the getNext() method returns true which guarantees that the following invocation of next() will succeed (even if the stream has been closed or the file which is represented by this call to next() was already deleted - since the iterator stores a read-ahead buffer it might not represent changes to the underlying filesystem during iteration). Two other characteristics of this iterator are: it is read only (the remove() operation is not supported) and it filters out the links to the iterated directory or its parent (the '.' and '..' directories).


Filtering

DirectoryStream entries can be filtered here is the simplest example which iterates over all filenames in the c:\tmp folder ending with '.exe':



// Creating a DirectoryStream which accepts only filenames ending with '.exe'
Path p = FileSystems.getDefault().getPath("c:/tmp");
try (DirectoryStream ds = Files.newDirectoryStream(p, "*.exe")) {
 for (Path p : ds) {
 // Iterate over the paths in the directory and print filenames
 System.out.println(p.getFileName());
 }
} catch (IOException e) {
 e.printStackTrace();
}


The newDirectoryStream(Path, String) method in the example above uses an instance of java.nio.file.DirectoryStream.Filter to filter directory entries. The Filter interface provides an API to accept or reject entries while iterating the stream which, in conjunction with the extended file systems support in JDK7 – such as file attributes and path matchers, can be easily leveraged to provide complex filters. The following example creates a filter which uses each entry's FileOwnerAttributeView instance to ensure the file is owned by user ‘eyal’:



Path folderToIterate = FileSystems.getDefault().getPath("c:/tmp");

// Creating the filter
DirectoryStream.Filter filter = new DirectoryStream.Filter() {

 @Override
 public boolean accept(Path entry) throws IOException {
   FileOwnerAttributeView ownerAttrs = Files.getFileAttributeView(entry, FileOwnerAttributeView.class);
   return "eyal".equals(ownerAttrs.getOwner().getName());
 }
};


try (DirectoryStream ds = Files.newDirectoryStream(folderToIterate, filter)) {
 for (Path p : ds) {
  // Iterate over the paths in the directory and print filenames
 System.out.println(p.getFileName());
 }

} catch (IOException e) {
 e.printStackTrace();
}

There are various types of files attribute views, such as: BasicFileAttributeView, FileOwnerAttributeView, AclFileAttributeView and others, which can be used for all kinds of file metadata operations and filtering (obviously the filter itself can be implemented in any why we would like - with or without the usage of file  attributes).

PathMatcher

Filename is probably the most commonly used attribute of files making the new java.nio.file.PathMatcher interface another useful tool for developers. The PathMatcher interface defines a single method (boolean matches(Path path)) which matches paths against patterns. PathMatcher instances can be instantiated using the getPathMatcher(syntaxAndPattern) method of the FileSystem class, this instantiation method gets one
argument which is composed of a syntax and pattern separated by colon (syntax:pattern). Currently the FileSystem implementation supports to syntaxes:  'glob' and 'regex':

The 'glob' Syntax
The glob (stands for globbing) syntax is a 'simplified' form of regular expressions with awareness to path
components (directories), the syntax is composed of the following syntactic tokens:
  • The '*' character matches zero or more characters from the path elements without crossing directory boundaries (unlike regular expression this is not a Kleene star and it has nothing to do with the preceding part of the expression)
  • The '**' characters match zero or more characters crossing directory boundaries
  • The '?' character matches exactly one character of a name component
  • '[' and ']' can be used to match a single character in the path name from a set of characters
    • '-' (hyphen) can be used to specify a range of characters. If hyphen has to be included in the characters set it must be the first in the set
    • '!' as the first character in the set can be used as a negation expression
  • '{' and '}' can group sub patterns, the group matches if any of the sub patterns matches (comma is used to separate between the groups)
  • The dot '.' character represents a dot (unlike regular expressions in which a dot is a replacement for any character)
  • and finally: special characters escaping is done using backslash
The '**' expression is the only one to cross directory boundaries, all other expressions are bound within a single element (either a directory or a filename), below is a sample usage of PathMatcher followed by few pattern examples:



// Create a Path object
Path path = FileSystems.getDefault().
   getPath("eyal/blog/my-workspace5/jse7/src/main/java/com/eyallupu/blug/jse7/Paths.java");

// Create a matcher and match
FileSystems.getDefault().getPathMatcher("glob:**/*").matches(path);

// The following is an output of invoking some patterns using the Path created above

Matching Patterns
PatternComment
glob:**Matches all.
glob:**/jse7/**Matches any path as long as 'jse7' is one of its components.
glob:eyal/{jse7,main,blog}/** The path instance starts with 'eyal/blog' which satisfies
the pattern.
glob:**/my-workspace[0-5]/**Must include element named my-workspaceX where X is between 0 to 5.
glob:**/*.javaThe path's last element must end with '.java'.
Non-Matching Patterns
PatternComment
glob:*Doesn't match since '*' means only one directory level.
glob:**/jse7Doesn't match since the last component in the path is not 'jse7'.
glob:eyal/{jse7,main}/**Requires the path to start with 'eyal' followed immediately by
either 'main' or 'jse7'.
glob:**/my-workspace[6-9]/** Must include element named my-workspaceX where X is between 6 to 9.
glob:**/*..javaUnlike regular expressions: dot represents itself.

Connection the dots: the newDirectoryStream(Path, String) method (illustrated in the filtering example above) uses a Filter which uses a PathMatcher with a globbing pattern, here is the code from the JDK:


public static DirectoryStream newDirectoryStream(Path dir, String glob) throws IOException
    {
        // create a matcher and return a filter that uses it.
        FileSystem fs = dir.getFileSystem();
        final PathMatcher matcher = fs.getPathMatcher("glob:" + glob);
        DirectoryStream.Filter<Path> filter = new DirectoryStream.Filter<Path>() {
            @Override
            public boolean accept(Path entry)  {
                return matcher.matches(entry.getFileName());
            }
        };
        return fs.provider().newDirectoryStream(dir, filter);
    }

The 'regex' Syntax
The 'regex' syntax is the traditional java.util.regex.Pattern - not too much to say about it...here is a short example:


.....
// Create a matcher and match
FileSystems.getDefault().getPathMatcher("regex:.*").matches(path);


- regex:.* (match) This is a match for zero or more characters
- regex:*  Exception: When using regex syntax the '*' is a Kleene operator requires
    a preceding expression

Comments

Java Developer said…
The new approach with path matcher makes it more neat. I guess these improvements are going to significantly change the way IO has been performed in Java so far. Thanks for illustrating it in a simple code snippet here. It helps.
crescent said…
Very Nice Sharing with this blog.
Anonymous said…
Very informative and delivered in a nice. Keep up the good work.
Bluecubeit said…
Very good information thanks to sharing information.
Unknown said…
This comment has been removed by a blog administrator.
Anonymous said…
The contents of your article are really good, but you didn't test the code. There are a plethora of silly errors, such as a parenthesis instead of a curly brace.
Hi there! glad to drop by your page and found these very interesting and informative stuff. Thanks for sharing, keep it up!
Unknown said…
This comment has been removed by a blog administrator.
Unknown said…
why
FileSystems.getDefault().getPath("c:/tmp")
instead
Paths.get("c:/tmp")
?

Popular posts from this blog

New in Spring MVC 3.1: CSRF Protection using RequestDataValueProcessor

Hibernate Exception - Simultaneously Fetch Multiple Bags

Hibernate Derived Properties - Performance and Portability