File system traversal with Node.js and node-walk

Traverse & Read Files in the File System using Node.js and node-walk

Recently I had to build a static site generator using Node. My build script traversed, sorted and filtered files, and run them through my templating system. For this task, I chose an npm package called walk, which is a port of Python’s os.walk. In this post, I give a bit more detail on how to get started with node-walk.

For this example, we will be playing with a file hierarchy that has both files and several layers of folders. Also, I want to be able to control what files and folders to include in the traversal.

Here is a file hierarchy I created for this example (the files we’ll be operating on are in the files folder):

Initial Setup

  1. In your terminal, navigate to your working directory (I am using node-walk-test in this example). If you are starting from scratch, run npm init to create package.json:
    cd path/to/dir 
    npm init
  2. Choose your main file. In this example, I am using main.js.
  3. Install the walk package from npm:
    npm install --save walk
  4. We will use two Node modules: fs (File System) and Path. They come with your Node installation, and to use them you just need to require them in your main.js:
    var fs = require("fs");
    var path = require("path");
  5. Finally, we will require and initiate the walk object:
    var walk = require("walk");
    var pathToFiles = "files";
    var options = {
      followLinks: false
    }
    var walker = walk.walk(pathToFiles, options);

At this point, this is what your main.js file should contain:

"use strict";

var fs = require("fs");
var path = require("path");
var walk = require("walk");

var pathToFiles = "files";

var options = {
    followLinks: false
}

var walker = walk.walk(pathToFiles, options);

List all files and directories, recursively

While the walker “walks” your file system, it will fire a number of events. You can select what events you want to listen to.

The first event we will test is called names. It will give us an array of strings containing names of all files and directories:

walker.on("names", function (root, nodeNamesArray) {
    console.log("Files & folders in the " + root + " folder: " + nodeNamesArray);
});

Here is what the output of the above function will look like:

Files & folders in the files folder: .DS_Store,dir1,dir2,donotread.md,exclude,file1.md,file2.md,file3.md
Files & folders in the files/exclude folder: exclude.md
Files & folders in the files/dir2 folder: file1.md
Files & folders in the files/dir1 folder: .DS_Store,file1.md,subdir1
Files & folders in the files/dir1/subdir1 folder: file1.md

As you can see, the walker traverses your target folder, prints the names of files and directories in that folder, then moves on to the subfolders, and repeats the process for each subfolder, recursively.

You can use it to sort or filter files before performing more costly file operations.

Get an array of directory stat objects

The next event we’ll look at is directories. This event will be fired after the walker has processed all the files in the current folder. It will give you an array of stat objects for all the directories in your target folder:

walker.on("directories", function (root, dirStatsArray, next) {

    console.log('Current directory root: ' + root);
    console.log(dirStatsArray);

    next();
});

In your callback, you will have access to 3 arguments: root, dirStatsArray and next

  • root: path to the current directory
  • dirStatsArray: array of stat objects. Each object has a name and type attribute. In this case, the type will be ‘directory’. Here is an example of a stat object:
    { dev: 16777221,
      mode: 16877,
      nlink: 3,
      uid: 501,
      gid: 20,
      rdev: 0,
      blksize: 4096,
      ino: 91318178,
      size: 102,
      blocks: 0,
      atime: Wed Feb 08 2017 09:29:38 GMT-0500 (EST),
      mtime: Wed Feb 08 2017 08:56:54 GMT-0500 (EST),
      ctime: Wed Feb 08 2017 08:56:54 GMT-0500 (EST),
      birthtime: Wed Feb 08 2017 08:56:45 GMT-0500 (EST),
      name: 'subdir1',
      type: 'directory' }
    
  • next: callback function for the next iteration

Keep in mind that next() will only be called on folders that include subfolders. In our example, the files folder has 3 subfolders: dir1, dir2 and exclude. However, only dir1 contains a child folder (subdir1). In this case, there will be only two iterations: first, the starting files folder, and second, the dir1 folder.

If you modify this array in any way – sort or remove a node, for example – you will affect the rest of the traversing. Use the names event to get a list of all files and folders, modify it as necessary, and then proceed to perform file operations.

Get an array of file stat objects

files event works in the same way as directories, and will recursively traverse all folders in your target folder, and return an array of file stat objects of type “file”:

walker.on("files", function (root, fileStatsArray, next) {

    console.log('Current directory root: ' + root);
    console.log(fileStatsArray);

    next();
});

This event will be fired after all the files in the current folder have been processed by the walker.

Read each file during traversal

Now, this is probably one of the most used events: file. As the walker traverses the file system, it will fire a file event every time it encounters a – you guessed it! – file:

walker.on("file", function (root, fileStats, next) {
    fs.readFile(path.join(root, fileStats.name), function () {
        console.log(fileStats.name);
        next();
    });
});

Here you can use Node’s file operations to work on your file.

The output of the above example will look like this:

.DS_Store
donotread.md
file1.md
file2.md
file3.md
exclude.md
.DS_Store
file1.md
.DS_Store
file1.md
file1.md

As you can see, the walker reads all files, including hidden files. This may not be exactly what you want. You may want to tell walker what kind of files you want it to skip.

Filter out files you don’t want to process

Walker provides an option to skip directories you don’t want to process. You can add a filters array containing a list of folders to exclude, like so:

var options = {
    followLinks: false,
    filters: ['exclude']
}

In the above example, I am telling walker to skip the exclude folder. Here is the outcome after this modification:

.DS_Store
donotread.md
file1.md
file2.md
file3.md
.DS_Store
file1.md
.DS_Store
file1.md
file1.md

As you can see, the exclude.md file from the exclude folder is not on the list.

To filter out certain files or file types, you have several options:

  • use the names event to generate a list of all files and folders, and apply a filter to that array
  • add a check in the file callback

Here is an example of the latter:

walker.on("file", function (root, fileStats, next) {

    // skip file names starting with '.'
    if (fileStats.name.substr(0, 1) === '.' ) {
        next();
        return;
    }

    // skip file donotread.md
    if (fileStats.name === 'donotread.md' ) {
        next();
        return;
    }

    fs.readFile(path.join(root, fileStats.name), function () {
        console.log(path.join(root, fileStats.name));
        next();
    });
    
});

Your implementation will depend on exactly what you want to filter out: you can use RegEx to apply rules.

The outcome of the above example:

files/file1.md
files/file2.md
files/file3.md
files/dir2/file1.md
files/dir1/file1.md
files/dir1/subdir1/file1.md

As you can see, the hidden system files are gone, as well as donotread.md.

Do something when the walker has finished traversing

It can be useful to know then the walker has completed the ‘walk’. You can use the end callback:

walker.on("end", function () {
    console.log("all done");
});

***

I hope these examples helped you get a feel of what walker has to offer. For more features, including how to run walker synchronously, head over to the official documentation page.

Leave a Reply

Your email address will not be published. Required fields are marked *