Recursively Scanning a Path with Filters under Python

Python has a great function to walk a tree called os.walk(). It’s a simple generator (meaning that you just enumerate it), and, at each node (a specific child path) it gives you 1) the current path, 2) a list of child directories, and 3) a list of child files. You can even use it in such a way that you can adjust what child directories it will walk on-the-fly. However, it doesn’t take any filters. What if you just want to give it inclusion/exclusion rules and then see the matching results?

Enter pathscan. This library will silently start a background-worker (as a process) to scan the directory structure in parallel while forwarding results to the foreground. To install, just install the pathscan library. It requires Python 3.4.

The library runs as a generator:

import fss.constants
import fss.config.log
import fss.orchestrator

root_path = '/etc'

filter_rules = [
    (fss.constants.FT_DIR, fss.constants.FILTER_INCLUDE, 'init'),
    (fss.constants.FT_FILE, fss.constants.FILTER_INCLUDE, 'net*'),
    (fss.constants.FT_FILE, fss.constants.FILTER_EXCLUDE, 'networking.conf'),

o = fss.orchestrator.Orchestrator(root_path, filter_rules)
for (entry_type, entry_filepath) in o.recurse():
    if entry_type == fss.constants.FT_DIR:
        print("Directory: [%s]" % (entry_filepath,))
    else: # entry_type == fss.constants.FT_FILE:
        print("File: [%s]" % (entry_filepath,))

# Directory: [/etc/init]
# File: [/etc/networks]
# File: [/etc/netconfig]
# File: [/etc/init/network-interface-container.conf]
# File: [/etc/init/networking.conf]
# File: [/etc/init/network-interface-security.conf]
# File: [/etc/init/network-interface.conf]

A command-line tool is also included:

$ pathscan -i "i*.h" -id php /usr/include 
F /usr/include/iconv.h
F /usr/include/ifaddrs.h
F /usr/include/inttypes.h
F /usr/include/iso646.h
D /usr/include/php