Testing File I/O

A common problem encountered in unit testing is interaction with the file system: scanning directories, creating files, opening files, reading files, writing files, creating symbolic links and so-on. Interacting directly with the file system when executing unit tests can lead to a failure of tests to be robust or repeatable. We also want our unit tests to be fast; an execution time of 100 milliseconds is considered a slow unit test. Interacting directly with the file system can make our unit tests take too long to execute.

The simplest approach is to wrap file system operations in an interface and perform all operations through the interface. Consider the following example of a function that returns a vector of filenames ending in ".txt" in a given directory:

extern std::vector<std::string> text_files(std::string const& directory);

How can we unit test text_files without relying on the actual contents of the file system? We can decouple text_files from the file system by introducing an interface:

class directory_scanner;

extern std::vector<std::string> text_files(
    std::string const& directory,
    directory_scanner& scanner);

We've used a forward declaration of directory_scanner where text_files is declared. The interface looks like this:

class directory_scanner
{
public:
    virtual ~directory_scanner() {}

    virtual void begin(std::string const &directory) = 0;

    virtual bool has_next() const = 0;

    virtual std::string next() const = 0;
};

The interface directory_scanner is used to isolate the free function text_files from directly interacting with the file system. In the unit tests, we use an implementation of directory_scanner that senses how text_files uses the interface and allows us to control the data made available to text_files. The tests can use a hand-crafted fake implementation or an implementation from a mock library. The fake might look something like this:

class fake_directory_scanner : public directory_scanner
{
public:
    fake_directory_scanner()
        : begin_called(false),
        has_next_called(false),
        next_called(false),
        next_call_count(0U)
    {}

    virtual ~fake_directory_scanner() {}

    virtual void begin(std::string const &directory)
    {
        begin_called = true;
        begin_last_directory = directory;
    }
    bool begin_called;
    std::string begin_last_directory;

    virtual bool has_next() const
    {
        has_next_called = true;
        return next_call_count <= next_fake_results.size();
    }
    mutable bool has_next_called;
    bool has_next_fake_result;

    virtual std::string next() const
    {
        next_called = true;
        ++next_call_count;
        return has_next() ? next_fake_results[next_call_count - 1U] : "";
    }
    mutable bool next_called;
    mutable std::size_t next_call_count;
    std::vector<std::string> next_fake_results;
};

The tests for text_files use the fake_directory_scanner to specify the configuration of the file system for the different test cases:

static std::string const ARBITRARY_DIRECTORY_NAME("foo");
static std::string const ARBITRARY_NON_TEXT_FILE_NAME("foo.foo");
static std::string const ARBITRARY_TEXT_FILE_NAME("foo.txt");
static std::string const ARBITRARY_OTHER_TEXT_FILE_NAME("bar.txt");

struct text_files_fixture
{
    void expect_enumerate_non_text_file()
    {
        scanner.next_fake_results.push_back(ARBITRARY_NON_TEXT_FILE_NAME);
    }

    void expect_enumerate_text_file(std::string const& file_name = ARBITRARY_TEXT_FILE_NAME)
    {
        scanner.next_fake_results.push_back(file_name);
        expected.push_back(file_name);
    }

    fake_directory_scanner scanner;
    std::vector<std::string> empty;
    std::vector<std::string> expected;
};

BOOST_FIXTURE_TEST_SUITE(test_text_files, text_files_fixture);

BOOST_AUTO_TEST_CASE(returns_empty_for_empty_directory)
{
    std::vector<std::string> files = text_files(ARBITRARY_DIRECTORY_NAME, scanner);

    BOOST_REQUIRE_EQUAL_COLLECTIONS(empty.begin(), empty.end(), files.begin(), files.end());
}

BOOST_AUTO_TEST_CASE(returns_empty_for_no_text_files)
{
    expect_enumerate_non_text_file();

    std::vector<std::string> files = text_files(ARBITRARY_DIRECTORY_NAME, scanner);

    BOOST_REQUIRE_EQUAL_COLLECTIONS(empty.begin(), empty.end(), files.begin(), files.end());
}

BOOST_AUTO_TEST_CASE(returns_file_for_text_file)
{
    expect_enumerate_text_file();

    std::vector<std::string> files = text_files(ARBITRARY_DIRECTORY_NAME, scanner);

    BOOST_REQUIRE_EQUAL_COLLECTIONS(expected.begin(), expected.end(), files.begin(), files.end());
}

BOOST_AUTO_TEST_CASE(returns_only_text_file_for_mixed_files)
{
    expect_enumerate_text_file();
    expect_enumerate_non_text_file();

    std::vector<std::string> files = text_files(ARBITRARY_DIRECTORY_NAME, scanner);

    BOOST_REQUIRE_EQUAL_COLLECTIONS(expected.begin(), expected.end(), files.begin(), files.end());
}

BOOST_AUTO_TEST_CASE(returns_all_text_files)
{
    expect_enumerate_text_file();
    expect_enumerate_non_text_file();
    expect_enumerate_text_file(ARBITRARY_OTHER_TEXT_FILE_NAME);

    std::vector<std::string> files = text_files(ARBITRARY_DIRECTORY_NAME, scanner);

    BOOST_REQUIRE_EQUAL_COLLECTIONS(expected.begin(), expected.end(), files.begin(), files.end());
}

BOOST_AUTO_TEST_SUITE_END();

In production code, we use an implementation of directory_scanner that interacts directly with the file system using Boost.FileSystem.

class filesystem_directory_scanner : public directory_scanner
{
public:
    filesystem_directory_scanner() {}
    virtual ~filesystem_directory_scanner() {}

    virtual void begin(std::string const &directory)
    {
        current = boost::filesystem::directory_iterator(directory);
    }

    virtual bool has_next() const
    {
        return current != end;
    }

    virtual std::string next() const
    {
        if (has_next())
        {
            std::string const result = (*current).path().filename().string();
            do
            {
                ++current;
            }
            while (has_next() && !boost::filesystem::is_regular_file((*current).path()));
            return result;
        }
        throw std::runtime_error("no next path");
    }

private:
    mutable boost::filesystem::directory_iterator current;
    boost::filesystem::directory_iterator end;
};

We've wrapped just enough of Boost.FileSystem for our needs; Boost.FileSystem has a very large surface area and we don't need to put an interface around the entire thing, just enough to satisfy the needs of text_files.

If we don't want production code to have to worry about supplying an instance of filesystem_directory_scanner to text_files, we can use overloading on text_files and use simple delegation to supply the dependency:

extern std::vector<std::string>
text_files(std::string const& directory)
{
    filesystem_directory_scanner scanner;
    return text_files(directory, scanner);
}

Seeing this, you might wonder if we need to unit test the overload we just created? Because we are using simple delegation here, there isn't sufficient complexity to warrant unit testing. However, we have no unit tests for filesystem_directory_scanner, which does have control structures. We will want some sort of automated tests around this code to verify that it functions properly. We can use acceptance tests to verify the system as a whole, exercising all the components end-to-end and not just in isolation.

Example Source Code

System under test:
Tests:
- file_system.cpp