Code and Bitters

A few times I've discovered code that uses tokio File I/O in a way that is unreliable. This is because there's a bit of a footgun in the implementation of how File writes occur in the tokio runtime.

If you use tokio::fs::File in your code, you should probably know about this hazard, to avoid some unpleasant surprises.

Part 1: How things work in `std`. §

Let's say you have a Rust module that defines a config file. We have some Config struct:

#[derive(Debug, PartialEq, Serialize, Deserialize)]
pub struct Config {
    pub enabled: bool,
    pub target: String,
}

And we have a function that can read the config file from the filesystem.

impl Config {
    /// Read configuration data from a file.
    pub fn read_from_file(path: &Path) -> io::Result<Self> {
        let file = File::open(path)?;
        let config = serde_json::from_reader(file)?;
        Ok(config)
    }
}

I might want to write a unit test to verify the functionality of read_from_file. So I decide to write a write_to_file function as well, so that I can write an end-to-end test.

impl Config {
    /// Write configuration data to a file.
    pub fn write_to_file(&self, path: &Path) -> io::Result<()> {
        let file = File::create(path)?;
        serde_json::to_writer(file, self)?;
        Ok(())
    }
}

#[cfg(test)]
mod tests {

    use super::*;

    #[test]
    fn config_writer() {
        let config = Config {
            enabled: true,
            target: "test123".to_string(),
        };
        // Don't do this; use the tempfile crate instead.
        let test_path = "test_config.json";

        config.write_to_file(test_path).unwrap();
        let config2 = Config::read_from_file(test_path).unwrap();

        assert_eq!(config, config2);
    }
}

All is well. The unit test passes, and I can move on with life.

Part 2: Trying the same thing with Tokio. §

This could have played out in a slightly different way. Maybe I found myself in a different module, full of async code, and I needed to write out a Config. Being a good async citizen, I decide to add an async config writer.

impl Config {
    /// Write configuration data to a file.
    pub async fn write_to_file(&self, path: &Path) -> io::Result<()> {
        let mut file = File::create(path).await?;
        let raw_data = serde_json::to_vec(self)?;
        file.write_all(&raw_data).await?;
        Ok(())
    }
}

#[cfg(test)]
mod tests {

    use super::*;

    #[tokio::test]
    async fn config_writer() {
        let config = Config {
            enabled: true,
            target: "test123".to_string(),
        };
        // Don't do this; use the tempfile crate instead.
        // If you're wondering why, try running this test and
        // the previous one in parallel, using the same filename.
        let test_path = Path::new("test_config.json");

        config.write_to_file(test_path).await.unwrap();
        let config2 =
            Config::async_read_from_file(test_path).await.unwrap();

        assert_eq!(config, config2);
    }
}

I want to write a unit test for this function. I'm a little bit lazy, so I decide that my unit test will use the async writer along with a blocking reader. It's just a unit test, right? A little bit of blocking shouldn't hurt anyone.

#[cfg(test)]
mod tests {
    use super::*;

    #[tokio::test]
    async fn config_writer() {
        let config = Config {
            enabled: true,
            target: "test123".to_string(),
        };
        let test_path = "test_config.json";

        config.async_write_to_file(test_path).await.unwrap();
        let config2 = Config::read_from_file(test_path).unwrap();

        assert_eq!(config, config2);
    }
}

All is... hang on a moment.

---- tests::config_writer stdout ----
thread 'tests::config_writer' panicked at src/lib.rs:30:57:
called `Result::unwrap()` on an `Err` value: Custom {
    kind: UnexpectedEof, error: Error("EOF while parsing a value",
    line: 1, column: 0) }

Part 3: What happened? §

If you haven't encountered this issue before, you may be a bit surprised to learn that Tokio file writes are not guaranteed to have written any data when the future completes.

In the unit test we wrote, this means that we are reading the file before the data has been written. This is a race condition: sometimes the data may have arrived; sometimes not.

If you add a time delay between the writer and reader, you can make this test fail a lot, or succeed most of the time.

In our second unit test, we assume that when async_write_to_file completes successfully, it means that the file data has been written. Well, "written" to the extent that the filesystem and kernel received all of the data, and other processes will be able to see the full contents of the file.

This assumption is incorrect. Tokio File method names mirror those from the Rust standard library. But they don't work the same way, and they don't promise the same things.

When a tokio File write future completes, the data has been delivered only to some location inside the tokio runtime. A successful Future does not mean that the data is visible to anyone else. The data hasn't reached the kernel, so it isn't visible to the standard library, or other processes. If the process terminates unexpectedly, it might disappear entirely.

Could this affect code outside of unit tests? Unfortunately, there are a lot of real-world situations where this might be a problem:

Writing a file, then calling in to some non-tokio code that opens the file itself and reads the data. This might happen if we need to link to a C library that is the consumer of the config file we are writing.
Writing a file, then spawning (or signalling) another process that opens the file and reads the data. Like the previous example, perhaps the config file we are writing is for some daemon process that will reload its config when signalled.
Writing a file, then moving it into a directory that is watched via inotify, triggering a function that reads the data. Daemon processes may implement this as a strategy for automatically reloading config files, picking up new jobs from a spool directory, or triggering a project rebuild.
Writing a temporary filename, then (atomically) moving it to its final path in an effort to guarantee that no reader will ever see a partially written file. For example, our config file could be used by every ssh client on the system, and we don't control the timing of all those processes.

If you're doing anything like this within tokio, then it's critical that you call flush() on the File, which will force writes to actually complete. If you do this before other code attempts to read the file, then the reader should see the results of the write.

        file.write_all(&raw_data).await?;
        file.flush().await?;

This is documented in the tokio::fs module, which says:

Note: It is important to use flush when writing to a Tokio File. This is because calls to write will return before the write has finished, and flush will wait for the write to finish. (The write will happen even if you don’t flush; it will just happen later.) This is different from std::fs::File, and is due to the fact that File uses spawn_blocking behind the scenes.

This explanation is a little bit confusing, because later on in the same documentation there is an example of using spawn_blocking with some std File writes, with a comment explaining that no flush is necessary.

A user would also be forgiven for not reading the fs module documentation. Tokio I/O methods so closely follow the semantics of the standard library, so a casual user might assume that the write completion guarantees from std are upheld by the tokio methods of the same name.

I should also note that I only tested Linux. I dont know if Tokio's I/O implementation works differently on other operating systems.

Part 4: Runtime boundaries §

In this situation the updates to the file hasn't reached the kernel yet. Is it visible to readers in the current runtime that try to read the data using tokio::fs::File?

As far as I can tell, the answer is yes: if we issue a tokio File write, followed by tokio File read, everything works as expected.

There are a few other situations we might imagine:

What if the tokio write occurs on a throwaway Runtime? If you just call Runtime::block_on, the runtime may still be running after the call, so this might have the same problem: writes may not complete when we expect them to.
What if we shut down the Runtime? The shutdown does complete the task within Runtime::block_on before the runtime exits, provided you give it a sufficient timeout. But if you spawn your writing task into the runtime, then the problem is still present: spawned tasks don't necessarily finish before the runtime exits.
What if we drop the Runtime? In my limited testing, this behaves the same as a shutdown: block_on work completes, but spawned work might not.
What about a tokio File write, followed by a tokio File read from a different runtime? This seems to work consistently. Perhaps the two runtimes arrange to share a single reactor?
What if the File write and read don't refer to the same path? E.g. what if the read is via a symlink or hard link? Both of these seem to work, though I'm not sure why. Is this because tokio inspects the underlying inode? Maybe it enforces some total ordering over all file I/O? Or something else entirely? I'd love to know the answer.

Part 5: Similar-looking but different problems §

I should note that this issue doesn't have anything to do with the kernel caching writes. You may be aware that any file I/O may fail to be durable if the kernel crashes or we lose power. Calling e.g. sync_all may help, depending on the filesystem and the hardware involved.

Had the file writes been delivered to the kernel (e.g. by calling the write syscall, and then waiting for it to complete), then the written bytes may not be durable-- but they would be visible to other processes. Since that didn't happen in our experiments above, we can rule out issues with the filesystem, kernel caches, storage device caches, etc. Those are all interesting issues that are good to know about, but they aren't part of the picture here.

Most engineers know that you may lose data during a kernel crash or power loss, and countermeasures to this problem are widely known. It's not very often that you encounter something that looks like a standard library write but has weaker guarantees that are hard to discover. Unless you deliberately race writers and readers in testing, you may not be aware that your software is relying on an ordering (when we see the write complete, the kernel cache has received the data) that is just luck.

These problems also resemble the issues you might get with userspace software caching layers, i.e. std::io::BufWriter. Those issues are very similar, in that an explicit flush step can force the write to complete. BufWriter will also flush data on Drop. Tokio is different in two ways: First, there is no async Drop in Rust, so an explicit flush is a requirement. And in tokio there is no lower level you can drop down to if you want to opt out of buffering behavior.

Part 6: What now? §

A few reporters have argued that this is a bug. The relevant issues have been closed, without any changes to tokio's behavior.

If this won't be fixed in Tokio 1.x, what can users do? For the moment, it seems like a hazard that all users of tokio File will need to be aware of. As countermeasures, the designer can take care to not depend upon the timing of write completions, or can take extra steps to ensure that writes are completed before depending on the results.

You can work around the issue in tokio by calling flush and awaiting the result. As far as I can gather, this doesn't complete until the bytes have been delivered to the kernel. So write followed by flush is the equivalent of write in std.

Unfortunately, if you are dealing with an external crate that drops the tokio File without calling flush, you may be out of luck. Drop does not guarantee that the data has been delivered to the kernel.

Unfortunately, this is means that the semantics of std::fs::File and tokio::fs::File have diverged. You can't make the same assumptions about tokio Files based on your understanding of std (or of libc, or any other language's standard library).

Maybe someday, tokio will switch over to something like tokio-uring on Linux and the semantics will converge with std. There are a lot of issues that need to be resolved first (particularly around cancellation), but uring-based approaches have nice performance benefits and might be a great fit for I/O-heavy async runtimes.

Cheers! Good luck with your Rust projects.

code and bitters