A few times I've discovered code that uses tokio
File
I/O in a way that is unreliable. This is because there's a bit of a footgun in the implementation of how File
writes occur in the tokio runtime.
If you use tokio::fs::File
in your code, you should probably know about this hazard, to avoid some unpleasant surprises.
Part 1: How things work in std
. §
Let's say you have a Rust module that defines a config file. We have some Config
struct:
#[derive(Debug, PartialEq, Serialize, Deserialize)]
pub struct Config {
pub enabled: bool,
pub target: String,
}
And we have a function that can read the config file from the filesystem.
impl Config {
/// Read configuration data from a file.
pub fn read_from_file(path: &Path) -> io::Result<Self> {
let file = File::open(path)?;
let config = serde_json::from_reader(file)?;
Ok(config)
}
}
I might want to write a unit test to verify the functionality of read_from_file
. So I decide to write a write_to_file
function as well, so that I can write an end-to-end test.
impl Config {
/// Write configuration data to a file.
pub fn write_to_file(&self, path: &Path) -> io::Result<()> {
let file = File::create(path)?;
serde_json::to_writer(file, self)?;
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn config_writer() {
let config = Config {
enabled: true,
target: "test123".to_string(),
};
// Don't do this; use the tempfile crate instead.
let test_path = "test_config.json";
config.write_to_file(test_path).unwrap();
let config2 = Config::read_from_file(test_path).unwrap();
assert_eq!(config, config2);
}
}
All is well. The unit test passes, and I can move on with life.
Part 2: Trying the same thing with Tokio. §
This could have played out in a slightly different way. Maybe I found myself in a different module, full of async code, and I needed to write out a Config
. Being a good async
citizen, I decide to add an async config writer.
impl Config {
/// Write configuration data to a file.
pub async fn write_to_file(&self, path: &Path) -> io::Result<()> {
let mut file = File::create(path).await?;
let raw_data = serde_json::to_vec(self)?;
file.write_all(&raw_data).await?;
Ok(())
}
}
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn config_writer() {
let config = Config {
enabled: true,
target: "test123".to_string(),
};
// Don't do this; use the tempfile crate instead.
// If you're wondering why, try running this test and
// the previous one in parallel, using the same filename.
let test_path = Path::new("test_config.json");
config.write_to_file(test_path).await.unwrap();
let config2 =
Config::async_read_from_file(test_path).await.unwrap();
assert_eq!(config, config2);
}
}
I want to write a unit test for this function. I'm a little bit lazy, so I decide that my unit test will use the async writer along with a blocking reader. It's just a unit test, right? A little bit of blocking shouldn't hurt anyone.
#[cfg(test)]
mod tests {
use super::*;
#[tokio::test]
async fn config_writer() {
let config = Config {
enabled: true,
target: "test123".to_string(),
};
let test_path = "test_config.json";
config.async_write_to_file(test_path).await.unwrap();
let config2 = Config::read_from_file(test_path).unwrap();
assert_eq!(config, config2);
}
}
All is... hang on a moment.
---- tests::config_writer stdout ----
thread 'tests::config_writer' panicked at src/lib.rs:30:57:
called `Result::unwrap()` on an `Err` value: Custom {
kind: UnexpectedEof, error: Error("EOF while parsing a value",
line: 1, column: 0) }
Part 3: What happened? §
If you haven't encountered this issue before, you may be a bit surprised to learn that Tokio file writes are not guaranteed to have written any data when the future completes.
In the unit test we wrote, this means that we are reading the file before the data has been written. This is a race condition: sometimes the data may have arrived; sometimes not.
If you add a time delay between the writer and reader, you can make this test fail a lot, or succeed most of the time.
In our second unit test, we assume that when async_write_to_file
completes successfully, it means that the file data has been written. Well, "written" to the extent that the filesystem and kernel received all of the data, and other processes will be able to see the full contents of the file.
This assumption is incorrect. Tokio File
method names mirror those from the Rust standard library. But they don't work the same way, and they don't promise the same things.
When a tokio File
write future completes, the data has been delivered only to some location inside the tokio runtime. A successful Future
does not mean that the data is visible to anyone else. The data hasn't reached the kernel, so it isn't visible to the standard library, or other processes. If the process terminates unexpectedly, it might disappear entirely.
Could this affect code outside of unit tests? Unfortunately, there are a lot of real-world situations where this might be a problem:
- Writing a file, then calling in to some non-tokio code that opens the file itself and reads the data. This might happen if we need to link to a C library that is the consumer of the config file we are writing.
- Writing a file, then spawning (or signalling) another process that opens the file and reads the data. Like the previous example, perhaps the config file we are writing is for some daemon process that will reload its config when signalled.
- Writing a file, then moving it into a directory that is watched via
inotify
, triggering a function that reads the data. Daemon processes may implement this as a strategy for automatically reloading config files, picking up new jobs from a spool directory, or triggering a project rebuild. - Writing a temporary filename, then (atomically) moving it to its final path in an effort to guarantee that no reader will ever see a partially written file. For example, our config file could be used by every ssh client on the system, and we don't control the timing of all those processes.
If you're doing anything like this within tokio, then it's critical that you call flush()
on the File
, which will force writes to actually complete. If you do this before other code attempts to read the file, then the reader should see the results of the write.
file.write_all(&raw_data).await?;
file.flush().await?;
This is documented in the tokio::fs
module, which says:
Note: It is important to use
flush
when writing to a TokioFile
. This is because calls towrite
will return before the write has finished, andflush
will wait for the write to finish. (The write will happen even if you don’t flush; it will just happen later.) This is different fromstd::fs::File
, and is due to the fact thatFile
usesspawn_blocking
behind the scenes.
This explanation is a little bit confusing, because later on in the same documentation there is an example of using spawn_blocking
with some std
File
writes, with a comment explaining that no flush is necessary.
A user would also be forgiven for not reading the fs
module documentation. Tokio I/O methods so closely follow the semantics of the standard library, so a casual user might assume that the write completion guarantees from std
are upheld by the tokio methods of the same name.
I should also note that I only tested Linux. I dont know if Tokio's I/O implementation works differently on other operating systems.
Part 4: Runtime boundaries §
In this situation the updates to the file hasn't reached the kernel yet. Is it visible to readers in the current runtime that try to read the data using tokio::fs::File
?
As far as I can tell, the answer is yes: if we issue a tokio File
write, followed by tokio File
read, everything works as expected.
There are a few other situations we might imagine:
- What if the tokio write occurs on a throwaway
Runtime
? If you just callRuntime::block_on
, the runtime may still be running after the call, so this might have the same problem: writes may not complete when we expect them to. - What if we shut down the
Runtime
? The shutdown does complete the task withinRuntime::block_on
before the runtime exits, provided you give it a sufficient timeout. But if youspawn
your writing task into the runtime, then the problem is still present: spawned tasks don't necessarily finish before the runtime exits. - What if we
drop
theRuntime
? In my limited testing, this behaves the same as a shutdown:block_on
work completes, but spawned work might not. - What about a tokio
File
write, followed by atokio
File read from a different runtime? This seems to work consistently. Perhaps the two runtimes arrange to share a single reactor? - What if the
File
write and read don't refer to the same path? E.g. what if the read is via a symlink or hard link? Both of these seem to work, though I'm not sure why. Is this because tokio inspects the underlying inode? Maybe it enforces some total ordering over all file I/O? Or something else entirely? I'd love to know the answer.
Part 5: Similar-looking but different problems §
I should note that this issue doesn't have anything to do with the kernel caching writes. You may be aware that any file I/O may fail to be durable if the kernel crashes or we lose power. Calling e.g. sync_all
may help, depending on the filesystem and the hardware involved.
Had the file writes been delivered to the kernel (e.g. by calling the write
syscall, and then waiting for it to complete), then the written bytes may not be durable-- but they would be visible to other processes. Since that didn't happen in our experiments above, we can rule out issues with the filesystem, kernel caches, storage device caches, etc. Those are all interesting issues that are good to know about, but they aren't part of the picture here.
Most engineers know that you may lose data during a kernel crash or power loss, and countermeasures to this problem are widely known. It's not very often that you encounter something that looks like a standard library write
but has weaker guarantees that are hard to discover. Unless you deliberately race writers and readers in testing, you may not be aware that your software is relying on an ordering (when we see the write complete, the kernel cache has received the data) that is just luck.
These problems also resemble the issues you might get with userspace software caching layers, i.e. std::io::BufWriter
. Those issues are very similar, in that an explicit flush step can force the write to complete. BufWriter
will also flush data on Drop
. Tokio is different in two ways: First, there is no async Drop
in Rust, so an explicit flush is a requirement. And in tokio there is no lower level you can drop down to if you want to opt out of buffering behavior.
Part 6: What now? §
A few reporters have argued that this is a bug. The relevant issues have been closed, without any changes to tokio's behavior.
If this won't be fixed in Tokio 1.x, what can users do? For the moment, it seems like a hazard that all users of tokio File
will need to be aware of. As countermeasures, the designer can take care to not depend upon the timing of write completions, or can take extra steps to ensure that writes are completed before depending on the results.
You can work around the issue in tokio by calling flush
and awaiting the result. As far as I can gather, this doesn't complete until the bytes have been delivered to the kernel. So write
followed by flush
is the equivalent of write
in std
.
Unfortunately, if you are dealing with an external crate that drops the tokio File
without calling flush, you may be out of luck. Drop
does not guarantee that the data has been delivered to the kernel.
Unfortunately, this is means that the semantics of std::fs::File
and tokio::fs::File
have diverged. You can't make the same assumptions about tokio File
s based on your understanding of std
(or of libc, or any other language's standard library).
Maybe someday, tokio will switch over to something like tokio-uring
on Linux and the semantics will converge with std
. There are a lot of issues that need to be resolved first (particularly around cancellation), but uring-based approaches have nice performance benefits and might be a great fit for I/O-heavy async runtimes.
Cheers! Good luck with your Rust projects.