I really hate unix signals. So I agree wholeheartedly with the sentiment here:
One of the nastiest bugs I've seen in recent memory was a memory corruption that took months to track down, and finally we found that the root cause was an SG_IO ioctl
getting interrupted by a signal, and then scribbling into some other process' memory.
Thankfully, the data was recognizable as the response to a SCSI INQUIRY
command, otherwise we may have never figured out what was going on. The fact that the process getting whacked never called SG_IO
or should have ever seen the results of an INQUIRY
was a bit of a giveaway.
So even though we were handling signals correctly in userspace we got screwed by a kernel bug where a particular syscall didn't handle interruptions correctly.
Why? Why is SG_IO
interruptible in the first place? The SCSI command in question already went out, so you can't exactly call it back (SCSI abort handling is its own happy can of worms, but in this case no abort is even attempted). So I'm a little sad that the fix wasn't just "make SG_IO uninterruptible".