Blog‎ > ‎

Chaos Monkey Bit Me - A short story about transactionality

posted Nov 26, 2013, 7:33 AM by Sami Lehtinen   [ updated Dec 3, 2013, 8:20 AM ]
Today it happened a Chaos Monkey bit me. Luckily it wasn't serious this time, I did survive it, but it could have been much worse.

What did happen? Well, I failed to full fill the requirement I have been asking from everybody else, proper use of transactionality. Basically I'm doing everything right, but there was just a minor glitch, it doesn't happen often, I didn't expect it to happen any time soon, even if I did actually acknowledge possibility of it earlier. So, if you think that something can fail, and you don't handle it properly in your code, it's going to bite your at some point.

How bad my code was then? It wasn't very bad at all, and I have checked that the operation is very quick, usually much less than 100ms even with traditional disks.

Original implementation:
  1. open file for (over)write
  2. write new content
  3. close file
Should be quite ok. Of course I handle exceptions, but how this could fail badly? Well, in this case it turns out, that the server file system performance was catastrophically bad due underlying technical reasons.

How did it affect my program? Write part hanged and server was shutdown (hard) to fix the issue. After fixing everything, system was again up'n'running, but my app wasn't working correctly. It was looking for the content of that file. Because disk write failed completely, but after opening file for write, content was gone. In this case I was lucky, it only prevented the task from running. In some other cases I have used method to make testing easier, where I create file keys with default values if content is lost. In such situation things could just get a lot worse, if transaction pointers are reversed back to some old state by surprise.

How to fix that the problem? I did think for a while, what would be the clearest solution, and I decide that I could use same approach as SQLite3 is using by default, where deleting roolback journal marks commit to database.

New implementation:
  1. read file 1
  2. if file 1 doesn't exist do step 9 and repeat step 1.
  3. do stuff
  4. write file 2
  5. flush file 2
  6. fsync file 2
  7. close file 2
  8. delete file 1 (commit)
  9. rename 2 to 1

I think this should work pretty well. Even the original simple solution worked also well, except in the situation where system fails between (original) step 1. and 2.

Should I now expect some other horrible problems arising from this new approach? - Why fsync? It's very important. By default Windows syncs file data to disk when file is closed, but that might not be the situation always. Some systems are using data=writeback mode with Linux, which means that basically file 2 could exist and file 1 is deleted, but data of file 2 isn't on disk yet.

This is only small part of the whole process, of course there should be other journal files, which can be used to update "all related information" in case something isn't completed completely. I usually prefer to use UUIDs for data entries, which allows me to rollback transaction pointers and just "recheck" that all data exists, in case where it's unclear what parts of larger data set / transaction was properly processed.

Yes, of course major parts of the program is protected by mutex etc, to prevent all other kinds of problems. But this was the very small problem, with so low probability, that I just assumed it would be ok. Amount of the data getting written is less than 4kb. And this code has been running millions of times without any problems on multiple servers and in different environments. Until there was error which stopped processing at the critical point.

What we learned from this? Even if your program has been working for several years on several severs and customers. It still doesn't mean it's solid, nor it proves it's working. Because it simply can still be fatally flawed. All programs should be written so, that it can be aborted at any point, and works on next run. Yet as we all know, we have seen tons of violations of this rule, due developer laziness.

Why I'm not using a database? Well, it's just so handy to use props files, modifying and updating data in databases in these cases is much more complex and non admin / user friendly way. Or requires additional admin / user tools & user interfaces.

New implementation isn't anything new. I've been doing that always with SFTP, FTPS etc. First upload file using temporary file name. Then check source & destination file sizes, rename file to the final file name. This is a great way to prevent partial files getting processed at the other end. But I didn't think it's necessary to do all the steps on local disk.

If you want to make SQLite3 faster, you can do it by disabling synchronous disk I/O and SQLites own journaling. Then it's faster, and totally unprotected to the things, that journaling and transactional database should protect you from. When you combine these write nobarriers and writeback or nojournaling, you could get interesting results after random OS crash.

As user of USB flash memories, I guess everyone's familiar with the situation where you copy stuff to stick and then remove it. After that some files are completely missing and some files can be partial or zero size, depending about file system being used.

Further reading: How To Corrupt An SQLite Database