External SMR drive worst case write performance test
It took 90 minutes to completely fill the write queue, then 90% performance drop started. After stopping testing it took 9 hours to fully recover from that torture.
Background and details
Random 4k writes to Toshiba Canvio 3 TB (USB3) SMR drive prefilled with random data. Point? See how performance behaves before the buffer is full, what happens when buffer is full and how long it takes to recover from this situation.
Guess, writes are fast until buffer is full, then writes are dead slow. Depending on buffer size, it might take several days to recover.
After the buffer is full, secondary test protocol is engaged. Now there will be 20 random reads every minute, to see the read latency. 20 reads should be suitable number of iopses to test. To see the difference after the SMR write buffer is empty again. There should be some difference in the drive access latency.
2785 IOPS operations per minute.
Then we'll just wait until the write buffer is full, that should take around 5 million IOPSes roughly estimated.
It took longer than expected, before the performance fell, but it well just as badly as expected.
After 92 minutes of writing the IOPS dropped from 120 IOPS to whopping... Wait for it 1,5 IOPS yes, that's one and half IOPS. That's the "SMR" effect, I were looking for. Single IOPS could take 8 seconds, hat 1,5 is averaged. I'll let this situation persist for 30 minutes or so, just to see that there aren't any jitter and the situation remains stable. There seems to be some jitter, so performance varies between 1 IOPS to 20 IOPS, but mostly remains on the lower end of that spectrum. To the application writing the data the write rate is around 15 to 130 4k per minute.
That IOPS rate means that the IO latency is huge. Meaning that any interactive user is waiting for the IO to finish, they're going to get well, give up, bored, enraged or something like that. System is very unresponsive.
Next ... After 120 minutes of 4k random writes, let's stop the write process, and just keep reading the drive every minute to see if any difference happens later.
As expected the drive continues extracting the write queue in background. The test reads are very easy to hear because those are quick random reads and the normal queue extraction is slow process due to SMR shingles.
As expected hour later (60 minutes), the D-SMR drive is still processing the write backlog. Same situation two ... five ... hours later. 24 hours later, the queue is empty. Let's see the logs if there were any latency change when the queue emptied. Well, it seems that based on logs the queue extract took 9 hours. But the most interesting part is that when the queue was empty, latency didn't go down, it did go up buy factor of four in average. But that's probably due to head parking or something like that. Because the drive was only accessed for around ~300 ms every minute.
During the testing 120 write test data points and 1331 read test data points were recorded and graphed for detailed analysis of this drive. Four other drives (different make and model) were also analyzed and results graphed for further analysis of the typical SMR drives. But that's outside of scope of this blog post. Based on the results is reasonable to conclude that this test well presents average SMR drive. And point of this post is to make it clear for average user how the SMR drives behave in practical terms and environment and what kind of performance should be expected if the very worst case is hit.
During normal larger file write operations (1 gb or larger) it's reasonable to expected 0-30% performance drop and average being around 15%. So there are some drawbacks, but most of users don't notice the issues. Only major annoyance are user interactive tasks where the high I/O latency might frustrate the user. But that's the pattern which should be avoided when using SMR drives. If you're looking for good random write performance, go for good SSD that's no brainer. SMR drives are good for backups & media storage. This of course assumes that the write loads are sanely designed. Like not copying small files or MP3 etc, always directly to the backup drive.
Ref: Shingled magnetic recording (SMR) & Input/output operations per second (IOPS) @ Wikipedia