posted Oct 26, 2013, 9:39 AM by Sami Lehtinen
updated Jan 5, 2014, 11:39 PM
Here's something I've been working & studying on lately.
(Btw. Most of stuff from backlog which is now about 6 months long)
- Played with new computer and Python3 & multiprocessing and different implementations of my ERP/ETL solutions. A few wrong data types and locks related to those, are enough to completely luin multicore performance. But that's nothing new at all. Also studied Python 3.4 Asynchronous I/O from excellent PyCon2013 documentation, which isn't unfortunately available anymore.
- SSD related stuff:
- Played with Ubuntu dm-cache with old computer and small (64 Gbyte SSD + 500GB HDD)
- Nice article about anatomy of SSD drive.
- Another great post about flash card design.
- Studied OCZ HSDL super fast interface.
- Studied coding theory of BCH code. Btw. That's not light reading. ;) But it's very related to SSD drives and flash bit errors.
LDPC (low-density parity-check), FTL (flash translation layer), LFS (log-structured file system), Linux IO scheduler: deadline, NOOP, CFQ,
linux disk io re-order, elevator. Interesting I did write a few
about this to Linux forum but I didn't blog those.
- Shannon's theorem
- Turbo codes
- Reading all this error correction code stuff reminded me once again about
modem times. v.42bis LAMP/SREJ (selective reject) etc. I also hated
regular Zmodem go-back-N ARQ which made transfer always stop for a
while, because redundant data was sent again due buffering.
gsz/dsz MobyTurbo Zmodem was much better, it didn't pause when it
encountered error, had smaller overhead and supported smaller block
sizes for noisy lines.
- Studied Finnish regulation about virtual currencies.
- Studied The Design Of SQLite4.
- Studied difference between Blue Ocean and Red Ocean strategies. Nothing new really, but reminds you what you really need to focus on.
- Found out the bad way that Python Shelve is totally useless with larger number of keys. It's not database, it's just datastore for small amounts of data. Write performance gets super poor after about million keys or so. I used SQlite3 instead. With quite small data sets, I also use configparser instead of SQLite3. Why? Because files written by configparser are directly human readable and editing data in those files is trivial.
- Played with group entity metadata. (Google AppEngine Data Store)
Allows easy and reliable way to check entity version. This is especially
good way to check entity version before writing. I prefer to run longer
running tasks outside transaction,
A) read data
B) process it
C) start transaction
D) check version
When running your own functions which read and write data, and fail if data has been modified (version changed during process), it's very important to try commit with zero retries. Otherwise you'll actually try to commit three times and it'll fail everytime anyway, because data is still changed compared to the original version.
db.run_in_transaction(decrement, counter.key(), retries=0
Also check entity group version
. Of course it's optimal and possible to reread the record if it's modified and check if the field you're updating has changed and if possible without reprocessing just update the record from suitable parts and write it back. Another way to deal with this is simply accept that object has been updated and restart whole process from the beginning. But this can be very problematic with long running transactions and optimistic concurrency control. -> Long running parallel tasks usually fail, if you cant quickly enough update data after reading it.Google Cloud Storage API
- Used about 6 hours to read through Redis documentation. No I didn't try it. I did want to read the documentation to understand the internals and how it works. Well, it's memory database which does write disk backup. Nothing especially interesting there. It's much more simple approach that most other database which I've been working with. More like dictionary dump, change log, and when change log is large enough, lastest dictionary is dumped again and then changes recorded after that. Yes, it's useful in some situations and totally unsuitable for others. (RDB, AOF)
- Studied InterBase and Firebird Multi Generational Architecture. Those (sweeping, sweep, garbage collection) were bit different from MVCC what's being used with PostgreSQL (postgres).
- If you didn't know, SQLite3 PRAGMA auto_vacuum doesn't actually invoke vacuum, it's totally different way of dealing with things. Also it doesn't reduce database fragmentation, but actually can make it much worse. So, even if you're using auto_vacuum, it's still a good idea to run actual vacuum for db for every now and then.
- Studied quite nice NoSQL article. You should read it too.
I think I blogged less than I added to my blog about list during last week. But so what, here's something interesting I've been doing. And there will be more posts in future. If you have any questions, don't hesitate to contact me. - Thanks!