Multilayered caching, asynchronous I/O, App Engine, Python, 9ox.net

Post date: Dec 17, 2011 2:03:25 PM

Finally I managed to write this short description about 9ox.net / Google App Engine data caching and application efficiency and performance.

Here are the layers I'm using from bottom to top. Slowest first and each layer adds caching. By reducing datastore access and other slow tasks, by making application more cost effective and faster.

Datastore

Datastore transactions are slow and expensive. Therefore it's important trying to avoid datastore transactions as often as possible. Records in datastore are only updated or read when ever it's required. Access timestamps for URLs are updated only once per every 24 hours. There isn't expire time on datastore records. Application expires records based on create (1 year) and access (31 dyas) time stamps, when it's looking for expired entries to replaced. All datastore access is purely based on indexes, so there is no need to read records just to check if records contain required content.

Memcache

Expensive tasks are cached for 24 hours (86400 seconds) using memcache. I currently consider following tasks as slow: DNS lookups, WOT ratings, SURBL information, spam / flood protection data and all data in datastore, also including datastore lookup misses. Cache records are set to expire in 24 hours to force cache refresh.

Instance memory store cache

Memcache is slower than in instance memory, but instance memory isn't distributed. Instance memory is in this case used to deal with worst spikes. Instance memory is also cleared if instance is shut down or restarted. I have implemented current instance store using exactly same API as memcache. It's currently restricted to 64 megabytes of cached data (including keys), due instance memory limit of 128 megabytes. I decided to use CAR cache, because it's most advanced but still very light caching solution. Now all top used records are directly stored in application memory. CAR cache implementation is also scan resistant and auto-tuning so it's much better than LRU cache in this case. ARC caching would have given practically same performance, but it still causes more memory updates, although in this scale it's almost meaningless.

Front end cache

All redirect pages served from application are public and can be cached for 86400 seconds = 24 hours. This public reverse cache drastically boosts efficiency, because most of requests aren't even coming to application level. Unfortunately front end cache isn't available for free apps. But it still helps, because any public proxy can cache pages.

Several caching layers, each making system faster. Of course it would be possible to run basic all, that would every time look data from datastore and serve it to end-user, but it would be much slower (about 100 times!) and therefore also more expensive.

Asynchronous I/O

I'm also using asynchronous I/O. When ever slow tasks mentioned above are executed, those are initiated asynchronously and results collected after. I'll initiate all slow tasks in parallel immediately after checking all caching layers. (Database API, Fetchurl API, etc.)

Using eventual consistency when reading data from database to refreshing caches. EC allows data to be read from database slave in case master server isn't available. Delays can be avoided.

Database concurrency handing

Application uses sliding allocation window to deal with sudden load spikes. My first implementation hand severe problem with allocating non random records. It caused bottleneck immediately when there are several threads updating database constantly. To fix that issue I added sliding window of 100 records. Now 100 suitable records are looked up and only one of those is used. Same method is used when allocating new records. There has to be solid block of three 3 contiguous records at the bottom. Probability of random allocation is set so that the records with lower id are slightly more probably allocated than records with higher id. This relative probability should prevent situation when all threads are forced trying to update that single record because there aren't any other records to be updated before sliding window can move forward.

Further development

When I have right mood, I'll upgrade UI to HTML5 standards and implement Google Safer Browsing API.

Remarks

I'm very aware that Python 2.7 would allow me to use multithreading which would help a lot with this kind of light application. I'll be implementing it as soon as 2.7 is out of experimental status. Tips about using eventually consistent reads are only valid when using old Master/Slave datastore. HRD (High Replication Datastore) got different ruleset. My application is already checking for stale indexes even if those should be extremely rare with M/S datastore.