Google App Engine (GAE) Gotchas and How To Avoid Them

Post date: Jan 13, 2013 8:49:18 AM

I cover here a few commong gotchas which snag new developers on Google App Engine PaaS platform.

Limited processing time

Every request got really limited processing time. I have seen many web apps to run user requests lasting tens of minutes or even hours, that's very bad approach. Return user queries quickly and if required poll for results later. Also see Task Queues and Database update frequency limitations.

Can't write to local file system

Even if files uploaded to service are accessable, you can't write anything to local file system. You have to use alternate storage methods like Google Cloud Storage, Google App Engine BlobStore or something else. If you're used to write local temp files or modify statics files, that won't work anymore.

No database locks

Google High Replication Datastore, does not provide any kind of database locking, it provides only transactions. Lock free operation is one way to make sure that no process is holding locks and blocking applications global progress. It simply means that either your transacion is successful or not. You can't do traditional locking. This also is beneficial, because now one process can't lock database for extended periods of time. Every entity does have version id, so in most of cases I try running most of processing outside transaction. Read data, process it, then start transaction, check data version and commit. If data version is not anymore what you started with, restart process. To update data in multiple entity groups use cross group (XG) transactions.

Database index latency

Database indexes (for separate entity groups / model) are not guaranteed to be up to date, this means that when ever you run query, you have to check values again when processing data. So if you ask for records where x=1, you might get records in that query where x!=1. So check records before actually processing those. This also means that you can't run cross entity queries in transactions. To avoid this problem, you store data in larger entities and then use use ancestor queries. This strategy also got it's own drawbacks which are covered bit later.

Ancestor queries

Ancestor queries are queries which are run in one entity group and therefore can be run in transaction. These queries do not return stale data. Main problem with using larger entity groups is that entity groups are the atomic processing unit with database.

Database update frequency limitations

Because HRD datastore is distributed to several data centers, it also means that there are internal latencies with datastore. Practically this means that each entity group can only be updated about once / second. If you now have one entity group which contains only one item like visitor_counter and you'll try to increment that on every page load, it's going to fail for sure. All you're processes are stuck with "run in transaction" mode, which is by default tried three times if transaction fails. Because roughly only one of parallel tasks can successfully process that transaction per second, all others are doomed to fail. To avoid this problem you need to use sharing. So if transactions fail, you'll simply add new shard to counter. So you'll end up practically with visitor_counter_# where # is shard number. When you need to update visitor counter, you'll update in transaction random shard id. When ever you need to read the visitor counter, you'll read all entities from model and sum those. For better results you can cache that to memcache for one second or bit longer time if your approach allows that. When ever run_in_transaction fails, meaning three failed update attemps, just add new entity group (shard) to model and then you're app is once again able to handle more traffic. Not handling this case properly is very common and sure way to fail.

Query result limit

Any query will return maximum of 1000 records. Therefore you might need to repeat the query several times. Using offset is problematic because non ancestor queries can't be run inside transaction, so you might miss a few records or get same records several times, if entries are inserted or deleted during processing.

Inequality filter limitations

You can't have multiple inequality filter rules in on query. Queries like select * from mytable where property_a>1 and property_b<100 order by property_c; simply can't be done. Composite indexes can solve some of these problems, but usually it's not as simple as it is with most of SQL databases.

Task Queues

If possible, do not update all data on client request. Just record the absolute required minimum to successfully execute the task later. Then spool rest of updates and processing to task queue. This allows user requests to return quickly.

Slow database access

High Replication Datastore is "really slow" compared to traditional local data stores. Therefore it's really important to cache data and avoid traditional database normalization. When ever possible you should be get all data from cache, or in worst case, pick it up by running just one select and reading a few records. See caching.

Caching data & output

If possible, you should store full output for the request using memcache, this means that there won't be any output processing required. Simply check if page url is in cache, and return results for it. This is very beneficial method, especially for public pages, even if it would be done for a short time interval. Also don't forget to utilize browser cache.

See my older post about Google App Engine and Caching.

Vendor lock-in

Be ware of vendor lock-in trap. When you creatae something on Google App Engine, always use your own (or some other) abstraction layer (API) between your application and the actual Google API. This allows you to use alternate services to provide data and communication for your application if and when required. Without this abstraction layer, you need to modify almost all of your code if you're not running your services on Google Cloud platform anymore. With this layer, you can only modify the layer, and start using SQL database instead Google High Replication Database etc.

Scalability, performance, reliability

There's also inherently pros for the Google App Engine solution, it allows virtually unlimited scalability when required without any changes. Excpeting that your app is designed and implemented correctly. This means that it must not create unnecessary bottlenecks, like updating same record with every request which is 100% sure way to cause a failure. Another thing is that your servers and platform is run by excellent guys.

I personally would select Google App Engine for especially quite simple programs, which require reliable database, no data loss after commits and high scalability no demand. Work loads with continuous fixed capacity are cheaper to be run elsewhere.

kw: appengine, developer, developing, programming, python, java, platform, PaaS, cloud service, cloud platform, data, database, Google App Engine (GAE) Gotchas and How To Avoid Them

as: ohjelmointi, kehitys, kehitystä, kehittää, alusta, alustalle, konsultti, konsultointi, konsultointia, pilvi, Google pilvi, pilveen, pilvessä, Googlen, ohjelmointia, pilvi, pilvelle, skaalautuva, skaalautua, automaattinen, automaattisesti, ohjelmisto, verkkopalvelu, verkkopalvelut, verkkopalveluita, verkkopalveluun, verkkopalvelussa, Googlelle, Googlessa, kuormitus, kuormittua, kuormittaa, edullisesti, edullinen, tehokas, tehokkaasti, tehokas, tehokasta, Suomi, Suomessa, Helsinki, Helsingissä, Helsinkiin, Suomeen, Suomeksi, pilvipalvelu, pilvipalvelut, pilvipalvelua, pilvipalveluun, pilvipalvelussa