How to improve SQL synchronization queue table performance

Post date: Dec 16, 2014 4:28:17 PM

As far as I can see, I would prefer shared queue or limited queue instead of having per client individual queues. This strategy can be used for anything which got data which is shared with multiple users and require synchronization.

First optimization things I usually take a look are:

  1. Amount of data
  2. Size of indexes
  3. How often the data is being accessed

Especially repeated read / writes to data which got huge indexes is bad for server. Ok, data which isn't being indexed and triggers full table scan is even worse. But hopefully nobody writes that bad cod

Goal of these things is to minimize the load caused on server. I've often seen that some applications with large queues generate significant load on server, even if everything is on idle. Load is being generated by repeated checks to large tables without indexes or with too large indexes. In these cases data should be partitioned better or whole strategy of queueing should be changed.

How we can reduce amount of data? Deduplication aka normalize data. In some rare cases this is bad idea, because the joins required could be even worse for the server than just scanning huge tables. But usually it's an good idea, especially if we can separate "small queue" and large data it's referring to.

Some queue systems do store same data for every client in data structures without deduplication. If queues are memory based, this can cause system to run out of memory or memory page swapping aka excessive swapping aka disk trashing.

  1. How do we limit amount of data in queue? First solution is to stop queueing for "dead clients". Which means that at some point change log for some clients is cleared, and there's simple message which tells to refresh everything because change log queue isn't available for that client anymore. This procedure usually gets rid of the worst clients which cause most of queue table bloat. These are the clients which actually do not ever fetch the data from the queue.
  2. Then we can improve this solution by separating queue and data. Now we have only one queue table, which tells what should be sent to where and then we have separate data. We can only replicate the queue table entries for recipients. So instead of queuing 100x 1MB message we only store something like 100x 24 bytes + 1x1MB message.
  3. If step 4. isn't implemented, this solution can be further improved by per client status flag in separate status table, which tells if there's anything in the queue table for that client. This is again optimization which makes writes heavier but constant reads much lighter. Instead of scanning queue table index on every polling request we can simply check this smaller table. Of course when ever queue table is updated, this status table also needs to be updated, making updates a bit slower.
  4. Improve queue table so that there is a monotonically increasing ID counter. This is very simple and RESTful way to track the state for each of the clients what should be refreshed. Because we have this id, it can be used as pointer. This procedure allows us to also form groups, because there's no need for server to know what data has been fetched by which clients. These groups can be like all clients, clients with certain status flag.

Before these changes the queue system was really taxing servers, with database tables sized in multiple gigabytes and being accessed all the time.

After these changes, the queue system became virtually free to run. Now it consumes 1000x less disk space, server doesn't basically need to know even about all clients because implementation is fully RESTful no state needs to be kept on server side. And all frequent polls hit check the same portion of the queue table instead of triggering separate disk access to fetch "per client data" as well as minimizes the amount of data that needs to be cached in ram all the time for performance reasons.

Depending from database server and it's data structures this method can be also written so that the queue table is circular log structure by rewriting older records with newer ones. This prevents data fragmentation very efficiently because the queue is always clean continuous queue on disk as well. But this is complex topic and requires detailed knowledge about the database system internals being used. In this case, the circular logging worked beautifully. But as example, SQLite3 uses dynamic record size and replacing older record with slightly larger record could lead to fragmentation. Some databases also provide special Queue type tables, but you can also abuse those to get bad results if you don't consider these matters when designing the solution.

I'm also very happy to receive any feedback, you'll find my contact info from home page.

-Thanks