Cloudflare CDN + API, Self-hosting, Technology, Thunderbird, NTFS Extents Write-Amplification

Post date: Jan 9, 2017 3:31:56 AM

Had long reasoning with a friend who's running a tech startup, about if it's better to connect from client directly to server's API or if API requests should pass via CDN. - That's a very good question. I'm afraid that there's no simple or right answer to that question. There are so many factors affecting the question. But here's my reasoning:

  • One project I've been developing provides JSON RESTful API. After long testing and experimenting I decided to pass the API traffic through Cloudflare CDN.
  • Main reason for that is DDoS protection and the fact that the API latency doesn't really matter. Because that API is low traffic and it's bot which call the API so there isn't even user waiting for instant interactive response.

Other factors which also affect this discussion are:

  • HTTP/2 (H2) - It depends from API usage, if it's utilizing H2 and efficiently using existing connections. If the API users are mostly bots and other backend systems, those can keep a few H2 connections open all the time and pass asynchronous traffic. In this case it's probable that CDN will make the performance worse.
  • Mobile Devices & Packet Loss - On mobile devices high latency and possible packet loss turns the table around. Now it's important that potential packet loss is detected quickly, and using CDN probably will make the end user experience better. It's also a great question how smart the CDN is. If the CDN does then maintain a few H2 connections to the source server and aggregates traffic from clients using this path, then the CDN can provide great benefit in over all latency and experience. But this is a extremely complex question and requires experimenting and fine tuning, a lot probably.
  • API response cacheability - Are the API calls / responses such that those can be cached? If so, CDN should provide great benefits. Let's say we have a service which gives you bitcoin exchange rate which is updated every minute or so. All the calls from clients to the API during that minute can get the cached response, that's great.
  • TLS / SSL - It's nice if the CDN can handle the TLS/SSL connection, which requires a few round trips at least with older implementations. I think the zero round trip implementation is coming, but it's only for repeated connections as well as clients which are smart enough to utilize it. Problem with these optimizations are that many of these are very complex. Which means that many won't be using those. It's like any other complex optimization, it's done, when it's absolutely required. Doing it from the very beginning can divert resources from more important things and is just a premature optimization. Of course using a suitable library can help, which handles all the complexity seamlessly.
  • DDoS protection - If you're trusting CDN for DDoS protection and don't want to reveal key subsystem IP addresses, then passing through CDN is great idea. Without CDN even if you would use load balancer outside the backend's address space, DDoS at least will take down the API even if rest of site would remain running.

Afaik, these are the most important key points we discussed, I'm sure there's plenty of other aspects. This is a deep topic.

Other stuff just gathered with this post:

  • Friends self-hosted email server had extensive down time. Yeah, that happens. That's one of the reasons why I actually stopped running my own mail server, even if I naturally could. But as we all know, it does break down just when you are unavailable to do anything about it, even if you normally could react swifly and get it fixed in no time, or even completely reinstalled in a few hours.
  • Some technology seems to go forward. But some won't. It seems that new washing machines got generally much louder electric engines than the old ones. It's kind of annoying.
  • Thunderbird, bad code, bad code. It seems that the IMAP message downloading code is broken. It hangs forever while fetching message from mail folder. I've been wondering why some messages are downloaded in real time, when I've got download option enabled. But the reason seems to be the fact that the downloader isn't working. I tried to look in configuration for downloader timeout, but there isn't such option. Great, just great. So message downloader will remain stuck, until Thunderbird is restarted. - Seems to confirm the software industry norm, broken software is the norm. If something actually works, it's rare exception. And you've probably been just very lucky.
  • Also noticed that this blog post entry is now over 4k bytes long. And being stored in two separate NTFS extents. Now when I save this file, with write amplification. It'll mean that at least 16 megabytes is written to the flash. Neat. No wonder it takes a while. No, size alone doesn't make it to be stored like that, but just on this file system and with this file and current allocation situation it seems to be split.