Blog‎ > ‎

WGET, Write Amplification, SATNAV, Twit 2FA, AI, OpenBazaar, DCIE

posted Dec 4, 2016, 6:28 AM by Sami Lehtinen   [ updated Dec 4, 2016, 6:28 AM ]
  • Found an issue in wget it seems to waste a ton of resources. I were wondering whats taxing my system, and it's wget. It's classic example of bad code. Totally horrible. I thought that download files over TCP / HTTP won't max out CPU, but it does. Classic. This happens when you combine -r and -O parameters. Something goes wrong and process starts to get seriously slow after couple of hundred pages and hundreds of megabytes being downloaded. If -O parameter isn't being used, this won't happen. Yet -O single_file_name should only append, so what's the source of the problem? Didn't bother to look for the actual source code, but there's something wrong there. Like looping over the growing output file or such. Maybe they don't seek, maybe they read and skip the already saved data or do something similarly silly which "works well" as long as you're not gathering too much data. Tried several times and confirmed. Based on my observations it also stores everything it downloads in memory. Why? I really don't know. - Confirmed, it got ridiculously slow. It's like exponential slowness or something like that. wget without -O used about 5 seconds of CPU time and the copy with -O option had used 55 minutes of CPU time and used more than 500 megs of RAM when I decided it's slow enough.- This is again example where probably the software hasn't been designed for copying such a large sites so the recursive parameter isn't implemented efficiently for larger data sets.
  • Actual example of practical write amplification on SSD on normal desktop use. Erased GiB 4332 Written GiB 2838. Number of erased bytes is larger than the written bytes, because data has been relocated. Yet many consumer SSDs use quite small erase blocks. I've got USB Flash stick which uses 8MB erase block. It makes random writes really slow. Or not actually slow, it just means that writing even one byte and then seeking to other location and writing again, triggers 8MB write if it's outside that erase block range. Many cheap flash drives don't use advanced controller with complex FTL and optimizations.
  • Reminded my self about basic stuff like: Real Time Kinematic (RTK) satellite Navigation aka Carrier-Phase Enhancement (CPGPS) and European Geostationary Navigation Overlay Service (EGNOS) and more generic article about GNSS enhancement.
  • 2FA is nice if it works. Twitter's 2FA is malfunctioning again. Can't get the 2FA code, so can't login. Well well, I guess that's pretty normal. Several other SMS 2FA using services have totally similar issues repeatedly. I've got a friend who works at one company using SMS 2FA and it's constant struggle to get it to work.
  • Why are all IM apps just so full of useless features and plain bloat? I'm looking for simple light and secure IM app. WhatsApp, Skype, etc. No thank you. I like lean and functional software, but it seems that most of engineers aren't simply able to produce such, even if it should be pretty trivial. Stick to basics, and boot the junk.
  • Sam Harris: Can we build AI without losing control over it? - Excellent TED talk, of course it didn't contain anything new. But it's still a good presentation.
  • OpenBazaar isn't getting explosive growth. Does it mean it's failed? I think people got strange illusions about explosive growth in everything. If someone has been working as entrepreneur for years made millions during 30 years and there has been only linear slow growth over that time, does it mean he has failed? I don't know. But it seems to be so. If you've used more than month to make a million, you have to be a failure seems to be the current trend. Based on that I need to remember that on next salary negotiation, every month my salary should double? Why? I've learned new things, that's why.
  • Reminded my self about Data Center Infrastructure Efficiency (DCIE) Metrics: PUE, GEC, ERF, CUE. Because PUE can't be below 1, there has to be another metric for really efficient data centers. Which accounts for energy reuse. Energy Reuse Factor (ERF) is for that. So if PUE would be 1 and ERF 0.5 then 50% of the power used by the data center would be recycled. One of the worlds leading data centers in Energy Reuse is Yandex Data Centers in Finland. Those are much more efficient than Google's Data Centers. Yandex is aiming for 0.5 ERF which would be great. Absolutely largest part of data centers got ERF near zero. If some of the data centers got 1.0 REF Renewable Energy Factor. Of course WUE is also a factor, but that's not very important in Finland, but in some areas like California it might be really important.