2013 assorted stuff

Post date: Jan 6, 2013 3:56:32 PM

To celebrate the beginning of year 2013 I have this little assorted post. This is not even nerly everything. I could have written several posts about each topic covered in this post. I'm just sorry I don't have time for that. This is just a web log of stuff I do, I really can't cover all the details.

  • Finished reading Innovator's Dilemma. I can see many similarities in the business I'm currently in. How things change, and how everything new is impossible or hard, things should be done as those are always been done. Well, maybe it's time to change things, even for better?
  • Veikkaus.fi web site is finaly allowing longer passwords up to 32 chars including special characters. Earlier their policiy did only allow 8 chars including a-z, A-Z, 0-9, which is bit outdated policy afaik.
  • Reminded my self about HTTP header fields (RFC2616). Mean while in process, found out that really many sites give out conflicting information in Cache-Control header. Private means that data should be cached, but only for the user. While no-cache means that data shouldn't be cached. So why site then defines private and no-cache simultaneously? Also some browsers require usage of must-revalidate. Why? If there's no-cache filed given, why shoud cached data be revalidated when there is no cache in first place? Of course we have Expires field and old Pragma and all tricks involved with those. But basically Cache-Control: no-cache should be enough and Expires field can be set to 0 or point to passed point in time, but it's not required.
  • Did read huge long discussion about CloudFlare. Point of discussion was if CloudFlare is making your site faster or not? Well, basically cloudflare works for caching proxy for resources which are cacheable, but for non-cacheable resources, it just adds delay. Therefore it's really important to be smart about what is being cacheable and what's not and especially not to declare resources that are cacheable to be non-cachable like Google does.
  • Worked hard with one minor project to get it off the blacklists. Let's say that the abuse control policy wasn't best possible. Domain got blacklisted and now I had to fix first the app and then get it removed from the lists. It can be very tedious work.
  • Limited number of parallel (concurrent) HTTP sessions in my browser to improve performance. I'm now living in rented flat and using ridiculous (compared to my normal fiber connection) 3G connection. After limiting number of parallel connections everything seem to be working much better. Because there aren't too many TCP connections competing from this bandwidth. I also might tune down my TCP-stack settings which are absolutely maxed out for fiber connection. (Like initial TCP RWIN 64k etc.)
  • Read a few posts wondering when IPv6 will really kick-off? Here's one. Well, I have to say that I fully working native IPv6 connectivity, and I'm happy with it. I'm really not using NAT, it seems that many people do not understand network interfaces at all. They also do not understand benefits binding services to only selected interfaces.
  • Just found out that my 3G internet provider doesn't block outgoing TCP port 25 at all. That's quite interesting, I assumed that all consumer grade network connections block port 25 access. because even many business network connections blick it, unless you make it clear to the provider that you don't want it to be blocked.
  • Encountered SQL gotcha when wondering why one app didn't behave like it should have. It seems that SELECT key,value FROM data WHERE value!=0: leads to interesting results. It returns only fields where key is non-integer. So from 0 1 1.1 2 2.1 3 3.3 only values 1.1, 2.2 and 3.3 get selected. If I want it to return all, like non-zero values, it requires adding decimal to the query. So let's put 0.0 instead 0 there, and then it behaves as expected. It's always good to know your systems thoroughly otherwise you'll find your self having these kind of strange problems.
  • Reminded my self about WiFi (WLAN) stuff: WDS, roaming (controlled by client), SSID, frequency allocation, encryption modes, etc. Especially for situations when you have larger set of base stations. One of main points was not to use WDS unless it's absolutely required, it'll just make your network slow or superslow in case of several base stations.
  • We all did read news about fake google.com SSL certificates. As I said, users should first authenticate the site, before users authenticate to the site. There's no reason to trust broken SSL at all.
  • Read interesting article about Tux3 FS. They sure have some interesting ways of solving things out. Reminds me about database transaction journaling and SQLite3 WAL mode.
  • Wondered SLIC BIOS, Sysprep and cloning Windows 7 and Windows 8 machines, whic use SLIC BIOS and special Windows licenses. Found one russian hacker app, which allows to read and decrypt all required data from BIOS allowing cloning of those machines and even re-entering the product key. Basically I don't like the basic concept of embedding licensing information in BIOS. This is just like the secure boot which could prevent loading any other than MS(tm) operating system or so.
  • Studied litte more in detail, Windows server / workstation Sysprep for cloning.
  • Updated passwords and email addresses for most of sites. It's quite painfull process. You'll never know which site will accept what kind of new password etc. Some sites allow only short passwords, some sites allow any kind of UTF-8 passwords without any problems etc. It's also hard to find where to change the password in some sites and some sites simply do not properly confirm if new passwords was accepted or not. Overall quite crappy user experience. One of the crappiest sites (bank btw), didn't accept password which is non-numeric and longer than 6 numbers. (Duh)
  • One nice article about bluetooth security. I just wonder what kind of fails there will be with the Near Field Communication (NFC) stuff.
  • Read nice article how I run my own DNS. Nothing new really, it was just fun to read it. I just wonder why he's using postfix to forward mailto Gmail. I run my own Postfix because I don't want my mail to be forwarded to Outlook or Gmail or any other huge dataware house.
  • HTTP headers from my web-server:
    • HTTP/1.1 200 OK
    • Accept-Ranges: bytes
    • Cache-Control: max-age=86400
    • Content-Length: 79318
    • Content-Type: image/png
    • Date: Mon, 31 Dec 2012 20:35:59 GMT
    • Etag: "1ff11-135d6-4bfd1068fed7c"
    • Expires: Tue, 01 Jan 2013 20:35:59 GMT
    • Last-Modified: Sat, 12 May 2012 06:33:06 GMT
    • Server: Apache/2.2.22 (Ubuntu)
    • Strict-Transport-Security: max-age=86400
    • x-mod-spdy: 0.9.3.3-386
    • X-Firefox-Spdy: 3
  • Closed a few (small traffic) web forums and replaced those with newly created Google+ Communities.
  • NitpickerTool.com finally a proper spellchecker. Maybe I should use it too? Just for lulz, they fail... The image on front page http://nitpickertool.com/resources/img/overview.png sends HTTP response header Cache-Control: no-cache. It's totally crazy and pointless. Usually I don't even notice these things, but I'm now using narrowband 3G instead of 1 Gigabit/s full duplex fiber connection. It's so very easy to spot crappy web-sites from the sites where admins know what they're doing. This is perfect example. (btw. Google Engineers do not know either what they're doing, I'll get back to this topic). - Btw. They now have fixed it.
  • Reminded myself about current Secure Boot UEFI issues with Free Software, and how that's going to possible affect several Linux distributions, unless SB can be disabled by user.
  • Studied and tested mod pagespeed. Even I didn't leave it enabled.
  • Wondered why FB sends B*S* information about users to other users. I'm 99% sure that the user isn't following the pages FB claims her to follow. I also later confirmed my suspicion and I was right. The information which FB sent, was based on my profile, but they claimed it was based on her profile. I think that's just plain deception.
  • Checked out and played with Ninchat. Didn't like their Flash video chat, but otherwise it's nice web-chat.
  • Watched Indie Game Movie. When stakes are high, it seems that people can get bit messed with their thoughts. Luckily all businesses covered in that documentary did well after all.
  • Military stuff read whole articles from Wikipedia: NASAMS 2, AIM-120 AMRAAM, MBDA Meteor, Iranian submarine force (Kilo (Russian) & Quadir/Ghadir (Iran) & Yono (North Korea) class), supercavitating torpoedoes (Hoot &
    • Shkval) and Iron Dome (Israel) antimissile systems.
  • Studied Missile Development project control goals. Strict requirements for features that has to be demonstrated before proceeding any further with project. Great way to have strict way for Prooft of Concept.
  • Studied pricing of one service provider. I got really upset, because their pricing was really confusing. It didn't cover edge cases at all and interaction of different mix and match solutions were also missing. People who make pricelists should really think out the cases, instead of listing just prices. It's just like doing my daily ERP integrations, I get way too often technical description of some API and then question how much it would cost to do this. Well, it's great, we now know that we can REST, but all the details required to generate the actual payload data are still totally missing. "Hi guys, how much does a car cost?"
  • Laughed at Jysk Xmas website. They are using Azure but doesn't help at all. They send daily emails asking quiz questions and then users rush to site to answer those questions. Guess what, site is absolutely jammed (over 30 seconds / page load) when those mails arrive. At least they were smart enough not to run this xmas campaing using their mail web servers, because I assume situation would have been even worse in that case. If Google App Engine would have been used, I'm sure it would have performed well. (I'll be posting Google App Engine gotchas post bit later.)
  • Other stuff I have read about lately: Big Data, Map Reduce, Hadoop. Well, isn't that just tabulating data? Makes me smile! Check out Tabulating Machine. Overall System Security, Data Erasure Procedures, Enterprise License Management, Startup Tech Companies, Hyper convergence products: Nutanix Complete Cluster, Scale Computing HC^3 (Hyper_Converged Compute Cluster) and Simplivity Omni-Cube. Storage systems: Hitachi Data Systems, NetApp, EMC Cisco Systems, EMC, VMware, VCE Vblock. Storage Networks (SAN) Storage Computing: Nimble Storage, Tintri and Astute Networks. Cognos, Sales, CRM, ERP, Data Analysis, ePOS, POS, mPOS, Point of Sale, Customer Loyalty Systems, Store Chain, Big Box Super Market. BigQuery, Data to Intelligence (D2I)
    • "Volume, Variety, Velocity", Open Data, Stream Computing, A/B testing, Continuous Deployment, Software Defined Network (SDN), Open Flow, European Public Sector Information ESPI Platform
    • Public Sector Information (PSI), Open Knowledge, PoS CRM sale data analysis on customer level.
  • I'm mentoring one really young but still promising nerd. He has done all the server stuff I have done (VPS, Linux installations, web servers, dns, mail server (smtps,impas,webmail), starttls with authentication, earlyssh, full disk encryption, SSL / HTTPS), some PHP code etc, and he's only 13 yo.
  • IPTV service TVkaista is under threat in Finland, they're facing charges.
  • Needed to checkout interesting linux distribution called Mageia. I think Ubuntu is going to wrong direction, that's also the reason why I gave up Windows years ago.
  • I visited The Next Web / ArcticStartup Meetup at Teatteri Club, Helsinki (13.12.2012).
  • Read article: 14 big trends for 2013: Preemptive health care, Predictive data analytics, Algorithmic censorship and algorithmic transparency, Social coding, Liquid data and Personal data ownership.
  • Well one company I have been visiting works just on this "Choice Engines" sector and they're currently having problems with system performance and managing all the data they're collecting from users. Here's nice article about the topic. Question remains, how you can avoid being included in these huge datasets?
  • I were in Egypt, there I learned what it is to have super narrow bandwidth. This current 3G connection is a luxury network connection compared to that. In the very monring it was possible to get 100kbit/s (10kbytes/s data rates), in the afternoon not even half of that. Usually packet loss was so high that it was totally impossible to use any services because timeout did hit faster than requests get processed. This is major dilemma. Many countries with good network connectivity assume that these "slow connections" are some kind of DDoS attacks. I just open sockets keep those reserved without doing anything. But it's not true. Getting SSL (HTTPS) negtion done just might take 30 seconds or more. If server isn't receiving request in 30 seconds, it doesn't mean there is anything wrong with it. On my severs I have set many timeouts relatively low, because I got full 1 Gbit/s connectivity between home and server and less than 1 ms latency. Then it's huge surprise to find out that there is 500 ms round trip lantency and bandwidth is well, bit less than what I have used to. Timeouts and limits are really tricky issue.
  • Read long and excellent posting about atomic transactions and what could go wrong. It was hilarious, getting things right is much harder than anyone could even imagine before trying it in reality. No, it's not ok if you think only your layer. There are multiple layers involved, disk drives, controllers, drivers, operating system, several layers of caching etc. It's really hard to make sure that atomic transaction is really committed successfully. In some cases it's even impossible, because hardware, file system or operating system could mask it from your app that data isn't actually yet permanently persisted. So you think your system is working? Just run high write transaction loads and then randomly yank the powercord. Did something get messed up, is the system state still consistent? If not, well, now you have failed fix it.
  • Finished reading Blog Hypnosis for beginners book... Very important psychological stuff for sales guys. Yes attitude to lower resistance etc. It seems that many Egyptian sales guys follow these base rules.
  • Google Sites seem to use no-cache and no-store (as well as noarchive) tags for hosted images. It's absolutely painful to see every image loaded again on every page refresh. I sent them feedback and I'm now really hoping that they will fix this, because the way they're doing it now, its totally crazy. I also keep wondering why they provide Etag for content which is not allowed to be stored or cached. "no-cache, no-store, max-age=0, must-revalidate", ETag:"1355247003982", X-Robots-Tag:noarchive. Google also seems to be very worried about web robots archiving their content. Well, I guess they know how much content for from other sites they cache.
    • Set of full headers from my web page, which I weren't too happy about:
    • HTTP/1.1 200 OK
    • Content-Type: image/png
    • X-Robots-Tag: noarchive
    • Cache-Control: no-cache, no-store, max-age=0, must-revalidate
    • Pragma: no-cache
    • Expires: Fri, 01 Jan 1990 00:00:00 GMT
    • Date: Wed, 26 Dec 2012 11:09:25 GMT
    • Last-Modified: Mon, 09 Aug 2010 16:50:23 GMT
    • ETag: "1281372623009"
    • Content-Length: 119297
    • X-Content-Type-Options: nosniff
    • X-XSS-Protection: 1; mode=block
    • Server: GSE
  • Studied features of Linux 3.7 Kernel ARM 64 bit support, nice. Server side TCP Fast Open, way cool. BtrFS Hole punching is really nice feature. Now you can "resparse" files by deallocating segments of file, when space inside file is freed, without slow and resource consuming copy data to new sparse file. Option to disable Copy-On-Write (COW) is also nice, even if it might weaken data durability a bit (?).
    • Supervisor Mode Access Prevention (SMAP), nice. Yet another layer to layered security approach. I just wonder how many layers we actually need. See Intel documentation. JSF also added TRIM support. Many changes to IPv6, NAT and Netfilter. I really hope that nobody want's to use NAT with IPv6.
  • Had some light vacation reading, finished: Getting Things Done For Hackers (GTD guide), The Trading Profits of High Frquency Traaders (High Frequency Trading, HFT) - Highly Profitable and Secure business with Sharpe ratio of 9.2. So I had some light vacation reading.
  • I'm using multiqueue GTD and it works very well. I never "forget" anything. Then it's just another question how to get taxes done from the list, especially the tasks I hate.
  • Well long day we left to Cairo 4:20 and got back to hotel 21:34. Now I have seen the great Pyramids and the Sphinx, while getting sand blasted. I'm been inside of one pyramid too. We ate at the Blue Nile river barge restaurant. Friend of mine bought painting of Scarab (The Beetle) on Papyrus. Street sales guys are incredible pests. I'm not buying anything, I dont't have money and yet they continue bombarding you with offers getting more and more ridiculous. I guess you have to have really dodgy car to drive in Cairo, traffic was quite funny compared to ours. I'm sure insurance companies are happy to provide full CDW cheaply, NOT. Even boarding procedures were completely strange. It just seems that they're not able to organize things efficiently there.
  • Studied more messaging, authenticatio and encryption(?). Is chaffing and winnowing encryption at all? No, I think it's message authentication scheme, which allows you just to send tons of messages and only the recipient knows which messages are fakes and which one are real ones. So using this method, you can communicate in a way confidentially without using any kind of encryption. See: Null cipher, Steganography (Concealed messaging).
  • Added to Kindle: The Checklist Manifesto: How to Get Things Right by Atul Gawande, The Signal and the Noise: Why So Many Predictions Fail-but Some Don't by Nate Silver, Automate This: How Algorithms Came to Rule Our World by Christopher Steiner
  • Played with Snappy compression. It seems to be working quite well with the huge XML data files which I'm dealing with daily. See Google Snappy code page.
  • Quicky checked out Blake2 hashing and Blake2*p which is fully parallelized version.
  • ROR Sitemap, ROR (Resources of a Resource) is a rapidly growing independant XML format for describing any object of your content in a generic fashion, so any search engine can better understand that content. RORweb.com is the official ROR website. - I really dunno what I would do with this, traditional sitemap or url list is just good enough.
  • In Egypt they honestly said 3.75G network, in Finland mobile operators claim DC is 4G even if it isn't.
  • Studied lightly CDN networks: CloudFront, ChinaCache, CacheFly, of course we're all familiar with old big ones like Akamai, LimeLight, Level3 and EdgeCast. For some reasons CacheFly seems to be pretty slow often to Finland.
  • Studied efficient email server design utilizing: Studied maildir, mbox, procmail, and dovecot caching and datastorage solutions in detail. Btw. Afaik dovecot is doing good job with caching and storage. Sdbox mdbox dbox imap cache, updating automatic caching options / mailbox based on client requirements etc. Afaik this is stuff done right. Utilizing memory cache, cache file, two tier storage, automatic cache optimization. Locking minimization, fsync options, disk io minimization, efficient storage file size (dbox), not too small, not too large etc.
  • Checked: Sieve script mail filtering with Dovecot
  • Read nice crypto article, 7 codes you'll never ever break. Afaik, it's bad that there are mistakes in Kryptos statue, of course it's also in realworld possible that when something is encoded, it's not getting encoded (encrypted) correctly. I just obfuscated my master key passwords using light paper & pen crypto and I have to say, I had to verify encryption & decrytpion three times / passwd, so I'm sure it's correctly encrypted. I do not ever store plain text passwords anywhere. Level of obfuscation / encryption depends from the security level requirement for that password. Because passwords are random nonsense to begin with, those won't give you easy hints when it's correctly deciphered so encryption doesn't usually need to be very strong. Basic ECB works quite well, even if that's not what I'm using (maybe).
  • These Egyptian guys at hotel didn't know how to use Sauna properly. They didn't throw water on hot stones at all. The sauna was totally incorrect and unbearable dry hot room, until we fixed a few things.
  • A friend of mine made an interesting project using MongoDB. This is yet another temporary email service, but it was mostly studying project. See: my10minutemail.com I just whish he would blog more about the stuff he does. He has really thought about details, how to prevent message loss, how to handle message bounces correctly, how to store data to database etc. I have seen so many services which are in production but they haven't really paid enough focus on technical details. One of these services which fail is boun.cr. Their out going mail server doesn't have proper reverse DNS record. That's a major fail and leads to situation where many receiving mail servers to reject forwarded messages.
  • Played with QR codes and also studied the encoding method, error correction etc. I have to say that I don't like many things I see, but QR coders are well done.
  • Reminded my self about NUXI problem and differences between small and big endian systems.
  • Checked out release documentation of App Engine 1.7.4: Expanded EU Support, Task Queue statistics, Traffic splitting (between application versions), LogsReader and Logs API, makes analyzing logs a lot easier.
    • Expanded Datastore query support, DISTINCT queries. DISTINCT returns distinct set of results. Just like you can do with python by forming set from list. set([1,2,3,2,4,3]) returns {1,2,3,4}, or simply in python shell {1,2,3,2,4,3} returns {1,2,3,4}. Distinct queries are based on index so it's pretty good way to get the results, without distinct feature it would be better to normalize data. Except that data store doesn't support joins, so it might make things actually pretty slow, unless you're caching all normalized fields.
  • Just some fun:
    • JavaScript:
    • 0==false // true
    • 0===false // false
    • 1=="1" // true
    • 1==="1" // false
    • Python:
    • 1==True // True
    • 1 is True // False
    • type(True) // bool
    • True is bool // False
    • That's all basic stuff, we should all know it.
  • Python performance play:
    • Each if statement is executed 10 million times.
    • if True == True : 1.78s
    • if True is True : 1.64s
    • if None == None : 1.76s
    • if None is None : 1.66s
    • if True: : 0.93s
    • if 1: : 0.93s
    • pass only : 0.93s
    • I have been doing quite many tests like this for my PCP caching class. I just hope I'll get it released soon. It needs a little final touch and it require just the right mood to dig into details.