Blog‎ > ‎

Home Lab, UX design, Web Peformance, Startup, Data Discovery, Cloud Security & Platforms

posted Feb 17, 2013, 5:59 AM by Sami Lehtinen   [ updated Mar 25, 2015, 10:41 AM ]
  • Why I'm doing all this? I'm no different than other IT guys afaik. here's excellent story "Why you need a home lab to keep your job".
  • UX design for startups. This is something that every developer shoud l know. It's very important to provide users pleasurable UX. I have seen way too many bad user interfaces and way too complex user interface logic, even when trying to complete very simple basic tasks. Direct link to PDF.
  • Web performance crash course (PDF). This is also something what every Web Developer should know even when suddenly waken up in the middle of the night. I'm actually quite happy that nothing except a few JS tricks were new to me.
  • After reading previous document, I modified my test JavaScripts to work with async loads. This clearly makes page rendering look faster to user, even if some content is filled in later.
  • Updated and HTML code with correct HTML5 DOCTYPE definition.
  • Finished reading (yet another) Finnish startup guide. Creating company, from idea to business idea, business plan, SWOT analysis, risk assessment and protection against the risks, sales, marketing, marketing tools, funding, pricing, pricing models, start money (from government), types of business entities, official business registration, bookkeeping and financial statements, taxation, taxation with different business entities, insurances, business pensions, hiring employess, environmental responsibility, 10 steps to success, check lists for business owners. What is a startup, how to grow your business, key personnel, team, shareholder's agreement, competition advantages, other important things to know, list of common business vocabulary. I have earlier read a whole book about this topic. But this guide was a compact one, only 60 pages.
  • Studied cloud provider CloudSigma. Virtual service providers Digital Ocean. As well as front caching using aiCache.
  • Checked out Python Wheel Binary Package Format 1.0 (PEP-0427, PEP 427)
  • Announced that Off-The-Record service will be closed, I don't have time to maintain older projects. will probably die when I would need to upgrade it from Master Slave (MS) data storage to High Replication Data (HRD) storage. Time will show if I'll have time / motivation to upgrade it when that time comes.
  • As for privacy policies and security, studied Lavabit secure mail services Security and Privacy documentation. (Secure, Privacy Policy)
  • Quickly checked out Amazon RedShift.
  • Steven Gibson recommended using  1 second scrypt password encryption to store passwords securely. Depending from site and attacker, that might make site vulnerable to DDoS using Layer 7 attack (Application Layer Attacks & Protection). Large number of users just try to login simultaneously overloading the system. Because password check is so slow and ties up a lot of CPU and RAM it could easily cause denial of service situation. Of course there must be sane limits how many and how often that password check can be triggered, but with large botnet it could be possible to bring site down at some important time or at least prevent users from logging in. Simply overloading those by utliizing the slow login process / servers.
  • Tried Tableau and QlikView quickly with one large data set. I might like to try SpotFire, Catavolt, BellaDati and Talgraf too. At least the to named tools were quite awesome. Tableau seemd to be better (simpler / faster to use) based on really quick testing. I got nice results from the dataset I used using Tableau. I'm sure many people do not realize difference between Business Intelligence, "Reports" and Data Discovery tools like these. Checking the data using different perspective is just so incredibly easy. I just got one tip, please make data available in tables without too many joins, simply de-normalise it, and don't use crypting integer references and flags in data. Each column should be simple and straight forward to process. Too complex data structures kill (easy) usability of these great tools. - Afaik, all software vendors should provide free trial for competent people. Why I would buy something, unless I have been able to play with the product first?
  • Studied byod best practices and mobile device management.
  • Thoroughly studied Check Point's DDoS whitepaper. DoS Attacks, Response Planning and Mitigation.
  • Wondered if sites protected by CloudFlare are vulnerable to RUDY attack from large botnet. Played with PyLoris and Python raw sockets. Finetuned sshd cipher and authentication parameters.
  • I didn't watch these, but if you're interested there are now video tutorials training you how to utilize Google App Engine.
  • Studied ENISA CIIP documentation for Cloud Services: Cloud computing is critical, Cloud computing and natural disasters, Cloud computing and DDoS attacks, Cyber attacks, Relevant Threats, Different scenarios, Infrastructure and platform as a Service the most critical, Administrative and legal disputes, Risk assessment, Security measures, Logical redundancy, best practices, Monitoring, Audits, tests, exercises, Incident reporting.
  • Read this nice writing about Heroku and how important routing requests to right handlers are. There have been followups for that, I really wouldn't believe that Heroku would use such a bad request routing.
  • Checked out Googles new App Engine (Also SDK 1.7.5) instances with high memory options. F4_1G and B4_1G which allocate 1G of memory for every instance. Previously only 512kB options were available. Also mail bounces are finally getting processed properly. No more sending mail blindly.
  • DDoS @ Wikipedia: Well, back in old good days when there weren't too many checks in many P2P protocols, it was really easy to feed false source information to network, making all clients connect one desired server at once. Many P2P networks are now much more strict about sources and won't often pass information forward without verifying it, effectively limiting damage caused by false source / tracker / peer information. First Edonkey 2000 versions with DHT were really dangerous. You just gave IP and PORT and then there was lot of traffic directed to that address. Also there was some kind of flaw how DHT looked up it's peers. If you made sure that the client id was lowest or highest in network, it seemds that those IPs got tons of traffic, when other clients updated their DHT peer information. When I played with that, I had to stop, because even back then it was able to saturate 10 megabit/s connection, which was super fast, because most of people were using 33.6k or 56k modems. Changing client DHT address was trivial using HEX editor and what's even worse, the "outside" IP and port reported to network was directly in configuration file as plaintext. Yes, I was useful if you used NAT and client couldn't figure out your public IP. But it also allowed targeting any other IP/port combination on the Internet.
  • Amazon Route 53 DNS supports now primary and secondary servers. So you can define your traffic to be routed to another server in case primary server(s) fail. Really easy way to add simple failover redundancy or simple pretty error pages.
  • Played with Bing Webmaster tools. It's really annoying that Bing considers lack of sitemap as an error and nags about it every week. Well, now I have a sitemap for every site.
  • A nice article about A/B testing. Should be pretty clear to every web developer too. I personally might prefer multi armed bandid approach instead of plain A/B testing. It would allow testing any number of combinations simultaneously. Of course that requires adequate traffic to be analyzed. Same methods can be applied to that which are mentioned in this post. 
  • Studied Fatcache documentation. What it is? Well, it's "memcache" for SSD drives. No, not memcache in front of SSD drives, it's SSD cache for any slower subsystem. I still think that I have to study memcaches eviction policies more carefully, I don't actually know how exactly memcaches cache eviciton works. Otherwise Fatcache looks like a great idea, and it's just yet another cache tier in multi-layered cache / date storage system. 
  • Many peole doesn't seem to get how caching works at all. They always claim that it's important to have blah blah large SSD drive etc. I personally think you don't need large SSD, if you simply use SSD as write-back block cache and utilize SARC for reads, you'll get really effective block caching in front of your primary datastorage. And you'll save ton of money. For most normal desktops having 128 GB SSD and 3 TB storage disk, it enough. Blocks which are regularly used and store are stored on SSD and rest on storage disk. I'm 100% sure that automated caching makes better work on this detail level than any nerd who tries to manually optimize what data is on SSD and what's on the secondary disk. It would actually be fun to see how great the difference between "intelligent human" and automated caching would be. Only risk is huge reads which might completely flush SSD cache if pure LRU is used. Like repeatedly reading 1 TB file several times from the storage disk.
  • TorBirdy, a TorButton for Thunderbird. - Nice.
  • Firefox OS, Tizen, Jolla, Ubuntu Mobile OS and Bada are coming? Who's going to develop native apps for every environment? Here's the Promise of FFOS. It remains to be seen if end users really care about new alternatives.
  • Article about PEN Testing and security drills. (Penetration testing, Zero-day exploits, backdoor code, drill, hackers, cyber security, CERT, Integrity testing, Stress testing) Basic stuff, systems can be attacked on multiple levels. Can anyone actually protected their systems from hackers anymore? Things are simply too complex that there would be anything that could be called as secure. 
  • Mega vulnerability reward program got quite nice list of different security severity classes:
    Severity class VI: Fundamental and generally exploitable cryptographic design flaws.
    Severity class V: Remote code execution on core MEGA servers (API/DB/root clusters) or major access control breaches.
    Severity class IV: Cryptographic design flaws that can be exploited only after compromising server infrastructure (live or post-mortem).
    Severity class III: Generally exploitable remote code execution on client browsers (cross-site scripting).
    Severity class II: Cross-site scripting that can be exploited only after compromising the API server cluster or successfully mounting a man-in-the-middle attack (e.g. by issuing a fake SSL certificate + DNS/BGP manipulation).
    Severity class I: All lower-impact or purely theoretical scenarios.
  • Confirmed that X-Frame-Options and X-XSS-Protection are both correctly configured with my sites.
  • There's strange bug or maybe it's a feature with Dolphin Browser. I have selected that I want a new tabs to be opened into a background tab. Earlier this did mean that when ever new tab is opened, it was opened in to a new background. But now something has changed and now every link (with out request to open into a new tab) is opened into a new background tab. Even if it was regular link that should have been followed in this tab. Let's see what other users think about that and if they'll change it back in future. It was super confusing, I clicked several links repeatedly and wondered why nothing is happening. Until I found out that I got tons of new tabs open in the background.
  • With one own target domain and about 50 clients with different request types and requests bots. How fast and what kind of requests are blocked and when. What are the criteria for blocking etc. What if requests come from large set of individual IPs etc. How well known good request filtration works, if there is large number of slow requests with some previously unknown characteristics. What if get requests contain additional parameters to bypass sites page / request caching or if existing parameters are modified to cause continuous stream of failed requests. With some platforms even these failed requests consume quite much resources, if same failing requests aren't done all the time. After some testing it seems that it's possible to generate permutation which changes requests over time so that those won't get easily caught. Any kind of traditional passive overloading doesn't work very well where you just flood similar requests from multiple clients to one target. It seems that if the origin host for the site, doesn't not have adequate resource reserver, it's easy to make site slow or even cause denial of service situation. With sites with reserves it's considerably harder and at least you will need larger pool of clients, so any of the clients doesn't trip protection. For some sites making legitimate requests to older infrequently accessed data seems to work very well. Because server caches can't satisfy these requests. In these cases only difference between other requests and these requests are the fact that I know that the information isn't cached on the source server or in CloudFlare network. It's quite hard to detect this kind of attack as an attack, because it's just larger number of clients, accessing non-cached data, with relatively slow request speeds. Of course some sites can help the CloudFlare by telling in return data if data was cached or not. This helps to identify clients which bombard site with requests with non-typical data access patterns.
    X-Cache: MISS from X-Cache-Lookup: HIT from Age: 14 X-Cache: HIT from X-Cache-Lookup: HIT from X-Cache: MISS from X-Cache-Lookup: HIT from Via: 1.1 (squid/2.7.STABLE9), 1.0 (squid/2.7.STABLE9), 1.0 (squid/2.7.STABLE9)
  • Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS), Advanced Evasion Techniques (AET), Intrusion detection system evasion techniques All seems to be simple and straight forward on theory level. Of course knowing all the insights and how each individual device and operating system works, is completely another story. But that is information which is available thorough searches and pure simple research, if you know what systems you're targeting.
  • Played with Python and raw sockets, allowing me to open tons of simple tcp sessions, without loading my own system with sockets. Because we all know that plain syn flood isn't going to work, then it's just required to handle the sessions bit further until suitable break point is found. Or simply asynchronously handle packets on event basis so that huge number of connections can be kept without loading local system too much.
  • Checked out the star wars route. I immediately wondered why they needed two routers to do that. Any hardware / software can be simulated using raw sockets, so using raspberry pi would have made it as well possible than using two Cisco firewalls.
  • Playing with reverse names (RDNS) is also fun, some badly designed services trust reverse names and it allows all kind of nice tricks to be done as well as naturally providing fake ident service (identd) data.
  • Studied Continuous integration (CI)
  • Found out that ZyNOS got issues with DNS caching and round robin. If I have configured 10 A records for one DNS name, Zyxel will only store first of those in the cache and return it to all clients. Therefore breaking round robin and load balancing or redundnacy that might have been used for.
  • Found out that some Telewell models got IPsec related memory leaks, renegotiating IPsec connection over and over again leads to out of memory situation.
  • I have watched many Air Crash (Mayday) shows. It's interesting to see how minor mistakes mount up and cause major catastroph. I'm also very curious about the usability issues, like how hard it's to read some meters or see if switch is on or off. So that TV-show is important for even IT guys. Don't design programs which confuse users, make user interfaces clear, tell all vital information correctly so that it is easy to understand, and don't flood users with useless or false information.
  • Google+ Platform: Especially Pages API seems to be interesting, otherwise read only APIs are quite useless. It seems that it requires whitelisting, especially granted access after an application has been submitted and hopefully accepted.
  • One guy lost his disk encryption key... This is exactly why I always keep paper backup of the master passkey. But, the paper backup is encrypted with light encryption. Why not to use strong one? It really doesn't matter, the master password is random string and 16 chars long. Then it's encrypted with simple phrase, using substitution, partitioning and transposition. After those steps, I'm confident that the password on paper is also utterly useless to anyone without knowledge how it is encrypted and what the simple pass phrase is. The backup key is also hidden outside any reasonable search area.
    You could also utilize very simple methods like reversing case of random password, or swapping parts, adding or removing something you know. Like prefix to strengthen the password, you just always write passwordpassword (or something similar) and then add your real password. Without knowledge to the attackers now your 6 chars long f8Snb3 random password is 22 chars long. Don't use any of the schemes mentioned here, make up your own.
    The password container software is configured to run about ~10 million streghtening iterations on the password before it's being used. This means that it will take about two seconds to verify one password. (Yeah of course depending from many factors.) - Password strengthening can be done using memory hard problems, like scrypt, which is way better than options which only consume pure processing power. (Read about memory hard problems)
    You should also be aware of corruption risk of encrypted data. Therefore it's better to always have a off-site backup set with different encryption key(s). I usually do not renew both keys simultaneously, so I can reasonably recover from the backup even if I would lose the master key.
    Of course you can also use indirect method, where you map to numbers and letters, pages, rows, char poses and therefore the password on the paper has absolutely nothing to do directly with your password. Do mapping so, that distribution is even and it's not clear that it's offset references. Then you just know, that when you pick (pdf/book/file,source code) X and start applying your code, you'll get your password.
    Generally I have absolute minimum length for passwords 12 random chars and for master keys I prefer 20. For keys that I don't remember, I use 32 random. If you're using AES256 and prefer to have 256 bits of entrypy in your password, use fully random password of 40 characters (including large set of special characters) or more.
    Giving password to lawyer is good idea if you want someone to have your password, in case something bad happens to you. Otherwise it's totally pointless. If I'm gone, my (private) data is gone, and that's it.
  • Finished reading Christensen - Innovator's Solution.
  • Google Cloud Platform: Studied BigQuery Best Practices, Tutorials. Read BigQuery Cookbook. Uncompressed vs Compressed data formats. Data de-normalization. Schema, data conversion, xml, json, csv, query basics. API V2 overview, Google Compute Engine, Overview, Main Concepts, Instances, Images, Networks and Firewalls, Disks.
  • Why People are upset about Facebooks Graph search? Isn't it clear that what ever they send to Facebook should be anyway considered to be public information. So what's the news?
  • More security reading, Lucky Thirteen: Breaking the TLS and DTLS Record Protocols. Doesn't seem to be quite feasible, but who knows, maybe attack can be improved.
  • Found out performance limits of one firewall which is quite important, for some reason they promise it should provide 50 Mbit/s IPsec throughput. Actually when using small packets it stalls way earlier (CPU maxed out). Didn't find any way to work around it. Seems that it's good time to replace firewall with new one, which also would support OSPF, SSL tunneling and naturally IPv6 as well as intrusion detection features.
  • Studied: Ulteo Open Virtual Desktop, rdesktop, xrdp, Ericom, TS RemoteApp, RDP, Citrix, 2x, ThinStuff - Tested solutions with Windows 2008 R2 Data Center and Windows 7 Professional (64bit) - ThinStuff is excellent for light remote desktop virtualization. I'm just not 100% if it fully accomplies with Microsoft TS licenses.
  • Finished reading about Spanner: Google's Globall-Distributed Database whitepaper. (PDF) (TrueTime, NewSQL, Paxos, NoSQL, MapReduce) 
  • From developer magazines I did read Cloud Services and Strong Identity article as well as Possibilities of Future Internet Payments. 4G the next generation of wireless communication. Long BIG DATA article. Hackerws view, how to utilize open public data. From wage salve to millionaire, making your startup business successful.

I think this is summary for bit more than two weeks.