Blog‎ > ‎

Topic mega dump 2014 (2 of 3)

posted Jan 31, 2015, 9:29 AM by Sami Lehtinen   [ updated Mar 2, 2015, 7:52 AM ]

Dumping more, as fast as I can.

  • Discussed with a guys team formation, company formation, initial fund raising, initial business plan, technology and customer validation, key idea development, early prototype development, used lean canvas for that. Product requirement specification, architecture and product design, customer validation.
  • When through Windows 2012 data center server audit settings.
  • Understanding customers point of view, end-user experience, product usability. Technical sales support tasks. Key customer responsibility. Customer contacts. Service desk experience. Wide ranged experience in different roles. Let's get this done attitude. I don't spend days thinking why we can't do it, if there's no real reason. Just do it. Self guided, I'll study what ever I think I need to know to get the job done.
  • Read: Pathak, Hu, Zhang (2012). Where is the energy spent inside my app? - Study shows that advertising components of free applications use surprisingly lot power from smart phone.
  • Read long Finnish article about cloud service security myths. (Sytyke 4/2013) - I agree with them. In the private cloud meeting, I did mention that it really doesn't matter if public or private cloud is being used. Because absolutely the highest real risk comes extremely insecure configuration mistakes on production systems. Like using bad passwords or even worse, not enabling authentication or encryption at all. Or using administrator accounts with ridiculous passwords. Or major design flaws in the application. - Using private cloud doesn't change a thing.
  • Global IPv6 statistics - IPv6 is coming, slowly...
  • Bluetooth Low Energy (BLE)
  • Wireless power consumption study by Microsoft - BLE, ZigBee and ANT being compared.
  • Technology is getting cheaper all the time. This should enable individual secure authentication tokes for consumers. Maybe using eink display and with fingerprint reader and so on. Having separate individual hardware as Trusted Computing Platform could bring that security to computers, phones and tablets.
  • Read about: UWB, ZigBee and Scatternet, 802.11ac
  • Note about BLE: "Bluetooth LE comes with very low, and in many cases broken security. BTLE or Bluetooth Smart, is a new modulation mode and link layer packet format targeting low powered devices and is found in recent high-end smart phones, sports devices, sensors, and will soon appear in many medical devices. Unfortunately the security implementation is broken with the encryption of any Bluetooth LE Energy link easily being rendered useless. This flaw in security can allow any device nearby to eavesdrop on Bluetooth Low Energy conversations. This includes packets being intercepted and reassembled into connection streams, as well as injection attacks." - Ouch!
  • One way to fixing the mobile authentication security issues, would be having separate device with display, cryptoengine and bluetooth le connection via mobile phone. It would be quite secure yet, fully connected to internet via device which takes care of the networking which requires more energy than bluetooh le.
  • Event-based One-Time Password (OTP), Time-based One-Time Password (TOTP), Transaction Authentication Number (TAN), Two-step verification, PIN control mandatory/optional, Security Question, Challenge-Response, Mutual Authentication. (Multi-factor, multifactor, twofactor, two-factor, authentication) - Man-In-The-Middle, DNS Cache Poisoning, Trojans, Man-In-The-Phone, Browser Poisoning - Excellent! Finally solution which actually provides solution to these issues. Many 2FA alternatives claim to fix these issues, but it's not true. As long as user authenticates "something" without knowing exactly what is being authenticated. Aka lack of transaction data verification and signing. Then most of MitM attacks (how ever they are techniclly executed) do work. Their solution is also OATH Compliant. 
  • SQLite3 page size - 1k, 4k vs 64k depends from usage. Note, it also grows memory requirement for cache, unless it's modified because default cache is 2000 pages.
  • My journey into FM-RDS [30c3] - Well, I assume everyone has seen this already. But in case you haven't. It's worth of checking out. I think it shows great attitude. How hackers can get things done. I'm sure it won't impress designers of 4G LTE or future 5G or MMIMO radio engineers. But it's just great combination of hardware, programming and analysis.
  • Quickly checked out Coreboot
  • Using synthetic aperture radar for high bandwidth directional communication from point to multi point.
  • People really should think twice, before sending any private information to dropbox. And come to very clear conclusion that it shouldn't be done! Btw. Same rule applies to email and many other cloud services. 
  • "Disabling inbound SSH has just been a way for me to stop myself cheating with automation (Oh, I'll just SSH in and fix this one thing)." - This is the attitude I love! Just don't fix it temporarily. Fix it for good!
  • Excellent post about AWS Tips - Many of the things mentioned are generic best practices so shouldn't be news. Security, Automation, Monitoring, CloudWatch, Scale horizontally, Naming convention.
  • Why PATCH is good for your HTTP API
  • IPv6 NAT (NAT66) could help solve a big problem; cheap multihoming for outward connections. The case where a business / individual needs a redundant load balanced connection that costs only twice as much as a normal connection and a fraction of what a BGP-multihomed.
  • Lack of encryption isn't even big fail. Much worse fails are allowing services publicly and totally freely accessed via Internet without any authentication. One IT guy told me that their remote access system is working very well, when I told them that they really should fix it. He didn't realize it was working too well. Anyone in the world could access their servers using plain standard VNC without any restrictions or passwords. Well, that's the state of default security. Often these remote access tools are installed using system account, so you can actually use these to reset the administrator password too.
  • Studied VISA's Verified by Visa and MasterCard's Securecode, which both add extra authentication layer when credit card is being used. In many cases this means TUPAS authentication in Finland, which is considered to be very secure due to random OTP verification codes.
  • Indirect attacks where secondary targets which lower security are attacked first and used to gain access to primary target via VPN or software updates or other alternate interfaces which aren't directly available.
  • Studied Python Scrapy and Pandas
  • Sleeping over before deploying into production is good idea. It seems that my integration software had a bug which wasn't caught by the automated tests. But during the night, I had a dream about a bug. First thing in morning I checked if the bug really was there. It was! So I fixed it as a first thing in a morning, before pushing code into production. - Dreams can solve problems.
  • A Smarter Bear - Visualizing the Interactions Between CAC, Churn and LTV  -  Absolutely marvellous post!
  • HTML5, WebRTC, STUN, TURN, Data channels - WebRTC can be used for so many different purposes. Especially to build in browser P2P / Mesh and CDN networks.
  • Studied PeerCDN - But now 2015 it seems to be acquired Yahoo
  • Checked out MsgPack even if I don't have real use case for it right now. - I'm pretty happy with JSON and if I need to store something into minimum space, then lib lzma helps.
  • Had problems with a few servers for several hours due to massive DDoS attacks. The attacks weren't actually targeted any my servers, but did actually flood the incoming connections to whole data center.
  • Mbox - lightweight sandboxing - Nice!
  • Only 90s Web Developers - Excellent technical examples of the web history. Made me smile!
  • MongoDB 2.6 highlights. - I really like the min, max and index improvements.
  • Read about and played with Telegram and Threema
  • Wanted to write some thoughts about system security and corporate politics, but I guess I've made my point already.
  • Checked out service provider Otaverkko.
  • Checked out SecureDrop
  • Read about network intrusion detection systems (NIDS), host intrusion detection systems (HIDS), network-based intrusion prevention system (NIPS), intrusion detection and prevention systems (IDPS)
  • Cryptographic software obfuscation
  • Even with technically educated people, it's really important not to forget how important it is to keep things simple and get default settings correct for applications. - KISS! Lack of sane default settings can lead to serious configuration and security issues.
  • F-Secure Whitepapers
  • Tried to study Boneh–Lynn–Shacham (BLS) - Too deep stuff for me.
  • Database trickery continues, it's just so wonderful. There are in the database two versions of the same table, old and new. Another is called customer and another is called customers. The customer table is old copy of the customers table. It's just so wonderful that there are misleading and confusing naming used through out the project. In some cases tables are dbo.tablename and in some cases those are company.tablename and differences in naming conventions go on. All of the references do not indicate if it's dbo or company version of the table they're referring to etc. I just can't stop loving deliberately (?) confusing solutions. Reminds me about one funny article how to write absolutely unmaintainable code, by using confusing naming for everything on purpose.
    What is this code for?
    def do_very_important_things( input, output ):
      important_data = get_data( output, input )
      return important_data
    Now of course this stores input to database using output as key and important_data isn't data at all, it's the boolean status telling that if the save data was successful. Are we having fun yet?
  • Wondered how engineers and testing can produce such an outcome:
    Basically if we have transaction IDs like 1,2,3 ... 10 with monotonically increasing counter and there's parallel task exporting data in those transactions. They'll manage skip one transaction after every export. Yes, it's classic off by one, but how it's possible that nobody notices it for ages?
    As example if we generated transactions 1..10 after exporting those during the day, we might actually end up exporting only: Transaction IDs 2,5,8,9,10. Ugh! Ok, clear fail. 
    Thing gets even more curious when you ask about, if this stuff should be working. Everyones saying it's working and tested and being used in production, so it must work. But no, it's not working! Close examination reveals that is hasn't been actually working ever.
    AFAIK, this isn't how systems should work. Nor anyone should claim that such systems are working and production quality stuff.  
  • Well, excellent question. Actually I've been planning to blog about this. But it seems that this is still in my blog about backlog. But I'll give you a short write up here. I've been working around it using several different methods. Others are more successful that others, but each one got it's own place when required.
    What you would consider to be a good method?
    Method 1. "Walk the IDs".
    SELECT * FROM Table WHERE Key=12345678;
    If data is returned, I run query again with Key += 1.
    Method 2. " Maxing it out, why?"
    SELECT * WHERE Key BETWEEN 12345678 AND max(Key);
    Nice idea, really nice idea. But this is going to trigger full table scan. Twice!. So If you use this with method 2. to get data for seemingly smart range select, you're going to have really bad time triggering two full table scans per case. Yes someone really smart used this SQL code. Leading to catastrophic performance results.
    Method 3. "Use some other field with high cardinality and similar kind ordering"
    SELECT * FROM Table WHERE SomeOtherKey>=Something AND Key>=12345678;
    This method works well, and allows me to get new data. But doesn't actually allow tailing the table without being sure enough that the SomeOtherkey tracks the database keys closely enough. Basically this method requires me to maintain secondary table, where I keep key's of this table which I have already processed, so I don't end up processing dat twice. If SomeOtherKey doesn't track table id's closely enough, there might be a situation where some data is unfortunately left out from processing. So using this method requires you to know your data and related system functionality pretty well.
    Method 4. "Optimizing method 1."
    SELECT * FROM Table WHERE Key=12345678 UNION SELECT * FROM Table WHERE Key=12345679 UNION ...
    If I get data with that method 1. query, and I don't want to make as many query round trips to server I can get data more efficiently by using this absolutely beautiful query by just adding 100 ... 1000 union statements and seeing what's returned after that. Then I just record the last record returned and next time I can query it using method 1 again.
    Now I would really to get some feedback. Did you cry, face palm, smile or  laugh when you were reading this post. 
    Depending from situation I'm using Methods 3 and or 4. I know it's not optimal, but it's what works best for current situation. Unfortunately I can't affect which columns are indexed and which aren't nor add new columns / indexes for optimization purposes.
  • Had some fun while configuring program to print graphics with a receipt printer (without a driver).
    Their documentation is horrible. But as old binary / serial data fox, I managed to get things done even with bad documentation. I just can't imagine how many "it guys" would have claimed that it doesn't work. When it does, exactly as they say in manual. When you just know how to do it. It seems that many people are clueless about binary protocols and serial communication, even if TCP/IP is also serial communication.
    In manual they specify that to print logo you'll need to give a command:
    Where pL,pH is data length in small endian format. In this case it should stand 0600 (in hex) which basically means that I'll be giving 6 bytes of parameters. Next there is op codes 48,69 which are actually in decimal. Ok, convert those into hex 0x30,0x45 and then comes the most interesting and worst documented part. Document simply says, that you'll have to fill kc1 and kc2 bytes with the graphics id you're going to print.
    The program is used to store the graphics to printer Flash memory, only shows bitmap image ID like 10 and 11, when storing two images to internal flash memory. But documentation says that kc1 and kc2 has to be in range 32<=X<=126 ... Hmm, that's strange! I tried all kind of stuff and alas, didn't produce any kind of results. Then I got bit upset, and said that I'll make this enraging printer sing whatever it takes. 
    Then I wrote little program which simply goes through all possible logo ID's. After two debug printout receipt rolls later, I knew the answer. Right code for printing the logo in first logo position (10) is 31,30. AFAIK that doesn't make any sense, because the lowest possible address would of course be in hex 20,20, and now I found the logo graphics from 31,30 position. What kind of logic these engineers got? I still don't know why kc1 has to be 31, but it's also possible that the address 10 they're reporting for first logo is actually 0x10 which added to 0x20 makes sense as being 0x30.
    The final magic touch? How easy it is to print a small image on start of receipt. Well, that's really easy.
    This is the final hex string: "1D284C0600304531300101", and it works just as expected. Who needs windows printer drivers? Smile.
    As usual, this was only the technical part. There were many other interesting and not so interesting things, like... Running out of receipt paper, Windows printer spooler now allowing the print job to be canceled and even restarting it from the very beginning, etc. Windows 8.1 usability issues, USB virtual com problems, etc.
    Also in the documentation fields x and y are narrower than other fields. Also x and y values can be only 1 or 2 (boolean with strange values?). Does this mean that x is byte? Or maybe it's one hex char? Because in some cases there are fields which are bitmapped like command 1B,21 where next parameter byte actually is bitmapped and uses all 8 bits. So it would be also reasonable to expect that x,y could be given as either 00,01,10,11 or 0-3 (decimal/hex) in binary, or as single hex signs like 00,01,11,11. But after all when I studied the manual hard and looked for other examples, it became clear that those are full bytes, so only values in hex 01 and hex 02 are allowed and x,y is two bytes. This of course also affected the value for pL,pH.
    I just so much wish that the documentation would make it clear, when they're talking about bytes, bits, decimal or hex. Or at least provide working example source codes almost in any language, which makes it just easy to check from code how things should be if documentation is bad.
  • I tried making one very simple thing, print wide pages from label printer. And it seems completely impossible with Office and the printer driver. Probably some engineer has implemented too high level logic and automation. Which unfortunately screws up even very simple configuration, completely preventing correct printing. 
    It impossible to configure Office to understand that page could be wider than what it's long. It always corrects this 'error' by using landscape mode and flipping length and width. But that's totally wrong, because in this case, pages are actually wider than long. I never asked for any darn landscape. But alas, it doesn't work, I have tried multiple tricks but nothing works with Office. It's likely that the printer driver causing the problem, because printing from other programs work just as expected.
    Actually after playing again with this "#%"#, I came into conclusion that the printer driver is utter "#x"#. Why? Well, in some cases when I print out labels I start print job, it gets rotated 90 degrees to "landscape" as I mentioned earlier. Then I just hit print again, it comes out ok. And hit print again, it's again rotated 90 degrees. and next. Aww. But if I choose to print copies or just print multiple labels in batch those are all correctly aligned. So if I need to print 1000 labels all as portrait? That's easy, I just print first page of that batch out, and it comes as landscape. Then I'll print out the whole job or just set 1000 copies of some other label, and those all come out ok. As long as those are all printed as one single print job. Hail drivers, hail Windows, hail Office! If you're asking if it rotates 90 degrees more? No, it's landscape, portrait, landscape, portrait, it's never upside down or rotated 270 degrees.
  • I've been planning to create some kind of web service. I'm not just yet 100% what it should be. Something quite simple, but something what customers are ready to pay for. Which wouldn't require lot of resources or development efforts. So I could run it my self.
    Maybe it should be some kind of specialized B2B software, which isn't obvious for everyone, but it's still solves a problem for small business customers.
    One business I've been thinking would be some kind of calendar integration software for major calendar platforms. But I'm quite sure that has been already fulfilled. Really easy way to integrate and send calender reservation information. I know there are plenty of calendars apps, but most of those aren't integrated. Maybe building simple API to deliver that calendar information would be nice?
    Actually it's part of the other plan. Which is simplifying a complex process. Of some business provide APIs which are really complex, I could simplify it so that it's easy to use for everyone. That's what N+1 SMS sending businesses have done. Basically anyone could do SMS gateway, but it's so complex, that it's good business to provide simplified version if it for a fee.
    That's why I started to think about Mobile IPv6 gateway. It would require setting up a few primary components, basically HA, and after that users would use what ever connection they have to use the IP addresses assigned by me. So the actual data flow, wouldn't pass via my service. So it's easier and cheaper to provide than VPN. But of course there can still be multiple restrictions, which networks support it and even the protocols aren't widely supported.
    Would you happen to have any ideas? Just asking. Is there an problem, which hasn't been already solved, and it's semi complicated, and could be simplified. Just mail me about ideas.
  • Had a multiple meetings with one old friend in Finland. If Finland / EU area would need another SaaS service which would generate quite simple discounted cash flow analysis? Why? Well, he's economics teacher and found out that there are tons of small businesses which don't do it, and many businesses fail each year because they haven't done such analysis. If they see money on bank account they think it can be spent, even if there still would be unaccounted costs to be paid. After thinking this proposition I thought that good implementation would require too much resources and forming a proper startup team. In this case the idea isn't the key, as usual. Great execution and contacts and right marketing channels are the trick. As well as absolutely excellent usability and user experience with all kind of devices, which just happens to be quite a way out of my HTML5 skill set.
  • Long discussions about how much and what data to log. I've heard so many times that it's working and we can't fix it when it's not broken. Anyway, this is conclusion because the production environment isn't logging enough data. So even if things fail, nobody actually knows it. Only when things are really bad someone starts to investigate what went wrong, and then is unable to reproduce the problem and this analysis is used to conclude that there isn't a problem in first place.
    As example one project only made crash dumps. But when it crashed badly enough, it didn't crate the dump. Conclusion? It must be working, because there isn't any dump files. Sigh. How about logging and persisting to disk, and then deleting the log if everything goes successfully through?
    Maybe logging is just rocket science? - No, it really isn't, but it seems still to be quite hard.
  • Data Exchange Layer X-Road - Estonian Information System Authority - Watched Finnish documentary about X Road using Biz talk and internal data structure & national service channel (palveluväylä in Finnish).
  • Played a little with Google App Engine's EdgeCache. Seems to be working. Unfortunately free apps can't utilize it. Basically what it requires is Pragma: Public and cache-control: public, max-age=86400 (or what ever you want it to be) headers. So it's similar to common caching or utilizing CloudFlare or many other CDN networks which work by caching HTTP documents.
  • Encountered one file format which was actually ASCII delimited text, not CSV or TSV. It used ASCII codes like 31 Unit Separator, 30 Record Separator and 29 Group Separator, but it didn't use 28 File Separator, because data for each individual file was written into it's own file. Of course You could also use as first column something like record type, so you can easily mix data in ascii file just as you could do it in XML or JSON.
  • Played with custom URL schemes, websockets, local storage, Brython and x-callback-url. - I still haven't used Brython for anything in production. But after playing with it, I could. Only thing which I don't like is long initialization time for mobile apps. It adds extra javascript loading as well as naturally adds lot of computational overhead as well as consumes memory and bandwidth, aka makes page slow. Maybe I'll use it only for pages which can be slow, like posting new messages or something like that. But for normal "browse it quickly" pages it seems like a bad idea. Especially if page can't be shown before Brython is fully running.
  • Open Big Data - utilizing global open data resoruces, Creative Commons CC 4.0, JulkICT, API Strategies,, 6AIKA, OpenData Globe,, Helsinki Region Infoshare, Open Ahjo,, Open Knowledge Foundation Data Ecosystem, Real Time Economy,, "My Data" / "Personal Data", Vendor Relationship Management (VRM),
  • Back in old days I actually logged IRC chats during one Assmebly. I think it was Assembly'94 or so. After putting the log on the net, it got insane amounts of traffic. Because it contained a lot of talk about all kind of topics which interested nerds. I should have monetized that somehow.
  • Wondered again why Google Sites leak draft posts even if those haven't been published yet. There's a work around. Create pages which do have access restrictions and save those. Only when posts are ready move and publish pages. But I still don't get why drafts aren't private by default.
  • There's a fail with Pizza Worm. It's actually one of the reasons why it doesn't run well with modern operating systems. Wait whats that? It's the method I used to protect the high score file for modifications. First of all there's kind of checksum, but there's one additional extra factor affecting the calculation of the checksum which isn't visible in the data. The factor is last modification timestamp of the file. When the game saves the highscore file, it computes the checksum using time which is on purpose bit off from the current time by 4 seconds. After saving the file, it sets the file last modified timestamp 4 seconds into past. Unfortunately most of modern operating systems prevent this modification step and therefore next time when data is being read from the highscore file game rejects the file because it detects that it has been tampered with. Great example why some clever tricks can nastily backfire. But it's still a good example how data can be carried in such a place that many people fail to notice it. Kind of covert channel communication.
  • Some keywords in ultra compact format: Functional encryption, Identity based encryption, Attribute based encryption, Garbled circuits, Software obfuscation
  • Received some feedback from Pierre Quentel the Brython developer. Talked about some optimization and comparison things. Main topic was that some comparisons are really simplified and might give totally wrong impression. Current Brython works like CPython. Writing simple code in JavaScript versus Brython doesn't actually do the same stuff, because Brython does everything CPython does. I personally often like "cutting the corners" optimizations, but I know there are cases when those aren't actually the option. Unfortunately exactly these kind of bad comparisons were used at PyCon 2014 Brython presentation.
  • Refreshed my memory about PostgreSQL Write Ahead Log details.
  • Had multiple long  discussions with several friends about Net Neutrality. There doesn't seem to be simple solution for this matter. I personally like freedom, but I also would like to believe in free markets. Which means that if something needs to be priced differently it should be possible. Either the pricing model is viable or it isn't. If it isn't competition should take care of the situation. Ok, I know that broadband market isn't nearly liquid enough to this happen in reality. But that's just what I want to believe in general.
  • With one interesting integration case, we got a great consult in the team. After very carefully analyzing current application features (for several applications) available APIs and how to utilize existing features to full fill customer needs with reasonable amount of work. The consult checked out our plan and wondered for a while. Then he took a clean flapboard sheet and redraw everything and said thsi is how it should be done. Ok, yes, that's very good plan in general. But it doesn't work together with existing applications as well it might require 25x the work required by the current plan. Even to make things worse the customer was the typical type. We assume this is ready tomorrow and shouldn't cost more than 1000€. Uuuaaah. Creating new communication hub, central master data system and new communication protocols and adding a few data tables and user interfaces and tons of logic to four different applications from different software companies... Well, it doesn't cut with that schedule and budget, it really doesn't. But I believe we all have been in such situation.
  • Lightly checked out Hoodie - Didn't see any use case for it so far.
  • Lightly tried out Firebase. - Didn't see any use cases for it either. Reason? Current systems require massive database access. If database is hosted "over the internet somewhere" it will mean at least bad round trip times and responsiviness and even worse massive bandwidth bills. There are reasons why database servers are usually quite heavily connected to the rest of server cluster. Those are as important for performance as SAN network resources.
  • My experiences about Google App Engine tells me that as usual Cache Everything you can, also use auto scaling and shut down instances which aren't required. Optimizing queries and database structure based on needs is obvious of course. Using Asynchronous methods when ever possible releases workers for other tasks. If you use NDB it does internal caching which old DB interface doesn't. Keeping minimum number of workers trades cost for latency, for highest possible performance keep spare workers which of course add to the bill. Do not index stuff you don't use in your queries, this should be naturally obvious for everyone, but it always isn't. Delete data or move it to alternate storage when it isn't needed anymore. I hope so many other applications would also do this. It's useless to store data which isn't used at all in expensive high performance data storage. Optimize slow requests and naturally if possible make your application to request multiple resources at once, using async database API as well as using separate iframes or JavaScript data retrieval.

Phew, that's it! Last part is coming soon. For it I still have about 300 entries in backlog. I just which I could be more verbose on each topic, but I can't. Sorry.

Edit: See Dump 3 of 3 or Dump 1 of 3.