My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me! My views and opinions are naturally my own and do not represent anyone else or other organizations.

[ Full list of blog posts ]

Python, Cloud, Neural Networks, Deep Learning, 2FA, JSON API, Startups, Mobile Web, Kafka, Samza

posted Mar 29, 2015, 12:56 AM by Sami Lehtinen   [ updated Mar 29, 2015, 12:57 AM ]

  • Wondered in a meeting how different views can be used to approach same matter. Marketing, Technology, Helpdesk, End Users, Invoicing, Server Management, Software Developers, System Peripherals, Customer Sectors. And so on. Result? Lot of semi great discussion, but very little factual results.
  • Something different: G6, PzH-2000, M109 Paladin, B-1 Lancer
  • Read: 10 Myths of Enterprise Python
  • Checked out: Apache Flink - It surely looks pretty interesting and something I would use if I would need it. I'll keep this one in mind. - Kw: apacheflink, flink, apache, bigdata, hadoop, inmemory, java, cluster, yarn, hdfs, mapreduce, hbase.
  • Checked out: Google Cloud Dataflow - Hmm, all github examples in Java, checked out but didn't bother to run. Yet I don't see any use for this in near future.
  • Read: Hackers guide to Neural Networks
  • Read: Deep Learning vs Machine Learning vs Pattern Recognition - kw: Convolutional Neural Nets, ConvNets Big-data, PaaS, Artificial Intelligence.
  • Read: Why does Deep Learning work?
  • Plan: I'm going to focus my studies on Data Analytics and Open Data for a while.
  • Had some fun, as usual. With different kind of combination, packaged and recipe articles while doing CMR, BI and ERP integrations. It's one of my favorite topics, because really simple things get so complicated after all. Especially in situations where external stuff gets referenced by parts of the package but that must not be visible to end user and so on. Of course if everything would be done from scratch it wouldn't be so hard. But usually there are pre-existing complex systems and you'll have just to find a simplest possible way to work around restrictions of these systems while still delivering what's required by the client. That's my daily stuff.
  • Took deeper look at Ansible and SaltStack (Salt) to evaluate those again, tried a few things in test environment. Windows Remote Management (WinRM) PowerShell (ps) - kw: configuration management and orchestration tools, playbook, execution module, state module, Python, SLS, YAML, ZeroMQ, SSH, inventory, orchestration, Vagrant, mocking, testing, scripting, modules, events, reactors, JSON, Linux, Windows, Tower, CLI, templates, master, minions, PyDSL, Jinja, desired state configuration (DSC), Powershell.
  • Read: Moving away from Puppet, SaltStack or Ansible?
  • Read bit more about Vagrant - Why they say development environments? Does development and production environments have some kind of difference?
  • Lol, one guy said in his blog "I spent my vacation reading computer literature and documentation". That someone is like me. ;)
  • Configuration Management (CM), requires quite a bit of server fu, duh. There's just so many dumb ways to fail. Is Ansible or Salt even required, PowerShell is the core of Windows configuration, does DSC do everything required?
  • Checked out Azure Regions and Zones as well as Microsoft Azure Regions and Azure RemoteApp. - Again costs are major player with this technology.
  • PaaS doesn't relieve you from software maintenance. Technologies which are used in your apps will be shut down and you'll need to migrate your apps to utilize newer solutions. Google App Engine Master/Slave datastore will be shut down in July 6, 2015. Applications must be migrated to use newer database (NoSQL, HDR or NDB) before that. I think I'll just shutdown my old projects because of this. I don't have interest of migrating those. kw: datastore, Paxos, eventually consistent.
  • Checked out bunch of HTML5 Front-end frameworks. It's horrible, there are just so many options, in hundreds easily.
  • finally closed, because google deprecated and shut down the database it was using.
  • Had a long discussion about different webshop platforms with my friend which is working in the field. - Prestashop, Drupal, Magento, Virtuemart, ZenCart, osCommerce, Wordpress ja OpenCart.
  • Had some discussion about developing electronic receipt formats futher.
  • Watched Haaga-Helia Future Forum marketing and system integration videos including TARU videos which is about future digitalized and integrated business systems. WhatsApp marketing. XBRL, Real-Time Economy (RTE).
  • Enjoyed wondering different Microsoft Windows CAL options.
  • A Great slideshare post The Emerging Global Web - how Internet is changing the global trade. There are winners and there are losers.
  • Read: Building Two-factor Authentication
  • JSON API - A standard for building APIs in JSON. - Nothing new at all, but it's nice that there are good examples for people making their first JSON APIs.
  • I was asked if I want to join a SIG developing standards for business message formats for Open Data (sorry, no details at this point) and integration APIs. Well, that's most interesting and yes. I'm naturally interested to do that. I just believe my approach to many problems is really pragmatic. Let's see how that works out with people who got much more theoretical or academic approach. I guess it's good and brings out differences and can lead to valuable outcome.
  • Read: Startup advice briefly by Sam Antman - Too short? Maybe it's better to check out the long version.
  • Radical Statements about the Mobile Web - More web vs native apps discussion. I personally don't want to install any apps or crap on my phone or on my computer, unless I know it's absolutely vital software. I always prefer web over native apps in most of cases. But that's just me.
  • I got involved in discussion if using 2FA will stop hacking or drastically improve security. Well, it will projetect from SOME attack scenarios but it surely won't stop hacking. Sure. So usually site hacking is done via exploiting some vulnerability on site or system. At that point they can usually steal anything they wish from the system. So using 2FA won't actually protect against that. As well as if they truly rooted the system, they can do what ever they want. As well as circumvent password protection measures as well as access all data, source codes, and so on. At that point, I really don't care if you got my password or not, that's least of my worries. My old password to Slack was: Q-CfK4h1H_bB0mN7PPvD I guess you or nobody else doesn't care about that fact at all. More important than slack passwords is the data in the system, like if people have given credentials to other systems during chat conversions and so on. From the end users point of view, if they have rooted your device, 2FA won't help either. They can steal already authenticated cookie, they can route traffic via your device so IP won't seem strange, as well as they can basically do what ever they want to. So, yes 2FA protects from some threats, but it really won't protect you from hacking at all. I could go on about this for much longer, but I believe I made the point clear.
  • Lightly checked out: Apache Kafka, Apache Samza
  • Turning the database inside-out with Apache Samza - Aww, so much talk about things which are obvious to everyone. Replication, caching, indexing, transactions, race conditions, locking issues, materialized views, data transformation, replication stream, transaction log write ahead log, immutable facts, immutable events, better data, analytics, fully precomputed caches. It's better to skip straight to "Let's rethink materialized views!" section. Other keywords: HTMLDOM, JSON, CSS, React, Angular, Ember, Functional Reactive Programming (FRP), Elm, Publish, Subscribe, Notify, Request, Response, Meteor, Firebase, Subscribers, RethinkDB, Designing Data-Intensive Applications, stateful steam processing.
  • Created: Google+ Brython Users Community
  • Studed several posts from Open Knowledge Blog URL shortener closed

posted Mar 25, 2015, 10:39 AM by Sami Lehtinen   [ updated Mar 25, 2015, 10:48 AM ]

Here's just the about page from the site as memory. credits, about, info


Do you hate long short urls like
or which are short in theory,
but totally impossible to remember! solves your problem. We provide really tiny urls.
We can keep urls short because our urls expire
30 days after last visit or 365 days after creation.


Domain by SpamGourmet
Minimalistic URL shortener by Sami Lehtinen.
Powered by Google App Engine.


Free service, no guarantees or warranties whatsoever.

Change log

2015.03.25    Service closed down, because Google closed Master/Slave datastore
2013.12.22     Version 4.10 released, service disabled
Service is disabled for non onion urls. .onion urls do still work.
Because I simply don't have time to maintain this
project. Deal with potential abuse etc. Otherwise I have been very
happy with Google App Engine and it's reliability and performance.
It was good as long as it lasted. - Thank you.
2013.02.16     Version 4.9 released
Minor internal restructuring & HTML structure improvements.
2012.12.01     Version 4.8 released
Added support for .onion domains.
2012.10.13     Version 4.7 released
Updated DNS proxies (IPv6 support dropped temporarily).
Now two primary DNS servers are hosted on own servers.
Only two backups are on located on web hosting services.
Primary servers are in EU & US and backup servers in US.
Failed server information is cached for one hour for
improved performance.
2011.12.18     Version 4.6 released
Google Safe Browsing API implemented.
Internal Memcaching improved.
2011.12.03     Version 4.5 released
Unicode URLs are now working perfectly.
Further improved caching. Now all redirects are public
and cacheable for 1 day.
2011.11.05     Version 4.4 released
Lot of tuning with Unicode urls, solution still isn't
perfect but it's working most of time.
Added second primary web-DNS service server. It's hosted on
another of my virtual servers, using uWSGI and Nginx.
2011.10.30     Version 4.3 released
Added two more backup DNS servers unless primary server
is unreachable or times out. Now service also supports
sites (domains) which have only AAAA DNS records. (IPv6)
2011.10.06     Version 4.2 released
SURBL checking using secondary Linux server added.
2011.10.01     Version 4.1 released
SURBL based spam checking activated. Now every link submitted
to service will be checked against SURBL reputation data.
2011.09.07     Version 4.0 released
JavaScript based spam checking activated. It also releases
server resources, because if JS is not evaluated from create
page database won't be accessed at all.
Frontpage is now fully cacheable, earlier GAE sent permanent
redirect but with no-cache tag. It's quite pointless afaik.
Now it's fixed. Frontpage is browser cacheable and there is
no additional redirect anymore.
2011.09.05     Version 3.1 released
Spam controls are tightened further. Now all URLs (domains)
in database are daily rechecked.
2011.09.03     Version 3.0 released
Database logic changed, now shorter urls are always preferred
instead of oldest urls. Lot of internal database access related
2011.07.07     Version 2.1 released
Database uses is now using faster eventual consistent reads.
Memcache is used for caching data in memory.
2011.06.18     Version 2.0 released
Implemented three independent spam protection mechanisms.
One of those is WOT.
WOT should prevent creation of links to spam adverticed sites.
Unicode URL handling fixed. URLs like http://ä.Ö/€ do work now.
Exception handling and database transactionality and concurrency
improved further.
Added Changelog.
2011.04.11     Version 1.0 released
Database transactionality improvements.
Added Memcaching.
2011.03.24     Version 0.9 released
Let's see if this works out. - URL Shortener

OVH, uWSGI, PostgreSQL, NoSQL, GQL, CM, Chef, Puppet, Ansible, Salt, PaaS, BitTorrent,

posted Mar 22, 2015, 6:50 AM by Sami Lehtinen   [ updated Mar 25, 2015, 10:49 AM ]

  • Something different: Chinese DH-10 Cruise missile, Computer Algorithms @ Khan Academy
  • OVH Classic servers strange lag bursts? I assume that the host system is running out of memory and swapping stuff out. So even if VM doesn't show stuff is being swapped out, it is actually swapped out by the host os. This leads to situations where access to memory areas which hasn't been accessed lately can be very slow. It's strange feeling when it seems there's plenty of memory, but in actually it behaves like it's swapped out. On network side there's also some strange things. I'm not sure if it's directly related to this, or if there's some kind of other network traffic throttling or prioritization being used. Because in general network connectivity seems to be great or nearly perfect low latency no packet loss, but in reality when transferring data speeds aren't always what you would expect. Maybe they're limiting rwin or something else. Don't know. But that what I'm experiencing. Compared to the OVH Cloud servers there's clearly lower priority on these OpenVZ boxes. On CPU as well as on Networking side.
  • Wondered how badly Outlook 365 is developed. It seems that it's horrible mess because being partially local email application and partially webmail application. Some of features work only using the desktop client and some of things work only by using the webapp. Most interesting result from this mess is that the desktop app doesn't combine cloud data and local data as it should if it wouldn't been designed so badly. Other email clients like Thunderbird work just so much better. If you don't have local copy of some message, it shouldn't mean that application is unable to show the message. Thunderbird works by caching messages locally, some messages are available some others aren't, which is perfect. I can still see all messages. But with Outlook fail. You can see that folder got 400 messages, but you can only see 100 of those or something similar. There's no way to see rest of the messages, unless synchronizing everything locally, which is simply really bad implementation as far as I know.
  • Studied bit more about uWSGI Python Module - Now stuff using it is working perfectly.
  • Finally managed to configure uWSGI fastrouter-subscription-server so I can run load balancing and other stuff easily with it. What was my problem with it? I didn't realize that when using ports other than 80 you HAVE to enter the port number and when using port 80, you MUST NOT enter the port number. Unfortunately there are no messages what so ever to help with this task, so you don't get any kind of hints, you just have to find the problem via trial and error or by reading the source code as I did. It's good documentation but might take a while to digest.
  • PostgreSQL vs MySQL / MariaDB
  • 7 PostgreSQL data migration hacks
  • Launched one ug project using Google App Engine (GAE) - Platform as a Service (PAAS) - Seems to be working fine. I just would so much love GAE if it would support Python 3.4. I also like Jinja2 template engine very much when using 'alternate platforms', like Linux or Windows servers. Currently I'm using App Engines own template engine with it.
  • Fine tuned my PostgreSQL RDBMS database slightly for performance when using it with peewee ORM. Got nice 25% performance gain by just changing a few lines in a query. Now I'm using lateral join.
  • I'm also using a few less well known SQL databases via ODBC (pyodbc).
  • I just have to say I kind of hate NoSQL term, because it doesn't actually mean anything at all. There has always been different object storages and solutions without transactional features and so on. Even GQL uses NoSQL database using SQL like statement syntax, but it's must drastically limited in features.
  • I think I might need to study more Docker and OpenStack. But summer is coming, maybe next fall.
  • I did take a look at Chef and Puppet, but I think I'll prefer Ansible right now. With current number of servers I'm administering it's just on the edge, if I should use advanced configuration management system (CM) or is it better not to use one. Setting up such system will take considerable effort. It's just smart not to invest heavily in tech that might not be needed, or does not produce meaningful profits or costs savings. Also Salt seems to be pretty interesting.
  • Even if big data is on such a huge demand. It's always a good question is the data reliable and what do you use it for. Having just data it's utterly meaningless, if data quality is bad, also the results even if technically correct can be really seriously misleading. Being data scientist or big data specialist also requires wide set of business and management knowledge. Doing technically correct things without understanding what you're doing can lead to extremely bad results. On the other hand, if you user right tools and methods, big data isn't any different than any other data. Just the data set itself is larger. Basic analytic and statistics skills are still needed. As well as using common logic to verify results, can these even be right, even in theory. I've seen so many times that people generate reports or do something, and say this is the result. When you take a look at it, it's immediately clear that this can't be true nor done correctly. But the question is why didn't they realize it when handling the data. Common sense and knowing your data are really important for making a basic reality checks.
  • Studied Vuze BitTorrent client. Which got new Swarm Merging feature. After reading the specification carefully I don't personally believe it's going to be meaningful feature. It's nice idea, but in reality it's not as useful. On the other hand systems like Freenet and GNUnet have shared data blocks between different downloads 'always' and it's been much more efficiently done than on file level. Not exactly same but reminds me from eMule Advanced Intelligent Corruption Handling (AICH) feature.
  • The Economist, it's just great stuff to read. Even if I linked to web site, I recommend reading the full version.
  • Is PaaS a perfect solution? Nope, it isn't. PaaS isn't silver bullet nor it guarantees any portability between platforms. Actually it can tie you to one platform extremely tightly. Of course you can make application which isn't tied to platform, but it adds overhead, affects performance and so on. For some tasks PaaS is great, but in some cases working around issues with PaaS can hinder whole project or just make running the systems very expensive compared to alternatives. It's just like using mobile frameworks which guarantee write once cross platform applications. It can be great, but it can also make things hard or nearly impossible, add lot of overhead and cause total failure of reaching promised goals. When ever something is "a perfect solution", I instantly get highly skeptical. Either the talker is doing pure marketing, or doesn't know what they're talking about.
  • Users asked if I can relaunch Of course I can. But I just need to fix a few things with it. I'll also launch it with new server which will provide vastly superior performance. Still got a few JavaScript (ugh) kinks to figure out. Everything on uWSGI, PostgreSQL and Python & Server side (Ubuntu Server 64bit) is working perfectly already. This is hobby stuff. I'll only code it at home, when I got the right mood. So things might not happen so quickly as those otherwise could. My main goal is now to vastly improve user experience and clarify a few things, even if everything already is technically working. I hate it when techies say it's ok, even if user experience is absolutely horrible. As example, I don't validate forms right now, I just tell user FAIL if there's anything wrong with the content. Might technically be working solution, but it surely annoys users.

Privacy, Etags, TLS 1.3, Nearline, Distributed, SSDs, BI, FIDO U2F, Python, Subspace, CMMI

posted Mar 14, 2015, 11:25 PM by Sami Lehtinen   [ updated Mar 14, 2015, 11:26 PM ]

  • Android privacy email address autocomplete privacy horror. Why are programs often written by so poor programmers? Why is everything stored the application sees and you can't even remove that information.
  • For some reason it seems that CloudFlare converts all Etags to Weak Etags. So even if I set strong Etag when the browser returns the query via CloudFlare it now contains W/ prefix. If I drop CloudFlare and do same stuff directly, then Etag is strong ie without W/ prefix.
  • My brain hurts, had again a few discussions about what's the difference between GiB and GB, what's Gbit and what's the difference between bit and byte
  • Had a few discussions about cost based vs value based pricing. Cost based model is always bad.
  • Reminded my self about TLS 1.3 and AEAD - KW: TLS handshake, TLS session resumption, TLS False Start, TLS record size optimization, Early termination, OCSP stapling, HTTP Strict Transport  Security (HSTS), CCM, EAX, OCB, GCM, AE, MAC
  • Signs that you're a bad programmer. I really liked this article. - Found out many issues. Smile. Especially being in hurry as well as avoiding overhead by putting something where it doesn't really belong. Depends from project size and scope, if that's Okish or really bad. Of course if required you can come back later and clean it up. - There's also clear difference between if it's done as temporary prototype code or if it's planned to be something more permanent. - Especially "do what ever it takes" to make it work right now, without researching topic sounded pretty familiar for quick prototype testing code. Why write perfect prototype code? If it won't work out it's getting discarded anyway. As well as adding temporary unrelated features to existing application to avoid the overhead of starting a new project. Smile. - It's important to know how these should be done, if it would be worth of it. - Also the Pinball Programming made me smile really much. But that's one of the symptoms of the previous stuff. If you're writing program that's going to be probably run only once in production, how much time you should waste to document and fine tune it. If it runs once and produces required results, that's it. - Yet similar code in more used programs is 'fatally flawed' and making everyone else life hard. - Anyway the Symptoms list made me smile so hard, there's just tons of stuff there which we all have seen several times, awesome! - I don't want to say anything about the "Unfamiliar with the principles of security" section, because it would be just way too horrible. - But the good thing is that the list was completely familiar and didn't bring surprises. It's just that in some cases there's decision to knowingly ignore some rules for temporary stuff. 
  • People who claim this temporary ignorance is bad, are doing it wrong them selves. Why to make mold and cast it from bronze, if quickly cutting it with knife from styrofoam does practically the same job? Cutting corners when it's suitable is also perfection. Over engineering can be really expensive. Actually Mythbusters are also pretty good at cutting corners, how to make something that works with really limited resources, even if they have been lately using pretty large resources.
  • Something different: BrahMos, Durandal, NRO, DigitalGlobe, Gravitational Wave, BICEP and Keck Array, Disc tumbler lock, Lockwiki, KW: Keyhole, Hexagon, Topaz, Crystal
  • Reviewed source code of one OpenSource project and immediately found two serious bugs. Well, it's good that open source code can be reviewed by anyone. kw: code review, bugs, fixed, python, reviews, commit, patch, fix, bug, git, github, pythonista.
  • Google Cloud Storage Nearline -
  • Checked my SSD wear leveling data, block erasure information and total amount of data written and health. It seems that now when the SSD drive has been used for 1,5 years. It's life status is about 99% left. Which means that I don't need to be particularly worried about 'burning out' my SSD. I highly doubt that current SSD hardware will become obsolete in less than 100 years. Of course there's a little problem of my personally expiring before that happens too. ;)
  • Found nice trap from one Python project (not my own this time), they used 'is' word to compare two values. But there's a big trap with that in Python. 1 is 1? Ok, 1 is 1+1-1 that works out. But the code used is on totally wrong places. Because if you use is instead of == it's really bad habit because when values get large enough, it's going to fail. And large doesn't really mean lager on Python int scale at all.
  • There's no now for distributed systems KW: Google Spanner, FLP, CAP theorem, GPS, NTP, Paxos, ACID, Strong Eventual Consistency, Apache Zookeeper
  • Goodbye MongoDB hello PostgreSQL - Key Value storage, JSON indexing, performance, reliability, correctness, consistency, sql, nosql, schemaless, replication, sharding, sharded, distributed.
  • Lol, one unannounced organization got Cryptowall on their server and it also encrypted backups. So, backing up to media which is connected all the time to the system isn't a great idea either. Like I have said, there should not be option to delete or access earlier backups, just send more delta data.
  • Yet another SSD endurance test - I'm heavy user and I've been writing about 1 TB / year to my SSD. So again, I think the drive will become obsolete in less than 100 years, so the actual endurance doesn't matter. Some of the tested drives lived up to over 1 PB of writes.
  • Checked out payment & identification solutions: RuPay, Aadhaar, China UnionPay, JCB, American Express, Diners Club
  • Re-read: ext4 and btrfs man pages, studied Bluetooth 4.2 smart.
  • Several BI articles, kw: data virtualization, etl, web services, soa, esb, information as a service, CIO, CDO, nosql, hadoop, sql, Gartner, SAS Institute Federation Server, SAP Hana, SQL Server Information Services, IBM InfoSphere, JBoss, Composite Software, Informatica, Cisco Data Virtualization Platform, Dendo, Dendo Express. Thoughts: Maybe I should try Dendo Express to see what it really can do.
  • I've been wondering why payments and identity businesses are considered to be separate businesses. Basically payments are just application of identification. Technically all this stuff is just so simple, when you got the primitives right. For primitives I mean public key stuff, which already exists in easy to use libraries like NaCL. - - When you can identify the user using public keys and users can sign tokens using their private keys and you can verify those using their public key, what's so hard? It should all be technically trivial. The whole problem comes from the ecosystem, are solutions supported? How easy those are to use? Are there any transaction fees, who's managing the trust network and so on. Is it easy to use without mobile phone, easy to use without computer, can it be used without the users authorization and so on. So after all it turns out there's no simple universal easy to use or cheap solution after all. That's the reason why market is so extremely fragmented. Worst part is to get national laws to accept authentication solution. If they used this solution to make a contract can it be enforced legally?
  • Checked out Hypersecu FIDO U2F Security Key - And their blog  
  • Python internals and things you just need to know.
    >>> 256 is 256-1+1 # This will match
    >>> 257 is 257-1+1 # But this won't anymore.
  • If you don't know the environment you're developing for, your code can contain very serious bugs which are hard to spot, because you don't understand the mechanism causing those. Just like the case where ASP int() worked like floor of most languages. Always rounding "down" even with negative numbers, so -1.1 becomes -2. As well as my fail with Peewee SQL where I didn't realize that not True doesn't match with None.
  • Let's see, there's updated version of Subspace documentation - I also had chat with the author 'ctpacia' about this topic.
  • Had a training about CMMI project management, product managing web sales channel, Kanban, Lean, Scrum, Business Model canvas, KanScrum, analyzing program usage situations and documentation and using this information to improve software products.

My personal vision about future of authentication

posted Mar 14, 2015, 6:09 AM by Sami Lehtinen   [ updated Mar 15, 2015, 3:17 AM ]

I'm thinking about credit card form device which would have full surface touch screen and slidable USB connector for charging and connecting to computer. Wireless charging would be nice too. It would also naturally feature Bluetooth and NFC connectivity as well as have very small CCD camera to read QR codes in cases where no other communication solution is available.

A few use cases:
  1. Pure NFC authentication to open doors or to login web sites or what ever mobile applications
  2. PIN-code to activate high security identity and then NFC authentication to open high security doors or to login to bank or tax authorities or and so on
  3. Bluetooth authentication for low security seamless proximity locking like car doors or to login generic low security web sites where you usually use crappy passwords
  4. Challenge response using number keyboard or QR codes for medium security identification like over telephone or any other situation. Also if high security identity is used, PIN code is required before getting the response code.
  5. Receiving signing request over Bluetooth, NFC, USB or QR code from web site or application to sign high security transactions. In this case the card will display information about the transaction you're going to sign. This prevents scams by malware where you think you're paying 5€ to charity, but in reality you're transferring half a million to some random Nigerian bank.
  6. Biometric identification could be also used for low security purposes instead of PIN code. But currently it's a technical problem. Also if true physical presence is required it's better to use on location sensors, card could provide required information to identify the user with the sensor. Of course all these measures can be combined with other things. Like it's card and on door biometric detector, which makes sure you're really there. If it's only over NFC identification which tells that yes, you're here, there could be relay attacks and it doesn't mean that you're really there. It just means that your identification information is available right now. Of course this could be also used in some cases as feature. It makes sure someone authorized you right now. As well as the card can display the information (once again) what you're exactly authorizing the other person to do.
So card can have multiple identities, those identities can be 'remotely readable', require NFC or USB mode, as well as PIN before activation for higher security purposes. Display can be used to show signing requests as well as to confirm identity of the service which is requesting authentication. For tinfoil head guys the display can naturally snow key fingerprints so it's possible to confirm that nobody's playing with the keys.

Naturally the authentication database and service should be open for anyone. So it can be used. One of show stoppers is that using authentication solution has been made so hard process that it's virtually impossible for everything else than for companies who are specialized for authentication or are very large players and spent huge amounts of money for this kind of stuff. Using electronic authentication should be trivial, just as easy as it's to check password or any other official identification document.

I know, I know mobile phones can do all this at least in theory. Problem is that mobile phones are nowadays full computers and therefore it's probably possible to mess with it using malware. And for sure it's possible if the device is rooted.

In Finnish - Not an exact translation. Contains lot of generic blah blah.

Kirjoittelin joskus tästä aiheesta pitkästi. Useimmissa näissä autentikointitavoissa on se vika, että sitten tunnistautuminen perustuu laitteeseen. FIDO U2F tokeneissa ei ole esim. lukitusta. Kuka tahansa joka saa laitteen haltuun voi väärinkäyttää sitä. Tämä on ihan merkittävä riski. Tietysti tuota voidaan softapuolella korjata vaatimalla muutakin tunnistautumistietoa. Toinen ongelma on se, että mm. pankkikäytössä, laitteella tunnistaudutaan, mutta sen jälkeen pelikenttä on taas täysin auki. Eli olisi sairaan hienoa, kun olisi sellainen ratkaisu joka myös varmistaisi sen mitä sillä tokenilla halutaan tehdä, eikä vain sitä, että onko sulla tokeni. Muistaakseni tällaisia ratkaisuta on ollut harvassa, mutta joku on senkin toteuttuanut. Käsittääkseni muutamalla Saksalaisella liikepankilla on mm. tällaisia käytössä. Nykyjään tuon voisi toteuttaa juuri tuolla mobiilisovelluksella, mutta silloin ongelmana on se, että entäs jos palvelua käytetään jo mobiilina. Silloin luotettavuus taas putoaa. Tietenkin asiaa varten voisi tehdä täysin dedikoidun laitteen, mutta se taas nostaa olennaisesti kustannuksia. Dedikoitulaite on kuitenkin mielestäni turvallisin vaihtoehto silloin, kun käyttötarkoitus on sellainen joka oikeuttaa tuon kustannuksen. Nykyjään myös mm. Bluetooth 4.2 tai vaikka BLE mahdollistaa sen, että tuo dedikoitulaite ei välttämättä ole hirvittävän kallis. Vastaavasti CCD kamerat jne on edullisia, joten laitteen voisi toteuttaa myös sellaisena, että sitä voi käyttää myös ilman matkapuhelinta. Toisaalta jos haetaan taas varmaatunnistusta pitkillä avaimilla, niin sitten tiedonsiirto voi olla haaste. Eiköhän tekniikan kehittyessä tule tällainenkin ratkaisu tarjolle.
Visioni? Luottokortinkokoon rakennettu laite, jossa koko kortin kokoinen kosketusnäyttö ja näppäimistö, liittämistä varten reunasta voi tökätä USB liittimen ulos tai voi käyttää bluetoothia, NFC:tä tai laitteessa olevaa CCD kameraa tiedonsiirtoon QR koodeista. Riippuen sovelluksesta voi homma toimia seuraavasti:

  1. Pelkällä kortilla mm. NFC (ovet rakennuksissa), jos korkeamman turvatason ovi voidaan käyttää kortin näppäimistöä lisänä, ettei pelkkä kortti kelpaa.
  2. Pelkällä kortilla mm. bluetooth (vaikka auto proximity lukitus)
  3. Syöttämällä challenge koodi näppäimistöllä -> antaa responsen -> vaikka puhelinautentikointi
  4. Korkeanturvallisuuden kirjautuminen, kytke USB:llä, näet mihin palveluun olet kirjautumassa, anna pin koodi -> kirjautuminen.
  5. Mobiilisovellusken valtuuttaminen bluetoothilla ja hyväksynnällä laitteessa.
  6. Johonkin random web-palveluun kirjautuminen jossain matkoilla vaikka 'web kioskista'. Lue QR koodi challenge login sivulta ja syötä response.
  7. Isomman maksusuorituksen hyväksyminen, laite näyttää maksusuorituksen tiedot näytöllä, annat pin-koodin, jonka jälkeen allekirjoitettu tapahtuma palautetaan pankkiin.
Luonnollisesti myös tuki useammalle täysin itsenäiselle identiteetille joita voi poistaa ja generoida tarpeen niin vaatiessa. Joo, lista ei oo ihan aukoton, mutta konsepti tuli varmaan selväksi. Luonnollisesti tähän palveluun kuuluu myös ns. kansallinen identiteetipalvelu, jonka kautta tuon tunnistamisen voi tehdä kuka tahansa. Monesti asiat on tehty vaan hankalaiksi, eikä tunnistautumista pysty hyödyntämään kun ne tahot, jotka erityisesti alkaa sen kanssa hankalasti nysväämään. Asioista tehdään tahallaan hankalia ja kalliita ja sitten ihmetellään kun kukaan ei käytä niitä, esimerkkinä VETUMA.

Btw. En tiedä onko operaattorit tajunneet asiaa, mutta itseasiassa mobiilivarmennetta voi käyttää tällä hetkellä ihan kuka tahansa ilman mitään sopimuksia. Ai häh? Miten niin? No mitäpä tarjoavat ns. testipalvelua, jonka kautta tuo onnistuu. Tuo on todella näppärä juttu jos sen vaan tajuaa. Kiitos siitä, että testin onnistumisen kuittaussanomassa näkyy kaikki tarvittavat identifiointi tiedot.

Sales arguments, P2P, Cloud Stuff, OVH, Hetzner, Profit, MongoDB, CDMA, VPN / IPSec

posted Mar 7, 2015, 9:51 PM by Sami Lehtinen   [ updated Mar 22, 2015, 6:46 AM ]

  • Some old P2P stuff: You completely forgot eDonkey 2000 (ed2k) and eMule. When saying that Bittorrent replaced Gnutella. eMule got perfectly working DHT implementation including serverless file search. Technically it was much more advanced than Bittorrent, it didn't require .torrent files. It did fetch AICH data from other peers. Only thing it lacked was efficient coordination between download and upload. Overnet (from ED2K developers) tried to fix that with Horde mode, but it failed because everyone was already using eMule. Horde tried to pick 5 fast peers for mutual trading. Requirement for trackers and .torrent files felt really backwards stuff after using ed2k. The Gnutella with super hubs was called G2 protocol. Gnutella also utilized GWebCaches for retrieving bootstrap information of other active nodes.
  • uWSGI is just incredibly versatile web server with tons of build-in features. uWSGI Python Decorators
  • HTTP compressing using Nginx gzip module or using uWSGI Transformations.
    Different: ASDS, RSM-56 Bulava
  • More great stuff by Charles Leifer - Querying the top N objects per group with Peewee ORM and comparing SQLite and PostgreSQL performance
  • Guys at major telco don't even know the difference between Megabits and GigaBytes. OMFG! I'm still laughing at them. Does 2 Mbit/s constant traffic total up to 168,75 GBytes/day as they claim? Hire someone who knows even basics of this stuff. Pretty please? - Thank you! I guess they don't even realize how ridiculous their offer looks with totally incorrectly calculated competitor comparison chart. They also presented different lies about routing and other stuff. They assume that buyers don't understand anything at all and they can openly lie to their clients. They're incompetent enough to provide facts and when you check the things your self, you'll find out that they've been lying all the time. Get the darn facts straight and stop lying and telling stories that are nice to hear. I think this policy also unfortunately clearly tells how bad purchasers many companies are, they just don't anybody who would understand what is being bought, they just believe all lies from sellers and then continue happily.
    I also had meeting with one company, which offers private cloud services. They were competent and stuff was good, only problem is that I don't want to buy stuff which I don't need. There's no need for expensive private cloud solutions, because public cloud can provide same stuff cheaper. After the meeting I told them that if you really think you can make reasoned offer which clearly shows any benefits for us I'm interested to see it. But as I guessed, I didn't ever receive the offer.
  • Nokia Siemens Networks (NSN) presents HSPA+ Multiflow new technology? If I don't remember absolutely wrong there's nothing new about this. All the original sales material from Qualcomm about CDMA mentioned that one of it's important benefits is soft handover which prevents dropped calls because handset can be connected to three base base stations simultaneously. And that was surely before any 3G time. Or has it really taken so long before it becomes reality? I would guess I did see this stuff around year 2000 or so.
  • Modified my Bottle project so that it now works seamlessly with Apache2 as CGI (yak, but possible), uWSGI standalone application or using's own internal dev / testing HTTP server.
  • Nice reminder from F-Secure - Internet of Things will realize all security nightmares especially hacking someones smart home can be really fun. Loud music, blinking lights, all night long!
  • Got again several phishing emails trying to trick me to install malware in my computer using various system exploits. You've gotta be really careful nowadays. But I guess that's not enough. Persistent attacker will succeed at some point.
  • Checked out Google Cloud Platform available Regions and Zones. US, Europe, Asia.
  • Decided to launch server performance tests using OVH and Hetzner servers and comparing those against Amazon AWS, Google Compute Engine and Microsoft Azure. I think the bang for buck ratio will be a lot better using these awesome European competitors.
  • Quickly checked out MongoDB 3.0, no I didn't install it. Just read the documentation.
    kw: Pyramid Profit, Multi-Component Profit, Switchboard Profit, Blockbuster Profit, Profit-Multiplier Profit, Specialist Profit, Installed Base Profit, Specialty Product Profit, Local Leadership Profit, Relative Market Share Profit (Scale Profit).
  • That's right! - "If you understand a problem better than anyone else, you'll be able to create better products, and customers will pay a premium to work with you."
    Just once again reminding everyone about cloud service providers. It doesn't matter what their policy says. Policy is just some text, which is supposed to make customers feel comfortable. It doesn't practically mean anything all. They can still do what ever they wish with your data, even if it's against the policy. All crimes are ok, as long as you don't get caught. So saying that the policy blah blah what ever, is absolutely pointless. Many of the official documents and guidelines are exactly that, official papers to make things look good, nobody actually does things according the papers. Yet some people just don't get it that those papers do not matter in reality. Fact is that you can't ever delete anything from the cloud, it might be hidden from you but you don't know if it's deleted. Nor you can ever know where that data finally ends up.
  • Laughed again about the Facebook privacy news. User was surprised that administrators were able to see their data. Of course they are. If you have something that's so secret you don't trust it to them, don't use Facebook to transmit data. At times it makes me really wonder how little people think about anything at all.
  • Finally fixed one unreliable IPsec VPN. The key to fix things? Well, disabling the Dead Peer Detection (DPD). Some manufactures allow you to configure options for DPD like how many checks and how long to wait for response before disconnecting. But in this case the DPD didn't have configuration options and that was the source of the problems. When ever DPD was on, it caused VPN always to disconnect repeatedly when connection was under any load which caused latency to grow. Extremely annoying. Especially when you combine that with transfers which got resume chunk size up in gigabytes. When ever connection resets the transfer of 2 GB chunk just restarts from the beginning. Wonderful, just so wonderful. Well why was the DPD enabled in the first place? Well, that's because IPsec VPN state machine is what it is. When tunnel is open, it's open, even if other side is reset or disconnected or what ever. Causing often situations where other end thinks the tunnel is open and refuses to connect it and other end thinks the tunnel is down and tries continuously to reconnect. - Duh! Before dropping DPD option, it could cause VPN to get stuck and working again about 200 times / day, which was naturally extremely annoying for everyone.
  • Removal of SD card software support and now with latest Galaxy Phones removal of whole SD card slot is typical marketing scam. They claimed that reason to remove SD card slot is that the internal memory is faster. Lol, what if that speed different doesn't matter at all. It's just like one server provider claimed that all cloud data archive servers need SSD disks because those are faster. Unfortunately the sales guys didn't understand the customers needs. I know what we need, and they don't. So don't even start arguing with me about it. You're going to fail, and just look really bad. - Thank you.
  • Unfortunately it seems that most of sales guys don't actually know anything about the stuff they're selling. They're just using high level marketing hype stuff and hoping that customers are senile and believe what ever marketing lies is fed to them. It's almost fun to check how incompetent these guys are. To be honest, I don't have anything against their enterprise SSDs. The only problem is that they're asking too way much money. If they offer their storage space with same GB price as 6 TB archival drives, I love it. But I don't want to pay enterprise premium for stuff, I don't really need.
  • As final words I have to say that there are also positive exceptions. I had a really nice meeting with EverCloud guys and I really liked them. They knew their stuff and didn't try to feed me non-relevant lies and arguments.
  • It feels like they're trying sell Ferrari to the pizza delivery company. I'll buy it, when better than the el-cheapo Ford or what ever other city car. Trying to feed arguments about how good and great the Ferrari is, are going to just waste time of both parties. How about delivering realistic calculations about, initial cost, reliability, service cost, additional costs by potential vandalism and winter capabilities, fuel consumption and so on. It's almost guaranteed that the el-cheapo Ford is much better than the 'so great and awesome' Ferrari. The dumb sales pusher guys are trying to push using absolutely non-relevant silly arguments. It's red, it's cool, so darn what? Does it save me money, deliver pizzas during nasty winter storm and help my company to generate profits?
  • I've heard that there are many very large companies doing exactly the kind of marketing. No real arguments, just high level values and other hype kind of crap. Did the tech guys who really get it, give approval for it and confirm the lies of the sellers? Or was it some kind of decision made on golf course by totally incompetent manager or boss? Yeah, this stuff is straight from Dilbert.
  • PostgreSQL pgbench results from Postgres 7.4 to 9.4 - Really interesting charts. If someone tells you it doesn't matter what database is being used (like I do at times), it only tells that the performance requirements aren't usually very high. Differences can be actually quite drastic between different kind of solutions, query types and server configurations. Internal optimizations and how code are organized and how well it works internally with different locking stuff and so on, can make huge difference. - So when they claim database X is so much better than database Y. Don't believe it, try it out with your own work loads and hardware and see the actual results. If they don't let you to try it before buying, don't buy it.
  • The previous statement actually sounded good. I would like to have brand new Ferrari delivered for me for free, so I'll test it for one year. I'll drive it and then I'll write a cost comparison report. I hope you'll provide me credit card which I can use to pay all the costs of the car. Because I assume it'll be more expensive than the small city car which I'm comparing it to. Luckily where I'm living, the winter road maintenance really sucks, so it's interesting to see how many days of year the car is totally unusable because it's stuck from bottom on ice bumps. 4 WD and studded tires are good option around here.

Thoughts about Seagate 8TB HDD drive review

posted Mar 7, 2015, 8:39 PM by Sami Lehtinen   [ updated Mar 7, 2015, 10:26 PM ]

Seagate 8TB HDD drive review

I didn't like the review. No, I didn't say I didn't like the drive. It seems that the tests they run wasn't designed for this kind of usage. As well as some very interesting key data was completely missing. They should notice that as the article says, the SMR is drive managed. So they can't test it like traditional drives are tested. The complex internal state of the drive affects the situation seriously. As well as with that drive capacity the internal garbage collection (GC), data compaction, re-ordering, processing, releasing space for writes and so will take hours, maybe even days if there's heavy load on the drive.

So if they did the tests as those are usually done, in quite short batch, that's one of the mistakes. You should run test, and then run next test tomorrow. Also the total amount of data written and especially modified on drive affects the state a lot. Incredibly fast random 4 KB write means that the data needs to be later arranged on disk. I would be very interested to see even simple chart where there's random 4KB write, measured. As well as blocks and latency & MB/s chart for just growing write data set. Because I assume it changes at some point drastically. This test should be run up to 24 TB of writes.

My guess is that there's quite limited amount which is written extremely quickly. Then there's some amount that's written with higher performance in some kind of write-ahead-log (WAL) and at some point before the 8 TB limit, the performance drops even further. I would also like to see same test with larger blocks and then finally with linear write where drive is just written over and over in order. But they didn't test those features, they tested it as it would be traditional disk, but it isn't.

This kind of drive could also benefit from discard / trim feature, which would tell it that it can ignore data in some existing blocks when doing garbage collection and write those directly over and avoid read modify write cycle.

Also the test 70% read and 30% write was bit strange. They didn't mention how it was done. Because they got so much higher read IOPS compared to write IOPS it makes me think that the test was run as example using setup where there are 7 threads reading and 3 threads writing. So if the drive and OS prioritized reads this is what I would assume to see. It's really easy to forget how big part operating system plays in drive tests, unless tests are run on RAW drive interface. In this case the results they got are possible. But we could run similar test using bit different setup. Let's just run 10 threads, which are all reading 7 times and writing 3 times in cycle or something similar. In this case also the performance could be greatly affected by the fact if the 3 writes are some of the blocks in the seven reads or not. If I would have one of these Seagate drives, I would run quite different tests set than what they did. Using this kind of synthetic tests which do not reflect reality in any way.

This is a good example like how people mischaracterized SSHD hybrid drives using random read and some consumer SSD drives with prolonged exhaustive write tests, causing drive to jam with slow block erasure and GC, which won't happen in normal usage. In this case of hybrid drives random read didn't really give right picture about performance in daily usage. As well as in case of those SSD disks sustained high speed write didn't give right picture. Especially using this kind of synthetic tests can give bad impression, because those doesn't reflect reality in anyway. These tests are only good for measuring performance of disks without complex internal state.

Of course during these tests it's also easy to forget that at least in desktop usage write performance isn't that important? Why? Because OS can buffer writes just like the Seagate Archival HHD does. I can write even to slow USB stick 1 GB instantly and then OS just writes the data to stick in background. As well as when they say it's archival drive, so the write process to disk is buffered using alternate storage. Nobody's actually waiting for it to be completed. It just trickles in background, probably freeing space on alternate storage like SSD or SAS drives when completed.

ETag, gzip, HTTP, NSA, DHT, Integration, Specification, Bad code, Python 3.4 libs

posted Mar 2, 2015, 8:09 AM by Sami Lehtinen   [ updated Mar 2, 2015, 8:10 AM ]

New stuff:
  • NSA's firmware hacking - It seems that they're missing the fact that harddrives got much more space than the 'service area alone' which is reserved for own operatios. Spare sectors and SSD wear levelling area and all "free space" can be used to store a lot of data on disk, especially if it's not completely full. Even then everything except the free space remains fully usable before drive starts to fail. Also I didn't like the fact that they said store data unencrypted. I'm sure if they bother to go that far, they also will encrypt the data when storing it. Just to be sure, not encrypting it would be just silly. Ok, often obfuscating data is enough, it's must lighter and faster, yet makes data such that it's not immediately obvious what it is. There's no reason why ROM alone should be used to store documents. Even code which doesn't fit in rom could be stored on disk and loaded on demand. So code base for this kind of application can be larger than the storage space in ROM. Did somebody forget dynamically loaded segments, which were used with .exe apps a long time ago. Same address space will be just swapped with different code loaded from disk, if there's not enough room to store or loaded everything at once. 
  • One of the five machines they found hit with the firmware flasher module had no internet connection and was used for special secure communications. " - This part reminded me from my secure system with RS connection. When I said that it's being used with low speed serial link, I did mean it. Also the out of band attack channels are disconnected like: DCD, DSR, RTS, CTS, DTR, RI. Only Transmit data or receive data and ground are connected also the RD, SD is controlled using a switch on cable, which makes the cable always unidirecitonal. If the other pins would be connected, it would be possible to carry data overt CTS, RTS, DSR, DTR and RI pins without the LEDs indicating it. I'm using DB9 pin out, if DB25 pin out would be used there would be many other pins which could be used to relay out of band data. As said, it's important to make relaying data in our out as clear and hard as possible.
  • " The attack works because firmware was never designed with security in mind " - Made me smile. Well, that's true. In most of cases, software is barely working. Who would want to spent additional resources to make program secure when adding those features could also make the system brittle and harder to manage & maintain? Security isn't priority is the norm when creating software. There are much more imporant things to consider, like if the program is working at all and not crashing all the time. Anyway, aren't applications and security only for honest people? If somebody really wants to get in, they will. 
  • Got bit bored at home and wrote decorators for ETag ang gzip handling for
  • Enjoyed installing a few Fujitsu Server PRIMERGY RX200 systems with LSI MegaRAID CacheCade Pro 2.0 SSD caching solution. 
  • It seems that many storage solution sellers don't even understand meaning of Hot-Warm-Cool-Cold tiered storage. There's no reason what so ever to store archival data on expensive SLC raid arrays. Only small amount of hot data should be on fastest possible disk system and rest can be stored on slower tiers. 
  • Wondered how some server dealers try to sell you tons of stuff you don't need, included in a package. As well as leaving the stuff which isn't included in package openly priced. I think this kind of pricing model is just annoying and wasting everbody's time. Just give me a clear pricelist which includes everything and I can make my own conclusions out of that. I don't want to waste time negotiating stuff which doesn't really matter. If prices are too high or service isn't what I would like it to be, then I'm not buying. Multiple negotiation rounds just waste everybody's time. Another thing which is pretty ridiculous nowadays are long contracts, like we demand 36 month contract? Ok, fine. If prices are lowered during the contract do these price cuts also apply to existing contracts? No? Ok fine, I don't want that kind of deal. Some service providers do cut prices also for existing contracts, other do not. I don't like it. If you offer a backup solution, I'm interested to know if the backup solution is off-site backup. I would prefer option where the backups can be fetched at any time without any assistance from service provider. So if required, I can even keep my own copies. How do we gain access to the backups in case of total data center loss? Yes, I know it's rare, but it has happened before and it will happen in future too. Is invoicing clear and right? Some service provider send horrible invoices with mistakes and unclearl ines, others deliver clear invoices which are always right. Some service providers provide clear invoice every three months or so. Other service providers require advance payment / contract / month, which is horrible. Also questions how refunds are done in case of service is cancelled are always interesting. Does the service provider provide flexible contracts where you can modify system resources as needed? If I need extra CPUs or memory for some heavy batch run is that possible? Many service providers also offer SSD storage. Well, nice deal, but what if you don't need it. As well as tons of bandwidth included in package which isn't needed either. I don't care if it's included, it's nice, but including bandwidth in package shouldn't bring it's price to pain point. I just wonder how much server resources are sold using these kind of Dilbert deals. Lot of BS talk, little facts and then just let's roll the monthly billing. Does anyone even know what they really need and what they're buying? Nope? I guess that's true unfortunately in many cases. Clueless customers and managers are truly clueless and those are also the customers which keep this kind of service providers running. 
  • Quickly read through Flux article - It's yet another pub/sub messaging / queue solution. 
  • Had long discussions with friends about DHT, STUN, TURN, how to know if peers are alive, how ping and pong should actually work, how often. How to prevent reflection and amplification attacks with UDP based solutions. How to manage peer information in a sane way. Listing 10k nodes got no use if there's really high churn. Keeping list of smallish number known reliable nodes is a good idea. In this case if bootstrap / seed / initial fixed list nodes are under attack network won't fully collapse because peers can't find information about current active peers to join the network and so on. List also shouldn't contain too many peers which are unreliable or short term nodes. In most of P2P networks it's really common to have extremely high churn rates. Some peers might run just a few minutes in a month and so on. Looking for those at a random time is quite unlikely to be successful. Like client pings server very 900 seconds. If no reply, enter test mode, send pings 6 times every 10 seconds. If no pong is received consider server to be dead. And if server doesn't receive any pings from client in 1000 seconds consider peer to be dead.  Of course this is only for times when state is in idle. During normal operation there's constant bidirectional communication as well as ACK / NACK packet traffic. As well as software engineering aspects and integration architecture consulting. Lot of debugging, hanging threads, non-blocking socket I/O and all the general stuff. Plus lot of discussion about NAT loopback, other devices do support it and others do not. Some allow it to be configured freely. It's also known as NAT hairpinning or NAT reflection. I'm hosting several networks which do support loopback but a few networks do not. It's really annoying because services can't be accessed using name or IP but you have to know the private IP address to access the service. Some times NAT is also doing NAPT and translating the port number so even port number might be different for LAN than for "rest of the world". 
  • Firefox 36.0 Release notes -  Adds support for HTTP/2. After using this for a while, I don't know what they got wrong. This is just like what I cursed a few posts ago. Shit code is shit and you'll notice it. Firefox totally freezes and hangs and seemingly nothing is happening. Network is on idle, CPU is on idle, there's plenty of RAM and Disk I/O capacity etc. But alas, nothing happens, why? Why? WHY?!
  • Zoompf guys wrote that they double their database performance by using multirow SQL inserts.
  • First thing to remember with UDP is that it's addresses can be spoofed. So data shouldn't be sent to recipient without first verifying that the request is valid. This is exactly what TCP does by it's nature. If this step is skipped, it's very easy to make and such program to amplify and reflect attacks. I'm just sending packets to all OB nodes which tell that some random ip and port just requested that huge image. It's very usual to measure the amplification factor. If one 512 byte UDP packet can trigger sending 100 kb then the amplification factor is roughly 200000. If there are no measures what ever to prevent this (I know there's already at least some window limits) I could use my 1 Gbit/s connection to trigger 200 Tbit/s DDoS attack easily. As well as the targets wouldn't know it's me even if I would do it from home. So this is just theoretical sample. Some times even no amplification is enough for attackers, they're just happy with the masking features. So they can use a few servers with high bandwidth to indirectly attack site making attack detection and mitigation harder. It's important that the recipient validation is made in a way that can't be also spoofed.
  • Attended Retail and Café & Restaurant 2015 expo / convention / fair / conference. Same stuff as always, self service, mobile apps, RFID, digital signage, loyalty programs and retail analytics. 
  • Had once again interesting discussion about customer data retention. What ever information is received, will be stored indefinitely and won't be removed ever. So when you use cloud storage, have you ever considered the fact that what ever you ever store there, you can't ever remove? Did you understand that? Maybe not? But you should really think about it. Yes, there might be "delete button", but it's just a scam. Anything isn't removed ever, it's just hidden from YOUR view. It's still there. These are very common practices and there's nothing new or special about this. Even all temporary files back from 2013 are stored. When asked if those can be deleted answer was nope, we don't ever delete any data which we have once gained access upon. 
  • Enjoyed configuring Office 365 for one business & domain + installing Office 365 clients as well as configuring email accounts and SharePoint.
  • Replaced CRC32 etags with Pythons own hash based etags using base64 encoding. Computing it it's about 7 times faster as well as amount of bits provided to avoid collisions are plenty more.
  • Also adding HTTP xz (lzma77) content-encoding compression support would be also trivial, but currently no browsers support it.
  • Requirements specification, all that joy. Fixed a few things for a old project. No fixing is wrong term, there wasn't anything wrong to begin with. The program worked exactly as specified. But after it has been in production for six months, customer had unexpected situation which created NEW requirements. Then there's all that age old and boring discussion, should they pay extra, because the integration isn't working. But they don't just get what's causing it not to work. In this case it was especially boring case. Data is transported over HTTP as XML to another system. Structure is really simple and clear and there are three systems interoperating via message passing. Problem? Well. 
  • For some reason system let's call it N doesn't accept messages from system S which are generated by system W. And the reason is? Well, for undefined reason system N can't handle in tag T data which contains information for several days, even if there's no reason what so ever to do so.
      <day date="1">
      <day date="2">
    They insisted that there has to be msg for each day, even if there's no technical reason for it and no documentation requires it. Of course this situation creates a problem only when there's data for several days to be delivered.
    So who made mistake? Me? Them? Nobody? And who's going to pay for it? - All just so typical integration stuff.
    Well, I 'fixed it'. It was naturally trivial to fix. Even if I still say that I didn't fix anything, because there was no mistake to begin fixing with. I just open and close the msg between days. Totally pointless and doesn't change a thing practically, except that now it works.
    Funny thing about these things is that sometimes it takes months of pointless discussions how it should be fixed. Even if fixing it would take just 5 minutes. Some companies and integrators just seem to be much more capable than others.
    In one other case situation was quite similar but instead of date it was profit center. One major ERP vendor said that it's impossible to handle transactions from multiple profit centers in same message, even if there's no technical limitation for it. In that case it wasn't even my app which was generating the data. I wrote simple proxy which received one mixed message, weeded it out per profit center and then sent per profit center messages forward. Totally insane stuff, but it works. Because both parties said that it's impossible to fix so complex things, which made me laugh. One party could generate per profit center data and another part couldn't receive mixed data. I think they both got pretty bad coders. Well luckily there was someone who was able to deal with this impossible to solve technical problem in a few hours.
  • Studied Bitcoin Subspace anonymous messaging protocol for direct P2P communcation. I also wrote about it.
  • CBS got the same problem PBS had earlier:
    "Unfortunately at this time we do no accept any foreign credit cards. In order for you to make a donation you will have to use a bank issued credit card from the U.S."
  • Read about Payment Services Directive (PSD2)
  • Python: Problem Solving with Algorithms and Data Structures - Just read it.
  • Noticed that SecureVNC allows cipher called 3AES-CFB. Yay, AES256 isn't enough? Do we need 3AES already? What about using ThreeFish with 1024 bit keys? 
  • Checked out twister which is distributed p2p implementation of Twitter.
  • Checked out Transip servers in EU - Excellent hosting option like Digital Ocean, Vultr, OVH, Linode and so on.
  • Quru wrote about Stockmann's webshop. - It just sucks. Actually I just yesterday proved it. My friend couldn't make her pruchases from the store. I had to make purchases because the payment solution was so broken that standard MasterCard didn't work with it. Nothing happened after credit card information, nothing at all.
  • Now it's clear, Samsung S6 doesn't even have SSD slot. This was really expected move, because even the old phones with SSD slot were crippled by firmware updates so that the SSD card couldn't be practically used with applications. I just wonder why nobody made bigger fuss about this. Devices which you have already bought and downgrade via software 'updates', duh! 
  • Python 3.4.3 released
  • Python 3.4 statistics library
  • Studied about Opus audio format - Because latest VLC 2.2.0 - - supports it. 
  • Peewee ORM ala Charles Leifer - Techniques for querying list of objects and determining the top related item
  • dataset - Super lightweight ORM for Python
  • Python 3.4 tracemalloc - Track application memory allocation in detail 
  • Python PEP 448 - Additional Unpacking Generalizations
  • Google PerfKitExplorer - Track performance across clouds  
  • For some reason multi threaded version of par2 seems to crash with my new 16 core system. *** Error in `par2': 6429 Segmentation fault      (core dumped) par2 c -u -r10 recovery.par2 *** Most interestingly the crash happens after the Reed Solomon matrix Construction is completed. So there's some kind of simple addressing fail somewhere probably. I'm pretty sure it's simple bug, and not a hardware related issue. It also seems to be happening quite often. 
Phew, now my backlog is gone. I did it. Hooray!

Topic mega dump 2014 (3 of 3)

posted Mar 2, 2015, 7:50 AM by Sami Lehtinen   [ updated Mar 2, 2015, 7:54 AM ]

  • Backblaze Hard Drive Reliability Update
  • One investment company didn't use HTTPS for their customer pages in 2014. That's incredible. Also many forms were server using HTTP only and then results were submitted over HTTPS. When I complained they said it's ok, because information is submitted over HTTPS. But no user knows or notices if someone edits the page and removes the HTTPS. As well you don't really know where the data is being submitted without checking the source. As well as MitM attack would allow modifying the form sent over HTTP easily and choosing free content for questions as well as the destination for form content.
  • Read TorCoin plan.
  • Read good long post about Amazon's Elasticsearch. - Unfortunately I don't have real use cases for such system right now. As well as I consider many Amazon services to be actually quite expensive compared to competition.
  • Had a discussion how to learn stuff. My view: "I think it would be better to learn the same skills on something concrete. amd being productive while learning, and not spending resources only on learning. That's one of the reasons, why I now study programming by deciding a project which requires a certain skill set and level. Then I proceed building it to at least on alpha or MVP level. Which allows me to create something useful as well as learning the required skills. Yes, this takes more effort than only 'skimming' a book on some specific topic, but then I know bit deeper the topic and hopefully generated something useful something while learning."
  • Something different: Semi-automatic transmission, Canard, Tricycle landing gear, Free piston engine, Wave disk engine - Free piston egine can be used as linear generator. In such engine there would be only moving valves and piston. No crankshaft or physical output axel. - Iron Dome, Skyshield, Depleted uranium, Quad TiltRotor, Supervolcano, MANTIS, AMOS, CV90, Rutherford, V-3 cannon, Psychological Warfare, Ballista, Catapult, Trebuchet, Hall effect thruster, VASIMR, Inertial Navigation System, Anti-aircraft warfare,
  • Also reminded myself about: Counterintelligence, Countersurveillance, Computer forensics, Forensics data analysis, Distraction, Cover-up, Disinformation.
  • Did a few short tests using Google Cloud Messaging and my phone. I had one specific project on my mind, and I found out that the delivery latency as well as latency jitter were totally unsuitable for the purpose of the potential requirements of the project. But in general I really like concept of one messaging solution which can be used to trigger events and so on, which naturally saves a lot of energy compared to running tons of different applications polling something constantly or even keeping idle tcp connections (with repeated ping/idle/alive messages) open. Consuming cpu, bandwidth, memory and battery resources.
  • Facebook data center concepts Wedge and FBOSS as well as disaggrecated network. - Kw: switch, configuration management, statistics packages, environmental handling, microserver, modular enclosure, control logic, switching module, Open Compute Project (OCP).
  • Reminded my self about Graph databases. - But the specification made me smile: "A graph database is any storage system that provides index-free adjacency." - Hmm? Index, that's though definition question. I would say it's "direct pointer" to data, that doesn't require index. But with current complex systems, that definition must be really lingering. Because any lookup table could be considered as index and therefore I believe that most of current systems simply can't provide index free solutions. There are so many layers of indexing already in existence on modern systems. But if we return to legacy systems, in ram graph database could be something where record A got direct pointers to other records with memory addresses where data is being stored. As example inodes in file systems. When making comparison to file systems, if the record contains filename that's a fail. Because looking up inode using filename requires using an index for lookup. Or if indexing isn't being used then it means going through a list of filenames in directory which is even worse. In a way I really like legacy programming and C. Because many high level systems diverge developers from what's really happening. Really simple naive legacy implementation is much cleaner. Dictionary, hashtable or what ever = index, fail, direct memory pointer or disk address doesn't require index. Except, that if data is being stored on SSD or any modern system, there are already multiple layers of lookup tables and indexes. And same applies to modern operating systems, paged memory and so on. Actually when these modern systems are used and you listen how high level developers describe those, it might sound like that they don't know computing at all. And they might not be able to describe on low level what's happening and how.
  • Read documentation about PostgreSQL / FreeBSD performance ans scalability on 40 core machine
  • Checked out Google Cloud Save, a cloud data storage for Android devices.
  • Google Cloud Platform - Cloud Endpoints. Just a additional layer making using App Engine with Android easier for developers.
  • Checked out Google Cloud Monitoring -  Would this be what's needed for future monitoring of cloud based services? Seems bit lighter solution than what I'm currently running. But I liked the way they provide ready installation using Puppet, Chef and Ansible.
  • Google Dataflow - This is something I could use for my ETL tasks if required. Most of those tasks are currently running locally with the primary application server. But if there's too much data to be processed by that server, relocating whole system in cloud should be future proof option. Provides data pipeline, data transformation layers. Which I've currently implemented in my own integration module. Yet I don't really like the fact it's Java only. I've written all of my latest code using Python 3 and left Java dusting where it should be.
  • Lightly checked out Google Polymer, Web Components Meteor, and Mozilla X-Tags - This is something I could love. Something which is quite simple to use and makes web UI and Application development much simpler. Current solutions with Angular and web server side stuff and tons of different JS frameworks combined for UI side make development quite complex mess. You'll have to know so many different technologies well as well as know exactly how those can be fitted together. On the other side, those high level frameworks could add considerable load on server as well as on client side. Just like the guys mentioned in Don't Use Angular post. - No link to single post because there are multiple good posts on this topic. If I would be JavaScript programmer I might like the concept of Meteor a lot.
    It's bit like the situation like the cross-platform mobile application development. Use something like Intel XDK and you'll get one slow bloated application which will perform poorly on all platforms.
    If you got interested also check out MEAN.
  • HTML Include - This is something I've been wanting to use from early 90's. iframe came, but it's not same as simple include. Of course there were solutions to make server side includes, and template engines do nested includes and stuff like that. But it's not the same as simple include on browser side. 
  • It seems that someone else came to exactly the same idea as I did. Why Brython isn't served by global CDN as well as why it's not using (even optional) HTTPS. Delivering a JavSscript library to whole world from one server at OVH, France isn't optimal solution. Lack of https and ipv6 is so great either. My personal suggestion for CDN would be using cdnjs.
  • Digital Panopticon - You're being watched. What will the future be like?
  • Most popular programming languages 2014. - Python is strong as well is Java, even if I don't love it anymore. - Java seems simple, but you'll end up generating a lot of bloat code, diminishing development joy and efficiency.
  • I finally figured out why some of the stuff I were battling with Peewee ORM and PostgreSQL and Python didn't work at all. Reason? It's very simple and quite a traditional trap with ORM and especially with dynamic programming languages and databases.
    Peewee ORM - Oh joy. It took me a while to figure out that Python's Peewee ORM handles default and None differently than Python usually does.
    Usually None != True and None != False are True, but in case of Peewee ORM, those won't be True. That's because None is only None, as example  None == None is True. Now it's finally clear. It also seems that even if there's default value defined for Model, those aren't used, in case reference Foreign Key is missing. So you'll need to write X == None or X == False, and only then that's about same as X != True, even if default value for X is False. This is especially important to remember when doing outer joins.
    Did I feel stupid after this? Yes I did. It's just like SELECT * FROM table WHERE data = 0 and then you'll finally figure out that it returns completely different number of records than SELECT * FROM table WHERE data = 0.0 isn't that fun? This is exactly why you should know your tools well or otherwise you'll end up with really nasty surprises. Even basic unit testing won't catch those unless you're specifically aknowledging that you should test for those cases. I assume that part of this problem is the fact that Peewee ORM doens't have exact NOT operator. ~ used by Peewee is about the same.
    Of course there are silly workarounds for the previous problem was that I could ask for count of matching records and if it's 0 then it's same as == None, but that's silly. As well as compiling sublist of potential join entires and then asking if key not in (sublist) which also excludes records which do not have references. Both of these solutions do work, but are quite non optimal. Isn't this just what normal programmers do? Now it works, fine, let's continue. Even if the solution is slow, crazy and doesn't make any sense.
  • Reminded my self about Enterprise Service Bus (ESB) stuff. I'm actually quite glad that many customers select simple, lightweight and more efficient integration methods. Some customers even clearly say that we have that ESB but well, let's just make this work and not use it. Smile. That fits quite well to my current view of avoiding bloat and overhead when it isn't absolutely required.
  • Tor exit node operator prosecuted in Austria. - This battle with Internet freedom and Surveillance will be long, we're living interesting times.
  • hubiC - Excellent European Cloud File Storage service with bit better pricing than what Box or Dropbox and many other alternatives provide. Data is also stored in three separate data centers for storage reliability and availability.
  • I thought that the email would be thing of past soon. But it doesn't seem to be that way. New email clients are popping all the time. Mailpile is one of those.
  • Python 3.4 asyncio with examples - A nice post about new features. This is also one of the reasons I'm not using the (whatever) pipe / queue solutions from (whatever) providers. When servers are clustered together with great interconnectivity, it's pointless to pass data via cloud adding bandwidth costs and latency. As well as because it's so simple using Python alone, I don't want to mess up my projects with additional and needless dependencies. Those should be brought in only if those offer some killer advantage over existing solution. Which they do not currently do. This is exactly the reason why most of my projects are also using SQLite3 and only some projects use PostgreSQL.
  • Whoa, Hotmail and Outook are finally supporting smtps (tls/ssl) smtp transport. - I wonder why it took so long.
  • Google Compute Engine is providing Persistent SSD Disk storage s well as global load balancing. - Which is nice.
  • Vultr seems like a good competitor for Digital Ocean. Based on quick tests they provide even better cost performance ration that digital ocean.
  • NSA targets privacy-conscious - Even more interesting development, maybe we do something to hide? But who's we? Maybe NSA will find out, maybe not.
  • I thought about messaging client which would use DHT for data storage. Everything in the DHT storage would be encrypted and all data would pass via DHT storage using pseudo-random data access patterns. In some cases even the encryption key itself could be used for covert messaging. The payload is basically meaningless, it's all about the key which could be used to decrypt it successfully.
  • PyCon 2014 - Multi-factor authentication, Postgres Performance for Humans
  • One guy said in one tech talk, that he's job is do all the tasks that the engineers can't get done. - Made me lol so much - I don't know why this sounds so familiar. - My work is to be kind of SWAT team or a special unit, when the other departments just can't handle it. - It's good and bad. Because you're going to get all the very problematic cases to solve. Which might require long monitoring, deep analysis, extensive logging and so on. (I'm actually right now working on one such complex case (Feb 2015). Issue has been analyzed for several months by others, but there aren't any real results. I guess I'll have to dig deeper than that.
  • Checked out Google Drive Pricing and compared it to hubiC - Yes there are price differences.
  • COMSEC - Communications Security
  • It's important to have certain arrangements made before hand, allowing maintaining capability to communicate securely even in time of real major crisis. Private out of band communications using multiple separate communication channels and without the need to relay existing networks like mobile phones or Internet connectivity. - It's also a good idea to have a few anonymous Internet connections, which are using 4G data.
  • I guess people with Comsec, Infosec, privacy, covert, communication, system administration and good general IT knowledge and skills can be dangerous.. If there is just a motivation for nefarious intent. But why bother if there's no good enough reason?
  • Cheap cloud services and optimized code could be easily used to generate such a flood of messages to Bitmessage system that it would overwhelm most of network peers. I don't know if proof of work is the right way to securing and limiting network resources in the future.
  • NSA classifies Linux Journal readers, Tor and Tails Linux users as "extremists" - Are Linux users really that dangerous?
  • Maintaining covert identities is hard, really hard. It doesn't require anything else than a simple habit based fail to ruin it all. It's something that needs to be practiced a lot to learn. If you just read about it and try it, you're going to fail, badly.
  • Actually I came up with this before the "Lorem Ipsum" stuff came out. My plan? Having a simple application which generates cipher text which is then translated to viable looking normal plain text, so it wouldn't trigger "encrypted communication" alarms. Program should have pluggable dictionaries and language modules so that it could be used with multiple languages. it's kind of steganography. First point of this whole thing is not to trigger any suspicion at all. See
  • Turned NLA, TLS and 128 bit encryption on for all systems when using RDP. - For some strange reason this prevents Remmina from connecting. I guess it has to do something with the high cryptography requirement because Remmina does suppot TLS and NLA.
  • Are privacy enhancing tools pro or con? Maybe using some simple basics could keep you off the radar? Instead of using well known yet efficient tools which arise suspicion. I was thinking building really simple text steganography tool just for fun. Embedding messages in text using c&w method with compression encryption and ECB. Result is text which doesn't seem suspectable but still contains strongly secured message. Depending from fill in text of course statistical analysis would pretty easily reveal that something is going on. Of course these questions are related to any privacy tools. If you're trying to keep things private and secret, you must have something to hide right? Especially when privacy tools aren't so commonly used, so it really sticks out when someone is using high grade privacy tools.
  • Stego - Text Seganography tool.
  • *** different attacks and stuff like that... False Flag strikes? Who gets the blame game?
  • Subliminal channels - A way to pass communication over unencrypted links. Just like the time stamp modifications with PW.
  • Canary Trap - Creating different documents for different recipients to see which one leaked.
  • Charge Cycle - Battery tech, how many charge cycles can your batteries take?
  • KW: Edi envelop SOAP envelop and Finvoice envelop.
  • JSON Resume standard - Nice way for hackers to represent data in consistent way?
  • Tried Windows IP Ban service, but it didn't work out as well as it should. Didn't like it.
  • Xiki - Improved (Amazing?) shell - Had to play with it, but didn't see a need for it being used for daily operations.
  • Credit Card Skimming - List of different kind of modern (?) skimmers. It's so silly that the magstrip is still being used.
  • Parsing Accept-Language header using Python. I didn't use that one, I wrote my own version. It takes the list, sorts it by preference and then finds first match in my available languages list.
  • Python 2.x vs 3.x survey results
  • It's known that comparing cloud service pricing is really hard. Sometimes nearly impossible. Some providers give lower price and yet provide 10x the performance. It's interesting to notice how bad performance AWS is actually providing. If you compare AWS prices to Hetzner prices the difference is mind blowing. 
  • It's just horrible how many people won't take proper care of their PGP/GPG keys, when hard drive crashes then they just generate new keys and assume that everyone should trust those right away. Sounds like a really bad practice.
  • Hacking Government Surveillance Malware - Totally awesome story including technical details!
  • Storing personal names - First name last name, a good idea? Well, it isn't. That's why I'm using only single unicode field for name.
  • SSL CA information shuoldn't be trusted - No news here
  • Reminded my self about Kaizen - That's something what everyone should follow automatically.
  • Kaikaku - Disruptive innovation and change / pivot
  • XG-Fast - 10 Gigabit links over copper. But as logical drawback distances are getting quite small.
  • KW: Enterprise Resource Management (ERM)
  • Open Data - Simply put "Personal data should belong to the people" if I store my data to some service, why I can't download it all easily?
  • Python is now most popular programming language in Top Universities
  • Yet another file storage service. Amazon WorkDocs.
  • PyCon Taiwan @ YouTube
  • Amazon Cognito - Similar service compared to Google Save. Easily store application data for users in the cloud.
  • There is a clear bug in Deluge Bittorrent client, per file connection limit doesn't work properly.
  • OSPFv3 vs OSPFv2 What is different? - Really nice post, I haven't yet used OSPFv3 but reading this was good intro, it's important to know that there are new LSA types and possibility for multiple instances over same link.
  • SQLite: Small. Fast. Reliable. Choose any three - Excellent post about SQLite3
  • Google Noto fonts for all languages.
  • Studying lossy image compression efficiency - One of my favorite topics. It remains to be seen if JPEG finally get's some viable alternative. I've also read about JPEG patent fights, some OpenSource projects are worried about JPG patents. Well, I don't miss JPEG and there are already better options like WebP and BPG, which just haven't received wide adoption unfortunately. Here's excellent image compression comparison site.
  • Is your application ready to handle CJK chars? Should be if it's UTF-8 compatible and uses right fonts, but there might be some traps. Like string length limits and so on.
  • We also see in Finland Mojibake often, because some systems print UTF-8 ÖÄÅöäå chars as ASCII leading to interesting results. Anyway post offices are really good deciphering those.
  • Shift JIS - Luckily we're not using anything like in that in Finland. But this actually reminds me from times when I wrote Code-128 barcode encoder. Code-39 and 128 which both allow (and require for efficiency) shifting between different encoding modes called A,B and C. Basically it included shift one letter for capital letters and then caps lock mode which permanently switches to another mode until told otherwise. Modes include lower case, uppercase and double digit mode for compression, which allows encoding two number per one barcode font symbol.
  • Unihan Han Unification - Way to get bit different Asian symbols to use same font and visual representation instead of having different symbols for each.
  • Bit faster SSD from Fusion I/O ioDrive Octal drives - Made me smile. Yet I don't have any use for such high end stuff.
  • iosnoop - Excellent tool for snooping disk I/O latencies per process. I've been using this with some servers when ever I suspect I/O related issues. Especially when using VPS servers disk I/O can really tank from the level you'll expect it to be.
  • Got a bunch of GTIN codes for one project.
  • Everyone is using ISBN-13 nowadays, but it wasn't like that always. I had to write EAN-13 to ISBN and back encoder/decoders back in days.
  • How to be happy - I hope you're already happy, so you don't need to read this.
  • I'm very used to databases which provide full MVCC / Snapshot isolation. It was very good that I always want to test all critical sections separately. I found out that some older and simpler databases require additional lock table statements to lock tables. Without those simply starting and transaction doesn't provide any protection from other committing transactions. So database doesn't provide read repeatability, without additional locking.
    Actually read repeatability is not yet even same as snapshot isolation. Because it only locks rows that you have read so far. So if your transaction consists multiple separate reads, it's possible that those reads do not give you uniform image of the database, when the transaction started.
  • Canvas Fingerprinting - Almost impossible to stop network tracking. Yes it is possible to block it, don't run the scripts in the first place.
  • Terminal - Yet another Linux virtualization management tool
  • Reminded my self about protobuf even if I don't have use cases for it. As well as checked out Transit which can encode/decode MessagePack or JSON formats. World is so full of these 'solutions'.
  • Why blurring sensitive information is a bad idea. - This should be also quite obvious to everyone.
  • StartUp mistakes you shouldn't ever make.
  • hubiC fixed their upload speeds finally. I've been avoiding using because upload speeds have been so lousy, less than 1Mbit/s, but now I'm uploading at 100Mbit/s which is good enough. 
  • Ekahau Spectrum Analyzer - Yes, it's just as cool as it sounds like. And does the job. Most of guides how to avoid WLAN / WiFi congestion and interference are quite bad, because most people don't realize there are many other sources than WLAN networks. As well as one heavily used network can be much worse than 10 lightly used ones. Or there might be a reason why there aren't WLAN devices on channels which are used by wireless video surveillance system and so on.
  • One project was designed to use WebServices a long time ago. But back then it was concluded it's so hard and nearly impossible. What then resulted was that the project did silly things. It dumped changed to be replicated to other databases into one table. Then this one table was dumped as XML files on disk. Then one client compressed these XML files to create a ZIP file. Then there was a client which polled bidirectionally for these ZIP files and transferred those over encrypted (of course DIY encryption and implementation) TCP connection. And the other end basically everything happened in reverse. When you think about this complex chain and bit bad code which doesn't lock files properly, doesn't check file integrity and randomly fails, you got excellent and reliable data transfer solution. Ehh, let's say NOT. All this because directly transferring data would have been 'too complex and unreliable'. Just managed to add 10x overhead and even more unreliability. But we all know this is business as usual and there's new about this kind of stuff happening over and over again.
  • Planned Obsolescence - Great for consumerism, but bad for environment. It's also a good policy for software business. It could be hard to charge high maintenance fees, unless customers need that maintenance is needed. If everything would work without continuous manual fixing, customers might feel that it doesn't just make sense to pay maintenance fees.
  • Finished watching lecture series Thinking Like an Economist (TTC).
  • Reminded my self about Markov chains. Finite-state machine is also closely related. - Some times some programs just seem to feel more like Infinite-state machines, wait what? That's because there's nearly infinite number of different ways to fail.
  • One integrator got lamest debugging tools I've so far seen. They used program to dump communications in hex, but then. No, no automatic extraction / analysis. They had printed papers with packet formats and then he used manual calculator to convert between hex, dec and bin. Debugging took long and their team seemed frustrated and it took long. No, I don't have anything else to say about this but I was bit aghast. As you can see, there are different levels, something seems just bad and then some cases are actually insanely bad. 
  • Decentralization I want to believe - It has been seen over and over again, that people don't want and don't care about decentralized systems. Major problem is that decentralized systems are basically mobile hostile. Some companies have used these solutions to limit burden on their servers, pushing to burden to clients, which are then unhappy about it. Clients can consume a lot of cpu time, memory, disk space, disk access, cause lot of network traffic, be potentially used to generate DDoS attacks, or malicious traffic etc. All are great reasons why not to use decentralized solutions. People also seem to totally forget that things like email are already decentralized!
    Zero Configuration is also basically impossible because you have to bootstrap the network some how. Fully decentralised solutions still require bootstrap information. Which is unfortunately hats enough for many and therefore works add efficient show stopper.
    Last nail to the coffin is that most people really do not care about security at all. User base is after all just a small bunch of delusional geeks.
    Otherwise if people would really prefer decentralization and secure communication, something like RetroShare and Bitmessage would be widely used.
  • Telehash - Yet another decentralization protocol 
  • Tor Traffic Confirmation Attack - Carefully studied the article
  • Remy - Even more TCP congestion control, except this one is so complex it's not actually viable. But it's interesting to see that really complex computer generated rules can out perform simpler solutions.
  • Read about QUIC. But no time for this kind of stuff. Hopefully it will be out in future.
  • Internet censorship is progressing, Russia passed new laws. No link, you'll find it if you're interested.
  • Some D-Link firewalls forward WAN UDP DNS queries to ISP. Really nice, works well for DNS DDoS amplification attacks even with spoofed addresses. No wonder some ISPs have been complaining about this. Devices are really easy to exploit.
  • IBM is building Brain like CPU's with 4096 cores.
  • IBM Research Report - Comparison of Virtual machines and Linux Containers (Native, LXC, Docker, KVM) - Yeah, virtualization is expensive. Yet another reason NOT to run "cloud" at all, if it's not required. It's better to run full servers with your software and proper automation and configuration management. Adding virtualization to this mess just lowers performance and adds costs.
  • Windows 8.1 tablets with InstantGo are really annoying if you're trying to save power. Sleep and Hibernation do have real role even with tablets.
  • What happens if you write TCP stack in Python - Nothing to add, except it seems that he wasn't quite up to the task.
  • How to validate your business idea by testing
  • Is there anonymous call forwarding service, which could use prepaid from multiple operators? You (A) call number B (forwarding serivce) and call id forwarded vial C (outbound forwarding service) and finally to D (final destination). This would make tracing calls much harder. Especially because you can swtich A-B and C-D independently. But because this is near real-time forwarding this would have similar traffic confirmation characteristics to VPN provider or Tor relay. Even if you can't directly link A-B call to C-D call, you can do it via statistical analysis of calls and timestamps.
  • Tor relay proxy with intentional latency? Would this be a good idea? At least it could be used with Tor SMTP, store and forward service which on purpose adds delay to skew statistical traffic confirmation analysis as well as it could alter the message size (expand it) or by dropping extra padding.
  • How hackers hid a botnet in Amazon - Well, if there's free resources, even little resources, which can be automatically harvested. It creates great potential for abuse, that should be pretty clear. 
  • Watched two documentaries, one about Israeli Intelligence services and another about Ukraina and Syria.
  • In one security audiot for someone: 1/4 (25%) of database servers facing the Internet used default login & password. Was direct database access blocked by firewall? Of course the answer is no.
  • Studied Netvisor Web Service REST API for system integration.
  • OFF System (OFFSystem) - Anonymous P2P - Storing only non copyrightable data - I actually studied this years ago. I just forgot to write about it. Questions related to it raise interesting questions especially if I XOR two movies together and release the diff, what I'm exactly releasing? This blog post actually contains several high security EC keys. What? Yeah, you'll just have to XOR this with 'random set of bits' Lol.
  • Microsoft is going to give data to US agencies, even if users are foreign and data is not stored in US. So if you think using MS European data center(s) provides privacy, you got it wrong. This is going to set lines how much US Cloud Service companies can be trusted in future. Trust is already very weak.
    It's quite likely that same unfortunate rules apply to Google, Yahoo, Twitter and Facebook as well. Great question is, if it's enough that the company hosting the servers is American? If there's small European business, using Amazon Servers in Europe, is it still all your data belongs to US fair game?
    It became evident from news that Google scans emails and attachments really carefully and reports to authorities. Can that also be extended to other programs? Technically, sure. Wouldn't it be great if the operating system, anti-virus tools, NAS devices, etc, would directly report pirated content to RIAA, they would save them a lot of trouble. 
  • About some of Tor node busts - So many fails. First they failed to use Whonix or similar separation which forces all traffic to go only through tor. Secondly you shouldn't ever mess with, normal (daily use), secure (secret / confidential), and anonymous (no identifying information what so ever) systems. All of those should be completely separate, as deep as hardware level, preferably with individual Internet connections. For secure systems it's good idea to use separation with extremely limited connectivity (rs-serial cable in my case), it's enough to pass ASCII armored pgp messages. AS well for anonymous systems you'll use prepaid data with burner phone and also replace hardware from time to time. You'll also boot the system from read only media, so when ever you'll reset it, it'll be clean again. But if you're lazy and don't care, it's easy to fail. If you use your normal system for all three settings, the results will probably be pretty bad. Also always check signatures, without signature(s) checking it's trivial to give you version which contains well, what ever. 
  • Google play gives really bad UX when some updates keep getting installed automatically even if you try to uninstall and disable those apps. 
  • Quickly checked out Azure DocumentDB
  • Technology 2014 Hyper Cycle Map
  • Sonera (Finnish telco) failed basic access control when providing free benefits. They sent text message (without any code) that using this text message you'll get 10€ benefit. Great, but they didn't include any code on the message which would the same code to be spent multiple times. 
  • Cloudflare now supports WebSockets - Yay!
  • Offline first is the new mobile first? - This is good and bad development. Some offline first sites are actually quite ok to use after inital loading, but using such technology could make the initial load ridiculously slow for visitors who aren't using the site daily. Been there and seen that happening. 
  • Optional static typing for Python? - Is it worth of the speed benefit? I guess it is in some cases, because benefits can be drastic where it matters.
  • I thought I would write more about DNS-based Authentication of Name Entities (DANE) - It seems that not everyone is happy with DANE. Anyway, I have a good friend who's able to do DANE stuff, if you need such services let me know. Yes I did read the RFC6698.  
  • Something different: Active Protection System [1, 2], Optimal Control, Sliding Mode Control, VA-111 Shkval (Supercavitating torpedo), Chemically strengthened glass
  • I thought I would write more about LclBd scoring logig, but there's nothing to add. It factors in location, time, and tags and users weighted by Naive Bayesian implementation. Based on that it picks latest local news for you which you're probably interested about due to tags used in the post or because you've previously like the posts from the poster. Also negative weights are available so you can dislike stuff, that's something what Google+ and Facebook and Twitter won't allow you to do. And I don't like it. I want to be able not to like things. ;) Actually current deployed version got so bad usability issues that those cast serious shadow over any usability. Maybe I got right mood to fix those some day when it's raining and dark. Is it worth of it? Well, most probably it isn't. It's just some hobby tinkering.
  • Cell phone guide for US protesters updated 2014 edition - It's all there, how to use your mobile phone. 
  • It seems that Skype dropped support for old non cloud based clients and is now forcing everyone to use their cloud storage and relay services. They also forced Ubuntu & Linux users to Skype update.
  • Submarine Cable Map - A great resource if you want to know where Internet is flowing under the sea. 
  • Intel Released first 8 core desktop CPU with 16 threads and DDR4 in i7 series. 
  • What are UTM tracking codes
  • Seamless WiFi roaming - Nice! You can configure many of the end devices to scan more often to roam, of course this doesn't make roaming seamless but it's good enough. I'm actually curious if this seamless roaming is actually seamless. It could be, but I don't see any proof there except they're claiming it to be genuine seamless roaming which would be pretty cool. Genuine Seamless roaming would mean that there won't be any kind of hick-up when switching base stations. User wouldn't notice anything at all. Most often that's not required, but can be beneficial if it's available. 
  • The Skyline Problem - Yet another seemingly very simple programming challenge to tackle.
  • Mobile Privacy: Use only clean prepaid phones, do not call outside the closed circle of those clean unidentified phones. It would be interesting to analyze such data, and correlate it with other calls and phones happening in parallel. I guess even in this kind of situation you could detect users which are carrying those anonymous phones with them and another identified phones and where they're relaying the information if using alternate phone. So even this isolation trick won't provide you real privacy.
  • Similarly many people seem to think that SSL would provide security. But no it doesn't. It only encrypts message content, it doesn't hide communication patterns. So when you open a web page and it's resources are downloaded all of those downloads can be monitored. When you then compare those to different possibilities available it's well possible to know exactly what page you opened. Even if the encryption wasn't broken. 
  • Higher level dynamic programming, generic specific code, which prevents reuse of code, Non-uniform Memory Access (NUMA), multiprocessing, multithreading, shared nothing and so on. - I was supposed to write about this, but no can do right now. 
  • Absolutely great post: Visualizing Garbage Collection Algorithms - You just gotta check it out. 
  • I've been writing a lot of stuff using MarkDown (MD) lately. See CommonMark
  • BankAPI - A secure solution specification for delivering messages between banks and other type of financial institutions. 
  • I would love to write about early days of Internet, when I used Windows 3.11 and Trumpet Winsock and stuff like: Slirp, Slip, PPP, packet traces, tcp flags, tcp window, rst, ack, syn, and other stuff I learned already back then. I miss my 14.k modem, no not really. 
  • Samsung Galaxy email app gets ridiculously slow at times. Deleting cache helps. Bad code doesn't show up, until it does.  
  • Actually I don't know why but running par2 on my computer for some reason makes it incredibly slow even if there seems to be no reason for that? Maybe it's memory contention? But afaik that should show up as CPU time, maybe it just doesn't with my current platform. 
  • Boxcryptor pre-crypt data before transferring it to cloud - Now, I've just used 7-zip and GnuPG for this very successfully earlier without any problems. 
  • Studied Universal Description, Discovery and Integration (UDDI) - No I'm not currently using it nor I see it needed in future either. 
  • Sovereign @ Github - Tested it and I can't say it better than they do: "A set of Ansible playbooks to build and maintain your own private cloud: email, calendar, contacts, file sync, IRC bouncer, VPN, and more." 
  • Mail-in-a-Box - Yet another alternative if you don't mind configuring everything by your self (as I did). 
  • My generic guide lines for my own code:  Reusable, simple, use pre-existing, Keep It Simple Stupid (KISS), only make optimizations when actually required. Aka focus on what really matters. Keep project profitable and relatively cheap. - I know a couple of guys who can spend months optimizing code that gets run monthly and takes 5 minutes to run. Is that wise? 
  • Studied BtrFS wiki - It seems that I just like reading it over and over again.
    WPS Wi-Fi router security in some cases ridiculously bad.  Ok, WPS security is always bad, don't use it. This is nothing new. Whole protocol has been broken all the time since it's very beginning. 
  • I'm not providing you enough interesting links? Ok, you asked for it. here's a great list of Recommended Reading - complied by someone else. Just enjoy reading all that stuff. 
  • hecked out: The Payment Application Data Security Standard (PA-DSS), formerly referred to as the Payment Application Best Practices (PABP) 
  • Was the Silk Road bust assisted by NSA? Maybe? Who knows. Where are the packet logs? - And story goes on: FBI's explanation
  • JSON Web Algorithms (JWA, JWS, JWE, JWK) - Standards for JSON encryption. - JSON Web TOkens (JWT)
  • Just my short thoughts: "transport sftp, ftp, sftp, RESTful, HTTPS and data sources xml json csv sql mongodb key value storage or any other. Data source is just data source and I'm sure I can deal with it." 
  • Devops Days Helsinki - Like I've written earlier DevOps aren't ultimate solution, because they lack set of other skills needed to sell, define, offer, build, maintain and support systems. 
  • This is exactly what I've been writing. Poor UI Design Can Kill
  • Just quoting some devops stuff: 'Perusideana on tuoda perinteisesti erillään olleet kehittäjät ja järjestelmien ylläpitäjät tiiviiseen yhteistyöhön. Kyse on isosta murroksesta, jonka ansiosta ohjelmistotuotannon vauhti kiihtyy, laatu paranee ja kustannukset laskevat. "Oleellista on ymmärtää, että devopsissa on kyse ennen kaikkea bisnesprosessista. Devopsin perimmäisenä ajatuksena on saada idea mahdollisimman tehokkaasti ja nopeasti tuotteeksi", Stordell kuvailee omaa näkemystään devopsin luonteesta. ' - Good question, all that high level blah blah. Does it really make any practical difference? What if the guys would be responsible for everything. Software developer everything from start to the very end.  
  • I like concept of chaos engineers. But what if everything is pure chaos even without them?
  • The laws of shitty dashboards - Just so true.
  • I were asked if I'm interested about distributed WebRTC utilizing HTML5 major project. Well, this time I weren't available. Yet the project sounded interesting. 
  • Open Data Finland (Avoindata in English)
  • Why is Google hurrying killing the SHA-1
  • A great post about Wifi beamforming
  • Some TLDs still don't support IPv6 one of those is .at TLD. Nor they do support DNSSEC. 
  • Great post The Art of Profitability
  • How handy e-receipt would be? Our tax authorities remind everyone to issue receipts.
  • Had interesting discussions with friends if database should contain all available information or only just required information. This is actually quite a good question. Because it depends from so many factors. In some cases it's really handy to have everything in database. But from the performance point of view, it's really bad of having everything in the database. Especially if whole record gets updated due to bad database engine. It can drastically add requirements for memory and disk I/O due to database size growth. 
  • More closed source solutions, Google deprecates OpenID 2.0 and forces users to use Google+.
  • Latest OpenID specifications
  • Finland is planning to strengthen national cyber warfare unit and preparedness for hybrid wars. 
  • Making sure crypto sayts insecure - Absolute must read article. This is how things are ruined behind the scenes and odds are set against you.
  • A great TED talk: Big data is better data by Kenneth Cukier

Not enough? See parts 1 and 2.

ETag and gzip decorators for

posted Mar 2, 2015, 7:32 AM by Sami Lehtinen   [ updated Mar 14, 2015, 12:00 AM ]

from functools import wraps

from gzip import compress

def gzipped():
    def decorator(func):
        def wrapper(*args, **kwds):
            rsp_data = func(*args, **kwds)
            if 'gzip' in request.headers.get('Accept-Encoding', ''):
                response.headers['Content-Encoding'] = 'gzip'
                rsp_data = compress(rsp_data.encode('utf-8'))
            return rsp_data
        return wrapper
    return decorator
from base64 import b64encode
def etagged():
    def decorator(func):
        def wrapper(*args, **kwds):
            rsp_data = func(*args, **kwds)
            etag = '"%s"' % b64encode(
                (hash(rsp_data.encode('utf-8')) + 2**63)
                .to_bytes(8, byteorder='big')).decode()[:11]
            if etag == request.headers.get('If-None-Match', '').lstrip('W/'):
                response.status = 304
                return ''
            response.headers['ETag'] = etag
            return rsp_data
        return wrapper
    return decorator

I'm very aware that this etag handing won't make it lighter for the server. But it makes getting response still faster especially for mobile clients.
If you have easy way of validating for ETag content before actually generating the content on server, just move ETag content check stuff above the rsp_data = decorated function line. So the call to the decorated function will be completely avoided if you return results from that stage. Both of these options are designed to be only used with fully dynamic content. Both work well with templates and stuff. 
It's recommended to use max-age=0 instead of no-cache for stuff which should be cached, but still could get quickly invalidated. ETags help with that.

I know, if you're using with uWSGI or Nginx you can use internal gzip.

kw: uwsgi,, python, programming, webdevelopment, websites, webdeveloper, http, header, headers, content compression, deflate, last modified.

1-10 of 237