My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me! My views and opinions are naturally my own and do not represent anyone else or other organizations.

[ Full list of blog posts ]

More OpenBazaar networking thoughts

posted Jun 28, 2015, 10:02 AM by Sami Lehtinen   [ updated Jun 28, 2015, 10:06 AM ]

Pretty random stuff about OpenBazaar networking...

  • Some protocol considerations OpenBazaar UDP, RUDP vs SCTP.
  • Had a chat with friends about if anyone has really used SCTP RFC3286 in production. 
  • My comments to it: I guess answer is nope. I guess everybody knows it, but nobody has used it ever. Because there's no native support for it. Of course it's possible to implement SCTP over UDP but that's one of the complex things I mentioned earlier.  But this is just where programming, hobby and merit comes into play. From commercial point implementing SCTP would sound crazy. 
  • But from point of programming hobby and merit of mentioning of writing SCTP implementation from scratch could be worth of the fun and of course the merit. I did it, have you done it?
  • What's the benefit? It would have more advanced flow control. Message based communication is nice, and also optional ordering is really nice. TCPs blocking head of queue is known common problem and HTTP/2 (h2 / h2c) protocols won't fix that. Because those currently run over TCP.
  • Someone suggested using a timestamp for packet ordering. I think it's a horrible idea, because high speed net and bad clock resolution on many platforms. (Off-topic, I know, but that's what came first into my mind). 4 way handshake requires lot of round trips -> high handshake latency -> really bad for DHT operations related operations. I assume rUDP is lighter on this side. I don't know exact specific implementation details of our rUDP implementation. What's the latency after it requires a ping + some simple authentication / handshake after some inactivity making it 2 to 4 way handshake before actual data gets transmitted. 
  • HTTP/2 implements very similar stuff inside TCP connection compared to SCTP. It also has steams and congestion windows per connection and per stream inside a TCP stream. This is just a random thought while reading the SCTP packet format stuff in detail.
  • See 'Data chunk' section. Ordering can be implemented using flags if and when required. There's a stream sequence number. Which contains transmission sequence number (TSN) which can be used for fragmentation reassembly. There's also stream sequence number (SSN). These are multi-layered protocols, which just adds tons of complexity if we can't name specific benefits for this use case.
  • Sometimes adding tons of complexity can improve performance, but in general I just love KISS. (Keep It Simple Stupid). Because complex stuff is, well just complex, and you'll end up hunting bugs. In that sense rUDP is really nice, because it's naively simple and light, yet not technically optimal (throughput in case of packet loss and lack of sliding window and lack of SACK).
  • Would I use SCTP instead of rUDP? Especially now when we have already working rUDP implementation which has required considerable effort? Well, only if there's really well working pure Python implementation available. It would then run on all platforms and would be hopefully really easy to use and maintain, compared to having own rUDP implementation. And if situation is really such, why time was wasted implementing rUDP in the very first place?
  • Are there some serious problems with rUDP which I'm not aware about? If not, I think we're focusing on wrong things, at least from the point of view where OpenBazaar should be made mainstream and usable for normal users. Usability and reliable functionality is much more important than some minor protocol details. I know engineers, and I like perfection too, but in most of cases, it's not actually worth of it.
  • What about support for dual stack, aka using IPv4 and IPv6 in parallel? I know, this is yet another nerdy view. IPv6 isn't that common (yet). But just because we're talking these topics I'll bring it up. This isn't important, but more important than the SCTP question. So peers should be able to advertise and use two addresses which one can be behind NAT and another without. Like the case with my 4G LTE data. IPv6 is totally open but IPv4 uses CG-NAT.
  • Related reading for networking nerds? HTTP/2, h2 / h2c RFC7540
  • Is there a need to shutdown connection? Couldn't it just 'timeout' so after a certain time some kind of new hand shake is required. This kind of 'dynamic' closing of connection allows you to keep many connections "open" if that's the situation as well as close those dynamically if there's load which requires getting rid of connections.
  • I've deeply hated way how Apache handles keep-alives, because it handles those as static. I would personally set connection limit like 64 connections. If there's room to keep connections alive, connections could remain alive for hours, and if there's load spike and keep alive can't be afforded then connections should be closed after a very short wait for next request. Or rotated after N requests anyway. Of course the exact use case affects things, but this is how HTTP/2 RFC recommends doing.
  • Final note: These notes were written before version 0.5 was released. OpenBazaar Project is currently making some changes to networking stack.

IPv6, SSL/TLS, CNC, CDN, Encryption, Authentication, Mobile Broadband, NFV, SDN, CN, NFC

posted Jun 28, 2015, 9:24 AM by Sami Lehtinen   [ updated Jun 28, 2015, 9:24 AM ]

Some light stuff during summer vacation...

  • Just tested IPv6 with my mobile 4G LTE, and it works perfectly. Now all of my systems are running full dual stack (IPv6 + IPv4). How long we're going to need IPv4 if all systems are using IPv6? It's still overhead to support both and legacy systems. Is it time to get rid of those? I guess that many of the system I'll be configuring from this point for 'internal / servers only / personal' use, will be IPv6 only, because I don't see any particular reason to support IPv4 anymore. IPv6 only might be coming soon. When firewalling systems, I'll probably only whitelist IPv6 addresses and leave IPv4 blocked completely.
  • Confirmation from phone: Now my Android phone is fully working with native IPv6 and IPv4 CG-NAT dual stack... Nice! Using G+ over IPv6. Also IPv6 only seems to be working perfectly without IPv4 support.
  • About SSL certs and let's encrypt. And certificate & key trust: AFAIK: For high security purposes self-signed is the option available. This of course means that you'll have to personally confirm the certificate fingerprint and add it to your lists. But this is always better than trusting any third party. Yet it requires work from you. Isn't this how SSH and OpenPGP keys have worked for ages. I meet you, I'll get you key, and I'll know it's your key. It's not some shady government agency or corrupted money making organization telling that now I can trust my secrets to 'you'.
  • Checked out Superlubricity. Because MIT news post about vanishing friction was interesting.
  • China Net Center operations (CNC) fixed the CDN speed issues I reported in last post based on my comments. Their CDN delivery time to Finland improved whopping 15x times. That's good result. It's not "great" yet, but it's okish considering that it's Chinese caching CDN like CloudFlare and it fetches the source data from China if not being locally cached already. It was just funny how CDN networks own pages were extremely slow before they fixed it. I might even choose to statically edge cache own pages, because those won't consume a lot of resources and it would give a good impression to potential customers. Yet this isn't the first time I encounter similar situation. CDN providers don't see their own pages important about to be cached or delivered using their own CDN. I don't remember anymore which one it was I encountered earlier, but they had similar kind of issues. Maybe their internal invoicing for such services is prohibitively high. Lol. It's always nice to notice that reporting issues do sometimes make a difference. Yet I often feel like reporting problems is waste of time. Some companies do not understand that feedback even if negative is really important, it helps to fix issues. If just they just mind about their customers and customer experience. Actually it's really nice to see companies which do value negative feedback and fix the issues. You don't believe how often it isn't happening.
  • All of my email systems are configured to use opportunistic encryption. There are list of domains which require encryption, then there's list of domains which require secure connection (certificate confirmed) and then there are list of domains, which require specified certificate with fingerprint. I'm using all levels. To be honest over 99% of email goes to those domains which either use secure connection or fingerprint. Fingerprint can be used in cases, where high security is required. And it won't do anything to 'secure the chain'. So anyone can forward the email delivered securely to non secure destination. Of course. Even if it would be any secure system, they can just photograph the message and share it publicly on Facebook or Twitter. There's no easy way of stopping it from happening.
  • Identification and trust issues: But when two banks setup VPN between their data centers, do they use Comodo SSL (or any other) certificate and externalize key system trust to Comodo or do they trust their IT departments? I personally prefer managing most important keys and I believe many do so. Yet I can see why externalizing trust could be very handy at times. And actually that's just and exactly what I asked in my authentication post. Why does your data center software defined network management console require anything else than Facebook or Twitter login? It's just so handy when you can link roles to existing identities and by using these services you'll get free 2FA too? No need for expensive RSA key generators.
  • See my Google+ random thoughts post about having multiple authentication systems. When I'm entering US and border guard ask me for ID? Why Can't I just show that I'm logged in Facebook? It would be actually handy, and that's what I would like to see. Universally trusted strong on-line identity just like   documents are right now. Except again, I'm sure some instances won't trust the passport, but they still want to implement parallel strong identity solution for their own employees for what ever reason. Estonia is trying to fix that.   Finland issued strong PKI identity cards a long time ago, but nobody wanted to use those. Now Yle is trying to launch (weak) identity service for Finland and Finnish users to compete with Facebook, Twitter, Instagram and Google+. Whole identity and trust management is one huge big mess. Why do we need so many parallel identities? Especially if we don't want to be pseudonymous or anonymous and we've got nothing to hide? (Let's not discuss how ridiculous that last claim is.) e-Estonia reference. There's constant competition in Finland for official strong on-line identity system. So far banks have been dominating the market, but telecom operators want to get their share via mobile authentication. The new mobile authentication in Finland is actually so strong that if Police asks you for ID you could technically use it and it shouldn't be highly trusted personal identification.
  • Gofore is going to build the national service channel for Finland (Finnish), which is based on ESB concept, Estonia is using similar mode under name X-Road.
  • Read an excellent write up about PHP 7.0 new hash table implementation.
  • Checked Blancco's new SSD eraser program. - AFAIK, that means they have to certify it separately for each SSD, firmware version and so on. So it really does it's job. Yet, as mentioned, advanced malware could still tweak the drive to lie about being clean.
  • Mobile Broadband prices in FInland? 150 Mbit/s 24,90€/mo (incl. VAT), 50 Mbit/s would be 15€ / mo and it seems that 29,90€ / mo would now buy you 300 Mbit/s 4G LTE, but my phone doesn't support it. Also no operator so far has limited tethering in Finland. Always when I hear discussion about it, it makes me laugh / cry. AFAIK, for most cases faster than 21 Mbit/s on mobile is most probably waste of money, unless you're using tethering and torrents or downloading something else large from network repeatedly. Wifi / WLAN is basically 300 Mbit/s but of course that varies wildly as we know due to radio issues. Fiber is quite common, 10Mbit/s is often free in newer buildings and 100Mbit/s costs something like 5-19 € / mo depending from situation and 1 Gbit/s is often available in buildings where there is FTTH / Ethernet installed, because it can't be delivered over VDSL2. Finland is large country, so actually 4G is important, because there are large areas (of course) majority of the country where you can't get Fiber. So the optional availability for fast 4G is really great compared to Fiber. And it's usually much faster than what could be delivered using VDSL2 or ADSL2+ in those areas due to distances involved. There's one operator TeliaSonera (Telia, Sonera) which does have 10 Gbyte / mo data cap but other competitors don't have that.
  • Performance testing LclBd has shown that even with current el-cheapo server this site can handle more than 4 million hits / day and deliver data at constant 50Mbit/s+ rate for dynamically generated content. I'm really happy with that. uWSGI and Python are doing great job. And the server is ultimate low end server. So using anything defined even as normal desktop, could easily to 10x that and real servers 50x. This result is without CDN or content caching. If required tuning those parameters would further improve performance a lot. Most annoying problem with current service provider are random and potentially too long (15+ seconds) freeze ups. I told them that they have to get the issue fixed asap, or I have to vote with my money. Yet as usual, I assume they don't care a bit and I have to relocate my servers to new service provider. I've been planning to use Vultr or RamNode next.
  • Performance and load testing for LclBd has shown that even with current super el-cheapo server this site can handle more than 4 million hits / day, from dynamically generated content. I'm really happy with that result, it's more than I expected. Linux, uWSGI and Python are doing great job. And the server is ultimate low end server. So using anything defined even as normal desktop, could easily to 10x that and real servers 50x. All static content can be easily delivered up to 500Mbit/s which is the maximum bandwidth currently available for the server.
  • Watched IPv6now Finland seminar video stream. Yet as I've said, all of my systems (at home and at work) are already fully IPv6 compliant and actually using IPv6. IPv6 RIPEness - IPv6 Capability at the ASN Level. Russia is lagging, Norway is leading a lot and in Sweden it's bit higher than in Finland. 6rd RFC5969 ala Sonera. Native IPv6 without NAT with DNA.
  • Thoughts for one project: I could add [b]bold[/b] [i]italic[/i] [q]blockquote[/q] and [c]monospace aka code[/c] modes if required. Or of course what ever markup could be used for those like *bold* /italic/ """blockquote""" and ~code~. = Header= maybe required? Btw. It's just wonderful how many different competing markup standards there are. Also one user sent me an email asking about possibility to add a 'user signature' feature? I personally don't see signature necessary. It was used with Usenet news and BBC systems BEFORE there was user profiles. Now you can dump your 'signature' stuff to user page to user introduction box and link to your alternateprofiles and so on. Therewfore I don't see a need for signature feature, which is often used to spam non-related repeated stuff to posts on many forums. Not mentioning any right now. But I guess you know what I mean.
  • Checked out Thuraya SatSleeve which will turn your cellular phone (smartphone) in to full blown satellite phone working cross 161 countries without roaming charges.
  • Veikkaus, Finnish Betting company starts to accept Divers licence or Social Security Card as their loyalty card. That's great. I've always hated the way companies want to issue loyalty cards which I should carry with me. No, I don't want to do that. If I can use existing card, that's great. Just like I wrote about the identity systems, why every system wants to manage their own identity system? And why email addresses are being used as identities etc. Annoying.
  • Checked out network functions virtualization (NFV), Software-defined networking (SDN), Carrier SDN and Cognitive Network (CN)
  • NFC Credit Card relay attack - Yep, that's what I've been thinking for a long time. Such attacks could be partially limited using tight latency controls. It's easy to make things slower, but sometimes it's really hard to make those a lot faster. Relay attacks adds considerable latency and radio waves pass at light speed, so using latency to stop this kind of attacks could be one way of prevention. Of course there are situations where relaying the identification can be greatly beneficial. Just as Steven Gibson demonstrated in Security Now show. If I use SQRL lock on my door, you can just send me image of it, and I'll open the door for you. Also HTML5 based web application could be interesting to play with. And if it works well, maybe even for actual use.
  • I've been seeing some traffic from Applebot/0.1 which power Siri and Spotlight search engines for Apple. Yet those crawlers doesn't seem to extensively crawl sites. It seems likely that they pick data from Twitter firehose and crawl all URLs mentioned on Twitter.
  • Checked out world democracy index and world competitiveness rankings for countries. Many countries I like rank really high on that list. Canada, New Zealand, Australia, Singapore, Hong Kong, Belgium, Germany.

HTTP/2, OpenPGP, Authentication, Identity, Bitium, SSO, 2FA, CRM, CDN, BI, China

posted Jun 6, 2015, 10:01 PM by Sami Lehtinen   [ updated Jun 6, 2015, 10:01 PM ]

  • Checked out Python 3.5.0b1 - More features, newer versions. I'm still using 3.4 in production. But as soon as new version comes out I'll be upgrading most of systems.
  • Data Analytics & Statistics using Python - Nice tutorial to the topic.
  • How to optimize your site for HTTP/2. - It also well reminds why you should use CDN. Fast Internet means low latency, not high bandwidth. It's amazing how many people just don't get the difference, even people who're professionally involved with this stuff. Yet I'm glad to say, the document didn't contain anything new at all. It's all basic stuff that everyone should be already aware about. I personally think that the stream weight (relative multiplexed priority) is quite useless. Absolutele dependency tree is much better way to deal with stuff. Priority would also work if it would be absolutely priority instead of weight. It seems that the earlier reading about HTTP/1.1 my own thoughts and thoughts about HTTP/2 are all ok, actually I know a lot about it. And I haven't been just reading about it. I've been also thinking deeply about it.
  • What are the use cases for #h2 / #http2 stream weights? I personally see stream dependencies as much more useful. Or I would have preferred absolute priorities over weighting priorities.
  • A good post how HTTP/2 is much faster than SPDY - The key for it is the dependency-based prioritization.
  • 0MQ ZeroMQ tutorial - Nothing to add, just read it.
  • Watched Google I/O 2015 Keynote
  • LclBd now shows most popular tags by user. I'll do minor changes and improvements whenever I'm feeling like it.
  • Thunderbird - More broken and bad software. - Thunderbird email application crashes when moving large number of messages from folder to another. I know ALL of my friends using Thunderbird are suffering from this SAME bug. It's really annoying, but it's more like feature than bug. It happens on Windows as well as on Linux platforms and so on. Restarting application fixes the issue. Usually CPU is also hogged during the task, which seems like a clear bug too.
  • Again wondered by Remmina (Remote Desktop Client) is unable to connect servers which are configured to use NLA / TLS / High connection settings. It's quite annoying.
  • Google is launching push notifications for web apps. - This is important, because I really love web apps over native apps. This is something I have to check out. Because push notifications are one of the primary reasons to use mobile app over web app.
  • Read long article about business management and big data analysis and data analytics in general. How managers should deal with it? Lol, buzzword, data analytics and hype mentioned.
  • Checked out to configure a captive portal for wireless dead drop.
  • Facebook is using OpenPGP to secure emails - That's really great. Yet I suspect that less than, well it's quite hard to get low enough estimate, is going to use that feature.
  • Some people claim that anonymous messages sent with OpenPGP signature are useless. No not really. The point of the signature is to allow me to prove that I wrote those messages to some party at some point, if required so. Yet keeping the identity or most likely identities and private key stored safely allows you to remain anonymous and hidden for the rest of the people. If I want to I can post message signed with my official key + the anonymous key to prove my identity. Or sign the nonce send by party I want to identify my self to with the anonymous key. So they know I'm the one, but they still don't know who I am. Providing really strong pseudonymity over any channel.
  • Reminded my self about QUIC is ~HTTP/2 over UDP. - Yes, that ~ is there on purpose, because that's inaccurate description. µTP Micro Transport Protocol and LEDBAT and SCTP. And a way RFC6951 to encapsulate SCTP packets in UDP for End-Host to End-host communication.
  • SSDs: A gift and a curse - Nice article about using SSD with servers and what kind of problems you might encounter.
  • Had annoying problems with xfce4-indicator-plugin version 2.3.2 it seems clearly to be buggy. What did I say about software earlier. Also the requirement to restart the panel to changes to take action is so... Some decades old stuff. Yuk.
  • Checked out Solar sail including types Magnetic sail and Electric sail.
  • Checked out FOAF (Friend of a Friend) protocol which is extension of F2F (Friend to Friend)
  • GAE Python Google Scalability course - Great stuff. How to build cloud services which seamlessly scale to even greatest demands you can imagine.
  • Added support for PhishTank API to one project.
  • Checked out Transatomic - Molten Salt reactor design, nuclear reactor blueprints and thorium reactors.
  • Studied models like drop shipping and store dropping. Both are concepts that have been used with our customers for a long time. At least one advanced large European customer has been using both models for almost an decade. I've designed the in store ICT / Post and stock system to support their concept and processes, including product life curve, seasons, dynamic pricing as well as automatic stock replenishments which is of course quite trivial when you have all required data at hand. Post of Finland is now marketing these concepts in Finland. But they seem to have forgotten, that most stores have moved away from Finland several years ago. Cost efficiency and scale benefits are really important in this business. It also seems that as market Sweden adopted concept of drop shipping much earlier than Finnish businesses.
  • FBI says that encryption should be prevented - Nice, I think we have suffered enough about some export keys and other stuff. Yet if encyption is banned, then only criminals will be using it. There's no way to practically get rid of encryption. It might be combined with steganography or chaffing and winnowing. Which isn't encryption, it'll just help you to know which parts are required and which aren't. It's just high tech stencil / mask which can be applied on data, to figure out what the real message is.
  • Fixed some annoying jQuery Mobile related issues with LclBd project. There are still a few known annoying things, but I'll fix those as soon as I'm feeling like it. There aren't any users anyway, so why worry with that? I know how it works, even if UX can be bad if user won't do exactly as supposed.
  • Let's Encrypt - New free SSL certificates for everyone. Here's the new CA root certificate being used.
  • Checked out Bitium - SSO 2FA authentication service provider - Cloud-based identity and access management solutions.
  • Dumping some authentication and identification related thoughts and comments, not edited for blog: I'm personally wondering why there's need for "so many authentication systems". Why doesn't just ONE authentication system do the job? I personally would love to see that kind of solution for cases where high to medium security is required. Of course it won't work for cases where really high security is required and governments can't be trusted. But for all other cases it would be ideal. AFAIK, it's silly that we need all kind of identities when actually we should need just one identity and all our roles and access tokens and stuff could be simply linked to it. That being said, the official ID card (which you can use to identify your self basically anywhere) could as well work as universal key, access token and identity as well as 2FA provider. SAML. This would neatly separate identity management from authorization. Some people seem to confuse those, but actually those are two technically completely separate things. f course identification is important part of authorization, but those can be federated together. Well, problem is that there are tons of solutions, but non of those is widely recognized and that doesn't seem to change anytime soon. I some how drifted back to this topic after checking out. It's also a problem that I often prefer not to be officially identifiable in many circumstances which of course invalidated 'well known and good strong identity system'.
    I'm well aware about #sqrl and I've throughly studied it as well as sent feedback about it to Steven.
    Just to add, I don't personally believe that current #sqrl implementations are nearly secure enough. There's no #HSM module for the SQRL private key AFAIK. Please correct me if I'm wrong. Of course some company could provide #yubikey kind of solution which would be SQRL compatible. But any application with 'regular' data storage is always insecure. That's why all those "authentication" apps won't just fly. Believe me, I've checked many of those.
     I'm currently user of "mobiilivarmenne" which is so far one of the best mobile authentication schemes so far. I also think it's secure enough, for most of clients. Except for the cases where extreme high level security is required. But that's a concern for only a very small group of people. Of course the identification trusts third part, and that's just why it won't work for really high security requirement case.
    Mobiilivarmenne is using #HSM and requires PIN code to activate it. As well as if you're not logging in on mobile, you'll get private key authentication using #2FA . Desktop login, using #mobile #authentication . Also the website clearly tells what information is being passed to whom. So that's secure pretty much too. Of course it could be changed with advanced malware. That's why I would love to see the information what's being signed / authorized on mobile screen also. Using low level method which would require firmware modifications on mobile device. So even if desktop AND mobile device both would be infected with "normal" malware (not rooted) it wouldn't be able to affect the key parts of the process.
    When things are as advanced as I think those should be. Try entering countries like US without passport only using SQRL or mobile authentication. If they accept that as your identity, then things are working out. There's already passports with chips, so actually it would be natural step not to require the paper document part of it anymore.
    Here's the description and flow of the current mobile authentication I'm using.
  • Read extensive review of CRM systems including Microsoft Dynamics CRM, Oracle Sales Cloud, Salesforce Sales Cloud, PipeDrive, Nimble and SuiteCRM.
  • Also read one long CDN article, but in reality it didn't contain anything which I wouldn't already know in extensive detail. It didn't go into details of many networks. But it seems that most of CDN networks have focus on western markets, and South America, Africa, India have very small number of POPs if Any. I guess TATA or Akamai are the primary CDN options for India. CDN77 got presence everywhere except Africa. It's also easy to notice that many CDNs don't have Russian POPs. And Russia alone is just absolutely huge country as measure in area. China Net Center seems to be the "Akamai of China". (Yes, I do hate it when someone says Yandex is Google of Russia, lol.  But I just did it! Yet I'm sure that most of people don't know what Akamai is.) They have 400+ pops in China. It's also really funny how CDN Planet shows France as being "Europe", I mean whole Europe. Don't know what's wrong with China Net Center, but it's own pages were really ridiculously slow to Helsinki. Here's their network map. I sent them feedback about the problem. TCP connection and traceroute seem to go somewhere in Central Europe with okish latency. So problem is somewhere else. I'm curious to hear what they say. If I would be running such CDN I would keep permanent connections open due to speed restrictions caused by the Great Firewall.
  • Read article about enterprise data storage solutions by CGOC and what all the storage space is being used. I've seen it over and over again, ROT (Redundant, Obsolete or Trivial). Yet that's what most data is. Records management is important part of that process.
  • Now largest cities are listed as "pre-customized" LclBd local locations. Users can view on LclBd what ever is being discussed right now in following mega cities: Bangkok, Beijing, Buenos Aires, Cairo, Delhi, Dhaka, Guangzhou, Istanbul, Jakarta, Karachi, Kinshasa, Kolkata, Lagos, Lahore, London, Los Angeles, Manila, Mexico City, Moscow, Mumbai, New York City, Osaka, Paris, Rhine-Ruhr, Rio de Janeiro, Sao Paulo, Seoul, Shanghai, Shenzhen, Teheran, Tokyo. Cities are in alphabetical order. I've been also considering to list smaller places just because unique location of cities like: Johannesburg, Sydney, Santiago, San Francisco, Winnipeg, Anchorage, Novosibirsk, Reykjavik, Ponta Delgada, Georgetown, St. Helena, Honolulu. Maybe I'll just limit search result depth for those locations. All other locations list top 1000 posts, but those could as well just list top 100.
  • Electronic Prescriptions are now widely used in Finland. Those are really handy, no need to shuffle papers around and all information is always up to date. It's just like the banking and online trade when it moved to Internet. But simultaneously it's making society more vulnerable to different cyber attacks, because many important things now rely on Internet.
  • It seems that the great firewall is making TCP connections and DNS lookups really slow. Many sites are much slower from China than Hong Kong. Also it's a trap to speed test your website from Hong Kong and think it would be the speed that Chinese users would be getting.

OpenBazaar related random thoughts

posted Jun 6, 2015, 9:42 PM by Sami Lehtinen   [ updated Jun 6, 2015, 9:43 PM ]

Checked out some implementation proposals for OpenBazaar (OB @ GitHub) which some are still secret. About identity reputations, ratings and reviews, web-of-trust, long term storage (LTS) issues. Blockstore DHT, OpenBazaar DHT. How to store long term data, who's going to host it, how to republish it, how to prevent abuse? HTTP hosting? Alternative protocols? Alternative payment methods other than Bitcoin. Who cares about negative ratings at all, if new identities are free? Then there are only positive reviews which can give you any practical gain. Blockchain is bad for data being often refreshed? On the other hand DHT is optimal for data regularly being refreshed. DHT is bad for long term static data (due to refresh requirements). Sybil attacks, Reliable global DHT is great, but what's the incentive to run one? Blockstore, IPFS, Coral CDN, Namecoin. For ratings I think the seller with good ratings has the most incentive to keep the ratings alive, and is responsible for storing the data for those. Decentralized reputation system. I personally believe there's a good change that there will be a few businesses running professional notary services and charing for it. content cryptographically signed financial ways. BitRated, double-depostit escrow, mutually assured destruction (MAD), transaction, disincentives, cheating, risk, redundancy, capital investment. Low barrier, easy entry, competitive market for services. Self regulation. Certification standards for independent groups with auditing. disruptive effect for on-line trading. role reputation, global reputation and network reputation. Signed valid interaction with context rating and scoring system including mine reputation and OpenBazaar reputation. Role reputations might include Merchant reputation, Notary Reputation and Lender reputation. Context rating might be item description, shipping time, customer service. This could be implemented using Blockchain Name System (BNS). Actual rating values could be positive +1, neutral 0 and negative -1 stored with merchant, notary and buyer Global unique identifier GUID or Item unique identifier IUID information and contract SHA256. I didn't like the potential concept of paying for right to write a review. I think it causes kind of negative incentive. If I'm happy, I don't have any interest to write a review. If I get scammed, I might write a negative review, but the market just aquires new identity and my review (which I paid extra for) is practically worthless anyway. Since practically only positive reviews do matter, I don't see a need to pay for review storage separately. Merchant should keep it's own review history, let's just chain those together so that you can't remove negative reviews. Thoughts about using notaries by taoeffect are interesting. This needs deeper thoughts. Notaries can solve disputes (possibly), but who's winning for that? Let's play some game theory. Yet, notaries are in control of escrows with current implementation. Talks about drug bazaar or platform for illicit goods. All technology can be used for good or evil. I do support concept where Notaries are also being rated. Otherwise being malicious notary would be just benefit. Project FreeKarma mentioned, it's decentralized open source reputation system for the web. Electronic cryptographic contract signing process and protocol. Semantic data, trading, trade. Selling physical goods at a fixed price. Signing contracts is agnostic. Smart Contracts. Market developer category for software developers. This will allow auctions, services, lending, crowdfunding, crowdlending, prediction markets, financial securities, P2P lending, bonds, options, swaps, stock, insurances, currency exchange, barte,r prediction markets and all kind of other deals. Reputation is easy way to estimate trading partners trustworthiness and counterparty risk. Sybil sockpuppet identities. Proof of trade, privacy questions, notary validation, anonymous ratings, discourage fraud. Currently there has been discussion about using bitcoin blockchain, blockstore, ChainDB, Namecoin or other similar solutions and of course the Subspace or OpenBazaars own DHT storage. Legitimate transaction and rating validation process. Then checking the OpenBazaar trading Platform and vision what it could be. One of the project goals is to disintermediate middlemen from the trading process. If you're wondering that you haven't yet heard about all this stuff, it's because this is still WIP. Main components of the project are the Kademlia based DHT, Trade Protocol and the Client Application. System is planned to use minimal resources as well as to be scalable to large number of users. No need for distributed consensus. OpenBazaar uses Bitcoin for transactions and payments. Multisignature is important for many of the escrow concepts. aka multiparty escrow. I personally would like to see "pluggable" payment module interface, which would allow using any payment method. One option is Altcoins and plety of other payment methods which can be verified using API. Of course there could be also payment methods which do not have direct integration, there has been tons of successful on-line trading sites without any payment integration. OpenBazaar uses Ricardian Contracts and cryptography to make the contracts tamper-proof as JSON schema to make contracts machine readable. Contract parties are pseudonymous / anonymous when required, new identities are free. Meatspace (IRL) identity can be also provided as well as Onename ID. Of course some disputes may end up being resolved in court. Contract images might be stored remotely, but the image hash will be stored inside contract to make it impossible to change the image later without breaking the linking to hash. Invitation to tender is one way to request goods for sale and advertise what kind of services / goods you're willing to buy. Of course in these cases it's very important to format the tender so, it's clear what is being bought. Technically it's reverse auction where sellers are competing for lowest bid to offer the service to the potential buyer. Technically OB will allow fully automated P2P exchange which could become the backbone of the future of on-line trade. Optional surety bonds can be written.

When there's discussion what should be stored in DHT and what shouldn't. It's a good question, actually a really good question. Because there's also a question what data should remain in network for nodes that doesn't exist or aren't online anymore? Currently DHT is mostly used as address database, and data is fetched directly from 'responsible' peer? Is this the optimum way to deal with it? Can't merchant self host positive (and negative) reviews? Those are just validated by the clients using cryptographic methods and chaining?

Issue: 1309 - Ratings and Reviews Proposal - My take on it: Who's got more incentive to maintain reviews and reputation than the merchant, notary or buyer. Because new identities are totally free, there's nobody else whom would have any interest on maintaining reputation data. Reputation data should be just chained up so, that if you got 10 positive reviews and one negative you can't drop the negative one from the chain. Storing reputation externally in this case doesn't make sense to me? Maybe I'm missing something? Distributed storage is one of the things I really love thinking and talking about. DHT is ditstrubted key value datastore. Yet using JSON data would also allow protocol to allow partial key value fetch possible. Let's say value is large, but there are individual pieces of information which needs to be fetched, then it's not necessary to fetch the whole value of that key from DHT. I personally don't see OB DHT as Long Term Storage (LTS), but it should be in interest of someone to republish the data, which solves the problem. Who benefits from the availability of the data? Other solutions require yet another independent network to run in parallel, which I generally dislike. As well as BitCoin BlockChain based solutions provide really small storage space. Same problem applies to SubSpace, why to pay for additional subspace storage, when you could use OB DHT for free, and the party who benefits from the data availability is responsible for republishing it? If BlockChain is used, what's the overhead for that, and who's going to provide light API which allows just to fetch the required data instead of fetching / storing tons of data just to find the data being looked for?

Issue 1317 - Smart Ricardian contracts My take on it: If you're unemployed right now. My personal tip would be starting to establish a reputable notary service on OpenBazaar. You don't ever know if that's going to make you millions later. This is the trick with seed investments on new technology. Cost is really low, but nobody's doing it, because it doesn't matter. But it really might matter a lot bit later. Just like nobody wanted to have any BitCoins. I personally discarded 50 BitCoins, because those aren't worth of anything. Just leave your computer on for one night, and you'll get 50 more. Well, everything is just so obvious in hindsight. Technically storing the wallet would have been trivial. I maintain all the time long term records. But I didn't do it, because it was only a test and didn't matter at all. So I didn't see those even worth of the storage they're consuming, which isn't of course a lot.

LXD, Ricochet, OB, RNN, IPv6, ECC, Logjam, Rlite, UnQlite, ChaCha20, Dedup, Ceetah, Bots, 464XLAT

posted May 31, 2015, 3:36 AM by Sami Lehtinen   [ updated May 31, 2015, 3:37 AM ]

Summer vacation weekly post is bit longer than usual.
  • Checked out LXD - It's faster than LXC or OpenVZ vs KVM - And finally a good comparison what it is all about -  - Finally we say Linux Containers
  • Checked out Ricochet - Anonymous metadata-resistant instant messaging that just works
  • Reported a few bugs I found from OpenBazaar project.
  • Fixed over 5000 lines of my old experimental Python 3 code, where I updated whitespace for 1000+ lines to meet PEP8.
  • Great article about unresonable effectiviness of Recurrent Neural Networks (RNN) - KW: Recurrent Neural Networks (RNN) - Long Short-Term Memory (LSTM) - Convolutional neural network
  • Another article about unreasonable effectiviness of Character-level Language Models
  • Just did read article about writing unmaintainable code. - But in reality, it's really marginal part of users who have to deal with code, and so I don't see real point making an effort writing unmaintainable code. Unmaintainability - naming variables? Just write 'prototype code' whatever comes into mind and go with it, there's no need for extra effort. Usually it's writing maintainable code which requires effort. What's more important? Of course runtime error handling. That's what the end users might have to deal with from time to time. Especially when changing system configuration or whenever something changes. Also using generic error handler can save a lot of time. I'm sure you know how annoying and time consuming it is to handle all exceptions and give proper indicative error messages which even might guide to the user to the actual source of the problem and even hint how to fix the problem. That takes lot of your time and also saves lot of end users time. If you want to save your time and waste users time, just use this great generic Python handler. I assume you'll get the point quickly and porting this excellent handler to other languages is trivial. The article also forgot to mention awesomeness of unicode. It provides lot of great variable names like: ڔ ڕ ږ and ϒ ϓ ϔ or maybe you prefer these? ն մ վ և կ Վ Մ Կ of course it should be obvious that all of those are separate characters.  But my personal favorite might be these  ু ূ ৃ ৄ of course the problem here is that after using these unicode characters at least find and replace is going to compared, instead of using always just single letter variables. Back to the topic, here's the 'catastrophic failure' project dump page.
  • Excellent post about Native Mobile Apps versus Mobile Web Sites. - I fully agree that websites should not try to emulate native experience, those are inherently different and trying to imitate something poorly is just poor experience. The quote starting "You destroy basic usability" is just excellent, yes, that's what I'm thinking too. But support for all platforms is just great. I'm also wondering why so many mobile sites are absolutely full of cruft making site loading a lot slower than it would have to be. Also 99,9%+ of apps are junk and I don't have slightest interest installing those. If I can occasionally visit their website which isn't too full of junk, that's great. KW: UX, Mobile, Apps, Applications, Software development, website design, frameworks.
  • I tried to pay my summer vacation trip using credit card. Yet the sites credit card payment system didn't work at all, I didn't even get to the point where I should give my credit card number, their SSL cert was broken and that wasn't the only problem and so on. I really wonder how online stores and services can really afford this kind of super crappy experience. I wonder if they realize at all how much money they're losing? I've seen this happening over and over again and I'm pretty sure this isn't going to be the last time.
  • Ext4 encryption - Native disk encryption for Linux? Without eCryptfs or dm-crypt. Cipher mode: AES-256-XTS, AES-256-CBC+CTS, F2FS flash-oriented filesystem.
  • Elliptic curve cryptography finite field and discrete logarithms - Excellent stuff in the Gentle Introduction to ECC series of posts. Also see: Cloud Flares ECC primer.
  • Rust for Python Programmers - A great article about Rust and Python differences and how to get quickly familiar with Rust if you're alrady an Python programmer.
  • Studied topics of Peer-to-peer lending - and Crowdfunding
  • - Ready secure configurations for many apps like Dovecot, Apache, Ninx, MySQL, Postfix, Exim, PostgreSQL, OpenSSH
  • Logjam - TLS vunerability explained by CloudFlare and another example how abundant serious bugs are.
  • UnQLite - is a in-process software library which implements a self-contained, serverless, zero-configuration, transactional NoSQL database engine. UnQLite is a document store database similar to MongoDB, Redis, CouchDB etc. as well a standard Key/Value store similar to BerkeleyDB, LevelDB, etc.
  • Rlite - Self-contained, serverless, zero-configuration, transactional redis-compatible database engine. rlite is to Redis what SQLite is to SQL.
  • ChaCha20 Cipher @ RFC7539 and Salsa20 - Only quick check, fast quite simple ciphers. Great option for AES. I just wonder why GCM is often ONLY available with AES, it should be of course possible to use it with any other cipher too, it's not AES specific.
  • Noticed that SeznamBot/3.2 (Seznam @ Wikipedia)is using IPv6 protocol address 2a02:598:2:0:0:0:0:1032 to crawl websites. As far as I know, that's the first bot ever I've seen using IPv6 when IPv4 and IPv6 both are available. kw: robots.txt / SEO / bots
  • Friends warned me about spam on all kind of projects. I had to tell them that I'm aware about that problem. Not just junk posts, put PURE spam. Generate by swarms of bots. I know that, because I've had to close down some of my test projects because fighting spam efficiently might require more resources than everything else together. I want to avoid CAPTCHAs as long as possible, because complex registration makes site usability much worse.
  • OpenDedup - Checked out the project as follow up to the log compression (deduplicating long messages) and compresing those efficiently. Yet this project is larger and heavier than what I needed when deduplicating logs. Yet I'll keep this one in mind, if I need serious deduplication at some point.
  • NVM Express - SATA is slow for fastest SSD disks and PCIe disks are usually expensive. Does NVM Express, NVMe, or Non-Volatile Memory Host Controller Interface Specification (NVMHCI) replace it?
  • I wonder if there's a way to transfer user profile easily between GNU Social hubs? Because Load Average is so out-dated I might like to move my profile to other hub? Is there a simply way of doing that? Afaik, there should be.
  • I'm glad to notice that 7-Zip is now running at Digital Ocean and Filezilla at Hetzner. I can't honestly recommend anyone to download anything from malware sites like SourceForge. I'm actually so glad that I made a donation to 7-Zip project.
  • I once dropped a camera, it seemed like being broken, but I wasn't sure. Actually it seemed to operate quite normally, but the display was blank all the time. How do I know if the camera works or not, without the display working? My solution was to take two photos, one of clear sky and another of complex patterns. Because the photo of complex patterns took longer to process & save, I knew the camera was working. If the image processing and saving would have taken about same time, I would know that the image is totally out of focus or the image sensor has been damaged. So I continued to take photos, even if the screen wasn't working. Also my conclusions turned out to be right later.
  • Checked out CloudFlare CDN Knowledge Base and CORS @ Wikipedia
  • Reminded my self about Linux File Descriptors
  • LclBd is now using uWSGI with backlog multiple processes and multi-threading including offload threads. As well as HTTP router to allow clustered backends and load balancing (if ever required). Site is also using CloudFlare, but not because it would require additional caching due to performance reasons. Primary reason is to provide better performance for mobile users via lower round trip times (RTT) if packet loss occurs with their mobile data connection.
  • During summer when you use everything using mobile you'll get new views how well sites are designed. Some are slow some totally broken and some try to offer some BS app which nobody wants to install. It's easy to see which sites fail and which ones rock.
  • Imgur image upload from device failed multiple times, even if other image hosting sites are working perfectly. So much fail!
  • Found two separate bugs from Dolphin Browser. It's just awesome how many buggy applications are out there.
  • Checked out hosting company VersaWeb - They seem to provide really cheap dedicated servers. Unfortunately all of their Cloud servers were sold out. Due to bad website design even the prices weren't visible for soldout services. Bandwidth seem to be really cheap too. Not as cheap as it's at OVH but ridiculously cheap if you compare to Amazon or Google. There's huge variance in the pricing, how much does the bandwidth costs depending where you buy it from.
  • About creating new social networks, even if the market is clearly full: Often it seems that these platforms will find their own (small) circle of people using it. Just like millions of different web forums have done so far. Creating one can be also great opportunity to learn how to build things, even if it's not intended to make revenue directly. That's one of the reasons why I wrote one more experimental project. Just to see how it works out. It might get other users or might not, I don't really care. I'm also often annoyed by the fact how heavy many sites are. Is it really so hard to keep to the bare essentials? Isn't Google+ even considered being a "ghost town" on modern "popular site" standards?
  • Alternative Redis-Like Databases with Python - Excellent post about new Redis like databases which can be easily used with Python.
  • Learning Robots - This is real stuff, not some BS 'smart' things which many companies are selling. When system truly learns you don't need to program it at all. So 'learning' statistical component isn't just a small part of program, it's the whole program and it doesn't need to be written. Only the goal needs to be set, which tells if an attempt is a success or not and if it is better than some other attempt. Like training a dog, u know. Try until it works and then give reward, repeat, maybe bigger reward for better performance or no reward until the performance is better than last time.
  • OVH 'ping bump', Around 50 ms is normal from Finland:
    64 bytes from vps: icmp_seq=3584 ttl=53 time=44.6 ms
    64 bytes from vps: icmp_seq=3585 ttl=53 time=199 ms
    64 bytes from vps: icmp_seq=3586 ttl=53 time=223 ms
    64 bytes from vps: icmp_seq=3587 ttl=53 time=1364 ms
    64 bytes from vps: icmp_seq=3588 ttl=53 time=359 ms
    64 bytes from vps: icmp_seq=3589 ttl=53 time=12766 ms !
    64 bytes from vps: icmp_seq=3590 ttl=53 time=11921 ms !
    64 bytes from vps: icmp_seq=3592 ttl=53 time=10045 ms !
    64 bytes from vps: icmp_seq=3591 ttl=53 time=11053 ms !
    64 bytes from vps: icmp_seq=3593 ttl=53 time=9054 ms
    64 bytes from vps: icmp_seq=3594 ttl=53 time=8046 ms
    64 bytes from vps: icmp_seq=3595 ttl=53 time=7081 ms
    64 bytes from vps: icmp_seq=3596 ttl=53 time=6073 ms
    64 bytes from vps: icmp_seq=3597 ttl=53 time=5066 ms
    64 bytes from vps: icmp_seq=3602 ttl=53 time=54.1 ms
    64 bytes from vps: icmp_seq=3598 ttl=53 time=4079 ms
    64 bytes from vps: icmp_seq=3599 ttl=53 time=3071 ms
    64 bytes from vps: icmp_seq=3603 ttl=53 time=52.5 ms
    64 bytes from vps: icmp_seq=3604 ttl=53 time=44.9 ms
    Most interesting part is that any packets didn't get lost, reply came just bit slowly. Ehh...
  • Optimizing your Python programs for speed - Nothing new I could say. 
  • ITU ICT Facts & Figures [PDF] - Less than 50% of people are using the Internet. There's a great potential for growth.
  • MIT Ceetah - Jumping over obstacles - It's great how robotics is finally advacing. Robotics + Neural Networks (real ones) will get these things far in the future.
  • What's the point of 'fake followers'? I've got two type of followers here. Somewhat nerdy ICT professionals, programming, data centers, ML, AI, robotics, DB, software developers. And then I got the second type, usually young girls posting celebrity junk and viral cute kitten videos. I don't just personally see what's the benefit / point for those fake profiles to follow me? It's also really unlikely that I would follow them back. So what's the gain, how they're making out money from that?
  • Yep, if something is really important, it shouldn't use #internet in the first place. But I have to admit that #netneutrality is really hard concept. Because we all know that even if there should be #neutrality and #equality with many things, that won't be happening anytime soon, if ever. Also in some sense neutrality and equality are discrimination against something else. Some people suggest quotas for quality, but that's discrimination against the people which are on the 'wrong side of the quota'. Even if they would be better suited for the task. Also often people thinking only one side of a problem, haven't formed a complete picture about what it really means what they're supporting. Or they're only incorrectly supporting the cause partially and promoting only that part.
  • Just random thought dump from my comments to one other discussion. This kind of very limited biased thinking is clearly visible with #pirates and many other 'one thing only matters to us' groups.
  • How are these 'killer bots' different from mines? If there is a 'no go' zone, everything moving or living in that zone will be targeted. I'm sure it can be clearly announced so you know you're taking our changes if you decide to enter the zone. Bot could also tase you and wait for backup, if that's more appropriate approach in the situation.
  • GitTorrent a decentralized GitHub - Lacks access control, issues, discussions, wiki, pull requests, comments. Technically it's quite similar how Freenet works, especially the DHT / Key handling part with immutable and mutable keys. All the questions of "distributed consensus" are way too familiar to me.
  • 464XLAT - 464XLAT in mobile networks strategic white paper - IPv6 only could be here much faster than you think. When IPv6 makes breakthrough, who wants to maintain any legacy IPv4 networks? #cgnat #464xlat #clat #mobilenetworks #ipv6migration #dualstack #ipv6only #nat64 #cgnat64 related keyword acronyms: KW: 3GPP 3rd Generation Partnership Project, ALG Application Layer Gateway, CAPEX Capital Expenses, CG-NAT Carrier Grade–Network Address Translation, CLAT Customer-side translator (XLAT), DNS Domain Name System, EPC Evolved Packet Core, E-UTRA Evolved-UMTS Terrestrial Radio Access, FTP File Transfer Protocol, GGSN Gateway GPRS Support Node, GPRS General Packet Radio Service, IPFIX IP Flow Information eXport, LTE Long Term Evolution, MNO Mobile Network Operator, OPEX Operational Expenses, PD Prefix Delegation, PDP Packet Data Protocol, PGW Packet Gateway, PLAT Provider-side translator (XLAT), PPTP Point-to-Point Tunneling Protocol, RA Router Advertisement, RS Router Solicitation, RTSP Real Time Streaming Protocol, SAE System Architecture Evolution, SGSN Serving GPRS Support Node, SIP Session Initiation Protocol, SLAAC Stateless Address Auto-Configuration, SNMP Simple Network Management Protocol, UE User Equipment, UMTS Universal Mobile Telecommunications System


posted May 23, 2015, 11:00 PM by Sami Lehtinen   [ updated May 24, 2015, 12:34 AM ]

  • Checked out AgileZen - Yet another Kanban style project management software, with configurable tabs etc.
  • Had some (not so) fun maintaining age old randomly coded and 'grown' ASP projects which hasn't been properly refactored. - Sigh. Well, I got the job done. I just was going mad with absolutely inconsistent use of white space again. TABs, spaces, random indentation and so on. But I guess that's the "normal way" of source code managed by multiple persons.  More fun! It seems that some of the emails system sends are using templates stored in database, some of the messages are hard coded in source and some are in individual files. Actually after seeing the template code, I could write messages which cause the system to totally fail during template processing because code uses replace [FieldName] with field value. It's nice to read random code. Also SQL injections as well as XSS seems to be working well. All the classic fails.
  • Also hat some fun with age old MS SQL related project which uses ADO.
  • Studied Monotone Span Program (or MSP) which belongs into Secret Sharing Schemes.
  • Had long discussions how to utilized DHT for most efficient data distribution using Merkle Tree and inode like chaining of blocks until the final payload blocks have been located.
  • Spent several days in continuous integration planning meetings.
  • More or less interesting discussions with Cloud service providers. What kind of payment methods they support, do they provide monthly invoice, does money / service need to be prepaid. Does every service renewal produce invoice which needs to be processed by accounting. If you pre-pay something, do they require credit card information to allow automatic service renewal, why automatic service renewals can't be paid from the prepaid account and so on. Tons of questions. If I would be their product manager, I would fix these issues more or less immediately. When you pay something, you get invoice, then you get payment confirmation, and finally invoice which is paid which could be said being receipt. This invoice contains the VAT information. This is ok-ish, if you're managing a few servers, but if you got hundreds or more servers to manage this causes significant administrative burden. Some service providers issue separate invoice for each server every month. That's not a great way to deal with it either.
  • I'm also wondering why service provider assigns DNS name to every server, no that's not a problem. It's nice thing. But they only configure A record (IPv4) for it, even if server naturally got IPv6 too. Why they don't automatically assign AAAA (IPv6) record for the server name at the same time? They also set reverse PTR DNS record for IPv4, but for IPv6 you need to manually add it. Or of course you can use API for this. Yet, you can't add IPv6 PTR record without AAAA record and therefore you have to have separate domain for servers (again cheap, but still annoying). All this because they just won't assign IPv6 address directly to the servers DNS. It would be so nice to get AAAA and PTR automatically assigned without additional work.
    Their key account manager said that they're looking into these issues. But I wonder if they're really interested. Even if some of those problems are pretty silly.
  • Checked out SIIT translator (CLAT) and related Android CLAT - aka 464XLAT RFC 6877 and related NAT64, TAYGA
  • 5 of 5 starts IPv6 test
  • Found multiple bugs after late quick changes with LclBd project. I'll be fixing those on my summer vaction. It's great, because I'll get weeks job done in a day during rainy days.
  • Found interesting screenupdate bugs from Notepad++ project. I keep wondering how broken ALL software can be.
  • How Google smears out leap seconds
  • When doing prototypes I usually start with the hardest part which I might be worried if I can get it to work. Because when that's done, rest of the project is just "the usual boring work" and all the challenge was related to the key part which was being tested first.
  • Some popular torrent downloads are being clearly disturbed using Sybil attacks and fake peers as well as really slow peers delivering intentionally corrupted data. Yes all good old known ways to disturb DHT and inject fake peers which clients try to connect. As well as corrupting downloads via delivering corrupted data which causes hash check to fail, user feel that download is slow as well as causing the block to be re-downloded (hopefully from some other peer, which doesn't deliver corrupted version).
  • Reminded my self about BlueJay - The Law enforcement crime scanner
  • Tested ImDisk with Windows 2012 R2 server to create large ramdisk for old application which utilizes disk a lot and doesn't use RAM / disk cache properly (due to commits / flushes). Tasks which can be run safely as batch in RAM now work beautifully and really quickly. Much faster than with SSD. I've personally always disliked the way how Windows flushes files to disk when file handle is closed. It makes many tasks much slower on Windows than when running 100% similar code on Linux. If I want to fsync, I can call it. If I choose leave data to be flushed to disk by OS later when ever that is, then I don't just call fsync and deal with it.
  • Watched a few episodes of CSI Cyber in background just listening. It's quite entertaining... But just because it's so incredibly bad.
  • Internet of Things aka Internet of Targets will be absolute security nightmare, for sure. There's no way to avoid that as far as I know or can see right now. We've already seen this happening with WiFi networks and Wireless cameras and webcams. So it shouldn't be a surprise to anyone. Many clueless people will almost completely lose their privacy. Smarthome controlled and watched, monitored, used as relay to attack others by anyone anywhere. That's the future.
  • Something different? Boing X-37B, AEHF 
  • I'm wondering when uWSGI and Apache will get HTTP/2 support. It would be a nice feature now when it's standardized. Current mod_spdy implementation is not same as HTTP/2 as well as it's not being developed anymore. (see mod_h2)
  • Quickly checked out SIMON cipher and Speck cipher which are pretty simple and based on Feistel cipher, both are fast and pretty simple ciphers to implement.
  • Checked out Python HTML5Lib
  • When I studied backgrounds of the Reliable UDP, it was clear that RUDP was designed to be 'less effective and lighter' networking implementation than TCP. So in general it will perform worse than TCP.  Many people saying that TCP sucks, either don't understand TCP or have used huge amounts of time to fine tune their own UDP implementations. Like the stuff mentioned below with RFC 7323. It's not simple or easy task at all.
  • I really love DRY. Now when people ask my opinions I can often just quote from my G+ posts or from my blog. No need to re-explain stuff. I just wish that instead of using email, phone or IM people would update Wiki articles and so on. If there are questions related some topics, just update the documentation. Do not individually answer the person who asked for more information, just refer to the updated documentation. If you start answering people individually it will be bad choice and never ending task as well the documentation is going to still suck in future not answering the FAQ questions.
  • Read RFC 7323 - TCP Extensions for High Performance - WS, TS, PAWS, RTTM, replaces RFC 1323, Window Scale, Timestamps, rout-trip time measurement, protection against wrapped sequences, header prediction, IP fragmentation, duplicates, security, privacy.
  • Future of website optimization? Future of website optimization? (See: My quick thought dump about HTTP/2 specification) It's going to be even more complex stuff than so far. What are you going to push to clients, in what specific order and how different browsers, JavaScript libraries and HTTP/2 implementation handle that and so on. If you've been doing website optimization, you know it's already infinitely complex and this is just going to add one more layer to it. How many streams you should use, what's the optimum window size? Don't you know? How are your streams dependent on each other, how are those prioritized? Did you know that you can send response before even receiving a request? You don't need to tell the client to request something, you can just push it. Do you separate static cookies from volatile cookies into separate header fields? Doing so could improve compression. Did you know that data which you push_promise, must be cacheable and safe, otherwise it's protocol error. Also the client can refuse your push attempts, even if would be allowed by the connection settings agreed with the client.
    It's really worth of checking out and thinking deeply.  Or is it just so, that it's way too complex for most site administrators and they'll choose to use something like PageSpeed or Rocket Loader.
    Simply put, the webserver needs to know about your site a lot. It's just like optimizing CSS, JS and loading of other resources and executing those efficiently without blocking DOM parsing etc.
  • Also checked out hyper (HTTP/2, h2) client for Python and mod_h2 which is HTTP/2 module for Apache2. I also tested both of these libraries quickly and both seem to be doing their job well. Python library was really easy to use and configuring the Apache HTTP/2 module didn't take too long, because I already had 'perfect' HTTPS/SSL/TLS configuration on my server and I have used earlier SPDY with it.

HTTP/2 - My thoughts while reading RFC7540

posted May 23, 2015, 10:57 PM by Sami Lehtinen   [ updated May 23, 2015, 10:58 PM ]

This is just a quick thought dump written while reading the HTTP/2 RFC 7540.
  • It's nice that they included h2c (cleartext) mode. Because many people got upset about mandatory encryption. Some data can be pre-encrypted or just delivered in bulk and doesn't require encryption at all. So encrypting it would only introduce unnecessary overhead.
  • 0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a Hmm, fixed long data structures, I would simply preferred something shorter.
  • "a variety of flow-control algorithm", even more complexity.
  • "how a receiver decides when to send this frame or the value that it sends, nor does it specify how a sender chooses to send packets. Implementations are able to select any algorithm that suits their needs.". That's easy for optimization when you don't know how stuff is going to behave.
  • I don't actually know if I like the proportional weighting, but on the other hand using dependencies to build the tree is better solution. Yet these different methods make implementation more flexible and simultaneously even more complex. I would have personally preferred quite basic and simple prioritization system which would have made the job. But this is more advanced stuff
  • "Streams can be prioritized by marking them as dependent on the completion of other streams (Section 5.3.1).  Each dependency is assigned a relative weight, a number that is used to determine the relative proportion of available resources that are assigned to streams dependent on the same stream." This offers infinite possibility for optimization as well as failing. Yet explicit priority doesn't guarantee anything, it just works as a hint.
  • Really loved this one "Implementations MUST ignore unknown or unsupported values in all extensible protocol elements.  Implementations MUST discard frames that have unknown or unsupported types.  This means that any of these extension points can be safely used by extensions without prior arrangement or negotiation." I've seen so many extensible protocols which immediately break if you extend those. Which is of course ridiculous.
  • Option to add padding? Nice, yes I can see this is required at some times so you can change size of objects being delivered to make passive monitoring harder. Also return to binary protocol feels like a flashback. I've been dealing with too many legacy binary protocols in past. XML or JSON feels so fresh compared to those. But I have to say, that binary protocols are more efficient and therefore preferable in this kind of use case where implementations are going to be widely used.
  • "An endpoint that receives a SETTINGS frame with any unknown or unsupported identifier MUST ignore that setting." Perfection again, true extendability is here.
  • What? No PONG frames? Only PING ACKs. No more world wide PING PONG.
  • The push promise system and refusal to receive pushed data etc also adds incredible amount of new optimization possibilities. 
  • If just web site optimization wouldn't be hard enough already. Streams with push promise add nearly infinite amount of different optimization strategies even on HTTP/2 level. Of course all this has to relate how web browsers parse and render the page, how the JavaScript interacts with it. Sounds like there are new jobs for website speed optimization because it's getting deeper and deeper stuff. Can you use one strategy for all browsers, should you have optimized strategy for every browser. Others might use different rendering engine than others. What if some pages served by your site use different JavaScript framework than others? This is going to be fun. You can spent working full time months just optimizing a site and then you notice that hey, Firefox changed something, as they most probably will. The full stack view of how web site works is going to be amazingly complex and HTTP/2 is just adding one new important part to it. I guess something like CloudFlare's rocket loader and Google's PageSpeed are needed as middleware because getting stuff right otherwise is just way too hard.
  • At this point it's easy to forget that the performance optimizing middleware adds again points where code is being modified by very complex rules and can implement tons of more or less interesting bugs. As well as the original code might not be fully compatible with these optimizations as I've already noticed. Some sites show up as completely 'empty' as soon as CF Rocket Loader is enabled. I know, I'm not JavaScript specialist and the code is probably broken on my side, but all I know is that everything works perfectly as soon as I disable Rocket Loader (even if it isn't probably their fault directly). Maybe machine learning will fix this? There are so many combinations to analyze, but it would automate the process what resources should be pushed when and for which platforms and screen configurations and so on.
  • My favorite frame type is GOAWAY.
  • Even more questions what kind of window size to utilize, how many parallel streams are optimal. Using too many concurrent streams with similar priority just delays completion of the more important streams if prioritization is done incorrectly.
  • I also like the enhance your calm (0xb) error message, no need to send 429 / 420 errors anymore.
  • I also liked the fact that response can be sent before request, if the response doesn't require any data from the request.
  • Header keys are handled being as case-insensitive, but keys containing uppercase must be treated as malformed. Nice, bit confusing also. Why bother doing case-insensitive comparison (which is slower than case sensitive) if case does not matter?
  • Connect method basically allows using any other protocol inside HTTP/2. Is HTTP/2 the new TCP? Host header field is not being used anymore, authority field replaces it for HTTP/2 requests.
  • Separating static cookies from volatile cookies into different header fields when compressing, Wow. One way to avoid compression dictionary misses. Again one thing that adds to the vast sea of optimization options.
  • Push promises must be cacheable, ok, nice.
  • Compression must be disabled when using TLS 1.2.
  • Must support GCM nice "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256" as well as P-256 elliptic curve. The TLS 1.2 Cipher suite black list was impressively, err lets say ridiculously long. I guess most of current HTTPS sites wouldn't work if I just go and blacklist all of those in my browser settings. Wouldn't it be easier just to list allowed cipher suites than the massive black list? TLS / SSL is just one huge cipher suite mess. GCM suites are going to replace older suites anyway, so I guess 128 / 256 bit AES/GCM are the way to go. No more DHE / CBC or even worse no FS.
  • It's also interesting to see how many new attack vectors like slow loris (and countless other HTTP DoS attacks) this HTTP/2 will implement, I guess pretty many. As soon as HTTP/2 clients will be out there, would it be fun to run crashyourhttp2 site which attempts multiple methods of crashing the client. 
  • All this sounds so complex that we're going to see a pretty mess before things start really working.
  • Challenge? Who's going to write a fully HTTP/2 compliant server alone in a weekend? It would be interesting. Or is the technology just going way too(?) complex.
  • Ok, maybe it's well possible after suitable libraries are available. I guess this leads to situation where there are less independent implementations and everyones just using the good known libraries. Pretty much the case with SSL/TLS, SSH and IPsec there aren't just too many implementations to choose from.
Related keywords: HTTP/2 some keywords: client, connection, endpoint, stream, frame, peer, receiver, sender, server, gateway, intermediary, proxy, tunnel, payload, http, https, request, response, implementation, resource, URIs, SSL, TLS, security, semantics, HTTP2, h2, h2c, stream identifier, flags, frame size, error, block, push_promise, continuation, settings, identifier, header compression, fragments, sequences, octets, end headers, stateful, context, decompression, discrete, sequence, interleaved, multiplexing, bidirectional, exchange, characteristics, concurrency, concurrently, unilaterally, endpoint, significant, initiating, open, reserved, local, remote, idle, closed, send, recv, state, rst stream, priority, window update, half-closed, flow-control, flow-controlled, prioritize, establish, settings max concurrent streams, refused stream, protocol error, stream error, scheme, interfere, contention, proxy, algorithm, interact, blocking, capability, advertise, operating, deadlock, bandwidth-delay product, transmission, guarantee, processing, particular, suggestion, dependent, dependencies, dependency, parent stream, exclusive, dependency tree, weighting, state management, extending, extension, extensions, services, scope, padding, defines, identifier, identified, establishment, synchronized, synchronization, communication, comprehension, application, padded, consecutive, frame size error, conveys, configuration, acknowledges, peer, receipt, settings header table size, settings enable push, settings max concurrent streams, settings initial window size, settings max frame size, settings max header list size, settings ack flag, push promise, promised, context, octets, associated, opaque, ping, measure round-trip time, determining, initiated, debug, immediately, precede, get, put, delete, activity, gracefully, maintenance, shut down, standards track, circumstances, no error, retry, cleanly, minimally, similarly, counted, unsynchronized, persistently, unauthorized, safeguards, logged, sensitive, privacy, throttling, indirectly, propagation, flow control error, size increment, window update, end stream, flow control window, indicates, permitted, transmit, buffering, capacity, applicable, exceeds, available, asynchronous, aggressively, initial window size, observers, violation, no error, protocol error, internal error, flow control error, settings timeout, stream closed, frame size error, refused stream, cancel, compression error, connect error, enhance your calm, inadequate security, http 1 1 required, semantics, conditional, range, caching, authentication, syntax, routing, request, response, chunked transfer encoding, trailing, carried, blocks, terminate, 101 switching protocols removed, registry, key-value, listing, case-insensitive, malformed, connection specific, header fields, metadata, keep-alive, transfer-encoding, method, scheme, authority, subcomponent, query, target, copying, absolute, optionally, contributing, allocation, trailing, trailers, reliability mechanisms, silently, bindings, alive, active, cacheable, push promise, rst stream, referencing, connect, proxy, mapping, requirements, considerations, keying material, identifier space, transport-layer, gracefully, certificate, resources, URI, TLS, wildcards, misdirected request 421, TLS 1.2, 1.3, Server Name Indication (SNI), generic, unnecessary, context aware, appropriate, performance, security, renegotiation, long-lived, unusable, confidentiality, credentials, handshake, immediately, implementations ephemeral key exchange sizes, ephemeral finite field Difie-hellman (DHE), ephemeral elliptic curve Diffie-Hellman (ECDHE), DHE key sizes up to 4096 bits, black-listed suites, appendix, deployment, triggering, reaction, attacks, cross-protocol, server authority, determining, capabilities, combination, difficult, attackers, cleartext, intermediary encapsulation, field-content ABNF, cacheability, cache-control, denial of service considerations, commitment, storing, state, strictly, waste of resources, abused, expend, processing time, processor, memory, storage, numbers,empty, legitimate, abuses, excess, implementation, commit, critical, correctness, ensuring, Request field too large, disproportionate, load, inexpensive, secret, recover, secrets, attacker, control, demonstrable, exploit, BREACH, settings, client, fingerprinting, user, opportunity, handling, Application-layer Protocol Negotiation (ALPN), protocol IDs, Hypertext Transfer Protocol version 2, PRI, data, header table size, enable push, max concurrent streams, initial windows size, max frame size, max header list size, idempotent, compression, cookie, fips186, rfc2119, rfc2818, rfc3986, rfc4648, rfc5226, rfc5234, rfc7230, rfc7231, rfc7232, rfc7233, rfc7234, rfc7235, tcp, tls-alpn, tls-ecdhe, tls-ext, tls12, alt-svc, bcp90, breach, html5, rfc3749, rfc,4492, rfc6585, rfc7323, talking, tlsbcp, h2-14. h2-16.

2015 summer keyword dump

posted May 22, 2015, 8:54 AM by Sami Lehtinen   [ updated May 22, 2015, 8:54 AM ]

Just a little keyword dump from the ARTS standard

Consumer promotion delivery, marketing and merchandising, proof of purchase, returns, rebates, and manufacturer registration, payment dispute resolution, issuer, issuers, merchants, financial institutions, manufacturers, provide the digital purchase receipts, customers, mobile operators, offline and online trading, purchaser, authorizes, issuing, receiving, retail, consumer, business, organizations, vendors, agents, expenses, expense, taxes, reimbursement, credit card, debit card, dispute resolution, rebates, warranties, interfaces, model, architectural model, business process model, business scope, out of scope, changes, payments, pays, RFID, coupon, survey, exchange, exchanged, disputes, amount, bank, accounting, expenses, invoice, travel, manufacturer, redeem, retailer, extended, targeted, targeting, ticket, ticketing, tickets, proximity, financial applications, redemption, shopping list, settlement, green, audit, candidate, inventory, attendance, promotions, promotion, loyalty, security, data warehousing, forecasting, XML, GITN, expense management, data mining, end user, notification, notifications, mobile, application, BPMN, CRM, POSlog, UUID, format, store, identify, tender, restock, SGML, OFX, data validation, XSLT, SOA, best practices, data dictionary, common data, data mapping, transactions, header, representation, brief description, scenario, instance, transaction, transactions, UPC, EAN, quantity, barcode, QR code, process flow, flowchart, data flow, diagram, business process mapping, use case, use cases.

Some startup related background information and stuff I've done

  • Did some product planning back in days. This stuff is already outdated so I can rewrite a bit about it. But actually I'm just listing just keywords. Because I can't go into any details anyway. 
  • Business Plan, Initial Market Study, Concept, Vision, Mission, Market Research, Operating Strategy, Target Market, Competitors, Promotion, Financial Viability, Budget, Team, Background, Technology considerations: Native, HTML5, Java, JavaScript, Node.js and other alternatives, Proof Of Concept (POC). Where development should be done, what kind of team and developers could be used. Where funding would come and so on. Had a few meetings with potential developers and so on. 
  • Hosting costs calculation matrix including several options, AWS, Azure, GCE, Hetzner, UpCloud, OVH, Nebula, CapNova, Sigmatic, etc. Competitor analysis including review of 10+ competitors. 
  • Customer Segments, Value Propositions, Channels, Customer Relationships, Revenue streams, Key Resources, Key Activities, Key Partnerships, Cost Structure.
If you want to know more, just mail me. - Thanks

Log deduplication and compression = 73,62% saving vs 7-zip ultra compression

posted May 22, 2015, 8:38 AM by Sami Lehtinen   [ updated May 23, 2015, 7:10 AM ]

I had some problems storing my long term log data. Earlier it was just LZMA compressed using 7-Zip ultra compression. But after I got sick'n'tired of those logs being so large. (Yes that's quite relative term) Yet logs are so important that these have to be stored for seven years and at least in two physically separated locations and there secondary off-site backups for even these primary storages. So what to do? I decided to write a slightly better solution for handling this kind of log data.

I created three database tables
1) Timeline: timestamp, message_id
2) Messages: id, hash, block_id
3) Blocks: id, datablob

Now the timeline contains only timestamps and id relation to messages table, which contains hash of the data record being stored relation to block id, which tells which block contains the message.

I collect messages until I got 32MB of NEW data. After that data is LZMA compressed into one message block and messages hashes are written and linked to timeline.

During playback I read required blocks and decompress messages from those and then cache decompressed messages using message id and CLOCK-Pro caching (PyClockPro) so I can efficiently utilize memory and I don't need to decompress blocks all the time. Also using efficient caching like CLOCK-Pro causes unneeded parts of decompressed blocks quickly to be discarded from memory.

Why this is so effective? That's because most of log messages are repeated a lot. Especially when you just take out the timestamp part from the message and separate it out. Let's see the statistics, how much more efficient this method is compared to plain old yet 'very good' 7-zip LZMA ulta compression.

Log entries for 2014 compressed using this method:
413GB old 7-zip data
 61GB My format

Year 2015 so far (about 4 months worht of data):
124GB old 7-zip data
 28GB My format

Writing the required code to make all this happen took just about 3 hours. But processing the old data took bit longer. But in future the data processing is faster (less data needs to be read and stored) due to this efficient compression and deduplication.

Space savings: 86GB vs 326GB = 73,62% saving compared to 7-zip ultra compression. At this point it's easy to forget that the 7-zip ultra compressed data is already over 80% reduced in size.

Let's just say, that a few hours of work and these results? Yes, I'm really happy with this. You might ask why logs are only 80% compressed? Isn't usual ratio more like 95%? Yes, that's right. Usual ratio is something like 95%. But in these log messages there are some rather long and encrypted entries, which make deduplication more efficient and won't compress as text. Also storing these messages repeatedly if missed by compression window is quite expensive, even if 7-zip ultra window is formidable in size compared to almost all other compression methods. As expected context aware compression can make better job than generic compression.

I know there are existing solutions, but I was bit bored during my summer vacation, and this looked like a nice test to do.

AI, ML, DB, Work, Profit, Agile, Grows, Python, JS, Project management, Software development, etc

posted May 17, 2015, 12:44 AM by Sami Lehtinen   [ updated May 17, 2015, 12:53 AM ]

  • Really nice writing about databases: CP or AP?  - Yes, many systems contain many different locking modes, and replication options and some parameters can be tuned on request basis and so on. Writes might fail, but reads still work (with stale data). Been there done that. Especially NOT properly understanding your data store will lead to hard to debug amazing problems at some point.
  • Created related tags and tag cloud features for one project.
  • Created full text search engine including adaptive rate crawler for one of my friends projects. It also supports refresh pings especially for data sources which have been off-line for a extended period and therefore it could take quite a while before adaptive rate data crawler would hit those again. It was fun stuff to do. Most of stuff works by using fully asynchronous JSON message queues where requests and responses are linked using UUID. This message queue design allows easy horizontal scaling.
  • Something was seriously broken with Sonera network - Huge packet loss and DNS broken. I guess the packet loss is causing DNS lookups totally failing. Ping works to most of national servers well, but to Amsterdam / London / Frankfurt, it works really badly with 50+% packet loss? DDoS? Interesting. Let's see if we see something later today in news. Update: This situation persisted for about 20 minutes, and seems to be mostly resolved now. DNS works and packet loss is gone.
  • Crate - Yet another distributed and scalable database with nosql goodies. Didn't go into details, but I like the one approach instead of zoo of technologies. I often prefer not to include new tools to the project all the time and if possible to deal with new problems using the existing tools. Otherwise things turn into monstrous mess of different technologies which aren't even known well and can cause serious production issue due to 'unexpected' things happening. Unexpected because we just don't know how it behaves under certain situations. Also available on Google Cloud Platform with simple and easy deployment.
  • Reminded my self about video compression frame types
  • IoT? Internet of Things? Nope, wrong. It actually stands for Internet of Targets. Smile. That's something I completely agree about. It will be absolutely unavoidable that this will get much worse before it might get better.
  • Once again had to deal with OVH and server freeze ups. It's really annoying when system just freezes for two minutes. Everything otherwise is fine, except nothing happens for a long while and that's really something which can't be tolerated. Yet it's still better than data loss or extended outage. But that's absolute no go for 'normal' operation.
  • Watched Open Networking Summit 2014 - OSN2014 - Keynote by Google Amin Vahdat. Good stuff.
  • Watched on organization using 'copy paste' system integration. Where they open data source A on regular intervals and then just copy paste new content from it to system B. That's nice, and efficient. Lol. I know I know, this is nothing new at all. Business as usual. But it still makes me smile widely.
  • Why I won't be switching to Disque - That's very well said. Many projects are fun for a while. But especially those which are technically demanding ones can turn into horrible burden rather quickly. Things which "mostly work" are fun to make. Things which work darn well, are reliable, fast and all the good things... Well, not so fun... Those require team of smart guys willing to tune the code for years. There will be times of frustration and sheer despair at times.
  • Bluefish programming editor - Yep, clearly made by programmers and developers. I've reported multiple bugs with it earlier. Like file saving working differently when using keyboard shortcuts versus mouse. Now I found yet another bug. When I open document and try to use syntax highlight with it I have to enable syntax highlighting then select language x and then switch back to language y even if the language y was already selected when I selected the syntax highlight option. Even hitting manual syntax highlight rescan won't work before those steps have been completed. Also doing that a few times seems to cause the editor to crash. Yep, that's the usual state of software. Anyway, I'm pretty happy with Python and some of the other tools which I'm using, because those seem to be very robust and I rarely (almost never!) need to waste my time fighting with broken platform or working around those doing some horrible kludges.
  • Latest jQuery Mobile 1.4.5 contains again the classic bug where stuff goes under the header bar. Aww. I've seen tons of discussions about this issue. And there are plenty of "more or less" silly sledge hammer work-a-rounds how to just make it work. But none of those is actually pretty solutions at all. Some force using CSS to insert empty space at top of the content, which is silly. Some trigger browser window resize event which is silly. Etc. All of these do work. Which just does simply prove that the original problem is a clear bug and I'm not just incorrectly using the framework. Btw. With older versions like 1.4.2 there were no issue like this. Also it's silly that when you open the page, everything is ok, open first link, stuff gets broken, go back and open the same link again, stuff is working again. This is exactly the kind of 'feature' I deeply hate about frameworks, web development and developer tools in general. That's why I like Python so much. If I do something wrong, it doesn't work or at least produces conclusively and repeatedly similar errors. Except ... I think I just had an example about this.
  • Actually I think this is just a case where ordering of events is random, but I don't know why. I would prefer consistent way of repeatedly showing this error.
    File "", line 3507, in __new__
      field.add_to_class(cls, name)
    File "", line 1073, in add_to_class,, self.related_name))
    1st run: AttributeError: Foreign key: ***keyN*** related name "***nameN***" collision with foreign key using same related_name.
    2nd run: AttributeError: Foreign key: ***keyY*** related name "***nameY***" collision with foreign key using same related_name.
    When I run that script it shows more or less random related name errors, even if there's no collision with that particular key. Instead it gives you just related name collision with some of the items being created with foreign related key. I would prefer if it would always show the first collision it encounters instead of interestingly randomizing the order it shows the errors.
  • It seems that some of the page rendering & JavaScript problems were caused by the CloudFlare. No I'm not now referring to situation where page goes under header. But to the situation where page content flashes only briefly without formatting and when JavaScript based page formatting code should format the page, the end result is just empty page. I'm sure everyone has encountered this situation at times. Solution? Disabling CloudFlare's Rocket Loader feature. After that everything is working perfectly. I'm not sure what kind of tricks the CF is using to decide when to use the rocket loader and when not to. But most annoying part of this problem was that it was hard to debug. Because there were no problems at all at times and with some browser and after full reload everything might or might not work etc. So there could be hidden 'ordering' issues where something tries to execute before something it requires is finished loading and boom.
  • SSD drives might not provide extended data retention when powered off. - Some drives lose data in one year, some drives in 3 months and some even faster. I personally would say that some of those times are much shorter than I expected. So it's not a good idea to buy and external SSD for extended data storage. That's a good example where you still should use traditional HDD.
  • VENOM - Funny, modern virtual servers hackablevia floppy disk controller. Yes, that's right. Bugs can and are lurking just about everywhere. Especially in places where nobody bothers to look for those. Are you using Xen, KVM or QEMU on your severs? Have you already patched against VENOM? Afaik, this is one of the examples where cloud service providers better security than self hosted systems. They have real priority on keeping systems secure. When systems are just "run" as side business or business enabler but not the primary focus, things like this could easily get unnoticed and such efficient high priority measures wouldn't be taken. VENOM: "An out-of-bounds memory access flaw was found in the way QEMU's virtual Floppy Disk Controller (FDC) handled FIFO buffer access while processing certain FDC commands. A privileged guest user could use this flaw to crash the guest or, potentially, execute arbitrary code on the host with the privileges of the hosting QEMU process."
  • Read: Final HTTP/2 RFC7540 specification - I need to write separate post what I really think about it and evaluating (my personal opinions of course) the choices they have made. Based on first read through my personal favorit is the GOAWAY frame. As well as I agree with the stuff I've been writing earlier. Now HTTP/2 starts to be so complex that implementing it would be a nightmare. It's just better to use pre-exisitng HTTP/2 library than trying to make compatible implementation. This will lead to situation which has already happened with SSL where there aren't actually too many options where you can choose from. I guess most of web servers won't even bother to write HTTP/2 implementation completely from the ground up. Only maybe some large ambitious projects might do it like Apache, Nginx, IIS. Others will just pass, because it's not worth of it. I'm interested to see what kind of approach uWSGI guys will take with HTTP/2. They seem to be able to tackle all kind of complex stuff quite easily. I guess they got great and really competent team working on it.
  • Something different? Reminded my self about Kilo attack class submarines and especially about the Russian Lada class submarines (Project 677)
  • Does Google Botrun and index JavaScript? - Yes it does. I guess this will be one of the things making again difference between many search engines. Others process dynamic javascript generated content and others won't. Which of course could lead to massively better results by the search engines which do process it.
  • Microsoft investing in global submarinecables and dark fiber capacity? Doesn't really surprise anyone. I thought it's quite clear investment when you're large enough player. It's better to own than rent, but only when scale truly allows it. 
  • Writing responsive and fluid web pages with HTML5 and CSS using the new picture element, without requiring JavaScript to select right images for the page.
  • Why you shouldn't use MySQL (or database in general) as queue. - Yet, it depends so much from the environment which you're coding for and also the performance requirements. I generally prefer NOT TO add new technologies and dependencies as long as I can well deal with the existing ones. Why to use MySQL if SQLite3 will do the job? Why add message broker if SQL database does the job well enough. I personally prefer to use well known technologies instead of using new ones. Because news ones will also bite you. You'll make some kind of naive implementation using the new solution, don't test it properly and after a while you find all kind of race conditions and other more or less interesting "surprises" because you just didn't know how the technology you're using behaves. As example I was quite surprised at one point that select * from table where valuefield=0 will produce totally different results than same query where valuefield = 0.0 Yep. I just didn't know what I was doing, and you're going to hit that kind of surprises several times every time when you just start using something you don't know well. I can program using Python, so I assume I can trivially port my programs to JavaScript at any time and run those using every browser. Well, yes and now. It will be pure pain before it works. Especially if I won't bother to read basics, I just make hasty implementation which works. But if there are tables which are used as some kind of queue with status flag, at least it makes sense to use partial indexes! This is one point I've mentioned several times in my blog already.
  • Did you know that Windows already contains port forwarding / TC / UDP relay as it's basic feature? That's nice. It's really useful at times. "netsh interface portproxy"
  • Started to use atexit from Python standard library for one project which requires quite much clean up when it exists. Very useful. In many cases I've used try, finally construction.
  • uWSGI background worker threads test was successful. uWSGI background threading test was successful. So this means that all requests which I assume can be little slow to handle due to external API dependencies or requiring reading tons of foreign keys from database, will be executed in background while showing a message to the user that data is being processed on server side. It works. This is also to prevent CloudFlare timeouts and of course this also follows the best practice that request it self shouldn't take too long to handle. That's also something you really have to do with Google App Engine because user facing request processing time is so limited. But it's not a bad thing at all.
  • Latest The Economist (9th, May, 2015) got also interesting article about Artificial Intelligence (AI). It's really hot topic. It's much more than the current "statistical machine learning models" like pattern recognition. Yet deep learning doesn't match yet with Deep AI or full artificial intelligence. Intriguing concept of artificial brain.
  • Reminded my self about multiple version concurrency control (MVCC) and TIMESTAMP / ROWVERSION optimistic concurrency control. Which allows to read data process it and then simply compare and swap CAS data into database very quickly when things are done.
  • Checked out - That's great. I'm not not surprised it's happening. That's something I would also do if I would be in the right position.
  • Don't drown in IPv6 addresses. Hmm, not so interesting post. As far as I can see, everything in this post was absolutely obvious. Main question is that do you want to provide reverse name for all IP addresses or just for those which are being used? That's also interesting question. As said address space is huge and it's going to be very likely mostly or nearly completely unoccupied.
  • Robots and AI are going to replace many jobs, but is that a problem? It's clear that there will be drastic changes in the future. Many jobs will be replaced by robots and also supporting business will be out. Yet this will free people for more productive and better jobs. Bit more writing about self driving cars in US.
  • Because my test project is pretty much working now as I wanted it to. I'll focus next on machine learning Python libraries. Only things which remain to be fine tuned is a few JavaScript things and some heavy background processing of data. Which are technically trivial tasks, but just require the right mood. To produce nice visualizations from the data I'll be using Tableau which is one of the data discovery and visualizations tools I really love using. I've also got some pretty nice data sets which I can use for testing, unfortunately those are such that I can't publish the results directly. But I think I can then utilize the learned lessons and resulting models for something else later.
  • I've been always wondering why high end vending machines aren't more popular. Wouldn't it be optimal at least in cities where you have expensive squaremeters and so. You could simply oder what you want and when you're at the store you can just pickup good which are there ready waiting for you instead of waiting. Of course next step would be automating the delivery from this point on. Having picking in a park? Running out of wine and cheese? A few clicks on mobile and stuff will be landing there in two minutes. Even without this delivery method, I thought this kind of fully automated pickup points would be nice near metrostations and so on. There was a story about small village which isn't large enough to run profitable store. They replaced store with fully automated vending machines. So this even works in cases where there are too little customers for traditional store with personnel. Is it time to create container system for this kind of vending systems? Refilling, transport and everything could be fully automated. In many European countries you can just pick up your mobile data pre-paid SIM card from vending machine. I was disappointed to notice that it wasn't that easy when I traveled first time to US.
  • Is blocking ICMP on firewall a bad idea? - Been there done that, I've also tried blocking protocol 41 and dhcp and yes, you'll end up breaking the network. I've also seen tons of networks where DNS is more or less broken.
  • Checked out Vortex Bladeless Vertical Wind generators - I wonder if you can call it wind turbine, because it isn't a turbine at all.
  • SSL Labs certifies my SSL/TLS is now as A+
  • Failure of Agile? - I really liked the GROWS method. - Who said that the agile process it self wouldn't be agile. Of course if you see need for modifications you'll adapt it for your needs. Shouldn't that be clear to everyone? I personally think that the failure starts at the point where talking about process politics is much more important than what we're actually trying to get done. It's absolute loss of focus. I think it partially belongs in the category of Analysis Paralysis. Next time when you need to carry something from till to your car, hire personal trainer, process consult, physiotherapist, environmentalist and a few other guys and have meetings for a few months to figure your what's the best way of getting your groceries to the car. Are you really using plastic bags? Have you made research about plastic back environment effects? No, I think we should one new team to research that topic too. - So much fail. Yet, I've seen that happening over and over again. I personally know several engineers which always seem to prefer this way. Whatever and even how simple tasks need to be done, they can spent months of it producing absolutely nothing with value, just because it's important to research this topic. How about just getting it done in a few hours instead of using months? I just wonder why "smart programmers and engineers" often seem to have total lack of common sense as well as extremely poor or even non-existent understanding of Return on investment (ROI). - - Where's the practical approach to it? If customer is paying N units for stuff that does X. Do you get the customer to pay 20x the price if you do it stupidly well researching everything and writing "so great code". All that it takes, that it does the job and works reliably. Everything else and all kind of coolness and research junk is just absolute waste of time and reduces profits from the job. Sometimes I even see people involved in sales to do similar kind of stuff without properly considering it. It's horrible, they if someone should be very knowledgeable about costs and profitability of stuff being sold. It's different to do something has hobby and do it ask profitable business. I don't care if you've been building that absolutely picture perfect WWII battle ship model in your cellar for last 5 years. But I do wish you a very good luck trying to sell it with a good hour price to someone. I'm also often wondering employees who seem to be absolutely clueless about profitability. Isn't it everyones job to take care that they work profitably for the company? Of course one factor affecting this is that many people are absolutely clueless what's their work worth of. Also focusing on tasks which will produce long term savings (increases profit too) or direct profits might be very good investments. Unfortunately many times these guys investigating something 'cool' or 'perfect' do not focus on those aspects. Things like renegotiating or changing service provider for network connections, servers, or reducing licensing costs, automating processes, system integrations, and so on can easily generate savings which can be counted as 'passive income' and be several orders of magnitude larger than your yearly salary, even on monthly level. Thats' what makes you valuable. Over engineering something for 'one off' cases, makes you just drag for whole organization. Also self-guided attitude helps a lot. But only if you can smartly figure out what you should be doing, so it's most beneficial for whole organization. I always remember when one consultant working for large ERP company said something like: "My salary is so high that if I'm not always invoicing all the work I do, I'll get fired pretty quickly. I just need to make sure that my work is worth of it to the employer and it's customers." - Yet it seems that it's quite a small percentage of work force whom gets that. You'll get agile, when you put a small team of competent guys with right focus to get the job done, and don't give them any unnecessary rules how to organize the thing. They'll figure it out almost immediately without wasting time by starting to create some kind of more or less useless rulebook. This makes common sense, these are your strengths, you'll do that, let's run iterations, ask for opinions when needed. Let's just get this done and delivered. And most likely the result is well enough and it gets done pretty quickly. Also when ever something is unclear, iterate really quickly using light drafts and then code it. When you compare it to the 'other teams', they still might be discussing some initial topics like who should be putting to this team and what kind of external consulting offers we should ask for. (Yawn)
  • This also reminds me about Parkinson's law of triviality aka bikeshedding.
  • Also see the GROWS method website for future of discussing topics and wasting our time instead of getting something else done. Yeah, this is kind of joke, something like GROWS is a good idea. But it's completely another question if I personally need it. I think I've got enough experience in this business to draw my own lines and optimize per case quickly instead of using some more or less suitable fixed rule sets. When someone says do X I'm always asking why, if it seems reasonable, Can this be done better, and then next question is is it worth of planning how to do it better. Very often the answer is no, it isn't.

1-10 of 256