Blog

Google+
My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me! My views and opinions are naturally my own and do not represent anyone else or other organizations.

[ Full list of blog posts ]

OpenBazaar more thoughts & ideas

posted by Sami Lehtinen   [ updated ]

My comments & questions about Ricardian Contract JSON format


OpenBazaar Ricardian Contract Template.md

Please note the comments are based on early version which has been since updated. Comments are also really quick draft thoughts based on single reading pass. So it's not going to be flawless.

Numbering

I assume the zero lead int (len 2) prefixes for keys are there to order stuff for human readability. Aka ordering entries in certain order Are those going to be included in final format? I don't know if it matters, as nerd I would say ditch those, but using those could still improve readability during specification phase.

Numbering is tight, what about new inserts? If numbers are included changing numbers would break stuff. Or should code strip numbers before processing the contract even if those are in JSON or what? I guess everyone remembers renum in BASIC.

Using tight numbering leads to similar problems to traditional CSV columns (without header row) or length delimited data (without header row), where new additions are only practical to the END of whatever 'set' there is.

 - Numbers got ditched.

Category

Can category contain list of category information including sub categories? Transportation -> Cars -> Cabriolet or whatever, using some separator. Or does it need to be one 'non-hierarchical' string? How that will work for UIs? I know there's category_sub, but it's clearly for different meaning.

 - This one got answered, there will be mapping to numbered category lists with hierarchy.

Consistency

If other tags got prefix number, why doesn't 'inner layers' have? Like in 02_pubkeys there's Bitcoin and PGP, but those do not have prefix number? I'm tech nerd, I hate things which doesn't seem to be consistent. Consistency is the key for 100% automated processing, any exceptions to 'generality' will suck.

 - Like I already mentioned, numbers got ditched.

Extensions

Official and non-official extensions? Official way to separate between those? Standards like SMTP and HTTP/2 (and many others) contain clear definition how you should deal in such cases. Of course some 'non official' extensions are pretty standard too. Like some X- headers with SMTP. I would like to see ways to do both, official and non-official extensions. Non-official extensions would be included in contract but ignored by clients which do not understand what those mean. Like I said, I've seen so many extendable formats, which get totally broken if you go and extend those. I would hate to see it happening again.

 - No official comments to this, so far, afaik.

Currency units

I really would like to see a possibility for fiat currencies, or let's say "other than Bitcoin" currencies. Are those mutually exclusive? | or & ? Based on rest of contract structure I assume it's | aka or.

 - Alternate currencies are available as nomination, but not as payment method. It's still fixed to Bitcoin.

Images

I assume that hashes and URLs both need to be filled. Of course JSON is very flexible, but I would personally prefer lists in lists or dictionaries in list than two separate "parallel" lists. I know that ziping two lists works works, but it still can be a pain for someone to parse that stuff. So [[url,hash],...] or [{'url':'','hash':'',...}] of course this is just my humble opinion based on long experience.

No more contract encoded or via OB network delivered images? Maybe there should be some URL format for 'via' OB network images (no HTTP), which could be used here?

 - If I understand the situation right, images will be delivered via OB in future.

Keywords

Is there a maximum number of keywords? Otherwise I could form a set("wikipedia_text_dump".split()) to get a few hits for my contract as well as flooding DHT with absolutely tons of keywords.

 - If I understand current situation right, there's now limit of 5 keywords / item.

Order Processing time

Whenever there are values or fields, I would prefer to see unit. At least in documentation. Or is it truly open text field? When standardizing something I would prefer uniform presentation with units. If it's 'open text' it can't be properly parsed by software reliably. I suggest using unit like days, or then open text. So user can write that stuff will be shipped every Monday or something like that.

 - Now units are always in days.

Passcard

We've talked about so many different technologies? Does the Passcard need to be there. In general, I like this kind of things kept open. Like a dictionary where there is key and value, and key can be Passcard. Or just two keys, like method and value etc.

 - Now Passcard is called Blockchain ID and can be used to map unique username and subdomain to OpenBazaar GUID.

Bitcoin

Bitcoin is there in several places. If value is nominated in fiat, does it mean that the payment can be still made only using Bitcoin? If not? There should be place for 'alternate payment' channel, which also should be encrypted similarly to the shipping address. (Whatever the method will be.)

 - Like I said, it's only Bitcoin so far if I got it right.

Shipping address nonce hash

What's the use of sha256(nonce)? Just to verify that the nonce has been correctly delivered if it needs to be delivered to the moderator? I don't see any other use for it. Also applies to other places where similar construction is used.

 - Yep, it's for confirmation purposes only. It's easy to confirm against that hash if the nonce given is one used with contract.

Content source URL & Password - No separate nonce?!?

Now we're encrypting using XOR TWO separate strings using the SAME nonce. Fail, that's guaranteed fail. I could easily assume that the URL begins with http:// or https:// (or upper case) and that gives me 7-8 first characters of the nonce straight away allowing me to extract the beginning of url related password. Knowing how bad passwords people often use, that's clear and straight FAIL. Of course one nonce field could be used in theory, if it's as longs as those two together and split somehow using some length information or so. Well this is the place where the theory of OTP being perfect of total failure becomes immediately true. Classic key reuse mega fail.

 - Well, you've guessed it right. It got fixed asap.

Rating

I see there strings, I assume those could and should be mostly int values (?), except the review. I know open discussion is going on, but if there are anything else than pre-agreed values it totally foils automated processing.

I think the quality was 'Gggrr8e8ttt!' and delivery time was "I'mmmm just soo happyyy!! - Love u!". Yep, that's it, no other questions or comments so far. I'm sorry for everyone, if I'm being so darn critical and maybe bit sarcastic, but that's just how I am. Things are either done right, nor not.

 - Now ratings are integers and reviews will be character limited. So things are getting so g0000d! ;)

Schema rigidity / structure / flexibility / documentation

Only thing which is after all really important is the clarity of documentation whatever style, format or structure is being used for contacts. I often tend to say that I do not really care about the format. I just need to know exactly what it is and I can deal with it. Yet, it's much easier to deal with somewhat sane than some other more or less insane formats. If some field is directly related to other field providing only redundant data, it should exist at all. I know that there are cases which counter this style and there are also several other points how to do it and why. But the reason why I prefer it is flexibility and simplicity when using modern tools.

If contract got key auction does it require another filed which is contract sub type auction? If it's not an auction then the lack of presence of auction key should already indicate it. But this is just my personal opinion. Target is to make things as flexible and dynamic as possible. Rigid schema guys (usually enterprise XML / XSD, Java, C++ etc), will hate me for such thinking. Like I said something about standards board, I've dealt with. Redundant information is ok, if it's in completely different sections of contact which are being handle by completely different subsystems. Yet even then it's a good question should the redundancy be in the document or should it be 'injected' at the point when data is split for the subsystems. Where the processing logic should be? These are really great questions when things start becoming more complex. Yet if there is a library and API, it's often good to implement as much logic as possible in it and try to keep rest of message handling just as passing of bytes with minimal routing information. (Example EDI envelope). I wrote this stuff about 4:50 am, so it might be bit repetitive and isn't properly formatted. But I hope it brings the point out somehow sensibly.

Field usage and relations in details in documentation, integrations

This is the most common issue with data structure documentation. I can see the structure by I don't know how fields are related to each other. As well as the content of some fields can set requirements for other fields having some value or making those unnecessary. This should be very clearly marked. These fields are only used when sub type is auction.

I've already had this exact problem when parsing the current JSON messages sent via WS. Yet this is very common problem in industry and documentation. Format is described in detail but values, meanings and relations between fields and different tables aren't.

It's so common that it's more like business as usual. Especially finding out the details of the relations and meanings without working test environment and customers dataset can require never ending test and failure iterations. If iterations are slow things can easily end up taking months or even years. I guess we've all been there with larger integration projects.

More semi-random comments & thoughts


Category: Google's Product and Services taxonomy. That's great choice for preexisting extensive set!

Extensions: Using alternate obcv version wasn't my original goal. Contract is crafted according 2.0 etc, but it could contain additional tags, which will be ignored by clients which do not understand those. But if there's 'rigid strong schema', I understand that this might be unwanted feature. How about then adding optional key like 'ext' which would allow 'any additional data', which can be carried with the contract. Or is the plan that every system requiring changes to standard schema will use it's completely own schema?

Currency / Bitcoin: Why two fields, if there's only one active price? If price is in BTC why not just use corresponding currency code?

Images: Very happy, that's the way to go. Fully utilize what is already done and doesn't require even additional external resources and dependencies.

Keywords: My personal opinion is that 5-10 should be enough(?). It's enough to clearly tell what it is about, but doesn't allow extensive keyword spamming.

Shipping address nonce hash: Very valid point, yes it's important to verify that nonce is the right one.

After the revisions already made only extensions and currency questions are open.

Distributed network thinking, full mobile peers, mobile peers using a relay to access network, or mobile thin client where basically everything is hosted on a server. All of these models got own pros and cons which have to be carefully thought through and considered.

Some random OpenBazaar DHT related ideas and views


Sorry, This text could seem to be out of context, because I can only quote my own text from private discussions.

My point was that store / market doesn't need to be same thing as node. One node should be able to run multiple stores/markets, afaik. Hence multi-tenancy.

Reason for DHT being 'unreliable' is usually the reason that DHT data is rarely persistently stored and network node churn will get rid of data even if it's TTL doesn't expire. If DHT data is persisted for extended TTL then there needs to be republishing to replace replication lost due to node churn. Even that doesn't make it reliable, but makes it much less likely that there aren't any copies around.

This method allows network to survive with 'reasonable' churn and retaining 'acceptable' reliability. Yeah, I know, I used weasel words on purpose. I don't have any facts. It's just like any other storage system which should retain N+1 replicas, if replicas are lost then you'll need to re-replicate the data to maintain minimum number of copies. If that fails, well, then it fails and data is lost at least until some peers might come back with the data.

Technically all DHT does in such case, is just 'distributes blocks data randomly on Nodes due to data address being used on hash' and replicas usually on nodes with neighbouring IDs, which usually is also random in real world terms.

Some networks do use supernodes, but I personally don't like that idea too much. Yet especially when bootstrapping network, it's good  that there's some kind of Node Reliability Indicator value which let's the clients to know if the node should be expected to be on-line or not after a some downtime.

Afaik, this is best approach if we exclude the other options:
A) Official paid central storage.
B) Nodes and moderators hosting their own content and being unavailable when nodes are unavailable.

XXX - I liked your model. I guess you've been studying email (SMTP / POP / IMAP / WebMail) or NNTP. Smile. I also loved the model how XXX wrote it. That's exactly the model why I love Freenet, it replicates and generates copies of data much faster than BitTorrent or similar P2P networks, because any node handling the data will cache it. So sources aren't limited to seeds and peers of that particular swarm alone. Yet with network churn, and depending from availability and TTL this could cause situation where it's hard to find a working source.

The Auction Haggling is also related to this. Afaik, I haven't seen it documented how you can delegate handling of auction ot another node. This is similar, the contract is created by some other entity than which is hosting it. Basically it's the same question. Except with auction haggling it's bit more complex, because in case of bids are received the hosting node would need to update the contract and that can't happen if it's signed by the original creator's private key. I've asked this earlier but so far I haven't seen any solution. All of these scenarios are indirectly linked to multi-tenancy. Of course there can be different kinds of multi-tenancy, one where the contracts are just hosted by the node. And in some other case there could be 'full stores' hosted by the hosting company, just like in case of on-line Bitcoin wallets, then the hosting authority also got full access to private keys.

XXX - That option to include list of listings is nice. Yet, I would prefer node information instead of IP address. IP addresses are and will be very volatile in future. So IP address can change very often, yet the node using different IP will remain the same.

Separating server / core-node code from client code would potentially allow making JavaScript only OB client, which would use then standardized 'proxy' to talk with OB network. This kind of solutions have been seen with all kind of tech, like the good old NNTP, IRC, Email and so on. Very old model. As well as very cool for mobile clients, which generally don't like full P2P networks. Who runs a full Bitcoin client on mobile?

I've also talked with people about using WebRTC which would allow direct JS / HTML5 browser 2 browser communication. Using it for globally distributed CDN networks, gaming, VOIP and so on.

XXX - Don't talk about IPs it's a very bad way in general to refer P2P nodes identity on Internet. Some other identifier like node id should be used, which then can be mapped using DHT or other methods to IP address and port. But the IP adress isn't a nodes identity.

XXX - I really love your thoughts and that abstract thinking.

XXX - I was referring to loading of information when browsing store / items. It doesn't matter if it's standalone app or a website. It still can be slow to browse. Most annoying example of this is image viewer apps which do not pre-decode the next image. With high resolution art decoding of next image could take 250+ ms which is enraging if you just want to pretty smoothly scan through the images.

XXX - That DHT data storage with automated replication would solve DDoS problem, because you would need to DDoS the whole network to bring it down, at least in theory. If some nodes go off-line or are inaccessible, data should be just replicated to more working nodes if the re-publishing / automatic 3rd party replication is implemented.

XXX - DDoS / Flooding / SPAM on DHT is quite hard to manage. Those good old times, when playing with ED2K network. Flooding with fake peers and making all P2P network nodes to connect to a IRC server to bring it down etc. So much fun. In early days many protection systems were near to none. Very simple tricks to trigger DDoS worked very nicely.

For the users which are looking for #3, I can assume that someone will be providing that service anyway. If it won't be directly available, someone will implement it anyway sooner or later.
#1 Won't work well, because so many users already have limited Internet connectivity on mobile -> leads to #2 indirectly anyway. As well as being full peer demands much more bandwidth, storage, CPU resources and so on. This is one of the reasons why distributed isn't cool on mobile practically, even if it might at first hand sound cool to nerds. Skype was like #1 for most parts but now they're reverted all the way back to #3 mode. From the buyer part #2 is the only option. Because in that case the gateway being used can change at any time. So the gateway must just work as 'passive relay'. IMHO, AFAIK.

Network churn isn't (directly) the main reason for battery drain. It's the generic network coordination traffic which causes the battery drain, because full nodes are involved in many kind of coordination traffic as well as working as DHT nodes and so on. Network churn is just small part of that, yet it's the reason why the coordination traffic has to be quite 'active' and peers have to be checked for being alive more often. Of course is can be optimized, so no need to ping, if there's other traffic. Peer pings need to be generated as keep-alive only if there hasn't been any other traffic.

Karma points? Without any references? Can I launch instant karma site? It would auto generate a new store every 15 seconds, give all my subscriber clients a karma point, rinse and repeat. 10€ / month to receive ~173k karma / month? I've got a long history and plenty of experience in different spamming and DoSing networks and playing with systems utilizing different attack / spoof techniques and this one seems really easy to make. If that works out well I can ramp it up to 10 - 100x rate using parallel instances. Fully automated. Very classic problem with many trust systems. Simply a massive automated Sybil attack.

Some deep discussion about different distributed user trust systems considerations, but because I referenced so much to private documentation / talks, it doesn't make any sense. Removed.

I also studied OpenBazaar Product Specification in detail. Unfortunately no public comments, because the document isn't public yet.

Btw... OB team, I would also love to see IPv6 networking.

Python IPv4/IPv6 TCP timeout, CloudFlare, Zerto, BI / SLQ / ETL, VDSL2, Linux Sec, Pen

posted Sep 1, 2015, 7:29 AM by Sami Lehtinen   [ updated Sep 1, 2015, 7:30 AM ]

  • Slow TCP connection, time exceeds timeout but works(!) - Today I encountered really interesting issue with Python networking. I try to connect a server which does have A and AAAA records. Yet the software running on server is so silly that it provides service on different TCP port depending if it's being connected using IPv4 or IPv6. What's even stranger? I noticed in many logs that connection time to server was like 5 seconds 16 milliseconds. But wait, didn't I configure 5 second timeout? How it's possible it's 5 seconds and 16 milliseconds? In many cases the normal time for non IPv6 servers was around 16 milliseconds. So I noticed immediate pattern. 5 seconds + 16 milliseconds, ok? But does that make any sense? I was trying to connect port X yet the IPv6 (which is preferred) uses only port Y. Yet it seems that after the connection to X failed using IPv6 Python tried to connect the server using IPv4 and port X. Ok? That's nice, now it's working? Potential trap? If you set time out to 10 seconds, it's possible that the connection attempt timeouts after 20 seconds? Why? Well first 10 seconds was used trying to use IPv6 and second 10 seconds is used trying to connect the server using IPv4. It's nice to have IPv6 -> IPv4 fall back, but it can surprise you at times. I guess this is documented, but I just haven't happened to read such documentation.
  • CloudFlare just continues incredible adding of PoPs.
  • Checked out Zerto - Virtual Replication Business continuity and Disaster recovery (BCDR) solutions. Which is hypervisor level based replication. Currently I didn't find a need for it due to facts how many systems are designed to work. But it's good to be aware about available solutions if and when those are required.
  • BI consultants making extremely bad / heavy SQL queries bringing the system down. Nothing new, they don't bother to think what's the smartest way to get the data. They just try to pull everything out potentially causing huge lock contention.
  • Again encountered some admins that seem to be unaware that Windows contains a proxy server by default. netsh interface portproxy.
  • VDSL2, FEC, CRC, HEC, Interleave, Latency, VDSL2 - Reminded my self error correction (Reed-Solomon), DMT Modulation and Interleaving & latency things.
  • Linux workstation security checklist - A good read.
  • Some history, how ball point pen killed cursive? - Also checked out gel pen and rollerball pen.
There's a lot more. I'm just posting smaller chunks now. And not more than one / day.

Fallacy, Bitcoin, DDoS, Tools, CDN, OVH, OB, DB, RND, etc

posted Aug 31, 2015, 10:52 AM by Sami Lehtinen   [ updated Aug 31, 2015, 10:52 AM ]

Once again, weekly random stuff...

  • Informal fallacy - Yep, that happens. Even better, it's good to read this list every year or so. List of fallacies.
  • Great post why Bitcoin is forking, and good discussion about digital currencies and scalability as well as dispute resolution process.
  • Internet is Hostile / No Internet Security - Good writing about Internet & Networking security. Yet, it didn't contain anything new at all.
  • Intermediate Python - Excellent Python e-book.
  • Read Portmapper a new DDoS reflection Attack by Level3 - A new vector for reflection and amplification DDoS attacks across the Internet.
  • Also checked out CDNetworks.com and their Russian and Chinese acceleration plans. It seems that China and Russia are completely separate entities from the rest of global CDN market, which are dominated by well known major players.
  • CloudHarmony - Is a great way to compare cloud services.
  • Just listing some other tools I'm often using:
    • Super-Ping is excellent global PING tool.
    • What's My DNS is excellent DNS propagation / state checking tool.
    • TestURI is excellent tool which allows you to test any URL (HTTP/HTTPS) and see related headers.
    • Of course everyone knows dig curl mtr etc, but always you can't do it locally nor you have suitable servers to do it from.
  • OpenBazaar is now using three separate git repositories for installer, client and server. I can't wait that I can check it out.
  • CacheFly fail. Their portal. I tried their new free tier plan just to check out what kind of control panel and settings they got. Yet they managed to fail this simple test. - Doesn't simply work at all. Just shows blank page. Way to go. This is just like web shops. First they complain that business is bad. Yet they deliver such a devastatingly bad UX and service that I wonder how and why anybody wants to ever make any business with them. This is just generic statement based on so many absolutely broken stuff I've been seeing all the time. - Yep, confirmed, won't work with Firefox on Linux nor Chrome on Windows 10. So much lulz. - Their portal was probably broken, it seems to working now.
  • Really wondered how long one domain transfer can take. In total it took 6 weeks to transfer .dk domain from one place to another. That's incredibly slow.
  • I assume that redirect.ovh.net is load balanced cluster of servers behind one IP address. Why? It seems that about 10% of requests I make to that address rediret to imp.ovh.net instead of the right destination. Maybe there are just some servers of the cluster which do not refresh information from redirect data as often as it should happen. Strange. I'll check this bit later again and if it's still malfunctioning I'll make a ticket about it. - Yep, it got fixed over time.
    $ curl -I http://domain.pointing.to.redirector.ovh.net
    HTTP/1.1 302 Moved Temporarily
    Set-Cookie: rd=R3047010670; path=/; expires=Fri, 21-Aug-2015 06:07:01 GMT
    Server: nginx
    Date: Tue, 18 Aug 2015 17:52:06 GMT
    Content-Type: text/html
    Content-Length: 154
    Connection: close
    Location: http://imp.ovh.net
    $ curl -I http://domain.pointing.to.redirector.ovh.net
    HTTP/1.1 301 Moved Permanently
    Set-Cookie: rd=R3047011759; path=/; expires=Fri, 21-Aug-2015 05:58:27 GMT
    Server: nginx
    Date: Tue, 18 Aug 2015 17:52:06 GMT
    Content-Type: text/html
    Content-Length: 178
    Connection: close
    Location: http://other-domain
  • Enjoyed company of advanced web developers, who do not understand concepts of keep-alive, session, streams, TCP connection, etc. Web API rate limits (what?) Aww... Hmm... So nice... Well, this isn't first and surely not last time this happens.
  • Dolphin Browser for Android is just so full of bugs. At one point it crashed constantly, now it doesn't crash. But some times it's totally unresponsive, you can't open new tabs. And at times, it does open links in new tabs, but doesn't load those. When you open the tab, it's empty also the address bar is empty. But when you click edit on address bar, it shows the page address it was supposed to load. Then you'll just click go and it loads the page. Who writes this kind of bleep applications. Fail, fail and fail. Well, this is better than crashing, but it's still quite lame.
  • Postgres (PostgreSQL) Guide - A really nice compact PostgreSQL tutorial.
  • Once again had long discussions about automated store with friends. I personally think it would be really great for busy people. You'll get all you need in a minute and that's it. You can pre-order stuff, you'll get resupply auand so on. Absolute perfection for busy people. I've got many friends whom deeply hate shopping. Standing in never ending queues, wondering around huge halls full of everything you don't need and so on. That automated store can be naturally combined with delivery, if required. But it works just as well as pickup, in city centers near subway stations and so on and outside cities with car loading docks. Combine this with IoT and you'll get automatic replenishment of the stuff you wan't to always have on hand. I've been wondering for a long time, why nobody offer this kind of service. Because I do know that in some areas this kind of service is being supplied, but not directly to consumers. They're selling the service to delivery companies and for industrial use. Some companies have been using such technology for over 10 years now. As well as these stores would require possibly only one employee. Taking care of replenishing the delivery system as well as helping with malfunctions if there are such. If the concept is hit, the replenishments could also be modular and also automated and centralized. So you just would 'load in' the system from automated loading cartridge. I guess we'll be seeing stores like this at some point, because at least for me, this seems all too obvious. This is how I would like to shop. Always get what I want and get it in seconds. Naturally this should be combined with different kind of recipe services and other stuff, which have become popular in some cases. Want to get Stroganoff today? Ok, you'll got all you need, we checked your home supplies and added what's required to your next pickup / delivery. Just like luggage handling or automated stocks work at many places.
  • Excellent post, How Does Relational Databases Work - Recommended reading for every nerd, even if you're already familiar with the stuff. If you liked that one, you'll probably like this one too. 119 pages of light reading about databases. Architecture of a Database System.
  • Finnish-Swedish Ice Class (Shipping / Ships)
  • WiGig 802.11ad
  • Have you turned it off and on again? Excellent list of the power of power cycling fixes.
  • I've been wondering how many companies there are actually producing 'global forecast data' and gathering data required to produce it. How tightly those companies are linked? Do they trade regional data, and so on. It would be interesting to read industry insider article about the practices being used. I can naturally guess some aspects, but my guesses are probably pretty off.
  • CloudFlare is also adding new features on incredible pace. Latest addition purge cache by cache tag feature. HTTP headers Cache-Tag.
  • How to Fix Windows 10 data leaks. - Windows configuration guide & tutorial.

CF, eMail, Databases, Google, Programming, Performance, Fusion, Ciphers, Windows 10, etc

posted Aug 28, 2015, 7:40 AM by Sami Lehtinen   [ updated Aug 28, 2015, 7:40 AM ]

  • CloudFlare keeps pushing perfection. New features like Cache-Tag and new PoPs all the time. I can see why this feature would be very valuable for several sites. It allows high cache hit rates and long data caching without repeated freshness checks. Yet data can be expired on demand quickly when required. Perfection!
  • Checkedout hMailServer - It's probably the next mail server I'm going to use. I'm not exactly happy with the 'current solution' and this option is easily the best I could manage to find.
  • Made my computers yearly vacuuming. Smile. Who's doing it there? It's so classic that computer gets broken because it's just too full of dust and heat sinks are more like wool jackets.
  • Read: IPv6 Industry Survey Report - IPv6 is picking up pace, but it's about time to do so. So no news there. I wonder how long it really takes before people start saying, aww IPv4? Who does that anyway? That time will come, it's sure. I'm bit sad that I can't find NT drivers for my latest display adapter. Smile. But it will take a while. Yet I personally do have working NT workstation, just as museum peace so nobody can claim that time of Windows NT would have passed. Smile. There are still users using NT. Lol.
  • Excellent post, How Does Relational Databases Work. Recommended reading for every nerd, even if you're already familiar with the stuff. If you liked that one, you'll probably like this one too. 119 pages of light reading about databases. Architecture of a Database System see architecture.pdf @ berkeley.edu
  • Windows already contains proxy, you don't need to look for one: netsh interface portproxy - Just as a pro-tip I've used it many times. It can also relay using same protocol, to work around firewalls etc.
  • Google Container Engine - I guess this is future. Many products aren't yet designed to utilize microservices, but I can see just so much benefit from this. It's just like well, Unix pipes. Combine things, process data, use clear 'modules' to get stuff done. Yet without proper monitoring system, it can be hard to debug what's failing when performance starts to suddenly and unexpectedly tank.
  • Chebyshev Approximation - Very nice post indeed. Well, in the good old days it was obvious. If you wrote bad code, it was too slow. But nowadays it isn't clear to many developers at all. Here's a good guide how to make stuff faster. I personally made a clock, which tried to use sin and cos to draw second hand on screen every second, and of course it didn't work out, because I used so expensive math functions to calculate coordinates. I just had to use lookup tables.
  • How to fix Windows 10 security and privacy issues. (At least on some level)
  • What has become known as the Dunning-Kruger effect is an example of what psychologists call metacognition – thinking about thinking. - I hope I'm not in that group. Because I all the time feel that I don't know nothing at all as well as I'm totally incompetent. ICT is so complex field that nobody's competent or good. They're just bad or really bad. It's like helpdesk, you should know everything about everything and never fail always deliver, like you're expert in everything.
  • Read about WU-14 Chinese HGV, YJ-18 and CJ-10.
  • Finnish e-receipt standardization is progressing slowly. As well as it's unfortunately(?) going to be national standard. Mostly based on Finvoice e-invoice standard. Also in this early stage it does look like the 'my data' and 'consumer' aspects of e-receipts have been largely forgotten. Current version will mostly server only large corporations which employees are paying using corporate credit card. Well, not exactly what I personally hoped for. But use cases can be extended to outside original range when that's done and the standardization is ready as well as format is proven. My data thinking would allow many additional services which would utilize and refine that information for consumers.
  • Reminded my self about Kalman Filter. Yet I don't see any use for those right now. But it's good to know what's available when you might need something. Here's an excellent example with images.
  • Read about MSI and NVMe.
  • Google Datacenter networking - Google's Software Defined Networking Fabric. It's awesome. Allows 'data center wide cluster computing' by providing enough bandwidth between all computers. No need for localized groups.
  • Found out that Python's time.time() isn't accurate enough for me at all (on Windows). Replaced it by time.clock() in some cases where it's used for timing. Btw. time.time() is much more accuarte on Linux than it's on Windows.
  • Read again about Fusion Power.
  • Even on latest version of ThunderBird, the message handling thread hangs when moving / deleting large number of messages with IMAP, requires app restart. Really annoying bug. I guess nobody cares enough to fix it.
  • Reminded my self about usage of Bacon cipher. Afaik it's more like encoding and steganography than cipher technically.

Finnish web hosting market share study

posted Aug 18, 2015, 9:28 AM by Sami Lehtinen   [ updated Aug 18, 2015, 11:23 AM ]

I checked out market share information of Finnish website market share hosting information at Host Advice.

Hosting country market share

Some examples:
Finland 74%  (26% outside Finland)

Other 8%
US 7%
United Kingdom 3%
The Netherlands 3%
Germany 3%
Ireland 3%

Hosting company market share

Some examples:
Nebula 20%+
(strong market leader)
Amazon 4%
OVH 3%
Hetzner 2%
LeaseWeb 2%

For full details you should check out the full information page linked at the top of page.

They also told that it's not usual to see such a strong national players like Nebula in many markets. Except of course the OVH and Hetzner in France and Germany accordingly.
 
Here's the global web hosting market study article and it's infographic about market shares by Host Advice which I found and started wondering and asking about. - Thanks Lurie and Ariel.

Personal thoughts

The only kind of surprise for me personally in this infographic, was the fact how strong players there are in some countries. OVH is in France and Hetzner in Germany, totally dominating their own markets.

It seems as expected that large number of Finnish web sites (.fi TLD domain) aren't hosted in Finland. The fact that large number of sites isn't hosted in Finland, isn't surprise to me personally at all. I do know that hosting in Finland is seriously overpriced. Service providers always ask for 'Finland extra' when hosting in Finland. Same company can even offer hosting in Europe for much cheaper, but when the same service is actually hosted in Finland will just cost more. Service providers like UpCloud.com aren't shy about it.

I just have a dream that when Hetzner Online gets their Finnish data center it would change this setting. Maybe, maybe not. I don't know also if it came as surprise to Hetzner that Russia changed their legalization about 'personally identifying information' (PII), and now it needs to be hosted in Russia. Maybe this is a factor, maybe it isn't. I don't know. If you're working at Hetzner I would like know more. We can naturally agree if it's public or not. I do of course honor any requests about not to publish something.

I've also noticed that many Finnish sites are hosted at Amazon in Ireland. If the target market is Finland, then hosting in Frankfurt would be much better in latency and availability terms.

kw: web hosting, web, website, hosting, vps, servers, marketshare, market share, hosting company, hosting country, nebula, amazon, ovh, hetzner, leaseweb, infographic, host advice, information, Finland, study, information, markets, global, information, data, chart, percent, percentage, distribution, clients, domains, domain, planeetta, webhotelli, sigmatic, neutech, hosting palvelu, louhi, mediam, capnova.

Jupiter Rising, Pythong Wheels, OVH, SQLite4, PostgreSQL, Batteries, CloudFlare, NVMe, etc

posted Aug 17, 2015, 8:02 AM by Sami Lehtinen   [ updated Aug 17, 2015, 8:02 AM ]

  • Reminded my self about TCP connection listen backlog on Linux.
  • Studied different car door designs. Don't ask why. IDK. It just happened to be interesting.
  • It seems that also the game industry uses all kind of heh, not so clean, methods to fix their code. Why bother to fix a bug, if you can just quickly write some new code to work around it? I guess everyones code is like that, they just don't want to admit it. Ha!
  • Dolphin Browser for Android is just so full of bugs. At one point it crashed constantly, now it doesn't crash. But some times it's totally unresponsive, you can't open new tabs. And at times, it does open links in new tabs, but doesn't load those. When you open the tab, it's empty also the address bar is empty. But when you click edit on address bar, it shows the page address it was supposed to load. Then you'll just click go and it loads the page. Who writes this kind of bleep applications. Fail, fail and fail. Well, this is better than crashing, but it's still quite lame.
  • Excellent Postgres Guide (PostgreSQL Guide) - This version is also delightfully compact, so you can browse through it pretty quickly.
  • Checked out 24M lithium-ion (Li-Ion) batteries, those should be cheaper to manufacture and contain more power than previous generation lithium-ion batteries. It's interesting to see if that's true, because we've all seen so many new great ideas which will change the world and then... nothing happens.
  • Upgraded multiple systems to use Windows 10, without problems so far. Phew. It's just like major distribution upgrades with Ubuntu. You hope everything will go well, and probably will. But you'll never know when you'll end up in really deep hmm, water? Only problem so far has been the fact that with Windows 7 IPv6 worked well but with Windows 10 IPv6 isn't working by default. That's strange. It it worked on some workstations, which probably means that it didn't work on workstations that had 'more tuning' done. If those would be using clean default settings those probably would work. I don't remember anymore what I did to get the IPv6 to work, but it was more like duh, of course. Yet it wasn't anything like enabling / disabling / resetting settings, routing table or so. Maybe I wrote it up somewhere, maybe I didn't. Yet it took quote a while to figure it out.
  • Also found really fun stuff, if you open cmd.exe and then close it using the close button, whole Windows 10 will crash. That's priceless. BTW. This only happens if you have BitDefender installed, but it was still really fun. Also if you write exit and hit enter, window closes without any problems. Then I hit the usual problem, it seems that Windows 10 got problems larger than 2 TB drives, wonderful. Let's move data off the drives and reformat those. Yep, it takes more time than you might think. Also SMB sharing over Internet failed do to some messy firewall senttings, which didn't start to work after 'normal' configuration. Then I just went through all settings switched things on and of and after that it started to work. Yet I don't have a clue which was the actual trigger. Business as usual, unfortunately.
  • Windows 10 adds tons of new features and protocols. So it's safe to assume that it's guaranteed that there will be serious security issues in future.
  • Emailrelay - Yet another annoyingly buggy and unreliable software. It labels messages as bad and stops forwarding those due to 'error', even there's no error. I just can't stop hating annoyingly unreliably software. When you manually reset those messages those are sent out just fine, even if there was earlier fatal error. Duh! Also the software doesn't support IPv6 on Windows.
  • Usually it feels like most of software is buggy and bad. If it doesn't give you directly that impression, just wait for a while and try doing something bit more complex and boom, there it is. It doesn't work and requires fixing.
  • I'm not sure if it's my WinFileLock which is buggy or the software using it, but I encountered one annoying bug today. I'll need to checkout what the problem is. Or maybe I'm just misusing it and assuming that locking was successful even if the locking failed. Either the bug is in the routine calling the locking or the locking it self after it time outs with failure. Shouldn't be too hard to find it out.
  • Checked out Borei-class submarines (Project 955). Which are capable of burping out aka launching RSM-56 Bulava missiles.
  • Read several articles and studies about LTE-U. And if it should use 3.5GHz instead of 5GHz and how it might affect WiFi and other IMS band users. Ericsson calls this technology License Assisted Access.
  • Updated systems to use LibreOffice 5, it's just awesome.
  • Reminded my self about BWR, ESBWR, PWR and EPR nuclear reactor types and differences.
  • I've been reading more and more stuff about Ethereum. It's so complex stuff that it's a good study. Makes you really think. Tons of stuff and more stuff and then you'll need to link up that all together in your mind after getting the subcomponents ignested first. That's good learning job to keep your mind fresh. After every section read, just stop and think hard, what does this really mean for the whole Ethereum system and ecosystem. So far based on what I've read, it allows really complex contracts, but scalability might be the issue. I haven't so far found any good description how they are planning to make it actually scalabe. With current design, it just get's worse when there are more users, contracts, network latency and peers (nodes), miners, etc. Also contracts which run out of gas can cause interesting situations, yet that's predictable for people who can predict it. But I guess some users will be surprised that their contract got reverted back and they still paid the fees. Just because they didn't provide enough gas to full fill the contract processing needs.
  • Are daily backups required? Sure, Microsoft Excel just corrupted one important table, and I had to restore it from backup. Yawn.
  • It's funny to notice how bad flaws some engineers leave into software. Like one project had multithreaded processing for clients. But the login process was single threaded and blocking. So if client opened TCP connection and didn't just waited, it prevented all other users from logging into the service. Only when login was completed or rejected, new thread was started for that client. These kind of flaws go easily unnoticed in small local network where all clients are official and well behaving. But when you enter the wonderful world of Internet, it's guaranteed that you'll be getting a lot of trouble.
  • Read: Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network - Really much stuff there, I'm quite happy, not a single part of it wen't over my head. Most of people just don't need to deal with networks like that anyway.
  • Now all new servers are equipped with NVMe (SSD) drives. Nice! Works beautifully in production, about depending from use case makes I/O tasks roughly 10x - 100x faster.
  • Read: Cold fusion - Someday it's coming, or not? Who knows. Hopefully ITER will work out.
  • It seems that OVH data centers in Roubaix start to get full? Now they're offering new servers at Gravelines and Strasbourg. For international traffic Gravelines is optimal. But for traffic from Finland it's bit hard to decide. Because some operators route traffic to Amsterdam and some others to Frankfurt. So either of those two locations can be 10+ms faster than other, depending what operator is being used in Finland. - Just as funny addition it seemed that last Sunday almost all services on OVH network went down for more than one hour. So far I haven't been getting any information what went wrong. And I couldn't find any report about it from their status service.
  • Wow! CloudFlare is adding new datacenters and cache POPs out on really incredible pace. Now they're extending to MENA area.
  • DeepSound Audio Steganography software.
  • Reminded my self about crowdsourcing and crowd work and about On-Demand-Economy.
  • Reminded my self about SQLite4 LSM features and usage.
  • It's nice to promise ultimate new technology fro VOIP telephony, Zyptonite promises: "Zyptonite keeps your calls connected even in the elevator and when you have no reception.". Just for some strange reason that really doesn't convince me.
  • Python Wheels - Wheels are the new standard of python distribution and are intended to replace eggs. - That's really nice. Somehow I really haven't liked eggs so far. Yet PIP is awesome when you want to install some libraries which require compiling and aren't pure python libs.
  • Had plenty of problems with Remote Desktop Protocol (Terminal Services, Remote Desktop Connection). It seems that it's just very vulnerable to all kind of Denial of Service attacks (DoS). Basically anyone can bring the services down easily on purpose or even purely accidentailly. RDP host seems to be really bad at handling these issues. It's a great question if many of the attacks agains SSH, HTTP or TCP can be counted as attacks. Those are just utilizing exising features and causing Denial of Service via the known standard methods. Is that a problem or not? Depends how it's being viewed. It's clear that Microsoft thinks it's not a problem if system is trivially DoSable.
  • In SSD security discussions we had long discussion if overwriting SSD is actually required. If drive supports RZAT there's no easy way to get around that. Unless the drive contains something really valuable, it's highly likely that nobody bothers to work around that. At least on consumer level.

That's not all.

Do things that won't scale

posted Aug 17, 2015, 7:33 AM by Sami Lehtinen   [ updated Aug 17, 2015, 8:03 AM ]

Just some of highlights from Paul Graham's post with same title.

-- Quote --

The mistake they make is to underestimate the power of compound growth.

Marketplaces are so hard to get rolling that you should expect to take heroic measures at first.

Almost all startups are fragile initially.

The big danger is that you'll dismiss your startup yourself.

Get some initial set of users by doing a comparatively untargeted launch, and then to observe which kind seem most enthusiastic, and seek out more like them.

It's not the product that should be insanely great, but the experience of being your user.

By building something you yourself need, the first thing you build is never quite right.

It's often better not to aim for perfection initially.

Sometimes the right unscalable trick is to focus on a deliberately narrow market.

Any startup that could be described as a marketplace usually has to start in a subset of the market, but this can work for other startups as well. It's always worth asking if there's a subset of the market in which you can get a critical mass of users quickly.

Among companies, the best early adopters are usually other startups.

B2B startups now have an instant market of hundreds of other startups ready at hand.

Sometimes we advise founders of B2B startups to take over-engagement to an extreme, and to pick a single user and act as if they were consultants building something just for that one user.

When you only have a small number of users, you can sometimes get away with doing by hand things that you plan to automate later. This lets you launch faster, and when you do finally automate yourself out of the loop, you'll know exactly what to build because you'll have muscle memory from doing it yourself.

Some startups could be entirely manual at first. If you can find someone with a problem that needs solving and you can solve it manually, go ahead and do that for as long as you can, and then gradually automate the bottlenecks.

It's not enough just to do something extraordinary initially. You have to make an extraordinary effort initially.


Aurora, Ethereum, Tor, 3DXM, Security, TKIP/RC4, Hosting, OpSec, QUIC, FireChat

posted Aug 1, 2015, 9:45 PM by Sami Lehtinen   [ updated Aug 1, 2015, 9:45 PM ]

  • TeliaSonera (Sonera, Telia) confirmed that they'll be building Finland's largest data center at Pitäjänmäki. Which is open to customers for co-location, etc. It'll be located about 200 meters from where I'm currently working.
  • Checked out Amazon Aurora - Amazon's blog post about Amazon Aurora - Yet I don't have currently any use for it.
  • Checked out Ethereum and read Ethereum White Paper and Ethereum Developer Tutorial - I've been having bit similar thoughts about OpenBazaar and Smart Contracts.
  • Checked out Augur - A decentralized future of prediction markets? At one point I was very interested about prediction markets.
  • Checked out EtherX - A fully decentralized cryptocurrency exchange, of course based on Ethereum.
  • Checked out OnionCat - It's an IPv6 VPN over Tor or I2P network. Allowing location privacy and strong security and defeating IP spoofing. It's excellent too for maintaining anonymous servers on the net. Which are hard to track back to the real administrators.
  • Reminded my self about some Tor  stuff: node types, directory authority, guard, middle, exit, relays, fast, stable, hsdir, v2dir, valid, flags, consensus algorithm, authority operators, padding, 3des, cipher suites, AES, padding, cells, certs, authenticate, authorize, TAP, ntor, curve25519, ECDH, pluggable transports, signatures, usage statistics, qunite items, GeoIP digest, bandwidth and stream counts, keepalives, path selection, rendezvous point relay, attacker, probability, random, math, network and user traffic profiling, fingerprint attacks, traffic correlation and confirmation attacks, countermeasures, bandwidth scanner, load balancing, proportional-integral-derivative controller, bridges, censorship, recurring, obfs3, obfs4, scramblesuit, fte, meek, bananaphone, stegotorus, skypemorph, dust, dust2, dlopd, sshproxy, git, generates random bytes and traffic patterns, randomizes packet sizes & timings, Format Transforming Encryption (FTE), Deep Packet Inspection (DPI) evasion, Markov Chains, maps data to text, Stegotorous, splits data over multiple paths and makes those look like HTML/JS/PDF etc, collateral freedom (meek), Flash Proxies, faciliator, intermediator, middle man, hidden services, introduction points, random value, nonce, cookie, CTR, Public Key, Encrypted, ephemeral single-use public key, traffic correlation, recognize traffic signature, HTTPS Everywhere, NoScript, Reproducible Builds, protocol improvements, directory mirrors, Post-Quantum Key Exchange, revocation keys. Hidden Services 2.0 will implement new much longer .onion addresses, that's wonderful, ring location randomization, directory authority voting, correlation attacks, entrance traffic and exit traffic, dragnet data collection. It was a good read.
  • What was new to me in that latest Tor spec? A new longer addresses, Post-Quantum Key Exchange and BananaPhone were new stuff to me. Otherwise everything was pretty much old stuff and known or 'obvious' development, like kicking DHE and replacing it with ECC. It seems that I'll have to read this separately. Actually the BananaPhone was something I've been thinking about too. Hiding encrypted data into English text, so it's text Steganography.
  • I used hidden service to access some servers (administrative) at one point. But after all I felt it's not a good idea and dropped that project.
  • Just as general comment. Many of this tech stuff is getting really deep. Unless you'll study it continuously and update your information monthly, it can take months of years to catch up!
  • 3D Xpoint memory - Yet another storage layer to be added to multi-tiered storage system. So cpu registers, cpu cache (multiple layers, at least 2x), ram, ramdisk, xpoint memory, ssd, hdd. That's quite a chain of different technology layers for data to flow through.
  • Feeling so tired about how bad Microsoft Server operating systems are. Those got constant DoS (Denial of Service) issues with Remote Desktop Service / Protocol (RDS/RDP) and they're doing nothing to fix it. Issue has persisted for several years with multiple Windows Server versions. Extremely annoying, causing unplanned random system boots because Windows is just so much fail. - This is my personal honest opinion.
  • "No one can hack my mind" Comparing expert and non-expert security practices [PDF] - This is just so awesome. It clearly how differently experts vs normal users think about security.
  • "All Your Biases Belong To US" Breaking RC4 in WPA-TIKIP and TLS [PDF] - Excellet paper about Wifi (RC4/TKIP) hacking and fails of RC4 + IVs.
  • A few sites I want to share with you: https://www.privacytools.io/ and http://www.infosecindustry.com/ those are excellent information and news sources.
  • I did read more stuff about Docker and played a bit with it. Yet as said, I don't (yet) see any use for it. But it's just like ready virtualbox images, there might be use it for it. But it's probably not needed in daily user. As well as discussed issue with development, staging and production environment differences and how docker could help in that field.
  • Actually some of the interesting projects overlap nicely. Outernet (Satellites) will be complement and over lapped by Project Loon (Balloons) which will be overlapped by Titan platform (Plans) and when there's network coverage then Internet.org by Facebook can be delivered to users for free. Awesome and nice. I completely agree with this stuff, if it's free, then you can't whine about net neutrality. If you want free access to all content, feel free to pay for it.
  • Studied Google QUIC Experiments [PDF] document - Providing 0 RTT and 1 RTT at times (~25%) connectivity. Also reminded my self about RENO and CUBIC differences. TCP congestion-avoidance algorithm - https://en.wikipedia.org/wiki/TCP_congestion-avoidance_algorithm - Also reminded my self about TCP timestamps and PAWS (TCP Sequence number wrapping).
  • Played a little with crunch, airmon-g, airodump-ng, aircrack-ng, reaver and other standard WIFI / WLAN hacking & cracking stuff.
  • Studied High-speed Onion Routing at the Network Layer (HORNET) [PDF] - After quick reading, some of the claims sound bit far fetched without technical proof. Also it doesn't protect against the confirmation and correllation attacks, duh. As well as 'high speed' is more linked to node speed than the actual platform. Coded in Python, hmm. Isn't that 'computationally' expensive? It depends, there's so many things you could speculate about based on that paper alone.
  • Have been doing some comparisons between OVH, Hetzner, UpCloud, Sigmatic and Capnova about hosting solutions. I'll write more about this bit later. Google Compute Engine (GCE) also offers three zones in St. Ghislain, Belgium.
  • Had long discussions with friends how beneficial IPv6 is compared to IPv4. Without NAT there's no more need for  constant keep-alive traffic things work as they were supposed to work, before Internet got broken. True stateless connectivity available and so on. That's wonderful!
  • OpSec is really hard for most of people. It's practically impossible to get them to follow any reasonable OpSec procedures. As example: What kind of moron first creates a message draft on Gmail, writes it there. Then encrypts it using PGP and send it? Aww double Aww... Didn't he/she realize that the Gmail is going to store the unencrypted draft version too?
  • Checked out Helion Energy - Hmm, lots of promises, light on details, but where's the deliverable? Yep, it would be nice to have fusion reactor in mobile phone so it wouldn't run out of power in next 10 years. Some how reminds me from SCRAM jet engines . What could be simpler than SCRAM jet? Yet it seems to be pretty hard.
  • Wondered new version of FireChat. Yet my thoughts are: I would prefer combining different networking technologies, because mesh and flood casts got serious inherent problems. I would only relay messages on mesh network to reach "a better connected node" and try to optimize routing. So use Internet if avail, if not, then try to find path to the recipient or Internet. Both ends could of course have a 'mesh' relay network but the primary path between relay networks could be Internet. This would help in many cases, if one operator is out or so on. There's still somebody with connectivity which you can use to piggyback. Yet keeping system efficient without using too much store and messaging for updating forward & routing tables can be a quite interesting challenge. -> Leads to lot of 'administrative / management / control traffic' -> consumes resources -> Not something you want to run on mobile. - Yet we often do something similar when traveling in group. We get one local prepaid with data plan, and then just tether rest of users to it. Computing OSPF tree isn't a light task for a mobile device in a large network. Using Internet gateways would also limit the size of mesh network that needs to be kept known and routable.
  • SMTP was great when it was open, nowadays it seems that email deliverability is really sucking. There are so many systems which refuse to handle email based on multiple reasons. Basically email isn't a generally working solution any more.
  • Windows 10 is taking the snooping of users to new levels or should we just say to the norm of today. All your data are belong to us.
  • This Akamai GNET CDN interactive map is just beautiful.
  • Studied DO-178B standard. - Gives great example how software can be more reliable. But usually customers don't want good software, they want cheap software and fast.
  • Read A look inside Google's Data Center Networks - They're using Jupiter Network with Jupiter Fabrics. Software Defined Networking (SDN), Andromeda.
  • Checked out meta coin and colored coins. These can open so many interesting possibilities in future. Yet I don't like some examples. Like in case of Namecoin, they give example that people could get names like 'George' based on first to come policy. Lol. Everyone knows where that leads to. Immediately when names are globally shared and unique all the good ones are taken. So instead of 'George' you'll end up with 'georgeb-882' or something similar, which isn't so fun anymore. There has been long discussions why people utilize so lame and limited name spaces.
  • Studied VVER reactor design and benefits of heavy water reactors.
  • Studied Bitcoin Thin Client Security and Simple Payment Verification (SPV) protocol.

DNS-SD, Learn, SSBJ, Security, RA/DHCPv6, 5GHz, IPAM, gRPC, IV, RC4NOMORE, Data Retention

posted Jul 28, 2015, 7:20 AM by Sami Lehtinen   [ updated Jul 28, 2015, 7:21 AM ]

  • DNS based Service Discovery - DNS-SD, RFC 6763
  • Finnish recommendation for Internet Service Providers ISP to deliver IPv6 connectivity to end users [PDF, Finnish].
  • How to actually learn data science. That's what I do. I usually like to setup a project which requires a certain skill set. The skills I don't have, I have to study and learn and then execute the stuff. Creating a actually working implementation will give you a much better insight into problems than just reading about those.
  • Hunted for hot spots in VMware ESXi environment. Some tasks which should take only seconds, can now seemingly take 10 minutes. Ehh, that's not really an optimal situation. Need to investigate more. After some hunting found a memory leak in one application, which reserved huge number of small memory segments and then caused those to be swapped out. Actually that creation didn't cause the problem. Problem was only caused when that huge swap was getting released suddenly on several parallel servers causing absolutely unacceptable amount of disk I/O and 'freezing' the host in process.
  • Youtube documentary Zero days - security leaks for sale. Hacker / hacking / Internet / security documentary.
  • Checked out: Textron AirLand Scorpion and Supersonic Business Jets (SSBJ), and many similar concept designs, but it's really hard to know from limited information resources which of those projects are pure fiction and if some are actually making some progress.
  • It seems also to be really complex to tell if system is using DHCPv6 or RA / SLAAC. Microsoft Windows gives very confusing and misleading information about that. Sometimes addresses are labeled as Public or DHCP but who says that you can't get public address via DHCP? As well as DHCP based entries do not show lifetime but SLAAC based entries do and so on. I'm sure if there are problems, it's going to be horrible to provide customer support because everything is so messed up and it's hard to get reliable information. In somecases it seems that only way to get reliable information what's actually happening is to dump the network traffic and analyze it. Tools, logs and user interfaces are so badly designed and confusing that you can't really trust those. Yep, this isn't first time nor the last. IPsec is similar. There's no way to trust the user interfaces or logs, everything can be more or less wrong. Well after playing with this stuff for a long time, you'll find out which are the places you can trust and which provide conflicting or wrong information. But it's always so annoying when things are inconsistent. It's just like bad or misleading documentation, which makes troubleshooting real nightmare, because you can't trust any information. You'll simply have to go through all possibilities and try to find some reliable source of information (like packet dumps) when you can't trust any other information.
  • Also had separate problems at one hosting company. Well you'll get what you pay for. More money = More dedicated resources. Yet it seems that things turned good. After I made the complaint and clearly said, I'll move all my systems out if this happens again. It hasn't been happening again. Luck or did they actually change something? Sounds pretty unlikely that they would really care. Or maybe I'm underestimating their interest to customer satisfaction, which also sounds unlikely.
  • Noticed that some teams aren't using automated monitoring for their production systems. That's really bad. If you don't monitor service quality & availability it's highly likely there will be exteded down time.
  • Once again ended up in a discussion where I had to remind my self about OTP, OFB, LRW, XTS, XEX. For simplicity of implementation team decided to use standard CTR with AES128. I really like asking some things from cryptology professor, helped to deepen my understanding about a few things. Which I already know how those should be done, but I really didn't understand why. Now I know it too.
  • OWASP Cryptographic Storage Cheat Sheet
  • Hackers remote kill a jeep on the highway. This is the future, everything is connected to the Internet, remote controllable and of course hackable. kw: uconnect, CAN bus, remote, exploit
  • Reminded my self about why and when Initialization Vector (IV) is needed.
  • Studied Python types library, "Dynamic type creation and names for built-in types". - It allows you to generate new classes dynamically. Yep, full classes, not only instances as usual.
  • gRPC Google's Remote Procedure Call system utilizing bidirectional HTTP/2 single connection multiplexed RPC. Really a nice way to utilize HTTP/2.
  • Checked Charles Leifer's post about Python UnQLite bindings. Looks really interesting. I have to check out if I could use unqlite instead of SQLite for some of my projects. Answer is most probably yes. Yet I'm familiar with SQLite3 and if there's no reason to switch, there's no reason to switch. Peewee already offers dictionary like interface for SQLite3 which I'm using with some key, value tables.
  • OpenBazaar project started weekly progress updates in their Blog.
  • Checked out a few IPAM products, yet I believe I don't have any need for those in future either. Managing just a few networks, is trivial, and even more trivial when IPv6 comes along, because you can easily allocate own /64 for every subnet required. Currently my ISPs are offering /48 for businesses and /56 for home users. It's interesitng that the Wikipedia article says that IPAM is more in demand for IPv6, I personally think that it's less required. Also firewalling comes much easier when you can refer directly to required subnet level. Or if you want just to croasely restrict traffic you can easily whitelist whole ISP, instead of going to through tens of even hundreds of different IP subnets they're using. This can be naturally combined with DDNS when required.
  • Quickly tried HaCi and Netmagis - Which just confirmed what I thought earlier. Using one smallish spreadsheet for required data is ok way to manage all I need to manage.
  • Checked out WLAN (Wifi) 5 GHz channels in Europe. I'll need to setup one network and wanted to be informed about channel usage. Ok, I wanted to see also the international differences, I'm curious so I did read it too.
  • Had long discussion with a friend about 'academic research' versus 'efficient execution'. How huge difference there is how things can be done.
  • At one salary comparison site I really wondered about about lack of units and definitions. They just got question like, what's your salary with dropbox containing several ranges from 500 to 200k+. But salary, in which currency? For which period? Weekly, daily, hourly, monthly, yearly? I really tried to look for the definition on the site and I couldn't find one. Also in heavily taxed countries there are big differences if you'll get paid vacations or not and if the salary is before or after taxes. Does it include potential bonuses, extras or overwork or not and so on.
  • HTML5 can be used to hide malware. Surprise? No.
  • Still had strange problems with IPv6 and one Linux server. It's probably related to IPv6 and UFW configuration. Yet I'm not exactly aware what's causing the problem. I changed some settings and if the problem reoccurs then I'll have to do larger changes. I just prefer not to change too many things at once, because then there's no way to tell which particular setting fixed the issue.
  • RC4NOMORE - Yep, RC4 shouldn't be used. As they say, attacks only get better. Here's improved and further developed clever attack against RC4.
  • Actually the DHCPv6 vs SLAAC poll is interesting, because even if address is assigned using SLAAC the DNS and other information can be delivered with RA O flag using DHCPv6-Lite protocol, which does not require M flag. So the host IP address is autoconfigured using SLAAC but the DNS information is still fetched over DHCPv6. This makes the question if you're using SLAAC or DHCPv6 quite confusing. There should be three options, A,M,O which flags are being used or if the address is being configured manually.
  • Studied Veeam backup & replication for VMware or Hyper-v, yet I concluded that I don't have use case for it right now.
  • Carefully studied and commented OpenBazaar's upcoming contract schema. I'll be blogging more about my findings. The schema version which I commented is still under construction and so far 'lightly discussed', so there are many things to fix. But I'll be posting about my OpenBazaar related observations later, and it will be a long post.
  • Studied unqlite-python documentation. - https://unqlite-python.readthedocs.org/en/latest/api.html#Collection - Nice, I like it. It's fits very well with Pythonic design. Iterable, lists and dictionaries.
  • Data retention, privacy, law and leaks / data theft: What's the problem? Everyone is talking about big data and stuff. Isn't one key factor of that, that any data ever obtained whatever means, won't be deleted, ever. You don't ever know when you might need it. Yet if it leaks, too bad. It wasn't 'our' data necessarily in the first place, we just happened to have it.
    This can be a problem, because some corporations have data retention policies which explicitly forbids deletion of any data, even if it would be required by law. Who's going to audit that anyway.
    Just as example:  If Gmail, Facebook or Dropbox leaks all customer data, including your private messages, chats, email attachments, anything you ever synced (photos, excel sheets) to the service in past 10 years. They can just say s*t happens. Not our problem. This came as complete surprise and we'll be making some improvements in future. Sorry.
    If that happens in future. Don't feel bad. You should have been expecting this to happen when you send your stuff to 'cloud'. So there's nothing to whine about.
    Why so? Because data isn't properly classified when it's generated / received, it leads to situation where there's so much 'random' data that nobody wants to go through it and decide what should be removed. Therefore it's just much simpler to keep everything forever. As well as many developers are lazy, inserting data into relation database is really easy, but nobody bothers to build the data structures so that data could be removed from the database in some sane way without breaking relations and this leads to situation where nothing gets ever deleted.
  • Finally something light, it's a cloud story time! What's the silliest thing you've encountered with cloud stuff? Here's my story.
    Once upon a time, at one customer, they had advance awesome private cloud.  It was really top notch. When we needed resources from that cloud it turned out to the project managers that getting resources from private cloud would require so much bureaucracy, paperwork and meetings, that we'll do it otherwise. We just ordered a few physical servers and installed those to the corner of the office. This was cheap, fast and efficient.
    Isn't flexible cloud stuff awesome or what? Nowadays it would be just as simple to get the servers from UpCloud or similar service provider, but the company's own cloud was a joke. Shadow IT working hard!
    Got any juicy stories to tell? I got tons of those! Share on G+ with me.

No Estimates, Eddystone, AltBeacon Schema, Modulation, OpenBazaar, DNS, DR, DISM

posted Jul 20, 2015, 10:17 AM by Sami Lehtinen   [ updated Jul 20, 2015, 10:44 AM ]

  • I just so much agree with this No Estimates concept. Because truth is that estimates are usually horribly wrong and not counting multiple factors. As well as so many of the details are missing that estimate is really a complete guess.
  • UK is again considering banning of encryption. This is strange trend. Don't they realize how much it can harm economy? Yet it won't be a problem for people who are willing to use encryption even if it's illegal. You'll just need to camouflage it so it isn't obvious. Crypto Wars are back - Should all encryption contain backdoor so it can easily be decrypted if required?
  • Had extensive discussions about international trade and business arrangements with a few friends.
  • Telegram was under massive 200Gbit/s DDoS Attack. Attckers were using Tsunami SYN Flood.
  • Checked out new contract schema drafts for OpenBazaar .
  • Also studied pre-existing schemas at schema.org - I love standards, but I always want that the standard is extensible. Most of standards really aren't in any easy way. Does unknown field cause an error or is it silently ignored? Well, if it causes process to fail, it's not extensible, because you're creating new standard for adding something simple into old standard.
  • I like standards really, but I also acknowledge the need for extensible standards. Especially in cases where quite simple things are being done using some heavy standard is a good example when I don't like standards too much. In such case studying standard can require a lot of time, there can be several complex traps in the standard as well as the implementation being build probably shortcuts most of the standard. Then you have a 'standard' solution with extremely limited functionality which causes errors when anyone with fuller implementation tries to talk with it.
  • OpenBazaar DHT and long term data storage: All data stored in distributed network / DHT should have TTL as well as most probably re-balancing (republishing) at quite rare intervals. These are the things I've been tuning with GNUnet guys back a long time ago. Originally they didn't have any expiry and it was bit strange, only new nodes stored new data as old nodes were full of old data. Duh! Yet this is the case where potential spam / flooding can get really dangerous and problematic, potentially hindering functionality of whole network for extended period.
  • What's new in uWSGI 2.0.11 - No HTTP/2 - support yet. I guess they haven't figured out what's the best way of doing Server Push.
  • Firefox starts to block Flash as default (Until most serious vulnerabilities are fixed). Yay! It has been causing so much security trouble. Now it's a must to start using HTML5 instead of Flash. Everyone has recommeded this for years, but well, u know, people and organizations are really slow making changes until they have to.
  • Reminded my self about QAM, OFDM and SSB. Interested? See modulation @ Wikipedia that's a good starting point.
  • Checked mobile power consumption 3G vs 4G on in my typical usage environment. Difference is really small, and 4G speed and low latency makes things nicer so it's a win for 4G (no surprise there).
  • Frawned once again about security procedures (total lack of those). Everything is installed and configured randomly and not even fixed in case there are reports of serious misconfiguration.
  • Well how's that different from Adobe Flash issues? Well it isn't. Who cares if there are serious exploits or bugs. If there's no widely used exploit for those, it doesn't matter. It only matters when it's actually happening, before that it's only theoretical threat.
  • Reminded my self about Paravirtualization.
  • Studied Google's Eddystone and their Blog entry about it. It's a flexibe iBeacon replacement. Also see Electric beacon. This is also a concept which could bring new business to small startups dealing with those. The Eddystone's telemetry frame (Eddystone-TLM) is also very interesting from this aspect when combined with Beacons Diagnostics. It's really nice that the Eddystone supports URL beacon instead of UUIDs alone. Problem with UUID is that for most of people it really doesn't mean or represent anything at all. UUID is about as useful as MAC address of WiFi base station. It can be meaningful to you, but in most of cases, it just doesn't mean anything at all. There are also some encrypted frame types like Ephemeral Identifiers (EIDs). It's also good to knowledge related technologies like Weave, Thread and Brillo all of this also realtes to Internet of Things (IoT).
  • Checked out AltBeacon. Read the AltBeacon protocol specification and frame type. Yet AltBeacon is super simple and only sends really short UUID making it also as useless as iBeacon is without external database. Useless? Well, I just now got 6415712610302 in my hand. Of course you should know what it is! 
  • Reminded my self about Bluetooth Low Energy (BLE).
  • Also checked out Google's Physical Web project. Yet it's merging to use Eddystone technology. I also love the concept, because I personally would prefer almost always HTML5 application over native application. I just hate installing tons of junk on my phone, when I really rarely need those. Using a properly designed HTML5 website, when I need one would be a lot better option.
  • Frowned to Microsoft, I guess they're working hard to make things as annoying as possible. Running CleanMgr.exe is really annoying on 2008 R2 or 2012 R2. I think Windows is even harder to use than Linux. There's absolutely no user friendliness what so ever, they've made it about as annoying and complex as it can get. I just posted one solution to the problem here.
  • Read some deliciously enjoyable stuff like: Potato paradox, Ham sandwich theorem, Pizza theorem, Pancake sorting, Fair cake cutting
  • Checked out Socket.io and PeerJS for efficient P2P direct in browser JSON utilizing WebRTC communication without needing to pass data via server.
  • Checked out OpenBazaar contract types: Physical Goods, Digital Content, Services and process flow charts for Physical goods (flow), Digital content (flow), Services (flow) - Getting a contract expiry is a great thing. There's also a new way to host images ant vendor's node. Which probably means that there will be some kind of new API call to fetch data in case data can't be fetched directly over HTTP. I also want to get the data so that it doesn't need to be refetched when contract is refreshed, so the image data can remain static, even if other parts of the contract get changed. Also the process used to encrypt address using XOR and nonce makes me think, but no conclusion yet. I have to find out why this is being done. I heard that they got cryptography professor, I hope it helps!
  • OpenBazaar is generally very interesting project. Networking, P2P, DHT, Reputation management, Transaction Ratings, Python, OpenPGP (PGP), E-Commerce, Encryption, Digital Standardized Ricardian Contracts using contract type based schema, BitCoin, Multisignature (multisig), Escrow, Moderators, ECC, Cryptography, Semantic data, Digital signatures, Cryptographic hash, JSON, databases and all that stuff, Financial Power combined with global free P2P trade! Connecting vendors and buyers around the world. Minimizes personal identifiable information (PII) leak yet provides strong identity using GUIDs, metadata, network data. This is exactly the kind of project I've been looking for several years and have been wondering why nobody sees the potential for it!
  • Checked out a Passcard - a Bitcoin based identity and authentication solution. Ok, I had to register too. Here's my Onename profile.
  • Reminded my self about DNS Glue Records and circular dependencies.
  • Had not so fun with DISM and Windows Servers. It's huge mess with bad instructions & documentation. I would really like to cleanup winsxs from all uncesessary junk, with Windows 2012 R2 it's reasonable, but with 2008 R2 I can't find similar commands? It seems that the things work differently with every WIndows version, hor annoying is that?
  • Reported a few seriously bad IPv6 routing issues to corresponding NOC's (Funet.fi, Nordu.net, OVH.net)
  • Studied Google's Disaster Recovery (DR) Planning Guide and Cookbook.
  • Now when IPv4 addresses are running out it's interesting to see traffic from IPv4/8 addresses where you never used to see traffic earlier like 1. 2. and 5. I actually got severs my self in 5. which use to be 'used by Hamaci' because nobody uses it. Smile.
  • Launched a poll in IT Professionals group, if you use SLAAC, DHCPv6, Static/Manual or some other method to configure IPv6 addresses.
  • It seems that it's hard to get for some people that when IPv6 starts to be used, and no IPv4 is being used, they have to start using IPv6 too, there's no other way to get things to work. Even if they still got 'enough addresses' behind their NAT. Smile.
  • I don't know if it's really necessary so often, it seeems that my home network triggers ICMP6, neighbor solicitation, ICMP6, neighbor advertisement for all IPv6 addresses every minute.
  • From my G+ post: " Well DHCPv6 doesn't always help with audit, because in some cases it won't help compared to SLAAC at all. Unless there's some additional authentication layer, it's really hard to get any information who's using which address and logs won't provide enough information. Even if logs would contain MAC you can naturally trivially change it.
    This is the area where many things need to be changed before things work out really well out of the box. Well, ok, not all DHCPv4 servers neither log mappings nor traffic, so that's not a new issue either.
    You can also log NDP traffic when using SLAAC and gain basically same information you would get from (working) DHCPv6 logging. "
  • And " Full port security was also first thing come into my mind, but that's pretty expensive solution. Most of networks do not require that kind of security. It's just enough that's some way to detect users. It's also interesting to see what kind of problems arise from network filtering or lack of it. I've already noticed that filtering MLD causes loss of connectivity at some cases and well of course not filtering some messages has similar results if someone just purposefully injects those to network like rogue RAs.
  • I remember good times when you could bring major systems down by hijacking just IP on switched large network or running rogue DHCP server. Smile. "
  • What's the difference between LAN and WAN in future, none? 'LAN, service provider is often responsible for WAN. But because we're talking about the Internet, why you should have lan, you can just bridge WAN to make it a LAN. same stuff, no router needed, just a switch. In many environments I don't have separate 'LAN' at all, it's just switched Internet and depending where packets are going those go to LAN or WAN.'
  • Studied UHV power transmission in China.
  • Debugging one network with tcpdump required me to refresh my memory about RA MO flags.
  • Quite nice and a simple post how backpropagation works on neural networks. A good read if you haven't ever really thought about it.
  • Glanced OpenBazaar Docs Documentation site. - There's a ton of stuff which I have to study later, it's all so good stuff.
  • A nice Infographics by BBC about Artificial Intelligence.
  • Once again thought why we do not yet have universal strong identity for ehh, for lifeforms (I said universal). Ok, let's say for humans. Many people are using IBAN it shouldn't be impossible to provide a global strong identity for people. Issued by governments.
  • Just a post how to learn data science. It's a guide basically, how to get started. I personally couldn't agree more. That's how I often get things done. I pick interesting topic and then I create related project. To get my stuff done, I'll have to learn how to get it done. keyword: learn by doing.

Back log is still building up. I'll really need to create one what I did during the summer dump.


1-10 of 273