Blog

Google+
My personal blog is about stuff I do, like and dislike. If you have any questions, feel free to contact. My views and opinions are naturally my own personal thoughts and do not represent my employer or any other organizations.

[ Full list of blog posts ]

CSIRT, Security, Suomi.fi, Searx, Documentaries, Obvious things, Cloudflare IP ranges

posted Aug 14, 2018, 3:29 AM by Sami Lehtinen   [ updated Aug 14, 2018, 3:30 AM ]

  • CSIRT Maturity - Evaluation process - Good things was that the list didn't contain anything we wouldn't have thought and talked about with friends and colleagues in general. Of course there are alternate networking / communication backup channels, etc.
  • Once again with one random project. Why developers put computers accessible over the Internet. Of course, as expected, they use the well known default administrator credentials for all systems and services... Security, what security? They really don't care. They don't bother even taking care of the most primitive security aspects. Once again 'security theater fantasy show', all the ridiculous talks about public / private key asymmetric encryption, etc. But in the actual reality tv, it's way too much to wish for using something extremely simple like plain username and strong static (random) password. I mean with values that wouldn't be the default. It's true that it's unauthorized access and requires hacker or criminal to access system where username is like usr and password is like pwd. Yes, it's a crime. So there's nothing we can do to defend against the all empowered darkness of evil Internet forces? Right? It's almost futile to even try. - No news of course. We know how many mongodb and memcached instances there are and were accessible over Internet with critical data and no authentication whatsoever.
  • Suomi.fi - Finnish citizens electronic service portal is developing fast. This will aslo replace many letters sent by authorities, officials, agencies. National digital secure communication channel for public services, local counties / municipalities. Also check out eSuomi.
  • searx.me - Yet another privacy-respecting, hackable metasearch engine. I actually like it. Only thing I don't like about the default settings, is the strange idea of having Bing and Yahoo enabled. But not having Yandex enabled. It seems that some people are missing the fact, that currently Bing and Yahoo provide basically same results, so it's overlap. But Yandex is actually fully independent and a good search engine. Unsurprisingly site is hosted in Germany as just so many other privacy search and European search engines.
  • Watched hour long lesson about Conscious Capitalism. It was good talk. Good shared values, everyone wins. No rip-off, good for clients, business, employees and environment. As one example they used the Whole Foods chain.
  • Watched another document about health diet. That's a hard question. Everything seems to be at least somewhat dangerous. Some stuff you have to have, or you'll get sick and so on. This is very complex topic. I guess nobody's eating better than the people at International Space Station? Because they must have very carefully planned and complete yet healthy diet.
  • Watched another document about creator economy. Great talk about economy, business and what people want, etc. Producer economy, consumer economy. Creating demand, producing required goods, etc. Good talk, about how much people have everything they need. That's the situation I've been in for a long time. I can go shopping, but it's extremely hard that I would find anything to buy. Gamification, participation, personal experience. Robotic car subscription services. Owning car doesn't make sense. Everything is tracked and there's digital tail and trail of everything. Robots taking jobs, etc. It's all about technological transformation. How technology, society and privacy works out. People love sharing valuable information for free, etc. People don't actually care about privacy.
  • Also watched many TED talks about interesting future topics. Yet those always awfully light and don't go deep in topics due to time limits.
  • Jvns: Being on call. That's exactly what I've been saying. Nothing new, nothing to add- Very good writing. But unfortunately not obvious to all developers. kw: software reliability, code quality, exception handling, being responsible, design choices, distributed systems, consistency, race conditions, stretch and learn, complex failure condition, "being responsible for my programs operations makes me a better developer", that's very true!
  • Read a few cloud migration articles. Nothing new, it was all obvious. Like data transfer times, data migration, delta syncs, updating data in smaller chunks. Checking connection upload speed, etc. DNS, TTL, IP address changes, firewalls. Long files and deeply nested folders, hahah. Been there done that. Access policies and restrictions, secure configuration, etc. All the basic fun.
  • Cloudflare has added following new IP ranges during the last six months. I've got a script which auto monitors CF ranges and updates firewalls etc. IPv4: 199.27.128.0/21 and IPv6: 2c0f:f248::/32, 2a06:98c0::/29

My site was down - Google Account Disabled

posted Aug 14, 2018, 12:01 AM by Sami Lehtinen   [ updated Aug 14, 2018, 12:08 AM ]


2018-08-01 Google Account Disabled

My site was down, because Google suddenly without any prior warning closed my Google Account. Just by sending email that "Google Account disabled".
After I got the Google Account re-enabled, the Blog site on sites.google.com still remained closed stating: "Site Disabled - The Google Account of a site owner has been disabled because of a perceived violation of the Terms of Service. The site owner needs to restore their Google Account before this site can be viewed. Learn more".
This is a good reminder why you shouldn't trust cloud services. And if you do, at least have several backups, using different technologies, in alternate clouds & physical locations, without single administration. That's one of the reasons why I laugh, when some provider says they provide backups. I guess Google provides backups too, but it's utterly meaningless if you can't access the backups. So please, never use providers own backup service. It's silly and pointless.

2018-08-02 Google Account Re-enabled

My account is finally accessible again. But Google sites still says, that the account is suspended, and I can't access the site, nor it's publicly visible. As usual, they didn't give any information why they took the site down, nor they did provide any information after the incident. Just as usual. Kind of BOFHish. If I did something wrong, wouldn't it be right, to tell me what it was? Nor they asked to change anything. Strange.

2018-08-03 Google Sites working again

Finally the website is also accessible for public.

Conclusions

Do not trust the cloud! I do have backups of everything. I could have relocated the site using just a few hours of work. But I'm pretty sure there are many people out there, whom don't have proper backups. Which you actually in control of. What if the cloud provider just throws you out, without giving any reason? As stated, one provider already gave two weeks notice of system shutdown. But that's plenty of time, compared to sudden loss of all access. I've been also worried, that people buy services from provider X and then they use the same providers backup and all other "availability" services. Seriously? Never trust only one provider, keep at least two totally independent providers if possible. Which allows automatic fallback or quick switch from provider to another if there are serious issues. Also from disaster recovery point, it's important to remember that provider can go totally missing over night.

Keep multiple backups and control your systems! Don't trust the cloud. 

I do have multiple backups, on multiple locations, using multiple technologies.

It's kind of scary to think, how many people might have basically all of their data in the cloud. Which will be lost based on my experience, sooner or later. (Geocities, MySpace, etc... It might just takes some time, or you're having issues with your credit card or whatever.)

KW: cloud, fail, risk, data loss, loss of control, IT, security, risks, backups, disaster recovery.

MAPtool, Badblocks, IoT, Politics, Thunderbird, IMAP, Firefox, Mobile Software Engineering

posted Jul 28, 2018, 11:31 PM by Sami Lehtinen   [ updated Jul 28, 2018, 11:35 PM ]

  • Microsoft Assessment and Planning Toolkit (MAPtool) is so full of bleep that phew. It's perfect example of totally over engineered, complex piece of bleep software which just seriously annoys users. It's perfect example how software should not be planned and implemented. - That's pretty much it. - Windows System Information is much more usable tool, but that toolkit is just horrible. If you run it as Administrator, why it asks for credentials. Why it doesn't use default user, why it doesn't allow .\username as username etc. Why it requires Active Directory (AD) / Domain, even if features wouldn't be in use at all. I would personally prefer light standalone program which can be run with elevated administrator privileges without any configuration or installation, just to run the binary and extract required system information in dozens of seconds. That would be something I could use, actually like and wouldn't deeply annoy me.
  • Run again badblocks on all drives and checked SMART data. Only one drive is in slightly bad shape, but that's being only for temporary data which isn't unique or easily reproducible / available.
  • Watched yet another documentary about IoT dangers by BBC. It was a very good documentary. Everything is now hackable, and unexpected things can happen. Because people aren't accustomed to all the trouble malware, hacking, viruses and worms can cause in the Internet of Things (IoT) world. Did you know that your children's toys, your kitchen kettle and home thermostat are all very hackable now or in near future. As well as your AC / HVAC units etc.
  • Politics, security, costs, work, efficiency, productivity, balance. Well, it's hard to get that right. Just as the NHS documentary showed. It's easy to laugh that they got hacked. But maybe the laughers don't know what kind of ridiculous resources NHS IT staff got for the task? That's one of the factors where the reality and fantasy collide. It's also easy to wonder why buildings collapse in earthquake zones. Sure it would be possible to bulldoze the buildings and replace those with state of art earthquake proof buildings. Personally I'm waiting for the California's big one. USA is rich country, and all required technology is available. It should be just as safe as in Japans new sky scrapers, right? Sad to say, but let's just watch the body count when the quake actually hits.
  • Also watched yet another documentary about current information availability (information overload) and the brain overload it causes. Including: Cognitive load, multi tasking, task switching cost, stress, weakened concentration.
  • Does anyone know why the Thunderbird + IMAP sometimes ends up in that kind of loop, where messages are duplicated over and over again? I've heard similar report from one other user. But I'm just wondering if this is more wide spread? I guess it has something to do with the combination. Large amount of messages, really slow and crappy Outlook servers. With this combination, it's possible that the process gets timeout, hangs out, or hits some kind of rate limit. And then is automatically retried, from beginning. And therefore doesn't get ever completed. - This is totally and absolutely just a guess, but it might still make some sense.
  • Firefox 54 is using multi-process & many-threads approach. Just as I do. And many others. Of course number of threads could be reduced if the program code is written perfectly using async code alone. This also leads to the expected downside which is potentially huge additional memory consumption. Haven't measured it yet.
  • Still wondering how badly mobile phone display brightness control (Android / Samsung) and modifiers are engineered. If the sensor range is from 0 - 10. Why they first read the sensor range And the fit it into range like 3 - 7. Then the user modifier can modify that with range: -3 or +3. I would very much like a solution where the user modifier would be applied to the sensor reading and then used. Currently configuration limits either dimness or brightness. I don't want that. I just want to set the screen bit dimmer or brighter than the sensor reading. But that seems to be once again way too complex for engineers. I do assume that the sensor range is actually larger than the screen brightness range. And therefor just adding the modifier to the sensor reading would be perfect. It would be also the simplest thing to do. Maybe I should make a slide show presentation about this. But I'm pretty sure nobody cares. We're just so used to living with bad engineering.

Python 3 vs 2, Ipaddress, Programming, Let's Encrypt ACMEv2

posted Jul 28, 2018, 11:17 PM by Sami Lehtinen   [ updated Jul 28, 2018, 11:17 PM ]

  • What you can't do with Python 2, but can do with Python 3. Still wondering why some people refuse to use Python 3. Let's see if that list got anything I didn't know, but need at some level. I'm naturally using advanced unpacking and keyword arguments, because those are just too useful to be ignored. Some people argue that it's better to have different functions / methods, for different parameters. But I don't personally like that too much, unless it's something obvious. Even then I prefer to have the main code and then just 'handles' for the rest of stuff. Chained exceptions is also way too useful to ignore. All of my code naturally uses that. OSError and other exception subclases. Yet at times it might bite you, if you've defined exception handling for a few common subclasses and then you'll get something you didn't expect to happen. Well. That's also one of the reasons why I write generic exception handler, which fails and rollback whatever was happening, in case of unhandled exception. Yet of course it's bad code, if exceptions aren't handed properly. In this way, there's always rollback catch all fallback. It's quite rare that there's unknown exception which still means that everything is ok, and process should be continued. I don't remember having that kind of fail in production for several years. It's also obvious that all this stuff gets logged in detail. Everything is an iterator and no more comparison of everything ot everything, is pretty obvious. Yield from is something I didn't actually know. There's subtle but important difference between append and extend on lists. I'm almost always using append. Because I'm often having lists of lists. Coroutine and asyncio, I'm naturally aware of it, but I haven't written any production code with it yet. Fault handler is something I haven't yet needed, but it seems to be really useful when needed, a key tool like signal lib, when needed. Ipaddress is something I've used. It's especially handy when dealing with subnets and IPv6 addresses in different formats. functools.lru_cache - Sure, but I've got my pyclockpro, which utilizes better CLOCK-Pro eviction algorithm which is better than the LRU obviously used by the lru_cache. Enum is quite handy.
  • As bonus the Python Ipaddress library does compression and exploding correctly. Unlike the ridiculously bad clearly custom code which OVH used in their control panel, which I blogged about and did the thing incorrectly. It was like fail, fail fail, burn with fire. Function annotations, that's sometimes very useful. I've used that a lot. Pathlib is also pretty obvious choice.
  • I'm currently just working on code, where it's hard to decide if it's one item and many process steps, or if it's list of data, which goes then through set of steps. Haven't yet decided which one is best for the specific case. Often I prefer by item processing, but if the items got interlinked meaning, then it's becomes much easier and clearer to handle those as group / list. Which passes multiple steps. This is also important because otherwise maintaining inter item state might become a horrible mess. Of it get's complex enough, then naturally there will be classes for each item type and data structure, instead of simple lists, tuples, dicts or named tuples. Using lists also easily leads to 'magic numbers'. Which is naturally very hard for anyone else, and for yourself too to handle after a while. Sometimes when stuff isn't just properly labeled or named. It's easy to come up with numbers. But it's hard to maintain such code in future. Especially if those field numbers will be changed at some point. Numbering itself isn't a bad thing. Oh, you're talking about field 132 that's just as good as talking about EBELP or MOA. Of course position enumerators can and should be used to add names, when applicable. About coroutines and async io, because networking is often handled by something else, and I'm handling already parsed messages. I'll do just fine with multiprocessing & multi-threading. Yet the 'slow' database access could be probably written using asyncio, but it might make the code bit more complex than using just imap. Because performance (so far) is hardly a problem, I haven't really put thought into that. Simplicity, clarity and reliability are much more important. I haven't completely internalized what's the benefit over enumerations over dictionaries. Except of course improving code readability in many cases, which is important factor alone. Enumerators fall in category, that you could use those, just so you can say you're using those. But the actual benefits should be pretty slim. One of things, which can be solved using just so many different ways. Which way is preferred, it depends. Mostly I've used enumerate with for loops.
  • ACME v2 Let's Encrypt protocol - Awesome, it's becoming IETF RFC standard.

Isolation, Problem Solving, SEPA, Bloat, Thunderbird+Outlook, DDoS Mitigation, Gartner's Cloud, Roche Limit

posted Jul 14, 2018, 11:40 PM by Sami Lehtinen   [ updated Jul 14, 2018, 11:40 PM ]

  • Many databases don't provide snapshot isolation. That's kind of problematic for some long running transactions. Of course you can read only the most vital data with read committed and rest of data with nolock. But problem with nolock is that it also reads uncommitted data. I don't actually care about getting fully consistent snapshot. But I generally don't want to read uncommitted data, which nolock option triggers. Using read committed snapshot would be ideal, because I usually don't write to database at all. I would just prefer getting a static snapshot with committed data to work with. - Snapshot Isolation in SQL Server.  Btw. SQLite (SQLite3) when used in WAL mode provides by default READ COMMITTED SNAPSHOT isolation level.
  • I really appreciate people who provide blatant disinformation. One claims that there's a problem with credentials. Which is a blatant lie. If DNS doesn't resolve. Or you're not getting a SYN-ACK back when trying to connect. There's absolutely nothing wrong with your credentials. Because credentials are absolutely irrelevant in most cases at that stage. So, please, don't claim there's a problem with your credentials, when there obviously isn't. And even if there would be, you wouldn't even know it. - This is why it's good idea to generically describe issues on higher level, without drawing too many conclusions.
  • SEPA Instant Payments / Instant Credit Transfer -  - This is going to be something big. We're already using similar national services, but this will enable EU wide instant transactions. Which will be naturally awesome. KW: fintech
  • Sometimes things are done in pretty inefficient way. Today I downloaded 24 GB zip packet, which contained one ISO file. Which contained Clonezilla live with disk image, which contained life Windows with NTFS partition and then the running system contained the actual files which I needed. Size of those files were 48 megabytes. Sigh, it's nice to wrap stuff wrapped in several other obfuscation layers. - Thank you so much for delivering such bloat. I would like to think that they were just trolling me. But sometimes things are made 'easy' by making those absolutely horribly overly complex. Maybe they thought that sending whole live disk image, would somehow make the things simpler. Or I could have just as well used Clonezilla without live anything.
  • Thunderbird + Microsoft Outlook. This seems to be invincible combination of problems, because Thunderbird works randomly and Microsoft Outlook works randomly. When you combine two pieces you'll get infinitely many problems. Everything fails, and is slow and then fails again. Now they claim I've got 12 gigabytes of mail and 200k emails in trash folder to be downloaded. Nope, I don't have that kind of loads of email. But Microsoft seems to think so. System is running infinite Amok run. Even closing and restarting systems won't help, it might be actually making the situation worse. I also cleared trash, but did it change anything? Nope. Reset local trash, did it change anything? Nope. And as bonus of all this, Microsoft cloud services are so inhumanly slow, that downloading those duplicate messages would take nearly forever. Btw. I never had these problems when I run my own server with Postfix and Dovecot. I guess the process where I deleted bunch of messages is in some kind of loop, which doesn't finish and starts all over again. Quite classic state management problem. But how do you clear that task queue? I've got no idea.
  • How to choose authenticated encryption. Reminded my-self about basics again. kw: CBC, CTR, GCM, AE, AEAD, OCB, EAX, CCM
  • It seems that DDoS and generic system attacks are getting worse and worse again. It's funny how attacks get worse when attackers assume that key personnel are on vacations. They know it might take longer to respond. The distributed (but centrally managed) attack prevention system got over 50000 attacking IPv4 addresses blocked right now. That's much more than I expected to see before implementing it.
  • Read Gartner's Cloud Infrastructure as Service report. IaaS, PaaS, aPaaS, Cloud Service Brokerage (CSB), Cloud-Enabled System Infrastructure" (CESI), High-Performance Computing (HPC), IT operations management (ITOM, managed service provider (MSP), system integrator (SI), Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, Provider maturity model.
  • Something different? Roche limit. Asronomy, orbiting mass, planets, planetary rings.

GnuPG, Heisenbug, TFFS, UUID, Security, Log-structured storage

posted Jul 7, 2018, 8:46 PM by Sami Lehtinen   [ updated Jul 7, 2018, 8:47 PM ]

  • Advanced intro GnuPG - Info: RFC7748 - Recommends using Curve22519 and Curve448. Also checkout RFC4480bis - The lastst OpenPGP Message Format draft.
  • Btw. That comment regarding SSH that nobody checks keys is not true. Of course keys are checked, if it's about something important. Afaik, valid certificates with TLS is worse than actually checking the keys. That's why we're using plenty of self-signed keys, with fingerprint verification. Of course you can also use valid cert and still do fingerprint verification separately.
  • Heisenbug. Have you ever encountered really elusive bug, check this out.
  • The new Finnish surveillance law is progressing. It's interesting what kind of data sets they'll be collecting. Why collecting? Well, there's no reason for the US to ask email information from Visa applicants. Unless they have the information and access to it. So that pretty much proves that they do have data and access.
  • Security best practices? Always carry all confidential files on USB stick in your shirt pocket. Encryption? What encryption. Come on, don't be so ridiculous. It's always nice and interesting to notice what the differences are between reality and the illusion show running in the security theater.
  • Tuxera Flash File System (TFFS) - Checked it quickly out. It's like all other Flash File Systems (?). They don't provide nearly enough technical information to make any difference. They claim it's designed for UFS, eMMC, MMC, SSD and SD. Support for HD video, life time extension write amplification / erase cycles / wear-leveling. Built-in check & repair + no data loss. And of course several times, high performance is mentioned. Anything can be high performance, unless any details are provided. POSIX File System compatible. More details (PDF). But that's also extremely technically disappointing document. Anyone can make those claims in the document. As we know there are a few common techniques used for flash storage, and those might be combined to produce 'optimal' results. Different allocation for small random and large contiguous data etc.
  • Had long discussion with colleagues about UUID as primary key. I'm opposing that. Because it's simply inefficient in many use cases. Especially if it's stored as string. UUID is just blob of bits with certain structure, which can have that all so familiar presentation form.
  • Security as usual. All doors open or unlocked and no one's watching. Does it matter how elite VPN tech or in transit encryption you're using. If anyone can walk in and pickup all the systems physically with the data? Or simply copy data directly from systems, without leaving any warning signs? That's the way to go. Of course we can safely assume that nobody would do it. It would still be a crime. But where goes the line of criminal negligence?
  • Log-structured storage - Nice summary, nothing new. I think the post is so short, that it seriously confuses and mixes and matches things. SQL vs not-SQL doesn't have anything to do with segmenting. Nor replicated vs not replicated, etc. That's always a problem. If you go and summarize or simplify things. It's almost guaranteed it's going to be wrong (not just inaccurate) on some level. 
  • Log-structured storage is one of the most simplest structures of all. That's why I'm very often using it. Especially for data which is stored, and it's very highly likely, it won't be ever accessed again. That's why I used the log-structured compressed block storage. It's an compromise between block storage and log-storage. There's a index file which tells which data block hash is in which storage block. And then compressed storage blocks are stored separately. This is for data which is likely to be accessed "some times". Like once a month or so. If there's data which is even less likely to be needed, I also often skip the indexing part. So it's just compressed stream of data. Like compressed log file. And getting data from it is very much like grepping the required message(s) and just finding last one of those. If required log-structured storage can be optimized, but there's no point of doing it, if it's not required. Combining log and blocks, also allows garbage collection, and so on implemented if required. Yet in my case, I just start a new log every N megabytes, or N time units. And the old logs are deleted when not required anymore. So the log exists for debugging purposes, because all data in the log expires at the same time, there's no need to compact or garbage collect it. Which usually means saving data partially. Or compacting data on log is cycled, or copying non expired data and so on. Yet that can be really inefficient in case, there's lot of non-expired data, write performance will suffer greatly.
  • Something different? - Gerald R. Ford-class aircraft carrier.

Integiry, Python 3.6, Azure Cosmos, CockroachDB, Web Development, Viper, App Engine, Network Protocols

posted Jun 30, 2018, 10:44 PM by Sami Lehtinen   [ updated Jun 30, 2018, 10:45 PM ]

  • Integrity in messages and files, as there should be a clear trailer with optionally contains checksum, hash, or signature. It's always a good to have some kind of batch, message and integrity indicators. Like data level checksum or even final sum and count of rows in a batch, in a distinct format. Which makes it clear, that whole message was received and the content rows match with the trailer at least on some crude level. In worst case partial message could get through and part of data lost due to lack of proper checks. As example assuming that whole message was processed, when it was only partially processed. Of course this shouldn't be a problem, if other things are done correctly. But layered checking is at times a very good thing to do. Because you'll never know if someone has skipped some checks in the process stack, because "some other layer" will probably take care of that. And when everybody starts thinking like that. Yep, you'll already know what the end result will be.
  • Optimizations which made Python 3.6 faster than Python 3.5 (Video). It's very nice that Decimal is 40x faster in Python 3.6 than it was with Python 2.7. Because I'm naturally using a lot of Decimal's. It's also nice that ElementTree is 2x faster than it used to be. But what about JSON? Probably not, because they didn't cover than in the talk.
  • Azure Cosmos Database - Quicklyish checked it's documentation out, without actually running anything on it. Does seem pretty similar to Google Spanner and many other distributed storages, like the CockroachDB.
  • CockroachDB - Very similar to Azure Cosmos and Google Spanner. It's mostly interesting how they can manage distributed database with consistency and acceptable performance. Of course that performance for repeated actions on same objects is extremely poor, compared to non-distributed ones. But that's apples to oranges comparison. Here's reminder how they can do it without Atomic Clocks. Google Spanner use the TrueTime, which requires globally synchronized atomic clocks. This is also where the sharding comes important, because you can't increment sequentially something like a counter with distributed database, at quick pace, because there's certain time which the object is being locked with every write. -> Loop which increments counter in database, will be extremely slow compared to local implementations. But this is of course the totally wrong way of doing it. And should be avoided in these distributed cases. High Availability is of course extremely welcome feature for many environments. Distributed SQL is here to say. Also it's of course possible to do eventually consistent reads, when ever consistency isn't required. This allows very fast local reads without synchronization.
  • Web Developer Security Checklist. A nice short checklist how to develop secure and robust web applications. Encryption, Minimal Privilege, Key Store, SQL Injection, Prepared Statements, Vulnerability Scanning, Secure Development Environments, MFA, DOS and DDoS protection, Rate Limits, API protection, TLS, httpOnly, HSTS, CPS, X-Frame-Option, X-XXS-Protection, CSRF, API authentication and authorization, Input Validation, System Separation and Isolation, database, logical services, etc. Only white list small set of carefully selected hosts, restrict outgoing IP and protect traffic,  minimal access privilege for staff, rotate passwords and keys according schedule, centralized logging, IDS, no unused services or servers, audit and design, penetration testing, threat model, practice for security incidents.
  • Viper - Ethereum a new programming language. I'm familiar with the concept, but I haven't really looked into it. Maybe I could during the summer vacation or so. Or maybe not.
  • Python 3.6 support on Google App Engine - Finally, this is one of the platforms which I really like. If I need it. For most of my personal needs, cheap VPS is much more cost efficient. It's also awesome that Google App Engine (GAE) flexible environments are also available in Europe-West region. Which is of course part of Google Cloud Platform (GCP).
  • Network protocols - A very nice networking article, if you're not already familiar with Internet (IP) networking basics. Yet there was nothing new for me.

China, Spanner, SeaGlass, JSON-DL, Optimization, Benchmarks

posted Jun 23, 2018, 11:34 PM by Sami Lehtinen   [ updated Jun 23, 2018, 11:35 PM ]

  • Read about China's new Cyber-Security Law. It's interesting to see how it will practically work out. But it's clear, that it's giving hard time for foreign companies. Bei'An license laws. kw: China, Cyber, Border, Protection.
  • It seems that Microsoft has been lately pushing SPLA licensing checks on multiple cloud providers using their selected collaborator companies in each jurisdiction. Anyway, with current development on Microsoft's side, if someone mentions data security & privacy with anything to do with Microsoft or Windows, it should trigger immediate seriously meme faces and then hysteric laugh. As mentioned, it's totally clear that every computer running Window's isn't technically your or your companys computer. It's Microsoft's computer and part of their world's largest botnet. There's no need to worry about ransomware, malware, spyware, bloatware, remote backdoor installation / dropper, or anything like that. Why? Because your system is inherently infected and compromised, with Windows (tm) products, which fits into all of the previously mentioned categories. kw: SPLA, License Agreement, Enforcement, Audit, Microsoft, Security, Privacy.
  • How Spanner became a global, mission-critical database - Very nice post by Google. Strong consistency, ACID transactions. Nice. It's always interesting to follow database and big data development. Even if I don't have personally any need for such technology right now.
  • Reminded my self once again about Python iterators and generators. I'm not using those too often in my code. Mostly because I've got data to process, which can't be generated.
  • SeaGlass - Way interesting project to monitor and track IMSI Catcher (Stingray) usage in USA. Had to read all about the project.
  • JSON-DL - JSON based format for Linked Data. Pretty simple stuff, just standardized format. I'll use that in future when ever it's required and suitable.
  • An optimization guide for assembly programmers and compiler makers (PDF). I'll scan that quickly through, and see if it contains anything essential which I didn't know. Probably not, but it's always nice to be surprised by new important information you didn't know. kw: caching, registers, instruction decoding, pipelining, branch prediction, cache coherency, speculative execution, branch misprediction. Branch Target Buffer (BTB). So many different predictors and algorithms, nice. Indirect calls and jumps. Cache bank conflicts. These are generally good to know, even if I don't personally have any use. Register renaming was a new concept to me, I didn't ever think that stuff like that would be required. After all, it's more efficient to rename a register than actually transferring the data. Also the concept of using XOR EAX,EAX or SUB EBX,EBX was kind of funny. I know coders are creative sometimes but why not just store zero? I do remember the story of x86 memory segmentation where jumping 64K-2 forward was faster than jumping two instructions backwards. These sounds like technically similar quirks. Of course the Intel Skylake and AMD Ryzen sections were the most interesting ones.
  • Benchmarks between common programming languages - Really nice site. Yet as being said, suddenly C vs Python performance becomes totally irrelevant, when you've just got a single bad SQL query in your program. That's the primary reason I'm using Python, because it allows me to write reliably programs which do the required stuff pretty quickly without going into too many nitty-gritty details. I know performance could be better, but when profiling, the Python code runs usually less than 10% of the time and rest of the time is spent fetching data from different sources, even when using efficient in process caching. This is also the reason why I often use processes + threads. To improve CPU utilization and shorten execution time. 
  • Something different? Checked out Dongfeng-5C (DF-5C) - missile with nuclear MIRV capability.

Google Cloud Platform (GCP) Finland, no national connectivity

posted Jun 23, 2018, 11:28 PM by Sami Lehtinen   [ updated Jun 23, 2018, 11:29 PM ]

Just a few random traceroutes, these are actually horrible. Around 8 times worse than what you'll get with UpCloud or Hetzner, Finland locations.

user@instance-1:~$ traceroute www.funet.fi
traceroute to www.funet.fi (193.166.3.7), 30 hops max, 60 byte packets
1 209.85.246.60 (209.85.246.60) 10.181 ms 10.242 ms 209.85.241.33 (209.85.241.33) 10.655 ms
2 108.170.254.39 (108.170.254.39) 9.985 ms 108.170.254.40 (108.170.254.40) 10.203 ms 108.170.254.55 (108.170.254.55) 10.197 ms
3 se-tug.nordu.net (109.105.98.5) 9.722 ms 9.687 ms *
4 se-tug.nordu.net (109.105.97.46) 10.219 ms se-fre.nordu.net (109.105.97.68) 9.962 ms 109.105.97.24 (109.105.97.24) 11.085 ms
5 helsinki6-rtr.funet.fi (109.105.102.103) 16.170 ms 15.774 ms 109.105.97.24 (109.105.97.24) 12.191 ms
6 csc6-et-8-1-0-1.ip.funet.fi (193.166.255.36) 17.319 ms helsinki6-rtr.funet.fi (109.105.102.103) 17.064 ms csc6-et-8-1-0-1.ip.funet.fi (193.166.255.36) 16.667 ms
7 csc2-ae0-1.ip.csc.fi (193.166.255.15) 15.294 ms csc6-et-8-1-0-1.ip.funet.fi (193.166.255.36) 16.725 ms 16.721 ms
8 csc3-xe-0-0-0-0.ip.csc.fi (193.166.187.179) 17.314 ms 17.327 ms 16.618 ms
9 * * csc3-xe-0-0-0-0.ip.csc.fi (193.166.187.179) 16.681 ms
...
user@instance-1:~$ traceroute www.ficix.fi
traceroute to www.ficix.fi (37.233.94.160), 30 hops max, 60 byte packets
1 209.85.246.60 (209.85.246.60) 10.261 ms 209.85.241.33 (209.85.241.33) 10.366 ms 72.14.234.106 (72.14.234.106) 10.176 ms
2 108.170.253.178 (108.170.253.178) 10.449 ms 10.378 ms 108.170.253.162 (108.170.253.162) 10.665 ms
3 * * *
4 ae-125-3515.bar1.Helsinki1.Level3.net (4.69.203.26) 16.628 ms ae-124-3514.bar1.Helsinki1.Level3.net (4.69.203.22) 16.162 ms ae-125-3515.bar1.Helsinki1.Level3.net (4.69.203.26) 16.737 ms
5 212.73.248.186 (212.73.248.186) 16.494 ms 16.790 ms 16.235 ms
6 hn01.fi.cloudplatform.fi (37.233.89.11) 16.927 ms 16.586 ms 16.047 ms
7 37-233-94-160.planeetta.com (37.233.94.160) 17.422 ms 16.720 ms 17.389 ms
user@instance-1:~$ traceroute www.trex.fi
traceroute to www.trex.fi (195.140.195.51), 30 hops max, 60 byte packets
1 209.85.241.33 (209.85.241.33) 10.518 ms 10.722 ms 72.14.234.106 (72.14.234.106) 9.913 ms
2 108.170.254.35 (108.170.254.35) 9.556 ms 108.170.254.51 (108.170.254.51) 9.300 ms 9.434 ms
3 213.192.185.92 (213.192.185.92) 9.755 ms 9.504 ms 9.578 ms
4 ge2-0-0-0.bbr2.hel1.fi.eunetip.net (213.192.184.81) 16.722 ms 16.607 ms 16.635 ms
5 trex1.unicast.trex.fi (195.140.192.10) 42.175 ms 42.164 ms 42.134 ms
...
user@instance-1:~$ traceroute www.trex.fi
traceroute to www.trex.fi (195.140.195.51), 30 hops max, 60 byte packets
1 209.85.241.33 (209.85.241.33) 10.462 ms 209.85.246.60 (209.85.246.60) 10.239 ms 209.85.241.33 (209.85.241.33) 10.444 ms
2 108.170.253.184 (108.170.253.184) 9.791 ms 108.170.254.51 (108.170.254.51) 9.806 ms 108.170.254.35 (108.170.254.35) 10.182 ms
3 213.192.185.92 (213.192.185.92) 9.825 ms 9.890 ms 9.803 ms
4 ge2-0-0-0.bbr2.hel1.fi.eunetip.net (213.192.184.81) 16.815 ms 15.562 ms 16.792 ms
5 trex1.unicast.trex.fi (195.140.192.10) 20.199 ms 20.144 ms 20.131 ms
...

Having data center in Finland, doesn't mean that you would route anything via Finnish IXs like FICIX.

What should be expected? Rough estimate is that something around 3 ms should be acceptable latency.

Other interesting observations are, that Cloudflare servers Google VPS servers in Finland from Frankfurt, adding even more extra networking latency. That's strange.

At least some services are well reachable with low latency. As expected.

user@instance-1:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=0.590 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=52 time=0.337 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=52 time=0.216 ms
64 bytes from 8.8.8.8: icmp_seq=4 ttl=52 time=0.319 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=52 time=0.224 ms
64 bytes from 8.8.8.8: icmp_seq=6 ttl=52 time=0.369 ms
--- 8.8.8.8 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5107ms
rtt min/avg/max/mdev = 0.216/0.342/0.590/0.125 ms
user@instance-1:~$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=56 time=10.6 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=56 time=10.0 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=56 time=10.0 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=56 time=10.0 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=56 time=10.1 ms
--- 1.1.1.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 10.058/10.209/10.629/0.224 ms


As well as this blog, because it's hosted on Google Sites.

user@instance-1:~$ ping -c 5 www.sami-lehtinen.net
PING ghs.google.com (209.85.233.121) 56(84) bytes of data.
64 bytes from lr-in-f121.1e100.net (209.85.233.121): icmp_seq=1 ttl=52 time=0.638 ms
64 bytes from lr-in-f121.1e100.net (209.85.233.121): icmp_seq=2 ttl=52 time=0.348 ms
64 bytes from lr-in-f121.1e100.net (209.85.233.121): icmp_seq=3 ttl=52 time=0.261 ms
64 bytes from lr-in-f121.1e100.net (209.85.233.121): icmp_seq=4 ttl=52 time=0.247 ms
64 bytes from lr-in-f121.1e100.net (209.85.233.121): icmp_seq=5 ttl=52 time=0.259 ms
--- ghs.google.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4093ms
rtt min/avg/max/mdev = 0.247/0.350/0.638/0.149 ms

CPU info, seem sto be older Intel Xeon. cpuid level: 13

Traceroute to UpCloud Finland

 1  72.14.239.166 (72.14.239.166)  32.190 ms 108.170.226.2 (108.170.226.2)  32.244 ms 209.85.241.70 (209.85.241.70)  32.439 ms
 2  108.170.251.137 (108.170.251.137)  32.214 ms 108.170.252.75 (108.170.252.75)  32.302 ms 108.170.251.205 (108.170.251.205)  32.401 ms
 3  xe-1-0-0.decix-fr-mx1.ip.fne.fi (80.81.192.180)  32.809 ms  32.659 ms  32.707 ms
 4  87.236.158.142 (87.236.158.142)  52.528 ms  52.615 ms  52.533 ms
 5  87.236.158.145 (87.236.158.145)  52.307 ms  52.234 ms  52.365 ms
 6  87.236.158.164 (87.236.158.164)  52.554 ms  57.828 ms  52.507 ms
 7  87.236.154.1 (87.236.154.1)  52.467 ms  52.451 ms  52.411 ms
 8  87.236.154.23 (87.236.154.23)  52.745 ms  52.527 ms  52.899 ms
 9  r2-hel2-et4.fi.net.upcloud.com (94.237.0.54)  52.417 ms  52.644 ms  52.446 ms

Traceroute to Hetzner Finland

 1  72.14.239.166 (72.14.239.166)  36.262 ms 209.85.241.70 (209.85.241.70)  32.451 ms 108.170.236.248 (108.170.236.248)  32.916 ms
 2  108.170.251.204 (108.170.251.204)  32.099 ms 108.170.251.140 (108.170.251.140)  32.471 ms 108.170.252.78 (108.170.252.78)  32.117 ms
 3  * * *
 4  core8.fra.hetzner.com (213.239.252.9)  32.471 ms core9.fra.hetzner.com (213.239.224.178)  85.374 ms core8.fra.hetzner.com (213.239.252.9)  32.393 ms
 5  core31.hel1.hetzner.com (213.239.224.165)  35.569 ms core32.hel1.hetzner.com (213.239.224.154)  35.192 ms  35.177 ms
 6  ex9k1.dc2.hel1.hetzner.com (213.239.224.138)  35.059 ms ex9k1.dc2.hel1.hetzner.com (213.239.224.134)  35.060 ms ex9k1.dc2.hel1.hetzner.com (213.239.224.138)  35.035 ms

I guess that's it, for now. Everyone can make their own conclusions. I've been personally waiting to see Google as FICIX member as soon as they announced Finland DC, but probably FICIX is just way too small for them to care?


Data Integrity, 3D libraries, LetsEncrypt, Payments, Duplicati, Telia

posted Jun 16, 2018, 11:02 PM by Sami Lehtinen   [ updated Jun 16, 2018, 11:03 PM ]

  • Data integrity, that's a topic which could go on and on. I've seen so many cases where something is insisted to be used in production and even years after launch, they're unsure if the data is even nearly correct, or if it's absolutely wrong. It's something brown in and something brown out. Yep, it's awesome to have real-time dashboards and advanced analytics and all that. But nobody seems to care if the information shown is right one at all? Yep, that's funny. And I'm pretty sure, I'm not only one who has been observing this kind of projects. Actually it only gets funnier, when at some point someone who's accepted that for production finally realizes that any of the data doesn't make any sense and it's seriously misleading. Honestly, I would like just laugh at that point, but it's also so sad. Well, what did you expect? Why nobody cared about those important things, when it was time to take care of those aspects and verify data integrity and correct values with all kind of use cases.
  • 3D printed multiple pieces of different useful everyday items, which got actually utility value. So many 3D library sites are full of useless stuff.
  • I've been been planning the move of some sites to use LetsEncrypt. I've done all the experiments required so far using IIS, Apache, Nginx and uWSGI, Python 3 SimpleHTTP with SSL wrapper. So whenever I feel like it, it's trivial to transfer the sites to (valid) HTTPS certificate time. Currently all the sites are providing HTTPS, but with expired certificate. Which of course doesn't matter for certain use cases, where server is authenticated using public key fingerprint. But for some people it seems hard to grasp, that the hash of public key is good enough, there's no need for anyone other than me or us to say it's valid.
  • Finally in Finland there's nice real-time mobile payment method, which works well. And without credit cards. As I've written several times, I don't really get the Credit Card fetish, because in most of cases that adds running extra costs. Unless you've been very carefully selecting a provider without extra fees. There are a few provides which give you free credit card without any running extra fees. Yet you have to change provider at times, because usually those are campaigns which only last a few years or so.
  • Tainted leaks - Always mix correct and disinformation so nobody knows what's true. That's the way to push agenda. it's harder to find and dismiss the disinformation from document, which actually contains 80% of totally correct information.
  • Again faulty Duplicati 2 backup update. Default compression module is zip, and then the application claims that invalid compression module has been defined. Awesome! On top of that, downgrade installation ends up in state, where the primary binary files required are missing. My guess is that the developer is going to release a patch for that pretty quickly. Sigh! "Fatal error => Unsupported compression module: zip" Windows version: 2.0.1.61. Anyway uninstall, reboot and reinstall fixed the issue on all platforms. Strange, why update fails, nobody seems to know.
  • Telia and Sonera merger. I terminated one random corporate service via their own on-line service form, which is designed exactly for this purpose. It wasn't any random feedback form. When I called two weeks later to confirm that, they didn't have any clue about that. Well, I'm IT guy and I've been seeing some $hit software and crappy organizations. So I naturally had all the screenshots for them. But well, this is just as usual. Nothing works, and everything fails. Always use at least three methods and get always confirmation that things are as agreed. Sometimes it feels so stupid to verify something repeatedly, but way too often it has turned out to be actually a very good idea. They changed the termination terms too. This is so typical, first they provide conflicting disinformation and then they leverage the disinformation they've provided against customer. My honest opinion is that, it's an total a$s move. Yet I personally see that as kind of a good thing, because it allows all the utterly useless and incompetent staff to be replaced by automation in future. I've said this earlier, only thing that totally doesn't suck with Telia is their IP network. It's nice. But everything else, especially customer service, is absolutely horrible. Every time, repeatedly, over the years. The same thing just happens. Was it business or personal service, doesn't make any difference.

1-10 of 562