Blog

Google+
My personal blog is about stuff I do, like and dislike. If you have any questions, feel free to contact. My views and opinions are naturally my own personal thoughts and do not represent my employer or any other organizations.

[ Full list of blog posts ]

Microsoft Outook 2FA, Python For Else, Hyperloop, For Fun

posted Dec 10, 2017, 12:18 AM by Sami Lehtinen   [ updated Dec 10, 2017, 12:18 AM ]

Microsoft Outlook 2FA

It repeatedly fails with MSSPError -2147195113 until it just happens to work. I guess there's something wrong with their login / authentication process. - Fail. - Some people wanted extra information. Here's some: I'll give login, password, and after giving TOTP / OTP code, it gives me that error message. After reloading the page, I can give the very same code again and it might, or might not work. I've made sure it's not about TOTP clock drift. It doesn't matter if the code is 'fresh' or 'just expiring'. I've confirmed that this happens with Firefox on Linux and on Windows. I've also tried once Chrome on Windows, and it didn't happen. So I assume that there's some set of circumstances which activates this problem. I'm not aware what the set is. And after Googling, I didn't find any other reports. Therefore I'm asking if anyone knows what this is about, because I don't know. I didn't mention those facts, because it still doesn't change the fact that the error message has been popping up for months when logging in with Firefox.

Additional information provided by show more information:
request-id: 5a5a2918-f954-41e9-9464-5a6a5830fb23
X-Auth-Error: MSSPErrorException
X-OWA-Version: 15.1.947.19
X-FEServer: AM3PR04CA0097
X-BEServer: HE1PR04MB1498
Date: 3/6/2017 6:15:44 AM

Seems to be quite helpful. Afaik, this is information which is provided for 'competent' people. And I'm pretty sure, we're lacking the competence here. That's exactly why I'm asking if these would happen to be competent people. I'm sure that in this case, the amount of people who can deal with that message is actually quite small. Maybe a dozen people?

kw: Microsoft, Outlook, MSSPError, MSSPErrorException, OWA, 2FA, fail

Python - For Else

Did you know what you can use else statement with with and for in Python too? It's very handy in some cases. If the loop exists normally, then the else statement will be run, but if break is used to break the loop, then the else statement won't be run. - Let's time stamp this, because this might have been fixed before this post comes out. These remarks were jotted down on 2017-03-05 (YYYY-MM-DD).

for i in range(10):
  print(i)
  if i == 8:
    break # Else won't run
else:
  print('No i')

Windows Resource Monitor File Path - Strangeness

What's this strange Microsoft Windows Resource Monitor File Path? Why it shows c:3\$LogFile being accessed, why it doesn't show f:\$LogFile that's just strange. Because there is dive letter mapped for that drive
of course. So it seems silly referring to c: when it's not about c: drive at all. I did search for answer. But most of answers were like, yeah it happens. But nobody told exactly why it happens.

Hyperloop

Crazyloop, eh, no, Hyperloop. It's totally crazy to talk about Helsinki Tallin or Helsinki Stockholm Hyperloop routes. Even when technology is actually proven (if that ever happens). Even then it might be prohibitively expensive. Simple projects like channel tunnel, how's it even possible that those fail? It's just business as usual. So afaik, even talking about Hyperloop is crazy. Yes, it's nice if you're enjoying weekend and want to find some nice topics to talk, like superstition, UFOs, gnomes, dragons, etc... But talking about that seriously, as a business project, is just project-loon. ;) - I'm not saying it wouldn't work. It could, and when that's proven, and there's clear practical information about costs, and execution requirements. Then it might, just might, be economically viable. Even very simple projects like Helsinki Subway Extension to Espoo and Olkiluoto 3 nuclear powerplant have had dramatic schedule and cost overruns. But that's nothing, those will probably be in production use in future, and will run for decades. What about projects, which exceed over cost estimates for over order of decade, and won't ever be in production use? That's very a real possibility for first Hyperloop project(s). History is totally full of failures like that.

For fun

Just for fun checked out Tachyonic Antitelephone. Neat tech, at least in the very theory.

Just tried alternate formatting, because this post got just a few longer writings.

OVH IPv6, RAM vs Disk, Repetier, 10x Programmer, Sigularity, WhatsApp, Maintenance & Cleanup

posted Dec 2, 2017, 11:41 PM by Sami Lehtinen   [ updated Dec 10, 2017, 12:20 AM ]

  • OVH Management Console IPv6 zero compression still working incorrectly. I'm laughing so much. I don't know once again, which is harder. Is it hard to understand how to compress addresses. Or is it hard to make working code. And when they're unable to produce correctly working program, how hard it is to fix it. This is ridiculous. But just so common, that it's business as usual. I can also confirm that this bug only affects the management console. The initial email + API does return correct uncompressed address. (from backlog)
  • RAM vs Disk - Faster for cache? Might not be that obvious. It would be nice to hear guesses and speculation & reasoning why either option would be faster or not and how much.
    Let's assume that the cache server is used for following setup. 1 KB - 1 MB contiguous data blocks, with a short 64 bit key or something. Data is cache / temporary data, so there's no requirement for data persistence. It's ok to lose all data if the process is restarted. Server got 2 GB of ram, and let's say that the cached data is limited ton 64 GB bytes.
    Now, is it faster to allocate memory to store the cached data and let system swap it out as required. Or is it faster to save the data blobs as files, and let the system cache file system to RAM? Which options is faster and why? - Is there obvious answer? Next interesting questions could be if the test is actually run a with Linux and Windows system, does the operating system make any difference, if exactly same Python, Go or C code is run? Data read and writes will would follow random pattern with Pareto address access distribution. 90% read, 10% write. Blocks would be randomly sized between 1 KB to 1 MB with even randomly distributed size. Would if change anything, if Pareto distribution would be used preferring small blocks? Would it change anything if the read and writes would be balanced with 50%, 50% ratio? Thoughts? Guesses? This is just a thought play. Which could be tested trivially.
  • Some 3D printing stuff. Repetier on Linux (Mono, System.Windows.Forms 4.0) & Windows. FreeCAD, CuraEngine, Slic3r, 3D printing, PLA, ABS. It took quite a while to get everything tuned. But now things are working pretty nicely.
  • I totally agree with this post. Yes, there are 'mythical' 10x programmers. They are 10x faster and more productive than others. I'm also sure there are 0.1x programmers too. I've seen some, even extremely simple basic tasks end up taking weeks and still failing.
  • The singularity in the toilet stall - Article contradicts itself in several cases. Maybe that's just to show that these things aren't simple? But yes, generally I agree that some low level things work extremely work and reliably. Over engineering some stuff makes it just unstable, fragile and brittle. This is just the elegance in programming. It's easy to make something extremely cool. Which is actually something horrible. Keeping it simple, functional and reliable is the way to go. - The point I especially started wondering was. 'Is online news full of falsehoods? Add machine-learning AI to separate the wheat from the chaff'. Wow. It seems that we don't need courts any more. We just need AI which tells us what's the truth. That's neat. It's impossible to tell what's falsehood and which isn't. When we're talking about something which isn't exact science. Actually why isn't there global 'the truth' site. Which would make sense of all the scandals and wars, and tell us the actual honest truth about the situation. It would be so nice.
  • WhatsApp and storage space management. It seems that they've got fail programmers there. How about deleting non-referenced lingering junk from storage media? Nope? Too hard for you guys? - This is exact opposite to the ridiculous vacuuming of database by Spotify. - I just deeply hate applications and programs, which do not remove all kind of crap, junk and $hit those are scattering around the file system. Yuk! So when something becomes unnecessary, delete it immediately or at least clean it up in some kind of clean up batch run a few hours or days later.
  • I always add automated maintenance tasks to my projects. Taking care of junk, logs, temp files, unnecessary database records. And compacting the database, at some sane interval. It might be daily, weekly, monthly or yearly. Depending on task at hand. But it's always included.
    Btw. Just as bad are some system admins and operators, whom just seem to poop around the system, without any kind of logic.

Some thoughts about SSD speed, file systems and fragmentation

posted Nov 25, 2017, 10:10 PM by Sami Lehtinen   [ updated Nov 25, 2017, 10:10 PM ]

Had long discussions about the SSD fragmentation topic with one researcher. Here's very short summary. I used small files to reserve disk space and deleted those to free disk space and grow one larger file. This is all done on very high level. I used filefrag tool, to confirm proper, I mean, almost total fragmentation. It wasn't perfect, but at least 99% of blocks were non-contiguous. Read and write tests used 4 KB blocks and seek within file to read data. I could have overridden / by passed the file system and accessed the storage media directly, but then it would be just a seek test and nothing to do with file fragmentation.

Actually my testes doesn't directly indicate measure effect of file system overhead. It's just one part of the whole. Defining what's non-negligible is extremely relative term. Of course, it would be interesting to measure the file system overhead. Separately using some underlying device, or virtual device, with always provides constant random / linear access latency.
Whole point was that even with modern file systems, fragmentation does play a role decreasing read / write performance. On file system level, as well as on storage media level.

Since doing those SSD tests I've gotten also a bunch of high capacity cheap flash drives. With these the difference is even more drastic. Reason for that is that the drives do not use advanced FTL. Which means that when data in block is changed, it's always read-modify-write operation. Doesn't sound bad, yet. But when you hear that the data block size is 8 megabytes, then you'll realize that any 'random' or fragmented non-contiguous writes, will make writing to the flash memory very slow.

When the drive fills up and rest of remaining space between extents gets filled. It's taking ages to get data stored. Drive writes 40 MB/s. But after serious fragmentation it's more like 4 KB / 8192 KB . Resulting in about 2000x slowdown at the very end when last totally fragmented voids are being filled in the available space.

When people say that full flash drive is slow. It's not. The slowness with full flash (without FTL / trim) comes only from file system + fragmentation overhead.

Actually the 4 KB read request I've used in the tests is probably extended to a larger read request by the OS anyway. It also uses read-a-head caching. If that wouldn't happen the random read and reversed read should take same time. But because the random read was faster than reversed read, it means that the random read hit the data read to RAM cache with previous requests.

This is just the trick why to do high level tests. There are several complex underlying layers which affect the system results.

With the high end SSD, I guess the FTL works so fast that it doesn't play a real role here. I don't know the exact details of high end SSD, but I assume those do support extreme internal fragmentation without any visible performance loss at least on sequential reads. Allowing high degree of internal fragmentation also drastically reduces amount of write amplification. Especially when device starts to become near full.

From speed aspects, maybe there's something like the TLB cache issues with FTL too. So some reads might hit FTL caches and be faster than others, etc. Just like with CPU RAM caches and so o on. But that is implementation specific and therefore really hard to predict. It's also highly likely that sequential reads do work pretty well with FTL, because it's also doing look-a-head processing. Getting the internally fragmented data ready and available for subsequent requests. This benefit is lost when the data is logically fragmented on file system level. - Therefore file system level fragmentation also reduces read speed on SSD devices. It's just like RAM fragmentation, it's not often that bad, but in some specific cases it can make things lot slower.

OVH Strasburg (SBG) Outage - #OVHgate

posted Nov 19, 2017, 2:33 AM by Sami Lehtinen   [ updated Nov 19, 2017, 2:35 AM ]

Power outage, worst thing that could happen?

I don't think their disaster recovery (DR) is really up to the task? Why? They claimed that: "power outage" is "The worst scenario that can happen to us".

No, that isn't. Let's change that scenario to situations like:

  1. Over voltage, all hardware fried - Solar flare / EMP - Widely fried electronics - Some data centers are hardened against this
  2. Flood, all hardware flushed off, or massive fire
  3. Direct air cooling and nice amount of volcanic ash or something, like corrosive chemical leak
  4. Fertilizer ship / train going past the DC explodes / leaks
  5. Nation state (or any other competent party) gets pissed of at them, and wipes their systems totally hijacking all control of the systems, after monitoring operations for months and they really know what they're doing
  6. Internal sabotage, where key systems are targeted either in software or physical attack
In these scenarios the whole site is more or less physically wiped out or damaged seriously. Recovering that is big more demanding than the seemingly easy and trivial job of restoring power.

This is the reason why I'm always having full off-site remote backups, just in case. Because you'll never know, anything could happen. It doesn't matter what the hosting provider is. You can never trust external party enough. This these procedures are used for all data, no matter the project or system. Always keep data secure.

About costs of downtime

Well, costs can be indirect. It's hard to even estimate costs from downtime and or recovery. If it's downtime alone, it might not be that expensive. But if the situation would have been worse. And systems would have needed to be restored from off-site backups. Yes, it would have been several thousands of Euro immediately and directly. And even more indirectly, when users require compensation, and data is lost and there's extra data synchronization, and restoring data lost due to restoring potentially day old backup, and so on.

Time to recover

In that situation we're probably talking over tens of thousands easily.  It would have meant basically redirecting all resources on system restoration (probably on other service provider) and ... Lots of work, before everything is working. Probably one week before most important stuff is working and before everything is working, would have taken around one month. Also if it's a big provider which goes down, it might also lead to sudden high resource demand on alternative providers, which probably would run out of resources as soon as people start realizing that the outage might take very long time to recover.

Yes, of course everyone has considered these things when making their Disaster Recovery Plan (DRP). Good thing. Stuff can be restored. Bad thing. It would take a lot of time and cost a lot, and probably cause indirect costs in loss of customers and tons of bad will, and so on.

There's also some stuff which is considered not worth of backing up daily to off-site location, because that data isn't "critical". But it's still something which would be still essentially recreated in case of total loss of DC & storage. This can be covered by server snapshots, but those taken weekly or monthly depending from setups. Or not at all. If not at all, then usually configuration and stuff, is also backed up daily.

From some of the posts complaining about the situation, it seems that the users / clients hasn't made proper DR preparations. Providers like UpCloud clearly state that the clients are required to have off-site backups of all critical data.

High Availability

Also if uptime is so important, in these kind of situations. As soon as the issue is detected, there restore to another provider / location should be launched immediately. Or even there should be already alternate replicated sites, where your systems can fail-over automatically. - These are the discussions which pop-up always when there's issue with Amazon. If the service is important, you shouldn't trust only one Advisability Zone or Region nor you should trust even one Cloud-Provider. - These are the topics I'm always bringing up, when someone says that they'll need system with high availability.

That's why there are solutions like Google Spanner, if data is actually that important. So it can be replicated to multiple locations in real-time.

But as we all well know. When absolute high availability isn't required, cost is usually the reason why such solutions aren't implemented or used.

CRDT, SSE, Hash tables, HTTP Headers, IIS, NTFS, Navigation, Hash functions, PowerShell

posted Nov 18, 2017, 10:32 PM by Sami Lehtinen   [ updated Nov 18, 2017, 10:32 PM ]

  • Conflict-free replicated data type (CRDT) - kw: Eventual consistency, Strong eventual consistency, Operation-based CmRDTs, State-based CvRDTs, Sequence CRDT.
  • Fingerprinting Firefox users with Cached intermediate CA certificates - Well. Isn't it pretty clear that any identifier which is stored and re-used / returned to server can be used to track users. That's why it's important to discard all data received from work from time to time. It doesn't matter if it's certificates, cookies or anything else which creates unique repeatable pattern / identifier. But still a good find, it's important that these things get noticed. Even this doesn't probably matter at all to normal users. Just like HSTS super cookies HSTS Super Cookie. This is also one of the reasons why I've been asking for per tab security. Just close the tab, and all the junk related to that tab / session is gone.
  • Found out that SSE 4.2 adds new instruction for computing CRC32.
  • Really nice technical post about fast hashtables. Also very nice charts, showing how hashtable load affects on performance.
  • OVH IPv6 address compression fail is still there, a full week later.
  • Had more long discussions about expires, last-modified, etag and cache-control max-age HTTP header values (headers). But there's clear use for each of those as well as those can be used efficiently as combination to improve performance, load time and reduce required bandwidth.
  • Had more fun configuring IIS and NTFS access rights, etc. It's obvious when you know what you're doing. But it might take a while to figure out what's exactly wrong. Before you know that it's obvious that it's incorrectly configured. Also configured system to use detailed error messages, which makes debugging early issues much easier.
  • Reminded my-self about old stuff, like aircraft navigation technologies - ILS (RDF, VOR, G/S, DME, Beacons) when I was using Microsoft Flight Simulator on 8086 with Hercules display (higher resolution than CGA). Yet as expected GPS boosted with WAAS, EGNOS, LAAS, D-GPS etc are taking over these legacy technologies. For many applications also Internet based Differential GPS sounds like great option. Like DGPS RTCM. In Finland visibility of geo static satellites is quite low, so mobile network + Internet will provide more available option, especially in northern parts.
  • he.net Network Tools - for Android is pretty nice app. I've been using it for a long time. I also like how they've included HOTP TOTP OTP generators. Even if some people seem to confuse this with Google Authenticator.
  • Lifetimes of cryptographic hash functions - Very nice chart indeed. From theory to practice. I especially lowed the "slashdotter " / "hacker news" / "reddit" approach column. Very entertaining and accurate. What did I just mention in my last post. This really hit the sweet spot. Excellent humor with hard facts.
  • Checked out LTE Advanced (LTE-A, 4G+, LTE CAT 6, LTE CAT 9, LTE CAT 12, LTE CA, LTE 3CA). Because operators are actually rolling it out in volume now. This allows customers to use 300 Mbit/s Internet connection on their mobile phones. Yet the real speeds users are getting can be drastically lower.
  • Converted some of my old basic Python management scripts on Windows Server 2016 to use PowerShell. It's just very handy for small tasks on Windows Servers.

Cloud hosting, IIS, PS, Cloudbleed, CDN, Hashing, 2G networks

posted Nov 11, 2017, 7:28 PM by Sami Lehtinen   [ updated Nov 11, 2017, 7:29 PM ]

  • Don't put everything in one basket. It seems that many of the Cloud providers say, that it's best that if you're only using their proprietary platforms. I find that horrible. What if something happens, you've got code base which you can't run elsewhere. And you're basically seriously vendor locked. What if that happens suddenly and by accident and you'll have to move NOW to another platform, because the platform you used to use, is now dead. That's why I like bit more versatile setups, and having preexisting agreements with several cloud providers. If provider A got issues, we can relocate quickly, everything to provider B or C. We're familiar how providers B and C work, we've got agreements ready and everything can be done easily in less than 24 hours. It would be nice to see how long some provider which got serious Amazon lock-in, would take to transfer everything from Amazon to Google Platform or Azure. It should be just switching service provider, not more complex than that.
  • Enjoyed IIS fun, no support for SNI. Awesome, so much about using Let's Encrypt properly for several sites. Sigh. Well, maybe I should just update that system to Windows Server 2016. Otherwise let's encrypt worked really nicely, when configured using default paths and no access restrictions. Of course sites which require basic auth, didn't work out of the box. It was required to configure anonymous access for .well-known path.
  • Had more fun with PowerShell and PowerShell ISE + different execution policies and so on. But after a bit of tuning and playing I got what I wanted to work as scheduled task. I'm not fluent with PowerShell, but with dedication and sweat I'll get done what's required. Actually I'm pretty sure that PowerShell could be used for most of cases for the stuff I'm using Python mostly for.
  • CloudBleed, the Cloudflare memory leak. Lot of upset people. I don't know. It sounds just so common programming fail, that there's nothing special about that. Only problem is that some people clearly choose to transmit confidential data using large shared cache, with bugs. Well, what did you expect? We can check history and this is nothing 'new' or 'special'. Same applies to the SHA1 case. I might sound pretty cynical, but that's just the truth. Things work, when those work, and sometimes, even often, won't work. That's how it is. Of course fixing the issue in this case won't erase the leaked data from search engines, and any other services which store data from web sites. Like different kind of archives, and data on users browser caches, etc. So definitely not a nice thing, but nothing special in generic security context. - I noticed someone else has been having similar thoughts, and those generally weren't really liked. Well, trugh isn't always nice. - Had a chat with one friend about this. He immediately said that Cloudflare is only used by static assets. All PII information is passed directly to the system passing Cloudflares caches. That naturally is exactly what everyone should have been doing. There's no point of using CDN for user specific non-cacheable data.
  • What are the real pros and cons using something like Cloudflare. Early SSL termination, nice. Shorter round trip with end users (helps with packet loss recovery). Quicker TCP window growth, etc. Should APIs pass through Cloudflare or access API servers directly. Is there any benefit from caching? Content Delivery Network (CDN) can validate object every now and then and let the client know that it's still valid, like using last-modified, expires and etag.
  • SHA1 & hash & security related topics, revisited: Birthday attack, Collision attack, Preimage attack and of course the 2nd preimage attack.
  • Time passes, 2G networks are being shutdown around the world. Some are already gone, and many are going down in 2017 and following years soon after that. This opens great changes for Sigfox and other technologies like LoRaWAN. Even old 2G systems need to be upgraded to use new 'modems' for signaling.

IPv6, OpenBazaar, Swapping, JSP+, TLB, WLAN / WiFi

posted Nov 5, 2017, 1:37 AM by Sami Lehtinen   [ updated Nov 5, 2017, 1:37 AM ]

  • The register writes - No more IPv4. IPv4 address shortage is getting worse. No news? That's hardly unexpected. Yet making the news really funny is the fact, that the register doesn't yet support IPv6. This makes them awesome false prophet. Do as I say, not as I do. Hahah.
  • OpenBazaar 2.0 is coming, with Tor support. That's awesome. Another cool feature is encrypted secure chat over IPFS and Tor.
  • Windows Security Updates Guide (SUG) is here. No more traditional security Bulletins. No more patches, just monolithic 'updates' to the 'latest' version. No way to select what updates will be installed and which ones wont. Some friends hated this, but I can also see benefits. Having messy partially updated systems can be a source for many strange problems. If you've got single 'solid' version, it's much better.
  • Julia Evans posts about swapping. These are the reasons why these topics are constant reason for argument. There's always lot of people whom don't know about these things and are learning about these. Some things might seem more surprising than others. Even if the logic behind the memory usage / allocation is totally well, logical. Things are weird and confusing, just because you don't understand those. But it's still a very nice post. I personally always recommend using swap, because reserving valuable ram with 'stale' stuff, just doesn't make sense.
  • Just wondering when Suomi Communications / SuomiCom would provide IPv6 for VDSL2 customers. It seems that routing using 6to4 is extremely bad at times. I'm pretty sure that more attention has been paid to native IPv6 address routing, naturally.
  • Can't stop loving cheap electronics. One led light power supply just short circuited and burned fuse. Well, that's actually ok. At least it burned the fuse and didn't actually start burning. That would have been way worse outcome.
  • JPS+: Over 100x Faster than A* - Techniques: JPS+, "Avoid redundant paths on grids". Goal bounding, "Avoid wrong directions on any kind of map". NavMesh. Dijkstra. Sub Goal Bounding. Comments: Efficient route planning, is quite interesting science, because there's no single right solution. There are several separate approaches depending on multiple factors. Quickly looking the JPS+ Global Bounding seems extremely efficient. - Very nice examples. There are several ways to get the exactly same job done. Some are just bit more efficient than others. Also it's a big question if something can be pre-processed or not. Pre-processing is all again about different trade-off.
  • TLB - Long discussions with fellow developers about TLB and problems caused by flushing cache with lookups. As well as MMU - Related performance issues.
  • Had endlessly long and semi pointless discussion about WLAN / WiFi reliability and how much better 5 GHz is etc. Well, as said it depends. In some cases where 2.4 GHz is crowded and 5 GHz isn't, it might make big difference. On the other hand, if you've don't have good 5 GHz router with DFS/TPC you might be locked to use just four 20 MHz channels (Europa), which are probably getting quickly crowded. Also it's good to remember that there are serious signal penetration issues with 5 GHz. It's pro, it makes the range much shorter. But if people think they'll get better WiFi using 5 GHz when 2.4 GHz signal is weak, they're probably getting it totally wrong. - Some times people just got impression that something is greatly superior to something else, without considering the related facts. There are multiple factors to consider and overall consideration is only good way to go. Also had very long discussion about Fragmentation Threshold to split large packets smaller and usage of RTS / CTS etc. Truth is that those values are 'easy to tune' based on on-site tests. But there's no way to tell 'best settings' based on 'wifi is bad'. Duh! - Also it's easy to forget that the situation is dynamic, in changes all the time, so there's no best static configuration for any site.

Falsehood, IoT, Abstractions, Refactoring, Cloud Spanner, Tools, Fuchsia, HEAD, Sigfox

posted Oct 28, 2017, 3:08 AM by Sami Lehtinen   [ updated Oct 28, 2017, 3:09 AM ]

  • List of falsehood's. Nice! CSVs and RESTful APIs. And of course the classic fast, that companies like Microsoft got nobody competent enough to even validate email addresses, I'm referring to outlook rejecting totally valid and working addresses. - The CSVs list made me really happy. 
  • Changed loT of internal stuff to use IPv6 alone. Global addressing is just so much nicer than the good old horrible NAT mess with port and IP forwardings. So even if public facing services are also available over IPv4. Many backend services are actually available only by using IPv6. Firewalls, networking and everything is easier to manage with IPv6 even if some seem to claim otherwise.
  • Lot of Python code I write uses abstractions so that basically it's trivial to cloud port it or to run it locally. There are just handlers like get or set data. Which can then be trivially mapped to any kind of blob / indexed data storage. Firebase, RDS, PostgreSQL, MySQL, MS SQL, SQLite3 or flat files, blobs in database, or S3 / B2 buckets.
  • Wondered how many times some basics need to be refactored. Like basic project login & authentication. First basic, then something cool, and then people get annoyed by that cool being too complex, and fall back to basics. Sigh. Well, yeah. That's it. Over engineering sucks. Another funky thing is asynchronous requests and responses, does it require tracking ID and does it really matter at all. Should it be random or sequential, etc. Or is it just utterly meaningless.
  • Cloud Spanner @ Google Cloud Platform - JDBC, ok. I haven't needed JDBC so far, but I assume it's not a big deal. Pretty similar to other connectors. I did read several papers about databases and so far it seems very likely that Cloud Spanner (must have) serious rate limits, due to latency and strong consistency. That's just the way it has to be. Same restrictions apply even to NoSQL databases, if strong consistency is required. Without breaking CAP, it's impossible to make distributed databases fast with strong consistency. Of course these limitations can be alleviated on software level, using different fall back logic. But not on database level. It's good idea to read the full - white paper - it's only 7 pages long. -. The white paper just confirmed what I thought. Very serious rate limits. Which of course need to be considered when developing software using the database. kw: TrueTime, Google, Spanner, Cloud, CAP, geographically distributed, Chubby.
  • Had a very long discussion with one team about options. Should we use existing tools, and suffer from potential limitations. Or build something new, and suffer from bad implementation and constant problems. - It's not easy trade-off. - It's nice to be in control, but it can be also very bad thing, depending from aspect. - As said earlier, there's no answer to this question. It needs to be evaluated case by case. Even then it's large possibility that the chosen option isn't right one.
  • Read an article about Fuchsia / Mojo and merging of Android & Chrome OS. Which AFAIK does perfect sense.
  • Noticed with one particular web app that curl -I and curl -i does actually make different requests. - It doesn't just show headers, it makes HEAD request. When -i includes headers in response. For many of my apps, that doesn't really make difference, because head request gets same reply as get, but for some apps it does make difference, and can mislead you.
  • Once again, I don't want to discuss matters, and generic blah blah. I prefer clear proposals, with facts. Not some ambiguous "We're so awesome, our blah blah is better than blah." - Very simple measurable / fact based arguments, plz. I really don't know how silly some customers clearly are, because it seems that large part of vendor marketing is totally underestimating their customers.
  • Registered a account for LoRaWAN / LPWAN Loriot.io service and checked it out. Looks good. Sigfox coverage isn't yet as good in Finland as I thought. It seems that several providers are betting on LoRaWAN / LPWAN in Finland. Sigfox provides longer range and cheaper hardware, but it's closed system.

AMP, asyncpg, Python Secrets, WA 2FA, LAN protocols, LinkedIn, Crypto, PostgreSQL, Ubuntu 17.10

posted Oct 20, 2017, 8:09 AM by Sami Lehtinen   [ updated Oct 20, 2017, 8:11 AM ]

  • Several articles about AMP. Well, AMP is slower than fast website with CDN. So why would anyone use AMP? I personally don't like it too much. I believe in lightweight bleep free websites.
  • Played a little with asyncpg. That's fast. I'll use it if and when required with suitable setup / configuration / use case.
  • Tested and played with Python 3.6.0 secrets module.
  • Am I the only one who finds the WhatsApp constant two-factor authentication (2FA) reminders extremely annoying? Of course I go the 2FA key stored safely. It doesn't mean that I would need to remember it.
  • Reminded my self about a few things before local area network (LAN) management meeting:
    MSRPC, mDNS, WMI, IPC, SMB, and neat tutorial about calling RPC functions over SMB.
  • Whoa, LinkedIn SMS 2FA is working again. It was broken, and they claimed it was my fault. It took them a week to admit and fix the issue, but now it's working again. I'm not happy about the initial response. But in average, the end result is better than usual. Most of helpdesks just feed you FAQ lies and don't even bother to look at the problem. Getting the problem actually fixed is quite rare. I'm still asking LinkedIn to add backup codes to 2FA, and allowing TOTP as alternative 2FA method.
  • Person being held prison as not divulging the password? You'll just need to XOR these together: random: ea74f9e9db9c514f data? : 8c019a82fbe53e3a Got my point? Random can be anything. As well as data can be anything. So it's possible to produce whatever 'evidence' is required. Often when talking about encryption, the random is pseudo random stream derived using some algorithm and then used with cipher. But it might or might not be that way. Everything is possible. Mixing bits around is trivial, making it in secure way is harder.  In some cases there's no password. The data can be on one device and the 'random' can be on other device. Also the random or the data part can be encrypted. - This is actually where 'standard encryption' is bad. Because it makes it pretty easy to know if it's cracked or not. Everyone also says that using standard crypto is a good idea, and doing something non-standard is a bad idea. But it's not always that simple. When using non-standard crypto, it makes many things very much harder. And at least the attacker needs to use valuable resources like cyrpto experts to try to decrypt to data. That's why some times bad crypto might be actually better than high end standard crypto.
  • Is PostgreSQL good enough? - I've believe, that the answer is usually yes. I'm also often thinking that SQLite3 is also good enough for most of cases. Of course it's possible to mix dozens of technologies and then spend months or years having issues with those, as well as complicating setup, configuration, installation, version management, etc all with excess complexity. When you could just use a simpler approach. Some projects are large, require different technologies. But for many projects, using mess of technologies is guaranteed way to hinder development. You'll end up tinkering with different cool tech toys, instead of getting the job done which pays the bills. We're making products / services to solve customer problems. Not to do academic research on different ways how to to accomplish same thing, using other neat technologies. - It's easy to forget how much research and study is required to make any new tool actually usable in production and so that everyone understands how it works. - Because it's often a real challenge to get that done even with one technology. - Afaik, it's newbie mistake trying to mix every possible design, paradigm, library, framework, something cool, and new, in a single project. - That's why you should have separate study projects and learning time, when you can play with those. And not trying to push all that stuff into production projects.
  • Ubuntu 17.10 release notes - GNOME, Kubernetes, Linux, Visual Studio, Snapcraft.io - Snaps with delta updates, Robot Operating System (ROS), OPAL storage - Quite a nice list.

Future Retail, CF, IP Spoofing, Pigz / GZip, Browsers / Apps, Cubes-OS, Projects, XLSX / CSV

posted Oct 15, 2017, 12:06 AM by Sami Lehtinen   [ updated Oct 15, 2017, 12:08 AM ]

  • Watched a set of future retail store videos from Youtube. Some of those videos were kind of funny. I've got a very clear vision of fully automated retail store. But in some of the videos there were extra manual steps which were mostly requiring additional labor and making the process awkward.
  • What's my vision for future retail? Afaik, perfect concept would be 100% automation. So basically it's like shopping at any web shop. Some Finnish chains are already offering 'pick-up' service. But it still requires someone to pick-up the goods. In my concept, store is fully automated, and there's no precollection of goods based on orders. Order is only automatically collected when you arrive or are arriving. So if you can't come for the pickup, good won't get collected. If collection system is fast enough, and not the bottleneck of the system, precollection isn't required at all. Adding precollected storage / delivery unit could be used if the collection process is the bottleneck. Let's say that the peak load of the system is 17:00 after people leave from work. In that the collection and temporary storage process could get started a few hours earlier. Minimizing precolection allows keeping the products and goods in right temperature as long as possible. Of course the option currently provided in Finland is great precursor for this. Because the basic concept to end customer remains the same. Web / mobile order & pickup / delivery. If people love and use that. Then it's easy to calculate where the break even for full automation goes. Because it's going to be expensive, really expensive. That delivery option also is perfect fit for future automatic cards / delivery vehicles. These things can be easily integrated to any existing web-shop.
  • CloudFlare servers getting MITMed? Yep, so it seems by Airtel. This is just another reason why backlinks from CDN edge nodes should be also encrypted. It's also good to read implications section of this post.
  • Strange Loop - IP Spoofing - Awesome talk about DDoS IP Spoofing and attacks. It would make so much sense to filter spoofed traffic, but there are still lot of ISPs not doing it. Nothing new in this post, but it's great overall summary.
  • Pigz, which stands for Parallel Implementation of GZip. That's nice. I usually often opt for 7-zip LZMA2 compression when compressing files. Totally unrelated but the multi-threaded parallel PAR2 I'm using seems to segfault often on smaller files.
  • Browsers, not apps - I love web browser vs apps. Especially for cases where I'm not using those too often. I don't want too many BS apps. No I won't install your app, go away. Unfortunately many websites try to force users to use their BS app. For no reason whatsoever. I'm sorry that you've got so incompetent web developers and no, I still won't install your crap app.
  • Played with Cubes-OS. I really like the application isolation. I would love to get that tab isolation on browsers too. But currently I don't have use cases for that technology.
  • Watched a documentary about Google Moonshots aka Google X projects. I liked their approach, it's very similar to mine. Let's take the hardest problem first, if that's not resolved then we know that the project is done, aka canceled. I think that's the best way of taking on projects.
  • Had long discussions with one team about XLSX and CSV. They wanted to use XLS / XLSX. I said, XLS sucks, let's use CSV. It took quite a while to make them realize that MS Excel files are horrible mess, when UTF-8 CSV is something beautiful which just works and is compact. It's possible to turn CSV data into XML bloat, but it doesn't actually bring any extra value. It just makes parsing lot slower and might expand the amount of bytes (not data) by order or magnitude or even more. Why bytes not data? Well, that's just because the data is actually the exactly same what would have been provided in CSV format. So, don't ever ask me for excel import or export, I'm gonna hate it. Yes, I can do it,I've got naturally preselected and tested libraries for that, but it's not a really good thing to begin with. As far as I remember there has been only one project so far which has selected XLSX for the official data transport format for integration.

1-10 of 516