My personal blog is about stuff I do, like and dislike. If you have any questions, feel free to contact. My views and opinions are naturally my own personal thoughts and do not represent my employer or any other organizations.

[ Full list of blog posts ]

Copy Paste Code, Implementations, Clock Skew, Time Zones, Network Intelligence

posted by Sami Lehtinen   [ updated ]

  • Some developers seem to like copy paste code, but I just can't stop hating it. Like writing tons of code like if something != something_else: something = something_else and modified = True and then if  modified then save object. And copy paste that for 100's of lines with different variables. I converted that horrible copy paste bloat to simple code like like update([(list, of), (variables, to), (be, updated), (with, data), (in, tuples)]. Got got much cleaner, it's more reliable and lot faster than the most simple implementation. Btw. There are three standard levels how similar method exists in the code. Some of code parts just updates variables and always saves. But that's horrible, it's really slow and consumes lot of resources slowing system performance. Do programmers get that? Nope? Copy paste code is huge improvement over that in terms of performance. But it's really error prone when coding. You'll just quickly snag some snippets, add more code, maybe if statement is wrong, maybe update is wrong. Maybe you didn't update modified flag and then everything is wrong. If you save without checking modify flag, then it's just slow. Simple is beautiful, but it often provides really bad performance. Best next thing is to make code which takes care of the complexity making rest of development work easy. Of course it's easy to say that you made 2000 rows of code in a day, instead of 6. If you just copy paste and edit some rows, generating bugs meanwhile. - This is actually one of the low hanging fruits of optimization. Program can run a lot, several orders of magnitude faster if useless transactions and writes are dropped. Cost of that is of course checking if update is required and keeping state. But that can be written in library so it happens completely transparently. Yet when layers start to overlap it can make code really hard to understand. This applies to many existing tightly mixed libraries which are really hard to understand unless you just start reading the whole code and experimenting with individual functions. Yes, it might be easy to use the library, but when you want to modify how it's internals work or to extend it or add external calls doing something which the original writer never thought the library would used for... That can be quite hard and also easily triggers unexpected side-effects by breaking internal relations, references and or state. Yet, not modifying library might lead to tons of bad code, bugs and poor performance.
  • More fun with complex implementations and data which got absolutely stunning amount of different mathematical cross references. It's funny when one part bugs, and causes invalid data, nobody notices that for cases A, B and C. Why? Because those implementations use different parts of the data to get the data which they need. Everything seems to be working well, until there's implementation D which does exactly the same as A, B and C, but using different criterion and then boom, we have a problem. As example one program check that 100 - 20% = 80 One program uses the numbers to define percentage, but then another program use the first pair or last pair or what ever, and then rounds it and... Yeah, you'll get the picture. Business as usual. Cross checking data could be really fun in some larger organizations and could reveal really interesting data quality problems. Take three independent or lightly integrated totally different data sources and then start cross checking. If you get same result, that's absolutely amazing. I'm pretty sure you won't. Probably you'll find that the data quality is so bad, that it's amazing that anyone hasn't noticed it earlier.
  • Clock Skew Exists? - Yeah it does. I just today spent quite a while fixing few servers having serious clock issues. Yet most common big fails are due to invalid time zone configuration on some level. VM BIOS clock / RTC settings in Windows registry, etc. In case of Finland it usually leads to 2-3 hours (EET, EEST) difference due to UTC being mixed up with local time.
  • Watched a documentary about Finnish Network and Communications Intelligence gathering. Only thing they said in that documentary which I haven't heard earlier, was that Sweden has been spying for Finland. Based on on earlier documents, that could have been easily assumed. It's also known way of working around national laws, is letting 'others' outside the law do the dirty work and then co-operate with them.

Cheap Computers, Know your tools, Python, Databases, testing, CBCrypt, SSH best practices

posted Apr 26, 2016, 7:58 AM by Sami Lehtinen   [ updated Apr 26, 2016, 7:59 AM ]

  • Quickly checked list of cheap hardware computers. But cloud is so cheap that running hardware systems just doesn't make sense, unless it's a hobby.
  • Wow, once again. It's extremely important to know your tools in every detail. Almost committed a massive fail into one project. There was a bad bug, which I fixed with even worse one. Never read documentation and implement something based on it alone. First you have to write experimental code and verify how things ACTUALLY work, because most of documentation doesn't cover all possible edge cases. It's very important to know, what happens with certain special cases which aren't fully covered by documentation. I've just in a few months fixed tons of stuff like that. Which could well pass and get into production, unless there's extensive testing. Yet, this is one of the reasons why this kind of fails are just so common in software. Also because many of these fails are connected to locking and complex transactions, it's well possible that all normal unit tests pass with flying colors. Problems only arise, when transactions start to take a long time, due to excess load on system etc.
  • Nice article, Python is the new BASIC (2008). Yep, that's just why I like it.
  • Excellent presentation: - Torturing Databases for Fun and Profit | USENIX - Highly recommended reading if you're into databases. - Don't trust your database(s), those are broken. Yes. You just haven't noticed it yet. ACID? Yeah, so they claim, but that's not the truth. - Transaction isolation violation, durability and transaction recovery failures, ordering failures, etc. It's nice that they optimized the error spotting based on predictions instead of doing exhaustive search. Transaction is partially committed, great? What's the point of transactions then?
  • Previous also applies to pretty much the rule I've found out. If you claim it's working, it's probably due to because you haven't tested it. If you start testing things properly, you'll be finding so staggering amount of issues that you might not want to do it after all. So, don't test, and you can honestly claim that it's working and there's no proof of things being otherwise. - Traditional testing methodology may not be enough for today's complex storage systems. Thorough testing requires purpose-built workloads and intelligent fault injection techniques. Well well. This applies directly to the second bullet point of this post. Everyone assumed that the code was working, and it almost was. But it could still overwrite already committed transactions with old data. Of course this only comes up very rarely, and can be hard to spot in most of cases.
  • CBCrypt - Encrypt password on client side. There's no point to send clear text passwords to the server. Works also for authentication secrets and encryption keys. Yet CBCrypt 2.0 isn't anything special. It just uses PBKDF2 with Salt and SHA-256. It's basically what anyone should do anyway. It's just standardized way of doing it.
  • SSH best practices - Nice post. Nothing new.

Capacity planning, utilization, Big Data, Oscobo, MemComp, ECDH, IPv6, RDBMS, IPFS, Hybrid War

posted Apr 23, 2016, 11:55 PM by Sami Lehtinen   [ updated Apr 23, 2016, 11:55 PM ]

  • Lot of discussion about capacity planning - and capacity target utilization.
  • Big Data, Privacy and The Dark Side of Big Data. Nothing, but a good compilation of current issues & topics. Isn't it classic that "Anything you say can and will be used against you", so anything you do or have done, can and will be used against you in era of big data. Do you know who got your digital finger prints? Anyway, great article. I don't have anything to add. I really liked the observations that even if I wouldn't personally leak my info, someone else will probably do it for me. - Thank you for doing that.
  • Article about Oscobo, Britain's first anonymous search engine.
  • Checked out something different: ACTUV.
  • SafeCurves - Excellent site about ECC security and known issues with different curves.
  • Watched 32c3 talk about IPFS and also wrote comments about P2P (direct every peer to every peer) vs decentralized (federated model) vs Mesh (distributed) networking networking.
  • Windows 10 memory compression - No surprises there, basic stuff.
  • Something different: Brahmos, LRASM, Klub (3M-54E1), XASM-3, AirMule, S-500 Missile System.
  • Watched a documentary about underground Internet gambling, in dark nets and what kind of businesses are revolving about dark nets and Bitcoin.
  • ECDH Key-Extraction via Low-Bandwidth Electromagnetic Attacks - Classic TEMPEST attack. There is a reason why key systems should be shielded, now news in that sense. kw: EMSEC, COMSEC, Side-channel attack.
  • Had several new integration project meetings again. I don't get what the problem with integrations is? If there's clear logic and customer is able to pay, I don't see any problems with any integrations. The it's just some work that needs to be done. As far as I can see, there hasn't been a single failed integration project. In some cases making integration isn't feasible due to economic or political reasons and that's not a failure. It's just clear nope, can't do.
  • An interesting post about IPv6 address formats.
  • Using RDBMS as Queue or Messaging is ok - I agree with that. You should consider the project as whole. As well as adding new technology adds complexity. If you know how X things, and you add just new Y and assume it works similarly, you're going to quite likely fail. Then you quickly add some patch code, which also fails, etc. Sometimes it's just amazing how many bugs you can fit in small code when using additional libraries or technologies which you really don't know or understand. I just assumed how it works, boom. That's it, massive fail.
  • Great discussion about IPFS on Hacker News. It seems that most interested about IPFS are people who don't get what it is. Nor they do get that it's nothing new. Content addressable networks are not a new invention at all. As well as there are many networks which work using content addressing. As example Freenet and GNUnet. There are pros and there are cons with content addressing. I've written lot about content addressable networks and there are aspects which I really like. What I do not like is the bleep bleep hype bleep factor which IPFS is riding on. I just hate all kind of bleep projects. How about just telling what it is, without adding bleep loads of bleep as well as misleading lies and false claims. There's also the Tahoe-LAFS out there. The main problem with these systems is that 'everyone expects someone to host the content'. Which of course won't often happen and well, that's pretty much end of the story. ED2K is / was also distributed content addressing solution just like magnets are for Bittorrent files. I do still have the old Sharereactor dump. But guess what, any of those links won't have any value to you, because the content just isn't available anymore. Same stuff is very evident with torrents and just so many other similar platforms. - My comments about all that discussion: "Just pointing out that GNUnet and Freenet both allow pretty much similar feature set. I've studied both extensively, and after checking out IPFS, I don't get what's new. Except all the 'hype' around it, which is generally something which I as tech nerd dislike. Another problem with distributed solutions is often performance, some tasks just become surprisingly expensive."
  • Some notes about hybrid war:
    • Cyber Attacks
    • Pressuring using several methods
    • Surprising on all fronts
    • Politics
    • Economy
    • Covert operations
    • Random independent happenings
    • Quick strategic attacks
    • Hiding attacker identity using indirect methods
    • Hard to know if events are independent and random
    • Using refugees as weapon
    • Disturbing internal integrity
    • Buying land and property near strategic targets where equipment and special troops can live ordinarily right before attack
    • Strategic targets: traffic, communications, water, military, electric network, key decision makers and key system maintenance and administration
    • Repeated testing of preparedness and response times

Dependencies, Risk Assesment, Threats, Security, CDN Helsinki, LIGO, Virgo, GW150914, OpenBazaar

posted Apr 22, 2016, 11:37 PM by Sami Lehtinen   [ updated Apr 22, 2016, 11:38 PM ]

  • Kill your dependencies - I so much agree about this! I've seen some developers adding new heavy weight dependencies using libraries to implement even simplest of tasks. One project contains 20k of source code and 100 megs of other libraries. My apps usually run so fast and light, because instead of using bloated libs, I often like to implement simplified solution for what I need. Yes, there's time for libraries but often I can do without. Does one project need urllib, http.request, requests, urllib2, urllib3 and some other fancy ways to make http requests? I also agree about using several different JSON libraries is silly. One project used JSON for everything, except one interface uses protobuf, just because it's cool. Hmm. Ok. Also using many different solutions for same stuff increments attack surface, it's enough that one of the libraries you used is seriously broken. That's why I'm mostly using standard library stuff for Python and sometimes make very simplified things which aren't in stdlib in my own function, instead of importin yet another external library and potentially bloated requirements with pip and compilation and compile tools. When you start compiling stuff, then you need probably more libraries for that and then those libraries require libraries and. Yep, we doing this stuff know all know that.
  • Short comments about national risk assessment by ministry of the interior. Internal threats. Just short (translated from Finnish) list of keywords: Energy Security, Cyber Security, Digitalization and related risks. Espionage. Sabotage. Vulnerabilities. Probability, Impact. I excluded stuff like: transportation, chemical and explosives related incidents, geopolitical risks, pandemic infection deceases, floods, solar storms, extreme thunderstorm, terrorism and immigration (refugees). 
  • For personal home security, do you have proper locking, monitoring and alarm systems? Safe for valuables? Fire alarm, extinguisher(s), adequate personal and close range protection (weapons). Proficiently practiced skills to use of those devices. Money cash? Available? Money, gold, and stuff for trade? Food and water storage / availability / filtering / melting capability? Heat is also very important in areas like Finland where you can literally freeze to death. Proper clothing? Some (city) people don't even have proper clothing that would let them survive extended periods without heating or power.
  • Microsoft seems to be serving stuff now from (even in Finland) which provides really fast downloads and low latency. That's nice. It's present at multiple Internet Exchange Points around the world (IX).
  • Sometimes I wonder why Finnish sites, which user base is basically only Finnish users due to locality of service and language choose to use international CDN services, which then server content from Stockholm, London or Frankfurt. If the site itself is hosted in Finland, serving rest of content from that would be faster than using the 'mighty' CDNs for market which those do not really care about. There are really many CDNs which do not have presence / POP in Helsinki and several important ones which doe not have presence / POP in Stockholm either. Even Moscow would be faster than Frankfurt or London. Let's see if the C-Lion1 changes that. If things go well, latency could be almost same. St.Petersburg would be even closer than Stockholm. Yet routing to Russia varies wildly between operators. Some route directly from Helsinki to St.Petersburg, some cases traffic loops via Stockholm and in some cases even via Amsterdam or Frankfurt. So latency can be anything from 5 ms to 80 ms depending from things which are quite hard to say without checking routing and testing.
  • Checked out LIGE, Enhanced LIGO, Advanced LIGO and European Virgo.
  • Some seem to be worried about building P2P systems. P2P is just like distributed but everything is slower, more unreliable and you can't trust almost any (non-signed) data. But after all it isn't that different if you've been dealing with distributed systems earlier.
  • Intro to #Python #Signal #Processing using GW150914 Open Data. - kw: iPython, numpy, datascience, dataanalytics, scipy, matplotlib, h5py, hdf5, opendata, ligo.
  • Who Controls OpenBazaar? - Lot of discussion if P2P networks are somehow inherently evil? Well, read this. It's how it is and has always been, but some people just refuse to get it. - Shouldn't your ISP be responsible for all Internet content, after all they're delivering it to you.

My quickly jotted thoughts about all rounder, jack of all trades

posted Apr 18, 2016, 8:45 PM by Sami Lehtinen   [ updated Apr 18, 2016, 8:46 PM ]

  • Am I a Full Stack Developer / Administrator / Business Guy. It also means being jack of all trades. But can buy, build, setup, configure, develop, run, manage a whole system and business processes and customer support, everything alone from scratch. It requires a lot of studying all the time. But I think it's worth of it. Only thing limiting these activities is time. Some of the Full Stack Developer articles seem to think that there are no hardware, networks, data centers and they also forget that there are end user, customer support, usability issues, end user process flows, agreements, and taking care of the end user experience etc. It's not enough if you know how to boot up operating system, install 'stack' and then write node.js. There's much more there on both ends. Also you might need company, usually a good idea, as well as agreements with customers. And customer support, etc. I've noticed that some developers are extremely bad with networks, hardware or customer support. Those are the true of Full Stack. Yet I haven't gone yet in hardware design, sigh. I'm sure there are hackers out there who think buying a server or server components is cheating and not true Full Stack. Can you be a full stack guy if you can't fill well the business model canvas of the business you're running? I've got a few friends who also fill this whole scope. Usually running small technology / software companies with a max of handful of employees. They can do everything required to run a business with that small staff or alone.
  • So if you're running a successful App / Web Business without using too much outsourced resources, and just a few persons then you're probably pretty much full stack person, taking care of everything that business needs to run.
  • Full stack persons might not be actually valued by larger companies to full extent, but startups and new internal projects in larger companies with limited resources actually do require full stack people. If you get team for everything mentioned here, your funding is going to burn fast and runway is going to be pretty short.
  • Good thing about full stack guys is that they do understand everything from the beginning to the end, the full stack. That can help enormously when trouble shooting or providing customer care. All that silly what did they mean, or how does it work, stuff goes away. Which can get to extremely ridiculous levels with some layered teams.

I've written this post a long time ago. But reading the Rework reminded me about this, and now it's here. I think I'm agreeing pretty much with the lines they did setup. So this post was written before reading the book. shutdown announcement

posted Apr 18, 2016, 9:11 AM by Sami Lehtinen   [ updated Apr 18, 2016, 9:12 AM ]

Just renewed the domain for 12 months. But it's highly likely that this project will be shutdown in 12 months when the domain renew is up again. There's no user base. Yet, this was great experiment from technological & software development point.

Things can change easily if users pop up somewhere. Which is highly unlikely.

Yet many parts of the code of this project will be migrated into other projects, so it's not going to be wasted experiment.

See post @

NaCl, Cyber Crime, Compiling, Depedencies, Breaking Changes, Error Messages, Software Fail, CIIP

posted Apr 18, 2016, 8:33 AM by Sami Lehtinen   [ updated Apr 18, 2016, 8:33 AM ]

  • Had some issues using NaCl, had to recompile it after trying to upgrade python version and installing pynacl using pip.
  • Europol Cyber Crime Prevention Advice - Good basic list of things to do. National Cyber Security Centre Finland (NCSC) also got it's own list of recommendations. European Cyber Crime Center EC3 - Combating Cybercrime in a Digital Age.
  • Thank you LinkedIn for LinkedIn Premium Spam. - No thanks, where I can click not to receive any more spam?
  • NTFS - Still like aww, why no file level snapshotting. It would solve some of my problems.
  • Snort - Network IDS or NIDS and at Wikipdia for lighter approach.
  • Had my fair share of fun recompiling python and libsodium and resolving all the dependencies, also finding out that some pip installations fail, because bzip2 isn't compiled in and it isn't compiled in because libbz2-dev is missing and. Yeah. Got it done, but it was a mess. I wonder why some other developers & admins have been also having some fun with Linux dependencies. There's even XKCD 754 about this mess. To make that work, you need clojure and then you need python and ruby and perl and oops, wrong version and then some lisp and. What? We don't have rust yet installed, ok... What next, let's add some java and go. Actually I'm still missing at least Fortran and Pascal. Ok, now we got only UCS2 support, let's add UCS4 support and recompile. Hmm, joy. But nothing beats the one and only POSIX C. Actually any of my linux servers do not yet have mono installed, I definitely need to add some .NET code. - Some project also needed newer Python so I had to compile 2.7.11 from scratch. It also had some dependency issues, but I got it all sorted out. So much fun, oh boy.
  • I just love projects which do more or less random breaking changes weekly. Then when you need to update something, you'll end up with totally debilitating amount of things to fix. In best case, you do this daily, and spend a few hours / day fixing new issues. But sometimes the issues created are so bad that you can't even get your things fixed fast enough to keep up with current version. This is what I've heard from a few guys with WordPress project. Things keep seriously breaking down faster than you can fix those. Growing maintenance is just like churn with SaaS business (or any other business). You're not growing even if you would get million new customers per day, because you're losing as much or even more. So when software maintenance takes more resources than you've got, then developing anything new becomes quite hard. You're just lucky if that barely (or somewhat) works which you've already got.
  • About error messages, if I disconnect network drive or USB stick or remove external harddisk. Notepad++ says 'file is open in another program'. Which is a lie. When developers learn to give informative error messages which aren't misleading lies? Art of giving error messages is, well art. Usually error messages are just exceedingly bad or total lies and misleading. But I'm sure that's not news. Just take a look at that my fight with python recompilation whine just a few bullets above this one. 'Not found' but 'why' is missing.
  • Sometimes it feels that world is only full of *t software. But yeah. That's pretty much true. Everything is broken or horribly broken or absolutely catastrophically failing. In that sense all those security talks I've been listening sound pretty ridiculous. If the question is that does it work at all, it's kind of silly to ask how secure it is. It's just like going to war hospital and asking from a guy who's both legs has been just amputated and he's narrowly escaped death, if he's going to make next Olympics hurdling. I would think it's rather pretty distasteful trolling.
  • Hmm. This post seems to be quite negative. Of course there are great things too, when you get stuff to work after all this trouble it's great. But the question is why there was all this trouble in the very first place.
  • Carefully studied national critical infrastructure protection (CIIP) and risk report 2015. I'll blog more about that bit later.

Robo-Advisers, Oscobo, Databases, Mr.Robot, CPUs, Distributed Locking, Duplicati, Statistical Analysis

posted Mar 29, 2016, 9:31 AM by Sami Lehtinen   [ updated Mar 29, 2016, 9:32 AM ]

  • Investing using Robo-Advisers and automated trading & investment portfolio allocation. Interesting trend, yet nothing new. It's obvious that this kind of technology will emerge. Because it's just silly to pay extra for funds and investments which are basically just following index.
  • I waited for a month and unsurprisingly Oscobo never answered to the questions how data they receive and search is being handled. Their website doesn't provide detailed information, nor they answer questions about how they do it. The only sane way is to assume, that they do forward the data somewhere and do not therefore handle it as privately in their own systems alone as they claim. Somehow when I have a bad question to ask, I almost always assume that they don't want to answer the question, because I already know answer for it. As it was also in this very case. - This isn't surprise, this is the norm, when I get though with questions. - But if you want to answer, please post just a blog post about it, how you deal with the stuff. It's better than replying me individually. Check your system privileges. Make sure your code has no bugs! Reduce code exposure. Make exploitation harder. Drop privileges, wear straitjacket. The less the code can do, the less can be gained by attacking it. Process confinement using jail / container / VM. Applications should confine itself, like "werewolf chains itself". Firefox is good example how this isn't being done. Access control via broker service. Split "work horse" and "access controller / broker". Prohibit ptrace! Dropping privileges. Privilege separation. Limit access. Use name spaces. And fake file system. Jails / chroot. Prison & Guard approach using broker. Descriptor Passing. Restrict file system access. Escaping chroot. capsicum, tame, capabilities.
  • Reminded my self again about How databases work article. Yet, I'm glad to say, that even rereading it didn't bring any new information. I already knew everything in it.
  • Hacker tools used by Mr.Robot. Hmm, yeah, TV shows are TV shows and pretty bad. But it's still interesting to see what they can come up with.
  • Another really nice post. What's new in CPU's since 80s. And how does it affect programmers. Very nice post, didn't contain anything new I didn't already know. Yay. I'm actually quite happy. I wasn't sure if I'm so up to date with low level stuff, because I don't usually do it at all. But it's the basics after all.
  • I keep learning & studying more stuff every day. Several hours / day. Just watched a few great TED taks about Technology Startups. Experimenting and collaborating in globally distributed groups.
  • Very nice article about Distributed Locking - Redlock (Redis locking). Lot of basic statements about correctness. Yes, code is either correct, or it isn't. That's a good point to remember. Changes for something might be slight, but it will happen sooner or later. I've often seen mentality where everything is assumed working, unless proven otherwise. I think it should be vice versa. Your code is broken, unless proven to be correct. I also liked the fencing / token / versioning approach. That's what I've been using for a long time. That's especially good for locks which might be held for a long time. In one case I'm actually using SQL database for such tokens where data expiry time is 24 hours. That's also the same approach as Google App Engine Datastore Optimistic Concurrency Control. As well as same approach I've used for many of the RESTful API's I've been writing. One way to guarantee correctness without having per client active state on server side. Of course network latency and number of parallel workers can possibly make this extremely inefficient, as it's easy to notice with App Engines datastore. So when running tasks in parallel, there's always global progress, but adding more workers doesn't improve the performance at all. Adding way too many workers actually just makes it much worse depending on multiple factors. Yet for 'untrusted' clients I don't ever use monotonically increasing tokens, because they might want to cheat on purpose, which would ruin everything. It's just why declare anything private on Java. Well well, of course you assume that everyone is doing it right and don't want to mess up with your code. Using 'random' or 'hash' tokens makes it impossible to on purpose overwrite already committed parallel transactions. With monotonically increasing counter, this is naturally trivial and could cause some committed changes to be overwritten. Use compare-and-set (CAS) and asynchronous model with unreliable failure detectors. In some cases the 'transactions / atomic updates' over RESTful API is very convenient. But as said, with certain network & processing latencies it can also be a huge drag. Another problem this kind of 'non-locking synchronization' without wait queues is that it can very easily lead to unbalanced resource sharing and priority inversion. Instead of providing the possibility to modify records directly it would much better option to provide API which deals with transaction complexities internally on server side. Like move N from A to B. Instead of reading A and B and then sending update with A and B with version / token information with updated data back to server, you can do just one call, which doesn't require several round trips. Sometimes people say that could is awesome. Yeah it is, but having database or RESTapi or stuff with several round trips and 200+ms round trip time, is ... well, it is, what it is. Don't expect drastic performance, especially in cases where strict ordering prevents parallelization completely. This also prevents all kind of issues caused on distributed system by network delay, process pauses, and clock errors.
  • I got reminded by this, because I just had annoying issues with Duplicati and it's stale lock file without any data. Which prevents it from starting, unless you manually go and delete that lock file.
  • Some statistical analysis (analytics) on social network and 'similar' interest / usage pattern data, like Collaborative Filtering. How to build product recommendations to user groups, fully automatic target demographics & individual target determination based on existing usage pattern & interest data, etc. Yes, this can be used just for so many different purposes. Fraud detection, marketing, classification, options are endless.

Keybase FS, IPX, Zero Knowlege Proofs, Security Systems sensors psychology, CPU Load

posted Mar 19, 2016, 11:10 PM by Sami Lehtinen   [ updated Mar 19, 2016, 11:14 PM ]

  • Checked out Keybase Filesystem. Pretty neat stuff. Nothing new really, but it's new that it got such integrated encryption and identity management.
  • Quickly reminded my self about IPX when talking with friends about legacy stuff.
  • Nice post, Top 10 Python idioms I wish I'd learned earlier.
  • Zero Knowledge Proofs: An illustrated primer - Yet another awesome post link from my backlog.
  • Reread Wikipedia articles: Computational creativity, Automated reasoning, Decision support system and Evolutionary computation, Cognitive Network, Security Information and Event Management (SIEM)
  • Security vs Surveillance by Bruce Schneier - This is interesting topic. Can't wait or guess what's coming, but it remains to be seen. Attitudes in Europe have been also changing lately for multiple reasons.
  • Played with friends little with ultrasonic, radar and infrared motion detectors and different combinations and what kind of measures can be used for detection evasion. As well as which measures could be used to trigger false alarm remotely without entering the intended monitoring perimeter for target desensitization. Play, study or experimentation? All the same stuff. Basically it means knowing exactly the how things actually work in different kind of situations and not just reading the usually bad documentation. As well add knowing timings and signals so well you know when there's an anomaly. Most of the sensors do not send back raw sensor data, which means that the most of information which would be useful for later analysis isn't available. Which is sad or great depending from the aspect the issue is being monitored from. Having full sensor data could easily reveal that the system is being manipulated by some external energy source triggering it. Of course most advanced devices could be also protected against these kind of attacks. Many of the attack can be also used to blind the sensor. So even if it triggers, it keeps triggering. Depending how interested the security staff is, they might even leave the whole system disabled, because they can't disable individual sensors or or get it to work due to invisible remote triggering. I've unfortunately seen that happening also. Nobody bothers to trouble shoot malfunctioning system in the middle of night. Especially if it happens repeatedly and staff debugging it during the day can't find anything wrong with it. How unsurprising is that? Ha. Do they realize that they're being played? Most likely not. Do they realize that if the security system isn't working, they should get extra staff on hand and do continuous patrols? Most likely not. Or maybe they do, but do they actually do it? Nope.
  • I guess you've noticed that some posts are seriously out of order in my blog. As well as some stuff has been delayed for months or years, if it's such stuff that it hash some 'timely meaning'. I might write something in store when it happens, but it's ok to publish it only months or years later. When it's already general and published information.
  • Catched up a few issues of The Economist. Great stuff, over and over again!
  • Something different: More steath fighters, Mitsubishi X-2 Shinshin, KAI KF-X, Boing F-15SE Silent Eagle, JL-2. Defense systems: Terminal High Altitude Area Defense (THAAD)
  • Julia Evans post about CPU Load Averages - Well, I don't completely agree. These are very complex topics and whenever you write anything which isn't a book about the topic, it's probably more or less wrong. That's the problem with simplifications even if everyone naturally loves simplifications because getting to the root of complex matters is very time consuming and requires superb care or well, then it's just estimate and more or less wrong. This is the issue I've been bringing up with almost every ICT related article. Measuring things like memory consumption or resource consumption is inherently complex matter. Like CPU with 8 threads, when you run 4 threads and CPU load is "50%" among all threads, adding double the work load doesn't actually bring it to 100% level. It probably brings it near 100% but the truth is that the amount of the work CPU should get done in that time is already way over 100% because the rest of 4 threads isn't nearly as efficient executing the tasks as the 4 first threads. And so on. Memory bus blocking, shared caches, etc. There are multiple ways why simple kitchen match just won't do it. Also in some cases adding additional tasks can drastically drop performance. Like with HDD disks. Reading one file gets you 100MB/s, well if I add second reader, should I get 100MB/s, 200MB/s or 50MB/s? Reality is that you could be getting something like 30MB/s in some cases. Which naturally means that adding more parallelism just made the situation and performance much worse.

32c3 comments, random ramblings, thoughts, notes, dump part VIII

posted Mar 19, 2016, 6:30 AM by Sami Lehtinen   [ updated Mar 19, 2016, 6:36 AM ]

Online banking and TAN numbers & security. NONCE / OTP code. Banking transaction MitM weaknesses. I've blogged about this several times. Security measures anti-debugging, device-fingerprinting, string-encryption, packed, library, anti-hooking. Root kit detection. High level system programming. Automatic Optimizations. Quantum Cryptography. Conspiracy Theories. Quantum Mechanics. Key Distribution. Position-Based Cryptography. Quantum Bit. Polarization of a Photon. Qubit. Rectilinear and Hadamard.  Measuring Collapses the Quantum State. Wonderland of Quantum Mechanics. Quanrum random number generator. Quantum Communication vs Quantum Computation. Efficient Classical Attack or Efficient Quantum Attack against AES, SHA, DiscLogs, Hash-Based Sign, McEliece - -, Lattice-based - -, Quantum Key Distribution - - (QKD). Quantum Hacking. Position-Based Cryptography. Position Verification. Distance Bouncing. Attacking Game. No-Cloning. EPR Pairs, entangled Qubits. Or "spooky action at a distance" as they say. Quantum Teleportation. No-Go Theorem. Technoogy and Mass Atrocity Prevention, Automatic Social Network Analysis. The Ultimate Amiga 500 talk... Love. Really great talk about Amiga 500 hardware and design. Even if was really retro stuff, but that's the time things were relatively simple. The architecture of street level panopticon. Street level surveillance is mass surveillance. Prevent, Expose, Empower. Centrally collecting data from private networked security cameras. Do you wan't cloud video monitoring to your home? Especially which stores all video data to the 'official service providers cloud'. Where you don't have control of it. Automated mobile finger printing program and biometric retina & iris scans with small devices comparable to current cell phones, all in the name of public safety. Hand telemetry, Scar recognition, tattoo recognition, etc. Voice & face recognition. Automated License Plate Readers (ALPR), which also photograph driver, passengers and car itself. Centralized video surveillance intelligence centers. ALPR's are also used to collect information who goes where and when and where they stay etc. There are cars quipped with mobile ALPR so they can easily collect information who's attending certain events and so on. Many video surveillance companies hand over video data to law enforcement on voluntary basis. Cell site simulators like StingRay II can be used for extensive surveillance operations based on mobile phones and also to inject malware to mobile phones. Anti Facial Recognition Makeup. Ha. Dazzle camouflage for your face. WiFi & Bluetooth device address identifier collection (BD_ADDR / Bluetooth MAC Address). Prediction and Control by Jennifer Helsby, Watching Algorithms. Human decision is slow, biased and not transparent. Algorithmic machine learning, automated tasks, distilling huge volums of data. Algorithms and machine learning can have serious implications. Collect: Financial records, criminal history, drivers license history, driving history, travel history, medical history, purchase history, social media posts and likes, social network analysis and so on. List goes on. It's also easy to forget that usually these activities are highly interlinked, so even if you wouldn't have all data, you can assume some areas based on some other data pretty reliably. This can be used to build citizen score or insurance score, credit score, and so on, employability score? Hidden imprinting of people. How do we assure privacy, fairness and transparency? Predicting actions from history data on individual level. Nothing new, if data is available, it will be analyzed and utilized. Is it about health, insurance, crime, jet engine or building maintenance, does basically no difference what so ever. Of course there's risk of data quality when systems are being trained. Cross-correlation can lead sensitive data being learned indirectly, even if it wouldn't be actually on learning task list. False positive flagging issues. Of course data can be used for something less vague purposes like targeted advertising etc. Who's in control of machine learning systems? Nobody? De-anonymizing 'anonymous' users. Filter bubble, where everything is personalized for everyone, but is that right? What are the down sides? Do you vote for something, because algorithms thought it would be a right thing to do and fed you related propaganda or 'selective information'? - Say hi to your new boss: How algorithms might soon control our lives. Theory, Algorithms, Machine Learning, Bid Data, Consequences for Machine Learning, Use of Algorithms Today and in the Future. Discriminating people with machine learning & algorithms. Creating persistent user identities by (accidental) de-anonymization. Strategies for Handling Data Responsibly. Data vs Model. Handling model discrepancy aka epsilon. Systematic errors, signal noise, hidden unknown variables not getting analyzed. Data Volume. Incorporate variables from hidden data into the model, reducing error. Decision Tree Classifier vs Neural Network Classifier. Low risk usage like: personalization of services and recommendation engines. Or individualized ad targeting, customer rating / profiling, consumer demand prediction. There are also medium risk usages like: personalized health, person classification (crime, terrorism, and other 'features'), autonomous cars / panes / machines, automated trading, service requests etc. High risk cases like: military intelligence / intervention, political oppression, critical infrastructure services, life-changing decision (about health). Examples of data "mishaps". Discriminating people with algorithms. Humans can be prejudiced but are algorithms better? Protected attributes like ethnicity, gender, sexual orientation, religion. Replacing a manual hiring process with an automated one? Would save a lot of time screening CVs by hand and probably result in improved candidate choice, basically decreased error rate. Training predictor. Support Vector Machine. Test Sample Data. Automatic vs automatic classification. Information leak. Using machine learning to de-anonymizing data. Data bucket analysis. Using 75% of data as training set and 25% of data as test set. Measure prediction success probability and identify users. All these using very naive and simple approaches without fine tuning or optimization. Grouping similar users, and so on. Where do you work, whom do you hang out during free time. The more data we have, the more difficult it is to keep algorithms from directly learning and using object identities instead of attributes. Our data follows us around! It's really hard to create a new identity which wouldn't be linked in some way to the old identity. Because you'll most probably have some attributes which still uniquely ID you even if you're using new ID. Data Scientists / Analysts / Programmers. Train data scientists in safety and risks of data analysis. Do not blindly trust decisions made by algorithms. Collect and analyze algorithm-based decision using collaborative approaches. Create better regulations for algorithms and their use. Force companies / organizations to open up black boxes. Overtraining and overfitting allows you to waste lot of resources and make the results only non-meaningfully better.

Yeah, this was the last post about 32c3 finally. Phew.

1-10 of 345