My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me!
[ Full list of blog posts ]

Peewee: Joining two tables without subquery. Is it possible?

posted Jul 15, 2014, 8:58 AM by Sami Lehtinen   [ updated Jul 16, 2014, 9:00 AM ]

I just got a very simple question, how to rewrite this statement, so that it is using only join, without separate subquery. Should be easy? But I'm not enough experienced with peewee, so that I could get it done trivially. I've tried multiple times, and I just doesn't seem to figure out how it should be done. That's why I want to know if it's possible at all. Should be?

read = Status_t.thread ).where( Status_t.usr == current_user )
query = Thread_t, Status_t )
      .order_by( Thread_t.time.desc() )
      .join( Status_t, JOIN_LEFT_OUTER )
      .where( ( Status_t.thread == None ) |
              ~( Status_t.thread << read ) |
              ( ( Status_t.last_read != Thread_t.comment ) &
                ( Status_t.uid == current_user )

Should be trivial right? Somehow I just doesn't seem to quickly figure out how it's done. Actually creating that horrible statement took quite a while. Without peewee ORM this would have been of course trivial, but I wanted to use ORM in the first place.

KW: Python, sql, database, databases, tables, table, query, queries, join, joining, joins.


posted Jul 12, 2014, 3:45 AM by Sami Lehtinen   [ updated Jul 12, 2014, 3:46 AM ] - Yet another authentication service. World is full of these services,
but I haven't yet found the one I would really love. Let's see if this would be the solution I could use for sites with lower security requirements. Why I said lower security? Main problem as I have earlier described is that mobile phone actually can't be used for high security authentication, because it's programmable computer itself. Especially in cases where keys are stored in the phone itself and are (directly) accessible via operating system.

First impression QR code login, reminds me immediately from SQRL (Secure Quick Reliable Login), which has been developing a lot lately. Basically this is okish solution. Works ok if you're not truly paranoid and you're using desktop to access systems and mobile phone for two-factor authentication. But there's immediately a problem when mobile phone itself is used to access these services. Now benefit of two-factor authentication is immediately lost, because the device is used for authentication and accessing the sites. Because mobile users are taking over the web, this is just more and more likely scenario. And after all only as good as any 'authentication application' on same device.

This list is actually directly from SQRLs page. (20140703)

Among the problems we have solved to create a practical solution, are:

 * How are identities backed up and/or cloned to other devices?
 * What about logging into a website displayed on the smart phones own browser?
 * What if the smart phone that contains my identity is lost or stolen?
 * What about password protecting logins on the phone?
 * What if the phone is hacked?
 * What about different people (and identities) sharing one phone?
 * What about having multiple identities for the same website?

I don't have great answers for those. Except that the password protecting logins on phone will make the experience even worse. First huge password to unlock phone, then even huger password to unlock the passwords ehh, logins / authentication information, etc. I guess this is just the reason why most of people don't care about security at all. They just use the same simple password on every site, or don't use any passwords / pins on device at all, if possible. On the other hand, logging in to a site from authentication application directly with single click is very user friendly method. So in this sense, they're right for sure, Saaspass is naturally a lot more secure than passwords. Although almost any solution which uses non-static random passwords is very secure compared to static and non-random passwords (the usual reference case). As well as getting rid of password resets is just great. Many sites seem to have high security, yet they still allow password resets using email, which is naturally a major fail.

So I can agree with them about this quote from their web pages: "This is the one authentication system that is actually easier to use than traditional login / password conventions and much more secure.". When that great feature is combined with the management portal, it's absolutely great solution. Because everyone at home, is suffering from logins & passwords, but businesses are suffering a lot more. Basically you have tens of different credentials and to cut down the work of maintaining the credentials usually there are shared accounts. Which leads to quite total loss of auditing and security. Because passwords might be widely shared and learned, changing such passwords is a bad thing, because everyone will be complaining after that. I've been blogging about this earlier. In some cases, the credential issuer doesn't even know which entities are using the password. So if you go and change it, everyone will be very unhappy. In some cases this might even break automated integrations etc. So, having efficient way to manage personal credentials is the way to go.

I personally think that the pin code for 4 digits is way too short. As far as I know, data isn't protected by something like SIM card, so it's totally possible to copy the credential storage and run off-line attack against it. In such case, 4 digits is 'nothing', even if it would be lengthened. Due to limited processing power on mobile phones, heavy password lengthening isn't great option either. So a lot longer password is required for such cases to achieve cryptographically feasible key. Of course if we can assume there won't be off-line attacks, 4 digits is bit on so so side anyway. It's worth of noting that this isn't only the 'default' setting, they do not allow stronger password than 4 digit pin.

I personally don't like the profiles feature. I think authentication application should be used to authenticate users, and not to manage (any) other user data. Yet, I can see situations where people would see this feature beneficial. Great thing is that nobody forces you to use this feature. Android mobile app didn't allow me to delete profiles, or I just couldn't find it. Anyway, they also provide link to the web portal, where deleting profiles is trivial. Unfortunately the web portal isn't mobile optimized, which was a surprise to me. I would have expected light, fast to use, and naturally mobile optimized site.

So if we get back to the Saaspass. They provide application for Mac & PC as well as mobile authentication applications for mobile phones iPhone (iOS), Android Phones and Windows Phone. First impression of the actual application is, wow, they have made so much work to get everything to this point. I also liked very much that the application didn't require excessively wide access rights (permissions). Also the list of supported authenticator(s) is awesome. So there's no need for the user to fine tune parameters of TOTP / HOTP / OATH / RFC6238 parameters. -  I just which more sites would actually support the QR code based login.

I tried SaasPass with Facebook just to see how things work out. But I assume this is a very nice solution for securing google apps (drive, etc) as well as Dropbox business logins as well as office 365, which has taken many business environments by storm. I almost forgot Salesforce, but I haven't ever used it, so it's easy to forget.

*)See next entry. What comes to SMS pin two-factor authentication, I'll find it quite annoying. So there's naturally room for improvement. Using mobile authentication in general, won't solve this problem, because the mobile security it self is on the way partly. Because now I first need to open password container, then I need to look for login name & password. After that I have to fill in the login form. Then I'll receive the SMS. Then I have to enter complex password to unlock the mobile phone. Then look for the two-factor password, then enter the password to computer, see that login is successful and then delete the SMS message. - That's very annoying. I often think, is this really worth of logging in. Because logging in itself is so annoying. Yes, security might be high, but especially for sites which you might like to login often, it's not fun at all. Actually this is one of the reasons, I'll try to log-in to some sites only weekly or on weekends.

But I can confess I'm using And I think they have made very much work to make it actually as secure as possible on modern mobile phones. Yet, it's usability is almost as bad as the previous item. First you'll need to enter the phone number to web site, then you'll have to unlock your phone. Yes, it's hard work if you use proper passwords, not four digit pins or some silly shapes or so. You'll need to wait for the authentication token. Then you'll need to give the private key unlock pin code to sign the token, and then wait for the signed token to get delivered back to server. Then server acknowledges your browser that it has received the token. Then the browser asks, that are you sure that you want to give this authentication token forward to this service, then click yet. And yeah, now you're done! Very simple, right? Well, not at all, slow and annoying. But at least it's very safe, as far as I know. Because Mobiilivarmenne is a lot safer than 99% of so called mobile authentication systems. It stored your private and key on SIM-card and requires PIN (not same as the regular SIM PIN) for access. - I'm only curious if it's still possible to steal the PIN with modified firmware on phone. This would also probably allow you to sign requests so that the user doesn't know about that at all. It's using Sim Application Toolkit features.

They also provide simple integration API for logging in, with login url (post) and instant registration. This is where the prefilled profile information comes handy. Many sites could provide easy login / account creation, but actually it's the collection of user data, which makes registration so painful. When data already exists in the authentication application, the registration process can be shortened greatly or in best case completely automated.

See: Saaspass FAQ. I also liked their FAQ because it doesn't contain bogus claims and also included information about potential but not so likely downfalls.
Also see: Developer page - There's just what I wanted to know, what kind of data is passed on when you register or login / sign-in / sign-on. I didn't try to create a service which would use Saaspass, but integrating it should be pretty trivial if required.

I personally do like very compact straight to the point documentation, but that's not enough. Basically with that documentation, you'll have to just try and see what the output is. I guess it's not hard from that to get the thing to work properly. But having it explicitly stated, instead of guessing from field names and data, is always better. (Although I'm way too used to guess.)

After using the Mobiilivarmenne, it became clear that they had thought many things that weren't clearly stated in the documentation. With the final user experience, you'll notice that many things were covered with I earlier (before using the application actually) speculated that could lack some vital elements.

More sites should support the QR code login. The TOTP authenticator solution can be used with 'any other similar applications'.

Last question? Would I use it? Yeah, why not. Looks good. Changes are I'm not going to use it, because I don't like 'extra apps'. But there aren't any particularly good reason why I wouldn't recommend this application for businesses and individuals looking for authentication solution with medium security requirements. And I'm now reminding that this is high security solution for normal users. My personal high security rating means, something what is tinfoil paranoid and NSA proof. ;)

Tags: Saaspass, OTP, TOPT, SSO (single sign-on), login, log-in, passwordless, secure, authentication, review, just few my of thoughts.

PyPy3, X-Road Europe, Google Dataflow, PyData, Google I/O, VLIV

posted Jul 12, 2014, 3:40 AM by Sami Lehtinen   [ updated Jul 12, 2014, 3:40 AM ]

Facebook, (FRA, NSA, BND), Sea cable, Algorithms, Stale lock, SSL/TLS, BI, OpenStreetMap/Nominatim

posted Jun 23, 2014, 12:09 AM by Sami Lehtinen   [ updated Jun 25, 2014, 12:06 AM ]

  • One research claimed that people don't especially trust Facebook and many other web services. That's exactly true. It's one of the reasons why LclBd service is NOT going to require user account or any user identifying information, except cookie / randomly generated user id. We're not asking for email, name or anything else. We think it's fair trade off, because in exchange we're asking for your location information to server you better with local content.Arstechnica writes that it's possible to snoop network traffic. Well, I did this trivially back in 1995 and nothing has changes since, except use of encryption. Yet during that time amount of different applications being used has exploded, and it's always possible to find a loophole to slip through.
  • Plan for Finland to build a sea cable to Germany to avoid FRA (+NSA) snooping seems to be pretty much a complete failure on some levels. NSA is also spying in Germany. Great plan! Ok, it was trivial to guess this already. See: BND
  • Decision tree - Checked it out, while implementing Bayesian filtering for LclBd. Bayes Theorem.
  • Carefully studied quarterly cyber security review by Finnish Cyber Security Bureau.
  • Encountered stale lock with Deluge BitTorrent client. Client didn't start, before I manually deleted the lock file. This is one of the reasons why I implemented my own locking lib, because I'm sick'n'tired of stale locks as well as issues that require manual intervention on servers. Things just should work, even if something hasn't worked exactly as assumed.
  • As we know SSL/TLS certificates are a huge mess. Maybe this new Online Certificate Status Protocol could solve the problem, my validating certificates online. Current problem is that nobody checks for certificate revocations and doing it is quite pointless because attacker can prevent checks when required if that seems to be appropriate action to do.
  • Software providers want to push BI services to smaller companies. I think that sales of BI systems alone isn't going to do it. There should be clear use cases where there are benefits to be gained. Adopting technology X won't bring anything else than costs, if it's not thought carefully what it's being used for. If suitable data sources are available, and there's even one competent and analytical person, it's highly probable that using something simple and efficient like Tableau will bring information insights for the organization. Implementing something rigid and expensive, is quite a bad plan.
  • OpenStreetMap/Nominatim reverse Geocoding gives strange results at times. How this query returns only the house number, no information about street, city, country, etc. - Actually they deleted the object it was referring to, before this post got published. So now it returns expected information.
  • Something different: Low probability intercept radar, Computer generated holography, Stealth technology, Passive radar, Multistatic radar.

Vacation fun, lot of mixed stuff links I've been reading and studying

posted Jun 17, 2014, 2:03 AM by Sami Lehtinen   [ updated Jun 18, 2014, 2:29 AM ]

BSBTC, ROTACS, Java 8, GAE, Birthday Pradox, Ubuntu, DDoS Mitigation

posted Jun 13, 2014, 4:20 AM by Sami Lehtinen   [ updated Jun 13, 2014, 4:23 AM ]

Summer vacation fun:

Blog backlog is still just growing. I'll try to extract it some rainy day, when having a right feeling to do it. It now contains 612 entries, aww.

SSDs are immune to file fragmentation? Myth debunked

posted Jun 9, 2014, 10:36 PM by Sami Lehtinen   [ updated Jun 10, 2014, 1:25 AM ]

There was a heated debate in one forum about the fact that if defragging affects SSD file system peformance. Because there wasn't any kind of evidence or conclusion, I had to test it.

Each read test was repeated 10 times and median was picked and whole test set was repeated 10 times including reformatting ntfs partition and on purpose fragmenting it again for the test. All file system caches were flushed between each test. For the tests I used 8 GB partition and 7 GB test file. Read times are in seconds.

Let's get the results.Lower (time) is better.

Fragmented Contiguous
Forward 34.7 s
14.5 s
Reversed 35.7 s
31.2 s
Random 36.1 s
19.0 s

And here's chart for easier readability.

SSD file fragmentation effects

As you can see, SSDs clearly aren't immune to effects of file fragmentation. As far as I can see, it should have been very clear to everyone immediately, but now I have actual proof. I didn't know how large the performance difference was going to be. But I basically knew, there would be a difference. But in this case, performance loss turned out to be over 50%. So defragging can double the file read performance on heavily fragmented files.

File system used for testing was ntfs. But because this is based on very basic data access pattern principles, I strongly assume similar kind of results can be achieved using any file system.

Discussion as well as some background information is here, see the Google+ post.

KW: SSD, fragmentation, defragmentation, defragment, defrag, file system, ntfs, effects, performance, read speed, read performance, does SSD need, should I, recommended, fact, myth, myths, ext4, file system, data, mb/s, gb/s, megabytes, gigabytes, per second, contiguous, sequential, extents, continuous, contiguous,discontinuity, filesystem, blocksize, blocks, clusters, sectors, cluster, sector, block, extent, btrfs, zfs, xfs, filefrag, e4defrag, defraggler, solid state drive, drives, flash, nand, mmc, emmc, contig, defragmenter, free space fragmenation, allocation unit, units, seek, crystal disk, performance testing, test, tests, review, reviews, benchmark, benchmarks, endurance, experiment, experiments, diagnostic, diagnostics, tool, tools, toolbox, usb, stick, sticks, raid, tester, utility, software, benchmarking, memory, storage, rate, rated, devices, database, databases, related, startup, optimization, optimized, comparison, compare, electronic disk, solid state disk, harddisk, hard disk, harddrive, harddrives, hard disks, hard drives, myths, false, claims, bogus, true, really, sas, storage system, difference, faster, slower, better, best, HDD.

SSD LBA, SQLite4, Cache contention, Tails mirrors, Chosen key encryption, Stale locks

posted May 25, 2014, 12:39 AM by Sami Lehtinen   [ updated May 26, 2014, 8:57 AM ]

  • Article about SSD LBA, no it doesn't mean same thing as the traditonal LBA with hard drives. This new technology should in some cases boost SSD (write) performance up to 300%, in cases when drive is quite full. It's done simply by writing data using different write pattern than earlier. Based on the paper it's nothing special. They just increase fragmentation and decrease amount of data written to disk. These are very traditional trade offs especially with database WAL and LSFS. They didn't tell if it decreases read speed, because even if SSD 'seek times' are low, most drives read random blocks from disk much slower than linear data. Which means that fragmenting data especially in very small blocks will probably cause reduced  read speeds. Just as fastest database for writes is ordinary log file, but it's not a database at all. Only problem is that reading from that file is very slow because you have to scan whole file to find what you're looking for or in worst case, several of these files.
  • Reminded my self about SQLite4 design principles. Only thing which makes me bit worried is that they're preferring single key space. Because with SQLite3 it's usually preferable to split data in separate key spaces to improve performance. Instead of putting billion records in one table, it's much better to split those chronologically so that the data from three years is split into monthly tables. Now we got average of 27,7 million keys per table and it's much more manageable amount than having a billion keys. Binary trees tend to grow when you put a lot of stuff in same table, so this is clearly beneficial approach. Total and final performance death hits you hard at the point when the primary key index doesn't fit into system memory anymore. After that everything is going to be extremely slow.
  • Excellent post about cache contention, those are things that most of engineers never even consider thinking about.
  • Actually Tails project was asking for people to setup mirrors. This is one of the reasons you should ALWAYS check download signature, because nothing prevents mirror maintainer delivering you something else than what you were expecting to get.
  • Simple stuff, but a way nice to actually apply it. How to encrypt data using chosen key to get chosen result when decrypting.
  • When I said that I wanted to use my own locking library, because I'm sick'n'tired of stale locks, I wasn't kidding. A few days ago I started to wonder why my Deluge isn't working anymore. After rebooting and checking logs, I found out that they had stale lock file in the app path. Reboot or restarting app didn't fix the problem. It required manual intervention and that's exactly what I hate so much. It might not sound bad, but when you have tons of servers and just a few of those fail with some kind of random stuff daily, you're getting really quickly tired of that. Everything should work, and if it doesn't work, reboot should fix it. If it doesn't fix it, it's absolutely totally broken. I don't want to use it.

"Random" passwords using DuckDuckGo

posted May 21, 2014, 8:50 AM by Sami Lehtinen   [ updated May 21, 2014, 9:11 AM ]

I was creating some random passwords when it suddenly hit me. Hey, these random passwords doesn't look too random at all. After little checking, it turned out it's complete snake oil.

Example passwords [A-Z,a-z,0-9] 12 characters. Should be pretty random right? Impossible to guess? Well, that isn't true at all.

Assumed amount of entropy and combinations by GRC Passwords Haystack.

Screenshot with one of passwords.

So if we have only 3,279,156,381,453,603,096,810 possible combinations, it's hard to get collisions? Not true again. Here's results form DuckDuckGo random password function, using two different computers.

A screenshot with random password generator using two different computers and browsers.

Here's sample set off passwords from the service, after all, it's not so random as you might expect.
A8EodAKtoypU, oypUDZk5ugf2, A8EodAKtoypU, DZk5ugf2Xfct, A8EodAKtoypU, 0YNYDsvBLBrL, A8EodAKtoypU, XfctNY4YDsvB, A8EodAKtoypU, A8EodAKtoypU, DZk5ugf2Xfct, 0YNYDsvBLBrL, A8EodAKtoypU, A8EodAKtoypU, A8EodAKtoypU, DZk5ugf2Xfct, A8EodAKtoypU, DZk5ugf2Xfct, A8EodAKtoypU, DZk5ugf2Xfct, oypUDZk5ugf2, XfctNY4YDsvB, 0YNYDsvBLBrL, A8EodAKtoypU.

No, it's not mistake. It's real output just as I received it from the service. I guess these are NSA approved passwords or something.

Tip, always use what ever random source as seed and mutate the result by you self, so original data isn't directly used. That's one of the reasons Linux isn't using Intel random number generator alone.

It's time for final Internet wisdom, XKCD, XKCD and Dilbert.

Discussion at Google+.

Failures, Cloud Disaster Recovery, New services, NAT pass-through stuff

posted May 18, 2014, 1:54 AM by Sami Lehtinen   [ updated May 18, 2014, 1:55 AM ]

  • Massive IT fail and my own thoughts and personal confession. I'm so glad that I haven't done anything like that.
    Well, I have had one very close by incident, but recovered it from it so that customer didn't even notice it.
    Once I screwed up production system data for one database table. But that's just because I was sick at home, in fever and almost like drunk. Then the they applied horrible pressure from the customer to make immediate changes straight into the production. Well, technically the screwup was really small, only one new line missing. But that got replicated to tons of computers as well as the data being collected was affected by that. Of course it was possible to afterwards clean that, and it didn't stop production. But cleanup was painful as usual. Most annoying thing was that I actually noticed my mistake during the run just by checking data being processed and I tried to stop the update at that point, but it was already partially done and started to get replicated. I just which I would have reversed the order. First check data, and then process it, instead first starting process and then checking data while waiting it to get processed. Clear fail. There's no reason why I couldn't have done that, before actually updating tables. Because I was of course able to run the process in steps. After all, that's the thing that that really bugged me personally. Clear failure to verify a key thing. But there's one more key point, the fail I experienced, didn't actually have anything to do with the change I made for the customer. It was a change which was done earlier, and just wasn't in production yet. So I did check things, which I expected to be worth of checking, meaning the the changes I made and things affected by those. But I encountered a secondary trap, which was laid in the code base several weeks earlier. Which clearly wasn't properly checked at the time the change was made. After all, I could have avoided the problem very easily by checking all data in processing steps and verifying the data before running the final update to production. So this err is very human. Hurry. pressure, not feeling well, so let's just get it done quickly and that's it. - This could be perfect example from the Mayday / Air Crash Investigation TV show, how to make things fail catastrophically. - Fixing the data issue in database on primary server took only about 15 minutes. I'm still quite sure there were hidden ripple effect from this event which probably did mean losing about two days of work indirectly. Having database backup would have been one solution, or using test environment, but either was available due to time pressure and me being at home. Because production system was live, having backup would have been worthless anyway, because it would have 'rolled back' way too many transactions.
    Yet another really dangerous way of doing things is that you'll remote connect to workstation and then open database management software and connect it back to the server. It's so easy to give commands to server in such situation accidentally by thinking you're commanding the workstation. Luckily I haven't ever failed with that, but I have often recognized the risk to major failure and so have my colleagues.
  • Cloud DR and RTO:
    In many cases having the data isn't the problem. If it's in some application specific format, accessing it can be the real problem in case the primary system isn't working. Let's say you're using accounting system XYZ. They provide you an off-line backup option where you get all the data. Something very bad happens and the company / their systems disappear. Great, now you have the data, but accessing and using it is a whole another story. Let's say they used something semi common, like MSSQL server or ProgresSQL and you you got giga bytes of schema dump. Nothing is lost, but basically it's totally inaccessible to everyone. If you have made escrow, great. Then starts the very slow and painful process of rebuilding the system which can utilize that data. Of course if you got competent IT staff, they probably can hand pick the "most important vital records" from that data, but it's nowhere near the level needed by normal operations. So RTO can be very long, like I said earlier. I'm sure most of small customers don't have their own data at all, nor do they have escrow to gain access to the application in case of major fail.
    Let's just all hope that something bad like that won't happen, because it'll be painful, even if you're well prepared. I have several systems where I do have the data and escrow or even the source. But I assume setting up the system will take at least several days even in the cases where I do have the source code for the project(s). In some cases situation could be even much worse. Let's say that the service provider was using PaaS and it failed and caused the problem. Now you have software based on AWS, App Engine, Heroku or something similar, but the primary platform to run the system isn't available anymore. Yet again, you can expect very long RTO. But competent staff will get it going at some point, assuming that you have the code and data. 
  • Checked out services like: Pandoo "Web operating system", Digital Shadows "Digital attack protection & detection", Wallarm, "Threat and Attack detection & protection", ThetaRay "Hyper-dimensional Big Data Threat Detection", Divide "BYOD", Cyber Ghost "VPN", and Lavaboom "privatey email".
  • Studied a few protocols more PCP and NAT-PMP. Yet IPv6 should make all thsese work-a-round protocols unnesessary. I hope nobody really is going to use NTPv6.

1-10 of 156