Blog‎ > ‎

Data Protection and Security, PostgreSQL & other databases, etc.

posted Mar 17, 2013, 12:49 PM by Sami Lehtinen   [ updated Nov 23, 2013, 6:49 AM ]
Stuff I have done lately, in random order:
  • Browser cache stats: Hits: 70374 Bytes: 881510044 I think that's remarkable results! Even if I count time using average roundtrip and maximum bandwidth, it's still saving a lot of time, which all would have been vasted if I would disable browser cache.
  • Studied: Data Retention Directive, Data Protection Drictive, ENISA, HIPAA, PCI SSC / DSS / PDA / Cloud and Mobile standards (boring stuff, all of these got same message). Maintain secure systems, procedures, monitor and log everything. Documentation doesn't actually tell you how to do it, it simply says you have to do it so that it will pass audits. FIPS does go in to much deeper details how things must be done, which cipher suites must be used etc. In general, these are good guides to wake up people to think about security issues and how bad security often is.
  • Linux 3.8 change log: F2FS was especially interesting. Because now they finally got the point that current SSD drives (with static wear leveling) already to block mapping, so log sturcture isn't required for that specific reason. NUMA improvements were good stuff too, they also got sense of humor, because the old NUMA balancing system was called MORON. Removing support for 386 CPUs made me really sad, all the good times. Actually I never owned 386, I had 286 and 486DX. SCTP and B.A.T.M.A.N. stuff was semi-interesting. I don't know if anyone is actually using mesh networking.
  • Wondered how bad Windows 2008 R2 internal tools are. Because you really can't clearly see which part of memory is being used by memory mapped files. It's confusing when server says that all system memory is used, but you don't see what's using it. But when using rammap.exe from Sysinternals it becomes immediately clear, memory mapped files are using 'all' memory. Yet, because it's cache, it's not a problem, it can be down sized when required. It's just interesting that memory mapped files aren't shown as cache, as those used to with older server versions.
  • Fraud profiling: Studied web shop and card payment / invoicing fraud detection theory. How to make Bayes analysis classification detecting fradulent transactions.
  • Tested TS seamless RemoteApp support, did seem to work fine. It's just funny that it requires modifying .rdp files. I wonder why they haven't put checkbox for it in the mstsc.exe UI.
  • Wondered this blog post which told how Googles two-factor authentication can be bypassed. It seems that security is too hard for even large companies. I would have assumed that this password isn't alternate password for the account, but it could be restricted to very small subset of features, like just checking mail or updating Google App Engine applications. This is one of the reasons, why I have separate account with separate credentials and 2FA for every service, even if it makes things bit slower and more complicated at times.
  • Nice post, how to hire a product manager.
  • Finished reading PostgreSQL 9.2. manual. Yes, there was quite much reading, but it's quick task when you know all background and can just scan through it. There are some areas which I haven't worked with which are GIST indexes and queries related to those. But as far as I know. I won't be needing those skills anytime soon. I were most interested about indexing, transactions, data persistency (WAL), query optimizer and database maintenance tasks. I have seen some very bad SQL query optimizers and compared to those PostgreSQL was really awesome. Rest of features unsurprisingly reminded me about SQLite (Vacuum) and MongoDB (Compact) which I have studied earlier. Google App Engines database doesn't require any maintenance. I also studied Pythons shelve library and tested it with about 5 million keys. I found that it becomes very slow after about two million keys, so it's not option for SQLite3 usage with local apps. I decided that I have to play with Postgres bit more, so I now have test server with PostgreSQL 9.2, node.js, golang and python3 which I use to play with it. Something I haven't used this far are full text search indexes, that's one of the things I think I will need in near future and continue studying. After I'm done with this, I'll try Apache Cassandra. I'm not sure if I need Cassandra class databases (yet) for anything. (Eventual consistency, Paxos algorithm, MVCC) PostgreSQL vacuum, vacuum full, analyze updating statistics based query optimizer data. 
  • Quickly studied LevelDB and found out that I don't have any use for it right now. SQLite3 and shelve can get the job done for me with sufficient speed without installing additional stuff on production servers.
  • Quickly studied basics of CloudFlares RailGun, seems to be pretty straight forward solution, nothing special after all.
  • Quickly studied IPv6 geolocation databases: IP2Location as well as MaxMind.
  • Studied Linux kernel 3.9 Cache Target feature, which allows to use SSD drive as block device cache for other block devices like spinning disks. This is the approach which I like. I don't like higher level solutions where files have to be placed on SSD or on traditional disk. What if I haave 30 terabyte file and only 512 gigabyte SSD drive? It's highly likely that only parts of that 30 terabyte file are regularly accessed, as we all know. 80/20 rule.
  • Read load balancing without load balancers by CloudFlare and this interesting post about Linode network overhaul.
  • Quickly tried Chef and it's recipes for mass Linux server management.
  • Tried virtual server at DigitalOcean, they seem to provide great bang for buck. But their web pages do not properly describe security and reliability matters. What kind of disaster recovery plans they have, are backups made to same data center etc.
  • Added OpenStack, Apache Cassandra and Apache Hadoop books to my Kindle
  • I think I'm studying right things, top three trends for 2013 were mobile devices, HTML5 (single page) applications and cloud solutions.
  • I really need to notch up my JavaScript skills, I have a few books on Kindle, but I still have had other tasks which have had higher priority.
  • Quickly checked out Silent Circle, RedPhone secure phone applications and reminded my about current's solutions.
  • Read about Carrier grade NAT (CGN) and it's shared address space RFC 6598.
  • Not to get overloaded with IT stuff, I did read about 9K720 Iskander missiles
  • I really did think that TLS would discard a few hundred bytes from start of RC4 stream, but it doesn't. So it allows breaking RC4 encryption by sending same sata over and over again several times. Because this RC4 vulnerability has been known for ages, it's quite funny it's stil a problem with TLS. (article) Also see: Lucky13, BEAST and CRIME attacks.
  • Checked out Salsa20 stream cipher as well as it's ChaCha variant and curve25519 ECC. Salsa design seems to be very simple, I guess one of the most important things are the values used with different rounds. Are block ciphers using Output FeedBack mode (OFB) mode safer(?), OFB mode allows block ciphers to be effectively used as stream ciphers.
  • DRY, Don't repeat yourself. I have seen tons of documentation, bad documentation, which didn't answer to essential questions. Why documentation is so crappy? Because integrators often answer the specific questions using private email. Afaik, the only correct method would be updating the documentation to answer all possible questions. Then they wouldn't need to answer the same questions over and over again. I personally hate it.
  • I have seen incredible server admins.. Does this sound like a plan to you? First make sure that you got public IP. Then disable all possible firewalls. Then enable guest account and set all printers and network shares accessable using the now enabled guest account. Perfect Windows server and networking security, right? Ehh.
  • The Internet is a surveillance state By Bruce Schneier - Are you surprised? You really shouldn't be, this has been known for a long time. I personally don't like any system which leaks any information which I didn't ask to be shared. It's hard to find applications which do not leak information. Almost all high level applications leak more or less. Modern operating systems can "leak" data at any time storing it to swap etc. Where it can be later discovered even if you would have used encryption when saving file etc. The list goes on, browser headers leak information. My desktop computer has always been unique when ever I have made any browser fingerprinting tests. So that's also a serious leak, super cookies, etc. List is endless.