Blog‎ > ‎

Enisa, Caching, TAN, Keccak, Garbage Collection, Integrations, Home improvement & networking

posted Oct 18, 2012, 7:10 AM by Sami Lehtinen   [ updated Mar 28, 2014, 11:33 PM ]
  • Sorry, this blog post containst stuff from two weeks, I were busy during last weekend.
  • Studied Enisa web-site thoroughly.
  • One of the readers asked if I'm using CAR caching in production environments. No I'm not using it, due IBM's patent. I just want to mention that my dictionary based CAR cache is over 7 times faster (87%) than the lru_cache implementation. It's also scan resistant, which LRU cache isn't. CAR cache is as slow as LRU when doing replacement due pure Python implementation, but lookups are superfast because most of updates can be omitted. Inserting is also very fast until bucket replacemnt is required. Update is done only if record is untouched when it's being accessed, it will be touched once, on each cache clock cycle. After that it will remain read only until it required to be touched again on next clock cycle or it is replaced by new record. Cache misses are also bit slower than with LRU cache due evicted key list needs to be maintained. Basic goal is to get over 90% cache hit rate, so cache replacements (updates, touches) are not happening too often which combinet with super fast read is nice thing.
  • Fixed Transaction Authentication Numbers (TAN/eTAN) do not solve issue of confirming message payload. Unless codes are delivered out of band with transaction information. In that case those are much better than pure TOTP/HOTP solutions which remain static even if transaction information is changed. ChipTAN is great alternative because amount and account number are used as parameters when TAN code is generated.
  • I'm using Sonera's Home IP-TV service. I'm finding constantly bugs and poor performance among many other issues. It seems that bad or very bad software seems to be an issue, always. They manage to fail even simple things, like keyword search of TV programs using WEB ui. Searches often take more than 30 seconds to perform. I don't even have words for that. With IP-TV box screen updates fail with menus etc. All kind of basic stuff you'll find from badly written and untested software. Btw. box is made by Motorola. - I always thought that my set top digital tv box which I had earlier had worst software & usability ever, but it seems that Sonera was able to surpass that crappiness.
  • I have been lately maintaining projects with IIS, ASP and MSSQL and Apache Tomcat, Java and MySQL.
  • Studied alternative currencies, work time banks and bitcoins and valuable metals. I have been studying this subject multiple times earlier. Just checked new options if there has been any development.
  • Checked out and liked new refreshed Bitbucket service. I have my private projects stored there.
  • Studied known Near Field Communication (NFC) security issues. It seems that especially with smart phones there are many gaps in software, even if NFC technology itself isn't harmful. Currently it seems it's good idea to disable NFC if aren't using it daily.
  • Checked out Keccak Python implementation.
  • I learned importance of caching at very early stage. I had my Casio FP-200 "laptop" which was really programmable calculator with exceptionally nice keyboard. It's CPU performance was so poor that if I tried to solve sin(x), cos(x) once per second, it couldn't get it done in time. I had to modify code so that I would precompute sin(x) only for 1/8 of the circle (sector of 3,75 minutes/seconds). After that I updated my on screen clock hands using those precomputed values with phase shifting so I could get my second hand updated in time. Naturally this means that hour and minute clock hands were updated only when required because it was again too slow and caused second hand update skips. I later utilized same routine with Pizza Worm game when drawing pizzas, because using sin / cos for every operation was once again way too slow even with 80286/12Mhz computer. (Yeah it was supposed to be fast!)
  • Parallella would be interesting playground for testing purposes. I just would like it to have multiple general purpose cores and not completely separate instruction set. Checked http://www.kickstarter.com/projects/adapteva/parallella-a-supercomputer-for-everyone project, were already familiar with Tilera chips. http://eclipse.sys-con.com/node/2381006
  • Configured apparmor to protect server resources and file system paths. All services which are reachable from network, are now access limited with apparmor as well as file system access rights using per service user accounts.
  • Studied garbage collection solutions and pros and cons. Mark & sweep, reference counter based solutions and optimizations for those. This is one of subjects which is simple, or super complex, depending from study level. Guys, don't forget that weak references can be very useful at times, especially with caching.
  • Three years after the release of OAuth WRAP, OAuth 2.0 is finally an official standard as IETF RFC6749 and RFC6750
  • Added three lines of code to one unnamed ERP integration project. Result? Total runtime decreased by 28%. I simply added caching at a few key points. I would say that's time well spent.
  • Checked out tehory of Flowlang, sounds pretty interesting: Flow Manifesto, Solving Multicore Dilemma.
  • Good writing about realworld MongoDB issues and experiences. Nothing new really, same basics apply to most other databases, recommended reading.
  • One Business Intelligence (BI) integration requires quite complete Extract Transform Load (ETL) approach. It's going to handle tens of gigabytes of data daily near realtime from several sources and finally store refined data to BI systems DB. I'm still using SQLite3 for staging tables. Why? It's really fast as long as there aren't concurrent writes. I'm also very happy that I were able to create design which doesn't require any additional mapping tables to be maintained.
    Python 3.3, process pool, synchronized priority queue (heap queue), SQLite3 for local staging tables and for semi permanent caching, can connect MSSQL, MySQL, PostgreSQL, SQLite3, MongoDB, ODBC, JDBC and Raima RDM Server. Alternative data formats CSV, XML, JSON (with almost any structure). SQLite3 is run in WAL mode to provide better write access. All writes are done in big chunks from thread safe queue using one process. About 7+ gigabytes will be processed daily, that's not really so much. But still can be very slow if things are done inefficiently.
    Using priority queue allows easy insertation of tasks to be executed as sub-processes in right order. Although tasks still have to yield at times when required data isn't available and tasks need to be requeued. That could be solved with better chaining logic, so tasks without fulfilled requirements wouldn't be in the queue on first place. Actually I have already fixed that issue.
    I avoid overloading production systems where data is fetched from by doing as light reads as possible. I have disabled transactionality (transaction isolation) and repeated reads for my data fetches. Because data being fetched is either old or a snapshot, it really doesn't matter if it's not 100% constant. I really don't require read repeatability because I have no intention to read same data several times, that's exactly why I have my own caching layers.
  • Tried Netflix. - It requires Silverlight? - Sorry, fail, no thanks.
  • Planning proper home fiber network installation. We already have fiber to building and apartments, but currently fiber is installed inside apartment on surface. I would prefer proper embedded installation with several sockets in every room.
  • Spent quite much time thinking about indirect led lighting, kitchen interior design alternatives, barthroom designs, removing partitioning walls, sliding mirror doors. Prices and suppliers. Plan is to renew interior completely.