Blog‎ > ‎

Robo-Advisers, Oscobo, Databases, Mr.Robot, CPUs, Distributed Locking, Duplicati, Statistical Analysis

posted Mar 29, 2016, 9:31 AM by Sami Lehtinen   [ updated Mar 29, 2016, 9:32 AM ]
  • Investing using Robo-Advisers and automated trading & investment portfolio allocation. Interesting trend, yet nothing new. It's obvious that this kind of technology will emerge. Because it's just silly to pay extra for funds and investments which are basically just following index.
  • I waited for a month and unsurprisingly Oscobo never answered to the questions how data they receive and search is being handled. Their website doesn't provide detailed information, nor they answer questions about how they do it. The only sane way is to assume, that they do forward the data somewhere and do not therefore handle it as privately in their own systems alone as they claim. Somehow when I have a bad question to ask, I almost always assume that they don't want to answer the question, because I already know answer for it. As it was also in this very case. - This isn't surprise, this is the norm, when I get though with questions. - But if you want to answer, please post just a blog post about it, how you deal with the stuff. It's better than replying me individually. Check your system privileges. Make sure your code has no bugs! Reduce code exposure. Make exploitation harder. Drop privileges, wear straitjacket. The less the code can do, the less can be gained by attacking it. Process confinement using jail / container / VM. Applications should confine itself, like "werewolf chains itself". Firefox is good example how this isn't being done. Access control via broker service. Split "work horse" and "access controller / broker". Prohibit ptrace! Dropping privileges. Privilege separation. Limit access. Use name spaces. And fake file system. Jails / chroot. Prison & Guard approach using broker. Descriptor Passing. Restrict file system access. Escaping chroot. capsicum, tame, capabilities.
  • Reminded my self again about How databases work article. Yet, I'm glad to say, that even rereading it didn't bring any new information. I already knew everything in it.
  • Hacker tools used by Mr.Robot. Hmm, yeah, TV shows are TV shows and pretty bad. But it's still interesting to see what they can come up with.
  • Another really nice post. What's new in CPU's since 80s. And how does it affect programmers. Very nice post, didn't contain anything new I didn't already know. Yay. I'm actually quite happy. I wasn't sure if I'm so up to date with low level stuff, because I don't usually do it at all. But it's the basics after all.
  • I keep learning & studying more stuff every day. Several hours / day. Just watched a few great TED taks about Technology Startups. Experimenting and collaborating in globally distributed groups.
  • Very nice article about Distributed Locking - Redlock (Redis locking). Lot of basic statements about correctness. Yes, code is either correct, or it isn't. That's a good point to remember. Changes for something might be slight, but it will happen sooner or later. I've often seen mentality where everything is assumed working, unless proven otherwise. I think it should be vice versa. Your code is broken, unless proven to be correct. I also liked the fencing / token / versioning approach. That's what I've been using for a long time. That's especially good for locks which might be held for a long time. In one case I'm actually using SQL database for such tokens where data expiry time is 24 hours. That's also the same approach as Google App Engine Datastore Optimistic Concurrency Control. As well as same approach I've used for many of the RESTful API's I've been writing. One way to guarantee correctness without having per client active state on server side. Of course network latency and number of parallel workers can possibly make this extremely inefficient, as it's easy to notice with App Engines datastore. So when running tasks in parallel, there's always global progress, but adding more workers doesn't improve the performance at all. Adding way too many workers actually just makes it much worse depending on multiple factors. Yet for 'untrusted' clients I don't ever use monotonically increasing tokens, because they might want to cheat on purpose, which would ruin everything. It's just why declare anything private on Java. Well well, of course you assume that everyone is doing it right and don't want to mess up with your code. Using 'random' or 'hash' tokens makes it impossible to on purpose overwrite already committed parallel transactions. With monotonically increasing counter, this is naturally trivial and could cause some committed changes to be overwritten. Use compare-and-set (CAS) and asynchronous model with unreliable failure detectors. In some cases the 'transactions / atomic updates' over RESTful API is very convenient. But as said, with certain network & processing latencies it can also be a huge drag. Another problem this kind of 'non-locking synchronization' without wait queues is that it can very easily lead to unbalanced resource sharing and priority inversion. Instead of providing the possibility to modify records directly it would much better option to provide API which deals with transaction complexities internally on server side. Like move N from A to B. Instead of reading A and B and then sending update with A and B with version / token information with updated data back to server, you can do just one call, which doesn't require several round trips. Sometimes people say that could is awesome. Yeah it is, but having database or RESTapi or stuff with several round trips and 200+ms round trip time, is ... well, it is, what it is. Don't expect drastic performance, especially in cases where strict ordering prevents parallelization completely. This also prevents all kind of issues caused on distributed system by network delay, process pauses, and clock errors.
  • I got reminded by this, because I just had annoying issues with Duplicati and it's stale lock file without any data. Which prevents it from starting, unless you manually go and delete that lock file.
  • Some statistical analysis (analytics) on social network and 'similar' interest / usage pattern data, like Collaborative Filtering. How to build product recommendations to user groups, fully automatic target demographics & individual target determination based on existing usage pattern & interest data, etc. Yes, this can be used just for so many different purposes. Fraud detection, marketing, classification, options are endless.