Blog‎ > ‎

AI, ML, DB, Work, Profit, Agile, Grows, Python, JS, Project management, Software development, etc

posted May 17, 2015, 12:44 AM by Sami Lehtinen   [ updated May 17, 2015, 12:53 AM ]
  • Really nice writing about databases: CP or AP?  - Yes, many systems contain many different locking modes, and replication options and some parameters can be tuned on request basis and so on. Writes might fail, but reads still work (with stale data). Been there done that. Especially NOT properly understanding your data store will lead to hard to debug amazing problems at some point.
  • Created related tags and tag cloud features for one project.
  • Created full text search engine including adaptive rate crawler for one of my friends projects. It also supports refresh pings especially for data sources which have been off-line for a extended period and therefore it could take quite a while before adaptive rate data crawler would hit those again. It was fun stuff to do. Most of stuff works by using fully asynchronous JSON message queues where requests and responses are linked using UUID. This message queue design allows easy horizontal scaling.
  • Something was seriously broken with Sonera network - Huge packet loss and DNS broken. I guess the packet loss is causing DNS lookups totally failing. Ping works to most of national servers well, but to Amsterdam / London / Frankfurt, it works really badly with 50+% packet loss? DDoS? Interesting. Let's see if we see something later today in news. Update: This situation persisted for about 20 minutes, and seems to be mostly resolved now. DNS works and packet loss is gone.
  • Crate - Yet another distributed and scalable database with nosql goodies. Didn't go into details, but I like the one approach instead of zoo of technologies. I often prefer not to include new tools to the project all the time and if possible to deal with new problems using the existing tools. Otherwise things turn into monstrous mess of different technologies which aren't even known well and can cause serious production issue due to 'unexpected' things happening. Unexpected because we just don't know how it behaves under certain situations. Also available on Google Cloud Platform with simple and easy deployment.
  • Reminded my self about video compression frame types
  • IoT? Internet of Things? Nope, wrong. It actually stands for Internet of Targets. Smile. That's something I completely agree about. It will be absolutely unavoidable that this will get much worse before it might get better.
  • Once again had to deal with OVH and server freeze ups. It's really annoying when system just freezes for two minutes. Everything otherwise is fine, except nothing happens for a long while and that's really something which can't be tolerated. Yet it's still better than data loss or extended outage. But that's absolute no go for 'normal' operation.
  • Watched Open Networking Summit 2014 - OSN2014 - Keynote by Google Amin Vahdat. Good stuff.
  • Watched on organization using 'copy paste' system integration. Where they open data source A on regular intervals and then just copy paste new content from it to system B. That's nice, and efficient. Lol. I know I know, this is nothing new at all. Business as usual. But it still makes me smile widely.
  • Why I won't be switching to Disque - That's very well said. Many projects are fun for a while. But especially those which are technically demanding ones can turn into horrible burden rather quickly. Things which "mostly work" are fun to make. Things which work darn well, are reliable, fast and all the good things... Well, not so fun... Those require team of smart guys willing to tune the code for years. There will be times of frustration and sheer despair at times.
  • Bluefish programming editor - Yep, clearly made by programmers and developers. I've reported multiple bugs with it earlier. Like file saving working differently when using keyboard shortcuts versus mouse. Now I found yet another bug. When I open document and try to use syntax highlight with it I have to enable syntax highlighting then select language x and then switch back to language y even if the language y was already selected when I selected the syntax highlight option. Even hitting manual syntax highlight rescan won't work before those steps have been completed. Also doing that a few times seems to cause the editor to crash. Yep, that's the usual state of software. Anyway, I'm pretty happy with Python and some of the other tools which I'm using, because those seem to be very robust and I rarely (almost never!) need to waste my time fighting with broken platform or working around those doing some horrible kludges.
  • Latest jQuery Mobile 1.4.5 contains again the classic bug where stuff goes under the header bar. Aww. I've seen tons of discussions about this issue. And there are plenty of "more or less" silly sledge hammer work-a-rounds how to just make it work. But none of those is actually pretty solutions at all. Some force using CSS to insert empty space at top of the content, which is silly. Some trigger browser window resize event which is silly. Etc. All of these do work. Which just does simply prove that the original problem is a clear bug and I'm not just incorrectly using the framework. Btw. With older versions like 1.4.2 there were no issue like this. Also it's silly that when you open the page, everything is ok, open first link, stuff gets broken, go back and open the same link again, stuff is working again. This is exactly the kind of 'feature' I deeply hate about frameworks, web development and developer tools in general. That's why I like Python so much. If I do something wrong, it doesn't work or at least produces conclusively and repeatedly similar errors. Except ... I think I just had an example about this.
  • Actually I think this is just a case where ordering of events is random, but I don't know why. I would prefer consistent way of repeatedly showing this error.
    File "", line 3507, in __new__
      field.add_to_class(cls, name)
    File "", line 1073, in add_to_class,, self.related_name))
    1st run: AttributeError: Foreign key: ***keyN*** related name "***nameN***" collision with foreign key using same related_name.
    2nd run: AttributeError: Foreign key: ***keyY*** related name "***nameY***" collision with foreign key using same related_name.
    When I run that script it shows more or less random related name errors, even if there's no collision with that particular key. Instead it gives you just related name collision with some of the items being created with foreign related key. I would prefer if it would always show the first collision it encounters instead of interestingly randomizing the order it shows the errors.
  • It seems that some of the page rendering & JavaScript problems were caused by the CloudFlare. No I'm not now referring to situation where page goes under header. But to the situation where page content flashes only briefly without formatting and when JavaScript based page formatting code should format the page, the end result is just empty page. I'm sure everyone has encountered this situation at times. Solution? Disabling CloudFlare's Rocket Loader feature. After that everything is working perfectly. I'm not sure what kind of tricks the CF is using to decide when to use the rocket loader and when not to. But most annoying part of this problem was that it was hard to debug. Because there were no problems at all at times and with some browser and after full reload everything might or might not work etc. So there could be hidden 'ordering' issues where something tries to execute before something it requires is finished loading and boom.
  • SSD drives might not provide extended data retention when powered off. - Some drives lose data in one year, some drives in 3 months and some even faster. I personally would say that some of those times are much shorter than I expected. So it's not a good idea to buy and external SSD for extended data storage. That's a good example where you still should use traditional HDD.
  • VENOM - Funny, modern virtual servers hackablevia floppy disk controller. Yes, that's right. Bugs can and are lurking just about everywhere. Especially in places where nobody bothers to look for those. Are you using Xen, KVM or QEMU on your severs? Have you already patched against VENOM? Afaik, this is one of the examples where cloud service providers better security than self hosted systems. They have real priority on keeping systems secure. When systems are just "run" as side business or business enabler but not the primary focus, things like this could easily get unnoticed and such efficient high priority measures wouldn't be taken. VENOM: "An out-of-bounds memory access flaw was found in the way QEMU's virtual Floppy Disk Controller (FDC) handled FIFO buffer access while processing certain FDC commands. A privileged guest user could use this flaw to crash the guest or, potentially, execute arbitrary code on the host with the privileges of the hosting QEMU process."
  • Read: Final HTTP/2 RFC7540 specification - I need to write separate post what I really think about it and evaluating (my personal opinions of course) the choices they have made. Based on first read through my personal favorit is the GOAWAY frame. As well as I agree with the stuff I've been writing earlier. Now HTTP/2 starts to be so complex that implementing it would be a nightmare. It's just better to use pre-exisitng HTTP/2 library than trying to make compatible implementation. This will lead to situation which has already happened with SSL where there aren't actually too many options where you can choose from. I guess most of web servers won't even bother to write HTTP/2 implementation completely from the ground up. Only maybe some large ambitious projects might do it like Apache, Nginx, IIS. Others will just pass, because it's not worth of it. I'm interested to see what kind of approach uWSGI guys will take with HTTP/2. They seem to be able to tackle all kind of complex stuff quite easily. I guess they got great and really competent team working on it.
  • Something different? Reminded my self about Kilo attack class submarines and especially about the Russian Lada class submarines (Project 677)
  • Does Google Botrun and index JavaScript? - Yes it does. I guess this will be one of the things making again difference between many search engines. Others process dynamic javascript generated content and others won't. Which of course could lead to massively better results by the search engines which do process it.
  • Microsoft investing in global submarinecables and dark fiber capacity? Doesn't really surprise anyone. I thought it's quite clear investment when you're large enough player. It's better to own than rent, but only when scale truly allows it. 
  • Writing responsive and fluid web pages with HTML5 and CSS using the new picture element, without requiring JavaScript to select right images for the page.
  • Why you shouldn't use MySQL (or database in general) as queue. - Yet, it depends so much from the environment which you're coding for and also the performance requirements. I generally prefer NOT TO add new technologies and dependencies as long as I can well deal with the existing ones. Why to use MySQL if SQLite3 will do the job? Why add message broker if SQL database does the job well enough. I personally prefer to use well known technologies instead of using new ones. Because news ones will also bite you. You'll make some kind of naive implementation using the new solution, don't test it properly and after a while you find all kind of race conditions and other more or less interesting "surprises" because you just didn't know how the technology you're using behaves. As example I was quite surprised at one point that select * from table where valuefield=0 will produce totally different results than same query where valuefield = 0.0 Yep. I just didn't know what I was doing, and you're going to hit that kind of surprises several times every time when you just start using something you don't know well. I can program using Python, so I assume I can trivially port my programs to JavaScript at any time and run those using every browser. Well, yes and now. It will be pure pain before it works. Especially if I won't bother to read basics, I just make hasty implementation which works. But if there are tables which are used as some kind of queue with status flag, at least it makes sense to use partial indexes! This is one point I've mentioned several times in my blog already.
  • Did you know that Windows already contains port forwarding / TC / UDP relay as it's basic feature? That's nice. It's really useful at times. "netsh interface portproxy"
  • Started to use atexit from Python standard library for one project which requires quite much clean up when it exists. Very useful. In many cases I've used try, finally construction.
  • uWSGI background worker threads test was successful. uWSGI background threading test was successful. So this means that all requests which I assume can be little slow to handle due to external API dependencies or requiring reading tons of foreign keys from database, will be executed in background while showing a message to the user that data is being processed on server side. It works. This is also to prevent CloudFlare timeouts and of course this also follows the best practice that request it self shouldn't take too long to handle. That's also something you really have to do with Google App Engine because user facing request processing time is so limited. But it's not a bad thing at all.
  • Latest The Economist (9th, May, 2015) got also interesting article about Artificial Intelligence (AI). It's really hot topic. It's much more than the current "statistical machine learning models" like pattern recognition. Yet deep learning doesn't match yet with Deep AI or full artificial intelligence. Intriguing concept of artificial brain.
  • Reminded my self about multiple version concurrency control (MVCC) and TIMESTAMP / ROWVERSION optimistic concurrency control. Which allows to read data process it and then simply compare and swap CAS data into database very quickly when things are done.
  • Checked out - That's great. I'm not not surprised it's happening. That's something I would also do if I would be in the right position.
  • Don't drown in IPv6 addresses. Hmm, not so interesting post. As far as I can see, everything in this post was absolutely obvious. Main question is that do you want to provide reverse name for all IP addresses or just for those which are being used? That's also interesting question. As said address space is huge and it's going to be very likely mostly or nearly completely unoccupied.
  • Robots and AI are going to replace many jobs, but is that a problem? It's clear that there will be drastic changes in the future. Many jobs will be replaced by robots and also supporting business will be out. Yet this will free people for more productive and better jobs. Bit more writing about self driving cars in US.
  • Because my test project is pretty much working now as I wanted it to. I'll focus next on machine learning Python libraries. Only things which remain to be fine tuned is a few JavaScript things and some heavy background processing of data. Which are technically trivial tasks, but just require the right mood. To produce nice visualizations from the data I'll be using Tableau which is one of the data discovery and visualizations tools I really love using. I've also got some pretty nice data sets which I can use for testing, unfortunately those are such that I can't publish the results directly. But I think I can then utilize the learned lessons and resulting models for something else later.
  • I've been always wondering why high end vending machines aren't more popular. Wouldn't it be optimal at least in cities where you have expensive squaremeters and so. You could simply oder what you want and when you're at the store you can just pickup good which are there ready waiting for you instead of waiting. Of course next step would be automating the delivery from this point on. Having picking in a park? Running out of wine and cheese? A few clicks on mobile and stuff will be landing there in two minutes. Even without this delivery method, I thought this kind of fully automated pickup points would be nice near metrostations and so on. There was a story about small village which isn't large enough to run profitable store. They replaced store with fully automated vending machines. So this even works in cases where there are too little customers for traditional store with personnel. Is it time to create container system for this kind of vending systems? Refilling, transport and everything could be fully automated. In many European countries you can just pick up your mobile data pre-paid SIM card from vending machine. I was disappointed to notice that it wasn't that easy when I traveled first time to US.
  • Is blocking ICMP on firewall a bad idea? - Been there done that, I've also tried blocking protocol 41 and dhcp and yes, you'll end up breaking the network. I've also seen tons of networks where DNS is more or less broken.
  • Checked out Vortex Bladeless Vertical Wind generators - I wonder if you can call it wind turbine, because it isn't a turbine at all.
  • SSL Labs certifies my SSL/TLS is now as A+
  • Failure of Agile? - I really liked the GROWS method. - Who said that the agile process it self wouldn't be agile. Of course if you see need for modifications you'll adapt it for your needs. Shouldn't that be clear to everyone? I personally think that the failure starts at the point where talking about process politics is much more important than what we're actually trying to get done. It's absolute loss of focus. I think it partially belongs in the category of Analysis Paralysis. Next time when you need to carry something from till to your car, hire personal trainer, process consult, physiotherapist, environmentalist and a few other guys and have meetings for a few months to figure your what's the best way of getting your groceries to the car. Are you really using plastic bags? Have you made research about plastic back environment effects? No, I think we should one new team to research that topic too. - So much fail. Yet, I've seen that happening over and over again. I personally know several engineers which always seem to prefer this way. Whatever and even how simple tasks need to be done, they can spent months of it producing absolutely nothing with value, just because it's important to research this topic. How about just getting it done in a few hours instead of using months? I just wonder why "smart programmers and engineers" often seem to have total lack of common sense as well as extremely poor or even non-existent understanding of Return on investment (ROI). - - Where's the practical approach to it? If customer is paying N units for stuff that does X. Do you get the customer to pay 20x the price if you do it stupidly well researching everything and writing "so great code". All that it takes, that it does the job and works reliably. Everything else and all kind of coolness and research junk is just absolute waste of time and reduces profits from the job. Sometimes I even see people involved in sales to do similar kind of stuff without properly considering it. It's horrible, they if someone should be very knowledgeable about costs and profitability of stuff being sold. It's different to do something has hobby and do it ask profitable business. I don't care if you've been building that absolutely picture perfect WWII battle ship model in your cellar for last 5 years. But I do wish you a very good luck trying to sell it with a good hour price to someone. I'm also often wondering employees who seem to be absolutely clueless about profitability. Isn't it everyones job to take care that they work profitably for the company? Of course one factor affecting this is that many people are absolutely clueless what's their work worth of. Also focusing on tasks which will produce long term savings (increases profit too) or direct profits might be very good investments. Unfortunately many times these guys investigating something 'cool' or 'perfect' do not focus on those aspects. Things like renegotiating or changing service provider for network connections, servers, or reducing licensing costs, automating processes, system integrations, and so on can easily generate savings which can be counted as 'passive income' and be several orders of magnitude larger than your yearly salary, even on monthly level. Thats' what makes you valuable. Over engineering something for 'one off' cases, makes you just drag for whole organization. Also self-guided attitude helps a lot. But only if you can smartly figure out what you should be doing, so it's most beneficial for whole organization. I always remember when one consultant working for large ERP company said something like: "My salary is so high that if I'm not always invoicing all the work I do, I'll get fired pretty quickly. I just need to make sure that my work is worth of it to the employer and it's customers." - Yet it seems that it's quite a small percentage of work force whom gets that. You'll get agile, when you put a small team of competent guys with right focus to get the job done, and don't give them any unnecessary rules how to organize the thing. They'll figure it out almost immediately without wasting time by starting to create some kind of more or less useless rulebook. This makes common sense, these are your strengths, you'll do that, let's run iterations, ask for opinions when needed. Let's just get this done and delivered. And most likely the result is well enough and it gets done pretty quickly. Also when ever something is unclear, iterate really quickly using light drafts and then code it. When you compare it to the 'other teams', they still might be discussing some initial topics like who should be putting to this team and what kind of external consulting offers we should ask for. (Yawn)
  • This also reminds me about Parkinson's law of triviality aka bikeshedding.
  • Also see the GROWS method website for future of discussing topics and wasting our time instead of getting something else done. Yeah, this is kind of joke, something like GROWS is a good idea. But it's completely another question if I personally need it. I think I've got enough experience in this business to draw my own lines and optimize per case quickly instead of using some more or less suitable fixed rule sets. When someone says do X I'm always asking why, if it seems reasonable, Can this be done better, and then next question is is it worth of planning how to do it better. Very often the answer is no, it isn't.