Blog‎ > ‎

Modern SQL, ALPN, uWSGI, Malware, WiFi, Robots, AI, Blind Trust, Python, Big Data

posted Feb 15, 2015, 12:25 AM by Sami Lehtinen   [ updated Feb 15, 2015, 1:05 AM ]
  • Virtual Memory Intro - Nothing new here, but if you're not familiar with this stuff. It's still a good intro for you.
  • Something different Auto GCAS, Beriev Be-200, Steyer AUG, ADS Rifle, FAMAS, FN P90
  • Played a bit with Mapillary - A service and uploaded about 200 photos around Helsinki.
  • The Cloud Conspiracy @ 31c3 - It's clear that they're doing what ever they want and can behind our backs. As EFF has addressed they have no intention telling about any potential guidelines, so we must assume there's none.
  • Things in Pandas I Wish I'd Had Known Earlier - Nice post, good reading if you're interested about Pandas.
  • 25 tips for intermediate Git users - More good reading, and some beneficial tips. Which you don't often encounter even if using git daily.
  • Modern SQL in PostgreSQL - I really loved this presentation. This is great example how things can be made drastically faster if developers bother to think (or even understand) how their programs should work. Of course this requires that you bother to that and know features of the database engine you're using.
  • Checked out sqlcipher - If and when encrypted databases are needed. - Some suggested that there are newer object databases but so what. SQLite3 is excellent. I guess it's one of the most widely used databases world wide. - That's why main benefit with SQLite is wide compatibility. If you want to use objects there are plenty of ORM solutions available which work with SQLite while not breaking generic file & SQL compatibility.
  • Once again reviewed and made sure that backup procedures work as supposed and automated monitoring is also working without problems.
  • Artificial Intelligence (AI) part 2 - kw: ASI, AGI, AI, Singularity, Kruzweil, AI Takeoff. - This is surely something to think about.
  • Got sick'n'tired of hopeless programmers, developers and engineers again. Let's see, we have a problem X which happens semi-randomly. Their fix? Well you should manually rename files and move those to another directory and then monitor if there will be new files and repeat the steps if there will be and so on. - Well, what about fixing the program so that it would work, and there wouldn't be manual continuously monitoring and action steps that need to be done? Have you ever considered that? Well, no. They don't get that it's crappy programming and bad design causing these problems. The problem isn't that the helpdesk or operations aren't continuously checking if there are "problem" files, restarting services and moving and renaming files to other path(s). Root cause analysis anyone?
  • Once again got very frustrated how slow Windows is. Some things are just ridiculously slow when using Windows compared to similar tasks when using Linux. Luckily I got three work stations so one Windows is lagged out. I can use others to keep going. Even multitasking works so badly that it's better to have physical workstations.
  • Studied ALPN RFC7301 - which should replace NPN for HTTP/2.
  • Boston Dynamics Spot - Our robotic overlords have arrived? - Nah, not yet, but progress seems to be pretty impressive.
  • Once again had not so interesting discussions about data quality. People checking statistics and analyzing data complain that data is useless. Yes, that's true, if whatever is entered into the system. I've been thinking about multiple big data projects, and even if there's plenty of data, it's pointless to analyze it, if it's garbage quality input. So only automated and well thought data sources should be used. As well as everything should be fully automated. If there are manual steps, like in some cases reading visitor counter every hour and submitting data manually, it's going to fail so hard that it's pointless to even plan something silly like that. Yes yes, in this case data is logged and you could enter data for whole week or month at once retroactively, but it doesn't matter. It's just not happening reliably and correctly. Another thing which should be really simple and straight forward is doing store / stock location inventory. In theory it's simple task to do, just enter correct values and that's it. But there are just so many dumb ways to fail. Using RFID to read data for stock might be expensive, but at least it would be up to date. Sorry for counsultant who's  claiming that to be perfect solution. It isn't. First of all, there might be tags which aren't readable for some reason. Then the next step, even if RFID tags would be read 100% correctly, it doesn't guarantee correctness? How so? Well, people might have incorrectly tag the products. I've seen it happening over and over again. So that's not the magic bullet we're all looking for.
  • Btw. One of the major problem cases I remember was one of big Finnish tobacco companies printing incorrect barcodes on their products. Did they pull back the batch? Nope, they didn't. Then we received countless calls about POS system malfunctioning and incorrectly identifying products. Which of course was complete BS from customers. Our system did read and connect the barcodes exactly right. But it seemd to be quite hard for people grasp that products could have invalid barcodes. It would be so much fun if Coca-Cola would ship millions of 0.5 bottles with Pepsi 1,5 litre bottle barcodes. I would just imagine how messed up people would be about that, even if it's really simple thing. Incorrect code and that's it, there's nothing to wonder about it. - Well, that's just life.
  • Another thing I'm worried about is that people some times seem to blindly trust the systems. They just can't get the point that their system is malfunctioning and they shouldn't trust it. If I take license plates from expensive sports car and put those on some old junk, and take it to the dealer, do they just read the license plate and decide that they're giving me 200k€ for that? I guess not, but in some cases people really are that silly. The system says that this is expensive sports car so it has to be.
  • What reminded me about all this stuff? Well, I tried to order stuff from local web shop. I asked them to deliver stuff to my local post office. Their website offered only somewhat distant alternatives. Then I send email to their customer support asking, if you're using standard national mail, why you can't deliver to my local mail office? Then they replied from the customer support that, we can deliver to following post offices. The list in email contained the exactly same list of the somewhat distant options which I had already seen. Then I was like again WTF, but why you can't deliver to the post office near me? It's the same network, there's no sane reason for not to deliver. Is there some specific reason why you're on purpose pissing people off, discriminating people living in certain places or are just playing dumb or are you really that dumb. I included link to the official list of post offices to that mail high lighting the local post office. Then they replied, Oooh, sorry, we didn't know that there is a post office out there. And then they added the post office to their delivery system option list. Why there's nobody responsible for keeping information up to date? Why they gave me pointless reply from the customer service on the first time? Are they really that incompetent that they can't check list of post offices before telling that there's no such post office? ... Or is this just typical customer service from people who are blindly trusting their malfunctioning and badly configured systems? - After one week it turned out that they can't deliver to that post office, because their current system doens't support it. I got no (positive) words for their great achievement.
  • Encountered yet another web trading place which got totally incompetent and lame developers. Their fail? Well, This is the same fail I've reported earlier, but now it's with password instead of username. SO when I create user account I give a password like [x7/9-8'%O*(jPkQe7+y . Everything goes well, but when I try to login they claim that the password is invalid. Then I use password recovery and they send me back my password using mail. Guess what my password is x798 . WTF again? First of all, where are the special characters? And what happened to the rest of password after ' sign? I it really so hard to scrypt unicode string using random salt? That task seems to be quite impossible for some elite developers. As well as there shouldn't be password recovery at least such which returns the given password. Of course no user is so stupid that they would reuse same password for multiple sites. But the whole point of using email based password recovery is that it makes account hijack just so much easier, so it shouldn't be available at all.
  • Played bit more with MemSQL. But the truth is that: I did get everything to work pretty well with Python 3.4 64 bit (Ubuntu). I don’t have any other feedback than that the fact that with current dataset sizes I’m totally happy with SQLite3 when using ’:memory:’ storage. When SQLite won’t cut it, there’s PostgreSQL. For simpler memory based key value tasks, I just use standard Python dictionary. Yet there’s nothing which I would be unhappy about MemSQL. The main fact remains, I just don’t have any real use cases for MemSQL right now with current data sources and systems.
  • Once again wondered engineering, serevr and network monitoring best practices. I just don't get how ... people are. Why do they install system using default configuration and then disable firewall. Before configuring any of the security or authentication options. When I told them that they're doing things really wrong, they just explained that they're installing the system. Well, that doesn't cut. It's BS. How about first configuring system correctly and THEN opening the firewall. These kind of things seem to be incredibly hard for engineers, administrators and most of people to get. Yes, I might sound bit annoyed, because I am. I just simply can't stand people who do not make any sense at all. But this is only very very small tip of the ice berg. Most of best practices seem to be common jokes, everybody knows those, but body cares or actually follows any rules. Funniest thing of all, are the total excuses which do not make any sense at all, why the rules can't be followed. There's no reason what so ever why things have to be done incorrectly. It doesn't provide any benefit or time saving at all. It's just that nobody cares at all... Like in this example case. Funniest part is that then they start claiming that there's nothing wrong they're doing. Yep, it even further proves the point that they're clueless what they're doing.
  • It's like the standard procedure when lighting up fire place or wood stove in sauna or whatever. First you light it up... When smoke alarm goes on, you start to think if you should open the outgoing valve in flue. Really recommended and smart move. As well as closing it as soon as there's no visible flame. What's wrong with that, the valve works just fine if we do so. Well, that wasn't the point, but these guys are just way too clueless to even get it.
  • Let's play the so many dumb way to die here. - Nothing important, it's just so funny. If you fail hard, there's only one final resting place. Only good thing is that when you fail hard enough, it's not your problem anymore.
  • This also proves the point about using private cloud. Well, you can use large public cloud provider, which hopefully follows even some of the best practices. Or you can be fool and think that private cloud is much more secure, even if then everything is actually done with attitude that if it barely works it's more than good enough. So is the private cloud smart move? It only causes a situation where you can be pretty darn sure that everything is more or less insecure, but it just seems to barely work somehow.
  • For private cloud, you won't even probably use SSL certs for administration panels and stuff like that, purely because it costs money and it isn't reasonable to get such hassle for small projects.
  • The Top Mistakes Developers Make When Using Python for Big Data Analytics - The usual stuff. Luckily my data isn't so big that I would especially need to focus on speed. But I very much acknowledge the problem with experimentation if things are running slowly. Article also really nicely explains the repeated failure patterns, like lack of understanding timezones. As well as lack of automation as source of constant semi random problems due to human errors. I've been personally doing these fully automated "hybrid" solutions. Just as she said, mixed original data source, Java filtering & preprocessing & Python finalization & coordination, but I really prefer to avoid that if possible. I always do that so that the Python task is launched first and then it automatically calls and controls other subsystems. Problem is that when such system fails, it might take a while to find where the problem is. Lack of data provenance made me smile. It's so usual story. Happens all the time. Customer asks immediate changes to be made to production and then at the end of month they complain that something should be checked. But the truth is that the monthly can be mixed results due to multiple software versions and processes. What's worse? Even recreating the data with latest version won't solve the problem because the data entry processes and stuff could have been changed simultaneously. This happens, over and over again in hasty and badly managed projects. Customers which require only monthly aggregate data, make this even much worse. With daily aggregate data we could iterate daily. But with monthly in production testing, it means that it can easily take up to an year to get things even closely working. And to make matters worse, customer can require additional changes and change their processes during this time, making the task even much harder. Lack of regression testing? Been there, done that. Problem is that nobody knows what the potential input values can be. You'll find it later. It's extremely rare to get proper examples before hand. And as mentioned, even if things would work with the perfect examples, things change in reality and you'll end up with more or less broken program anyway. Just did that well, not today, but yesterday. I did exactly what customer asked, but I'm quite sure they're not happy with it and they'll be making more requests soon and asking if old data can be reprocessed according new rules and so on. Of course the reinventing the wheel seems to be very common in many companies. Due to reinventing the wheel, none of the best practices and commonly used methods aren't being properly applied.
  • Checked out luigi - Couldn't summarize it better than they do: 'Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.'
  • I have to quote this Karolina Alexiou: 'Working well with any tool comes down to knowing its shortcomings (and especially the ones that most affect your situation).' - I personally just couldn't agree more.
  • Read a few issues of The Economist - I just can't stop loving it.
  • Finland is leading in credit card payments - Do you always pay using credit card? Do you even carry cash? I personally don't even remember when I would have used cash last time. I have only a few 50€ notes in wallet for situations when cards won't work (extremely rare, even if I use cards several times daily). Basically I'm not using cash at all. I guess I didn't withdraw cash even once during 2014 (in Finland). But when I'm travelling, you have to use inconviently cash. Except when I visited Sweden, I didn't have any kronor because I always used credit card.
  • Had annoying problems with Windows clock. It's easy to fix problems with networked computers using w32tm or net time but in this case computers aren't connected to network yet their real time clocks return semi bad time. I've only received random suggestions from HP how to fix it. But it seems that they don't even know how to get right time. So sad, modern computers yet no reliable time. And I'm not now talking about ns / ms resolution. I'm talking about getting time within +/- 1 hour!!!
  • Checked out WebRTC 1.0 draft - This is awesome HTML5 development and could potentially allow fully browser based peer 2 peer & mesh solutions.
  • Dark Patterns - How to trick users. It's annoying when mistakes do happen due to bad or lack of UI design. But it's even worse when sites are designed to deceive you.
  • Checked out GnuPG 2.1.2 release notes - No Windows binaries for latest versions yet. But soon ECC will take over old DSA/RSA keys.
  • Tinkered with Twitter integration APIs. Nothing more to say, but got it working pretty nicely.
  • Had my first major Python namespace collision problem. It took me something like 15 minutes to figure out why the logger module did almost what I wanted, but didn't exactly do it. Why it took so long? Well the new logger module inherited the old logger module so it only slightly modified the old functionality, yet the new logger module was completely from other project so I didn't even know it was there. Great example of dangers of import * from stuff which you really don't know well enough. Pretty classic fail.
  • Antivirus tools won't block malware efficiently - Sure. That's exactly why white list is much better approach than blacklist. You can use App Locker to lock system down so, that only small number known and actually required binaries can run. All other files are blocked.
  • Really neat project how to Visualize Wifi (WLAN) Signals - It's not exactly news that signals vary greatly based on location. But this guy wen't bit further than usual simple wifi signal testing. I've talked a lot of Wifi signals at work and in general, and I know the factors affecting, but most of people doesn't seem to realize how complex stuff Wifi is. There are no simple answers to Wifi reliability matters, channel selection and so on. - Signal variance is nothing new. I remember personally noticing it in late 90's that with my GSM (900MHz) phone, moving it on table just for about 5 centimeters could bring it down from full signal to no reception at all. Of course with analog radios you also notice how easily signal changes with location. If you've been ever playing with TV antenna in in bad reception and so on. Moving your hand in other room might block TV signal or make it crystal clear even if you would make there's no connection what so ever. With 2.4GHz people often forget that interference from other sources can significantly contribute. So signal quality and signal strength aren't same thing at all. Getting to the root all these things require professional, which I'm not. So one type of measurement defined as "signal strength" probably misleads you badly. Is it a good idea to select a wifi channel that doesn't have any other wifi boxes? Well, the reason might be that the channel is totally overpowered by local wireless CCT or phones. That's the reason why nobody's using it for WiFi and then you think it's a great idea to select a free Wifi channel?
  • Radio stuff is (truly) really tricky. With higher frequencies it's just like light. Why some things are in shadows and some things are well lit?What do the guys use their skills for? Well this is pretty bad. But it clearly shows how dangerous APT threats are. Even if 'the database would be encrpted' it wouldn't make any difference in this kind of cases.
  • Hackers steal millions using Malware - This clearly shows how dangerous APT threats are. Even if 'the database would be encrpted' it wouldn't make any difference in this kind of cases. I've lately blogged how dangerous this kind of treat is, especially when it's related to computer systems which people blindly trust. I've often seen unbelievable levels of trust to systems. People just can't get their head around the fact that computers system can tell them whatever lies and they shouldn't blieve it. Aonyone thinking SCADA systems or any other fully computerized systems like ATC? (scada, atc, ics, control systems, control network)
  • My site had some downtime due to uWSGI changes. Suddently it couldn't find libraries from standard path and python34_plugin.so file wasn't loaded. The only quick fix I found was to place the library in the start path of the project. Which is silly. I really don't know why --plugin-dir parameter didn't work nor why it couldn't find the plugin from the standard path or even custom path with the paratmeter. So annoying. At least my server wasn't in debug mode and leak all kind of stuff. I'll need to explore this problem bit more later, but I can't let it to cause more downtime than it has already caused.
    Exact error message: open("./python34_plugin.so"): No such file or directory [core/utils.c line 3675]
  • DownOrNot down? - It seems that they're using App Engine with Debug mode on and dumping some config information... Lol...
"Traceback (most recent call last):
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 715, in __call__
    handler.get(*groups)
  File "/base/data/home/apps/wmc/12.348534319863004081/don.py", line 306, in get
    self.work()
  File "/base/data/home/apps/wmc/12.348534319863004081/don.py", line 407, in work
    cloud = dc.fetch(wordcount, offset)
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 2160, in fetch
    return list(self.run(limit=limit, offset=offset, **kwargs))
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 2329, in next
    return self.__model_class.from_entity(self.__iterator.next())
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/datastore/datastore_query.py", line 3389, in next
    next_batch = self.__batcher.next_batch(Batcher.AT_LEAST_OFFSET)
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/datastore/datastore_query.py", line 3275, in next_batch
    batch = self.__next_batch.get_result()
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
    return self.__get_result_hook(self)
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2973, in __query_result_hook
    self._batch_shared.conn.check_rpc_success(rpc)
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1340, in check_rpc_success
    rpc.check_success()
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 579, in check_success
    self.__rpc.CheckSuccess()
  File "/base/data/home/runtimes/python/python_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 134, in CheckSuccess
    raise self.exception
OverQuotaError: The API call datastore_v3.RunQuery() required more quota than is available.
"