posted May 17, 2015, 12:44 AM by Sami Lehtinen
updated May 17, 2015, 12:53 AM
- Really nice writing about databases: CP or AP?
- Yes, many systems contain many different locking modes, and
replication options and some parameters can be tuned on request basis
and so on. Writes might fail, but reads still work (with stale data).
Been there done that. Especially NOT properly understanding your data
store will lead to hard to debug amazing problems at some point.
- Created related tags and tag cloud features for one project.
- Created full text search engine including adaptive rate crawler for one of my friends projects. It also supports refresh pings especially for data sources which have been off-line for a extended period and therefore it could take quite a while before adaptive rate data crawler would hit those again. It was fun stuff to do. Most of stuff works by using fully asynchronous JSON message queues where requests and responses are linked using UUID. This message queue design allows easy horizontal scaling.
was seriously broken with Sonera network - Huge packet loss and DNS
broken. I guess the packet loss is causing DNS lookups totally failing.
Ping works to most of national servers well, but to Amsterdam / London /
Frankfurt, it works really badly with 50+% packet loss? DDoS?
Interesting. Let's see if we see something later today in news. Update:
This situation persisted for about 20 minutes, and seems to be mostly
resolved now. DNS works and packet loss is gone.
- Crate - Yet another distributed and scalable database with
nosql goodies. Didn't go into details, but I like the one approach
instead of zoo of technologies. I often prefer not to include new tools
to the project all the time and if possible to deal with new problems
using the existing tools. Otherwise things turn into monstrous mess of
different technologies which aren't even known well and can cause
serious production issue due to 'unexpected' things happening.
Unexpected because we just don't know how it behaves under certain
situations. Also available on Google Cloud Platform with simple and easy
- Reminded my self about video compression frame types
Internet of Things? Nope, wrong. It actually stands for Internet of
Targets. Smile. That's something I completely agree about. It will be absolutely unavoidable that this will get much worse before it might get better.
- Once again
had to deal with OVH and server freeze ups. It's really annoying when
system just freezes for two minutes. Everything otherwise is fine,
except nothing happens for a long while and that's really something
which can't be tolerated. Yet it's still better than data loss or
extended outage. But that's absolute no go for 'normal' operation.
- Watched Open Networking Summit 2014 - OSN2014 - Keynote by Google Amin Vahdat. Good stuff.
- Watched on
organization using 'copy paste' system integration. Where they open
data source A on regular intervals and then just copy paste new content
from it to system B. That's nice, and efficient. Lol. I know I know,
this is nothing new at all. Business as usual. But it still makes me
- Why I won't be switching to Disque - That's very well said. Many projects are fun for a while. But
especially those which are technically demanding ones can turn into
horrible burden rather quickly. Things which "mostly work" are fun to
make. Things which work darn well, are reliable, fast and all the good
things... Well, not so fun... Those require team of smart guys willing
to tune the code for years. There will be times of frustration and sheer
despair at times.
- Bluefish programming editor - Yep, clearly made by
programmers and developers. I've reported multiple bugs with it
earlier. Like file saving working differently when using keyboard
shortcuts versus mouse. Now I found yet another bug. When I open
document and try to use syntax highlight with it I have to enable syntax
highlighting then select language x and then switch back to language y
even if the language y was already selected when I selected the syntax
highlight option. Even hitting manual syntax highlight rescan won't work
before those steps have been completed. Also doing that a few times
seems to cause the editor to crash. Yep, that's the usual state of
software. Anyway, I'm pretty happy with Python and some of the other
tools which I'm using, because those seem to be very robust and I rarely
(almost never!) need to waste my time fighting with broken platform or
working around those doing some horrible kludges.
- Latest jQuery
Mobile 1.4.5 contains again the classic bug where stuff goes under the
header bar. Aww. I've seen tons of discussions about this issue. And
there are plenty of "more or less" silly sledge hammer work-a-rounds how
to just make it work. But none of those is actually pretty solutions at
all. Some force using CSS to insert empty space at top of the content,
which is silly. Some trigger browser window resize event which is silly.
Etc. All of these do work. Which just does simply prove that the
original problem is a clear bug and I'm not just incorrectly using the
framework. Btw. With older versions like 1.4.2 there were no issue like
this. Also it's silly that when you open the page, everything is ok,
open first link, stuff gets broken, go back and open the same link
again, stuff is working again. This is exactly the kind of 'feature' I
deeply hate about frameworks, web development and developer tools in
general. That's why I like Python so much. If I do something wrong, it
doesn't work or at least produces conclusively and repeatedly similar
errors. Except ... I think I just had an example about this.
I think this is just a case where ordering of events is random, but I
don't know why. I would prefer consistent way of repeatedly showing this
File "peewee.py", line 3507, in __new__
File "peewee.py", line 1073, in add_to_class
self.model_class._meta.name, self.name, self.related_name))
1st run: AttributeError: Foreign key: ***keyN*** related name "***nameN***" collision with foreign key using same related_name.
2nd run: AttributeError: Foreign key: ***keyY*** related name "***nameY***" collision with foreign key using same related_name.
I run that script it shows more or less random related name errors,
even if there's no collision with that particular key. Instead it gives
you just related name collision with some of the items being created
with foreign related key. I would prefer if it would always show the
first collision it encounters instead of interestingly randomizing the
order it shows the errors.
- It seems that some of the page rendering
referring to situation where page goes under header. But to the
situation where page content flashes only briefly without formatting and
end result is just empty page. I'm sure everyone has encountered this
situation at times. Solution? Disabling CloudFlare's Rocket Loader
feature. After that everything is working perfectly. I'm not sure what
kind of tricks the CF is using to decide when to use the rocket loader
and when not to. But most annoying part of this problem was that it was
hard to debug. Because there were no problems at all at times and with
some browser and after full reload everything might or might not work
etc. So there could be hidden 'ordering' issues where something tries to
execute before something it requires is finished loading and boom.
- SSD drives might not provide extended data retention when powered off. - Some drives lose data in one year, some drives in 3 months and some
even faster. I personally would say that some of those times are much
shorter than I expected. So it's not a good idea to buy and external SSD
for extended data storage. That's a good example where you still should
use traditional HDD.
- VENOM - Funny, modern
virtual servers hackablevia floppy disk controller. Yes, that's right.
Bugs can and are lurking just about everywhere. Especially in places
where nobody bothers to look for those. Are you using Xen, KVM or QEMU
on your severs? Have you already patched against VENOM? Afaik, this is
one of the examples where cloud service providers better security than
self hosted systems. They have real priority on keeping systems secure.
When systems are just "run" as side business or business enabler but not
the primary focus, things like this could easily get unnoticed and such
efficient high priority measures wouldn't be taken. VENOM: "An
out-of-bounds memory access flaw was found in the way QEMU's virtual
Floppy Disk Controller (FDC) handled FIFO buffer access while processing
certain FDC commands. A privileged guest user could use this flaw to
crash the guest or, potentially, execute arbitrary code on the host with
the privileges of the hosting QEMU process."
- Read: Final HTTP/2 RFC7540 specification - I need to write separate post what I really think about it and evaluating (my personal opinions of course) the choices they have made. Based on first read through my personal favorit is the GOAWAY frame. As well as I agree with the stuff I've been writing earlier. Now HTTP/2 starts to be so complex that implementing it would be a nightmare. It's just better to use pre-exisitng HTTP/2 library than trying to make compatible implementation. This will lead to situation which has already happened with SSL where there aren't actually too many options where you can choose from. I guess most of web servers won't even bother to write HTTP/2 implementation completely from the ground up. Only maybe some large ambitious projects might do it like Apache, Nginx, IIS. Others will just pass, because it's not worth of it. I'm interested to see what kind of approach uWSGI guys will take with HTTP/2. They seem to be able to tackle all kind of complex stuff quite easily. I guess they got great and really competent team working on it.
different? Reminded my self about Kilo attack class submarines and
especially about the Russian Lada class submarines (Project 677)
content and others won't. Which of course could lead to massively
better results by the search engines which do process it.
- Microsoft investing in global submarinecables and dark fiber capacity? Doesn't
really surprise anyone. I thought it's quite clear investment when
you're large enough player. It's better to own than rent, but only when
scale truly allows it.
- Why you shouldn't use MySQL (or database in general) as queue.
- Yet, it depends so much from the environment which you're coding for
and also the performance requirements. I generally prefer NOT TO add new
technologies and dependencies as long as I can well deal with the
existing ones. Why to use MySQL if SQLite3 will do the job? Why add
message broker if SQL database does the job well enough. I personally
prefer to use well known technologies instead of using new ones. Because
news ones will also bite you. You'll make some kind of naive
implementation using the new solution, don't test it properly and after a
while you find all kind of race conditions and other more or less
interesting "surprises" because you just didn't know how the technology
you're using behaves. As example I was quite surprised at one point that
select * from table where valuefield=0 will produce totally different
results than same query where valuefield = 0.0 Yep. I just didn't know
what I was doing, and you're going to hit that kind of surprises several
times every time when you just start using something you don't know
well. I can program using Python, so I assume I can trivially port my
Well, yes and now. It will be pure pain before it works. Especially if I
won't bother to read basics, I just make hasty implementation which
works. But if there are tables which are used as some kind of queue with
status flag, at least it makes sense to use partial indexes! This is
one point I've mentioned several times in my blog already.
- Did you
know that Windows already contains port forwarding / TC / UDP relay as
it's basic feature? That's nice. It's really useful at times. "netsh
- Started to use atexit from Python standard
library for one project which requires quite much clean up when it
exists. Very useful.
In many cases I've used try, finally construction.
- uWSGI background
worker threads test was successful. uWSGI background threading test was
successful. So this means that all requests which I assume can be little
slow to handle due to external API dependencies or requiring reading
tons of foreign keys from database, will be executed in background while
showing a message to the user that data is being processed on server
side. It works. This is also to prevent CloudFlare timeouts and of
course this also follows the best practice that request it self
shouldn't take too long to handle. That's also something you really have
to do with Google App Engine because user facing request processing
time is so limited. But it's not a bad thing at all.
- Latest The
Economist (9th, May, 2015) got also interesting article about Artificial
Intelligence (AI). It's really hot topic. It's much more than the
current "statistical machine learning models" like pattern recognition.
Yet deep learning doesn't match yet with Deep AI or full artificial
intelligence. Intriguing concept of artificial brain.
- Reminded my
self about multiple version concurrency control (MVCC) and TIMESTAMP /
ROWVERSION optimistic concurrency control. Which allows to read data
process it and then simply compare and swap CAS data into database very
quickly when things are done.
- Checked out Kensho.com - That's great.
I'm not not surprised it's happening. That's something I would also do
if I would be in the right position.
- Don't drown in IPv6 addresses. Hmm, not so
interesting post. As far as I can see, everything in this post was
absolutely obvious. Main question is that do you want to provide reverse
name for all IP addresses or just for those which are being used?
That's also interesting question. As said address space is huge and it's
going to be very likely mostly or nearly completely unoccupied.
- Robots and AI are going to replace many jobs, but is that a problem? It's clear that there will be drastic changes in the future. Many
jobs will be replaced by robots and also supporting business will be
out. Yet this will free people for more productive and better jobs. Bit
more writing about self driving cars in US.
my LclBd.com test project is pretty
much working now as I wanted it to. I'll focus next on machine learning
Python libraries. Only things which remain to be fine tuned is a few
are technically trivial tasks, but just require the right mood. To
produce nice visualizations from the data I'll be using Tableau
which is one of the data discovery and visualizations tools I really
love using. I've also got some pretty nice data sets which I can use for
testing, unfortunately those are such that I can't publish the results
directly. But I think I can then utilize the learned lessons and
resulting models for something else later.
- I've been always wondering
why high end vending machines aren't more popular. Wouldn't it be
optimal at least in cities where you have expensive squaremeters and so.
You could simply oder what you want and when you're at the store you
can just pickup good which are there ready waiting for you instead of
waiting. Of course next step would be automating the delivery from this
point on. Having picking in a park? Running out of wine and cheese? A
few clicks on mobile and stuff will be landing there in two minutes.
Even without this delivery method, I thought this kind of fully
automated pickup points would be nice near metrostations and so on.
There was a story about small village which isn't large enough to run
profitable store. They replaced store with fully automated vending
machines. So this even works in cases where there are too little
customers for traditional store with personnel. Is it time to create
container system for this kind of vending systems? Refilling, transport
and everything could be fully automated. In many European countries you
can just pick up your mobile data pre-paid SIM card from vending
machine. I was disappointed to notice that it wasn't that easy when I
traveled first time to US.
- Is blocking ICMP on firewall a bad idea? - Been there done that,
I've also tried blocking protocol 41 and dhcp and yes, you'll end up
breaking the network. I've also seen tons of networks where DNS is more
or less broken.
- Checked out Vortex Bladeless Vertical Wind generators
- I wonder
if you can call it wind turbine, because it isn't a turbine at all.
- SSL Labs certifies my SSL/TLS is now as A+
- Failure of Agile? - I really liked the GROWS method. - Who said that the agile process it
self wouldn't be agile. Of course if you see need for modifications
you'll adapt it for your needs. Shouldn't that be clear to everyone? I
personally think that the failure starts at the point where talking
about process politics is much more important than what we're actually
trying to get done. It's absolute loss of focus. I think it partially
belongs in the category of Analysis Paralysis. Next time when you need
to carry something from till to your car, hire personal trainer, process
consult, physiotherapist, environmentalist and a few other guys and
have meetings for a few months to figure your what's the best way of
getting your groceries to the car. Are you really using plastic bags?
Have you made research about plastic back environment effects? No, I
think we should one new team to research that topic too. - So much fail.
Yet, I've seen that happening over and over again. I personally know
several engineers which always seem to prefer this way. Whatever and
even how simple tasks need to be done, they can spent months of it
producing absolutely nothing with value, just because it's important to
research this topic. How about just getting it done in a few hours
instead of using months? I just wonder why "smart programmers and
engineers" often seem to have total lack of common sense as well as
extremely poor or even non-existent understanding of Return on
investment (ROI). - https://en.wikipedia.org/wiki/Return_on_investment -
Where's the practical approach to it? If customer is paying N units for
stuff that does X. Do you get the customer to pay 20x the price if you
do it stupidly well researching everything and writing "so great code".
All that it takes, that it does the job and works reliably. Everything
else and all kind of coolness and research junk is just absolute waste
of time and reduces profits from the job. Sometimes I even see people
involved in sales to do similar kind of stuff without properly
considering it. It's horrible, they if someone should be very
knowledgeable about costs and profitability of stuff being sold. It's
different to do something has hobby and do it ask profitable business. I
don't care if you've been building that absolutely picture perfect WWII
battle ship model in your cellar for last 5 years. But I do wish you a
very good luck trying to sell it with a good hour price to someone. I'm
also often wondering employees who seem to be absolutely clueless about
profitability. Isn't it everyones job to take care that they work
profitably for the company? Of course one factor affecting this is that
many people are absolutely clueless what's their work worth of. Also
focusing on tasks which will produce long term savings (increases profit
too) or direct profits might be very good investments. Unfortunately
many times these guys investigating something 'cool' or 'perfect' do not
focus on those aspects. Things like renegotiating or changing service
provider for network connections, servers, or reducing licensing costs,
automating processes, system integrations, and so on can easily generate
savings which can be counted as 'passive income' and be several orders
of magnitude larger than your yearly salary, even on monthly level.
Thats' what makes you valuable. Over engineering something for 'one off'
cases, makes you just drag for whole organization. Also self-guided
attitude helps a lot. But only if you can smartly figure out what you
should be doing, so it's most beneficial for whole organization. I
always remember when one consultant working for large ERP company said
something like: "My salary is so high that if I'm not always invoicing
all the work I do, I'll get fired pretty quickly. I just need to make
sure that my work is worth of it to the employer and it's customers." -
Yet it seems that it's quite a small percentage of work force whom gets
that. You'll get agile, when you put a small team of competent guys with
right focus to get the job done, and don't give them any unnecessary
rules how to organize the thing. They'll figure it out almost
immediately without wasting time by starting to create some kind of more
or less useless rulebook. This makes common sense, these are your
strengths, you'll do that, let's run iterations, ask for opinions when
needed. Let's just get this done and delivered. And most likely the
result is well enough and it gets done pretty quickly. Also when ever
something is unclear, iterate really quickly using light drafts and then
code it. When you compare it to the 'other teams', they still might be
discussing some initial topics like who should be putting to this team
and what kind of external consulting offers we should ask for. (Yawn)
also reminds me about Parkinson's law of triviality aka bikeshedding.
- Also see the GROWS method website for future of discussing topics and wasting our time instead of getting something else done. Yeah, this is kind of joke, something like GROWS is a good idea. But it's completely another question if I personally need it. I think I've got enough experience in this business to draw my own lines and optimize per case quickly instead of using some more or less suitable fixed rule sets. When someone says do X I'm always asking why, if it seems reasonable, Can this be done better, and then next question is is it worth of planning how to do it better. Very often the answer is no, it isn't.