Blog

Google+
My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me! My views and opinions are naturally my own and do not represent anyone else or other organizations.

[ Full list of blog posts ]

2015 summer keyword dump

posted by Sami Lehtinen   [ updated ]

Just a little keyword dump from the ARTS standard

Consumer promotion delivery, marketing and merchandising, proof of purchase, returns, rebates, and manufacturer registration, payment dispute resolution, issuer, issuers, merchants, financial institutions, manufacturers, provide the digital purchase receipts, customers, mobile operators, offline and online trading, purchaser, authorizes, issuing, receiving, retail, consumer, business, organizations, vendors, agents, expenses, expense, taxes, reimbursement, credit card, debit card, dispute resolution, rebates, warranties, interfaces, model, architectural model, business process model, business scope, out of scope, changes, payments, pays, RFID, coupon, survey, exchange, exchanged, disputes, amount, bank, accounting, expenses, invoice, travel, manufacturer, redeem, retailer, extended, targeted, targeting, ticket, ticketing, tickets, proximity, financial applications, redemption, shopping list, settlement, green, audit, candidate, inventory, attendance, promotions, promotion, loyalty, security, data warehousing, forecasting, XML, GITN, expense management, data mining, end user, notification, notifications, mobile, application, BPMN, CRM, POSlog, UUID, format, store, identify, tender, restock, SGML, OFX, data validation, XSLT, SOA, best practices, data dictionary, common data, data mapping, transactions, header, representation, brief description, scenario, instance, transaction, transactions, UPC, EAN, quantity, barcode, QR code, process flow, flowchart, data flow, diagram, business process mapping, use case, use cases.

Some startup related background information and stuff I've done

  • Did some product planning back in days. This stuff is already outdated so I can rewrite a bit about it. But actually I'm just listing just keywords. Because I can't go into any details anyway. 
  • Business Plan, Initial Market Study, Concept, Vision, Mission, Market Research, Operating Strategy, Target Market, Competitors, Promotion, Financial Viability, Budget, Team, Background, Technology considerations: Native, HTML5, Java, JavaScript, Node.js and other alternatives, Proof Of Concept (POC). Where development should be done, what kind of team and developers could be used. Where funding would come and so on. Had a few meetings with potential developers and so on. 
  • Hosting costs calculation matrix including several options, AWS, Azure, GCE, Hetzner, UpCloud, OVH, Nebula, CapNova, Sigmatic, etc. Competitor analysis including review of 10+ competitors. 
  • Customer Segments, Value Propositions, Channels, Customer Relationships, Revenue streams, Key Resources, Key Activities, Key Partnerships, Cost Structure.
If you want to know more, just mail me. - Thanks

Log deduplication and compression = 73,62% saving vs 7-zip ultra compression

posted by Sami Lehtinen   [ updated ]

I had some problems storing my long term log data. Earlier it was just LZMA compressed using 7-Zip ultra compression. But after I got sick'n'tired of those logs being so large. (Yes that's quite relative term) Yet logs are so important that these have to be stored for seven years and at least in two physically separated locations and there secondary off-site backups for even these primary storages. So what to do? I decided to write a slightly better solution for handling this kind of log data.

I created three database tables
1) Timeline: timestamp, message_id
2) Messages: id, hash, block_id
3) Blocks: id, datablob

Now the timeline contains only timestamps and id relation to messages table, which contains hash of the data record being stored relation to block id, which tells which block contains the message.

I collect messages until I got 32MB of NEW data. After that data is LZMA compressed into one message block and messages hashes are written and linked to timeline.

During playback I read required blocks and decompress messages from those and then cache decompressed messages using message id and CLOCK-Pro caching (PyClockPro) so I can efficiently utilize memory and I don't need to decompress blocks all the time. Also using efficient caching like CLOCK-Pro causes unneeded parts of decompressed blocks quickly to be discarded from memory.

Why this is so effective? That's because most of log messages are repeated a lot. Especially when you just take out the timestamp part from the message and separate it out. Let's see the statistics, how much more efficient this method is compared to plain old yet 'very good' 7-zip LZMA ulta compression.

Log entries for 2014 compressed using this method:
413GB old 7-zip data
 61GB My format

Year 2015 so far (about 4 months worht of data):
124GB old 7-zip data
 28GB My format

Writing the required code to make all this happen took just about 3 hours. But processing the old data took bit longer. But in future the data processing is faster (less data needs to be read and stored) due to this efficient compression and deduplication.

Space savings: 86GB vs 326GB = 73,62% saving compared to 7-zip ultra compression. At this point it's easy to forget that the 7-zip ultra compressed data is already over 80% reduced in size.

Let's just say, that a few hours of work and these results? Yes, I'm really happy with this. You might ask why logs are only 80% compressed? Isn't usual ratio more like 95%? Yes, that's right. Usual ratio is something like 95%. But in these log messages there are some rather long and encrypted entries, which make deduplication more efficient and won't compress as text. Also storing these messages repeatedly if missed by compression window is quite expensive, even if 7-zip ultra window is formidable in size compared to almost all other compression methods.

I know there are existing solutions, but I was bit bored during my summer vacation, and this looked like a nice test to do.

AI, ML, DB, Work, Profit, Agile, Grows, Python, JS, Project management, Software development, etc

posted May 17, 2015, 12:44 AM by Sami Lehtinen   [ updated May 17, 2015, 12:53 AM ]

  • Really nice writing about databases: CP or AP?  - Yes, many systems contain many different locking modes, and replication options and some parameters can be tuned on request basis and so on. Writes might fail, but reads still work (with stale data). Been there done that. Especially NOT properly understanding your data store will lead to hard to debug amazing problems at some point.
  • Created related tags and tag cloud features for one project.
  • Created full text search engine including adaptive rate crawler for one of my friends projects. It also supports refresh pings especially for data sources which have been off-line for a extended period and therefore it could take quite a while before adaptive rate data crawler would hit those again. It was fun stuff to do. Most of stuff works by using fully asynchronous JSON message queues where requests and responses are linked using UUID. This message queue design allows easy horizontal scaling.
  • Something was seriously broken with Sonera network - Huge packet loss and DNS broken. I guess the packet loss is causing DNS lookups totally failing. Ping works to most of national servers well, but to Amsterdam / London / Frankfurt, it works really badly with 50+% packet loss? DDoS? Interesting. Let's see if we see something later today in news. Update: This situation persisted for about 20 minutes, and seems to be mostly resolved now. DNS works and packet loss is gone.
  • Crate - Yet another distributed and scalable database with nosql goodies. Didn't go into details, but I like the one approach instead of zoo of technologies. I often prefer not to include new tools to the project all the time and if possible to deal with new problems using the existing tools. Otherwise things turn into monstrous mess of different technologies which aren't even known well and can cause serious production issue due to 'unexpected' things happening. Unexpected because we just don't know how it behaves under certain situations. Also available on Google Cloud Platform with simple and easy deployment.
  • Reminded my self about video compression frame types
  • IoT? Internet of Things? Nope, wrong. It actually stands for Internet of Targets. Smile. That's something I completely agree about. It will be absolutely unavoidable that this will get much worse before it might get better.
  • Once again had to deal with OVH and server freeze ups. It's really annoying when system just freezes for two minutes. Everything otherwise is fine, except nothing happens for a long while and that's really something which can't be tolerated. Yet it's still better than data loss or extended outage. But that's absolute no go for 'normal' operation.
  • Watched Open Networking Summit 2014 - OSN2014 - Keynote by Google Amin Vahdat. Good stuff.
  • Watched on organization using 'copy paste' system integration. Where they open data source A on regular intervals and then just copy paste new content from it to system B. That's nice, and efficient. Lol. I know I know, this is nothing new at all. Business as usual. But it still makes me smile widely.
  • Why I won't be switching to Disque - That's very well said. Many projects are fun for a while. But especially those which are technically demanding ones can turn into horrible burden rather quickly. Things which "mostly work" are fun to make. Things which work darn well, are reliable, fast and all the good things... Well, not so fun... Those require team of smart guys willing to tune the code for years. There will be times of frustration and sheer despair at times.
  • Bluefish programming editor - Yep, clearly made by programmers and developers. I've reported multiple bugs with it earlier. Like file saving working differently when using keyboard shortcuts versus mouse. Now I found yet another bug. When I open document and try to use syntax highlight with it I have to enable syntax highlighting then select language x and then switch back to language y even if the language y was already selected when I selected the syntax highlight option. Even hitting manual syntax highlight rescan won't work before those steps have been completed. Also doing that a few times seems to cause the editor to crash. Yep, that's the usual state of software. Anyway, I'm pretty happy with Python and some of the other tools which I'm using, because those seem to be very robust and I rarely (almost never!) need to waste my time fighting with broken platform or working around those doing some horrible kludges.
  • Latest jQuery Mobile 1.4.5 contains again the classic bug where stuff goes under the header bar. Aww. I've seen tons of discussions about this issue. And there are plenty of "more or less" silly sledge hammer work-a-rounds how to just make it work. But none of those is actually pretty solutions at all. Some force using CSS to insert empty space at top of the content, which is silly. Some trigger browser window resize event which is silly. Etc. All of these do work. Which just does simply prove that the original problem is a clear bug and I'm not just incorrectly using the framework. Btw. With older versions like 1.4.2 there were no issue like this. Also it's silly that when you open the page, everything is ok, open first link, stuff gets broken, go back and open the same link again, stuff is working again. This is exactly the kind of 'feature' I deeply hate about frameworks, web development and developer tools in general. That's why I like Python so much. If I do something wrong, it doesn't work or at least produces conclusively and repeatedly similar errors. Except ... I think I just had an example about this.
  • Actually I think this is just a case where ordering of events is random, but I don't know why. I would prefer consistent way of repeatedly showing this error.
    File "peewee.py", line 3507, in __new__
      field.add_to_class(cls, name)
    File "peewee.py", line 1073, in add_to_class
      self.model_class._meta.name, self.name, self.related_name))
    1st run: AttributeError: Foreign key: ***keyN*** related name "***nameN***" collision with foreign key using same related_name.
    2nd run: AttributeError: Foreign key: ***keyY*** related name "***nameY***" collision with foreign key using same related_name.
    When I run that script it shows more or less random related name errors, even if there's no collision with that particular key. Instead it gives you just related name collision with some of the items being created with foreign related key. I would prefer if it would always show the first collision it encounters instead of interestingly randomizing the order it shows the errors.
  • It seems that some of the page rendering & JavaScript problems were caused by the CloudFlare. No I'm not now referring to situation where page goes under header. But to the situation where page content flashes only briefly without formatting and when JavaScript based page formatting code should format the page, the end result is just empty page. I'm sure everyone has encountered this situation at times. Solution? Disabling CloudFlare's Rocket Loader feature. After that everything is working perfectly. I'm not sure what kind of tricks the CF is using to decide when to use the rocket loader and when not to. But most annoying part of this problem was that it was hard to debug. Because there were no problems at all at times and with some browser and after full reload everything might or might not work etc. So there could be hidden 'ordering' issues where something tries to execute before something it requires is finished loading and boom.
  • SSD drives might not provide extended data retention when powered off. - Some drives lose data in one year, some drives in 3 months and some even faster. I personally would say that some of those times are much shorter than I expected. So it's not a good idea to buy and external SSD for extended data storage. That's a good example where you still should use traditional HDD.
  • VENOM - Funny, modern virtual servers hackablevia floppy disk controller. Yes, that's right. Bugs can and are lurking just about everywhere. Especially in places where nobody bothers to look for those. Are you using Xen, KVM or QEMU on your severs? Have you already patched against VENOM? Afaik, this is one of the examples where cloud service providers better security than self hosted systems. They have real priority on keeping systems secure. When systems are just "run" as side business or business enabler but not the primary focus, things like this could easily get unnoticed and such efficient high priority measures wouldn't be taken. VENOM: "An out-of-bounds memory access flaw was found in the way QEMU's virtual Floppy Disk Controller (FDC) handled FIFO buffer access while processing certain FDC commands. A privileged guest user could use this flaw to crash the guest or, potentially, execute arbitrary code on the host with the privileges of the hosting QEMU process."
  • Read: Final HTTP/2 RFC7540 specification - I need to write separate post what I really think about it and evaluating (my personal opinions of course) the choices they have made. Based on first read through my personal favorit is the GOAWAY frame. As well as I agree with the stuff I've been writing earlier. Now HTTP/2 starts to be so complex that implementing it would be a nightmare. It's just better to use pre-exisitng HTTP/2 library than trying to make compatible implementation. This will lead to situation which has already happened with SSL where there aren't actually too many options where you can choose from. I guess most of web servers won't even bother to write HTTP/2 implementation completely from the ground up. Only maybe some large ambitious projects might do it like Apache, Nginx, IIS. Others will just pass, because it's not worth of it. I'm interested to see what kind of approach uWSGI guys will take with HTTP/2. They seem to be able to tackle all kind of complex stuff quite easily. I guess they got great and really competent team working on it.
  • Something different? Reminded my self about Kilo attack class submarines and especially about the Russian Lada class submarines (Project 677)
  • Does Google Botrun and index JavaScript? - Yes it does. I guess this will be one of the things making again difference between many search engines. Others process dynamic javascript generated content and others won't. Which of course could lead to massively better results by the search engines which do process it.
  • Microsoft investing in global submarinecables and dark fiber capacity? Doesn't really surprise anyone. I thought it's quite clear investment when you're large enough player. It's better to own than rent, but only when scale truly allows it. 
  • Writing responsive and fluid web pages with HTML5 and CSS using the new picture element, without requiring JavaScript to select right images for the page.
  • Why you shouldn't use MySQL (or database in general) as queue. - Yet, it depends so much from the environment which you're coding for and also the performance requirements. I generally prefer NOT TO add new technologies and dependencies as long as I can well deal with the existing ones. Why to use MySQL if SQLite3 will do the job? Why add message broker if SQL database does the job well enough. I personally prefer to use well known technologies instead of using new ones. Because news ones will also bite you. You'll make some kind of naive implementation using the new solution, don't test it properly and after a while you find all kind of race conditions and other more or less interesting "surprises" because you just didn't know how the technology you're using behaves. As example I was quite surprised at one point that select * from table where valuefield=0 will produce totally different results than same query where valuefield = 0.0 Yep. I just didn't know what I was doing, and you're going to hit that kind of surprises several times every time when you just start using something you don't know well. I can program using Python, so I assume I can trivially port my programs to JavaScript at any time and run those using every browser. Well, yes and now. It will be pure pain before it works. Especially if I won't bother to read basics, I just make hasty implementation which works. But if there are tables which are used as some kind of queue with status flag, at least it makes sense to use partial indexes! This is one point I've mentioned several times in my blog already.
  • Did you know that Windows already contains port forwarding / TC / UDP relay as it's basic feature? That's nice. It's really useful at times. "netsh interface portproxy"
  • Started to use atexit from Python standard library for one project which requires quite much clean up when it exists. Very useful. In many cases I've used try, finally construction.
  • uWSGI background worker threads test was successful. uWSGI background threading test was successful. So this means that all requests which I assume can be little slow to handle due to external API dependencies or requiring reading tons of foreign keys from database, will be executed in background while showing a message to the user that data is being processed on server side. It works. This is also to prevent CloudFlare timeouts and of course this also follows the best practice that request it self shouldn't take too long to handle. That's also something you really have to do with Google App Engine because user facing request processing time is so limited. But it's not a bad thing at all.
  • Latest The Economist (9th, May, 2015) got also interesting article about Artificial Intelligence (AI). It's really hot topic. It's much more than the current "statistical machine learning models" like pattern recognition. Yet deep learning doesn't match yet with Deep AI or full artificial intelligence. Intriguing concept of artificial brain.
  • Reminded my self about multiple version concurrency control (MVCC) and TIMESTAMP / ROWVERSION optimistic concurrency control. Which allows to read data process it and then simply compare and swap CAS data into database very quickly when things are done.
  • Checked out Kensho.com - That's great. I'm not not surprised it's happening. That's something I would also do if I would be in the right position.
  • Don't drown in IPv6 addresses. Hmm, not so interesting post. As far as I can see, everything in this post was absolutely obvious. Main question is that do you want to provide reverse name for all IP addresses or just for those which are being used? That's also interesting question. As said address space is huge and it's going to be very likely mostly or nearly completely unoccupied.
  • Robots and AI are going to replace many jobs, but is that a problem? It's clear that there will be drastic changes in the future. Many jobs will be replaced by robots and also supporting business will be out. Yet this will free people for more productive and better jobs. Bit more writing about self driving cars in US.
  • Because my LclBd.com test project is pretty much working now as I wanted it to. I'll focus next on machine learning Python libraries. Only things which remain to be fine tuned is a few JavaScript things and some heavy background processing of data. Which are technically trivial tasks, but just require the right mood. To produce nice visualizations from the data I'll be using Tableau which is one of the data discovery and visualizations tools I really love using. I've also got some pretty nice data sets which I can use for testing, unfortunately those are such that I can't publish the results directly. But I think I can then utilize the learned lessons and resulting models for something else later.
  • I've been always wondering why high end vending machines aren't more popular. Wouldn't it be optimal at least in cities where you have expensive squaremeters and so. You could simply oder what you want and when you're at the store you can just pickup good which are there ready waiting for you instead of waiting. Of course next step would be automating the delivery from this point on. Having picking in a park? Running out of wine and cheese? A few clicks on mobile and stuff will be landing there in two minutes. Even without this delivery method, I thought this kind of fully automated pickup points would be nice near metrostations and so on. There was a story about small village which isn't large enough to run profitable store. They replaced store with fully automated vending machines. So this even works in cases where there are too little customers for traditional store with personnel. Is it time to create container system for this kind of vending systems? Refilling, transport and everything could be fully automated. In many European countries you can just pick up your mobile data pre-paid SIM card from vending machine. I was disappointed to notice that it wasn't that easy when I traveled first time to US.
  • Is blocking ICMP on firewall a bad idea? - Been there done that, I've also tried blocking protocol 41 and dhcp and yes, you'll end up breaking the network. I've also seen tons of networks where DNS is more or less broken.
  • Checked out Vortex Bladeless Vertical Wind generators - I wonder if you can call it wind turbine, because it isn't a turbine at all.
  • SSL Labs certifies my SSL/TLS is now as A+
  • Failure of Agile? - I really liked the GROWS method. - Who said that the agile process it self wouldn't be agile. Of course if you see need for modifications you'll adapt it for your needs. Shouldn't that be clear to everyone? I personally think that the failure starts at the point where talking about process politics is much more important than what we're actually trying to get done. It's absolute loss of focus. I think it partially belongs in the category of Analysis Paralysis. Next time when you need to carry something from till to your car, hire personal trainer, process consult, physiotherapist, environmentalist and a few other guys and have meetings for a few months to figure your what's the best way of getting your groceries to the car. Are you really using plastic bags? Have you made research about plastic back environment effects? No, I think we should one new team to research that topic too. - So much fail. Yet, I've seen that happening over and over again. I personally know several engineers which always seem to prefer this way. Whatever and even how simple tasks need to be done, they can spent months of it producing absolutely nothing with value, just because it's important to research this topic. How about just getting it done in a few hours instead of using months? I just wonder why "smart programmers and engineers" often seem to have total lack of common sense as well as extremely poor or even non-existent understanding of Return on investment (ROI). - https://en.wikipedia.org/wiki/Return_on_investment - Where's the practical approach to it? If customer is paying N units for stuff that does X. Do you get the customer to pay 20x the price if you do it stupidly well researching everything and writing "so great code". All that it takes, that it does the job and works reliably. Everything else and all kind of coolness and research junk is just absolute waste of time and reduces profits from the job. Sometimes I even see people involved in sales to do similar kind of stuff without properly considering it. It's horrible, they if someone should be very knowledgeable about costs and profitability of stuff being sold. It's different to do something has hobby and do it ask profitable business. I don't care if you've been building that absolutely picture perfect WWII battle ship model in your cellar for last 5 years. But I do wish you a very good luck trying to sell it with a good hour price to someone. I'm also often wondering employees who seem to be absolutely clueless about profitability. Isn't it everyones job to take care that they work profitably for the company? Of course one factor affecting this is that many people are absolutely clueless what's their work worth of. Also focusing on tasks which will produce long term savings (increases profit too) or direct profits might be very good investments. Unfortunately many times these guys investigating something 'cool' or 'perfect' do not focus on those aspects. Things like renegotiating or changing service provider for network connections, servers, or reducing licensing costs, automating processes, system integrations, and so on can easily generate savings which can be counted as 'passive income' and be several orders of magnitude larger than your yearly salary, even on monthly level. Thats' what makes you valuable. Over engineering something for 'one off' cases, makes you just drag for whole organization. Also self-guided attitude helps a lot. But only if you can smartly figure out what you should be doing, so it's most beneficial for whole organization. I always remember when one consultant working for large ERP company said something like: "My salary is so high that if I'm not always invoicing all the work I do, I'll get fired pretty quickly. I just need to make sure that my work is worth of it to the employer and it's customers." - Yet it seems that it's quite a small percentage of work force whom gets that. You'll get agile, when you put a small team of competent guys with right focus to get the job done, and don't give them any unnecessary rules how to organize the thing. They'll figure it out almost immediately without wasting time by starting to create some kind of more or less useless rulebook. This makes common sense, these are your strengths, you'll do that, let's run iterations, ask for opinions when needed. Let's just get this done and delivered. And most likely the result is well enough and it gets done pretty quickly. Also when ever something is unclear, iterate really quickly using light drafts and then code it. When you compare it to the 'other teams', they still might be discussing some initial topics like who should be putting to this team and what kind of external consulting offers we should ask for. (Yawn)
  • This also reminds me about Parkinson's law of triviality aka bikeshedding.
  • Also see the GROWS method website for future of discussing topics and wasting our time instead of getting something else done. Yeah, this is kind of joke, something like GROWS is a good idea. But it's completely another question if I personally need it. I think I've got enough experience in this business to draw my own lines and optimize per case quickly instead of using some more or less suitable fixed rule sets. When someone says do X I'm always asking why, if it seems reasonable, Can this be done better, and then next question is is it worth of planning how to do it better. Very often the answer is no, it isn't.

IPv6, vCPU vs CPU, Duplicati, VAT, Profiling, Cold Storage, Mobile Retail, Customer Service, Collections

posted May 10, 2015, 12:55 AM by Sami Lehtinen   [ updated May 10, 2015, 12:56 AM ]

  • DNA announced that it's going to provide IPv6, it's the first major operator in Finland to do that. Sonera has been offering limited 6rd so far. DNA IPv6 info.
  • Reminded my self about Finnish Communications Regulatory Authority's (FICORA) IPv6 regulations and recommendations (200/20014 S, TR-177, RIPE-554, 28 H/2010 M, MPS 28, 13 B/2011, RFC 7084, RFC 4192, RFC 7010, SLAAC, DHCPv6, RA, DHCPv6-PD, Prefix Delegation, DNS, NTP, MRO flags, Managed Configuration, Other Configuration, RFC 4890, ICMPv6, 3GPP R 8, 9, 10, LTE, UMTS, RFC 5969)
  • In Finland it's recommended that everybody gets at least /56 IPv6 network prefix. It's also recommended that users receive "long term addresses" meaning that customer will be using the same addressing for extended periods (years) and recommended valid address duration is 30 days, which allows address to be refreshed at any point during that time. Long term addressing also makes privacy addressing more or less useful when you can track users based on /56 subnets level alone.
  • IPv6 is important part of Net Neutrality. Because without IPv6 there's clear discrimination towards users and operators who do not happen to have IPv4 addresses.
  • Duplicati claims that remote directory does not exist. Without even connecting the server. Classical trap. Error message is 100% misleading and contains pure lies. As well as it doesn't have ANYTHING at all to do with the problem what they're reporting. It's just so annoying to see programs to do stuff like this. Yet on the other hand, when you know how to create absolutely misleading error messages you can place those whenever you want in case it's required. Investigating the problem which has nothing to do with the underlying issue will surely waste a lot of time for the whole team. After having a lunch and thinking about it again, I found the problem. Problem? The IP address isn't white listed on FTP server level. This causes FTP server to completely reject binding attempt -> No log entry about rejected connection. As well as Duplicati gives totally misleading message that remote directory doesn't exist. It should say connection refused. Yet, it just doesn't do that. Giving false information and misleading error messages just wastes everybody's time like it did in this case.
  • Another favorite type of error is that if you get error 666 after 1 seconds of starting, it means something else than getting it after 5 seconds of starting and so on. When you run the program with unfamiliar environment and get the error 666 after 3 seconds. Which of the 666 errors it is? Have fun!
  • How to make bullet proof Python programs which never crash? Just user totally generic catch all error handler with nice descriptive error messages and your program won't ever crash. Btw. This awesome error handling module is so generic, that it works for every application. Intuitive and informative error messages also make all other documentation of the project unnecessary saving huge amount of time when developing new applications and software. Don't ever waste your time writing man pages or command line help. Even source code comments are now past.
    Just add to your program:
    try:
      code_here()
    except:
      print('Error')
    # Now you can handle all exceptions and your program never crashes! It's always ehh, controlled termination.

  • This is funny, people often say more cores is better. But that's not the way. Like in case where there is an option to choose 100% of one Xeon Core or there's option to choose 8x vCPU which shows as 8 cores to virtual machine. But it might be so that even all of those 8 vCPUs combined doesn't produce same performance as the one real Xeon Core from what ever model is being used. So don't get confused about "getting so many cores". It means nothing. And having more cores can be significantly worse than having fewer. Especially in cases where the applications won't utilize the cores fully. Of course this helps in situations which I've described earlier, where really bad code is run on high priority and then "everything is jammed" and adding more cores solved the problem. But uuh. How about just fixing that crappy code or running it on lower priority? Having more slower CPUs can also bring up many kind of locking issues which you wouldn't have encountered otherwise. Race conditions and stuff like that. It's so common that code isn't perfect, there are flaws. But with a few fast cores, it's much less likely to hit these than with tons of really slow CPUs. With systems using traditional locking this also leads to worse lock contention. Because time spent processing data while lock is being held is longer.
  • Had interesting meeting about mobile payments integration. It's interesting to see what will follow. I know there are projects which are full of hype, lot of resources are spent and nothing practical comes out. But let's see and hope that in this case there would be some kind of useful end product. At least they got good prospects how to get this stuff to work. But isn't that what everybody is saying?
  • There's so much to do with server subcontracts. Security issues, invoicing procedures and modes. How invoices get handler correctly and correct VAT paid and so on. Some service providers charge all service individually, which is bad, because it produces so many invoices. Then there are service providers which charge like monthly or when there's certain amount of services to be charged used. When that happens they can provide VAT calculation for whole sum at once. This is of course very efficient, but requires a credit account. Then there's the same model which is just pre-paid, so you pay something before hand, then when invoices come, they charge it from that account. But now it's a good question if that account is with or without VAT. In some cases, it's with vat, and that can be a problem. Because account charges do not separate charge sum from VAT. But the VAT should be individually picked from all services which are charged from that account. It's quite problematic way do deal with that. Because if I pay lets say 10000€ including vat, there's no VAT being paid yet. The vat separation happens later from the invoices paid from that account. It's horror for bookkeeping. Some other providers do so that when you charge account for 10000€ they actually invoice 12400€ and the VAT separation from payment happens at that stage. I think personally that it's better way to deal with VAT because it makes things much simpler than the solution where VAT is separated later. Only case where that won't work, is of course cases where services being charged from account can contain multiple different VAT classes. That, if something is going to be interesting. So four different methods. Direct invoicing, batch invoicing based on invoices (via credit), paying from account, where VAT separated when charging and paying from account, where VAT is separated when using the account. It's just wonderful how much mess and discussion you can get from simple topics with accounting department.
  • Why performance profiling is so important? Well, fixing just a few methods / functions from whole program can make it easily 90% faster. This was the case once again with one project. Over 90% of CPU and memory and database access was caused by single 42 rows / lines long function. Fixing that made everything run just so much faster and lighter.
  • Got small headache with JavaScript, but via trial and error, reading, and retrying, you'll learn finally. Yes, JavaScript isn't Pythonic, but I'll survive. It just takes bit more time.
  • Excellent analysis about Aerospike database. This is exactly why you need to know the implementation details well enough. If you just go and assume things, you're going to fail badly.
  • Was Python 3.X a mistake? I personally think it certainly wasn't. Here's a nice comparison what's the situation with 'new' versions of PERL and PHP.
  • How to connect IPv4 systems to IPv6 network. That's bad one. Because every system should support IPv6. Yeah, I know. We're all going to deal with that pain at some point in future. Simplest solution would be TCP relay proxy afaik. - Rest of my thoughts at LinkedIn: That's totally application dependent. Operating systems and networks also affect that. If there are some heavy legacy reasons why this can't be done, I would use separate proxy. Of course this case assumes you can freely control the connection points. So instead of connecting to IPv6 only server, connect to IPv4 server which can relay the connection to IPv6 server. That's the easiest thing which I would use in case I would encounter such requirements with legacy hardware which can't use IPv6 at all. Of course there are several ways to do this. But simplest setup would be using normal workstation with proxy / relay software, which is pretty trivial to setup. When connection X comes in, connect to Y and relay all data back and forth. Been there, done that, several times. (Just like ssh port forwarding, or Tor hidden service, nothing special there). You aren't providing any useful information with your questions, so I don't really know what would be the preferred solution. Technically this has nothing to do with NAT. Run one box which connects both networks (v4 / v6). Then you make clients to connect this relay on port X which then in turn connects using TCPv6 the server and port X forwarding all traffic. Potential problems? Security, authentication and of course if the server limits connections by IP or so, now all IPv4 connections seem to be coming from one IP. But I assume this is industrial or something setup, so I'm sure you can work around both of those limits in limited / private network(s). But these are really hard to judge when you don't really know what the connection(s) are being used for.
  • Under the hood - Facebooks cold storage system. Really nice article. I have to say that they sure did choose no frills approach. Because there was nothing particularly interesting or new. That's just how data is being stored. I've also blogged about Erasure Coding and Reed-Solomon error correction several times. And as they say. It's ages old and well known stuff. So I was bit disappointed after reading the article and finding out that there really wasn't anything new. Only the 1 exabyte scale they're doing it is of course something amazing. But that's just scaling up the concept.
  • Retailer Mobile Strategies Start With Driving In-Store Sales - What I might like? After all these years and QR code hype. I might be ready to see QR codes on shelve labels which would offer extensive product information on customers own mobile device. As well as stores should provide tables which you can use to get the same information if you don't want to use your own device. It's amazing how often product information which is a key factor to sell, is bad or nearly non-existent. One Finnish on-line store which is "leading the market" is selling backpacks online. Yet they fail to tell the size of the backpack. They just mention the length of the person which the backpack is designed for. What kind of ... Yeah, that's just so much fail. This same chain is especially saying that they prefer to provide perfect product information to customers, because it makes the sales transaction efficient and quick, because there's no need to talk with customer service. Yet they leave key details out of their descriptions.
  • The customer service approach related to so many aspects on on-line trade. I often hear people saying that they got so good customer service. But my personal opinion is that the company is doing something really wrong if they need customer service and customers need to contact it. I've encountered so many customer service especially with larger service providers which are extremely, nice, polite, what ever. But there' just one fail. How about getting the thing just done, and not waste time talking about it? So companies boasting about their customer services, probably suck. If things work out well without ever contacting the customers service, that's the way things should work out.
  • Checked out some RIPE IPv6 address allocation stuff. It seems that smallest IPv6 block they're willing to allocate is /32 address space. If you need anything smaller than that, you just should contact your ISP to get that.
  • My thoughts about new Google+ Collections feature. "Google+ Collections? Well, that's a nice way to group posts. Because I've always hated the way you need to follow a "person" or "page" instead of "topic". This helps a lot. Let's say someone posts 99% of cat pictures and funny theme stuff. But that 1% that is left out is absolutely amazing original technology related posts which I really don't want to miss. Should I see all that 99%? No thank you, please. So now if the poster uses collections correctly, this will fix the major issue I were experiencing. I've got a few people in mind, which I really hope would start using collections properly asap. I also hope that collection could be applied to posts made to communities. Aka traditional cross-posting or labeling."
  • Nice basic TED talk about AI, what happens when it get's smarter than us? Nothing new there. It's just good to remind your self about this topic every now and then.

Ricardian Contract, SIGAINT, NoTCP, PEP484, RUDP, PM, Business Culture, Efficiency, DBaaS, Hyper-Convergence

posted May 3, 2015, 9:46 AM by Sami Lehtinen   [ updated May 4, 2015, 10:07 AM ]

  • Studied Bounded Futures Ricardian Contract for OpenBazaar
  • Docker without Docker - Docker is just standardized bunch of techniques which already existed
  • Studied: CopSSH and played with it to familiarize with the product. Simple and easy to use and understand, which is generally a good thing.
  • Once again wondered how efficiently software engineers handle data transfers. Their way of doing it? Opening a remote desktop connections to two servers and then using clipboard to transfer ~200 GB of files and then complaining it's being slow and often unreliable especially in cases where there are individual files which are like 50GB in size. They used two days trying to transfer that data from server to another. I just couldn't stand it. Solution? Compress data using 7-zip and transferring it using SSH. Result? Compression time ~40 minutes transfer time, transfer time ~20 minutes and decompression time ~30 minutes. That's it. I think it makes more like 1,5 hours than several days. Even if you think people can do stuff, it still seems matters who's getting it done. There are huge efficient benefits gained from doing things efficiently. Yes, no surprise there. I've earlier witnessed cases where engineers try to move stuff from computer to another using USB stick. It's also laughable. Usually they transfer lot of stuff that's not needed to be transferred, secondly they transfer tens of hundreds of thousands of small files, which is for sure totally inefficient. They might try to transfer larger files using fat32, which of course fails. In this case efficiency can be hugely improved and when you do it inefficiently enough it's not going to work at all which is a bonus on top of it being just darn slow. So please. Only transfer what's needed, compress it, move efficiently (USB or network) and decompress. It's not that hard, yet it seems to be really hard for many.
  • Privacy is really hard. You should train for it. So you have the capability if and when needed. All this surveillance forces journalists to hink and act like spies.
  • I were really excited about the future prediction markets at one time. OpenBazaar allows easy P2P contracts directly between individuals and that could bring totally new kind of financial and predictive speculation (aka betting) markets. As well as one interesting field of development would be fully automated notaries, which would read data from reliable sources and automatically quickly and efficiently deal with any payment contracts which are linked to outcome of those events. Like sports, politics, stock prices and so on.
  • Studied: General Data Protection Regulation (GDPR). Related: Data Protection Officer (DPO), Data Protection Authorities (DPA), EU, European Union.
  • SIGAINT project targetted Tor Exit Nodes. No news there, it's well known point of interception. It would be much smarter to use encrypted communication if you're worried about security. Or just run hidden service as to begin with.
  • Type Hints: PEP-484 Adding Gradual Typing to Python 3.5.
  • Refreshed memory and current status about Nofollow usage.
  • I've greatly improved my CSS skills. I get things done reasonably quickly. I'm not expert, but getting stuff done isn't pure swearing and reading more and more documentation and experimenting for hours why this ... just doesn't work out. I've got clear idea how to do what I want to accomplish.
  • One of major challenges I did was large database migration script stuff for one project. Tons of tables getting renamed, fields, added, removed and most importantly remapped to new structure using additional conversion program (Python) which I had to write. I got it done. Of course it was kind of trivial in my mind, but making sure that everything goes absolutely right when script is run during short service break got me bit stressed at first. But after testing, and fixing stuff I got confident about getting it done. I also found a few bugs from the main application which I wrote the script for that got also fixed during same update.
  • E-receipt and m-receipt points: Fragmented environment, Investment efficiency (investment / benefit), Comfortably and usability (off-line environment in e-society), Accounting and reporting, Analytics and cost management.
  • My first time I used findstr on Windows Server. It's always nice to learn something new.
  • NoTCP - Nothing new there. One of the TCP works well enough, is easy to use. Implementing data transfer over UDP in sane way is monumental task. As well as some of the benefits are lost with it. The fact that TCP uses three way handshake prevents spoofing. With UDP you have to do something similar, otherwise your system is just going to be one huge DDoS amplifier which is well, not a good idea. I've seen some projects use stuff like RUDP and I can say the communication over UDP is a lot worse than over TCP. I commented on one discussion:"Tons of discussion and nobody mentioned this: https://en.wikipedia.org/wiki/Reliable_User_Datagram_Protoco... Reason for using TCP is that using UDP requires application specific implementation and making it good and efficient would require more work than making rest of the software. That's why the applications only use UDP where it really does make difference. Others just won't bother because it isn't worth of it.."
  • Reminded my self about ARQ and NAK. yet these are familiar from good old phone line modems and of course PAR which is used by TCP. Related: NACK, SACK. Related: RFC 4077
  • Microsoft launched .NET for Linux. That's nice. No need to fight with old Mono version and software compatibility problems. (Hopefully)
  • Microsoft added PackageManagement for Windows 10. That's absolutely great news. I've been using Chocolatey but hopefully in future it's possible to use native package manager. 
  • I keep wondering why so many Finnish sites are hosted at Amazon EU West, even if it's clearly slower than Amazon EU Central when being accessed from Finland. If there are options to choose between Frankfurt and Amsterdam, that's though call because it varies from operator to operator. But choosing between Frankfurt and Dublin is not. Latency is always worse from Finland to Dublin. Some routes from Sonera Finland to OVH Roubaix seem to fluctuate between Frankfurt and Amsterdam. When ever running trace, you don't know which path the packets are going to take.
  • Some stuff on this post is really surprisingly negative. But actually we should learn from it. It is what it is, but how we can fix it?
  • Lol, some say programming is hard. But it's same thing with everything. I was really laughing when one of my friends had huge struggle with GIMP and LibreOffice Writer to get things right. He kept cursing for hours because things simply won't work out. I told her that duh, that's just like programming. But eventually you will learn, just keep banging your head. GIMP layers and adding text as well as thend page numbering with LibreOffice documents can be real pain in the ... if you don't know how to do the thing. How about using paragraphs which reset page number or using title pages and page number offset? But if you don't know that, it's rage and taking a quite good while reading tens of different more and less incorrect instructions how to accomplish that. Yes, it's exactly like tuning Linux, Windows or programming. Lol again. Just lookup how to do X using CSS or how to configure Linux network stack to ... And you're going to have fun penty of fun. Or like in my case, joining stuff using Peewee in very specific way where I want to also get the rows which do not have joinable references.
  • World is so full of *t software... I'm still highly amused about PDF files. First of all, people seem to think that you can fill PDF forms, nope you can't. It sucks. Another really amusing misunderstanding is that PDF is somehow signed and encrypted format and you can't modify it. Which is yet another purely delusion belief. World is just so full of crap software. Even basic things like office applications are just so hopelessly badly coded. Due these limits I always have all on my stuff on my own computer, pdf, odf, docx, doc and other formats. As well as I have the working application to show etc. Some formats embed fonts and others won't and some systems render unknown fonts really funnily, like Windows shows Ubuntu font which is missing as Script etc. So if you save document as docx or odf and recipient doesn't have the font installed document or presentation gets totally screwed up and so on. It's just baffling to notice how full of bad and really bad software world is and how much suffering, pain and loss of productivity all that crap causes. Yet another thing is that how overly complicated software is "cool", I think that software which is fast, light, simple and does what's essential is the coolest thing ever. Any complex standards or really super complex standards are horrible, making everyone suffer, users, developers, just everyone. It also takes resources away from the truly profitable stuff. So why is anyone making such a horrible software? Maybe there's just some kind of need for it? Because we're paid to do it? Yuck! Lean, mean and working. That's the way.
  • I've seen projects which have used millions of euros to produce, well, absolutely nothing, but confusing and poor very high level documentation without any practical details. Worst part of all this is that the solution is so obvious that just a few nerds meeting for a half day would have solved it. But that would be just so wrong, getting something done, instead of keeping useless meetings, where discussion starts from absolute zero over and over again and nothing is actually getting produced. Good thing is that the catering service was good. I think this is excellent concept for the pointy-haired boss from Dilbert. But I don't personally like it, I would get the thing done and that's just it. Actually I was at one point totally shocked how accurately Dilbert describes software and IT industry. Smile.
  • Yet another thing I really hate is let's just discuss this matter. Well, does the discussion change any of the facts? No? Does it help in anyway? No? What about going straight to the facts and looking for some real solution, instead of wasting our time over dinner talking about some high level talks. It doesn't solve the problem and won't be in any other meaning full way helpful. Like show me the API documentation and tell me what it costs. Talking about what kind of organizational history or how cool tech you're using, won't help. It won't make things happen like fixing the problem or creating the integration, or making stuff work like the customer wants it to work. This is not personal, this is business and getting the things done, delivered and solved as fast and efficiently as possible is preferred. Please send clear agenda and documentation before the meeting, describing what is being offered, what's the plan and what it costs and what are the benefits. When you do that, I can decline before the meeting and we both save time. It's just pointless to have pointless meetings.
  • It's interesting to notice that in Estonia, they do the things which Finnish people talk about. And in Sweden, aww. They're still planning to have a meeting about preparing the meeting and thinking what we should talk about at the meeting. (When Estonians already got the specifications done and they're now working (actually) hard on the product.) There's huge difference between looking like you're working and busy and actually getting the thing done. 
  • Just be straight, honest and go straight to topic, if you can't get it done, just say so. If you say you will get it done, try really hard to get it done, if you won't well, that's too bad, you look bad and if you do it a couple of times or when things are important enough, you won't be trusted anymore. I value being straight and honest more than being polite or politically correct. I really don't like hedging topic, just say how things are and that's it. We save time, and can proceed to fixing potential problem. Instead of trying to guess what the problem is.
  • I've seen some of the open source projects do absolutely great progress. Team of 5-10 competent and efficient people who want to get the thing done, can achieve miracles. Compared to organizations where all kind of 'overhead' is being done and years pass and stuff which would be already completed is still being considered if we should do it, and can it be done, and how. How about stopping to wonder things and getting it done. It's kind of analysis paralysis. Which I admit I'm at times suffering personally, especially when talking about stock market or investing.
  • Some people say it's good to be always happy and always smiling. But it's not, when things won't work out. I'm happy and smiling when results are excellent. From larger organizations I've noticed that people are totally disconnected from reality, what's productive and what's not. It would be interesting to see those guys as independent www.entrepreneur.com working alone. I've heard a few complaining about the salary and how 'easy' it would be millions as alone developer. Smile. Especially if you're focus and productivity is ahem, bit lost. And that's slightly said.
  • Strange system freezes at OVH ended when I complained about it. It might be that the HOST was running out of memory and swapping. Yes, that's also reasonable optimization when trying to run as many servers as possible on cheapest possible hardware. Even if the VM isn't out of memory the HOST could still swap memory out and VM might now reval it. Totally reasonable optimization, because systems do reserve and keep a lot of memory which isn't usually being used. And now it's usable, it's just bit slower when needed.
  • Studied EU Data Retention Directive (DRD)
  • Tried Debian GNU/Hurd in VM. Yes, it works, and does look just like any linux bash level and application level. Of course there are major gaps. I assigned 8 gigs of memory for it and on boot it said limiting memory to ~1.9 gigabytes. Etc. But getting so far is already a really great thing.
  • Grooveshark shutdown. That's sad. I really liked Grooveshark, it was the best music site on net afaik. So sad to see it's gone. I also recommended it to all of my friends, instead of Spotify. I also liked Pandora but it doesn't work in my country.
  • 6 Tips for Goole App Engine from Streak. No non-obvious surprises there.
  • Mozilla wants to deprecate HTTP and make whole web 'secure'. Whole point of certs is to verify the site ownership. If getting certs is too easy, well those are then worthless. As it happens to be already. Email verification of domain ownership isn't good verification at all. These certs even if trusted are no different from self signed certs. IMHO.
  • Pentagon announces new strategy for Cyberwarfare. MAD for Internet, cool?  
  • It seems that my personal email server is handling about 6k mails / month. That's like 190 / day. That's actually quite horrible when you think about it.
  • Checked out EmDrive and Cannae Drive. Related: RF resonant cavity thruster, spacecraft propulsion, microwaves, magnetron, reaction massless propulsion.
  • Some analysis on WhatsApp security. As usual and expected, it seems to be weak.
  • Rehearsed my Morse code and reminded my self about tap code messaging protocols.
  • The data center and server arrangements made lower purchase and production costs for many services for over 80%. That's immense! When you think how much it will add to the profit margin if the customer prices aren't cut. Good article how reducing costs on YouTube possibly saved it.
  • What we can learn from history, that if there's a way to communicate at all. Then there's also a way to covertly communicate over that channel. It's just encoding matter. It's nearly impossible to prevent that. Camp X, Clandestine Warfare, Covert Sabotage, Guerilla Warfare, 
  • DataBase-as-a-Service (DBaaS) aka Cloud Database. That surely fits some purposes great, and others very badly. But it's good to have that option available for cases where it's suitable and usable option.
  • I still can't stop deeply hating bad code, documentation and error messages. It's so extremely frustrating when things which really trivial and should take less than a minute end up taking days, weeks or won't even ever get solved and require some kind of complex obsecure workaround which poses tons of new potential for risk and catastrophic failure. I guess we all know that. It's just like selling cheap crap devices which are unreliable and you'll end up wasting tons of time to trying to get that stuff to work, and it might even randomly work before once again failing and causing even more loss of resources, time, money and mental energy.
  • Peewee add_column works well, drop_columm fails with OperationalError message which doesn't give any hint what's wrong. Compared to the fact how simple the command is, it's quite clear that there's some kind of annoying problem somewhere and it's better to add and drop the columns once again using pure sql that trying to waste your time playing with ORM which obfuscates things and makes things generally extremely painful. - After researching this deeper and trying all kind of combinations. I think the problem was that due to libraries I imported for migration some of those connected database and getting exclusive lock was impossible. yet, it would be really nice to get better than OperationalError exception. Unable to exclusively lock database, would have been nice message. But now everything works again.
  • This of course seems to pretty normal in business and managing things in general. Competent developers or managers, won't bother to do something which is clearly required and useful. They push the problem downward to other people. One guy, could have done it quickly and efficiently with good knowledge and shared there results with rest of the organization just won't bother to do it.. What's the way to get it done then? Just pass the problem to be solved by 1000 incompetent guys who don't talk to each other. The Win? Now we as organization managed to waste just about 100000x the resources that would have been required to solve the problem. Compared to situation where it would have been taken care efficiently. As bonus of that during the time the incompetent guys tried to solve the problem they possibly seriously messed up things and caused further customer dissatisfaction, bad PR, arguments, crisis meetings with customers and general huge loss of energy, mood , money, time and resources in all departments. (Hopefully not loss of life) Yes, that's just how things seem to work.
  • This applies just so much to stuff like software installation, upgrade scripts and everything at least in manufacturing in general. Yes yes, I know the quality of the cement or steel used to build the bridge was inadequate. But no problem, the bride won't fall a part before we can get rid of this fake company and run with the money. It's then someone else's problem to deal with that emerging catastrophe. I guess this is pretty much the mentality when building buildings around the world.
  • PSYOP - They're manipulating you! Don't listen!
  • Read about Saab Kockums A26 submarine
  • Read a book about Industrial Intenet, Data Analytics, Data Collection, Data Discovery, IoT, Sensor Networks, Remote Monitoring, pre-emptive service, pre-failure alerts, digitalization, business models. Fiware, Data Visualization, fragmented standards, user interfaces, usability, efficiency, cost beneftis, productivity, investments. Message queue telemetry transport (MQTT), AMQP, CAP, XMPP.
  • Smile, some say that IoT needs to be on same level as m2m to break through. That's not going to happen ever? Why? Well, it's just like your random desktop application isn't on same level when comparing to flight management system of modern airliner. The resources needed for it, are quite different. For IoT it's mostly enough if it works (if it even does that!) and for m2m, industrial internet software requirements are absolutely different. Yet this somehow reminds me from the Airbus overflow problem which could cut power from whole airliner and prevent flight controls from working. But that's something which happens with consumer products all the time. I assume you got hard drives? When have you updated firmware for those? Do you know what version you got? Do you even know if there are new version available and what the changelog says? Of course you don't. And this is just as things are going to be in future too. Do you know what firmware version your digital camera is using? No, do you know firmware of your toaster, fidge or ... You get the point. Most don't know and even those who know, mostly don't care. Not before things actually stop working. Have you checked if there's a firmware update for your monitor or tv? Yes, botnet of appliances will be here and it seems just inevitable. Lol, one article said that securing networks can be done using 20 usd/eur firewalls. Did you btw know that those got often security issues, device require firmware updates and so on.
  • Hyper-Convergence - It's kind of cool to re-invent thigns. Getting rid of SAN. Who said that there should have been SAN in the first place. Just install servers and use software to coordinate everything including replication to different disks. SAN was a bad solution because in most of cases the disk space prices were way too high compared to other kind of solutions. (Software-defined storage, software-defined networking), data center, rack, servers.
  • LclBd.com now uses fully threaded discussion model. Making it much more friendly for long and deep conversions. Next step is to optimize per user customized views, because currently compiling those from database can take too long for web app. I guess some kind of message that data is being processed is first step and second is that the intermediate results will be stored as cache to improve performance so the most computationally or I/O expensive (database) steps do not need to be recomputed on every reload.

PEP 492, KaPA, XBRL, Loon, Tableau, Pwnie Express, Cloud Prices, MongoDB, Great Cannon, Containers

posted May 2, 2015, 9:00 PM by Sami Lehtinen   [ updated May 2, 2015, 9:00 PM ]

  • Checked out: PEP 492 - Coroutines with async and await syntax for Python 3.5
  • It's just wonderful how simple MS guys have made freeing up disk space on server. How hard you can make managing disk space? At least Microsoft seems to be trying hard to make it as complex and annoying as possible. I would say it's user experience, but in a really bad way. Btw. 2012 version keeps complaining that the binaries are 16 bit which is clearly a lie when binaries are from wow64 or amd64 directories. But no can do! Of course it will eventually work out, but this is again example of situation where thing that should take less than minute ends up being annoying struggle. Just as remark, I kow you can install full I don't want ton install the whole Desktop Experience package which includes tons of stuff I don't need.
  • I'm still thinking that India dropping out of Internet.org is bad thing. Why? Because it's better to have even limited Internet access than no access at all. Access to Wikipedia alone can be literally a life saver! It's easy to forget that there are 4 billion people without any kind of Internet access. SpaceX, OneWeb, Facebook, Google, Qualcomm, latency, Intelsat, Project Loon, Ascenta, graveyard orbit.
  • Actually this is great question, what is net neutrality and is it discrimination or freedom? It sounds like with some topics actually freedom is used as basis for discrimination. Isn't forcing quotas also discrimination even if some people see it as gender neutrality. Afaik true gender neutrality means that gender doesn't matter, if there's quota being assiged by some statistical distribution and it's enforced to achieve "neutrality", isn't that actually discrimination? Hmm, great questions. Why there's food aid? Why some people are forced to pay for their food when others get it for free? Is that discrimation? No, I'm not actually trying to make any kind of stand here. I'm just asking questions and wondering "what's right and why" and if there's even absolute truth to such questions.
  • Checked out Kansallinen Palveluarkkitehtuuri (KaPA), jossa osana on kansallinen palveluväylä ja sen tiedonvälityskerros. Same in English: National Service Architecture which includes service and data transport layers.
  • It's also great that project Loon is progressing.
  • Reminded my self about XBRL and XBRL @ Wikipedia
  • Tableau 9.0 got again huge performance increases. Version 8.0 was already just blazingly fast, and 9.0 futher drastically improves speed of data visualization. It's just wonderful to notice how fast things can be done via proper analysis and optimization. If you compare how quickly you can pull group by results from average SQL server versus Tableau the difference is just huge.
  • Made extensive investing cost comparison analysis, including all trading fees, work time, taxation and other factors. If it's better to invest using direct stock purchases, and using which stock broker, low cost index funds or ETFs. Currently I'm utilizing all three options, so I also had track record of actual costs I could check out as well as compare potential differences where possible. Only thing guaranteed to happen when investing are costs, potential profits and related taxes may come or many not come.
  • OpenBazaar 0.4 released with it's new rUDP transport. Yet it seems that after start the program consumes more and more cpu and memory when time passes, some cleanup systems aren't clearly working correctly. After enough time has passed, the process silently crashes.
  • Pwnie Express - It detects active attacks, but doesn't detect passive attacks. So it won't help to many of tracking threats.
  • There are stunning price differences between computing and storage service providers like Amazon AWS, Microsoft Azure, Google Compute Engine, UpCloud, Hetzner and OVH.
    Here's AWS vs UpCloud comparison by UpCloud. As said there are huge differences on "cloud service provider" pricing. UpCloud is clearly a lot cheaper than AWS, Azure or GCE. But there are even cheaper options if you're just looking for those. No wonder OVH is Europe's largest service provider. It seems that some of the service provider market is really illiquid and some service providers charge ridiculous fees as well as provide exceptionally bad service even while charging a lot. That's the field where many of the traditional Telcos are. You'll end up paying 10-20x the market price for the service. No wonder they got these 'sales guys' talking lot of trash and then charging a lot. In many cases you won't even get full console or management card access to servers as well as you don't get proper management console for the systems. I've been talking with several of these providers and it's hard to believe how much effort they put in sales, which should be placed on automation and cost efficient processes.
  • Time for To-Do list task cleanup. Let's see how many I've completed, I've just forgotten to check completed. And how many I can now discard as 'expired' and time has just gone past those. And how many relevant tasks there will be left. This is going to be interesting. Yet it's usually very nice to notice that tons of tasks have been completed without just updating the To-Do list.
  • It's interesting to see how often nothing actually new pops up. Google Project Fi. Is just so old stuff. It's just like fully automated traffic (not only cars) or personal flight. So old stuff that you won't believe it as well as immediately obvious for everyone. Only problem is actually making it practical and cheap enough. So idea is nothing, execution is everything. All parts of this concept have been widely used, but yet have failed to gain wide spread long term use.
  • Pretty nice post comparing MongoDB and Azure DocumentDB. Blob storage part was quite compact. Great questions about blob storage would be if it's efficient and how it handles transactions. Is it also fully transactional with the main object data. If blobs are external data, and stored outside the main document, then those are clearly separate data structure and it get's even worse if those aren't handled transaction wise. In many cases where I know there's large blob data, and I don't need transactionality I immediately choose not storing large blobs in database at all. I prefer storing file id or hash instead of the data. If large blobs are stored with rest of the data every update to the document most probably leads to situation where the blob data is also rewritten to disk making updates really expensive and slow.
  • Security now 504 and TrueCrypt security audit, China Great Firewall and Great Cannon.
  • Just as I've said earlier, filtering DDoS is extremely hard. Because in case of the Great Cannon, they did just what I've said. Run the attack on high level and from distributed sources and make completely valid requests. Then it's really hard or impossible to know if this is attack or if it's just flood of actual users. Attacker can even simulate completely valid traffic. Yes, it's harder than just flooding packets, but it's also much more effective. Also everything they say is new attack blah blah, is decades old. As long as IP and TCP. There's nothing new there. Intercepting and modifying traffic has been going for a long time. As I said, I did at one network monitoring company back in early 1996. Why? Well, just because I could and I had good pre-existing toosl. So that makes me absolutely sure that many, many, researchers and tool developers had though about that stuff several years earlier. Even Trumpet Winsock allowed to dump packets, as soon as you can easily see what's there, it becomes immediately clear, that it's possible to modify the content too. I remember that some early VoiP apps allowed to masquarede their traffic into ICMP pings and into multiple other message types when required. Anything over anything, isn't anything special, because it's just bits. You can wrap or convert those into other forms, like, light, radiowaves and so on. As well as you can wrap data into existing formats just like text tv in TV signal or stereo in FM transmissions. It's obvious.
  • Yet another programmer encountered 'strange issues' with floating point arithmetic. Well, there's nothing strange about it, it's just how it is. And it has been standardized. See: IIIE 754 / ISO/IEC/IIIE 60559:2001
  • Read article about DARPA's Memex search technology
  • When every you're talking with some project, it's good to get immediate picture in mind. This is the user interface they're needing, these are the integrations, messages and message formats they will be needing, and thse APIs should be used. AS well as what kind of hardware they require for the project as well as database tables and schema. Know it all, is a great approach!
  • I want to do and accomplish things. It's always important to find a team who loves their job so much they would be doing it even without getting paid. So if it makes good money on top, that's all just a great bonus!
  • Checked out yet another e-receipt product. It was nice and technically working. Yet, what's the point? Who's getting the benefit? Customer? Reseller (Receipt Issuer)? Bookkeeping office possible receiving the receipt as part of traveling expenses? Technically it's trivial to do, as soon as there's demand. And there won't be demand before there are users and there won't be users as long you don't do anything with the e-receipt.This is the traditional new technology problem. I remember USB it was a joke for several years. Also everyone got their own implementation.
  • Reported a few OpenBazaar issues: [ 1244, 1245 ]. It's good to notice that commit 58463c2 fixed the issues.
  • Checked out new CDN provider CDNsun. They got really dense network. But I guess Akamai got plenty more POPs than they do.
  • Cinia's project C-lion got now it's own pages. Connecting Finland and Germany with direct high capacity fiber. Without passing via Sweden and Denmark.
  • OVH doesn't seem to automatically assign IPv6 address for Windows 2012 R2 servers. I'm wondering if this can be a security issue. When they let you know about the IPv6 address you'll have to enter it manually. When you enter it, Windows defaults to /64 network. But OVH is actually using /56 networks. After quick check it seems that there might be possibliy a security issue and MITM possibility here?
    Traceroute with incorrect /64 prefix:
    1    <1 ms    <1 ms    <1 ms  vps3.ip42.eu [2001:db8:2152:7001:d3m0::2265]
    2     1 ms     1 ms    <1 ms  2001:41d0:52:7ff:ffff:ffff:ffff:ff7e
    With correct /56 prefix the first hop is different:
    1     1 ms    <1 ms    <1 ms  2001:41d0:52:7ff:ffff:ffff:ffff:ff7e
    Is this a security problem or not? I'm not 100% sure, but at least it very potentially could be. It depends what's the role of the vps3 in the traceroute is. I guess I have to ask them. I checked that they're using manual configuration for IPv4 and IPv6 on Linux boxes and there aren't any router advertisement (RA) packets being delivered on the network which could work as good hint where to forward packets.
  • Watched Atari Game Over documentary.
  • Containerization can bring new blind spots to IT security. Just fire and forget containers can cause serious security problems, just like obsolete network hardware. Excellent blog post about this topic. How this differs from most of installations? People just setup something semi randomly and when it seems to work, that's it. Then they forget it, until it stops working and they have to do something to fix it. No news there. Unfortunately.

ZyWall 50 USG IPv6, Bloom Filters, Tableau, DE-CIX, Game-Theory, Hash, Pycon, BitHalo

posted May 1, 2015, 9:39 PM by Sami Lehtinen   [ updated May 1, 2015, 9:58 PM ]

  • Studied X-Road protocol version 6.0 - Which is governmental and national ESB integration solution which allows basically anyone to join in 'easily' and 'cheaply' compared to well many more traditional solutions.
  • Reminded my self about simple details of Magnet URI scheme - I think OpenBazaar should use something similar.
  • Created a few information flow diagrams (IFD) - to visualize how processes work between departments and to describe business process information flows.
  • Checked out: Google network edge, Google Cloud DNS, Google Carrier Interconnect
  • Watched tons of PyCon 2015 videos, Guido van Rossum's Python Type Hints which was especially interesting. Including Python Static Checking, Linting, Type Checking, Static Code Analysis
  • We all should be very familiar with this stuff already Hash Functions and You - Curtis Lassam, Btw. Excellent talk! Not too deep, but if you're not familiar with that stuff, watch it.
  • REST API Descriptive Language (API DL) Related: RAML, SWAGGER, API Blueprint, WADL, SOAP, WSDL, JSON, XML
  • Checked release notes and toyed around with Tableau 9.0
  • Reminded my self about Bloom Filters. I just needed those in one project to reduce database lookups and the great hash talks at Pycon reminded me about using those when needed.
  • Watched SpaceX CR6 first stage landing video, I guess everyone did. They were pretty close being successful, but it still failed. Also checked out SpaceX rocket engines - As well as reminded my memory about rocket propellants.
  • Had long talks with consults about project communication, documentation, management, scrum, agile methods, continuous integration, software quality management, version control, Trello, Sharepoint, Kanban and so on. All the business as usual.
  • Studied LightSail energy storage technology. So they don't waste heat energy from compressing air, nice? It's then secondary question where and how this thermal energy is stored. This is also the key, which has made most of earlier compressed air solutions so inefficient. Energy is lost due to heat loss during compression and decompression, if it's not stored somewhere and restored later.
  • Reminded my self about operators which are connecting directly to DE-CIX @ Frankfurt.
  • Thoroughly studied ARTS XML Digital Receipt Technical Sepcification Version 2.0 (April 21, 2001 - Candidate Recommendation) - Which is related to e-receipt stuff.
  • It seems that ThunderBird got some kind of bug when updating folders. At times it just hogs CPU and doesn't do anything at all. Clear infinite loop somewhere. After restarting the email client, everything works perfectly again. I don't know if this is known issue. But I just didn't bother to make a ticket about it. I've experienced it so many times now, that I'm sure there's a bug somewhere. Usually it's related to cases where large number of messages have been added or deleted to a folder other than Inbox. I've experienced hang with Sent and Deleted (Trash) folders.
  • Project to build new direct undersea fiber from Finland to Germany seems to be progressing. Currently they're mapping the seabed in detail and making plans for laying the cable there. Which will hopefully follow in about 4 months.
  • Checked out yet another "secure" email service provider Tutanota. It's just like so many others like it: Hushmail, Safe-mail, Protonmail, and so many others. HTTPS Webmail sending out links is kind of kludge, because it doesn't mean that the messages would be transported over SMTPS and that the SMTP server certificates would be used and verified. But certificates can be practically meaningless if it's too easy to obtain those. More about that later.
  • Gorilla Glass might seem like durable, but it's not shock resistant. I just dropped my phone for about 3 centimeters onto my granite table and glass did get multiple fractures. It's clear that it's hard, but it's way too hard to hand shocks when colliding with other hard and heavy objects.
  • Telegram Fist - It revelas who's sending telegrams using statistical analysis, even if you would think that Morse code would be pretty anonymous. There's just so much information leackage in many mediums. As well as it's possible to tell if radio is remotely operated or locally operated from the characteristics of transmission when sending analog messages. Yes, all of this is old stuff. But it just tells how much even much simpler systems leak side channel information.
  • OpenBazaar Thread Model Analysis by Dionysis Zindros.  Assumed adversaries and malicious groups, game-theory, incentives, censorship, eavesdropper, PKI, RSA 1024, Tor, GPG, HTTPS, CA, DNS, Bitcoin, RSA, ECDSA, SHA256, AES, Python, Javascript, Angular, Developers.
  • Studied BitHalo. Yet another Bitcoin and BlackCoin related trading platform bit like OpenBazaar.
  • Watched AirAsia crash plane crash documentary going through the events which lead to the unfortunate situation.
  • Why to blog if nobody's reading? Well, that's good question. Yet I've used often my own (web) log aka blog as my things I've done log. So it's easy to visit and check when I did and what, even if I'm not usually writing complex or deep articles.
  • Checked tons of different developing market ETF's, Africa, Asia, India, China, Saudi-Arabia, Russia. I'm also following Greece situation, as everybody else is too (?).
  • Bandwithplace is a nice HTML5 bandwidth tester like Speedtest. It doesn't require Flash or any software installation. It's also interesting to see how connections from Finland are wired to neighbouring countries. Because based on which operator you're using, fastest server can be in Amsterdam, Frankfurt or Vilna. Also the new Sea Lion cable straight to DE-CIX (?) sites can change this even further. I guess some operators will be utilizing it and others won't. Same thing applies here, some operators route directly to Vilna via Tallin and some operators route via Moscow or St. Petersburg and in some cases data takes trip to Amsterdam and then back to Vilna from Helsinki.
  • Swarming flying robot drone bots are here, under project name Sensintel Coyote LOCUST project. Straight out of movies where mothership comes and drops swarms of smaller fighters.
  • Ymail.com (Yahoo Mail) email delivery is just unacceptably slow. It took 5 minutes for Yahoo to deliver email out. Even if other service providers can do the same under a second. No go, that's a show stopper in modern world.
  • Once again checked IPv6 configuration stuff, now everything is working as expected. Yet I got a few things to wish about the IPv6 loggin with ZyWall USG 50.
    When using IPv6 DHCP:
    > netsh int ipv6 show int 11
    Router Discovery                   : enabled
    Managed Address Configuration      : enabled

    Interesting point is that you can't manually set those options on simultaneously in Windows. You either have Router Discovery or ManagedAddress enabled. But when RD is on and RA message announces DHCPv6 then MAC is also automatically enabled. It's kind of confusing at all on user interface level.
    When using Windows you can check neighbourhood cache using:
    netsh interface ipv6 show neighbors
    If you mess up your IPv6 configuration you can completely reset it.
    netsh interface ipv6 reset
    A reboot is required after reset.
  • ZyWall USG 50, IPv6 DHCP DHCPv6 Logging
    Yet I'm not entirely happy with the logging details when using ZyWall USG 50 DHCPv6 with IPv6:
    IPv4 DHCP log snippet:
    10   2015-04-14 01:54:07
         info                dhcp                   DHCP Request
         Requested 172.23.130.9 from RANDOM(F4:58:D2:E9:09:F8)
    11   2015-04-14 01:54:07
         info                dhcp                   DHCP ACK
         DHCP server assigned 172.23.130.9 to RANDOM(F4:58:D2:E9:09:F8)

    There you can see the MAC vs IP relation very nicely. But when using IPv6...
    IPv6 DHCPv6 log snippet:
    11   2015-04-13 15:57:00
         info                dhcp                   DHCPv6
         DHCPv6 [solicit] Destination ff02::1:2 from fe80::9bf2:488d:34c2:a2c7
    12   2015-04-13 15:57:01
         info                dhcp                   DHCPv6
         DHCPv6 [request] Destination ff02::1:2 from
    fe80::9bf2:488d:34c2:a2c7
    It's nice, I know that that 'they' got now IPv6 address. But what's the global IPv6 address being assigned to it? Nobody knows. What IPv6 address was assigned to requester via DHCPv6? No information about that what so ever is stored in logs. Great, just great. DHCP is better than SLAAC with privacy extension? No, it's not. It doesn't provide any additional information at least on logging / access level in this case. Of course it's possible to manually assign IPv6 addresses using DUID and fixed address lists, but that's not exactly what I had in mind for most of the networks. Without that and bad logging, DHCPv6 is just as good as SLAAC as far as I can see. When using SLAAC without privacy extensions, each computer get's it address which contains th MAC of the NIC. Then it's easy to control per machine outbound and inbound traffic using firewall. Because you know exactly which address is being used by which computer, without any manual configuraiton. So in that sense it's also on the same line with DHCP. Machines which do have 'unknown addresses' are of course fully blocked using firewall. Of course using DHCPv6 allows usage of smaller than /64 subnets if required, yet many discussion forums metioned that there "could be" potential problems, especially if there are clients using not so great IPv6 stack and potentially missing DHCPv6 support which is required in that case.
  • I made a ticket about this logging issue to ZyXEL. I also got confirmation from ZyXEL that this is known issue with ZyWall USG 50 model (and presumably with other USG models too). The current firmware doesn't simply log enough information even if debug mode would be enabled. This issue will be fixed in future updates to the firmware.
    Related RFCs:  RFC 2473, RFC 3315, RFC 4861, RFC 6106, RFC 7113, RFC 4861, RFC 1256, RFC 4291, RFC 6343, RFC 5969, RFC 2461, RFC 2463, RFC 4443, RFC 2710, RFC 3122, RFC 2473, RFC 2765, RFC 5237, RFC 6106. kw: Router Solicitation Message, Router Advertisement Message, , Neighbor Solicitation Message, Neighbor Advertisement Message, Full dual stack and native IPv6.

Show - don't say, PostgreSQL, VP9, IPv6, Hashes, DHCPv6, DNSWL, Azure Nano Server, Great Cannon

posted Apr 25, 2015, 10:02 AM by Sami Lehtinen   [ updated Apr 25, 2015, 10:03 AM ]

  • A few tricks how to handle things more efficiently using PostgreSQL and it's arrays.
  • YouTube starts using VP9 video codec to improve video quality and saving a ton of bandwidth.
  • It seems that during holidays there are tons of attacks towards public internet facing servers. I guess the 'bad guys' know that if they hack the servers during holidays, nobody's probably going to do anything to fix the situation for several days. Especially if they do it so, that they don't disturb it's normal operation and just run their own additional tasks with low priority. Nobody notices or cares anything at all.
  • Configured ALL servers to use IPv6 as well as configured corporate network to use IPv6. Servers were easy to configure and software & firewalls, but the corporate network configuration took some pain, because there was different router / firewall which I didn't have to deal earlier with. Had to use similar test system to test and troubleshoot everything before moving working configuration into production. It took quite a while to get everything confirmed cross checked and so on. Also had fun with DUID fields. SLAAC, DHCPv6 and dhcpv6-slaac-problem by IETF. But it was worth of the fun I got while doing it.
  • All of my servers are now using only SHA-256 based SSL certificates. Yes including the full certificate chain, so there aren't any intermediate SHA-1 hashes. But the funny thing is that Google says that SHA-1 is obsolete cryptography technology. Yet they're using it all the time. Like in case of Gmail's certificates.
  • I just can't stop loving projects which are absolute mess and extremely badly documented. Yeah, you can get things working, if you are brave enough to think and go through all possible configuration options as well as read source code if it's available. It's just so annoying. But well, things will get done, when you just put enough effort into it. Some times some key information is assumed yet if you don't know what it is, you're pretty much failing hard for a long time.
  • Added my servers IPv6 address to DNSWL whitelist for ensuring email deliverability.
  • Studied tons of stuff about DHCPv6 and Router Advirtsement Flags (M,R,O). I had to, because I'm planning to use it. Yet it doesn't seem to deliver (With ZyWall) some of the key benefits I would assume getting. Like knowing who's using which IP address and when. Unless full manual configuration is being used with DHCPv6 DUID.
  • Microsoft "Docker" style servers. Aka containerization with minimal over head. They call it Nano Server on Azure platform.
  • Finally after a weekend it seems that systems are now getting 100% correct IPv6 configuration, including Windows & Linux systems as well as mobile devices. That's just awesome. Now everything is 100% dual stack allowing IPv4 and or IPv6 traffic. Also many services which earlier used NAT or port redirection are now directly reachable. It's better than using constant port mappings. As well as horrible in protocol ALG in some cases (FTP). Only one workstation is failing, which is the workstation I've been using to test everything. So there's some kind of configuration issue somewhere. Let's see how this plays out.
  • It's good to disable ISATAP, 6to4 and Teredo which are enabled all on default Windows configuration: Just enter commands in elevated shell:
    netsh int ipv6 isatap set state disabled
    netsh int ipv6 6to4 set state disabled
    netsh interface teredo set state disable
  • I liked this approach mentioned in one blog: About Startups - Show don't tell. "I'm going to build this amazing thing" is a LOT less interesting than "I've built this slightly crappy thing that actually does something". EVERYONE is GOING to build something, most people never do...
  • Amazon EFS did look interesting yet it's extremely light on details, which really do matter in these cases. It's just so easy to make bold bogus claims like "low latency" or high IOPS. Those are very relative terms.
  • Tons of configuration work with DNS and IPv6 stuff. But it starts to be pretty much done now. Phew! Now even visitor WiFi network provides full IPv6 connectivity. Allowing both options SLAAC + DHCPv6.
  • Checked out Call for bids - When wondering what kind of new features OpenBazaar could implement.
  • China's Great Cannon - New stuff? I think the attacks mentioned here have been known well over 20 years. Shocking news right? It was in 95 when I were at office doing IP networking and it was back then already trivial to monitor and modify packets, messages and content on the fly. Yet people still doesn't seem to realize that email and other stuff are just "post cards" whizzing by which can be modified at will when and if required.
  • Also checked out: Data Analytics using Pandas and SQLite and Python Boltons

Peewee ORM, IPv6, OpenBazaar, Hosting OVH, GNU Social, CupCake, Diaspora, IPFS, Apache Samza

posted Apr 24, 2015, 7:48 AM by Sami Lehtinen   [ updated Apr 24, 2015, 8:47 AM ]

My weekly web log dump:

  • Read about: 802.3bz and 802.by which are faster NBASE-T networking standards for Ethernet.
  • Checked out a few discussions about how to update components inside docker containers to patch vulnerabilities. Seems that there's currently no good way of doing it. 
  • Yet another trap. I was going mad because I couldn't get peewee to make the query I wanted to make. Or actually it did run my query, but results weren't exactly what I was expecting to see. There's a table which can contain five different categories for an article. I wanted to join all articles belonging into selected category and then sum sales for those. I did the usual join stuff and added where statement and used sum function. But alas, results were way too small (as sales amount) as well as I could clearly see that the amount of result rows wasn't what I was expecting it to be. After trying jumping through a different loops I found out the problem. When I used pure SQL writing the query took me about 30 seconds. When I used peewee it felt hopeless. After some time I decided I'll need to debug this deeper. Using the traditional ORM is just obfuscation layer for working queries I pulled the SQL query out what Peewee was creating. And there it was the trap. When I wrote the SQL query I just used pure join and then where statements. But Peewee friendly and automatically and seamlessly added JOIN ON. And the ON statement nicely referred to only one of the category fields. I added join(articles, on=( cat1  | cat2 | cat3 | cat4 | cat5 )) and the problem solved. Uaahh. Once again, pure SQL was so easy compared to hidden traps with ORM. Of course that automatic join on can be beneficial at times, if there's shared foreign key it's enough to join tables without additional statements. 
  • I actually did get a reply from Charles Leifer, I haven't yet said thank you very much! Because I've been busy with other projects.
  • Somehow I understand programmers who create useless extra rows into databases. Just to make querying much easier. In case of Peewee all the trouble starts as soon as tables being joined do not contain data being joined on. I've always said that programmers who add pointless data into database aren't really doing a smart thing. But in this case, everything gets just so much easier if you submit into doing that. Suddenly everything works straight out of box, instead of having continuous problems with queries. Another incredibly simple but bit funny way is to run two queries. First get stuff which can be joined and then additionally fetch stuff which can't be joined and then merge and sort it. Or use union and merge two statements where the second part doesn't contain results from the first part. I've seen that happening over and over again. It's some times really funny to see tens of millions of totally useless rows in database. But those are there, to make things simpler. You don't need to handle cases and build complex queries and code to work around missing information, even if it's redundant and useless. I've seen cases where there are tens of gigabytes of useless data stored in tables just to simplify queries. Now I can see why.
  • There's also some bad documentation with Peewee. There's difference between JOIN.LEFT_OUTER and JOIN_LEFT_OUTER yet documentation messes up with those. As well as fn.Count() and fn.COUNT() as well as fn.count() aren't same thing at all. 
  • UpCloud started to offer IPv6 free of charge for their IaaS servers. I've already configured my servers to fully utilize it. 
  • Some WiFi thoughts: Depends so much from environment. I would use only WPA2, dtim can be 3-10x beacon intervals depending from use case. Like for laptop network I would use 3 and for mobile devices 10. Rts/fragmentation is also very site specific, sometimes smaller values bring better results, but generally rts can be disabled and fragmentation can be off (maximum frame size as threshold). In congested areas Smaller Fragmentation Threshold + RTS can bring better results. If that even matters, in most of cases it doesn't. Depending from device quality auto channel can be preferred. 
  • Tested dedicated cloud SSD ARM servers from ScaleWay. - Liked it, excellent nice performance / price ratio. Yet the storage is virtual, which means it's stored. So even if server is dedicated shared storage can cause "noisy neighbours" performance problems. Their approach is bit different: "The craziest cloud technology is out! Welcome on board! Say good bye to virtual servers, we have defined the next computing standard: the physical cloud!" 
  • Tested even more OpenBazaar 0.4 version using several virtual machines. There are some issues, but commits are flowing in at a good pace. 0.3 network seems to be practically dead. I hope release of 0.4 version boosts the network to new heights. Even 0.3 had over 1200 users, mostly testing the network and not actually yet using it for anything. I guess the 0.4 version will reach 10x that easily. 
  • Something different: T-14 (Armata), Sub-caliber round, Ula Class Submarine
  • One test project is currently hosted at OVH. But I do have servers at DigitalOcean, Vultr, Hetzner and UpCloud. I do like OVH for my personal small projects, because it's reliable and dirt cheap. For more business critical stuff I'll prefer Hetzner. They and soyoustart (OVH) provides crazy performance per buck. Links: www.hetzner.de ovh.com www.soyoustart.com also online.net is worth of checking out or if you're looking plenty of storage space and cheap price then kimsufi.com vultr.com. My personal test servers are running at UpCloud, they provide hourly billing great performance but at clearly higher cost. (But still considerably cheaper than Amazon AWS, Google Cloud Compute Engine or MS Azure) One pro for services which got active and passive data is UpCloud MaxIOPS storage, which is combination of RAM, SSD and Cheap SATA storage. Data which is updated or often read is cached and stuff which is rarely accessed rests on SATA. It releaves developer from dealing with that and still gives affordable per GB price. Actually I built at one time such systems using bcache and dmcache. But that won't fly when some of production servers use Windows.
    I also love getting a good throughput: 2015-04-03 18:34:05 (68.5 MB/s) - ‘1GB-testfile.dat’ saved [1073741824/1073741824] just yesterday played tested it out with wget.
    I did consider Google App Engine for front a while. But problem with GAE is that if you get kicked out for some reason, there's no good real alternative platform to run the app without extensive porting. So for this kind of test project it wasn't a really viable option after all.
  • Just checking out GNU Social, what kind of stuff it got similar to Local Board (LclBd.com) and what's different. Is this better than Twitter and, if so, how. It's good to learn new stuff all the time. 
  • Tried GNU Social at Load Average to see how it's different from this and Twitter. Well, there are plenty of similar projects. Like CupCake Users are free to select which ones to use. Other provide better privacy and features than Twitter. With largest networks there's a big problem that those are being tightly monitored which doesn't necessarily apply to smaller networks like cupcake, GNU social or this Local Board (lclbd). My Load Average profile and my CupCake.io profile.Also tried latest version of Diaspora to see what they've come up with. My Diaspora profile. To be honest, it looks good and again much better than Ello. Finally my Ello.co profile.
  • Finished reading again latest issues of the Economist (I just love that stuff) as well as Finnish CTO and System Integrator magazines. Long articles about transmedia where same product is making money on multiple fields, I guess Angry Birds is a quite good example about that. Kw: Classic Concrete Experience Feeling Diverging feel and watch deflective observation watching continuum assimilating think and watch abstract conceptualisation thinking perception converging think and do active experimentation doing processing accommondating feel and do. Problem and project based learning. Learning & Awareness. Economist also got a long article about Data Protection and how rules differ in US and Europe. 
  • Estonia's e-residency program expands abroad, now official strong digital identity can be applied from 34 countries.
  • Once again wondered how multithreaded par2 can somehow hog system so badly, I guess it's related to it's disk IO somehow. After the actual massive and computationally CPU & Memory intensive Reed Solomon matrix compilation starts, system runs fine again.
  • Checked out IPFS - Nothing new, Content Addressable Storage (CAS) / Content Based Addressing (CBA) is nothing new at all. - Content Centric Networking - Named Data Networking - Lot of the IPFS talks are quite much off topic, they don't well describe the project, it's just generic promotional marketing like blah blah. Many of the related facts are totally hidden under this marketing hype. I made separate post about this IPFS topic. Sorry, posts are again out of order. I often queue stuff in backlog and release out of order. Some stuff can be just logged in yearly mega dump.
  • But as I've earlier written Named Data Networking and Content Addressable Storage aren't new concepts. Actually at one point people tried to hire me for bit similar project where there was small JavaScript library which would then create host based swarm and load content from peers and use the primary server as backup only if no fast enough peers are available. 
  • Got bored with the fact how badly ThunderBird networking stuff is written. At times it just hangs and requires restart of whole system. It really annoys me. I've checked that it's a pure lie and just the internal state of the app sucks. Because I can access same resources using other applications and other networks without any problems. Except that ThunderBird just fails hard. Of course after rebooting workstation miraculously issues at the server gets fixed. Yeah, right. 
  • Why earlier it was recommended to shard images to multiple pieces, splitting site on multiple domains and so on. Now with HTTP2 single tcp stream is being preferred that's kind of strange...
  • For one project which handles tons of messages asynchronously I've implemented "replay solution" which is excellent for testing and development as well as situations where database needs to be reconstructed. It actually quite well follows Apache Samza thoughts. All messages are stored as those are when received from the network into data feed storage and then only local "derivate" data is processed from that for end users. When something needs to be fixed, tested, developed, changed. I just make changes and replay that whole feed storage into system. At that time it's easy to see if everything goes through well and if there are any exceptions raised. As well as if some messages are incorrectly handled for a reason or another, it doesn't matter. I'll just fix code and run replay again. So handy. This also allows fully stateless and asynchronous processing, there's technically no correspondence between other parts of the program and the received / handler module. No state needs to be maintained what so ever, so I'm using fully stateless message passing implementation. 
  • I don't like Skype at all. It's delivery status information is so archaic. It only lets you know if the message you sent is delivered to "Skype cloud", but it won't tell you if the recepient has received or read the message. Other newer IM systems handle these things much better!  
  • One article said that there will be huge demand in Sweden for ICT guys. Especially Internet of Things and Big Data will add need for competent techies. It's also important to know well whole system architecture and integrations as well as project management and things will work out smoothly. 
  • Finnish quote from ICT mag: "Lisää osaajia tarvitaan tulevina vuosina etenkin tietoturvan pariin. Esineiden internetin kasvun ja big datan myötä myös muun muassa järjestelmäarkkitehtuuriosaajille tulee kysyntää. " - There will be jobs for ICT guys even in future, who are passionate, ready to learn and work hard.

UpCloud IPv6 network configuration for Ubuntu Server

posted Apr 20, 2015, 9:23 AM by Sami Lehtinen   [ updated Apr 20, 2015, 9:26 AM ]

UpCloud uses SLAAC for IPv6 configuration. But they explicitly allow only traffic from specified addresses and therefore privacy addressing / extension must be disabled. They did provider instructions how to get this stuff done, but I found the instructions to be "non-optimal".
 
*** sysctl.conf additions for Ubuntu ***
 
# IPv6 additions from UpCloud documentation
 
net.ipv6.conf.all.use_tempaddr = 0
net.ipv6.conf.default.use_tempaddr = 0
 
*** /etc/networking/interfaces additions ***
 
auto eth2
iface eth2 inet6 auto
 
*** Shell ***
 
sudo ifdown eth2
sudo ifup eth2
 
*** Verify output and address ***
 
ifconfig
 
eth2      Link encap:Ethernet  HWaddr aa:aa:aa:80:47:0d
          inet6 addr: 2a04:3540:1000:310:a8aa:aaff:fe80:470d/64 Scope:Global
          inet6 addr: fe80::a8aa:aaff:fe80:470d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:36 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:3868 (3.8 KB)  TX bytes:7270 (7.2 KB)
 
*** What wasn't optimal ***
 
https://www.upcloud.com/support/allocating-new-ip-addresses/
 
Well, these settings didn't matter at all. So I didn't set those values. Also I don't know if those are really required. At least if you're not planning to setup a router which I wasn't.
 
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.default.rp_filter=0
net.ipv4.ip_forward=1
 
And the settings about tempaddr wasn't mentioned at all. I double checked it, without disabling temporary privacy addressing things just won't work out.

*** Windows 2012 R2 configuration ***

With Windows 2012 R2 servers the guide was all good, disabling privacy addressing provided immediate static IPv6 address for server:
 
netsh interface ipv6 set global randomizeidentifiers=disabled store=active 
netsh interface ipv6 set global randomizeidentifiers=disabled store=persistent 
netsh interface ipv6 set privacy state=disabled store=active 
netsh interface ipv6 set privacy state=disabled store=persistent

1-10 of 249