Blog‎ > ‎

Peewee ORM, IPv6, OpenBazaar, Hosting OVH, GNU Social, CupCake, Diaspora, IPFS, Apache Samza

posted Apr 24, 2015, 7:48 AM by Sami Lehtinen   [ updated Apr 24, 2015, 8:47 AM ]

My weekly web log dump:

  • Read about: 802.3bz and 802.by which are faster NBASE-T networking standards for Ethernet.
  • Checked out a few discussions about how to update components inside docker containers to patch vulnerabilities. Seems that there's currently no good way of doing it. 
  • Yet another trap. I was going mad because I couldn't get peewee to make the query I wanted to make. Or actually it did run my query, but results weren't exactly what I was expecting to see. There's a table which can contain five different categories for an article. I wanted to join all articles belonging into selected category and then sum sales for those. I did the usual join stuff and added where statement and used sum function. But alas, results were way too small (as sales amount) as well as I could clearly see that the amount of result rows wasn't what I was expecting it to be. After trying jumping through a different loops I found out the problem. When I used pure SQL writing the query took me about 30 seconds. When I used peewee it felt hopeless. After some time I decided I'll need to debug this deeper. Using the traditional ORM is just obfuscation layer for working queries I pulled the SQL query out what Peewee was creating. And there it was the trap. When I wrote the SQL query I just used pure join and then where statements. But Peewee friendly and automatically and seamlessly added JOIN ON. And the ON statement nicely referred to only one of the category fields. I added join(articles, on=( cat1  | cat2 | cat3 | cat4 | cat5 )) and the problem solved. Uaahh. Once again, pure SQL was so easy compared to hidden traps with ORM. Of course that automatic join on can be beneficial at times, if there's shared foreign key it's enough to join tables without additional statements. 
  • I actually did get a reply from Charles Leifer, I haven't yet said thank you very much! Because I've been busy with other projects.
  • Somehow I understand programmers who create useless extra rows into databases. Just to make querying much easier. In case of Peewee all the trouble starts as soon as tables being joined do not contain data being joined on. I've always said that programmers who add pointless data into database aren't really doing a smart thing. But in this case, everything gets just so much easier if you submit into doing that. Suddenly everything works straight out of box, instead of having continuous problems with queries. Another incredibly simple but bit funny way is to run two queries. First get stuff which can be joined and then additionally fetch stuff which can't be joined and then merge and sort it. Or use union and merge two statements where the second part doesn't contain results from the first part. I've seen that happening over and over again. It's some times really funny to see tens of millions of totally useless rows in database. But those are there, to make things simpler. You don't need to handle cases and build complex queries and code to work around missing information, even if it's redundant and useless. I've seen cases where there are tens of gigabytes of useless data stored in tables just to simplify queries. Now I can see why.
  • There's also some bad documentation with Peewee. There's difference between JOIN.LEFT_OUTER and JOIN_LEFT_OUTER yet documentation messes up with those. As well as fn.Count() and fn.COUNT() as well as fn.count() aren't same thing at all. 
  • UpCloud started to offer IPv6 free of charge for their IaaS servers. I've already configured my servers to fully utilize it. 
  • Some WiFi thoughts: Depends so much from environment. I would use only WPA2, dtim can be 3-10x beacon intervals depending from use case. Like for laptop network I would use 3 and for mobile devices 10. Rts/fragmentation is also very site specific, sometimes smaller values bring better results, but generally rts can be disabled and fragmentation can be off (maximum frame size as threshold). In congested areas Smaller Fragmentation Threshold + RTS can bring better results. If that even matters, in most of cases it doesn't. Depending from device quality auto channel can be preferred. 
  • Tested dedicated cloud SSD ARM servers from ScaleWay. - Liked it, excellent nice performance / price ratio. Yet the storage is virtual, which means it's stored. So even if server is dedicated shared storage can cause "noisy neighbours" performance problems. Their approach is bit different: "The craziest cloud technology is out! Welcome on board! Say good bye to virtual servers, we have defined the next computing standard: the physical cloud!" 
  • Tested even more OpenBazaar 0.4 version using several virtual machines. There are some issues, but commits are flowing in at a good pace. 0.3 network seems to be practically dead. I hope release of 0.4 version boosts the network to new heights. Even 0.3 had over 1200 users, mostly testing the network and not actually yet using it for anything. I guess the 0.4 version will reach 10x that easily. 
  • Something different: T-14 (Armata), Sub-caliber round, Ula Class Submarine
  • One test project is currently hosted at OVH. But I do have servers at DigitalOcean, Vultr, Hetzner and UpCloud. I do like OVH for my personal small projects, because it's reliable and dirt cheap. For more business critical stuff I'll prefer Hetzner. They and soyoustart (OVH) provides crazy performance per buck. Links: www.hetzner.de ovh.com www.soyoustart.com also online.net is worth of checking out or if you're looking plenty of storage space and cheap price then kimsufi.com vultr.com. My personal test servers are running at UpCloud, they provide hourly billing great performance but at clearly higher cost. (But still considerably cheaper than Amazon AWS, Google Cloud Compute Engine or MS Azure) One pro for services which got active and passive data is UpCloud MaxIOPS storage, which is combination of RAM, SSD and Cheap SATA storage. Data which is updated or often read is cached and stuff which is rarely accessed rests on SATA. It releaves developer from dealing with that and still gives affordable per GB price. Actually I built at one time such systems using bcache and dmcache. But that won't fly when some of production servers use Windows.
    I also love getting a good throughput: 2015-04-03 18:34:05 (68.5 MB/s) - ‘1GB-testfile.dat’ saved [1073741824/1073741824] just yesterday played tested it out with wget.
    I did consider Google App Engine for front a while. But problem with GAE is that if you get kicked out for some reason, there's no good real alternative platform to run the app without extensive porting. So for this kind of test project it wasn't a really viable option after all.
  • Just checking out GNU Social, what kind of stuff it got similar to Local Board (LclBd.com) and what's different. Is this better than Twitter and, if so, how. It's good to learn new stuff all the time. 
  • Tried GNU Social at Load Average to see how it's different from this and Twitter. Well, there are plenty of similar projects. Like CupCake Users are free to select which ones to use. Other provide better privacy and features than Twitter. With largest networks there's a big problem that those are being tightly monitored which doesn't necessarily apply to smaller networks like cupcake, GNU social or this Local Board (lclbd). My Load Average profile and my CupCake.io profile.Also tried latest version of Diaspora to see what they've come up with. My Diaspora profile. To be honest, it looks good and again much better than Ello. Finally my Ello.co profile.
  • Finished reading again latest issues of the Economist (I just love that stuff) as well as Finnish CTO and System Integrator magazines. Long articles about transmedia where same product is making money on multiple fields, I guess Angry Birds is a quite good example about that. Kw: Classic Concrete Experience Feeling Diverging feel and watch deflective observation watching continuum assimilating think and watch abstract conceptualisation thinking perception converging think and do active experimentation doing processing accommondating feel and do. Problem and project based learning. Learning & Awareness. Economist also got a long article about Data Protection and how rules differ in US and Europe. 
  • Estonia's e-residency program expands abroad, now official strong digital identity can be applied from 34 countries.
  • Once again wondered how multithreaded par2 can somehow hog system so badly, I guess it's related to it's disk IO somehow. After the actual massive and computationally CPU & Memory intensive Reed Solomon matrix compilation starts, system runs fine again.
  • Checked out IPFS - Nothing new, Content Addressable Storage (CAS) / Content Based Addressing (CBA) is nothing new at all. - Content Centric Networking - Named Data Networking - Lot of the IPFS talks are quite much off topic, they don't well describe the project, it's just generic promotional marketing like blah blah. Many of the related facts are totally hidden under this marketing hype. I made separate post about this IPFS topic. Sorry, posts are again out of order. I often queue stuff in backlog and release out of order. Some stuff can be just logged in yearly mega dump.
  • But as I've earlier written Named Data Networking and Content Addressable Storage aren't new concepts. Actually at one point people tried to hire me for bit similar project where there was small JavaScript library which would then create host based swarm and load content from peers and use the primary server as backup only if no fast enough peers are available. 
  • Got bored with the fact how badly ThunderBird networking stuff is written. At times it just hangs and requires restart of whole system. It really annoys me. I've checked that it's a pure lie and just the internal state of the app sucks. Because I can access same resources using other applications and other networks without any problems. Except that ThunderBird just fails hard. Of course after rebooting workstation miraculously issues at the server gets fixed. Yeah, right. 
  • Why earlier it was recommended to shard images to multiple pieces, splitting site on multiple domains and so on. Now with HTTP2 single tcp stream is being preferred that's kind of strange...
  • For one project which handles tons of messages asynchronously I've implemented "replay solution" which is excellent for testing and development as well as situations where database needs to be reconstructed. It actually quite well follows Apache Samza thoughts. All messages are stored as those are when received from the network into data feed storage and then only local "derivate" data is processed from that for end users. When something needs to be fixed, tested, developed, changed. I just make changes and replay that whole feed storage into system. At that time it's easy to see if everything goes through well and if there are any exceptions raised. As well as if some messages are incorrectly handled for a reason or another, it doesn't matter. I'll just fix code and run replay again. So handy. This also allows fully stateless and asynchronous processing, there's technically no correspondence between other parts of the program and the received / handler module. No state needs to be maintained what so ever, so I'm using fully stateless message passing implementation. 
  • I don't like Skype at all. It's delivery status information is so archaic. It only lets you know if the message you sent is delivered to "Skype cloud", but it won't tell you if the recepient has received or read the message. Other newer IM systems handle these things much better!  
  • One article said that there will be huge demand in Sweden for ICT guys. Especially Internet of Things and Big Data will add need for competent techies. It's also important to know well whole system architecture and integrations as well as project management and things will work out smoothly. 
  • Finnish quote from ICT mag: "Lisää osaajia tarvitaan tulevina vuosina etenkin tietoturvan pariin. Esineiden internetin kasvun ja big datan myötä myös muun muassa järjestelmäarkkitehtuuriosaajille tulee kysyntää. " - There will be jobs for ICT guys even in future, who are passionate, ready to learn and work hard.