My personal blog about stuff I do, like and I'am interested in. If you have any questions, feel free to mail me! My views and opinions are naturally my own and do not represent anyone else or other organizations.[ Full list of blog posts ]
You can use 7-zip
(7z, 7zip, 7z.exe)
to get file hashes easily, by using new h parameter. So 7z h
-scrcsha256 produces SHA256 hashes of the files in the directly and
subdirectories. That's really easy and secure way to verify that the
files are same on recipient and sender. This won't work if you're using
too old version, upgrade to latest version. If you don't specify any
parameter, CRC32 will be produced, in most of cases that's easier to
compare and good enough as it still contains almost 4.3 billion
possibilities for different file content. When using UI (7zFM File
Manager) you'll find the file hashes under right click menu. 7-zip also
supports other hash types like CRC-32 (CRC32), CRC-64 (CRC64), SHA-256
(SHA256) and SHA-1 (SHA1 160 bits) which you can select using the -s
parameter on command-line including Linux and Windows versions.
$ 7z h *.txt
7-Zip  9.35 beta Copyright (c) 1999-2014 Igor Pavlov 2014-12-07
CRC32 Size Name
-------- ------------- ------------
CDF68C0E 40158 History.txt
FDDB6E75 1927 License.txt
7ED55530 1683 readme.txt
-------- ------------- ------------
CRC32 for data: 4AA74FB3
CRC32 for data and names: 2F50438F
Now it's really easy to see the individual file hashes as well as final hash for data and names. Hash checksum allows really easy comparison of even large directory and file structures including the data content. you can easily verify whole paths over chat, telephone or what ever so nothing hash changed due virus infection, malicious purposes or just old fashioned bit rot aka plain corruption and the stuff you got it's really what it is supposed to be. I know many fellow administrators just use a plain file size to check file content, but that's not a really good way of doing it. Because file content might not be what it seems.
- Played with Full text indexes (FTS) when using SQLite3. This simple virtual table solution allowing MATCH searches seems to be working really well and provides lot faster search results than the traditional LIKE search which many programmers are unfortunately using. Based on tests FTS4 indexing works really well and quickly at least with databases which are just a few gigabytes of text.
- Checked out Tribler project. I don't really like their design. They're just copying how Tor works, even if everybody knows really well that Tor isn't designed for file sharing. Much older designs like Freenet and GNUnet would provide a lot better performance.
- A nice post about code quality. Do it using a quick fix or do it properly? Cost of quick fixes can be huge in the aftermath. This post compares software design & quality to financial markets.
- "NSA Has Undercover Operatives in Foreign Companies" - No surprise. It should be more like obvious thing. Many things are being lied about. I guess many anonymous VPN providers also lie about the level of anonymity they're providing. What if Sony Hacking would have been done via commercial VPN providers "anonymous" service? Would you believe that they would retain the anonymity? I wouldn't.
- Yandex Disk provides 10GB of free backup space with WebDAV support.
- Also checked out a few Google Serices: Managed VMs, Google Cloud Interconnect, Google Cloud Direct Peering, Google Cloud Trace and the Google Container Engine. Which is a fully-managed cluster manager for Docker containers using
Google's Kubernetes. Network is managed by Andromeda a Software Defined
Network (SDN). Using Docker & Kubernetes also really nice avoids the
big bad thing of Google App Engine, called vendor lock-in.
- Read one article about companies with 100 specialized security engineers. Well, what about targets which got NONE. As well as systems are inherently misconfigured and there's no hardening what so ever.
- Other stuff: Cloud Security Assessment, Firebase, Security Architecture, Zero Day Attack, Automatic Forensics, Adaptive Deception & Defence, Measuring application performance, 50% and 95% latency.
Shingled Magnetic Recording Software abstraction layers, internal APIs, Cloud Services, software design, etc.
Seagate has released SMR drives to market which use Shingled Magnetic Recording
. It's excellent for large slowly writing storage drives. I'm just very curious how they fix the random write performance. Some people claim that the write performance isn't that bad, but how they do it? Because SMR might require several read write passes for one sector, and unless they're using some clever tricks, that easily leads to super bad performance. Maybe they're doing something like using Flash Translation Layer (FTL) style tricks inside the track / shingle batch. Which basically allows them to read and write the sector without waiting full rotations of the disk. Maybe the sector contains some free space and they're using logging style updates where all even random writes (to same track/shingle region) are neatly modified to be written just at once. And then there's separate garbage collection / compaction during idle times. That kind of stuff would for sure work at least for desktop use. Any ideas how they really do it?
Moving between service providers is not impossible, it's trivial. But it all depends how you abstract your system. Do you docker, lxc, pure IaaS. If you have required abstraction layers in your system, so you can basically map calls to anything you desire. How complex those abstraction layers are etc. In some cases moving between service providers is much harder, like as example if you're using Google App Engine.
It's hard to move to other cloud, unless, you've especially build the app so that there's abstraction layer which allows you to map the calls to GAE to any other solution. You'll just swap the data store to MongoDB and so on. If system is build and grown without strict control to utilize "all kind of" PaaS systems, then it might be really hard to move out.
If you're using pure IaaS platform and you have automated Linux deploy scripts or use OpenStack, then moving from cloud provider to another with quite simple application can be really trivial.
Something simple like SurDoc can be implemented on almost any platform as well as the code behind the service isn't too complex. It's not hard to move or even recreate similar service.
All this comes down to the final question, costs and time. Is it worth of it? If startup starts using GAE, is it worth of creating a system that can be run on any platform. Every abstraction layer might require a lot of extra work and change system design and complicate it. On the other hand, using required APIs and layers, might also make the project actually much simpler. Because you can abstract all the complex service provider related issues out. Like storefile(key,data) it doesn't matter what OS, what cloud provider, are you using S3, Google Blob Store, MongoDB, or flat file system, it just works.
Lightly checked out projects like Kubernetes, Maestro-ng, Mesos and Fleet. This actually very much relates to my previous post. Moving from cloud to cloud could be really trivial and quick. For small projects, it could just take minutes to move from cloud to cloud using high level cloud orchestrators like TOSCA.
As far as I can see, I would prefer shared queue or limited queue instead of having per client individual queues. This strategy can be used for anything which got data which is shared with multiple users and require synchronization.
First optimization things I usually take a look are:
- Amount of data
- Size of indexes
- How often the data is being accessed
Especially repeated read / writes to data which got huge indexes is bad for server. Ok, data which isn't being indexed and triggers full table scan is even worse. But hopefully nobody writes that bad cod
Goal of these things is to minimize the load caused on server. I've often seen that some applications with large queues generate significant load on server, even if everything is on idle. Load is being generated by repeated checks to large tables without indexes or with too large indexes. In these cases data should be partitioned better or whole strategy of queueing should be changed.
How we can reduce amount of data? Deduplication aka normalize data. In some rare cases this is bad idea, because the joins required could be even worse for the server than just scanning huge tables. But usually it's an good idea, especially if we can separate "small queue" and large data it's referring to.
Some queue systems do store same data for every client in data structures without deduplication. If queues are memory based, this can cause system to run out of memory or memory page swapping aka excessive swapping aka disk trashing.
- How do we limit amount of data in queue? First solution is to stop queueing for "dead clients". Which means that at some point change log for some clients is cleared, and there's simple message which tells to refresh everything because change log queue isn't available for that client anymore. This procedure usually gets rid of the worst clients which cause most of queue table bloat. These are the clients which actually do not ever fetch the data from the queue.
- Then we can improve this solution by separating queue and data. Now we have only one queue table, which tells what should be sent to where and then we have separate data. We can only replicate the queue table entries for recipients. So instead of queuing 100x 1MB message we only store something like 100x 24 bytes + 1x1MB message.
- If step 4. isn't implemented, this solution can be further improved by per client status flag in separate status table, which tells if there's anything in the queue table for that client. This is again optimization which makes writes heavier but constant reads much lighter. Instead of scanning queue table index on every polling request we can simply check this smaller table. Of course when ever queue table is updated, this status table also needs to be updated, making updates a bit slower.
- Improve queue table so that there is a monotonically increasing ID counter. This is very simple and RESTful way to track the state for each of the clients what should be refreshed. Because we have this id, it can be used as pointer. This procedure allows us to also form groups, because there's no need for server to know what data has been fetched by which clients. These groups can be like all clients, clients with certain status flag.
Before these changes the queue system was really taxing servers, with database tables sized in multiple gigabytes and being accessed all the time.
After these changes, the queue system became virtually free to run. Now it consumes 1000x less disk space, server doesn't basically need to know even about all clients because implementation is fully RESTful no state needs to be kept on server side. And all frequent polls hit check the same portion of the queue table instead of triggering separate disk access to fetch "per client data" as well as minimizes the amount of data that needs to be cached in ram all the time for performance reasons.
Depending from database server and it's data structures this method can be also written so that the queue table is circular log structure by rewriting older records with newer ones. This prevents data fragmentation very efficiently because the queue is always clean continuous queue on disk as well. But this is complex topic and requires detailed knowledge about the database system internals being used. In this case, the circular logging worked beautifully. But as example, SQLite3 uses dynamic record size and replacing older record with slightly larger record could lead to fragmentation. Some databases also provide special Queue type tables, but you can also abuse those to get bad results if you don't consider these matters when designing the solution.
I'm also very happy to receive any feedback, you'll find my contact info from home page.
- Watched: Mikko Hyppönen's TEDx Brussels 2014 talk Internet is on Fire.
- Finnish businesses are looking for multiple methods to avoid taxes. This is the situation where taxation is so high, it's really worth of all effort trying to find anyway to avoid taxation. Taxation avoidance is a good business.
- Maybe 'Finnish businesses' should be located in Panama or Hong Kong? There has been multiple discussions if running business in Estonia gives true tax benefits as those two previous alternatives do.
- Reminded developers once again about Semantic Versioning which does make sense, instead of versioning everything totally randomly or not changing version numbers at all when releasing new versions.
I've seen so many projects to miserably fail with even simple versioning.
- Watched pyvideo Set your code free releasing and maintaining an open-source Python project.
- I'm just wondering why some firewalls do reserver NAT table entires for UDP ports which have been forwarded to internal network? Because there's static mapping in place, there's no reason what so ever to replicate these static mappings as temporary mappings, which then just consume resources in the temporary mapping talbe. In case of many DHT network implementations which quickly communicate with large number of peers, this could easily reserve thousands of entries, and then basically swamp the routers NAT table. If I would write such firmware or software, I wouldn't simply store the static mappings into the temporary NAT table at all.
- Alternate view, I hate sites which require login & password, especially to only view content. I try to ignore every such site, because those are designed really badly. As well as I don't get it what's the silly correllation between email and registration. I don't want to give my email, nor I want to register. If I need to give email, I can give any random temporary email. Signup procedures on many sites are horrible. Best sites do allow interaction without these hindrances.
- Watched pyvideo So You Want to Build an API. Nothing new, but otherwise a good talk including best practices and patterns of API building. Handled caching, headers, authentication, resource urls, rate limiting, JSON data structures, and other regular stuff.
- Checked out new Docker features. Docker Engine, Docker Machine, Docker Swarm, Docker Compose.
- Watched: Introduction to Twisted which goes differences between synchronous and asynchronous code through.
- Watched: Machine Learning with Scikit-learn.
- uWSGI's Emperor mode is as wonderful as many other features with uWSGI. Even cofiguration data can be given on command line or from directory of config files (XML, INI, JSON, YAML, SQLite3), PostgreSQL, MongoDB, AMQP, ZeroMQ, Zookeeper, LDAP. Wonderful, you won't run out of options.
- Wondered the standard style of network admins to screw up things. They take any random network device. Uuggh, I can't login. Then they reset it to factory settings and start tuning things. Don't they really realize how massively they just f..d up. Nothing to say about these so called 'field best practices', duh.
- National Service Channel (Finland) - Technology review - Finnish:" Mainio yhteenveto kansallisen palveluväylän teknologioista ja tekiinoista, lienee kuitenkin suurimpia integraatiohankkeita ja projekteja Suomessa tulevina vuosina."
For English information check out X-Road Europe.
If anyone is interested about these integration projects, or related
services, that's great. Because at least I'm interested about these new
services and opportunities. As well as the stuff they're doing is very
familiar to me.
kw: tiedonsiirtoprotokolla, tekniset vatimukset, aikataulu,
jatkokehitys, yleisesittely, palveluarkkitehtuuri, viestiväylä,
palveluprosessi, standardointi, hajautettu järjestelmä.
- Also checkout (in Finnish), National Service Channel Experiences by City of Espoo. No surprises there either.
- Checked out VersaWeb. It seems that they provide "Heztner" like prices in US. Really nice and competitive pricing. Yet they got US only servers right now.
If you're looking for super cheap VPS servers to work as VPN gateway or just to monitor things or so, running really light code on the web all the time, it's good idea to checkout Atlantic.net. Starting from less than 1 USD / month. I also wrote a long post about differences of vps and cloud (IaaS, SaaS) alternatives but that's in Finnish.
- Something different? Checked out Cruise Missiles, History of Nuclear
Power and Area 51 documentaries, telling a lot about history of
inventions made and tested and what kind of failures did they
Final comments: Some SSD drives are just so horrible. Works SST performance ever? Ok ok, 10 MB/s read 4MB/s write USB 2.0 sticks are worse, as well as the Random 4KB access speed is still much better than with traditional HDD's.-----------------------------------------------------------------------
CrystalDiskMark 3.0.3 (C) 2007-2013 hiyohiyo
Crystal Dew World : Crystal Dew World
* MB/s = 1,000,000 byte/s [SATA/300 = 300,000,000 byte/s]
Sequential Read : 82.151 MB/s
Sequential Write : 31.920 MB/s
Random Read 512KB : 79.885 MB/s
Random Write 512KB : 15.408 MB/s
Random Read 4KB (QD=1) : 7.955 MB/s [ 1942.1 IOPS]
Random Write 4KB (QD=1) : 1.954 MB/s [ 477.1 IOPS]
Random Read 4KB (QD=32) : 8.360 MB/s [ 2041.0 IOPS]
Random Write 4KB (QD=32) : 1.908 MB/s [ 465.9 IOPS]
Test : 100 MB (x3)
Date : 2014/12/08
- IPv6 adoption globally is now at 5%. Information from Google IPv6 global adoption statistics site. https://www.google.com/intl/en/ipv6/statistics.html Belgium leads with 28% adoption rate.
- Studied new image (compression) format BPG, which looks really good, even better than WebP. The question is only that is it partially patented or not. I've been personally very disappointed how slowly (if at all) things like JPEG 2000 (.JP2) and JPEG XR (.JXR) have been adopted, even if those provide advantage over older formats.
- Latency changed between my servers (in datacenter) and home, now latency is about 1,3 ms when it was about year ago only 0,9 ms. That's horrible ping!
- Multi is new(?) elevator concept where there could be multiple elevators in same shaft. But personally I find it funny that it hasn't been introduced a lot earlier. As well as their current design is bad, because the one shaft is still blocking traffic when elevators stop. I would prefer option where there are let's say six shafts, and only some of those got doors. So cars which are stopping won't block other traffic. Just like taxis operate at airport drop off / pickup points. When this is computer controlled it's much faster and safer than with cars. Six shafts could allow operating tens of elevator cars in same space. As well as some floors like ground floor could have six doors, and other floors usually two, leaving shafts free for passing stopped cars.
- Played a little with UpCloud's new MaxIOPS, which providers better than SSD IOPS performance on the cloud.
- Nice post: Make Your Program Slower With Threads. Nothing new there. As my database tests showed earlier, context changes and locking can really kill the performance. That's why I prefer 'shared nothing model' and if that's not possible, then communicating between threads using only locking queues and proper "chunk size" so the inner loop doesn't pop stuff from queue one at time, but instead takes anything between 100 to 1M entries depending from situation, so the locking doesn't start to dominate the process. Yet, it's interesting to see, how people (all people) hit the same performance, synchronization, and so on, problems over and over again.
- The hobby project we've been working on during weekends with my friends, currently uses threads very sparingly. We prefer fully asynchronous message queues and just a few threads instead of launching a large number of threads for processing data from multiple sources. Current design has worked very well, and performs well and it's memory and CPU friendly too. Basically there are just three threads + a few workers. Inbound message handler, Outbound message handler, Controller thread and number of CPU cores workers as processes due to CPython GIL limitations.
The project also uses the old tricks I've been using earlier. Basically tapping into existing program code with minimal modifications. I'll just launch my own parasite thread inside the main program and then add a few asynchronous queue calls where ever is required for I/O operations. Some times I also directly call the main programs methods or create my own instances of objects available in main program. What ever is required to get to the goal with minimal work. It's not always beautiful, but is easy and a dirty way to plug into systems which do not provide proper plug-in / integration features requiring minimal amount of work.
- It seems that Finland is going to get a lot better Internet connectivity toward central Europe by Cinia Group new Operation Sea Lion.
- When data gets creepy. - I have nothing to add, this shouldn't be news to anyone. Everything you do is being monitored, stored, archived, indexed and analyzed.
- E-residency got launched on 1.12.2014. Now you can access digital signing to on-line business registration and backing.
- Still wondering why I just can't get my uWSGI to listen on IPv6 interface. Even if everything is just fine with the LXC container where I'm running it. If I just start listening for connections using Python 3 sockets there's no problem with IPv6 what so ever. uWSGI documentation didn't shine any light into this matter. I guess I'll have to post a Stack Overflow question.
I think I found FTS related bug?
Afaik, both codes should only delete record from FTS table where key matches the data.
# Code 1 - For some reason this seems to delete all records
fts_instance = FTS.get(FTS.key == key)
# Code 2 - This works as expected and deleted only matching key
FTS().delete().where(FTS.key == key).execute()
Any ideas why delete_instance deletes everything? Why?
uWSGI is a complete web server
uWSGI virtual hosts (multiple appliations) on same or separate hosts, etc.
So it seems that most of uWSGI guides are at least semi outdated and rest of guides instruct to use Nginx, even if it's not required anymore because uWSGI got it own efficient web request router. Now I'm just asking, what's the most efficient way to route queries to different apps and how to exactly do it. Most of guides I've seen, actually do run separate instances of uwsgi for each app and use separate ports or sockets. But with the internal router this shoudln't be required. Has anyone done it? I would prefer minimal setup where two scripts are loaded and served based on host (http request header) name. I know there are just so many ways to do it, but I would prefer doing it using plain uwsgi and it's parameters. Of course I can load app, which imports the other apps and then internally routes the requests between the apps based on the headers and so on. But this would be the cleanest solution afaik.
Latest uWSGI can also handle HTTP and HTTPS requests using pre-processing thread(s), as well as do content off-loading after request. So it can basically (at least in theory) handly request as efficiently as using separate http server with it like Nginx or Apache. Workers won't be reserved before data is ready for processing and workers won't be stuck with delivering the data after processing. As well as handle static files efficiently.
Some of the guides are based on really old versions which had to reserve a thread for whole processing time from connection to finishing connection.
Just like good old Apache pre-fork systems did. But times and things have changed since that. It seems that many people even still think that apache would work that way too.
Currently I'm looking for a simple solution to do vhosts using uWSGI only. Of course I can handle it on application level, but how to configure uWSGI to route requests based on host header, is still bit open.
Nice post: 7 principles of Single Page Applications (SPA)
- Which is basically SPDY over UDP. You'll find out reasons and benefits from the documentation.
Studied IPv6 Rapid Deplopyment (6rd)
. Why? One of the ISP's I'm using is now providing IPv6 addresses using it. This is basically on the lines I were thinking earlier. If ISPs won't provide native IPv6, every operator could still provide their own 6to4 gateway to guarantee service levels. 6rd is basically just that.
Wondered some database administrators best practices which are quite crazy, afaik. What do do if database engine is shutting down too slowly? Well, let's kill the process, delete the journal and restart it. Why we delete the commit journal? Well, because moving committed transactions from journal to database files are slow operations. So if we call normal shutdown or start the system with the uncommitted journal, tasks are slow. It's better to kill the database engine and delete the journal and restart it. This guarantees swift database service restart. - Uh, arf, om(f)g.
Studied: REST best practices
. - Quite short and obvious list, if you're been doing this stuff for years.
Thought stuff like digital economy and concept of real-time economy and which are the thinks which are dominate future trading on net. They did talk a lot about this in Slush event too #Slush14.
Watched Mikko Hyppönen's talk at #Slush14 about Internet's future, Startups, security threats, etc
. As well as Harri Hursti's talk how many different kind of attacks can be mounted over USB
(stick. mobile phones, etc). Basically USB devices can infect or be infected by any other device getting connected. Don't share your USB devices, don't use others USB devices and remember to use USB condom.
Checked out: Arctic Fibre project
. - This is great competitor to the ROTACS, which could be routed through Finland. It seems that Russia and Canada are also very interested to getting the cables deployed for multiple political reasons.
Hobby Project: As well as programmed a lot. I'm helping my friends to launch their hobby project. Lot of work, git commits, pushes, pulls, docker containers, virtual servers, etc. Working with JSON, DHT, jQuery, peewee, Python 3, Bottle, Angular, Tornado and coordinating whole project globally with small yet very agile and productive team. I don't know when the project will be ready. Let's say it's 90% ready, so finishing the last 10% will take about 100% more work. Eh, as usual. No really, there are just a few minor show stoppers which make the end user web app usability so bad it's not ready for release. Technically everything on server side is already fully working. Soon, maybe. Technically the project has already been running for two weeks but under wraps, with closed private beta for feedback and testing using mobile devices and browsers on different platforms.
- Throughly studied OpenBazaar documentation and source code. Also see the marketing site OpenBazaar.org
It's really painful, many things are so badly named, that it's very hard to find any real patterns without extensive study and testing. Well, I made it. But there's many things to fix. Code also seems to be leaking as well as for some strange reason hangs so that peers can't connect anymore and contracts won't get stored. This means that the process requires frequent restarts. Yay! Otherwise you'll run out of memory, file handles, etc. - As they said, naming things is hard, current terminology is absolutely horrible mess. It took me several hours to figure out some of the basics, because totally different naming conventions are used in separate parts of the program. As well as mixed snake_case and CamelCase even in JSON messages. Some messages (with practically similar content) use lists, some messages use dictionaries for same purposes and so on. As example result message types: store, "store_contract", "peers" (list with dictionaries), "foundNodes" (list with lists). Some parts of documentation use XML, even if program itself only uses JSON, list goes on. Sometimes stuff is called contracts, products, listings or items? And in some cases peers, nodes, markets, stores, pages. Ugh, are we having fun yet? JSON messages got linefeeds \n in middle of data, which need to be stripped (field: PGPPubKey), etc.
- Operation Onymous
- Swedish Visby class corvettes
- WorldVu Satellite Constellation - Because SpaceX is talking about dense world wide low orbit satellite network
- Lightly studied Google Container Engine - Seems to be a great solution. I just wonder when OpenBazaar is available so you can easily "drop" your "shop" to any docker hosting. Btw. OpenBazaar project already constains directory with docker stuff, so it's quite clear they have thought about it.
- Telegram Cryptalysis Analysis. It got great examples what not to do. Also the current $300,000 for Cracking Telegram Encryption contest is quite interesting.
- A great example, like how Bitcoin DoS protection can actually make you vulnerable by allowing you to connect only to attackers own nodes.
- Designing secure P2P networks is really hard, because every decision got it's own pro's and con's which not be clear at all.
- NVM Express SSD interface
- Quicky checked out Amazon AWS Lambda and Amazon EC2 Container Service (ECS)
- CDN services by CDNNetworks - They have really dense network of hubs, even if service seems to be quite unknown compared to other players in the market. It's one of the CDN networks which also cover Africa and South America and China using several hubs.
GET /ws HTTP/1.1
Websockets.py (websocket-client-py3 0.10.0) fixed several things which are more or less broken:
Fixed hostport parameter value. Which is sent with Origin header value. Basically removed port number:
if True: # Was: port == 80:
At least Tornado webserver doesn't want
to see port number in Host header when using web sockets. This hasn't been fixed in latest version, or maybe it's a bug on Tornado side? Don't know, didn't check, works now. It seems that nonstandard port should be reporetd with Host header. So the fail isn't in websockets.py actually, it's on Tornado(s) side?
logger.debug("send: " + repr(data))
l = self.sock.send(data)
data = data[l:]
Swapped place of 1 and 2, because in original code trace statement was after the while loop, and naturally the data field was empty at that point. It seems that this fail has been fixed in 0.15.0 version of websocket client py3 library. Currently using version 0.21.
- One of the reasons why I hate .NET. Installing .NET on Windows Servers, requires that you'll be running a IIS Web Server on that server. Afaik, that's not a grat way to reduce attack sufrace on servers. All servers are then running publicly reachable vailable web servers with default settings. All this, because .NET installation requires it. - Arf!
- Selecting non-referenced entries, using Peewee ORM
It seems that I'm having a hard time doing one kind of data lookups with Peewee ORM.
Table A contains "id", Table B contains foreignkey fields fk1,fk2...
How do I find all A's which aren't referenced by B. And what's the most efficient way of doing it? Of course I can pull list of all keys in B and then select from A which not ["in" / "<<"] that set. But that's hardly the smartest way of doing it. It gets horrible, if I got 100k keys. First I'll have to dig up list of 500k keys and then compare those to 100k keys etc. Worst part of this is that I'm actually building a object list, which naturally can't be optimized anymore by the SQL query engine.
Or maybe I should run 5 separate queries collecting fk1,fk2 and so on and then query where not in set of five lists or make separate or statement for each list. I haven't tried that yet. It would still allow the mess to be optimized by the SQL query engine, because I haven't executed any of those statements separately.
I've tried many things and joins doesn't seem to work well in cases where there's no foreign key reference. But is there a perfect solution for what I'm asking? Maybe I'm just not figuring out the right mind set, or I'm approaching this case from some kind of weird or restricted angle?
Yes, and yes. If I would be using pure SQL I could do this easily using left outer join. But in this case I'm not. I'm using Peewee ORM, which sets it's own limitations on joining.
- Created Google+ Peewee ORM Users Community
- Interesting article about Machine Learning, Pattern Recognition and Deep Neural Networks
As we know, results which we get from Big Data and Neural Networks are often true Black Box results. Nobody really knows, why we're getting what we're getting.
- Read Hacker's guide to Neural Networks
- Studied Crypto101.io document. I'm glad, it didn't contain anything I that would be new to me. Yes, I don't remember / know all the exact minor details, like exact differences between different pseudorandom generator implementations. But in general, it didn't bring anything new to the table.
- Finally GnuPG 2.1 has been released with Elliptic Curve Cryptography (ECC) support! But there aren't any other OpenPGP compatible implementations with ECC features yet out. So ECC keys will have very limited use for a while. When that's delivered with standard ubuntu repositories, I'll generate my new ECC keys using Brainpool P-512 (brainpoolP512r1).
- OpenBazaar is a distributed online trading platform which also makes it a censorship-resistant marketplace. No fees and everything goes. This is the future as long as I can see. If you see my older posts, I've been wondering why The Pirate Bay (TPB) continues to operate a website. Wouldn't it have been much better to create fully distributed, encrypted, peer to peer network. Where you simply can't track who's sharing and downloading what. Just as Freenet does. But especially designed for peer to peer distribution & potential trading. It could have contained features for digital trading and reputation systems etc. But this is it, basically the same concept, but not tied only to file trading. So in this sense, this is better more generalized solution.