The Cloud, Query Optimization, Other stuff
Post date: Dec 8, 2013 2:21:55 PM
Cloud discussion @ LinkedIn, sorry, no link. It's closed group.
Attended a very long discussion in Cloud professionals, many many, aspects mainly from Enterprise clinets were covered. I'm not going to summarise whole long discussion here. But here are the keywords from the topics which were covered.
Security, Risks, Accessability, Connectivity, Cost, Geographical distribution, Disaster Recovery, Data Replication, Bandwidth, Performance, Private Cloud, Hybrid Cloud, Redundancy, Backups, Off-site, Reliability, Uptime, Service Level Agreement (SLA), Vendor Lock-In, Are simple IaaS virtualization providers Cloud service providers at all, SaaS updates as surprise, Business size and usailibity of cloud services, Own data centers, ROI, OPUD Over promise and under deliver, contracts, projects, marketing, data ownership, confidential, confidentiality, dedicated, physical posession, control, Disaster Recovery as a Service (DRaaS), overselling, under-resourcing, dynamic clustering, scalability, high availability, multitenancy, single sign on, remote host, networked hosting, web applications, redundancy, message-passing, scheduling, partitioning, deployment management.
It's also funny to notice how differently marketing, CEOs, CTOs, product managers, network engineers, server admins, programmers and end users all see the cloud. They got totally different aspect to the same thing.
Is VMs over Internet actually a cloud service at all? A long discussion if those are a "fake" providers, like Digital Ocean. It just makes it so clear, that there's no clear facts what is a cloud service. NIST has produced this document, but those requirements are easy to fill. - The NIST Definition of Cloud Computing. (PDF) - Even I could produce those from my home easily, one server, RAID 1 or 5, my fiber connection and a light web management console for LXC containers.
Database Query Optimization
Utilizing pre-aggregated data, why it's so hard for engineers? We have pre-aggregated data for year, month. day and hour. For simplicity of this example, let's assume that every month is 30 days. Now if I want to run report for 3 months, wouldn't it be smart to produce the report by summing up the three months? Now if user uses calendar to select the date range, system will sum up 90 days, instead of summing up 3 months. I just hate it. Same thing if we want report for 10 months, now system is summing up 300 days, instead of taking 1 year and deducting two months from it. Engineers say that this kind of logic is too complex for computers to handle. Using 10 months to produce the report can be done, but it requires user to especially define that I want to use monthly data as source. I personally think, this is just perfect example from bad engineering. Shouldn't system use optimal method to deliver data user requested, why it's up to user to optimize report generation. I just don't get it, but this really isn't first time when this happens.
Studied Microsoft Azure pricing & concepts, and compared those with Hetzner, AWS / EC2, UpCloud, Sigmatic, etc.
Data as a Service (DaaS)
Disaster Recovery as a Service (DRaaS)
Database as a Service (DBaaS) / Cloud Database like Amazon Redshift, Google Cloud SQL, App Engine DB Store, Amazon RDS
Software Defined Storage (SDS)
Service Level Agreement (SLA)
Wondered why PowerShell starts so slowly, much more slowly than Python. Then it became evident, PowerShell is almost 10 as heavy as Python 3.3. It requires more disk I/O to launch it and also consumes a lot more memory. (Shared & Private) Of course this isn't usually a problem, but if there are many parallel tasks running, of course some resources are wasted. I'm now talking about multi-tenant environments where there will be 1k-10k processes running. In the usual cases where script runs hourly or daily, this of course doesn't many any difference. That's also where the startup time is longest, because everything isn't already cached.
Checked out a few posts about recursion. It reminded me from times I were learning it. It's very useful when doing something like flood filling or path travelsal. Storing state in stack, instead of using separate list to store it, is just so nice. Same results, but with much simpler & elegant code. Somehow this just reminds me from differential, which is also quite simple but yet interesting solution. As wel as more complex devices where stuff is rotating relative to something else etc.
CoreOS - Linux for Massive Server Deployments, sounds like a great product, if you just happen to need one.
A nice comparison of HTML5 frameworks. I just were hoping longer and for more in depth article.
Read a nice article about Using Google App Engine, Google Compute Engine and Amazon Redshift for ETL processing.
NSA tracking phone locations globally? Hmm? No news. I personally think, that is something is technically doable, someone is going to do it anyway. It doesn't matter if it's illegal, doesn't follow good ethics or even if doesn't make money for you. It can be done for several reasons. This is why most of data shouldn't collected in first place, because all collected data can be abused later. Even by single worker with extensive access to systems. Get data, do the dirty work, and sell or report it to management etc. Just make clear that I got the results you asked for, don't ask how I got this. So you can claim later that it was complete surprise that thing X happened.
Being anonymous is hard, really hard: I do know that, because I have tried it. It's quite complex stuff, but is achievable when required. I've been practicing for it several years. It requires certain mindset and abandoning all your habits. And it's hard for sure. It's so darn easy to fail if you follow familiar patterns or habits. Hardest and most complex thing, is to make your peers to understand that they really really have to follow this procedure, without any exceptions. So, unless you have practiced a lot and if it's not 100% of your focus, you're going to fail.
Checked out IPython even I don't have any use for it right now.
Checked out Micro Python, it's great concept. It I would work with micro controllers, I would use it for sure when it's ready.
Played with StartMail Beta. Nice service, but they still unfortunately got completely wrong price point. They haven't yet released official prices, but if comparison is Digital Ocean server for $5 / mo and 20GB storage space. It's hard to beat that. At least Hushmail is really expensive compared to Digital Oceans pricing. Of course I can use Digital Oceans server for a lot more than just email.
SMS4TOR services seems to be basically same service as my Off-The-Record was. It provides encrypted secure self destructing messaging over Tor network using URLs.
Totally different just for fun:
Retrovirus, Virus, Gene Therapy, Bacteriophage. I'm personally starting to think that this could be something even bigger than Nanotechnology or Computer development. Combination of these technologies will change the world beyond our current imagination and understanding. Just as example, think about device which will create (like 3D print) any kind of virus, bacteria, cells you ever need for what ever purpose? Automated analyzation, computation, and manufacturing of customized personal vaccines for just the exact problem you're having, etc. It's going to be something truly wonderful, or horrible, depending on the purpose the technology is being used. As we all know, one of the the top most human interest has been for centuries killing each other. That has also been the driver of new technologies. Either you develop something or die. Well, that's quite good concept for medicines for serious deceases too, but it doesn't touch as many people as war does. - Who's going to work in field of Genetic Engineering in the future? When we're going to be able to build completely new life forms, viruses etc, not only combine or modify existing, but building stuff from the scratch?