IoT, Python Big Data, Shingled Magnetic Recording, Cloud Platform Abstraction

Post date: Dec 16, 2014 4:33:52 PM

Watched and read a lot of stuff about: Internet of Things (IoT), Big Data, Big Data Analytics using Python, Embedded Systems and so on
Like The Top Mistakes Developers Make When Using Python for Big Data Analytics - Nothing new, business as usual
Watched several WIRED2014 talks
Watched: The four pillars of a decentralized society
Once again read about SABRE, A2, SSTO, Venture Star X-33. Hmm, doesn't seem to be a working concept. This topic just pops up over and over again as some kind of techno sci-fi fantasy

Shingled Magnetic Recording

Seagate has released SMR drives to market which use Shingled Magnetic Recording. It's excellent for large slowly writing storage drives. I'm just very curious how they fix the random write performance. Some people claim that the write performance isn't that bad, but how they do it? Because SMR might require several read write passes for one sector, and unless they're using some clever tricks, that easily leads to super bad performance. Maybe they're doing something like using Flash Translation Layer (FTL) style tricks inside the track / shingle batch. Which basically allows them to read and write the sector without waiting full rotations of the disk. Maybe the sector contains some free space and they're using logging style updates where all even random writes (to same track/shingle region) are neatly modified to be written just at once. And then there's separate garbage collection / compaction during idle times. That kind of stuff would for sure work at least for desktop use. Any ideas how they really do it?

Software abstraction layers, internal APIs, Cloud Services, software design, etc.

Moving between service providers is not impossible, it's trivial. But it all depends how you abstract your system. Do you docker, lxc, pure IaaS. If you have required abstraction layers in your system, so you can basically map calls to anything you desire. How complex those abstraction layers are etc. In some cases moving between service providers is much harder, like as example if you're using Google App Engine.

It's hard to move to other cloud, unless, you've especially build the app so that there's abstraction layer which allows you to map the calls to GAE to any other solution. You'll just swap the data store to MongoDB and so on. If system is build and grown without strict control to utilize "all kind of" PaaS systems, then it might be really hard to move out.

If you're using pure IaaS platform and you have automated Linux deploy scripts or use OpenStack, then moving from cloud provider to another with quite simple application can be really trivial.

Something simple like SurDoc can be implemented on almost any platform as well as the code behind the service isn't too complex. It's not hard to move or even recreate similar service.

All this comes down to the final question, costs and time. Is it worth of it? If startup starts using GAE, is it worth of creating a system that can be run on any platform. Every abstraction layer might require a lot of extra work and change system design and complicate it. On the other hand, using required APIs and layers, might also make the project actually much simpler. Because you can abstract all the complex service provider related issues out. Like storefile(key,data) it doesn't matter what OS, what cloud provider, are you using S3, Google Blob Store, MongoDB, or flat file system, it just works.

Lightly checked out projects like Kubernetes, Maestro-ng, Mesos and Fleet. This actually very much relates to my previous post. Moving from cloud to cloud could be really trivial and quick. For small projects, it could just take minutes to move from cloud to cloud using high level cloud orchestrators like TOSCA.

Google Sites

Report abuse