posted Mar 2, 2015, 7:04 AM by Sami Lehtinen
updated Mar 2, 2015, 7:09 AM
I'm going to complete ignore any presented use cases and write it just in totally generic perspective as this would be any other distributed storage thinking about selected technical choices and potential problems.
- Store-and-Forward - Potentially higher latency compared to low latency solutions. Which means that it's harder to connect sender / recipient at least in theory. This also allows communication when another of the parties isn't available which is often quite nice and desired feature.
- Idea of tagging and anonymity sets is something which I like as concept. Especially the possibility for custom length parameter is interesting. But it also can lead to many problems which aren't immediately clear unless you really start thinking deeper. I'm covering a few of those models in this post.
- Lack of Forward Secrecy - Same encryption keys are being used all the time no PF ratcheting.
- Store-and-Forward -> Potentially much slower than low latency solutions depending form the implementation being used.
- Server loading - If the server is doing all the message fetching, it could be seriously be loaded. Caching is the key, but how often information needs to be refreshed and how heavy this operation is?
- Caching - Should server store fetched messages so it can deliver those to other clients too? If I would be writing the server code I would naturally cache stuff, disk space is so cheap and it would drastically improve user experience due to much lower latency. DHT lookups can easily take several seconds and if larger key space is requested it can take well basically anything up to an hour. Depending from used sets and client distribution and values used with length parameter this could easily lead to situation where all servers are still containing most of messages in cache anyway.
- Timing questions - Will the messages be fetched by the server as soon as notification protocol announces that messages are available? Server could also prefetch stuff? If stuff is fetched only when client requests it what are the performance and privacy aspects? If data is prefetched then it of course improves performance. If data is not prefetched it might again make client / data correlation confirmation easier (?) as well as make performance from latency stand point really poor.
- Federation / notification protocol design and routing? - How the notification protocol would work? There has been some discussion that the servers should be notified about new data. But DHT isn't great for that. Is there parallel notification / gossip protocol what's available? There wasn't any description of it. Because basically this is something which forms next bottle neck easily. Even if full messages aren't being tossed around, the amount of control traffic for the notification protocol could be really heavy if network actually gets used. So it's not as distributed as it sounds in the first hand. Without such protocol some key (spaces) could be repeated over and over again repeatedly, which isn't good either. Especially if protocol doesn't have packet types telling no change nothing new don't ask ETag style solution where server could easily tell that there's nothing new without sending the actual data.
- Can clients have multiple key requests hanging open using one long poll request, What about really slow requests to full fill?
- What about traditional DHT poisoning and flooding attacks which could be used to cripple the network? Of course servers without adequate control code, this could also lead out of disk space situation and huge consumption of other system resources.
- Actually the design is such that it gives high probability of potential indirect centralization - "Servers with low latency, high up-time, and robust anti-DDoS measures will attract more traffic" - Which could actually quite easily lead to a service which is basically working as the backbone of the system. This means that system which fetches all messages and servers those instantly from cache would be the preferred solution for almost everyone. Nobody wants to use naive implementation which checks DHT and then fetches messages passing those to clients. Which could take a long time, something like several hours easily as well as of course all messages might not be available for multiple reasons after all. Like if Facebook would allow federation protocol, how many percentage of users would be using other than Facebook servers? Sample kind of examples are Hotmail, Gmail, and so on. Even if email is actually fully distributed, I guess that absolutely huge majority of email is being handled by just a few largest providers. Or what about the BitCoin mining pools, which got over 50% of the network nodes joined under? That's better than any of the other examples I could imagine. Fully decentralized system with one instance controlling more than 50%. Is it really distributed anymore that breaks all the good basic governance rules of distributed system.
- Amount of information stored / client on server - What kind of information needs to be stored at client and servers? I do have clear image in my mind how the server part would work. Clients only need to know about keys to request so it's more up to the final client application what it does. But the server part could be pretty heavy. Also long polling requests from clients of course take their toll if the service scales up.
- Expiry? How long will be the data located in the original server or caches? Ok, caches can figure out how to get rid of data, but is there any official policy for the "authoritative" server?
Just confusing stuff
- "The messages would not be stored by the entire network, but rather only by the server to which they were uploaded." - Afaik, problematic design reducing network resiliency. Also this would create clear centralized targets to bring the network (functionally) down.
- "but would operate as a node in the DHT, randomly download and store a
configurable amount of messages from the network ― inserting its IP
address into the DHT accordingly ― and serve them when asked." - What? Download messages? Wasn't it so that the messages are only stored by servers where those are uploaded directly to? A contradiction to previous statement.
- "With enough nodes, each storing only a fraction of the total messages,
we can guarantee all messages will remain reachable at all times, even
if a server goes down." - Again doesn't match with the first statement. It would be also smart to mention that the fraction is overlapping fraction, not clear cut section of the message pie.
- "Because messages are not stored by each node (like Bitmessage) we do not
need to use a proof-of-work to prevent the network from being overrun
with spam messages." - Flooding is fun and trivial if network is incorrectly designed. I'm waiting for the challenge.
- "Because the network is server based, servers can implement traditional anti-spam measures without harming user experience." - Yes and no, who says you would attack this kind of implementation on "client level", you naturally would do it on inter server level aka on federation protocol.
- "The protocol supports lightweight queries allowing the user to make the anonymity set/bandwidth trade-off." - This is nice thing.
- "The protocol supports market based rationing mechanisms where necessary." - Much more details plz, this gets easily really complex.
My related post about decentralized systems.
KW: Bitcoin, OpenBazaar, P2P, IM, secure, anonymous, messaging, protocol, Subspace, federation protocol, server, servers, message, storage, distributed, messages, mesh, network, notification, announce, announcement.
Btw. I know there has been multiple discussions already about Subspace, but
unfortunately I didn't have time to study those thoroughly. So my
conclusions are based only on the original provided documentation.There has been also some preliminary discussion of Subspace could be used as storage system for OpenBazaar.