Blog‎ > ‎

OpenBazaar more thoughts & ideas

posted Sep 2, 2015, 8:12 AM by Sami Lehtinen   [ updated Sep 2, 2015, 8:13 AM ]

My comments & questions about Ricardian Contract JSON format


OpenBazaar Ricardian Contract Template.md

Please note the comments are based on early version which has been since updated. Comments are also really quick draft thoughts based on single reading pass. So it's not going to be flawless.

Numbering

I assume the zero lead int (len 2) prefixes for keys are there to order stuff for human readability. Aka ordering entries in certain order Are those going to be included in final format? I don't know if it matters, as nerd I would say ditch those, but using those could still improve readability during specification phase.

Numbering is tight, what about new inserts? If numbers are included changing numbers would break stuff. Or should code strip numbers before processing the contract even if those are in JSON or what? I guess everyone remembers renum in BASIC.

Using tight numbering leads to similar problems to traditional CSV columns (without header row) or length delimited data (without header row), where new additions are only practical to the END of whatever 'set' there is.

 - Numbers got ditched.

Category

Can category contain list of category information including sub categories? Transportation -> Cars -> Cabriolet or whatever, using some separator. Or does it need to be one 'non-hierarchical' string? How that will work for UIs? I know there's category_sub, but it's clearly for different meaning.

 - This one got answered, there will be mapping to numbered category lists with hierarchy.

Consistency

If other tags got prefix number, why doesn't 'inner layers' have? Like in 02_pubkeys there's Bitcoin and PGP, but those do not have prefix number? I'm tech nerd, I hate things which doesn't seem to be consistent. Consistency is the key for 100% automated processing, any exceptions to 'generality' will suck.

 - Like I already mentioned, numbers got ditched.

Extensions

Official and non-official extensions? Official way to separate between those? Standards like SMTP and HTTP/2 (and many others) contain clear definition how you should deal in such cases. Of course some 'non official' extensions are pretty standard too. Like some X- headers with SMTP. I would like to see ways to do both, official and non-official extensions. Non-official extensions would be included in contract but ignored by clients which do not understand what those mean. Like I said, I've seen so many extendable formats, which get totally broken if you go and extend those. I would hate to see it happening again.

 - No official comments to this, so far, afaik.

Currency units

I really would like to see a possibility for fiat currencies, or let's say "other than Bitcoin" currencies. Are those mutually exclusive? | or & ? Based on rest of contract structure I assume it's | aka or.

 - Alternate currencies are available as nomination, but not as payment method. It's still fixed to Bitcoin.

Images

I assume that hashes and URLs both need to be filled. Of course JSON is very flexible, but I would personally prefer lists in lists or dictionaries in list than two separate "parallel" lists. I know that ziping two lists works works, but it still can be a pain for someone to parse that stuff. So [[url,hash],...] or [{'url':'','hash':'',...}] of course this is just my humble opinion based on long experience.

No more contract encoded or via OB network delivered images? Maybe there should be some URL format for 'via' OB network images (no HTTP), which could be used here?

 - If I understand the situation right, images will be delivered via OB in future.

Keywords

Is there a maximum number of keywords? Otherwise I could form a set("wikipedia_text_dump".split()) to get a few hits for my contract as well as flooding DHT with absolutely tons of keywords.

 - If I understand current situation right, there's now limit of 5 keywords / item.

Order Processing time

Whenever there are values or fields, I would prefer to see unit. At least in documentation. Or is it truly open text field? When standardizing something I would prefer uniform presentation with units. If it's 'open text' it can't be properly parsed by software reliably. I suggest using unit like days, or then open text. So user can write that stuff will be shipped every Monday or something like that.

 - Now units are always in days.

Passcard

We've talked about so many different technologies? Does the Passcard need to be there. In general, I like this kind of things kept open. Like a dictionary where there is key and value, and key can be Passcard. Or just two keys, like method and value etc.

 - Now Passcard is called Blockchain ID and can be used to map unique username and subdomain to OpenBazaar GUID.

Bitcoin

Bitcoin is there in several places. If value is nominated in fiat, does it mean that the payment can be still made only using Bitcoin? If not? There should be place for 'alternate payment' channel, which also should be encrypted similarly to the shipping address. (Whatever the method will be.)

 - Like I said, it's only Bitcoin so far if I got it right.

Shipping address nonce hash

What's the use of sha256(nonce)? Just to verify that the nonce has been correctly delivered if it needs to be delivered to the moderator? I don't see any other use for it. Also applies to other places where similar construction is used.

 - Yep, it's for confirmation purposes only. It's easy to confirm against that hash if the nonce given is one used with contract.

Content source URL & Password - No separate nonce?!?

Now we're encrypting using XOR TWO separate strings using the SAME nonce. Fail, that's guaranteed fail. I could easily assume that the URL begins with http:// or https:// (or upper case) and that gives me 7-8 first characters of the nonce straight away allowing me to extract the beginning of url related password. Knowing how bad passwords people often use, that's clear and straight FAIL. Of course one nonce field could be used in theory, if it's as longs as those two together and split somehow using some length information or so. Well this is the place where the theory of OTP being perfect of total failure becomes immediately true. Classic key reuse mega fail.

 - Well, you've guessed it right. It got fixed asap.

Rating

I see there strings, I assume those could and should be mostly int values (?), except the review. I know open discussion is going on, but if there are anything else than pre-agreed values it totally foils automated processing.

I think the quality was 'Gggrr8e8ttt!' and delivery time was "I'mmmm just soo happyyy!! - Love u!". Yep, that's it, no other questions or comments so far. I'm sorry for everyone, if I'm being so darn critical and maybe bit sarcastic, but that's just how I am. Things are either done right, nor not.

 - Now ratings are integers and reviews will be character limited. So things are getting so g0000d! ;)

Schema rigidity / structure / flexibility / documentation

Only thing which is after all really important is the clarity of documentation whatever style, format or structure is being used for contacts. I often tend to say that I do not really care about the format. I just need to know exactly what it is and I can deal with it. Yet, it's much easier to deal with somewhat sane than some other more or less insane formats. If some field is directly related to other field providing only redundant data, it should exist at all. I know that there are cases which counter this style and there are also several other points how to do it and why. But the reason why I prefer it is flexibility and simplicity when using modern tools.

If contract got key auction does it require another filed which is contract sub type auction? If it's not an auction then the lack of presence of auction key should already indicate it. But this is just my personal opinion. Target is to make things as flexible and dynamic as possible. Rigid schema guys (usually enterprise XML / XSD, Java, C++ etc), will hate me for such thinking. Like I said something about standards board, I've dealt with. Redundant information is ok, if it's in completely different sections of contact which are being handle by completely different subsystems. Yet even then it's a good question should the redundancy be in the document or should it be 'injected' at the point when data is split for the subsystems. Where the processing logic should be? These are really great questions when things start becoming more complex. Yet if there is a library and API, it's often good to implement as much logic as possible in it and try to keep rest of message handling just as passing of bytes with minimal routing information. (Example EDI envelope). I wrote this stuff about 4:50 am, so it might be bit repetitive and isn't properly formatted. But I hope it brings the point out somehow sensibly.

Field usage and relations in details in documentation, integrations

This is the most common issue with data structure documentation. I can see the structure by I don't know how fields are related to each other. As well as the content of some fields can set requirements for other fields having some value or making those unnecessary. This should be very clearly marked. These fields are only used when sub type is auction.

I've already had this exact problem when parsing the current JSON messages sent via WS. Yet this is very common problem in industry and documentation. Format is described in detail but values, meanings and relations between fields and different tables aren't.

It's so common that it's more like business as usual. Especially finding out the details of the relations and meanings without working test environment and customers dataset can require never ending test and failure iterations. If iterations are slow things can easily end up taking months or even years. I guess we've all been there with larger integration projects.

More semi-random comments & thoughts


Category: Google's Product and Services taxonomy. That's great choice for preexisting extensive set!

Extensions: Using alternate obcv version wasn't my original goal. Contract is crafted according 2.0 etc, but it could contain additional tags, which will be ignored by clients which do not understand those. But if there's 'rigid strong schema', I understand that this might be unwanted feature. How about then adding optional key like 'ext' which would allow 'any additional data', which can be carried with the contract. Or is the plan that every system requiring changes to standard schema will use it's completely own schema?

Currency / Bitcoin: Why two fields, if there's only one active price? If price is in BTC why not just use corresponding currency code?

Images: Very happy, that's the way to go. Fully utilize what is already done and doesn't require even additional external resources and dependencies.

Keywords: My personal opinion is that 5-10 should be enough(?). It's enough to clearly tell what it is about, but doesn't allow extensive keyword spamming.

Shipping address nonce hash: Very valid point, yes it's important to verify that nonce is the right one.

After the revisions already made only extensions and currency questions are open.

Distributed network thinking, full mobile peers, mobile peers using a relay to access network, or mobile thin client where basically everything is hosted on a server. All of these models got own pros and cons which have to be carefully thought through and considered.

Some random OpenBazaar DHT related ideas and views


Sorry, This text could seem to be out of context, because I can only quote my own text from private discussions.

My point was that store / market doesn't need to be same thing as node. One node should be able to run multiple stores/markets, afaik. Hence multi-tenancy.

Reason for DHT being 'unreliable' is usually the reason that DHT data is rarely persistently stored and network node churn will get rid of data even if it's TTL doesn't expire. If DHT data is persisted for extended TTL then there needs to be republishing to replace replication lost due to node churn. Even that doesn't make it reliable, but makes it much less likely that there aren't any copies around.

This method allows network to survive with 'reasonable' churn and retaining 'acceptable' reliability. Yeah, I know, I used weasel words on purpose. I don't have any facts. It's just like any other storage system which should retain N+1 replicas, if replicas are lost then you'll need to re-replicate the data to maintain minimum number of copies. If that fails, well, then it fails and data is lost at least until some peers might come back with the data.

Technically all DHT does in such case, is just 'distributes blocks data randomly on Nodes due to data address being used on hash' and replicas usually on nodes with neighbouring IDs, which usually is also random in real world terms.

Some networks do use supernodes, but I personally don't like that idea too much. Yet especially when bootstrapping network, it's good  that there's some kind of Node Reliability Indicator value which let's the clients to know if the node should be expected to be on-line or not after a some downtime.

Afaik, this is best approach if we exclude the other options:
A) Official paid central storage.
B) Nodes and moderators hosting their own content and being unavailable when nodes are unavailable.

XXX - I liked your model. I guess you've been studying email (SMTP / POP / IMAP / WebMail) or NNTP. Smile. I also loved the model how XXX wrote it. That's exactly the model why I love Freenet, it replicates and generates copies of data much faster than BitTorrent or similar P2P networks, because any node handling the data will cache it. So sources aren't limited to seeds and peers of that particular swarm alone. Yet with network churn, and depending from availability and TTL this could cause situation where it's hard to find a working source.

The Auction Haggling is also related to this. Afaik, I haven't seen it documented how you can delegate handling of auction ot another node. This is similar, the contract is created by some other entity than which is hosting it. Basically it's the same question. Except with auction haggling it's bit more complex, because in case of bids are received the hosting node would need to update the contract and that can't happen if it's signed by the original creator's private key. I've asked this earlier but so far I haven't seen any solution. All of these scenarios are indirectly linked to multi-tenancy. Of course there can be different kinds of multi-tenancy, one where the contracts are just hosted by the node. And in some other case there could be 'full stores' hosted by the hosting company, just like in case of on-line Bitcoin wallets, then the hosting authority also got full access to private keys.

XXX - That option to include list of listings is nice. Yet, I would prefer node information instead of IP address. IP addresses are and will be very volatile in future. So IP address can change very often, yet the node using different IP will remain the same.

Separating server / core-node code from client code would potentially allow making JavaScript only OB client, which would use then standardized 'proxy' to talk with OB network. This kind of solutions have been seen with all kind of tech, like the good old NNTP, IRC, Email and so on. Very old model. As well as very cool for mobile clients, which generally don't like full P2P networks. Who runs a full Bitcoin client on mobile?

I've also talked with people about using WebRTC which would allow direct JS / HTML5 browser 2 browser communication. Using it for globally distributed CDN networks, gaming, VOIP and so on.

XXX - Don't talk about IPs it's a very bad way in general to refer P2P nodes identity on Internet. Some other identifier like node id should be used, which then can be mapped using DHT or other methods to IP address and port. But the IP adress isn't a nodes identity.

XXX - I really love your thoughts and that abstract thinking.

XXX - I was referring to loading of information when browsing store / items. It doesn't matter if it's standalone app or a website. It still can be slow to browse. Most annoying example of this is image viewer apps which do not pre-decode the next image. With high resolution art decoding of next image could take 250+ ms which is enraging if you just want to pretty smoothly scan through the images.

XXX - That DHT data storage with automated replication would solve DDoS problem, because you would need to DDoS the whole network to bring it down, at least in theory. If some nodes go off-line or are inaccessible, data should be just replicated to more working nodes if the re-publishing / automatic 3rd party replication is implemented.

XXX - DDoS / Flooding / SPAM on DHT is quite hard to manage. Those good old times, when playing with ED2K network. Flooding with fake peers and making all P2P network nodes to connect to a IRC server to bring it down etc. So much fun. In early days many protection systems were near to none. Very simple tricks to trigger DDoS worked very nicely.

For the users which are looking for #3, I can assume that someone will be providing that service anyway. If it won't be directly available, someone will implement it anyway sooner or later.
#1 Won't work well, because so many users already have limited Internet connectivity on mobile -> leads to #2 indirectly anyway. As well as being full peer demands much more bandwidth, storage, CPU resources and so on. This is one of the reasons why distributed isn't cool on mobile practically, even if it might at first hand sound cool to nerds. Skype was like #1 for most parts but now they're reverted all the way back to #3 mode. From the buyer part #2 is the only option. Because in that case the gateway being used can change at any time. So the gateway must just work as 'passive relay'. IMHO, AFAIK.

Network churn isn't (directly) the main reason for battery drain. It's the generic network coordination traffic which causes the battery drain, because full nodes are involved in many kind of coordination traffic as well as working as DHT nodes and so on. Network churn is just small part of that, yet it's the reason why the coordination traffic has to be quite 'active' and peers have to be checked for being alive more often. Of course is can be optimized, so no need to ping, if there's other traffic. Peer pings need to be generated as keep-alive only if there hasn't been any other traffic.

Karma points? Without any references? Can I launch instant karma site? It would auto generate a new store every 15 seconds, give all my subscriber clients a karma point, rinse and repeat. 10€ / month to receive ~173k karma / month? I've got a long history and plenty of experience in different spamming and DoSing networks and playing with systems utilizing different attack / spoof techniques and this one seems really easy to make. If that works out well I can ramp it up to 10 - 100x rate using parallel instances. Fully automated. Very classic problem with many trust systems. Simply a massive automated Sybil attack.

Some deep discussion about different distributed user trust systems considerations, but because I referenced so much to private documentation / talks, it doesn't make any sense. Removed.

I also studied OpenBazaar Product Specification in detail. Unfortunately no public comments, because the document isn't public yet.

Btw... OB team, I would also love to see IPv6 networking.