Blog‎ > ‎

32c3 comments, random ramblings, thoughts, notes, dump part VII

posted Feb 21, 2016, 6:48 AM by Sami Lehtinen   [ updated Feb 21, 2016, 6:49 AM ]
VOID LINUX. fail0verflow on PS4. IPFS The Permanent Web. Decentralized (Federated model) / Distributed. Yet I personally wonder why so many charts show distributed network as MESH network, direct peer to peer network isn't actually a mesh network. It's a network where every node is connected to each other directly. Of course it's also about the diagram semantics how this is presented. But it seems that many people don't get the difference, even if it's very essential. Basically P2P network with advanced internal routing and limited number of connections form distributed mesh network. Just like Bitmessage does. Terms The distributed, permanent, Merkle, web. Safe, Faster, Offline. Location Addressing vs Content Addressing. Hacker Spaces. Wikidata, machine readable Wikipedia. Genesis.re operating system for new world. Business and entrepreneurship. Encrypted Walkie Talkie Communication with Strong Encryption. External infrastructure free implementation. Real time communication. Group communication. Passive listening without transmission and revealing location. Push to talk. Tech: Codec2, Chacha20 / Poly1305 (RFC 7539), GMSK modulation. The possibility of an army. DullTech is good, it just works and is dull. Code Stylometry & Programmer De-Anonymization by Princeton University. Alternative non-stylometric detection methods. Stylometry and machine learning continued. More privacy concerns for programmers. Stylic fingerprints. Style expressed in code can be quantified and characterized. Supervised stylometry. Stylometry allos to classify a set of unknown authorship based on documents of known authorship. Text alone can identify you, no names or any exact references are required. De-Anonymizing Programmers via Code Stylometry. Usenix Security Symposium 2015.  Source code stylometry. Everyone learns coding on an individual basis, as a result code in a unique style, which makes de-anonymization possible. Programmer style changes while implementing sophisticated functionality. Differences in coding styles of programmers with different skill sets. Identify malicious programmers. Who wrote this code question is such that there are several scenarios where it would be interesting to find it out. Someone wrote a library with malicious source code. They want to locate the adversary. Unfortunately there's no technical difference between security-enhancing and privacy-infringing technology. Lexical & layout & syntatic analysis, using different machine learning & classification methods like: case-based reasoning, nearest neighbor, C4.5 decision tree, genetic algorithm, random forest. So there are really any ways to get the analysis done. Naturally this can be done over any language like C++, Java, Python, etc. Workflow, get code, preprocessing, Fuzzy Abstract Syntax Tree (AST) parser, Extract features, Random forest classification & majority vote. Allows fully automated analysis based on layout, lexical and syntax of the code. Using random forest avoids over-fitting and is efficient multi-class classifier by nature. Provides K-fold cross validation and method can be validated on a different dataset. Authorship attribution. Who is this 'anonymous' programmers? 94% accuracy. How to obfuscate code and become unrecognizable? Common off-the-shelf source code obfuscators still leave identifying information in code. As example STUNNIX for C++ or TIGRESS for C. Coding style is preserved up to some degree throughout years. So code written in years ago can identify you in future. Programs syntatic features alone reveal a lot about programmer. Machine learning can be used for mass analysis and reducing suspect set, even if it wouldn't be totally accurate. Obfuscation is no the answer or solution to this problem. Stylometry in executable binaries is interesting question. Compiler changes code like obfuscator, but it still reveals a lot about the original source code. Basically coding style survives compilation and makes de-anonymizing programmers from executable binaries possible. Binary extraction, disassembly, control flow graph extraction and instruction feature extracting. Building data set from stylistic features using information gain. Filtering via support vector machine. Control-flow graph (CFG) features. Top-n Relaxed Classification, relaxation factor and correct classification accuracy analysis. Reconstructing original features using cosine similarity between original and reconstructed feature vectors. More advanced programmers are easier to de-anonymize than beginners. calaylin, jstylo, anonymouth. It's good to remember that Tor or other anonymity tools won't make your writing / content anonymous. As example, a single photo can reveal extensive amounts of information. These analytic methods can be used for malware family classification. Future work also included de-anonymizing collaborative binaries.