Blog

Google+
My personal blog is about stuff I do, like and dislike. If you have any questions, feel free to contact. My views and opinions are naturally my own personal thoughts and do not represent my employer or any other organizations.

[ Full list of blog posts ]

AI Boom Bust, Social Networking, My passion, Baltic security review

posted Apr 23, 2017, 10:04 AM by Sami Lehtinen   [ updated Apr 23, 2017, 10:05 AM ]

  • This AI Boom Will Also Bust - This article actually pretty much is on same lines as I am. Many organizations got lot of data which isn't being utilized in anyway. Even lot more data could be very easily gathered, if required. Using the state of art machine learning on that data would be just computationally too intensive and wouldn't produce worthy results. Best bang for buck would be using some very simple charts, KPI and linear regression on that data. Aka, basic traditional BI stuff or Data Science if done it bit more advanced form. That would revel where there are inefficiencies and which units / items / personnel etc is doing better than others and help the organization to seek knowledge why so. Thinking that you'll just forward data from your "data lake" to "AI" which would then run your business more efficiently is pretty fantasy, at least so far. How about replacing the investors on shark tank / dragon's den with AIs wouldn't that sound great plan? At least the AI could properly analyze the business background and wouldn't be so "limited" on available resources than the busy investors currently are. Wouldn't the advanced AI automatically clean and merge the data, as well as figure out what really matters in that data. That's the dream of AI. But as it comes clear in this article, it's not the truth yet. So called AI is often just very complex (at times) statistical model, nothing more, currently. That's also why some of the simpler models could produce similar or better results with much less computational resources. This is also one of the primary reasons why I haven't yet applied real "machine learning" on anything. Most commonly I use simple statistical models like Bayesian Filtering or Collaborative Filtering. Even those simple methods can require lot of computation resources on larger data sets. Requiring multiple optimizations to make the actual processing simpler and reducing amount of data being processed using several shortcuts. Of course there are cases where accurate prediction does make all the difference, but as usual. I like balancing resource consumption with required output. Shouldn't smart AI (or Strong AGI) do that completely automatically? These are the things that matter and matter this much, this is how we efficiently allocate resources to produce best results form this data. This data is ignored, this data is slightly analyzed to see if it causes any alarms, and this data is the key data which will be very carefully studied and analyzed using the most advanced methods. You don't need great AGI to make business predictions for the board, it should be able to replace the board. ;) - But as said this is something which will be probably over estimated in short term and under estimated in long term. There are real benefits coming and AI research is progressing. But you can't yet replace full business department with a rack of servers, at least in many business sectors. Something like AlphaGo and Watson are going to change things in future, but those are still limited and aren't running in your pocket. kw: deep learning, Monte Carlo method, inefficiency, inefficient
  • Social Networking: Show Passion, Show Expertise, Signal Flexibility, Ask for Advice -> Good!
  • My passion is to make reliable, efficient software without wasting resources and execute projects which truly benefit the end customer. Without hype or false exceptions. Sometimes taking perspective can be seen as negative thing, I'm taking lot of perspective in this blog. But I think some of the perspectives also just reflect my experience and views, boosting my expertise. And sometimes I'm just plain wrong, but that's just very human. I might have some bad background data or experience, but if something is wrong. I'm glad to review my opinions when new data or perspectives are brought to light / my knowledge. So as said, opinion might be strong A, but in light of certain circumstances it's totally natural to choose option B. Everything in this blog is mostly generalized, so this doesn't reflect "absolute opinion" in no means. - Thanks for feedback.
  • Read dozens of pages of Baltic Sea security review, Finland (Åland), Sweden (Gotland), Denmark (Bornholm), Estonia, Latvia, Lithuania, Russia (Kalningrad, St. Petersbug), etc.

Economic Rational Decision Making, Game theory, Life satisfaction and Links

posted Apr 22, 2017, 10:08 AM by Sami Lehtinen   [ updated Apr 22, 2017, 10:11 AM ]

I were in a meeting and now I've got a reason to make sure, I'm making a good rational decisions. That's why I decided to re-listened the talks Thinking Like an Economist. Loved it, pure logic of making money and evaluating deals. Well said, it's all just common sense. Let's see if there's anything new, which wouldn't be obvious with that common sense and logic. Many things sound like stereotypes, because well, those are absolutely true. You can't even argue against those. Spent a weekend reminding my self about basics.

Talks of course had several very good real life examples.

Principles:
  • People respond to incentives
  • There is no such thing as a free lunch
  • No thing is just one thing - There are always (at least) two sides to every interaction
  • Law of unanticipated influences
  • Law of unintended consequences
  • No one is in control

Core concepts:

  • Rationality
  • Marginal analysis
  • Optimization
  • Efficiency
Some related links, which I also visited and read as addition to listening the lessons / lectures / course + exams.
After finishing all that, I'm actually I'm quite happy. These talks contained words and concepts in English, I wasn't fully familiar with. But after all the concepts and models were all very familiar. It's just common sense, like I stated earlier. Considering options and making rational decisions aka sane choices. This is also demonstrate diminishing returns on learning more about economic thinking. If I take a two day course on topic, and won't learn anything new, it's all time & money wasted. Some of these areas are just pure math and logic, but some are ahem bit more complex. Once again, everything is a trade-off.

I also considered to read reading the Finnish books:
  • Lamantaittaja
  • Välähdyksiä pimeässä ja pimeitä välähdyksiä
  • Väärää talouspolitiikkaa
  • Mahdoton menestys
  • Kaikki oikein
  • Erinlainen ote omaan talouteen
  • Raha ja Onni
Only funny thing is that when I asked a few friends whom clearly would benefit from thinking like an economist concepts, they all refused to be interested about these concepts. I guess that's what makes the difference.

AWS, Future, Data Structures, Microservices, Process Memory, World Views

posted Apr 15, 2017, 9:56 PM by Sami Lehtinen   [ updated Apr 15, 2017, 9:57 PM ]

  • Read about new AWS services, Lambda@Edge, AWS CodeBuild, AWS Greengrass, AWS Athena and AWS Shield. Especially Greengrass is interesting platform for processing data using IoT devices using MQTT rules. Lamda@Edge provides distributed computing on Cloud using the Edge CloudFront edge servers. Yet those can't actually call other resources, so it's just really cloud based processing of data, using RAM. There's also tight time limit for processing time which is just 50 milliseconds as well as memory is limited to 128 megabytes.
  • Read long article about future visions and technologies. Not surprisingly it listed: Unlimited free cloud storage, robotic service-oriented jobs, trillions of IoT devices, lot of 3D printed machines, medical parts, implants and even intestines etc, Internet of Things (IoT) communication implants, big data managed democratic countries, lots of artificial intelligence (AI) usage, driverless cars, no need for traffic lights.
  • Data structures for external memory - A very nice article. Nothing new, but it's nice to see graphs and get some background information, if you didn't know it already. - I've written so much about inefficiency, so it's good to check that you code isn't horrible. (In case performance is required)
  • Read a few articles and discussed with friends & colleagues about Microservice Architecture (MSA). When the system(s) have been designed from scratch for micro services. This is great shift, because I've seen many projects aiming for hugely complex monolithic All In One (AIO) solutions. Where one system does it all, using extensive standards and logic, using staggeringly massive interfaces. One interesting solution is Micro Service Oriented Architecture (μSOA) - http://baiy.cn/doc/byasp/mSOA_en.htm -, where you'll get some of microservice benefits from architectural standpoint but much less overhead. I personally really like the fact that with this mode you can avoid networking and IPC communication overhead. Which is common reason for smaller projects not to use microservices.
  • Nice post by Julia Evans - Why can't I just see how much memory my process is using? - Yes, and that's just small generic part of it. Truth is much more complex. Like memory compression and other stuff. Yet all this complexity and "relativity" causes some funny effects. You use (or write) program which 'allocates and accesses' memory, to shrink other programs memory usage? What? Or you can even manage that by just accessing suitably sized set of files stored on disk repeatedly. - Yes, that's very true. Because your application (or disk cache) puts on purpose pressure on memory management, it leads disk cache evicting pages as well as other processes to be swapped out or compressed. So, if you think you have "too little free memory". You can run very simple program to allocate N gigabytes of memory and access it repeatedly for a few minutes. After that you'll simply release that memory. And now you got plenty of free memory. - This is one of the exact reasons when people say "their system is out of memory", it just indicates that they'll completely (usually) lack understanding of what memory even is. Coming up with 'right amount of memory' is quite complex science which is so complex, that at least I can honestly say that I don't master it. It would require lot of case specific testing, and even then it would be most likely related to the question what's acceptable performance for that specific task / system. One of the ways to detect that is to of course watch performance, amount of page faults and amount of disk I/O / active disk time on Windows. But this is also one of my favorite topics, so I've written about this earlier and most likely will write in future too.
  • Views of the World - Amazing site. As I wrote that geolocation aware web app, as well thought about CDN networks. I encountered a few of the matters handled on the Views of the World site, like population distribution on planet, etc. But this is much wider topic and there are so much more data available nowadays. So just take a look and learn. Accessing the site was enlighting at least to me. Nothing really new, but lot of aha moments, that's something I thought or didn't thought. Now I got the facts. - Discussed world market for business sector X with a few guy, it was clear that when they were talking about the market, they did mean EU & US. It seemed that they completely ignored Asia. So they had good view of their market, but it was clearly lacking global view. So there's potential for such product or service clearly in areas which they weren't considering.

33C3 notes & keywords part 6

posted Apr 15, 2017, 9:53 PM by Sami Lehtinen   [ updated Apr 15, 2017, 9:53 PM ]


  • Corporate surveillance, digital tracking, big data & privacy - About: Surveillance, big data, corporate surveillance, big data, digital tracking, data broker. user privacy, private personal data. Analyzing and utilizing valuable data without consent. Corporate malpractices, what could possibly go wrong? Marketing bullshit terms. Hahah! Google, Facebook, Instagram and Twitter spying users. Ad networks, analytic services, social network sites, etc. Smartphone = Person Tracker. Collecting and selling personal information. Addthis, Bluekai, data collection services from Oracle. BlueKai, Datalogix, AddThis. Behavioral Data, Social Data, Purchase Data. Used for targeting, personalization and measuring. - That's one of the reasons why I personally prefer using addthis in a way, hit doesn't leak your data, unless of course you use the share feature. But by default, it's not being shared. Creating Efficient and accurate One Addressable Consumer Profile. Using Cloud Data Directory. Sources like: Acxiom, Infogroup, Neustar, Experian TransUnion, IXI / Quifax, Lotame, VisualDNA. Also Visa and MasterCard leak your purchase data. Not of course forgetting Google, Facebook, etc. Building Rich User Profiles. Allows excluding ethnic groups etc. How do they know and is it accurate? As example people are easy to target by: Ethnicity, Gender, Sexual Orientation, Political Views, Religion, Nicotine Usage, Alcohol Usage, Relationship, Drug Usage and if their Parents are Divorced. It's just data, it's not discrimination. Can we price your car insurance based on what you've liked on Facebook? Facebook credit score? WhatsApp shares data with Facebook. VisualDNA automatic Psychological Profiling on-line. (Personal note: Ouch, I wonder what they're saying about me. Hahah. Honesty, I really don't care.) Is it bad if line between marketing and risk analysis is being crossed? Interlinked databases form Networks of Corporate Surveillance. Spying data gets collected covertly, so users don't even know it's being collected. Types of data being collected: financial, contact, socio-demographic, transaction, contractual, locational, behavioral, technical, communications, social relationships. Especially collecting location data is very popular. Different data collection levels on mobile providers: Application level real time location. Interval based location snapshots.. Emergency Services Location. Granular network-based. Coarse cell-level network based. And of course high resolution indoor information based on beacons, WiFi, etc. Real-world behavior of mobile users surveillance based on observation graph. Matching users with Points of Interest. - Obvious question how interested intelligence services and law enforcement are about this? - Highly accurate audience profiles. - Also Stasi would have loved this kind of technology. - Separating between first-party data and third-part data. Marketing Segmentation. Credit Scoring. FICO Score (ficoscore). TransUnion Trustev. ID check, Address, Email, Telephone, Financial History, Behavioral, Location, Device, Mobile, Machine learning, Identity, Digital Data, Fraud Scoring. Typical Patterns, Past Reputation, Cross Merchant History, Network deep location. OS, Browser, VM, etc. White, gray , black lists for the identity. - Afaik, that's all pretty obvious. I'm doing all thet with some projects, yet using pretty simple algorithms. But the source data is there. - Four weeks of your call history, is enough to tell your credit history (?), pretty interesting claim. Predicting user character traits from smartphone metadata. Is user neuroticism, extraversion, openness, conscientiousness, agreeableness. Etc. With of course varying prediction accuracy. Data Mining, Mathematics, Statistics, Machine Learning. Zest Finance uses thousands of data elements to calculate credit scores. They co-operate with Baidu. More data is always better. All data is also credit data. Infobase-X. Basic data like: Credit History, Driving History, Criminal History, Residential History, Employment History, Education History. Income History, Credit Cards, Properly Data, Vehicle Daa. Purchase Behavior, Life Events, Voter Party, Health Information, Personal Interests. Name, Address, Phone, Email, Birth date, Gender, Ethnic Code, Martial Status, Childer. - Yep, sounds like pretty public basic data, ahem. - LexisNexis Risk Solutions. Healh care solutions where Social Network Analytics Reveal Hidden Relationships. Delinquent prediction. Plantir. PayPal Fraud Detection. SCL Group. Targeting and data-driven communication. Defence/Intelligence: Information Operations. Elections: Microtargeting for political campaigns. Classifying users on political views like: Pro-Life, Environment, Gun Rights, National Security, Immigration. Marketing Technology Landscape. Mobile Marketing, Digital Asset Management, Display Ad Management, E-Commerce, Loyalty management, Marketing Automation, CRM, Email Marketing, Social Media, Data Analytics, Business Intelligence (BI), Multi-channel marketing. Agile project management. TellApart - Predictive Marketing Platform. Customer Value Score. Allowing Dynamic Promotions. Right person at the right time. Personalized Pricing. Price Discrimination. "Rich see a different Internet than the poor" - Michael Fertik . Filter Bubbles. Anonymous identifiers derived from email addresses, phone numbers and credit card numbers. Pseudonym != Anonymous. Google Advertising ID, Apple IFA / DFA, Microsoft ID, Orcale Identity Graph, Acxio AbiliTec Link, Verzon Precision ID, Experian AdTruth ID. Data Management Platform (DMP) = real-time online data marketplace. Providing a central hub for data aggregation, integration, management and employment and disparaging different sources of data. Collecting enw data using tags and web bugs. Analyzing and categorizing people in different segments and audiences. Grouping people to lookalikes. DMS providers: Liveramp, BlueKai, eXelate, Krux, Lotame, Adobe Audience Manager, Rocket Fuel, Turn DMP, etc. What should we do about all this? DataDealer.com - link - An online game that explores the personal data ecosystem on the Internet. Networks of Control is a report on corporate surveillance, digital tracing, big data & privacy. Tracking the trackers about health, discrimination, data brokers, tracking ,algorithmic decisions, apps, employment, profiling, IoT, privacy, big data, personal data, surveillance, wearables, smartphone, credit scoring, analytics and regulation. Data collection and analytics should be transparent. There should be strong regulation to protect privacy. Support of decentralized, privacy-aware technology. Privacy-aware open source components & business models. (Nuff said!) - This was awesome talk. And provider so much information. Which of course wasn't anything new to me. But it's good to know that 'predictions' are ell coming reality.

.NET Strings, Python, Lists, LeftPop, Deque, Shenzen IO, Comfort Zone, Open Media, Databases

posted Apr 8, 2017, 9:30 PM by Sami Lehtinen   [ updated Apr 8, 2017, 9:30 PM ]

  • Had long talk with elite .NET coder with decades of experience. He didn't know that strings are immutable in .NET. This also means that he didn't know what kind of performance problems are caused by this, if you're unaware about the immutability. The classic example, let's create one GB string by string concatenating characters in loop stuff. Which always creates nice CPU & RAM bandwidth hogs and take a long time. No wonder programs perform at times so badly, truth is that many programmers don't have a clue what they're actually doing. But this doesn't only apply to strings, this applies to so many other things too. If 'higher level programming language' is being used, the risk of not understanding what happens under the hood is even higher.
  • About those 'stupid fails', I thought some of my projects which use leftpop with list. It's also very bad design choice: Why? Here's some benchmark:
    List creation and popping with one 2 ^ 20 or 2 ** 20 or pow(2, 20) or 1024 * 1024 items:
    Deque create time, using append:   0.07 s
    Queue create time, using append:   0.16 s
    Deque popleft extraction time  :   0.21 s
    Queue pop(0) extraction time   : 227    s
    Ouch! As you can see, pop(0) / popleft / FIFO is very bad idea with lists in Python, when not using deque.
    If you use the regular pop (right) / FILO there's still significant difference on pop:in, but it's not that radical:
    Deque create time, using append: 0.07 s
    Queue create time, using append: 0.16 s
    Deque pop extraction time      : 0.21 s
    Queue pop extraction time      : 0.29 s
    So if modifying any longer lists, use deque even if list in most normal conditions performs okishly, but it can be a real nightmare with larger lists. This is just here to remind me not to do this silly fail again. Gotta fix a few projects using these. As well, I'll be fixing the "magic numbers" with named constants in code. In this case I'm referring to magic numbers meaning by Wikipedia: "Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants". Yet Python doesn't actually support constants, so variable lookup does degrade performance. These are just the examples of optimization problems. Messy code can be much faster than clearer code, because the messy code is designed to be fast and minimize lookups and jumps and function calls etc. Just like rolling short loops open etc. Same problem applies just to so many optimizations. Those might grow code base a lot, make stuff much more complex, etc. Yet 'smarter' processing of data might also make drastic performance improvements. - It's always a trade-off like said. Make simple code, which works reliably. Or doing lot of interesting stuff to improve performance and growing code base and adding complex logic which might have multiple hidden serious flaws with some edge cases. In that sense really naive code is often very good choice.
  • Read article about Shenzen I/O game, which challenges you to optimize the code to the max. Which of course is very serious task. I'm just looking for easy and low hanging fruits. Which if not thought about are more like failures than real optimizations. Unfortunately I don't have games for, but you can still learn from the mentality to beat the odds. And knowing that if you think something is optimal, you're most likely very wrong about it.
  • Going outside your comfort zone. Some projects have been on the very edge lately. But hey, that's just the case where you're forced to evolve and develop fast. Therefore people actually should aim for going outside the comfort zone quite often. It also makes you judge your own actions, have I done this well. Repeatedly rethink the situations, how could I have done better, engage in quite harsh self-assessment and so on. As well as play rehearse situations you're going into, so you can be relaxed and confident in those situations. I've done this dozens of times earlier, no problem. I'll take care of it.
  • Studied Alliance for Open Media, AV1 Codec and AOMedia Video 1 (AV1) on Wikipedia. Other nice video compression and codec related articles. H265 Part I Technical Overview and H265 Part II Considerations on quality and "state of the art". Nice article about bit older tech H.264 is Magic. Well, I wouldn't personally define magic so, but it's still very neat piece of technology and code. With video codecs efficiency and performance are very important points, because there's plenty of data to process.
  • With databases I usually use Serializable isolation level, but for long running / large data set reads use read committed to make processing faster. Of course it depends on database engine how much a problem using long running read queries cause. For obvious reasons I almost never use read uncommitted. A nice post by Elliot Change about SQL transaction isolation levels. Another very important thing is to use locks correctly, so you're not causing deadlocks.

Presentation, US LTE, Cloud Storage, OpenBazaar, mPOS, Cyber Attacks, Privacy

posted Apr 1, 2017, 8:27 PM by Sami Lehtinen   [ updated Apr 1, 2017, 8:27 PM ]

  • So traditional fail, leaving everything visible on desktop and in email client when sharing a screen to give a presentation. No, you really can't do that. I wouldn't buy anything from a provider which does that. It's so massive security fail that it can't be tolerated. Yet this happens quite often. Every time someone makes this silly mistake, I'll take immediately a screenshot, so I can review in piece that information got revealed. - Very easy way of collecting information which you weren't authorized to access, or wasn't meant for you to be seen / known. OpSec Fail.
  • Compared US 4G / LTE operators for a friend in Los Angeles. It seems that the network performance there isn't often nearly as good as it is in Finland. Also available options are much more expensive. Especially if you're looking for unlimited data.
  • Had long discussion how Google Cloud Storage buckets and B2 storage buckets can be mounted as file system on Linux using FUSE and suitable software. Tried both and it works well. Yet it's good to acknowledge that there's major overhead and performance isn't going to be great.
  • Interesting article: OpenBazaar Truly Free Trade Through Crypto. It's awesome that there are more and more distributed p2p platforms which do not rely on central server(s) and aren't controlled by one closed profit making organization. Blockstack mentioned, I'll need to read more about this subject. Another interesting post Can Bitcoin and Multisig Reduce Identity Theft and Fraud?
  • Still about flash and random write. I cloned one git repository with around 10k objects, in total only 2 megs to flash stick, and sync operation took more than 6 minutes after git had finished. Write caching often masks the true latency on flash writes.
  • A good joke, the S in IoT stands for Security. Hahahah.
  • Had long discussions about project management, coordination, team communications, etc. It's always as fun as ever. Same topics, over and over again. Nothing new.
  • One application were crashing so often, that it was easier to create another application which restarted the first application in case it crashed. - Wow, that's really state of art development. - Yet this seems to be industry wide standard solution.
  • Studied mPOS tracker by Pymnts. Very nice issues and good reading.
  • Had more interesting discussions about helpdesk. Should helpdesk care about matters which are deeper or systemic problems. Or do they only deal with fire department matters. Let's say that system X is crashing several times a week. Is it ok, if helpdesk just restarts the system, and then it's fixed. We all know darn well, that it's not a fix or solution. It'll happen soon again. What about thinking it bit deeper, and even trying to acknowledge that this is not a fix, and trying to find the real problem? And now I'm not talking digging really deep for true root cause. Just looking bit deeper to see what's behind the issue, and not completely ignoring the fact that there is a deeper issue somewhere. - Just so generic question, who's responsible for what and how things are escalated forward correctly.
  • Unencrypted doesn't mean unauthenticated, those are separate things. It seems that people often confuse these. Also encrypted doesn't mean authenticated. Also authentication can be just connection authentication and it doesn't mean that the payload would be authenticated using MAC, signature or similar cryptographic means.
  • Bruce Schneier: 'The internet era of fun and games is over' - IoT is and will be major security headache in future. It's also nice to watch Mr. Schneier when Dr. Fu is talking. Understanding the Role of Connected Devices in Recent Cyber Attacks.
  • UK uses funny terms like: "Equipment interference". I'm sure you'll guess what that practically means. Anyway, they managed to create nice storm and titles like. - "How can I protect my self from government snoopers?" - "Everyone who can now see your entire Internet history, including the taxman" - "Your entire Internet history to be viewable by many agencies" - "The most extreme surveillance law ever passed in a democracy" - That's nice. Most of population won't care and those who got required knowledge, do not care in a way, because they can circumvent the monitoring using different technical means. Like VPN and systems which do not reveal metadata and communication patterns and use high grade encryption.
  • Helped a friend to choose a perfect European VPN service provider for personal user specific needs.
  • Something different? Mine-Resistant Ambush Protected (MRAP) - A specialized protected military vehicles. Which often easily topple over.

Discrimination, Statistics, Bayesian, Data, Analytics

posted Mar 26, 2017, 9:28 AM by Sami Lehtinen   [ updated Mar 26, 2017, 9:30 AM ]

Warning, this is n/n (nothing / nothing) post. This doesn't actually provide any value for me, nor it provides any value for anyone else. So only read if you're really bored and want to think complex issues where there's no right answer.
  • Just some random bits of deeper thoughts, or actually mostly questions. I don't have answer or 'right' opinion to the any of the issues(?) mentioned in this post.
  • Final question is that how it can be discriminatory if it's based on data? It's not something that would be inherently designed to be discriminatory, it's just something which reflects reality. Afaik, I wouldn't define that as discrimination. So many things are related and work as proxy for something else. So even if the results might look like discriminatory, it's not. It's just based on how things are. I don't personally like when people say that you can't say that, it's not right thing, or it's wrong or something else. No, it's not wrong if it's based on real facts and data. Even if it would be wrong on some ethical incriminatory values or hurt someones feelings. It's still fact based and true and therefore not discrimination or whatever you want to call that in different situations. Yet all this technology allows people to be processed (hah, what a horrible term) as individuals. Instead of larger aggregate groups. Yet, as said, people still often are profiled very much like their aggregate groups. But of course it's possible to be individual without typical grouping and current big data technology makes it possible. It's another question if the default grouping when there isn't enough data can discriminate someone. Probably, but isn't this the current case in US with the credit score stuff. So what if I don't have any credit score. It doesn't mean I wouldn't be credit worthy. Is that discrimination?
  • It's also funny if there's group A and B. B is the group which commits 90% of crimes. It doesn't mean everyone in group B would be criminal. But still, if there are "random" checks on street to see if they've got stolen goods. I would say it's totally fair and purely logical to focus 90% of the search effort on people belonging to group B. That A / B grouping could be anything, sex, religion, wearing a rabbi hat, having specific brand of jeans, having a gang symbol, specific country's passport etc. It's all just Bayesian math. Naturally all of these features can be combined in overall estimate. Is that discrimination?
  • Sometimes it sounds like 'anti-discrimination' laws are more laws to discriminate others in favor of some small minorities. I don't know? But this is the feeling I get at times. Also lot of effort might be 'wasted?' to fit in some minorities to some situations, etc. This of course totally in sense of efficiency and generalization. When making business decisions, there's usually careful consideration. It's not based on discrimination or not, but it's consideration to get the best end result. It might seem like discrimination at times, but it's not. That's not the key factor. Is it discrimination against operating system / vendor? If other operating system, database, server platform, cloud service, etc providers more cost efficient and over all better solution? Of course they claim that they've got generic market share of % in this market, and we should have quota of % in our servers to utilize their technology? Wouldn't that sound crazy? Sometimes vendors seem to think like this. But I can assure, I don't care about the vendor, I assure you care about the overall package. Also unions seem to have sometimes interesting views about salaries etc.Same job, same salary. But what if one employee is 5x more productive than others? Etc. These are complex and generally hard topics to talk about. I don't have any kind of 'right' or fixed view for this. But it all comes to fair overall consideration what's the best way to deal with the situation.
  • Similar questions go about countries and communities. Should you have freedom of speech? What if you're speaking against the 'organization' you belong to? Yes, it could be a country as well as company or some other community. Would it be just better to leave and than trying to undermine the organization? If Singapore is kind of police state so what, nobody can deny they're being highly successful. - These are all extremely complex questions, but still worth of asking. This is also relevant question according to the minorities and ethnicity. "When in Rome, do as the Romans do" What if they come here, and want us to do as they please? Is that right then?
  • Yet we've seen some stories where the input data for this statistical analysis (sometimes hyped as machine learning) can be highly misleading. In one story which I couldn't find right now, they had system to do medical analysis. The dataset they used was from single stage of the process, analysis made by doctor. But the results were highly skewed because in very serious cases the paramedics already took the patient to the next step passing by the doctor. Therefore the machine learning process didn't recognize the very serious cases, because the data set created from the doctor analyses lacked the samples. Unfortunately this is totally normal 'process' failure, which I've described happening over and over again in this blog. Then nerds claim, it's working, even if in reality it's killing people. - But that's just matter of perspective.
  • I think I've actually written about every topic mentioned in tis post earlier, so nothing new. No great conclusions to be made, just questions without any answers. Also all of these things are very fungible depending on perspective and position being viewed from.
  • Just as intentionally provocative setup. Everyone's got right for food. But is it right, that poor people get it free, and rich people have to pay for it? Is that fair then? - Yes, I know, this is intentionally provocative setup. But provides one of the complex questions which arise when reading election related questions and answers. - I'm also aware how socialist some European countries are. - Is that a bad thing? Can't say, too complex situation.

Airport ID check fail, Integrity Testing, Cryptographic Signatures, Fake News

posted Mar 19, 2017, 1:10 AM by Sami Lehtinen   [ updated Mar 19, 2017, 1:11 AM ]

  • Airport passport & identity checks were a joke. There was a separate person checking the ID & boarding pass, and then there were the computerized boarding system check. This should allow me to show different boarding pass for the ID checker and while actually boarding the plane. Basically the ID check was totally useless. Another funny thing was I had mobile boarding pass. So the image they saw was a screenshot. Naturally I could have presented whatever information to them. - This is just the problem with people and rules. When they apply the rules, they often make stupid decisions about how the measures should be implemented and the original cause gets totally lost. So, they never compared that the name on the ID and the ticket was same. They did check "some document" and ID and then they checked me in using the QR code on real boarding card, but that stage the previous verification step was already meaningless. Ha ha ha. I thought that people at airports got even basic security training, but nope. It was perfect example how they follow rules and at the same time make the rules utterly meaningless by using stupid implementation. - Greetings for this individual case go the Lufthansa (LH) &Frankfurt (FRA).
  • One of my friends (who's on pension now), used to do all kind of 'integrity testing' of personnel in different businesses. It was great and he had awesome stories. His job was usually to be outsider, who does something stupid, opening and opportunity for staff to do the wrong thing and then burn them for that. So if the opportunity seems to be really good and worth of taking. It might be a trap setup just to burn you. Sometimes I also do this on purpose, I'll let the small mismanagement slip and then make a major s*t storm about that. A hotel room cleaner found a diamond ring in the room. Didn't report it as being found. - Burned. Inventory got too much of something, you forgot to report the extra. - Burned. Too much cash at the end of day? Didn't report that? - Burned. 'Stupid customer' overpaid for something? - Burned. These tricks are especially useful when there's preexisting suspicion and the setup is there just to confirm it. This is just the simplest form. There are many much more advanced tactics. Yet at times, it's hard to tell if they're that devious or stupid. Anyway in either case, the thing they did, wasn't the right thing to do.
  • It's just like the many signature verification issues we've seen. Programmer DOES verify that cryptographic signature is valid. But they don't verify that the signature belongs to the request maker or authorized user, etc. It's just stupid. That classic XKCD. To verify that email is authentic check that it begins with ----- BEGIN PGP SIGNED MESSAGE -----, if so. It's all good. It would be also very interesting to test the security checks above. What kind of documents they would allow to pass. Would id matter if the passport / id card would be expired and so on. I'm pretty sure the checks suck on multiple levels. Unfortunately I don't think it would be a good idea to test these checks with forged documents, even if I would have valid documents on hand. - But it would be still interesting. Oh, you didn't like my fake passport (this time), here's the real one.

    -----BEGIN PGP SIGNED MESSAGE-----

    Hash: SHA256


    Is it really so hard to check a signature?

    -----BEGIN PGP SIGNATURE-----

    Version: GnuPG v2


    iF4EARYIAAYFAlgxcwkACgkQ/NTQawK41Cox+wEA025HgvdwFpk1XP0h1ytAj7aO

    V0VkBEJMDEAHcFrCmkQBAPWOZsP1ylxxozWYp0nNrGjwAODdy8A/LZmEZWGGupsI

    =d0vt

    -----END PGP SIGNATURE-----

  • The post about fake news on Facebook. So what? If people like and share fake news, it's good business for Facebook. Who really cares if those news are true or not? If they're tech company and not media company, why would they care about that? It's bit like drugs, people claim that drugs are bad and dangerous. Well? If that's true, why people then go and get more? I'm pretty sure it's not because the drugs are bad and dangerous. If those would be so bad, they wouldn't do such stuff ever again. - This is just my "tech company" view to the "drug problem" and information. People love fake news, and that's how businesses make money and isn't it simple as that?

33C3 notes & keywords part 5

posted Mar 19, 2017, 1:05 AM by Sami Lehtinen   [ updated Mar 19, 2017, 1:06 AM ]

  • No USB? No problem. - Sounds interesting. Software based USB stack. Basics of USB. This is nice, I haven't really bothered. Bit stuffing. keepalive. Slew rate. I really like this talk. Grainuum. Also loved many of the approaches where they mentioned, this is what should be done. But we can just go and ignore it. Nice.
  • Copywrongs 2.0 - Let's see how stupid laws there are being crafted. Yep, just as crazy as you could imagine. Well, I won't even comment these.
  • Quicky glanced at BearSSL. No I don't have any use for it right now. It's also alpha software. Alpha + Security = Not a good match for production. ;) But I admit there are use cases where lighter and smaller code is very useful. Getting rid of all that bloat, and doing only the essential things is sometimes very beneficial.
  • It seems that time of wonders isn't ever over. When watching 33c3 videos, suddenly there start to be really annoying audio artifacts. First I thought that the video streams audio was bad. But then I noticed when pausing and unpausing the video stream the audio artifacts were late for about 100 - 150 milliseconds. I don't know what caused the problem. Probably had to do something with too many different audio sources when I had multiple video stream tabs open. After rebooting system, everything worked normally again. - So annoying, so strange, yet so normal.
  • 3 Years After Snowden: Is Germany fighting State Surveillance? - This should be interesting. Well, it's pretty good. So far nothing I wouldn't have expected. Wiretapping Internet exchanges (IX), etc. - Nice ending, they got Snowden talking. Nice applauses too.
  • On the Security and Privacy of Modern Single Sign-On in the Web - SSO This topic is also interesting. But I'm sure there are way different implementations. Others are actually secure and some are guaranteed to be totally insecure. OAuth, OpenAuth, OpenID Connect, IdP, Mozilla Persona, BrowserID. Lack of privacy. Single point of failure. kw: token, authentication, authorization. OAuth 2.0 not compatible with OAuth 1.0, yet OAuth 2.0 is much nicer.  Session integrity. Authentication. Authorization Code. Redicrect Attack. Identity Bridge. Identity Forgery. Privacy Attacks. So much fail, unsigned parts, etc. Lack of signature checking, and other 'obvious' fails. BrowserID privacy broken. Spresso. Identity Provider. Subresource Integrity.
  • A world without blockchain - Complexity of cross bank money transfers. Interbank messaging. ECB. SWIST Communication network. XML SCLSCT BBkSCF. Yep, nothing special. Does look just like any other XML integration. TARGET2 / RTGS. Netting Batches. Low value transactions. Money Clearing House. Beneficiary Accounts. International cross currency payments.
  • Stopping law enforcement hacking - Stingray, IMSI Catcher. Military Surveillance Technology. Remote Operations Unit (ROU). Cross Border Hacking. Power Abuse. Lack of Firefox security sandbox. Cubes OS, Subgraph OS. Generally new high level talk. Surprisingly litte news / technical facts in this talk. "Are Linux users safer, because of being minority" - "No", was the answer. Technical debt mentioned. Zero Day Exploits. Government and Law Enforcement does mistakes. - Sure, everyone does. Network Investigative Technique, it's just FBI issued malware. Twisting words.
  • The Untold Story of Edward Snowden’s Escape from Hong Kong - Refugees saved him life? This should be interesting. Lame fail with the presenter, that's unfortunate. When giving presentation, if anything goes wrong. It's your fault. Presenter needs to make everything as ready and sure as possible. Because it's your presentation which will get ruined, by any fault whatsoever. Very bad start for that presentation.  Well story got much better after that major blunder at the beginning. Fund raising platform taking 20% cut, ouch! That's bad.

A System Integration Checklist

posted Mar 16, 2017, 9:53 PM by Sami Lehtinen   [ updated Mar 16, 2017, 10:05 PM ]

Just a few basic things to check list when building a integration:
  • Generic purpose of integration
  • Is all the required source data even available
  • Cost benefit analysis
  • Test cases
  • Transfer Triggers / Interval / Schedule
  • Data sources
  • Data mapping (Avoid if possible)
  • Transfer format
  • Data validation
  • Destination
  • Transport method
  • Data Security / Authentication / Signing / Encryption
  • Timeouts / Retransmissions / Other exceptions
  • Logging & Reporting
  • Monitoring & Alerts 
  • Key responsible contacts
That's just what I do, every time when I need to build one.

If the data mapping gets very complex. It's highly likely that nobody in the client organization is able to maintain it in future. If it requires stretching to get it right during the development. It's almost guaranteed that it'll fail in future. They'll be blaming the integration.

Suggestions, Questions, Feedback? Contact information is on home page.

1-10 of 474