Blog‎ > ‎

.NET Strings, Python, Lists, LeftPop, Deque, Shenzen IO, Comfort Zone, Open Media, Databases

posted Apr 8, 2017, 9:30 PM by Sami Lehtinen   [ updated Apr 8, 2017, 9:30 PM ]
  • Had long talk with elite .NET coder with decades of experience. He didn't know that strings are immutable in .NET. This also means that he didn't know what kind of performance problems are caused by this, if you're unaware about the immutability. The classic example, let's create one GB string by string concatenating characters in loop stuff. Which always creates nice CPU & RAM bandwidth hogs and take a long time. No wonder programs perform at times so badly, truth is that many programmers don't have a clue what they're actually doing. But this doesn't only apply to strings, this applies to so many other things too. If 'higher level programming language' is being used, the risk of not understanding what happens under the hood is even higher.
  • About those 'stupid fails', I thought some of my projects which use leftpop with list. It's also very bad design choice: Why? Here's some benchmark:
    List creation and popping with one 2 ^ 20 or 2 ** 20 or pow(2, 20) or 1024 * 1024 items:
    Deque create time, using append:   0.07 s
    Queue create time, using append:   0.16 s
    Deque popleft extraction time  :   0.21 s
    Queue pop(0) extraction time   : 227    s
    Ouch! As you can see, pop(0) / popleft / FIFO is very bad idea with lists in Python, when not using deque.
    If you use the regular pop (right) / FILO there's still significant difference on pop:in, but it's not that radical:
    Deque create time, using append: 0.07 s
    Queue create time, using append: 0.16 s
    Deque pop extraction time      : 0.21 s
    Queue pop extraction time      : 0.29 s
    So if modifying any longer lists, use deque even if list in most normal conditions performs okishly, but it can be a real nightmare with larger lists. This is just here to remind me not to do this silly fail again. Gotta fix a few projects using these. As well, I'll be fixing the "magic numbers" with named constants in code. In this case I'm referring to magic numbers meaning by Wikipedia: "Unique values with unexplained meaning or multiple occurrences which could (preferably) be replaced with named constants". Yet Python doesn't actually support constants, so variable lookup does degrade performance. These are just the examples of optimization problems. Messy code can be much faster than clearer code, because the messy code is designed to be fast and minimize lookups and jumps and function calls etc. Just like rolling short loops open etc. Same problem applies just to so many optimizations. Those might grow code base a lot, make stuff much more complex, etc. Yet 'smarter' processing of data might also make drastic performance improvements. - It's always a trade-off like said. Make simple code, which works reliably. Or doing lot of interesting stuff to improve performance and growing code base and adding complex logic which might have multiple hidden serious flaws with some edge cases. In that sense really naive code is often very good choice.
  • Read article about Shenzen I/O game, which challenges you to optimize the code to the max. Which of course is very serious task. I'm just looking for easy and low hanging fruits. Which if not thought about are more like failures than real optimizations. Unfortunately I don't have games for, but you can still learn from the mentality to beat the odds. And knowing that if you think something is optimal, you're most likely very wrong about it.
  • Going outside your comfort zone. Some projects have been on the very edge lately. But hey, that's just the case where you're forced to evolve and develop fast. Therefore people actually should aim for going outside the comfort zone quite often. It also makes you judge your own actions, have I done this well. Repeatedly rethink the situations, how could I have done better, engage in quite harsh self-assessment and so on. As well as play rehearse situations you're going into, so you can be relaxed and confident in those situations. I've done this dozens of times earlier, no problem. I'll take care of it.
  • Studied Alliance for Open Media, AV1 Codec and AOMedia Video 1 (AV1) on Wikipedia. Other nice video compression and codec related articles. H265 Part I Technical Overview and H265 Part II Considerations on quality and "state of the art". Nice article about bit older tech H.264 is Magic. Well, I wouldn't personally define magic so, but it's still very neat piece of technology and code. With video codecs efficiency and performance are very important points, because there's plenty of data to process.
  • With databases I usually use Serializable isolation level, but for long running / large data set reads use read committed to make processing faster. Of course it depends on database engine how much a problem using long running read queries cause. For obvious reasons I almost never use read uncommitted. A nice post by Elliot Change about SQL transaction isolation levels. Another very important thing is to use locks correctly, so you're not causing deadlocks.