30 years in software development - my comments
Excellent post: Experience of 30 years in software development. Here are my comments and thoughts about it.
Specs first then code. Well, yeah. So true. But that seems to be impossible for some people to understand. Also leads to hack on extensions, kludges, bad code and even worse refactoring to keep in budged and schedule. No different from building a house on ad-hoc basis. Yet that happens all the freaking time.
Layers, yes, of course. Unless it's something really small where it would just add overhead. Of course automated scripted tests. TDD + bad specs -> Sure there's lot of code to be thrown away. Because development project can be technically "ready", when they decide to change it radically. Depends from view of point, if this is extremely frustrating waste of time and energy or just natural steps in discovery & agile development. It's just like when some old legacy technology is required, it's lots of work to get it done, but was it really worth of learning it? Maybe. It's sometimes really hard to always see the whole project and not to focus on technical details.
Future thinking, so true. That's why I usually do only the stuff that's needed. Yet modules and libraries are usually written to be as versatile as possible. But without extra cruft.
Documentation, short and to the point. That's what I like. Documentation problem, ehh. Yeah, it of course depends how detailed the documentation needs to be. With some integration basically the documentation is written directly from the source code, because it has to cover everything. It's like writing the same thing twice. But of course in this case, if the documentation comes first, then coding is easy.
Single use functions and statement. Well, that comes to the details again. If the description is ambiguous enough, then you don't need the and. Sometimes is really hard to decide if to create a new function or not. I'm not personally real friend of ravioli code. Also during development due to organic code growth it's easy to recognize patterns that are good to move to functions. But you might not know that when you encounter the need for the very first time. Also knowing the scope of the function is hard, do you create function inside function, or do you need to re-use it, or if it should be in some generic library, etc. Same challenge applies to classes and inheritance. I'm not friend of those either, unless there's obvious need or benefit. Often if the project complexity grows, then I'll do the splitting of code. Keep it simple.
Boolean parameters. That's very clear. I personally prefer keyword arguments, in such cases. That also prevents the classic fail, where parameters are in incorrect order or something similarly silly. Also the values can be constant keywords which seems to be really common, or the parameters itself can be strings, but I personally slightly dislike these options with Python. Yet those are important to avoid "magic values". object, 6. Where 6 is "operation mode", or something similar. Interface changes, I really love when some project does that which I'm widely using. Why people still complain about Python 2 -> 3, isn't it trivial? Integrated documentation is the way to go. Otherwise it's way too common to forget to update the documentation. About exception handling, totally agreed. Yet that leaves the classic problem, should be let the system to blow up, or just slightly misbehave. That's something which is often very hard to answer. I just last week added such exception handler, because it was a problem that the program doesn't work. Now it "works" but it doesn't work correctly. - That's life. But that approach is approved by customer. It was simple and quick. Types related traps. Not always simple to answer some say it's silly to check if X is True in Python. But what if the X is 'false' (which is True as Boolean in Python), another trap is [False] which is also Boolean True.
Schema - Here's what I expected to see. Use a class, always. I already commented about this one. I'll do it, if it provides any extra value. I still haven't used Python data classes in production, but that's the right tool for this job, when required.
Right tool stuff, agreed.
Patching things outside your project - I know to avoid it, it's really bad way to go. But sometimes some things aren't possible without it. As example you'll need access to some data, which simply isn't accessible via any existing APIs. Sure, it's guaranteed to be nightmare to maintain. But if the extension / change is properly made, you can post a pull request. - Yet this is kind of related to any dependencies which change over time, not of course nearly as bad, but still the same, requires maintenance in future. That's why I avoid adding dependencies if possible.
Data flow patterns, agreed.
Design patterns, been there done that. It's ugly.
Functional programming approach ♥, agreed, I do hate complex internal state, so many bugs will ooze out of it.
Cognitive Cost, this is well said. Actually this is exactly why I dislike ravioli code, because it adds tons of indirections. Simple code versus code art and trickery. Even if not mentioned, this is a great questions. Is really powerful complex syntax better than bit more sparse code without that complex syntax? Hard to say. Yet I personally often prefer the sparse code. But it doesn't make you look like a Guru. Yet figuring out the complex hacker syntax can take considerable time, if you're unfamiliar with the construction. Yet the example of using sum to count Boolean values in a list, is very working hacker solution. Another thing is performance, some other syntax might be 10 or 100 times slower than some other solution. Does the readability / performance matter? Depends on project. Yet there are personal preferences as well.
Magical Number Seven or Four, this is exactly the reason why I dislike using functions and classes in simple cases. Those just add something you'll need to lookup instead of seeing the code right there.
Shortcuts and temptation of "easy", fully agreed. - Generally it would be very helpful to know what's going behind the curtains. That's usually where I've found the worst performance traps ever.
Timezones, UTF-8. Fully agreed. Yet if it doesn't matter, then I'll just use UTC time without timezone. So timezone is always UTF unless separately mentioned. I'll never user local time. Yet I've heard that it's really confusing. Many coders like to use local time without timezone. Sigh.
Start stupid, agreed.
Logs, sure. I personally prefer writing three logs. Generic log, which describes what happened and when. Error log which logs all errors. Debug log, which contains debug data. Why? Well, I prefer to retain the debug log for shorter period of time and the generic log is log which is really quick and easy to see that everything is operating normally. The Error log is hopefully empty.
Debuggers, so much agreed. We can't reproduce that bug, there's nothing to fix. Oh well. So classic story. Logging is the way to go, and adding extra debug logs, the usual way. Another classic case is where the debug logs are written only when there's an error. Well, that doesn't work out. Why? If the system truly crashes, it won't write those logs. That's why I prefer writing to the debug log what's going to happen next. If the system then blows up, I'll know what was the steps which preceded the event. It's just why watchdog is used, which is reset by the process. Can't we just do so that if the process crashes, it'll restart? Eh. - Logs are also awesome when people say something isn't working in long chain of microservices. You can just check the debug log and ask if they would stop lying. This is what came in and this time, this is what went out at this time. All statuses are good, out bound data was confirmed to be received correctly by the next module in the chain at this time and well yeah, what's the problem? That has saved me so many times from wrong bogus accusations.
Version Control System, so much agreed!
One commit per change, fully agreed. But only when the code is somewhat in order. But before that I usually prefer session commits. Otherwise there would be absolute flood of commits, without any meaningful content. For sure use -p to select hunks for commits, if the changes do not belong into one commit.
Organize projects by data/type, not functionality, Fully agreed. The data / type hierarchy is exactly what I've been using, because the data1 is often customer specific. Which naturally means, I'll usually ship the deliverable in package which contains only the generic + single customer specific code and leave all the rest out.
Create libraries, this one is interesting one. Well, it seems that my approach is spot-on. Because the generic part is set of libraries which I do import and contain the most commonly required components. Of course those libraries contain classes which can be inherited and on purpose sometimes even empty functions which which can be overridden to extended the functionality of the library easily. Classic question again, inherit and override or simply passing handles with instances providing specific interfaces to be used. I've used both approaches on different projects. The inherit, extend and override works better with really stuff. Yet just passing handles to instances is preferred in simple cases. Same thing, slightly different approach. Library approach is best for light projects. Where all the inheritance and class stuff is just excess. Of course it can be done, if required. But it doesn't add any production value. - Honestly, with Python projects I rarely do this. With Java projects it was the norm.
Monitoring, sure, business as usual. The monitor data is usually split between generic & debug logs.
Config file, agreed. Basically all of my projects have config file(s). Except the simplest ones.
Command line options, almost always. Unless she config file is used for everything required. Yet often the command line option is limited to single letter mode selector. This is again one case which divides people. Should we use argparse or not. Of course it'll be used, if the command line options are more complex for that. But checking if the argument is a, b or c. Doesn't AFAIK require using option parser library.
Function and application composition, sure. That's always the question. Yet in some cases that'll can end up with hard to manage jungle. When bunch of components or applications or microservices are just glued together to form a brittle chain. Why bother using HTTP client library, when you can just call curl. - As example. I've even seen many companies recommending that. One pro is that you can then update the client by simply updating cURL, instead of ouch, how do we get this project to support TLSv1.3. - So many factors.
App composition, start stupid, agreed! I got sick'n'tired with Microsoft COM Interface and Python. When command line or any file would have been more than enough to pass the required information. The only good thing is that now I can say, it's working, I've done it, I know how it works, and it can be done in future. But I really hope, I don't ever need to do it again. Yet of course it'll be much easier next time. This is bit like the complex interfaces and classes, can be done. Uses lot of time, and the end functionality will be exactly the same to the end user. But at least we can say, it's done much better. - I'm also worried that the dependencies the COM Interface add, might actually negatively affect maintainability in long term. - Many coders seem to forget that the projects can be used in production for a decade or two. How are your dependencies doing during that time?
Optimization, kind of agreed. This is extremely complex topic. I would wonder what would be the performance of as example video codec, if it would be written in simplest possible naive way, doing what's required. And completely forgetting all the design aspects with optimizations required to make it run fast. The difference can be absolutely huge. So after all I do not agree. Yet the code would be guaranteed to be simpler and easier to read. Simple, naive and easy can be nice, but can perform extremely badly. It's also another question when the optimization / performance matters. It doesn't until it does.
Lazy evaluation, agreed. Yet again for clarity, in some cases it's clearer to process data in clear steps, instead of passing it through chain. Yet as mentioned, the chain approach is usually the option I do prefer when possible, especially if working with non-minimal data sets. Yet some times processing data from database via long slow chain can cause the locks to be held for a long time. Processing in blocks and buffering in memory can relieve lock contention.
Explicit is better than implicit, agreed.
The best secure way to deal with user data is not to capture it. YEAH!
Keep a record of "stupid errors that took me more than 1 hour to solve" - Hahah. - With Python it would definitely be reusing accidentally some short variable name and changing it's value some unexpected value. If that happens in some function / method which got complex internal state, it can take quite a while to find it out and usually at the beginning you're completely baffled how the bleep this is happening. This code isn't doing that. Does usually happen in some experimental code which is growing organically and adding more and more logic. But because it's experimental, it doesn't yet make sense to make it proper.
When it's time to stop. Sure, after that certain point, you'll be just doing "random things" and making the situation probably worse. Anyway, it might be fun to check out later the code and I just wonder why that was being done, it doesn't make any sense at all. This is exactly the state where the errors mentioned in the previous paragraph start happening and become impossible to find. If you're forced to do something with production in that state, it's extremely hazardous. Being sick at home, and quickly fixing this one thing in production is the case where I've done my worst errors. I've had two of such mishaps in my career, yet both times I were able to fix up the situation so that nobody knew it happened. But personally I were extremely alarmed because I knew how close to a major mishap it was.
"It's really therapeutic to see that someone else also struggles with your problem, and that's not just you." - One of best statements ever. So much joy!
Hero Projects, nothing new at all. It can be also great learning experience. In work you can't create same project designed in 10 different ways. Because it would take so much time. But with your own project at home, you can do that. Then you can experiment with all approaches and find the good and bad sides of every design / approach. As well as do things much better than it needs to be done, just to make it work. - I'm just working with one upgrading the PCP project to PCPP project. And that's not all. I'll also design and test it in at least in two different ways, to see if either design is better than the other. Purely as a hobby and out of generic interest.
Blogging stupid solutions. Sure, it's same when you look your own decade old code. Why is this so bad. Well, see it in positive light. New code is much better than the old one.
"my first Python project that looks like I just translated Java into Python, without the Pythonic part." - This one made me laugh again. Because I've blogged about exactly same thing, my Python looking like Java long time ago. Sometimes I feel like Pythonic part just means more complex syntax than Java's.
Finally there's again the final distinction, do we need to get this done. Or if this project is 'academic research' and we can use 10 to 100x the time it would need to get the job done, to do it in some overly complicated and technically prefect way with perfect documentation. Is the customer ready to pay for the extra days? - This statement is kind of related to the articles comments about cargo cult.