Friday, November 17, 2006

Enforcing Good Bahavior

I hate tools that force you down a particular path, and fight you all the way if you want to do something different than what they want. Every development environment and framework has their own path of least resistance, and various punishments for those that wander from the path. For example, Visual Studio encourages poorly designed applications by making it trivially easy to drop database components onto a web page, wire everything on that page via properties, and just click run. No real-world application should be written like that: it is a maintenance nightmare. Of course, you can write well structured applications in .NET (we do it all the time), but you have to do it around some of the designers and other affordances.

I really like tools that encourage good behavior and punish bad behavior. For example, Subversion is almost perfect for Agile projects because it strongly encourages you to check in early and often. Because it doesn't do any file locking, any file upon which you are working is subject to change by another developer. If you wait too long to check in, you are punished with Merge Hell, where you have to reconcile the differences between the changed file. The easiest way to avoid Merge Hell is to check in very frequently. Statistically, you are much less likely to bump into Merge conflicts.

A framework that encourages good behavior is Ruby on Rails. It builds layered applications by default. In fact, you would have to fight Rails hard to build a highly coupled application. Similarly, if you don't write good unit and functional tests in Rails, you are in great danger of building a very fragile application.

Both Subversion and RoR have the right priorities: reward the Right Thing and punish the Wrong Thing.

Thursday, November 02, 2006

Entropic Software

I've reached a bit of an epiphany lately about software and complexity. "Epiphany" is probably too strong a word because it implies a sudden revelation. My thoughts about software complexity have more crept up than hit me over the head. But no matter how it got here, I'm convinced that software breeds entropy. And I have examples.

If you look at information theory (the mathematics behind information itself, not "information technology", like electronics), you might be startled to discover that the definition of "entropy" and "information" are essentially the same. Both measure the relative complexity of systems. Here's an example. Compare a class of water to a class filled with the makings of a mud pie. Which has more information? Clearly, the mud pie glass does because it is much more difficult to describe exactly. Water is easy: "A glass full of water". But a glass full of mud pie material is much more difficult. You have dirt, which is itself rich in information (composition, density, etc), plus rocks and twigs (what type of rocks, twigs, etc). From an information standpoint, the glass of mud pie is has much more information. The same is true of entropy. More entropic systems have more information density than less entropic ones. If you think of "entropy" as the movement from structure to chaos, you can see that chaotic systems have more information, just as the mud-pie glass has more information. The information density of highly entropic systems is greater than of structured, less chaotic systems.

Given all that, let's talk about software. I've come to the conclusion that software wants to be complex. In other words, it tends towards entropy unless someone takes active measures to stop it. I see examples of this every day, both building and using software. Software wants, needs, strives to be complicated. I don't know if it's something inherent in having an ultimately flexible palate upon which to build things (i.e., general purpose programming languages), something about the nature of engineering, or something about the people who really want to build software. Whatever causes this tendency, it must be assiduously fought at every turn.

Here's a concrete example from the recent past. During the design of Unix, lots of smart guys had observed this tendency towards complexity and fought it down diligently. To design the commands of the operating system, they decided to make everything as simple as they could, and establish simple rules about how different utilities talked to one another: everyone consumes plain text, and everyone produces plain text. While simple, this is a very effective way to create modular little programs that play nicely with a whole host of other simple programs. The utility of this simple idea has spawned many useful applications (by combining simple parts) beyond what the designers anticipated. Another example of the value of simplicity is the HTTP protocol. So simple you can understand it in an afternoon, yet sophisticated enough to create the largest distributed environment in the universe (as far as we know), the Internet.

Here's a counter example. When designing Office and Windows, Microsoft bumped into the same problem: we need to all applications to talk to one another. Recreating the simple mechanism of Unix didn't seem enough: applications in Windows were event driven, graphical, multi-threaded beasts that couldn't be bothered with simple command lines. Thus, DDE was born (Dynamic Data Exchange). DDE was a way for one binary hairball to talk to another binary hairball. Thus, Word and Excel could send information back and forth. But, as it turns out, DDE was fragile. Both applications had to be running, and in the correct mode to be able to talk to one another. DDE was all about sending information, not driving the other application. And thus is was considered not robust enough. So, let's add more complexity. OLE was born (Object Linking and Embedding). This allowed 2 things: embed an application inside another one, so that the user could interact with the spreadsheet embedded in a Word document. This, by the way, is why Office document formats are so obtuse. Each of the Office documents must act as a container for any other OLE object that might be embedded. The other feature of OLE was the ability for one application to drive another through background commands. This aspect of OLE was split off and became COM (and, its distributed cousin, DCOM). That wasn't sufficient for a variety of reasons, so we got COM+. Then .NET Remoting. Which leads us back around to Monad (or whatever Microsoft is calling it now that it's official - Windows Power Shell). Monad is a way for...wait for it...a command line script (or batch file) to make two application interact with one another, through COM+ interfaces. The idea is that you can pump some rows from an Excel spreadsheet into Outlook as email addresses and tell Outlook to send some files to the recipients.

But what is the problem we're trying to solve? Getting applications to talk to one another. I could do the same thing in Unix, with several of its tools, without all the intervening complexity. Building small modular parts with clean interfaces (the Unix way) means that I get to pick and choose what combinations I want. Using the Monad way, the designers of the binary hairballs that I need to get to talk must have anticipated what I want to do before I can use their hairball to do it. In other words, you cannot use Monad in a way unsupported by the huge binary behemoths for which it facilitates communication between.

This is a good example of the way software has of becoming highly entropic. The problem is that I need to have 2 applications send information back and forth. The simple way is the Unix way. The entropic, highly complex, fragile, limited way is to build great complex edifices, with lots of opaque moving parts. If we're ever going to produce really great software, we have to avoid entropic software like the plague that it is.