Thursday, November 02, 2006

Entropic Software

I've reached a bit of an epiphany lately about software and complexity. "Epiphany" is probably too strong a word because it implies a sudden revelation. My thoughts about software complexity have more crept up than hit me over the head. But no matter how it got here, I'm convinced that software breeds entropy. And I have examples.

If you look at information theory (the mathematics behind information itself, not "information technology", like electronics), you might be startled to discover that the definition of "entropy" and "information" are essentially the same. Both measure the relative complexity of systems. Here's an example. Compare a class of water to a class filled with the makings of a mud pie. Which has more information? Clearly, the mud pie glass does because it is much more difficult to describe exactly. Water is easy: "A glass full of water". But a glass full of mud pie material is much more difficult. You have dirt, which is itself rich in information (composition, density, etc), plus rocks and twigs (what type of rocks, twigs, etc). From an information standpoint, the glass of mud pie is has much more information. The same is true of entropy. More entropic systems have more information density than less entropic ones. If you think of "entropy" as the movement from structure to chaos, you can see that chaotic systems have more information, just as the mud-pie glass has more information. The information density of highly entropic systems is greater than of structured, less chaotic systems.

Given all that, let's talk about software. I've come to the conclusion that software wants to be complex. In other words, it tends towards entropy unless someone takes active measures to stop it. I see examples of this every day, both building and using software. Software wants, needs, strives to be complicated. I don't know if it's something inherent in having an ultimately flexible palate upon which to build things (i.e., general purpose programming languages), something about the nature of engineering, or something about the people who really want to build software. Whatever causes this tendency, it must be assiduously fought at every turn.

Here's a concrete example from the recent past. During the design of Unix, lots of smart guys had observed this tendency towards complexity and fought it down diligently. To design the commands of the operating system, they decided to make everything as simple as they could, and establish simple rules about how different utilities talked to one another: everyone consumes plain text, and everyone produces plain text. While simple, this is a very effective way to create modular little programs that play nicely with a whole host of other simple programs. The utility of this simple idea has spawned many useful applications (by combining simple parts) beyond what the designers anticipated. Another example of the value of simplicity is the HTTP protocol. So simple you can understand it in an afternoon, yet sophisticated enough to create the largest distributed environment in the universe (as far as we know), the Internet.

Here's a counter example. When designing Office and Windows, Microsoft bumped into the same problem: we need to all applications to talk to one another. Recreating the simple mechanism of Unix didn't seem enough: applications in Windows were event driven, graphical, multi-threaded beasts that couldn't be bothered with simple command lines. Thus, DDE was born (Dynamic Data Exchange). DDE was a way for one binary hairball to talk to another binary hairball. Thus, Word and Excel could send information back and forth. But, as it turns out, DDE was fragile. Both applications had to be running, and in the correct mode to be able to talk to one another. DDE was all about sending information, not driving the other application. And thus is was considered not robust enough. So, let's add more complexity. OLE was born (Object Linking and Embedding). This allowed 2 things: embed an application inside another one, so that the user could interact with the spreadsheet embedded in a Word document. This, by the way, is why Office document formats are so obtuse. Each of the Office documents must act as a container for any other OLE object that might be embedded. The other feature of OLE was the ability for one application to drive another through background commands. This aspect of OLE was split off and became COM (and, its distributed cousin, DCOM). That wasn't sufficient for a variety of reasons, so we got COM+. Then .NET Remoting. Which leads us back around to Monad (or whatever Microsoft is calling it now that it's official - Windows Power Shell). Monad is a way for...wait for it...a command line script (or batch file) to make two application interact with one another, through COM+ interfaces. The idea is that you can pump some rows from an Excel spreadsheet into Outlook as email addresses and tell Outlook to send some files to the recipients.

But what is the problem we're trying to solve? Getting applications to talk to one another. I could do the same thing in Unix, with several of its tools, without all the intervening complexity. Building small modular parts with clean interfaces (the Unix way) means that I get to pick and choose what combinations I want. Using the Monad way, the designers of the binary hairballs that I need to get to talk must have anticipated what I want to do before I can use their hairball to do it. In other words, you cannot use Monad in a way unsupported by the huge binary behemoths for which it facilitates communication between.

This is a good example of the way software has of becoming highly entropic. The problem is that I need to have 2 applications send information back and forth. The simple way is the Unix way. The entropic, highly complex, fragile, limited way is to build great complex edifices, with lots of opaque moving parts. If we're ever going to produce really great software, we have to avoid entropic software like the plague that it is.


Jeff Santini said...

If one can be given an epiphany, you just gave me one.

PK said...

It is for this reason that I have the following quote taped to my monitor: "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -Antoine de Saint-Exupery

nixnixnix said...

Don't get me wrong, I like RoR. But I can smell your RoR 'tude coming through on this one, and I think your reach extends your grasp: I think you might want to check your assumptions about complex systems. Even closed complex systems often emerge from few and very simple inputs. Wolfram's work with Cellular Automata is all about this. Similarly, knowing the rules of a system backwards and forwards in a few minutes does not necessarily mean you can grasp their interactions even in an entire lifetime. The games of Chess and Go are perfect examples of this. And while you're lumping "complexity" with "entropy", you might as well lump "interesting" in there too. A glass of water is not as interesting as a glass of mud pie, considering that mud pie glass contains orders of magnitude more organic, heterogeneous material than the water one will.

Neal Ford said...

You say "RoR 'tude like its a bad thing!

Don't make the mistake of thinking that I mean that simple inputs don't generate complex systems. I, too, am familiar with Wolfram, Conway, and other fascinating work in Chaos and Complexity. I'm simply stating that software tends towards complexity all by itself, and we shouldn't shove it more in that direction when it isn't warrented. Certainly richer information sources are more interesting...if that's what you need. Complexity for the sake of complexity isn't a good thing, and it seems that software wants to go that way.

David Levin said...

I think java (JDK) is a perfect illustration of "entropy" in software.

It started as a language with fairly simple syntaxes. Instead of fixing about half a million bugs and reducing complexity SUN gave us generics, annotations, etc.