Friday, November 07, 2008

Comments == Code Smell

I am sometimes asked about my position on code comments, and, like most things, I have strong opinions about it. Two kinds of comments exist:

  • JavaDoc-style comments (which encompasses JavaDoc, XMLDoc, RDoc, etc), which are designed to produce developer documentation at a high level (class and method names and what they do)

  • In-line comments, generally scattered around the code to indicate a note from developer to developer

Both kinds of comments represent different smells, each with different odors depending on the target.

What makes comments so smelly in general? In some ways, they represent a violation of the DRY (Don't Repeat Yourself) principle, espoused by the Pragmatic Programmers. You've written the code, now you have to write about the code. In a perfect world, you'd never hove to write comments for this purpose: the code will be expressive enough that someone who reads it will understand it. Two things help achieve this result: expressive, readable languages and the composed method pattern.

The language makes a big difference as to the readability of the code. If you write in assembly language, you're pretty much forced to write comments; no one on earth can read the code directly. As languages have matured, you can get much closer to the ideal of self-documenting code (which was explicitly attempted in Literate Programming). Especially in the modern wave of non-ceremonious languages (like Ruby, Groovy, Scala, etc), you can craft extremely readable code. To this end, ThoughtWorks projects generally avoid the more magical features of languages (like the implicit global variables in Ruby, for example) because it hurts readability. One of the side effects of dynamic typing is outstanding method names. The method name is the only vector of information about what the method does (you can't crutch on return or parameter types), so methods tend to be named much better on the dynamic language projects upon which I've worked.

The second key to readable code is not to have too much of it, especially at the method level. In Smalltalk Best Practice Patterns, Kent Beck defines composed method (which I write about extensively in The Productive Programmer). Composed method encapsulates the idea that all your methods should do one and only one thing, making them as small as possible. The side effect of this discipline leads to public methods that mainly consist of calls to a large number of very small private methods, each of which do only one thing. Composed method has lots of beneficial side effects on your code: small methods are easier to test, you end up with really low cyclomatic complexity for your methods, you discover and harvest reusable code chunks more easily, and your code is readable. This last one brings us back to the topic of comments. If you use composed method, you'll find much less need to have comments to delineate sections of code within methods (actual method calls do that), and you'll find that you use better method names.

Now let's talk about how this applies to the two types of comments. First, where are comments indeed useful (and less smelly)? If you are writing an API, you need some level of generated documentation so that people can use your API. JavaDoc style comments do this job well because they are generated from code and have a fighting chance of staying in sync with the actual code. However, tests make much better documentation than comments. Comments always lie (maybe not now, but on a long enough timeline, all comments will become outdated). Tests can't lie or they fail. When I'm looking at work in progress on projects, I always go to the tests first. The comments may or may not be there, but the tests define what's now done and working.

We were on a project where the client insisted on Javadoc comments for every public class and method. We started the project adding those comments, but eventually stopped. When doing agile development, you don't want anything that hampers refactoring. Having comments in place caused a dilemma: do I refactor the method and change the comment (the most work), refactor the method and leave the old comment (with the theory being that I'll change it again later, and would rather just have to update the comment once), or not refactor? No good options here. Having pervasive comments discourages refactoring because it adds significant extra friction. On this project, we abandoned commenting as we went along. The last week of the project we literally did nothing but go back and add comments to the code, which worked well because the code base had settled down by that point. But notice what's lurking in wait: the client wanted all the comments there to make it easier to maintain in the future. But who's to say that whoever maintains that code will keep the comments up to date? You have the same DRY violation as before. If they maintain the tests, they have to change as code changes because they are executable.

The new wave of Behavior Driven Development tools (like JBehave, RSpec, and easyb make this "executable specification" style of comment + test feasible. I expect to see the usage of these tools skyrocket because they give you documentation that has a fighting chance of staying up to date.

Inline comments are almost always a smell. The only legitimate use of inline comments is when you have some very complex algorithm that you need to have some thoughts about beside the code. Otherwise, the presence of inline comments indicates that you've written code that needs explanation, meaning that it cries out for refactoring. I frequently troll code bases upon which I'm working to look for inline comments so that I can refactor the code to eliminate the need for them.

Comments are a great example of something that seems like a Good Thing, but turn out to cause more harm than good. Fortunately, we've figured out how to achieve the same benefits that comments allegedly provide with tests, particularly BDD-style tests

22 comments:

Giles Bowkett said...

This is a great post and I agree with all of it, except the idea that dynamic languages encourage better method names. That feels accurate but I can't totally get behind it because I can't articulate why it feels accurate and I'm not sure you articulate that either. Can you say more about that? Is it just because the cleaner, more flexible syntax encourages people to use more natural language?

Chrigel said...

This or course implies no comments in the tests either, eh? :)

felix said...

I generally agree with your reasoning.

However I think some comments are very useful. Class-level comments can provide a short description of the concept behind that class of objects. In a truly DDD style your build can update the project glossary regularly.

Also you might use annotation style comments on methods to produce user documentation (i.e. annotate actions with tool-tips).

I also disagree about the having to keep things up-to-date argument. If you want to produce good documentation you have to keep it up-to-date. While the fact that it's compiled and executed helps, we at ThoughtWorks have recently had a lot of arguments about acceptance test automation (the kind of test that seems well suited to serve as documentation). Bottom line being that you have to keep your acceptance test up-to-date as the system evolves.

I am not so sure about the dynamic languages thing either. One thing that is cool however is the smalltalk way of having multiple selectors between the parameters. This is the thing to strive for. The difference between the static languages like java and c# and the dynamic ones could boil down to camel case versus underscores. The underscores are easier to read for sure. However we still have a single identifier. In smalltalk we could nicely decompose the identifier.

Mark Turner said...

Giles, I think he was referring to duck typing or dynamic types that dynamic languages allow us to use. It allows you to be more descriptive with your method and variable declarations.

I suppose you could be pretty descriptive in most languages but most developers that come from C or Java tend to name their methods and variables in a very utilitarian manner, i.e. 'intData1' or 'strData2'... and of course the dreaded '$data'.

I think it comes down to the developers using more natural language like you said.

salient1 said...

I mostly agree with what you say, however, I disagree a bit in a few of instances:

1) I like class level comments. I believe in responsiblity based programming where a class has a given responsibility and frankly it's sometimes very difficult to come up with a class name that makes it perfectly clear (yes, I know...that's a smell because of course it should always be easy to come up with a name, right?). In any event, I've found people co-opting classes for purposes other than what was intended. A class comment could prevent that.

2) Javadoc comments make sense for libraries but they should only be written at the end of the coding cycle.

3) I use inline comments frequently to think through problems. In some cases I will delete the comments after I finish coding but sometimes not. It can be helpful for another developer who comes later to understand the design or flow by reading such comments. It doesn't mean they should trust the comments as gospel but it also doesn't make them worthless or not useful. I don't believe in dogma about software and unfortunately simple, absolute rules become some kind of religion regardless of whether they are always appropriate. However, for junior programmers LCD rules are the way to go. For experienced developers, being pragmatic is the only rule that matters.

p3t0r said...

I concur, but think there is one area where Javadoc/Rdoc is hugely important: Frameworks. And yes, even in that case it is mostly due to code smells and is not DRY.

sven said...

Literate Programming as I understood the concept is NOT about avoiding documentation. It is about putting documentation FIRST and code second, because making code readable for humans is the more difficult and more important thing.

It should be possible to structure your program in such a way, that a narrative appears and that the program is presented to a reader in the way that is easiest to understand. This should not be limited by the available means of the programming language.

Obviously this is more important when writing software in languages that impose structure (e.g. "declare all variables at the top of your function") than in languages that allow more freedom.

Also it is a more document-oriented approach, where it is assumed, that the narrative is created by one authorative voice and not the collaborative fashion in which programs are created nowadays.

sven said...

You state, that code should be written in such a way, that comments are not needed.
I translate this to "code should be as clear and simple as it can". But this only obviates comments that explain WHAT the code does. The more valuable comments are about WHY the code does what it does;
why did the developer choose this algorithm over that one.

And this kind of comment IMHO should not be invalidated by refactoring. Refactoring (as I understand it) only changes notation without changing visible behaviour. Thus, the design decisions should not be changed by refactoring.

Of course, if the design decisions change, the comment needs updates. But the alternative is, to NOT record the design decision.

André Faria said...

No comments. :.)
humm... just one: Very Nice!

Andy Carlson said...

On the whole I agree with the posting. There is a situation, howveer, where I find inline comments valuable. The idea isn't my own - I think it was from Steve McConnell's 'Code Complete'.

When starting work on a method with a moderately complex algorithm, I start with a completely empty method and just outline my thoughts as a sequence of inline comments. Then I go through the comments in whatever order feels right adding the code to implement them.

When finished, I usually leave the comments in because they are usually useful to me if I ever have to go back to the method later when I've forgotten exactly how it works.

Michael Speer said...

Neil using comments to sudo code a method is a great idea ... in fact I find it valuable b/c it often represents the simple step of write the code once and then re-write it cleaner. I know you support this methodology. The comments left need to be up-to-date but I find the end result of sudo coding 1st via comments and then refactoring that sudo code into real code == better code for most programmers.

That said most methods that are built will not require any sudo code comments b/c they only do one simple thing. However ultimately there is always one place in any class that requires a more complex interaction and those methods need sudo code comments.

Andreas Krey said...

@Andy: As I understand this code smell, your stepwise comments should actually be method calls so that they are no longer comments but method names.

What I don't like about this is that the methods necessarily become visible for the whole class and not only for the only user.

Howard said...

I write a lot of comments in my code. Most of it is JavaDoc (on interfaces) ... I have a lot of implementation classes that are very bare. My emphasis on comments, especially JavaDoc comments, its why a method is called and under what conditions, not particularily what it does (read the code and look at the return type, as you say).

However, when writing code that executes in a multi-threaded environment, some amount of docs need to address those circumstances.

Finally, I've recently started to use documentation annotations (rather than the more active annocations I used elsewhere in Tapestry) as a kind of "documentation macro". For example, in Tapestry IoC services have extensible configurations and there's three different types of such configurations and for all three you need to know the type of object contributed, so I know have @UsesConfiguration, @UsesOrderedConfiguration and @UsesMappedConfiguration annotations that I place on the interfaces, to guide developers who wish to make contributions to the matching services.

Nate said...

I really like the theory behind the post, however in practice I've seen a lot of things that would prevent it from working.

If we had a choice of having only well written tests or well written documentation, I would go for the tests every time. The crux being well written.

I find that most of the issues that come up maintaining code is that the previous person believes they have written code that works as intended, tests that verify this and documentation that explains the method. So now with tests and javadoc we have now repeated ourselves twice!

Original code is written to accomplish some job, and thus is the 'how' we do something. By breaking down the code to use sensible method names, we can imply a lot of 'what' we are doing, but often we are forced to use third party libraries or legacy code that doesn't conform to this.

Javadoc and Tests are both different ways of showing 'what' we are doing, but from two totally different approaches. This forces people to think 'how would I explain this to someone using human readable language' and 'how do I show someone what I am doing using code'. Just as people make mistakes with code, there is a chance that someone will write a bad test or a bad comment.

Now someone else is maintaining code and comes across a comment that doesn't seem to reflect what a method does. They have a fallback in that they can verify what they think it is really doing with the test cases, and update the comment accordingly.

In the same way, if a test starts breaking, the comment on a method can be checked to verify whether the test failed because the method has changed from its documented behaviour.

With modern IDE's, I find that people wanting to use a method will first look at the 'javadoc' that is supplied with a method, and secondly look at the code to see what something does. Very rarely does someone attempt to find the test case for that method and run it to verify the behaviour. So in this scenario comments are quite valuable.

I do agree there in an inherit problem with comments in that there is nothing to enforce them staying in sync with the code, unlike tests which enforce both compile and runtime checks. I also believe that as coders, we are better equipped to write a comprehensive unit test then comprehensive human readable documentation.

We try to get the developers in the mindset that unit tests and javadoc must be reworked in tandem. Peer reviews help to enforce this, but it's still a long way from optimal.

willCode4Beer said...

I have to disagree with you on javadocs (even some on inline comments).

Generally, (inline) comments should say why, not what. The code itself says what. It's really hard to create method and variable names that explain *why* something is done. If something is weird or squirrelly and I don't see why I'll probably change/delete the code. Communication is a good thing.

While unit tests ARE more informative than javadocs, it also takes longer to find/read them. Sometimes a one or two sentence javadoc is all that is really needed to get the job done.

Finally, I'm going to scold a little bit about the issue of refactoring and dealing with javadocs.

Following agile practices does not remove the need to do some design in the application. Even when following an agile practice, great care should be taken when creating interfaces and methods with public access.

Once something is public, you lose the ability to control it and others may be writing code that depends upon it. Refactoring these can be much more difficult because you may not always have access to the client code.

Adam Fisk said...

I like the general theory behind this post, but it's wrong. Why? Hard code happens. If you're not writing hard code, you better hope you're technology isn't your primary competitive advantage, or someone else will eat your lunch.

You mention this briefly regarding inline comments only being necessary when/if you're writing a very complex algorithm. I'd expand that to "if you're writing something complex." If you're not writing something complex, of course you don't need to comment it. Somewhere in your app, though, there sure as hell better be something complex (if you want to have a business at all), and it better be commented well (if you want it to work with each subsequent change).

It sounds like you work on a lot of contract jobs, and that's a totally different story when it comes to code maintenance. Incentives are completely different.

Mike said...

I only read the first few lines and formed a quick opinion. In terms of javadocs, they provide a means to learn about the code without reading the code. That is to say, I don't need the source to learn what a method or class does if I have the javadocs. This is obviously important when it comes to compiled code, right?

I also think javadocs have been important in the success of java, especially in the early years.

Comments as a code smell is an interesting topic that I had not considered or read about before. I plan to read the rest of the post tomorrow. Cheers!

Tor Hershman said...

I just read that you hate Xmas music and you don't like 'happy' music - you MUST visit moi's blog and music sites.

http://www.soundlift.com/band/music.php?song_id=82930

Russ Gray said...

"Inline comments are almost always a smell. The only legitimate use of inline comments is when you have some very complex algorithm that you need to have some thoughts about beside the code."

I just can't understand this fairly extreme viewpoint. Inline comments are not a violation of DRY - if they are, you're doing it wrong. As a few others have said, comments explain the why, not the how, and therefore are not a repetition.

It's also untrue to say that inline comments are only justifiable with extremely complex code. Take this very simple snippet, for instance:

request.Description = description.Substring(0, 30);

You don't need a comment to explain what that's doing, but without a comment you can't see why. Maybe it's to conform with the structure of a wire protocol. Maybe it's because the data must be valid for an old legacy binary storage format with a 30-char limit on fields. Maybe it's because of a display limitation. I can't take seriously the suggestion that this be wrapped up in a method called TrimDescriptionTo30CharsToConformToLegacyBinaryStorageFormatX, so what else are you going to do?

Moandji Ezana said...

@russ grey:
"I can't take seriously the suggestion that this be wrapped up in a method called TrimDescriptionTo30CharsToConformToLegacyBinaryStorageFormatX, so what else are you going to do?"

You could get your code to look something like this:

request.description = description.toFormatX()

Roberto Iván Bravo said...

@"Moandji Ezana"
Do you mean to code only for documentation thus get code without real functionality?
"Substring" method exists before. Again, do you mean encapsulate it only because you need to document something?

Cheers

Moandji Ezana said...

@Roberto Iván Bravo: no, I meant adding a method called toFormatX(). If that method can't be added to description, you could have a method like toFormatX(description).