The other week Jeff Atwood posted a blog entry named Coding Without Comments that stated... well I'm actually not sure. It started by making the valid point that comments should indicate why your code works the way it does and shouldn't be needed to explain what it does, but then went entirely off track and ended up pretty much stating that all comments are detrimental to code quality (emphasis Jeff's):
While comments are neither inherently good or bad, they are frequently used as a crutch. You should always write your code as if comments didn't exist. This forces you to write your code in the simplest, plainest, most self-documenting way you can humanly come up with.
The example used in the blog post was refactoring some code that calculated a square root using the Newton-Raphson approximation to make it more clear how it worked. Which is all very useful. But the problem is that there are a lot of different ways to calculate a square root which make trade-offs in terms of speed, accuracy, memory usage and simplicity, and by the end of the post we still didn't know why the Newton-Raphson method was chosen in the first place because there were no comments with that information. And I really doubt this is something you want to encode in the method name:
var sqrt = NewtonRaphsonApproximationBecauseItConvergesQuicklyWithAGoodInitialGuess(n);
Clearly the claim that you should write code as if comments didn't exist is a fallacy because they are necessary to explain why you wrote it that way (or, indeed, why you didn't write it another way). But what about my claim that code without comments doesn't even work? This is down to the probability that code without any comments hasn't been used in the real world.
When you start writing code, you write it to meet a specification, using the available documentation for any APIs you're calling, and it all looks right. Everything has been code reviewed by at least one other person, you've added argument checking and assertions about your current state to every method, and run plenty of unit and system tests against it, so you know that when it goes into production everything will work perfectly.
Except it doesn't.
One example at blinkBox is the frame-grabbing code which uses DirectShow to grab a frame and then loads the resultant byte array into a Bitmap class. This worked perfectly during testing, and then suddenly in the live environment we kept getting an error with one particular file where the Bitmap constructor claimed the bytes were invalid. Just to see what the bits looked like if we could load them into the image I tried using LockBits and an unsafe memory copy into the bitmap's memory area; the result was a skewed black-and-white image. Noticing that the stride of the image was not native-pointer aligned and many APIs prefer aligned data, my colleague Jon and I set about padding the data to an aligned stride and then trying to use that as the image data. The constructor still wouldn't accept it, but using the unsafe memory copy and then trimming the width to the original size produced a perfect image. I still don't know why the data is invalid, or why this hack works, but we've found that for about 1 in 500 source files we need it.
There's simply no way that you could predict this, or even find it without huge quantities of test data, and putting that sort of hack in the exception handling block without a comment explaining the empirical reasoning behind it and the fact that it may not be perfect as we don't even know why it really works would be reproachable. I think the comment came out at around 20 lines, including ASCII art.
As our codebase matures, there are more and more parts of it that have long comments explaining why things are implemented the way they are, what alternatives were considered and discounted, what circumstances were observed in the live environment that cause this edge case or that edge case or the other edge case, and why the code looks wrong but needs to be that way. Some particularly tortuous parts have significantly more comments than code itself.
Code without this type of comment is code that has never been hardened by real-world use; code that has never hit the 0.1% edge cases; code that simply doesn't work properly. I'm sure Jeff will find this out once he's had StackOverflow up and running for a few months.
Posted
Aug 04 2008, 08:58 PM
by
Greg Beech