Greg Beech's Website

Even good Hungarian notation is still bad

I was bored this weekend so I ended up trawling through a bunch of blog archives and came across posts from some well respected people about why they believe Hungarian notation (in its originally intended form) is a good thing. There are entries from Eric Lippert, Joel Spolsky and Larry Osterman which essentially have the same point that originally Hungarian notation was intended to reflect the purpose of the variable, not the underlying data type, and in this form it is a very valuable naming convention. I’d have to agree. Sort of.

One of the examples cited in all three of these entries (in different forms) is concerned with ensuring that two integers with different purposes are never added. For example if you have an integer that represents a byte array size then it would begin with “cb” (count of bytes) whereas if it was an index in an array it would begin “i” (index) so if you ever saw “cbValue = cbMyArray + iMyArray” then it would be instantly obvious it was wrong as there’s no good reason to add an index to an array size.

But the problem is that is isn’t instantly obvious – it’s only instantly obvious if you’ve learned the conventions that make it obvious. At least the prefixes in the array example are fairly widely used, but an example quoted by Joel for websites advocates prefixing strings which haven’t been HTML-encoded with “us” (unsafe) and those which have with “s” (safe) to ensure that unsafe strings aren’t inadvertently written to the page output. Anyone who hasn’t read the documentation for these prefixes (because everybody reads documentation right?) will have no idea what they mean and probably just assume that “s” stands for “string” and that “us” stands for some kind of unsigned string, whatever that might be!

So that’s why I sort of agree. I completely agree with the sentiment that the naming convention needs to take into account the purpose of the variable, but it needs to be done in a transparent way that doesn’t require the developer to learn any number of arcane prefixes. If you are looking at somebody else’s code to try and fix a bug, which of these two equivalent lines of code adding integers is more obviously wrong and what is the error?

cbFoo = cchBar + cbBlah;

bytesInFoo = charsInBar + bytesInBlah;

In the first line if you have learnt the convention that “cb” means count of bytes, and “cch” means count of characters then it’s instantly obvious that something is wrong as the two are not equivalent in this modern age of multi-byte character sets. But it’s only obvious if you have learned the convention. In the second line, you don’t need to learn any conventions to see the error because it’s spelled out for you in plain English. Sure, it’s a little wordier, but you’ve got an auto-complete editor so you only need to type the first couple of characters in either case.

So we’ve established that we do need a coding practice of embedding the purpose of variables in their names, but I believe it’s better to do it in a plain-text verbose way rather than with arcane prefixes. And I’ve got the .NET Framework naming guidelines’ backing here:

Do choose easily readable identifier names. For example, a property named HorizontalAlignment is more readable in English than AlignmentHorizontal.

Do favor readability over brevity. The property name CanScrollHorizontally is better than ScrollableX (an obscure reference to the X-axis).

Do not use Hungarian notation.

The problem is we need a name for it. “Hungarian” was chosen as the original designation for the prefixed naming convention because its inventor Charles Simonyi was Hungarian and because it looked a bit like a foreign language with all the random letters. I propose “German” notation for the new verbose method as the German language tends to concatenate a lot of words to form a single one. Unfortunately, I didn't invent this form of notation, so I doubt the name will stick...


Posted Feb 03 2007, 02:38 PM by Greg Beech
Filed under: ,

Comments

Peter Ritchie wrote re: Even good Hungarian notation is still bad
on 05-23-2007 2:16 AM

Agreed, I prefer barByteCount over cbBar.  Add to that the fact that if you work on a team with components written in VB and C++; on the VB side "cb" might mean combobox, where in C++ it might mean count-of-bytes.

Back when C first came out there was an initial limitation that the first 8 chars of an identifier was used to uniquely identify the identifier--which expanded many times over the years--so Hungarian notation made some sense back them.  

For some people it's become habit; especially the folk with a C++ background.  In C# there's no need to use identifier prefixes, in C++ this isn't necessarily the case.  In C# you can write something like this:

internal class Entity {

private int value;

internal Entity(int value) {

 this.value = value;

}

}

..using the this keyword to scope the member from the parameter.  C++ isn't as flexible, if you want to allow freedom of parameter names being anything you want, you'd run into a conflict in this case.  Hence prefixes like "m_" were conceived and you get something like:

class Entity {

private: int m_value;

public: Entity(const int value) : m_value(value){}

};

...to avoid parameter name clashes with member names.

Hungarian notation is fraught with problems: it's subjective, it's impossible to enforce, it's makes refactoring more complex, has become obsolete by contemporary IDEs, and completely unstandardized (VB programmers use different/conflicting prefixes than C programmers).

Mitch wrote re: Even good Hungarian notation is still bad
on 07-05-2007 9:10 PM

"Anyone who hasn’t read the documentation for these prefixes (because everybody reads documentation right?) will have no idea what they mean and probably just assume that “s” stands for “string” and that “us” stands for some kind of unsigned string, whatever that might be!"

Wow, that's horrible. What is a programmer doing reading code first without the documentation? The line:<code>cbFoo = cchBar + cbBlah;</code> looks wrong even if you don't know what cb or cch means. At the very least it will prompt one to go look for documentation or ask someone.

That's like saying the word "is" shouldn't be used because very few people can explicitly and concisely define the word. Usage is what's important.

I'm talking about the real world though, where people don't always read documentation, and where you have pressure from management to get things done rather than going off to find documentation about the naming conventions - which a lot of the time doesn't exist anyway.

Not sure I understand your analogy? What I'm saying is that it's better to use full words to denote purpose than acronyms or abbreviations that may not be known widely, or may be misintepreted. For example, is "cb" count of bytes, or a combo box? It means different things to different people - Greg

Greg Beech's Tech Blog wrote To var or not to var, implicit typing is the question
on 03-24-2008 10:56 PM

The introduction of the var keyword in C# 3.0 was required to support anonymous types, however it may

Add a Comment

(required)  
(optional)
(required)  
Remember Me?

Enter the numbers above:
Copyright (C) Greg Beech. All rights reserved.
Powered by Community Server (Non-Commercial Edition), by Telligent Systems