mamamatison Jul 15 2021 at 10:29

Guide to naming in code

15 min

9.2K

Semantics*Programming*Designing and refactoring*Industrial Programming*Development Management*

Summary

We present a guide to name entities in code based on putting naming in perspectives of semantic space, design, and readability. The main idea is that naming should not be considered as creation of tags, but as a fundamental part of design process, which implies integral and consistent vocabulary to be used. We discuss naming process and naming formalism from these perspectives and we provide guides for practical use. The work is based on 15 years of experience in engineering work, coding and development management in high-tech industries.

Names in Software Engineering
Principles of Name Design
Bad Naming
Naming Process
Fast Method
Long Method
Formatting
Details & Examples
Final Remarks

Names in Software Engineering

Let us consider what a programmer does. He models a situation in life to create a program which adds some value to it.

While doing this he extracts some things from the situation, conceptualizes them, and names them. Like a table is a concept of a wooden construction we see, and we name it using a word “table”. Note, all three are different.

In all this work a programmer uses some language to describe the situation with a certain vocabulary. This vocabulary defines and reflects meaning we put into our program, so it also defines if the program is modelling the situation well enough to add some value to it. Vocabulary in turn can be a source for building names of parts of a program.

We see three dimensions meeting here: world, thinking, artifact. These dimensions are pair-wise connected.

We also see two levels of abstraction meeting here: low-level, dealing with a part; high-level, dealing with a whole. These are also cross-dependent.

This can be visualized in a triangle model where entities are pair-wise connected.

Name is a reflection of meaning and design. When you name properly, you think properly, you design properly, and you use properly. Bad names lead to improper usage, bad design and bugs.

Programming is also creation of texts. Other programmers or the same programmer in future read the text and decode its concepts. Names are essential part of transferring proper meaning, so actions of future maintainers will not harm integrity of the program.

This shows that naming is essential part the whole process of software engineering, so their design impact all the rest in it.

Principles of Name Design

Thus, design of names, design of vocabulary and design of a model should be aligned, so should the principles we use for it.

In software engineering, as in any engineering discipline, in most cases a model is mechanical, i.e.

a system is a set of separate parts, each having its own concrete function.

Engineer needs to limit entities he works with simultaneously because of limitation of brain, and he usually creates generalizations and hierarchy in a model, which implies:

one entity has one name,
vocabulary has hierarchical structure,
no overlaps between names at the same hierarchy level.

Usually, an engineer works with other engineers, so

the vocabulary should be shared between all the engineers.

Try to think of these general principles of software design in application to naming:

consistency between parts of an integral system;
proper level of specification/generalization;
proper level of modularity/coupling;
maintainability and cognitive-friendly interface;
future-awareness – need to design taking evolution of system into account;
support of various scenarios and corner cases.

As program is a text, we need to make names readable as easy in a normal text to improve understanding of it while reading. It means we need to prefer English over formal language, and full words over excessive shortening.

Bad Naming

As it is usually not that easy to fix the architecture, the same holds for the vocabulary. Terms you use to describe your system cannot be fixed by simple Find/Replace in most cases. They impact a maintainer, and lead to wrong derivative changes in other names and in structure.

Names can be bad if they mislead, can mislead, are ambiguous, are too specific or too general, require time to understand, are inconsistent with context or other names or a convention, describe too little or too much.

Names can be bad because of bad design, for example if the same entity has different meaning in different contexts. In this case to fix name you need to redesign the entity.

Bad names in small pieces of code are not an exclusion. It is better to speak the same language in all parts of an integral space of your program.

Naming Process

Naming can be approached as a process of finding meaning of entities and expressing it clearly within a set of other names.

Since you spend time on design in general, you need to make time for name design. Reluctance of doing the latter is inconsistency which has no excuse, because it harms design and semantic space.

Engineering work contains of iterations. The knowledge of what you work with and what is to be done increase gradually:

Speculative (mental) experiments are naturally more used for naming since they are faster and since they precede more expensive physical experimentation. Also, impact of names is long-term and not always easily physically tested right away.

Names are derivatives from terms you have in a vocabulary you use to describe your model. So, naming process starts with understanding of what terms are more appropriate for your model to reflect a situation in life you work with.

Terms are more heavyweight than names, so more time should be spent on them. It is the time when you create a value of an insight about a problem and a solution.

Terms also define ability of a solution to evolve when a situation in life evolves. Fitting of key things of a situation with the terms you use in your solution lead to a robustness of a solution.

Concrete names should be built of terms in a more formal way, using conventions.

Fast Method

Fast method is for usage while writing a draft or a small part in an already established code.

Describe an entity “using your words”
What is your intention? How would you use it?
Use terms from the context/existing vocabulary to compile a derivative name
Do not create synonyms and do not use different wording for the same thing.
Check if the entity can be properly understood from the same line it will be used in
Do not imply that the reader has go to the definition/somewhere else to understand it correctly. Switching between contexts is for finding details, not for finding the idea. Switching to another context make you lose the original one.
Ask yourself: can others misinterpret the name?
Even if the misinterpretation will happen in 5% of cases it may lead to hours of debugging and worsen maintainability.
Check readability, prefer English over formal.
A text that can be easily read is easier to understand.

Long Method

Hard cases usually mean there are flaws in the design and the understanding of how the system (should) work. Instead of thinking “let us leave it as is and do some real work”, spending extra time in this case is a natural need.

Create a long description of an entity “using your words”
What is your intention? How would you use it?
Write the long description down, look at it.
Make the description precise, i.e. create a definition
What is the entity’s place and scope in the system? How is it different from the other entities you have in the same system and from the entities in use context of the system (if it is visible outside)? Do you put more specifics than needed? Do you generalize what you should not? Usually you increase level of abstraction vs. the values of the entity.
Write precise description down, look at it.
Create variants of long names
Try to put the essence of the definition in few words. Use English, not formal language here. Create multiple variants.
Write the list down, look at it.
Put names in use context in code, look at the code.
Check vocabulary
Check consistency with other parts of your code. Do you create a synonym? Do you use terms which may not be understood by a newcomer who will read your code 10 years later? Is it easy to read? Can it mislead someone even if it is formally perfect? How much time is required to figure out the correct meaning of the variable? A user should be able to read your code as fast as he reads a normal text.
Consider context to shorten names
What will be understood from the usage, so can be omitted? What can be omitted without losing possibility to distinguish the entity from other entities?
Several words are OK if needed, if you move 30 lines into a new function, it is OK to spend few more words to describe it in a name.
Put shorter names in use context in code, look at the code.
Make a survey to decide on a winner
The survey can be a mental experiment of physical. For the mental experiment you need to compare the names as other code users. Both ways have limitations. If you will ask people, they will be unable to identify in a short time if what you suggest is aligned with your design and purpose. Mental experimentation requires a skill and usually is very biased by your own opinion.
Sometimes naming analysis leads to redesign, do it
This is a natural consequence of you getting more knowledge of the system you work with, while finding meaning of entities.
Single entity with multiple responsibilities often makes it uneasy to name it and maintain it. It might happen if your entity is not logically integral. You may split it into parts in this case.
Create appropriate comment about complex entities
It may happen, that a name cannot hold everything you want to share with a reader. Add a comment with full description.
Non-descriptive names should be rare negotiated conventions
You may agree with your colleagues that you will use the name for this entity even if it cannot be understood straight without additional knowledge. For example, abbreviations go here.

Formatting

As code is a text which is read by people, we may reuse principles of creation of a natural language text for code. In a natural language text words are separated by spaces of the same length in a line, comma is followed by a space and has no leading space, etc. And this is applied for all paragraphs. It is better to do the same in code.

The broken window principle is applicable for name formatting. If one breaks a window in a car in an unsafe district, after some time the car will be left without wheels. Something done improperly sets the culture of negligence and pollutes everything around.

Stick to a style which is preferred in the code you change. If you begin a new module, you are freer in choice. Main name formatting styles are:

snake_case – fluently readable as underscore is very similar to a space;
CamelCase – readable, capital letters naturally attract attention;
camelCase – readable, can be used to distinguish from CamelCase.

Convention and consistency in naming open a very useful possibility to encode more meaning in names using styles. Styles can be used to differentiate entities by their class, which increase readability a lot, e.g.:

snake_case for variables, CamelCase for classes and functions;
snake_case for variables, CamelCase for classes, camelCase for functions.

camelCase for variables, CamelCase for classes and functions.
snake_case for variables and functions, CamelCase for classes in Python
snake_case for almost everything in C++ STL
QCamelCase for classes and camelCase for methods in Qt

Hungarian notation put variable type in all variable names. If you create a new style, avoid that, names are for meaning, not for implementation. However, if a type is a part of meaning, in can be in the name. For example, is you create an instance of a class, the class name is usually already abstracted from the implementation well enough to be used in an instance name.

Some styles use _ pre- and postfixes for marking non-public fields of classes. This is useful because it shows the scope in which a variable is used, which adds to understanding of its impact and its dependencies:

_private_function # in Python and Dart
int private_member_; // Google C++ Style Guide

Details & Examples

Meaning should be understood from the same line. Switching of contexts takes time and distracts from thinking on a topic. Do not expect jumping of eye back and forth as a standard way to understand your code. Do not expect that the correct meaning can only be understood from the definition, people will unlikely to go to the definition and will work with their guess instead.
If meaning cannot be understood from the same line – make it explicitly visible. Place a comment to describe what is happening and why, and provide a reference for further reading if needed.
Do not use one-two-letter-per-word abbreviations, even if this is very local. Eventually it may become bigger and, even if not, such code is not easy to understand from this line only, you need to get back to definitions of these variables and spend extra time to understand what is going on correctly. Abbreviations which are agreed convention is an exception, but these should be rare.
Bad: wcf.add(cr);
Prefer English over formal (but keep it formally correct). Name things so they can be read naturally in English. Avoid compound noun form if it can be misleading.
Bad: bool selection_all() const;
Good: bool all_selected() const
Bad: file_unable_to_parse,
Good: unable_to_parse_file,
Bad: DesignTypeNeedToOpen();
Good: TypeOfDesignToOpen();
Bad: SetAbstractGeometryStatus(); // What is abstract geometry status? Status of abstract geometry? abstract status of geometry? something else? Just by reading the name you cannot understand what it really is and is made for. In most cases it means that the name is wrong.
Narrow scope of usage and visibility. Create variables as close to their usage as possible. This improves readability and decrease coupling of code. This helps with having simpler names, as narrow context of usage requires less specifics to distinguish entities from each other. This will also tell a reader that there is no need to check if this variable is used anywhere else. Use code blocks and namespaces for that.
Violating this rule harms code understanding and harms design, since others may use variables in an unintended way, creating more coupling between modules than needed.
Do not create synonyms, use one term for everything related to what is meant by this term. You need to name one entity the same way everywhere. This way people will faster understand code and will be more careful when noting differences in names – they will expect the meaning is also different.
Bad: char console[] = "STD_OUT";
Good: char std_out_name[] = "STD_OUT";
Bad:
QLineEdit* rundir_editor_ = nullptr; QPushButton* select_dir_ = nullptr; // is dir different from rundir?
Good:
QLineEdit* rundir_editor_ = nullptr; QPushButton* select_rundir_ = nullptr;
Bad: return RecentMenuItemTextAndTooltip(mru_text, menu_item_tooltip); // all names refer to one entity using different wording
Do not use shortenings of words as a main method. It leads to overly relaxed attitude to naming and indulges to the reluctance to find simpler and shorter terms. It also adds cognitive complexity, because it leads to non-intuitive shortenings.
Remove unnecessary prefixes and postfixes.
Bad: bool is_visual_mode() const { return visual_mode_; }
Bad: bool get_visual_mode() const { return visual_mode_; }
Good: bool visual_mode() const { return visual_mode_; }
Object should be named per its purpose and data, not per its type.
Bad: Coloring coloring_;
Good: Coloring custom_colors_;
Use inline comment to identify meaning of arguments passed by value, so the line can be understood without checking the function definition.
Good: traits.set_color(color, true /*custom*/);
If a function does some things, it should be named with a verb.
Bad: void JobsQueueUpdate();
Good: void UpdateJobsQueue();
If a function is a simple getter of a parameter or a property, it should be a noun or an adjective. To preserve that in some conventions simple getters can be named using the same style as variables, e.g. in snake_case().
Bad: string GetName() const;
Good: sting name() const;
Function name should not contain argument type in general, because each function call will already mention its argument.
Bad: bool ValidateRectangle(const Rectangle&) const;
Good: bool Validate(const Rectangle&);
Function name should not mislead about what it does.
Bad:
bool FilesAreAvailable(files) { // may set expectation that all files will be checked for file in files: if reader.open_existing(file): return true; // but it only checks if at least one file is available return false; }
Name function by its resulting effect, not by one of possible applications
Bad:
void RunVisualDebugger(char* argv[], int* exit_status) {   forksys(argv, exit_status); }
Good:
void Fork(char* argv[], int* exit_status) {   forksys(argv, exit_status); }
Avoid putting implementation in name
Bad:
void CallLsfork(char* argv[], int* exit_status) {   lsforksys(argv, exit_status); //can change in future }
Good:
void Fork(char* argv[], int* exit_status) {   lsforksys(argv, exit_status); }
Predicates should be named such that they will be interpreted as predicates: having true/false possible states, not multiple states, are not confused with an instance of a class.
Bad: failed_state = True # name assumes multiple failed values; also, is this about politics and sociology?
Good: failed = True # `if failed:` is perfectly readable
Bad: <instance>.worker_ = True # member 'worker_' can be read as instance of Worker class, not as predicate in other contexts
Good: <instance>.is_worker_ = True # no misinterpretation is possible
Bad: is_active_user = True # if this is not a property, but boolean local data holder
Bad: active_user = True # like a variable with User instance
Good: user_is_active = True
Bad: at_least_one_running_instance = True
Good: at_least_one_instance_is_running = True
Do not sacrifice meaning over beauty.
Bad:
// note that all names are rather small class Status(Enum): SENT = 0 ACCEPTED = 1 REFUSED = 2 STARTED = 3 COMPLETED = 4 // successfully? FAILED = 5 IN_PROGRESS = 6 ALIVE = 7
Better:
class Status(Enum): SENT = 0 ACCEPTED = 1 REFUSED = 2 STARTED = 3 SUCCESSFULLY_COMPLETED = 4 // long, but clear FAILED = 5 IN_PROGRESS = 6 ALIVE = 7
Meaning should be understood without knowing of rest associated names. Classification should be nonoverlapping. This is because such names will be broadly used in contexts alone, which will mislead readers about their meaning.
Bad:
FINISHED = 5 // successfully? prematurely?
FAILED = 6 // only this line helps to understand previous line meaning, and we are still not sure if something which is FINISHED can also be FAILED
Good:
SUCCEEDED = 5
FAILED = 6
Singularity and inconsistency in décor and formatting attract attention and slows down reading. Rules of formatting are created also for faster reading. If you don’t have any special reason, do not break the formatting rules.
Bad: //break a rule of one blank line between function bodies (if it is set) if the functions are similar
Bad:
void StartUp(); void PROCESS_DATA(); // why capitals?
Avoid misleading terms.
Bad: QCheckBox* check_geometry_; // using “check” to show that it is a checkbox can mislead a reader that this means “test geometry for errors”, hower it just enables showing of it
Do not mix more and less general terms as synonyms.
Bad: DesignLoadStatus error; // status can be either error or success or something else, in practice error variable may hold status of success
Name associative containers such that one can understand their two-fold nature. In some conventions you may use “to” for that: key_to_value
Compound nouns are good if they are not ambiguous.
Good?: id_of_task_of_parent_stage
Good: parent_stage_task_id

Final Remarks

All of the above is a try to formulate a consistent methodology of thinking while coding and designing. I hope this may also help development managers to facilitate interns' progress. Thank you for your time!

Hubs: