ACM

Communications of the ACM

Home/Opinion/Articles/Sanity vs. Invisible Markings/Full Text

Kode Vicious

Sanity vs. Invisible Markings

By George V. Neville-Neil
Communications of the ACM, October 2020, Vol. 63 No. 10, Pages 28-29
10.1145/3417099
Comments (1)

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share:

programming code surrounded by white space

Dear KV,

My team resurrected some old Python code and brought it up to version 3. The process was made worse by the new restriction of not mixing tabs and spaces in the source code. An automatic cleanup that allowed the code to execute by replacing the tabs with spaces caused a lot of havoc with the comments at the ends of lines. Why does anyone make a language in which white space matters this much?

White Out

Dear White,

Ever edited a Makefile? Although there is a long tradition of the significant use of white space in programming languages, all traditions need to change. In Python, many people have taken issue with the choice to have white space—and not braces—to indicate the limits of blocks of code, but since the developers did not change their minds on this with version 3 of Python, I suspect we are all stuck with it for quite a bit longer, and I am quite sure that there will be other languages, big and small, where white space remains significant.

If I could change one thing in the minds of all programming language designers, it would be to impress upon them—forcefully—the idea that anything that is significant to the syntactic or structural meaning of a program must be easily visible to the human reader, as well as easily understood by the systems used by developers.

Let's deal with that last point first. Making it easy for tools to understand the structure of software is one of the keys to having tools that help programmers prepare proper programs for computers. Since the earliest days of software development, programmers have tried to build tools that show them—before the inevitable edit-compile-test-fail-edit endless loop—where there might be issues in the program text. Code editors have added colorization, syntax highlighting, folding, and a host of other features in a desperate, and some might say fruitless, attempt to improve the productivity of programmers.

When a new language comes along, it is important for these signifiers in the code to be used consistently; otherwise your editor of choice has little or no ability to deploy these helpful hints to improve productivity. Allowing any two symbols to represent the same concept, for example, is a definite no-no. Imagine if you could have two types of braces to delineate blocks of code, just because two different parts of the programming community wanted them, or if there were multiple syntactic ways to dereference a variable. The basic idea is there must be one clear way to do each thing that a language must do, both for human understanding and for the sanity of editor developers. Thus, the use of invisible, or near-invisible, markings in code, especially tabs and spaces, to indicate structure or syntax.

Invisible and near-invisible markings bring us to the human part of the problem—not that code editor authors are not human, but most of us will not write new editors, though all of us will use editors. As we all know, once upon a time computers had small memories and the difference between a tab, which is a single byte, and a corresponding number of spaces (8) could be a significant difference between the size of source code stored on a precious disk, and also transferred, over whatever primitive and slow bus, from storage into memory.

Changing the coding standard from eight spaces to four might improve things, but let's face it, none of this has mattered for several decades. Now, the only reason for the use of these invisible markings is to clearly represent the scope of a piece of code relative to the pieces of code around it.

In point of fact, it would be better to pick a single character that is not a tab and not a space and not normally used in a program—for example, Unicode code point U+1F4A9—and to use that as the universal indentation character. Editors would then be free to indent code in any consistent way based on the user's preferences. The user could have any number of blank characters used per indent character—8, 4, 2, some prime number, whatever they like—and programmers could choose their very own personal views of the scope. On disk, this format would cost only one character (two bytes) per indent, and if you wanted to see the indent characters, a common feature of modern editors, you flip a switch, and voila, there they all are. Everyone would be happy, and we would finally have solved the age-old conundrum of tabs vs. spaces.

Author

George V. Neville-Neil ([email protected]) is the proprietor of Neville-Neil Consulting and co-chair of the ACM Queue editorial board. He works on networking and operating systems code for fun and profit, teaches courses on various programming-related subjects, and encourages your comments, quips, and code snips pertaining to his Communications column.

Comments

Gunnar Wolf

October 27, 2020 01:50

You say in your article, Allowing any two symbols to represent the same concept, for example, is a definite no-no. This brought up a memory of a case where contradicting your advice was actually an improvement.
Many years ago, I learnt a Lisp dialect where the syntactical meaning of parentheses, brackets and brackets was identical they only were required to match (that is, no bracket could close a parenthesis). This _did_ improve on readability/maintainability of programs, as it allows to better spot where in a deep s-expression you were working.
Of course, bringing Lisp to any discussion on readability feels like cheating. In Lisp it's too common to drown in the layers upon layers of "toenail clippings".
The only worse offender than Lisp would be, of course, the "Whitespace" programming language, which introduces the novelty that any non-whitespace character is considered to be a comment, and the language definition consists of combinations of spaces, tabs and newlines. Yes, naturally, Whitespace proudly occupies a spot as an "esoteric programming language", useful for proving a point and setting up programming challenges, but nothing beyond that.

Displaying 1 comment

Sanity vs. Invisible Markings

Author

Comments

Gunnar Wolf

Article Contents: