acm-header
Sign In

Communications of the ACM

Kode Vicious

Sanity vs. Invisible Markings


programming code surrounded by white space

back to top 

Dear KV,

My team resurrected some old Python code and brought it up to version 3. The process was made worse by the new restriction of not mixing tabs and spaces in the source code. An automatic cleanup that allowed the code to execute by replacing the tabs with spaces caused a lot of havoc with the comments at the ends of lines. Why does anyone make a language in which white space matters this much?

White Out

Dear White,

Ever edited a Makefile? Although there is a long tradition of the significant use of white space in programming languages, all traditions need to change. In Python, many people have taken issue with the choice to have white space—and not braces—to indicate the limits of blocks of code, but since the developers did not change their minds on this with version 3 of Python, I suspect we are all stuck with it for quite a bit longer, and I am quite sure that there will be other languages, big and small, where white space remains significant.

If I could change one thing in the minds of all programming language designers, it would be to impress upon them—forcefully—the idea that anything that is significant to the syntactic or structural meaning of a program must be easily visible to the human reader, as well as easily understood by the systems used by developers.

Let's deal with that last point first. Making it easy for tools to understand the structure of software is one of the keys to having tools that help programmers prepare proper programs for computers. Since the earliest days of software development, programmers have tried to build tools that show them—before the inevitable edit-compile-test-fail-edit endless loop—where there might be issues in the program text. Code editors have added colorization, syntax highlighting, folding, and a host of other features in a desperate, and some might say fruitless, attempt to improve the productivity of programmers.

When a new language comes along, it is important for these signifiers in the code to be used consistently; otherwise your editor of choice has little or no ability to deploy these helpful hints to improve productivity. Allowing any two symbols to represent the same concept, for example, is a definite no-no. Imagine if you could have two types of braces to delineate blocks of code, just because two different parts of the programming community wanted them, or if there were multiple syntactic ways to dereference a variable. The basic idea is there must be one clear way to do each thing that a language must do, both for human understanding and for the sanity of editor developers. Thus, the use of invisible, or near-invisible, markings in code, especially tabs and spaces, to indicate structure or syntax.

Invisible and near-invisible markings bring us to the human part of the problem—not that code editor authors are not human, but most of us will not write new editors, though all of us will use editors. As we all know, once upon a time computers had small memories and the difference between a tab, which is a single byte, and a corresponding number of spaces (8) could be a significant difference between the size of source code stored on a precious disk, and also transferred, over whatever primitive and slow bus, from storage into memory.

Changing the coding standard from eight spaces to four might improve things, but let's face it, none of this has mattered for several decades. Now, the only reason for the use of these invisible markings is to clearly represent the scope of a piece of code relative to the pieces of code around it.

In point of fact, it would be better to pick a single character that is not a tab and not a space and not normally used in a program—for example, Unicode code point U+1F4A9—and to use that as the universal indentation character. Editors would then be free to indent code in any consistent way based on the user's preferences. The user could have any number of blank characters used per indent character—8, 4, 2, some prime number, whatever they like—and programmers could choose their very own personal views of the scope. On disk, this format would cost only one character (two bytes) per indent, and if you wanted to see the indent characters, a common feature of modern editors, you flip a switch, and voila, there they all are. Everyone would be happy, and we would finally have solved the age-old conundrum of tabs vs. spaces.

KV

q stamp of ACM QueueRelated articles
on queue.acm.org

File-System Litter
Kode Vicious
https://queue.acm.org/detail.cfm?id=2003323

A Generation Lost in the Bazaar
Poul-Henning Kamp
https://queue.acm.org/detail.cfm?id=2349257

Demo Data as Code
Thomas A. Limoncelli
https://queue.acm.org/detail.cfm?id=3355565

Back to Top

Author

George V. Neville-Neil ([email protected]) is the proprietor of Neville-Neil Consulting and co-chair of the ACM Queue editorial board. He works on networking and operating systems code for fun and profit, teaches courses on various programming-related subjects, and encourages your comments, quips, and code snips pertaining to his Communications column.


Copyright held by author.
Request permission to (re)publish from the owner/author

The Digital Library is published by the Association for Computing Machinery. Copyright © 2020 ACM, Inc.


Comments


Gunnar Wolf

You say in your article, Allowing any two symbols to represent the same concept, for example, is a definite no-no. This brought up a memory of a case where contradicting your advice was actually an improvement.
Many years ago, I learnt a Lisp dialect where the syntactical meaning of parentheses, brackets and brackets was identical they only were required to match (that is, no bracket could close a parenthesis). This _did_ improve on readability/maintainability of programs, as it allows to better spot where in a deep s-expression you were working.
Of course, bringing Lisp to any discussion on readability feels like cheating. In Lisp it's too common to drown in the layers upon layers of "toenail clippings".
The only worse offender than Lisp would be, of course, the "Whitespace" programming language, which introduces the novelty that any non-whitespace character is considered to be a comment, and the language definition consists of combinations of spaces, tabs and newlines. Yes, naturally, Whitespace proudly occupies a spot as an "esoteric programming language", useful for proving a point and setting up programming challenges, but nothing beyond that.


Displaying 1 comment

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: