One of the naughty details of my Varnish software is that the configuration is written in a domain-specific language that is converted into C source code, compiled into a shared library, and executed at hardware speed. That obviously makes me a programming language syntax designer, and just as obviously I have started to think more about how we express ourselves in these syntaxes.
Rob Pike recently said some very pointed words about the Java programming language, which if you think about it, sounded a lot like the pointed words James Gosling had for C++, and remarkably similar to what Bjarne Stroustrup said about good ol' C.
I have always admired Pike. He was already a giant in the field when I started, and his ability to foretell the future has been remarkably consistent.1 In front of me I have a tough row to hoe, but I will attempt to argue that this time Pike is merely rearranging the deckchairs of the Titanic and that he missed the next big thing by a wide margin.
Pike got fed up with C++ and Java and did what any self-respecting hacker would do: he created his own languagebetter than Java, better than C++, better than Cand he called it Go.
But did he go far enough?
The Go language does not in any way look substantially different from any of the other programming languages. Fiddle a couple of glyphs here and there and you have C, C++, Java, Python, Tcl, or whatever.
Programmers are a picky bunch when it comes to syntax, and it is a sobering thought that one of the most rapidly adopted programming languages of all time, Perl, barely had one for the longest time. Ironically, what syntax designers are really fighting about is not so much the proper and best syntax for the expression of ideas in a machine-understandable programming language as it is the proper and most efficient use of the ASCII table real estate.
There used to be a programming language called ALGOL, the lingua franca of computer science back in its heyday. ALGOL was standardized around 1960 and dictated about a dozen mathematical glyphs such as x, ÷, ¬, and the very readable subscripted 10 symbol, for use in what today we call scientific notation. Back then computers were built by hand and had one-digit serial numbers. Having a teletypewriter customized for your programming language was the least of your worries.
A couple of years later came the APL programming language, which included an extended character set containing a lot of math symbols. I am told that APL still survives in certain obscure corners of insurance and economics modeling.
Then ASCII happened around 1963, and ever since, programming languages have been trying to fit into it. (Wikipedia claims that ASCII grew the backslash [\] specifically to support ALGOL's /\ and \/ Boolean operators. No source is provided for the claim.)
The trouble probably started for real with the C programming language's need for two kinds of and
and or
operators. It could have used just or
and bitor
, but | and || saved one and three characters, which on an ASR-33 teletype amounts to 1/10 and 3/10 second, respectively.
It was certainly a fair trade-offjust think about how fast you type yourselfbut the price for this temporal frugality was a whole new class of hard-to-spot bugs in C code.
Niklaus Wirth tried to undo some of the damage in Pascal, and the bickering over begin
and end
would no} take.
C++ is probably the language that milks the ASCII table most by allowing templates and operator overloading. Until you have inspected your data types, you have absolutely no idea what + might do to them (which is probably why there never was enough interest to stage an International Obfuscated C++ Code Contest, parallel to the IOCCC for the C language).
C++ stops short of allowing the programmer to create new operators. You cannot define :-: as an operator; you have to stick to the predefined set. If Bjarne Stroustrup had been more ambitious on this aspect, C++ could have beaten Perl by 10 years to become the world's second write-only programming language, after APL.
How desperate the hunt for glyphs is in syntax design is exemplified by how Guido van Rossum did away with the canonical scope delimiters in Python, relying instead on indentation for this purpose. What could possibly be of such high value that a syntax designer would brave the controversy this caused? A high-value pair of matching glyphs, {and}, for other use in his syntax could. (This decision also made it impossible to write Fortran programs in Python, a laudable achievement in its own right.)
The best example of what happens if you do the opposite is John Ousterhout's Tcl programming language. Despite all its desirable propertiessuch as being created as a language to be embedded in toolsit has been widely spurned, often with arguments about excessive use of, or difficult-to-figure-out placement of, {} and [].
My disappointment with Rob Pike's Go language is that the rest of the world has moved on from ASCII, but he did not. Why keep trying to cram an expressive syntax into the straitjacket of the 95 glyphs of ASCII when Unicode has been the new black for most of the past decade?
Unicode has the entire gamut of Greek letters, mathematical and technical symbols, brackets, brockets, sprockets, and weird and wonderful glyphs such as "Dentistry symbol light down and horizontal with wave" (0x23c7). Why do we still have to name variables OmegaZero when our computers now know how to render 0x03a9+0x2080 properly?
The most recent programming language syntax development that had anything to do with character sets apart from ASCII was when the ISO-C standard committee adopted trigraphs to make it possible to enter C source code on computers that do not have ASCII's 95 characters availablea bold and decisive step in the wrong direction.
While we are at it, have you noticed that screens are getting wider and wider these days, and that today's text processing programs have absolutely no problem with multiple columns, insert displays, and hanging enclosures being placed in that space?
But programs are still decisively vertical, to the point of being horizontally challenged. Why can't we pull minor scopes and subroutines out in that right-hand space and thus make them supportive to the understanding of the main body of code?
And need I remind anybody that you cannot buy a monochrome screen anymore? Syntax-coloring editors are the default. Why not make color part of the syntax? Why not tell the compiler about protected code regions by putting them on a framed light gray background? Or provide hints about likely and unlikely code paths with a green or red background tint?
For some reason computer people are so conservative we still find it more uncompromisingly important for our source code to be compatible with a Tele-type ASR-33 terminal and its 1963-vintage ASCII table than it is for us to be able to express our intentions clearly.
And, yes, me too: I wrote this in vi(1), which is why the article does not have all the fancy Unicode glyphs in the first place.
Related articles
on queue.acm.org
A Conversation with Arthur Whitney
http://queue.acm.org/detail.cfm?id=1531242
You're Doing It Wrong
Poul-Henning Kamp
http://queue.acm.org/detail.cfm?id=1814327
How Not to Write Fortran in Any Language
Donn Seeley
http://queue.acm.org/detail.cfm?id=1039535
1. Pike, R. Systems software research is irrelevant; http://herpolhode.com/rob/utah2000.pdf.
©2010 ACM 0001-0782/10/1100 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
For a reference regarding the original purpose of the ASCII backslash character (to form /\ and \/ in ALGOL), see Robert W. Bemer: A view of the history of the ISO character code, Honeywell Computer Journal 6(4), pp274-286, 1972.
While I sympathize with your quarrel, the real culprits surely are not character sets but keyboard standards. They have yet to shed their mechanical typewriter past and catch up with digital typography.
Go allows identifiers constructed from Unicode characters, and the language defines source files to be encoded in UTF-8, not ASCII. The tool chain that 6g was built from has dealt with UTF-8 for close to 20 years (6g is built on top of the machinery from the Plan 9 compiler suite).
Perhaps I'm just missing the point of the article. On the face, however, it seems to be based on an incorrect premise.
The following letter was published in the Letters to the Editor in the March 2011 CACM (http://cacm.acm.org/magazines/2011/3/105325).
--CACM Administrator
Poul-Henning Kamp's attack in "Sir, Please Step Away from the ASR-33!" on ASCII as the basis of modern programming languages was somewhat misplaced. While, as Kamp said, most operating systems support Unicode, a glance at the keyboard shows that users are stuck with an ASCII subset (or regional equivalent).
My dubious honor learning and using APL* while at university in the 1970s required a special "golf ball" and stick-on key labels for the IBM Selectric terminals supporting it. A vexing challenge in using the language was finding one the many Greek or other special characters required to write even the simplest code.
Also, while Kamp mentioned Perl, he failed to mention that the regular expressions made popular by that language employing many special characters as operators are virtually unintelligible to all but the most diehard fans. The prospect of a programming language making extensive use of the Unicode character set is a frightening proposition.
William Hudson
Abingdon, U.K.
FOOTNOTES
* APL stands for "A Programming Language," so "the APL programming language" deconstructs as "the a programming language programming language."
The following letter was published in the Letters to the Editor in the February 2011 CACM (http://cacm.acm.org/magazines/2011/2/104382).
--CACM Administrator
In his article "Sir, Please Step Away from the ASR-33!" (Nov. 2010), Pohl-Henning Kamp was deliberately provocative regarding programming language syntax, but his arguments were confused and off the mark. To call attention to new directions available to language designers, he focused on syntax, and surprisingly, complained about the "straightjacket" imposed by ASCII but also made a good point regarding the importance of "expressing our intentions clearly." Still, he distorted the role of "syntax," which involves critical goals beside "expressivity," including machine interpretation, amenability to formal analysis, efficiency (in many dimensions), and persistence over time. A computer language also concerns communicating with the computer.
Kamp seemed to conflate the formal syntax of a language with a variety of presentation and communication issues in programming environments, rather than with the language itself. His examples even demonstrated my point; no matter what the formal syntax, contemporary tools can overlay useful semantics, making it much easier for humans to express their ideas. Why in the world would we want to enshrine the vagaries of human perception and cognition in the syntax of a computer language?
I also have reservations about many of Kamp's suggested improvements. He clearly privileges "expression" over "communication," and his reference to using color and multi-column layouts is highly problematic. These concepts make assumptions about the technical capabilities available to users that are as likely to change as the perceived technical constraints that led to the syntax of C. Despite his intention to be provocative, Kamp was also quite conservative in his technological assumptions, staying with two dimensions, eschewing sound, ignoring handheld technology, and generally expecting WIMP interfaces on commodity PCs.
I find the biggest problem with programming languages involves understanding, not expression. I'm probably typical in that I've read far more code than I've written, most of it by strangers in contexts I can only guess. Many of my toughest challenges involve unraveling the thoughts of these other programmers based on limited evidence in the code. For the reader, adding color coding, complex nonlinear texts, and thousands of glyphs means only more cognitive load. It's difficult enough to "execute C/Java/etc in my head"; mapping complex, multi-colored hypertext to formal logic only makes it more difficult.
Kamp made a great case for why humans should never see programming languages, just like they should never see XML or RDF. They should express themselves through environments that promote communication by humans, with the added ability for the machine to generate correct code (in multiple languages) as needed. While such communication could be implemented through a programming language, the language itself would need features not generally found in conventional languages, including semantics (not easy to define or express) that model displays and layouts.
All in all, I thank Kamp for his comments, but ASCII isn't the heart of the matter. The real heart is the design of programming environments.
Robert E. McGrath
Urbana, IL
-----------------------------------------------
AUTHOR'S RESPONSE:
As desirable as McGrath's vision might be, in a trade where acrophobia is the norm, giant leaps may be ineffective. So while others work on the far future, I am happy to start the stairs that will eventually get us there.
Poul-Henning Kamp
Slagelse, Denmark
Displaying all 4 comments