ACM

Communications of the ACM

Home/Magazine Archive/April 2012 (Vol. 55, No. 4)/Why LINQ Matters: Cloud Composability Guaranteed/Full Text

Practice

Why LINQ Matters: Cloud Composability Guaranteed

By Brian Beckman
Communications of the ACM, April 2012, Vol. 55 No. 4, Pages 38-44
10.1145/2133806.2133820
Comments (2)

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share:

Why LINQ Matters, illustration — Credit: Alex Williamson

Composability is an aspect of component design that addresses the freedom to select and combine generic components in nearly arbitrary configurations to support a wide variety of applications, even applications that were not anticipated by the designers of the components. For example, electrical components are routinely designed so that switches, junctions, and loads can be configured in almost any order to fulfill an enormous variety of applications. Component designers do not need to know the specifics of an application, and component users are not overly constrained by arbitrary choices made by designers. Similarly, in mechanical engineering many applications can be built entirely of generic brackets, hinges, fasteners, and so on, designed at the outset to be configured in any order and layout made possible by standardization of sizes, screw pitches, strength-of-materials, among others.

In this article we use Language-integrated Query (LINQ) as the guiding example of composability. LINQ is a specification of higher-order operators designed specifically to be composable. This specification is broadly applicable over anything that fits a loose definition of "collection," from objects in memory to asynchronous data streams to resources distributed in the cloud. With such a design, developers build up complexity by chaining together transforms and filters in various orders and by nesting the chainsthat is, by building expression trees of operators.

Encoding and transmitting such trees of operators across tiers of a distributed system has many specific benefits, most notably:

Bandwidth savings from injecting filters closer to producers of data and streams, avoiding transmission of unwanted data back to consumers.
Computational efficiency from performing calculations in the cloud, where available computing power is much greater than in clients.
Programmability from offering generic transform and filter services to data consumers, avoiding the need for clairvoyant precanning of queries and data models at data-producer sites.

This article also covers lifting composability from programs into the cloud via REST (representational state transfer), mapping between expressions in programs and resource specifications in URIs; and it addresses the subtle hazards of designing for composability: seemingly unimportant, arbitrary choices can render a design fundamentally uncomposable.

Meta-Design Principle

Designing for composability is not merely separating interface and implementation. It also means a certain uniformity in the interfaces: a meta-design principle. It wouldn't do much good if every electrical socket-and-plug design were of a custom and idiosyncratic shape, even if the design were sufficient to carry the required current. It is precisely because sockets and plugs are standardized in shape that we get the flexibility to combine them in nearly arbitrary configurations.

The physical engineering disciplines (for example, electrical and mechanical) have a well-established history of designing for composability because the benefits are so obvious and overwhelming, but the benefits are becoming increasingly clear in software engineering as well. Software designers have long accepted blackboxing: the need to encapsulate implementation details inside interfaces that expose minimal, precise, complete contracts. Designers are now beginning to abstract over the interfaces themselves and to realize the combinatorial benefits of composability.

The discipline of pure functional programming, exemplified by the programming language Haskell (http://www.haskell.org) and its predecessor Miranda (http://en.wikipedia.org/wiki/Miranda_programming_language), brought the absolute composability guarantees of mathematical functions into the world of programming. Their influence, combined with imperative and object-oriented programming traditions, represents a recent, deep, and important consolidation: one big concept, function, incorporates several smaller concepts as cases. Relations, objects, states, and streams all have natural representations in functional style. The increasing favor of this style is evidenced by the following:

The incorporation of higher-order functions (http://en.wikipedia.org/wiki/First-class_function; http://en.wikipedia.org/wiki/Higher-order_function) for transforming, filtering, and aggregating data into workaday imperative programming languages, most notably C# and JavaScript.
The general recognition of immutability as a benefit, especially in concurrent and distributed systems; see Eric Lippert's comments (http://blogs.msdn.com/b/ericlippert/archive/2007/11/13/immutability-in-c-part-one-kinds-of-immutability.aspx) and a relevant thread in the Python user's forum (http://groups.google.com/group/comp.lang.python/browse_thread/thread/29c62cbee7a6b598/df5b676f6f695eb9).
The spread of libraries such as LINQ (http://en.wikipedia.org/wiki/Language_Integrated_Query) and underscore.js (https://github.com/documentcloud/underscore).
Whole languages such as Scala, F#, and Clojure.
Language overlays such as coffee-script (https://github.com/jashkenas/coffee-script).

LINQ as a Multidomain Composability Pattern

LINQ is a case study in taking the composability of functions one step further by specifying a collection of composable higher-order functionsthe SQOs (standard query operators; http://en.wikipedia.org/wiki/Language_Integrated_Query#Standard_Query_Operators). This specific design generalizes over a loose abstraction of a collection and covers a comprehensive variety of domains:

Data structures in memory (for example, lists, trees, graphs, queues).
Tables in a database (even with primary- and foreign-key constraints).
Asynchronous data streams (RX; http://msdn.microsoft.com/en-us/data/gg577609).
Slash-separated terms in URIs; thus, resources in the cloud.
Signal-processing primitives (convolutions and Fourier transforms).
CodeDOMs (code document object models); expression trees; ASTs (abstract syntax trees)that is, over programming languages themselves.
HTML, XML, and JSON (JavaScript Object Notation) documents.
Continuations, exceptions, alternatives, and more.

This design brings the same thorough collection of composable operators to all these domains. You could say that LINQ brings composability itself to these domains. If you implement the SQOs, then you are guaranteed composability. LINQ was not the first and certainly is not the only way to do it, but it is exemplary enough to support the other observations in this article, namely:

Lifting composability in programs to composability in the cloud.
Illustrating the hazards facing would-be composable designs.

From Programs to the Cloud

LINQ's SQOs, then, serve as our exemplar of the composability-in-itself meta-design principle. As typically implemented in programs, the SQOs apply functions to collections in various ways, recognizing functions as the natural interaction convention between composable components in programs.

Leaping from composability within programs to composability in the cloud, you must "implement" SQOs over the natural interaction convention between composable components in the cloud, whatever that may be. This is just another domain for LINQ, viewed as a specification for composable operators over anything that satisfies LINQ's loose abstraction of a collection. The cloud is just a loose collection of resources specified by URIs.

What is "the natural interaction convention between composable components in the cloud?" Perhaps the most important such interaction convention today is REST (http://en.wikipedia.org/wiki/REST), in which client and server interact via HTTP, by opaque URIs that represent resources. REST specifies no semantics on URIs, only a very light postfix syntax of slash-separated terms. Protocols such as OData (http://www.odata.org/) use URI encoding and REST to provide data frameworks to the cloud, specifically by empowering a client to specify the result of a desired computation or query as a resource. To this let's add the following ideas:

Radical composability. Users build complexity via chaining sequences of SQO expressions in arbitrary order rather than via a rich syntax over a presumed data model, as in OData.
Bidirectional mapping. This is mapping between expression chains in programs and resource specifications in URIs. Given such a mapping, you may reason over expressions in programs and resources in cloud interactions as if they were the same.
Injectability. The receiver of an expression chain may maintain standing operations over multiple communications so as to save bandwidth. A client requesting, say, a stream of "Chinese restaurants near me," may inject a filter one time to the server, which then avoids sending other irrelevant business records to the client over future sequences of push notifications. A server, flooded by too-frequent location updates from a client, may inject a filter into the client that says "only every tenth update," which the client applies just before performing physical communication.
Broad applicability. The same composable design works over static data resources and time-dependent, asynchronous data streams (and other domains).

At this point, assume a natural correspondence between composable operator-chain expressions in programs and composable resource specifications in the cloud. A specific way to mechanize that correspondence is presented in the following sections, but the correspondence itself allows you to reason the same way over what works in programs and what works in the cloud.

Embedding LINQ in URI Syntax

The typical implementation of LINQ is embedded in a programming language that supports higher-order functions. Adopting LINQ, however, only means accepting the specification of the SQOs. LINQ, as a design, does not need an ordinary programming language at all. By noting that a nested tree of operators can be converted to a pure-postfix chain of operators; and a chain of operators separated by dots in a programming language is formally isomorphic to a chain of terms separated by slashes in a URI, you can embed any LINQ chain in URI syntax and use HTTP as an "expression-embedding language" in RESTful Web services.

For example, imagine the expression illustrated in Figure 1, in pseudo C#, that represents the phone numbers of customers with recent purchases:

It says, "Keep the customers Where the difference between the date and time now and the date and time of the last purchase is fewer than 365 days, then Select the primary phone contact number from each customer in the resulting collection of customers." Where and Select are both LINQ SQOs. Each one takes two argumentsa data collection argument to the left of each dot and a higher-order function inside parenthesesand each produces a new collection. That is the essence of their chain composability: each SQO expression represents a data collection ready to be dotted into the next SQO down the chain in left-to-right order.

The higher-order-function argument of the Where SQO is a predicate written as a lambda expression:

Read this as "the function of customer that performs the computation specified by the function-body expression and produces the resulting value of the expression." Since this particular lambda expression represents a predicate, it should produce a Boolean value. In JavaScript, you would write exactly the same thing as

The higher-order-function argument of the Select SQO is another lambda expression that just picks out the phone-number field from a customer record, but it could do arbitrary computation.

Most languages or libraries call this Select operator map, but LINQ's designers chose the name Select to appeal to SQL developers. In some theoretical discussions, the operator is also called Project. Let's stick with Select for now.

Since the expression chain contains no nested operators, it is already in shallow postfix form if we consider the lambda expressions atomic:

Just prepend a protocol and domain, and replace the dots with slashes:

Then URL-encode the lambdas, ending up with something like this:

Thus, you have a representation of the entire expression in a URI. A RESTful server can interpret expressions such as this in a security sandbox and provide a very general query service without having built-in, precanned queries.

If you cannot or do not want to consider the lambdas as atomic, then you can go all the way to deep postfix form, writing the entire AST in postfix and encoding it in a URI.

You can do this in two stages: first keep the lambdas imploded so you can see the difference between deep prefix and shallow prefix; then explode the lambdas. Begin by rewriting the query first in prefix form, dragging everything that was on the left of a query-operator dot over to the right inside the parentheses of the operator in first position. This turns the expression inside out like the sleeve of a sweater, as follows:

Now, it is easy to see that each level is a binary operator with a data collection in the first argument slot and a lambda in the second slot. This corresponds to the AST in Figure 2. Without exploding the lambdas there, left-to-right, depth-first traversal yields a deep postfix form, still with lambdas imploded:

It looks similar to the shallow postfix form, just with the SQOs post-lambda, as it were. The SQOs appear after the lambdas instead of hosting lambdas inside their parentheses. In deep postfix form, there are never any parentheses, and that's the whole point. This should remind you of PostScript or Forth or RPN (reverse Polish notation) calculators, because they are all deep postfix (http://en.wikipedia.org/wiki/Concatenative_programming_language).

Now, explode out the lambdas into the AST shown in Figure 3. With no parentheses, its RESTful encoding in a URI is trivial:

The remaining dots denote property-accessor calls, system calls, or lambda-variable declarations, as opposed to SQO calls. URI encoding replaced all dots preceding SQO calls with slashes.

Note that there are other options for encoding lambdas in postfix. This example uses a representation in which variables appear before they are declared. This requires an execution model that saves variables and dependent operations symbolically on the stack, to be resolved later when the lambda term declaring the variable arrives. Other competent choices abound. For example, PostScript and Factor encode lambda expressions in arrays where the variable declarations precede the function bodies. This amounts to a cheater prefix notation embedded in otherwise postfix languages.

We have departed the realm of human-readable syntax (except for lovers of Forth) but have found a URI encoding of ASTs that is trivial for machines to read and write. We probably do not need to go quite this far, even for nested LINQ. Parenthesis counting can just as easily manage nesting while retaining human readability.

For an example with nested SQOs, imagine the expression in Figure 4, again in pseudo C#, that represents the line items for high-value orders from a list of customers.

Hazards in Designing for Composability

Whereas LINQ is a fine example of a composable design that works equally as well over objects in memory as over resources in the cloud, it's certainly not the only one. Designers who embrace radical composability, however, will confront some hazards. If they get something just slightly wrong in the design, then users can no longer specify what they need within the design.

In a typical example, suppose you need a collection of all orders from big-spending customers. Your library provides a way to filter out low-spending customers, and it provides a composable way to get the orders for each of those customers. So far, so good. In pseudo-C#, you might try the example shown in Figure 5.

This does not yield the required collection of orders, however; instead it yields a collection of collections of ordersone internal collection of orders for each customer. You need to flatten the top-level collection. You must leave the library and write custom code.

In a cloud setting, leaving the library means proliferating custom code for every uncovered case, restricting the ability to build novel expressions just from composable building blocks. Anywhere you have an expression like the one here, you must do something out-of-band to remove the unwanted extra structure.

In a program setting, leaving the library means risking insertion of brittle workaround code in various places in the system. Perhaps the programmer knows that Orders are arrays, so writes block-copy operations to flatten them. Later, when Orders changes implementation from array to gzipped XML, this code fails unexpectedly.

It is somewhat amazing that a single specification of operators such as LINQ's SQOs applies equally as well to asynchronous data streams as to objects in memorybut it does.

If, however, the library had provided the required SelectMany in the first place, then the programmer would have written as shown in Figure 6, which has only one difference from the previous code (underlined).

Breadth and Depth of Composability

It is somewhat amazing that a single specification of operators such as LINQ's SQOs applies equally as well to asynchronous data streams as to objects in memorybut it does. Users can specify arbitrary transformations not only on pulled data resources, but also on pushed asynchronous data streams using the same set of SQOs. In both cases, expression components can be transparently injected upstream to the data-producer machines, whether client or server, transforming and filtering data at the headwaters for bandwidth savings and reduction of "chattiness."

In a kind of self-referential joke, however, the same set of SQOs that works over functions in a program works over resources distributed in the cloud. Thus, there is an octuplet application of the same SQO design:

SQOs operating on data within programs or on data over the cloud.
Data distributed in space or with the data distributed in time.
Transforms and filters executed at the data-producer site versus at the data-consumer site, respecting concerns such as bandwidth, latency, and security.

LINQ borrowed its essential concepts once again from Haskell, specifically from its initialization library called the Prelude. The essence of the technique is a loose abstraction of a collection of thingsfor example (don't yawn), customers, orders, items. Each customer has a number of orders, and each order has a number of items. From the point of view of the pattern, it does not matter whether these customers are in memory, or delivered n at a time to callbacks, or threaded through continuations, or in offline document stores accessible by Web-service calls. What matters is just that there is an abstraction representing them somewhere in your system where you want to manipulate them. The required composable operators fall into categories (with examples from LINQ and the Haskell Prelude):

INSERT elements into collections (for example, new List(...) / return). This is how you get into a collection. For example, convert a customer record into a one-record table; make a singleton list (containing a single customer object).
TRANSFORM elements and produce a new collection (such as, .Select / map). For example: produce a list of customers' primary phone numbers from a list of customers, one phone number for each customer.
FILTER elements out of the collection based on a predicate (such as, .Where / filter). For example, produce a list of customers with big orders; there will be fewer of these, in general, than in the original collection.
EXPAND elements from one collection into new subcollections and combine or concatenate the new collections (such as,.SelectMany / concatMap, bind, >>=). For example, get all first-degree friends of the customers; get all orders from all customers; get all line items from all orders.
AGGREGATE elements and produce a result that depends on all the elements (such as,.Aggregate / fold; and .Take and .Skip / take, drop also fall in this category). This is how you get out of the collection (technically, this is a co-monadic operator, EXTRACT). For example, get the average spending over all customers.

These are fundamental. If a collection abstraction supports appropriate primitives in each of these categories, then composability is guaranteed.

How to Ruin a Library

You do not have to write exactly the LINQ SQOs; there is plenty of freedom if you understand the fundamentals. One big way to blow it, however, is to overlook the EXPAND category, which designers often do because it does not fit the map/filter/fold mantra familiar in some quarters. The EXPAND category is essential; in fact, it is the most important one. Without it, the library can not represent the most general kind of collection and thus can not be extended to new settings such as asynchronous data streams or resources in the cloud. With it (and with INSERT), all the other categories can be simulated exactly (for more information, see http://channel9.msdn.com/Shows/Going+De-Smet-MinLINQ-The-Essence-of-LINQ).

The diagrams in Figure 7 demonstrate the categories of operator. Together, the diagrams show why the EXPAND category is most essential. It is the necessary twin of FILTER, even though it is more general.

Another way to ruin a library is to get the order of arguments wrong. To chain operators, let alone to embed them in URIs and REST, you want the collection argument in first position, always. Traditionally, dialects of Lisp put the function argument first, at least for map, because it makes expressions read, "Map this function over that collection." Library designers with Lisp backgrounds might just reflexively write map that way. This minor syntactic lapse means that map, a.k.a. Select, disrupts the pattern and cannot be composed in dotted chains except at the beginning.

Conclusion

LINQ's pattern is bulletproof and pervasive, applying to a surprisingly broad array of software domains such as asynchronous streams, stateful computations, I/O, exceptions, alternativesanything that fits a loose abstraction of "collection of things." The pattern is always composable in expression trees, where dependent operators follow each other in nested sequences. It can be encoded equally as well in ordinary programming-language syntax as in pure-postfix notations such as URIs.

Because the cloud fits the loose definition of a "collection of things," LINQ's pattern of composable transforms covers distributed resources in the cloud as well as it covers locally addressable resources in programs. Packaging subexpressions and injecting them closer to data-production sites allows you to gain bandwidth, chattiness, and security benefits in RESTful Web services.

Author

Brian Beckman currently works in Microsoft's Bing on Maps and Signals. He has held many positions at Microsoft since 1992, from Crypto (SET) to Biztalk to research in functional programming. He wrote the first version of the Time Warp Operating System on the Caltech Hypercube 19841989.

Figures

Figure 1. LINQ expressions including a filter and a projection.

Figure 2. AST showing imploded lambdas.

Figure 3. AST with exploded lambdas.

Figure 4. Example LINQ expression with nested operators.

Figure 5. Example that produces residual unwanted nesting.

Figure 6. Example of SelectMany that avoids unwanted nesting.

Figure 7. Categories of composable operators.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.

Comments

CACM Administrator

April 02, 2014 11:34

The following letter was published in the Letters to the Editor in the August 2012 CACM (http://cacm.acm.org/magazines/2012/8/153804).
--CACM Administrator

I concur wholeheartedly with the composability benefits Brian Beckman outlined in his article "Why LINQ Matters: Cloud Composability Guaranteed" (Apr. 2012) due to my experience using composability principles to design and implement the message-dissemination mechanism for a mobile ad hoc router in a proprietary network. In it, the message-dissemination functionality of the router emerges from the aggregation of approximately 1,500 nodes in a composable tree that resembles a large version of the lambda-tree diagrams in the article. However, instead of being LINQ-based, each node represents a control element (such as if/else, for-loop, and Boolean operations nodes), as well as nodes that directly access message attributes. Each incoming message traverses the composable tree, with control nodes directing it through pertinent branches based on message attributes (such as message type, timestamp, and sender's location) until the message reaches processing nodes that complete the dissemination.

Since assembling and maintaining a 1,500-node tree within the code base would be daunting, a parser assembles the tree from a 1,300-line routing-rule specification based on a domain-specific language (DSL).a Defining the routing rules through this DSL-assembled composable tree also provides these additional benefits:

Nodes verified independently. Verifying the if/else, message-timestamp, and other nodes can be done in isolation;

Routing rules modified for unit testing. As the routing rules mature, their execution requires a full lab- or field-configuration environment, making it difficult to test new features; a quick simplification of a local copy of the DSL specification defines routing rules that bypass irrelevant lab/field constraints while focusing on the feature being tested on the developer's desktop;

Scalable and robust. New routing rules can be added to DSL specification; new routing concepts can be added through the definition of new node types; and new techniques can be added to the overall design; and

Each message traversal recorded by the composable tree. Each node in the composable tree logs a brief one-line statement describing what it was doing and why the message chose a particular traversal path; the aggregation of these statements provides an itinerary describing the journey of each message traversal through the composable tree for confirmation or debugging.

My experience with composable trees defined through a DSL has been so positive I would definitely consider using the technique again to solve problems that are limited in scope but unlimited in variation.

Jim Humelsine
Neptune, NJ

CACM Administrator

April 02, 2014 11:46

The following letter was published in the Letters to the Editor in the July 2012 CACM (http://cacm.acm.org/magazines/2012/7/151227).
--CACM Administrator

The sole purpose of the article "Why LINQ Matters: Cloud Composability Guaranteed" (Apr. 2012) by Brian Beckman seemed to be to try to show that LINQ is buzzword-compatible, as in "cloud" and "RESTful Web services." While I agree with Beckman's main pointthat composability and designing for it are important and usefulI object to the article's exposition for the following reasons:

First, it claimed LINQ is applicable to graphs (among other things). If a graph is to be treated as a collection, it must first be linearized (such as through a topological sort), but what about cycles? Moreover, not at all obvious was that the result of a query is independent of the chosen linearization. Without concrete examples, the article was rather unconvincing when stating that, say, convolutions, continuations, and exceptions can be viewed as sequences usable for LINQ operators.

Second, the article was largely devoted to convincing the reader that a LINQ query can be represented as an abstract syntax tree (AST) traversable in postorder and encodable in a uniform resource identifier (URI)which is kind of a boring, well-known fact. As an engineer-practitioner, I object to the idea of encoding queries in URIs though accept it for the sake of the intellectual exercise.

Third, its description of the EXPAND category of operators, claimed as crucial for composability, was lacking. For example, for the SelectMany operator, I had to infer from nearby text what SelectMany is supposed to do, but the article's other "examples" (one sentence and Figure 7) contributed nothing to my understanding of what the category is supposed to do. Why would "all orders from all customers" require flattening/SelectMany when only a single collectionordersis involved?

Finally, I disagree with the claim that orders of arguments can ruin the property of operator composability. Unless Beckman expects developers to (manually?) encode queries in URIs by pasting in text fragments, a machine might reconstruct an AST from a representation with the "wrong" order of arguments and use it as a starting point for constructing the AST of another query. Not clear is how this might affect embedding in URIs, since the AST must still be reconstructed, modified, and reencoded.

Zeljko Vrba
Oslo, Norway

-------------------------------------------

AUTHOR'S RESPONSE

LINQ's foundation is the monad type class borrowed from the Haskell programming language. A monad is an interface for managing monoidal container structures, including lists, sets, bags, and permutations. LINQ does not presuppose or guarantee order. No matter how one might linearize a graph into, say, an ordered list or unordered bag, LINQ will not change the given order.

As for convolutions, continuations, and exceptions, I need only show that a collection is monadic to show it is also LINQable. For convolutions, under the Fourier representation, LINQ operates on vector spaces, which are monads. For the continuation and exception monads (called the Maybe monad), see Haskell documentation http://www.haskell.org/haskellwiki/Haskell.

Vrba and I will respectfully disagree on the order of arguments. I grant that machine reordering is a functional fix, but it also introduces an extra step at the call site I would rather avoid as a point of style.

Brian Beckman
Redmond, WA

Displaying all 2 comments

Why LINQ Matters: Cloud Composability Guaranteed

Meta-Design Principle

LINQ as a Multidomain Composability Pattern

From Programs to the Cloud

Embedding LINQ in URI Syntax

Hazards in Designing for Composability

Breadth and Depth of Composability

How to Ruin a Library

Conclusion

Author

Figures

Comments

CACM Administrator

CACM Administrator

Article Contents: