acm-header
Sign In

Communications of the ACM

Research highlights

Technical Perspective: Portraiture in the Age of Big Data


"I have never been aware before how many faces there are.
There are quantities of human beings, but there are many more faces, for each person has several."
Rainer Maria Rilke

How many faces does a person possess? That is, how much does a face vary in its appearance over the lifetime of a given individual? Aging, of course, produces the greatest changes in facial structure, as anyone who has ever tried to pick out an adult friend from his first-grade class photo can attest. This is why many official ID documents require their holders to update their photograph every 5–10 years. But even at shorter temporal scales (days or weeks) there could be significant variations due, for instance, to the changes in hairstyle, eyewear, facial hair, or makeup. Add to that the changing pose (full face, profile, 3/4 view) and the constant parade of momentary changes in facial expression: happy, amused, content, angry, pensive ... there are literally hundreds of words for describing the nuances of the human face.

This, of course, poses a great problem for portraitists, for how can a single portrait, even the most masterful one, ever come close to capturing the full gestalt of a living face? Indeed, many great artists have been obsessively preoccupied with this very question. Rembrandt painted over 60 self-portraits over his lifetime, a monumental study of his own face. Da Vinci, a master of visual illusion, purposefully blurred the corners of Mona Lisa's mouth and eyes, perhaps in an effort to transcend the immediacy of the moment and let the observer mentally "paint in" the missing details. The cubists argued that to truly seize the essence of a person requires forgoing the traditional single-perspective 2D pictorial space and instead capture the subject from several viewpoints simultaneously, fusing them into a single image. Cinema, of course, has helped redefine portraiture as something beyond a single still image—the wonderful "film portraits" of the likes of Charlie Chaplin or Julia Andrews capture so much more than the still-life versions. Yet, even the cinema places strict limits on the temporal dimension since filming a movie rarely takes longer than a few months, which is only a small fraction of a person's life.

The following paper is, in some sense, part of this grand tradition—the perpetual quest to capture the perfect portrait. Its principal contribution is in adapting this age-old problem to our post-modern, big data world. The central argument is that there already exist thousands of photographs of any given individual, so there is no need to capture more. Rather, the challenge is in organizing and presenting the vast amount of visual data that is already there. But how does one display thousands of disparate portraits in a human-interpretable form? Show them all on a large grid, à la Warhol? Each will be too small to see. Play them one after another in a giant slideshow? The visual discontinuities will soon make the viewer nauseated.


The following paper could be thought of as a type of Visual Memex specialized for faces.


The solution presented by these authors is wonderfully simple: first, they represent all photos as nodes in a vast graph with edges connecting portraits that have a high degree of visual similarity (in pose, facial expression, among others); then, they compute a smooth path through the graph and make it into a slideshow. The surprising outcome is that, typically, there are enough photographs available for a given individual that the resulting slideshow appears remarkably smooth, with each photo becoming but a frame in a continuous movie, making these "moving portraits" beautifully mesmerizing.

This type of graph representation betrays another intellectual lineage that goes back to Vannevar Bush and his article "As We May Think" (The Atlantic, 1945). Bush proposed the concept of the Memex (Memory Extender), a device that would organize information not by categories, but via direct associations between instances, using a vast graph. This idea has been influential in the development of hypertext, but the original Memex concept is actually much broader, encompassing data types beyond text (for example, photographs, sketches, video, audio), and describing paths through the instance graph (Bush called them "associative trails"). So, the following paper could be thought of as a type of Visual Memex specialized for faces.

In many ways this work signifies the coming of age of computer vision as a practical discipline. The work is one of the first instances when a fairly complex computer vision system (itself utilizing several nontrivial components such as face detection and face alignment) has become a "button" in a mainstream software product (Google Picasa) used by millions of people. So, read the paper, try the software—I do not think you will be disappointed.

Back to Top

Author

Alexei A. Efros ([email protected]) is an associate professor of electrical engineering and computer science at the University of California, Berkeley.

Back to Top

Footnotes

To view the accompanying paper, visit doi.acm.org/10.1145/2647750


Copyright held by author.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2014 ACM, Inc.


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account
Article Contents: