acm-header
Sign In

Communications of the ACM

ICT Results

Listen, Watch, Read


View as: Print Mobile App Share:
multimedia screens

Credit: iStockPhoto.com

European researchers have created the first integrated semantic search platform that integrates text, video and audio. The system can 'watch' films, 'listen' to audio and 'read' text to find relevant responses to semantic search terms. At last, computers are able to look for meaning in our multimedia searches.

There is a phenomenal amount of content out there on the Internet, but therein lies a problem. Sure, text content can be skimmed or glanced, but audiovisual content has to be viewed in linear time. It is very complex to search inside a film or audio recording for relevant information.

But European researchers in the MESH project have developed an integrated platform which they say, for the first time, can combine semantic search — or search by the meaning of the words — and a host of associated tools to deliver more relevant information, from a wide variety of sources that can be accessed from an individual user.

The platform can search any type of media that's been annotated — photographs, videos, sound recordings, text, document scans — using a host of techniques including optical character recognition, automated speech recognition and automatic annotation of movies and photographs that track salient concepts.

Semantic Shift

This represents an emerging paradigm shift in search technology. Right now, text in computing is defined by a series of numbers, most commonly the Unicode standard. Each number signifies a particular letter, and computers can scan these codes very quickly. So when you enter a search term, the machine has no idea what those letters signify. It simply looks for the pattern — it has no inkling of the concept behind the pattern.

But in semantic search, every bit of information is defined by potentially dozens of meaningful concepts. When a copywriter invoices for his or her work, for example, the date could be defined in terms of calendar, invoice, billing period, and so on. All these definitions for one piece of information are called 'metadata,' or information about information.

Collections of agreed metadata terms for a particular field or task, like medicine or accounting, are called ontologies.

So the computer not only searches for the term, it searches for related metadata that defines types of information in specific ways. In reality, the computer still does not 'understand' a concept in its semantic search — it continues to look for patterns of letters. But because the concepts behind the search terms are included, it can return results based on concepts as well as text patterns.

Extending Domains

These technologies are becoming common in particular knowledge domains, and more are emerging every day, but most relate to the concepts behind text-based documents. The MESH platform sought to use semantic search for every type of media.

On the way, it created some cutting-edge technology. "Our automatic annotation for video, for example, is state of the art," says Pedro Concejero, coordinator of the MESH project.

"The annotation system is capable of identifying the general scene setting, such as whether a video is a studio shot or a shot recorded on location. With adequate training, it can also detect (within some error margins) the general topic of the video, such as a scene about an earthquake or a flood. It can also find a number of salient objects within the scene, such as persons or fire, but cannot yet identify consistently objects with great variations in shape or aspect."

One of the major challenges of the project was a product of its own success: It annotated too much information!

"This is good — it is what we wanted the system to do — but the quantity of data was vast, too much to handle, so we had to find ways to cut down on the amount of metadata," Concejero tells ICT Results.

Manual Override

So the project developed a manual annotation tool that can, with a little training, be used by non-technical people. "It is a very powerful, very advanced professional program. There are other manual annotation tools available commercially, but we have developed a strong and user-friendly program that could probably compete very successfully with what is currently available."

For the project, the platform was developed to search video news sources relating to civil unrest and street violence, and natural disasters like earthquakes, forest fires and floods.

"We had to focus the demonstrator because there is a lot of work involved in developing ontologies for specific news topics. You would need to develop a very detailed ontology for politics, or crime and so on. We have designed the system so that it can accept ontologies from elsewhere, but for the demonstrator we reserved our work to these two domains," says Concejero.

The Beginning of the End?

The technology will not be challenging the industry leading search engines any time soon. This project does not necessarily mark the end of the type of keyword-based search that we use every day.

But it could well be the beginning of the end, and in the meantime the work of the MESH project will find a happy home in a number of standalone commercial applications and work will, in one way or another, continue to develop new applications.

The MESH integrated project received funding from the ICT strand of the EU's Sixth Framework Program for research.

From ICT Results
 


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account