acm-header
Sign In

Communications of the ACM

ACM TechNews

Searching Deep and Dark: Building a Google For the Less Visible Parts of the Web


View as: Print Mobile App Share:
A map depicting hotbeds of dark web activity related to illegal products. Larger circles indicate more activity.

The Apache Tika content analysis toolkit could help with the effort to teach computers to recognize, index, and search all the different types of material that is available online, writes the director of the University of Southern California's Information

Credit: Christian Mattmann

The Apache Tika content analysis toolkit could help with the effort to teach computers to recognize, index, and search all the different types of material that is available online, writes Christian Mattmann, director of the University of Southern California's Information Retrieval and Data Science Group and principal data scientist at the U.S. National Aeronautics and Space Administration.

Mattmann says the tool enables users to understand any file and the information contained within it.

He notes improvements to Tika during the U.S. Defense Advanced Research Projects Agency's (DARPA) Memex project launched in 2014 made it even better at handling multimedia and other content found on the deep and dark Web.

Memex sought to create a search index that would help law enforcement identify human trafficking operations online--in particular by mining the deep and dark Web. Tika, which Mattmann co-developed, can now process and identify images with common human trafficking themes, and additional software can help it find automatic weapons and identify a weapon's serial number.

However, Mattmann says more work is needed to achieve Memex's goals. The tool is part of an open source software library available on DARPA's Open Catalog.

From The Conversation
View Full Article

 

Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account