The Apache Tika content analysis toolkit could help with the effort to teach computers to recognize, index, and search all the different types of material that is available online, writes Christian Mattmann, director of the University of Southern California's Information Retrieval and Data Science Group and principal data scientist at the U.S. National Aeronautics and Space Administration.
Mattmann says the tool enables users to understand any file and the information contained within it.
He notes improvements to Tika during the U.S. Defense Advanced Research Projects Agency's (DARPA) Memex project launched in 2014 made it even better at handling multimedia and other content found on the deep and dark Web.
Memex sought to create a search index that would help law enforcement identify human trafficking operations online--in particular by mining the deep and dark Web. Tika, which Mattmann co-developed, can now process and identify images with common human trafficking themes, and additional software can help it find automatic weapons and identify a weapon's serial number.
However, Mattmann says more work is needed to achieve Memex's goals. The tool is part of an open source software library available on DARPA's Open Catalog.
From The Conversation
View Full Article
Abstracts Copyright © 2017 Information Inc., Bethesda, Maryland, USA
No entries found