Karl Moritz Hermann and his team at Google DeepMind say the particular way some news websites display online articles facilitates the creation of a database computers can use to learn.
The Daily Mail, MailOnline, and CNN websites present news stories with the main points of the story displayed as bullet points that are written independently of the text. "Of key importance is that these summary points are abstractive and do not simply copy sentences from the documents," the researchers note. An annotated database can be formed by taking the news articles as the texts and the bullet-point summaries as the annotation.
Using Cloze query, which machine-learning algorithms are often used to solve, Hermann and his team anonymized the dataset by replacing the actors in sentences with a generic description. The resulting database is substantial, consisting of 110,000 articles from CNN and 218,000 articles from the Daily Mail website. The researchers are using the database to compare conventional natural-language processing techniques, such as measuring the distance between combinations of words, and more modern neural network approaches.
Hermann and his team say the best neural nets can answer 60 percent of the queries put to them, indicating these machines only have trouble with queries with complex grammatical structures.
From Technology Review
View Full Article
Abstracts Copyright © 2015 Information Inc., Bethesda, Maryland, USA
No entries found