Researchers at the Indian Institute of Technology Bombay led by Aditya Joshi report developing a new strategy to help computers better detect the presence of sarcasm in sentences. Their algorithm analyzes the similarity of words by examining how they relate to each other in Word2Vec, a database of Google News stories containing about 3 million words. Joshi says extensive analysis determined how frequently words appear next to each other so they can be represented as vectors in a high dimensional space. Similar words are represented by similar vectors, which enable vector space mathematics to capture simple relationships between the words.
The researchers say sentences that contrast similar and dissimilar concepts are more likely sarcastic, and they tested this theory using a database of 3,629 quotes, 759 of them tagged by people as sarcastic. By comparing the word vectors in each quote while seeking similarities and dissimilarities, the researchers found the algorithm improved sarcasm detection. Joshi says the algorithm's errors are likely because many words contain multiple definitions not captured in the Word2Vec database, or because the word pairs have high similarity scores. He also notes in some cases the algorithm incorrectly identified sentences as sarcastic.
From Technology Review
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found