Chinese Academy of Sciences researchers have designed a novel mutual attention inception network (MAIN) and a remote sensing visual question answering (RSIVQA) dataset.
RSIVQA chiefly concerns adding objectivity and interactivity to semantic comprehension of remote sensing images (RSIs), with most techniques limited due to their disregard of RSIs' spatial information and the word-level semantic information of questions.
The MAIN combines a representation and fusion module; the former was designed to acquire image and question features which can provide better representations, while the latter augments the discrimination of representations that can yield correct answers by reinforcing image-question representations.
Experimental results indicated the method can identify image-question alignments under different evaluation metrics.
From Chinese Academy of Sciences
View Full Article
Abstracts Copyright © 2021 SmithBucklin, Washington, DC, USA
No entries found