ACM

Communications of the ACM

Home/News/The View from On High/Full Text

ACM News

The View from On High

By Sandrine Ceurstemont
Commissioned by CACM Staff
February 23, 2021
Comments

View as: Print Mobile App Share:

Earth's western hemisphere, as seen from space. — Satellite imagery could address the limitations in traditional sources of data.

Credit: U.S. National Aeronautics and Space Administration

Societal problems such as poverty and disease outbreaks have typically been monitored by gathering data from people on the ground. However, views of the Earth captured by satellites are now being investigated as alternative sources of data that can be analyzed using machine learning (ML).

"The big jump was the development of new satellites that provided high-resolution imagery where you can see more things," says Stefano Ermon, assistant professor in the Department of Computer Science at Stanford University, "and recent developments in computer vision and machine learning that enabled us to actually make sense of these images."

Using satellite images could help address limitations in traditional sources of data. When monitoring the spread of a disease, such as during the current pandemic, for example, test results are typically the main source of information, but accessing results quickly can be difficult. "Sometimes test results lag behind, or people get tested but the transmission of that information to relevant authorities can take time," says Elaine Nsoesie, assistant professor at the Boston University School of Public Health in Massachusetts.

Information extracted from satellite images can be available within a few days to help spot an emerging outbreak before traditional data is available. In a feasibility study conducted several years ago, Nsoesie and her colleagues wanted to assess whether high-resolution images of hospital parking lots captured by satellites could help estimate cases of respiratory virus illnesses, where there are seasonal surges, by showing related changes in traffic levels. "We were thinking that if we were in a situation where we didn't really have good data on an outbreak, could we look at how people were using the hospital parking lot as a way to inform us," says Nsoesie.

To test their idea, the team used 2,890 satellite images of 15 healthcare facilities in Argentina, 26 in Mexico, and 13 in Chile, taken between January 2010 and May 2013. They focused on specific features of the images, such as the number of cars in the parking lot relative to the number of spots available and the number of occupied parking spaces surrounding the hospital, to train a simple ML model called Elastic Net to estimate the percentage of flu-like illnesses each week in each country. This data was sourced from the Pan American Health Organization (PAHO).

The team created models using satellite photos captured at different times, ranging from 52 weeks before a week of interest to four weeks afterward. They also looked at whether weather, social unrest, or natural disasters influenced hospital parking lot occupancy, and therefore coulds skew their results for influenza-type conditions.

Nsoesie and her colleagues found their models could predict respiratory illness trends quite well using recent satellite observations, such as hospital parking lot images captured a few weeks before a week of interest. Although there was a significant correlation between civil unrest in Mexico and hospital parking lot data, the other variables didn't seem to play a role.

However, information from these models are not likely to be enough to predict an outbreak. According to Nsoesie, they would act more as a prompt that should be followed up with further investigation. "There would be a second step where there is testing to confirm that actually there is an outbreak happening in that place," she says. "It could be that people are reporting that there is a potential new disease because symptoms are different, but we still need laboratory confirmation of that."

Using multiple sources of data can help provide additional clues. In follow-up work, Nsoesie and her team have been investigating whether satellite imagery of hospital parking lots could help reveal whether the Covid-19 virus was circulating in China prior to November 2019, when the first case was detected, as some evidence suggests. They also are examining search queries from a Chinese search engine to see whether certain virus symptoms were also being searched for earlier on. That would show whether the same trend is apparent in different data sources, says Nsoesie.

A lack of data is also driving the use of satellite images to help alleviate poverty. Surveys are typically conducted to assess asset wealth, which typically involves visiting households and filling out questionnaires. However, since it's an expensive process, data from many African countries is infrequent, as surveys are usually more than four years apart and, in some countries, have never been carried out at all. Assessing changes in wealth to meet targets, such as the United Nations' sustainable development goal to eradicate extreme poverty by 2030, can therefore be difficult, says Ermon. "How do you measure whether you're making progress towards that?" he says. "How do you figure out whether interventions are working and whether aid is deployed in the way it should be?"

Since satellite images are updated frequently, and are often publicly available and therefore inexpensive, they are promising as an alternative data source. In recent work, Ermon and his colleagues used deep learning convolutional neural networks (CNNs), which are well-suited for processing images, to develop models that could predict asset wealth based on satellite images.

To train their models, algorithms learned to associate patterns in satellite images with asset wealth values from survey data from 19,669 villages in 23 African countries. Images captured both during the day and at night were used since they reveal different features relevant to wealth: houses or agricultural land can be seen in daytime views, for example, whereas light intensity at night can help assess access to electricity.

The models then were tested by predicting the asset wealth in places where survey data was available, but had not been used in training. They performed well and were able to account for about 70% of the variation between different regions seen in survey data they hadn't been trained on. Discrepancies between survey estimates conducted by different organizations were also of a similar order of magnitude to differences in their predictions. "We believe that there is noise in the ground truth data, just because of the way it's collected, so we actually think that we are fairly close to what can be done," says Ermon.

Interpretability is an issue, though. Since CNNs learn on their own, Ermon and his colleagues are not exactly certain which features their models were learning to recognize. Using visualization techniques, they were able to get some clues, which suggest they were picking out meaningful features related to asset wealth such as urban areas, farmland, and desert. In follow-up work, however, the team used a more transparent approach where CNNs were trained to recognize relevant objects in an image, such as buildings and vehicles, to assess poverty in Uganda.

Jonathan Hersh, assistant professor of economics and management science at the Chapman University Argyros School of Business in Orange County, CA, who was involved in similar work to determine economic well-being in Sri Lanka, thinks using interpretable models is a good approach partly because it helps with model trust. "Performance should be higher with non-interpretable features, however in many examples there doesn't appear to be a clear diminution of performance from using interpretable features," he says.

Ermon thinks his team's models could be improved by using higher-resolution satellite images, in which more details of a location can be seen. However, these images are not publicly available, could be expensive to acquire and also could require more computational resources to analyze.

Hersh also thinks there are additional risks when moving away from open data. "Once the data pipeline is established the imagery provider has pricing power to increase the cost of imagery," he says. "Their pricing power is decreased with an outside option that works well (as is the case with freely-available imagery)."

Using additional sources of data is another avenue to pursue. Crowdsourced images collected on the ground from GoPro cameras, for example, could be incorporated with information from satellite images to improve accuracy. "There's a lot of information there that you can't see from space," says Ermon.

Thanks to interest from a wide range of organizations, Ermon and his colleagues launched a start-up a few years ago that is applying their algorithms to inform a wider range of societal issues. Called Atlas AI, the company has been involved in projects such as using satellite imagery for estimating crop areas and yields in Malawi and Ethiopia, and maize yields in Kenya during the Covid-19 pandemic. "There are all kinds of things you can do with this technology," says Ermon. "We're pretty excited about the response that we got in the market."

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

No entries found