acm-header
Sign In

Communications of the ACM

BLOG@CACM

Human Subjects Research For the Twenty-First Century, Or, What Can We Learn from the Facebook Mood Study?


View as: Print Mobile App Share:
Carnegie Mellon Associate Professor Jason Hong

By now, everyone online has heard about the Facebook mood study. In the study, researchers selected a small subset of Facebook users and did an A/B study, with one group seeing posts that were more positive, and the other seeing posts that were more negative. The study evaluated how seeing more positive and more negative posts would affect people's moods. The researchers found that there was a very small increase of positive words in people's status updates for those in the positive condition, and vice versa.

The resulting media storm and discussion on social media forums has been fierce, to say the least. The concerns raised include lack of explicit informed consent for these studies, people saying that they felt manipulated, and people raising concerns that it may have led to harm for some individuals. There are even reports of politicians and regulators investigating the study.

My stance is that there is tremendous value to society in doing these kinds of large-scale A/B studies, and the current framework we have for human subjects research is a poor match for the 21st century. Furthermore, the Facebook mood study is just one of many examples of challenges we as a community will face in the future as increasingly powerful tools for understanding and influencing human behavior are developed and deployed. How can we make sure that these studies are done ethically, in a way that offers substantive protections to people, and in a manner that scales?

I should also point out for disclosure purposes that I currently have two papers under review with researchers at Facebook.

Why So Many Negative Reactions?

One thing I find very confusing is how the concerns about the mood study are narrowly targeted at scientists publishing research in a public forum. For example:

  • If a politician or pollster sought to influence and study people's moods, mark the checkbox, this is ok. 
  • If an advertising agency did the same, check, this is also ok.
  • If a news web site did the same, check, this is ok. If it bleeds, it leads.
  • If a designer at a company creates some user interfaces intended to influence people's moods, check, this is ok.
  • If a developer at a company wanted to influence people's moods, check, this is ok. (And oftentimes they inadvertently do, by creating frustratingly difficult-to-use interfaces that cause people to yell at their computers.)
  • If an analyst wanted to measure if the designer's or developer's changes had any effect on people's moods, check, this is ok.
  • If a scientist at a company wanted to rigorously experiment and measure changes, but framed it in terms of improving a product and didn't aim to publish it, check, this is ok.
  • But, if a scientist wants to share the results of their measurements and contribute to the body of knowledge in a scientific venue... whoa, hold on a second, is this ok?

This does not make sense. It seems strange that this last case is viewed as problematic, given that many of the cases above are ok. Two possible counter-arguments are (a) the cases above are not ok (which seems pretty hard to argue, given how widespread the first few cases are), and (b) scientists should be held to a higher standard due to their trusted position in society (which is a good argument).

So let's go with (b), that scientists should be held to a higher standard. How might these requirements be operationalized in practice? 

The Scale of A/B Studies Makes the Current Form of Informed Consent Infeasible

A common issue that people have raised is that participants in the Facebook mood study should have had informed consent. However, there are  a lot of pragmatic issues regarding the effectiveness of informed consent. In particular, it doesn't seem that people have thought through the issues of scale.

More specifically, if companies like Facebook, Google, Yahoo, Twitter, eBay, and others are doing hundreds of A/B tests per month, it is not very practical to get informed consent for every single study a person might be in. First, it would ruin the user experience. Imagine getting a dozen popups when you go to these sites asking if you want to participate in a study. Second, it would lead to habituation, with people just ignoring or blocking the informed consent notices. Furthermore, if the default is that people are in the study and can opt-out, most people would probably not notice that they agreed to be in a study (how often do you just swat away popups?). If the default is opt-in, researchers would be unlikely to get enough representative participants to do an effective study.

Even if the notices were somehow integrated into the site itself, I seriously doubt they would be more effective. Today's informed consent notifications on the web assume that people are fully rational creatures with a lot of free time on their hands. They are about as effective in informing people as web privacy policies and end-user license agreements (EULAs).

Some counter-arguments here are (a) people shouldn't be in these kinds of studies at all, (b) people should still have some kind of choice in what studies they participate in, or (c) there should be general informed consent in terms of use policies that people may be in research studies.

I think few people would argue (a), because there is a tremendous potential for improving services and science in general with these kinds of A/B studies. Also, as Ed Felten has noted, very few people would find things like A/B testing of link colors objectionable. Informed consent for studies like these don't really make sense, given that there is no potential for harm beyond what people experience in everyday life. 

The third case (c) of general informed consent is a tempting option, but isn't substantive in protecting people. It's just marking off the checkbox. A faculty in my department at Carnegie Mellon University commented that there is a well-known web site developed by researchers that tells people when they sign up that they may be put in studies, and that was sufficient for their Institutional Review Board (IRBs review human subjects studies for organizations receiving federal funding). This approach reminds me of something a privacy analyst once told me about smartphone apps. He half-joked that, to pass legal muster, developers should just say "we collect any data we want, and we will use it for whatever purposes we want." These kinds of statements may satisfy lawyers, but only offer a veneer of protection.

Using Big Data Approaches to Solve Big Data Problems

The second case (b), that people should still have some kind of choice, is the most interesting. How might we offer people choice and informed consent in a manner that also scales? I think this direction is worth serious consideration.

The most obvious approach is to establish an equivalent of an IRB in companies, to govern scientific research focusing on generalizable knowledge, and possibly studies aimed just at improving the product. This is an idea that many have advocated, including danah boyd and Matthew Salganik.

It's also worth thinking creatively about informed consent and assessing the level of harm. Review boards could easily become bottlenecks if hundreds of A/B studies are being run at any given time. One option might be to have general user profiles that let people say what classes of studies are ok for them to participate in, though this leads to issues of opt-in versus opt-out, as well as knowing what the categories are upfront.

Another option might be to map out characteristics of studies that don't need approval, and those that do. For example, studies dealing with basic usability issues - such as color, layout, or text labels - should very likely be exempt. Studies that touch on issues of sexuality or grieving should probably be reviewed very closely.

Can we use big data approaches to solve big data problems? Crowdsourcing approaches might actually be effective in getting large quantities of data about what kinds of studies people generally feel are safe and innocuous, versus those that are not. As one example of a study, Olson, Grudin, and Horvitz had 30 participants fill out forms to gauge how sensitive various pieces of information were, for example age, marital status, salary, health status, email, and so on. Using this data, they were able to create models showing different levels of sensitivity to different classes of information. In some work by my colleagues and myself, we used crowdsourcing approaches to assess people's level of comfort regarding the use of smartphone data for particular purposes (e.g. location for ads, or contact list for social networks). Using this data, we have been looking at how to create predictive models of people's privacy concerns.

A third option might be to use crowdsourcing to ask a small slice of the population what they think about a study before rolling it out. In some ways, this approach would mimic IRB structures, which often include a person from the local community. A variant would be to make public a list of what studies a company is running, which could improve transparency. However, companies would likely be reluctant to share anything that might tip off competitors on possible upcoming features or give them ideas of things that they could also be testing.

A fourth option might be to do small-scale studies with full informed consent first, along with a  debriefing afterward. If participants only feel minimal potential for harm, then the study could get the green light to be scaled up.

These last two options of involving participants in what studies are conducted could also have other benefits. Rather than features and changes being pushed top-down, people might feel they have more ownership in the direction of their community. Rather than just being consumers, people might also feel like active and engaged citizens. (As an amateur historian, this reminds me of how ancient Greeks used to fill government offices by randomly selecting people from the general population. See Wikipedia's article on sortition for fun details.)

Circling Back: What Really Matters for Human Subjects Research

I strongly believe that the underlying rationale behind IRB and human subjects still makes a lot of sense. The principles of respect for persons, beneficence, and justice, as outlined in the Belmont Report, still matter. However, these principles should not be confused with the mechanisms in place today, as these mechanisms are becoming increasingly mismatched with emerging tools and techniques.

It's also worth considering if new principles are needed to guide human subjects research, both federally funded as well as corporate. The kinds of challenges we are seeing with large-scale A/B testing are just the tip of the iceberg. We really are on the cusp of a scientific revolution: the widespread adoption of social media and smartphones, the coming deployment of wearable technologies and sensor networks, and the rise of quantified self are making it so that we can understand human behavior at a fidelity and scale we have never been able to do before in history.

The potential of this new kind of science for bettering humanity is tremendous. But there are also legitimate concerns about privacy, self-determination, autonomy, who wins, and who loses.

There are also pragmatic concerns about scale, information overload, and expectations that are not addressed very well today. Things that might work for a single study in isolation may be less effective if people are part of dozens of studies.

>We also need substantive and practical protections for people, rather than just simply checking the boxes. However, these protections need to be proportionate and not overly burdensome on designers, developers, and researchers. A mistake that some IRBs have made is lumping medical, social, ethnographic, and HCI research all together, burdening some researchers with processes and requirements that don't make sense for their methods and their context. Helping to map out levels of acceptability and potential harm could go a long ways toward shedding light on this issue.

I don't have the answers. It's likely that we will be debating these issues for many years. I'll end with a question instead: how can we create a connected world that we would all want to live in?


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account