ACM

Communications of the ACM

Home/Magazine Archive/March 2024 (Vol. 67, No. 3)/Generative AI Degrades Online Communities/Full Text

Opinion

Generative AI Degrades Online Communities

By Gordon Burtch, Dokyun Lee, Zhichen Chen
Communications of the ACM, March 2024, Vol. 67 No. 3, Pages 40-42
10.1145/3624732
Comments (1)

View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share:

balloon with a question mark above three figures holding their hands towards a digital face, illustration — Credit: Andrij Borys Associates, Shutterstock.AI

Imagine you are at a crossroads in a complex project and you need quick answers on how to grapple with a problem. It is quite likely that you might turn to an online knowledge community for answers, one hosted by your company, or perhaps Stack Overflow, Quora, or Reddit. These communities have come to play a central role in knowledge exchange, in many corners of the economy and society, but they depend on voluntary participation from users just like you and me.

Our recent research indicates an intriguing shift is now taking place: generative AI technologies, such as OpenAI's large language model (LLM) ChatGPT, are disrupting the status quo. Increasingly, users are gravitating toward these new AI tools to obtain answers, bypassing traditional knowledge communities. In this column, we delve into recent work documenting ChatGPT's influence on user participation in online communities. A key insight we offer is that communities lacking social fabric are suffering most. We then propose a research agenda for better understanding these evolving impacts of generative AI.

(Some) Online Knowledge Communities Are Struggling

In our recent research, we estimate that, by late March 2023, ChatGPT had driven an approximate 12% reduction in average daily Web visits to StackOverflow.com, the world's largest online knowledge community for software developers. Further, among the 50 most popular topics on Stack Overflow, we estimate the average volume of questions posted per week had declined by more than 10%, per topic, and we find that the declines in community participation have in turn led to a significant degradation in the quality of answers the community provides. This combination of findings raises the prospect of a vicious cycle, with negative implications for the long-term health and sustainability of online knowledge communities.^4,5 These concerns are not necessarily limited to StackOverflow.com; the potential exists for a similar dynamic to play out in any online knowledge community, including those that are private and firm-hosted, catering to employees.^a That said, we have also found that ChatGPT's negative effects depend crucially on context. So, when do these negative consequences emerge and what can we do about them?

ChatGPT Excels when It Has Training Data

ChatGPT generates believable text about nearly any subject, but there is a big difference between "believable" and "correct." ChatGPT, like other LLMs, is trained on large swaths of publicly available data, in large part scraped from online forums such as Stack Overflow and Reddit. Given differences in the volume of available data, ChatGPT's performance naturally varies by topic and may in turn affect communities to different degrees.

We observed that ChatGPT's impact on Stack Overflow participation varies significantly across topics, aligning with its expected performance based on available training data. Those topics related to open-source tools and general-purpose programming languages (for example, Python, R) appeared to experience larger declines in participation and contribution than proprietary and closed technologies, such as those employed for enterprise server-side development (for example, Spring Framework, AWS, Azure).

For better or worse, the quality of output LLMs can produce based on such publicly available data appears to be peaking. Recent work has documented that GPT, for example, has begun to exhibit declines in the quality of its output.³ It has been suggested this decline in performance is the expected result of a feedback loop, wherein data collected for training is increasingly contaminated by GPT itself, as users leverage the technology to produce and post content online.¹² In an ironic twist, this suggests that the incentives of generative AI companies may be aligned with those of society more broadly; they have a vested interest in encouraging users to continue contributing organic, unadulterated content.

ChatGPT Still Does Not Substitute for Human Social Connections

LLMs are better suited to some tasks than others. Users engage in online knowledge communities for a variety of reasons, beyond the simple desire to obtain information. While generative AI may and can often act as a useful source of information, its capacity to substitute for human social connections is much weaker. Many communities manage to foster a sense of solidarity and peer attachment among members.^6,8 Consider that, whereas Stack Overflow is notorious for its focus on pure information exchange,^b Reddit is comparatively social in nature.^1,10 Repeating our analysis employing data from Reddit communities that focus on the same sets of technology topics we considered at Stack Overflow, we found virtually no evidence of any declines in participation following ChatGPT's emergence (we depict these divergent effects graphically in the accompanying figure). It therefore appears that a robust social fabric will be crucial to the health and sustainability of online knowledge communities going forward.

Figure. Estimates of ChatGPT's effect on Stack Overflow weekly question volumes (left) versus Reddit posting volumes (right).

Research Agenda: Knowledge Management in the Era of Generative AI

Our findings raise several important, open questions and issues social and computer scientists can and should look to address going forward. These questions collectively relate to the role of individuals in knowledge production and sharing, and how those roles change in the face of advancing AI.

Social interaction is key. While our recent work suggests social connection can provide some protection against the eroding influence of generative AI for online knowledge communities, how one achieves social connection is rather open-ended.⁷ So, in the presence of generative AI, how can online communities be redesigned to facilitate an increased focus on social interaction, while maintaining the quality and efficiency of knowledge provision and search? How might platform features be adjusted to encourage users to engage with each other instead of, or as a complement to, AI-generated content? One useful prospect to consider is that peer experts can provide a helpful point of verification to ensure the information supplied by an LLM is accurate and optimal. More generally, how can AI be leveraged to enhance, rather than replace, human interactions in online communities?

For better or worse, the quality of output LLMs can produce based on publicly available data appears to be peaking.

An important approach to consider is the incorporation of LLMs directly into the community interface, a prospect that obviously requires a thoughtful, well-considered design. Indeed, Stack Overflow has recently announced OverflowAI, an in-house LLM that integrates with the Stack Overflow user interface.^c However, this strategy is likely to be less successful in an open setting, as other LLM alternatives exist for users outside of Stack Overflow's domain. By contrast, the strategy is likely to be more successful inside a firm-hosted online knowledge community, if employees lack access to outside alternatives as a matter of policy (for example, in the presence of employer bans on employees' use of third-party generative AI).

Generative AI usage policies. What other strategies might be pursued to encourage continued user engagement and knowledge sharing in an online community? As suggested in this column, many organizations are presently considering acceptable use policies or outright bans on the use of generative AI in the workplace. Those initiatives have largely been motivated by information security concerns, but they nonetheless have the potential to help ensure sustained knowledge sharing. Samsung, as one example, has recently banned employees' use of LLMs after discovering confidential company data had been ingested by GPT.^d That said, anecdotal media reports indicate that employees continue to employ ChatGPT, despite workplace bans.^e A lengthy literature on employee policy compliance speaks to whether and when employees abide by workplace regulations.⁹ Future work might explore these generative AI usage bans and acceptable use policies, to understand compliance and impacts.

Generative AI is having large, negative impacts on user participation and contributions to online communities.

Rewarding users for their contributions. Users participate in online communities voluntarily and they face challenges internalizing the value of the content they contribute.² As a result, content is often scarce and "under-provided." Generative AI tools are exacerbating the problem by shrinking the audience a user can expect their contributions to reach. Further, generative AI tools are trained on data scraped from the Internet, which has prompted negative reactions from online community members. Several artists have recently filed class-action lawsuits against Stability AI and Midjourney,^f and writers of popular fan fiction have begun to withhold their contributions, to prevent AI companies from profiting off their work.^g So, how can we compensate users for their content? Reddit and Twitter have begun charging for API use, in part hoping to obtain payment from AI companies for their users' data, a change that has only made matters worse, driving contributors to exit.^h Reddit moderators recently went on strike, protesting the impact of the new payment policy on accessibility and moderation tools built around the free API, and raising questions about whether Reddit should be passing some revenues on to users.

Redesigning approaches to training and education. In addition to devising interventions and strategies for managing employees, knowledge sharing and online knowledge communities in the age of generative AI, scholars must also attend to the implications of generative AI tools as a novel source of information, in lieu of peers, as this has potentially far-reaching implications for student and employee education and training. Online communities typically provide a rich source of learning opportunities, with users able to learn not only from the answers to their own questions but also from the questions and answers of peers. If such opportunities begin to diminish and users begin to rely increasingly on generative AI tools in isolation, this raises the question of how new knowledge will be produced, documented, and shared. A lengthy literature on knowledge hiding considers the antecedents and consequences of knowledge withholding in organizations,¹¹ and how to manage the impediments that give rise to it. Future work can explore the design of explicit policies, processes, and incentives around knowledge sharing, accounting for generative AI use, as a means of maintaining the status quo. So, how might organizations incorporate generative AI tools into existing training and retraining efforts?

Conclusion

Generative AI is having large, negative impacts on user participation and contributions to online knowledge communities. As users depart, the average quality of contributions is also beginning to decline, raising the prospect of a vicious cycle. As we continue to navigate this new landscape, it is crucial that we develop an understanding of the consequences of generative AI. We must work to identify strategies and information system designs that can ensure the health and sustainability of online knowledge communities, and of knowledge sharing more broadly.

References

1. Antelmi, A. et al. The age of snippet programming: Toward understanding developer communities in stack overflow and Reddit. In Companion Proceedings of the ACM Web Conf. 2023 (Apr. 2023); 1218–1224.

2. Burtch, G. et al. How do peer awards motivate creative content?. Experimental Evidence from Reddit Management Science 68, 5 (2022), 3488–3506.

3. Chen, L. et al. How is ChatGPT's behavior changing over time? (2023); arXiv preprint arXiv:2307.09009.

4. Faraj, S. et al. Special section introduction—Online community as space for knowledge flows. Information Systems Research 27, 4 (2016), 668–684.

5. Hwang, E.H. et al. Knowledge sharing in online communities: Learning to cross geographic and hierarchical boundaries. Organization Science 26, 6 (2015), 1593–1611.

6. Katz, J.E. and Rice, R.E. Social Consequences of Internet Use: Access, Involvement, and Interaction. MIT Press (2002).

7. Kraut, R.E. and Resnick, P. Building Successful Online Communities: Evidence-Based Social Design. MIT Press (2012).

8. Ren, Y. et al. Applying common identity and bond theory to design of online communities. Organization Studies 28, 3 (2007), 377–408.

9. Sarkar, S. et al. The influence of professional subculture on information security policy violations: A field study in a healthcare context. Information Systems Research 31, 4 (2020), 1240–1259.

10. Sengupta, S. Learning to code in a virtual world: A preliminary comparative analysis of discourse and learning in two online programming communities. In Conf. Companion Publication of the 2020 on Computer Supported Cooperative Work and Social Computing (Oct. 2020), 389–394.

11. Serenko, A. and Bontis, N. Understanding counterproductive knowledge behavior: antecedents and consequences of intra-organizational knowledge hiding. J. Knowledge Management 20, 6 (2016), 1199–1224.

12. Shumailov, I. et al. The Curse of Recursion: Training on Generated Data Makes Models Forget. (2023); arXiv preprint arxiv:2305.17493

Authors

Gordon Burtch ([email protected]) is Kelli Questrom Chair and Associate Professor in Information Systems and Fellow of the Digital Business Institute at the Questrom School of Business, Boston University, Boston, MA, USA.

Dokyun Lee ([email protected]) is Kelli Questrom Associate Professor of Information Systems at the Questrom School of Business, Boston University, Boston, MA, USA.

Zhichen Chen ([email protected]) is a Ph.D. student in Information Systems at the Questrom School of Business, Boston University, Boston, MA, USA.

Footnotes

a. The Stack Overflow community infrastructure is also used by thousands of private organizations: https://try.stackoverflow.co/get-teams

b. https://bit.ly/3HCvHum

c. https://bit.ly/3u4yxFH

d. https://bit.ly/42ivR3I

e. https://cnb.cx/3UkEAQQ

f. https://bit.ly/3uf10sg

g. https://bit.ly/3HCvEia

h. https://bit.ly/47YODOR

Comments

Christopher Rousseau

February 26, 2024 11:04

The decline in contributions does not surprise me, and it is worrying. LLMs do not create knowledge. Their very nature is derivative; meaning they depend on their training data to generate responses. LLMs are valuable because they can draw from much larger datasets than a human, but they are still derivative. One example dataset they draw from is StackOverflow. If people stop contributing to StackOverflow, LLM training data will stagnate and the quality of LLM responses will degrade. This will create another negative feedback loop with LLMs generating increasingly poor results. Perhaps that will drive people back to Online communities, but the long-term damage to online communities is hard to forecast and they may not return to their current vibrancy.

Displaying 1 comment