acm-header
Sign In

Communications of the ACM

Review articles

Rethinking Search Engines and Recommendation Systems: A Game Theoretic Perspective


maze extends to horizon

Credit: Swill Klitch

In her popular book, Weapons of Math Destruction, data scientist Cathy O'Neil elegantly describes to the general population the danger of the data science revolution in decision making. She describes how the US News ranking of universities, which orders universities based on 15 measured properties, created new dynamics in university behavior, as they adapted to these measures, ultimately resulting in decreased social welfare. Unfortunately, the idea that data science-related algorithms, such as ranking, cause changes in behavior, and that this dynamic may lead to socially inferior outcomes, is dominant in our new online economy.

Back to Top

Key Insights

ins01.gif

Ranking also plays a crucial role in search engines and recommendation systems—two prominent data science applications that we focus on in this article. Search engines (for example, Google and Bing) rank Web pages, images, and other items in response to a query. Recommendation systems endorse items by ranking them using information induced from some context—for example, the Web page a user is currently browsing, a specific application the user is running on her mobile phone, or the time of day.

In the industrial search and information retrieval community, the practice of adapting content to satisfy a search engine's ranking function or that of a recommendation engine has been typically referred to as search engine optimization (SEO), which has become one of the most profitable professions in the data science era. Accordingly, there has been much work on devising algorithms that block spammers and guarantee that content presented to users is of high quality. Yet, virtually all retrieval and recommendation algorithms ignore post-ranking/recommendation effects (that is, manipulations applied to items) on the corpus. As it turns out, this reality results in underoptimized fundamental paradigms for devising search and recommendation algorithms as we discuss here. For example, a publisher of a Web page can try and mimic other Web pages to promote her page in rankings, thereby potentially causing content changes that are not for the better—for example, reducing content breadth in terms of topical coverage.

These observations call for action. We believe the design of search and recommendation methods central to the Web and to online media services, must be revolutionized to account for the strategic incentives of the various parties that affect (or are affected by) the system. The parties can be publishers/content providers, the systems themselves (for example, search engines) that might be in competition with each other, and/or users of the systems. We are in the process of establishing an entirely new repertoire of incentive-compatible search and recommendation algorithms. These are devised through pioneering the application of game theoretic mechanism design to these tasks.

Game theory is the branch of mathematics that models multiagent interactions. Mechanism design is the part of game theory that designs protocols/algorithms for environments consisting of self-motivated participants. Mechanism design has been central in bridging computer science and game theory,29 including wide application to electronic commerce (for example, Cramton10), advertising (for example, Varian38) and routing networks (Roughgarden34).

In this article, we survey our recent work on creating theoretical foundations and developing empirically evaluated algorithms to build a fundamental bridge between game theory, and more specifically mechanism design, and search and recommendation. We argue it is only natural to model the competitive search and recommendation settings using game theory. For example, considering the system (search engine or recommendation system) as a mediator and those who produce items to be searched or recommended as players entails a suite of specific game settings. Items being highly ranked or recommended amount to a high "pay off" for the owner of the item. Game-theoretic modeling allows to reason about the strategic behavior of players and the situations the setting can converge to (namely, equilibria). Furthermore, the task of devising search or recommendation algorithms becomes a mechanism design (or more specifically, mediator design) problem—a completely new view of these two important tasks.

The incentivized players we mainly focus on here are content providers/publishers who create and manipulate the retrieved/recommended items. In addition, we describe our work on addressing the competition between search engines—that is, where the engines are the incentivized players—and more generally, competition between prediction algorithms, focusing on regression; and addressing some of the implications of the strategic behavior of users of recommendation systems in social networks.

We discuss some of the fundamental, and far reaching, modeling and algorithmic implications of the competitive and incentivized settings in which search engines operate. Specifically, we examine post-ranking corpus effects and survey our work on a game-theoretic analysis of the strategic behavior of publishers (content providers) and on addressing ranking robustness. We also briefly describe the challenges of empirical evaluation and analysis in competitive search settings. Later, we focus on our theoretical work on the suboptimality in competitive settings of the most basic search (ranking) and recommendation principle and we survey our work on settings where incentivized parties are ranking algorithms and prediction algorithms (regression) that compete with each other.

Back to Top

Search Engines

The main goal of search engines is to rank documents in a corpus by their presumed relevance to the information need expressed by a query posted by a user. There has been a huge body of work on devising relevance estimates for documents. For example, the similarity between term-based vectors representing documents and queries serves as a relevance estimate in the well-known vector space model.35 Modern relevance ranking functions are learned (trained) using a training set which contains queries and corresponding relevance judgments of documents.25,28 These approaches allow to integrate various types of relevance estimates.

The theoretical foundation of virtually all ad hoc retrieval paradigms, classical and modern, is the probability ranking principle (PRP):33 Documents are ranked by the probability that they are relevant to the query wherein the probability is estimated using all the information available to the search system. The PRP is the direct basis of the probabilistic retrieval approach20 and was shown to underlie the basic language modeling approach.23 Feature-based learning-to-rank methods can also be viewed as obeying the PRP, even if relevance probability is not directly estimated. That is, these approaches are trained to minimize ranking loss with respect to ground truth ranking and they essentially integrate various relevance estimates so as to improve overall relevance estimation that guides ranking. Neural-network-based retrieval methods estimate relevance (often probability) by learning document and query representations and/or integrating multiple relevance signals.28 As feature-based approaches they can be viewed as obeying the PRP. The PRP was shown to be optimal under some assumptions; for example, the independence of document relevance from that of others.33


We believe the design of search and recommendation methods must be revolutionized to account for the strategic incentives of the various parties that affect (or are affected by) the system.


Post-ranking dynamics. Careful examination of the PRP, and retrieval methods based on the PRP, reveals a major gap: post-ranking effects are not accounted for; specifically, changes in the corpus documents made by incentivized publishers that respond to the induced ranking so as to improve the ranking of their documents. We focus on publishers (content providers) as those affected by, and affecting, the dynamics. We do not consider the dynamics driven by users of the systems, for example, via clickthrough actions. Later in the article, we discuss the importance, challenges, and potential future directions of accounting simultaneously for publisher and user-driven dynamics.

To highlight the importance of accounting for post-ranking effects on corpus content consider the following example.2 A publisher writes a document that contains both common information and information which is unique to the document; that is, this unique information cannot be found in any other document in the corpus. The publisher, for some reason, is more interested in her document being highly ranked for queries that touch on the non-unique information. However, the ranking function "penalizes" documents having a mixture of information (that is, not being focused on the information sought for in these queries). In terms of the PRP and current retrieval paradigms the approach of the ranking function is justifiable: one can assume that the user who posted the query would prefer reading documents that are focused on information related to the query rather than having to wade through potentially long documents and distill the sought information. Now, suppose that the publisher decides to remove the unique information from the document so as to obtain higher ranking for queries pertaining to the common information. This means that valuable information in the corpus is lost, and search effectiveness for queries which target the unique information will consequently degrade. Overall, the potential satisfaction of users' information need with respect to the corpus will decrease.

Indeed, we showed, using a game theoretical analysis, that the PRP is suboptimal as it does not promote content breadth in the corpus in terms of topical coverage.2 In other words, retrieval methods based on the PRP do not provide a strong enough incentive for publishers to produce diversified content. It also turns out that introducing randomness to ranking functions can help to promote content breadth in the corpus, and hence, increase the overall attainable utility for search engine users. We will discuss the sub-optimality of the PRP and the merits of using nondeterministic ranking functions later.

Given the significant impact of induced rankings on the content in the corpus, due to the actions employed by incentivized publishers who respond to these rankings, the natural challenge that rises is analyzing the strategic behavior of publishers. Previous work has characterized search engine optimization (SEO) techniques intended to promote documents in rankings.15 We also note that content dynamics on the Web was studied and analyzed (for example, Raifer et al.32 and Santos et al.36). However, the post-ranking perspective has not been actually modeled or analyzed; that is, the specific types of responses of publishers to rankings, and more generally, their ranking-motivated strategies in terms of document manipulation, were not studied.

Strategic publishers. Since the early days of the Web, different types of SEO techniques have been identified.15 For example, publishers can "stuff" keywords in their Web pages so as to promote them in rankings induced for queries that include these keywords. The underlying (often correct) assumption is that increased similarity between the query and the Web page increases the retrieval score of the page, and hence improves its potential rank position. On the other hand, compromising the quality of the page by filling it with potentially query relevant keywords can also reduce the retrieval score as ranking functions often utilize page-quality estimates.6

There is a fundamentally important question that goes beyond the actual general actions that publishers use as part of their SEO efforts: What is the strategic behavior of the publishers with respect to induced rankings? In other words, given that they do not know what the ranking function is,a but they can observe past rankings induced for queries of interest, what would be an effective response strategy to rankings?b We have recently addressed this question using a game theoretic analysis.32 Our main theoretical result was that a "worthwhile" strategy for publishers who want to promote their documents in rankings induced for a specific query is to make their documents become more similar to those highly ranked in the past for the query. By "worthwhile" we refer to the fact that this strategy results in a certain equilibrium in a game-theoretic modeling of the retrieval setting wherein publishers are players and the ranking function is a mediator, not exposed to the players, which induces rankings for queries.

The intuitive theoretical finding that mimicking the "winner" from previous rankings is worthwhile was supported by analyzing the strategic publishing behavior of students who served as publishers in a content-based ranking competition we organized. We provide details of this competition later.

Addressing unwarranted effects of the dynamics. As discussed, a major part of the dynamics of a corpus in competitive retrieval settings is driven by incentivized publishers. The dynamics can have undesirable effects, specifically, in terms of degrading retrieval effectiveness. For example, some Web pages could be spam that is intended to be promoted in rankings and to attract clicks—that is, black-hat SEO.15 At the same time, corpus dynamics is driven to a major extent by white hat SEO efforts which need not necessarily hurt retrieval effectiveness; these are legitimate actions applied to Web pages so as to promote them in rankings. However, even white-hat SEO can lead to undesirable effects; for example, rapid changes to the relative ranking of documents due to indiscernible document manipulation. As a result, users might respond by consistently reformulating their queries or simply losing faith in the search engine.

Ranking robustness: A blessing or a curse? Following the arguments just posed, one should presumably opt for ranking robustness; that is, small indiscernible changes of documents should not result in major changes to induced rankings.14 Some support for this argument can be drawn using one of the most fundamental hypotheses in the field of information retrieval, namely, the cluster hypothesis18 as we recently described.14 Additional support can be drawn from arguments made in work on adversarial, classification, specifically, in the vision domain. The main premise in this line of work was that small indiscernible (adversarial) changes of objects should not result in changes to classification decisions.11,13 Thus, while changes of rankings in our setting that reflect discernible changes of, or differences between, documents are naturally very important, changes of rankings that are "harder to explain" (a.k.a. explainable IR) are less warranted.

On the other hand, there are arguments for hurting ranking robustness, to some extent, for special purposes. For example, recent work showed that introducing randomization to induced rankings along time, which hurts robustness, can increase fairness with respect to publishers of Web pages.7 Specifically, given that users browsing the search results page mainly pay attention to the top ranked results,19 allowing Web pages to be positioned at the highest ranks even if they are not among the most relevant, can increase the attention given to other publishers and hence promote fairness. Another justification for hurting ranking robustness is promoting content breadth in the corpus, specifically, by introducing randomization to ranking functions, as we recently showed using a game theoretic analysis.2

Considering the findings described here, there is a trade-off between ranking robustness and content breadth and/or ranking fairness. Accounting for all these aspects simultaneously is an intriguing avenue for future work.

Robustness of ranking functions: Regularization and beyond. In recent work,14 we presented initial theoretical foundations for the analysis of the robustness of ranking functions to (adversarial) document manipulations intended for promoting the documents in rankings. While the goal was indeed to study adversarial effects on robustness, the formal treatment was more general and accounted for any type of a change.

An important theoretical result to which we provided empirical support was the connection between regularization—a suite of approaches intended to improve the effectiveness of a ranking function over queries not seen during the training phase—and ranking robustness. We showed that the "stronger" the regularization, the more robust the induced rankings are to document manipulations.

Controlling ranking robustness by tuning regularization is only a first step. Exploring other fundamental approaches to improving, or more generally, controlling ranking robustness is a completely open research avenue. While existing relevance ranking functions are trained to improve effectiveness or minimize reductions in effectiveness with respect to that of known ranking functions,39 ranking robustness as a manifestation of post-ranking effects is a missing component in such approaches. The treatment of robustness of ranking functions is also important, as mentioned earlier, given the emergence of recent work on improving ranking fairness by hurting robustness.7 One would strive to simultaneously maximize fairness and minimize ranking instability.

Empirical analysis and evaluation. The Web is the canonical example of a competitive retrieval setting. Specifically, many publishers of Web pages have an incentive (for example, financial) to have their pages highly ranked for queries they care about. Isolating Web dynamics associated with incentive driven changes to documents is difficult since it occurs on a backdrop of constant Web page changes and dynamics that are not driven by ranking incentives.

To allow the study of corpus dynamics that results from the strategic behavior of publishers in response to rankings, as well as to devise new retrieval paradigms that account for this type of dynamics and its effects, controlled experiments are called for. For example, we recently reported a ranking competition held between students in a class.32 The students were instructed to write plain text documents with the aim of being highly ranked for given queries. That is, the competition was focused on content-based manipulation. The incentive of students to take part in the competition and produce documents that would presumably be ranked high was bonus points for the course grade if they were to win many rounds of the competition; that is, if the documents they created were highly ranked. At the end of each round of the competition, the students were shown the rankings induced for queries over the documents they have created/changed. That is, the only information available for "publishers" in this competition, as is the case over the Web, is the query, the induced ranking, and the ranked documents.

The competition was not large scale and focused on a specific (yet important) aspect of manipulation—namely, content manipulation. Nevertheless, as mentioned earlier, it provided empirical support to a game theoretical result we presented about the strategic behavior of publishers in response to induced rankings:32 publishers make their documents similar to those that were highly ranked for the same queries in previous rounds of the competition.

Organizing ranking competitions at large scale is extremely difficult, specifically, if various aspects are to be studied simultaneously—for example, the manipulation of both content and hyperlink structure. Furthermore, to develop novel ranking functions, one must run A/B testing between old and new functions in real time—that is, to have publishers participating in the game respond, by manipulating their documents, to the ranker which is applied. Thus, analysis of ranking competitions between incentivized publishers, as well as evaluation of novel ranking methods that address the resulting corpus dynamics, is a challenge at its own right.

Back to Top

Non-Optimality of Classical Search and Recommendation Methods

As noted, the probability ranking principle (PRP) is the theoretical foundation of most retrieval approaches that rank documents in response to a query. It is, in essence, also the theoretical foundation of recommendation methods that rank and recommend items. Accordingly, aside for potential exploration intended to collect data for training, search and recommendation methods based on the PRP are deterministic. As it turns out, when the publishers of items to be searched or recommended are strategic, this classical deterministic approach has to be replaced. Here, we consider the PRP and approaches devised based on the PRP for search (retrieval) and recommendation, respectively, in settings with strategic publishers.

Beyond PRP: Randomness increases social welfare in search. One of the important goals of an analysis of a competitive retrieval setting with corpus dynamics driven by incentivized (strategic) selfish players—publishers, in our case—is to derive insightful statements about the steady states that the setting can converge to, if at all. It is important to note that here, dynamics refers to that of documents in the corpus upon which search is performed. Dynamics of clickthrough patternsc and that of users' queries40 is outside the scope of this article. To address this goal, it is only natural to take a game theoretic approach that allows reasoning about the different equilibria the search setting can converge to. Equilibrium is a steady state wherein players (publishers) have no incentive to deviate from the current actions (strategies) they employ.d The fundamental question then becomes how to model a retrieval setting as a game.

We have recently modeled competitive retrieval settings as games.2,3 Publishers are players whose strategies are the types of documents they can write: either single-topic or multi-topic documents. The search engine's ranking function is the mediator that induces a ranking. A publisher is rewarded if her document is the highest ranked for a query. To simplify the analysis, we assumed each query is about a single topic and the distribution over queries is known to the publishers. We then examined two settings: all publishers have the same equal writing quality over all topics, and publishers have differential writing quality for different topics.

The type of documents publishers can write and the assumption regarding their writing quality entails a specific game. We assumed the ranking function has full knowledge of whether a document is relevant to a query (specifically, by topic matching) and analyzed the equilibria of the game, that is, publishers' behavior in which modification of behavior will not be any unilaterally beneficial. We then analyzed the social welfare attained in the game, specifically, in various equilibria. Social welfare was defined as the sum of utilities attained by publishers. We assumed users' utilities and publishers' utilities were aligned and those were determined by whether the highest ranked document was relevant. A simple utility function was used: 1 if the highest-ranked document is relevant and 0 otherwise.2,3 This utility function corresponds to a setting where a user examines only the top-ranked document and her utility solely depends on the relevance of this document. The alignment between publishers' and users' utilities facilitates the formal treatment and corresponds to the assumption that if a user is satisfied by seeing a relevant document then the publisher of the document is rewarded to the same extent. In practice, however, publishers' and users' utilities are not necessarily aligned, and more evolved models are thereby called for.2,3

The games were further analyzed by computing the price of anarchy:21,34 the ratio between the maximal social welfare that can be attained in a game and the minimal social welfare attained in any equilibrium of a game. In other words, price of anarchy is the price the echo system "pays" for the selfish behavior of players. The accompanying sidebar describes an example of a game and its analysis.

We found that if publishers write single-topic documents and they have equal writing qualities for all topics then the PRP is indeed an optimal criterion for ranking.2,3 But, if publishers write multitopic documents or have differential writing qualities the PRP becomes suboptimal in terms of price of anarchy. The sidebar provides a simple example that demonstrates the suboptimality of the PRP. Indeed, the PRP motivates publishers to write single-topic documents, or documents only on the topics they are most qualified to write, as their resultant ranking would be higher. Thus, the PRP essentially does not promote content breadth in the corpus—in terms of topical coverage—in these natural settings. Previously, we described a canonical example of this situation: the publisher of a multitopic document, with information unique to the document, is driven by the PRP to remove the unique information so as to better focus the document for higher ranking for queries of interest.

It turns out that introducing randomization to the ranking functions can improve the price of anarchy with respect to that attained by using the PRP as a ranking criterion, as randomization can help to promote content diversity.2,3 A case in point, promoting in rankings a multitopic document with respect to a single topic document, in case the retrieval score of the latter is a bit higher than that of the former, does not significantly harm per-query retrieval effectiveness but does result in improved price of anarchy (and social welfare) due to increased content breadth in the corpus.

These findings are especially important because they imply the most fundamental principle used by existing retrieval methods and search engines for ranking (the PRP) is suboptimal in competitive retrieval settings. Furthermore, these findings motivate a new view of how ranking functions should be learned: not (only) for minimizing a "local" loss function but rather maximizing the resultant social welfare attained in equilibria. This is a brand new challenge for the information retrieval community, and other communities interested in ranking, as the task involves estimating post-ranking effects. Our findings about the merits of applying nondeterministic ranking functions is a first small step in this direction.

Characterizing a desired fair recommendation system. Heretofore, we have mainly focused on search (retrieval) systems. Recommendation systems (RSs hereinafter) have also rapidly developed over the past decade. By predicting a user preference for an item, RSs have been successfully applied in a variety of applications. However, in order to address significant applications, where information is integrated from many sources, such systems have to generate recommendations that satisfy the needs of both the end users and other parties, and in particular content providers (publishers). Many of the content providers provide sponsored or partially sponsored content that may be relevant to a user, and an RS that will not be fair with content providers, will not survive.

Hence, an RS should be able to deal in a fair way with strategic content providers, each of which aims at maximizing its exposure. In addition, if an RS is not designed properly, then content providers may reach a fluctuating system where content providers rapidly change the type of content they provide only to defeat their competitors. Therefore, the RS need not only be fair but to also yield stability, that is, convergence to equilibrium of content providers' strategies. These RS requirements were only formulated very recently in a rigorous manner.5 It has been shown that classical RSs, which operate in the PRP style (that is, always show the most relevant content) fail to satisfy the requirements noted here (that is, will be unfair or lead to fluctuations). Indeed, the latter work also shows a particular RS, selecting among content providers in a well-defined probabilistic manner, that does satisfy the related requirements, and in a sense is the only recommendation system that satisfies them. Technically, the results use a classical concept in cooperative game theory, the Shapley value,37 which measures the contribution of parties to a cooperative act, and adapt it to serve as an efficient mechanism for probabilistic selection among content providers.

These findings complement the observation about the failure of the most fundamental principle used by existing retrieval methods and search engines for ranking (the PRP), by showing that the corresponding method also fails in the context of recommendation systems. It is suboptimal when considering strategic content providers. Indeed, well-defined and well-grounded probabilistic substitutes are offered.

Back to Top

Dueling

So far we have dealt with the design of systems, a.k.a. mechanisms or mediators, while taking into account that participants in these systems (specifically, publishers/content generators) are self-motivated. Indeed, the role of a search engine, as well as the role of a RS, which integrate information from different content providers, is to serve as a mediator among selfish participants. However, such mechanisms may also face a competition with other mechanisms as we will discuss. Thus, we now describe models that address such competitions (specifically, between search engines/ranking functions and between prediction algorithms). In these settings, the systems themselves can be thought of as the players, in contrast to the settings discussed thus far where publishers were the players.

Consider two competing search engines. A user thinks of a desired Web page and submits a query intended to target such a page to both search engines. The engine that ranks the desired page higher is chosen by the user as the "winner." Given a probability distribution over the users as for their desired pages given that query, the optimal algorithm ranks the pages, ω12,...,ωn, in a decreasing order of the corresponding probabilities; this is essentially the probability ranking principle (PRP). However, in a competitive setting the ranking ω23,...,ωn, ω1 wins (assuming the proportion of users interested in ω1 is less than 0.5) on every item that may be searched except for ω1. This is a highly simplified example of the fact that different data science mechanisms are in competition with one another, and therefore the optimal way to design them should be reconsidered.

Such setting was formalized first in Immorlica et al.,16 under the title dueling algorithms. The most basic building block in such settings is to consider a state-of-the-art algorithm (for example, a ranker in the information retrieval setting, a regressor in a prediction setting) and see if it can be defeated when put in a competitive setting; such as, whether one can provide an algorithm that most users will find more compelling than the classical one, when they have to choose. We now discuss two achievements that have been obtained in this regard.

Best-response regression. An interesting extension of this general discussion is in the context of machine learning, and in particular regression tasks. In a regression task, a predictor is given a set of instances (points) along with a real value for each point. Subsequently, she must identify the value of a new instance as accurately as possible. Ben-Porat and Tennenholtz4 considered a regression task tackled by two players, where the payoff of each player is the proportion of the points she predicts more accurately than the other player. On the technical side, in order to do so, the authors revise the probably approximately correct (PAC) learning framework to deal with the case of a duel between two predictors. Then they devise an algorithm that finds a linear regression predictor that is a best response to any (not necessarily linear) regression algorithm. This algorithm has linearithmic sample complexity, and polynomial time complexity when the dimension of the instances domain is fixed. The approach has been also tested in a high-dimensional setting, and it has been shown to significantly defeat classical regression algorithms in the prediction duel.

Responding to a strong ranker. Returning to the search and ranking motivation, here is a fundamental interesting question. Say a search engine has a relatively weak (somewhat ineffective) relevance ranking function; specifically, with respect to that of another (strong) search engine. This could be, for example, due to a relatively small user base: indeed, user engagement signals are important features in relevance-ranking functions.

Thus, an interesting duel between the two search engines emerges. The duel entails an interesting bimodal utility function for the weak search engine: the goal is to produce in response to a query a document result list whose effectiveness is not significantly lower than that of the strong search engine; and, the result list should also be quite different than that produced by the strong engine. In Izsak et al.,17 we presented a per-query algorithmic approach that leverages fundamental retrieval principles such as pseudo-feedback-based relevance modeling.24 That is, the weak ranker observes the documents most highly ranked by the strong ranker, uses them to induce a model of relevance for the query, and uses this model for ranking. We demonstrated the merits of our approach in improving the search effectiveness of the weak ranker using TREC datae—a suite of datasets used for evaluation of retrieval methods.

Back to Top

Strategic Users

The previous sections emphasized the need to account for the incentives of strategic parties when devising major data science applications such as search (ranking), recommendation, and prediction (specifically, regression). The strategic parties were associated with stakeholders such as content providers or prediction experts. The users of the systems were not strategic; they were the ones posting a query, asking for recommendation, or interested in prediction. In a sense, the users were the products the strategic parties were aiming to attract or to serve better than their competitors. However, users may have their own incentives that might cause them to potentially not follow recommendations by the system or to provide dishonest inputs to the system. Next, we briefly describe one of our recent works that addresses such aspects.

Incentive-compatible explore and exploit. Recommendation systems based on the interleaving of exploration and exploitation paradigm have a two-way relationship with their customers—on the one hand they provide recommendations while on the other hand they use customers as their source of information. This reality leads to an important challenge: a user might not accept a recommendation if he/she believes the recommendation is done to benefit exploration rather than be the optimal one given current information. The challenge is to devise a system that will behave close to optimal, but will be also incentive compatible, that is, rational users who care only for their expected utility will accept the system's recommendations.


A recommender system should be able to deal in a fair way with strategic content providers, each of which aims at maximizing its exposure.


We have recently addressed this challenge in designing recommendation systems, specifically for social networks, where users can (partly) observe each other.1,f In particular, we investigated the conditions on the social network that allow for asymptotically optimal outcomes. Our results show that for reasonable networks, where each user can observe many of the other users, but where still most users cannot see most of the other users, an incentive compatible and approximately optimal recommendation system does exist.

The literature on incentivizing exploration relevant to our study can be viewed as an extension of the celebrated multi-armed bandit problems8 to deal with settings where exploration can only be done by self motivated myopic agents and a central planner must incentivize exploration. The tension between the objective of the mediator (the recommendation engine) and the individual agents in a multi-armed bandit context was first introduced in Kremer et al.,22 who study this in a very simple setting. They identify an incentive compatible scheme with which a central mediator with commitment power can asymptotically steer the users toward taking the optimal action. This exciting news has been extended in Mansour et al.26,27 to several more elaborate bandit settings and to additional optimization criteria such as regret minimization. Whereas both of these papers account for agents' incentives and in particular the misalignment of incentives of the agents and the mediator they ignore other societal aspects. In particular, these papers make an implicit assumption that agents cannot see nor communicate with any other agent. Needless to say, the assumption that agents have no knowledge of each other, and cannot see the actions chosen by their neighbors, is unrealistic. Che and Horner9 also study mechanisms for social learning through a mediator. Similar to Kremer et al.,22 they assume agents are short lived and do not observe each other and the mechanism must incentivize them to explore. In contrast, the model in Che and Horner9 is one of continuous time.

Unfortunately, the results obtained in Kremer,22 and Mansour26,27 are not robust to changes in such network assumptions and their implicit assumption of no visibility turns out to be critical. In fact, even with very little observability, for example when each agent just sees the action chosen by his immediate predecessor, the schemes proposed in both works cease to be incentive compatible and lead to market failure. The challenge of approximately optimal incentive compatible schemes subject to partial observability among the agents was addressed in our work.1 In that paper it is assumed agents can view actions chosen by some of their predecessors. We characterize structural properties that allow for the design of optimal schemes and provide explicit constructions. In contrast with the recent literature relevant to search and recommendation where payments are avoided, other works study payments to agents as a means to incentivize exploration (for example, Frazier et al.12).

Back to Top

Conclusion and Looking Forward

Today, many search engines and recommendation systems operate in dynamic settings wherein dynamics is driven by interested parties. The parties can be those that generate items to be retrieved or recommended, the users themselves or competing systems.

It is thus somewhat surprising there has been little to no work on modeling the incentives of parties involved, and more generally, on devising algorithms that account for the incentive driven dynamics of the echosystem they operate in. We discussed our results regarding the consequences of ignoring the dynamics in several cases: the attained solutions are, by design, suboptimal. In particular, we showed that the most basic principle underlying most query-based ranking (retrieval) algorithms and recommendation methods is suboptimal in competitive settings.

Accordingly, we argued for a need to revolutionize the way search and recommendation algorithms are devised. We suggested game theory, and more specifically mechanism design, as a framework for devising the algorithms. The basic idea is to treat interested/incentivized parties as players and the data system/algorithm as a mediator. The challenge is that of mechanism (mediator) design: devising an algorithm that accounts for the environment which consists of self-motivated parties.


Our ongoing and ultimate operational goal is to devise new and improved search and recommendaion algorithms via mechanism design.


An interesting question is whether the game theoretic models constitute effective approximations for real user behavior. In our recent work we demonstrated such an alignment:32 the publishing strategies of publishers were aligned with the game-theoretic results; namely, mimicking documents which were highly ranked in the past. Studying the robustness of game-theoretic solutions in the face of irrational behavior of users is a direction we intend to explore in future work.

Another important aspect is the system and algorithmic complexity overhead when accounting for incentives and applying game theoretic principles. We argue that the overhead need not be significant. For example, the nondeterministic ranking functions we proposed2,3 improve social welfare with respect to the PRP and practically post no algorithmic or system-implementation overhead. Similarly, our approaches for improving ranking robustness14 and for dueling of search engines17 incur no significant overhead.

Our ongoing and ultimate operational goal is to devise new and improved search and recommendation algorithms via mechanism design. We surveyed a few such examples, but there is much more work to be done in that respect. We envision a modular development of game theoretic-based search and recommendation solutions, as was and is the case for current solutions that ignore incentives. First, fundamental relevance estimates for search and recommendation will be improved by mechanism design and these can be integrated—for example, via learning-to-rank—with other estimates for which incentive issues are not accounted for. At the macro level, one can devise loss functions for search and recommendation functions that account for incentives. These can be devised in some cases, as we recently showed for ranking robustness,14 regardless of the features used. Overall, such modular development allows, in many cases, to separate the development of none-game-theoretic aspects from those which rely on game theory.

The work we surveyed (for example, on nondeterministic ranking functions2,3 provides some initial guidelines about how one would devise solutions that account for incentives. We believe the suite of guidelines will broaden with more work in these directions, as is the case, for example, for neuro IR—current state-of-the-art neuro IR algorithms are significantly different from those proposed a few years ago given the experience which has been gained.

The spectrum of challenges that remain to be addressed is huge. For example, accounting for the interactions between different types of incentivized players involved—for example, publishers and users of a search engine. Indeed, users of the search engine are not only incentivized consumers of retrieved items, but also interested parties that affect the search engine's behavior (specifically, its ranking function) via their interactions. That is, in commercial search engines, users' clicks on retrieved results are treated as implicit relevance feedback.19 Query reformulations are an additional implicit relevance feedback signal. Thus, modeling user interactions with the search system is an important part of the analysis of any search engine and the design of retrieval algorithms. In our setting, the emerging challenge is modeling simultaneously, using a game theoretic approach, the effects of the strategic behavior of interested users with that of interested publishers. For example, to contrast ranking mechanisms in terms of utility and social welfare, we can potentially use Bayesian games, rather than perfect information games used in our initial work,2,3 wherein clicks and their dynamics are employed to devise relevance estimates.

There are additional "micro-level" challenges; examples include devising content-based and other (for example, hyperlink-based) relevance estimates that account for incentivized publishers; moving from list-based effectiveness evaluation to amortized and equilibrium-based evaluation3,7 given the dynamics of the retrieval setting; and, adapting specialized retrieval approaches (for example, those based on fusion or federation) to incentives-based settings.

Our study is part of a more general attack on applying mechanism design for data science (see http://mdds.net.technion.ac.il). Indeed, while algorithmic game theory had remarkable success in addressing incentives in variety of settings, such as a variety of fundamental resource allocation settings (for example, auctions), and congestion control,29 among others, both the repertoire of games and the techniques to be employed need to be expanded in order to deal with data science algorithms. While we referred mainly to search and recommendation systems in this article, the site mentioned here shows applications to prediction, regression, segmentation, diffusion, and others. Two of the major techniques used are around approximate mechanism design without money30 and dueling algorithms,16 both of which are beyond the scope of this article.

Back to Top

References

1. Bahar, G., Smorodinsky, R., and Tennenholtz, M. Economic recommendation systems: One page abstract. In Proceedings of EC, 2016, 757–757.

2. Ben-Basat, R., Tennenholtz, M., and Kurland, O. The probability ranking principle is not optimal in adversarial retrieval settings. In Proceedings of ICTIR, 2015, 51–60.

3. Ben-Basat, R., Tennenholtz, M., and Kurland, O. A game theoretic analysis of the adversarial retrieval setting. J. Artif Intell. Res. 60 (2017), 1127–1164.

4. Ben-Porat, O. and Tennenholtz, M. Best response regression. In Proceedings of NIPS, 2017, 1498–1507.

5. Ben-Porat, O. and Tennenholtz, M. A Game-theoretic approach to recommendation systems with strategic content providers. In Proceedings of NeurIPS, 2018, 1118–1128.

6. Bendersky, M., Croft, W.B., and Diao, Y. Quality-biased ranking of Web documents. In Proceedings of WSDM, 2011, 95–104.

7. Biega, A.J., Gummadi, K.P., and Weikum, G. Equity of attention: Amortizing individual fairness in rankings. In Proceedings of SIGIR, 2018, 405–414.

8. Bubeck, S. and Cesa-Bianchi, N. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning 5, 1 (2012), 1–122; https://doi.org/10.1561/2200000024

9. Che, Y.K. and Horner, J.O. Optimal Design for Social Learning. Cowles Foundation Discussion Paper No. 2000, 2015.

10. Cramton, P., Shoham, Y., and Steinberg, R. An overview of combinatorial auctions. SIGecom Exchanges 7, 1 (2007), 3–14.

11. Dalvi, N., Domingos, P., Mausam, Sanghai, S., and Verma, D. Adversarial classification. In Proceedings of KDD, 2004, 99–108.

12. Frazier, P.I., Kempe, D., Kleinberg, J.M., and Kleinberg, R. Incentivizing exploration. In Proceedings of EC, 2014, 5–22.

13. Goodfellow, I.J., Shlens, J., and Szegedy, C. Explaining and harnessing adversarial examples. In Proceedings of ICLR, 2015.

14. Goren, G., Kurland, O., Tennenholtz, M., and Raiber, F. Ranking robustness under adversarial document manipulations. In Proceedings of SIGIR, 2018, 395–404.

15. Gyöngyi, Z. and Garcia-Molina, H. Web spam taxonomy. In Proceedings of AIRWeb, 2005, 39–47.

16. Immorlica, N., Kalai, A.T., Lucier, B., Moitra, A., Postlewaite, A., and Tennenholtz, M. Dueling algorithms. In Proceedings of STOC, 2011, 215–224.

17. Izsak, P., Raiber, F., Kurland, O., and Tennenholtz, M. The search duel: A response to a strong ranker. In Proceedings of SIGIR, 2014, 919–922.

18. Jardine, N. and van Rijsbergen, C.J. The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7, 5 (1971), 217–240.

19. Joachims, T., Granka, L., Pan, B., Hembrooke, H., and Gay, G. Accurately interpreting clickthrough data as implicit feedback. In Proceedings of SIGIR, 2005, 154–161.

20. Jones, K.S., Walker, S., and Robertson, S.E. A probabilistic model of information retrieval: development and comparative experiments—Part 1. Information Processing and Management 36, 6 (2000), 779–808.

21. Koutsoupias, E. and Papadimitriou, C. Worst-case equilibria. In Proceedings of STACS, 1999.

22. Kremer, I., Mansour, Y., and Perry, M. Implementing the wisdom of the crowd. J. Political Economy 122 (2014), 988–1012.

23. Lafferty, J. and Zhai, C. Probabilistic relevance models based on document and query generation. Language Modeling and Information Retrieval. Kluwer Academic Publishers, 2003, 1–10.

24. Lavrenko, V. and Croft, W.B. Relevance-based language models. In Proceedings of SIGIR, 2001, 120–127.

25. Liu, T. Learning to Rank for Information Retrieval. Springer, 2011.

26. Mansour, Y., Slivkins, A., and Syrgkanis, V. Bayesian incentive–compatible bandit exploration. In Proceedings of EC, 2015.

27. Mansour, Y., Slivkins, A., Syrgkanis, V., and Wu, Z.S. Bayesian Exploration: Incentivizing Exploration in Bayesian Games. CoRR (2016); http://arxiv.org/abs/1602.07570

28. Mitra, B. and Craswell, N. Neural Models for Information Retrieval. CoRR (2017); http://arxiv.org/abs/1705.01509

29. Nisan, N., Roughgarden, T., Tardos, E., and Vazirani, V.V. Algorithmic Game Theory. Cambridge University Press, 2007.

30. Procaccia, A. and Tennenholtz, M. Approximate mechanism design without Money. In Proceedings of EC, 2009.

31. Radinsky, K. and Bennett, P.N. Predicting content change on the Web. In Proceedings of WSDM, 2013, 415–424.

32. Raifer, N., Raiber, F., Tennenholtz, M., and Kurland, O. Information retrieval meets game theory: The ranking competition between documents' authors. In Proceedings of SIGIR, 2017, 465–474.

33. Robertson, S.E. The Probability Ranking Principle in IR. J. Documentation (1977), 294–304. Reprinted in Readings in Information Retrieval, K. Sparck Jones and P. Willett (eds), 1997, 281–286.

34. Roughgarden, T. and Tardos, E. How bad is selfish routing? JACM 49, 2 (April 2002), 236–259.

35. Salton, J., Wong, A., and Yang, C.S. A vector space model for automatic indexing. Commun. ACM 18, 11 (Nov. 1975), 613–620.

36. Santos, A.S.R., Pasini, B., and Freire, J. A first study on temporal dynamics of topics on the Web. In Proceedings of WWW, 2016, 849–854.

37. Shapley, L.S. A value for n-person games. Contributions to the Theory of Games II, Harold W. Kuhn and Albert W. Tucker, (eds.). Princeton University Press, Princeton, NJ, 1953, 307–317.

38. Varian, H.R. Online ad auctions. American Economic Review 99, 2 (2009).

39. Wang, L., Bennett, P.N., and Collins-Thompson, K. Robust ranking models via risk-sensitive optimization. In Proceedings of SIGIR, 2012, 761–770.

40. Yang, G.H., Sloan, M., and Wang, J. Dynamic Information Retrieval Modeling. Morgan & Claypool Publishers, 2016.

Back to Top

Authors

Moshe Tennenholtz ([email protected]) is a professor in the Faculty of Industrial Engineering at Management at Technion, Haifa, Israel, where he holds the Sondheumer Technion Academic Chair.

Oren Kurland ([email protected]) is a professor .in the Faculty of Industrial Engineering at Management at Technion, Haifa, Israel.

Back to Top

Footnotes

a. There are efforts to reverse engineer ranking functions but these are often of quite limited success.

b. A response in terms of manipulating document content can be, for example, at the "micro-level:" selecting the terms to add or remove, or at the "macro-level:" making a document more similar to another document.32

c. Clickthrough refers to clicks of users on the results presented by the search engine.

d. These strategies can be probability distributions (a.k.a. mixed strategies) over pure strategies.

e. https://trec.nist.gov/

f. The pioneering work on this challenge assumed no user communication.9,22

Back to Top


©2019 ACM  0001-0782/19/12

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2019 ACM, Inc.


 

No entries found