This page intentionally left blank
Advances in Economics and Econometrics This is the ﬁrst of three volumes containin...

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

This page intentionally left blank

Advances in Economics and Econometrics This is the ﬁrst of three volumes containing edited versions of papers and commentaries presented at invited symposium sessions of the Eighth World Congress of the Econometric Society held in Seattle, WA, in August 2000. The papers summarize and interpret recent key developments, and they discuss future directions for a wide range of topics in economics and econometrics. The papers cover both theory and applications. Written by leading specialists in their ﬁelds, these volumes provide a unique survey of progress in the discipline. Mathias Dewatripont is Professor of Economics at the Universit´e Libre de Bruxelles where he was the founding Director of the European Centre for Advanced Research in Economics (ECARE). Since 1998, he has been Research Director of the Londonbased CEPR (Centre for Economic Policy Research) network. In 1998, he received the Francqui Prize, awarded each year to a Belgian scientist below the age of 50. Lars Peter Hansen is Homer J. Livingston Distinguished Service Professor of Economics at the University of Chicago. He was a co-winner of the Frisch Prize Medal in 1984. He is also a member of the National Academy of Sciences. Stephen J. Turnovsky is Castor Professor of Economics at the University of Washington and recently served as an Editor of the Journal of Economic Dynamics and Control. He is an Associate Editor and is on the Editorial Board of four other journals in economic theory and international economics. Professors Dewatripont, Hansen, and Turnovsky are Fellows of the Econometric Society and were Program Co-Chairs of the Eighth World Congress of the Econometric Society, held in Seattle, WA, in August 2000.

Econometric Society Monographs No. 35 Editors: Andrew Chester, University College London Matthew Jackson, California Institute of Technology The Econometric Society is an international society for the advancement of economic theory in relation to statistics and mathematics. The Econometric Society Monograph Series is designed to promote the publication of original research contributions of high quality in mathematical economics and theoretical and applied econometrics. Other titles in the series: G. S. Maddala Limited dependent and qualitative variables in econometrics, 0 521 33825 5 Gerard Debreu Mathematical economics: Twenty papers of Gerard Debreu, 0 521 33561 2 Jean-Michel Grandmont Money and value: A reconsideration of classical and neoclassical monetary economics, 0 521 31364 3 Franklin M. Fisher Disequilibrium foundations of equilibrium economics, 0 521 37856 7 Andreu Mas-Colell The theory of general economic equilibrium: A differentiable approach, 0 521 26514 2, 0 521 38870 8 Truman F. Bewley, Editor Advances in econometrics – Fifth World Congress (Volume I), 0 521 46726 8 Truman F. Bewley, Editor Advances in econometrics – Fifth World Congress (Volume II), 0 521 46725 X Herv´e Moulin Axioms of cooperative decision making, 0 521 36055 2, 0 521 42458 5 L. G. Godfrey Misspeciﬁcation tests in econometrics: The Lagrange multiplier principle and other approaches, 0 521 42459 3 Tony Lancaster The econometric analysis of transition data, 0 521 43789 X Alvin E. Roth and Marilda A. Oliviera Sotomayor, Editors Two-sided matching: A study in game-theoretic modeling and analysis, 0 521 43788 1 Wolfgang H¨ardle, Applied nonparametric regression, 0 521 42950 1 Jean-Jacques Laffont, Editor Advances in economic theory – Sixth World Congress (Volume I), 0 521 48459 6 Jean-Jacques Laffont, Editor Advances in economic theory – Sixth World Congress (Volume II), 0 521 48460 X Halbert White Estimation, inference and speciﬁcation, 0 521 25280 6, 0 521 57446 3 Christopher Sims, Editor Advances in econometrics – Sixth World Congress (Volume I), 0 521 56610 X Christopher Sims, Editor Advances in econometrics – Sixth World Congress (Volume II), 0 521 56609 6 Roger Guesnerie A contribution to the pure theory of taxation, 0 521 23689 4, 0 521 62956 X David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume I), 0 521 58011 0, 0 521 58983 5 David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume II), 0 521 58012 9, 0 521 58982 7 David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume III), 0 521 58013 7, 0 521 58981 9 Donald P. Jacobs, Ehud Kalai, and Morton I. Kamien, Editors Frontiers of research in economic theory: The Nancy L. Schwartz Memorial Lectures, 1983–1997, 0 521 63222 6, 0 521 63538 1 A. Colin Cameron and Pravin K. Trivedi Regression analysis of count data, 0 521 63201 3, 0 521 63567 5 Steinar Strøm, Editor Econometrics and economic theory in the 20th century: The Ragnar Frisch Centennial Symposium, 0 521 63323 0, 0 521 63365 6 Eric Ghysels, Norman R. Swanson, and Mark Watson, Editors Essays in econometrics: Collected papers of Clive W.J. Granger (Volume I), 0 521 77297 4, 0 521 80401 8, 0 521 77496 9, 0 521 79697 0 Eric Ghysels, Norman R. Swanson, and Mark Watson, Editors Essays in econometrics: Collected papers of Clive W.J. Granger (Volume II), 0 521 79207 X, 0 521 80401 8, 0 521 79649 0, 0 521 79697 0 Cheng Hsiao, Analysis of panel data, second edition, 0 521 81855 9, 0 521 52271 4 Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky, Editors Advances in economics and econometrics – Eighth World Congress (Volume II), 0 521 81873 7, 0 521 52412 1 Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky, Editors Advances in economics and econometrics – Eighth World Congress (Volume III), 0 521 81874 5, 0 521 52413 X

Advances in Economics and Econometrics Theory and Applications, Eighth World Congress, Volume I Edited by

Mathias Dewatripont Universit´e Libre de Bruxelles and CEPR, London

Lars Peter Hansen University of Chicago

Stephen J. Turnovsky University of Washington

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge , United Kingdom Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521818728 © Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky 2003 This book is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2003 - isbn-13 978-0-511-06989-5 eBook (EBL) - isbn-10 0-511-06989-8 eBook (EBL) - isbn-13 978-0-521-81872-8 hardback - isbn-10 0-521-81872-9 hardback - isbn-13 978-0-521-52411-7 paperback - paperback isbn-10 0-521-52411-3 Cambridge University Press has no responsibility for the persistence or accuracy of s for external or third-party internet websites referred to in this book, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

List of Contributors Preface 1. Auctions and Efﬁciency eric maskin 2. Why Every Economist Should Learn Some Auction Theory paul klemperer 3. Global Games: Theory and Applications stephen morris and hyun song shin 4. Testing Contract Theory: A Survey of Some Recent Work pierre-andre chiappori and bernard salani e´ 5. The Economics of Multidimensional Screening jean-charles rochet and lars a. stole A Discussion of the Papers by Pierre-Andre Chiappori and Bernard Salani´e and by Jean Charles Rochet and Lars A. Stole patrick legros 6. Theories of Fairness and Reciprocity: Evidence and Economic Applications ernst fehr and klaus m. schmidt 7. Hyberbolic Discounting and Consumption christopher harris and david laibson A Discussion of the Papers by Ernest Fehr and Klaus M. Schmidt and by Christopher Harris and David Laibson glenn ellison 8. Agglomeration and Market Interaction masahisa fujita and jacques-franc¸ ois thisse 9. Nonmarket Interactions edward glaeser and jos e´ a. scheinkman Index

page ix xi 1

25 56 115 150

198

208 258

298 302 339

371

Contributors

Pierre-Andre Chiappori University of Chicago

Stephen Morris Yale University

Glenn Ellison Massachusetts Institute of Technology

Jean-Charles Rochet GREMA and IDEI-R, Universit´e des Sciences Sociales, Toulouse, France

Ernst Fehr University of Zurich and CEPR Masahisa Fujita Kyoto University Edward Glaeser Harvard University Christopher Harris University of Cambridge Paul Klemperer Oxford University David Laibson Harvard University Patrick Legros Universit´e Libre de Bruxelles Eric Maskin Institute for Advanced Study and Princeton University

Bernard Salani´e CREST, CNRS, and CEPR, Paris Jos´e A. Scheinkman Princeton University Klaus M. Schmidt University of Munich and CEPR Hyun Song Shin London School of Economics Lars A. Stole University of Chicago Jacques-Fran¸cois Thisse Universit´e Catholique de Louvain, Ecole, Nationale des Ponts et Chaus´ees, and CEPR

Preface

These volumes contain the papers of the invited symposium sessions of the Eighth World Congress of the Econometric Society. The meetings were held at the University of Washington, Seattle, in August 2000; we served as Program Co-Chairs. The book also contains an invited address, the “Seattle Lecture,” given by Eric Maskin. This address was in addition to other named lectures that are typically published in Econometrica. Symposium sessions had discussants, and about half of them wrote up their comments for publication. These remarks are included in the book after the session papers they comment on. The book chapters explore and interpret recent developments in a variety of areas in economics and econometrics. Although we chose topics and authors to represent the broad interests of members of the Econometric Society, the selected areas were not meant to be exhaustive. We deliberately included some new active areas of research not covered in recent Congresses. For many chapters, we encouraged collaboration among experts in an area. Moreover, some sessions were designed to span the econometrics–theory separation that is sometimes evident in the Econometric Society. We followed the lead of our immediate predecessors, David Kreps and Ken Wallis, by including all of the contributions in a single book edited by the three of us. Because of the number of contributions, we have divided the book into three volumes; the topics are grouped in a manner that seemed appropriate to us. We believe that the Eighth World Congress of the Econometric Society was very successful, and we hope that these books serve as suitable mementos of that event. We are grateful to the members of our Program Committee for their dedication and advice, and to Scott Parris at Cambridge University Press for his guidance and support during the preparation of these volumes. We also acknowledge support from the ofﬁcers of the Society – Presidents Robert Lucas, Jean Tirole, Robert Wilson, Elhanan Helpman, and Avinash Dixit – as well as the Treasurer, Robert Gordon, and Secretary, Julie Gordon. Finally, we express our gratitude to the Co-Chairs of the Local Organizing Committee, Jacques Lawarree and Fahad Khalil, for a smoothly run operation.

CHAPTER 1

Auctions and Efﬁciency Eric Maskin

1. INTRODUCTION The allocation of resources is an all-pervasive theme in economics. Furthermore, the question of whether there exist mechanisms ensuring efﬁcient allocation (i.e., mechanisms that ensure that resources end up in the hands of those who value them most) is of central importance in the discipline. Indeed, the very word “economics” connotes a preoccupation with the issue of efﬁciency. But economists’ interest in efﬁciency does not end with the question of existence. If efﬁcient mechanisms can be constructed, we want to know what they look like and to what extent they might resemble institutions used in practice. Understandably, the question of what will constitute an efﬁcient mechanism has been a major concern of economic theorists going back to Adam Smith. But, the issue is far from just a theoretical one. It is also of considerable practical importance. This is particularly clear when it comes to privatization, the transfer of assets from the state to the private sector. In the last 15 years or so, we have seen a remarkable ﬂurry of privatizations in Eastern Europe, the former Soviet Union, China, and highly industrialized Western nations, such as the United States, the United Kingdom, and Germany. An important justiﬁcation for these transfers has been the expectation that they will improve efﬁciency. But if efﬁciency is the rationale, an obvious leading question to ask is: “What sorts of transfer mechanisms will best advance this objective?” One possible and, of course, familiar answer is “the Market.” We know from the First Theorem of Welfare Economics (see Debreu, 1959) that, under certain conditions, the competitive mechanism (the uninhibited exchange and production of goods by buyers and sellers) results in an efﬁcient allocation. A major constraint on the applicability of this result to the circumstances of privatization, however, is the theorem’s hypothesis of large numbers. For the competitive mechanism to work properly – to avoid the exercise of monopoly power – there must be sufﬁciently many buyers and sellers so that no single agent has an appreciable effect on prices. But privatization often entails small

2

Maskin

numbers. In the recent U.S. “spectrum” auctions – the auctions in which the government sold rights (in the form of licenses) to use certain radio frequency bands for telecommunications – there were often only two or three serious bidders for a given license. The competitive model does not seem readily applicable to such a setting. An interesting alternative possibility was raised by William Vickrey (1961) 40 years ago. Vickrey showed that, if a seller has a single indivisible good for sale, a second-price auction (see Section 2) is an efﬁcient mechanism – i.e., the winner is the buyer whose valuation of the good is highest – in the case where buyers have private values (“private values” mean that no buyer’s private information affects any other buyer’s valuation). This ﬁnding is rendered even more signiﬁcant by the fact that it can be readily extended to the sale of multiple goods,1 as shown by Theodore Groves (1973) and Edward Clarke (1971). Unfortunately, once the assumption of private values is dropped and thus buyers’ valuations do depend on other buyers’ information (i.e., we are in the world of common2 or interdependent values), the second-price auction is no longer efﬁcient, as I will illustrate later by means of an example. Yet, the common-values case is the norm in practice. If, say, a telecommunications ﬁrm undertakes a market survey to forecast demand for cell phones in a given region, the results of the survey will surely be of interest to its competitors and thus turn the situation into one of common values. Recently, a literature has developed on the design of efﬁcient auctions in common-values settings. The time is not yet ripe for a survey; the area is currently evolving too rapidly for that. But I would like to take this opportunity to discuss a few of the ideas from this literature. 2. THE BASIC MODEL Because it is particularly simple, I will begin with the case of a single indivisible good. Later, I will argue that much (but not all) of what holds in the one-good case extends to multiple goods. Suppose that there are n potential buyers. It will be simplest to assume that they are risk-neutral (however, we can accommodate any other attitude toward risk if the model is specialized to the case in which there is no residual uncertainty about valuations when all buyers’ information is pooled). Assume that each buyer i’s private information about the good can be summarized by a real-valued signal. That is, buyer i’s information is reducible to a onedimensional parameter.3 Formally, suppose that each buyer i’s signal si lies in 1 2

3

Vickrey himself also treated the case of multiple units of the same good. I am using “common values” in the broad sense to cover any instance where one agent’s payoff depends on another’s information. The term is sometimes used narrowly to mean that all agents share the same payoff. Later on, I will examine the case of multidimensional signals. As with multiple goods, much will generalize. As we will see, the most problematic case is that in which there are both multiple goods and multidimensional signals.

Auctions and Efﬁciency

3

an interval [s i , s¯i ]. The joint prior distribution of (s1 , . . . , sn ) is given by the c.d.f. F(s1 , . . . , sn ). Buyer i’s valuation for the good (i.e., the most he would be willing to pay for it) is given by the function v i (s1 , . . . , sn ). I shall suppose (with little loss of generality) that higher values of si correspond to higher valuations, i.e., ∂v i > 0. ∂si

(2.1)

Let us examine two illustrations of this model. Example 2.1. Suppose that v i (s1 , . . . , sn ) = si . In this case, we are in the world of private values, not the interesting setting from the perspective of this lecture, but a valid special case. A more pertinent example is: Example 2.2. Suppose that the true value of the good to buyer i is yi , which, in turn, is the sum of a value component that is common to all buyers and a component that is peculiar to buyer i. That is, yi = z + z i , where z is the common component and z i is buyer i’s idiosyncratic component. Suppose, however, that buyer i does not actually observe yi , but only a noisy signal si = yi + εi ,

(2.2)

where εi is the noise term, and all the random variables –z, the z i s, and the εi s – are independent. In this case, every buyer j’s signal s j provides information to buyer i about his valuation, because s j is correlated [via (2.2)] with the common component z. Hence, we can express v i (s1 , . . . , sn ) as v i (s1 , . . . , sn ) = E[yi |s1 , . . . , sn ],

(2.3)

where the right-hand side of (2.3) denotes the expectation of yi conditional on the signals (s1 , . . . , sn ). This second example might be kept in mind as representative of the sort of scenario that the analysis is intended to apply to. 3. AUCTIONS An auction in the model of Section 2 is a mechanism (alternatively termed a “game form” or “outcome function”) that, on the basis of the bids submitted, determines (i) who wins (i.e., who – if anyone – is awarded the good), and

4

Maskin

(ii) how much each buyer pays.4 Let us call an auction efﬁcient provided that, in equilibrium, buyer i is the winner if and only if v i (s1 , . . . , sn ) ≥ max vj (s1 , . . . , sn ) j=i

(3.1)

(this deﬁnition is slightly inaccurate because of the possibility of ties for highest valuation, an issue that I shall ignore). In other words, efﬁciency demands that, in an equilibrium of the auction, the winner be the buyer with the highest valuation, conditional on all available information (i.e., on all buyers’ signals). This notion of efﬁciency is sometimes called expost efﬁciency. It assumes implicitly that the social value of the good being sold equals the maximum of the potential buyers’ individual valuations. This assumption would be justiﬁed if, for example, each buyer used the good (e.g., a spectrum license) to produce an output (e.g., telecommunication service) that is sold in a competitive market without signiﬁcant externalities (market power or externalities might drive a wedge between individual and social values). The reader may wonder why, even if one wants efﬁciency, it is necessary to insist that the auction itself be efﬁcient. After all, the buyers could always retrade afterward if the auction resulted in a winner with less than the highest valuation. The problem with relying on postauction trade, however, is much the same as that plaguing competitive exchange in the ﬁrst place: These mechanisms do not, in general, work efﬁciently when there are only a few traders. To see this, consider the following example:5 Example 3.1. Suppose that there are two buyers. Assume that buyer 1 has won the auction and has a valuation of 1. If the auction is not guaranteed to be efﬁcient, then there is some chance that buyer 2’s valuation is higher. Suppose that, from buyer 1’s perspective, buyer 2’s valuation is distributed uniformly in the interval [0, 2]. Now, if there is to be further trade after the auction, someone has to initiate it. Let us assume that buyer 1 does so by proposing a trading price to buyer 2. Presumably, buyer 1 will propose a price p ∗ that maximizes his expected payoff, i.e., that solves 1 max (2 − p)( p − 1). p 2

(∗)

[To understand (∗), note that 12 (2 − p) is the probability that the proposal is accepted – since it is the probability that buyer 2’s valuation is at least p – and that p − 1 is buyer 1’s net gain in the event of acceptance.] But the solution to (∗) is p ∗ = 32 . Hence, if buyer 2’s valuation lies between 1 and 32 , the allocation, 4

5

For some purposes – e.g., dealing with risk-averse buyers (see Maskin and Riley, 1984), liquidity constraints (see Che and Gale, 1996, or Maskin, 2000) or allocative externalities (see Jehiel and Moldovanu (2001) – one must consider auctions in which buyers other than the winner also make payments. In this lecture, however, I will not have to deal with this possibility. In this example, buyers have private values, but, as Fieseler, Kittsteiner, and Moldavanu (2000) show, resale can become even more problematic when there are common values.

Auctions and Efﬁciency

5

even after allowing for expost trade, will remain inefﬁcient, because buyer 2 will reject 1’s proposal. I will ﬁrst look at efﬁciency in the second-price auction. This auction form (often called the Vickrey auction) has the following rules: (i) each bidder i makes a (sealed) bid bi , which is a nonnegative number; (ii) the winner is the bidder who has made the highest bid (again ignoring the issue of ties); (iii) the winner pays the second-highest bid, max j=i b j . As I have already noted and will illustrate explicitly, in Section 6 this auction can readily be extended to multiple goods. The Vickrey auction is efﬁcient in the case of private values.6 To see this, note ﬁrst that it is optimal – in fact, a dominant strategy – for buyer i to set bi = v i (i.e., to bid his true valuation). In particular, bidding below v i does not affect buyer i’s payment if he wins (because his bid does not depend on his own bid); it just reduces his chance of winning – and so is not a good strategy. Bidding above v i raises buyer i’s probability of winning, but the additional events in which he wins are precisely those in which someone else has bid higher than v i . In such events, buyer i pays more than v i , also not a desirable outcome. Thus, it is indeed optimal to bid bi = v i , which implies that the winner is the buyer with the highest valuation, the criterion for efﬁciency. Unfortunately, the Vickrey auction does not remain efﬁcient once we depart from private values. To see this, consider the following example. Example 3.4. Suppose that there are three buyers with valuation functions 2 v 1 (s1 , s2 , s3 ) = s1 + s2 + 3 1 v 2 (s1 , s2 , s3 ) = s2 + s1 + 3 v 3 (s1 , s2 , s3 ) = s3 .

1 s3 , 3 2 s3 , 3

Notice that buyers 1 and 2 have common values (i.e., their valuations do not depend only on their own signals). Assume that it happens that s1 = s2 = 1 (of course, buyers 1 and 2 would not know that their signal values are equal, because signals are private information), and suppose that buyer 3’s signal value is either slightly below or slightly above 1. In the former case, it is easy to see that v1 > v2 > v3, and so, for efﬁciency, buyer 1 ought to win. However, in the latter case v2 > v1 > v3, 6

It is easy to show that the “ﬁrst-price” auction – the auction in which each buyer makes a bid, the high bidder wins, and the winner pays his bid – is a nonstarter as far as efﬁciency is concerned. Indeed, even in the case of private values, the ﬁrst-price auction is never efﬁcient, except when buyers’ valuations are symmetrically distributed (see Maskin, 1992).

6

Maskin

and so buyer 2 is the efﬁcient winner. Thus, the efﬁcient allocation between buyers 1 and 2 turns on whether s3 is below or above 1. But, in a Vickrey auction, the bids made by buyers 1 and 2 cannot incorporate information about s3 , because that signal is private information to buyer 3. Thus, the outcome of the auction cannot in general be efﬁcient.

4. AN EFFICIENT AUCTION How should we respond to the shortcomings of the Vickrey auction as illustrated by Example 3.3? One possible reaction is to appeal to classical mechanismdesign theory. Speciﬁcally, we could have each buyer i announce a signal value sˆi , award the good to the buyer i for whom v i (ˆs1 , . . . , sˆn ) is highest, and choose the winner’s payment to evoke truth-telling in buyers (i.e., to induce each buyer j to set sˆ j equal to his true signal value s j ). This approach is taken in Cr´emer and McLean (1985) and Maskin (1992). The problem with such a “direct revelation” mechanism is that it is utterly unworkable in practice. In particular, notice that it requires the mechanism designer to know the physical signal spaces S1 , . . . , Sn , the functional forms v i (·), and the prior distributions of the signals – an extraordinarily demanding constraint. Now, the mechanism designer could attempt to elicit this information from the buyers themselves using the methods of the implementation literature (see Palfrey, 1993). For example, to learn the signal spaces, he could have each buyer announce a vector ( Sˆ 1 , . . . , Sˆ n ) and assign suitable penalties if the announcements did not match up appropriately. A major difﬁculty with such a scheme, however, is that in all likelihood the signal spaces Si are themselves private information. For analytic purposes, we model Si as simply an interval of numbers. But, this abstracts from the reality that buyer i’s signal corresponds to some physical entity – whatever it is that buyer i observes. Indeed, the signal may well be a sufﬁcient statistic for data from a variety of different informational sources, and there is no reason why other buyers should know just what this array of sources is. To avoid these complications, I shall concentrate on auction rules that do not make use of such details as signal spaces, functional forms, and distributions. Indeed, I will be interested in auctions that work well irrespective of these details; that is, I will adhere to the “Wilson Doctrine” (after Robert Wilson, who has been an eloquent proponent of the view that auction institutions should be “detail-free”). It turns out that a judicious modiﬁcation of the Vickrey auction will do the trick. Before turning to the modiﬁcation, however, I need to introduce a restriction on valuation functions that is critical to the possibility of constructing efﬁcient auctions. Let us assume that for all i and j = i and all (s1 , . . . , sn ), v i (s1 , . . . , sn ) = v j (s1 , . . . , sn ) ⇒ 7

∂v j ∂v i (s1 , . . . , sn ) > (s1 , . . . , sn ).7 (4.1) ∂si ∂si

This condition was introduced by Gresik (1991).

Auctions and Efﬁciency

7

In other words, condition (4.1) says that buyer i’s signal has a greater marginal effect on his own valuation than on that of any other buyer j (at least at points where buyer i’s and buyer j’s valuations are equal). Notice that, in view of (2.1), condition (4.1)8 is automatically satisﬁed by Example 2.1 (the case of private values): the right-hand side of the inequality then simply vanishes. Condition (4.1) also holds for Example 2.2. This is because, in that example, si conveys relevant information to buyer j (= i) about the common component z, but tells buyer i not only about z but also his idiosyncratic component z i. Thus, v i will be more sensitive than v j to variations in si . But whether or not condition (4.1) is likely to be satisﬁed, it is, in any event, essential for efﬁciency. To see what can go wrong without it, consider the following example. Example 4.5. Suppose that the owner of a tract of land wishes to sell off the rights to drill for oil on her property. There are two potential drillers who are competing for this right. Driller 1’s ﬁxed cost of drilling is 1, whereas his marginal cost is 2. In contrast, driller 2 has ﬁxed and marginal costs of 2 and 1, respectively. Assume that driller 1 observes how much oil is underground. That is, s1 equals the quantity of oil. Driller 2 obtains no private information. Then, if the price of oil is 4, we have v 1 (s1 ) = (4 − 2)s1 − 1 = 2s1 − 1, v 2 (s1 ) = (4 − 1)s1 − 2 = 3s1 − 2. Observe that v 1 (s1 ) > v 2 (s1 ) if and only if s1 < 1. Thus, for efﬁciency, driller 1 should be awarded drilling rights provided that 12 < s1 < 1 ( for s1 < 12 , there is not enough oil to justify drilling at all). Driller 2, by contrast, should get the rights when s1 > 1. In this example, there is no way (either through a modiﬁed Vickrey auction or otherwise) of inducing driller 1 to reveal the true value s1 to allocate drilling rights efﬁciently. To see this, consider, without loss of generality, a direct revelation mechanism and let t1 (ˆs1 ) be a monetary transfer (possibly negative) to driller 1 if he announces signal value sˆ1 . Let s1 and s1

be signal values such that 1 < s1 < 1 < s1

. 2

(4.2)

Then, for driller 1 to have the incentive to announce truthfully when s1 = s1

, we must have t1 (s1

) ≥ 2s1

− 1 + t1 (s1 ) 8

(4.3)

Notice that the strictness of the inequality in (4.1) rules out the case of “pure common values,” where all buyers share the same valuation. However, in that case, the issue of who wins does not matter for efﬁciency.

8

Maskin

(the left-hand side is his payoff when he is truthful, whereas the right-hand side is his payoff when he pretends that s1 = s1 ). Similarly, the incentive-constraint corresponding to s1 = s1 is 2s1 − 1 + t1 (s1 ) ≥ t1 (s1

).

(4.4)

Subtracting (4.4) from (4.3), we obtain 2(s1 − s1

) ≥ 0, a contradiction of (4.2). Hence, there exists no efﬁcient mechanism. The feature that interferes with efﬁciency in this example is the violation of condition (4.1), i.e., the fact that 0

v 2o .

(4.7)

To understand the rationale for (4.6) and (4.7), imagine that buyers bid truthfully. Because signals are private information and thus buyer 1 will not in general know his own valuation, truthful bidding means that, if his signal value

Auctions and Efﬁciency

9

is s1 , he submits a schedule bˆ 1 (·) = b1 (·) such that b1 (v 2 (s1 , s2 )) = v 1 (s1 , s2 )

for all s2 .9

(4.8)

That is, whatever s2 (and hence v 2 ) turns out to be, buyer 1 bids his true valuation for that signal value. Similarly, truthful bidding for buyer 2 with signal value s2 means reporting schedule bˆ 2 (·) = b2 (·), such that b2 (v 1 (s1 , s2 )) = v 2 (s1 , s2 )

for all s1 .

(4.9)

Observe that if buyers bid according to (4.8) and (4.9), then the true valuations (v 1 (s1 , s2 ), v 2 (s1 , s2 )) constitute a ﬁxed point in the sense of (4.6).10 In view of (4.6) and (4.7), this means that if buyers are truthful, the auction will result in an efﬁcient allocation. Thus, the remaining critical issue is how to get buyers to bid truthfully. For this purpose, it is useful to recall the device that the Vickrey auction exploits to induce truthful bidding, viz. to make the winner’s payment equal, not to his own bid, but to the lowest possible bid he could have made and still have won the auction. This trick cannot be exactly replicated in our setting because buyers are submitting schedules rather than single bids. But let us try to take it as far as it will go. Suppose that when buyers report the schedules (bˆ 1 (·), bˆ 2 (·)), the resulting ﬁxed point (v 1o , v 2o ) satisﬁes v 1o > v 2o . Then, according to our rules, buyer 1 should win. But rather than having him pay v 1o , we will have buyer 1 pay v 1∗ , where v 1∗ = bˆ 2 (v 1∗ ).

(4.10)

This payment rule, I maintain, is the common-values analog of the Vickrey trick in the sense that v 1∗ is the lowest constant bid (i.e., the lowest uncontingent bid) that buyer 1 could make and still win (or tie for winning) given buyer 2’s bid bˆ 2 (·). The corresponding payment rule for buyer 2 should he win is v 2∗ such that v 2∗ = bˆ 1 v 2∗ . (4.11) I claim that, given the payment rules (4.10) and (4.11), it is an equilibrium for buyers to bid truthfully. To see this most easily, let us make use of a strengthened 9

10

I noted in my arguments against direct revelation mechanisms that buyer 1 most likely will not know buyer 2’s signal space S2 . But this in no way should prevent him from understanding how his own valuation is related to that of buyer 2, which is what (4.8) is really expressing [i.e., (4.8) still makes sense even if buyer 1 does not know what values s2 can take]. Without further assumptions on valuation functions, there could be additional – nontruthful – ﬁxed points. Dasgupta and Maskin (2000) and Eso and Maskin (2000a) provide conditions to rule such ﬁxed points out. But even if they are not ruled out, the auction rules can be modiﬁed so that, in equilibrium, the truthful ﬁxed point results (see Dasgupta and Maskin, 2000).

10

Maskin

version of (4.1): ∂v j ∂v i > . ∂si ∂si

(4.12)

Let us suppose that buyer 2 is truthful, i.e., he bids b2 (·) satisfying (4.9). I must show that it is optimal for buyer 1 to bid b1 (·) satisfying (4.8). Notice ﬁrst that if buyer 1 wins, his payoff is v 1 (s1 , s2 ) − v 1∗ , where v 1∗ = b2 v 1∗ , (4.13) regardless of how he bids (because neither his valuation nor his payment depends on his bid). I claim that if buyer 1 bids truthfully, then he wins if and only if (4.13) is positive. Observe that if this claim is established, then I will in fact have shown that truthful bidding is optimal; because buyer 1’s bid does not affect (4.13), the most he can possibly hope for is to win precisely in those cases where the net payoff from winning is positive. To see that the claim holds, let us ﬁrst differentiate (4.9) with respect to s1

to obtain db2 ∂v 1

∂v 2

(v 1 (s1 , s2 )) (s1 , s2 ) = (s , s2 ) for all s1 . dv 1 ∂s1 ∂s1 1 This identity, together with (2.1) and (4.12), implies that db2 (v 1 ) < 1, dv 1

for all v 1 .

(4.14)

But, from (4.14), (4.13) is positive if and only if v 1 (s1 , s2 ) − v 1∗ >

db2

(v )(v 1 (s1 , s2 ) − v 1∗ ) dv 1 1

for all

v 1 .

(4.15)

Now, from the intermediate value theorem, there exists v 1 ∈ [v 1∗ , v 1 (s1 , s2 )] such that db2

(v )(v 1 (s1 , s2 ) − v 1∗ ). b2 (v 1 (s1 , s2 )) − b2 (v 1∗ ) = dv 1 1 Hence (4.13) is positive if and only if v 1 (s1 , s2 ) − v 1∗ > b2 (v 1 (s1 , s2 )) − b2 (v 1∗ ), which, because

v 1∗

=

b2 (v 1∗ ),

(4.16)

is equivalent to

v 1 (s1 , s2 ) > v 2 (s1 , s2 ).

(4.17)

Now suppose that buyer 1 is truthful. Because (v 1 (s1 , s2 ), v 2 (s1 , s2 )) is then a ﬁxed point, 1 wins if and only if (4.17) holds. So, we can conclude that, when buyer 1 is truthful, his net payoff from winning is positive [i.e., (4.13) is positive] if and only if he wins, which is what I claimed. That is, the modiﬁed Vickrey auction is efﬁcient. (This analysis ignores the possible costs to buyers of aquiring signals; once such costs are incorporated the modiﬁed Vickrey

Auctions and Efﬁciency

11

auction is no longer efﬁcient in general – see Maskin, 1992 and Bergeman and V¨alim¨aki, 2000.) An attractive feature of the Vickrey auction in the case of private values is that bidding one’s true valuation is optimal regardless of the behavior of other buyers (i.e., it is a dominant strategy). Once we abandon private values, however, there is no hope of ﬁnding an efﬁcient mechanism with dominant strategies (this is because, if my payoff depends on your signal, then my optimal strategy necessarily depends on the way that your strategy reﬂects your signal value, and so is not independent of what you do). Nevertheless, equilibrium in our modiﬁed Vickery auction has a strong robustness property. In particular, notice that although, technically, truthful bidding constitutes only a Bayesian (rather than dominant-strategy) equilibrium, equilibrium strategies are independent of the prior distribution of signals F. That is, regardless of buyers’ prior beliefs about signals, they will behave the same way in equilibrium. In particular, this means that the modiﬁed Vickrey auction will be efﬁcient even in the case in which buyers’ signals are believed to be independent of one another.11 It also means that truthful bidding will remain an equilibrium even after buyers learn one another’s signal values; i.e., truthful bidding constitutes an ex post Nash equilibrium. Finally Chung and Ely (2001) show that, at least in the two-buyer case, the modiﬁed Vickrey auction is dominant solvable. One might complain that having a buyer make his bid a function of the other buyer’s valuation imposes a heavy informational burden on him – what if he does not know anything about the connection between the other’s valuation and his own? I would argue, however, that the modiﬁed Vickrey auction should be viewed as giving buyers an additional opportunity rather than as setting an onerous requirement. After all, the degree to which a buyer makes his bid contingent is entirely up to him. In particular, he always has the option of bidding entirely uncontingently (i.e., of submitting a constant function). Thus, contingency is optional (but, of course, the degree to which the modiﬁed Vickrey auction will be more efﬁcient than the ordinary Vickrey will turn on the extent to which buyers are prepared to bid contingently). I have explicitly illustrated how the modiﬁed Vickrey auction works only in the case of two bidders, but the logic extends immediately to larger numbers. For the case of n buyers, the rules become: 1. Each buyer i submits a contingent bid schedule bˆ i (·), which is a function of v −i , the vector of valuations excluding that of buyer i. 2. The auctioneer computes a ﬁxed point (v 1o , . . . , v no ), where v io = o bˆ i (v −i ) for all i. 3. The winner is the buyer i for whom v io ≥ v oj for all j = i. 11

Cr´emer and McLean (1985) exhibit a mechanism that attains efﬁciency if the joint distribution of signals is common knowledge (including to the auction designer) and exhibits correlation. R. McLean and A. Postlewaite (2001) show how this sort of mechanism can be generalized to the case where the auction designer himself does not know the joint distribution.

12

Maskin ∗ ∗ 4. The winner pays max j=i bˆ j (v − j ), where, for all j = i, v j satisﬁes ∗ ∗ v j = bˆ j (v − j ).

Under conditions (2.1) and (4.1), an argument similar to the two-buyer demonstration establishes that it is an equilibrium in this auction for each buyer to bid truthfully (see Dasgupta and Maskin, 2000).12 That is, if buyer i’s signal value is si , he should set bˆ i (·) = bi (·) such that

)) = v i (si , s−i ) bi (v −i (si , s−i

13 for all s−i .

(4.18)

Furthermore, it is easy to see that, if buyers bid truthfully, the auction results in an efﬁcient allocation. One drawback of the modiﬁed Vickrey auction that I have exhibited is that a buyer must report quite a bit of information (this is an issue distinct from that of the buyer’s having to know a great deal, discussed previously) – a bid for each possible vector of valuations that others may have. Perry and Reny (1999a) have devised an alternative modiﬁcation of the Vickrey auction that considerably reduces the complexity of the buyer’s report. Speciﬁcally, the Perry–Reny auction consists of two rounds of bidding. This means that a buyer can make his second-round bid depend on whatever he learned about other buyers’ valuations from their ﬁrst-round bids, and so the auction avoids the need to report bid schedules. In the ﬁrst round, each buyer j i submits a bid bi ≥ 0. In the second round, each buyer i submits a bid bi for each buyer j = i. If some buyer submits a bid of zero in the ﬁrst round, then the Vickrey rules apply: the winner is the high bidder, and he pays the secondhighest bid. If all ﬁrst-round bids are strictly positive, then the second-round bids determine the outcome. In particular, if there exists a buyer i such that j

bi ≥ bij

for all

j = i,

(4.19)

then buyer i wins and pays max j=i bij . If there exists no i satisfying (4.19), then the good is allocated at random. Perry and Reny show that, under conditions (2.1) and (4.1) and provided that the probability a buyer has a zero valuation is zero, there exists an efﬁcient 12

13

The reader may wonder whether, when (4.1) is not satisﬁed and so an efﬁcient auction may not be possible, the efﬁciency of the ﬁnal outcome could be enhanced by allowing buyers to retrade after the auction is over. However, any postauction trading episode could alternatively be viewed as part of a single mechanism that embraces both it and the auction proper. That is, in our search for efﬁcient auctions, we need not consider postauction trade, because such activity could always be folded into the auction itself. Indeed, permitting trade after an auction can, in principle, distort buyers’ bidding in the same way that the prospect of renegotiation can distort parties’ behavior in the execution of a contract (see Dewatripont, 1989). Ausubel and Cramton (1999) argue that only an efﬁcient auction is exempt from such distortion. It is conceivable – although unlikely – that for a given vector v −i there could exist two dif and s

, such that v (s , s ) = v (s , s

) = v , but v (s , s ) = ferent signal vectors s−i −i i −i −i i −i −i i i −i −i

), in which case (4.18) is not well deﬁned. To see how to handle that possibility, see v i (si , s−i Dasgupta and Maskin (2000).

Auctions and Efﬁciency

13

equilibrium of this auction. They also demonstrate that the auction can be readily extended to the case in which multiple identical goods are sold, provided that a buyer’s marginal utility from additional units is declining. 5. THE ENGLISH AUCTION The reader may wonder why, in my discussion of efﬁciency, I have not brought up the English auction, the familiar open format in which (i) buyers call out bids publicly (with the proviso that each successive bid exceed the one before), (ii) the winner is the last buyer to make a bid, and (iii) the winner pays his bid. After all, the opportunity to observe other buyers’ bids in the English auction would seem to allow a buyer to make a conditional bid in the same way that the modiﬁed Vickrey auction does. However, as shown in Maskin (1992), Eso and Maskin (2000b), and Krishna (2000), the English auction is not efﬁcient in as wide a class of cases as the modiﬁed Vickrey auction. To see this, let us consider a variant of the English auction, sometimes called the “Japanese” auction (see Milgrom and Weber, 1982), which is particularly convenient analytically: 1. 2. 3. 4. 5.

All buyers are initially in the auction. The auctioneer raises the price continuously starting from zero. A buyer can drop out (publicly) at any time. The last buyer remaining wins. The winner pays the price prevailing when the penultimate buyer dropped out.

Now, in this auction, a buyer can indeed condition his drop-out point according to when other buyers have dropped out, allowing bids in effect to be conditional on other buyers’ valuations. However, a buyer can condition only on buyers who have already dropped out. Thus, for efﬁciency, buyers must drop out in the “right” order in the equilibrium. That this might not happen is illustrated by the following example from Eso and Maskin (2000a): Example 5.6. Suppose there are two buyers, where v 1 (s1 , s2 ) = 2 + s1 − 2s2 , and v 2 (s1 , s2 ) = 2 + s2 − 2s1 and s1 and s2 are distributed uniformly on [0, 1]. Notice ﬁrst that conditions (2.1) and (4.1) hold, so that the modiﬁed Vickrey auction results in an efﬁcient equilibrium allocation. Indeed, buyers’ equilibrium contingent bids are b1 (v 2 ) = 6 − 3s1 − 2v 2 ,

14

Maskin

and b2 (v 1 ) = 6 − 3s2 − 2v 1 . Now, consider the English auction. For i = 1, 2, let pi (si ) be the price at which buyer i drops out if his signal value is si . If the English auction were efﬁcient, then we would have s1 > s2

if and only if

p1 (s1 ) > p2 (s2 ).

( 䉬)

From symmetry, if s1 = s2 = s, then p1 (s1 ) = p2 (s2 ).

(䉬䉬)

But from (䉬) and (䉬䉬), pi (s + s) > pi (s) and so pi (·) is strictly increasing in si .

(䉬䉬䉬)

Thus, p1 (s) = v 1 (s, s) and p2 (s) = v 2 (s, s) [if v 1 (s, s) > p1 (s) and s1 = s2 = s, then buyer 1 drops out before the price reaches his valuation and so would do better to stay in a bit longer; if v 1 (s, s) < p1 (s), then buyer 1 stays in for prices above his valuation, and so would do better to drop out earlier]. But, v 1 (s, s) = 2 + s − 2s = 2 − s, which is decreasing in s, violating our ﬁnding that p1 (·) is increasing. In short, efﬁciency demands that a buyer with a lower signal value drop out ﬁrst. But, if buyer i’s signal value is s, he has the incentive to drop out when the price equals v 1 (s, s), and this function is decreasing in s. So, in equilibrium, buyers will not drop out in the right order. We conclude that the English auction does not have an efﬁcient equilibrium in this example. In Example 5.6, each buyer’s valuation is decreasing in the other buyer’s signal. Indeed, this feature is important: as Maskin (1992) shows, the English and Vickrey auctions are efﬁcient in the case n = 2 when valuations are nondecreasing functions of signals [and conditions (2.1) and (4.1) hold]. However, examples due to Perry and Reny (1999b), Krishna (2000), and Eso and Maskin (2000b) demonstrate that this result does not extend to more than two buyers. Nevertheless, Krishna (2000) provides some interesting conditions [considerably stronger than the juxtaposition of (2.1) and (4.1)] under which the English auction is efﬁcient with three or more buyers (see also Eso and Maskin, 2000b). Moreover, Izmalkov (2001) shows that these conditions can be relaxed considerably when reentry in the English auction is permitted. Finally Perry and Reny (1999b) shows that the English auction can be modiﬁed [in a way analogous

Auctions and Efﬁciency

15

to their (1999a) alteration of the Vickrey auction] that renders it efﬁcient under the same conditions as the modiﬁed Vickrey auction. In fact, this modiﬁed English auction extends to multiple (identical) units, as long as buyers’ marginal valuations are decreasing in the number of units consumed [in the multiunit case, the Perry–Reny auction is actually a modiﬁcation of the Ausubel (1997) generalization of the English auction]. 6. MULTIPLE GOODS In the same way that the ordinary Vickrey auction extends to multiple goods via the Groves–Clarke mechanism, so our modiﬁed Vickrey auction can be extended to handle more than one good. It is simplest to consider the case of two buyers, 1 and 2, and two goods, A and B. If there were private values, the pertinent information about buyer i would consist of three numbers (v i A , v i B, and v i AB ), his valuations, respectively, for good A, good B, and both goods together. Efﬁciency would then mean allocating the goods to maximize the sum of valuations. For example, it would be efﬁcient to allocate both goods to buyer 1 provided that v 1AB ≥ max{v 1A + v 2B , v 1B + v 2A , v 2AB }. The Groves–Clarke mechanism is the natural generalization of the Vickrey auction to a multigood setting. In this mechanism, buyers submit valuations (in our two-good, private-values model, each buyer i submits vˆ i A , vˆ i B , and vˆ i AB ); the goods are allocated in the way that maximizes the sum of the submitted valuations; and each buyer makes a payment equal to his marginal impact on the other buyers (as measured by their submitted valuations). Thus, in the private-values model, if buyer 1 is allocated good A, then he should pay vˆ 2AB − vˆ 2B ,

(6.1)

because vˆ 2AB would be buyer 2’s payoff were buyer 1 absent, vˆ 2B is his payoff given buyer 1’s presence, and so the difference between the two – i.e., (6.1) – is buyer 1’s marginal effect on buyer 2. Given private values, bidding one’s true valuation is a dominant strategy in the Vickrey auction, and the same is true in the Groves–Clarke mechanism. Hence, in view of its allocative rule, the mechanism is efﬁcient in the case of private values. But, as with the Vickrey auction, the Groves–Clarke mechanism is not efﬁcient when there are common values. Hence, I shall examine a modiﬁcation of Groves–Clarke analogous to that for Vickrey. As in the one-good case, assume that each buyer i (i = 1, 2) observes a private real-valued signal si . Buyer i’s valuations are functions of the two signals: v i A (s1 , s2 ), v i B (s1 , s2 ), v i AB (s1 , s2 ). The appropriate counterpart to condition (2.1) is the requirement that if H and H are two bundles of goods for which, given (s1 , s2 ), buyer i prefers H, then the intensity of that preference rises with si . That is, for all i = 1, 2 and for any

16

Maskin

two bundles, H, H = φ, A, B, AB, v i H (s1 , s2 ) − v i H (s1 , s2 ) > 0 ⇒

∂ (v i H (s1 , s2 ) − v i H (s1 , s2 )) > 0. ∂si (6.2)

Notice that if, in particular, H = A and H = φ, then (6.2) just reduces to the requirement that if v i A (s1 , s2 ) > 0, then ∂v i A /∂si (s1 , s2 ) > 0, i.e., to (2.1). Similarly, the proper generalization of (4.1) is the requirement that if, for given signal values, two allocations of goods are equally efﬁcient (i.e., give rise to the same sum of valuations), then an increase in si leads to the allocation that buyer i prefers to become the more efﬁcient. That is, for all i = 1, 2, and any two allocations (H1 , H2 ), (H1 , H2 ), if

2

v j H j (s1 , s2 ) =

j=1

then

2

v j H j (s1 , s2 )

and v i Hi (s1 , s2 ) > v i Hi (s1 , s2 ),

j=1

2 2 ∂ ∂ v j H j (s1 , s2 ) > v j H j (s1 , s2 ). ∂si j=1 ∂si j=1

(6.3)

Notice that, if just one good A was being allocated and the two allocations were (H1 , H2 ) = (A, φ) and (H1 , H2 ) = (φ, A), then, when i = 1, condition (6.3) would reduce to the requirement v 1A (s1 , s2 ) = v 2A (s1 , s2 )

if

and

v 1A (s1 , s2 ) > 0, (6.4)

then

∂v 1A ∂v 2A (s1 , s2 ) > (s1 , s2 ), ∂s1 ∂s1

which is just (4.1). An auction is efﬁcient in this setting if, for all (s1 , s2 ), the equilibrium allocation (H1o , H2o ) solves max

(H1 ,H2 )

2

v i Hi (s1 , s2 ).

i=1

Under assumptions (6.2) and (6.3), the following rules constitute an efﬁcient auction: 1. Buyer i submits schedules bˆ i A (·), bˆ i B (·), bˆ i AB (·), where for all H = A, B, AB and all v j , bˆ i H (v j ) = buyer i’s bid for H if buyer j’s ( j = i) valuations are v j = (v j A , v j B , v j AB ).

Auctions and Efﬁciency

17

2. The auctioneer computes a ﬁxed point (v 1o , v 2o ) such that, for all i and H, v ioH = bˆ i H v oj . 3. Goods are divided according to allocation (H1o , H2o ), where

2 v ioHi . H1o , H2o = arg max (H1 ,H2 )

i=1

4. Suppose that buyer 1 is allocated good A (i.e., H1o = A); if (i) there exists v 1∗ such that ∗ v 1A + bˆ 2B (v 1∗ ) = bˆ 2AB (v 1∗ ), (6.5) then buyer 1 pays bˆ 2AB (v 1∗ ) − bˆ 2B (v 1∗ ) ; ∗ if instead of (6.5), (ii) there exist vˆ 1∗ (with vˆ 1A ∗ ∗ vˆ 1A + bˆ 2B (ˆv 1∗ ) = vˆ 1B + bˆ 2A (ˆv 1∗ )

(6.6)

s2 + βs11 + γ s12

(7.1)

+ γ s12 , s1

+ s2

+ αs2 < s2 + βs11

(7.2)

but

so that, with (s11 , s12 ) = (s11 , s12 ), the good should be allocated to buyer 1 and,

with (s11 , s12 ) = (s11 , s12 ) , it should be allocated to buyer 2 [if β = γ , this conﬂict does not arise; the directions of the inequalities in (7.1) and (7.2) must be the same]. Hence, an efﬁcient auction is impossible when β = γ . However, because buyer 1 cares only about the sum s11 + s12 , it is natural to deﬁne

r1 = s11 + s12 and set w 1 (r1 , s2 ) = r1 + αs2 and w 2 (r1 , s2 ) = E s11 ,s12 [s2 + βs11 + γ s12 |s11 + s12 = r1 ]. Notice that we have reduced the two-dimensional signal s1 to the onedimensional signal r1 . Furthermore, provided that α, β, and γ are all less than 1 [so that condition (4.1) holds], our modiﬁed Vickrey auction is efﬁcient with respect to the “reduced” valuation functions w 1 (·) and w 2 (·) (because all the analysis of Section 4 applies). Hence, a moment’s reﬂection should convince the reader that, although full efﬁciency is impossible for the valuation functions v 1 (·) and v 2 (·) , the modiﬁed Vickrey auction is constrained efﬁcient, where “constrained” refers to the requirement that buyer 1 must behave the same way for any pair (s11 , s12 ) summing to the same r1 (in the terminology of Holmstrom and Myerson, 1983, the auction is “incentive efﬁcient”). Unfortunately, as Jehiel and Moldovanu (2001) show in their important paper, this trick of reducing a multidimensional signal to one dimension no longer works in general if there are multiple goods. To see the problem, suppose that, as in Section 5, there are two goods, A and B, but that now a buyer i (i = 1, 2, 3) receives two signals – one for each good. Speciﬁcally, let s1A and s1B

20

Maskin

be buyer i’s signals for A and B, respectively, and let his valuation functions be v i A (s1A , s2A , s3A )

and

v i B (s1B , s2B , s3B ) .

Assume that each buyer wants to buy at most one good. Let us ﬁrst ﬁx the signal values of buyers 2 and 3 at levels such that, as we vary s1A and s1B , either (i) it is efﬁcient to allocate good A to buyer 1 and B to 2, or (ii) it is efﬁcient to allocate good A to 2 and B to 3. In case (i), we have v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) > v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) , that is, v 1A (s1A , s2A , s3A ) > v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ),

(7.3)

whereas in case (ii), we have v 1A (s1A , s2A , s3A ) < v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ).

(7.4)

Notice that buyer 1’s objective function does not depend on s1B [s1B affects only buyer 1’s valuation for good B, but buyer 1 is not allocated B in either case (i) or (ii)]. Hence, the equilibrium outcome of any auction cannot turn on the value of this parameter. But this means that, if an auction is efﬁcient, which of case (i) or (ii) [i.e., which of (7.3) or (7.4)] holds cannot depend on s1B . We conclude, from the right-hand sides of (7.3) and (7.4), that v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ) must be independent of s1B . Expressed differently, we have ∂ ∂ v 3B (s1B , s2B , s3B ) = v 2B (s1B , s2B , s3B ). ∂s1B ∂s1B Repeating the argument for all other pairs of buyers and for good B, we have ∂v j H ∂v k H = , ∂si H ∂si H

for all

j = i = k

and

H = A, B.

(7.5)

Next, let us ﬁx the signal values of buyers 2 and 3 at levels such that, as we vary s1A and s1B , either (iii) it is efﬁcient to allocate A to buyer 1 and B to 2 or (iv) it is efﬁcient to allocate B to buyer 1 and A to 2. In case (iii), we have v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) > v 1B (s1B , s2B , s3B ) + v 2A (s1A , s2A , s3A ),

(7.6)

and in case (iv), v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) < v 1B (s1B , s2B , s3B ) + v 2A (s1A , s2A , s3A ).

(7.7)

Auctions and Efﬁciency

21

To simplify matters, let us assume that valuation functions are linear: v 1A (s1A , s2A , s3A ) = s1A + α12 s2A + α13 s3A ,

(7.8) (7.9)

v 1B (s1B , s2B , s3B ) = s1B + β12 s2B + β13 s3B , and similarly for buyers 2 and 3. Then (7.6) and (7.9) can be rewritten as

s1A − s1B > α21 s1A + α22 s2A + α23 s3A − β21 s1B − β22 s2B − β23 s3B (7.10) and s1A − s1B < α21 s1A + α22 s2A + α23 s3A − β21 s1B − β22 s2B − β23 s3B . (7.11) Now (because we have ﬁxed 2’s and 3’s signal values), buyer 1’s objective function depends only on s1A − s1B . That is, for any value of , buyer 1 will behave the same way for signal values (s1A , s1B ) as for (s1A + , s1B + ). Hence, in any auction, the equilibrium outcome must be the same for any value of . In particular, if the auction is efﬁcient, whether (7.10) or (7.11) applies cannot depend on ’s value. But, from the right-hand sides of (7.10) and (7.11), this can be the case only if α21 = β21 , i.e., only if ∂v 2A ∂v 2B = . ∂s1A ∂s1B Repeating the argument for the other buyers, we have ∂v j B ∂v j A = ∂si A ∂si B

for all

i

and

j = i.

(7.12)

The necessary conditions (7.5) and (7.12), due to Jehiel and Moldovanu (2001), are certainly restrictive. Nevertheless, as shown in Eso and Maskin (2000a), there is a natural class of cases in which they are automatically satisﬁed. Speciﬁcally, suppose that in our two-good model, each buyer wants at most one good (this is not essential). Assume that the true value of good A to buyer i, yi A , is the sum of a component z A common to all buyers and a component of z i A that is idiosyncratic to him. That is, yi A = z A + z i A . Similarly, assume that buyer i’s true valuation of good B, yi B , satisﬁes yi B = z B + z i B . Suppose, however, that buyer i does not directly observe his true valuations, but only noisy signals of them. That is, he observes si A and si B , where si A = yi A + εi A and si B = yi B + εi B .

22

Maskin

It can be shown (see Eso and Maskin, 2000a) that, if the random variables z H , z i H , εi H , i = 1, 2, 3, H = A, B, are independent, normal random variables and if the variances of εi H and z i H are proportional to that of z H , i.e., for all i, there exists kiε and ki z such that var εi H = kiε var z H

and

var z i H = ki z var z H , H = A, B,

then (7.5) and (7.12) are automatically satisﬁed and the modiﬁed Groves–Clarke mechanism discussed in Section 6 is an efﬁcient auction. 8. FURTHER WORK There is clearly a great deal of work remaining to be done on efﬁcient auctions, including dealing with the multiple good/multidimensional problem in cases where (7.5) and (7.12) do not hold. I would like to simply underscore one issue: ﬁnding an open auction counterpart to the modiﬁed Groves–Clarke mechanism in the case of multiple goods. The task of submitting contingent bids is considerable even for a single good. For multiple goods, it could be formidable. For this reason, as I have already discussed, researchers have sought open auctions – variants of the English auction – as desirable alternatives. Perry and Reny (1999b) have exhibited a lovely modiﬁcation of the Ausubel (1997) auction (which, in turn, elegantly extends the English auction to multiple identical goods). However, efﬁciency in that auction obtains only when all goods are identical and buyers’ marginal valuations are declining. It would be an important step, in my judgment, to ﬁnd a similar result without such restrictions on goods or preferences. ACKNOWLEDGMENTS I thank the National Science Foundation and the Beijer International Institute for research support and S. Baliga, S. Izmalkov, P. Jehiel, V. Krishna, and B. Moldovanu for helpful comments. Much of my research on efﬁcient auctions – and much of the work reported here – was carried out with my long-time collaborator and friend P. Dasgupta. More recently, I have had the pleasure of working with P. Eso. Others whose research ﬁgures prominently in the recent literature – and to whom I owe a considerable intellectual debt – include L. Ausubel, P. Jehiel, V. Krishna, B. Moldovanu, M. Perry, A. Postlewaite, R. McLean, and P. Reny. APPENDIX: BUYER 1’S PAYMENT WHEN ALLOCATED BOTH GOODS IN A TWO-GOOD, TWO-BUYER AUCTION If (a) there exists v 1∗ such that ∗ v 1AB = bˆ 2AB (v 1∗ ),

Auctions and Efﬁciency

then buyer 1 pays bˆ 2AB (v 1∗ ); if (a) does not hold and instead (b) there exists vˆ 1∗ such that ∗ ∗ = vˆ 1A + bˆ 2B (ˆv 1∗ ), vˆ 1AB

then if (b1) there exists v 1∗∗ such that ∗∗ + bˆ 2B (v 1∗∗ ) = bˆ 2AB (v 1∗∗ ), v 1A

buyer 1 pays bˆ 2B (ˆv 1∗ ) + (bˆ 2AB (v 1∗∗ ) − bˆ 2B (v 1∗∗ )); and if instead (b2) there exist vˆ 1∗∗ and vˆ 1∗∗∗ such that ∗∗ ∗∗ + bˆ 2B (ˆv 1∗∗ ) = vˆ 1B + bˆ 2A (ˆv 1∗∗ ) vˆ 1A and ∗∗∗ v 1B + bˆ 2A (v 1∗∗∗ ) = bˆ 2AB (v 1∗∗∗ ),

then buyer 1 pays bˆ 2B (ˆv 1∗ ) + (bˆ 2A (ˆv 1∗∗ ) − bˆ 2B (ˆv 1∗∗ )) + (bˆ 2AB (v 1∗∗∗ ) − bˆ 2A (v 1∗∗∗ )); ﬁnally, if ∗ (c) there exists vˆˆ 1 such that ∗ ∗ ∗ vˆˆ 1AB = vˆˆ 1B + bˆ 2A (vˆˆ 1 ),

then if (c1) there exists v 1∗∗ such that ∗∗ v 1B + bˆ 2A (v 1∗∗ ) = bˆ 2AB (v 1∗∗ ),

buyer 1 pays ∗∗ bˆ 2A (vˆˆ 1 ) + (bˆ 2AB (v 1∗∗ ) − bˆ 2A (v 1∗∗ ));

and if instead ∗∗ (c2) there exist vˆˆ 1 and vˆ 1∗∗∗ such that ∗∗ ∗∗ ∗∗ ∗∗ vˆˆ 1B + bˆ 2A (vˆˆ 1 ) = vˆˆ 1A + bˆ 2B (vˆˆ 1 )

and ∗∗∗ + bˆ 2B (v 1∗∗∗ ) = bˆ 2AB (v 1∗∗∗ ), v 1A

then buyer 1 pays ∗ ∗∗ ∗∗ bˆ 2A (vˆˆ 1 ) + (bˆ 2B (vˆˆ 1 ) − bˆ 2A (vˆˆ 1 )) + (bˆ 2AB (v 1∗∗∗ ) − bˆ 2B (v 1∗∗∗ )).

23

24

Maskin

References Ausubel, L. (1997), An Efﬁcient Ascending-Bid Auction for Multiple Objects, mimeo. Ausubel, L. and P. Cramton (1999), The Optimality of Being Efﬁcient, mimeo. Che, Y. K. and I. Gale (1996), Expected Revenue of the All-Pay Auctions and First-Price Sealed-Bid Auctions with Budget Constraints, Economics Letters, 50, 373–380. Chung, K.-C. and J. Ely (2001), Efﬁcient and Dominant Solvable Auction with Interdependent Valuations, mimeo. Clarke, E. (1971), Multipart Pricing of Public Goods, Public Choice, 11, 17–33. Cr´emer, J. and R. McLean (1985), Optimal Selling Strategies Under Uncertainty for a Discriminating Monopolist When Demands Are Interdependent, Econometrica, 53, 345–362. Dasgupta, P. and E. Maskin (2000), Efﬁcient Auctions, Quarterly Journal of Economics, 115, 341–388. Bergemann, D. and J. V¨alin¨aki (2001), Information Acquisition and Efﬁcient Mechanism Design, mimeo. Debreu, G. (1959), Theory of Value, New Haven, CT: Yale University Press. Dewatripont, M. (1989), Renegotiation and Information Revelation Over Time: The Case of Optimal Labor Contracts, Quarterly Journal of Economics, 104, 589–619. Eso, P. and E. Maskin (2000a), Multi-Good Efﬁcient Auctions with Multidimensional Information, mimeo. Eso, P. and E. Maskin (2000b), Notes on the English Auction, mimeo. Fiesler, K. T. Kittsteiner, and B. Moldovanu (2000), Partnerships, Lemons, and Efﬁcient Trade, mimeo. Gresik, T. (1991), Ex Ante Incentive Efﬁcient Trading Mechanisms without the Private Valuation Restriction, Journal of Economic Theory, 55, 41–63. Groves, T. (1973), Incentives in Teams, Econometrica, 41, 617–631. Holmstrom, B. and R. Myerson (1983), Efﬁcient and Durable Decision Rules with Incomplete Information, Econometrica, 51, 1799–1819. Izmalkov, S. (2001), English Auctions with Reentry, mimeo. Jehiel, P. and B. Moldovanu (2001), Efﬁcient Design with Interdependent Values, Econometrica, 69, 1237–1260. Krishna, V. (2000), Asymmetric English Auctions, mimeo. McLean, R., and A. Postlewaite (2001), Efﬁcient Auction Mechanisms with Interdependent Signals, mimeo. Maskin, E. (1992), Auctions and Privatization, in Privatization (ed. by H. Siebert), T¨ubingen: J. C. B. Mohr, 115–136. Maskin, E. (2000), Auctions, Development and Privatization: Efﬁcient Auctions with Liquidity-Constrained Buyers, European Economic Review, 44(4–6), 667–681. Maskin, E. and J. Riley (1984), Optimal Auctions with Risk-Averse Buyers, Econometrica, 52, 1473–1518. Milgrom, P. and R. Weber (1982), A Theory of Auctions and Competitive Bidding, Econometrica, 50, 1081–1122. Palfrey, T. (1993), Implementation in Bayesian Equilibrium, in Advances in Economic Theory (ed. by J. J. Laffont), Cambridge, U.K.: Cambridge University Press. Perry, M. and P. Reny (1999a), An Ex Post Efﬁcient Auction, mimeo. Perry, M. and P. Reny (1999b), An Ex Post Efﬁcient Ascending Auction, mimeo. Vickrey, W. (1961), Counterspeculation, Auctions, and Competitive Sealed Tenders, Journal of Finance, 16, 8–37.

CHAPTER 2

Why Every Economist Should Learn Some Auction Theory Paul Klemperer

Figure 2.1. Disclaimer: We don’t contend that the following ideas are all as important as the one illustrated, merely that those who haven’t imbibed auction theory are missing out on a potent brew! This chapter discusses the strong connections between auction theory and “standard” economic theory; we show that situations that do not at ﬁrst sight look like auctions can be recast to use auction-theoretic techniques; and we argue that auction-theoretic tools and intuitions can provide useful arguments and insights in a broad range of mainstream economic settings. We also discuss some more obvious applications, especially to industrial organization.

26

Klemperer

1. INTRODUCTION Auction theory has attracted enormous attention in the last few years.1 It has been increasingly applied in practice, and this has generated a new burst of theory. It has also been extensively used, both experimentally and empirically, as a testing ground for game theory.2 Furthermore, by carefully analyzing very simple trading models, auction theory is developing the fundamental building blocks for our understanding of more complex environments. But some people still see auction theory as a rather specialized ﬁeld, distinct from the main body of economic theory, and as an endeavor for management scientists and operations researchers rather than as a part of mainstream economics. This paper aims to counter that view. This view may have arisen in part because auction theory was substantially developed by operational researchers, or in operations research journals,3 and using technical mathematical arguments rather than standard economic intuitions. But it need not have been this way. This paper argues that the connections between auction theory and “standard” economic theory run deeper than many people realize; that auction-theoretic tools provide useful arguments in a broad range of contexts; and that a good understanding of auction theory is valuable in developing intuitions and insights that can inform the analysis of many mainstream economic questions. In short, auction theory is central to economics. We pursue this agenda in the context of some of the main themes of auction theory: the revenue equivalence theorem, marginal revenues, and ascending vs. (ﬁrst-price) sealed-bid auctions. To show how auction-theoretic tools can be applied elsewhere in economics, Section 2 exploits the revenue equivalence theorem to analyze a wide range of applications that are not, at ﬁrst sight, auctions, including litigation systems, ﬁnancial crashes, queues, and wars of attrition. To illustrate how looser analogies can usefully be made between auction theory and economics, Section 3 applies some intuitions from the comparison of ascending and sealed-bid auctions to other economic settings, such as rationing and e-commerce. To demonstrate the deeper connections between auction theory and economics, Section 4 discusses and applies the close parallel between the optimal auction problem and that of the discriminating monopolist; both are about maximizing marginal revenues. Furthermore, auction-theoretic ways of thinking are also underutilized in more obvious areas of application, for instance, price-setting oligopolies we 1 2 3

See Klemperer (1999a) for a review of auction theory; many of the most important contributions are collected in Klemperer (2000). See Figure 2.1. Kagel (1995) and Laffont (1997) are excellent recent surveys of the experimental and empirical work, respectively. Section 6 of this paper and Klemperer (2002a) discuss practical applications. The earliest studies appear in the operations research literature, for example, Friedman (1956). Myerson’s (1981) breakthrough article appeared in Mathematics of Operations Research, and Rothkopf’s (1969) and Wilson’s (1967, 1969) classic early papers appeared in Management Science. Ortega’s (1968) pathbreaking models of auctions, including a model of signaling that signiﬁcantly predated Spence (1972), remain relatively little known by economists, perhaps because they formed an operations research Ph.D. thesis.

Auction Theory

27

discuss in Section 5.4 Few non-auction theorists know, for example, that marginal-cost pricing is not always the only equilibrium when identical ﬁrms with constant marginal costs set prices, or know the interesting implications of this fact. Section 6 brieﬂy discusses direct applications of auction theory to markets that are literally auction markets, including electricity markets, treasury auctions, spectrum auctions, and internet markets, and we conclude in Section 7. 2. USING AUCTION-THEORETIC TOOLS IN ECONOMICS: THE REVENUE EQUIVALENCE THEOREM Auction theory’s most celebrated theorem, the Revenue Equivalence Theorem (RET), states conditions under which different auction forms yield the same expected revenue, and also allows revenue rankings of auctions to be developed when these conditions are violated.5 Our purpose here, however, is to apply it in contexts where the use of an auction model might not seem obvious. Revenue Equivalence Theorem. Assume each of a given number of riskneutral potential buyers has a privately known valuation independently drawn from a strictly increasing atomless distribution, and that no buyer wants more than one of the k identical indivisible prizes. Then, any mechanism in which (i) the prizes always go to the k buyers with the highest valuations and (ii) any bidder with the lowest feasible valuation expects zero surplus, yields the same expected revenue (and results in each bidder making the same expected payment as a function of her valuation).6 More general statements are possible, but are not needed for the current purpose. Our ﬁrst example is very close to a pure auction. 2.1.

Comparing Litigation Systems

In 1991, U.S. Vice President Dan Quayle suggested reforming the U.S. legal system in the hope, in particular, of reducing legal expenditures. One of his 4

5 6

Of course, standard auction models form the basic building blocks of models in many contexts. See, for example, Stevens’ (1994, 2000) models of wage determination in oligopsonistic labor markets, Bernheim and Whinston (1986), Feddersen and Pesendorfer (1996, 1998), Persico (2000) and many others’ political economy models, and many models in ﬁnance (including, of course, takeover battles, to which we give an application in Section 4). Another major area we do not develop here is the application of auction theorists’ understanding of the winner’s curse to adverse selection more generally. For example, Klemperer’s (1999a) survey develops a series of revenue rankings starting from the RET. See Klemperer (1999a, Appendix A) for more general statements and an elementary proof. The theorem was ﬁrst derived in an elementary form by Vickrey (1961, 1962) and subsequently extended to greater generality by Myerson (1981), Riley and Samuelson (1981), and others.

28

Klemperer

proposals was to augment the current rule according to which parties pay their own legal expenses, by a rule requiring the losing party to pay the winner an amount equal to the loser’s own expenses. Quayle’s intuition was that if spending an extra $1 on a lawsuit might end up costing you $2, then less would be spent. Was he correct?7 A simple starting point is to assume each party has a privately known value of winning the lawsuit relative to losing, independently drawn from a common, strictly increasing, atomless distribution;8 that the parties independently and simultaneously choose how much money to spend on legal expenses; and that the party who spends the most money wins the “prize” (the lawsuit).9 It is not too hard to see that both the existing U.S. system and the Quayle system satisfy the assumptions of the RET, so the two systems result in the same expected total payments on lawyers.10 Thus Quayle was wrong (as usual); his argument is precisely offset by the fact that the value of winning the lawsuit is greater when you win your opponent’s expenses.11 Ah, Quayle might say, but this calculation has taken as given the set of lawsuits that are contested. Introducing the Quayle scheme will change the “bidding functions,” that is, change the amount any given party spends on litigation, and also change who decides to bring suits. Wrong again, Dan! Although it is correct that the bidding functions change, the RET also tells us (in its parenthetical remark) that any given party’s expected payoffs from the lawsuit are unchanged, so the incentives to bring lawsuits are unchanged. What about other systems, such as the typical European system in which the loser pays a fraction of the winner’s expenses? This is a trick question: It is no longer true that a party with the lowest possible valuation can spend nothing and lose nothing. In this case, this party always loses in equilibrium and 7

8

9

10

11

This question was raised and analyzed (although not by invoking the RET) by Baye, Kovenock, and de Vries (1997). The ideas in this section, except for the method of analysis, are drawn from them. See also Baye, Kovenock, and de Vries (1998). For example, a suit about which party has the right to a patent might ﬁt this model. The results extend easily to common-value settings, e.g., contexts in which the issue is the amount of damages that should be transferred from one party to another. American seminar audiences typically think this is a natural assumption, but non-Americans often regard it as unduly jaundiced. Of course, we use it as a benchmark only, to develop insight and intuition (just as the lowest price does not win the whole market in most real “Bertrand” markets, but making the extreme assumption is a common and useful starting point). Extensions are possible to cases in which with probability (1 − λ) the “most deserving” party wins, but with probability λ > 0, the biggest spender wins. The fact that no single “auctioneer” collects the players’ payments as revenues, but that they are instead dissipated in legal expenses in competing for the single available prize (victory in the lawsuit), is of course irrelevant to the result. Formally, checking our claims requires conﬁrming that there are equilibria of the games that satisfy the RET’s assumptions. The assumption we made that the parties make a one-shot choice of legal expenses is not necessary, but makes conﬁrming this relatively easy. See Baye, Kovenock, and de Vries (1997) for explicit solutions. Some readers might argue they could have inferred the effectiveness of the proposal from the name of the proponent, without need of further analysis. In fact, however, this was one of Dan Quayle’s policy interventions that was not subject to immediate popular derision.

Auction Theory

29

must pay a fraction of the winner’s expenses, and so makes negative expected surplus. Thus, condition (ii) of the RET now fails. Thinking through the logic of the proof of the RET makes clear that all the players are worse off than under the previous systems.12 Thus, legal bills are higher under the European rule. The reason is that the incentives to win are greater than in the U.S. system, and there is no offsetting effect. Here, of course, the issue of who brings lawsuits is important because low-valuation parties would do better not to contest suits in this kind of system; consistent with our theory, there is empirical evidence (e.g., Hughes and Snyder, 1995) that the American system leads to more trials than, for example, the British system. This last extension demonstrates that even where the RET in its simplest form fails, it is often possible to see how the result is modiﬁed; Appendix 1 shows how to use the RET to solve for the relative merits of a much broader class of systems in which those we have discussed are special cases. We also show there that a system that might be thought of as the exact opposite of Quayle’s system is optimal in this model. Of course, many factors are ignored (e.g., asymmetries); the basic model should be regarded as no more than a starting point for analysis. 2.2.

The War of Attrition

Consider a war of attrition in which N players compete for a prize. For example, N ﬁrms compete to be the unique survivor in a natural monopoly market, or N ﬁrms each hold out for the industry to adopt the standard they prefer.13 Each player pays costs of 1 per unit time until she quits the game. When just one player remains, that player also stops paying costs and wins the prize. There is no discounting. The two-player case, where just one quit is needed to end the game, has been well analyzed.14 Does the many-player case yield anything of additional interest? Assume players’ values of winning are independently drawn from a common, strictly increasing, atomless distribution, and the game has an equilibrium satisfying the other conditions of the RET. Then the RET tells us that, in expectation, 12

13

14

As Appendix 1 discusses, every type’s surplus is determined by reference to the lowest valuation type’s surplus [see, also, Klemperer (1999a, Appendix A)], and the lowest type is worse off in the European system. Again, our argument depends on condition (i) of the RET applying. See Appendix 1 and Baye et al. (1997). Another related example analyzed by Bulow and Klemperer (1999) is that of N politicians, each delaying in the hope of being able to avoid publicly supporting a necessary but unpopular policy that requires the support of N − 1 to be adopted. See, for example, Maynard Smith (1974) and Riley (1980) who discuss biological competition, Fudenberg and Tirole (1986) who discuss industrial competition, Abreu and Gul (2000), Kambe (1999), and others who analyze bargaining, and Bliss and Nalebuff (1984) who give a variety of amusing examples. Bliss and Nalebuff note that extending to K + 1 players competing for K prizes does not change the analysis in any important way, because it remains true that just one quit is needed to end the game.

30

Klemperer

the total resources spent by the players in the war of attrition equal those paid by the players in any other mechanism satisfying the RET’s conditions – e.g., a standard ascending auction in which the price rises continuously until just one player remains and (only) the winner pays the ﬁnal price. This ﬁnal price will equal the second-highest actual valuation, so the expected total resources dissipated in the war of attrition are the expectation of this quantity. Now imagine the war of attrition has been under way long enough that just the two highest-valuation players remain. What are the expected resources that will be dissipated by the remaining two players, starting from this time on? The RET tells us that they equal the auctioneer’s expected revenue if the war of attrition was halted at this point and the objects sold to the remaining players by an ascending auction, that is, the expected second-highest valuation of these two remaining players. This is the same quantity, on average, as before!15 Thus the expected resources dissipated, and hence the total time taken until just two players remain must be zero; all but the two highest-valuation players must have quit at once. Of course, this conclusion is, strictly speaking, impossible; the lowestvaluation players cannot identify who they are in zero time. However, the conclusion is correct in spirit, in that it is the limit point of the unique symmetric equilibria of a sequence of games that approaches this game arbitrarily closely (and there is no symmetric equilibrium of the limit game).16 Here, therefore, the role of the RET is less to perform the ultimate analysis than it is to show that there is an interesting and simple result to be obtained.17 Of course by developing intuition about what the result must be, the RET also makes proving it much 15

16

17

Of course, the expectation of the second-highest valuation of the last two players is computed when just these two players remain, rather than at the beginning of the war of attrition as before. But, on average, these two expectations must be the same. Bulow and Klemperer (1999) analyze games in which each player pays costs at rate 1 before quitting, but must continue to pay costs even after quitting at rate c per unit time until the whole game ends. The limit c → 0 corresponds to the war of attrition discussed here. (The case c = 1 corresponds, for example, to “standards battles” or political negotiations in which all players bear costs equally until all have agreed on the same standard or outcome; this game also has interesting properties; see Bulow and Klemperer.) Other series of games, for example games in which being kth to last to quit earns a prize of ε k−1 times one’s valuation, with ε → 0, or games in which players can quit only at the discrete times 0, ε, 2ε, . . . , with ε → 0, also yield the same outcome in the limit. It was the RET that showed Bulow and Klemperer that there was an analysis worth doing. Many people, and some literature, had assumed the many-player case would look like the two-player case, but with more complicated expressions, although Fudenberg and Kreps (1987) and Haigh and Cannings (1989) observed a similar result to ours in games without any private information and in which all players’ values are equal. However, an alternative way to see the result in our war of attrition is to imagine the converse, but that a player is within ε of her planned quit time when n > 1 other players remain. Then, the player’s cost of waiting as planned is of the order ε, but her beneﬁt is of the order ε n , because only when all n other players are within ε of giving up will she ultimately win. So, for small ε, she will prefer to quit now rather than wait; but, in this case, she should, of course, have quit ε earlier, and so on. So, only when n = 1 is delay possible.

Auction Theory

31

easier. Furthermore, the RET was also useful in the actual analysis of the more complex games that Bulow and Klemperer (1999) used to approximate this game. In addition, anyone armed with a knowledge of the RET can simplify the analysis of the basic two-player war of attrition.

2.3.

Queueing and Other “All-Pay” Applications

The preceding applications have both been variants of “all-pay” auctions. As another elementary example of this kind, consider different queueing systems (e.g., for tickets to a sporting event). Under not unreasonable assumptions, a variety of different rules of queue management (e.g., making the queue more or less comfortable, informing or not informing people whether the number queueing exceeds the number who will receive a ticket, etc.) will make no difference to the social cost of the queueing mechanism. As in our litigation example (Section 2.1), we think of these results as a starting point for analysis rather than as ﬁnal conclusions.18 Many other issues – such as lobbying battles, political campaigns,19 tournaments in ﬁrms, contributions to public goods,20 patent races, and some kinds of price-setting oligopoly (see Section 5.2) – can be modeled as all-pay auctions and may provide similar applications.

2.4.

Solving for Equilibrium Behavior: Market Crashes and Trading “Frenzies”

The examples thus far have all proceeded by computing the expected total payments made by all players. But, the RET also states that each individual’s expected payment must be equal across mechanisms satisfying the assumptions. This fact can be used to infer what players’ equilibrium actions must be in games that would be too complex to solve by any direct method of computing optimal behavior.21 Consider the following model. The aim is to represent, for example, a ﬁnancial or housing market and show that trading “frenzies” and price “crashes” 18 19 20 21

Holt and Sherman (1982) compute equilibrium behavior and hence obtain these results without using the RET. See, especially, Persico (2000). Menezes, Monteiro, and Temimi (2000) use the RET in this context. The same approach is also an economical method of computing equilibrium bids in many standard auctions. For example, in an ascending auction for a single unit, the expected payment of a bidder equals her probability of winning times the expected second-highest valuation among all the bidders conditional on her value being higher. So, the RET implies that her equilibrium bid in a standard all-pay auction equals this quantity. Similarly, the RET implies that her equilibrium bid in a ﬁrst-price, sealed-bid auction equals the expected second-highest valuation among all the bidders, conditional on her value being higher. See Klemperer (1999a, Appendix A) for more details and discussion.

32

Klemperer

are the inevitable outcome of rational strategic behavior in a market that clears through a sequence of sales rather than through a Walrasian auctioneer. There are N potential buyers, each of whom is interested in securing one of K available units. Without fully modeling the selling side of the market, we assume it generates a single asking price at each instant of time according to some given function of buyer behavior to date. Each potential buyer observes all prices and all past offers to trade, and can accept the current asking price at any instant, in which case, supply permitting, the buyer trades at that price. Thus traders have to decide both whether and when to offer to buy, all the while conditioning their strategies on the information that has been revealed in the market to date. Regarding the function generating the asking prices, we specify only that (i) if there is no demand at a price, then the next asking price is lower, and (ii) if demand exceeds remaining supply at any instant, then no trade actually takes place at that time but the next asking price is higher and only those who attempted to trade are allowed to buy subsequently.22 Note, however, that even if we did restrict attention to a speciﬁc price-setting process, the direct approach of computing buyers’ optimal behavior using ﬁrst-order conditions as a function of all prior behavior to solve a dynamic program would generally be completely intractable. To use the RET, we must ﬁrst ensure that the appropriate assumptions are satisﬁed. We assume, of course, that buyers’ valuations are independently drawn from a common, strictly increasing, atomless distribution, and that there is no discounting during the time the mechanism takes. Furthermore, the objects do eventually go to the highest-valuation buyers, and the lowest-possible-valuation buyer makes zero surplus in equilibrium, because of our assumption that if demand ever exceeds remaining supply, then no trade takes place and nondemanders are henceforth excluded. So, the RET applies, and it also applies to any subgame of the whole game.23 Under our assumptions, then, starting from any point of the process, the remainder of the game is revenue equivalent to what would result if the game were halted at that point and the remaining k objects were sold to the remaining buyers using a standard ascending auction [which sells all k objects at the (k + 1)st-highest valuation among the remaining buyers]. At any point of our game, therefore, we know the expected payment of any buyer in the remainder of our game, and therefore also the buyer’s expected payment conditional on 22 23

Additional technical assumptions are required to ensure that all units are sold in ﬁnite time. See Bulow and Klemperer (1994) for full details. If, instead, excess demand resulted in random rationing, the highest-valuation buyers might not win, violating the requirements of the RET; so, even if we thought this was more natural, it would make sense to begin with our assumption to be able to analyze and understand the process using the RET. The effects of the alternative assumption could then be analyzed with the beneﬁt of the intuitions developed using the RET. Bulow and Klemperer (1994) proceed in exactly this way.

Auction Theory

33

winning.24 But any potential buyer whose expected payment conditional on winning equals or exceeds the current asking price will attempt to buy at the current price.25 This allows us to completely characterize buyer behavior, so fully characterizes the price path for any given rule generating the asking prices. It is now straightforward to show (see Bulow and Klemperer, 1994) that potential buyers are extremely sensitive to the new information that the price process reveals. It follows that almost any seller behavior – e.g., starting at a very high price and slowly lowering the price continuously until all the units are sold or there is excess demand – will result in “frenzies” of trading activity in which many buyers bid simultaneously, even though there is zero probability that two buyers have the same valuation.26 Furthermore, these frenzies will sometimes lead to “crashes” in which it becomes common knowledge that the market price must fall a substantial distance before any further trade will take place.27 Bulow and Klemperer also show that natural extensions to the model (e.g., “common values,” the possibility of resale, or an elastic supply of units) tend to accentuate frenzies and crashes. Frenzies and crashes arise precisely because buyers are rational and strategic; by contrast, buyer irrationality might lead to “smoother” market behavior. Of course, our main point here is not the details of the process, but rather that the RET permits the solution and analysis of the dynamic price path of a market that would otherwise seem completely intractable to solve for.

24

25

26

27

Speciﬁcally, if k objects remain, the buyer’s expected payment conditional on winning will be the expected (k + 1)st-highest valuation remaining conditional on the buyer having a valuation among the k-highest remaining, and conditional on all the information revealed to date. This is exactly the buyer’s expected payment conditional on winning an object in the ascending auction, because in both cases only winners pay and the probability of a bidder winning is the same. The marginal potential buyer, who is just indifferent about bidding now, either will win now or will never win an object. (If bidding now results in excess demand, this bidder will lose to inframarginal current bidders, because there is probability zero that two bidders have the same valuation.) So, conditional on winning, this bidder’s actual payment is the current price. Inframarginal bidders, whose expected payment conditional on winning exceeds the current price, may eventually end up winning an object at above the current price. To see why a frenzy must arise if the price is lowered continuously, note that, for it to be rational for any potential buyer to jump in and bid ﬁrst, there must be positive probability that there will be a frenzy large enough to create excess demand immediately after the ﬁrst bid. Otherwise, the strategy of waiting to bid until another player has bid ﬁrst would guarantee a lower price. For more general seller behavior, the point is that while buyers’ valuations may be very dispersed, higher-valuation buyers are all almost certainly inframarginal in terms of whether to buy and are therefore all solving virtually identical optimization problems of when to buy. So, a small change in asking price, or a small change in market conditions (such as the information revealed by a single trade) at a given price, can make a large number of buyers change from being unwilling to trade to wanting to trade. The only selling process that can surely avoid a frenzy is a repeated Dutch auction. The price process is also extremely sensitive to buyer valuations; an arbitrarily small change in one buyer’s value can discontinuously and substantially change all subsequent trading prices.

34

Klemperer

3. TRANSLATING LOOSER ANALOGIES FROM AUCTIONS INTO ECONOMICS: ASCENDING VS. (FIRST-PRICE) SEALED-BID AUCTIONS A major focus of auction theory has been contrasting the revenue and efﬁciency properties of “ascending” and “sealed-bid” auctions.28 Ideas and intuitions developed in these comparisons have wide applicability. 3.1.

Internet Sales vs. Dealer Sales

There is massive interest in the implications of e-commerce and internet sales. For example, the advent of internet sales in the automobile industry as a partial replacement for traditional methods of selling through dealers has been widely welcomed in Europe;29 the organization of the European automobile market is currently a major policy concern in both ofﬁcial circles and the popular press, and the internet sales are seen as increasing “transparency.” But is transparency a good thing? Auction theory shows that internet sales need not be good for consumers. Clearly, transparent prices beneﬁt consumers if they reduce consumers’ search costs so that, in effect, there are more competitors for every consumer,30 and internet sales may also lower prices by cutting out the ﬁxed costs of dealerships, albeit by also cutting out the additional services that dealers provide. But, transparency also makes internet sales more like ascending auctions, by contrast with dealer sales that are more like (ﬁrst-price) sealed-bid auctions, and we will show this is probably bad for consumers. Transparent internet prices are readily observable by a ﬁrm’s competitors and therefore result, in effect, in an “ascending” auction; a ﬁrm knows if and when its offers are being beaten and can rapidly respond to its competitors’ offers if it wishes. Viewing each car sale as a separate auction, the price any consumer faces falls until all but one ﬁrm quits bidding to sell to him. (The price is, of course, descending because ﬁrms are competing to sell, but the process corresponds exactly to the standard ascending auction among bidders competing to buy an object, and we therefore maintain the standard “ascending” terminology.) On the other hand, shopping to buy a car from one of the competing dealers is very much like procuring in a (ﬁrst-price) “sealed-bid” auction. It is typically impossible to credibly communicate one dealer’s offer to another. (Car dealers 28

29

30

By “sealed-bid,” we mean standard, ﬁrst-price, sealed-bid auctions. “Ascending” auctions have similar properties to second-price, sealed-bid auctions. See Klemperer (1999a) for an introduction to the different types of auctions. See, for example, “May the Net Be with You,” Financial Times, October 21, 1999, p. 22. In the UK, Vauxhaul began selling a limited number of special models over the Internet late in 1999, while Ford began a pilot project in Finland. There may be both a direct effect (that consumers can observe more ﬁrms) and an indirect effect (that new entry is facilitated). See Baye and Morgan (2001) and K¨uhn and Vives (1994) for more discussion.

Auction Theory

35

often deliberately make this hard by refusing to put an offer in writing.) From the buyer’s perspective, it is as if sellers were independently making sealed-bid offers in ignorance of the competition. Of course, the analogies are imperfect,31 but they serve as a starting point for analysis. What, therefore, does auction theory suggest? Because, under the conditions of the revenue equivalence theorem, there is no difference between the auction forms for either consumer or producer welfare, we consider the implications of the most important violations of the conditions. First, market demand is downward sloping, not inelastic.32 Hansen (1988) showed that this means consumers always prefer the sealed-bid setting, and ﬁrms may prefer it also; the sum of producer and consumer surpluses is always higher in a sealed-bid auction.33 The intuition is that, in an “ascending” auction, the sales price equals the runner-up’s cost, and is therefore less reﬂective of the winner’s cost than is the sealed-bid price. So, the sealed-bid auction is more productively efﬁcient (the quantity traded better reﬂects the winner’s cost) and provides greater incentive for aggressive bidding (a more aggressive sealed bid not only increases the probability of winning, but also increases the quantity traded contingent on winning). Second, we need to consider the possibilities for collusion, implicit or explicit. The general conclusion is that ascending auctions are more susceptible to collusion, and this is particularly the case when, as in our example, many auctions of different car models and different consumers are taking place simultaneously.34 As has been observed in the United States and German auctions of radiospectrum, for example, bidders may be able to tacitly coordinate on dividing up the spoils in a simultaneous ascending auction. Bidders can use the early rounds when prices are still low35 to signal their views about who should win which objects, and then, when consensus has been reached, tacitly agree 31

32

33 34

35

The analogies are less good for many other products. For lower-value products than cars, internet sales are less like an “ascending” auction because search costs will allow price dispersion, while traditional sales through posted prices in high-street stores are more like “ascending” auctions than are dealer sales of cars. Note also that the outcomes of the two auction types differ most when competitors have private information about their costs, which is more likely when competitors are original manufacturers than when competitors are retailers selling goods bought at identical prices from the same wholesaler. For an individual consumer, demand might be inelastic for a single car up to a reservation price. From the point of view of the sellers who do not know the consumer’s reservation price, the expected market demand is downward sloping. Of course, Hansen is maintaining the other important assumptions of the revenue equivalence theorem. See Robinson (1985) and Milgrom (1987) for discussion of the single-unit case. See Ausubel and Schwartz (1999), Brusco and Lopomo (1999), Cramton and Schwartz (2000), EngelbrechtWiggans and Kahn (1998), Menezes (1996), and Weber (1997), for the multi-unit case. Klemperer (2002a) reviews these arguments and gives many examples. Bidders are competing to buy rather than sell spectrum, so prices are ascending rather than descending.

36

Klemperer

to stop pushing prices up.36 The same coordination cannot readily be achieved in simultaneous sealed-bid auctions, in which there is neither the opportunity to signal, nor the ability to retaliate against a bidder who fails to cooperate.37 The conclusion is less stark when there are many repetitions over time, but it probably remains true that coordination is easier in ascending auctions. Furthermore, as is already well understood in the industrial organization literature,38 this conclusion is strengthened by the different observabilities of internet and dealer sale prices that make mutual understanding of ﬁrms’ strategies, including defections from “agreements,” far greater in the internet case. Thus selling over the internet probably makes it easier for ﬁrms to collude. A third important issue is that bidders may be asymmetric. Then “ascending” auctions are generally more efﬁcient (because the lowest-cost bidders win39 ), but sealed-bid auctions typically yield lower consumer prices.40 In this case economists generally favor ascending auctions, but competition-policy practitioners should usually prefer sealed-bid auctions because most competition regimes concentrate on consumer welfare. Furthermore, this analysis ignores the impact of auction type on new entry in the presence of asymmetries. Because an “ascending” auction is generally efﬁcient, a potential competitor with even a slightly higher cost (or lower quality) than an incumbent will see no point in entering the auction. However, the same competitor might enter a sealed-bid auction, which gives a weaker bidder a shot 36

37

38 39

40

For example, in a 1999 German spectrum auction, Mannesman bid a low price for half the licenses and a slightly lower price for the other half. Here is what one of T-Mobil’s managers said: “There were no agreements with Mannesman. But [T-Mobil] interpreted Mannesman’s ﬁrst bid as an offer.” T-Mobil understood that it could raise the bid on the other half of the licenses slightly, and that the two companies would then “live and let live,” with neither company challenging the other on “their” half. Just that happened. The auction closed after just two rounds, with each of the bidders having half the licenses for the same low price. See Jehiel and Moldovanu (2000) and Grimm et al. (2001). In U.S. FCC auctions, bidders have used the ﬁnal three digits of multimillion dollar bids to signal the market id codes of the areas they coveted, and a 1997 auction that was expected to raise $1,800 million raised less than $14 million. See Cramton and Schwartz (2001), and “Learning to Play the Game,” The Economist, May 17, 1997, p. 120. Klemperer (2002a) gives many more examples. The low prices in the ascending auction are supported by the threat that, if a bidder overbids a competitor anywhere, then the competitor will retaliate by overbidding the ﬁrst bidder on markets where the ﬁrst bidder has the high bids. At least since Stigler (1964). To the extent that the auctions for individual consumers are independent single-unit auctions, an ascending auction is efﬁcient under a broad class of assumptions if bidders’ private signals are single-dimensional, even with asymmetries among bidders and common-value components to valuations. See Maskin (1992). A price-minimizing auction allocates the object to the bidder with the lowest “virtual cost,” rather than to the one with the lowest actual cost. (See Section 4; virtual cost is the analogous concept to marginal revenue for an auction to buy an object.) Compared with an ascending auction, a sealed-bid auction discriminates in favor of selling to “weaker” bidders, whose costs are drawn from higher distributions, because they bid more aggressively (closer to their actual costs) than stronger ones. But, for a given cost, a weaker bidder has a lower virtual cost than a stronger one. So, the sealed-bid auction often, but not always, yields lower prices. See Section 7.1 of Klemperer (1999a).

Auction Theory

37

at winning. The extra competition may lower prices substantially. Of course, the entry of the weaker competitor may also slightly reduce efﬁciency, but if competition is desirable per se, or if competition itself improves efﬁciency, or if the objective is consumer welfare rather than efﬁciency, then the case for sealed-bid auctions is very strong (see next subsection and Klemperer, 2002a). Although there are other dimensions in which our setting fails the revenue equivalence assumptions, they seem less important.41 It follows that the transparency induced between ﬁrms that makes internet sales more like ascending auctions than sealed-bid auctions is probably bad for consumers. Although gains from lower consumer search costs and dealer costs could certainly reverse this conclusion, auction-theoretic considerations mount a strong case against “transparent” internet sales.42 In another application of auction-theoretic insights to e-commerce, Bulow and Klemperer (2002b) apply Milgrom and Weber’s (1982) celebrated linkage principle to show when the price discrimination that internet markets make possible helps consumers. 3.2.

Anglo-Dutch Auctions, a Theory of Rationing, and Patent Races

The last disadvantage of ascending auctions discussed earlier – the dampening effect on entry – has been very important in practical auction contexts (see Klemperer 2002a). For example, in the main (1995) auction of U.S. mobilephone licenses, some large potential bidders such as MCI, the U.S.’s third-largest phone company, failed to enter at all, and many other bidders were deterred from competing seriously for particular licenses such as the Los Angeles and New York licenses, which therefore sold at very low prices.43 Entry was therefore a prominent concern when the UK planned an auction of four UMTS “thirdgeneration” mobile-phone licenses in 1998 for a market in which four companies operated mobile telephone services and therefore had clear advantages over any new entrant.44 In this case, the design chosen was an “Anglo-Dutch” auction as ﬁrst proposed in Klemperer (1998):45 in an Anglo-Dutch auction for four licenses, the 41 42

43

44 45

Other violations of the revenue equivalence assumptions may include buyer and seller risk aversion that both favor sealed-bid auctions and afﬁliation of costs that favors ascending auctions. Empirical evidence is limited. Lee (1998) and Lee et al. (1999) ﬁnd electronic markets yield higher prices than conventional markets for cars. Scott Morton et al. (2001) ﬁnd that California customers get lower prices if they use automobile internet sites, but this is unsurprising because these sites merely refer customers to dealers for price quotes, so behave more like traditional dealers than like the “transparent” sites that we have described and that are being promised in Europe. See Klemperer and Pagnozzi (2002) for econometric evidence of these kinds of problems in U.S. spectrum auctions, Bulow and Klemperer (2000) and Klemperer (1998) for extensive discussion, and Bulow, Huang, and Klemperer (1999) for related modeling. Bidders could not be allowed to win more than one license each. See Klemperer (1998, 2002a) and Radiocommunications Agency (1998a,1998b) for more details and for variants on the basic design. (The Agency was advised by Binmore, Klemperer, and others.)

38

Klemperer

price rises continuously until ﬁve bidders remain (the “English” stage), after which the ﬁve survivors make sealed bids (required to be no lower than the current price level) and the four winners pay the fourth-highest bid (the “Dutch” stage). Weak bidders have an incentive to enter such an auction because they know they have a chance of winning at the sealed-bid stage if they can survive to be among the ﬁve ﬁnalists. The design accepts some risk of an ex post inefﬁcient allocation to increase the chance of attracting the additional bidders that are necessary for a successful auction and reasonable revenues.46,47 Translating this idea into a more traditional economics context suggests a theory of why ﬁrms might ration their output at prices in which there is excess demand as, for example, microprocessor manufacturers routinely do after the introduction of a new chip. Raising the price to clear the market would correspond to running an ascending auction. It would be ex post efﬁcient and ex post proﬁt maximizing, but would give poor incentives for weaker potential customers who fear being priced out of the market to make the investments necessary to enter the market (such as the product design necessary to use the new chip). Committing to rationing at a ﬁxed price at which demand exceeds supply is ex post inefﬁcient,48 but may encourage more entry into the market and so improve ex ante proﬁts. Details and more examples are in Gilbert and Klemperer (2000). A similar point is that a weaker ﬁrm may not be willing to enter a patent race in which all parties can observe others’ progress. Such a race is akin to an ascending auction in which a stronger rival can always observe and overtake a weaker ﬁrm, which therefore has no chance of winning.49 A race in which rivals’ progress cannot be monitored is more akin to a sealed-bid auction and may attract more entry.

46 47

48 49

The additional bidders might yield a higher price even after the English stage, let alone after the ﬁnal stage, than in a pure ascending auction. The design performed very successfully in laboratory testing, but the auction was delayed until 2000, and technological advances made it possible to offer ﬁve licenses, albeit of different sizes. The additional license resolved the problem of attracting new entrants, and because collusion was not a serious problem in this case (bidders were not allowed to win more than one license each), it was decided to switch to a simultaneous ascending design. The actual UK auction was very successful, but the wisdom of the UK decision not to run an ascending auction when the number of strong bidders equaled the number of licences was conﬁrmed when the Netherlands did just this three months later, and raised little more than one-quarter of the per capita revenue raised by the UK In large part, the Netherlands’ problem was that their ascending auction deterred entry, Denmark also had the same number of strong bidders as licences, and (successfully) used a sealed-bid auction for similar reasons that the UK would have run an Anglo-Dutch auction in this context. (In Denmark it was clear that there were too few potential bidders to make an Anglo stage worthwhile.) See Klemperer (2002a, 2002b, 2002c) for more detail. We assume any resale is inefﬁcient. But see Cramton, Gibbons, and Klemperer (1987). Of course, this point is closely related to the idea of “ε-preemption” in R&D races with observability that has already been well discussed in the standard industrial organization literature (Fudenberg et al. 1983).

Auction Theory

39

These analogies illustrate how an insight that is routine in auction theory may help develop ideas in economics more broadly. 4. EXPLOITING DEEPER CONNECTIONS BETWEEN AUCTIONS AND ECONOMICS: MARGINAL REVENUES The previous sections showed how a variety of economic problems can be thought of in auction-theoretic terms, allowing us to use tools such as the revenue equivalence theorem and intuitions such as those from the comparison of ascending and sealed-bid auctions. This section explains that the connections between auction theory and standard economic theory run much deeper. Much of the analysis of optimal auctions can be phrased, like the analysis of monopoly, in terms of “marginal revenues.” Imagine a ﬁrm whose demand curve is constructed from an arbitrarily large number of bidders whose values are independently drawn from a bidder’s value distribution. When bidders have independent private values, a bidder’s “marginal revenue” is deﬁned as the marginal revenue of this ﬁrm at the price that equals the bidder’s actual value (see Figure 2.2).50 Although it had been hinted at before,51 the key point was ﬁrst explicitly drawn out by Bulow and Roberts (1989), who showed that under the assumptions of the revenue equivalence theorem the expected revenue from an auction equals the expected marginal revenue of the winning bidder(s). The new results in the article were few – the paper largely mimicked Myerson (1981), while renaming Myerson’s concept of “virtual utility” as “marginal revenue”52,53 – but their contribution was nevertheless important. Once the connection had been made, it was possible to take ways of thinking that are second nature to economists from the standard theory of monopoly pricing and apply them to auction theory. 50

51 52

53

The point of this construction is particularly clear when a seller faces a single bidder whose private value is distributed according to F(v). Then, setting a take-it-or-leave-it price of v yields expected sales, or “demand,” 1 − F(v), expected revenue of v(1 − F(v)), and expected marginal revenue d(qv)/dq = v − (1 − F(v))/ f (v). See Appendix B of Klemperer (1999a). For example, Mussa and Rosen’s (1978) analysis of monopoly and product quality contained expressions for “marginal revenue” that look like Myerson’s (1981) analysis of optimal auctions. Myerson’s results initially seemed unfamiliar to economists, in part because his basic analysis (although not all his expressions) expressed virtual utilities as a function of bidders’ values, which correspond to prices, and so computed revenues by integrating along the vertical axis, whereas we usually solve monopoly problems by expressing marginal revenues as functions of quantities and integrating along the horizontal axis of the standard (for monopoly) picture. Bulow and Roberts emphasize the close parallel between a monopolist third-degree pricediscriminating across markets with different demand curves, and an auctioneer selling to bidrevenue ders whose valuations are drawn from different distributions. For the { monopolist auctioneer }, { expected revenue } is maximized by selling to the { consumers } with the highest marginal revenue(s), not necessarily bidder } with marginal revenue less than the the highest value(s), subject to never selling to a { consumer bidder marginal cost { monopolist’s auctioneer’s own valuation }, assuming (i) resale can be prohibited, (ii) credible commitment can be marginal revenue curves are all downward sloping future sales higher “types” of any bidder have higher marginal revenues than lower “types” }, etc. made to { no sticking to any reserve price }, and (iii) { of the same bidder

40

Klemperer

Figure 2.2. Construction of marginal revenue of bidder with value v˜ drawn from distribution F(v) on [v, v¯ ].

For example, once the basic result (that an auction’s expected revenue equals the winning bidder’s expected marginal revenue) was seen, Bulow and Klemperer (1996) were able to use a simple monopoly diagram to derive it more simply and under a broader class of assumptions than had previously been done by Myerson or Bulow and Roberts.54 Bulow and Klemperer also used standard monopoly intuition to derive additional results in auction theory. The main beneﬁts from the marginal-revenue connection come from translating ideas from monopoly analysis into auction analysis, because most economists’ intuition for and understanding of monopoly is much more highly developed than for auctions. But, it is possible to go in the other direction, too, from auction theory to monopoly theory. Consider, for example, the main result of Bulow and Klemperer (1996): Proposition 4.1 (Auction-Theoretic Version). An optimal auction of K units to Q bidders earns less proﬁt than a simple ascending auction (without a reserve price) of K units to Q + K bidders, assuming (a) bidders are symmetric, (b) bidders are serious (i.e., their lowest possible valuations exceed the seller’s supply cost), and (c) bidders with higher valuations have higher marginal revenues.55 Proof. See Bulow and Klemperer (1996). 54 55

See Appendix B of Klemperer (1999a) for an exposition. See Bulow and Klemperer (1996) for a precise statement. We do not require bidders’ valuations to be private, but do place some restrictions on the class of possible mechanisms from which the “optimal” one is selected, if bidders are not risk neutral or their signals are not independent. We assume bidders demand a single unit each.

Auction Theory

41

Application. One application is to selling a ﬁrm (so, K = 1). Because the seller can always resort to an ascending auction, attracting a single additional bidder is worth more than any amount of negotiating skill or bargaining power against an existing bidder or bidders, under reasonable assumptions. Thus, there is little justiﬁcation for, for example, accepting a “lock-up” bid for a company without fully exploring the interest of alternative possible purchasers. The optimal auction translates, for large Q and K , to the monopolist’s optimum. An ascending auction translates to the competitive outcome, in which price-taking ﬁrms make positive proﬁts only because of the ﬁxed supply of units. (An ascending auction yields the K + 1st-highest value among the bidders; in a perfectly competitive market, an inelastic supply of K units is in equilibrium with demand at any price between the K th and K + 1st-highest value, but the distinction is unimportant for large K .) So, one way of expressing the result in the market context is: Proposition 4.2 (Monopoly-Theoretic Version). A perfectly competitive industry with (ﬁxed) capacity K and Q consumers would gain less by fully cartelizing the industry (and charging the monopoly price) than it would gain by attracting K new potential customers into the industry with no change in the intensity of competition, assuming (a ) the K new potential consumers have the same distribution of valuations as the existing consumers, (b ) all consumers’ valuations for the product exceed sellers’ supply costs (up to sellers’ capacity), and (c ) the marginal-revenue curve constructed from the market-demand curve is downward sloping.56

Proof. No proof is required – the proposition is implied by the auction-theoretic version – but once we know the result we are looking for and the necessary assumptions, it is very simple to prove it directly using introductory undergraduate economics. We do this in a brief Appendix 2. Application. One application is that this provides conditions under which a joint-marketing agency does better to focus on actually marketing rather than (as some of the industrial organization literature suggests) on facilitating collusive practices.57 5. APPLYING AUCTION THEORY TO PRICE-SETTING OLIGOPOLIES We have stressed the applications of auction theory to contexts that might not be thought of as auctions, but even though price-setting oligopolies are obviously 56 57

We are measuring capacity in units such that each consumer demands a single unit of output. Appendix 2 makes it clear how the result generalizes. Of course, the agency may wish to pursue both strategies in practice.

42

Klemperer

auctions, the insights that can be obtained by thinking of them in this way are often passed by.

5.1.

Marginal-Cost Pricing Is Not the Unique Bertrand Equilibrium

One of the most famous results in economics is the “Bertrand paradox,” that with just two ﬁrms with constant and equal marginal costs in a homogeneous products industry, the unique equilibrium is for both ﬁrms to set price equal to marginal cost and ﬁrms earn zero proﬁt. This “theorem” is widely quoted in standard texts. But, it is false. There are other equilibria with large proﬁts, for some standard demand curves, a fact that seems until recently to have been known only to a few auction theorists.58 Auction theorists are familiar with the fact that a boundary condition is necessary to solve a sealed-bid auction. Usually, this is imposed by assuming no bidder can bid less than any bidder’s lowest possible valuation, but there are generally a continuum of equilibria if arbitrarily negative bids are permitted.59 Exactly conversely, with perfectly inelastic demand for one unit and, for example, two risk-neutral sellers with zero costs, it is a mixed-strategy equilibrium for each ﬁrm to bid above any price p with probability k/ p, for any ﬁxed k. (Each ﬁrm therefore faces expected residual demand of constant elasticity −1, and is therefore indifferent about mixing in this way; proﬁts are k per ﬁrm.) It is not hard to see that a similar construction is possible with downwardsloping demand, for example, standard constant-elasticity demand, provided that monopoly proﬁts are unbounded. [See, especially, Baye and Morgan (1999a) and Kaplan and Wettstein (2000).] One point of view is that the nonuniqueness of the “Bertrand paradox” equilibrium is a merely technical point, because it requires “unreasonable” (even though often assumed60 ) demand. However, the construction immediately suggests another more important result: quite generally (including for demand which becomes zero at some ﬁnite choke price), there are very proﬁtable mixed-strategy ε-equilibria to the Bertrand game, even though there are no pure-strategy ε-equilibria. That is, there are mixed strategies that are very different from marginal-cost pricing 58

59

60

We assume ﬁrms can choose any prices. It is well known that if prices can be quoted only in whole pennies, there is an equilibrium with positive (but small) proﬁts in which each ﬁrm charges one penny above cost. (With perfectly inelastic demand, there is also an equilibrium in which each ﬁrm charges two pennies above cost.) For example, if each of two risk-neutral bidders’ private values is independently drawn from a uniform distribution on the open interval (0, 1), then for any nonnegative k there is an equilibrium in which a player with value v bids v/2 − k/v. If it is common knowledge that both bidders have value zero, there is an equilibrium in which each player bids below any price − p with probability k/ p, for any ﬁxed nonnegative k. This demand can, for example, yield unique and ﬁnite-proﬁt Cournot equilibrium.

Auction Theory

43

in which no player can gain more than a very small amount, ε, by deviating from the strategies.61 (There are also “quantal response” equilibria with a similar ﬂavor.) Experimental evidence suggests that these strategies may be empirically relevant (see Baye and Morgan, 1999b).62

5.2.

The Value of New Consumers

The revenue equivalence theorem (RET) can, of course, be applied to pricesetting oligopolies.63 For example: what is the value of new consumers in a market with strong brand loyalty? If ﬁrms can price discriminate between new uncommitted consumers and old “locked-in” consumers, Bertrand competition for the former will mean their value is low, but, what if price discrimination is impossible? In particular, it is often argued that new youth smokers are very valuable to the tobacco industry because brand loyalty (as well as loyalty to the product) is very high (only about 10 percent of smokers switch brands in any year), so price-cost margins on all consumers are very high. Is there any truth to this view? The answer, of course, under appropriate assumptions, is that the RET implies that the ability to price discriminate is irrelevant to the value of the new consumers (see the discussion in Section 2). With price discrimination, we can model the oligopolists as acting as monopolists against their old customers, and as being in an “ascending”64 price auction for the uncommitted consumers with the ﬁrm that is prepared to price the lowest selling to all these consumers at the cost of the runner-up ﬁrm. Alternatively, we can model the oligopolists as making sealed bids for the uncommitted consumers, with the lowest bidder selling to these consumers at its asking price. The expected proﬁts are the same under the RET assumptions. Absent price discrimination, a natural model is the latter one, but in addition each oligopolist must discount its price to its own locked-in customers down to the price it bids for the uncommitted consumers. The RET tells us that the total cost to the industry of these “discounts” to old consumers will, on average, precisely compensate the higher

61

62

63

64

Of course, the concept of mixed-strategy ε-equilibrium used here is even more contentious than either mixed-strategy (Nash) equilibria or (pure-strategy) ε-equilibrium. The best defense for it may be its practical usefulness. Spulber (1995) uses the analogy with a sealed-bid auction to analyze a price-setting oligopoly in which, by contrast with our discussion, ﬁrms do not know their rivals’ costs. For a related application of auction theory to price-setting oligopoly, see Athey et al. (2000). As another example, Vives (1999) uses the revenue equivalence theorem to compare pricesetting oligopoly equilibria with incomplete and complete (or shared) information about ﬁrms’ constant marginal costs, and so shows information sharing is socially undesirable in this context. The price is descending because the oligopolists are competing to sell rather than buy, but it corresponds to an ascending auction in which ﬁrms are competing to buy, and we stick with this terminology as in Section 3.1.

44

Klemperer

sale price achieved on new consumers.65 That is, the net value to the industry of the new consumers is exactly as if there was Bertrand competition for them, even when the inability to price discriminate prevents this. Thus, Bulow and Klemperer (1998) argue that the economic importance to the tobacco companies of the youth market is actually very tiny, even though from an accounting perspective new consumers appear as valuable as any others.66 Similarly, applying the same logic to an international trade question, the value of a free-trading market to ﬁrms, each of which has a protected home market, is independent (under appropriate assumptions) of whether the ﬁrms can price discriminate between markets.67 Section 3.1’s discussion of oligopolistic e-competition develops this kind of analysis further by considering implications of failures of the RET. 5.3.

Information Aggregation in Perfect Competition

Although the examples cited previously, and in Section 3,68 suggest auction theory has been underused in analyzing oligopolistic competition, it has been very important in inﬂuencing economists’ ideas about the limit as the number of ﬁrms becomes large. An important strand of the auction literature has focused on the properties of pure-common-value auctions as the number of bidders becomes large, and asked: does the sale price converge to the true value, thus fully aggregating all of the economy’s information even though each bidder has only partial information? Milgrom (1979) and Wilson (1977) showed assumptions under which the answer is “yes” for a ﬁrst-price, sealed-bid auction. Milgrom (1981) obtained similar results for a second-price auction [or for a (k + 1)th-price auction for k objects].69 These models justify some of our ideas about perfect competition. 65

66 67 68 69

Speciﬁcally let n “old” consumers be attached to each ﬁrm i, and ﬁrms’ costs ci be independently drawn from a common, strictly increasing, atomless distribution. There are m “new” consumers who will buy from the cheapest ﬁrm. All consumers have reservation price r . Think of ﬁrms competing for the prize of selling to the new consumers, worth m(r − ci ) to ﬁrm i. Firms set prices pi = r − di to “new” consumers; equivalently, they set “discounts” di to consumers’ reservation prices. If price discrimination is feasible, the winner pays mdi for the prize and all ﬁrms sell to their old consumers at r . Absent price discrimination, the prices pi apply to all ﬁrms’ sales, so relative to selling just to old consumers at price r , the winner pays (m + n)di for the prize and the losers pay ndi each. For the usual reasons, the two sets of payment rules are revenue equivalent. For more discussion of this result, including its robustness to multiperiod contexts, see Bulow and Klemperer (1998); if the total demand of new consumers is more elastic, their economic value will be somewhat less than our model suggests; for a fuller discussion of the effects of “brand loyalty” or “switching costs” in oligopoly, see, especially, Beggs and Klemperer (1992) and Klemperer (1987a, 1987b, 1995). If industry executives seem to value the youth segment, it is probably due more to concern for their own future jobs than concern for their shareholders. See also Rosenthal (1980). Bulow and Klemperer (2002b) provides an additional example. Matthews (1984), on the other hand, showed that the (ﬁrst-price) sale price does not in general converge to the true value when each bidder can acquire information at a cost. Pesendorfer and

Auction Theory

45

6. APPLYING AUCTION THEORY (AND ECONOMICS) TO AUCTION MARKETS Finally, although it has not always been grasped by practitioners, some markets are literally auctions. The increasing recognition that many real markets are best understood through the lens of auction theory has stimulated a burst of new theorizing,70 and created the new subject of market design that stands in similar relation to auction theory as engineering does to physics. 6.1.

Important Auction Markets

It was not initially well understood that deregulated electricity markets, such as in the United Kingdom, are best described and analyzed as auctions of inﬁnitely divisible quantities of homogeneous units.71 Although much of the early analysis of the UK market was based on Klemperer and Meyer (1989), which explicitly followed Wilson’s (1979) seminal contribution to multiunit auctions, the Klemperer and Meyer model was not thought of as an “auctions” paper, and only recently received much attention among auction theorists.72 Indeed, von der Fehr and Harbord (1993) were seen as rather novel in pointing out that the new electricity markets could be viewed as auctions. Now, however, it is uncontroversial that these markets are best understood through auction theory, and electricity market design has become the province of leading auction theorists, such as Wilson, who have been very inﬂuential. Treasury bill auctions, like electricity markets, trade a divisible homogeneous good; but, although treasury auctions have always been clearly understood to be “auctions,” and the existing auction theory is probably even more relevant to treasury markets than to electricity markets,73 auction theorists have never been as inﬂuential as they are now in energy markets. In part, this is

70 71 72

73

Swinkels (1997) recently breathed new life into this literature, by showing convergence under weaker assumptions than previously if the number of objects for sale, as well as the number of bidders, becomes large. See also Kremer (2000), Swinkels (2001), and Pesendorfer and Swinkels (2000). Especially on multiunit auctions in which bidders are not restricted to winning a single unit each, because most markets are of this kind. von der Fehr and Harbord (1998) provide a useful overview of electricity markets. Klemperer and Meyer (1989) was couched as a traditional industrial organization study of the question of whether competition is more like Bertrand or Cournot, following Klemperer and Meyer (1986). Non-auction-theoretic issues that limit the direct application of auction theory to electricity markets include the very high frequency of repetition among market participants who have stable and predictable requirements, which makes the theory of collusion in repeated games also very relevant; the nature of the game the major electricity suppliers are playing with the industry regulator who may step in and attempt to change the rules (again) if the companies are perceived to be making excessive proﬁts; the conditions for new entry; and the effects of vertical integration of industry participants. On the other hand, the interaction of a treasury auction with the ﬁnancial markets for trading the bills both before and after the auction complicates the analysis of that auction.

46

Klemperer

because the treasury auctions predated any relevant theory,74 and the auctions did not seem to have serious problems. In part it may be because no clear view has emerged about the best form of auction to use; indeed, one possibility is that the differences between the main types of auction may not be too important in this context – see Klemperer (2002a).75 Academics were involved at all stages of the radiospectrum auctions, from suggesting the original designs to advising bidders on their strategies. The original U.S. proponents of an auction format saw it as a complex environment that needed academic input, and a pattern of using academic consultants was set in the U.S. and spread to other countries.76 Many other new auction markets are currently being created using the internet, such as the online consumer auctions run by eBay, Amazon, and others that have more than 10 million customers, and the business-to-business auto parts auctions being planned by General Motors, Ford, and Daimler-Chrysler that are expected to handle $250 million in transactions a year. Here, too, auction theorists have been in heavy demand, and there is considerable ongoing

74 75

76

By contrast, the current U.K. government sales of gold are a new development, and government agencies have now consulted auction theorists (including myself) about the sale method. In a further interesting contrast, the U.K. electricity market – the ﬁrst major market in the world to be deregulated and run as an auction – was set up as a uniform price auction, but its perceived poor performance has led to a planned switch to an exchange market, followed by a discriminatory auction (see Klemperer 2002a; Ofﬁce of Gas and Electricity Markets 1999; Newbery 1998, Wolfram 1998, 1999). Meanwhile, the vast majority of the world’s treasury bill markets have until recently been run as discriminatory auctions (see Bartolini and Cottarelli 1997), but the U.S. switched to uniform price auctions in late 1998, and several other countries have been experimenting with these. In fact, it seems unlikely that either form of auction is best either for all electricity markets or for all treasury markets (see, e.g., Klemperer 1999b, Federico and Rahman 2000, McAdams 1998, Nyborg and Sundaresan 1996). Evan Kwerel was especially important in promoting the use of auctions. The dominant design has been the simultaneous ascending auction sketched by Vickrey (1976), and proposed and developed by McAfee, Milgrom, and Wilson for the U.S. auctions. (See McMillan 1994, McAfee and McMillan 1996, and especially Milgrom forthcoming.) Although some problems have emerged, primarily its susceptibility to collusion and its inhospitability to entry (see Section 3.2), it has generally been considered a success in most of its applications (see, e.g., Board 1999, Cramton 1997, Plott 1997, Salant 1997, Weber 1997, and Zheng 1999). A large part of the motivation for the U.S. design was the possibility of complementarities between licenses (see Ausubel et al. 1997), although it is unproven either that the design was especially helpful in allowing bidders to aggregate efﬁcient packages, or that it would work well if complementarities had been very signiﬁcant. Ironically, the simultaneous ascending auction is most attractive when each of an exogenously ﬁxed number of bidders has a privately known value for each of a collection of heterogeneous objects, but (contrary to the U.S. case) is restricted to buying at most a single license. In this case, entry is not an issue, collusion is very unlikely, and the outcome is efﬁcient. For this reason a version of the simultaneous ascending auction was designed by Binmore and Klemperer for the U.K. 3G auction (in which each bidder was restricted to a single license) after concerns about entry had been laid to rest. A sealed-bid design was recently used very successfully in Denmark where attracting entry was a serious concern. See Section 3.2, see Binmore and Klemperer (2002) for discussion of the U.K. auction, and see Klemperer (2002a, 2002b, 2002c, 2002d, 2002e) for a discussion of the recent European spectrum auctions.

Auction Theory

47

experimentation with different auctions forms.77 Furthermore, we have already argued that internet markets that are not usually thought of as auctions can be illuminated by auction theory [see Section 3.1 and Bulow and Klemperer (2002b)]. 6.2.

Applying Economics to Auction Design

Although many economic markets are now fruitfully analyzed as auctions, the most signiﬁcant problems in auction markets and auction design are probably those with which industry regulators and competition authorities have traditionally been concerned – discouraging collusive, predatory, and entry-deterring behavior, and analyzing the merits of mergers or other changes to market structure. This contrasts with most of the auction literature that focuses on Nash equilibria in one-shot games with a ﬁxed number of bidders, and emphasizes issues such as the effects of risk aversion, correlation of information, budget constraints, complementarities, asymmetries, etc. Certainly these are also important topics – and auction theorists have made important progress on them that other economic theory can learn from – but they are probably not the main issues. Although the relative thinness of the auction-theoretic literature on collusion and entry deterrence may be defensible to the extent general economic principles apply, there is a real danger that auction theorists will underemphasize these problems in applications. In particular, ascending, second-price, and uniformprice auction forms, although attractive in many auction theorists’ models, are more vulnerable to collusive and predatory behavior than (ﬁrst-price) sealed-bid and hybrid forms, such as the Anglo-Dutch auction described in Section 3.2. Klemperer (2002a) provides an extensive discussion of these issues. Although auction theorists are justly proud of how much they can teach economics, they must not forget that the classical lessons of economics continue to apply. 7. CONCLUSIONS Auction theory is a central part of economics and should be a part of every economist’s armory; auction theorists’ ways of thinking shed light on a whole range of economic topics. We have shown that many economic questions that do not at ﬁrst sight seem related to auctions can be recast to be solvable using auction-theoretic techniques, such as the revenue equivalence theorem. The close parallels between auction theory and standard price theory – such as those between the theories of optimal auctions and of price discrimination – mean ideas can be arbitraged 77

See, e.g., Hall (2001). The UK government recently used the internet to run the world’s ﬁrst auction for greenhouse gas emissions reductions. (Peter Cramton, Eric Maskin, and I advised on the design, and Larry Ausubel and Jeremy Bulow also helped with the implemention.)

48

Klemperer

from auction theory to standard economics, and vice versa. The insights and intuitions that auction theorists have developed in comparing different auction forms can ﬁnd fertile application in many other contexts. Furthermore, although standard auction theory models already provide the basis of much work in labor economics, political economy, ﬁnance, and industrial organization, we have used the example of price-setting oligopoly to show that a much greater application of auction-theoretic thinking may be possible in these more obvious ﬁelds. “Heineken refreshes the parts other beers cannot reach” was recently voted one of the top advertising campaigns of all time, worldwide. The moral of this paper is that, “Auction theory refreshes the parts other economics cannot reach.” Like Heineken, auction theory is a potent brew that we should all imbibe.

ACKNOWLEDGMENTS Susan Athey was an excellent discussant of this paper. I have also received extremely helpful comments and advice from many other friends and colleagues, including Larry Ausubel, Mike Baye, Alan Beggs, Simon Board, Jeremy Bulow, Peter Cramton, Joe Farrell, Giulio Federico, Nils Hendrik von der Fehr, Dan Kovenock, David McAdams, Peter McAfee, Flavio Menezes, Meg Meyer, Jonathan Mirrlees-Black, John Morgan, Marco Pagnozzi, Nicola Persico, Eric Rasmussen, David Salant, Margaret Stevens, Rebecca Stone, Lucy White, Mark Williams, Xavier Vives, Caspar de Vries, and Charles Zheng.

APPENDIX 1: COMPARING LITIGATION SYSTEMS Assume that after transfers between the parties, the loser ends up paying fraction α ≥ 0 of his own expenses and fraction β ≤ 1 of his opponent’s. (The winner pays the remainder.)78 The American system is α = 1, β = 0; the British system is α = β = 1; the Netherlands system is, roughly, α = 1, 0 < β < 1; and Quayle’s is α = 2, β = 0. It is also interesting to consider a “reverse-Quayle” rule α = 1, β < 0 in which both parties pay their own expenses, but the winner transfers an amount proportional to her own expenses to the loser. Let L be the average legal expenses spent per player. The following slight generalization of the RET is the key: assuming the conditions of the RET all hold except for assumption (ii) (i.e., the expected surplus of a bidder with the lowest feasible valuation, say S, may not be zero), it remains true that the expected surplus of any other types of bidder is a ﬁxed amount above S. [See, e.g., Klemperer (1999a; Appendix A); the ﬁxed amount 78

As in the main text, we assume a symmetric equilibrium with strictly increasing bidding functions. For extreme values of α and β, this may not exist (and we cannot then use the RET directly). See Baye, Kovenock, and de Vries (1997) for explicit solutions for the equilibria for different α and β.

Auction Theory

49

depends on the distribution of the parties’ valuations, but unlike S and L does not depend on the mechanism {α, β}.] It follows that the average bidder surplus is S plus a constant. But the average bidder surplus equals the average lawsuit winnings (expectation of {probability of winning} × {valuation}) minus L, equals a constant minus L by assumption (i) of the RET. So, S = K − L in which K is a constant independent of α and β. Because the lowest valuation type always loses in equilibrium [by assumption (i) of the RET], she bids zero so S = −β L, because in a one-shot game her opponent, on average, incurs expenses of L. Solving, L = K /(1 − β) and the expected surplus of any given party is a constant minus β K /(1 − β). It follows that both expected total expenses and any party’s expected payoff are invariant to α; hence the remarks in the text about the Quayle proposal. But legal expenses are increasing in β, indeed become unbounded in the limit corresponding to the British system. The mechanism that minimizes legal expenses taking the set of lawsuits as given is the reverse Quayle. The intuition is that it both increases the marginal cost of spending on a lawsuit and reduces the value of winning the suit. On the other hand, of course, bringing lawsuits becomes more attractive as β falls.

APPENDIX 2: DIRECT PROOF OF MONOPOLYTHEORETIC VERSION OF PROPOSITION IN SECTION 4 The proof rests precisely on the assumptions (a ), (b ), and (c ). Without loss of generality, let ﬁrms’ marginal costs be ﬂat up to capacity,79 and consider what would be the marginal revenue curve for the market if the K new consumers were attracted into it (see Figure 2.3). A monopolist on this (expanded) market would earn area A in proﬁts (i.e., the area between the marginal revenue and marginal cost curves up to the monopoly point, M). The perfectly competitive industry in the same (expanded) market would earn c = A − B, that is, the integral of marginal revenue less marginal cost up to industry capacity, K . By assumption (a ), a monopolist (or fully cartelized industry) in the original market would earn M = [Q/(Q + K )]A. Now, the average marginal revenue up to quantity Q + K equals the price at demand Q + K (because total marginal revenue = price × quantity), which exceeds marginal cost by assumption (b ), so B + C ≤ A. Furthermore, by assumption (c ) and elementary geometry, B ≤ [(K − M)/((Q + K ) − M)](B + C). So, B ≤ [(K − M)/(Q + K − M)]A, and therefore c = A − B ≥ [Q/(Q + K − M)]A ≥ M , as required. 79

If the industry cost curve is not ﬂat up to the capacity, then use the argument in the text to prove the result for a cost curve that is ﬂat and everywhere weakly above the actual cost curve. A fortiori, this proves the result for the actual curve, because a monopoly saves less from a lower cost curve than a competitive industry saves from the lower cost curve.

50

Klemperer

Marginal Revenue Marginal Cost Marginal cost

A

B

C

Marginal revenue Quantity M

K

Q+K

Figure 2.3. Marginal revenue if demand is expanded.

References Abreu, D. and F. Gul (2000), “Bargaining and Reputation,” Econometrica, 68, 85–117. Athey, S., K. Bagwell, and C. Sanchirico (2000), “Collusion and Price Rigidity,” mimeo, MIT. Ausubel, L. M., P. Cramton, R. P. McAfee, and J. McMillan (1997), “Synergies in Wireless Telephony: Evidence from the Broadband PCS Auctions,” Journal of Economics and Management Strategy, 6, 497–527. Ausubel, L. M. and J. A. Schwartz (1999), “The Ascending Auction Paradox,” Working Paper, University of Maryland. Bartolini, L. and C. Cottarelli (1997), “Designing Effective Auctions for Treasury Securities,” Current Issues in Economics and Finance, 3, 1–6. Baye, M. R., D. Kovenock, and C. de Vries (1997), “Fee Allocation of Lawyer Services in Litigation,” mimeo, Indiana University, Purdue University, and Tinbergen Institute, Erasmus University. Baye, M. R., D. Kovenock, and C. de Vries (1998, February), “A General Linear Model of Contests,” Working Paper, Indiana University, Purdue University, and Tinbergen Institute, Erasmus University. Baye, M. R. and J. Morgan (1999a), “A Folk Theorem for One-Shot Bertrand Games,” Economics Letters, 65, 59–65.

Auction Theory

51

Baye, M. R. and J. Morgan (1999b), “Bounded Rationality in Homogeneous Product Pricing Games,” Working Paper, Indiana University and Princeton University. Baye, M. R. and J. Morgan (2001), “Information Gatekeepers on the Internet and the Competitiveness of Homogeneous Product Markets,” American Economic Review, 91, 454–474. Beggs, A. W. and P. D. Klemperer (1992), “Multi-Period Competition with Switching Costs,” Econometrica, 60(3), 651–666. Bernheim, B. D. and M. D. Whinston (1986), “Menu Auctions, Resource Allocation, and Economic Inﬂuence,” Quarterly Journal of Economics, 101, 1–31. Binmore, K. and P. D. Klemperer (2002), “The Biggest Auction Ever: The Sale of the British 3G Telecom Licenses,” Economic Journal, 112(478), C74–C96. Bliss, C. and B. Nalebuff (1984), “Dragon-Slaying and Ballroom Dancing: The Private Supply of a Public Good,” Journal of Public Economics, 25, 1–12. Board, S. A. (1999), “Commitment in Auctions,” M. Phil. Thesis, Nufﬁeld College, Oxford University. Brusco, S. and G. Lopomo (1999), “Collusion via Signalling in Open Ascending Auctions with Multiple Objects and Complementarities,” Working Paper, Stern School of Business, New York University. Bulow, J. I., M. Huang, and P. D. Klemperer (1999), “Toeholds and Takeovers,” Journal of Political Economy, 107, 427–454. Bulow, J. I. and P. D. Klemperer (1994), “Rational Frenzies and Crashes,” Journal of Political Economy, 102, 1–23. Bulow, J. I. and P. D. Klemperer (1996), “Auctions vs. Negotiations,” American Economic Review, 86, 180–194. Bulow, J. I. and P. D. Klemperer (1998), “The Tobacco Deal,” Brookings Papers on Economic Activity (Microeconomics), 323–394. Bulow, J. I. and P. D. Klemperer (1999), “The Generalized War of Attrition,” American Economic Review, 89, 175–189. Bulow, J. I. and P. D. Klemperer (2002a), “Prices and the Winner’s Curse,” Rand Journal of Economics, 33(1), 1–21. Bulow, J. I. and P. D. Klemperer (2002b), “Privacy and Prices,” Nufﬁeld College, Oxford University Discussion Paper, available at www.paulklemperer.org. Bulow, J. I. and D. J. Roberts (1989), “The Simple Economics of Optimal Auctions,” Journal of Political Economy, 97, 1060–1090. Cramton, P. (1997), “The FCC Spectrum Auctions: An Early Assessment,” Journal of Economics and Management Strategy, 6(3), 431–495. Cramton, P., R. Gibbons, and P. D. Klemperer (1987), “Dissolving a Partnership Efﬁciently,” Econometrica, 55(3), 615–632. Cramton, P. and J. A. Schwartz (2001), “Collusive Bidding: Lessons from the FCC Spectrum Auctions,” Journal of Regulatory Economics, 18, 187–205. Engelbrecht-Wiggans, R. and C. M. Kahn (1998), “Low Revenue Equilibria in Simultaneous Auctions,” Working Paper, University of Illinois. Feddersen, T. J. and W. Pesendorfer (1996), “The Swing Voter’s Curse,” American Economic Review, 86(3), 408–424. Feddersen, T. J. and W. Pesendorfer (1998), “Convicting the Innocent: The Inferiority of Unanimous Jury Verdicts under Strategic Voting,” American Political Science Review, 92(1), 23–35. Federico, G. and D. Rahman (2000), “Bidding in an Electricity Pay-as-Bid Auction,” Working Paper, Nufﬁeld College.

52

Klemperer

von der Fehr, N.-H. and D. Harbord (1993), “Spot Market Competition in the UK Electricity Industry,” Economic Journal, 103, 531–546. von der Fehr, N.-H. and D. Harbord (1998), “Competition in Electricity Spot Markets: Economic Theory and International Experience,” Memorandum No. 5/1998, Department of Economics, University of Oslo. Friedman, L. (1956), “A Competitive Bidding Strategy,” Operations Research, 4, 104– 112. Fudenberg, D., and D. M. Kreps (1987), “Reputation in the Simultaneous Play of Multiple Opponents,” Review of Economic Studies, 54, 541–568. Fudenberg, D. and J. Tirole (1986), “A Theory of Exit in Duopoly,” Econometrica, 54, 943–960. Fudenberg, D., R. Gilbert, J. Stiglitz and J. Tirole (1983), “Preemption, Leapfrogging, and Competition in Patent Races,” European Economic Review, 22, 3–31. Gilbert, R. and P. D. Klemperer, (2000), “An Equilibrium Theory of Rationing,” Rand Journal of Economics, 31(1), 1–21. Grimm, V., F. Riedel, and E. Wolfstetter (2001), “Low Price Equilibrium in MultiUnit Auctions: The GSM Spectrum Auction in Germany,” Working Paper, Humboldt Universit¨at zu Berlin. Haigh, J. and C. Cannings, (1989), “The n-Person War of Attrition,” Acta Applicandae Mathematicae, 14, 59–74. Hall, R. E. (2001), Digital Dealing. New York: W. W. Norton. Hansen, R. G. (1988), “Auctions with Endogenous Quantity,” Rand Journal of Economics, 19, 44–58. Holt, C. A. Jr. and R. Sherman (1982), “Waiting-Line Auctions.” Journal of Political Economy, 90, 280–294. Hughes, J. W. and E. A. Snyder (1995), “Litigation and Settlement under the English and American Rules: Theory and Evidence.” Journal of Law and Economics, 38, 225–250. Jehiel, P. and B. Moldovanu (2000), “A Critique of the Planned Rules for the German UMTS/IMT-2000 License Auction,” Working Paper, University College London and University of Mannheim. Kagel, J. H. (1995), “Auctions: A Survey of Experimental Research,” in The Handbook of Experimental Economics (ed. by J. H. Kagel and A. E. Roth), Princeton, NJ: Princeton University Press, 501–586. Kambe, S. (1999), “Bargaining with Imperfect Commitment,” Games and Economic Behavior, 28(2), 217–237. Kaplan, T. and D. Wettstein (2000), “The Possibility of Mixed-Strategy Equilibria with Constant-Returns-to-Scale Technology under Bertrand Competition,” Spanish Economic Review, 2(1), 65–71. Klemperer, P. D. (1987a), “Markets with Consumer Switching Costs,” Quarterly Journal of Economics, 102(2), 375–394. Klemperer, P. D. (1987b), “The Competitiveness of Markets with Switching Costs,” Rand Journal of Economics, 18(1), 138–150. Klemperer, P. D. (1995), “Competition When Consumers Have Switching Costs: An Overview with Applications to Industrial Organization, Macroeconomics, and International Trade,” Review of Economic Studies, 62(4), 515–539. Klemperer, P. D. (1998), “Auctions with Almost Common Values,” European Economic Review, 42, 757–769. Klemperer, P. D. (1999a), “Auction Theory: A Guide to the Literature,” Journal of

Auction Theory

53

Economic Surveys, 13(3), 227–286. [Also reprinted in The Current State of Economic Science, 2, 711–766, (ed. by S. Dahiya), 1999.] Klemperer, P. D. (1999b), “Applying Auction Theory to Economics,” Working Paper, Nufﬁeld College Oxford. Klemperer, P. D. (Ed.) (2000), The Economic Theory of Auctions. Cheltenham, UK: Edward Elgar. Klemperer, P. D. (2002a), “What Really Matters in Auction Design,” Journal of Economic Perspectives, 16(1), 169–189. Klemperer, P. D. (2002b), “How (Not) to Run Auctions: The European 3G Telecom Auctions,” European Economic Review, 46(4–5), 829–845. Klemperer, P. D. (2002c), “Using and Abusing Economic Theory,” 2002 Marshall Lecture to the European Economic Association. Forthcoming at www.paulklemperer. org. Klemperer, P. D. (2002d), “Some Observations on the British 3G Telecom Auction,” ifo Studien, 48(1), forthcoming, and at www.paulklemperer.org Klemperer, P. D. (2002e), “Some Observations on the German 3G Telecom Auction.” ifo Studien, 48(1), forthcoming, and at www.paulklemperer.org Klemperer, P. D. and M. A. Meyer (1986), “Price Competition vs. Quantity Competition: The Role of Uncertainty,” Rand Journal of Economics, 17(4), 618–638. Klemperer, P. D. and M. A. Meyer (1989), “Supply Function Equilibria in Oligopoly under Uncertainty,” Econometrica, 57, 1243–1277. Klemperer, P. D. and M. Pagnozzi (2002), “Advantaged Bidders and Spectrum Prices: An Empirical Analysis,” Discussion Paper, Nufﬁeld College, Oxford University, available at www.paulklemperer.org Kremer, I. (2000), “Information Aggregation in Common Value Auctions,” Working Paper, Northwestern University. K¨uhn, K.-U. and X. Vives (1994), “Information Exchanges among Firms and Their Impact on Competition,” Working Paper, Institut d’An`alisi Econ`omica (CSIC) Barcelona. Laffont, J.-J. (1997), “Game Theory and Empirical Economics: The Case of Auction Data,” European Economic Review, 41, 1–35. Lee, H. G. (1998), “Do Electronic Marketplaces Lower the Price of Goods?” Communications of the ACM, 41, 73–80. Lee, H. G., J. C. Westland, and S. Hong (1999), “The Impact of Electronic Marketplaces on Product Prices: An Empirical Study of AUCNET,” International Journal of Electronic Commerce, 4-2, 45–60. Maskin, E. S. (1992), “Auctions and Privatization,” in Privatization: Symposium in Honour of Herbert Giersch (ed. by H. Siebert), T¨ubingen: Mohr, 115–136. Matthews, S. A. (1984), “Information Acquisition in Discriminatory Auctions,” in Bayesian Models in Economic Theory (ed. by M. Boyer and R. E. Kihlstrom), New York: North-Holland, 181–207. Maynard Smith, J. (1974), “The Theory of Games and the Evolution of Animal Conﬂicts,” Journal of Theoretical Biology, 47, 209–219. McAdams, D. (1998), “Adjustable Supply and “Collusive-Seeming Equilibria” in The Uniform-Price Share Auction, Working Paper, Stanford University. McAfee, R. P. and J. McMillan (1996), “Analyzing the Airwaves Auction,” Journal of Economic Perspectives, 10, 159–175. McMillan, J. (1994), “Selling Spectrum Rights,” Journal of Economic Perspectives, 8, 145–162.

54

Klemperer

Menezes, F. (1996), “Multiple-Unit English Auctions,” European Journal of Political Economy, 12, 671–684. Menezes, F., P. K. Monteiro, and A. Temimi (2000), “Discrete Public Goods with Incomplete Information,” Working Paper, EPGE/FGV. Milgrom, P. R. (1979), “A Convergence Theorem for Competitive Bidding with Differential Information,” Econometrica, 47, 679–688. Milgrom, P. R. (1981), “Rational Expectations, Information Acquisition, and Competitive Bidding,” Econometrica, 49, 921–943. Milgrom, P. R. (1985), “The Economics of Competitive Bidding: A Selective Survey,” in Social Goals and Social Organization: Essays in Memory of Elisha Pazner, (ed. by L. Hurwicz, D. Schmeidler, and H. Sonnenschein), Cambridge: Cambridge University Press. Milgrom, P. R. (1987), “Auction Theory,” in Advances in Economic Theory–Fifth World Congress, (ed. by T. F. Bewley), Cambridge:Cambridge University Press. Milgrom, P. R. (forthcoming), Putting Auction Theory to Work. Cambridge:Cambridge University Press. Milgrom, P. R. and R. J. Weber (1982), “A Theory of Auctions and Competitive Bidding,” Econometrica, 50, 1089–1122. Mussa, M. and S. Rosen (1978), “Monopoly and Product Quality,” Journal of Economic Theory, 18, 301–317. Myerson, R. B. (1981), “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. Newbery, D. M. (1998), “Competition, Contracts, and Entry in the Electricity Spot Market,” Rand Journal of Economics, 29(4), 726–749. Nyborg, K. and S. Sundaresan (1996), “Discriminatory Versus Uniform Treasury Auctions: Evidence from When-Issued Transactions,” Journal of Financial Economics, 42, 63–104. Ofﬁce of Gas and Electricity Markets (1999), The New Electricity Trading Arrangements, July, available at www.open.gov.uk/offer/reta.htm. Ortega-Reichert, A. (1968), Models for Competitive Bidding under Uncertainty. Stanford University Ph.D. Thesis (and Technical Report No. 8, Department of Operations Research, Stanford University). [Chapter 8 reprinted with foreword by S. A. Board and P. D. Klemperer, in P. D. Klemperer (Ed.) (2000), The Economic Theory of Auctions, Cheltenham, UK: Edward Elgar.] Persico, N. (2000), “Games of Redistribution Politics are Equivalent to All-Pay Auctions with Consolation Prizes,” Working Paper, University of Pennsylvania. Pesendorfer, W. and J. M. Swinkels (1997), “The Loser’s Curse and Information Aggregation in Common Value Auctions,” Econometrica, 65, 1247–1281. Pesendorfer, W. and J. M. Swinkels (2000), “Efﬁciency and Information Aggregation in Auctions,” American Economic Review, 90(3), 499–525. Plott, C. (1997), “Laboratory Experimental Testbeds: Application to the PCS Auction,” Journal of Economics and Management Strategy, 6(3), 605–638. Radiocommunications Agency (1998a), “UMTS Auction Design.” UMTS Auction Consultative Group Report, 98, 14, available at www.spectrumauctions.gov.uk. Radiocommunications Agency. (1998b), “UMTS Auction Design 2.” UMTS Auction Consultative Group Report, 98, 16, available at www.spectrumauctions.gov.uk. Riley, J. G. (1980), “Strong Evolutionary Equilibrium and the War of Attrition,” Journal of Theoretical Biology, 82, 383–400. Riley, J. G. and W. F. Samuelson (1981), “Optimal Auctions,” American Economic Review, 71, 381–392.

Auction Theory

55

Robinson, M. S. (1985), “Collusion and the Choice of Auction,” Rand Journal of Economics, 16, 141–145. Rosenthal, R. W. (1980), “A Model in Which an Increase in the Number of Sellers Leads to a Higher Price,” Econometrica, 48(6), 1575–1579. Rothkopf, M. H. (1969), “A Model of Rational Competitive Bidding,” Management Science, 15, 362–373. Salant, D. (1997), “Up in the Air: GTE’s Experience in the MTA Auction for Personal Communication Services Licenses,” Journal of Economics and Management Strategy, 6(3), 549–572. Scott Morton, F., F. Zettelmeyer, and J. Silva Risso (2001), “Internet Car Retailing,” Working Paper, Yale University. Spence, M. A. (1972), “Market Signalling: The Informational Structure of Job Markets and Related Phenomena,” Ph.D. Thesis, Harvard University. Spulber, D. F. (1995), “Bertrand Competition When Rivals’ Costs Are Unknown, ” Journal of Industrial Economics, 43, 1–12. Stevens, M. (1994), “Labour Contracts and Efﬁciency in On-the-Job Training,” Economic Journal, March, 104(423), 408–419. Stevens, M. (2000), “Reconciling Theoretical and Empirical Human Capital Earnings Functions,” Working Paper, Nufﬁeld College, Oxford University. Stigler, G. J. (1964), “A Theory of Oligopoly,” Journal of Political Economy, 72, 44–61. Swinkels, J. M. (2001), “Efﬁciency of Large Private Value Auctions,” Econometrica, 69 37–68. Vickrey, W. (1961), “Counterspeculation, Auctions, and Competitive Sealed Tenders,” Journal of Finance, 16, 8–37. Vickrey, W. (1962), “Auction and Bidding Games,” in Recent Advances in Game Theory, Princeton, NJ: The Princeton University Conference, 15–27. Vickrey, W. (1976), “Auctions Markets and Optimum Allocations,” in Bidding and Auctioning for Procurement and Allocation: Studies in Game Theory and Mathematical Economics, (ed. by Y. Amihud), New York: New York University Press, 13–20. Vives, X. (1999), “Information Aggregation, Strategic Behavior, and Efﬁciency in Cournot Markets,” Discussion Paper, Institut d’An`alisi Econ`omica (CSIC, Barcelona). Weber, R. J. (1997), “Making More from Less: Strategic Demand Reduction in the FCC Spectrum Auctions,” Journal of Economics and Management Strategy, 6(3), 529–548. Wilson, R. (1967), “Competitive Bidding with Asymmetric Information,” Management Science, 13, A816–A820. Wilson, R. (1969), “Competitive Bidding with Disparate Information,” Management Science, 15, 446–448. Wilson, R. (1977), “A Bidding Model of Perfect Competition,” Review of Economic Studies, 44, 511–518. Wilson, R. (1979), “Auctions of Shares,” Quarterly Journal of Economics, 93, 675–689. Wolfram, C. D. (1998), “Strategic Bidding in a Multiunit Auction: An Empirical Analysis of Bids to Supply Electricity in England and Wales,” Rand Journal of Economics, 29(4), 703–725. Wolfram, C. D. (1999), “Measuring Duopoly Power in the British Electricity Spot Market,” American Economic Review, 89, 805–826. Zheng, C. (1999), “High Bids and Broke Winners,” mimeo, University of Minnesota.

CHAPTER 3

Global Games: Theory and Applications Stephen Morris and Hyun Song Shin

1. INTRODUCTION Many economic problems are naturally modeled as a game of incomplete information, where a player’s payoff depends on his own action, the actions of others, and some unknown economic fundamentals. For example, many accounts of currency attacks, bank runs, and liquidity crises give a central role to players’ uncertainty about other players’ actions. Because other players’ actions in such situations are motivated by their beliefs, the decision maker must take account of the beliefs held by other players. We know from the classic contribution of Harsanyi (1967–1968) that rational behavior in such environments not only depends on economic agents’ beliefs about economic fundamentals, but also depends on beliefs of higher-order – i.e., players’ beliefs about other players’ beliefs, players’ beliefs about other players’ beliefs about other players’ beliefs, and so on. Indeed, Mertens and Zamir (1985) have shown how one can give a complete description of the “type” of a player in an incomplete information game in terms of a full hierarchy of beliefs at all levels. In principle, optimal strategic behavior should be analyzed in the space of all possible inﬁnite hierarchies of beliefs; however, such analysis is highly complex for players and analysts alike and is likely to prove intractable in general. It is therefore useful to identify strategic environments with incomplete information that are rich enough to capture the important role of higher-order beliefs in economic settings, but simple enough to allow tractable analysis. Global games, ﬁrst studied by Carlsson and van Damme (1993a), represent one such environment. Uncertain economic fundamentals are summarized by a state θ and each player observes a different signal of the state with a small amount of noise. Assuming that the noise technology is common knowledge among the players, each player’s signal generates beliefs about fundamentals, beliefs about other players’ beliefs about fundamentals, and so on. Our purpose in this paper is to describe how such models work, how global game reasoning can be applied to economic problems, and how this analysis relates to more general analysis of higher-order beliefs in strategic settings.

Global Games

57

One theme that emerges is that taking higher-order beliefs seriously does not require extremely sophisticated reasoning on the part of players. In Section 2, we present a benchmark result for binary action continuum player games with strategic complementarities where each player has the same payoff function. In a global games setting, there is a unique equilibrium where each player chooses the action that is a best response to a uniform belief over the proportion of his opponents choosing each action. Thus, when faced with some information concerning the underlying state of the world, the prescription for each player is to hypothesize that the proportion of other players who will opt for a particular action is a random variable that is uniformly distributed over the unit interval and choose the best action under these circumstances. We dub such beliefs (and the actions that they elicit) as being Laplacian, following Laplace’s (1824) suggestion that one should apply a uniform prior to unknown events from the “principle of insufﬁcient reason.” A striking feature of this conclusion is that it reconciles Harsanyi’s fully rational view of optimal behavior in incomplete information settings with the dissenting view of Kadane and Larkey (1982) and others that rational behavior in games should imply only that each player chooses an optimal action in the light of his subjective beliefs about others’ behavior, without deducing his subjective beliefs as part of the theory. If we let those subjective beliefs be the agnostic Laplacian prior, then there is no contradiction with Harsanyi’s view that players should deduce rational beliefs about others’ behavior in incomplete information settings. The importance of such analysis is not that we have an adequate account of the subtle reasoning undertaken by the players in the game; it clearly does not do justice to the reasoning inherent in the Harsanyi program. Rather, its importance lies in the fact that we have access to a form of short-cut, or heuristic device, that allows the economist to identify the actual outcomes in such games, and thereby open up the possibility of systematic analysis of economic questions that may otherwise appear to be intractable. One instance of this can be found in the debate concerning self-fulﬁlling beliefs and multiple equilibria. If one set of beliefs motivates actions that bring about the state of affairs envisaged in those beliefs, while another set of selffulﬁlling beliefs bring about quite different outcomes, then there is an apparent indeterminacy in the theory. In both cases, the beliefs are logically coherent, consistent with the known features of the economy, and are borne out by subsequent events. However, we do not have any guidance on which outcome will transpire without an account of how the initial beliefs are determined. We have argued elsewhere (Morris and Shin, 2000) that the apparent indeterminacy of beliefs in many models with multiple equilibria can be seen as the consequence of two modeling assumptions introduced to simplify the theory. First, the economic fundamentals are assumed to be common knowledge. Second, economic agents are assumed to be certain about others’ behavior in equilibrium. Both assumptions are made for the sake of tractability, but they do much more besides.

58

Morris and Shin

They allow agents’ actions and beliefs to be perfectly coordinated in a way that invites multiplicity of equilibria. In contrast, global games allow theorists to model information in a more realistic way, and thereby escape this straitjacket. More importantly, through the heuristic device of Laplacian actions, global games allow modelers to pin down which set of self-fulﬁlling beliefs will prevail in equilibrium. As well as any theoretical satisfaction at identifying a unique outcome in a game, there are more substantial issues at stake. Global games allow us to capture the idea that economic agents may be pushed into taking a particular action because of their belief that others are taking such actions. Thus, inefﬁcient outcomes may be forced on the agents by the external circumstances even though they would all be better off if everyone refrained from such actions. Bank runs and ﬁnancial crises are prime examples of such cases. We can draw the important distinction between whether there can be inefﬁcient equilibrium outcomes and whether there is a unique outcome in equilibrium. Global games, therefore, are of more than purely theoretical interest. They allow more enlightened debate on substantial economic questions. In Section 2.3, we discuss applications that model economic problems using global games. Global games open up other interesting avenues of investigation. One of them is the importance of public information in contexts where there is an element of coordination between the players. There is plentiful anecdotal evidence from a variety of contexts that public information has an apparently disproportionate impact relative to private information. Financial markets apparently “overreact” to announcements from central bankers that merely state the obvious, or reafﬁrm widely known policy stances. But a closer look at this phenomenon with the beneﬁt of the insights given by global games makes such instances less mysterious. If market participants are concerned about the reaction of other participants to the news, the public nature of the news conveys more information than simply the “face value” of the announcement. It conveys important strategic information on the likely beliefs of other market participants. In this case, the “overreaction” would be entirely rational and determined by the type of equilibrium logic inherent in a game of incomplete information. In Section 3, these issues are developed more systematically. Global games can be seen as a particular instance of equilibrium selection though perturbations. The set of perturbations is especially rich because it turns out that they allow for a rich structure of higher-order beliefs. In Section 4, we delve somewhat deeper into the properties of general global games – not merely those whose action sets are binary. We discuss how global games are related to other notions of equilibrium reﬁnements and what is the nature of the perturbation implicit in global games. The general framework allows us to disentangle two properties of global games. The ﬁrst property is that a unique outcome is selected in the game. A second, more subtle, question is how such a unique outcome depends on the underlying information structure and the noise in the players’ signals. Although in some cases the outcome is sensitive to the details of the information structure, there are cases where a particular outcome

Global Games

59

is selected and where this outcome turns out to be robust to the form of the noise in the players’ signals. The theory of “robustness to incomplete information” as developed by Kajii and Morris (1997) holds the key to this property. We also discuss a larger theoretical literature on higher-order beliefs and the relation to global games. In Section 5, we show how recent work on local interaction games and dynamic games with payoff shocks use a similar logic to global games in reaching unique predictions. 2. SYMMETRIC BINARY ACTION GLOBAL GAMES 2.1.

Linear Example

Let us begin with the following example taken from Carlsson and van Damme (1993a). Two players are deciding whether to invest. There is a safe action (not invest); there is a risky action (invest) that gives a higher payoff if the other player invests. Payoffs are given in Table 3.1: Table 3.1. Payoffs of leading example

Invest NotInvest

Invest

NotInvest

θ, θ 0, θ − 1

θ − 1, 0 0, 0

(2.1)

If there was complete information about θ, there would be three cases to consider: r If θ > 1, each player has a dominant strategy to invest. r If θ ∈ [0, 1], there are two pure strategy Nash equilibria: both invest and both not invest. r If θ < 0, each player has a dominant strategy not to invest. But there is incomplete information about θ . Player i observes a private signal xi = θ + εi . Each εi is independently normally distributed with mean 0 and standard deviation σ . We assume that θ is randomly drawn from the real line, with each realization equally likely. This implies that a player observing signal x considers θ to be distributed normally with mean x and standard deviation σ . This in turn implies that he thinks his opponent’s signal x is normally √ distributed with mean x and standard deviation 2σ. The assumption that θ is uniformly distributed on the real line is nonstandard, but presents no technical difﬁculties. Such “improper priors” (with an inﬁnite mass) are well behaved, as long as we are concerned only with conditional beliefs. See Hartigan (1983) for a discussion of improper priors. We will also see later that an improper

60

Morris and Shin . .... .... ... ... ... .... .... .... ... . ... .... ... . .. .... ....... ....... ....... .......... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ......... ....... ....... ....... ....... ....... ....... ....... .. .. ...... ... .................. . ............ .... .... .......... ......... . .... ....... .... ....... . . . . . .... . ..... .... ....... ... ....... .. ... ... ............ ... . .... .. .... .......... . . . .... . . .... ........ .... .. ..... ... ...... .... . ...... .... ............. . . . . . .... . ...... . ..... .......... .... ...... .. ...... .... ......... ... ....... . . . . ... .. ... ..... . ... ..... ... ..... ... .... ..... . ...... .... .... ...... .. . . . . .... . ... ...... ... ...... ... ... ............ .. . ........... .... ........ ....... .... . . . . . . . . . . .. ...... ......... .... ... ........... ............... .... . .. ....................... ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ . . . . . . . . .. . . .... . .... .. ...

1

b (k)

0.5

0

0.5

1

k

Figure 3.1. Function b(k).

prior can be seen as a limiting case either as the prior distribution of θ becomes diffuse or as the standard deviation of the noise σ becomes small. A strategy is a function specifying an action for each possible private signal; a natural kind of strategy we might consider is one where a player takes the risky action only if he observes a private signal above some cutoff point, k: Invest, if x > k s(x) = NotInvest, if x ≤ k. We will refer to this strategy as the switching strategy around k. Now suppose that a player observed signal x and thought that his opponent was following such a “switching” strategy with cutoff √ point k. His expectation of θ will be x. He will assign probability 1/ 2σ (k − x) to his opponent observing a signal less than k [where (·) is the c.d.f. of the standard normal distribution]. In particular, if he has observed a signal equal to the cutoff point of his opponent (x = k), he will assign probability 12 to his opponent investing. Thus, there will be an equilibrium where both players follow switching strategies with cutoff 12 . In fact, a switching strategy with cutoff 12 is the unique strategy surviving iterated deletion of strictly interim-dominated strategies. To see why,1 ﬁrst deﬁne b(k) to be the unique value of x solving the equation k−x x − √ = 0. (2.2) 2σ The function b(·) is plotted in Figure 3.1. There is a unique such value because the left-hand side is strictly increasing in x and strictly decreasing in k. These 1

An alternative argument follows Milgrom and Roberts (1990): if a symmetric game with strategic complementarities has a unique symmetric Nash equilibrium, then the strategy played in that unique Nash equilibrium is also the unique strategy surviving iterated deletion of strictly dominated strategies.

Global Games

61

properties also imply that b(·) is strictly increasing. So, if your opponent is following a switching strategy with cutoff k, your best response is to follow a switching strategy with cutoff b(k). We will argue that if a strategy s survives n rounds of iterated deletion of strictly dominated strategies, then s(x) =

Invest, if NotInvest, if

x > bn−1 (1) x < bn−1 (0).

(2.3)

We argue the second clause by induction (the argument for the ﬁrst clause is symmetric). The claim is true for n = 1, because as we noted previously, NotInvest is a dominant strategy if the expected value of θ is less than 0. Now, suppose the claim is true for arbitrary n. If a player knew that his opponent would choose action NotInvest if he had observed a signal less than bn−1 (1), his best response would always be to choose action NotInvest if his signal was less than b(bn−1 (1)). Because b(·) is strictly increasing and has a unique ﬁxed point at 12 , bn (0) and bn (1) both tend to 12 as n → ∞. The unique equilibrium has both players investing only if they observe a signal greater than 12 . In the underlying symmetric payoff complete information game, investing is a risk dominant action (Harsanyi and Selten, 1988), exactly if θ ≥ 12 ; not investing is a risk dominant action exactly if θ ≤ 12 . The striking feature of this result is that no matter how small σ is, players’ behavior is inﬂuenced by the existence of the ex ante possibility that their opponent has a dominant strategy to choose each action.2 The probability that either individual invests is 1 − θ ; 2 σ Conditional on θ , their investment decisions are independent. The previous example and analysis are due to Carlsson and van Damme (1993a). There is a many-players analog of this game, whose solution is no more difﬁcult to arrive at. A continuum of players are deciding whether to invest. The payoff to not investing is 0. The payoff to investing is θ − 1 + l, where l is the proportion of other players choosing to invest. The information structure is as before, with each player i observing a private signal xi = θ + εi , where the εi are normally distributed in the population with mean 0 and standard deviation σ . Also in this case, the unique strategy surviving iterated deletion of strictly dominated strategies has each player investing if they observe a signal above 12 and not investing if they observe a signal below 12 . We will brieﬂy sketch why this is the case. Consider a player who has observed signal x and thinks that all his opponents are following the “switching” strategy with cutoff point k. As before, his √ expectation of θ will be x. As before, he will assign probability ((k − x)/ 2σ )) to 2

Thus, a “grain of doubt” concerning the opponent’s behavior has large consequences. This element has been linked by van Damme (1997) to the classic analysis of surprise attacks of Schelling (1960), Chapter 9.

62

Morris and Shin

any given opponent observing a signal less than k. But, because the realization of the signals are independent conditional on θ, his expectation of the proportion of players who observe a signal less than k will be exactly equal to the probability he assigns to any one opponent observing a signal √ less than k. Thus, his expected payoff to investing will be x − ((k − x)/ 2σ ), as before, and all the previous arguments go through. This argument shows the importance of keeping track of the layers of beliefs across players, and as such may seem rather daunting from the point of view of an individual player. However, the equilibrium outcome is also consistent with a procedure that places far less demands on the capacity of the players, and that seems to be far removed from equilibrium of any kind. This procedure has the following three steps. r Estimate θ from the signal x. r Postulate that l is distributed uniformly on the unit interval [0, 1]. r Take the optimal action. Because the expectation of θ conditional on x is simply x itself, the expected payoff to investing if l is uniformly distributed is x − 12 , whereas the expected payoff to not investing is zero. Thus, a player following this procedure will choose to invest or not depending on whether x is greater or smaller than 1 , which is identical to the unique equilibrium strategy previously outlined. 2 The belief summarized in the second bullet point is Laplacian in the sense introduced in the introductory section. It represents a “diffuse” or “agnostic” view on the actions of other players in the game. We see that an apparently naive and simplistic strategy coincides with the equilibrium strategy. This is not an accident. There are good reasons why the Laplacian action is the correct one in this game, and why it turns out to be an approximately optimal action in many binary action global games. The key to understanding this feature is to consider the following question asked by a player in this game. “My signal has realization x. What is the probability that proportion less than z of my opponents have a signal higher than mine?”

The answer to this question would be especially important if everyone is using the switching strategy around x, since the proportion of players who invest is equal to the proportion whose signal is above x. If the true state is θ, the proportion of players who receive a signal higher than x is given by 1 − ((ψ − θ)/σ ). So, this proportion is less than z if the state θ is such that 1 − ((ψ − θ )/σ ) ≤ z. That is, when θ ≤ x − σ −1 (1 − z). The probability of this event conditional on x is x − σ −1 (1 − z) − x = z. σ

(2.4)

Global Games

63

In other words, the cumulative distribution function of z is the identity function, implying that the density of z is uniform over the unit interval. If x is to serve as the switching point of an equilibrium switching strategy, a player must be indifferent between choosing to invest and not to invest given that the proportion who invest is uniformly distributed on [0, 1]. More importantly, even away from the switching point, the optimal action motivated by this belief coincides with the equilibrium action, even though the (Laplacian) belief may not be correct. Away from the switching point, the density of the random variable representing the proportion of players who invest will not be uniform. However, as long as the payoff advantage to investing is increasing in θ , the Laplacian action coincides with the equilibrium action. Thus, the apparently naive procedure outlined by the three bulleted points gives the correct prediction as to what the equilibrium action will be. In the next section, we will show that the lessons drawn from this simple example extend to cover a wide class of binary action global games. We will focus on the continuum player case in most of this paper. However, as suggested by this example, the qualitative analysis is very similar irrespective of the number of players. In particular, the analysis of the continuum player game with linear payoffs applies equally well to any ﬁnite number of players (where each player observes a signal with an independent normal noise term). Independent of the number of players, the cutoff signal in the unique equilibrium is 12 . However, a distinctive implication of the inﬁnite player case is that the outcome is a deterministic function of the realized state. In particular, once we know the realization of θ, we can calculate exactly the proportion of players who will invest. It is 1 −θ 2 ξ (θ ) = 1 − . σ With a ﬁnite number of players (I ), we write ξλ,I (θ) for the probability that at least proportion λ out of the I players invest when the realized state is θ : ξλ,I (θ) =

I n≥λI

n

1 2

−θ σ

I −n

1−

1 2

−θ σ

n .

Observe, however, that the many ﬁnite player case converges naturally to the continuum model: by the law of large numbers, as I → ∞, ξλ,I (θ) → 1

if λ < ξ (θ)

ξλ,I (θ) → 0

if λ > ξ (θ).

and

64

Morris and Shin

2.2.

Symmetric Binary Action Global Games: A General Approach

Let us now take one step in making the argument more general. We deal ﬁrst with the case where there is a uniform prior on the initial state, and each player’s signal is a sufﬁcient statistic for how much they care about the state (we call this the private values case). In this case, the analysis is especially clean, and it is possible to prove a uniqueness result and characterize the unique equilibrium independent of both the structure and size of the noise in players’ signals. We then show that the analysis can be extended to deal with general priors and payoffs that depend on the realized state. 2.2.1.

Continuum Players: Uniform Prior and Private Values

There is a continuum of players. Each player has to choose an action a ∈ {0, 1}. All players have the same payoff function, u : {0, 1} × [0, 1] × R → R, where u(a, l, x) is a player’s payoff if he chooses action a, proportion l of his opponents choose action 1, and his “private signal” is x. Thus, we assume that his payoff is independent of which of his opponents choose action 1. To analyze best responses, it is enough to know the payoff gain from choosing one action rather than the other. Thus, the utility function is parameterized by a function π : [0, 1] × R → R with π (l, x) ≡ u(1, l, x) − u(0, l, x). Formally, we say that an action is the Laplacian action if it is a best response to a uniform prior over the opponents’ choice of action. Thus, action 1 is the Laplacian action at x if 1 1 u(1, l, x)dl > u(0, l, x)dl, l=0

l=0

or, equivalently, 1 π (l, x)dl > 0; l=0

action 0 is the Laplacian action at x if 1 π (l, x)dl < 0. l=0

Generically, a continuum player, symmetric payoff, two-action game will have exactly one Laplacian action. A state θ ∈ R is drawn according to the (improper) uniform density on the real line. Player i observes a private signal xi = θ + σ εi , where σ > 0. The noise terms εi are distributed in the population with continuous density f (·),

Global Games

65

with support on the real line.3 We note that this density need not be symmetric around the mean, nor even have zero mean. The uniform prior on the real line is “improper” (i.e., has inﬁnite probability mass), but the conditional probabilities are well deﬁned: a player observing signal xi puts density (1/σ ) f ((xi − θ )/σ ) on state θ (see Hartigan 1983). The example of the previous section ﬁts this setting, where f (·) is the standard normal distribution and π (l, x) = x + l − 1. We will initially impose ﬁve properties on the payoffs: A1: Action Monotonicity: π(l, θ) is nondecreasing in l. A2: State Monotonicity: π(l, θ) is nondecreasing in θ . A3: Strict Laplacian State Monotonicity: There exists a unique θ ∗

1 solving l=0 π(l, θ ∗ )dl = 0. A4: Limit Dominance: There exist θ ∈ R and θ ∈ R, such that [1] π (l, x) < 0 for all l ∈ [0, 1] and x ≤ θ ; and [2] π (l, x) > 0 for all l ∈ [0, 1] and x ≥ θ . 1 A5: Continuity: l=0 g(l) π (l, x)dl is continuous with respect to signal x and density g. Condition A1 states that the incentive to choose action 1 is increasing in the proportion of other players’ actions who use action 1; thus there are strategic complementarities between players’ actions (Bulow, Geanakoplos, and Klemperer, 1985). Condition A2 states that the incentive to choose action 1 is increasing in the state; thus a player’s optimal action will be increasing in the state, given the opponents’ actions. Condition A3 introduces a further strengthening of A2 to ensure that there is at most one crossing for a player with Laplacian beliefs. Condition A4 requires that action 0 is a dominant strategy for sufﬁciently low signals, and action 1 is a dominant strategy for sufﬁciently high signals. Condition A5 is a weak continuity property, where continuity in g is with respect to the weak topology. Note that this condition allows for some discontinuities in payoffs. For example, 0, if l ≤ x π (l, x) = 1, if l > x satisﬁes A5 as for any given x, it is discontinuous at only one value of l. We denote by G ∗ (σ ) this incomplete information game – with the uniform prior and satisfying A1 through A5. A strategy for a player in the incomplete information game is a function s : R → {0, 1}, where s(x) is the action chosen if a player observes signal x. We will be interested in strategy proﬁles, s = (si )i ∈ [0,1] , that form a Bayesian Nash equilibrium of G ∗ (σ ). We will show not merely that there is a unique Bayesian Nash equilibrium of the game, but that a unique strategy proﬁle survives iterated deletion of strictly (interim) dominated strategies. 3

With small changes in terminology, the argument will extend to the case where f (·) has support on some bounded interval of the real line.

66

Morris and Shin

Proposition 2.1. Let θ ∗ be deﬁned as in (A3). The essentially unique strategy surviving iterated deletion of strictly dominated strategies in G ∗ (σ ) satisﬁes s(x) = 0 for all x < θ ∗ and s(x) = 1 for all x > θ ∗ . The “essential” qualiﬁcation arises because either action may be played if the private signal is exactly equal to θ ∗ . The key idea of the proof is that, with a uniform prior on θ, observing xi gives no information to a player on his ranking within the population of signals. Thus, he will have a uniform prior belief over the proportion of players who will observe higher signals. Proof. Write πσ∗ (x, k) for the expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: ∞ 1 x −θ k−θ f π 1− F , x dθ . πσ∗ (x, k) ≡ σ σ θ=−∞ σ First, observe that πσ∗ (x, k) is continuous in x and k, increasing in x, and decreasing in k, πσ∗ (x, k) < 0 if x ≤ θ and πσ∗ (x, k) > 0 if x ≥ θ. We will argue by induction that a strategy survives n rounds of iterated deletion of strictly interim dominated strategies if and only if 0, if x < ξ n s(x) = 1, if x > ξ n , where ξ 0 = −∞ and ξ 0 = +∞, and ξ n and ξ n are deﬁned inductively by ξ n+1 = min{x : πσ∗ (x, ξ n ) = 0} and ξ n+1 = max{x : πσ∗ (x, ξ n ) = 0}. Suppose the claim was true for n. By strategic complementarities, if action 1 were ever to be a best response to a strategy surviving n rounds, it must be a best response to the switching strategy with cutoff ξ n ; ξ n+1 is deﬁned to be the lowest signal where this occurs. Similarly, if action 0 were ever to be a best response to a strategy surviving n rounds, it must be a best response to the switching strategy with cutoff ξ n ; ξ n+1 is deﬁned to be the highest signal where this occurs. Now note that ξ n and ξ n are increasing and decreasing sequences, respectively, because ξ 0 = −∞ < θ < ξ 1 , ξ 0 = ∞ > θ > ξ 1 , and πσ∗ (x, k) is increasing in x and decreasing in k. Thus, ξ n → ξ and ξ n → ξ as n → ∞. The continuity of πσ∗ and the construction of ξ and ξ imply that we must have πσ∗ (ξ , ξ ) = 0 and πσ∗ (ξ , ξ ) = 0. Thus, the second step of our proof is to show that θ ∗ is the unique solution to the equation πσ∗ (x, x) = 0. To see this second step, write σ∗ (l; x, k) for the probability that a player assigns to proportion less than l of the other players observing a signal greater

Global Games

67

than k, if he has observed signal x. Observe that if the true state is θ, the proportion of players observing a signal greater than k is 1 − F((k − θ )/σ ). This proportion is less than l if θ ≤ k − σ F −1 (1 − l). So, k−σ F −1 (1−l) 1 x −θ ∗ σ (l; x, k) = f dθ σ σ θ=−∞ ∞ x −θ f (z) dz, changing variables to z = = x−k −1 σ z= σ +F (1−l) x −k + F −1 (1 − l) . = 1− F (2.6) σ Also observe that if x = k, then σ∗ (·; x, k) is the identity function [i.e., σ∗ (l; x, k) = l], so it is the cumulative distribution function of the uniform density. Thus, 1 π(l, x)dl. πσ∗ (x, x) = l=0

Now by A3, 2.2.2.

πσ∗ (x, x)

= 0 implies x = θ ∗ .

䊏

Continuum Players: General Prior and Common Values

Now suppose instead that θ is drawn from a continuously differentiable strictly positive density p(·) on the real line and that a player’s utility depends on the realized state θ , not his signal of θ . Thus, u(a, l, θ ) is his payoff if he chooses action a, proportion l of his opponents choose action 1, and the state is θ, and as before, π(l, θ) ≡ u(1, l, θ) − u(0, l, θ ). We must also impose two extra technical assumptions. A4∗ : Uniform Limit Dominance: There exist θ ∈ R, θ ∈ R, and ε ∈ R++ , such that [1] π (l, θ) ≤ −ε for all l ∈ [0, 1] and θ ≤ θ ; and [2] there exists θ such that π (l, θ) > ε for all l ∈ [0, 1] and θ ≥ θ. Property A4∗ strengthens property A4 by requiring that the payoff gain to choosing action 0 is uniformly positive for sufﬁciently low values of θ , and the payoff gain to choosing action 1 is uniformly positive for sufﬁciently high values of θ .

∞ A6: Finite Expectations of Signals: z=−∞ z f (z)dz is well deﬁned. Property A6 requires that the distribution of noise is integrable. We will denote by G(σ ) this incomplete information game, with prior p(·) and satisfying A1, A2, A3, A4∗ , A5, and A6. Proposition 2.2. Let θ ∗ be deﬁned as in A3. For any δ > 0, there exists σ > 0 such that for all σ ≤ σ , if strategy s survives iterated deletion of strictly dominated strategies in the game G(σ ), then s(x) = 0 for all x ≤ θ ∗ − δ, and s(x) = 1 for all x ≥ θ ∗ + δ.

68

Morris and Shin

We will sketch here why this general prior, common values, game G(σ ) becomes like the uniform prior, private values, game G ∗ (σ ) as σ becomes small. A more formal proof is relegated to Appendix A. Consider σ (l; x, k), the probability that a player assigns to proportion less than or equal to l of the other players observing a signal greater than or equal to k, if he has observed signal x:

k−σ F −1 (1−l) p(θ ) f ( x−θ ) dθ

∞ x−θ σ σ (l; x, k) = θ=−∞ dθ θ=−∞ p(θ ) f σ

∞ −1 (1−l) p (x − σ z) f (z) dz z= x−k σ +F

∞ , = z=−∞ p (x − σ z) f (z) dz x −θ changing variables to z = . σ For small σ , the shape of the prior will not matter and the posterior beliefs over l will depend only on (x − k)/σ , the normalized difference between the x and k. Formally, setting κ = (x − k)/σ , we have

∞ z=κ+F −1 (1−l) p(x − σ z) f (z) dz ∗

∞ , σ (l; x, x − σ κ) = z=−∞ p(x − σ z) f (z) dz so that as σ → 0, σ∗ (l; x, x − σ κ) →

∞

f (z) dz z=κ+F −1 (1−l)

= 1 − F(κ + F −1 (1 − l)).

(2.7)

In other words, for small σ , posterior beliefs concerning the proportion of opponents choosing each action are almost the same as under a uniform prior. The formal proof of proposition 2.2 presented in Appendix A consists of showing, ﬁrst, that convergence of posterior beliefs described previously is uniform; and, second, that the small amount of uncertainty about payoffs in the common value case does not affect the analysis sufﬁciently to matter. 2.2.3.

Discussion

The proofs of propositions 2.1 and 2.2 follow the logic of Carlsson and van Damme (1993) and generalize arguments presented in Morris and Shin (1998). The technique of analyzing the uniform prior private values game, and then showing continuity with respect to the general prior, common values game, follows Frankel, Morris, and Pauzner (2000). (This paper is discussed further in Section 4.1.) Carlsson and van Damme (1993b) showed a version of the uniform prior result (proposition 2.1) in the ﬁnite player case (see also Kim, 1996). We brieﬂy discuss the relation to the ﬁnite player case in Appendix B.

Global Games

69

How do these propositions make use of the underlying assumptions? First, note that assumptions A1 and A2 represent very strong monotonicity assumptions: A1 requires that each player’s utility function is supermodular in the action proﬁle, whereas A2 requires that each player’s utility function is supermodular in his own action and the state. Vives (1990) showed that the supermodularity property A2 of complete information game payoffs is inherited by the incomplete information game. Thus, the existence of a largest and smallest strategy proﬁle surviving iterated deletion of dominated strategies when payoffs are supermodular, noted by Milgrom and Roberts (1990), can be applied also to the incomplete information game. The ﬁrst step in the proof of proposition 2.1 is a special case of this reasoning, with the state monotonicity assumption A2 implying, in addition, that the largest and smallest equilibria consist of strategies that are monotonic with respect to type (i.e., switching strategies). Once we know that we are interested in monotonic strategies, the very weak assumption A3 is sufﬁcient to ensure the equivalence of the largest and smallest equilibria and thus the uniqueness of equilibrium. Can one dispense with the full force of the supermodular payoffs assumption A1? Unfortunately, as long as A1 is not satisﬁed at the cutoff point θ ∗ [i.e., π(l, θ ∗ ) is decreasing in l over some range], then one can ﬁnd a problematic noise distribution f (·) such that the symmetric switching strategy proﬁle with cutoff point θ ∗ is not an equilibrium, and thus there is no switching strategy equilibrium. To obtain positive results, one must either impose additional restrictions on the noise distribution or relax A1 only away from the cutoff point. We discuss both approaches in turn. Athey (2002) provides a general description of how monotone comparative static results can be preserved in stochastic optimization problems, when supermodular payoff conditions are weakened to single crossing properties, but signals are assumed to be sufﬁciently well behaved (i.e., satisfy a monotone likelihood ratio property). Athey (2001) has used such techniques to prove existence of monotonic pure strategy equilibria in a general class of incomplete information games, using weaker properties on payoffs, but substituting stronger restrictions on signal distribution. We can apply her results to our setting as follows. Consider the following two new assumptions. A1∗ : Action Single Crossing: For each θ ∈ R, there exists l ∗ ∈ R ∪ {−∞, ∞} such that π(l, θ ) < 0 if l < l ∗ and π (l, θ ) > 0 if l > l ∗ . A7: Monotone Likelihood Ratio Property: If x > x, then f (x − θ)/ f (x − θ) is increasing in θ. Assumption A1∗ is a signiﬁcant weakening of assumption A1 to a single crossing property. Assumption A7 is a new restriction on the distribution of the noise. Recall that we earlier made no assumptions on the distribution of the ) the incomplete information game with a uniform prior noise. Denote by G(σ ∗ satisfying A1 , A2, A3, A4, A5, and A7.

70

Morris and Shin

) has a unique (symmetLemma 2.3. Let θ ∗ be deﬁned as in A3. The game G(σ ric) switching strategy equilibrium, with s(x) = 0 for all x < θ ∗ and s(x) = 1 for all x > θ ∗ . The proof is in Appendix C. An analog of proposition 2.2 could be similarly constructed. Notice that this result does not show the nonexistence of other, nonmonotonic, equilibria. Additional arguments are required to rule out nonmonotonic equilibria. For example, in Goldstein and Pauzner (2000a) – an application to bank runs discussed in the next section – noise is uniformly distributed (and thus satisﬁes A7) and payoffs satisfy assumption A1∗ . They show that (1) there is a unique symmetric switching strategy equilibrium and that (2) there is no other equilibrium. Lemma 2.3 could be used to extend the former result to all noise distributions satisfying the MLRP (assumption A7), but we do not know if the latter result extends beyond the uniform noise distribution. Proposition 2.1 can also be weakened by allowing assumption A1 to fail away from θ ∗ . We will report one weakening that is sufﬁcient. Let g(·) and h(·) be densities on the interval [0, 1]; g stochastically dominates h (g h)

l

l if z=0 g(z) dz ≤ z=0 h(z) dz for all l ∈ [0, 1]. We write g(·) for the uniform density on [0, 1], i.e., g(l) = 1 for all l ∈ [0, 1]. Now consider

1 A8: There exists θ ∗ which solves l=0 π (l, θ ∗ )dl = 0 such that [1]

1 x ≥ θ ∗ and g g, with strict inl=0 g(l) π(l, x)dl ≥ 0 for all

1 ∗ equality if x > θ ; and [2] l=0 g(l)π (l, x)dl ≤ 0 for all x ≤ θ ∗ and g g, with strict inequality if x < θ ∗ . We can replace A1–A3 with A8 in propositions 2.1 and 2.2, and all the arguments and results go through. Observe that A1–A3 straightforwardly imply A8. Also, observe that A8 implies that π(l, θ ∗ ) be nondecreasing in l [suppose that l > l and π(l, θ ∗ ) < π (l , θ ∗ ); now start with the uniform distribution g and shift mass from l to l]. But, A8 allows some failure of A1 away from θ ∗ . Propositions 2.1 and 2.2 deliver strong negative conclusions about the efﬁciency of noncooperative outcomes in global games. In the limit, all players

1 will be choosing action 1 when the state is θ if l=0 π (l, θ )dl > 0. However, it is efﬁcient to choose action 1 at state θ if u(1, 1, θ ) > u(0, 0, θ ). These conditions will not coincide in general. For example, in the investment example, we had u(1, l, θ ) = θ + l − 1, u(0, l, θ) = 0 and thus π(l, θ ) = θ + l − 1. So in the limiting equilibrium, both players will be investing if the state θ is at least 12 , although it is efﬁcient for them to be investing if the state is at least 0. The analysis of the unique noncooperative equilibrium serves as a benchmark describing what will happen in the absence of other considerations. In practice, repeated play or other institutions will often allow players to do better. We will brieﬂy consider what happens in the game if players were allowed to make

Global Games

71

cheap talk statements about the signals that they have observed in the investment example (for this exercise, it is most natural to consider a ﬁnite player case; we consider the two-player case). The arguments here follow Baliga and Morris (2000). The investment example as formulated has a nongeneric feature, which is that if a player plans not to invest, he is exactly indifferent about which action his opponent will take. To make the problem more interesting, let us perturb the payoffs to remove this tie: Table 3.2. Payoffs for cheap talk example

Invest NotInvest

Invest

NotInvest

θ + δ, θ + δ δ, θ − 1

θ − 1, δ 0, 0

Thus, each player receives a small payoff δ (which may be positive or negative) if the other player invests, independent of his own action. This change does not inﬂuence each player’s best responses, and the analysis of this game in the absence of cheap talk is unchanged by the payoff change. But, observe that if δ ≤ 0, there is an equilibrium of the game with cheap talk, where each player truthfully announces his signal, and invests if the (common) expectation of θ conditional on both announcements is greater than −δ (this gives the efﬁcient outcome). On the other hand, if δ > 0, then each player would like to convince the other to invest even if he does not plan to do so. In this case, there cannot be a truth-telling equilibrium where the efﬁcient equilibrium is achieved, although there may be equilibria with some partially revealing cheap talk that improves on the no cheap talk outcome. 2.3.

Applications

We now turn to applications of these results and describe models of pricing debt (Morris and Shin, 1999b), currency crises (Morris and Shin, 1998), and bank runs (Goldstein and Pauzner, 2000a).4 Each of these papers makes speciﬁc assumptions about the distribution of payoffs and signals. But, if one is interested only in analyzing the limiting behavior as noise about θ becomes 4

See Fukao (1994) for an early argument in favor of using global game reasoning in applied settings. Other applications include Karp’s (2000) noisy version of Krugman’s (1991) multiple equilibrium model of sectoral shifts; Scaramozzino and Vulkan’s (1999) noisy model of Shleifer’s (1986) multiple equilibrium model of implementation cycles; and D¨onges and Heinemann’s (2000) model of competition between dealer markets and crossing networks in ﬁnancial markets.

72

Morris and Shin

small, the results of the previous section imply that we can identify the limiting behavior independently of the prior beliefs and the shape of the noise.5 In each example, we describe one comparative static exercise changing the payoffs of the game, illustrating how changing payoffs has a direct effect on outcomes and an indirect, strategic effect via the impact on the cutoff point of the unique equilibrium. We emphasize that it is also interesting in the applications to study behavior away from the limit; indeed, the focus of the analysis in Morris and Shin (1999b) is on comparative statics away from the limit. More assumptions on the shape of the prior and noise are required in this case. We study behavior away from the limit in Section 3. 2.3.1.

Pricing Debt

In Morris and Shin (1999b), we consider a simple model of debt pricing. In period 1, a continuum of investors hold collateralized debt that will pay 1 in period 2 if it is rolled over and if an underlying investment project is successful; the debt will pay 0 in period 2 if the project is not successful. If an investor does not roll over his debt, he receives the value of the collateral, κ ∈ (0, 1). The success of the project depends on the proportion of investors who do not roll over and the state of the economy, θ. Speciﬁcally, the project is successful if the proportion of investors not rolling over is less than θ/z. Writing 1 for the action “roll over” and 0 for the action “do not roll over,” payoffs can be described as follows: 1, if z (1 − l) ≤ θ u(1, l, θ ) = 0, if z (1 − l) > θ, u (0, l, θ ) = κ. So π(l, θ) ≡ u(1, l, θ ) − u(0, l, θ) 1 − κ, if z(1 − l) ≤ θ = −κ, if z(1 − l) > θ. Now

if −κ, θ π (l, θ ) dl = z − κ, if l=0 1 − κ, if

5

1

θ ≤0 0≤θ ≤z z ≤ θ.

The model in Goldstein and Pauzner (2000a) fails the action monotonicity property (A1) of the previous section, but they are nonetheless able to prove the uniqueness of a symmetric switching equilibrium, exploiting their assumption that noise terms are distributed uniformly. However, their game satisﬁes assumptions A1* and A2, and therefore whenever there is a unique equilibrium, it must satisfy the Laplacian characterization with the cutoff point θ ∗ deﬁned as in A3.

Global Games

1.0 0.8 0.6 V (κ) 0.4 0.2

73

.. .... .... .... .... . . .... .... .... ... .... .... . . . .. .... ... .... ... .... .... . . .... .... .... ... .... ... . . . . .... ... .... ... .... .. .... ... ... . ... . . . .. .... .. ... .. .... ... .... .. .... ... ... . ... . . .. ... ... .... ... .... .... ... ..... ... ....... ............ .....

0.0 0.0

0.2

0.4

0.6

0.8

1.0

κ Figure 3.2. Function V (κ).

Thus, θ ∗ = zκ. In other words, if private information about θ among the investors is sufﬁciently accurate, the project will collapse exactly if θ ≤ zκ. We can now ask how debt would be priced ex ante in this model (before anyone observed private signals about θ ). Recalling that p(·) is the density of the prior on θ , and writing P(·) for the corresponding cdf, the value of the collateralized debt will be V (κ) ≡ κ P(zκ) + 1 − P(zκ) = 1 − (1 − κ)P(zκ), and dV = P(zκ) − z(1 − κ) p(zκ). dκ Thus, increasing the value of collateral has two effects: ﬁrst, it increases the value of debt in the event of default (the direct effect). But, second, it increases the range of θ at which default occurs (the strategic effect). For small κ, the strategic effect outweighs the direct effect, whereas for large κ, the direct effect outweighs the strategic effect. Figure 3.2 plots V (·) for the case where z = 10 and p(·) is the standard normal density. Morris and Shin (1999b) study the model away from the limit and argue that taking the strategic, or liquidity, effect into account in debt pricing can help explain anomalies in empirical implementation of the standard debt pricing theory of Merton (1974). Brunner and Krahnen (2000) present evidence of the importance of debtor coordination in distressed lending relationships in Germany [see also Chui, Gai, and Haldane (2000) and Hubert and Sch¨afer (2000)].

74

Morris and Shin

2.3.2.

Currency Crises

In Morris and Shin (1998), a continuum of speculators must decide whether to attack a ﬁxed–exchange rate regime by selling the currency short. Each speculator may short only a unit amount. The current value of the currency is e∗ ; if the monetary authority does not defend the currency, the currency will ﬂoat to the shadow rate ζ (θ), where θ is the state of fundamentals. There is a ﬁxed transaction cost t of attacking. This can be interpreted as an actual transaction cost or as the interest rate differential between currencies. The monetary authority defends the currency if the cost of doing so is not too large. Assuming that the costs of defending the currency are increasing in the proportion of speculators who attack and decreasing in the state of fundamentals, there will be some critical proportion of speculators, a(θ ), increasing in θ, who must attack in order for a devaluation to occur. Thus, writing 1 for the action “not attack” and 0 for the action “attack,” payoffs can be described as follows: u(1, l, θ ) = 0, ∗ e − ζ (θ ) − t, u (0, l, θ ) = −t,

if l ≤ 1 − a (θ ) if l > 1 − a(θ ),

where ζ (·) and a(·) are increasing functions, with ζ (θ ) ≤ e∗ − t for all θ. Now ζ (θ ) + t − e∗ , if l ≤ 1 − a(θ ) π (l, θ ) = t, if l > 1 − a(θ). If θ were common knowledge, there would be three ranges of parameters. If θ < a −1 (0), each player has a dominant strategy to attack. If a −1 (0) ≤ θ ≤ a −1 (1), then there is an equilibrium where all speculators attack and another equilibrium where all speculators do not attack. If θ > a −1 (1), each player has a dominant strategy to attack. This tripartite division of fundamentals arises in a range of models in the literature on currency crises (see Obstfeld, 1996). However, if θ is observed with noise, we can apply the results of the previous section, because π(l, θ ) is weakly increasing in l, and weakly increasing in θ : 1 π (l, θ ) dl = (1 − a(θ ))(ζ (θ ) + t − e∗ ) + a(θ )t l=0

= t − (1 − a(θ ))(e∗ − ζ (θ )).

Thus, θ ∗ is implicitly deﬁned by (1 − a(θ))(e∗ − ζ (θ )) = t. Theorem 2 in Morris and Shin (1998) gave an incorrect statement of this condition. We are grateful to Heinemann (2000) for pointing out the error and giving a correct characterization. Again, we will describe one simple comparative statics exercise. Consider a costly ex ante action R for the monetary authority that lowered their costs of defending the currency. For example, R might represent the value of foreign currency reserves or (as in the recent case of Argentina) a line of credit with

Global Games

75

foreign banks to provide credit in the event of a crisis. Thus, the critical proportion of speculators for which an attack occurs becomes a(θ, R), where a(·) is increasing in R. Now, write θ ∗ (R) for the unique value of θ solving (1 − a(θ, R))(e∗ − ζ (θ )) = t. The ex ante probability that the currency will collapse is P(θ ∗ (R)). So, the reduction in the probability of collapse resulting from a marginal increase in R is − p(θ ∗ (R))

dθ ∗ = p(θ ∗ (R)) ∂a dR + ∂θ

∂a ∂R 1−a(θ,R) dζ e∗ −ζ (θ) dθ

.

This comparative static refers to the limit (as noise becomes very small), and the effect is entirely strategic [i.e., the increased value of R reduces the probability of attack only because it inﬂuences speculators’ equilibrium strategies (“builds conﬁdence”) and not because the increase in R actually prevents an attack in any relevant contingency]. In Section 4.1, we very brieﬂy discuss Corsetti, Dasgupta, Morris, and Shin (2000), an extension of this model of currency attacks where a large speculator is added to the continuum of small traders [see also Chan and Chiu (2000), Goldstein and Pauzner (2000b), Heinemann and Illing (2000), Hellwig (2000), Marx (2000), Metz (2000), and Morris and Shin (1999a)]. 2.3.3.

Bank Runs

We describe a model of Goldstein and Pauzner (2000a), who add noise to the classic bank runs model of Diamond and Dybvig (1983). A continuum of depositors (with total deposits normalized to 1) must decide whether to withdraw their money from a bank or not. If the depositors withdraw their money in period 1, they will receive r > 1 (if there are not enough resources to fund all those who try to withdraw, then the remaining cash is divided equally among early withdrawers). Any remaining money earns a total return R(θ) > 0 in period 2 and is divided equally among those who chose to wait until period 2 to withdraw their money. Proportion λ of depositors will have consumption needs only in period 1 and will thus have a dominant strategy to withdraw. We will be concerned with the game among the proportion 1 − λ of depositors who have consumption needs in period 2. Consumers have utility U (y) from consumption y, where the relative risk aversion coefﬁcient of U is strictly greater than 1. They note that if R(θ) was greater than 1 and θ were common knowledge, the ex ante optimal choice of r maximizing 1 − λr λU (r ) + (1 − λ)U R (θ ) 1−λ

76

Morris and Shin

would be strictly greater than 1. But, if θ is not common knowledge, we have a global game. Writing 1 for the action “withdraw in period 2” and 0 for the action “withdraw in period 1,” and l for the proportion of late consumers who do not withdraw early, the money payoffs in this game can be summarized in Table 3.3: Table 3.3. Payoffs in bank run game l≤ Early 0 Withdrawal Late 1 Withdrawal

r −1 r (1−λ)

1−λr (1−λ)(1−l)r

0

l≥

r −1 r (1−λ)

r r−

r −1 l(1−λ)

R (θ)

Observe that, if θ is sufﬁciently small [and so R(θ ) is sufﬁciently small], all players have a dominant strategy to withdraw early. Goldstein and Pauzner assume that, if θ is sufﬁciently large, all players have a dominant strategy to withdraw late (a number of natural economic stories could justify this variation in the payoffs). Thus, the payoffs in the game among late consumers are

U (0), if l ≤ u(1, l, θ ) = r −1 U r − l(1−λ) R (θ) , if l ≥ 1 r −1 U 1−l(1−λ) , if l ≤ r (1−λ) u(0, l, θ ) = r −1 U (r ), if l ≥ r (1−λ) so that

1 U(0) − U , 1−l(1−λ) π(l, θ ) = r −1 U r − R (θ ) − U (r ), l(1−λ)

r −1 r (1−λ) r −1 , r (1−λ)

if l ≤ if l ≥

r −1 r (1−λ) r −1 . r (1−λ)

The threshold state θ ∗ is implicitly deﬁned by r −1 r (1−λ) 1 U (0) − U dl 1 − l (1 − λ) l=0 1 r −1 R (θ ) − U(r ) dl = 0. U r− + r −1 l (1 − λ) l= r (1−λ) The ex ante welfare of consumers as a function of r (as noise goes to zero) is W (r ) = P(θ ∗ (r ))U (1) ∞ 1 − λr R(θ ) . p(θ ) λU (r ) + (1 − λ)U + 1−λ θ=θ ∗ (r )

Global Games

77

There are two effects of increasing r : the direct effect on welfare is the increased value of insurance in the case where there is not a bank run. But, there is also the strategic effect that an increase in r will lower θ ∗ (r ). Morris and Shin (2000) examine a stripped down version of this model, where alternative assumptions on the investment technology and utility functions imply that payoffs reduce to those of the linear example in Section 2.1 [see also Boonprakaikawe and Ghosal (2000), Dasgupta (2000b), Goldstein (2000), and Rochet and Vives (2000)]. 3. PUBLIC VERSUS PRIVATE INFORMATION The analysis so far has all been concerned with behavior when either there is a uniform prior or the noise is very small. In this section, we look at the behavior of the model with large noise and nonuniform priors. There are three reasons for doing this. First, we want to understand how extreme the assumptions required for uniqueness are. We will provide sufﬁcient conditions for uniqueness depending on the relative accuracy of private and public (or prior) signals. Second, away from the limit, prior beliefs play an important role in determining outcomes. In particular, we will see how even with a continuum of players and a unique equilibrium, public information contained in the prior beliefs plays a signiﬁcant role in determining outcomes, even controlling for beliefs concerning the fundamentals. Finally, by seeing how and when the model jumps from having one equilibrium to multiple equilibria, it is possible to develop a better intuition for what is driving results. We return to the linear example of Section 2.1: there is a continuum of players, the payoff to not investing is 0, and the payoff to investing is θ + l − 1, where θ is the state and l is the proportion of the population investing. It may help in following in the analysis to recall that, with linear payoffs, the exact number of players is irrelevant in identifying symmetric equilibrium strategies (and we will see that symmetric equilibrium strategies will naturally arise). Thus, the analysis applies equally to a two-player game. Now assume that θ is normally distributed with mean y and standard deviation τ . The mean y is publicly observed. As before, each player observes a private signal xi = θ + εi , where the εi are distributed normally in the population with mean 0 and standard deviation σ . Thus, each player i observes a public signal y ∈ R and a private signal xi ∈ R. To analyze the equilibria of this game, ﬁrst ﬁx the public signal y. Suppose that a player observed private signal x. His expectation of θ is θ=

σ 2 y + τ 2x . σ2 + τ2

It is useful to conduct analysis in terms of these posterior expectations of θ. In particular, we may consider a switching strategy of the following form: Invest, if θ > κ s(θ ) = NotInvest, if θ ≤ κ.

78

Morris and Shin

If the standard deviation of players’ private signals is sufﬁciently small relative to the standard deviation of the public signal in the prior, then there is a strategy surviving iterated deletion of strictly dominated strategies. Speciﬁcally, let σ2 σ2 + τ2 . γ ≡ γ (σ, τ ) ≡ 4 τ σ 2 + 2τ 2 Now we have Proposition 3.1. The game has a symmetric switching strategy equilibrium with cutoff κ if κ solves the equation √ κ = ( γ (κ − y)); (3.1) if γ (σ, τ ) ≤ 2π , then there is a unique value of κ solving (3.1) and the strategy with that trigger is the essentially unique strategy surviving iterated deletion of strictly dominated strategies; if γ (σ, τ ) > 2π, then (for some values of y) there are multiple values of κ solving (3.1) and multiple symmetric switching strategy equilibria. Figure 3.3 plots the regions in σ 2 − τ 2 space, where uniqueness holds. In Morris and Shin (2000), we gave a detailed version of the uniqueness part of this result in Appendix A. Here, we sketch the idea. Consider a player who has observed private signal x. By standard properties of the normal distribution

5

σ2 τ4

... ... ... .. .... . ... .. . ... . ... σ2 . . ... . . .. . . .. τ4 . ... .. .. . ... . .. ... . . . .. ... ... .. .. .. ... . .. ... . . .. ... . .. . .. ... ... . .. .. .. .. .. . . ... . .. ... . .. . .. ... . .. . ... . . .. . .. . .. . .. . . ... . . ... . . . . .. . .. . . ... . . .. . .. . .. .. . . . . .. . . .. . . ... . .. . . .. . . .. .. . . . ... . . ... . . ... .. . ... . ... . . ... .. . . . . ... . . ... . . ..... ... . . . ... .. . ..... .. . . . . . . ... .. .. ......... .. ...... .. .. ........ .......... ................ . . . . .. . . . . .. .... ........... ..... ....

= 4π

4 σ2 3

2

1

0

σ 2 +τ 2 σ2 +2τ 2 σ2 τ4

= 2π

= 2π

multiplicity

uniqueness

0.0

0.2

0.4

0.6

0.8

τ2

Figure 3.3. Parameter range for unique equilibrium.

Global Games

79

(see DeGroot, 1970), his posterior beliefs about θ would be normal with mean θ=

σ 2 y + τ 2x σ2 + τ2

and standard deviation σ 2τ 2 . σ2 + τ2 He knows that any other player’s signal, x , is equal to θ plus a noise term with mean 0 and standard deviation σ . Thus, he believes that x is distributed normally with mean θ and standard deviation 2σ 2 τ 2 + σ 4 . σ2 + τ2 Now suppose he believed that all other players will invest exactly if their expectation of θ is at least κ [i.e., if their private signals x satisfy (σ 2 y + τ 2 x )/(σ 2 + τ 2 ) ≥ κ, or x ≥ κ + (σ 2 /τ 2 )(κ − y)]. Thus, he assigns probability 2 κ − θ + στ 2 (κ − y) (3.2) 1 − 2σ 2 τ 2 +σ 4 σ 2 +τ 2

to any particular opponent investing. But his expectation of the proportion of his opponents investing must be equal to the probability he assigns to any one opponent investing. Thus, (3.2) is also equal to his expectation of the proportion of his opponents investing. Because his payoff to investing is θ + l − 1, his expected payoff to investing is θ plus expression (3.2) minus one, i.e., 2 κ − θ + στ 2 (κ − y) . v(θ , κ) ≡ θ − 2σ 2 τ 2 +σ 4 σ 2 +τ 2

His payoff to not investing is 0. Because v(θ , κ) is increasing in θ , we have that there is a symmetric equilibrium with switching point κ exactly if v ∗ (κ) ≡ v(κ, κ) = 0. But v ∗ (κ) ≡ v(κ, κ)

2 (κ − y) σ = κ − 2 2 4 τ 2 2σσ 2τ+τ+σ2 √ = κ − ( γ (κ − y)) .

Figure 3.4 plots the function v ∗ (κ) for y = respectively.

1 2

and γ = 1,000, 10, 5, and 0.1,

80

Morris and Shin

1.0 ν ∗ (κ) 0.5

0.0

. ... .... ... .... .... . . .. .... ... .... ... .... ... . . . .. .... ... ....... .... ..... .... ...... ... ..... .... . ......... . ........... .... . . . . . . . . . ....... ... ......... ... ........ ... ......... .... ........ ... .......... .... ...... ............. . .... .... .... . . . . . . . ... . .. ......... .... ... .... .......... ... ... ..... ... ... ..... ...... .... .... ... ..... .... ... ... ... ...... ........ .... ... ... .................. ... . ... . . . . . . . . . ... ..... .... ... .... ... ....... ....... ... .... ...... ............ ... .... .... . ........ ... ... .... ........ .............. .... ... ......... .... .... ..... ... .......... . . ... . . . . . . . . . . . . . . . . . . . . . . . . .. ... .. .................. ....................................... ...... .. ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... . ................................ ...... .... ....................................... ... ........ ... ... ..... .... ..... .... .... .......... ... .... ... .... .... .......... ................ ... . . . . . . . . . . . .... ... .......... ............. .... . . . . . . . ... .. ... .... ... ....... ........ .... ... .... ....... ..... .... .. ... ...... ..... .... ...... ...... ... ... ..... ... ...... ...... .. ... ... ................ . . . . . . . . . . . . .. . ... . .... ... .......... .... ... .......... ... ..... .... ........ ..... ......... .... .... ........... ............ .... . . . . . . . .... .. ......... .... ....... ... ......... .... ....... ... ...... .... ... ....... . . . . . . . . .... .... ...... .... .... ..... .... .... ... ..... .... .... .... . . ...

γ = 1000

γ = 10

γ=5

−0.5

γ = 0.1

−0.5

0.0

0.5

1.0

κ

1.5

Figure 3.4. Function ν ∗ (κ).

The intuition for these graphs is the following. If public information is relatively large (i.e., σ τ and thus γ is large), then players with posterior expectation κ less than y = 12 conﬁdently expect that their opponent will have observed a higher signal, and therefore will be investing. Thus, his expected utility is (about) κ. But, as κ moves above y = 12 , he rapidly becomes conﬁdent that his opponent has observed a lower signal and will not be investing. Thus, his expected utility drops rapidly, around y, to (about) κ − 1. But, if public information is relatively small (i.e., σ τ and γ is small), then players with κ not too far above or below y = 12 attach probability (about) 12 to their opponent observing a higher signal. Thus, his expected utility is (about) κ − 12 . We can identify analytically when there is a unique solution: Observe that dv ∗ √ √ = 1 − γ φ( γ (κ − y)) . dκ Recall √ that φ(x), the density of the standard normal, attains its maximum of 1/ 2π at x = 0. Thus, if γ ≤ 2π , dv ∗ /dκ is greater than or equal to zero always, and strictly greater than zero, except when κ = y. So, (3.1) has a unique solution. But, if γ > 2π and y = 12 , then setting κ = 12 solves (3.1), but dv ∗ /dκ|κ= 12 < 0, so (3.1) has two other solutions. Throughout the remainder of this section, we assume that there is a unique equilibrium [i.e., that γ (α, β) ≤ 2π ]. Under this assumption, we can invert the equilibrium condition (3.1) to show in (θ¯ , y) space what the unique equilibrium

Global Games

2 y

1 0 −1 −2

81

... ... ... ... .. ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... .. ... ... .... ... ....... ... ......... ... ................. ........................................ .......................................................................................................... ............................... .. .............. ... ........ ... ...... .. .... .. ... ... ... ... ... ... ... ... ... ... ... .... ..... ... ... .... ... ... ... .... ... ... .... ... ... ... ... ... ... ... ... ... ... ...

γ = 0.001

γ=5

invest

not invest

0.0

0.4

κ

0.8

Figure 3.5. Investment takes place above and to the right of the line.

looks like: 1 y = h γ (θ¯ ) = θ¯ − √ −1 (θ¯ ). γ

(3.3)

Figure 3.5 plots this for γ = 5 and γ = 1/1,000. The picture has an elementary intuition. If θ¯ < 0, it is optimal to not invest (independent of the public signal). If θ¯ > 1, it is optimal to invest (independent of the public signal). But, if 0 < θ¯ < 1, there is a trade-off. The higher y is ¯ the more likely it is that the other player will invest. Thus, if (for a given θ), 0 < θ¯ < 1, the player will always invest for sufﬁciently high y, and not invest for sufﬁciently low y. This implies in particular that changing y has a larger impact on a player’s action than changing his private signal (controlling for the informativeness of the signals). We next turn to examining this “publicity” effect.

3.1.

The Publicity Multiplier

To explore the strategic impact of public information, we examine how much a player’s private signal must adjust to compensate for a given change in the public signal. Equation (3.1) can be written as 2 σ 2 y + τ 2x σ y + τ 2x √ − γ − y = 0. σ2 + τ2 σ2 + τ2 Totally differentiating with respect to y gives √ σ2 + γ φ(·) dx τ2 =− . √ dy 1 − γ φ(·) This measures how much the private signal would have to change to compensate for a change in the public signal (and still leave the player indifferent between investing or not investing). We can similarly see how much the private signal

82

Morris and Shin

would have to change to compensate for a change in the public signal, if there was no strategic effect. Totally differentiating θ=

σ 2 y + τ 2x = k, σ2 + τ2

we obtain dx σ2 =− 2. dy τ Deﬁne the publicity multiplier as the ratio of these two: 2 √ 1 + στ 2 γ φ(·) . ζ = √ 1 − γ φ(·) Thus, suppose a player’s expectation of θ is θ and he has observed the public signal that makes him indifferent between investing and not investing [y = √ θ − (1/ γ )−1 (θ )]; the publicity multiplier evaluated at this point will be: ζ =

τ2 √ γ φ(−1 (θ )) σ2 . √ − γ φ((−1 (θ )))

1+ 1

Notice that (for any given σ and τ ) the publicity multiplier is maximized when θ = 12 , and thus the critical public signal y = 12 . Thus, it is precisely when there is no conﬂict between private and public signals that the multiplier has its biggest effect. Here, the publicity multiplier equals 2 γ 1 + στ 2 2π ζ∗ = . γ 1 − 2π Notice that, when γ is small (i.e., σ/τ 2 is small), the publicity multiplier is very small. The multiplier is biggest just before we hit the multiplicity zone of the parameter space (i.e., when γ ≈ 2π ). There is plentiful anecdotal evidence that in settings where coordination is important, public signals play a role in coordinating outcomes that exceed the information content of those announcements. For example, ﬁnancial markets apparently “overreact” to announcements from the Federal Reserve Board and public announcements in general. If market participants are concerned about the reaction of other participants to the news, the “overreaction” may be rational and determined by the type of equilibrium logic of our example. Further evidence for this is brieﬁngs on market conditions by key players in ﬁnancial markets using conference calls with hundreds of participants. Such public brieﬁngs have a larger impact on the market than bilateral brieﬁngs with the same information, because they automatically convey to participants not only information about market conditions, but also valuable information about the beliefs of the other participants.

Global Games

83

Urban renewal also has a coordination aspect. Private ﬁrms’ incentives to invest in a run-down neighborhood depend partly on exogenous characteristics of the neighborhood, but they also depend to a great extent on whether other ﬁrms are investing. A well-publicized investment in the neighborhood might be expected to have an apparently disproportionate effect on the probability of ending in the good equilibrium. The willingness of public authorities to subsidize football stadiums and conference centers is consistent with this view. An indirect econometric test of the publicity effect is performed by Chwe (1998). Chwe observes that the per viewer price of advertising during the Super Bowl is exceptionally high (i.e., the price of advertising increases more than linearly in the number of viewers). The premium price is explained by the fact that any information conveyed by those advertisements becomes not merely known to the wide audience, but also common knowledge among them. The value of this common knowledge to advertisers should depend on whether there is a signiﬁcant coordination problem in consumers’ decisions whether to purchase the product. Chwe makes some plausible ex ante guesses about when coordination is an important issue because of network externalities (e.g., the Apple Macintosh) or social consumption (e.g., beer) and when it is not (e.g., batteries). He then conﬁrms econometrically that it is the advertisers of coordination goods who pay a premium for large audiences. In Morris and Shin (1999b), we use the publicity effect to explain an anomaly in the pricing of debt. Empirically, the option pricing model of debt due to Merton (1974) underestimates the yield on debt (i.e., underestimates the empirical default rate). This deviation from theory is largest for low-grade (highrisk) bonds. A deterioration in public signals for low-grade bonds generates a large publicity effect: the deterioration makes investors more pessimistic about default for any given strategies of the other players, but, more importantly, the deterioration makes investors more pessimistic about other players’ strategies.

3.2.

Limiting Behavior

If we increase the precision of public signals, while holding the precision of private signals ﬁxed (i.e., let τ → 0 for ﬁxed σ ), then we clearly exit the unique equilibrium zone.6 If we increase the precision of private signals, while holding the precision of public signals ﬁxed (i.e., let σ → 0 for ﬁxed τ ), then we return to the uniform prior setting of Section 2.1. But, we can also examine what happens to the unique equilibrium as the precision of both signals increases in such a way that uniqueness is maintained. Speciﬁcally, let τ → 0 and let 6

For sufﬁciently small τ , either action is rationalizable as long as y ∈ (0, 1) and θ ∈ (0, 1). If either θ ≥ 1 or θ > 0 and y ≥ 1, then only investing is rationalizable. If either θ ≤ 0 or θ < 1 and y ≤ 0, then only not investing is rationalizable.

84

Morris and Shin

σ 2 → cτ 4 , where c < 4π . In this case, σ2 σ2 + τ2 γ (σ, τ ) = 4 τ σ 2 + 2τ 2 4 cτ cτ 4 + τ 2 → 4 τ cτ 4 + 2τ 2 c → 2 < 2π. Thus

2 −1 θ . hγ (σ,τ ) (θ ) → θ − c

This result says that, even though the public signal becomes irrelevant to a player’s expected value of θ in the limit, it continues to have a large impact on the outcome. For example, suppose c = 1 and y = 13 (i.e., public information looks bad). Each player will invest only if θ ≥ 0.7 (i.e., they will be very conservative). This is true even as they ignore y (i.e., θ → x). The intuition for this result is the following. Suppose public information looks bad (y < 12 ). If each player’s private information is much more accurate than the public signal, each player will mostly ignore the public signal in forming his own expectation of θ. But, each will nonetheless expect the other to have observed a somewhat worse signal than themselves. This pessimism about the other’s signal makes it very hard to support an investment equilibrium. 3.3.

Sufﬁcient Conditions for Uniqueness

We derived a very simple necessary and sufﬁcient condition for uniqueness in the linear example, depending only on the precision of public and private signals. In this section, we brieﬂy demonstrate that a similar sufﬁcient condition works for general payoff functions. In particular, we will show that there is always a unique equilibrium if σ 2 /τ 4 is sufﬁciently small.7 We will show this in a simple setting, although the argument can be extended. We maintain the normal distribution assumptions on the prior and signals, but let the payoffs be as in Section 2.2, so that π(l, θ ) is the payoff gain from choosing action 1 instead of action 0. Furthermore, we will focus on the continuum players case, where π (l, θ) is differentiable and strictly increasing in l and θ , with dπ/dl(l, θ) ≤ K and dπ/dθ(l, θ ) ≥ ε for all l and θ . Under these assumptions, we may look at the expected gain to choosing action 1 rather than action 0 if your expectation of θ is θ and you think that 7

Hellwig (2000) performs a related exercise in a version of our currency attacks model (Morris and Shin, 1998).

Global Games

85

others follow a switching strategy at κ: ∞ σ 2τ 2 θ − θ V (θ , κ) = φ σ2 + τ2 σ 2τ 2 θ=−∞

×π 1 − =

∞

θ =−∞

σ 2 +τ 2

κ −θ +

σ2 τ2

(κ − y)

σ

, θ dθ

σ 2τ 2 θ φ σ2 + τ2 σ 2τ 2 σ 2 +τ 2

−θ + κ − θ + ×π 1 − σ

σ2 τ2

(κ − y)

, θ + θ dθ .

Now to apply our earlier argument for uniqueness, it is enough to show that expression is increasing in θ and V (κ, κ) = 0 has a unique solution. The former is clearly true; to show the latter, observe that ∞ σ 2τ 2 θ φ V (κ, κ) = σ2 + τ2 σ 2τ 2 θ =−∞ σ 2 +τ 2 2 −θ + στ 2 (κ − y)

, θ + κ dθ , ×π 1 − σ so

σ 2 τ 2 θ dπ(·) dπ(·) σ − φ (·) 2 dθ

φ σ2 + τ2 dθ dl τ σ 2τ 2 θ =−∞ σ 2 +τ 2

∞ dπ (·) σ σ 2 τ 2 θ dπ(·) dl 1 − dπ (·) φ (·) 2 dθ . = φ σ2 + τ2 dθ τ σ 2τ 2 θ =−∞ dθ

d V (κ, κ) = dκ

∞

σ 2 +τ 2

(3.4) If this expression is always positive, then there is a unique value of κ solving V (κ, κ) = 0, and the unique strategy surviving iterated deletion of strictly dominated strategies is the switching strategy with that cutoff. Because φ(·) √ is at most 1/ 2π , the expression in square brackets within equation (3.4) is positive as long as dπ (·) √ 2 dl < τ 2π ; dπ (·) σ dθ

86

Morris and Shin

since dπ (·) dl ≤ K ; dπ (·) ε dθ this will be true as long as √ K τ 2 2π < , ε σ i.e., ε 2 σ2 < 2π . 4 τ K 4. THEORETICAL UNDERPINNINGS 4.1.

General Global Games

All the analysis thus far has dealt with symmetric payoff games. The analysis of Carlsson and van Damme (1993a) in fact provided a remarkably general result for two-player, two-action games, even with asymmetric payoffs. Let the payoffs of a two-player, two-action game be given by Table 3.4: Table 3.4. Payoffs for general 2 × 2 global game

1 0

1

0

θ 1 , θ2 θ 5 , θ6

θ 3 , θ4 θ 7 , θ8

Thus, a vector θ ∈ R8 describes the payoffs of the game. Each player i observes a signal xi = θ + σ εi , where the εi are eight-dimensional noise terms. This setup describes an incomplete information game parameterized by σ . Under mild technical assumptions,8 as σ → 0, any sequence of strategy proﬁles surviving iterated deletion of strictly dominated strategies converges to a unique limit. Moreover, that limit is independent of the distribution of the noise and has the unique Nash equilibrium of the underlying complete information game being played (if there is one), and has the risk-dominant Nash equilibrium played (if there are two strict Nash equilibria). To understand if and when this remarkable result might extend to many players and many action games, it is useful to ﬁrst observe that there are two 8

The following technical conditions are sufﬁcient (Carlsson and van Damme’s actual setup is a little more general): payoff vector θ is drawn according to a strictly positive, continuously differentiable, bounded density on R8 ; and the noise terms (ε1 , ε2 ) are drawn according to a continuous density with bounded support, independently of θ .

Global Games

87

independent things being proved here. First, there is a limit uniqueness result. As the noise goes to zero, there is a unique strategy proﬁle surviving iterated deletion of strictly dominated strategies. Given that with no noise we know that there are multiple equilibria, this is a striking result by itself. Second, there is a noise-independent selection result. We can characterize behavior in that unique limit as a function of the complete information payoffs in the limit, and thus independently of the shape of the prior beliefs on θ and the distribution of noise. Thus, Carlsson and van Damme’s two-player, two-action analysis combines separate limit uniqueness and noise-independent selection results. Similarly, the results in Section 2 for continuum player, symmetric binary action games simultaneously showed that there was a unique strategy surviving iterated deletion of strictly dominated strategies in the limit (a limit uniqueness result) and characterized behavior in the limit (the Laplacian action) independent of the structure of the noise (a noise-independent selection result). Frankel, Morris, and Pauzner (2000) (hereafter, FMP) examine global games with many players, asymmetric payoffs, and many actions. They show that a limit uniqueness result holds quite generally, as long as some monotonicity properties are satisﬁed. They consider the following environment. Each player has an ordered set of actions (ﬁnite or continuum); his payoff depends on the action proﬁle played and a payoff parameter θ ∈ R; he observes a signal xi = θ + σ εi , where σ > 0, and εi is an independently distributed noise term. For sufﬁciently low values of θ , each player has a dominant strategy to choose his lowest action, and that for sufﬁciently high values of θ , each player has a dominant strategy to choose his highest action. Each player’s payoffs are supermodular in the action proﬁle, implying that each player’s best response is increasing in others actions (for any θ ). Each player’s payoffs are supermodular in his own action and the state, implying that his best response is increasing in the payoff parameter θ (for any given actions of his opponents). Under these substantive assumptions, and additional technical assumptions,9 FMP show a limit uniqueness result. The proof uses the technique, also followed in Section 2.2, of ﬁrst analyzing the uniform prior, private values game and showing a uniqueness result independent of the size of the noise; and then showing that, if the noise is small, all equilibria of the game with a general prior and common values are close to the unique equilibrium of the uniform prior, private values game. The limit uniqueness result of FMP provides a natural many-player, many-action generalization of Carlsson and van Damme (1993a). It is true that Carlsson and van Damme required no strategic complementarity and other monotonicity properties. But, when a two-player, two-action game has multiple Nash equilibria (the interesting case for Carlsson and van Damme’s analysis), there are automatically strategic complementarities. FMP’s limit uniqueness 9

Payoffs are continuous with respect to actions and θ , and there is a Lipschitz bound on the sensitivity of payoffs to changes in own and others’ actions. The state is drawn according to a continuous and positive density, and signals are drawn according to a continuous and positive density with bounded support.

88

Morris and Shin

results could presumably be extended straightforwardly to many-dimensional payoff parameters and signals, if the relevant monotonicity conditions were suitably adjusted.10 Within this class of monotonic global games where limit uniqueness holds, FMP also provide sufﬁcient conditions for noise-independent selection. They generalize the notion of a potential maximizing action, due to Monderer and Shapley (1996). We will discuss these generalized potential conditions in more detail in Section 4.4, because they are also sufﬁcient for the (more demanding) property of being robust to incomplete information. The sufﬁcient conditions for noise-independent selection encompass two classes of games already discussed in this survey: many-player, two-action, symmetric payoff games (where the Laplacian action is played); and two-player, two-action games, with possibly asymmetric payoffs (where the risk dominant equilibrium is played). They also encompass two-player, three-action games with symmetric payoffs. They encompass the minimum effort game of Bryant (1983).11 FMP also provide an example of a two-player, four-action, symmetric payoff game where noise-independent selection fails. Thus, there is a unique limit as the noise goes to zero, but the nature of the limit depends on the exact distribution of the noise. Carlsson (1989) gave a three-player, two-action example in which noise-independent selection failed. Corsetti, Dasgupta, Morris, and Shin (2000) describe a global games model of currency crises, where there is a continuum of small traders and a single large trader. This is thus a many-player, two-action game with asymmetric payoffs. We show that the equilibrium selected as noise goes to zero depends on the relative informativeness of the large and small traders’ signals. This is thus an application where noise-independent selection fails. We conclude this brief summary by noting one consequence of FMP for the earlier analysis in this paper. In Section 2.2, it was shown that the Laplacian action was selected in symmetric binary action global games. The argument exploited the fact that players observed signals with iid noise in that class of games. But, FMP show noise-independent selection of the Laplacian action independent of the distribution of noise. If the distribution of noise is very different for different players, we surely cannot guarantee that each player has a uniform belief over the proportion of his opponents taking each action. Nonetheless, the Laplacian action must be played in the limit. We can illustrate this implication with a simple example. Consider a three-player game, with binary action set {0, 1}. The payoff to action 1 is θ if both of the other players choose action 1, θ − z if one other player chooses action 1, and θ − 1 if neither 10

11

The conditions for limit uniqueness in FMP conditions could also presumably be weakened in a number of directions. For example, with additional restrictions on the noise structure, one could perhaps use the monotone comparative statics under uncertainty techniques of Athey (2001, 2002), as in lemma 2.3. Carlsson and Ganslandt (1998) show the potential maximizing action is selected in the minimum effort game when players’ continuous actions are perturbed.

Global Games

89

player chooses action 1 (where 0 < z < 1). The payoff to action 0 is zero. State θ is uniformly distributed on the real line. Observe that the Laplacian action is 1 if 13 θ + 13 (θ − z) + 13 (θ − 1) > 0 [i.e., θ > 13 (z + 1)]. Let ε1 , ε2 , and ε3 be i.i.d. with symmetric c.d.f. F(·), let δ be a very small positive number, and let σ be a parameter describing the size of the noise. The players’ signals x1 , x2 , and x3 are given by x1 = θ + σ δε1 , x2 = θ + σ δε2 , x 3 = θ + σ ε3 . Thus, 1 and 2 observe much more informative signals. We will look for a switching strategy equilibrium, where players 1 and 2 use cutoff x σ and player 3 uses cutoff xσ . Let xσ − x σ λσ = F . σ We are interested in what happens in the limit as ﬁrst we take δ → 0, and then take the limit as σ → 0. As δ becomes very small, if player 1 or 2 observes signal x σ , he will assign probability (about) 12 (1 − λσ ) to both players choosing action 1, probability (about) 12 to one player choosing action 1, and probability (about) 12 λσ to neither player choosing action 1; although, if player 3 observes signal xσ , he will assign probability λσ to both players choosing action 1, probability 0 to one player choosing action 1, and probability 1 − λσ to neither player choosing action 1. Thus, we must have: 1 1 1 (1 − λσ )x σ + (x σ − z) + λσ (x σ − 1) = 0, 2 2 2 xσ − z) + (1 − λσ )( xσ − 1) = 0. xσ + 0 ( λσ Rearranging gives: 1 1 z + λσ , 2 2 x σ = 1 − λσ . xσ =

As σ → 0, we must have x σ → xσ and thus λσ → 23 (1 − 12 z) [so, −1 xσ must both converge to 13 (z + 1). ( xσ − x σ )/σ −→ F (λσ )]. Thus, x σ and But this gives the result that the Laplacian action is played by all players in the limit, independent of the shape of F. 4.2.

Higher-Order Beliefs

In global games, the importance of the noisy observation of the underlying state lies in the fact that it generates strategic uncertainty, that is, uncertainty about others’ behavior in equilibrium. That strategic uncertainty is generated by

90

Morris and Shin

players’ uncertainty about other players’ payoffs. Thus, understanding global games involves understanding how equilibria depend on players’ uncertainty about other players’ payoffs. But, clearly, it is not going to be enough to know each player’s beliefs about other players’ payoffs. We must also take into account each player’s beliefs about other players’ beliefs about his payoffs, and further such higher-order beliefs. Players’ payoffs and higher-order beliefs about payoffs are the true primitives of a game of incomplete information, not the asymmetric information structure. In earlier sections, we told an asymmetric information story about how there is a true state of fundamentals θ drawn from some prior and each player observes a signal of θ generated by some technology. But, our analysis of the resulting game implicitly assumes that there is common knowledge of the prior distribution of θ and the signaling technologies. It is hard to defend this assumption literally when the original purpose was to get away from the unrealistic assumption that there is common knowledge of the realization of θ . The classic arguments of Harsanyi (1967–1968) and Mertens and Zamir (1985) tell us that we can assume common knowledge of some state space without loss of generality. But such a common knowledge state space makes sense with an incomplete information interpretation (a player’s “type” is a description of his higher-order beliefs about payoffs), but not with an asymmetric information interpretation (a player’s “type” is a signal drawn according to some ex ante ﬁxed distribution); see Battigalli (1999) and Dekel and Gul (1996) for forceful defenses of this position. Thus, we believe that the noise structures analyzed in global games are interesting because they represent a tractable way of generating a rich structure of higher-order beliefs. The analysis of global games represents a natural vehicle to illustrate the power of higher-order beliefs at work in applications.12 But, then, the natural way to understand the “trick” to global games analysis is to go back and understand what is going on in terms of higher-order beliefs. Even if one is uninterested in the philosophical distinction between incomplete information and asymmetric information, there is a second reason why the higher-order beliefs literature may contribute to our understanding of global games. Even keeping a pure asymmetric information interpretation, we can calculate (from the prior distribution over θ and the signal technologies) the players’ higher-order beliefs about payoffs. Statements about higher-order beliefs about payoffs turn out to represent a natural mathematical way of characterizing which properties of the prior distribution and signal technologies matter for the results. The pedagogical risk of emphasizing higher-order beliefs is that readers may conclude that playing in the uniquely rational way in a global game requires fancy powers of reasoning, some kind of hyperrationality that allows them to reason to an arbitrarily high number of levels. We emphasize that the fact that either the analyst or a player expresses information about the game in terms 12

For work on higher-order beliefs not using the global games technology, see Townsend (1983); Allen, Morris, and Postlewaite (1993); Shin (1996); and the discussion of Section 4.1 of Allen and Morris (2000).

Global Games

91

of higher-order beliefs does not make standard equilibrium concepts any less compelling and does not suggest any particular view about how equilibrium behavior might be arrived at. In particular, recall that there is a very simple heuristic that will generate equilibrium behavior in symmetric binary action games. If there is not common knowledge of the environment you are in, you should hold diffuse beliefs about others’ behavior. In particular, if you are on the margin between your two actions, it seems reasonable to take the agnostic view that you are equally likely to hold any rank in the population concerning your evaluation of the desirability of the two actions. Thus, if other people behave like you, you should make your decision on the assumption that the proportion of other players choosing each action is uniformly distributed. This reasoning sound naive, but actually generates a very simple heuristic for behavior that is consistent with the unique rational behavior. In the remainder of this section, we ﬁrst informally discuss the role of higherorder beliefs in a global game example. Then, we review brieﬂy the theoretical literature on higher-order beliefs in games.13 Finally, we show how results from that literature can be taken back to the analysis of global games. Monderer and Samet (1989) introduced a natural language for characterizing players’ higher-order beliefs. Fix a probability p ∈ (0, 1]. Let be a set of possible states, and let E be any subset of . The event E is p-believed at state ω among some ﬁxed group of individuals if everyone believes that it is true with probability at least p (and we write B pE for the set of states where event E is p-believed). The event E is common p-belief at state ω if it is p-believed, it is p-believed that it is p-believed, and so on, up to an arbitrary number of levels [and we write C p (E) for the set of states where event E is common p-belief ]. The event E is p-evident if whenever it is true, it is p-believed (i.e., E ⊆ B pE). Monderer and Samet proved the following result: Proposition 4.1. Event E is common p-belief at ω [i.e., ω ∈ C p (E)] if and only if there exists a p-evident event F such that ω ∈ F ⊆ B p E. This result provides a ﬁxed-point characterization (i.e., using the p-evident property) of an iterative deﬁnition of common p-belief. It thus generalizes Aumann’s classic characterization of common knowledge (Aumann, 1976). We will illustrate these properties of higher-order beliefs in the global games setting.14 So, consider again the two-player example of Section 2.1: θ is drawn uniformly from the real line and players i = 1, 2 each observe a signal 13

14

Our review of this literature is much abbreviated and highly selective. See Fudenberg and Tirole (1991) Chapter 14; Osborne and Rubinstein (1994) Chapter 5; Geanakoplos (1994); and Dekel and Gul (1996) for more background on this material. Morris and Shin (1997) survey the higherorder beliefs in game theory literature with a focus on the relationship to related literatures in philosophy and computer science. Kajii and Morris (1997c) survey this literature with a focus on the relation to the standard reﬁnements literature in game theory. Monderer and Samet (1989) characterized common p-belief for discrete state spaces, but Kajii and Morris (1997b) show the straightforward extension to continuum state spaces.

92

Morris and Shin

xi = θ + εi , where εi is distributed normally with mean 0 and standard deviation σ . Thus, the relevant state space is R3 , with typical element (θ, x1 , x2 ). Fix the payoff relevant event E k = {(θ, x1 , x2 ) : θ ≥ k}; this is the set of states where the true θ is at least k. If player i observes signal xi , he will assign probability (xi − k/σ ) to the event E k being true. Thus, he will assign probability at least p to the event E k exactly if xi ≥ k + σ −1 ( p) ≥ k. Thus B p E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p),

i = 1, 2}. √ Now, if player i observes xi , he assigns probability (xi − κ)/ 2σ to player j observing a signal√ above κ, and he assigns probability at least p to that event exactly if xi ≥ κ + 2σ −1 ( p). In addition, player i knows for sure whether xi is greater than κ. Thus for

B p B p E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p) √ + max{0, 2σ −1 ( p)}, for i = 1, 2} and, by induction, [B p ]n E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p) √ + (n − 1) max{0, 2σ −1 ( p)},

for i = 1, 2}.

(4.1)

So C p E k = ∩ [B p ]n E n≥1 ∅, = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p),

if p > 12 for i = 1, 2}, if p ≤ 12 .

Thus, a remarkable feature of this simple example is that for any p > 12 , there is never common p-belief that θ is greater than k, for any k. We could also have shown this using the characterization of common p-belief described in proposition 4.1. For any k, event E k is p-evident only if p ≤ 12 . This is because a player observing signal k will always assign probability 12 to his opponent observing a signal less than k. A key property of global games is that they fail to deliver nontrivial common p-belief and p-evident events (for high p). As we will see, the existence of such events is key to supporting multiple equilibria in incomplete information games. Combining this information structure with the payoffs from the two-player example of Section 2.1, we can illustrate the extreme sensitivity of strategic outcomes to players’ higher-order beliefs. Recall that each player had to choose between not investing (with payoff 0) and investing (with payoff θ if the other player invests, and payoff θ − 1 otherwise). The unique equilibrium involved each player i investing if his signal xi was greater than 12 and not otherwise. This result was independent of σ (the scale variable of the noise). Now observe that if 1 σ ≤ , √ 5(1 + (n − 1) 2)−1 ( p)

Global Games

93

then [by equation (4.1)] for all θ ,

2 2 θ, , 5 5

∈ [B p ]n E 1 . 5

In words, suppose that each player observed signal 25 . If we ﬁx any integer n and any p < 1, we may choose σ sufﬁciently small such that it is p-believed that it is p-believed that (n times) . . . that θ is greater than 15 . If it was common knowledge that θ was greater than 15 , it would clearly be rational for both players to invest. But, the unique rational behavior has each player not investing. Rubinstein (1989) used his electronic mail game to illustrate this sensitivity of strategic outcomes to common knowledge. Monderer and Samet (1989) showed why n levels of p-belief or even knowledge was not enough to approximate common knowledge in strategic settings, and common p-belief (i.e., an inﬁnite number of levels) is required. The idea behind this observation is illustrated in the next section. Morris, Rob, and Shin (1995) showed why only some Nash equilibria (e.g., risk-dominated equilibria) were sensitive to higherorder beliefs and not others, and provided a characterization – related to the lack of common p-belief events – of which (discrete state) information systems displayed an extreme sensitivity to higher-order beliefs (see also Sorin, 1998). Kajii and Morris (1997a) introduced a notion of robustness to incomplete information to characterize equilibria that are not sensitive to higher-order beliefs. This work is reviewed and related back to global games in Sections 4.4 and 4.5.

4.3.

Common p-Belief and Game Theory

Fix a ﬁnite set of players 1, . . . , I and a ﬁnite action set Ai for each player i. A complete information game is then a vector of payoff functions, g ≡ (g1 , . . . , g I ), where each gi : A → R. A (discrete state) incomplete information I I game is then a collection {, π, (Pi )i=1 , (u i )i=1 }, where is a countable state space, π ∈ () is a prior probability on that state space, Pi is the partition of the state space of player i; and u i : A × → R is the payoff function of player i. I I , (u i )i=1 }, we may For any given incomplete information game {, π, (Pi )i=1 write |g| for the set of states in the incomplete information game where payoffs are given by g. Thus, |g| = {ω ∈ | u i (a, ω) = gi (a)

for all

a ∈ A and i = 1, . . . , I } .

Using this language, we can summarize some key observations from the theoretical literature on higher-order beliefs in game theory. A pure strategy Nash equilibrium a ∗ of a complete information game, g, is said to be a pdominant equilibrium (Morris, Rob, and Shin, 1995) if each player’s action is a best response whenever he assigns probability at least p to his opponents

94

Morris and Shin

choosing according to a ∗ , i.e., λ(a−i )gi (ai∗ , a−i ) ≥ λ(a−i )gi (ai , a−i ) a−i ∈Ai

a−i ∈Ai

∗ for all i = 1, . . . , I , ai ∈ Ai and λ ∈ (A−i ), such that λ(a−i ) ≥ p.

Lemma 4.2. If a ∗ is a p-dominant equilibrium of complete information game I I g, then every incomplete information game {, π, (Pi )i=1 , (u i )i=1 } has an equi∗ p librium where a is played with probability 1 on the event C (|g|). The proof of this result is straightforward. The event C p (|g|) is itself a p-evident event. Consider the modiﬁed incomplete information game where each player is constrained to choose according to a ∗ when he p-believes the event C p (|g|). Find an equilibrium of that modiﬁed game. By construction, a ∗ is played with probability 1 on the event C p (|g|). But, the equilibrium of the modiﬁed game is also an equilibrium of the original game. If a player i p-believes the event C p (|g|), then he p-believes that other players are choosing ∗ . But, because his payoffs are given by g and a ∗ is a p-dominant equilibrium, a−i ∗ ai must be a best response for player i. Because every strict Nash equilibrium is a p-dominant equilibrium for some p < 1, we immediately have: Corollary 4.3. If a ∗ is a strict Nash equilibrium of complete information game g, then there exists p < 1, such that every incomplete information game I I , (u i )i=1 } has an equilibrium where a ∗ is played on the event {, π, (Pi )i=1 p C (|g|). Thus, if we took a sequence of incomplete information games where in the limit payoffs are common knowledge, and close to the limit they are common p-belief (with p close to 1) with ex ante probability close to 1, then payoffs from equilibria of that sequence of incomplete information games must converge to payoffs in the limit game. Monderer and Samet (1989) proved such a lower hemicontinuity result. One can also ask a converse question: what is the relevant topology on information systems, such that information systems close to common knowledge information systems deliver outcomes that are close to common knowledge outcomes. Monderer and Samet (1996) and Kajii and Morris (1998) characterize such topologies (for different kinds of information system). 4.4.

Robustness to Incomplete Information

Let a ∗ be a pure strategy Nash equilibrium of complete information game g; a ∗ is robust to incomplete information if every incomplete information game where payoffs are almost always given by g has an equilibrium where players

Global Games

95

almost always choose a ∗ [Kajii and Morris (KM), 1997a)].15 More precisely, a ∗ is robust to incomplete information if, for all δ > 0, there exists ε > 0, such that every incomplete information game where π(|g|) ≥ 1 − ε has an equilibrium where a ∗ is played by all players on an event with probability at least 1 − δ. Robustness (to incomplete information) can be seen as a very strong reﬁnement of Nash equilibrium. Kajii and Morris (1997b) provide a detailed account of the relation between robustness and the existing reﬁnements literature, which we brieﬂy summarize here. The reﬁnements literature examines what happens to a given Nash equilibrium in perturbed versions of the complete information game. A weak class of reﬁnements requires only that the Nash equilibrium continues to be equilibrium in some nearby perturbed game [Selten’s (1975) notion of perfect equilibrium is the leading example of this class]; a stronger class requires that the Nash equilibrium continues to be played in all perturbed nearby games [Kohlberg and Mertens’ (1986) notion of stable equilibria is the leading example of this class]. Robustness belongs to the latter, stronger class of reﬁnements. Moreover, robustness to incomplete information allows an extremely rich set of “perturbed games.” In particular, while Kohlberg and Mertens allow only independent action trembles across players, the deﬁnition of robustness leads to highly correlated trembles and thus an even stronger reﬁnement. Indeed, KM construct an example in the spirit of Rubinstein (1989) to show that even a game with a unique Nash equilibrium, which is strict, may fail to have any robust equilibrium. Yet it turns out that a large set of games do have robust equilibria. KM provided two sufﬁcient conditions. The ﬁrst is that if a ∗ is the unique correlated equilibrium of g, then a ∗ is robust. The second sufﬁcient condition comes from a generalization of the notion of p-dominance. Fix a vector of probabilities, p = ( p1 , . . . , p I ), one for each player. Action proﬁle a ∗ is a p-dominant equilibrium if each player i’s action is a best response whenever he assigns probability at least pi to his opponents choosing according to a ∗ , i.e., λ(a−i )gi (ai∗ , a−i ) ≥ λ(a−i )gi (ai , a−i ) a−i ∈Ai

a−i ∈Ai

∗ ) such that λ(a−i ) ≥ pi . If a ∗ is for all i = 1, . . . , I , ai ∈ Ai , and λ ∈ (A−i I a p-dominant equilibrium for some p with i=1 pi ≤ 1, then a ∗ is robust to incomplete information. This property is a many-player, many-action generalization of risk dominance. KM proved this result by showing a surprising property of higher-order beliefs. Say that an event is p-believed (for some vector of probabilities p) if each player i believes it with probability at least pi ; and the event is common p-belief if it is p-believed, is p-believed that it is it I p-believed, etc. KM show that if vector p satisﬁes i=1 pi ≤ 1, and an event

15

KM deﬁne the property of robustness to incomplete information for mixed strategy equilibria also, but most of the sufﬁcient conditions described previously apply only to pure strategy proﬁles. For this reason, we focus on pure strategy proﬁles in the discussion that follows.

96

Morris and Shin

has a high probability, then with high probability that event is common p-belief. A generalization of lemma 4.2 then proves the robustness result. Further sufﬁcient conditions for robustness exploit the idea of potential games due to Monderer and Shapley (1996). A function v : A → R is a potential function for complete information game g, if v(ai , a−i ) − v(ai , a−i ) = gi (ai , a−i ) − gi (ai , a−i ) for all i = 1, . . . , I , ai , ai ∈ Ai , and a−i ∈ A−i . This property implies that the game g has identical mixed strategy best response correspondences to the common interest game with common payoff function v. Observe that a ∗ is thus a Nash equilibrium of g if it is a local maximizer of v (i.e., it is not possible to increase v by changing one player’s action). Monderer and Shapley suggested if a game has multiple Nash equilibria, the global maximizer of v (which must of course be a local maximizer and thus a Nash equilibrium) is a natural candidate for selection. If action proﬁle a ∗ is the strict maximum of a potential function v for complete information game g, we say that a ∗ is potential maximizer of g. Ui (2001) shows that a potential maximizing action proﬁle is necessarily robust to incomplete information.16 Many-player, two-action, symmetric payoff games are potential games, so this result provides a proof that the strategy proﬁle where all players choose the Laplacian action is robust to incomplete information.17 The p-dominance sufﬁcient conditions and potential game sufﬁcient conditions for robustness can be uniﬁed and generalized. We very brieﬂy sketch the main ideas and refer the reader to Morris (1999) for more details. Action proﬁle a ∗ is a characteristic potential maximizer of the complete information game g if there exists a function v : 2{1,...,I } → R with v({1, . . . ,I }) > v(S) for all S = {1, . . . , I }, and µi : Ai → R+ such that for all i, ai ∈ Ai , and a−i ∈ A−i , v({ j : a j = a ∗j }) − v({ j : a j = a ∗j } ∪ {i}) ≥ µi (ai )(gi (ai , a−i ) − gi (ai∗ , a−i )). Here, v(·) is a potential function that depends only on the set of players choosing according to a ∗ . In this sense, the characteristic potential maximizer condition strengthens the potential maximizer condition. But, the earlier equalities are replaced with inequalities, and the constants µi also add extra degrees of freedom. So, the characteristic potential maximizer condition neither implies nor is implied by the potential maximizer condition. Any characteristic potential maximizing action proﬁle is robust to incomplete information. One can use duality to show that if a ∗ is a p-dominant equilibrium for some p arguments I with i=1 pi ≤ 1, then a ∗ is a characteristic potential maximizer.18 16

17 18

Ui uses a slightly weaker version of robustness to incomplete information, where all types in the perturbed game either have payoffs given exactly by the complete information game g or have a dominant strategy to choose some action. Morris (1997) previously provided an independent argument showing the robustness of the Laplacian strategy proﬁle. Ui (2000) extends these ideas with a set-based notion of robustness to incomplete information.

Global Games

97

Let the actions of each player be ordered, and for any action ai ∈ Ai , write ai− for the action below ai and ai+ for the action above ai . Action proﬁle a ∗ is a local potential maximizer of the complete information game g if there exists a local potential function v : A → R with v(a ∗ ) > v(a) for all a = a ∗ and, for each i, µi : Ai → R+ , such that for all i = 1, . . . , I and a−i ∈ A−i , gi (ai , a−i ) if ai > ai∗ v(ai , a−i ) − v(ai− , a−i ) ≥ µi (ai ) −gi (ai− , a−i ) (4.2) and v(ai , a−i ) − v(ai+ , a−i ) ≥ µi (ai )

gi (a i , a−i ) −gi ai+ , a−i

if ai < ai∗ .

One can show that if a ∗ is a local potential maximizer, then a ∗ is both a potential maximizer and a characteristic potential maximizer. Thus, it generalizes both conditions. If a ∗ is a local potential maximizer of g, and g satisﬁes strategic complementarities and each gi (ai , a−i ) is concave with respect to ai , then a ∗ is robust to incomplete information. The following two-player, three-action, symmetric payoff game satisﬁes the strategic complementarity and concavity conditions, and one can show that (0, 0) is the local potential maximizer and thus robust (the earlier conditions do not help to characterize robustness in this example; see Table 3.5): Table 3.5. Payoffs in three-action example

0 1 2

0

1

2

4, 4 0, 0 −3, −6

0, 0 1, 1 0, 0

−6, −3 0, 0 2, 2

In fact, the local potential maximizer condition can be used to characterize the unique robust equilibrium in generic two-player, three-action, symmetric payoff games. 4.5.

Noise-Independent Selection

If an action proﬁle is robust to incomplete information, we know that – roughly speaking – any way that a “small” amount of incomplete information is added cannot prevent that action proﬁle being played in equilibrium. This observation has important implications for global games. Consider a global game where payoffs depend continuously on a random parameter θ (which could be multidimensional), and each player observes a noisy signal xi = θ + σ εi . If a ∗ is a robust equilibrium of the game being played at θ ∗ , then there will always be an equilibrium of the global game (for small σ ) where action proﬁle a ∗ is

98

Morris and Shin

almost always played whenever all players observe signals close to θ ∗ . In other words, there will be no way of adding noise that will prevent action proﬁle a ∗ being played in the neighborhood of θ ∗ in some equilibrium. Thus, if there is limit uniqueness [say, because there are strategic complementarities and the other assumptions of Frankel, Morris, and Pauzner (2000) are satisﬁed], then a ∗ must be played in the unique limit for every noise distribution. In the language of Section 4.1, a ∗ must be the noise-independent selection. Here is a heuristic argument for this claim. Fix θ ∗ and let a ∗ be a Nash equilibrium of the complete information game at θ ∗ that is robust to incomplete information. By deﬁnition, if a ∗ is robust to incomplete information in game u(·, θ ∗ ), every incomplete information game where payoffs are almost always given by u(·, θ ∗ ) has an equilibrium where a ∗ is almost always played. Generically, it will also be true that every incomplete information game where payoffs are almost always close to u(·, θ ∗ ) will have an equilibrium where a ∗ is almost always played. But now consider an incomplete information where some types of each player have payoffs close to u(·, θ ∗ ) (“sane” types), although some types may have very different payoffs (“crazy” types). Suppose that conditional on any player being sane, with probability close to 1, he assigns probability close to 1 to all other players being sane. Now, the robustness arguments described previously could be adapted to show that this incomplete information game has an equilibrium where, conditional on all players being sane, a ∗ is almost always played. Now, return to the global game and write B(θ ∗ , δ) for a δ ball around θ ∗ (i.e., the set of θ within Euclidean distance δ of θ ∗ ). For a generic choice of θ ∗ , a ∗ will remain robust to incomplete information close to θ ∗ [i.e., at all θ ∈ B(θ ∗ , δ) for some sufﬁciently small δ > 0]. Now, consider a sequence of global games where we let the noise go to zero (i.e.,σ → 0). For ﬁxed δ and ﬁxed q < 1, we can choose σ sufﬁciently small such that conditional on a player observing a signal in B(θ ∗ , δ), with probability at least q, he will assign probability at least q to all other players observing signals within B(θ ∗ , δ). Labeling the types who observe signals in B(θ ∗ , δ) “sane” and types who observe signals not in B(θ ∗ , δ) “crazy,” this argument shows that there is an equilibrium where a ∗ is almost always played in a neighborhood of θ ∗ .19 5. RELATED MODELS: LOCAL HETEROGENEITY AND UNIQUENESS There are a number of ways that adding local heterogeneity to a population of players can remove multiplicity. In this section, we will attempt to give some intuition for a general logic at work. We start with a familiar example. 19

There is a technical problem formalizing this argument. The robustness analysis described in Section 4.4 was carried out in discrete state spaces, where existence of equilibrium in incomplete information games is never a problem. In the uncountable state space setting of global games, it would be necessary to impose extra assumptions to ensure existence.

Global Games

99

There are two players, 1 and 2, and each player i has a payoff parameter xi . Expected payoffs are given by Table 3.6 : Table 3.6. Payoffs in private value example

Invest NotInvest

Invest

NotInvest

x1 , x2 0, x2 − 1

x1 − 1, 0 0, 0

If there was common knowledge that x1 = x2 = x ∈ (0, 1), then there would be multiple strict Nash equilibria of the complete information game. Because both pure strategy equilibria are strict, they seem quite stable. It seems surprising that an apparently “small” perturbation could remove either equilibrium. But, now let x be a publicly observed random variable and let x1 = x2 = x. Let players be restricted to switching strategies, so that player i will invest if his payoff parameter exceeds some cutoff ki and not invest otherwise. Thus, player i’s strategy is parameterized by a number ki . Because the game is symmetric, we can write b∗ (k) to the optimal cutoff of any player if he expects his opponent to choose cutoff k. Clearly, we have 0, if k ≤ 0 b∗ (k) = k, if 0 ≤ k ≤ 1 . 1, if 1 ≤ k. This function is plotted in Figure 3.6. Symmetric equilibria will exist when this best response function crosses the 45◦ line. So, there are a continuum of equilibria: for any x ∈ [0, 1], there is an equilibrium where each player follows a switching strategy with cutoff x. If we perturb this best response function, we would expect there to be a ﬁnite number of equilibria (i.e., a ﬁnite number of points where the function b∗ crosses the 45◦ line). Given the shape of the best response function, it does not ... .. . ... ........... ......................................................... . .... ..... .... ... .... .... ... . . .... . ... .... .... ... .... ... ... .... .... .... . . ... . .... .... .... .... .... ... .... ... ... . . . .... . .... ... .... ∗ .. .... .. ... ..... .... .... . . .... ... .... .... ... .... .... ... .... .... ... . . .... . . .... .... .... .... .... ... .... .... .... . .... . ... .... ...... . ... ...... ........................................................................................................................................................................................................................................................................ . . ... .. .. .

1

b (k)

0

1 ∗

Figure 3.6. Function b (k).

k

100

Morris and Shin

seem surprising that there might be natural ways of perturbing the best response function so that there is a unique equilibrium. The two-player example of Section 2.1 represented one way of carrying out such a perturbation. There, it was assumed that there was a payoff parameter θ , and each player i observed a noisy signal xi = θ + σ εi . The payoffs in Table 3.6 then represent the expected payoffs of the players, given their signals. Recall signal x j is that a player observing signal xi will believe that his opponent’s √ distributed normally with mean xi and standard deviation 2σ . If σ = 0 in that example, so there is no noise in the signal, we have exactly the scenario described previously with best response function b∗ . But, if σ > 0, then the best response function rotates clockwise a little bit and crosses the 45◦ line only at 12 (see Figure 3.1) and there is a unique equilibrium. However, this argument does not really rely on the incomplete information interpretation. The important feature of the argument is the local heterogeneity in payoffs: a player with payoff parameter xi knows that he is interacting with other player(s) who have some perhaps different, but nearby, payoff parameters; and he knows that those other player(s) in turn know that they are interacting with other player(s) who have some perhaps different, but nearby, payoff parameters. In the remainder of this section, we will see how a similar logic to the global game argument can arise when players are interacting not with unknown types of an opponent, but with (known) opponents at different locations or at different points in time.20,21 5.1.

Local Interaction Games

A continuum of players are evenly distributed on the real line. If a player does not invest, his payoff is 0. If he invests, his payoff is x + l − 1, where x is his location and l is a weighted average of the proportion of his neighbors investing. In particular, let f (·) be the√density of a standard normal distribution with mean 0 and standard deviation 2σ ; a player puts weight f (z) on the actions of players at location x + z. This setup describes a game among a continuum of players. The analysis of this game is identical to the analysis of the continuum player example of Section 2.1. In particular, players at locations less than 12 will not invest, and 20

21

This logic also emerges in the the models of Carlsson (1991) and Carlsson and Ganslandt (1998), where players’ continuous action choice is subject to a small heterogeneous tremble. The exact connection to global games is not known. A distinctive feature of these arguments relying on local heterogeneity is that a very small amount of heterogeneity is sufﬁcient to imply unique equilibrium in environments where there are multiple strict equilibria without heterogeneity. One can also sometimes obtain uniqueness results assuming global, not local, heterogeneity (i.e. assuming that each player or type has the same, but sufﬁciently diffuse, beliefs about other players or types’ payoff parameters). Such global heterogeneity uniqueness arguments rely on the existence of a sufﬁciently large amount of heterogeneity. See Baliga and Sj¨ostr¨om (2001) in an incomplete information context (where global heterogeneity corresponds to independent types); Herrendorf, Valentinyi, and Waldmann (2000) and Glaeser and Scheinkman (2000) in models of large population interactions; and Frankel (2000b) in the context of a dynamic model with payoff shocks.

Global Games

101

players at locations above 12 will invest. This is despite the fact that, if players were interacting only with people at the exact same location (i.e., σ = 0), there would be multiple equilibria at all locations between 0 and 1. This rather stylized game illustrates the possibility that in local interaction games, play at some locations may be inﬂuenced by play at distant locations via the structure of local interaction. A literature on local interaction games has examined this type of effect.22 To understand the connection a little better, imagine a local interaction game where payoffs depend in a nonlinear way on location. Thus, let the payoff to investing be ψ(x) + l − 1 (instead of x + l − 1). Furthermore, suppose that ψ(x) < 12 for all x and that ψ(x) < 0 for some open interval of values of x. For small σ , this game will have a unique equilibrium where no player ever invests. To see why, note that for sufﬁciently small σ , players inside the open interval where ψ(x) < 0 will have a dominant strategy to not invest. But, now players close to the edge of that interval will have about 1 their neighbors within that interval, and thus [since ψ(x) < 12 always] will not 2 invest in equilibrium. This argument will iterate to ensure that no investment takes place anywhere. This argument has very much the ﬂavor of the contagion argument developed by Ellison (1993) and others. There, a population with constant payoffs interacts with near neighbors on a line. Players choose best responses to some average behavior of their neighbors. But, a low rate of mutations ensures small neighborhoods where each action is played with periodically arise randomly. Once a risk-dominant action is played in a small neighborhood, it will tend to spread to the whole population under the best response dynamics. The initial mutant region where the risk-dominant action is played plays much the same role as the dominant strategy region in the story described previously. In this setting with strategic complementarities, best response dynamics mimic iterated deletion of strictly dominated strategies. Morris (1997) describes more formally an exact relationship between a version of Rubinstein’s (1989) e-mail game and a version of Ellison’s contagion effect, and describes more generally an exact equivalence between games of incomplete information and local interaction games.23 The connection between games of incomplete information and local interaction games can be exploited. In evolutionary models, local interaction leads to much faster convergence to stochastically stable states than global interaction, because of the contagious dynamics. But, there is a very close connection between which action will spread contagiously in a local interaction game and which action will be played in the limit in a global game. In particular, recall from Section 4.1 that some games have a noise-independent selection (i.e., an action proﬁle played in the limit of a global game, independent of the noise 22 23

For example, Blume (1995), Ellison (1993), and Young (1998). See Glaeser and Scheinkman (2000) for a recent survey. Hofbauer (1998, 1999) introduces an approach to equilibrium selection in a local interaction environment. His “spatially dominant equilibria” seem to coincide with those that are robust to incomplete information.

102

Morris and Shin

structure); whereas in other games, the action played in the limit depends on the noise structure. Translated to a local interaction setting, this result implies that some games that have the same action tend to spread contagiously, independent of the structure of interaction, whereas in other games ﬁne details of the local interaction structure will determine which action is contagious [see Morris (1999) for details]. Thus, local interaction may not just speed up convergence to stochastically stable states, but may change the stochastically stable states in subtle ways.24 5.2.

Dynamic Games

5.2.1.

Dynamic Payoff Shocks

A continuum of players each live for an instant of time. If a player does not invest, his payoff is 0. If he invests, his payoff is x + l − 1, where x is the date at which he lives and l is a weighted average of the proportion of players investing at other points in time. In particular, let f (·) be the √ density of a standard normal distribution with mean 0 and standard deviation 2σ ; a player puts weight f (z) on the actions of players living at date x + z. This setup describes a game among a continuum of players. The analysis of this game is identical to the analysis of the continuum player example of Section 2.1 and thus also the local interaction example of the previous section. In particular, players will not invest before date 12 and will invest after date 12 . This is despite the fact that, if players were interacting only with people making contemporaneous choices (i.e., σ = 0), there would be multiple equilibria at all dates between 0 and 1. This was a very stylized example. But, the logic is quite general. In many dynamic strategic environments where choices are made at different points in time, a player’s payoff may depend not only on contemporaneous choices, but also on choices made by other players at other times. Payoff conditions may be varying through time. Thus, players’ optimal choices may depend indirectly on environments, where payoffs are very different from what they are now. These features may allow us to identify a unique equilibrium. We discuss two approaches that exploit this logic.25 One approach has been developed recently in Burdzy, Frankel, and Pauzner (2001), Frankel and Pauzner (1999), and Frankel (2000a).26 A continuum of players are periodically randomly matched in a two-player, two-action game. 24 25

26

Morris (2000) also exploits techniques from the higher-order beliefs literature to prove new results about local interaction. Morris (1995) describes a third approach. Suppose that players are deciding whether to invest or not invest at different points in time, but they make their decisions in private and their watches are not synchronized. Thus, each player will believe that the time on any other player’s watch is close to his own, but not identical. Risk-dominant play may result even when perfect synchronization would have allowed multiple equilibria. See also Frankel and Pauzner (2000) and Levin (2000a) for applications following this approach.

Global Games

103

For simplicity, we can think of them playing the investment game described in matrix (2.1). But assume that the publicly observed common payoff parameter θ evolves through time according to some random process [a random walk in Burdzy, Frankel, and Pauzner (2001), a continuous Brownian motion in Frankel and Pauzner (1999)]. Furthermore, suppose that each player can only occasionally alter his behavior: Revision opportunities arrive according to a Poisson process and arrive slowly relative to changes in the game’s payoffs. Under certain conditions on the noise process (roughly equivalent to the sufﬁciently uniform prior conditions in global games), there is a unique equilibrium where each player invests when θ exceeds 12 and not when θ is less than 12 . This description considerably oversimpliﬁes the analysis. For example, it is natural to assume that players observe the public evolution of θ , so they will be able to infer at any point in time (even if they cannot observe) the proportion of players taking each action. This creates an extra state variable (relative to the global games analysis), and the resulting asymmetry between the past and future complicates the analysis. Nonetheless, the logic is similar to the stylized example previously described. In particular, note how the friction in revision opportunities exactly ensures that a player making a choice given some publicly observed θ will take into account the choices that others will make at different times with different publicly observed θ.27 Levin (2000a) describes another approach that is closer to the stylized example previously described. At discrete time t, player t chooses an action. His payoff may depend on the actions of players choosing before him or the player choosing after him, but also depends on a payoff parameter θ . The payoff parameter is publicly observed and evolves according to a random walk. If players act as if they cannot inﬂuence or do not care about the action of the decision maker in the next period, then under weak monotonicity conditions (a player’s best response is increasing in others’ actions and the payoff parameter) and limit dominance conditions [the highest (lowest) action is a dominant strategy for sufﬁciently high (low) values of θ], there is a unique equilibrium. The no inﬂuence assumption makes sense if there are in fact a continuum of players at each date or if actions are observed only with a sufﬁciently long lag. In Matsui’s (1999) currency crisis model, there are overlapping generations of players, but there is a natural reason why players do not care about the actions of players preceding them.28 27

28

Matsui and Matsuyama (1995) earlier analyzed a model with Poisson revision opportunities. However, they assumed that the same game was being played through time (i.e., θ was constant), but examined the stability of different population states. The state where the whole population plays the risk-dominant action can be reached in equilibrium from the state where the whole population plays the risk-dominated action, but not vice versa. Hofbauer and Sorger (1999) show that the potential maximizing action of (many-action) symmetric potential games tends to be played in the Matsui-Matsuyama environment. Oyama (2000) shows that the 12 -dominant equilibrium is selected in this context. In a private communication, Hofbauer has reported that it also selects the “local potential maximizing action” (see Section 4.4) in two-player, three-action games with strategic complementarities and symmetric payoffs. See also Frankel (2000b) on the relationship between some of these models.

104

Morris and Shin

5.2.2.

Recurring Incomplete Information

Let θt follow a random walk, with θt = θt−1 + ηt , where each ηt is independently normally distributed with mean 0 and standard deviation τ . In period t, θt−1 is publicly observed, but θt is observed only with noise. In particular, each player i observes xit = θt + εit , where each εit is independently normally distributed with mean 0 and standard deviation σ . In each period, a continuum of players decide whether to invest with linear payoffs depending on θt (the payoff to not investing is 0, and the payoff to investing is θt + l − 1, where l is the proportion of the population investing). This dynamic game represents a crude way of embedding the static global games analysis in a dynamic setting. In particular, each period’s play of this dynamic game can be analyzed independently and is exactly equivalent to the public signals model of Section 3. In particular, θt−1 is the public signal about θt , whereas xit is player i’s private signal. A unique equilibrium will exist in this dynamic game exactly if γ (σ, τ ) ≤ 2π (i.e., σ is small relative to τ ). In Morris and Shin (2000), we sketch a continuous time version of this recurring incomplete information model and derive the continuous time sufﬁcient conditions for uniqueness. In Morris and Shin (1999a), we discuss such a recurring incomplete information model of currency crises. One distinctive implication of that analysis is that by the publicity effect, the previous period’s fundamentals may be expected to have a disproportionate inﬂuence on current outcomes. Thus, for any given actual level of fundamentals, an attack on the exchange rate is more likely when the fundamentals have just risen. Chamley (1999) considers a richer global game model with recurring incomplete information. A large population of players play a coordination game in each period, but each player has a private cost of taking a risky action that evolves through time. There is correlation in private costs and dominance regions, so that each period’s coordination game has the structure of a global game. But past actions convey information about other players’ private costs and thus (because of persistence) their current costs. Chamley identiﬁes sufﬁcient conditions for uniqueness in all periods and discusses a variety of applications. 5.2.3.

Herding

In the herding models of Banerjee (1992) and Bikhchandani, Hirshleifer, and Welch (1992), players sequentially make some discrete choice. Players do not care about each other’s actions directly, but players have private information, and so each player may partially learn the information of players who choose before him. But, if a number of early-moving players happen to observe signals favoring one action, late-moving players may start ignoring their own private information, leading to inefﬁcient herding because of the negative informational externality. Herding models share with global game models the feature that outcomes are highly sensitive to ﬁne details of the information structure. However, it is

Global Games

105

important to note that the mechanisms are quite different. The global games analysis is driven by strategic complementarities and the highly correlated signals generated by the noisy observations technology. However, sensitivity to the information structure arises in a purely static setting. The herding stories have no payoff complementarities and simple information structures, but rely on sequential choice. Dasgupta (2000a) analyzes a simple model where it is possible to see both kinds of effects at work. A ﬁnite set of players decide sequentially (in an exogenous order) whether to invest or not. Investment conditions are either bad (when each player has a dominant strategy to not invest) or good (in which case it pays to invest if all other players invest). Each player observes a signal from a continuum, with high signals implying a higher probability that investment conditions are good. All equilibria in this model are switching equilibria: each player invests only if all previous players invested and his private signal exceeds some cutoff. Such equilibria encompass herding effects: previous players’ decisions to invest convey positive information to later players and make it more likely that they will invest. They also encompass higher-order belief effects: an increase in a player’s signal makes it more likely that he will invest both because he thinks it more likely that investment conditions are good and because he thinks it more likely that later players will observe high signals and choose to invest.29 6. CONCLUSIONS Global games rest on the premise that the information received by economic agents is informative, but not so informative so as to achieve common knowledge of the underlying fundamentals. Indeed, as the information concerning the fundamentals become more and more accurate, the actions elicited in equilibrium resemble behavior when the uncertainty concerning the actions of other agents becomes more and more diffuse. This points to the potential pitfalls if we rely too much on our intuitions that are based on complete information games that allow perfectly coordinated switching of beliefs and actions. Decentralized decision making in market environments cannot be relied on to rule out inefﬁcient outcomes, so that there may be room for policies that mitigate the inefﬁciencies. The analysis of economic problems using the methods from global games is in its infancy, but the method seems promising. Global games also present a “user-friendly” face of games with incomplete information in the tradition of Harsanyi. The potentially daunting task of forming an inﬁnite hierachy of beliefs over the actions of all players in the game can be given a representation in terms of beliefs (and the behavior that they elicit) that are simple to the point of being naive. Global games go some 29

For other models combining elements of payoff complementarities and herding, see Chari and Kehoe (2000), Corsetti, Dasgupta, Morris, and Shin (2000), Jeitshcko and Taylor (2001), and Marx (2000).

106

Morris and Shin

way to bridging the gap between those who believe that rigorous game theory has a role in economics (as we do) and those who insist on tractable and usable tools for applied economic analysis. ACKNOWLEDGMENTS This study was supported by the National Science Foundation Grant 9709601(to S.M.). Section 3 incorporates work circulated earlier under the title “Private Versus Public Information in Coordination Problems.” We thank Hans Carlsson, David Frankel, Josef Hofbauer, Jonathan Levin, and Ady Pauzner for valuable comments on this paper, and Susan Athey for her insightful remarks. Morris would like to record an important intellectual debt in this area to Atsushi Kajii, through joint research and long discussions. APPENDIX A: PROOF OF PROPOSITION 2.2 We will prove the ﬁrst half of the result [s(x) = 0 for all x ≤ θ ∗ − δ]. The second half [s(x) = 0 for all x ≤ θ ∗ − δ] follows by a symmetric argument. For any given strategy proﬁle s = {si }i∈[0,1] , we write ζ (x) for the proportion of players observing signal x who choose action 1; ζ (·) will always be a continuous function of x. Write πσ (x, k) for the highest possible expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: πσ (x, k) ≡ max {ζ :ζ (x)=0 for all x 0 for the required values of x and ξ . Because we are interested in values of x in the closed interval [x, θ ∗ ] and because varying ξ generates a compact set of 䊏 distributions over l, covergence is uniform.

108

Morris and Shin

APPENDIX B: THE FINITE PLAYER CASE As we noted in the linear example of Section 2.1, analysis of the continuum and ﬁnite players can follow similar methods. Here, we brieﬂy note how to extend the uniform prior private values analysis of proposition 2.1 to the ﬁnite player case. The extension of the general prior common values analysis of proposition 2.2 is then straightforward. The setting is as in Section 2.2.1, except that there are now I ≥ 2 players, and the noise terms in the private signals are identically and independently distributed according to the density f (·). As before, π (l, x) is the payoff gain to choosing action 1 rather than action 0, if you have observed signal x and proportion l of your opponents choose action 1. Of course, now (because you have I − 1 opponents) l will always be an element of the set {0, 1/(I − 1), 2/(I − 1), . . . , 1}. Property A3 becomes: A3(I): I -Player Single Crossing: There exists a unique θ I∗ solving I −1 ∗ k=0 (1/I )π (k/(I − 1), θ I ) = 0. Observe that, as I → ∞, θ I∗ → θ ∗ (i.e., the θ ∗ of assumption A3). In the special case where I = 2, this reduces to 12 π(0, θ2∗ ) + 12 π (1, θ2∗ ) = 0; in other words, θ2∗ is the point where the risk-dominant action (Harsanyi and Selten 1988) switches from 0 to 1. Proposition 2.1 remains true as stated for the ﬁnite player game, with θ I∗ replacing θ ∗ . This was essentially shown by Carlsson and van Damme (1993b). The key step in the proof is showing that, in a symmetric strategy proﬁle, each player has uniform beliefs over the proportion of players observing a higher signal. To see why this is true, note that the probability that a player observing signal x assigns to exactly proportion n(I − 1) of his opponents signal greater than k is

x −θ I −1 k − θ I −1−n 1 f F σ I −1−n σ θ=−∞ σ k−θ n × 1− F dθ, σ ∞

where F(·) is the c.d.f. of f (·). Letting x = k − σ z and carrying out the change of variables ξ = (k − θ )/σ , this expression becomes

∞

ξ =−∞

I −1 f (ξ − z) [F(ξ )] I −1−n [1 − F(ξ )]n dξ. I −1−n

This expression is now independent of σ and k, so we may denote this expression by ψ I (n/(I − 1); z). For the same argument to work as in the continuum case, it is enough to show that ψ I (·; 0) is the uniform distribution. But, integration

Global Games

109

by parts gives ∞ I −1 n I ;0 = ψ f (ξ ) [F(ξ )] I −1−n [1 − F(ξ )]n dξ I −1 I − 1 − n ξ =−∞ ∞ I −1 = f (ξ ) [F(ξ )] I −n [1 − F(ξ )]n−1 dξ I − n ξ =−∞ = ... = =

∞

f (ξ ) [F(ξ )] I −1 dξ

ξ =−∞

1 . I

APPENDIX C: PROOF OF LEMMA 2.3 Recall the following expression for a player’s expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: ∞ 1 x −θ k−θ ∗ f π 1− F , x dθ. πσ (x, k) ≡ σ σ θ=−∞ σ With a change of variables [setting z = (θ − k)/σ ], this expression becomes ∞ x −k − z π(1 − F (−z) , x) dz. πσ∗ (x, k) = f σ z=−∞ We can rewrite this expression as πσ∗ (x, k) = h(x, k, x), where

∞

h(x, k, x ) ≡ f (x, z)g(z, x )dz, z=−∞ x −k f (x, z) ≡ f −z , σ

and g(z, x ) ≡ π (1 − F(−z), x ). Now observe that, by A7, f (x, z) satisﬁes a monotone likelihood ratio property [i.e., if x > x, then f (x, z)/ f (x, z) is increasing in z]; also observe that, by A1∗ ,

g(·, x ) satisﬁes a single crossing property: there exists z ∗ ∈ R ∪ {−∞, ∞} such that g(z, x ) < 0 if z < z ∗ and g(z, x ) > 0 if z > z ∗ . Now lemma 5 in Athey (2000b) implies that h(·, k, x ) satisﬁes a single crossing property: there exists x ∗ (k, x ) such that h(x, k, x ) < 0 for all x < x ∗ (k, x ), and h(x, k, x ) > 0 for all x > x ∗ (k, x ). But by A2, we know that h(x, k, x ) is strictly increasing in x .

110

Morris and Shin

Now suppose h(x, k, x) = 0. If x < x, then h(x , k, x ) < h(x , k, x), < h(x, k, x),

by A2 by the single crossing property of h.

By a symmetric argument, we have x > x ⇒ h(x , k, x ) > h(x, k, x). Thus, there exists β : R → R such that πσ∗ (x, k) < 0 πσ∗ (x, k) = 0 πσ∗ (x, k) > 0

if if

x < β(k) x = β(k)

if

x > β(k).

Thus, if a player thinks that others are following a strategy with cutoff k, a player’s best response is to follow a switching strategy with cutoff β(k). But, by A3, we know that there exists exactly one value of k such that 1 π(l, k)dl = 0. πσ∗ (k, k) = l=0

Thus, there is a unique symmetric switching strategy equilibrium. References Allen, F. and S. Morris (2001), “Finance Applications of Game Theory,” in Advances in Business Applications of Game Theory, (ed. by K. Chatterjee and W. Samuelson), Boston, MA: Kluwer Academic Press. Allen, F., S. Morris, and A. Postlewaite (1993), “Finite Bubbles with Short Sales Constraints and Asymmetric Information,” Journal of Economic Theory, 61, 209–229. Athey, S. (2001), “Single Crossing Properties and the Existence of Pure Strategy Equilibria in Games of Incomplete Information,” Econometrica, 69, 861–889. Athey, S. (2002), “Monotone Comparative Statics under Uncertainty,” Quarterly Journal of Economics, 117(1), 187–223. Aumann, R. (1976), “Agreeing to Disagree,” Annals of Statistics, 4, 1236–1239. Baliga, S. and S. Morris (2000), “Coordination, Spillovers and Cheap Talk,” Journal of Economic Theory. Baliga, S. and T. Sj¨ostr¨om (2001), “Arms Races and Negotiations,” Northwestern University. Banerjee, A. (1992), “A Simple Model of Herd Behavior,” Quarterly Journal of Economics, 107, 797–818. Battigalli, P. (1999), “Rationalizability in Incomplete Information Games,” available at http://www.iue.it/Personal/Battigalli. Bikhchandani, S., D. Hirshleifer, and I. Welch (1992), “A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades,” Journal of Political Economy, 100, 992–1026. Blume, L. (1995), “The Statistical Mechanics of Best-Response Strategy Revision,” Games and Economic Behavior, 11, 111–145. Boonprakaikawe, J. and S. Ghosal (2000), “Bank Runs and Noisy Signals,” University of Warwick.

Global Games

111

Brunner, A. and J. Krahnen (2000), “Corporate Debt Restructuring: Evidence on Coordination Risk in Financial Distress,” Center for Financial Studies, Frankfurt. Bryant, J. (1983), “A Simple Rational Expectations Keynes Type Model,” Quarterly Journal of Economics, 98, 525–529. Bulow, J., J. Geanakoplos, and P. Klemperer (1985), “Multimarket Oligopoly: Strategic Substitutes and Complements,” Journal of Political Economy, 93, 488–511. Burdzy, K., D. Frankel, and A. Pauzner (2001), “Fast Equilibrium Selection by Rational Players Living in a Changing World,” Econometrica, 69, 163–189. Carlsson, H. (1989), “Global Games and the Risk Dominance Criterion,” University of Lund. Carlsson, H. (1991), “A Bargaining Model where Parties Make Errors,” Econometrica, 59, 1487–1496. Carlsson, H. and E. van Damme (1993a), “Global Games and Equilibrium Selection,” Econometrica, 61, 989–1018. Carlsson, H. and E. van Damme (1993b), “Equilibrium Selection in Stag Hunt Games,” in Frontiers of Game Theory, (ed. by K. Binmore, A. Kirman, and A. Tani), Cambridge, MA: MIT Press. Carlsson, H. and M. Ganslandt (1998), “Noisy Equilibrium Selection in Coordination Games,” Economics Letters, 60, 23–34. Chamley, C. (1999), “Coordinating Regime Switches,” Quarterly Journal of Economics, 114, 817–868. Chan, K. and Y. Chiu (2000), “The Role of (Non)Transparency in a Currency Crisis Model,” McMaster University. Chari, V. and P. Kehoe (2000), “Financial Crises as Herd Behavior,” Working Paper 600, Federal Reserve Bank of Minneapolis. Chui, M., P. Gai, and A. Haldane (2000), “Sovereign Liquidity Crises: Analytics and Implications for Public Policy,” International Finance Division, Bank of England. Chwe, M. (1998), “Believe the Hype: Solving Coordination Problems with Television Advertising,” available at http://chwe.net/michael. Corsetti, G., A. Dasgupta, S. Morris, and H. S. Shin (2000), “Does One Soros Make a Difference? The Role of a Large Trader in Currency Crises,” Review of Economic Studies. Dasgupta, A. (2000a), “Social Learning and Payoff Complementarities,” available at http://aida.econ.yale.edu/˜amil. Dasgupta, A. (2000b), “Financial Contagion Through Capital Connections: A Model of the Origin and Spread of Bank Panics,” available at http://aida.econ.yale.edu/˜amil. DeGroot, M. (1970), Optimal Statistical Decisions. New York: McGraw-Hill. Dekel, E. and F. Gul (1996), “Rationality and Knowledge in Game Theory,” in Advances in Economic Theory–Seventh World Congress of the Econometric Society, (ed. by D. Kreps and K. Wallace), Cambridge: Cambridge University Press. Diamond, D. and P. Dybvig (1983), “Bank Runs, Deposit Insurance, and Liquidity,” Journal of Political Economy, 91, 401–419. D¨onges, J. and F. Heinemann (2000), “Competition for Order Flow as a Coordination Game,” Center for Financial Studies, Frankfurt, Germany. Ellison, G. (1993), “Learning, Local Interaction, and Coordination,” Econometrica, 61, 1047–1071. Frankel, D. (2000a), “Determinacy in Models of Divergent Development and Business Cycles,” available at www.tau.ac.il/˜dfrankel. Frankel, D. (2000b), “Noise versus Shocks,” seminar notes, University of Tel Aviv.

112

Morris and Shin

Frankel, D., S. Morris, and A. Pauzner (2000), “Equilibrium Selection in Global Games with Strategic Complementarities,” Journal of Economic Theory. Frankel, D., and A. Pauzner (1999), “Expectations and the Timing of Neighborhood Change,” available at www.tau.ac.il/˜dfrankel. Frankel, D. and A. Pauzner (2000), “Resolving Indeterminacy in Dynamic Settings: The Role of Shocks,” Quarterly Journal of Economics, 115, 285–304. Fudenberg, D. and J. Tirole (1991), Game Theory. Cambridge, MA: MIT Press. Fukao, K. (1994), “Coordination Failures under Incomplete Information and Global Games,” Discussion Paper Series A 299, The Institute of Economic Research, Hitotsubashi University, Kunitachi, Tokyo, Japan. Geanakoplos, J. (1994), “Common Knowledge,” in Handbook of Game Theory, Chapter 40 of Volume 2, (ed. by R. Aumann and S. Hart), New York: Elsevier Science. Glaeser, E. and J. Scheinkman (2000), “Non-Market Interactions,” prepared for the Eighth World Congress of the Econometric Society. Goldstein, I. (2000), “Interdependent Banking and Currency Crises in a Model of SelfFulﬁlling Beliefs,” University of Tel Aviv. Goldstein, I. and A. Pauzner (2000a), “Demand Deposit Contracts and the Probability of Bank Runs,” available at http://www.tau.ac.il/˜pauzner. Goldstein, I. and A. Pauzner (2000b), “Contagion of Self-Fulﬁlling Currency Crises,” University of Tel Aviv. Harsanyi, J. (1967–1968), “Games with Incomplete Information Played by ‘Bayesian’ Players, Parts I–III,” Management Science, 14, 159–182, 320–334, and 486–502. Harsanyi, J. and R. Selten (1988). A General Theory of Equilibrium Selection in Games, Cambridge, MA: MIT Press. Hartigan, J. (1983), Bayes Theory. New York: Springer-Verlag. Heinemann, F. (2000), “Unique Equilibrium in a Model of Self-Fulﬁlling Currency Attacks: Comment,” American Economic Review, 90, 316–318. Heinemann, F. and G. Illing (2000), “Speculative Attacks: Unique Sunspot Equilibrium and Transparency,” Center for Financial Studies, Frankfurt. Hellwig, C. (2000), “Public Information, Private Information, and the Multiplicity of Equilibria in Coordination Games,” London School of Economics. ´ Valentinyi, and R. Waldmann (2000), “Ruling out Multiplicity and Herrendorf, B., A. Indeterminacy: The Role of Heterogeneity,” Review of Economic Studies, 67, 295– 308. Hofbauer, J. (1998), “Equilibrium Selection in Travelling Waves,” in Game Theory, Experience, Rationality: Foundations of Social Sciences, Economics and Ethics, (ed. by W. Leinfellner and E. K¨ohler), Boston, MA: Kluwer. Hofbauer, J. (1999), “The Spatially Dominant Equilibrium of a Game,” Annals of Operations Research, 89, 233–251. Hofbauer, J. and G. Sorger (1999), “Perfect Foresight and Equilibrium Selection in Symmetric Potential Games,” Journal of Economic Theory, 85, 1–23. Hubert, F. and D. Sch¨afer (2000), “Coordination Failure with Multiple Source Lending,” available at http://www.wiwiss.fu-berlin.de/˜hubert. Jeitschko, T. and C. Taylor (2001), “Local Discouragement and Global Collapse: A Theory of Coordination Avalanches,” American Economic Review, 91, 208–224. Kadane, J. and P. Larkey (1982), “Subjective Probability and the Theory of Games,” Management Science, 28, 113–120. Kajii, A. and S. Morris (1997a), “The Robustness of Equilibria to Incomplete Information,” Econometrica, 65, 1283–1309.

Global Games

113

Kajii, A. and S. Morris (1997b), “Common p-Belief: The General Case,” Games and Economic Behavior, 18, 73–82. Kajii, A. and S. Morris (1997c), “Reﬁnements and Higher Order Beliefs in Game Theory,” available at http://www.econ.yale.edu/˜smorris. Kajii, A. and S. Morris (1998), “Payoff Continuity in Incomplete Information Games,” Journal of Economic Theory, 82, 267–276. Karp, L. (2000), “Fundamentals Versus Beliefs under Almost Common Knowledge,” University of California at Berkeley. Kim, Y. (1996), “Equilibrium Selection in n-Person Coordination Games,” Games and Economic Behavior, 15, 203–227. Kohlberg, E. and J.-F. Mertens (1986), “On the Strategic Stability of Equilibria,” Econometrica, 54, 1003–1038. Krugman, P. (1991), “History Versus Expectations,” Quarterly Journal of Economics, 106, 651–667. Laplace, P. (1824), Essai Philosophique sur les Probabilit´es. New York: Dover (English translation). Levin, J. (2000a), “Collective Reputation,” Stanford University. Levin, J. (2000b), “A Note on Global Equilibrium Selection in Overlapping Generations Games,” Stanford University. Marx, R. (2000), “Triggers and Equilibria in Self-Fulﬁlling Currency Collapses,” University of California at Berkeley. Matsui, A. (1999), “Multiple Investors and Currency Crises,” University of Tokyo. Matsui, A. and K. Matsuyama (1995), “An Approach to Equilibrium Selection,” Journal of Economic Theory, 65, 415–434. Mertens, J.-F. and Zamir, S. (1985), “Formulation of Bayesian Analysis for Games with Incomplete Information,” International Journal of Game Theory, 10, 619–632. Merton, R. (1974), “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,” Journal of Finance, 29, 449–470. Metz, C. (2000), “Private and Public Information in Self-Fulﬁlling Currency Crises,” University of Kassel. Milgrom, P. and J. Roberts (1990), “Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities,” Econometrica, 58, 1255–1277. Monderer, D. and D. Samet (1989), “Approximating Common Knowledge with Common Beliefs,” Games and Economic Behavior, 1, 170–190. Monderer, D. and D. Samet (1996), “Proximity of Incomplete Information in Games with Common Beliefs,” Mathematics of Operations Research, 21, 707–725. Monderer, D. and L. Shapley (1996), “Potential Games,” Games and Economic Behavior, 14, 124–143. Morris, S. (1995), “Cooperation and Timing,” available at http://www.econ.yale. edu/˜smorris. Morris, S. (1997), “Interaction Games,” available at http://www.econ.yale.edu/˜smorris. Morris, S. (1999), “Potential Methods in Interaction Games,” available at http://www. econ.yale.edu/˜smorris. Morris, S. (2000), “Contagion,” Review of Economic Studies, 67, 57–78. Morris, S., R. Rob, and H. S. Shin (1995), “ p-Dominance and Belief Potential,” Econometrica, 63, 145–157. Morris, S. and H. S. Shin (1997), “Approximate Common Knowledge and Coordination: Recent Lessons from Game Theory,” Journal of Logic, Language, and Information, 6, 171–190.

114

Morris and Shin

Morris, S. and H. S. Shin (1998), “Unique Equilibrium in a Model of Self-Fulﬁlling Currency Attacks,” American Economic Review, 88, 587–597. Morris, S. and H. S. Shin (1999a), “A Theory of the Onset of Currency Attacks,” in Asian Financial Crisis: Causes, Contagion and Consequences, (ed. by P.-R. Agenor, D. Vines, and A. Weber), Cambridge: Cambridge University Press. Morris, S. and H. S. Shin (1999b), “Coordination Risk and the Price of Debt,” available at http://www.econ.yale.edu/˜smorris. Morris, S. and H. S. Shin (2000), “Rethinking Multiple Equilibria in Macroeconomic Modelling,” NBER Macroeconomics Annual 2000, (ed. by B. Bernanke and K. Rogoff ) Cambridge, MA: MIT Press. Obstfeld, M. (1996), “Models of Currency Crises with Self-Fulﬁlling Features,” European Economic Review, 40, 1037–1047. Osborne, M. and A. Rubinstein (1994), A Course in Game Theory. Cambridge, MA: MIT Press. Oyama, D. (2000), “ p-Dominance and Equilibrium Selection under Perfect Foresight Dynamics,” University of Tokyo, Tokyo, Japan. Rochet, J.-C. and X. Vives (2000), “Coordination Failures and the Lender of Last Resort: Was Bagehot Right after All?” Universitat Autonoma de Barcelona. Rubinstein, A. (1989), “The Electronic Mail Game: Strategic Behavior under Almost Common Knowledge,” American Economic Review, 79, 385–391. Scaramozzino, S. and N. Vulkan (1999), “Noisy Implementation Cycles and the Informational Role of Policy,” University of Bristol. Schelling, T. (1960), Strategy of Conﬂict. Cambridge, MA: Harvard University Press. Selten, R. (1975), “Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games,” International Journal of Game Theory, 4, 25–55. Shin, H. S. (1996), “Comparing the Robustness of Trading Systems to Higher Order Uncertainty,” Review of Economic Studies, 63, 39–60. Shleifer, A. (1986), “Implementation Cycles,” Journal of Political Economy, 94, 1163– 1190. Sorin, S. (1998), “On the Impact of an Event,” International Journal of Game Theory, 27, 315–330. Townsend, R. (1983), “Forecasting the Forecasts of Others,” Journal of Political Economy, 91, 546–588. Ui, T. (2000), “Generalized Potentials and Robust Sets of Equilibria,” University of Tsukuba. Ui, T. (2001), “Robust Equilibria of Potential Games,” Econometrica, 69, 1373–1380. van Damme, E. (1997), “Equilibrium Selection in Team Games,” in Understanding Strategic Interaction: Essays in Honor of Reinhard Selten, (ed. by W. Albers et al.), New York: Springer-Verlag. Vives, X. (1990), “Nash Equilibrium with Strategic Complementarities,” Journal of Mathematical Economics, 19, 305–321. Young, P. (1998), “Individual Strategy and Social Structure,” Princeton, NJ: Princeton University Press.

CHAPTER 4

Testing Contract Theory: A Survey of Some Recent Work Pierre-Andre Chiappori and Bernard Salani´e

1. INTRODUCTION It is a capital mistake to theorise before one has data. Arthur Conan Doyle, A Scandal in Bohemia.

Since the early seventies, the development of the theoretical literature on contracts has been nothing short of explosive. The study of more and more sophisticated abstract models has gone hand in hand with the use of the tools of the theory to better understand many ﬁelds of economics, such as industrial organization, labor economics, taxation, insurance markets, or the economics of banking. However, it is only fair to say that the empirical validation of the theory has long lagged behind the theoretical work. Many papers consist of theoretical analyses only, with little attention to the facts. Others state so-called stylized facts often based on fragile anecdotal evidence and go on to study a model from which these stylized facts can be derived. Until the beginning of the eighties, empirical tests using actual data and econometric methods were very rare, even though the theoretical literature had by then given birth to a large number of interesting testable predictions. Although such a long lag is not untypical in economics, it is clearly unfortunate, especially when one compares our practice to that of other scientists. Even without fully sharing the somewhat extreme methodological views expressed above by Sherlock Holmes, one can hardly dispute that interactions between theory and reality are at the core of any scientiﬁc approach. To give only one example, the models of insurance markets under asymmetric information developed at the beginning of the seventies were extensively tested (and found to lack empirical support) only in the middle of the nineties. If this had been done earlier, the 20-year period could have been used to devise better models. Fortunately, a number of empirical researchers have turned their attention to the theory of contracts in recent years, so that such long lags should become less common. This survey will present a panorama of this burgeoning literature. Because new papers are appearing every week in this ﬁeld, we cannot claim to be exhaustive. We just hope that we can convey to the reader both a sense of

116

Chiappori and Salani´e

excitement at these recent developments and an understanding of the speciﬁc econometric problems involved in taking contract theory to the data. A unifying theme of our survey is the necessity of controlling adequately for unobserved heterogeneity in this literature. If it is not done properly, then the combination of unobserved heterogeneity and endogenous matching of agents to contracts is bound to create selection biases on the parameters of interest. This is given a striking illustration in a recent contribution by Ackerberg and Botticini (2002). They consider the choice between sharecropping and ﬁxedrent contracts in a tenant–landlord relationship. Standard moral hazard models stress the trade-off between incentives and risk-sharing in the determination of contractual forms. Fixed-rent contracts are very efﬁcient from the incentives viewpoint, since the tenant is both the main decision maker and the residual claimant. However, they also generate a very inefﬁcient allocation of risk, in which all the risk is borne by one agent, the tenant, who is presumably more risk averse. When uncertainty is small, risk-sharing matters less, and ﬁxed-rent contracts are more likely to be adopted. On the contrary, in a very uncertain environment, risk-sharing is paramount, and sharecropping is the natural contractual form. This prediction can readily be tested from data on existing contracts, provided that a proxy for the level of risk is available. For instance, if some crops are known to be more risky than others, the theory predicts that these crops are more likely to be associated with sharecropping contracts. A number of papers have tested this prediction by regressing contract choice on crop riskiness. The underlying argument, however, has an obvious weakness: it takes contracts as exogenously given, and disregards any possible endogeneity in the matching of agents to contracts. In other words, the theoretical prediction described holds only for given characteristics of the landlord and the agents. It can be taken to the data only to the extent that this “everything equal” assumption is satisﬁed, so that agents facing different contracts do not differ by some otherwise relevant characteristic. Assume, on the contrary, that agents exhibit ex ante heterogeneous degrees of risk aversion. To keep things simple, assume that a fraction of the agents are risk neutral, whereas the rest are risk averse. Different agents will be drawn to different crops; efﬁciency suggests that riskneutral agents should specialize in the more risky crops. But, note that riskneutral agents should also be proposed ﬁxed-rent contracts, because risk-sharing is not an issue for them. Thus, given heterogeneous risk aversions, ﬁxed-rent contracts are associated with the more risky crops, and the standard prediction is reversed. Clearly, the core of the difﬁculty lies in the fact that, although risk aversion plays a crucial role in the story, it is not directly observable. Conditional on risk aversion, the initial theoretical argument remains valid: more risk makes ﬁxedrent contracts look less attractive. This prediction can in principle be tested, but it requires that differences in risk aversion be controlled for in the estimation or that the resulting endogeneity bias be corrected in some way. The paper is divided in two parts. In Section 2, we study the effect of contractual forms on behavior. This obviously comprises the measure of the so-called “incentive effect” (i.e., the increase in productivity generated by moving to a

Testing Contract Theory

117

higher-powered incentive contract), but we adopt a more general approach here. Thus, we consider that the decision to participate in a relationship and the choice of a contract in a menu of contracts all are effects of contractual forms on behavior. Section 3 turns to the optimality of observed contracts. The central question here can be stated as follows: does the theory predict well the contractual forms that we actually observe? Section 4 provides a brief conclusion. Contract theory encompasses a very large body of literature, and we had to make choices to keep a manageable length for this survey. First, we consider only situations in which contracts are explicit and the details of the contractual agreement are available to the econometrician. In particular, we do not cover the literature on optimal risk-sharing within a group, which has rapidly developed since the initial contributions of Cochrane (1991) and Townsend (1994).1 There are also areas where excellent surveys of the empirical literature have been written recently. Thus, we will not mention any work on auctions in this survey, and we refer the reader to Laffont (1997). Similarly, we will only brieﬂy touch on the provision of incentives in ﬁrms, which is discussed by Gibbons and Waldman (1998) and Prendergast (1999). 2.

CONTRACTS AND BEHAVIOR Circumstantial evidence is a very tricky thing. Arthur Conan Doyle, The Boscombe Valley Mystery.

Several papers aim at analyzing the links between the form of existing contracts and observed behavior. A recurrent problem of this literature is related to selection issues. Empirical observation provides direct evidence of correlations between contracts and behavior. Theoretical predictions, on the other hand, are concerned with causality relationships. Assessing causality from correlations is an old problem in economics, and indeed in all of science; but the issue is particularly important in our context. Typically, one can observe that different contracts are associated with different behaviors, as documented by a large number of contributions. But, the interpretation of the observed correlations is not straightforward. One explanation is that contracts induce the corresponding behavior through their underlying incentive structure; this deﬁnes the so-called incentive effect of contracts. However, an alternative, and often just as convincing, story is that differences in behavior simply reﬂect some unobserved heterogeneity across agents and that this heterogeneity is also responsible for the variation in contract choices. Interestingly enough, this distinction is familiar to both theorists and econometricians, although the vocabulary may differ. Econometricians have for a long time stressed the importance of endogenous selection. In the presence of unobserved heterogeneity, the matching of agents to contracts must be studied with care. If the outcome of the matching process is related to the unobserved heterogeneity variable (as one can expect), then the choice of the contract is 1

See the contribution by Attanasio and Rios-Rull in this volume.

118

Chiappori and Salani´e

endogenous. In particular, any empirical analysis taking contracts as given will be biased. Contract theory, on the other hand, systematically emphasizes the distinction between adverse selection (whereby unobserved heterogeneity preexists the contractual relationship and constrains its form) and moral hazard (whereby behavior directly responds to the incentive structure created by the contract). As an illustration, consider the literature on automobile insurance contracts. The idea, here, is to test a standard prediction of the theory: Everything being equal, people who face contracts entailing more comprehensive coverage should exhibit a larger accident probability. Such a pattern, if observed, can however be given two different interpretations. One is the classical adverse selection effect a` la Rothschild–Stiglitz: high-risk agents, knowing they are more likely to have an accident, self-select by choosing contracts entailing a more comprehensive coverage. Alternatively, one can evoke moral hazard. If some agents, for exogenous reasons (say, picking up the insurance company located down the corner), end up facing a contract with only partial coverage, they will be highly motivated to adopt a more cautious behavior, which may result in lower accident rates. In practice, the distinction between adverse selection and moral hazard may be crucial, especially from a normative viewpoint.2 But it is also very difﬁcult to implement empirically, especially on cross-sectional data. Most empirical papers relating contracts and behavior face, at least implicitly, a selection problem of this kind. Various strategies can be adopted to address it. Some papers explicitly recognize the problem and merely test for the presence of asymmetric information without trying to be speciﬁc about its nature. In other cases, however, available data allow to disentangle selection and incentives. Such is the case, in particular, when the allocation of agents to contracts is exogenous, either because it results from explicit randomization or because some “natural experiment” has modiﬁed the incentive structure without changing the composition of the population. In some cases, an explicit modelization of the economic and/or econometric structure at stake leads to simultaneous estimation of selection and incentives effects. Finally, a promising direction relies on the use of panel data, the underlying intuition being that the dynamics of behavior exhibit speciﬁc features under moral hazard. 2.1.

Testing for Asymmetric Information

Several papers have recently been devoted to the empirical analysis of insurance contracts and insurees’ behavior.3 Following initial contributions by Dahlby 2

3

One of the most debated issues regarding health insurance is the impact of deductible on consumption. It is a well-established fact that, in cross-sectional data, better coverage is correlated with higher expenditure levels. But the welfare implications are not straightforward. If incentives are the main explanation, deductibles or copayments are likely to be useful, because they reduce overconsumption. However, should selection be the main driving force, then limits on the coverage level can only reduce the insurance available to risk averse agents with no gain in terms of expenditure. The result is an unambiguous welfare loss. See Chiappori (2000) for a recent overview.

Testing Contract Theory

119

(1983), Boyer and Dionne (1987), and Puelz and Snow (1994), a (nonexhaustive) list includes Chiappori and Salani´e (1997, 2000), Gouri´eroux (1999), Bach (1998), Cawley and Philipson (1999), Dionne, Gouri´eroux, and Vanasse (2001), and Richaudeau (1999).4 In most cases, the nature of the test is straightforward: conditionally on all information that is available to the insurance company, is the choice of a particular contract correlated to risk, as proxied ex post by the occurrence of an accident? This idea can be given a very simple illustration. Consider an automobile insurance context, where insurees choose between two types of coverage (say, comprehensive versus liability only). Then they may or may not have an accident during the subsequent period. The simplest representation of this framework relies on two probit equations. One describes the choice of a contract, and takes the form yi = I [X i β + εi > 0] ,

(2.1)

where yi = 1 when the insuree chose the full coverage contract at the beginning of the period, 0 otherwise; here, the X i are exogenous covariates that control for all the information available to the insurer, and β is a vector of parameters to be estimated. The second equation relates to the occurrence of an accident: z i = I [X i γ + ηi > 0] ,

(2.2)

where z i = 1 when the insuree had an accident during the period contract, 0 otherwise, and γ is a vector of parameters to be estimated.5 In this context, asymmetric information should result in a positive correlation between yi and z i conditional on X i , which is equivalent to a positive correlation between εi and ηi . This can be tested in a number of ways; for instance, Chiappori and Salani´e (2000) propose two parametric tests and a nonparametric test.6 Interestingly enough, none of these tests can reject the null hypothesis of zero correlation (corresponding to the absence of asymmetric information). These results are conﬁrmed by most studies on automobile insurance7 ; similarly, Cawley and Philipson (1997) ﬁnd no evidence of asymmetric information in life insurance. However, Bach (1998), analyzing mortgage-related unemployment insurance contracts, ﬁnds that insurees who choose contracts with 4 5

6

7

A related reference is Toivanen and Cressy (1998), who consider credit contracts. An additional problem is that, typically, claims, not accidents, are observed. The decision to ﬁll a claim is obviously inﬂuenced by many factors, including the form of the contract, which may induce spurious correlations. For that reason, most studies concentrate on accidents involving several vehicles and/or bodily injuries. See Dionne and Gagn´e (2001) for a careful investigation of these issues. One parametric test is based on a computation of generalized residuals from independent estimations of the two probits, whereas the other requires a simultaneous estimation of the two probits using a general covariance matrix for the residuals. The nonparametric approach relies on the construction of “cells” of identical proﬁles, followed by a series of χ 2 tests. One notable exception is the initial paper by Puelz and Snow (1994). However, subsequent studies strongly suggest that their result may be due to a misspeciﬁcation of the model [see Chiappori and Salani´e (2000) and Dionne et al. (2001)].

120

Chiappori and Salani´e

better (in her case earlier) coverage are more likely to become unemployed. Evidence of adverse selection has also been repeatedly found in annuity markets. Following earlier work by Friedman and Warshawski (1990) and Brugiavini (1993) shows that, controlling for age and gender (the two variables used for pricing), annuity buyers have a longer life expectancy than the rest of the population. Recently, Finkelstein and Poterba (2000) have studied the annuity policies sold by a large UK insurance company since the early 1980s. Again, the systematic and signiﬁcant relationships they ﬁnd between ex post mortality and some relevant characteristics of the policies suggest that adverse selection may play an important role in that market. For instance, individuals who buy more backloaded annuities are found to be longer lived, whereas policies involving payment to the estate in the event of an early death are preferred by customers with shorter life expectancy. This empirical literature on asymmetric information in insurance suggests a few general insights. One is that asymmetric information may be an important issue in some insurance markets, but not in others. Ultimately, this is an empirical question, and the last word should be given to empirical analysis instead of theoretical speculations. From a more methodological perspective, the treatment of the information available to both the insuree and the insurer appears as a key issue. Correctly controlling for this information is a crucial, but quite delicate, task. It may be, for instance, that the linear forms used are not ﬂexible enough, in the sense that they omit relevant nonlinearities or cross-effects.8 Should this be the case, then the resulting, omitted variable bias will result in a spurious correlation between contract choices and risk that could mistakenly be interpreted as evidence of asymmetric information. A last conclusion is that static models may miss important dimensions of the problem. In automobile insurance, for instance, experience rating is known to play an important role. Insurers typically observe past driving records; these are highly informative on accident probabilities, and, as such, are used for pricing. Again, omitting history in the probit regressions will generate a bias toward overestimating the importance of asymmetric information. However, in the presence of unobserved heterogeneity, the introduction of variables reﬂecting past behavior raises complex endogeneity problems. In many cases, an explicit model of the dynamics of the relationship will be required. 2.2.

Experiments

The most natural way to overcome selection problems is to make sure that the allocation of people to contracts is fully exogenous. Assume that different people are assigned to different contracts in a purely random way; then differences in observed behavior can safely be analyzed as responses to the different 8

Chiappori and Salani´e argue that the use of simple, linear functional forms (such as logit or probit) should be restricted to homogeneous populations, such as “young” drivers. An additional advantage of this approach is that it avoids the problems raised by experience rating.

Testing Contract Theory

121

incentive structures at stake. Random assignment may be seen as an ideal situation, a kind of “ﬁrst-best” context for testing contract theory. Such situations, although infrequent, can however be found; their analysis generates some of the most interesting and robust conclusions of the literature. The best example of a random experiment of this kind certainly is the celebrated Rand Health Insurance Experiment (HIE).9 Between November, 1974 and February 1977, the HIE enrolled families in six sites in the United States. Families participating in the experiment were randomly assigned to one of 14 different insurance plans, involving different coinsurance rates and different upper limits on annual out-of-pocket expenses. In addition, lump-sum payments were introduced to guarantee that no family would lose by participating in the experiment. The HIE has provided extremely valuable information about the sensitivity of the demand for health services to out-of-pocket expenditures under a number of different schemes. The use of medical services was found to respond to changes in the amount paid by the insuree. The largest decrease in the use of outpatient services occurs between a free plan and a plan involving a 25% copayment rate; larger rates did not signiﬁcantly affect expenditures. The impact of the various features of the different plans could be estimated, as well as their interaction with such family characteristics as income or number of children. Also, it is possible, using the regressions results, to estimate “pure coinsurance elasticities” (i.e., the elasticity of expenditures to coinsurance rates in the absence of ceilings on out-of-pocket expenses). It is fair to say that the results of the HIE study have been extremely inﬂuential in the subsequent discussions on health plan reforms. The HIE will probably remain as one of the best empirical studies ever made in that ﬁeld, a “Rolls Royce” of empirical contract theory. However, quality comes at a cost. That of the HIE (130 million 1984 dollars) may not be totally prohibitive, but is high enough to severely hamper the repetition of such experiments in the future. Fortunately, not only academics (or government agencies) are willing to run experiments of this kind. Knowledge about the incentive effects of contractual forms is valuable for ﬁrms as well; as a consequence, they may be eager to invest in acquiring such relevant information, in particular through experiments. In a recent contribution, Shearer (1999) studies the case of a tree-planting ﬁrm that randomly allocated workers to plant under piece rate and ﬁxed-wage contracts under a subset of planting conditions. Daily productivities were recorded for each worker and are used to measure the percentage difference in average productivity under both types of payment. A simple analysis of variance analysis suggests an incentive effect of piece wages of about 20 percent. In addition, Shearer estimates a structural econometric model of worker behavior. This enables him to take into account nonexperimental data as well, to impose nonlinear restrictions on the analysis of variance model, and ﬁnally to extend

9

See Manning et al. (1987).

122

Chiappori and Salani´e

his conclusions to a larger set of planting conditions. The estimates appear to be very robust: Shearer ﬁnds a lower bound of 17 percent for the incentive effect. Ausubel (1999) analyzes the market for bank credit cards. A substantial portion of bank credit card marketing today is done via direct-mailed preapproved solicitations; furthermore, several card issuers decide on the terms of the solicitations by conducting large-scale randomized trials. Ausubel uses the outcomes of such a trial to test for a standard prediction of adverse selection theory, namely that high-risk agents are more willing to accept less favorable deals.10 The trial is conducted by generating a mailing list of 600,000 customer names and randomly assigning them among equal market cells. The market cells are mailed solicitations that vary in the introductory interest rate, in the duration of the introductory offer, and in the postintroductory interest rate. Three tests can be conducted on these data. The ﬁrst test relates to a “winner’s curse” prediction: respondents should be worse borrowers than nonrespondents. Ausubel indeed ﬁnds that respondents have on average shorter credit histories, inferior credit rating, and are more borrowed-up than nonrespondents. Second, respondents to inferior offers (i.e., offers displaying a higher introductory interest rate, a shorter duration of the introductory period, or a higher postintroductory interest rate) are also worse borrowers on average, in the sense that they exhibit lower incomes, inferior credit records, lower balances on other credit cards, and higher utilization rates of credit lines on other credit cards. Note, however, that these two tests involve characteristics that are observable by the bank and hence do not correspond to adverse selection in the usual sense. On the other hand, a third test looks for hidden information by checking whether, even after controlling for the observable characteristics of respondents to inferior offers, the latter still yield a customer pool that is more likely to default. The answer is an unambiguous yes, which provides very convincing evidence supporting the existence of adverse selection on the credit card market. 2.3.

Natural Experiments

Selection issues arise naturally in a cross-sectional context: if different people are involved in different contracts, the mechanism that allocates contracts to people deserves close scrutiny. Assume, however, that the same people successively face different contracts. Then, selection is no longer a problem; in particular, any resulting change of behavior can safely be attributed to the variation of incentives, at least to the extent that no other signiﬁcant factor has changed during the same period. This is the basic insight of natural experiments: incentive effects are easier to assess when they stem from some exogenous change in the incentive structure. 10

Technically, the market for credit card exhibits nonexclusive contracts. In particular, the relevant theoretical reference is Stiglitz and Weiss (1981) rather than Rothschild and Stiglitz (1976) as in automobile insurance. Also, Ausubel (1999) focuses on testing for adverse selection, but he argues that moral hazard cannot explain his ﬁndings.

Testing Contract Theory

123

Changes in regulations constitute an obvious source of natural experiments. For instance, the automobile insurance regulation in Qu´ebec was modiﬁed in 1978 by the introduction of a “no fault” system, which in turn was deeply restructured in 1992. Dionne and Vanasse (1996) provide a careful investigation of the effects of these changes. They show in particular that the average accident frequency dropped signiﬁcantly after strong incentives to increase prevention efforts were reinstored in 1992. They conclude that changes in agents’ behavior, as triggered by new incentives, did have a signiﬁcant effect on accident probabilities.11 Another illustration is provided by the study of tenancy reform in West Bengal by Banerjee, Gertler, and Ghatak (2002). The reform, which took place in 1976, entitled tenants, upon registration with the Department of LandRevenue, to permanent and inheritable tenure on the land they sharecropped so long as they paid the landlord at least 25 percent of output as rent. The incentive impact of the reform is rather complex, because it changes the respective bargaining powers of the parties and the tenant’s incentives to invest while reducing the set of incentive devices available for the landlord. To test for the impact of the reform, the authors use two methods. One is to use neighboring Bangladesh as a control; the fact that the reform was implemented in West Bengal, but not in Bangladesh, the authors argue, was to a large extent due to an exogenous political shock. The second method compares changes in productivity across districts with different registration rates. Again, endogeneity might be a problem here; the authors carefully discuss this issue. They ﬁnd that the reform signiﬁcantly increased productivity. Regulation is not the only cause of changes in incentive structures. Periodically, ﬁrms modify their incentive schemes, introduce new rewards, or restructure their wage schedules. Natural experiments of this kind have been repeatedly analyzed. To take only one example, Lazear (2000) uses data from a large auto glass company that changed its compensation structure from hourly wages to piece rates. He ﬁnds that, in accordance with the theoretical predictions, the productivity increases sharply, half of which can be attributed to existing workers producing more. A ﬁrst potential limitation of any work of this kind is that, strictly speaking, it establishes a simultaneity rather than a causality. What the studies by Dionne and Vanasse or Lazear show is that, on a given period, outcomes have changed signiﬁcantly, and that this evolution immediately followed a structural change in incentives. But, the two phenomena might stem from simultaneous and independent (or correlated) causes. The lower rate of accidents following the 1992 Qu´ebec reform may be due, say, to milder climatic conditions. Such a “coincidence” may be more or less plausible, but it is difﬁcult to discard totally. A second and related problem is that the change in the incentive structure may well fail to be exogenous. This is particularly true for ﬁrms, which are supposed to adopt optimal contracts. If the switch from ﬁxed wages to piece rates

11

See Browne and Puelz (1999) for a similar study on U.S. data.

124

Chiappori and Salani´e

indicates that, for some reason, ﬁxed wages were the best scheme before the reform but ceased to be by the time the reform was implemented, then a direct regression will provide biased estimates, at least to the extent that the factors affecting the efﬁciency of ﬁxed wages had an impact on productivity Again, this type of explanation may be difﬁcult to discard.12 The “coincidence” problem can be overcome when the experiment provides a “control” sample that is not affected by the change, so that the effects can be estimated in differences (or more precisely differences of differences). In two recent papers, Chiappori, Durand, and Geoffard (1998) and Chiappori, Geoffard, and Kyriazidou (2000) use such data on health insurance. Following a change in regulation in 1993, French health insurance companies modiﬁed the coverage offered by their contracts in a nonuniform way. Some of them increased the level of deductible, whereas others did not. The tests use a panel of clients belonging to different companies, who were faced with different changes in coverage, and whose demand for health services is observed before and after the change in regulation. To concentrate on those decisions that are essentially made by consumers themselves (as opposed to those partially induced by the physician), the authors study the number of physician visits, distinguishing between general practitioner ofﬁce visits, general practitioner home visits, and specialist visits. They ﬁnd that the number of home visits signiﬁcantly decreased for the “treatment” group (i.e., agents who experienced a change of coverage), but not for the “control” group (for which the coverage remained constant). They argue that this difference is unlikely to result from selection, because the two populations are employed by similar ﬁrms, they display similar characteristics, and participation in the health insurance scheme was mandatory. A paper by Dionne and St.-Michel (1991) provides another illustration of these ideas. They study the impact of a regulatory variation of the coinsurance level in the Qu´ebec public insurance plan on the demand for days of compensation. The main methodological contribution of the paper is to introduce a distinction between injuries, based on the type of diagnosis; it reﬂects the fact that it is much easier for a physician to detect a fracture than, say, lower back pain. In the ﬁrst case, moral hazard (interpreted, in the ex post sense, as the tendency to cheat on the true severity of the accident) can play only a minor role, whereas it may be prevalent when the diagnosis is more difﬁcult. In a sense, the easy diagnoses play the role of a control group, although in a speciﬁc way: they represent situations where the moral hazard problem does not exist. Theory predicts that the regulatory change will have more signiﬁcant effects on the number of days of compensation for those cases where the diagnosis is more problematic. This prediction is clearly conﬁrmed by empirical evidence. A more generous insurance coverage, resulting from an exogenous regulatory change, 12

This remark illustrates a general phenomenon: if contracts are always optimal, then contract changes should always be taken as endogenous. In real life, however, (at least temporarily) inefﬁcient contracts can hardly be assumed away, which, paradoxically, may simplify a lot the task of the econometrician!

Testing Contract Theory

125

is found to increase the number of days on compensation, but only for the cases of difﬁcult diagnoses. Note that the effect thus identiﬁed is ex post moral hazard. The reform is unlikely to have triggered signiﬁcant changes in prevention; and, in any case, such changes would have affected all types of accidents. Another natural experiment based on reforms of public programs is studied by Fortin, Lanoie, and Laporte (1995), who examine how the Canadian Worker’s Compensation (WC) and the Unemployment Insurance (UI) programs interact to inﬂuence the duration of workplace accidents. They show that an increase in the generosity of WC in Qu´ebec leads to an increase in the duration of accidents. In addition, a reduction in the generosity of UI is, as in Dionne and St.-Michel, associated with an increase in the duration of accidents that are difﬁcult to diagnose. The underlying intuition is that worker’s compensation can be used as a substitute to UI. When a worker goes back to the labor market, he may be unemployed and entitled to UI payments for a certain period. Whenever worker’s compensation is more generous than UI, there will be strong incentives to delay the return to the market. In particular, the authors show that the hazard of leaving WC is 27 percent lower when an accident occurs at the end of the construction season, when unemployment is seasonally maximum.13 Finally, an interesting situation is when the changes in the incentive structure are random but endogenous. Take the example of mutual fund managers, as studied by Chevalier and Ellison (1997). The basic assumption of the paper is that fund companies have an incentive to increase the inﬂow of investments. That, in turn, depends on the fund’s performance in an implicit contract between fund companies and their customers. The authors estimate the shape of the ﬂow– performance relationship for a sample of funds observed over the 1982–1992 period, and ﬁnd that it is highly nonlinear. Such a nonlinear shape, in turn, creates incentives for fund managers to alter the riskiness of their portfolios, and these incentives vary with time and past performance. Examining portfolio holdings, the authors ﬁnd that risk levels are changed toward the end of the year in a manner consistent with these incentives. For instance, the ﬂow performance is convex for funds that are ahead of the market; and, as expected, these tend to gamble so as to increase their expected inﬂow of investment.14 In a similar vein, Oyer (1998) remarks that compensation contracts for salespersons and executives are typically nonlinear in ﬁrm revenues, which creates incentives for these agents to manipulate prices, vary effort, and inﬂuence the timing of customer purchases. Using an extensive data set (gathering ﬁrm revenue and cost of goods sold for 31,936 quarterly observations covering 981 manufacturers), Oyer ﬁnds evidence of business seasonality patterns that fully support the theoretical predictions. 13 14

See also Fortin and Lanoie (1992), Bolduc et al. (1997), and the survey by Fortin and Lanoie (1998). Chevalier and Ellison (1999) extend this approach to study the impact of career concerns on the investment decisions of mutual fund managers. For another, recent work on the incentive impact of managerial contracts, see Lemmon, Schallheim, and Zender (2000).

126

Chiappori and Salani´e

2.4.

Explicit Modeling

2.4.1.

Econometric Tools

In the absence of (natural) experiments, the endogenous matching problem is pervasive. Adequate theoretical tools may, however, allow it to be tackled in a satisfactory way. From the econometric perspective, much attention has been devoted to exogeneity tests, which ﬁnd a natural application in our context. An illustration is provided by Laffont and Matoussi (1995), who study a model of sharecropping with moral hazard. The main prediction of this class of models is that production increases with the share of the product kept by the tenant. Laffont and Matoussi use data collected in 1986 on contracts and production in a Tunisian village to test that sharecropping indeed reduces production. To do this, they estimate augmented Cobb–Douglas production functions, adding contract dummy variables as explanatory variables. They ﬁnd that moving from a sharecropping contract to a rental contract increases production by 50 percent on average. However, longer-term sharecropping relationships, which allow for delayed retaliation, tend to be much more efﬁcient, as one would expect from the repeated moral hazard literature in a context of missing credit markets (see Chiappori, Macho, Rey, and Salani´e, 1994). As presented, the Laffont–Matoussi approach seems very sensitive to the criticism of selection bias: if they ﬁnd higher production in plots with rental contracts, it may simply be that rental contracts are more often adopted for more fertile plots. Their answer to this criticism is to test for exogeneity of the contract-type variables in production functions. This they do, and they do not reject exogeneity, which validates their approach. One problem with exogeneity tests is that they may not be very powerful. As we will see, another solution to the selection bias problem is to use instruments. In fact, the exogeneity test used by Laffont–Matoussi assumes that some variables (such that the tenant’s age, his wealth, and working capital) are valid instruments for the contract variables in the production function.

2.4.2.

Structural Models of Regulation under Adverse Selection

Often, however, identiﬁcation requires a full-grown structural model. Wolak (1994) pioneered the estimation of structural models with adverse selection. His paper is set within the context of the regulator–utility relationship for California water companies. However, it is simpler to present it for a price discriminating monopoly (the principal) facing consumers (agents) with an unknown taste θ for the good. Let X be the vector of exogenous variables that are observed by both parties and by the econometrician, α be the vector of parameters we want to estimate, and let q be the quantity traded as per the contract. The observational status of θ depends on our assumptions. First, consider model S (for symmetric information), in which both Principal and Agent observe θ. Then, we obtain by

Testing Contract Theory

127

maximizing the total surplus15 a likelihood function l S (q, X, α; θ ). Note that this is conditional on θ. Now consider the more interesting model A (for asymmetric information) in which only the Agent knows θ and the Principal has a prior given by a probability distribution function f and a cumulative distribution function F. In that case, we know from the theoretical literature that under the appropriate hazard rate condition, the solution is given by maximizing the virtual surplus, which generates a likelihood function l A (q, X, α; θ, (1 − F(θ ))/ f (θ )) . Note that the latter is conditional both on θ and on (1 − F(θ ))/ f (θ ). Assume that we have data on n relationships between Principals and Agents that are identical except for the exogenous variables X , so that our sample is n (qi , X i )i=1 . The difﬁculty here is that we do not know θ or f , even in model S, in which both parties observe θ . In econometric terms, θ is an unobserved heterogeneity parameter and we must integrate over it. To do this, we must ﬁnd a functional form for f that is ﬂexible enough, given that we have very little idea of what the Principal’s prior may look like. Let ( f γ ) be such a parameterized family. We can now estimate all parameters of model S by maximizing over α and γ the log-likelihood n log l S (qi , X i , α; θ ) f γ (θ ) dθ. i=1

To estimate model A, we must ﬁrst integrate f γ to get Fγ ; then, we maximize n 1 − Fγ (θ ) A log l qi , X i , α; θ, f γ (θ ) dθ. f γ (θ ) i=1 These log-likelihood functions are obviously highly nonlinear and also require a numerical integration in both models; however, modern computers make it quite feasible to maximize them. As pointed out before, Wolak (1994) introduced this approach to study the regulation of water utilities in California in the eighties. He found that nonnested tests a` la Vuong (1989) favor model A over model S, indicating that asymmetric information is relevant in this regulation problem. Wolak also noted that using model S instead of model A may lead the analyst to conclude wrongly that returns are increasing, whereas they are estimated to be constant in model A. Finally, he was able to evaluate the underproduction that is characteristic of adverse selection models to about 10 percent in the middle of the θ range. One difﬁculty with Wolak’s method is that the econometrician observes only the conditional distribution of q, given X ; thus, identiﬁcation of the preferred 15

Assuming that utilities are quasilinear.

128

Chiappori and Salani´e

model heavily relies on functional form assumptions. Without them, it is easy to ﬁnd examples in which model S with parameters (α, F) yields exactly the same likelihood function as model A with parameters (α , F ), so that there is no way to discriminate between these two models on the basis of data. Of course, this problem is not speciﬁc to Wolak’s model; it is just the usual identiﬁcation problem in structural models, with the new twist that the parameter F is really inﬁnite-dimensional.16 Ivaldi and Martimort (1994) have used a similar approach in a model that has both market power and asymmetric information. They study competition through supply schedules in an oligopoly, where two suppliers of differentiated goods do not know the consumers’ valuations for the two goods. They model this situation as a multiprincipals game where the suppliers are the principals and the consumers are the agents. Assuming supply schedules to be quadratic, they derive the perfect Bayesian equilibrium in supply schedules and the corresponding segmentation of the market according to the valuations of consumers for the two goods. Ivaldi and Martimort apply this theoretical model to study energy supply to the French dairy industry. The ﬁrst supplier is the public sector monopoly on gas and electricity, EDF-GDF. The second supplier consists of oil ﬁrms, who are assumed to act as a cartel. Oil ﬁrms maximize proﬁt, but EDF-GDF maximizes social welfare. The authors use pseudo–maximum likelihood (Gouri´eroux, Monfort, and Trognon, 1984) to estimate the structural equations derived from their theoretical model. They ﬁnd that the estimated variance of suppliers’ priors on the valuations of consumers is signiﬁcantly positive, so that there is evidence of asymmetric information in this market. Obviously, our remark on identiﬁcation in Wolak’s model also applies here.17

2.4.3.

Structural Models Involving Moral Hazard and Selection

Structural models can be used in a more speciﬁc way to disentangle selection from incentive effects. Paarsch and Shearer (2000) analyze data from a treeplanting ﬁrm, where some workers receive a piece rate, whereas others are paid a ﬁxed wage. In their model, the decision to adopt a piece rate or a ﬁxed wage is modeled as resulting from the observation of the planting conditions by the ﬁrm. The endogeneity problem arises from the fact that neither the planting conditions nor the individual-speciﬁc cost of effort is observed by the econometrician. According to the structural model developed in the paper, 16

17

Wolak also assumes that the regulator maximizes social welfare. Timmins (2000) relaxes this assumption and estimates the relative weights of consumers’ surplus and ﬁrms’ proﬁts in the regulator’s objective function. Gagnepain and Ivaldi (2001) take the existing regulatory framework as given; they estimate the structural parameters of supply and demand and use them to simulate the optimal contracts. See also Lavergne and Thomas (2000).

Testing Contract Theory

129

ﬁxed wages are efﬁcient under poor planting conditions and for less productive employees, whereas piece rates work well in more favorable contexts. A direct comparison of observed productivities under each type of contract thus is biased, because the estimated difference results partly from the incentive effect of piece rates and partly from the selection effect. Hence observed discrepancies in productivity provide an upper bound of the incentive effect. Conversely, differences in real earnings provide a lower bound for the incentive effect. This simple idea can be taken to the data quite easily; the authors ﬁnd an upper (respectively lower) bound of 60 percent (respectively 8 percent). Finally, a parametric version of the structural model is estimated. The authors conclude that about half of the difference in productivity is due to incentive effects and half to selection. Interestingly enough, these nonexperimental ﬁndings are fully compatible with the experimental results in Shearer (1999).18 A related approach is adopted by Cardon and Hendel (2001), who consider employer-provided health insurance. As argued here, a contract that involves a larger copayment rate is likely to correspond to smaller health expenditures, either because of the incentive impact of the copayment rate or because highrisk agents self-select by choosing contracts entailing more coverage. The main identifying assumption used by Cardon and Hendel is that agents do not choose their employer on the basis of the health insurance coverage. A consequence is that whereas the allocation of individuals among the various options of a given plan typically reﬂects adverse selection, the differences in behavior across plans must be from incentive effects. Again, a structural model is needed to disentangle the two effects; the authors ﬁnd that selection effects are negligible, whereas incentives matter.19 2.5.

Using Behavioral Dynamics

If selection and moral hazard are difﬁcult to disentangle in a static context, a natural response is to turn to dynamic data.20 Adverse selection and moral hazard indeed induce different behavioral dynamics, which provides a new source for identiﬁcation. An illustration of this line of research is provided by a recent work by Chiappori, Abbring, Heckman, and Pinquet (2001). They consider a French data base provided by an automobile insurer. A particular feature of automobile insurance in France is that pricing relies on experience rating (i.e., the premium associated to any particular contract depends, among other things, on the 18

19 20

Paarsch and Shearer (1999) use a similar model, where the ﬁrm, having observed the planting conditions, chooses a speciﬁc piece rate. Again, the structural model allows the endogeneity of the rate to be taken into account. Other references include, among others, Holly et al. (1998) and Ferrall and Shearer (1999). A different but related idea is that the use of panel data allows control of unobserved heterogeneity and selection issues in a much more satisfactory way than in cross-sectional analysis. See, for instance, MacLeod and Parent (1999).

130

Chiappori and Salani´e

past history of the relationship), but the particular form experience rating may take is strongly regulated. All companies must apply the same “bonus/malus” system, according to which the premium is decomposed as the product of a “basis” premium, freely set by the insurer but independent of past history, and a bonus coefﬁcient, the dynamics of which is imposed by law. Speciﬁcally, the coefﬁcient is decreased by a factor µ < 1 after each year without an accident but increased by a factor λ > 1 after each year with an accident.21 The authors show that this scheme has a very general property, namely that each accident increases the marginal cost of (future) accidents. Under moral hazard, any accident thus increases prevention efforts and reduces accident probability. The conclusion is that for any given individual, moral hazard induces a negative contagion phenomenon: the occurrence of an accident in the past reduces accident probability in the future. The tricky part, however, is that this prediction holds only conditional on individual characteristics, whether observable or unobservable. As is well known, unobserved heterogeneity induces an opposite, positive contagion mechanism: past accidents are typical of bad drivers, hence are a good predictor of a higher accident rate in the future. The problem thus is to control for unobserved heterogeneity. This problem is fairly similar to an old issue of the empirical literature on dynamic data, namely the distinction between pure heterogeneity and state dependence. The authors show that nonparametric identiﬁcation can actually be achieved under mild identifying restrictions, even when the history available to the econometrician about each driver consists only of the number of years of presence and the total number of accidents during this period. Using a proportional hazard duration model on French data, they cannot reject the null of no moral hazard. 3. ARE CONTRACTS OPTIMAL? We now turn to tests of contract optimality. The papers we are going to survey all focus on the same question: do observed contracts have the properties predicted by contract theory? There is a sense in which the answer is always positive: given any contract, a theorist with enough ingenuity may be able to build an ad hoc theory that “explains” it. The operative word here is “ad hoc.” Clearly, there is no precise deﬁnition of what constitutes an ad hoc assumption, but there may be accepted standards. So, we can rephrase the optimality question thus: do the properties of observed contracts correspond to those that the currently standard models of contract theory predict? This new formulation makes it clear that a negative answer may be only temporary, as better models with new predictions are developed (ideally, in response to such rejections of currently standard models). 21

Currently, µ = .95 and λ = 1.25. In addition, the coefﬁcient at any time is capped and ﬂoored (at 3.5 and .5, respectively). Note that the strict regulation avoids selection problems, because the insuree cannot choose between menus involving different bonus/malus coefﬁcients, as is often the case in other countries.

Testing Contract Theory

3.1.

Static, Complete Contracts

3.1.1.

Managerial Pay

131

The standard model of moral hazard implies that managers’ pay should be sensitive to their ﬁrms’ performance. The “pay-performance sensitivity” has been estimated by many papers [for a recent survey of the evidence, see Murphy (1999)]. The seminal contribution is that of Jensen and Murphy (1990); using data on CEOs of U.S. ﬁrms from 1969 to 1983, they obtained what seemed to be very low estimates of the elasticity of executive compensation to ﬁrm performance. Their oft-quoted result was that when the ﬁrm’s value increases by $1,000, the (mean) manager’s wealth increases only by $3.25. The early reaction to Jensen and Murphy’s result was that they indicated inefﬁciently low incentives for top management (see, e.g., Rosen 1992). However, Haubrich (1994) showed that even fairly low levels of manager’s risk aversion (such as a relative index of risk aversion of about 5) were consistent with this empirical result. The intuition is that for large companies, changes in ﬁrm value can be very large and imply large swings in CEO wealth even for such lowish pay-performance sensitivity levels. Moreover, more recent estimates point to much higher elasticities. Thus, Hall and Liebman (1998) use a more recent data set (1980–1994). They show that the spectacular increase in the stock options component of managers’ pay has made their pay much more sensitive to ﬁrm performance. Their mean (respectively median) estimate of the change in CEO wealth (salary, bonus, and the change in value of stocks and stock options) linked to a $1,000 increase in ﬁrm value indeed is about $25 (respectively $5.3). Much of it is due to the change in value of stocks and stock options. Another testable implication of the moral hazard model is that payperformance sensitivity should be inversely related to the variance of the measure of performance used (typically ﬁrm value for managers). Aggarwal and Samwick (1999) show that, indeed, CEO pay is much less sensitive to performance for ﬁrms whose stock returns are less volatile.22 This result, however, may itself be sensitive to the choice of covariates.23 This illustrates a problem frequently encountered by this literature. Theory predicts the form of optimal contracts within simpliﬁed models, where comparative statics are easy to work out (one can change the level of uncertainty within a moral hazard model by varying one parameter). Taking such predictions to data typically requires some very strong “everything equal” qualiﬁcation. In practice, ﬁrms differ by the uncertainty they face, but also by their size, market share, relationship to the clients, technology, internal organization and others – all of which may be

22 23

Aggarwal and Samwick use panel data and include ﬁxed CEO effects that allow them to control for CEO risk aversion. For instance, Core and Guay (2000) ﬁnd that the sign of the relationship is reversed when controlling for ﬁrm size.

132

Chiappori and Salani´e

correlated, moreover, in various ways. In this context, sorting out one particular type of causality is a difﬁcult task indeed. Other models relate the use of a particular form of compensation to the characteristics of the task to be performed. Using various data sets, MacLeod and Parent (1999) ﬁnd, for instance, that jobs using high-power incentives are associated with more autonomy on the job, and that a high local rate of unemployment results in less discretion in pay or promotion, conﬁrming standard conclusions of incomplete contract theory. Finally, one empirical puzzle in this literature is that ﬁrms do not seem to use relative performance evaluation of managers very much.24 The theory indeed predicts that managers should not be paid for performance that is due to “observable luck,” such as a favorable industrywide exchange rate shock or a change in input prices. Bertrand and Mullainathan (2001) revisit this issue of “pay for luck”; they ﬁnd that manager pay in fact reacts about as much to performance changes that are predictable from observable luck measures as to unpredictable changes in performance. This clearly contradicts the theoretical prediction. However, Bertrand and Mullainathan also ﬁnd that better-governed ﬁrms (such as those with large shareholders) give less pay for luck, as one would expect. 3.1.2.

Sharecropping

Many papers have tested the moral hazard model of sharecropping, and we will quote only a few recent examples.25 Ackerberg and Botticini (2002) regress the type of contract (rental or sharecropping) on crop riskiness and tenant’s wealth. As explained previously, theory predicts that more risky crops are more likely to be grown under sharecropping contracts. If wealth is taken to be a proxy for risk aversion, we would also expect that richer (and presumably less risk averse) tenants are more likely to be under a rental contract. Now wealth is only an imperfect proxy for risk aversion, and as explained earlier, the unobserved component of risk aversion is likely to be correlated with crop riskiness. This implies that the error in the contract choice equation is correlated with one of the explanatory variables, and the estimators of such a naive regression are biased. To remedy this endogenous matching problem, Ackerberg and Botticini instrument the crop riskiness variable, using geographical variables as instruments. They ﬁnd that the results are more compatible with theory than a naive regression would suggest. Moreover, the implicit bias in the naive estimators goes in the direction implied by a matching of more risk-averse tenants with less risky crops: it leads to overestimating the effect of crop risk and underestimating the effect of wealth.26 24 25 26

Gibbons and Murphy (1990) argue that they do. Other recent works include, in particular, a series of papers by Allen and Lueck (1992, 1993, 1998, 1999). An alternative strategy used by Dubois (2000a, 2000b) is to independently estimate individual risk aversion (as a function of the available covariates) from panel data on consumptions (in the line of the consumption smoothing literature), then include the estimated parameter of risk aversion within the explanatory variables for the contract choice equation.

Testing Contract Theory

133

Laffont and Matoussi (1995) test a different variant of the moral hazard sharecropping model. In their story, tenants are risk neutral; but they are facing ﬁnancial constraints that limit how much risk they may take. This model predicts that tenants with less working capital tend to work under sharecropping or even wage contracts. They ﬁnd that their Tunisian data support this prediction. In either of these variants, the theory used is drastically simpliﬁed. Empirical work must often extend the theory to take into account features of real-world applications. Dubois (1999) makes a step in that direction by taking into account landlords’ concerns that tenant effort may exhaust the soil and reduce future land fertility and hence future proﬁts. This is a problem because contracts are incomplete: they cannot be made contingent on land fertility. Moreover, many contracts extend over only one season and so long-term contracts are not feasible. Then, sharecropping may be optimal even with risk-neutral tenants, as it improves future land fertility by reducing tenant effort. This “extended model” of sharecropping has some predictions that differentiate it from the “canonical model” of Stiglitz (1974) and that seem to ﬁt Dubois’ Philippines data set better. For instance, the data show that incentives are higher powered for more valuable plots of land. This is incompatible with most versions of the canonical model; on the other hand, it is quite possible under the extended model. Moreover, observed incentives are lower powered for crops such as corn that tend to exhaust the soil, as the extended model predicts. The theory also predicts that a technological shock that makes the effort of the tenant less crucial should increase the share of the landlord at the optimal contract. Hanssen (2001) argues that this is exactly what happened in the movie industry with the coming of sound in 1927. When ﬁlms were silent, the exhibitor was expected to provide musical background and other live acts. With sound ﬁlms, all of this was incorporated in the movie itself, making the receipts less sensitive to the exhibitor’s effort. Hanssen shows that, as we would expect, contracts between ﬁlm companies and exhibitors rapidly moved from ﬂat-fee rentals to the revenue-sharing agreements that now dominate the industry. Finally, when long-term contracts are available, they are effective in providing incentives for noncontractible investment. If incentive provision is costly because of information rents, long-term contracts will be used only when maintenance beneﬁts are large enough. This idea is exploited by Bandiera (2001) in her study of agricultural contracts in nineteenth century Sicily. She ﬁnds that long-term contracts were indeed used for crops requiring higher maintenance efforts. There are still some features of sharecropping contracts that are harder to explain. One of them is that the share of output that goes to the tenant is not as responsive to economic fundamentals as theory predicts it should be. Young and Burke (2001) show that, in their sample of Illinois farms, almost all contracts have the same tenant share for all types of crops, and this share is one-half for 80 percent of the contracts. They argue that such inﬂexible terms are due to local custom: whereas shares do vary across regions, they are almost constant within regions. Young and Burke put this down to fairness concerns.

134

Chiappori and Salani´e

3.2.

Multitasking

Both the managerial pay and the sharecropping literature test traditional versions of the moral hazard model; but, more recent variants have also been tested. Slade (1996) tests the multitask agency model of Holmstrom and Milgrom (1991) on contracts between oil ﬁrms and their service stations in the Vancouver area. Service stations do not only deliver gasoline, but also may act as convenience stores and/or repair cars. In multitask models, the form of the optimal contract crucially depends on complementarity patterns between tasks: incentives should be lower powered when tasks are more complementary. Slade argues that the convenience store task is likely to be more complementary to the gasoline task than the repairs task. Thus, the theory predicts that service stations that also do repairs should face higher-powered incentives than those that run convenience stores. Slade tests this prediction by running probits for contract type: service station operators may be lessee dealers (with high-powered incentives) or commissioned agents (with low-powered incentives). She ﬁnds that, as predicted by the theory, doing repairs increases the probability of running a lessee dealership, while having a convenience store reduces it. 3.3.

Incomplete Contracts/Transaction Costs

The formal literature on incomplete contracts is still rather young, and to the best of our knowledge, it has not been submitted yet to econometric testing.27 On the other hand, a number of papers have tested the main intuitions from the transactions cost literature as developed by Williamson (1975, 1985, 1996). We will give only a few examples; the reader can refer to more detailed surveys such as in Shelanski and Klein (1995). Perhaps the best-known result from the transactions cost literature, following Williamson, is that, when relationship-speciﬁc investments matter more, contracts will have a longer duration (so as to avoid hold-up problems). This has been tested by Joskow (1987). He studies the relationship between coal suppliers and electric plants that burn coal in the United States in 1979. Williamson distinguishes four types of speciﬁcity. Joskow uses three of them to construct testable predictions: r site speciﬁcity: some electric plants are “mine-mouth” (i.e., located close to the coal mine that supplies them) r physical asset speciﬁcity: electric plants are designed to burn a speciﬁc type of coal (but not necessarily from a speciﬁc supplier); Joskow argues that this consideration matters most in the West, less in the Midwest, and least in the East r dedicated asset speciﬁcity: this holds when large annual quantities are contracted for 27

We will discuss a descriptive study of Kaplan and Str¨omberg (1999).

Testing Contract Theory

135

Thus, transaction cost theory predicts that contracts should have longer duration when they involve mine-mouth plants, when the ﬁrms are in the West, and when large annual quantities are contracted for. Joskow runs a simple regression of contract duration on the three speciﬁcity variables and ﬁnds that all three hypotheses are robustly validated by the data. Crocker and Masten (1988) also test whether the determinants of contract duration conform to what transactions cost theory predicts, with one interesting twist. This goes back to the difﬁculty for the analyst to know whether actual contracts optimally maintain incentives for efﬁcient adaptation, while minimizing need for costly enforcement. Crocker and Masten argue that sometimes there is external interference from courts or government that makes contract terms deviate from the optimal trade-off in predictable ways, and this can be used by the econometrician. They use the example of natural gas, where wellhead regulation at times imposed price ceilings at the producer level. When such a price ceiling is binding, contracts should stipulate higher damages or take-or-pay rates to protect producers. Then, the contract is less efﬁcient, and the contract duration will be shorter – unless the seller fears that the next renegotiation will lead to much lower prices. Crocker and Masten indeed ﬁnd that when the price ceiling is much lower than the notional price (estimated as the latent variable in a probit model), contracts have a shorter duration. This effect is highly significant and matters a lot: price regulation may have shortened contract duration by half. Crocker and Reynolds (1993) look at the determinants of the degree of contract incompleteness itself. They argue that this results from a trade-off between the ex ante costs of crafting more detailed arrangements and the ex post costs of inefﬁciencies. Because the former increase with uncertainty and complexity and the latter increase with the likelihood of opportunistic behavior, one expects that contracts will be less complete when the environment is more uncertain and complex and when opportunistic behavior is less likely. Crocker and Reynolds test these predictions on a sample of U.S. Air Force procurement contracts. They run an ordered probit for the type of the contract on variables that proxy for uncertainty and the reputation of the supplier for opportunistic behavior. Their results support the theoretical prediction. Transactions cost theory also predicts that when quasi-rents are large, sometimes even long-term contracts will not sufﬁce, and vertical integration will take place. A number of papers have tested this prediction and generally found good support for it. An early example is Monteverdi and Teece (1982). They looked at the “make-or-buy” decision in the automobile industry: should components be produced in-house or should they be obtained from outside suppliers? They argued that the answer depends on whether making a particular component involves much or little engineering-speciﬁc knowledge. Then, they ran a probit of the make-or-buy decision on a set of variables that included a measure of engineering-speciﬁc knowledge provided to them by an independent engineer. They found that, as predicted by the theory, components tend to be made in-house when they involve more speciﬁc knowledge.

136

Chiappori and Salani´e

Some less obvious results from transactions cost theory have also been tested. Thus, Crocker and Masten (1991) look at the provisions for adjusting prices during the lifetime of contracts. Some contracts rely on “redetermination provisions”: price adjustment is predetermined through a more or less contingent price adjustment formula. Others emphasize renegotiation provisions, which more or less structure the process of renegotiating prices. Crocker and Masten argue that renegotiations provisions are more useful when the environment is more uncertain or the contract has a longer duration. To test this, they examine a 1982 sample of natural gas contracts in the United States. The observed price adjustment provisions are very diverse, but a probit model for renegotiation vs. redetermination validates the predictions of the theory. Transactions cost theory has also been tested against other theories. For instance, Hubbard and Weiner (1991) use natural gas contracts in the United States in the ﬁfties to examine whether considerations of market power or efﬁcient contracting matter most. Market power is often invoked in this market, because switching contracting parties is difﬁcult and thus there is an element of bilateral monopoly. A linear regression for contract prices (paid by the pipeline to the gas producer) indeed appears to show some evidence for pipeline monopsony power: prices are higher in regions with more pipelines. However, Hubbard and Weiner show that this is due to a spurious correlation: growing markets have more pipelines, but they also exhibit larger quasi-rents. The existence of these quasi-rents motivates the use of a most-favored-nation clause according to which a pipeline that has a contract with producer A and signs a new contract with producer B at a higher price must grant that new price to producer A. Because the most-favored-nation clause tends to be associated with higher prices, this generates the positive correlation between prices and the number of pipelines. That correlation thus appears to be due to efﬁcient contracting considerations and not to market power on either side. Most of the empirical tests of transactions cost theory have been implemented on data from relatively thin markets, where quasi-rents are large. An interesting question is whether these intuitions extend to thicker markets. This has been studied by Hubbard (1999) for the trucking industry. This is an industry in which assets are not very speciﬁc, even less so when local markets are thick. Still, there is some variation on how thick local markets are, and transactions cost theory then predicts that spot arrangements should be more likely when the local market is thicker. Hubbard runs an ordered logit on the various contractual forms in the industry that conﬁrms this prediction. It is fair to say that most of the empirical literature has been supportive of the basic ideas of transactions cost theory. Nevertheless, it is hard to feel completely satisﬁed with the methodology of these studies. One ﬁrst problem is a consequence of the somewhat vague character of some of the concepts in the theory: because quasi-rents and uncertainty are such broad categories, it is very difﬁcult to ﬁnd good proxies for them. Besides, it is not always clear what the observability/veriﬁability status of these variables is. Consider uncertainty, for instance; in this literature, it is often proxied by the volatility of a price

Testing Contract Theory

137

index. But, this is certainly veriﬁable information, so one still has to explain why the contract is not made contingent on the value of that price index. A second problem with this literature is that it usually does not control for the possible endogeneity of right-hand-side variables. Consider, for instance, Joskow’s (1987) study. One of the right-hand-side variables is a dummy variable for a mine-mouth location. But, we are not given any evidence on the determinants of the decision to site a plant mine-mouth; and that may certainly depend on unobserved factors that also inﬂuence contract duration, making the mine-mouth variable endogenous in the regression of contract duration. Because Joskow does not attempt to correct for endogeneity or to test for it, the estimates may be biased. A related point is that Joskow does not condition on the fact that these ﬁrms are not vertically integrated,28 whereas the decision to not vertically integrate again may be correlated with contract duration. Clearly, these two points exemplify the endogenous matching problem that we mentioned repeatedly: regressions of contract variables on characteristics of the parties are fraught with selection bias and endogeneity problems. Finally, what does this tell us about the more recent theory of incomplete contracts, as exposited in Hart’s (1995) book? Because many of the underlying ideas started with transactions cost theory, one might think that the relative empirical success of the older theory somehow validates the newer one. However, this would certainly be premature, as argued by Whinston (2000) for theories of vertical integration. One ﬁrst point is that, because incomplete contracts theory is more formalized, it has a much richer set of predictions than transactions cost theory does. By implication, it exposes itself more to empirical refutation. A second point is that testing incomplete contracts theory is bound to be a much more demanding task. Although we have argued that transactions cost theory relies on quasi-rents that may be difﬁcult to proxy properly, the situation is even worse for incomplete contracts theory, because predictions rather precisely depend on how the marginal returns to noncontractible investments are distributed among the parties. Measuring these marginal returns reliably enough to test the predictions of the theory will require much more highly detailed information on contracting environments than is usually present in our data sets.29 Of course, one may forgo econometrics for the moment and take a more descriptive look at the data. A ﬁrst attempt to do this is the work by Kaplan and Str¨omberg (1999), who analyze a large number of venture capital contracts. The authors argue that venture capitalists (VCs) are real-world entities who most closely approximate the investors of the theory; hence, relating theoretical predictions to real-life VC contracts will provide precious insights about the relevance of theory. Indeed, some of their ﬁndings tend to support standard predictions of the incomplete contract literature. Separate allocation of cash ﬂow 28

29

In a separate paper, Joskow (1985) explores the determinants of vertical integration for this same sample; but what we would want is a joint modeling of contract duration and the decision to integrate vertically. Whinston (2001) and Baker and Hubbard (2001) also discuss this issue.

138

Chiappori and Salani´e

and control rights is a standard feature of VC contracts. The allocation of rights is contingent on observed measures of ﬁnancial and nonﬁnancial performance, especially at early stages of the relationship. Existing contracts are consistent with a basic prediction of the theory, namely that control should be left to the manager in case of success (then the VC keeps cash ﬂow rights only), whereas it shifts to the VC when the ﬁrm’s performance is poor. Finally, the importance of noncompete and vesting provisions suggests that imperfect commitment and hold-up problems are indeed an important aspect of VC contracts. However, some theories seem to fare less well than others. “Stealing” theories a` la Hart and Moore (1998) or Gale and Hellwig (1982), for instance, rely on the impossibility of making contracts contingent on proﬁts (or other measures of ﬁnancial performance), an assumption that is not supported by the data. Finally, several problems are left open by the empirical investigation. For instance, existing theories cannot explain why we observe in these contracts that control rights are allocated across a number of dimensions, such as voting rights, board rights, or liquidation rights. Similarly, the variety and the complexity of the ﬁnancial tools used to allocate rights – convertible securities (with speciﬁc strikes), common and preferred stocks, . . . – go well beyond the simple settings (typically, debt vs. equity) considered so far. Finally, some recent studies usefully remind us that there may be more to incomplete contracting than transactions cost theory or the more recent approach. Banerjee and Duﬂo (1999) focus on the Indian-customized software industry, which writes specialized software for (usually) foreign clients. In this industry, the product is very difﬁcult to describe ex ante; the client writes a vague description of what he wants, software ﬁrms bid by announcing a price and a time schedule, and the client chooses whom he will contract with. Much of the process of describing the functions of the software is interactive and takes place after the contract is signed. Therefore, the contracts are highly incomplete and cost overruns are frequent: Three-quarters of the contracts have cost overruns, of 25 percent of planned costs on average. Because the initial description of the software is so vague, it would be impossible for a court to decide in what proportions the overruns are due to the ﬁrm or to the client. In practice, the contracts are often renegotiated in case of cost overruns to increase the price the software ﬁrm is paid. Banerjee and Duﬂo ﬁnd that the client is more generous in these renegotiations when he faces an older ﬁrm, especially if he has already contracted with that ﬁrm in the past. Banerjee and Duﬂo put it down to reputation effects: they argue that older ﬁrms have shown in the past that they were reliable, all the more so if the client has already dealt with them. They show that alternative explanations ﬁt the data less well.30 McMillan and Woodruff (1999) use a survey of private ﬁrms in Vietnam to investigate the determinants of trade credit. Vietnam does not have a reliable

30

In particular, this cannot be due to optimal risk-sharing, because younger ﬁrms tend to be smaller than older ﬁrms.

Testing Contract Theory

139

legal system, so trust matters a great deal. McMillan and Woodruff indeed ﬁnd that a ﬁrm tends to grant more trade credit to its customers when these have no alternative supplier, when the supplier has more information about the customer’s reliability, and when the supplier belongs to a business or social network that makes information available and/or makes it easier to enforce sanctions. Baker and Hubbard (2000a) investigate the impact on asset ownership of technological changes that modify the contractibility of actions. They consider the U.S. trucking industry, where the introduction, in the late 1980s, of on-board computers (OBCs) allowed contracts to be made contingent on various operating parameters of trucks (speed, etc.). Because of the exogenous enlargement of the space of feasible contracts, suboptimal behavior becomes monitorable, and the need for powerful incentive schemes (such as ownership by drivers) is reduced. Using a survey of the U.S. trucking ﬂeet, they actually ﬁnd that OBC adoption leads to less driver ownership. All OBCs are not equal, however: some improve the monitoring of drivers and others improve the coordination of the ﬂeets. Baker and Hubbard (2000b) argue that this distinction is relevant to the make-or-buy decision (whether the shipper should use an internal or an external ﬂeet): equipments that improve monitoring (respectively coordination) should lead to more (respectively less) integration. Using the same survey, they ﬁnd supporting evidence for this prediction. 3.4.

Dynamics of Contracts

Finally, a few papers have tried to take the predictions of dynamic contract theory to data. This is a difﬁcult task, if only because the theory is often inconclusive or relies on very strong assumptions that are difﬁcult to maintain within an applied framework.31 Still, interesting insights have emerged from this line of work. Three types of models have been considered in the literature. One is the pure model of repeated adverse selection; a second one considers repeated moral hazard; ﬁnally, a couple of papers have recently been devoted to empirical testing of models entailing symmetric learning. 3.4.1.

Dynamic Models of Asymmetric Information

An important contribution is due to Dionne and Doherty (1994), whose model of repeated adverse selection with one-sided commitment transposes previous 31

For instance, most papers in the ﬁeld assume that agents cannot freely save or borrow, so that the dynamics of their consumption can be fully monitored by the principal (whether the latter is an employer, a landlord, or an insurance company). When this assumption is relaxed, the models typically use very speciﬁc preferences (such as constant absolute risk aversion with monetary cost of effort) to guarantee that income effects do not matter. For a detailed discussion in a moral hazard context, see Chiappori et al. (1994).

140

Chiappori and Salani´e

work by Laffont and Tirole (1990) to a competitive framework. The key testable prediction is that, in a repeated adverse selection framework of this kind, whenever commitment is possible for the insurer, then optimal contracts entail experience rating and exhibit a “highballing” property (i.e., the insurance company makes positive proﬁts in the ﬁrst period, compensated by low, below-cost second-period prices). Dionne and Doherty test this property on Californian automobile insurance data. According to the theory, when contracts with and without commitment (from the insurer) are simultaneously available, contracts entailing commitments will typically attract low-risk agents. The presence of highballing is empirically characterized by the fact that the loss to premium ratio should rise with the cohort age. If insurance companies are classiﬁed according to their average loss per vehicle (which reﬂects the “quality” of their portfolio of insurees), one expects the premium growth to be negative for the best-quality portfolios; in addition, the corresponding slope should be larger for ﬁrms with higher average loss ratios. This prediction is conﬁrmed by the data. Insurance companies are classiﬁed into three subgroups. The slope coefﬁcient is negative and signiﬁcant for the ﬁrst group (with lowest average loss), positive and signiﬁcant for the third group, and nonsigniﬁcant for the intermediate group. Dionne and Doherty conclude that the “highballing” prediction is not rejected. In a recent contribution, Margiotta and Miller (2000) analyze a dynamic model of managerial compensation under moral hazard. Their framework is reminiscent of that introduced by Fudenberg, Holmstrom, and Milgrom (1990): the manager’s utility function exhibits constant absolute risk aversion, so that wealth effects do not make the analysis untractable. They estimate the model from longitudinal data on returns to ﬁrms and managerial compensations. Obviously, the dynamic nature of the data introduces more robustness into the estimations, compared with simple cross-sectional analysis. In particular, it allows mitigation of an obvious selection problem with cross-sectional data: the level of incentives provided by the manager’s contract should be endogenous to the ﬁrm’s situation, and the latter may impact the outcome in a nonobservable way. The conclusions drawn by Margiotta and Miller are particularly interesting in view of the Jensen–Murphy controversy. They ﬁnd that, although the beneﬁts of providing incentives are large, the costs are small, in the sense that even a relatively small fraction of the ﬁrm’s shares is generally sufﬁcient to induce the required level of effort. 3.4.2.

Symmetric Learning

Finally, several works test a model of symmetric but incomplete information and learning. The basic reference, here, is the labor contract paper by Harris and Holmstrom (1992), in which the employer and the employee have identical priors about the employee’s ability and learn at the same pace from the employee’s performance. This setting has been applied with success to labor contracts, but also to long-term insurance relationships.

Testing Contract Theory

141

An application to internal labor markets is proposed by Chiappori, Salani´e, and Valentin (1999). Their model borrows the two main ingredients of the Harris and Holmstrom framework, namely symmetric learning and downward rigidity of wages (the latter being explained either by risk-sharing considerations as in the initial model or by hold-up problems and contractual incompleteness). They show that optimal contracts should then exhibit a “late beginner” effect: if two agents, A and B, are at the same wage level at date 0 and at date 2, but A’s wage at date 1 was higher, then B has better future prospects for date 3 and later. They test this prediction on data on contracts and careers within a French public ﬁrm. Interestingly enough, careers, in this context, must be analyzed as sequences of discrete promotions – a feature that requires speciﬁc econometric tools. The results very strongly conﬁrm the predictions: the “late beginner” effect seems like a crucial feature of careers in the context under consideration. Recently, the same type of model has been applied to life insurance contracts by Hendel and Lizzeri (2000). They exploit an interesting database of contracts that includes information on the entire proﬁle of future premiums. Some contracts involve commitment from the insurer, in the sense that the evolution of premia will not be contingent on the insuree’s health status, whereas under the other contracts future premiums are increased if the insuree’s health condition deteriorates. According to the theory, commitment implies front loading (initial premiums should be higher than without commitment, because they include an insurance premium against the reclassiﬁcation risk) and a lower lapsation rate (a fraction of the agents whose health has actually deteriorated would be strictly worse off if they were to change company). These predictions are satisﬁed by existing contracts. Even more interesting is the fact that this conﬁrmation obtains only for general life insurance. Accidental death contracts exhibit none of these features, as one would expect, given that learning considerations are much less prominent. Finally, in such a context, any friction that limits the agent’s mobility between contracts is welfare-improving, because the precommitment of insurees to stay in the pool helps mitigate the uninsurability of the reclassiﬁcation risk. This idea is exploited by Crocker and Moran (1997) in a study of employer-provided health insurance contracts, for which precommitment is proxied by the difﬁculty for workers of switching jobs. They show that when employers must offer the same contract to all of their workers, then the optimal contract exhibits a coverage limitation that is inversely proportional to the degree of employee job lock. If, on the other hand, employers are able to offer multiple contracts that experience-rate the insurees, then the optimal contract exhibits full coverage of medical expenditures, albeit at second-period premiums that partially reﬂect each individual’s observable health status. Crocker and Moran conﬁrm these predictions on data with insurance coverages using proxies for job lock: the insurance contracts associated with ﬁrms who offer a single policy exhibit coverage limitations that are decreasing in the amount of employee job lock, and those ﬁrms offering multiple plans to their workforce have higher levels of coverage that are insensitive to the degree of job lock.

142

Chiappori and Salani´e

4. CONCLUSIONS “Data! data! data!” he cried impatiently. “I can’t make bricks without clay.” Arthur Conan Doyle, The Adventure of the Copper Beeches.

We hope this survey has shown that the econometrics of contracts is a very promising and burgeoning ﬁeld. Although empirical testing of the theory of contracts started in the eighties, most of the papers we have surveyed were indeed written in the last 5 years. For a long time, econometricians could be heard echoing Sherlock Holmes’s complaint about lack of data on contracts. It is true that some researchers have gone far to ﬁnd their data [as far as Renaissance Tuscany for Ackerberg and Botticini (2002)]. Still, it has proven much less difﬁcult than expected to ﬁnd data amenable to econometric techniques. In fact, we draw the impression from Bresnahan’s (1997) earlier World Congress survey that the situation is somewhat worse in industrial organization. It is still true that many papers in this ﬁeld use similar data and/or focus on similar problems, as shown by the number of papers on sharecropping or natural gas we surveyed. We would certainly want to see wider-ranging empirical work in the future. Insurance data are very promising in that respect, because they are fairly standardized, come in large data sets, and can be used to test many different theories. It can also be hoped that, in the future, ﬁrms will be less averse to opening their personnel data to researchers, as they did to Baker, Gibbs, and Holmstrom (1994a, 1994b). Our conclusion on the importance of incentive effects echoes that of Prendergast (1999) for incentives in ﬁrms: the recent literature, as surveyed in Section 2, provides very strong evidence that contractual forms have large effects on behavior. As the notion that “incentives matter” is one of the central tenets of economists of every persuasion, this should be comforting to the community. On the other hand, it raises an old puzzle: if contractual form matters so much, why do we observe such a prevalence of fairly simple contracts? More generally, the question asked in Section 3 is whether observed contracts take the form predicted by the theory. As we have seen, the evidence is more mixed in that regard. However, it is reassuring to see that papers that control adequately for selection and endogeneity bias have generally been more supportive of the theory. Throughout this survey, we emphasized the crucial role of the selection, matching, and contract endogeneity issues. These problems are prevalent in the two approaches we distinguish (i.e., whether one is testing for the optimality of contracts or for the behavioral impact of given contractual forms). It can be argued that selection issues are probably even more difﬁcult to address in the ﬁrst case, because our theoretical understanding of situations involving “realistic” forms of unobserved heterogeneity is often very incomplete. To take but one example, Rothchild and Stiglitz’s (1976) celebrated model of insurance under adverse selection assumes identical preferences across agents. Should risk aversion differ across insurees as well, then the shape of the equilibrium

Testing Contract Theory

143

contract is not fully known for the moment.32 It is safe, however, to predict that where the theory cannot be reconciled with the facts, new and improved models will emerge. Thus we hope that some econometricians will be inspired by this survey to contribute to the growing literature on testing of contract theory, while negative empirical ﬁndings may prompt some theorists to improve the theory itself. As an example of this potentially fruitful dialog between theorists and econometricians, the empirical ﬁndings by Chiappori and Salani´e (1997, 2000) and others that the standard models of insurance do not ﬁt the data well in some insurance markets has led Chassagnon and Chiappori (1997), Jullien, Salani´e, and Salani´e (2000), and de Meza and Webb (2001) to propose new models of insurance that are based on a combination of moral hazard and adverse selection. Similarly, new tools have recently been developed that allow tackling the possible coexistence of several types of unobserved heterogeneity.33 We hope to see more of this interplay between theory and testing in the future. ACKNOWLEDGMENTS We thank our discussant Patrick Legros and Jeff Campbell, Pierre Dubois, Phillippe Gagnepain, Lars Hansen, Jim Heckman, Patrick Legros, Bruce Shearer, Steve Tadelis, and Rob Townsend for their comments. This paper was written while Salani´e was visiting the University of Chicago, which he thanks for its hospitality.

References Ackerberg, D. and M. Botticini (2002), “Endogenous Matching and the Empirical Determinants of Contract Form,” Journal of Political Economy, 110(3), 564–91 Aggarwal, R. and A. Samwick (1999), “The Other Side of the Trade-off: The Impact of Risk on Executive Compensation,” Journal of Political Economy, 107, 65–105. Akerlof, G. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. Allen, D. W. and D. Lueck (1992), “Contract Choice in Modern Agriculture: Cash Rent Versus Cropshare,” Journal of Law and Economics, 35, 397–426. Allen, D. W. and D. Lueck (1993), “Transaction Costs and the Design of Cropshare Contracts,” Rand Journal of Economics, 24(1), 78–100. Allen, D. W. and D. Lueck (1998), “The Nature of the Farm,” Journal of Law and Economics, 41, 343–386. Allen, D. W. and D. Lueck (1999), “The Role of Risk in Contract Choice,” Journal of Law, Economics and Organization, 15(3), 704–736. Ausubel, L. (1999), “Adverse Selection in the Credit Card Market,” mimeo, University of Maryland. 32 33

See Landsberger and Meilijson (1999). See Rochet and Stole in this volume (pp. 150–197).

144

Chiappori and Salani´e

Bach, K. (1998), Negativauslese und Tarifdifferenzierung im Versicherungs-sektor. DUV, Schesslitz. Baker, G., M. Gibbs, and B. Holmstrom (1994a), “The Internal Economics of the Firm: Evidence from Personnel Data,” Quarterly Journal of Economics, 109, 881– 919. Baker, G., M. Gibbs, and B. Holmstrom (1994b), “The Wage Policy of a Firm,” Quarterly Journal of Economics, 109, 921–955. Baker, G. and T. Hubbard (2000a), “Contractibility and Asset Ownership: On-Board Computers and Governance in U.S. Trucking,” NBER Working Paper 7634. Baker, G. and T. Hubbard (2000b), “Make vs. Buy in Trucking: Asset Ownership, Job Design, and Information,” mimeo, Harvard University. Baker, G. and T. Hubbard (2001), “Empirical Strategies in Contract Economics: Information and the Boundary of the Firm,” American Economic Review, 91, 189–194. Bandiera, O. (2001), “On the Structure of, Tenancy Contracts: Theory and Evidence from 19th Century Rural Sicily,” CEPR Working Paper 3032. Banerjee, A. and E. Duﬂo (1999), “Reputation Effects and the Limits of Contracting: A Study of the Indian Software Industry,” mimeo, MIT. Banerjee, A., P. Gertler, and M. Ghatak (2002), “Empowerment and Efﬁciency: Tenancy Reform in West Bengal,” Journal of Political Economy, 110, 239–280. Bertrand, M. and S. Mullainathan (2001), “Are CEOs Rewarded for Luck? The Ones without Principals are,” Quarterly Journal of Economics, 116, 901–932. Bolduc, D., B. Fortin, F. Labrecque, and P. Lanoie (1997), “Workers’ Compensation, Moral Hazard and the Composition of Workplace Injuries,” mimeo, HEC, Montreal. Boyer, M. and G. Dionne (1989), “An Empirical Analysis of Moral Hazard and Experience Rating,” Review of Economics and Statistics, 71, 128–134. Bresnahan, T. (1997), “Testing and Measurement in Competition Models,” in Advances in Economics and Econometrics–Theory and Applications, Volume 3, (ed. by D. Kreps and K. Wallis), Econometric Society Monographs, 28, Cambridge University Press, pp. 61–81. Browne, M., and R. Puelz (1999), “The Effect of Legal Rules on the Value of Economic and Non-Economic Damages and the Decision to File,” Journal of Risk and Uncertainty, 18, 189–213. Brugiavini, A. (1993), “Uncertainty Resolution and the Timing of Annuity Purchase,” Journal of Public Economics, 50, 31–62. Cardon, J. and I. Hendel (2001), “Asymmetric Information in Health Insurance: Evidence from the National Health Expenditure Survey,” Rand Journal of Economics, 32, 408– 427. Cawley, J. and T. Philipson (1999), “An Empirical Examination of Information Barriers to Trade in Insurance,” American Economic Review, 89, 827–846. Chevalier, J. and G. Ellison (1997), “Risk Taking by Mutual Funds as a Response to Incentives,” Journal of Political Economy, 105, 1167–1200. Chevalier, J. and G. Ellison (1999), “Career Concerns of Mutual Fund Managers,” Quarterly Journal of Economics, 114, 389–432. Chassagnon, A. and P. A. Chiappori (1997), “Insurance under Moral Hazard and Adverse Selection: The Competitive Case,” mimeo, DELTA. Chiappori, P. A. (2000), “Econometric Models of Insurance under Asymmetric Information,” Handbook of Insurance, (ed. by G. Dionne), Amsterdam: North-Holland. Chiappori, P. A., J. Abbring, J. Heckman, and J. Pinquet (2001), “Testing for Adverse Selection Versus Moral Hazard from Dynamic Data,” mimeo, University of Chicago.

Testing Contract Theory

145

Chiappori, P. A., F. Durand, and P. Y. Geoffard (1998), “Moral Hazard and the Demand for Physician Services: First Lessons from a French Natural Experiment,” European Economic Review, 42, 499–511. Chiappori, P. A., P. Y. Geoffard, and E. Kyriazidou (2000), “Cost of Time, Moral Hazard, and the Demand for Physician Services,” mimeo, University of Chicago. Chiappori, P. A., I. Macho, P. Rey, and B. Salani´e (1994), “Repeated Moral Hazard: The Role of Memory and Commitment, and the Access to Credit Markets,” European Economic Review, 38, 1527–1553. Chiappori, P. A. and B. Salani´e (1997), “Empirical Contract Theory: The Case of Insurance Data,” European Economic Review, 41, 943–950. Chiappori, P. A. and B. Salani´e (2000), “Testing for Asymmetric Information in Insurance Markets,” Journal of Political Economy, 108, 56–78. Chiappori, P. A., B. Salani´e, and J. Valentin (1999), “Early Starters versus Late Beginners.” Journal of Political Economy, 107, 731–760. Cochrane, J. (1991), “A Simple Test of Consumption Insurance,” Journal of Political Economy, 99, 957–976. Core, J. and W. Guay (2000), “The Other Side of the Trade-off: The Impact of Risk on Executive Compensation: a Comment,” mimeo, Wharton School. Crocker, K. and S. Masten (1988), “Mitigating Contractual Hazards: Unilateral Options and Contract Length,” Rand Journal of Economics, 19, 327–343. Crocker, K. and S. Masten (1991), “Pretia Ex Machina? Prices and Process in Long-Term Contracts,” Journal of Law and Economics, 34, 69–99. Crocker, K. and J. Moran (1997), “Commitment and the Design of Optimal Agreements: Evidence from Employment-Based Health Insurance Contract,” mimeo, University of Michigan. Crocker, K. and K. Reynolds (1993), “The Efﬁciency of Incomplete Contracts: An Empirical Analysis of Air Force Engine Procurement,” Rand Journal of Economics, 24, 126–146. Dahlby, B. (1983), “Adverse Selection and Statistical Discrimination: An Analysis of Canadian Automobile Insurance,” Journal of Public Economics, 20, 121–130. de Meza, D. and D. Webb (2001), “Advantageous Selection in Insurance Markets,” Rand Journal of Economics, 32, 249–262. Dionne, G. and N. Doherty (1994), “Adverse Selection, Commitment and Renegotiation: Extension to and Evidence from Insurance Markets,” Journal of Political Economy, 102(2), 210–235. Dionne, G. and R. Gagn´e (2001), “Deductible Contracts against Fraudulent Claims: Evidence from Automobile Insurance,” Review of Economics and Statistics, 83, 290– 301. Dionne, G., C. Gouri´eroux, and C. Vanasse (2001), “Testing for Adverse Selection in the Automobile Insurance Market: A Comment,” Journal of Political Economy, 109, 444–453. Dionne, G, and P. St-Michel (1991), “Worker’s Compensation and Moral Hazard,” Review of Economics and Statistics, 73, 236–244. Dionne, G., and C. Vanasse (1996), “Une e´ valuation empirique de la nouvelle tariﬁcation de l’assurance automobile au Qu´ebec,” mimeo, Montreal. Dubois, P. (1999), “Moral Hazard, Land Fertility, and Sharecropping in a Rural Area of the Philippines,” CREST Working Paper 9930. Dubois, P. (2000a), “Assurance compl`ete, h´et´erog´en´eit´e des pr´ef´erences et m´etayage au ´ Pakistan,” Annales d’Economie et de Statistiques, 59, 1–36.

146

Chiappori and Salani´e

Dubois, P. (2000b), “Consumption Insurance with Heterogeneous Preferences: Can Sharecropping Help Complete Markets?,” mimeo, INRA, Toulouse. Ferrall, C. and B. Shearer (1999), “Incentives and Transactions Cost within the Firm: Estimating an Agency Model Using Payroll Records,” Review of Economic Studies, 66, 309–338. Finkelstein, A. and J. Poterba (2000), “Adverse Selection in Insurance Markets: Policyholder Evidence from the U.K. Annuity Market,” NBER Working Paper W8045. Fortin, B. and P. Lanoie (1992), “Substitution between Unemployment Insurance and Workers’ Compensation,” Journal of Public Economics, 49, 287–312. Fortin, B. and P. Lanoie (1998), “Effects of Workers’ Compensation: A Survey,” CIRANO Scientiﬁc Series, Montr´eal, 98s–104s. Fortin, B., P. Lanoie, and C. Laporte (1995), “Is Workers’ Compensation Disguised Unemployment Insurance?” CIRANO Scientiﬁc Series, Montr´eal, 95s-148s. Friedman, B. M. and M. J. Warshawski (1990), “The Cost of Annuities: Implications for Savings Behavior and Bequests,” Quarterly Journal of Economics, 105, 135–154. Fudenberg, D., B. Holmstrom, and P. Milgrom (1990), “Short-Term Contracts and LongTerm Agency Relationships,” Journal of Economic Theory, 51, 1–31. Gagnepain, P. and M. Ivaldi (2001), “Incentive Regulatory Policies: The Case of Public Transit Systems in France,” mimeo. Gale, D. and M. Hellwig (1985), “Incentive-Compatible Debt Contracts: The One-Period Problem,” Review of Economic Studies, 52, 647–663. Gibbons, R. and K. Murphy (1990), “Relative Performance Evaluation of Chief Executive Ofﬁcers,” Industrial and Labor Relations Review, 43, S30–S51. Gibbons, R. and M. Waldman (1999), “Careers in Organizations: Theory and Evidence, Handbook of Labor Economics,” Volume 3b (ed. by O. Ashenfelter and D. Card): North-Holland, Amsterdam, 2373–2437 Gouri´eroux, C. (1999), “The Econometrics of Risk Classiﬁcation in Insurance,” Geneva Papers on Risk and Insurance Theory, 24, 119–139. Gouri´eroux, C., A. Monfort, and A. Trognon (1984), “Pseudo-Maximum Likelihood Methods: Theory,” Econometrica, 52, 681–700. Hall, J. and J. Liebman (1998), “Are CEOs Really Paid Like Bureaucrats?” Quarterly Journal of Economics, 113, 653–691. Hanssen, R. (2001), “The Effect of a Technological Shock on Contract Form: Revenue Sharing in Movie Exhibition and the Coming of Sound,” mimeo. Harris, M. and B. Holmstrom (1982), “A Theory of Wage Dynamics,” Review of Economic Studies, 49, 315–333. Hart, O. (1995), Firms, Contracts, and Financial Structure. London: Oxford University Press. Hart, O. and J. Tirole (1988), “Contract Renegotiation and Coasian Dynamics,” Review of Economic Studies, 55, 509–540. Haubrich, J. (1994), “Risk Aversion, Performance Pay, and the Principal-Agent Model,” Journal of Political Economy, 102, 258–276. Hendel, I. and A. Lizzeri (2000), “The Role of Commitment in Dynamic Contracts: Evidence from Life Insurance,” Working Paper, Princeton University. Holly, A., L. Gardiol, G. Domenighetti, and B. Bisig (1998), “An Econometric Model of Health Care Utilization and Health Insurance in Switzerland,” European Economic Review, 42, 513–522. Holmstrom, B. and P. Milgrom (1991), “Multitask Principal-Agent Analyses: Incentive

Testing Contract Theory

147

Contracts, Asset Ownership and Job Design,” Journal of Law, Economics and Organization, 7, 24–51. Hubbard, T. (1999), “How Wide Is the Scope of Hold-Up-Based Theories? Contractual Form and Market Thickness in Trucking,” mimeo, UCLA . Hubbard, R. and R. Weiner (1991), “Efﬁcient Contracting and Market Power: Evidence from the U.S. Natural Gas Industry,” Journal of Law and Economics, 34, 25–67. Ivaldi, M. and D. Martimort (1994), “Competition under Nonlinear Pricing,” Annales d’Economie et de Statistiques, 34, 72–114. Jensen, M. and K. Murphy (1990), “Performance Pay and Top-Management Incentives,” Journal of Political Economy, 98, 225–264. Joskow, P. (1985), “Vertical Integration and Long-Term Contracts: The Case of CoalBurning Electric Generation Plants,” Journal of Law, Economics, and Organization, 1, 33–80. Joskow, P. (1987),“ Contract Duration and Relationship-Speciﬁc Investments: Empirical Evidence from Coal Markets,” American Economic Review, 77, 168–185. Jullien, B., B. Salani´e, and F. Salani´e (2000), “Screening Risk-Averse Agents under Moral Hazard,” mimeo. Kaplan, S. and P. Str¨omberg (1999), “Financial Contracting Theory Meets the Real World: Evidence from Venture Capital Contracts,” mimeo. Laffont, J. J. (1997), “Game Theory and Empirical Economics: The Case of Auction Data,” European Economic Review, 41, 1–35. Laffont, J.-J. and M. Matoussi (1995), “Moral Hazard, Financial Constraints and Sharecropping in El Oulja,” Review of Economic Studies, 62, 381–399. Laffont, J.-J. and J. Tirole (1990), “Adverse Selection and Renegotiation in Procurement,” Review of Economic Studies, 57(4), 597–625. Landsberger, M. and I. Meilijson (1999), “A General Model of Insurance under Adverse Selection,” Economic Theory, 14, 331–352. Lavergne, P., and A. Thomas (2000), “Semiparametric Estimation and Testing in a Model of Environmental Regulation with Adverse Selection,” mimeo, INSEE, Paris. Lazear, E. (2000), “Performance Pay and Productivity,” American Economic Review, 90, 1346–1361. Lemmon, M., J. Schallheim, and J. Zender (2000), “Do Incentives Matter? Managerial Contracts for Dual-Purpose Funds,” Journal of Political Economy, 108, 273– 299. MacLeod, B., and D. Parent (1999), “Job Characteristics and the Form of Compensation,” Research in Labor Economics, 18, 177–242. Manning, W., J. Newhouse, N. Duan, E. Keeler, and A. Leibowitz (1987), “Health Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment,” American Economic Review, 77, 251–277. Margiotta, M. and R. Miller (2000), “Managerial Compensation and the Cost of Moral Hazard,” International Economic Review, 41, 669–719. Masten, S. and K. Crocker (1985), “Efﬁcient Adaptation in Long-Term Contracts: Take-or-Pay Provisions for Natural Gas,” American Economic Review, 75, 1083– 1093. McMillan, J. and C. Woodruff (1999), “Interﬁrm Relationships and Informal Credit in Vietnam,” Quarterly Journal of Economics, 114, 1285–1320. Monteverdi, K. and D. Teece (1982), “Supplier Switching Costs and Vertical Integration in the Automobile Industry,” Bell Journal of Economics, 13, 206–213.

148

Chiappori and Salani´e

Murphy, K. (1999), “Executive Compensation,” in Handbook of Labor Economics, Vol. 3, (ed. by O. Ashenfelter and D. Card), Amsterdam: North-Holland. Oyer, P. (1998), “Fiscal Year End and Nonlinear Incentive Contracts: The Effect on Business Seasonality,” Quarterly Journal of Economics, 113, 149–185. Paarsch, H. and B. Shearer (1999), “The Response of Worker Effort to Piece Rates: Evidence from the British Columbia Tree-Planting Industry,” Journal of Human Resources, 34, 643–667. Paarsch, H. and B. Shearer (2000), “Piece Rates, Fixed Wages, and Incentive Effects: Statistical Evidence from Payroll Records,” International Economic Review, 41, 59– 92. Prendergast, C. (1999), “The Provision of Incentives in Firms,” Journal of Economic Literature, 37, 7–63. Puelz, R. and A. Snow (1994), “Evidence on Adverse Selection: Equilibrium Signaling and Cross-Subsidization in the Insurance Market,” Journal of Political Economy, 102, 236–257. Richaudeau, D. (1999), “Automobile Insurance Contracts and Risk of Accident: An Empirical Test Using French Individual Data,” Geneva Papers on Risk and Insurance Theory, 24, 97–114. Rosen, S. (1992), “Contracts and the Market for Executives,” in Contract Economics, (ed. by L. Werin and H. Wijkander), Oxford, UK: Basil Blackwell. Rothschild, M. and J. Stiglitz (1976), “Equilibrium in Competitive Insurance Markets,” Quarterly Journal of Economics, 90, 629–649. Shearer, B. (1999), “Piece Rates, Fixed Wages and Incentives: Evidence from a Field Experiment,” mimeo, Laval University. Shelanski, H. and P. Klein (1995), “Empirical Research in Transaction Cost Economics: A Review and Assessment,” Journal of Law, Economics and Organization, 11, 335– 361. Slade, M. (1996), “Multitask Agency and Contract Choice: An Empirical Exploration,” International Economic Review, 37, 465–486. Stiglitz, J. (1974), “Incentives and Risk Sharing in Sharecropping,” Review of Economic Studies, 41, 219–255. Stiglitz, J. and A. Weiss (1981), “Credit Rationing in Markets with Imperfect Information,” American Economic Review, 71, 393–410. Timmins, C. (2002), “Measuring the Dynamic Efﬁciency Costs of Regulators’ Preferences: Municipal Water Utilities in the Arid West,” Econometrica 70(2), 603–29. Toivanen, O. and R. Cressy (1998), “Is There Adverse Selection on the Credit Market?” mimeo, Warwick. Townsend, R. (1994), “Risk and Insurance in Village India,” Econometrica, 62, 539–591. Vuong, Q. (1989), “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses,” Econometrica, 57, 307–334. Whinston, M. (2000), “On the Transaction Cost Determinants of Vertical Integration,” mimeo, Northwestern University. Whinston, M. (2001), “Assessing the Property Rights and Transaction Cost Theories of Firm Scope,” American Economic Review, 91, 184–188. Williamson, O. (1975), Markets and Hierarchies: Analysis and Antitrust Implications. New York: The Free Press. Williamson, O. (1985), The Economic Institutions of Capitalism: Firms, Markets, Relational Contracting. New York: The Free Press.

Testing Contract Theory

149

Williamson, O. (1996), The Mechanisms of Governance. London: Oxford University Press. Wolak, F. (1994), “An Econometric Analysis of the Asymmetric Information, RegulatorUtility Interaction,” Annales d’Economie et de Statistiques, 34, 13–69. Young, P. and M. Burke (2001), “Competition and Custom in Economic Contracts: A Case Study of Illinois Agriculture,” American Economic Review, 91, 559–573.

CHAPTER 5

The Economics of Multidimensional Screening Jean-Charles Rochet and Lars A. Stole

1. MOTIVATION AND INTRODUCTION Since the late 1970s, the theory of optimal screening contracts has received considerable attention. The analysis has been usefully applied to such topics as optimal taxation, public good provision, nonlinear pricing, imperfect competition in differentiated industries, regulation with information asymmetries, government procurement, and auctions, to name a few prominent examples.1 The majority of these applications have made the assumption that preferences can be ordered by a single dimension of private information, largely to facilitate ﬁnding the optimal solution of the design problem. However, in most cases that we can think of, a multidimensional preference parameterization seems critical to capturing the basic economics of the environment. For example, consider the case of duopolists in a market where each ﬁrm competes with nonlinear pricing over its product line. In many examples of nonlinear pricing (e.g., Mussa and Rosen 1978 and Maskin and Riley 1984), it is natural to think of consumers’ preferences being ordered by the willingness to pay for additional units of quantity or quality. But, if we believe that competition between duopolists is imperfect in the horizontal dimension as suggested, for example, by models such as Hotelling’s (1929), then we need to introduce a form of horizontal heterogeneity as well. As a consequence, a minimally accurate model of imperfect competition between duopolists suggests including two dimensions of heterogeneity – vertical and horizontal. There are several additional economic applications that naturally lend themselves to multidimensional heterogeneity. r General models of pricing. In some instances, a ﬁrm may offer a single product over which the preferences of the consumer may depend 1

Among the seminal contributions, we can cite Mirrlees (1971, 1976) for optimal taxation, Green and Laffont (1977) for public good provision, Spence (1980) and Goldman, Leland, and Sibley (1984) for nonlinear pricing, Mussa and Rosen (1978) for imperfect competition in differentiated industries, Baron and Myerson (1982), Baron and Besanko (1984), McAfee and McMillan (1987), and Laffont and Tirole (1986, 1993) for regulation, and Myerson (1981) for auctions.

Multidimensional Screening

151

importantly on several dimensions of uncertainty (e.g., tastes, marginal utility of income, etc.). In other instances, a ﬁrm may be selling an array of distinct products, of which consumers may desire any subset of the total bundle of goods. In this latter case, the dimension of heterogeneity of consumers’ preferences for the ﬁrm’s products will be at least as large as the number of distinct products. r Regulation under richer asymmetries of information. As noted in the seminal article by Baron and Myerson (1982) on regulation under private information, at least two dimensions of private cost information naturally arise – ﬁxed and marginal costs. Another example is studied by Lewis and Sappington (1988) in which the regulator is simultaneously uncertain about cost and demand. If we wish to take the normative consequences of asymmetric information models of regulation seriously, we should check the robustness of the results to such reasonable bidimensional private information. r Income effects and related phenomena. Many times it makes sense to think of two-dimensional information when privately known budget constraints or other forms of limited liability are present. For example, how should a seller design a price schedule when customers have random valuations and simultaneously random budget constraints? r Auctions. Similar to the aforementioned problem, we may suppose that multiple buyers bid for a single item, but their preferences depend on a privately known budget constraint in addition to a private valuation for the good (as in Che and Gale, 1998, 1999, 2000). Or in another important auction setting, suppose (as in Jehiel, Moldovanu, and Stacchetti 1999) that a buyer’s preferences depend not only on his own valuation of the good, but also on the privately known externality from someone else getting the good instead (e.g., two downstream ﬁrms bid for an exclusive franchise and the loser must compete against the winner with an inferior product). Although in this paper, we do not consider the auction literature in depth, the techniques of optimal contract design in multidimensional environments are clearly relevant.2 Unfortunately, the techniques for confronting multidimensional settings are far less straightforward as in the one-dimensional paradigm. This difﬁculty has meant that the bulk of applied theory papers in the self-selection literature are based on one-dimensional models of heterogeneity. As a consequence, the results of these economic applications remain uncomfortably restrictive and possibly inaccurate (or at least nonrobust) in their conclusions. In this sense, we have been searching under the proverbial street lamp, looking for our lost keys, not because that is where we believe them to lie, but because it is apparently the only place where we can see. This survey is an attempt to catalog and 2

Other multidimensional auctions problems are studied by Gal, Landsberger, and Nemirovski (1999) and Zheng (2000).

152

Rochet and Stole

explain the terrain that has been discovered in the brief forays away from the one-dimensional street lamp – indicating both what we have learned and how light or dark the night sky actually is. In Section 2, we review the one-dimensional paradigm, emphasizing those aspects that will generate problems as we extend the analysis to multiple dimensions. In Section 3, the general multidimensional paradigm is explained for both the discrete and continuous settings. We illustrate the concepts in a simple two-type “multidimensional” model, explaining how the multidimensionality of types introduces new economic and mathematical aspects of the screening problem. In Sections 4–9, we specialize our discussion to speciﬁc classes of multidimensional models that have proven successful in the applied literature. Section 4 presents results on separation and aggregation that greatly simplify multidimensional screening. Section 5 considers environments in which there is a single, nonmonetary contracting variable, but multiple dimensions of type – a scenario that also frequently gives rise to explicit solutions. Section 6 looks at a further specialized subset of models (from Section 5) that are economically important and mathematically tractable: bidimensional private information settings in which one dimension of information enters the agent’s utility function additively. Section 7 considers a series of multidimensional models that have been successfully applied to competitive environments. Section 8 considers a distinct set of multidimensional environments in which information is revealed over time. Finally, Section 9 considers the more subtle problems inherent in general models of multiple instruments and multidimensional preferences; here, most papers written to date have considered the scenario of multiproduct monopoly bundling, so we study this model in some detail. Section 10 concludes. 2. A REVIEW OF THE ONE-DIMENSIONAL PREFERENCE MODEL Although it is often recognized that agents typically have several characteristics and that principals typically have several instruments, the screening problem has most of the time been examined under the assumption of a single characteristic and a single instrument (in addition to monetary transfers). In this case, several qualitative results can be obtained with some generality: 1. When the single-crossing condition is satisﬁed, only local (ﬁrst- and second-order) incentive compatibility constraints can be binding. 2. In most problems, the second-order (local) incentive compatibility constraints can be ignored, provided that the distribution of types is not too irregular. 3. If bunching is ruled out, then the principal’s optimal mechanism is found in two steps: (a) First, compute the minimum expected rent of the agent as a function of the allocation of (nonmonetary) goods.

Multidimensional Screening

153

(b) Second, ﬁnd the allocation of goods that maximizes the surplus of the principal, net of the expected rent computed in (a). To understand the difﬁculties inherent in designing optimal screening contracts when preferences are multidimensional, it is useful to ﬁrst review this basic one-dimensional paradigm. This will serve both as a building block for the multidimensional extensions and as an illustration of how one-dimensional preferences generate simplicity and recursion in the optimization program. We will use a simple nonlinear pricing framework similar to Mussa and Rosen (1978) as our basic screening environment, elaborating as appropriate. Suppose that a monopolist sells its products using a nonlinear tariff, P(q), where q is the amount of quantity chosen by the consumer and P(q) is the associated price. The population of potential consumers of the ﬁrm’s good have preferences that can be indexed by a single-dimensional parameter, θ ∈ ≡ [θ, θ ], and is distributed in the population according to the absolutely continuous distribution function F(θ ), where f (θ) ≡ F (θ ) represents the associated density. Let each ¯ for a price of P be given consumer’s preferences for consuming q ∈ Q ≡ [0, q] by u = v(q, θ) − P. Note that preferences are linear in money. To place some additional structure on the effect of θ, we assume the well-known, single-crossing property that v qθ has a constant sign; in this paper, we will associate higher types with higher marginal valuations of consumption; hence, v qθ > 0. This condition makes the one-dimensional assumption restrictive.3 It is worth noting that this condition has two equivalent implications: (i) the indifference curves of any two types of consumers cross at most once in price-quantity space, and (ii) the associated demand curves do not intersect and are completely ordered as a family of curves given by p = v q (q, θ). We will begin our focus on the even simpler linear-quadratic setting in which v(q, θ) = θq − 12 q 2 . In this case, the associated demand curves are parallel lines, p = θ − q. There are two methodologies used to solve one-dimensional screening problems – what we refer to as the parametric-utility approach and the demandproﬁle approach. The former has been more commonly used in the applied theory literature, but the latter provides useful conceptual insights, particularly in the multidimensional context, that are easily overlooked in the former methodology. For completeness, we will brieﬂy present both here.4 3 4

In a discrete setting, for example, multidimensional types can always be reassigned to a onedimensional parameter, but the single-crossing property is not always preserved. Most recent methodological treatments of the screening problem use the parametric-utility approach, referred to by Wilson (1993a) as the “disaggregated-type” approach. See, for example, the article by Guesnerie and Laffont (1984), and the relevant sections in Fudenberg and Tirole (1991), Myerson (1991), Mas-Colell, Whinston, and Green (1995), and Stole (1997). The demand-proﬁle approach is thoroughly expounded in Wilson (1993a). Brown and Sibley (1986) and Wilson (1993a) discuss both approaches.

154

Rochet and Stole

2.1.

The Parametric-Utility Approach

The basic methodology we follow here was initially developed by Mirrlees (1971), and applied to nonlinear pricing by Mussa and Rosen (1978). The ﬁrm in our setting cares only about expected proﬁt and so seeks to maximize θ E[π ] = [P(q(θ )) − cq(θ)] d F(θ ), θ

where q(θ) is a parametric representation of the choice of type θ consumers, and c is the constant marginal cost of producing q units for a given consumer. Suppose that our monopolist offers a nonlinear, lower-semi-continuous pricing schedule P(q) deﬁned over the compact domain Q. Then, we can deﬁne a type θ consumer’s indirect utility under this scheme as u(θ) ≡ max{v(q, θ ) − P(q)}. q∈Q

Provided that the derivatives of v are bounded, u(θ) is absolutely continuous. Applying the revelation principle, we can reparameterize our problem and focus on maximizing expected proﬁts over all incentive-compatible and individually rational mechanisms, { p(θ), q(θ)}θ∈ . As is well known, a mechanism in this context is incentive-compatible if and only if, for almost all θ , we ˙ ) = v θ (q(θ ), θ ) and q(θ) is nondecreasing.5 The former condition is have u(θ equivalent to the local ﬁrst-order condition and arises as a natural analog of the envelope condition in Roy’s identity; the latter is equivalent to the local second-order condition. When preferences satisfy the single-crossing property, the local second-order condition implies a global condition as well. Hence, our ˙ monopolist ﬁrm can maximize expected proﬁts subject to u(θ) = v θ (q(θ ), θ ) and the monotonicity of q. Given our incentive compatibility conditions are stated in terms of u and q, it is useful to transform our monopolist’s program from price-quantity space to the utility-quantity space. Because S(q, θ ) ≡ v(q, θ ) − cq represents joint surplus from producing q units of output to be consumed by a type θ consumer, the ﬁrm’s expected proﬁt can be restated as θ E[π] = [S(q(θ ), θ) − u(θ)] d F(θ ). (2.1) θ

˙ = Hence, the monopolist maximizes (2.1) over {q(θ), u(θ )}θ ∈ subject to u(θ) v θ (q(θ ), θ), q(θ) nondecreasing and subject to individual rationality. Note that this program is entirely deﬁned by the social surplus function S(q, θ ) and the partial derivative of the consumer’s utility function with respect to θ . For example, the setting in which utility over quantity is v(q, θ) = θq − 12 q 2 and cost is cq is formally equivalent to the setting in which a monopolist sells a 5

Throughout, we use the notation x˙ (y) to represent the derivative of x with respect to y.

Multidimensional Screening

155

product line with various qualities, in which the consumer’s value of consum˜ and the cost of producing ing one unit of quality q is given by v˜ (q, θ˜ ) = θq such a unit is 12 q 2 , where θ˜ = θ − c. Both settings give rise to identical surplus functions and partial derivatives with respect to type, and hence have identical optimal price schedules. In this sense, there is little to distinguish the use of quality [as in Mussa and Rosen’s (1978) seminal paper] from quantity [as in Maskin and Riley’s (1984) generalization of this model]. Fortunately, in both cases, the operative instruments begin with the letter q and, as a pleasant historical accident, we can refer to this second-best allocation as the MR allocation. We will nonetheless focus our attention on the quantity variation of this model. As a technical simpliﬁcation, we use the local ﬁrst-order condition for truthtelling to replace u in the ﬁrm’s objective via integration by parts. The result is an objective function that is maximized over {{q(θ)}θ ∈ , u( θ )} subject to q nondecreasing and the individual rationality constraint: θ 1 − F(θ ) E[π] = v θ (q(θ ), θ ) − u( θ ) d F(θ). S(q(θ ), θ) − f (θ ) θ This objective function has been usefully termed the monopolist’s “virtual surplus” function by Myerson (1991); it includes the total surplus generated by the monopolist’s production less the information rents that must be left to the consumers as a function of their type. ˙ ) = v θ (q, θ) ≥ 0, individual rationality is equivalent to requirBecause u(θ ing u( θ ) ≥ 0. Thus, we choose u( θ ) = 0 as a corner solution in this program, guaranteeing participation at the least possible cost. Note that, in this simple program, it is never proﬁtable to leave excess rents to consumers. Hence, we are left with an objective function that can be maximized pointwise in q(θ ) if we ignore the monotonicity condition. Providing that the virtual surplus !(q, θ) ≡ S(q, θ ) −

1 − F(θ ) v θ (q, θ) f (θ )

is quasi-concave in q and satisﬁes a cross-partial condition, !qθ ≥ 0, the solution {q(θ)}θ∈ , which is deﬁned by the pointwise ﬁrst-order condition !q (q(θ ), θ) = 0, maximizes expected proﬁt and is nondecreasing as required. This solution satisﬁes 1 − F(θ ) v θ (q(θ ), θ ) ≥ 0. Sq (q(θ ), θ) = f (θ ) Hence, we have the familiar result that q(θ ) is distorted downward relative to the social optimum, everywhere but at the “top” (i.e., at θ = θ). If !qθ is not everywhere nonnegative, it is possible that an ironing procedure needs to be used to constrain q(θ) to be nondecreasing.6 Such a procedure typically requires that we utilize more general control-theoretic techniques and depart from our 6

See, for example, Fudenberg and Tirole (1991) for details.

156

Rochet and Stole

simple pointwise maximization program. However, in the single-dimensional setting, a mild set of regularity conditions on v and F guarantees us the simple case.7 Note that because proﬁt per customer, π = S − u, is linear in utility, we are able to use integration by parts to eliminate the utility function from the objective function, except for the requirement that u( θ ) ≥ 0. This allows us to maximize proﬁts pointwise in q; i.e., we do not have to concern ourselves simultaneously with the value of u(θ ). In this sense, the program is block recursive: ﬁrst the optimum can be found for each q(θ ) and for u( θ ) in isolation; then using the resulting function q(θ ) and u( θ ), u(θ) can be determined via integration. The resulting utility schedule can then be combined with q(θ ) to determine the total type-speciﬁc transfer, p(θ) = v(q(θ), θ ) − u(θ ). Given { p(θ ), q(θ )}θ∈ , the price schedule can be constructed by inverting the function q(θ ): P(q) = p(θ −1 (q)). A second inherent simplicity in the one-dimensional model is that the incentive compatibility conditions are determined by a simple differential equation and a monotonicity condition. Whether we use integration by parts or the maximum principle to solve the program, in both instances we made important use of this fact: without it, we also lose the recursive nature of the problem. In the multidimensional setting, if we are uncertain as to which constraints bind, we will generally be forced to maximize proﬁts subject to a far larger set of global constraints. To this end, it is useful to brieﬂy consider the discrete setting. Suppose that θ is distributed discretely on θ = θ1 < θ2 < · · · < θ I = θ , with respective probabilities f i > 0, i = 1, . . . , I and cumulative distribution function k f i . A direct mechanism is a menu of I price-quantity pairs, where Fk ≡ i=1 the ith indexed pair is given to consumers who report that they are of the ith type: {qi , pi }i=1,...,I . Given that the single-crossing property is satisﬁed, it is straightforward to show that, if adjacent incentive compatibility constraints are satisﬁed, then global incentive compatibility is satisﬁed. The adjacent constraints are typically referred to as the downward local and upward local incentive constraints: v(qi , θi ) − pi ≥ v(qi−1 , θi ) − pi−1 ,

for i = 2, . . . , I,

v(qi , θi )− pi ≥ v(qi+1 , θi ) − pi+1 , for i = 1, . . . , I − 1.

(ICi,i−1 ) (ICi,i+1 )

Furthermore, assuming that it is always proﬁtable to transfer rents from the consumer to the ﬁrm, one can easily demonstrate that the downward constraints are binding. In addition, providing that the resulting quantity allocation, {qi }i=1,...,I , is monotonic, one can show that the upward constraints must be slack and consequently incentive compatibility is global. This set of results is typically used to solve the relaxed program with only the downward constraints. In this sense, the sequence of binding downward-local incentive constraints (and the difference 7

The commonly made assumptions that preferences are quadratic, and θ has a log-concave distribution are sufﬁcient for !qθ ≥ 0.

Multidimensional Screening

157

equation that they imply) are analogous to the ordinary differential equation ˙ u(θ) = v θ (q(θ), θ) in the continuous setting. Not surprisingly, the solution to the relaxed program (ignoring monotonicity constraints) satisﬁes an analogous condition: 1 − Fi {v q (qi , θi+1 ) − v q (qi , θi )}, Sq (qi , θi ) = fi i = 1, . . . , I − 1. In the discrete setting case, it is perhaps easier to see the importance of focusing on the local constraints, and in particular on the downward-local constraints. Without such a simpliﬁcation, we would have to introduce a Lagrange multiplier for every type-report pair, (i, j), resulting in I (I − 1) total constraints rather than simply I − 1. Not only does the single-crossing property in tandem with a one-dimensional type space allow us to reduce the set of potential constraints by a factor of I , it also renders these local constraints in an tractable fashion: a simple ﬁrst-order difference equation. The absence of such a convenient ordering is the source of much difﬁculty in the multiple-dimension setting. 2.2.

The Demand-Proﬁle Approach

An alternative approach to modeling optimal screening contracts in parametricutility space is to work with a less primitive and perhaps more economically salient structure – demand curves ordered by type and then aggregated into a demand proﬁle.8 Because demand curves entirely capture consumers’ preferences, there is no informational loss from restricting our attention to demand proﬁles. Given that they are generally easier to estimate empirically, this primitive has arguably more practical appeal. For our purposes, however, the demand proﬁle approach is useful also in that this method more clearly illustrates the simplicity and recursiveness of the single-type framework, and also underscores the aspects of the multiple-type framework that will lead to genuine economic difﬁculties rather than merely technical concerns. Consider ﬁrst the continuous parameterization of demand curves that we will index by θ: an individual of type θ has a demand curve given by p = v q (q, θ). The single-crossing property is equivalent to the requirement that these demand curves do not intersect. In the parametric-utility approach where v(q, θ ) = θq − 12 q 2 , this generates a simple family of parallel demand curves: p = θ − q. The primitive object on which we will work, however, is the aggregate demand proﬁle generated by calculating the measure of consumers who consume q or 8

An interested reader is urged to consult Wilson (1993a) for a wealth of examples and insights into this approach. Wilson (1993a) builds on the work of Brown and Sibley (1986), who provide an earlier treatment of this approach.

158

Rochet and Stole

more units of output with a price schedule, P(q). Formally, we characterize this “cumulative” aggregate demand functional as M[P(·), q] = Prob[θ ∈

| arg max{v(x, θ ) − P(x)} ≥ q]. x

If the consumer’s program is quasi-concave [which is equivalent to the requirement that the marginal price schedule, p(q) ≡ P (q), intersects the consumer’s demand curve once from below], then consumer θ will demand q or more units if and only if v q (q, θ) is not less than the marginal price, p(q), which implies that the cumulative aggregative demand functional has a very simple form: M[P(·), q] = Prob [v q (q, θ) ≥ p(q)] ≡ N ( p(q), q). In this case, the problem is fully decomposable: The seller’s program is to choose optimal marginal prices, p(q), to maximize N ( p, q)[ p − c] pointwise for each q. Assuming that the monopolist’s local ﬁrst-order necessary condition is also sufﬁcient, we can characterize the solution by N ( p(q), q) +

∂ N ( p(q), q) [ p(q) − c] = 0, ∂p

or in a more familiar inverse-elasticity formula p(q) − c 1 = , p(q) η( p(q), q)

where η( p, q) ≡

−p ∂N . N ∂p

Providing that the resulting marginal price schedule, p(q), cuts each parameterized demand curve only once from below, this solution to the relaxed program will satisfy the agent’s global incentive compatibility constraints. The resulting

q nonlinear price schedule in this case is P(q) = P(0) + 0 p(s) ds, where the ﬁxed fee is chosen optimally to induce participation for all consumers who generate nonnegative virtual surplus. When the monopolist’s program is not quasi-concave over p for all q, the solution is still given by the maximization over p of N ( p, q)( p − c), but the resulting marginal price schedule p(q) may fail to be continuous in q, which gives rise to kinks in the price function P. This situation corresponds to the cases where !θq < 0 and q(θ ) is not strictly monotonic (bunching arises). Notice that in this case, also the demand proﬁle approach is less difﬁcult than the parametric-utility approach, which must resort to an ironing procedure. The demand proﬁle approach does not work well when the resulting price schedule cuts some demand curve twice. In this case the expression of the aggregated demand function M cannot be simpliﬁed, because it depends on the whole function P. As an illustration, consider the following numerical example. Suppose that there are three types of consumers with demand curves for the quantities given in the ﬁrst three numeric columns (we normalize

Multidimensional Screening

159

Table 5.1. The demand-proﬁle approach: a numerical example Unit

θ1

θ2

θ3

p(q)

N ( p(q), q)

R(q)

1st 2nd 3rd 4th 5th 6th Total

7 5 3 1 0 0

9 7 5 3 1 0

11 9 7 5 3 1

7 5 5 3 3 1

3 3 2 2 1 1

21 15 10 6 3 1 56

marginal cost to zero; see Table 5.1): The fourth numeric column represents the pointwise optimal price p(q), obtained by maximizing revenue pN ( p, q) for the qth unit. The ﬁfth column is the number of consumers purchasing that quantity (we have normalized the population to an average of one consumer of each type), and the ﬁnal column represents the revenue attributed to the particular quantity level. Total revenue using nonlinear pricing is equal to 56, whereas a uniform-pricing monopolist would choose a price of 5 per unit, sell 9 units, and make a total revenue of 45. The simplicity of this method for ﬁnding the optimal price schedule is worth noting. The local demand-proﬁle representation sometimes falls short, however. If the zeros in the θ1 type’s demand curve were replaced by 1 − 2ε, and the zero in the θ2 type’s demand curve was replaced by 1 − ε, the maximum revenue for the 6th unit would be obtained by selling to all types, whereas it would still be optimal to sell the 5th unit only to θ3 types. Thus, we would generate gaps in consumption choices for types θ1 and θ2 when we maximized p(q) pointwise by q. Speciﬁcally, types θ1 and θ2 would each be directed to choose only units 1–4 and unit 6 (but to skip unit 5), which is not feasible. This candidate solution represents the failure of the local representation; speciﬁcally, the marginal demand proﬁle N ( p, q) does not capture the consumer’s true preferences, which are instead characterized by the full demand proﬁle, M[P(·), q]. 3. THE GENERAL MULTIDIMENSIONAL SCREENING PROGRAM 3.1.

A General Discrete Formulation

We begin with the discrete setting, because it is perhaps most easiest to follow, relying on simple techniques of summation and optimization for the characterization of an optimum, unlike its continuous-type counterpart that makes use of more complex techniques in vector calculus and differential forms. Nonetheless, both approaches are closely related, and the conditions in the discrete setting have smooth analogs in the continuous setting. More importantly for the purposes of this survey, the rough equivalence between the two settings

160

Rochet and Stole

allows us to understand what is difﬁcult about “multiple dimensions.” To be precise, the problems arise not because of multiple dimensionality itself, but because of a commonly associated lack of exogenous type-ordering in multipledimensional environments. This source of the problem is clearest in the discrete setting, where it makes no sense to speak about dimensionality without simultaneously imposing structure on preferences.9 Let us consider now a more general version of the discrete model, where there are I distinct consumer types, and the monopolist produces n different goods: q ∈ Rn . Hence, we can speak of there being n available instruments (i.e., varieties of goods exchanged for money). We make no assumptions on preferences, except for linearity in money. For the sake of consistency with the rest of the paper, we still parameterize gross utilities in the form v(q, θi ) (where θi is the consumer type i = 1, . . . , I ), but we make no assumption on v. By convention, q = 0 represents “no consumption,” and we normalize utility such that v(0, θ) = 0 for all θ . We denote the allocation for consumer θi by the vector qi = q(θi ) and the associated utility by the scalar u i = u(θi ). We will use q to denote the n × I matrix (q1 , . . . , q I ), and u to denote the I -length row vector (u 1 , . . . , u I ). Using the parametric-utility approach, we represent the ﬁrm’s expected proﬁt as E[π ] =

I

f i {S(qi , θi ) − u i )},

i=1

to be maximized under the discrete incentive compatibility constraints, ICi, j , as deﬁned previously in the one-dimensional case, and individual rationality constraints: ∀i

u i ≡ v(qi , θi ) − P(qi ) ≥ 0.

(IRi )

The individual rationality constraints can be considered as a particular case of incentive compatibility constraints by deﬁning a “dummy”-type θ0 , such that v(q, θ0 ) ≡ 0, which implies that it will always be optimal to choose q0 = 0 and P(0) = 0.10 The ﬁrm’s problem is thus to maximize its expected proﬁt under implementability (i.e., IC and IR) constraints: ∀i, j ∈ {0, . . . , I },

u i ≥ v(q j , θi ) − P(q j ),

or, equivalently, ∀i, j ∈ {0, . . . , I }, 9

10

u i − u j ≥ v(q j , θi ) − v(q j , θ j ).

(ICi j )

For example, whether we index preferences using two dimensions, v(q, θ1i , θ2 j ) where i, j = 1, . . . , I /2, or a single dimension, v(q, θk ) with k = 1, . . . , I , is immaterial by itself. Fundamentally, any difﬁculty in extending single-dimensional models to multidimensional models must arise from a lack of ordering among the types rather than any primitive notions of dimensionality. This is just a convention. It is compatible with ﬁxed fees, because P can be discontinuous in 0.

Multidimensional Screening

161

Following Spence (1980), it is natural to decompose the ﬁrm’s problem in two subproblems: 1. Minimize expected utility for ﬁxed q = (q1 , . . . , q I ). 2. Choose q to maximize expected surplus minus expected utility. It is remarkable that the ﬁrst subproblem has a general solution that can be found by a relatively simple algorithm. Let us denote by U(q) the set of utility vectors that implement q. That is, U(q) = {(u 1 , . . . , u I )such that ICi j is satisﬁed for all i, j = 0, . . . , I and u 0 = 0}. In what follows, it will be useful to consider arbitrary paths in the set = {θ1 , . . . , θ I }. We will denote such a path from type θi to θ j by the function γ . We denote the “length” of γ by "; i.e., " is the number of segments used to connect θi = γ (0) to θ j = γ ("). Hence, γ is a mapping, γ : {0, 1, . . . , "} → . Finally, we say that a path of length " is “closed” if γ (0) = γ ("). With this notation for discrete paths, the following characterization of U(q) can be stated. A proof is found in Rochet (1987). Lemma 3.1. U(q) is nonempty if and only if for every closed "-length path γ "−1

v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ) ≤ 0.

(3.1)

k=0

To provide an intuition for condition (3.1), deﬁne the incremental utility between type i and type j as the difference between the utility of type i and the utility of type j when consuming the bundle assigned to type j. Condition (3.1) means that, for all closed paths γ in the set of types, the sum of incremental utilities along γ is nonpositive. Consider, for example, a closed path of length k. Incentive compatibility requires u γ (k+1) − u γ (k) ≥ v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ). By summing over these inequalities, we see that condition (3.1) is implied by incentive compatibility for any closed path. Lemma 3.1 says that the converse is true: condition (3.1) implies incentive compatibility, as well.11 The proof is constructive: Lemma 3.2 gives an algorithm for constructing the minimal element of U(q). 11

The reader versed in vector calculus will recognize this as a discrete variation of the requirement that v θ (q(θ ), θ ) is a conservative ﬁeld, where C represents an arbitrary closed path in : ! v θ (q(θ ), θ ) dθ = 0. C

162

Rochet and Stole

Lemma 3.2. When (3.1) is satisﬁed, U(q) has a unique minimal element, u , characterized for i = 0, . . . , I by "−1

u i ≡ sup γ

v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ),

(3.2)

k=0

where the sup is taken over all open paths from 0 to i, and u 0 ≡ 0. Condition (3.2) means that agent i is guaranteed a utility level, u i , equal to the sum of the incremental utilities along any path connecting θ0 to θi . We will refer to this ith element of the minimum of U(q) as the informational rent of agent i. Note that this rent does not depend on the frequencies { f 1 , . . . , f I } of the distribution of types, but only on the support, = {θ1 , . . . , θ I }, of this distribution. Formula (3.2) shows that the informational rent of each agent can be computed by a recursive algorithm. Intuitively, it is as if each type i chooses the path from θ0 to θi that maximizes the sum of incremental utilities. Denote by u i" the maximum of formula (3.2) over all paths of length less than or equal to " from 0 to i. u i" can be computed recursively by the Bellman-type formula: " # u i"+1 = max u "j + v(q j , θi ) − v(q j , θ j ) . j

Condition (3.1) implies that this algorithm has no cycles. The set of types being ﬁnite, u i" converges to the rent of agent i in a ﬁnite number of steps as " is increased to I . For any allocation q, the dynamic programming principle implies that if j belongs to the optimal path γ from 0 to i, the truncation of γ to the path between 0 and j deﬁnes the optimal path from 0 to j. This allows us to deﬁne a partial ordering ≺ on types: j ≺ i ⇐⇒ j belongs to one of the optimal paths from 0 to i. For generic12 allocations, there is a unique optimal path γ [with γ (0) = 0 and γ (") = i] from 0 to i, and the rent of i is easily computed: u i (q, ≺) =

"−1 [v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) )]. k=1

Graphically, the collection of optimal paths comprises a “tree” (i.e., a connected graph without cycles such that, from the “root” vertex 0, there is a unique path to any other point in the graph); we use # to represent such a tree. We can therefore represent the binding incentive constraints by such a tree emanating from the type, θ0 . One can also deﬁne for all i, j, such that i ≺ j, the “immediate successor” s(i, j) of i in the direction of j by the formula s(i, j) = min{k | i ≺ k ≺ j, k = i}. 12

However, the optimal allocation q may be such that there are several optimal paths. We give an example of such a case in Section 3.3.

Multidimensional Screening

163

Then, it is easy to see that the agent’s expected rent can be written as13 ER(q, ≺) =

I

f j [v(qi , θs(i, j) ) − v(qi , θi )].

i=1 j!i

In the classic one-dimensional case when the single-crossing holds, condition (3.1) reduces to the well-known monotonicity condition q1 ≤ q2 ≤ · · · ≤ q I and ≺ always consists of the complete ordering: θ1 < θ2 < · · · < θ I ; the associated tree is a single connected branch. In this case ER(q, ≺) =

I

(1 − Fi ) [v(qi , θi+1 ) − v(qi , θi )] ,

i=1

and as previously shown in Section 2, subproblem 2 is easily solved by maximizing the virtual surplus !(qi , θi ) = S(qi , θi ) −

1 − Fi [v(qi , θi+1 ) − v(qi , θi )]. fi

In the general case, the binding IC constraints (corresponding to the agent’s optimal paths deﬁning the tree #) depend on the allocation q, which means that the virtual surplus does not have in general a simple expression. As will be illustrated later, the virtual surplus approach works only when one can anticipate a priori the optimal paths γ ∈ #: i.e., which IC constraints will be binding. To summarize, from this discussion of the general discrete formulation, two conclusions emerge that are inherent in all multidimensional problems. First, and most signiﬁcantly, multiple-dimension models are difﬁcult precisely when they give rise to an endogenous ordering over the types of (i.e., the set of binding IC constraints is endogenous to the choice of q). Second, and closely related, the incentive compatibility conditions are frequently binding not only among local types, and hence the discrete analog of the ﬁrst-order approach is not generally valid and a form of an integrability condition, (3.1), must necessarily be satisﬁed. We will see a similar structure in the continuous-type setting. 3.2.

The Continuous Case

In the continuous case, the implementability condition (3.1) translates into two necessary conditions. The ﬁrst is an integrability condition that requires, for every closed path γ : [0, 1] → , that 1 v θ (q(γ (s)), γ (s)) dγ (s) = 0. (3.3 ) 0

13

When i is a maximal element, the set { j | j ! i} is empty and ER does not depend on qi .

164

Rochet and Stole

This is equivalent to saying that v θ (q(θ ), θ ) is the gradient14 of some function u(θ ). The second condition is a set of inequalities: ∀θ

(3.3

)

D 2 u(θ ) ≥ v θ θ (q(θ ), θ ),

where D 2 u is the Hessian matrix of any function u such that ∇u = v θ and the inequality is taken in the sense of matrices (i.e., D 2 u − v θ θ is positive semideﬁnite). The trouble is that these conditions are not sufﬁcient, except when v θ θ ≡ 0 (the linear parameterization) in which case (3.3 ) and (3.3

) are necessary and sufﬁcient for implementability by Fenchel’s duality theorem15 (Rochet, 1987). The continuous equivalent of Lemma 3.2 is somewhat trivial. This is because the integrability condition (3.3 ) implies that, for any path γ connecting γ (0) = θ0 to γ (1) = θ , we have 1 u(θ ) = v θ (q(γ (s)), γ (s)) dγ (s). 0

Expected surplus can be computed using the divergence theorem:16 u(θ) f (θ ) dθ = λ(θ ) · v θ (q(θ), θ) f (θ) dθ λ(θ ) · n(θ ) f (θ )u(θ ) dσ (θ ), − ∂

where λ is any solution of the partial-differential equation: div (λ(θ ) f (θ )) + f (θ ) = 0,

(3.4)

where n(θ) is the outward normal to the boundary ∂ of , and the notation ∂ W (θ) dσ (θ) represents the integral of some function W along the boundary ∂ . 14

As noticed by several authors, this is also equivalent to a set of partial differential equations reminiscent of Slutsky equations: ∂ ∂ ∂v ∂v ∀n, m (q(θ ), θ) = (q(θ ), θ ) . ∂θn ∂θm ∂θm ∂θn

15

McAfee and McMillan (1988) deﬁne a Generalized Single-Crossing condition that slightly generalizes the linear case: it amounts to assuming that, for any nonlinear price, the set of types who choose the same allocation is a linear subspace. They use it to generalize the results of Laffont, Maskin, and Rochet (1987). They also ﬁnd a necessary and sufﬁcient condition for implementability. The divergence theorem is the multidimensional analog of the integration-by-parts formula. It asserts that, under regularity conditions, λ(θ ) · n(θ )u(θ ) f (θ ) dσ (θ ). − u(θ )div[λ(θ) f (θ )] dθ = λ(θ ) · ∇u(θ ) f (θ ) dθ −

16

∂

Multidimensional Screening

165

Now, the expected proﬁt of the ﬁrm can be written as E[π ] = {S(q(θ), θ ) − λ(θ )v θ (q(θ ), θ )} f (θ) dθ + λ(θ ) · n(θ )u(θ) f (θ) dσ (θ ), ∂

which has to be maximized under the implementability conditions (3.3 ) and (3.3

). When these constraints are not binding, this problem can, in principle, be solved by point-wise maximization of virtual surplus: !(q, θ) = S(q, θ) − λ(θ)v θ (q, θ). The trouble is that, like in the discrete case, λ is not known explicitly. It is deﬁned as the unique solution of partial differential equation (3.4) that satisﬁes the boundary condition u(θ)[λ(θ ) · n(θ )] = 0

for all

θ on ∂ .

It can be proved that the general solution to equation (3.4) can be computed by integrating the density, f , along arbitrary paths γ : 1 f (γ (s)) dγ (s). λ(θ ) = γ −1 (θ)

Therefore, the optimal u is characterized by two objects: r A partition of the boundary of into two regions: the “lower boundary” ∂0 , where the participation constraint is binding [u(θ) = 0] and the “upper boundary” ∂1 , where λ(θ) · n(θ) = 0, which means that there is no distortion along the normal to the boundary; r A family of paths connecting the lower boundary [where u(θ) = 0] to the upper boundary (where there is no distortion). This is the continuous equivalent of the pattern found in the discrete case: a partial ordering of types along paths connecting the region where the participation constraint binds to the region where there is no distortion. As in the discrete setting, again two ideas emerge that are distinct to the multidimensional case: (i) the set of paths connecting the lower and upper boundaries of are endogenous to the choice of allocation {q(θ )}θ ∈ , and (ii) an integrability condition must necessarily be satisﬁed. 3.3.

Tractable Discrete Models

To illustrate the different patterns that can arise in multidimensional screening models and how our conclusions affect our results, we consider here a very simple example of nonlinear pricing problems, inspired by Sibley and Srinagesh

166

Rochet and Stole

(1997) and Armstrong and Rochet (1999).17 In those examples, a monopolist ﬁrm produces two goods j = 1, 2 at a constant marginal cost (normalized to zero). There are two types of consumers, characterized by independent linear inverse demands pi j (qi j ) = θi j − qi j ,

j = 1, 2,

i = 1, 2.

Thus types are bidimensional θi = (θi1 , θi2 ), i = 1, 2. Linear demands are equivalent to quadratic utilities: $ 2 1 2 v(θi , qi ) = θi j qi j − qi j , 2 j=1 where qi is the vector qi = (qi1 , qi2 ). The ﬁrst-best efﬁcient allocation is char2 2 + θi2 ). acterized by the vector qi∗ = θi and surplus by the scalar Si∗ = 12 (θi1 18 Following lemma 3.1, the implementability condition reduces to (θ1 − θ2 ) · q2 + (θ2 − θ1 ) · q1 ≤ 0.

(3.5)

Providing this condition is satisﬁed, lemma 3.2 implies that, at the optimum, the rents to the types are given by u 1 = max(0, (θ1 − θ2 ) · q2 ), u 2 = max(0, (θ2 − θ1 ) · q1 ). The implementability condition then implies that either u 1 or u 2 equals 0 (i.e., the IR constraint binds somewhere). To ﬁx ideas, we assume that S1∗ < S2∗ . By analogy with the unidimensional case, one may conjecture that the second-best allocation is then characterized by u 1 = 0 (binding IR “at the bottom”) and q2 = q2∗ (efﬁciency “at the top”). This is indeed one possible regime, illustrated by Figure 5.1. In this ﬁrst case, u 2 = (θ2 − θ1 ) · q1 and q1 = θ1 −

f2 (θ2 − θ1 ). f1

This allocation can be implemented by a menu of two-part tariffs: tariff 1 has a low ﬁxed fee T1 = 12 q12 and a unit price vector p1 = f 2 / f 1 (θ2 − θ1 ); tariff 2 has a high ﬁxed fee T2 = S2∗ − u 2 and a zero unit price vector. Note that unit prices are not necessarily above marginal costs (which have been normalized to zero), because we did not assume19 θ2 > θ1 . Apart from this feature the completely 17 18 19

Dana (1992) and Armstrong (1999a) also provide related examples of tractable discrete-type models. The only relevant closed path to consider is the cycle from θ1 to θ2 . The case θ2 > θ1 corresponds to what Sibley and Srinagesh (1997) have called uniformly ordered demands.

Multidimensional Screening

167

θ2 ❅ ❅

❅ ❘ ❅ ❅ ❅

❅ ❅ ❅ ✘✘ θ1 ✘ ✘ ✘ ✾ ✘ ✘ ✘

θ0

✘ ✘✘✘ ✘

Figure 5.1. First-regime – the completely ordered case. Arrows indicate the direction of the binding incentive constraints; e.g., an arrow from θ2 to θ1 represents type θ2 ’s indifference between their own allocation and that meant for θ1 .

ordered case is analogous to the unidimensional case. It corresponds to the solution of the monopoly problem whenever u 2 = (θ2 − θ1 ) · q1 ≥ 0 = u 1 ≥ (θ1 − θ2 ) · q2 . The second inequality is implied by the ﬁrst, given the implementability condition in (3.5), whereas the ﬁrst inequality is equivalent to f 1 (θ2 − θ1 ) · θ1 ≥ f 2 (θ2 − θ1 )2 , or θ12 + f 2 θ22 . 1 + f2

θ1 · θ2 ≥

(3.6)

When this condition is not satisﬁed, a second possible regime corresponds to the case where there is no interaction between types; we call it the separable case (see Figure 5.2). In this second case, there are no distortions: q1 = θ1

and q2 = θ2 , θ2

✢

✘ θ1

✘✘ ✾✘✘ ✘ ✘✘ ✘ ✘ ✘✘ ✘

θ0

Figure 5.2. Second regime – the separable case.

168

Rochet and Stole θ2 •

❅ ❅ ❅ µ21 ❘ ❅ ❅ µ20 ✎ ❅ ❅ ❅ ❅ ✘• ✘✘ θ1 ✘✘ ✾ ✘ ✘ ✘ ✘ ✘ µ 10 ✘✘ •✘

θ0

Figure 5.3. Third regime – the mixed case.

and all the surplus is captured by the seller: u 1 = u 2 = 0. Following (3.5), this allocation is implementable if and only if (θ2 − θ1 ) · θ1 ≤ 0,

(θ1 − θ2 ) · θ2 ≤ 0.

Given our assumption that θ12 ≤ θ22 , this is equivalent to θ1 · θ2 ≥ θ12 .

(3.7)

Finally, when neither (3.6) nor (3.7) is satisﬁed, there is an intermediate case that combines the features of the two regimes (see Figure 5.3). In this third and ﬁnal case, the ﬁrm is still able to capture all the surplus, but this is at the cost of a distortion on q1 , designed in such a way that type θ2 is just indifferent between the efﬁcient bundle q2 = θ2 at a total tariff T2 = S2∗ and bundle q1 at tariff T1 = 12 q12 . Notice that there are two optimal paths connecting θ2 to θ0 , corresponding to two different trees, #1 and #2 . The weight µ21 put on this second path is determined by this indifference condition: u 2 = 0 = (θ2 − θ1 )q1 , where q1 = θ1 −

µ21 (θ2 − θ1 ). f1

This gives µ21 = f 1

(θ2 − θ1 ) · θ1 , (θ2 − θ1 )2

which has to be between 0 and f 2 . These conditions determine the boundary of this regime in the parameter space: 0 ≤ (θ2 − θ1 ) · θ1 ≤ f 2 (θ2 − θ1 ) · θ2 , or θ12 ≤ θ1 · θ2 ≤

θ12 + f 2 θ22 . 1 + f2

Multidimensional Screening

169

Notice that, in this case, we have that u 1 = u 2 = 0 but q1 = q2 , which cannot arise in dimension 1. The three cases (completely ordered, separable, and mixed) illustrate the three settings that generally arise in multidimensional models. When we place signiﬁcant restrictions on preferences and heterogeneity, we can frequently obtain simpler solutions that correspond to the ﬁrst two cases. We discuss these in the following section, and then consider variations on these themes in Sections 5–8. The mixed case corresponds to the more general and difﬁcult setting we discuss in Section 9.

4. AGGREGATION AND SEPARABILITY In this section, we explore two cases where multidimensional problems can be effectively reduced to unidimensional problems: the case of aggregation, where a one-dimensional sufﬁcient statistic can be found for representing unobservable preference heterogeneity, and the case of separability, where the set of types can be partitioned a priori into one-dimensional subsets. In the former setting, the binding IC constraints necessarily lie in a completely ordered graph, which is known a priori and corresponds to the completely ordered case, discussed in the previous section. In the latter setting, the incentive constraints can be partitioned into an exogenously given tree that is known a priori, which corresponds to the separable case.

4.1.

Aggregation

A family of multidimensional screening problems that effectively reduces to one-dimensional problems are characterized by the existence of a sufﬁcient statistic of dimension 1 that summarizes all relevant information on unobservable heterogeneity of types and that has an exogenously given distribution. Let us start with a trivial example where the sufﬁcient statistics can be found immediately. Suppose that only one good is sold (n = 1), but types are bidimensional (m = 2) and social surplus is given by 1 S(θ, q) = (θ1 + θ2 )q − q 2 . 2 It is then obvious that θˆ ≡ θ1 + θ2 is a one-dimensional sufﬁcient statistic for the consumer’s preferences, and the monopolist’s can be solved by applying the usual techniques to the distribution of θˆ . Even in this simple transformable setting, however, we can see that everything is not the same as in the canonical one-dimensional model. The primary difference is the exclusion property discovered by Armstrong (1996). Suppose, indeed, that θ = (θ1 , θ2 ) has a bounded density on R2+ , or on a rectangle [ θ 1 , θ¯1 ] × [ θ 2 , θ¯2 ] (or any domain with a “southwest” corner). Then, it

170

Rochet and Stole

is easy to see that the density of θˆ , obtained by convolution of the marginals20 of θ1 and θ2 tends to zero when θˆ tends to the lower bound of its support. As a result, the inverse hazard rate tends to inﬁnity, which implies the existence of an exclusion region at the bottom that would not necessarily emerge if either θ1 or θ2 was observable and contractible. There is an associated intuition that relates to the envelope theorem: raising prices by ε raises revenues from inframarginal buyers by a ﬁrst-order amount at a loss of a second-order measure of consumers in the southwest corner of the support of types, ε2 . We will see this insight extends to more general settings; for example, Armstrong (1996) originally demonstrates this result for the separable setting discussed in the following section.21 It is worth noting that, whereas the aggregation technique appears trivial in our toy example, it is often more subtle and arises from a property of the market setting. For example, Biais, Martimort, and Rochet (2000) consider a market maker who sells a risky asset to a population of potential investors, characterized by two dimensions of adverse selection, θ = (θ1 , θ2 ) (using our notation): θ1 corresponds to the investor’s fundamental information; i.e., his evaluation of the asset’s liquidation value [the true liquidation value is θ1 + ε˜ , where ε˜ is N (0, σ 2 ) and independent of θ2 ]; θ2 corresponds to a sort of personal taste variable – namely, the initial position of the investor in the risky asset (his hedging needs). If he buys q units of the asset for a total price P(q), the investor’s ﬁnal wealth is ˜ (q) = W0 − P(q) + (θ1 + ε˜ )(θ2 + q), W where W0 denotes his initial endowment of money. Assuming that the investor has constant absolute risk aversion preferences [u(W ) = −e−ρW ], the certainty equivalent of trading q units is 1 V (q) = W0 − P(q) + θ1 (θ2 + q) − ρσ 2 (θ2 + q)2 . 2 Thus, the net utility of trading q is given by 1 U = V (q) − V (0) = (θ1 − σ 2 θ2 )q − ρσ 2 q 2 − P(q). 2 Even though the initial screening problem is bidimensional, the simpliﬁed

20 21

As noticed by Miravete (1996), the monotone hazard-rate property is preserved by convolution. Armstrong (1996) shows that the exclusion property is also true when is strictly convex. However, suppose that θ is uniformly distributed on a rectangle that has been rotated 45 degrees: = {θ ∈ R2 , θ ≥ θ1 + θ2 ≥ θ¯ , −d ≤ θ1 − θ2 ≤ d}. Then, it is easy to see that θˆ has a uniform distribution on [ θ, θ¯ ], which implies that q ∗ (θ ) = 2θˆ − θ¯ and that the exclusion region vanishes when 2θ > θ¯ . This shows that the exclusion property discovered by Armstrong (1996) is not intrinsically related to multidimensionality, but rather to the properties of the distribution of types.

Multidimensional Screening

171

version of the problem reduces to a one-dimensional screening problem with a sufﬁcient statistic θˆ = θ1 − ρσ 2 θ2 that aggregates the two motives for trade. Other examples of this sort appear in Laffont and Tirole (1993) and Ivaldi and Martimort (1994). Ivaldi and Martimort (1994) study a model of competition with two dimensions of preference heterogeneity, which, given their assumptions about distributions and preferences, aggregates into a model with a one-dimensional statistic. Laffont and Tirole (1993) study regulation of multidimensional ﬁrms in a model combining adverse selection and moral hazard. By assuming that costs are observable to the regulator, they effectively transform their problem into a pure screening model, amenable to the technique presented here. In particular, when the unobservable technological parameters of the ﬁrms (their “types”) are multidimensional, Laffont and Tirole ﬁnd conditions, inspired by the aggregation theorems of Blackorby and Schworm (1984), under which the type vectors can be aggregated into a single number.

4.2.

Separability

Wilson (1993a, 1993b) and Armstrong (1996) were the ﬁrst to provide closedform solutions to multidimensional screening models. These solutions are all of the separable type. An illustration can be given in our framework by assuming linear parametrization of surplus with respect to types S(θ, q) = θ · q − W (q), where W is convex such that ∇W (0) = 0, and a density f of types that depends only on #θ#. Consider, for example, the case where there are two goods (m = 2), and f is the density of a truncated normal on a quarter of a circle of center 0 and radius R > 1. Wilson (1993a) and Armstrong (1996) ﬁnd conditions under which the solution to the monopolist problem depends only on the distribution of types along the “rays” (i.e., the straight lines through the origin). In other words, they look for cases where the only binding IC constraints are “radial” (see the Figure 5.4). θ2 ✻

✁ ✁ ✁

✁ ✁✁ ✁✁ ☛ ✁ ✁✁ ☛ ✁ ✁✁ ☛✁ ✁ ✁

✟✟ ✟✟ ✙ ✟✟ ✟ ✟ ✙ ✟ ✟✟ ✙✟ ✟ ✟✟

✁ ✁ ✟✟ ✁ ✟✟ ✁✟✟ ✟ ✁

“radial IC constraints”

✲ θ1

Figure 5.4. Radial incentive compatibility constraints.

172

Rochet and Stole

If this is the case, the solution can be determined by computing the conditional distribution of types αalong the rays. This is done by introducing the change of variable θ = t cos sin α , with t ∈ [0, R] and α ∈ [0, π/2]. The change of variable formula for multivariate densities gives the conditional density along the rays: t2 , 2 which does not depend on α. The virtual surplus is easily computed as g(t) = t exp −

!(θ, q) = S(θ, q) −

1 − G(t) θ · q, g(t) #θ#

which gives, after easy computations: 1 !(θ, q) = 1 − θq − W (q). #θ#2 This virtual surplus is maximized for q ∗ (θ ) deﬁned implicitly by ∇W (q) = (1 − 1/#θ #2 )θ for #θ # ≥ 1, and q = 0 for #θ # < 1. If we use the indirect surplus function S ∗ (θ ) = maxq {θ · q − W (q)}, this is equivalent to: q ∗ (θ ) = ∇ S ∗ ([1 − 1/#θ #2 ]+ · θ ), where [x]+ denotes max(0, x). We now have to check whether this function q ∗ satisﬁes the necessary conditions of the monopoly problem, namely boundary conditions and implementability conditions. The boundary conditions require that the boundary of = R2+ be partitioned into two regions: r ∂0 , where u(θ) ≡ 0 (no rent) r ∂1 , where the gradient of the surplus is tangent to the boundary (no distortion at the boundary). These two regions are represented in Figure 5.5. Notice that the boundary condition is satisﬁed in ∂1 only because the extreme rays are tangent to the boundary. This property would not be satisﬁed if the support of θ was shifted by an arbitrarily small vector. This more complex case is discussed in Section 9.2. On the other hand, Armstrong (1996) discovered a robust property of the solution, namely the existence of an exclusion region (where u ≡ 0): in our example, it corresponds to the region #θ# ≤ 1. This is explained by the fact that, for “regular” distributions on R2+ (similar properties hold for many other domains), the conditional densities along the rays tend to zero when #θ# tends to zero, which implies that inverse hazard rates tend to inﬁnity, as discussed in Section 4.1. It remains to check that implementability conditions are satisﬁed. Due to the linearity of preferences with respect to θ , these implementability conditions are equivalent to saying that q ∗ is the gradient of a convex function (i.e., that Dq ∗ is a symmetric, positive deﬁnite matrix). Easy computations show that symmetry is equivalent to saying that S ∗ (θ ) depends only on #θ #; that is, it possesses the same type of symmetry as the density of types. If this property is satisﬁed, the

Multidimensional Screening

173

θ2 ✻

∂1 Θ no distortion at the boundary

No distortion at

✲ ❄

1s

✂

✂

✂

✂

✂

✂

✂

✂

✚ the boundary

✂ ✂✄ ✎✂✌ ✂

✚

✚ ✚ ❂ ❂ ✱ ✱

❂✱ ✱

✱ ✱

✱

✱ ✱

u≡0

✯

✥ ✾ ✥ ✥✥✥ ✥✥✥ ✥ ✥ ✥ s

∂0 Θ

✣

1

✛ ✻

✲θ

1

no distortion at the boundary ∂1 Θ

Figure 5.5. Exclusion region and boundaries.

second-order conditions for implementability (i.e., the fact that Dq ∗ is positive deﬁnite) will be automatically satisﬁed. When this is not the case, the solution is much more complex to characterize, because bunching necessarily appears. We study such an example in Section 9.2. 5. ENVIRONMENTS WITH ONE-DIMENSIONAL INSTRUMENTS In many multidimensional screening problems, there are more dimensions of heterogeneity than instruments available to the principal (n < m). Here, we turn attention to the case of screening problems with one instrument (n = 1), but several parameters of adverse selection (m > 1) in which, even though a univariate sufﬁcient statistic exists, its distribution is endogenous, depending on the pricing schedule chosen by the ﬁrm. Typically, the set of instruments may be limited either by exogenous reasons [see, e.g., the justiﬁcations given by Rochet and Stole (2002) for ruling out stochastic contracts] or because the principal restricts herself to a subclass of all possible instruments. For example, Armstrong (1996) focuses on cost-based tariffs in his search of optimal nonlinear prices for a monopolist.22 Using our notation, the monopolist problem in Armstrong (1996) can then be simpliﬁed 22

Similarly, several authors e.g., Zheng (2000) and Che and Gale (1996a, 1996b), have studied score auctions, a particular subclass of multidimensional auctions in which the auctioneer aggregates bids using a prespeciﬁed scoring rule. As another example, Armstrong and Vickers (2000) consider price-cap regulation under the restriction of no lump-sum transfers.

174

Rochet and Stole

by computing indirect utilities V (y, θ) = max{v(q, θ) | C(q) ≤ y} q

representing the maximum utility attained by a consumer of type θ who gets a bundle of total cost less than or equal to y. The problem reduces then to ﬁnd the best one-dimensional schedule T (y) (n = 1) for screening a multidimensional distribution of buyers (m > 1). As in the one-dimensional case, there are two approaches available for this class of problems: the parametric-utility approach and the demand-proﬁle approach. The demand-proﬁle approach is typically far easier to implement, provided that the consumer’s preferences can be accurately summarized by a demand proﬁle that depends only on the marginal prices. Laffont, Maskin, and Rochet (1987) solved such a problem using the parametric-utility approach. Consider the scenario in which a monopolist sells only one good (n = 1) to buyers differing by two characteristics: the intercept θ1 and the slope −θ2 of their (individual) inverse demand curves. This corresponds to the following parameterization of preferences: 1 v(q, θ) = θ1 q − θ2 q 2 . 2 If we want to apply the parametric-utility methodology, we are confronted with the problem that implementability of an indirect utility function u(·) is more complex to characterize. Indeed, let P(q) be a given price schedule. The corresponding indirect utility u and allocation rule q satisfy $ 1 2 u(θ ) = max θ1 q − θ2 q − P(q) , q 2 where the maximum is attained for q = q(θ ). By the envelope principle, we have that u is again a convex function such that q(θ) for a.e. θ. ∇u(θ ) = − 12 q 2 (θ ) This shows that u necessarily satisﬁes a nonlinear partial-differential equation ∂u 1 ∂u 2 + = 0. (5.1) ∂θ2 2 ∂θ1 The monopolist’s problem can then be transformed as before into a calculus of variations problem in u and ∇u, but with the additional constraint (5.1) that makes the program difﬁcult. Interestingly, Wilson’s demand-proﬁle approach works very well in this case. Let us deﬁne the demand proﬁle for quantity q at marginal price p as N ( p, q) = Prob[v q (q, θ) ≥ p] = Prob[θ1 − θ2 q ≥ p]. Assuming a constant marginal cost c, the optimal marginal price p(q) = P (q) can be obtained by maximizing ( p − c)N ( p, q) with respect to p. If θ1

Multidimensional Screening

175

and θ2 are distributed independently according to cumulative distributions F1 and F2 (and densities f 1 and f 2 ), we obtain +∞ {1 − F1 ( p + θ2 q)} f 2 (θ2 ) dθ2 . N ( p, q) = 0

The optimal marginal price is deﬁned implicitly by +∞ {1 − F1 ( p(q) + θ2 q)} f 2 (θ2 ) dθ2 N ( p(q), q) p(q) = c − = c + 0 +∞ , N p ( p(q), q) f 1 ( p(q) + θ2 q) f 2 (θ2 ) dθ2 0

which generalizes the classical formula obtained when θ2 is nonstochastic: p(q) = c +

1 − F1 ( p(q) + θ2 q). f1

For example, when θ1 is exponentially distributed (i.e., f 1 (θ1 ) = λ1 e−λ1 θ1 ), the mark-up is constant and the two formulas coincide: p(q) = c + 1/λ1 . Notice also that θˆ = θ1 − θ2 q(θ) is a univariate sufﬁcient statistic, but unlike the case considered in Section 4.1, its distribution depends on q(θ ) and thus on the price schedule chosen by the monopolist. We now turn to a subset of these models with a single instrument, in which one dimension of type enters utilities additively. 6. ENVIRONMENTS WITH RANDOM PARTICIPATION 6.1.

A General Framework

We consider a class of environments in which n = 1, but in which a particular additivity assumption provides sufﬁcient structure to produce some general economic conclusions. Speciﬁcally, suppose that n = 1 and m = 2, but utility of the agent is restricted to the form u = v(q, θ1 ) − θ2 − P, where 1 = [ θ 1 , θ 1 ] and 2 = R+ . Several interesting economic settings can be studied within this model. First, we can think of the θ2 parameter as capturing a type-dependent participation constraint. Previous work on type-dependent participation has assumed that θ2 is a deterministic function of θ1 (e.g., they are perfectly correlated).23 In this sense, the framework generalizes the previous one-dimensional literature, although many of the more interesting results rely on independent distributions of θ1 and θ2 . 23

See, for example, Maggi and Rodriguez-Clare (1995), Lewis and Sappington (1989a, 1989b), and Jullien (2000).

176

Rochet and Stole

Second, one can think of θ2 as capturing a “locational cost” in a discretechoice model of consumer behavior.24 This allows one to extend the nonlinear pricing model of Mussa and Rosen (1978) to a more general setting, which may be important to obtain a more realistic model of consumer behavior. As an illustration, consider the predicted consumer behavior of the standard, onedimensional model following a uniform price increase from P(q) to P(q) + δ: the units sold at every quality level except the lowest should remain unchanged. This is because a shift in P(q) has no effect on any of the incentive compatibility conditions, since the shift occurs on both sides of the constraints. By adding the stochastic utility effect of θ2 , predicted market shares would smoothly change for all types, although perhaps more dramatically for lower types. Third, consider the regulatory setting ﬁrst discussed in Baron and Myerson (1982). There, a regulator designs an optimal mechanism for regulating a monopoly with unknown marginal cost. Suppose that, in addition, ﬁxed costs are also private information: i.e., C(q) = θ1 q + θ2 . Proﬁt for the regulated ﬁrm that receives T (q) as a transfer from the regulator for producing q units is π = T (q) − θ1 q + θ2 , that has a one-to-one correspondence with the previous monopoly setting.25 Other closely related examples that we discuss in more detail include selling to liquidity-constrained buyers, where θ2 captures the buyer’s available budget, regulation of a ﬁrm in an environment with demand and cost heterogeneity, competition between oligopolists selling differentiated products with nonlinear pricing, and competition among sellers providing goods via auctions. The key simpliﬁcations in all of these settings are twofold. First, one dimension of information enters additively. As such, q is unavailable for direct screening on this additive attribute. Second, attention is limited to deterministic26 price schedules, P(q). 24 25

26

See Anderson, de Palma, and Thisse (1992) for a review of this large literature, and Berry, Levinsohn, and Pakes (1995) for an econometric justiﬁcation of the additive speciﬁcation. Rochet (1984) ﬁrst solved this problem on an example with general mechanisms that rely on randomization. Applying Rochet and Stole’s (2002) results to this context is appropriate in the restricted setting in which the price schedule is deterministic. In this case, Rochet and Stole (2002) show that the presence of uncertainty over ﬁxed costs causes the optimal regulation to reduce the extent of the production distortion. Given the relevance of deterministic contracts, this may seem a reasonable restriction, a priori. In general, however, the principal may be able to do better by introducing a second screening instrument, φ, which represents the probability that the agent is turned away with q = 0. In this case, utility becomes φ(v(q, θ1 ) − θ2 − P) and φ can be used to screen different values of θ2 . On the other hand, it is without loss of generality to rule out such random mechanisms when either (i) the value θ2 is lost by participating in the mechanism (i.e., even if φ = 0), which eliminates the possibility to screen over θ2 ; alternatively, (ii) if the agent can anonymously return to the principal until φ = 1 is realized, the problem is stationary and the agent will continue to return until q > 0, and so there is no beneﬁt to the randomization. We leave the discussion of stochastic mechanisms unresolved and simply restrict attention to deterministic price schedules remaining agnostic about the reasons.

Multidimensional Screening

177

We take the joint density to be f (θ1 , θ2 ) > 0 on 1 × 2 , the marginal distribution of θ1 as f 1 (θ

θ1 ), and the conditional cumulative distribution function for θ2 as G(θ2 | θ1 ) ≡ θ 2 f (θ1 , t) dt. Deﬁne the indirect utility function 2

u(θ1 ) ≡ max v(q, θ1 ) − P(q). q∈Q

This indirect utility is independent of the additive component, θ2 , because it does not affect the optimal choice of q, conditional on q > 0. Net utility is given by u(θ1 ) − θ2 . Note that the agent’s participation contains an additional random component: i.e., the agent participates iff u(θ1 ) ≥ θ2 . Hence, an agent with type θ1 participates with probability G(u(θ1 ) | θ1 ), and the expected proﬁt of a mechanism that generates {q(θ1 ), u(θ1 )} for all participating agents is G(u(θ1 ) | θ1 ) (S(q(θ1 ), θ1 ) − u(θ1 )) f (θ1 ) dθ1 . 1

This is maximized subject to the standard one-dimensional incentive compati˙ 1 ) = v θ1 (q(θ1 ), θ1 ) and q(θ1 ) nondecreasing. In short, we bility conditions: u(θ have removed the typical corner condition that would require the utility of the lowest type – which we denote u ≡ u( θ 1 ) − to be zero, and instead introduced an endogenous determination of u. The endogeneity of u poses some difﬁculties that were not present in the onedimensional setting. First and foremost, part of the block-recursive structure is now lost: There is a nonrecursive aspect to the problem as the entire function q(θ1 ) and the initial condition u( θ 1 ) must be jointly determined. Given that a purchasing consumer’s preferences are ordered by a single-crossing property in (θ1 , q), the general problem of global vs. local incentive constraints is not present; incentive constraints are still recursive in their structure, although we may have to restrict q to a nondecreasing allocation. The problem is that the ﬁrstorder condition determining the optimal utility for the lowest-type u depends on the optimal quantity schedule, {q(θ1 )}θ1 ∈ 1 , and the ﬁrst-order equation for the latter (speciﬁcally, the Euler equation) depends on the value of the former. Thus, although the resulting system of equations is not a system of partial differential equations as is common in the general multidimensional continuous type setting, but rather a second-order boundary-value problem, it is still more complicated than the standard initial-value ﬁrst-order problem that arises in the canonical class of one-dimensional models. Finding general characteristics of the solution is difﬁcult without imposing some additional structure. A convenient restriction used in Rochet and Stole (2002) is to focus attention on independent distributions of θ1 and θ2 , requiring that the former is distributed uniformly on 1 and that the latter have a log-concave conditional cumulative distribution function.27 Even with these distributional simpliﬁcations, the additional effect on market share still 27

In Rochet and Stole (2002), some general results are nonetheless available in the two-type setting, providing that G(θ2 | θ1 ) is log-concave in θ2 .

178

Rochet and Stole

provides substantial difﬁculty. The primary cause of the difﬁculties is that the relaxed program (without monotonicity imposed) frequently generates nonmonotonic solutions. Hence, pooling occurs even with nonpathological distributions. Nonetheless, as a ﬁrst result, one can show that if pooling occurs, it occurs only for a lower interval on 1 and that otherwise efﬁciency occurs on the boundaries of 1 . This already is a substantial departure from the one-dimensional setting, and shares many similarities with the work of Rochet and Chon´e (1998) (see Section 9.2), especially the general presence of bunching and the efﬁciency on the boundaries. Several results emerge beyond pooling or efﬁciency at the bottom. First, as the distribution on 2 converges to an atom at θ2 = 0, the optimal allocation converges to that of the standard one-dimensional setting. Second, one can demonstrate that the optimal solution is always bounded above by the ﬁrst-best allocation and below by the MR allocation. This last result has a clear economic intuition behind it. Under the standard one-dimensional setting, there is no reason not to extract the maximal amount of rent from the agents. This is the reason for distorting output downward; it allows the principal to extract greater rents from the higher types without completely shutting off the lower types from the market. When participation depends monotonically on the amount of rent left to the agent, it seems natural to leave more rents to the agent on the margin, and therefore to reduce the magnitude of the distortions. The argument is a bit more involved than this, because the presence of pooling eliminates these simple envelope-style arguments. These results can be illustrated with a numerical example. Suppose that −θ2 /σ . Here, we use σ as a crude measure of 1 = [4, 5] and G(θ2 ) = 1 − e the amount of noise in the participation constraint. As σ goes to zero, the exponential distribution converges to an atom on zero. In the example, as σ becomes small, the optimal allocation converges pointwise to the MR allocation, although pooling emerges at the bottom. For σ sufﬁciently large, the allocation becomes efﬁcient on the boundaries of 1 (see Figure 5.6). Returning to our previous discussion of other applications, it should be clear that these results immediately extend to the regulatory environment of Baron and Myerson (1982), where marginal and ﬁxed costs are represented by θ1 and θ2 , respectively, and the regulator is restricted to offering a deterministic, nonlinear transfer schedule. Other settings ﬁt into this class of models in a less obvious manner. For example, consider the papers of Lewis and Sappington (1988) and Armstrong (1999a), which look at regulation of a ﬁrm in an environment of twodimensional private information: demand is q = x − p, marginal cost is c, and the ﬁrm’s private information is (x, c). The regulator observes only the price of the ﬁrm’s output and offers a transfer that depends on price, T ( p). The ﬁrm’s payoff is u = (x − p)( p − c) + T ( p); the regulator maximizes consumer surplus less transfer, W = 12 (x − p)2 − T ( p). Redeﬁne the private information as θ1 = x + c and θ2 = xc. Following similar arguments as previously described after substituting for the demand function,

Multidimensional Screening

179

quality, q(θ1) 5

4.5

.07

>5 .0 = 4

qFB(θ1) 4 = 1.0

3.5

= 0.25 qMR(θ1)

4.2

4.4

4.6

4.8

5 type, θ1

Figure 5.6. The monopoly solution with a uniform distribution of θ1 on [4, 5] and an exponential distribution of θ2 : G(u) = 1 − e−u .

we can deﬁne u(θ1 ) ≡ max p θ1 p − p 2 + T ( p) and p(θ1 ) to be the correspond˙ 1 ) = p(θ1 ) ≥ 0; ing maximizer. Note that the local IC constraint requires that u(θ second-order conditions require that u is convex in θ1 (i.e., p is nondecreasing in θ1 ). The ﬁrm will participate if and only if u(θ1 ) ≥ θ2 . The regulator’s program can then be written as G(u(θ1 ) | θ1 ) max { p(θ1 ),u(θ1 )} 1 1 2 × γ1 (θ1 , u(θ1 )) + p(θ1 )γ2 (θ1 , u(θ1 )) − p(θ1 ) − u(θ1 ) dθ1 , 2 ˙ 1 ) = p(θ1 ) and u(θ ¨ 1 ) ≥ 0, where γ1 (θ1 , θ2 ) = E[ 12 x 2 | θ1 , θ˜2 ≤ subject to u(θ ˜ θ2 ] and γ2 (θ1 , θ2 ) = E[c | θ1 , θ2 ≤ θ2 ]. As another example, the work on optimal taxation is frequently concerned about leaving rents to agents that is characterized as part of the principal’s objective function. Here, there is a natural connection to this class of models. As a last example, it is worth noting the recent work of Che and Gale (2000) on two-dimensional screening when one of the dimensions is the budget constraint of the buyer. In their framework, the monopolist is selling a good to a consumer with preferences u = θ1 q − P, but with a budget constraint given by θ2 . Hence, the indirect utility function is necessarily two-dimensional: u(θ1 , θ2 ) ≡

max

{q|P(q)≤θ2 }

θ1 q − P(q).

180

Rochet and Stole

This is a departure from the basic model presented in that θ2 does not enter utility linearly, and monetary payments can be directly informative about the buyer’s budget θ2 , because a buyer cannot pay more than he has available. Although this problem looks more complicated than the previous setting, the authors demonstrate that an optimal nonlinear pricing schedule is increasing, convex, and goes through the origin. This pins down the utility of the lowest type, u; efﬁciency at the top determines the other boundary. Although the resulting Euler equation generates a second-order differential equation, the solution can be found analytically in many simple examples. Formally, this setting differs from the previous in that the variable θ2 represents dissipated surplus in the case of Rochet and Stole (2002), but θ2 represents a constraint on how much money can be transferred to the principal in Che and Gale (2000). This minor difference nonetheless translates into a signiﬁcant effect on the nature of the solution: in Rochet and Stole (2002) the determination of the participation region is more difﬁcult than in Che and Gale’s (2000) setting, where the latter are able to demonstrate that the optimal tariff goes through the origin and generates full participation, albeit with distorted consumption.28 Finally, it is worth pointing out that the general class of problems contained in this section are closely related to models of nonlinear pricing with income effects. As Wilson (1993a) has noted in his discussion of income effects (i.e., in models in which the marginal utility of money is not constant, but varies either with wealth levels or with some related parameterization), in general the Euler conditions for optimality will consist of second-order differential equations (rather than ﬁrst-order in the canonical case) and ﬁxed fees may be part of an optimal pricing schedule. Using the demand-proﬁle approach, suppose that the income effect is modeled by a nonlinearity in money: N [P, p, q] = Prob[θ ∈

28

| MRS(q, I − P(q), θ) ≥ p(q)].

One may be tempted to solve the budget-constrained class of problems in Che and Gale (2000) by appealing to the aggregation results presented earlier. In particular, a natural candidate for a sufﬁcient statistic when there are unit demands, q ∈ [0, 1], is θ = min{θ1 , θ2 }. This line of reasoning is ﬂawed because min{θ1 , θ2 } is not a sufﬁcient statistic for the consumer’s marginal rate of substitution between money and q. A simple example from Che and Gale (2000) demonstrates this most clearly. Suppose ﬁrst that the consumer’s valuation for the good is distributed uniformly on 1 = [0, 1] and the consumer’s wealth is nonstochastic and equal to θ2 = 2. The revenue-maximizing unit price is P(1) = 12 and expected revenues are 14 . Utilizing a pricequantity schedule cannot further increase revenues. Now, suppose instead that the consumer’s valuation is ﬁxed at θ1 = 2, but wealth is a random variable distributed uniformly on 2 = [0, 1]. In this case, min{θ1 , θ2 } is identical as in the former setting, but now the monopolist can raise expected revenues by charging the price schedule P(q) = 2q for q ∈ [0, 1]. Each consumer of type θ2 purchases the fraction q = θ2 /2, and expected revenues are 12 . Aggregation fails because the marginal rates of substitution differ across the two settings and are not functions of the same aggregate statistic. In the ﬁrst, the marginal rate of substitution of q for money is θ1 if the total purchase price is less than or equal to 2 and 0 if the total price is greater than 2. In the second setting, the marginal rate of substitution is 2 if the total price is less than or equal to θ2 , and 0 otherwise.

Multidimensional Screening

181

Here, I represents income and the demand proﬁle depends on the marginal price, p(q), and the total price level, P(q), since the latter affects the marginal rate of substitution of q for money. The Euler equation is $ ∂N d ∂N [ p(q) − C (q)] − N+ [ p(q) − C (q)] = 0. ∂P dq ∂p Because the second component is totally differentiated by q, a second-order differential equation arises. The problem loses much of its tractability because N now depends on the total price level P as well as marginal price, p. Economically, the problem is complicated because the choice of a marginal price for some q will shift the consumer’s demand curve via an income effect, which will affect the optimality of other marginal prices. Hence, the program is no longer block-recursive in structure as in Rochet and Stole (2002). As one raises the marginal price of a given level of output, one also lowers the participation rate for all consumers who consume that margin or greater. It is not a coincidence that, in some models of self-selection, private information over income, and exponential utility, the nature of the optimal allocation resembles that of the allocations in the nonlinear pricing context with random participation, as in Salani´e (1990) and Laffont and Rochet (1998). 7. COMPETITIVE ENVIRONMENTS This section builds on the previous sections by applying various models to study the effects of competition on the design of screening contracts. There have been some limited attempts to model imperfect competition between ﬁrms competing with nonlinear prices within a one-dimensional framework. This, for example, is the approach taken in the papers by Spulber (1989), Ivaldi and Martimort (1994), and Stole (1995). Similarly, in most work on common agency in screening environments (e.g., Stole 1991 and Martimort 1992, 1996), the agent’s private information is of one dimension. Unfortunately, as argued previously, competitive models naturally suggest at least two dimensions of heterogeneity; so, the robustness of these approaches may be called into question. Several papers have considered competitive nonlinear pricing using a variety of methodologies. We brieﬂy survey a few papers using the demand-proﬁle methodology with some limited success. We then present a speciﬁc form of bidimensional heterogeneity that has been successful in applied work. 7.1.

A Variety of Demand-Proﬁle Approaches

Wilson (1993a, Chapter 12) surveys the basic economics of ﬁrms competing with nonlinear prices, outlining two general classes of models. The ﬁrst category supposes that there is some product differentiation between the ﬁrms. As before, an aggregate demand proﬁle can be constructed that measures the proportion of consumers who buy from ﬁrm i at least q units when the marginal price is p;

182

Rochet and Stole

this demand proﬁle obviously depends on the nonlinear price schedules offered by the other ﬁrms. The ﬁrst-order conditions for optimality now include terms capturing the ﬂux of consumer purchases on the boundaries, but also isolate a competitive externality. Wilson numerically solves two models of this sort. A second category of models discussed by Wilson (1993a) assumes that products are homogeneous. Now, to avoid the outcome of zero-proﬁt, marginalcost pricing between the competing ﬁrms, one has to assume some sort of extensive form game (e.g., a Cournot game where output is brought to market and then is subsequently priced with nonlinear price schedules, etc.). Several games are considered with a variety of strategic restrictions and results in Oren, Smith, and Wilson (1982) using the demand-proﬁle approach. 7.2.

A Speciﬁc Approach: Location Models (Hotelling Type)

The third, more recent, approach has been to model competition in multidimensional environments in which simple aggregation is not available by introducing one dimension of uncertainty to handle the differentiation between ﬁrms (e.g., brand location and, more generally, “horizontal” heterogeneity) and another dimension to capture important characteristics of consumer tastes that may be similar in effect across all ﬁrms (e.g., marginal willingness to pay for quantity/quality and, more generally, “vertical” heterogeneity). Recent papers that take this approach include Armstrong and Vickers (1999), Biglaiser and Mezzetti (1999), Rochet and Stole (2002), and Schmidt-Mohr and Villas-Boas (1999), among others. We brieﬂy survey the model and results in Rochet and Stole (1997, 2002) before remarking on the similar treatments by other authors. As we suggest, this framework for modeling oligopoly markets is quite general; we need only posit some distribution of horizontal preferences.29 What is fundamental is that our proposed model affords both a vertical preference parameter along the lines of Mussa and Rosen (1978), while also incorporating a measure of imperfect competition by allowing for distinct horizontal preferences.30 For simplicity, consider the case of two ﬁrms competing on either ends of a market with unit length and transportation cost σ . We will let θ2L ≡ θ2 denote the distance from a consumer located at θ2 to the left ﬁrm and θ2R ≡ 1 − θ2 denote the distance from the same consumer to the right ﬁrm. Preferences are as before: For a consumer of type (θ1 , θ2 ) consuming from ﬁrm j, an amount q j at a price 29 30

Such a framework has been usefully employed recently by Laffont, Rey, and Tirole (1998a, 1998b) and Dessein (1999) for studying competition between telecommunications networks. This modeling of competition is in the spirit of some recent empirical work on price discrimination. Leslie (1999), for example, in his study of Broadway theater ticket pricing ﬁnds it useful to incorporate heterogeneous valuations of outside alternatives to capture the presence of competing ﬁrms while maintaining a distinct form of vertical heterogeneity (in this case, income) to capture variation in preferences over quality. Because Leslie (1999) takes the quality of theater seats as ﬁxed, he does not solve for the optimal quality-price schedule. Similarly, Ginsburgh and Weber (1996) use a Hotelling-type model to study price discrimination in the European car market.

Multidimensional Screening

183 j

of P j , the consumer obtains utility of θ1 q j − θ2 − P j . We further assume that θ1 is distributed independently of θ2 , with F(θ1 ) and G(θ2 ) representing the distribution of types, respectively. Each ﬁrm simultaneously posts a publicly observable price schedule, Pi (qi ), after which each consumer decides which ﬁrm (if any) to visit and which price-quality pair to select. The market share of ﬁrm j among consumers of type (θ1 , θ2 ) can be computed easily: $ u j 1 u j − uk M j (u j , u k ) = G j min , + . (7.1) σ 2 2σ This comes from the fact that the marginal consumer of ﬁrm j is located at a distance that is the minimum of u j (θ1 )/σ (which occurs when the total market shares are less than one – the local monopoly regime) and 12 + (u j − u k )/2σ (which occurs when all the market is served – the competitive regime). Again, using the dual approach, we can write the total expected proﬁt of ﬁrm i as a functional involving the consumers’ rents u i (·) and u j (·) taken as the strategic variables of the two ﬁrms: u j (θ1 ) ≡ max θ1 q − P j (q), q

where P j is the price schedule chosen by ﬁrm j. We obtain θ¯1 {S(t, q j (t)) − u j (t)}M j (u j (t), u k (t)) dt, B j (u j , u k ) =

(7.2)

θ1

where qi is again related to u i by the ﬁrst-order differential equation u˙ j (θ1 ) = q j (θ1 ). We now look for a Nash equilibrium of the normal form game deﬁned by (7.1) and (7.2), where the strategy spaces of the ﬁrms have been restricted to u i consistent with nondecreasing quality allocations. This turns out to be a difﬁcult task in general, because of the monotonicity conditions [remember that q L (·) and q R (·) have to be nondecreasing]. However, if we neglect these monotonicity conditions (which can be checked ex post), competitive nonlinear prices can be characterized by a straightforward set of Hamiltonian equations. A numerical example is illustrative. Consider, for example, the case when θ1 is uniformly distributed on [4, 5], which is shown in Figure 5.7. For σ sufﬁciently large (i.e., σ > 14.8), the market shares of the two ﬁrms do not adjoin: each ﬁrm is in a (local) monopoly situation and the quality allocation is exactly the same as in our previously analyzed monopoly setting.31 Interestingly, when the market shares are adjoining for high θ1 (i.e., u L (θ 1 ) + u R (θ 1 ) ≥ σ ), but not all θ1 (i.e, u L (θ 1 ) + u R (θ 1 ) < σ ), the qualitative pattern of the solution remains identical (cf. Figure 5.7 below for σ = 10). However, when u L ( θ 1 ) + u R ( θ 1 ) ≥ σ (the fully competitive regime), it turns out that quality distortions disappear completely (cf. Figure 5.7 below, σ < 16/3). In this particular case, the equilibrium pricing schedules are cost-plus-fee schedules. 31

It can be proved that this local monopoly solution involves full separation.

184

Rochet and Stole

quality, q(θ1) 5

4.8

4.6

qFB(θ1), < 5.33

4.4

> 14.8 = 10.0

4.2

4.2

4.4

4.6

4.8

5 type, θ1

Figure 5.7. Quality choices in the oligopoly equilibrium for three regimes: fully competitive (σ < 16/3), mixed (σ = 10), and local monopoly (σ > 14.8). We assume that θ2 is uniform on [0, 1]; θ1 is uniform on [4, 5].

As demonstrated by Armstrong and Vickers (1999) and Rochet and Stole (2002), the result that efﬁciency in q emerges for a fully covered market is somewhat general, extending well beyond this simple example. Formally, if σ is sufﬁciently small so as to guarantee that every consumer in 1 × 2 purchases, in equilibrium each ﬁrm offers a cost-plus-fee pricing schedule, P j (q) = C(q) + F j , and each customer consumes the efﬁcient allocation q f b (t) from one of the two ﬁrms. Fundamentally, this result relies on full coverage and the requirement that the inverse hazard rate is constant over θ1 in equilibrium for each ﬁrm. More generally, we could think about an N -ﬁrm oligopoly with a joint distribution of (θ21 , . . . , θ2N ). Formally, let G i (u 1 , . . . , u N ) ≡ Prob[u i − θ2i ≥ max j=i (u j − j θ2 )], and let the inverse hazard rate be given by Hi (u 1 , . . . , u N ) =

G i (u 1 , . . . , u N ) . G i (u 1 , . . . , u N )

∂ ∂u i

Then, if d/du Hi (u, . . . , u) = 0, for each i, cost-plus-ﬁxed-fee pricing is an equilibrium outcome. Biglaiser and Mezzetti (1999), in a different context, consider the case of auctions for incentive contracts of a restricted form. Because sellers have heterogeneity over their ability to provide quality, each seller’s objective function takes a similar form as in Armstrong and Vickers (1999) and Rochet and Stole

Multidimensional Screening

185

(2002). Nonetheless, because of the structure of preferences and contracts, efﬁcient cost-based pricing emerges only in the limit as preferences become homogeneous. 8. SEQUENTIAL SCREENING MODELS A common setting in which multidimensional screening is particularly important is when information evolves over time. For example, an important paper by Baron and Besanko (1984) considers the environment in which a regulated ﬁrm learns information about its marginal cost over time, and the regulator sets prices over time as a function of the ﬁrm’s sequential choices.32 As another example, consider the problem of refund pricing studied by Courty and Li (2000). Here, a consumer purchases an airline ticket knowing that, with some probability, the value of the trip may change. The initial purchase price may depend on the refund price, particularly when marginal return to the ticket may be positively correlated with the likelihood of a change in plans or a high second-period valuation. As a third important example, Clay, Sibley, and Srinagesh (1992) and Miravete (1997) provide convincing evidence that individuals choose a variety of purchase plans, such as telephone services and electricity, which turns out ex post to be suboptimal, suggesting that the consumer is uncertain about his ﬁnal needs at the time of contracting. Miravete (1997) goes on to analyze this optimal two-stage tariff problem theoretically. In these settings, the agent learns at t = 1 an initial one-dimensional type parameter, θ1 , distributed according to F1 (θ1 ) on 1 , and enters into a contractual relationship with this private information. Making an appeal to the revelation principle, without loss of generality it is assumed that the ﬁrm offers a menu of nonlinear price schedules, {P(q, θˆ 1 )}θ1 ∈ 1 , which we index by ﬁrst-period report, θˆ 1 . Later, at t = 2, additional information is revealed to the agent that is economically relevant and that affects the agent’s marginal utility of the contractual activity in the ﬁnal period. We denote this second-period information as θ2 , which is conditionally distributed according to F2 (θ2 | θ1 ) on 2 with density f (θ2 | θ1 ). After the realization of θ2 , the consumer chooses a particular quantity from the schedule. Assume that the consumer’s ﬁnal utility is given by33 1 u = θ2 q − q 2 − P. 2

32 33

Baron and Besanko (1984) also address issues of moral hazard and optimal production choice over time in the context of their model. Note that θ2 can directly depend on θ1 in this setting. For example, it can be the case that θ2 = θ1 + x, where x is independently distributed from θ1 and is learned at t = 2; this is the setting studied in Miravete (1996). For conciseness, we do, however, require that the support 2 is independent of θ1 .

186

Rochet and Stole

Given the speciﬁc utility of the agent in this setting, we know that θ2 would be a sufﬁcient statistic for the agent’s preferences, and therefore at date t = 2, in absence of a prior contract, there would be a simple one-dimensional problem to solve – θ1 is payoff irrelevant conditional on θ2 .34 This class of models differs from previous formulations of the multidimensional problem in that the sequential nature of the information revelation restricts the agent’s ability to lie. Nonetheless, the recurring theme of this survey – that multidimensional models generally pose difﬁculties in determining the “tree” of binding incentive constraints – reappears in the sequential context as the single-crossing property is once again endogenous. Although the papers written to date on sequential screening have typically imposed sufﬁcient conditions to guarantee a complete ordering (i.e., the single-crossing condition), the source of the problem is still the familiar one. To see this clearly, consider the second stage of the relationship: incentive compatibility for any given schedule chosen at t = 1 is guaranteed by the standard methods. Speciﬁcally, deﬁning second-period indirect utility and optimal choice by 1 u(θˆ 1 , θ2 ) ≡ max θ2 q − q 2 − P(q | θˆ 1 ), q∈Q 2 1 q(θˆ 1 , θ2 ) ≡ arg max θ2 q − q 2 − P(q | θˆ 1 ), q∈Q 2 second-period incentive compatibility is equivalent to ∂u(θ1 , θ2 )/∂θ2 = q(θ1 , θ2 ) and q(θ1 , θ2 ) nondecreasing in θ2 . The approach is standard because preferences satisfy a single-crossing property in the second period. First-period incentive compatibility is more difﬁcult to address because single crossing in (q, θ1 ) is not exogenously given. Assuming second-period incentive compatibility, ﬁrst-period indirect utility as a function of true type θ1 and reported type θˆ 1 can be deﬁned as θ2 ˆ ˜ θ 1 | θ1 ) ≡ u(θˆ1 , θ2 ) f 2 (θ2 | θ1 ) dθ2 . u( θ2

The relevant ﬁrst-order local condition for truth-telling at t = 1 requires that θ2 ∂ f 2 (θ2 | θ1 ) d ˜ 1 | θ1 ) = u(θˆ 1 , θ2 ) dθ2 . u(θ dθ1 ∂θ1 θ2 ˜ 1 | θ1 )/∂θ1 ∂ θˆ 1 ≥ In the standard setting, the local second-order condition ∂ 2 u(θ 0, in tandem with a monotonicity condition, yields a sufﬁcient condition for 34

With more general preferences, however, we could remove the presence of a one-dimensional aggregate and still ﬁnd that the sequential mechanism restricts the manner in which the agent can misreport, thereby simplifying the set of binding incentive constraints.

Multidimensional Screening

187

global incentive compatibility – the global single-crossing property: ˜ 1 | θˆ 1 )/∂θ1 ∂ θˆ 1 ≥ 0 ∀θ1 , θˆ 1 ∈ ∂ 2 u(θ

1.

(scp)

In the present setting, this argument is not available. Instead, the focus is on maximizing the relaxed program (i.e., the program with only local ﬁrst-order incentive conditions imposed), and then checking ex post that the second-order conditions are satisﬁed. Substituting the agent’s ﬁrst-order condition into the principal’s program and integrating by parts twice, we obtain the following virtual surplus for the sequential design program: 1 ˜ θ 1 | θ 1 ), !(q, θ1 , θ2 ) = θ2 q − q 2 + α(θ1 , θ2 )q − u( 2 where α(θ1 , θ2 ) ≡

∂ F2 (θ2 | θ1 )/∂θ1 f 2 (θ2 | θ1 )

1 − F1 (θ1 ) . f 1 (θ1 )

Given that expected proﬁt is E θ1 ,θ2 [!(q, θ1 , θ2 )] and the IR constraint binds only for the lowest type θ1 = θ 1 , proﬁt is maximized by choosing q to maximize !(q, θ1 , θ2 ) pointwise over 1 × 2 . In general, the nature of the distortion will depend on the nature of the conditional distribution, F2 (θ2 | θ1 ). Baron and Besanko (1984) and Courty and Li (2000) consider the case of ﬁrst-order stochastic dominance (FSD): i.e., θ1 represents a ﬁrst-order FSD shift in the distribution of θ2 . Both demonstrate that, under FSD, the IR constraint will bind for the lowest type because: (i) utility in the second period is nondecreasing in θ2 , and (ii) θ1 shifts the distribution toward higher types. Hence, this relaxed program is cast in the appropriate form, and the principal will choose ˜ θ 1 , θ 1 ) = 0. Examining the relaxed program, we see that in the FSD case, u( α < 0, and the distortion in q is downward, away from the full-information solution. The intuition is by now familiar: By distorting consumption downward by a small amount, q, only a second-order loss in surplus arises, but a ﬁrst-order gain in rent reduction is obtained as captured by −αq. The difference in the sequential screening FSD model is that the dependence of future rents on θ1 depends on the informativeness function, α. Baron and Besanko (1984) note the importance of commitment in this context, because the secondperiod allocation will be constrained efﬁcient only if α(θ1 , θ2 ) happens to equal 1 − F2 (θ2 | θ1 )/ f 2 (θ2 | θ1 ). When are the global incentive constraints satisﬁed by the q that solves the relaxed program? Baron and Besanko (1984) are silent on the sufﬁcient conditions for global incentive compatibility, providing instead an example in which the global constraints are satisﬁed. Courty and Li (2000) demonstrate that if the resulting q(θ1 , θ2 ) allocation is nondecreasing in both arguments, then a price

188

Rochet and Stole

schedule exists that implements the allocation and satisﬁes global incentive compatibility.35 Courty and Li (2000) also consider the case in which θ1 parameterizes the distribution of θ2 in terms of a mean-preserving spread. Again, they demonstrate that the IR constraint will bind only for the lowest type, so the relaxed program is appropriate.36 Global incentive compatibility in the ﬁrst stage is again difﬁcult to assess, but Courty and Li provide a useful sufﬁcient condition to this end.37 Interestingly, this incentive problem shares many similarities with the one-dimensional problem in which the sign of the cross-partial of expected utility with respect to q and θ1 changes sign as one varies (q, θ1 ) ∈ Q × 1 ; see Araujo and Moreira (2000) for a discussion.38 Taking the distributional assumption of Courty and Li, the solution to the relaxed program has a simple economic description. For all stage-one types, θ1 < θ 1 , the principal introduces a distortion in the second-period adjustment. One can think of the ﬁnal price as a markup over cost that depends on the difference between the ﬁnal consumption and its expected value. The lower the initial type (i.e., the lower the noise in the second-stage marginal utility of consumption), the less valuable is the option to change consumption plans in the future. Note that variability creates higher value in expected consumption (which is why the IR constraint binds only for the lowest type, θ1 ) and hence the monopolist will offer a lower price to this less consumption-valuing segment. The high types have high variability and also 35

This follows immediately from the global sufﬁcient condition for incentive compatibility, θ2 ˜ 1 | θˆ 1 ) ∂ 2 u(θ ∂u(θˆ 1 , θ2 ) ∂ f 2 (θ2 | θ1 ) = dθ2 ≥ 0. ˆ ∂θ1 ∂θ1 ∂ θ 1 ∂ θˆ 1 θ2

36

37

38

Given that q(θ1 , θ2 ) is nondecreasing in the ﬁrst argument, the ﬁrst term in the integrand is a nondecreasing function of θ2 . Because θ1 represents an FSD improvement in θ2 , this integral must be nonnegative. Moreover, the fact that q(θ1 , θ2 ) is nondecreasing in its second argument guarantees second-period global incentive compatibility. Because global incentive compatibility does not require that q(θ1 , θ 2 ) be nondecreasing in the ﬁrst argument, when the relaxed solution is neither monotone nor globally incentive compatible, the solution to the unrelaxed program is more complex than simply using an ironing procedure on q in the relaxed program. This follows from the necessary condition for second-stage incentive compatibility: q(θ1 , θ2 ) is nondecreasing in θ2 . This (with the local ﬁrst-order condition) implies that u(θ1 , θ2 ) is convex in θ2 , and hence a mean-preserving spread must increase utility; hence the IR constraint can bind only for the lowest type, θ1 . One useful simplifying assumption used by Courty and Li is that the class of distribution functions passes through the same point θ2 = z for all θ1 . This assumption guarantees that α(θ1 , θ2 ) is negative (positive) for all θ2 < z (resp., θ2 > z). Providing that the resulting allocation q(θ1 , θ2 ) from the relaxed program is nondecreasing in each argument, global incentive compatibility is satisﬁed at both stages. This will be the case, for example, whenever (∂ F2 /∂θ1 )/ f 2 does not vary much in θ1 . Araujo and Moreira (2000) study a one-dimensional screening model where the single-crossing condition is relaxed. As a result, nonlocal incentive constraints can be binding. They derive optimal contracts within the set of piecewise continuous contracts, and apply their techniques to bidimensional models with (perfect) negative correlation between the two dimensions of individual heterogeneity.

Multidimensional Screening

189

high option value from altering future consumption. Hence, the ﬁrm can screen these customers from the low types by charging a premium for the initial ticket, but allowing a low-cost variation in the level of ﬁnal consumption. It is also the case that any θ1 type that draws θ2 = z in the second period will consume the efﬁcient allocation; in our present setting, this is q = θ2 = z. The actual allocation will rotate through this point, departing from the ﬁrst-best allocation q = θ2 increasingly as θ1 decreases. Although the ﬁnal allocation may have some individuals consuming above the ﬁrst-best level of output, this should not be considered an upward distortion. Rather, the distortion is in the amount of allowed stage-two adjustment; the principal optimally distorts this adjustment downward from the efﬁcient level.39 9. PRODUCT BUNDLING In the previous discussions, we have largely focused on a variety of models that are tractable at some expense in generality. Providing, for example, that either simple aggregation or separability exists, the type space is small and discrete, or n = 1, we can deal with multidimensional environments with some success. We now turn to a set of models in which multidimensional screening poses the most difﬁcult problems: n > 1 and m > 1 with nonseparable and nonaggregatable preferences. The most well-studied version of this problem is the problem of commodity bundling by a multiproduct monopolist. We will consider the papers in the literature in this context. 9.1.

Some Simple Bundling Environments

We begin with the simplest linear n-product monopolist bundling environment, where m = n. Consumer preferences are given by u=

n

θi qi − P,

i=1

where each θi is independently and identically distributed according to the distribution function F(θi ) on i . (Below, we extend this model to quadratic preferences.) The cost of production is assumed to be zero, but demands are for at most one unit of each product; hence without loss of generality qi ∈ [0, 1]. The monopolist’s space of contracts is assumed to be a price schedule P(q1 , . . . , qn ) deﬁned on the domain [0, 1]n . Given that preferences are linear in money and consumption, we can think of qi ∈ (0, 1) as representing either a lottery over unit consumption or partial (but deterministic) consumption. We seek to ﬁnd the optimal price schedule. Nonetheless, even in this simpliﬁed setting, we are still looking for a collection of 2n − 1 prices. 39

This effect is similar to that which arises in signaling models in which agents desire to signal variance to the market. See, e.g., Prendergast and Stole (1996).

190

Rochet and Stole

Unlike the full one-dimensional (n = m = 1) setting in which the economics of the downward distortion is well understood, it is difﬁcult to see the economics behind the optimal screening contract in multidimensional environments. This is in part because the multidimensional bundling environment is mathematically more complex, but also because there are at least two distinct economic effects. The ﬁrst is a familiar sorting effect in which consumption is distorted downward to reduce the rents to “higher” types; the second effect arises because if demand parameters are independently distributed, a law-of-large-numbers argument shows that multigoods have a “homogenizing” effect on consumer heterogeneity. To illustrate these effects, we will present two extreme forms of this model: when n = 2 and when n → ∞. 9.1.1.

The Case of n = m = 2: Similarities with the One-Dimensional Paradigm

When n = 2, given the symmetry of the problem, we are looking for two marginal prices, p(1) and p(2); i.e., the price for one good, and the price for a second good, having already purchased the ﬁrst good. The key insight is that even though the marginal values are independently distributed, the order statistics are positively correlated. This positive correlation makes the bundling environment akin to the classic one-dimensional paradigm. In short, provided that the ﬁrst-order statistic of one consumer is greater than the ﬁrst-order statistic of another, it is more likely than not that the second-order statistics are similarly ordered. Hence, it is probable that the two-good demand curves of any two consumers are nested. In this sense, a single-crossing property is present in a stochastic fashion. To demonstrate this more precisely, denote consumer θ’s ﬁrst- and secondorder statistics as θ (1) and θ (2) , and refer to the corresponding ﬁrst and second units of consumption as q (1) and q (2) . Considering that it is physically possible to consume the second unit only after having consumed the ﬁrst unit, the ﬁrm could think of this as a simple one-dimensional problem and construct the demand proﬁle as follows: N ( p, q (i) ) = Prob[θ (i) ≥ p]. One could then apply the one-dimensional paradigm for the demand proﬁle to this function to obtain the optimal marginal prices. This procedure, although possibly proﬁtable, will not obtain the maximum possible revenue. The reason why it may work in a crude sense is that a large subset of the type space will have nested demand curves (hence the one-dimensional single-crossing property will hold). Because not all types are so ordered, however, this procedure will fail to maximize revenue. To return to the intuition for why a large subset of types are ordered as if they had one dimension, think about two consumers, where consumer A has a higher ﬁrst-order statistic than consumer B. Conditional on this fact, it is also likely that consumer A will have a higher second-order statistic than B. In the case of uniformly distributed types on i = [0, 1], there is a three-fourths

Multidimensional Screening

191

probability that such a nesting of demand curves will emerge between any two consumers. If the types were perfectly positively correlated, then demand curves over {q (1) , q (2) } would always be nested, and we would be in the equivalent of a one-dimensional world. Because some types will have nonnested demand, a one-dimensional single-crossing property will not hold, and hence the simple demand-proﬁle procedure will not maximize proﬁts. Mathematically, the ﬁrm needs to account for the possibility of nonnested curves, and this alters the optimal price. A ﬁrm following the simple demand-proﬁle procedure incorrectly perceives its proﬁt to be π = p(1)Prob[θ (1) ≥ p (1) ] + p(2)Prob[θ (2) ≥ p (2) ], when in fact its proﬁt is given by π = p(1)(Prob[θ (1) ≥ p (1) ] + Prob[θ (1) < p (1) & θ (1) + θ (2) > p (1) + p (2) ]) + p(2)(Prob[θ (2) ≥ p (2) ] − Prob[θ (2) ≥ p (2) & θ (1) + θ (2) < p (1) + p (2) ]). There is an adjustment that must be made to the demand for each product that is not noted by the naive seller. Nonetheless, to the extent that these second terms are small, the naive one-dimensional screening approach does well in approximating the optimal solution. This simple example makes two points. First, the well-known economic principle behind nonlinear pricing in the one-dimensional model is still present in the two-dimensional model, albeit obscured. Second, as n becomes large, the likelihood that any two consumers will have ordered demand curves decreases to zero, suggesting that the one-dimensional intuition begins to wane as n increases. Although it is difﬁcult to make this second idea precise, we will see that a homogenizing effect along the lines of the law of large numbers removes most of the value for sorting as n increases, suggesting that the one-dimensional intuition is less appropriate for larger n. 9.1.2.

The Case of n = m → ∞: The Homogenizing Effect of the Law of Large Numbers

It has been noted in a few papers that an increase in n with independently distributed types has the effect of allowing a ﬁrm to capture most of the consumer surplus.40 The idea is simple: selling a single aggregate bundle (again assuming marginal cost is zero) can extract most of the consumer’s rents, because as n becomes large the per-unit value of this bundle converges to the sample mean. Using the argument in Armstrong (1999b), let si (θi ) ≡ θi qi∗ (θi ) − C(qi∗ (θi )) represent the social surplus generated by a consumer of type θ who consumes the full-information efﬁcient allocation, qi∗ (θi ). Suppose that the distribution of si (θi ) [derived from F(θi )] has mean µ and standard deviation σ . Then, a ﬁrm that offers cost-plus-fee pricing, P(q) = (1 − ε)µ + C(q), where 40

Schmalensee (1984), Armstrong (1999b), and Bakos and Brynjolfsson (1996).

192

Rochet and Stole 1

2

ε = 2 3 (σ/µ) 3 , will obtain expected proﬁts that converge to the full-information proﬁt level as n approaches inﬁnity.41 Armstrong demonstrates that this result easily extends to a setting with a particular form of positive correlation: u = θ0

n

θi qi − P.

i=1

Now, θ0 is a multiplicative shock, common to all n products, but independently distributed across consumers. Armstrong shows that as n increases, the ﬁrm’s proﬁt approaches that of a monopolist with uncertainty only over the common component. 9.2.

General Results on Product Bundling

In this section [based on Rochet and Chon´e (1998)], we generalize the bundling model presented to allow for multiple units demands. We come back to the general framework of nonlinear pricing by a multiproduct monopolist already studied by Wilson (1993a, 1993b) and Armstrong (1996) and presented in our Section 3.2. However, we do not assume the particular homogeneity properties of costs and types distributions that have allowed these authors to ﬁnd explicit solutions using the separability property. In other words, we consider the most general multidimensional screening model in which binding IC constraints are unknown a priori. For simplicity, we assume linear-quadratic preferences: $ n 1 2 u= θi qi − qi − P. 2 i=1 Like before, production costs are assumed to be constant and normalized to zero, but contrary to the simple bundling model presented previously, demands for each good are not restricted to be 0 or 1. Types θ are distributed on some convex domain , in accord with a continuous and positive density f (θ ). Building on our previous discussions, we want to characterize the optimal pricing policy of a monopolist, using the parametric-utility approach. The problem is thus to ﬁnd the function u ∗ that maximizes expected proﬁt E[π] = {S(θ, ∇u(θ)) − u(θ )} f (θ) dθ, over all convex, nonnegative functions u. When the second-order condition is not binding (i.e., when u ∗ is strictly convex), we already saw that u ∗ is characterized by two elements: 1. a partition of the boundary ∂ of into two subsets: r ∂0 , where u ∗ = 0 (binding participation constraint), and 41

More speciﬁcally, as shown in Armstrong (1999b), let π ∗ be the full-information expected proﬁt level and let π˜ be the expected proﬁt from the cost-plus-fee price schedule. Then, π˜ /π ∗ converges to 1 at speed n −1/3 .

Multidimensional Screening

193

r ∂1 , where ∂/∂q S(θ, ∇u(θ )) is orthogonal to the boundary of (no distortion along the boundary) 2. a set of paths γ connecting ∂0 to ∂1 , along which u ∗ is computed by integrating q ∗ (θ ) = ∇u ∗ (θ ). As proved by Armstrong (1996), the nonparticipation region 0 (where u ∗ = 0) typically has a nonempty interior, and u ∗ can be computed numerically by solving a free-boundary problem; that is, ﬁnding the curve 0 that partitions into two regions: 0 (where u ∗ = 0), and 1 ,where u ∗ > 0, and, in the latter region, u ∗ satisﬁes the Euler equation: ∂ div S(θ, ∇u(θ ) · f (θ ) = − f (θ ), ∂q together with the boundary condition stated previously. The problem is that, for most distributions of types [for details, see Rochet and Chon´e (1998)], the solution of this free-boundary problem violates the second-order conditions. The economic intuition behind this result is the presence of a strong conﬂict between the desire of the monopolist to limit the nonparticipation region (by pushing 0 toward the lower boundary of ) and “transverse” incentive compatibility constraints (that force 0 and thus 0 to be convex). By trading off these two effects, the typical shape of 0 will be linear, which means that, in the region immediately above it, u ∗ will depend only on a linear combination of θ1 and θ2 . This is a robust property of multidimensional screening problems: even with log concave distributions of types, bunching cannot be ruled out, and typically occurs in the “southwest” part of (i.e., for consumers with low valuations in all dimensions). From an economic viewpoint, it means that “pure bundling” (i.e., an inefﬁcient limitation of the choice set of consumers with low valuations) is a general pattern. Rochet and Chon´e (1998) consider, for example, the case where θ is exponentially distributed on [a, +∞)2 : with f (θ ) = exp(2a − θ1 − θ2 ) and a > 1. Because θ1 and θ2 are independently distributed and demands are separable, a natural candidate for the optimal price schedule is the best separable price, which can easily be computed: P(q1 , q2 ) = q1 + q2 + (a − 1)2 , giving rise to demands qi (θ ) = θi − 1, i = 1, 2. However, this cannot be the solution, because the nonparticipation region would be empty. In fact, the true solution has the characteristic pattern of multidimensional screening models, whereby is partitioned into three regions: r the nonparticipation region 0 , delimited by a ﬁrst boundary 0 (of equation θ1 + θ2 = τ0 ) in which u ∗ = 0, r the pure bundling region 1, delimited by a second boundary 1 (of equation θ1 + θ2 = τ1 ) in which consumers are forced to buy a bundle with identical quantities of the two goods (thus u ∗ is not strictly convex, because it depends only on θ1 + θ2 ), and ﬁnally

194

Rochet and Stole

r the fully separating region, where consumers have a complete choice and u ∗ can only be determined numerically. Rochet and Chon´e (1998) design a speciﬁc technique, the sweeping procedure, which generalizes the ironing procedure of Mussa and Rosen (1978) for dealing with this new form of bunching, that is speciﬁc to multidimensional screening problems. 10. CONCLUDING REMARKS In this survey, we have emphasized one general theme – that in models with multidimensional heterogeneity over preferences, the ordering of the binding incentive constraints is endogenous. Because the resulting endogenous ordering also is a source of our economic predictions, the difﬁculty in ﬁnding general, tractable mathematical models is particularly signiﬁcant. Notwithstanding this pessimistic appraisal, we have also emphasized in these pages that several solutions to this problem of endogenous ordering exist, all of which shed light on this issue. The simple discrete model we presented, together with a sketch of the algorithm for determining the endogenous ordering and a solution for the simple two-type case, is helpful in illustrating the economic multidimensional screening contracts – one of our primary goals in this paper. In addition, we present a variety of classes of restricted models that make the modeling tractable although still allowing sufﬁcient theoretical degrees of freedom for interesting economics to come out of the analysis. We are particularly heartened by the recent results applied to auctions, bundling, and other rich economic settings, especially competitive environments. ACKNOWLEDGMENTS We are grateful to Mark Armstrong and Patrick Legros for helpful comments. The ﬁrst author thanks CNRS for ﬁnancial support. The second author thanks the National Science Foundation for ﬁnancial support through the PFF Program. Any errors are our own. References Anderson, S., A. de Palma, and J.-F. Thisse (1992), Discrete Choice Theory of Product Differentiation. Cambridge, MA: MIT Press. Araujo, A. and H. Moreira (2000), “Adverse Selection Problems without the SpenceMirrlees Condition,” mimeo, IMPA, Rio de Janeiro, Brazil. Armstrong, M. (1996), “Multiproduct Nonlinear Pricing,” Econometrica, 64(1), 51–75. Armstrong, M. (1999a), “Optimal Regulation with Unknown Demand and Cost Functions,” Journal of Economic Theory, 84(2), 196–215. Armstrong, M. (1999b), “Price Discrimination by a Many-Product Firm,” Review of Economic Studies, 66(1), 151–168. Armstrong, M. and J.-C. Rochet (1999), “Multi-dimensional Screening: A User’s Guide,” European Economic Review, 43(4–6), 959–979.

Multidimensional Screening

195

Armstrong, M. and J. Vickers (1999), “Competitive Price Discrimination,” mimeo. Armstrong, M. and J. Vickers (2000), “Multiproduct Price Regulation under Asymmetric Information,” Journal of Industrial Economics, 48(2), 137–159. Bakos, Y. and E. Brynjolfsson (1996), “Bundling Information Goods: Pricing, Proﬁts and Efﬁciency,” Discussion Paper, MIT. Baron, D. P. and D. Besanko (1984), “Regulation and Information in a Continuing Relationship,” Information Economics and Policy, 1(3), 267–302. Baron, D. and R. Myerson (1982), “Regulating a Monopolist with Unknown Costs,” Econometrica, 50(4), 911–930. Berry, S., J. Levinsohn, and A. Pakes (1995), “Automobile Prices in Market Equilibrium,” Econometrica 63(4), 841–890. Biais, B., D. Martimort, and J.-C. Rochet (2000), “Competing Mechanisms in a Common Value Environment,” Econometrica, 68(4), 799–838. Biglaiser, G. and C. Mezzetti (1999), “Incentive Auctions and Information Revelation,” mimeo, University of North Carolina. Blackorby, C. and W. Schworm (1984), “The Structure of Economies with Aggregate Measures of Capital: A Complete Characterization,” Review of Economic Studies, 51, 633–650. Brown, S. and D. Sibley (1986), The Theory of Public Utility Pricing. New York: Cambridge University Press. Che, Y.-K. and I. Gale (1996a), “Expected Revenue of All-Pay Auctions and First-Price Sealed-Bid Auctions with Budget Constraints,” Economics Letters, 50(3), 373–379. Che, Y.-K. and I. Gale (1996b), “Financial Constraints in Auctions: Effects and Antidotes,” in Advances in Applied Microeconomics, Volume 6: Auctions, (ed. by M. R. Baye), Greenwich, CT: JAI Press, 97–120. Che, Y.-K. and I. Gale (1998), “Standard Auctions with Financially Constrained Bidders,” Review of Economic Studies, 65(1), 1–21. Che, Y.-K. and I. Gale (1999), “Mechanism Design with a Liquidity Constrained Buyer: The 2 × 2 Case,” European Economic Review, 43(4–6), 947–957. Che, Y.-K. and I. Gale (2000), “The Optimal Mechanism for Selling to a BudgetConstrained Buyer,” Journal of Economic Theory, 92(2), 198–233. Clay, K., D. Sibley, and P. Srinagesh (1992), “Ex Post vs. Ex Ante Pricing: Optional Calling Plans and Tapered Tariffs,” Journal of Regulatory Economics, 4(2), 115–138. Courty, P. and H. Li (2000), “Sequential Screening,” Review of Economic Studies, 67(4), 697–718. Dana, J. (1993), “The Organization and Scope of Agents: Regulating Multiproduct Industries,” Journal of Economic Theory, 59(2), 288–310. Dessein, W. (1999), “Network Competition in Nonlinear Pricing,” mimeo, ECARE, Brussels. Fudenberg, D. and J. Tirole (1991), Game Theory. Cambridge, MA: MIT Press, Chapter 7. Gal, S., M. Landsberger and A. Nemirouski (1999), “Costly Bids, Rebates and a Competitive Environment,” mimeo, University of Haifa. Ginsburgh, V. and S. Weber (1996), “Product Lines and Price Discrimination in the European Car Market,” mimeo. Goldman, M. B., H. Leland, and D. Sibley (1984), “Optimal Nonuniform Prices,” Review of Economic Studies, 51, 305–319. Green, J. and J.-J. Laffont (1977), “Characterization of Satisfactory Mechanisms for the Revelation of Preferences for Public Goods,” Econometrica, 45, 427–438.

196

Rochet and Stole

Guesnerie, R. and J.-J. Laffont (1984), “A Complete Solution to a Class of PrincipalAgent Problems with an Application to the Control of a Self-Managed Firm,” Journal of Public Economics, 25, 329–369. Hotelling, H. (1929), “Stability in Competition,” Economic Journal, 39, 41–57. Ivaldi, M. and D. Martimort (1994), “Competition under Nonlinear Pricing,” Annales d’Economie et de Statistique, 34, 71–114. Jehiel, P., B. Moldovanu, and E. Stacchetti (1999), “Multidimensional Mechanism Design for Auctions with Externalities,” Journal of Economic Theory, 85(2), 258–293. Jullien, B. (2000), “Participation Constraints in Adverse Selection Models,” Journal of Economic Theory, 93(1), 1–47. Laffont, J.-J., E. Maskin, and J.-C. Rochet (1987), “Optimal Nonlinear Pricing with Two-Dimensional Characteristics,” in Information, Incentives, and Economic Mechanisms., (ed. by T. Groves, R. Radner, and S. Reiter), Minneapolis MN: University of Minnesota Press. Laffont, J.-J., P. Rey, and J. Tirole (1998a), “Network Competition: Overview and Nondiscriminatory Pricing,” Rand Journal of Economics, 29(1), 1–37. Laffont, J.-J., P. Rey, and J. Tirole (1998b), “Network Competition: Price Discrimination,” Rand Journal of Economics, 29(1), 38–56. Laffont, J.-J. and J.-C. Rochet (1998), “Regulation of a Risk-Averse Firm,” Games and Economic Behavior, 25, 149–173. Laffont, J.-J. and J. Tirole (1986), “Using Cost Observation to Regulate Firms,” Journal of Political Economy, 94, 614–641. Laffont, J.-J. and J. Tirole (1993), A Theory of Incentives in Regulation and Procurement. Cambridge, MA: MIT Press. Leslie, P. (1999), “Price Discrimination in Broadway Theatre,” mimeo, UCLA. Lewis, T. and D. Sappington (1988), “Regulating a Monopolist with Unknown Demand and Cost Functions,” Rand Journal of Economics, 19(3), 438–457. Lewis, T. and D. Sappington (1989a), “Inﬂexible Rules in Incentive Problems,” American Economic Review, 79(1), 69–84. Lewis, T. and D. Sappington (1989b), “Countervailing Incentives in Agency Problems,” Journal of Economic Theory, 49, 294–313. Maggi, G. and A. Rodriguez-Clare (1995), “On Countervailing Incentives,” Journal of Economic Theory, 66(1), 238–263. Martimort, D. (1992), “Multi-principaux avec Anti-selection,” Annales d’Economie et de Statistique, 28, 1–37. Martimort, D. (1996), “Exclusive Dealing, Common Agency, and Multiprincipals Incentive Theory,” Rand Journal of Economics, 27(1), 1–31. Mas-Colell, A., M. Whinston and J. Green (1995), Microeconomic Theory. New York: Oxford University Press. Maskin, E. and J. Riley (1984), “Monopoly with Incomplete Information,” Rand Journal of Economics, 15, 171–196. McAfee, R. P. and J. McMillan (1987), “Competition for Agency Contracts,” Rand Journal of Economics, 18(2), 296–307. McAfee, R. P. and J. McMillan (1988), “Multidimensional Incentive Compatibility and Mechanism Design,” Journal of Economic Theory, 46(2), 335–354. Miravete, E. (1996), “Screening Consumers Through Alternative Pricing Mechanisms,” Journal of Regulatory Economics, 9(2), 111–132. Miravete, E. (1997), “Estimating Demand for Local Telephone Service with Asymmetric Information and Optimal Calling Plans,” Working Paper, INSEAD.

Multidimensional Screening

197

Mirrlees, J. (1971), “An Exploration in the Theory of Optimum Income Taxation,” Review of Economic Studies, 38(114), 175–208. Mirrlees, J. (1976), “Optimal Tax Theory: A Synthesis,” Journal of Public Economics, 6(4), 327–358. Mussa, M. and S. Rosen (1978), “Monopoly and Product Quality,” Journal of Economic Theory, 18, 301–317. Myerson, R. (1981), “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. Myerson, R. (1991), Game Theory. Cambridge, MA: Harvard University Press. Oren, S., S. Smith, and R. Wilson (1983), “Competitive Nonlinear Tariffs,” Journal of Economic Theory, 29(1), 49–71. Prendergast, C. and L. Stole (1996), “Impetuous Youngsters and Jaded Oldtimers: Acquiring a Reputation for Learning,” Journal of Political Economy, 104(6), 1105– 1134. Rochet, J.-C. (1984), “Monopoly Regulation with Two Dimensional Uncertainty,” mimeo, Universit´e Paris 9. Rochet, J.-C. (1987), “A Necessary and Sufﬁcient Condition for Rationalizability in a Quasi-linear Context,” Journal of Mathematical Economics, 16(2), 191–200. Rochet, J.-C. and P. Chon´e (1998), “Ironing, Sweeping, and Multidimensional Screening,” Econometrica, 66(4), 783–826. Rochet, J.-C. and L. Stole (1997), “Competitive Nonlinear Pricing,” mimeo. Rochet, J.-C. and L. Stole (2000), “Nonlinear Pricing with Random Participation,” Review of Economic Studies 69(1), 277–311. Salani´e, B. (1990), “Selection Adverse et Aversion pour le Risque,” Annales d’Economie et de Statistique, 18, 131–149. Schmalensee, R. (1984), “Gaussian Demand and Commodity Bundling,” Journal of Business, 57(1), Part 2, S211–S230. Schmidt-Mohr, U. and M. Villas-Boas (1999), “Oligopoly with Asymmetric Information: Differentiation in Credit Markets,” Rand Journal of Economics, 30(3), 375–396. Sibley, D. and P. Srinagesh (1997), “Multiproduct Nonlinear Pricing with Multiple Taste Characteristics,” Rand Journal of Economics, 28(4), 684–707. Spence, M. (1980), “Multi-Product Quantity Dependent Prices and Proﬁtability Constraints,” Review of Economic Studies, 47, 821–841. Spulber, D. (1989), “Product Variety and Competitive Discounts,” Journal of Economic Theory, 48, 510–525. Stole, L. (1991), “Mechanism Design and Common Agency,” mimeo, MIT. Stole, L. (1995), “Nonlinear Pricing and Oligopoly,” Journal of Economics and Management Strategy, 4(4), 529–562. Stole, L. (1997), “Lectures on the Theory of Contracts”, mimeo, University of Chicago. Wilson, R. (1993a), Nonlinear Pricing. Oxford, UK: Oxford University Press. Wilson, R. (1993b), “Design of Efﬁcient Trading Procedures,” in The Double Auction Market: Institutions, Theories and Evidence, Chapter 5, Santa Fe Institute Studies, Volume 14, (ed. by D. Friedman and J. Rust), Reading, MA: Addison Wesley, 125–152. Wilson, R. (1996), “Nonlinear Pricing and Mechanism Design,” in Handbook of Computational Economics. Volume 1. Handbooks in Economics, Volume 13, (ed. by H. Amman, D. Kendricks, and J. Rust), New York: Elsevier Science, 253–293. Zheng, C. G. (2000), “Optimal Auction in a Multidimensional World,” Discussion Paper, Northwestern University.

A Discussion of the Papers by Pierre-Andre Chiappori and Bernard Salani´e and by Jean Charles Rochet and Lars A. Stole Patrick Legros

Each of these surveys is a “must read”: anyone who wants to analyze multidimensional screening models should start by reading Rochet and Stole (RS), and anyone who wants to do empirical work on contracts should begin with Chiappori and Salani´e (CS). I will start this discussion (Section 1) by what I perceived to be the main message of each survey. Although the two papers are quite different in nature and in focus, they both remind us why we should be interested in contracts and organizations: when markets are incomplete or imperfect, contracts and organizations are the relevant allocation devices and are not neutral from an “efﬁciency” point of view. Therefore, if we want to understand the effects of economic policies, macroeconomic shocks, technological shocks on the performance of ﬁrms, or the economy, we are bound ﬁrst to answer two questions. 1. What are the effects of contractual and organizational choices on behavior and economic performance? 2. What are the determinants of contractual choices? RS and CS show how answers to these questions can be enhanced by theoretical and empirical work in contract theory. Reading these surveys and the literature, it seems fair to acknowledge two tendencies: ﬁrst, that empirical work has been an active consumer of theory, but that theory has been a more timid consumer of empirical work and, second, that we seem to have many answers to (1), but fewer answers to (2). I will therefore develop two themes in my discussion: the necessity of a constructive dialogue between theory and empirical work, and the necessity to provide theoretical models that will more

Discussion

199

accurately capture market forces. Although the ﬁrst theme is clearly present in RS and CS, the second theme is less present in these surveys, but is a logical consequence of the agendas described in RS and CS. Section 2 develops the two themes, and Section 3 illustrates these themes with some examples taken from CS. 1. ROCHET-STOLE AND CHIAPPORI-SALANI E´ 1.1.

RS: Multidimensional Screening

The difﬁculty in multidimensional screening models is the lack of a natural order on types. The problem is not so much one of feasibility, because RS show an algorithm by which the solution can be computed. The problem is rather the possibility to obtain robust qualitative results (similar, e.g., to the “no inefﬁciency at the top” result in the one dimension). RS provide a useful classiﬁcation of the multidimensional models into three categories. They show that, for two of them (aggregation and separability), such robust results can be obtained. The properties of the solution in the aggregation case (i.e., when the multidimensionality can be reduced to one dimension by using an aggregator) are (obviously) related to the distribution of the aggregator. RS footnote 21 nicely illustrates this point. More important differences arise in the separability case (when transversality conditions can be ignored): bundling at the bottom and the possibility of efﬁciency at the top and at the bottom when one looks at one dimension only. RS convincingly show that a rich new set of economic problems can be studied by going from one to two (or more) dimensions. Budgetary constraints, sequential screening, and multiple product purchase are naturally modeled as multidimensional screening problems that can be analyzed at times as simply as in the one-dimensional case. Because in practice not all dimensions can be quantiﬁed or instrumented, a challenge faced by theory is to provide results like those in Figure 5.6 of RS (i.e., to establish a relationship between the endogenous variable and the quantiﬁable dimension). Figure 5.6 summarizes the relationship between the noise in the distribution of outside options and the quantity schedule contingent on the ﬁrst dimension in a parametric example. Because outside options are not observable, the relevant exogenous variable in a regression would indeed be the ﬁrst dimension only (the residual would then be the noise in outside options). We observe that all solutions are increasing in the ﬁrst dimension, and that the schedule becomes “ﬂatter” as the noise in outside options becomes larger. Note also that there is a U-shaped relationship between the noise and the size of the bundling region at the bottom. The comparative static results in Figure 5.6 are therefore quite useful from a theoretical perspective, since they tell us how noise in outside options yields different quality-price schedules than the ﬁxed (and uniform) outside option case. However, it is not clear how easy it will be to identify these results. For instance, a change in the ﬂatness of the optimal schedule could be obtained

200

Discussion

in the one-dimensional case by changing the distribution function (since the ﬂatness is related to the hazard rate). It is not clear at this point how one can empirically distinguish a multidimensional model in which the second dimension is a (random) outside option from a one-dimensional model. There is a sense, however, in which this difﬁculty is also a strength, because the interpretation of the residual as unobserved outside options might be more satisfying than the interpretation in terms of measurement error. 1.2.

CS: Capturing (Endogenous) Heterogeneity

CS’s survey covers a lot of ground. They identify early on the main challenge that empirical work must face: controlling for heterogeneity and endogeneity of the contractual relationships. If agents self-select into ﬁrms or contracting relationships, the outcome of the relation, as well as the contract itself, are explained by the characteristics of the agents, whereas the modeler would be tempted to see the contract as the endogenous variable and the characteristics of the agents in the relationship as the exogenous variables. Their warning should also echo to theorists. CS show that it is possible to create or ﬁnd good data sets to test a variety of important questions: incentive effects of compensation schemes, relative importance of adverse selection and moral hazard to explain behavior in markets, role of reputation, and effects of contractual instruments (e.g., insurance deductible, technology that make contracts more complete). At the same time CS make clear the difﬁculties in meeting their challenge: controlling for the selection effect, distinguishing between the available theoretical models, and controlling for quasi-rents. The task of identifying the incentive effect is already daunting. Trying to identify whether the form of contracting is “optimal,” as they set to do in their Section 3, is certainly even more daunting. For instance, principal-agent theory simply tells that, for a given outside option of the agent, there exists a second-best optimal contract that maximizes the level of utility of the principal. Changing the outside option might – and often will – also change the form of the second-best contract. Hence, unless there is a good way to proxy for the outside option, or for the market forces that affect this outside option, it is not clear how one can answer the question, “Are contracts optimal?” This problem is even more severe when other organizational instruments such as monitoring, auditing, size of the hierarchy, etc. deﬁne the form of the contract. 2. TOWARD A CONSTRUCTIVE DIALOGUE 2.1.

More Theory or More Facts? Necessity of a Dialogue

Research in contract theory has proceeded like most other scientiﬁc endeavors: one step at a time. It has isolated sources of imperfections and has analyzed the consequences of these imperfections for contracts, prices, or organizations. This

Discussion

201

literature has generated a large “toolbox” consisting of a host of models (e.g., adverse selection, moral hazard, multitasks, teams, principal-agent, principalagents, principals-agent, principals-agents, additive noise, multiplicative noise, complete contracting, incomplete contracting, dynamic contracting, career concerns). Do we now have an embarrassment of riches? To paraphrase the title of a recent paper,1 is contract theory plagued by too many theories and too few facts? I will argue in fact that we need more facts and more theory. A dialogue between theory and empirical work is necessary to identify the relevant omitted variables in theoretical and empirical research. Omitted variables are usually associated with econometric analysis. Theory is useful because it helps the econometrician pinpoint the relevant omitted variables, and how these variables affect the observed outcome. Here, the “embarrassment of riches” becomes the solution. Less appreciated perhaps is the fact that theoretical work also faces (by nature?) a problem of omitted variables. An analysis based on a moral hazard model will fail if the essence of the imperfection is adverse selection. If both moral hazard and adverse selection are important, a new model combining the two effects might be necessary if one expects new effects to emerge when the two imperfections are taken simultaneously into account. Here, empirical work helps by providing a “sanity check” on the relevance of a model in a given situation and by suggesting new avenues for research. Now, it is easy to make a model “more general”: generalize some assumptions. Ignoring issues of tractability, such generalizations seem to be useful for the dialogue with empirical work if they yield robust theoretical results that are qualitatively different from the simpler case and if these differences can be identiﬁed in empirical work. CS and RS are excellent illustrations of the beneﬁts of such a dialogue between theory and empirical work. The main focus of RS is on ﬁnding robust theoretical results and the main focus of CS is on identifying theoretical results in the data. However, and this is another theme of this discussion, while existing theoretical and empirical work can generate a dialogue to answer (1) – do incentives matter and how? – the theoretical literature uses a modeling paradigm that will eventually limit the possibility to pursue the dialogue successfully and answer (2) – what determines contracts? This modeling paradigm is the use of outside options for capturing market effects (i.e., forces external to the contract or the organization). Outside options capture the underlying market forces at play in the economy. The question is then which outside options correctly capture market forces. As I argue in the next section, there is a need for theoretical constructs that “bypass” the outside options and that capture directly the relationship between observable data and market forces. This would facilitate, for instance, the identiﬁcation of the effects in the random outside options model of RS, or the completion of the agenda set forth in Section 3 of CS.

1

Baker and Holmstr¨om (1995).

202

Discussion

2.2.

Omitted Variables

Contracts are shaped by a variety of forces or variables. Contract theory has mainly focused on the internal forces that shape an organization or a contract but has been relatively silent on the external forces that shape an organization.2 Examples of “internal” variables are: r Agents’ characteristics: risk aversion, talent, productivity, . . . . r Contractual instruments: monitoring, auditing instruments, delegation rights, screening devices, compensation schemes, . . . . whereas examples of “external” variables are r Policy variables: competition policy, regulation, . . . . r Market variables: distribution of characteristics, product market competition, process of matching, interest rate, market imperfections, . . . . For instance, we understand quite well that monitoring will reduce the cost of inducing productive effort and that more monitoring will be associated with ﬂatter compensation schemes. We understand less well why seemingly identical economies have ﬁrms with different monitoring intensities. We understand that an entrepreneur with more liquidity will need to forfeit less control to ﬁnance an investment. We understand less well the effects of economywide changes in liquidity on control structures. Interestingly, CS emphasize as one of the main sources of bias in empirical work on contracts the endogeneity of the match between contracting parties (i.e., an illustration of how market forces inﬂuence the characteristics of contracting parties, a question on which theory is most silent). Now, market forces are already taken into account in most models in contract theory, albeit in a shortcut sort of way. The “optimal contract” in a principalagent model is the solution to a constrained Pareto problem: maximize the welfare of the principal subject to a set of incentive constraints and subject to giving the agent his outside option (the participation constraint). The outside option of the agent captures his “market value.” By changing the outside option, one changes the nature of the optimal contract. Here is already a sense in which market forces matter for organizations. There is also a sense in which it is futile to test for the efﬁciency of contracting by using this type of model: by changing the outside option, we can generate as optimal contracts a large set of contracts. The fact that we do not observe directly outside options is another impediment. I will come back to this point in the examples at the end of this discussion.3 2 3

Nevertheless, a small, and growing, theoretical literature exists [e.g., Fershtman and Judd (1987), Legros and Newman (1996), Schmidt (1997), and Aghion, Dewatripont, and Rey (1999)]. What about many agents or many types of agents? Most of the literature has been developed under the assumption of a unique outside option. More generally, the outside option of a type can vary with the type (as in the countervailing incentive literature) or can be a random variable (as in RS). In each case, the relationship between types and outside options is quite important for the qualitative properties of the optimal contract. Because we do not observe directly the distribution of outside options, it is not clear how the new effects from these generalizations can be identiﬁed.

Discussion

203

Therefore, the outside option is a convenient theoretical shortcut, but is not an instrument that can be directly used in empirical work for capturing market forces. What seems needed is a theoretical apparatus that will articulate how outside options are determined. Such a mechanism will then link directly some observable and hopefully quantiﬁable variables to contractual or organizational forms, bypassing the outside options. Some of the work cited in footnote 2 goes in this direction, but much remains to be done on this front. 3. REVISITING SOME EXAMPLES Two of the examples in CS enable me to illustrate the beneﬁts of a dialogue between theory and empirical work, and the need to instrument for external forces. The ﬁrst example is about the role of risk in shaping contracts. The second example is about the form of compensation schemes between ﬁxed wage and piece rate. 3.1.

The Risk-Incentive Trade-off

Independently of the risk attitude of the agent, the creation of incentives requires variations in output-based compensation. The cost-minimizing schedule for a risk-neutral principal who wants to give a risk-averse agent his outside option is a perfectly ﬂat compensation schedule. Because a ﬁxed compensation is incompatible with incentives, some variation in compensation characterizes the second-best contract. If the risk inherent in production increases, two effects come into play: ﬁrst, more insurance should be provided to the agent to meet his outside option (for a given effort level); second, because the marginal expected return from effort changes, the incentive compatible level of effort changes (for a given compensation scheme). How the two effects interact is ambiguous; what is not ambiguous is that risk in production will have an effect on contracting. Some models4 predict a negative correlation between risk in production and variation in output-based compensation. A natural place to test this prediction is contracts for sharecropping: lands with more risky crops should be associated with sharecropping (tenant shares the risk with the owner), whereas lands with less risky crops should be associated with rental contracts (tenant faces all the risk). The empirical literature has shown that there is no such positive relationship between risk and sharecropping. CS cite the explanation of Ackerberg and Botticini (2002). Let us embed the basic sharecropping model in a two-sided matching model where one side, the workers, is differentiated by their risk attitude, and the other side, the crops, is differentiated by their riskiness. We will in a competitive equilibrium have more risk-averse agents be assigned to less risky crops, whereas risk-neutral agents would be assigned to more risky crops. Risk-neutral agents are willing 4

For example, the normal noise model with constant absolute risk aversion utility functions and linear-sharing rules.

204

Discussion

to accept to bear all risk (i.e., we should observe rental contracts for risky crops and sharecropping for less risky crops, which is consistent with stylized facts, but is the opposite to what a model with homogeneous workers would predict). Hence, theory omits both an “internal variable” – the heterogeneity in workers’ risk attitude – and an “external variable” – the competitive determination of the assignment of workers to crops. Here,“facts” force theory to identify relevant omitted variables. However, this is not the end of the dialogue. Imagine that workers indeed have the same risk attitude and that crops have different riskiness. Can theory still make sense of “the facts”? If yes, what are the relevant omitted variables? We can follow here an early work of Rao (1971).5 Because the ability to contract on output is linked to its veriﬁability, riskier crops prevent the use of output contingent contracts – absent technologies that make output veriﬁable. Hence, a proﬁt-maximizing land owner who can allocate resources between technologies that make input veriﬁable and technologies that make output veriﬁable will tend to favor output monitoring when crops are risky and to favor input monitoring when crops are less risky. Now, if there is input monitoring, it is easier to contract directly on the worker effort, and the contract should reﬂect ﬁrst-best risk-sharing arrangements, whereas if there is output monitoring, incentives will be created by having the worker bear more risk. Here, again, we obtain a negative correlation between riskiness of crop and sharecropping, absent heterogeneity in risk attitudes. Theory therefore points out an omitted internal variable – the ability to monitor (or measure) input and output6 – and emphasizes the trade-off between rent extraction and incentives.

3.2.

From Fixed Wages to Piece Rates

3.2.1.

Incentives Matter

CS cite the papers by Lazear (1999) and by Paarsch and Shearer (1999), who show how going from ﬁxed wage to piece rates will generate (large) productivity gains. For those of us who are interested in incentive theory, this is good news indeed. In the case of Paarsch and Shearer, the ﬁrm uses both piece rate and wage contracts, whereas in the case of Lazear there was a change in management that coincided with a change of compensation scheme from ﬁxed wage to piece rate. In the ﬁrst case, the observed productivity reﬂects both the contractual terms and 5

6

See, also, Allen and Lueck (1995), Newman (1999), and Prendergast (2000). Lefﬂer and Rucker (1991) show also that contractual choices are best explained by variables like enforcement costs or measurement costs rather than differences in risk attitudes. Interestingly, Ackerberg and Botticini (2000) conclude that there is no empirical support for the risk-sharing hypothesis, but that there is empiricial support for the moral hazard and the imperfect capital market hypotheses. A corollary of this story is that riskier crops should also be correlated with more delegation of authority to the worker. See Rao (1971) or Prendergast (2000).

Discussion

205

the land condition (piece rate is associated with good planting conditions).7 In the second case the observed productivity seems to reﬂect only the contractual change. Both studies are related to question (1). Paarsch and Shearer also partially answer question (2) because they see as a possible source of contractual choice the quality of the land. Lazear is more silent on (2). For both situations, outside options are not taken into account. This raises a natural question in the case of Lazear: Why did we observe the contractual change following the change of management? There are at least three possible answers. r Is it because there was some type of organizational innovation? This is not likely given the prevalence of piece-rate contracts elsewhere. r Is it because the previous management did not realize the productivity beneﬁts of using piece rates? Possibly (and could explain why the previous management was replaced). In this case, the contractual change generates sorting effects: high types are paid more and therefore will tend to “ﬂow” toward the ﬁrm more than before.8 r Or is it because the change of management coincided with a change in outside options (or other market conditions) of the workers?9 In this case, sorting effects generate the contractual change. It is because high types have a relatively larger outside option than low types that the contract must be piece-rate in order to minimize the cost to the ﬁrm of giving each type of agent his outside option. Here the omitted variable is external. 3.2.2.

Outside Options Matter

In the work cited by RS and by CS, moral hazard or asymmetric information was key to explaining the performance and nature of the contracts. As I have argued, external variables are also important. Here, I would like to propose a simple example showing how external variables could be sufﬁcient to explain, for instance, the choice of piece-rate versus wage contracts. Consider a risk-neutral principal who has limited liability (this is the ﬁrst “market variable”; there is a missing insurance market) and who contracts with a risk-averse worker. Assume that output is veriﬁable and that effort is contractable. To simplify, assume that there is a unique level of effort consistent 7 8 9

Note the parallel with the previous explanation for correlation between sharecropping and riskiness of crop. This is the observation of Lazear (1999) Think of a situation in which the type of a worker affects his private cost of production, but not the level of production. It is easy to show that, if there is any cost to implementing menu contracts, we will observe for relatively equal outside options a unique wage-effort contract, although if the outside options are more unequal, we will observe a menu contract that can be implemented by a piece-rate contract.

206

Discussion

Figure 1. Piece-rate contracts are optimal when outside option is large.

with production and that there is an equal probability that a low output R0 and a high output R1 are realized. The principal will therefore choose a contingent contract (w 0 , w 1 ) that minimizes the expected wage bill subject to two constraints: (i) the limited liability constraint that wages cannot exceed available output and (ii) the participation constraint that the expected utility of the agent is greater than his outside option u (this is our second “market variable”). The principal solves the problem min w 0 + w 1 u(w 0 ) + u(w 1 ) ≥ 2u w i ∈ [0, Ri ], i = 0, 1. It is straightforward to show that the cost-minimizing schedule is of the form w 1 = w 0 + b(R1 − R0 ) and that there exists a cutoff level u 0 = u(R0 ) such that when the outside option is smaller than u 0 , the optimal b is equal to zero (wage contract) and when the outside option is greater than u 0 , the optimal b is positive and increases in the outside option (piece-rate contract). This is also simply illustrated in the Edgeworth box diagram (Figure 1) where the contract curve corresponding to the previous problem is the thick line. For low values of the outside option, the contract curve is the full insurance line. For high values of the outside option, the limited liability constraint of the principal binds and prevents full insurance. A change from wage contracting to piece-rate contracting is therefore directly due to an increase in outside options, absent any agency problem.

Discussion

207

References Ackerberg, D. and M. Botticini (2002), “Endogenous Matching and the Empiricial Determinants of Contract Form,” Journal of Political Economy, 110(3), 564–591. Ackerberg, D. and M. Botticini (2000), “The Choice of Agrarian Contracts in Early Renaissance Tuscany: Risk Sharing, Moral Hazard, or Capital Market Imperfections?” Explorations in Economics History, 37(3), 241–257. Aghion, P., M. Dewatripont, and P. Rey (1999), “Competition, Financial Discipline and Growth,” Review of Economic Studies, 66, 825–852. Allen, D. W. and D. Lueck (1995), “Risk Preferences and the Economics of Contracts,” American Economic Review, 85, 447–451. Baker, G. and B. Holmstr¨om (1995), “Internal Labor Markets: Too Many Theories, Too Few Facts,” American Economic Review, 85(2), 255–259. Fershtman, C. and K. Judd (1987), “Equilibrium Incentives in Oligopoly,” American Economic Review, 77, 927–940. Lazear, E. (1999), “Performance Pay and Productivity,” mimeo (revised version of NBER W5672, 1996, with the same title). Lefﬂer, K. B. and R. R. Rucker (1991), “Transaction Costs and the Organization of Production: Evidence from Timber Sales Contracts,” Journal of Political Economy, 99(5), 1060–1087. Legros, P. and A. Newman (1996), “Wealth Effects, Distribution and the Theory of Organization,” Journal of Economic Theory, 70, 312–341. Newman, A. (1999), “Risk-Bearing, Entrepreneurship and the Theory of Moral Hazard,” mimeo. Prendergast, C. (2000), “Uncertainty and Incentives,” mimeo. Rao, C. H. H. (1971), “Uncertainty, Entrepreneurship, and Sharecropping in India,” Journal of Political Economy, 79(3), 578–595. Schmidt, K. (1997), “Managerial Incentives and Product Market Competition,” Review of Economic Studies, 64, 191–213.

CHAPTER 6

Theories of Fairness and Reciprocity: Evidence and Economic Applications Ernst Fehr and Klaus M. Schmidt

1. INTRODUCTION Most economic models are based on the self-interest hypothesis that assumes that all people are exclusively motivated by their material self-interest. Many inﬂuential economists – including Adam Smith (1759), Gary Becker (1974), Kenneth Arrow (1981), Paul Samuelson (1993), and Amartya Sen (1995) – pointed out that people often do care for the well-being of others and that this may have important economic consequences. Yet, so far, these opinions have not had much of an impact on mainstream economics. In recent years, experimental economists have gathered overwhelming evidence that systematically refutes the self-interest hypothesis. The evidence suggests that many people are strongly motivated by other-regarding preferences, and that concerns for fairness and reciprocity cannot be ignored in social interactions. Moreover, several theoretical papers have been written showing that the observed phenomena can be explained in a rigorous and tractable manner. Some of these models shed new light on problems that have puzzled economists for a long time (e.g., the persistence of noncompetitive wage premia, the incompleteness of contracts, the allocation of property rights, the conditions for successful collective action, and the optimal design of institutions). These theories in turn induced a new wave of experimental research offering additional exciting insights into the nature of preferences and into the relative performance of competing theories of fairness. The purpose of this paper is to review these recent developments, to point out open questions, and to suggest avenues for future research. Furthermore, we will argue that it is not only necessary, but also very promising for mainstream economics to take the presence of other-regarding preferences into account. Why are economists so reluctant to give up the self-interest hypothesis? One reason is that this hypothesis has been quite successful in providing accurate predictions in some economic domains. For example, models based on the self-interest hypothesis make very good predictions for competitive markets with standardized goods. This has been shown in many carefully conducted market experiments. However, a large amount of economic activity is taking

Fairness and Reciprocity

209

place outside of competitive markets – in markets with a small number of traders, in markets with informational frictions, in ﬁrms and organizations, and under incompletely speciﬁed and incompletely enforceable contracts. In these environments, models based on the self-interest assumption frequently make very misleading predictions. An important insight provided by some of the newly developed fairness models is that they show why, in competitive environments with standardized goods, the self-interest model is so successful and why, in other environments, it is refuted. In this way, the new models provide fresh and experimentally conﬁrmed insights into important phenomena (e.g., nonclearing markets or the widespread use of incomplete contracts). We consider it important to stress that the available experimental evidence also suggests that many subjects behave quite selﬁshly even when they are given a chance to affect other peoples’ well-being at a relatively small cost. However, there are also many people who are strongly motivated by fairness and reciprocity and who are willing to reward or punish other people at a considerable cost to themselves. One of the exciting insights of some of the newly developed theoretical models is that the interaction between fair and selﬁsh individuals is key to the understanding of the observed behavior in strategic settings. These models explain why, in some strategic settings, almost all people behave as if they are completely selﬁsh, whereas in others the same people will behave as if they are driven by fairness. A second reason for the reluctance to give up the self-interest hypothesis is methodological. There is a strong convention in economics of not explaining puzzling observations by changing assumptions on preferences. Changing preferences is said to open Pandora’s box, because everything can be explained by assuming the “right” preferences. We believe that this convention made sense in the past when economists did not have sophisticated tools to examine the nature of preferences in a scientiﬁcally rigorous way. However, because of the development of experimental techniques, this is no longer true. In fact, one purpose of this paper is to show that much progress and fascinating new insights into the nature of fairness preferences have been made in the past decade. Although there is still much to be done, this research clearly shows that it is possible to discriminate between theories based on different preference assumptions. Therefore, in view of the facts, the new theoretical developments, the importance of fairness concerns in many economic domains, and in view of the existence of rigorous experimental techniques that allow us to examine hitherto unsolvable problems in a scientiﬁc manner, we believe that it is time to recognize that a substantial fraction of the people is also motivated by fairness concerns. People do not differ only in their tastes for chocolate and bananas, but also along a more fundamental dimension. They differ with regard to how selﬁsh or fair-minded they are, and this does have important economic consequences. The rest of this paper is organized as follows. Section 2 provides many reallife examples indicating the relevance of fairness considerations and reviews the experimental evidence. It shows that the self-interest model is refuted in

210

Fehr and Schmidt

many important situations and that a substantial number of people seem to be strongly concerned about fairness and behave reciprocally. Section 3 surveys different theoretical approaches that try to explain the observed phenomena. In the meantime, there is also a large and growing literature on the evolutionary origins of reciprocity (see, e.g., Bowles and Gintis 1999; Gintis 2000; Sethi and Somananthan 2000, 2001). We do not discuss and review this literature in our paper. Section 4 discusses the wave of new experiments that have been conducted to discriminate between these theories. Section 5 explores the implications of fairness-driven behavior in various economic applications and offers some directions for future research. Section 6 concludes. In view of the length of our paper, it is also possible to read the paper selectively. For example, readers who are already familiar with the basic evidence and the different fairness theories may go directly to the new evidence in Section 4 and the economic applications in Section 5. 2. EMPIRICAL FOUNDATIONS OF FAIRNESS AND RECIPROCITY 2.1.

Where Does Fairness Matter?

The notion of fairness is frequently invoked in families, at the workplace, and in people’s interactions with neighbors, friends, and even strangers. For instance, our spouse becomes sour if we do not bear a fair share of family responsibilities. Our children are extremely unhappy and envious if they receive less attention and gifts than their brothers and sisters. We do not like those among our colleagues who persistently escape doing their share of important, yet inconvenient, departmental activities. Fairness considerations are, however, not restricted to our personal interactions with others. They shape the behavior of people in important economic domains. For example, employee theft and the general work morale of employees are affected by the perceived fairness of the ﬁrm’s policy (Greenberg, 1990 and Bewley, 1999). The impact of fairness and equity norms may render direct wage cuts unproﬁtable (Kahneman, Knetsch, and Thaler 1986, Agell and Lundborg, 1995). Firms may, therefore, be forced to cut wages in indirect ways (e.g., by outsourcing activities). Fairness concerns may thus inﬂuence decisions about the degree of vertical integration. They may also severely affect the hold-up problem as demonstrated by Ellingsen and Johannesson (2000). Debates about the appropriate income tax schedule are strongly affected by notions of merit and fairness (Seidl and Traub, 1999). The amount of tax evasion is likely to be affected by the perceived fairness of the tax system (Frey and Weck-Hanneman, 1984, Alm, Sanchez, and de Juan 1995, Andreoni, Erard, and Feinstein 1998). Public support for the regulation of private industries depends on the perceived fairness of the ﬁrms’ policies (Zajac, 1995). Compliance with contractual obligations, with organizational rules, and with the law in general is strongly shaped

Fairness and Reciprocity

211

by the perceived fairness of the allocation of material beneﬁts and by issues of procedural justice (Lind and Tyler, 1988, Fehr, G¨achter, and Kirchsteiger, 1997). The functioning of incentive-compatible mechanisms has been shown to depend on fairness considerations (Andreoni and Varian, 1999). The solution of collective action problems (e.g., rules regulating the access to common pool resources) critically depends on the fairness of the allocation of the costs and beneﬁts of the rules (Ostrom 1990, 2000, Falk, Fehr, and Fischbacher, 2000c). The erosion of public support for the welfare state in the United States in the last two decades probably has much to do with deeply entrenched notions of reciprocal fairness (Bowles and Gintis, 2000). Many people cease to support public programs that help the poor if they have the impression that the poor do not attempt to bear their share of a society’s obligations. Thus, real-world examples in which fairness concerns are likely to matter abound. Nevertheless, in the following, we concentrate on clean experimental studies, because in most real-life situations, it is impossible to unambiguously isolate the impact of fairness motives. A skeptic may always argue that the notion of fairness is used only for rhetorical purposes that disguises purely selfinterested behavior in an equilibrium of a repeated game. Therefore, we rely on experimental evidence of human decision making. In these experiments, real subjects make decisions with real monetary consequences in carefully controlled laboratory settings. In particular, the experimenter can implement one-shot interactions between the subjects so that long-term self-interest can be ruled out as an explanation for what we observe. As we will see, in some experiments, the monetary stakes involved are quite high – amounting up to the income of three months’ work. In the experiments reviewed, subjects do not know each others’ identity, they interact anonymously, and, sometimes, even the experimenter cannot observe their individual choices. 2.2.

Experimental Evidence

In hindsight, it is a bit ironic that experiments have proved to be critical for the discovery and the understanding of fairness-driven behavior, because for several decades, experimental economists were ﬁrmly convinced that fairness motives would not matter much. At best, fair behavior was viewed as a temporary deviation from the strong forces of self-interest. In the 1950s, Vernon Smith discovered that, under relatively weak conditions, experimental markets quickly converge to the competitive equilibrium.1 Since then, the remarkable convergence properties of experimental markets have been conﬁrmed by hundreds of experiments (see, e.g., Davis and Holt, 1993). For these experiments, the equilibrium is computed under the assumption that all players are 1

Smith’s results were eventually published in 1962 in the Journal of Political Economy after time-consuming debates with the referees. It is also ironic that Smith‘s initial aim was “to do a more credible job of rejecting competitive price theory” than Chamberlin (1948).

212

Fehr and Schmidt

exclusively self-interested. Therefore, the quick convergence to equilibrium has been interpreted as a conﬁrmation of the self-interest hypothesis. We will see later in this paper that this conclusion was premature because, as the newly developed models of fairness show (see Section 3 and Section 5.1), convergence to standard competitive predictions can occur even if agents are very strongly concerned about fairness. This strong commitment to the self-interest hypothesis slowly weakened in the 1980s when experimental economists started to study bilateral bargaining games and interactions in small groups in controlled laboratory settings (see, e.g., Roth, Malouf, and Murningham, 1981, G¨uth et al., 1982). One of the important experimental games that ultimately led many people to realize that the self-interest hypothesis is problematic was the so-called Ultimatum Game (UG) invented by G¨uth, Schmittberger, and Schwarze (1982). In addition, the Gift Exchange Game (GEG), the Trust Game (TG), the Dictator Game (DG), and Public Good Games (PGGs) played an important role in weakening the exclusive reliance on the self-interest hypothesis. All these games share the feature of simplicity. Because they are so simple, they are easy to understand for the experimental subjects, and this makes inferences about subjects’ motives more convincing. In the UG, a pair of subjects has to agree on the division of a ﬁxed sum of money. Person A, the Proposer, can make one proposal of how to divide the amount. Person B, the Responder, can accept or reject the proposed division. In the case of rejection, both receive nothing; in the case of acceptance, the proposal is implemented. Under the standard assumptions that (i) both the Proposer and the Responder are rational and care only about how much money they get and (ii) that the Proposer knows that the Responder is rational and selﬁsh, the subgame perfect equilibrium prescribes a rather extreme outcome: The Responder accepts any positive amount of money and, hence, the Proposer gives the Responder the smallest money unit, ε, and keeps the rest. A robust result in the UG, across hundreds of experiments, is that proposals offering the Responder less than 20 percent of the available surplus are rejected with probability 0.4–0.6. In addition, the probability of rejection is decreasing in the size of the offer (see, e.g., G¨uth et al., 1982, Camerer and Thaler, 1995; Roth, 1995, and the references therein). Apparently, many Responders do not behave in a self-interest maximizing manner. In general, the motive indicated for the rejection of positive, yet “low,” offers is that subjects view them as unfair. A further robust result is that many Proposers seem to anticipate that low offers will be rejected with a high probability. This is suggested, for example, by the comparison of the results of DGs and UGs. In a DG, the Responder’s option to reject is removed – the Responder must accept any proposal. Forsythe et al. (1994) were the ﬁrst who compared the offers in UGs and DGs. They report that offers are substantially higher in the UG, which suggests that many Proposers do apply backward induction. This interpretation is also supported by the surprising observation of Roth, Prasnikar, Okuno-Fujiwara, and Zamir

Fairness and Reciprocity

213

(1991), who showed that the modal offer in the UG tends to maximize the expected income of the Proposer.2 The UG shows that a sizeable fraction of Responders is willing to punish behavior that is perceived as unfair. In contrast, the GEG indicates that a substantial fraction of the Responders is willing to reward actions that are perceived as generous or fair. The ﬁrst GEG has been conducted by Fehr, Kirchsteiger, and Riedl (1993). In the GEG, the Proposer offers an amount of money w ∈ [w, w], w ≥ 0, which can be interpreted as a wage payment, to the Responder. The Responder can accept or reject w. In case of a rejection, both players receive zero payoff; in case of acceptance, the Responder has to make a costly “effort” choice e ∈ [e, e], e > 0. The monetary payoff for the Proposer is x P = ve − w, whereas the Responder’s payoff is x R = w − c(e), where v denotes the marginal value of effort for the Proposer and c(e) the strictly increasing effort cost schedule.3 Under the standard assumptions (i) and (ii), the Responder will always choose the lowest feasible effort level e and will, in equilibrium, never reject any w. Therefore, the subgame perfect proposal is the lowest feasible wage level w. The GEG captures a principal–agent relation with highly incomplete contracts in a stylized way. Variants of the GEG have been conducted by several authors.4 All of these studies report that the mean effort is, in general, positively related to the offered wage that is consistent with the interpretation that the Responders, on average, reward generous wage offers with generous effort choices. However, as in the case of the UG, there are considerable individual differences among the Responders. Although there typically is a sizeable fraction of Responders (frequently roughly 40 percent, sometimes more than 50 percent) who exhibit a reciprocal effort pattern, there is also a substantial fraction of Responders who always make purely selﬁsh effort choices or whose choices seem to deviate randomly from the self-interested action. Despite the presence of selﬁsh Responders, the relation between average effort and wages is in general sufﬁciently steep to render a high wage policy proﬁtable. This induces Proposers to pay wages far above w. Evidence for this interpretation comes from Fehr, Kirchsteiger, and Riedl, who embedded the GEG into an experimental market. 2

3

4

Suleiman (1996) reports the results of UGs with varying degrees of veto power. In these games, a rejection meant that λ percent of the cake was destroyed. For example, if λ = 0.8, and the Proposer offered a 9:1 division of $10, a rejection implied that the Proposer received $1.8, whereas the Responder received $0.2. Suleiman reports that Proposers’ offers are strongly increasing in λ. In some applications of this game, the Proposer’s payoff was given by x P = (v − w)e. This formulation rules out that Proposers can make losses when they offer generously high wages. Likewise, in some applications of the GEG, the Responder did not have the option to reject w. Thus, the Proposer just sent w, whereas the Responder choose an effort level. Under the standard assumptions of rationality and selﬁshness, the subgame perfect equilibrium is, however, not affected by these differences. See, e.g., Fehr, Kirchsteiger, and Riedl (1993, 1998), Charness (1996, 2000), Brandts and Charness (1999), Falk, G¨achter, and Kovacs (1999), Fehr and Falk, (1999), G¨achter and Falk (1999), and Hannan, Kagel, and Moser (1999).

214

Fehr and Schmidt

In addition to the embedded GEG, there was a control condition in which the effort level was exogenously ﬁxed by the experimenter. Note that, in the control condition, the Responders can no longer reward generous wages with high effort levels. It turns out that the average wage is substantially reduced when the effort is exogenously ﬁxed. Another important game that did much to change the exclusive reliance on the self-interest hypothesis was the TG, ﬁrst studied by Berg, Dickhaut, and McCabe (1995). In a TG, a Proposer receives an amount of money y from the experimenter, and then can send between zero and y to the Responder. The experimenter then triples the amount sent, which we term z, so that the Responder has 3z. The Responder is then free to return anything between zero and 3z to the Proposer. It turns out that many Proposers send money and that many Responders give back some money. Moreover, there is frequently a strong correlation between z and the amount sent back at the individual, as well as at the aggregate, level (see, e.g., Miller 1997, Cox 2000, Fahr and Irlenbusch, 2000). Finally, we brieﬂy consider the evidence on PGGs. Like the GEG, the PGG is important because it not only provides interesting insights into the nature of nonpecuniary motivations, but it also captures the essence of numerous realworld situations. There is by now a huge experimental literature on PGGs (see, for surveys, Dawes and Thaler, 1988, Ledyard, 1995). In the typical experiment, there are n players who simultaneously decide how much of their endowment to contribute to a public good. Player i’s monetary payoff is given by xi = yi − gi + m g j , where yi is player i’s endowment, gi her contribution, m the monetary payoff per unit of the public good, and g j the amount of the public good provided by all players. The unit payoff m obeys m < 1 < nm. This ensures that it is a dominant strategy to contribute nothing to the public good, although the total surplus would be maximized if all players contributed their whole endowment.5 In many experiments, the PGG is repeated for about 10 periods, in which in each period the group composition changes randomly. If we restrict attention to behavior in the ﬁnal period (to abstract from repeated games or learning effects), it turns out that roughly 75 percent of all subjects contribute nothing to the public good and the rest contributes very little.6 If one adds to the PGG the opportunity to punish other group members, the contribution pattern changes radically (Fehr and G¨achter 2000). In a PGG with a punishment option, there are two stages. Stage 1 is identical to the previously described PGG. At stage 2, after every player in the group has been informed 5 6

Typically, endowments are identical and n ≤ 10, but there are also experiments with a group size of 40 and 100 (Isaac, Walker, and Williams, 1994). At the beginning of a repeated PGG, subjects contribute on average between 40 and 60 percent of their endowment; but, toward the end, contributions are typically very low. This pattern may be due to repeated game effects. Another plausible reason for the decay of cooperation is that many subjects are conditional cooperators, as shown by Croson (1999), Fischbacher, G¨achter, and Fehr (1999), and Sonnemans, Schram, and Offerman (1999). Conditional cooperators cease to cooperate once they notice that selﬁsh subjects take advantage of their cooperation.

Fairness and Reciprocity

215

about the contributions of each group member, each player can assign up to ten punishment points to each of the other players. The assignment of one punishment point reduces the ﬁrst-stage income of the punished subject by three points on average, but it also reduces the income of the punisher according to a strictly increasing and convex cost schedule. Note that because punishment is costly for the punisher, the self-interest hypothesis predicts zero punishment. Moreover, because rational players will anticipate this, the self-interest hypothesis predicts that nobody will contribute (i.e., there should be no difference in the contribution behavior between the usual PGG and a PGG with a punishment opportunity). The experimental evidence is, however, completely at odds with this prediction. Although, in the usual PGG, cooperation is close to zero in the ﬁnal period, the punishment opportunity causes, on average, stable cooperation rates around 75 percent of subjects’ endowment.7 The reason for these huge differences in contribution behavior is that in the punishment condition many cooperators punish the free riders. The more a subject deviates from the average contribution of the other group members, the more it is punished. Thus, the willingness to punish “unfair” behavior is not restricted to the UG. The above-mentioned facts in the UG, the GEG, the TG, and the PGG are now well established, and there is little disagreement about them. But there are, of course, questions about which factors change the behavior in these games. For example, a question that routinely comes up in discussions with economists is whether a rise in the stake level will eventually induce subjects to behave in a self-interested manner. There are several papers examining this question (Hoffman, McCabe, and Smith 1996; Fehr and Tougareva 1995; Slonim and Roth 1998; Cameron 1999). The surprising answer is that relatively large increases in the monetary stakes did nothing or little to change behavior. Hoffman et al. could not detect any effect of the stake level in their UGs. Fehr and Tougareva conducted GEGs (embedded in a competitive exerimental market) in Moscow. In one condition, the subjects earned, on average, the equivalent amount of the income of 1 week in the experiment. In another condition, they earned the equivalent of 10 weeks’ income. Despite this large difference in the stake size, there are no signiﬁcant differences across conditions in the behavior of both the Proposers and the Responders. Slonim and Roth conducted UGs in Slovakia. They found a small interaction effect between experience and the stake level. In the ﬁnal period of a series of one-shot UGs, the Responders in the high-stake condition (with a tenfold increase in the stake level relative to the low-stake condition) seem to be willing to reject a bit less frequently. Fehr and Tougareva also allowed subjects to repeat the game (with randomly matched partners). They found no such interaction effects. Cameron conducted UGs in Indonesia and – in the high-stake condition – subjects could earn 7

If the same subjects are allowed to stay together for 10 periods, the cooperation rate even climbs to 90 percent of subjects’ endowments in the ﬁnal period. In Fehr and G¨achter (2000), the group size was n = 4. Recently, Carpenter (2000) showed that, with a group size of n = 10, subjects achieve almost full cooperation, even with a random group composition over time.

216

Fehr and Schmidt

the equivalent of three months’ income in her experiment. She observed no effect of the stake level on Proposers’ behavior and a slight reduction of the rejection probability when stakes were high. Of course, it is still possible that, in the presence of extremely high stakes, there may be a shift toward more selﬁsh behavior. However, for large segments of the population, this is not the economically relevant question. For almost all people, the vast majority of their decisions involves stake levels well below three months’ income. Thus, even if fairness-driven behavior would play no role at all at stake levels above that size, fairness concerns would still play a major role in many economically important domains. 2.3.

Interpretation of the Evidence

Although there is now little disagreement regarding the facts, there is still disagreement about the interpretation of these facts. In Section 3, we will describe several recently developed theories of fairness that maintain the rationality assumption, but change the assumption of purely selﬁsh preferences. Some researchers have, however, reservations about changes in the motivational assumptions and prefer, instead, to interpret the behavior in these games as elementary forms of bounded rationality. For example, Binmore, Gale, and Samuelson (1995) and Roth and Erev (1995) try to explain the presence of fair offers and rejections of low offers in the UG by learning models that are based on purely pecuniary preferences. These models are based on the idea that the rejection of low offers is not very costly for the Responder and, therefore, the Responders learn only very slowly not to reject such offers. The rejection of offers is, however, quite costly for the Proposers. Therefore, Proposers learn more quickly that it does not pay to make low offers. Moreover, because Proposers quickly learn to make fair offers, the pressure on the Responders to learn accepting low offers is greatly reduced. This gives rise to very slow convergence to the subgame perfect equilibrium – if there is convergence at all. The simulations of Binmore et al. and Roth and Erev show that it often takes thousands of iterations until play comes close to the standard prediction. In our view, there can be little doubt that learning processes are important in real life, as well as in laboratory experiments. There are numerous examples where the behavior of subjects changes over time, and it seems clear that learning models are prime candidates to explain such dynamic patterns. We believe, however, that attempts to explain the basic facts in such simple games as the UG, the GEG, and the TG in terms of learning models that assume completely selﬁsh preferences are misplaced. The decisions of the Responders, in particular, are so simple in these games that it is difﬁcult to believe that they make systematic mistakes and reject money or reward generous offers, although their true preferences would require them not to do so. Moreover, the previously cited evidence from Roth et al. (1991), Forsythe et al. (1994), Suleiman (1996), and Fehr et al. (1998) suggests that many Proposers do anticipate Responders’

Fairness and Reciprocity

217

actions surprisingly well. Thus, at least in these simple two-stage games, many Proposers seem to be quite rational and forward-looking. Sometimes it is also argued that the behavior in these games is because of a social norm (see, e.g., Binmore 1998). In real life, so the argument goes, experimental subjects make the bulk of their decisions in repeated interactions. It is well known that, in repeated interactions, the rejection of unfair offers or the rewarding of generous offers can be sustained as an equilibrium. According to this argument, notions of fairness perform the function of selecting a particular equilibrium among the inﬁnitely many equilibria that typically exist in long-term interactions. Subjects’ behavior is, therefore, adapted to repeated interactions, and they tend to apply behavioral rules that are appropriate in the context of repeated interactions erroneously to laboratory one-shot games. This argument essentially boils down to the claim that subjects cannot rationally distinguish between one-shot and repeated interactions. One problem with this argument – apart from claiming that subjects make systematic mistakes – is that it cannot explain the huge behavioral variations across one-shot games. Why do, in Forsythe et al. (1994), the Proposers give so much less in the DG compared with the UG? Why do the Proposers in the control condition with exogenously ﬁxed effort (Fehr et al., 1998) make so low wage offers? Why is there so much defection in the ﬁnal round of PGGs, whereas in the presence of a punishment opportunity, a high level of cooperation can be achieved? Invoking some kind of social norm cannot explain this behavior unless one is willing to assume that different social norms apply to these different situations. A second problem with this argument is that there is compelling evidence that, in repeated interactions, experimental subjects do behave very differently compared with one-shot situations. In G¨achter and Falk (1999), it is shown that the Responders in GEGs put forward much higher effort levels if they can stay together with the same Proposer.8 In fact, experimental subjects who participate in one-shot GEGs frequently complain after the experiment that the experimenter ruled out repeated interactions because that would have enabled them, so the subjects’ claim, to develop a much more trustful and efﬁcient relation with their partner. All this indicates that experimental subjects are well aware of the difference between one-shot interactions and repeated interactions. These arguments suggest that an approach that combines bounded rationality with purely selﬁsh preferences does not provide a satisfactory explanation of the facts observed in UGs, GEGs, TGs, and PGGs. In our view, there remain two plausible approaches to account for the facts. One approach is to maintain the assumption of rationality at least for the analysis of these simple games and to assume, in addition, that some players are not only motivated by pecuniary forces. The other approach is to combine models of learning with models that take into account nonselﬁsh motives. In the following we focus on the ﬁrst 8

Andreoni and Miller (1993) also report that, in prisoner’s dilemmas, increases in the probability of staying together or meeting the same partner again increase cooperation rates.

218

Fehr and Schmidt

approach because there has been much progress in this area in recent years, whereas the second approach is still in its infancy.9 3. THEORIES OF FAIRNESS AND RECIPROCITY This section surveys the most prominent recent attempts to explain the experimental evidence sketched in Section 2 within a rational choice framework. Two main approaches can be distinguished. The ﬁrst approach assumes that at least some agents have “social preferences” (i.e., the utility function of these agents depends not only on the own material payoff, but also on how much the other players receive). Given these social preferences, all agents are assumed to behave perfectly rational, and the well-known concepts of traditional utility and game theory can be applied to analyze optimal behavior and to characterize equilibrium outcomes in experimental games. The second approach focuses on “intention-based reciprocity.” This approach assumes that a player cares about the intentions of her opponent. If she feels treated kindly, she wants to return the favor and be nice to her opponent. If she feels treated badly, she wants to hurt her opponent. Thus, in this approach, it is crucial how a player interprets the behavior of the other players. This cannot be captured by traditional game theory but requires the framework of psychological game theory. The starting point of both of these approaches is to make rather speciﬁc assumptions on the utility functions of the players. Alternatively, one could start from a general preference relation and ask what kind of axioms are necessary and sufﬁcient to generate utility functions with certain properties. Axiomatic approaches are discussed at the end of this section. 3.1.

Social Preferences

Classical utility theory assumes that a decision maker has preferences over allocations of material outcomes (e.g., goods) and that these preferences satisfy some “rationality” or “consistency” requirements, such as completeness and transitivity. However, in almost all applications, this fairly general framework is interpreted much more narrowly by implicitly assuming that the decision maker cares about only one aspect of an allocation, namely the material resources that are allocated to her. Models of social preferences assume, in contrast, that the decision maker may also care about how much material resources are allocated to others. Somewhat more formally, let {1, 2, . . . , N } denote a set of individuals and x = (x1 , x2 , . . . , x N ) denote an allocation of physical resources out of some set X of feasible allocations, where xi denotes the material resources allocated to person i. The self-interest hypothesis says that the utility of individual i 9

An exemption is the recent paper by Cooper and Stockman (1999), which combines reenforcement learning with a model of social preferences, and the paper by Costa-Gomes and Zauner (1999).

Fairness and Reciprocity

219

depends on xi only. We will say that individual i has social preferences if for any given xi person i’s utility is affected by variations of x j , j = i. Of course, simply assuming that the utility of individual i may be any function of the total allocation is too general, because it does not yield any empirically testable restrictions on observed behavior. In the following, we will discuss several models of social preferences, each of which assumes that the preferences of an individual depend on x j , j = i, in a different way. 3.1.1.

Altruism

A person is altruistic if the ﬁrst partial derivatives of u(x1 , . . . , x N ) with respect to x1 , . . . , x N are strictly positive (i.e., if her utility increases with the wellbeing of other people).10 The hypothesis that people are altruistic has a long tradition in economics and has been used to explain charitable donations and the voluntary provision of public goods (see, e.g., Becker, 1974). Clearly, the simplest game to elicit altruistic preferences is the DG. Adreoni and Miller (2000) conducted a series of DG experiments in which one agent could allocate “tokens” between herself and another agent for a series of different budgets. The tokens were exchanged into money at different rates for the two agents and the different budgets. Let Ui (x1 , x2 ) denote subject i’s utility function representing her preferences over monetary allocations (x1 , x2 ). In a ﬁrst step, Adreoni and Miller check for violations of the General Axiom of Revealed Preference and ﬁnd that almost all subjects behaved consistently and passed this basic rationality check. Then they classify the subjects into three main groups. They ﬁnd that about 30 percent of the subjects give tokens to the other party in a fashion that equalizes the monetary payoffs between players. The behavior of 20 percent of the subjects can be explained by a utility function in which x1 and x2 are perfect substitutes [i.e., these subjects seem to have maximized the (weighted) sum of the monetary payoffs]. However, there are also almost 50 percent of the subjects who behaved “selﬁshly” and did not give any signiﬁcant amounts to the other party. Andreoni and Miller (2000, p. 23) conclude that altruistic behavior exists and that it is consistent with rationality, but also that individuals are heterogeneous. Charness and Rabin (2000) consider a speciﬁc form of altruism that they call quasi-maximin preferences. They start from a “disinterested social welfare function,” which is a convex combination of Rawls’ maximin criterion and a utilitarian welfare function: W (x1 , x2 , . . . , x N ) = δ · min{x1 , . . . , x N } + (1 − δ) · (x1 + · · · + x N ), 10

The Encyclopaedia Britannica (1998, 15th edition) deﬁnes an altruistic agent as someone who feels the obligation “to further the pleasures and alleviate the pains of other people.” Note that our deﬁnition of altruism differs somewhat from the deﬁnition used in moral philosophy, where “altruism” requires a moral agent to be concerned only about the welfare of others and not about his own happiness.

220

Fehr and Schmidt

where δ ∈ (0, 1) is a parameter reﬂecting the weight that is put on the maximin criterion. The utility function of an individual is then given by a convex combination of his own monetary payoff and the above social welfare function:11 Ui (x1 , x2 , . . . , x N ) = (1 − γ )x1 + γ [δ · min{x1 , . . . , x N } + (1 − δ) · (x1 + · · · + x N )]. In the two-player case, this boils down to x + γ (1 − δ)x j Ui (x1 , x2 ) = i (1 − γ δ)xi + γ x j

if if

xi < x j xi ≥ x j .

Note that the marginal rate of substitution between xi and x j is smaller if xi < x j . Hence, the decision maker cares about the well-being of the other person, but less so if the other person is better off than she is. Altruism in general and quasi-maximin preferences, in particular, can explain positive acts to other players, such as giving in DGs, voluntary contributions in PGGs, and the kind behavior of Responders in TGs and GEGs12 ; but, it is clearly inconsistent with the fact that, in some experiments, subjects try to retaliate and hurt other subjects, even if this is costly for them (as in the UG or a PGG with punishments). This is why Charness and Rabin augment quasi-maximin preferences by incorporating reciprocity (see Section 3.2.3). 3.1.2.

Relative Income and Envy

An alternative hypothesis is that subjects are concerned not only about the absolute amount of money they receive, but also about their relative standing compared with others. This “relative income hypothesis” has a long tradition in economics and goes back at least to Veblen (1922). Bolton (1991) formalized this idea in the context of an experimental bargaining game between two players and assumed that Ui (xi , x j ) = u i (xi , xi /x j ), where u(·, ·) is strictly increasing in its ﬁrst argument and where the partial derivative with respect to xi /x j is strictly positive for xi < x j and equal to 0 for xi ≥ x j . Thus, agent i suffers if she gets less than player j, but she does not care about player j if she is better off herself. Note that this utility function implies that ∂Ui /∂ x j ≤ 0, just the opposite of altruism. Hence, whereas this utility function is consistent with the behavior in the bargaining games considered by Bolton, it fails to explain 11

12

Note that Charness and Rabin do not normalize payoffs with respect to N . Thus, if the group size changes, and the parameters δ and γ are assumed to be constant, the importance of the maximin term in relation to the player’s own material payoff changes. However, even in these games, altruism has some implausible implications. For example, in a public good context, altruism implies that if the government provides part of the public good (ﬁnanced by taxes), then every dollar provided by the government “crowds out” one dollar of private, voluntary contributions. This “neutrality property” holds quite generally (Bernheim, 1986). However, it is in contrast to the empirical evidence reporting that the actual crowding out is rather small. This has led some researchers to include the pleasure of giving (a “warm glow effect”) in the utility function (Andreoni, 1989).

Fairness and Reciprocity

221

giving in DGs, GEGs, and TGs or voluntary contributions in public PGGs. The same problem arises in the envy-approach of Kirchsteiger (1994). 3.1.3.

Inequity Aversion

The preceding approaches assumed that utility is either monotonically increasing or monotonically decreasing in the well-being of other players. Fehr and Schmidt (1999) assume that a player is altruistic toward other players if their material payoffs are below an equitable benchmark, but she feels envy when the material payoffs of the other players exceed this level.13 In most experiments, it is natural to assume that an equitable allocation is an equal monetary payoff for all players. Fehr and Schmidt consider the simplest utility function capturing this idea. Ui (x1 , . . . , x N ) = xi − [αi /(N − 1)] max {x j − xi , 0} j=i

− [βi /(N − 1)] max

{xi − x j , 0}.

j=i

with βi ≤ αi and βi ≤ 1. Note that ∂Ui /∂ x j ≥ 0 if and only if xi ≥ x j . Note also that the disutility from inequality is larger if another person is better off than player i than if another person is worse off (αi ≥ βi ). This utility function can rationalize positive and negative actions toward other players. It is consistent with giving in DGs, GEGs, and TGs, and with the rejection of low offers in UGs. It can also explain voluntary contributions in PGGs and the costly punishment of free riders. A second important ingredient of this model is the assumption that individuals are heterogeneous. If all people were alike, it would be difﬁcult to explain why we observe that people sometimes resist “unfair” outcomes or manage to cooperate even though it is a dominant strategy for a selﬁsh person not to do so, whereas in other environments fairness concerns or the desire to cooperate do not seem to have much of an effect. Fehr and Schmidt show that the interaction of the distribution of types with the strategic environment explains why, in some situations, very unequal outcomes are obtained, whereas in other situations very egalitarian outcomes prevail. For example, in certain competitive environments (see, e.g., the UG with Proposer competition in Section 5.1), even a population that consists of only very fair types (high αs and βs) cannot prevent very uneven outcomes. The reason is that none of the inequity-averse players can enforce a more equitable outcome through her own actions. In contrast, in a PGG with punishment, a small fraction of inequity-averse players is sufﬁcient to threaten credibly that free riders will be punished, which induces selﬁsh players to contribute to the public good. 13

Daughety (1994) and Fehr et al. (1998) also assume that a player values the payoff of reference agents positively, if she is relatively better off, whereas she values the others’ payoff negatively, if she is relatively worse off.

222

Fehr and Schmidt

Using data that are available from many experiments on the UG, Fehr and Schmidt calibrate the distribution of α and β in the population. Keeping this distribution constant, they show that their model yields quantitatively accurate predictions across many bargaining, market, and cooperation games.14 Neilson (2000) provides an axiomatic characterization of the Fehr and Schmidt (1999) model of inequity aversion. He introduces the axiom of “selfreferent separability,” which requires that if the payoff differences between player i and any subset of all other players remain constant, then the preferences of player i should not be affected by the magnitude of these differences. Neilson shows that this axiom is equivalent to having a utility function that is additively separable in the individual’s own material payoff and the payoff differences to his opponents, which is an essential feature of the Fehr–Schmidt model. Neilson also offers a full axiomatic characterization of the more speciﬁc functional form used by Fehr and Schmidt. Bolton and Ockenfels (2000) independently developed a similar model of inequity aversion. They also show that their model can explain a wide variety of seemingly puzzling evidence (e.g., giving in DGs and GEGs and rejections in UGs). In their model, the utility function is given by Ui = Ui (xi , σi ), where

σi =

x N i j=1

1 N

xj

if if

N j=1

x j = 0

j=1

x j = 0.

N

For any given σi , the utility function is assumed to be weakly increasing and concave in player i’s own material payoff xi . Furthermore, for any given xi , the utility function is strictly concave in player i’s share of total income, σi , and obtains a maximum at σi = 1/N .15 Bolton and Ockenfels do not pin down a 14

15

One drawback of the piecewise linear utility function used by Fehr and Schmidt is that it implies corner solutions for some games where interior solutions are frequently observed. For example, in the DG, a decision maker with a Fehr-Schmidt utility function would either give nothing (if her β < 0.5) or share the pie equally (if β > 0.5). Giving away a fraction that is strictly in between 0 and 0.5 is optimal only in the nongeneric case, where β = 0.5. However, this problem can be avoided by assuming nonlinear inequity aversion. This speciﬁcation of the utility function has the disadvantage that it is not independent of a shift in payoffs. Consider, for example, a DG in which the dictator has to divide X dollars. Note that this is a constant sum game, because x1 + x2 ≡ X . If we reduce the sum of payoffs by X (i.e., if the dictator can take away money from her opponent or give to him out of her own pocket), then x1 + x2 = 0 for any decision of the dictator and thus we always have σ1 = σ2 = 1/2. Therefore, the theory makes the implausible prediction that, in contrast to the game where x1 + x2 = X > 0, all dictators should take as much money from their opponent as possible. A related problem has been noted by Camerer (1999, p. 61). Suppose that the UG is modiﬁed as follows: If the Responder rejects a proposal, the Proposer receives a small amount ε > 0 while the Responder receives zero. In this game, the rejection of a positive offer implies σ = 0, whereas acceptance implies σ > 0. Thus, the Responder never rejects any positive offer, no matter how small ε > 0.

Fairness and Reciprocity

223

speciﬁc functional form, so their utility function is more ﬂexible. However, this also makes it more difﬁcult to get closed-form solutions and quantitative predictions for the outcomes of many experiments. It also imposes less discipline on the researcher not to adjust the utility function to a speciﬁc set of data. For two-player games, Fehr and Schmidt and Bolton and Ockenfels often yield qualitatively similar results. With more than two players, there are some interesting differences. In this case, Fehr and Schmidt assume that a player compares herself with each of her opponents separately. This implies that her behavior toward an opponent depends on the income difference toward this person. In contrast, Bolton and Ockenfels assume that the decision maker is not concerned about each individual opponent, but only about the average income of all players. Thus, whether ∂Ui /∂ x j is positive or negative in the Bolton– Ockenfels model does not depend on j’s relative position toward i, but rather on how well i does, compared with the average. If xi is below the average, then iwould like to reduce j’s income even if j has a much lower income than i herself. On the other hand, if i is doing better than the average, then she is prepared to give to j even if j is much better off than i.16 3.1.4.

Altruism and Spitefulness

Levine (1998) offers a different solution to explain giving in some games and punishing in others. Consider the utility function Ui = xi + x j (ai + λa j )/(1 + λ), j=i

where 0 ≤ λ ≤ 1 and −1 < ai < 1 for all i ∈ {1, . . . , N }. Suppose ﬁrst that λ = 0. In this case, the utility function reduces to Ui = xi + ai j=i x j . If ai > 0, then person i is an altruist who wants to promote the well-being of other people; if ai < 0, then player i is spiteful. Although this utility function would be able to explain why some people contribute in PGGs and why some (other) people reject positive offers in the UG, it cannot explain why the same person who is altruistic in one setting is spiteful in another. To deal with this problem, suppose that λ > 0. In this case, an altruistic player i (with ai > 0) feels more altruistic toward another altruist than toward a spiteful person. In fact, if −λa j > ai , player i may behave spitefully herself. In most experiments, where there is anonymous interaction, the players do not know the parameter a j of their opponents and have to form beliefs about them. Thus, any sequential game becomes a signaling game in which beliefs about the other players’ types are crucially important to determine optimal strategies. This may give rise to a multiplicity of signaling equilibria. Levine uses the data from the UG to calibrate the distribution of a and to estimate λ (which is assumed to be the same for all players). He shows that, 16

See Camerer (1999) and Section 4.1 for a more extensive comparison of these two approaches.

224

Fehr and Schmidt

with these parameters, the model can reasonably ﬁt the data on centipete games, market games, and PGGs. However, because ai < 1, the model cannot explain positive giving in the dictator game. 3.2.

Models of Intention-Based Reciprocity

Models of social preferences share a common weakness. They assume that players are concerned only about the distributional consequences of their acts but not about the intentions that lead their opponents to choose these acts. To see that this may be a problem, consider the following two “mini-UGs” in which the strategy set of the Proposer is restricted. In the ﬁrst condition, the Proposer can choose between a 50:50 and an 80:20 split. In the second condition, the Proposer must choose between an 80:20 and a 20:80 division of the pie. All theories that look only at the distributional consequences must predict that, if a Responder rejects the 80:20 split in the ﬁrst condition, then she must also reject this offer in the second condition. However, in the second condition, a fair division of the pie was not feasible, and so the Responder may be more inclined to accept this offer, compared with the ﬁrst treatment in which the Proposer could have split the pie evenly, but chose not to do so. In fact, Falk, Fehr, and Fischbacher (2000a) report that the 80:20 split is rejected signiﬁcantly less often under the second condition.17 This is inconsistent with any theory of social preferences that rely only on preferences over income distributions. 3.2.1.

Fairness Equilibrium

In a pioneering article, Rabin (1993) starts from the observation that our behavior is often a reaction to the (expected) intentions of other people. If we feel that another person has been kind to us, we often have a desire to be kind as well. If we feel that somebody wanted to hurt us, we often have the desire to retaliate, even if this is personally costly. To model intentions explicitly, Rabin departs from traditional game theory and adopts the concept of “psychological game theory” that had been introduced by Geanakoplos, Pearce, and Stacchetti (1989). In psychological game theory, utilities depend not only on terminal-node payoffs, but also on players’ beliefs. Rabin restricts attention to two-player, normal-form games. Let A1 and A2 denote the (mixed) strategy sets for players 1 and 2, respectively, and let xi : A1 × A2 → R be player i’s material payoff function. 17

This criticism does not necessarily apply to Levine (1998). In his model, offering 80:20 may be interpreted as a signal that the Proposer is spiteful if the 50:50 split was available, and may be differently interpreted if the 50:50 split was not available. However, if a player knows the type of her opponent, her behavior is independent of what the opponent does to her and of why he does it to her.

Fairness and Reciprocity

225

We now have to deﬁne (hierarchies of) beliefs over strategies. Let ai ∈ Ai denote a strategy of player i. When ichooses her strategy, she must have some belief about the strategy to be chosen by player j. In all of the following i ∈ {1, 2} and j = 3 − i. Let b j denote player i’s belief about what player j is going to do. Furthermore, to rationalize her expectation b j , player i must have some belief about what player j believes that player iis going to do. This belief about beliefs is denoted by ci . The hierarchy of beliefs could be continued ad inﬁnitum, but the ﬁrst two levels of beliefs are sufﬁcient to deﬁne reciprocal preferences. Rabin starts with a “kindness function,” f i (ai , b j ), which measures how kind player i is to player j. If player i believes that her opponent chooses strategy b j , then she chooses effectively her opponent’s payoff out of the set [x lj (b j ), x hj (b j )], where x lj (b j ) (x hj (b j )) is the lowest (highest) payoff of player j that can be induced by player i if j chooses b j . According to Rabin, a “fair” or “equitable” payoff for player j, x jf (b j ), is just the average of the lowest and highest payoffs (excluding Pareto-dominated payoffs, however). Note that this “fair” payoff is independent of the payoff of player i. The kindness of player i toward player j is measured by the difference between the actual payoff she gives to player j and the “fair” payoff, relative to the whole range of feasible payoffs:18 % &'% & f f i (ai , b j ) ≡ x j (b j , ai ) − x j (b j ) x hj (b j ) − x lj (b j ) , with j = 3 − i and f i (ai , b j ) = 0, if x hj (b j ) − x lj (b j ) = 0. Note that f i (ai , b j ) > 0 if and only if player i gives player j more than the “fair” payoff. Finally, we have to deﬁne player i’s belief about how kindly she is being treated by player j. This is deﬁned in exactly the same manner, but beliefs have to move up one level. Thus, if player i believes that player j chooses b j and if she believes that player j believes that i chooses ci , then player i perceives player j’s kindness as given by % &'% & f f j (b j , ci ) ≡ xi (ci , b j ) − xi (ci ) xih (ci ) − xil (ci ) , with j = 3 − i and f j (b j , ci ) = 0, if xih (ci ) − xil (ci ) = 0. These kindness functions can now be used to deﬁne a player’s utility function: Ui (a, b j , ci ) = xi (a, b j ) + f j (b j , ci )[1 + f i (ai , b j )], where a = (a1 , a2 ). Note that if player j is perceived to be unkind ( f j (·) < 0), player i wants to be as unkind as possible, too. On the other hand, if f j (·) is positive, player i gets some additional utility from being kind to player j as 18

A disturbing feature of Rabin’s formulation is that he excludes Pareto-dominated payoffs in the deﬁnition of the “fair” payoff, but not in the denominator of the kindness term. Thus, adding a Pareto-dominated strategy for player j would not affect the fair payoff, but it would reduce the kindness term.

226

Fehr and Schmidt

well. Note also that the kindness terms have no dimension and that they must lie in the interval [−1, 0.5]. Thus, the utility function is sensitive to positive afﬁne transformations. Furthermore, the kindness term becomes less and less important the higher the material payoffs are. A “fairness equilibrium” is an equilibrium in a psychological game with these payoff functions [i.e., a pair of strategies (a1 , a2 ) that are mutually best responses to each other and a set of rational expectations b = (b1 , b2 ) and c = (c1 , c2 ) that are consistent with equilibrium play]. Rabin’s theory is important because it was the ﬁrst contribution that made the notion of reciprocity precise and explored the consequences of reciprocal behavior. The model provides several interesting insights, but it is not well suited for predictive purposes. It is consistent with rejections in the UG, but there exist many other unreasonable equilibria, including equilibria in which the Responder receives more than 50 percent of the pie. The multiplicity of equilibria is a general feature of Rabin’s model. If material payoffs are sufﬁciently small so that psychological payoffs matter, then there are always multiple equilibria. In particular, there is one equilibrium in which both players are nice to each other and one in which they are nasty. Both equilibria are supported by selffulﬁlling prophecies, so it is difﬁcult to predict which equilibrium is going to be played. The theory also predicts that players do not undertake kind actions unless others have shown their kind intentions. Suppose, for example, that in the prisoner’s dilemma, player 2 has no choice but is forced to cooperate. If player 1 knows this, then – according to Rabin’s theory – she will interpret player 2’s cooperation as “neutral” ( f 2 (·) = 0). Thus, she will look at only her material payoffs and will defect. This contrasts with models of inequity aversion in which player 2 would cooperate, irrespective of the reason for player 1’s cooperation. We will discuss the experimental evidence that can be used to discriminate between the different approaches in Section 4. 3.2.2.

Intentions in Sequential Games

Rabin’s theory has been deﬁned only for two-person, normal-form games. If the theory is applied to the normal form of simple sequential games, some very implausible equilibria may arise. For example, in the sequential prisoner’s dilemma, unconditional cooperation of the second player is part of a “fairness” equilibrium. The reason is that Rabin’s equilibrium notion does not force player 2 to behave optimally off the equilibrium path. In a subsequent paper, Dufwenberg and Kirchsteiger (1998) generalized Rabin’s theory to N -person extensive form games for which they introduce the notion of a “Sequential Reciprocity Equilibrium” (SRE). The main innovation is to keep track of beliefs about intentions as the game evolves. In particular, it has to be speciﬁed how beliefs about intentions are formed off the equilibrium path. Given this system of beliefs, strategies have to form a fairness equilibrium

Fairness and Reciprocity

227

in every proper subgame.19 Applying their model to several examples, Dufwenberg and Kirchsteiger show that conditional cooperation in the prisoner’s dilemma is an SRE. They also show that it can be an SRE in the UG in which the Proposer makes an offer that is rejected by the Responder with certainty. This is an equilibrium because both players believe that the other party wants to hurt them. However, even in these extremely simple sequential games, the equilibrium analysis is fairly complex, and there are typically many equilibria with different equilibrium outcomes due to different self-fulﬁlling beliefs about intentions. 3.2.3.

Merging Intentions and Social Preferences

Falk and Fischbacher (1999) also generalize Rabin (1993). They consider N person extensive form games and allow for the possibility of incomplete information. Furthermore, they measure “kindness” in terms of inequity aversion. A strategy of player j is perceived to be kind by player i if it gives rise to a payoff for player i that is higher than the payoff of player j. Note that this is fundamentally different from Rabin and Dufwenberg and Kirchsteiger, who deﬁne “kindness” in relation to the feasible payoffs of player i and not in relation to the payoff that player j gets. Furthermore, Falk and Fischbacher distinguish whether an unequal distribution could have been altered by player j or whether player j was a “dummy player” who is unable to affect the distribution by his actions. In the former case, the kindness term gets a higher weight than in the latter. However, even if player j is a dummy player who has no choice to make, the kindness term (which now reﬂects pure inequity aversion) gets a positive weight. Thus, Falk and Fischbacher merge intention-based reciprocity and inequity aversion. Their model is quite complex. At every node where player i has to move, she has to evaluate the kindness of player j that depends on the expected payoff difference between the two players and on what player j could have done about this difference. This “kindness term” is multiplied by a “reciprocation term,” which is positive if player i is kind to player j and negative if i is unkind. The product is further multiplied by an individual reciprocity parameter that measures the weight of player i’s desire to reciprocate, compared with his 19

Dufwenberg and Kirchsteiger also suggest several other deviations from Rabin’s model. In particular, they measure kindness “in proportion to the size of the gift” (i.e., in monetary units). This has the advantage that reciprocity does not disappear as the stakes become larger, but it also implies that the kindness term in the utility function has the dimension of “money squared,” which again makes the utility function sensitive to linear transformations. Furthermore, they deﬁne “inefﬁcient strategies” (which play an important role in the deﬁnition of the kindness term) as strategies that yield a weakly lower payoff for all players than some other strategy for all subgames. Rabin (1993) deﬁnes inefﬁcient strategies as those that yield weakly less on the equilibrium path. However, with more than two players in Dufwenberg and Kirchsteiger (1998), the problem arises that an additional dummy player may render an inefﬁcient strategy efﬁcient and might thus affect the size of the kindness term.

228

Fehr and Schmidt

desire to get a higher material payoff. These preferences, together with the underlying game form, deﬁne a psychological game a` la Geanakoplos et al. (1989). A subgame perfect psychological Nash equilibrium of this game is called a “reciprocity equilibrium.” Falk and Fischbacher show that there are parameter constellations for which their model is consistent with the stylized facts of the UG, the GEG, the DG, the PGG, and the prisoner’s dilemma game. Furthermore, there are parameter constellations that can explain the difference in outcomes if one player moves intentionally and if she is a dummy player. Because their model contains variants of a pure intention-based reciprocity model (e.g., Rabin) and a pure inequity aversion model (e.g., Fehr and Schmidt or Bolton and Ockenfels) as special cases, it is possible to get a better ﬁt of the data, but at a signiﬁcant cost in terms of the complexity of the model. Another attempt to combine social preferences with intention-based reciprocity is due to Charness and Rabin (2000). We described their model of quasi-maximin preferences in Section 3.1.1. In a second step, they augment these preferences by introducing a demerit proﬁle ρ ≡ (ρ1 , . . . , ρ N ), where ρi ∈ [0, 1] is a measure of how much player i deserves from the point of view of all other players. The smaller the ρi , the more does player i count in the utility function of the other players. Given a demerit proﬁle ρ, player i’s utility function is given by " # Ui (x1 , x2 , . . . , x N | ρ) = (1 − γ )xi + γ [δ · min xi , min{x j + dρ j } j=i max{1 − kρ j , 0} · x j ) − f ρ j x j ], + (1 − δ) · (xi + j=i

j=i

where d, k, f ≥ 0 are three new parameters of the model. If d = k = f = 0, this boils down to the quasi-maximin preferences described previously. If d and k are large, then player i does not want to promote the well-being of player j. If f is large, player i may actually want to hurt player j. The crucial step is to endogenize the demerit proﬁle ρ. Charness and Rabin do this by comparing player j’s strategy to an unanimously agreed-upon, exogenously given “selﬂess standard” of behavior. The more player j falls short of this standard, the higher is his demerit factor ρ j . A “reciprocal fairness equilibrium” (RFE) is a strategy proﬁle and a demerit proﬁle such that each player is maximizing his utility function given other players’ strategies and given the demerit proﬁle that is itself consistent with the proﬁle of strategies. This deﬁnition implicitly corresponds to a Nash equilibrium of a psychological game as deﬁned by Geanakoplos et al. (1989). The notion of RFE has several drawbacks that make it almost impossible to use it for the analysis of even the simplest experimental games. First of all, the model is incomplete because preferences are deﬁned only in equilibrium (i.e., for an equilibrium demerit proﬁle ρ), and it is unclear how to evaluate outcomes out of equilibrium or if there are multiple equilibria. Second, it requires that all players have the same utility functions and agree on a “quasi-maximin”

Fairness and Reciprocity

229

social welfare function to determine the demerit proﬁle ρ. Finally, the model is so complicated and involves so many free parameters that it would be very difﬁcult to test it empirically. Charness and Rabin show that if the “selﬂess standard” is sufﬁciently small, then every RFE corresponds to a Nash equilibrium of the game in which players simply maximize their quasi-maximin utility functions. Therefore, in the analysis of the experimental evidence, they restrict attention to the much simpler model of quasi-maximin preferences that we discussed in Section 3.1.1. 3.3.

Axiomatic Approaches

The models considered so far assume very speciﬁc utility functions that are deﬁned either on (lotteries over) material payoff vectors and/or on beliefs about other players’ strategies and other players’ beliefs. These utility functions are based on psychological plausibility, yet most of them lack an axiomatic foundation. Segal and Sobel (1999) take the opposite approach and ask what kind of axioms generate preferences that can reﬂect fairness and reciprocity. Their starting point is to assume that players have preferences over strategy proﬁles rather than over material allocations. Consider a given two-player game and let %i , i ∈ {1, 2}, denote the space of (mixed) strategies of player i. For any strategy proﬁle (σ1 , σ2 ) ∈ % × %1 , let v i (σ1 , σ2 ) denote player i’s material payoff function, assuming that these “selﬁsh preferences” satisfy the von Neumann–Morgenstern axioms. However, the actual preferences of player i are given by a preference relation σi , σ j over her own strategies. Note that this preference relation depends on the strategy chosen by player j. Segal and Sobel show that if the preference relation σi , σ j satisﬁes the independence axiom and if, for a given σ j , player i prefers to get a higher material payoff for herself if the payoff of player j is held constant (self-interest), then the preferences σi , σ j over %i can be represented by a utility function of the form20 u i (σi , σ j ) = v i (σi , σ j ) + ai , σ j v j (σi , σ j ). In standard game theory, ai , σ j ≡ 0. Positive values of this coefﬁcient mean that player i has altruistic preferences, negative values of ai , σ j mean that she is spiteful. Note that the coefﬁcient ai , σ j depends on σ j . Therefore, whether a player is altruistic or spiteful may depend on the strategy chosen by her opponent, so there is scope to model reciprocity. To do so, Segal and Sobel introduce an additional axiom, called “reciprocal altruism.” This axiom requires that, when player j chooses a strategy σ j that player i likes better than some other strategy σ j , then player i prefers strategies that give a higher payoff to player j. Segal and Sobel show that this axiom implies that the coefﬁcient ai , σ j varies with 20

The construction resembles that of Harsanyi’s (1955) “utilitarian” social welfare function %αi u i . Note, however, that Harsanyi’s axiom of Pareto efﬁciency is stronger than the axiom of selfinterest used here. Therefore, the ai , σ j in Segal and Sobel may be negative.

230

Fehr and Schmidt

σ j such that (other things being equal) the coefﬁcient increases if and only if player j chooses a “nicer” strategy. The models of social preferences that we discussed at the beginning of this chapter – in particular the models of altruism, relative income, inequity aversion, quasi-maximin preferences, and altruism and spitefulness – can all be seen as special cases of a Segal–Sobel utility function. Segal and Sobel can also capture some, but not all, aspects of intention-based reciprocity. For example, in Rabin’s (1993) model, a player’s utility depended not only on the strategy chosen by her opponent, but also on why he has chosen this strategy. This can be illustrated in the “Battle of the Sexes” game. Player 1 may go to boxing, because she expects player 2 to go to boxing, too (which is kind of player 2, given that he believes player 1 to go to boxing). Yet, she may also go to boxing, because she expects player 2 to go to ballet (which is unkind of player 2 if he believes player 1 to go to boxing) and which is punished by the boxing strategy of player 1. This effect cannot be captured by Segal and Sobel, because in their framework preferences are deﬁned on strategies only. 4. DISCRIMINATING BETWEEN THEORIES OF FAIRNESS Most theories discussed in Section 3 have been developed during the last few years, and the evidence to discriminate between these theories is still limited. As we will show, however, the available data do exhibit some clear qualitative regularities that give a ﬁrst indication of the advantages and disadvantages of the different theories.21 4.1.

Who Are the Relevant Reference Actors?

All theories of fairness and reciprocity are based on the idea that actors compare themselves with a set of reference actors. To whom do people compare themselves? In bilateral interactions, there is no ambiguity about who the relevant reference actor is. In multiperson interactions, however, the answer is less clear. Most of the theories that are applicable in the N -person context assume that players make comparisons with all other N − 1 players in the game. The only exemption is the theory of Bolton and Ockenfels (BO). They assume that players compare themselves only with the “average” player in the game and do not care about inequities between the other players. In this regard, the BO approach is inspired by the data of Selten and Ockenfels (1998) and G¨uth and van Damme (1998), which seem to suggest that actors do not care for inequities among the other reference agents. It would greatly simplify matters if this aspect of the BO theory were correct. 21

This section rests to a large extent on joint work of one of the authors with Armin Falk and Urs Fischbacher (Falk, Fehr, and Fischbacher, 2000a, 2000b, henceforth FFF). In particular, the organization of this section according to the questions herein and many of the empirical results emerged from this joint project.

Fairness and Reciprocity

231

One problem with this aspect of the BO approach is that it renders the theory unable to explain the punishment pattern in the PGG with punishment. Remember that, in this experiment, the assignment of one punishment point reduces the income of the punished member by 3 points. The theory of BO predicts that punishing subjects are indifferent between punishing a free rider and punishing a cooperator. All that matters is whether punishment brings the income of the punishing subject closer to the average income in the group and, for this purpose, the punishment of a cooperator is equally good as the punishment of a defector. Yet, in contrast to this indifference prediction, the cooperators predominantly punish the defectors. To further test the BO model, Fehr and Fischbacher (2000) conducted the following Third-Party Punishment Game. There are three players: A, B, and C. Player A is endowed with 100 experimental currency units and must decide how much of the 100 units to give to B, who has no endowment. Player B is just a dummy player and has no decision power. Player C has an endowment of 50 units and can spend this money on the punishment of A after he observes how much A gave to B. For any money unit player C spends on punishment, the payoff of player A is reduced by 3 units.22 Note that without punishment, player C is certain to get her fair share of the total surplus (50 of 150 units). Therefore, BO predict that C will never punish. In contrast to this prediction, player A is, however, punished a lot. The less player A gives to B, the more C punishes A. For example, if A gives nothing, his income is reduced by roughly 30 percent. This indicates that many players do care about inequities among other players. Further support for this hypothesis comes from Charness and Rabin (2000), who offered player C the choice between payoff allocations (575, 575, 575) and (900, 300, 600). Because both allocations give player C the fair share of one-third of the surplus, BO predict that player C will choose the second allocation that gives him a higher absolute payoff. However, 54 percent of the subjects preferred the ﬁrst allocation. Note that the self-interest hypothesis also predicts the second allocation, so one cannot conclude that the other 46 percent of the subjects have BO preferences. A recent paper by Zizzo and Oswald (2000) also strongly suggests that subjects care about the inequities among the set of reference agents. It is important to note that theories in which fair-minded subjects have multiple reference agents do not necessarily imply that fair subjects take actions in favor of all other reference agents. To illustrate this, consider the following three-person UG (G¨uth and van Damme, 1998). In this game, there is a Proposer, a Responder who can reject or accept the proposal, and a passive Receiver who can do nothing but collect the amount of money allocated to him. The Proposer proposes an allocation (x1 , x2 , x3 ), where x1 is the Proposer’s payoff, x2 the Responder’s payoff, and x 3 the Receiver’s payoff. If the Responder rejects, all three players get nothing; otherwise, the proposed allocation is implemented. 22

In the experimental instructions, the value-laden term “punishment” was not used. The punishment option of player C was described in neutral terms by telling subjects that player C could “assign points” to player A that reduced the incomes of A and C in the way described previously.

232

Fehr and Schmidt

In this game, the Proposer allocates substantial fractions of the surplus to the Responder, but little or nothing to the Receiver. Moreover, G¨uth and van Damme (p. 230) report that, “there is not a single rejection that can clearly be attributed to a low share for the dummy (i.e., the Receiver, FS).” BO take this as evidence in favor of their approach because the Proposer and the Responder apparently do not take the Receiver’s interest into account. However, this conclusion is premature, because it is easy to show that approaches with multiple reference agents are fully consistent with the G¨uth and van Damme data. The point can be demonstrated in the context of the Fehr–Schmidt model. Assume for simplicity that the Proposer makes an offer of x1 = x2 = x, whereas the Receiver gets x3 < x. It is easy to show that a Responder with FS preferences will never (!) reject such an allocation, even if x3 = 0 and even if he is very fair-minded (i.e., has a high β-coefﬁcient). To see this, note that the utility of the Responder if he accepts is given by U2 = x − (β/2)(x − x3 ), which is positive for all β ≤ 1 and thus higher than the rejection payoff of zero. A similar calculation shows that it takes implausibly high β-values to induce a Proposer to take the interests of the Receiver into account.23 4.2.

Equality Versus Efﬁciency

Many models of fairness are based on the deﬁnition of a fair or equitable outcome to which people compare the available payoff allocations. In experimental games, a natural ﬁrst approximation for the relevant reference outcome is the equality of material payoffs. The quasi-maximin theory of Charness and Rabin assumes instead that subjects care for the total surplus accruing to the group. A natural way to study whether there are subjects who want to maximize the total surplus is to construct experiments in which the predictions of both theories of inequality aversion (BO and FS) are in conﬂict with surplus maximization. This has been done by Bolle and Kritikos (1998), Andreoni and Miller (2000), Andreoni and Vesterlund (2001). Charness and Rabin (2000), Cox (2000), and G¨uth, Kliemt, and Ockenfels (2000). Except for the G¨uth et al. paper, these papers indicate that, in DG situations a nonnegligible fraction of the subjects is willing to give up some of their own money to increase total surplus, even if this implies that they generate inequality that is to their disadvantage. Andreoni and Miller and Andreoni and Vesterlund, for example, conducted DGs with varying prices for transferring money to the Receiver. In some conditions, the Allocator had to give up less than a dollar to give the Receiver a dollar; in some conditions, the exchange ratio was 1:1, and in some other conditions the Allocator had to give up more than one dollar. In the usual DGs the exchange 23

The Proposer’s utility is given by U1 = x1 − (β/2)[(x1 − x 2 ) + (x1 − x3 )]. If we normalize the surplus to one and take into account that x 1 + x2 + x3 = 1, U1 = (β/2) + (3/2)x1 [(2/3) − β]; thus, the marginal utility of x1 is positive unless β exceeds 2/3. This means that Proposers with β < 2/3 will give the Responders just enough to prevent rejection. Because the Responders neglect the interests of the Receivers, nothing is given to the Receivers.

Fairness and Reciprocity

233

ratio is 1:1, and there are virtually no cases in which an Allocator transfers more than 50 percent of the surplus. In contrast, in DGs with an exchange ratio of 1 : 3 (or 1 : 2), a nonnegligible number of subjects makes transfers such that they end up with less money than the Receiver. This contradicts BO, FS, and Falk and Fischbacher because in these models fair subjects never take actions that give the other party more than they get. It is, however, consistent with altruistic preferences or quasi-maximin preferences. What is the relative importance of this kind of behavior? Andreoni and Vesterlund are able to classify subjects in three distinct classes. They report that 44 percent of their subjects (N = 141) are completely selﬁsh, 35 percent exhibit egalitarian preferences (i.e., they tend to equalize payoffs), and 21 percent of the subjects can be classiﬁed as surplus maximizers. Charness and Rabin report similar results with regard to the fraction of egalitarian subjects in a simple DG, where the Allocator had to choose between (own, other) allocations of (400, 400) and (400, 750). Thirty-one percent of the subjects preferred the egalitarian and 69 percent the surplus-maximizing allocation. Among the 69 percent, there may, however, also be many selﬁsh subjects who no longer choose the surplusmaximizing allocation when this decreases their payoff only slightly. This is suggested by the DG where the Allocator had to choose between (400, 400) and (375, 750). Here, only 49 percent of surplus-maximizing choices were observed. Charness and Rabin also present questionnaire evidence indicating that, when the income disparities are greater, the egalitarian motive gains weight at the cost of the surplus maximization motive. When the Allocator faces a choice between (400, 400) and (400, 2,000), 62 percent prefer the egalitarian allocation. The evidence cited in the described papers indicates that surplus maximization is a relevant motive in DGs. This motive has not been included in the prevailing models of inequity aversion, but it would be straightforward to do this. It should also be remembered that any positive transfer in DGs is incompatible with intention-based reciprocity models, irrespective of the exchange rate. We would like to stress, however, that the DG is different from many economically important games and real-life situations, because in economic interactions it is rarely the case that one player is at the complete mercy of another player. It may well be that, in situations where both players have some power to affect the outcome, the surplus maximization motive is less important than in DGs. The gift exchange experiments by Fehr et al. (1993, 1998) are telling in this regard because they embed a situation that is like a DG into an environment with competitive and strategic elements. These experiments exhibit a competitive element because the GEG is embedded into a competitive experimental market. The experiments also exhibit a strategic element because the Proposers are wage setters and have to take into account the likely effort responses of the Responders. Yet, once the Responder has accepted a wage offer, the experiments are similar to a DG because, for a given wage, the Responder essentially determines the income distribution and the total surplus by his choice of the effort level. The gift exchange experiments

234

Fehr and Schmidt

are an ideal environment to check the robustness of the surplus maximization motive, because an increase in the effort cost by one unit increases, on average, the total surplus by ﬁve units. Therefore, the maximal feasible effort level is, in general, also the surplus-maximizing effort level. If surplus maximization is a robust motive capable of overturning inequity aversion, one would expect that many Responders choose effort levels that give the Proposer a higher monetary payoff than the Responder.24 Moreover, surplus maximization also means that we should not observe a positive correlation between effort and wages because, for a given wage, the maximum feasible effort always maximizes the total surplus.25 However, neither of these implications is supported by the data. Effort levels that give the Proposer a higher payoff than the Responder are virtually nonexistent. In the overwhelming majority of the cases effort is substantially below the maximally feasible level, and in less than 2 percent of the cases the Proposer earns a higher payoff than the Responder.26 Moreover, almost all subjects who regularly chose nonminimal effort levels exhibited a reciprocal effort–wage relation. These numbers are in sharp contrast to the 49 percent of the Allocators in Charness and Rabin who preferred the (375, 750) allocation over the (400, 400) allocation. One reason for the difference across studies is perhaps the fact that it was much cheaper to increase the surplus in the Charness–Rabin example. Although the surplus increases in the gift exchange experiments on average by ﬁve units, if the Responder sacriﬁces one payoff unit, the surplus increases by 14 units per payoff unit sacriﬁced in the Charness–Rabin case. This suggests that surplus maximization gives rise to a violation of the equality constraint only if surplus increases are extremely cheap. A second reason for the behavioral difference may be that, when both players have some power to affect the outcome, the motive to increase the surplus is quickly crowded out by other considerations. This reason is quite plausible, insofar as the outcomes in DGs themselves are notoriously nonrobust. Although the experimental results on UGs, GEGs, or PGGs are fairly robust, the DG seems to be a rather fragile situation in which minor factors can have large effects. Cox (2000), for example, reports that, in his DGs, 100 percent of all subjects transferred positive amounts.27 This result contrasts sharply with many other games, including the games in Charness and Rabin and many other DGs. To indicate the other extreme, Hoffman, McCabe, Shachat, and Smith 24

25 26 27

The Responder’s effort level may, of course, also be affected by the intentions of the Proposer. For example, paying a high wage may signal fair intentions that may increase the effort level. Yet, because this tends to raise effort levels, we would have even stronger evidence against the surplus maximization hypothesis, if we observe little or no effort choices that give the Proposer a higher payoff than the Responder. There are degenerate cases in which this is not true. The total number of effort choices is N = 480 in these experiments (i.e., the results are not an artifact of a low number of observations). In Cox’s experiment, both players had an endowment of 10 and the Allocator could transfer his endowment to the Receiver where the transferred amount was tripled by the experimenter.

Fairness and Reciprocity

235

(1994), Eichenberger and Oberholzer-Gee (1998), and List and Cherry (2000) report on DGs with extremely low transfers.28 Likewise, in the Impunity Game of Bolton and Zwick (1995), which is very close but not identical to a DG, the vast majority of Proposers did not shy away from making very unfair offers. The Impunity Game differs from the DG only insofar as the Responder can reject an offer; however, the rejection destroys only the Responder’s, but not the Proposer’s, payoff. The notorious nonrobustness of outcomes in situations resembling the DG indicates that one should be very careful in generalizing the results found in these situations to other games. Testing theories of social preferences in DGs is a bit like testing the law of gravity with a table tennis ball. In both situations, minor unobserved distortions can have large effects. Therefore, we believe that it is necessary to show that the same motivational forces that are inferred from DGs are also behaviorally relevant in economically more important games. One way to do this is to apply the theories that have been constructed on the basis of DG experiments to predict outcomes in other games. With the exemption of Andreoni and Miller (2000), this has not yet been done. Andreoni and Miller (2000) estimate utility functions based on the results of their DG experiments and use them to predict cooperation behavior in a standard PGG. They predict behavior in period 1 of these games, in which cooperation is often quite high, rather well. However, their predictions are far away from ﬁnal period outcomes, where cooperation is typically very low. In our view, the low cooperation rates in the ﬁnal period of repeated PGGs constitutes a strong challenge for models that rely exclusively on altruistic or surplus-maximizing preferences. Why should a subject with a stable preference for the payoff of others or the payoff of the whole group contribute much less in the ﬁnal period, compared with the ﬁrst period? Models of inequity aversion and intentionbased or type-based reciprocity models provide a plausible explanation for this behavior. All of these models predict that fair subjects make their cooperation contingent on the cooperation of others. Thus, if the fair subjects realize that there are sufﬁciently many selﬁsh decisions in the course of a PGG experiment, they cease to cooperate as well. 4.3.

Revenge Versus Inequity Reduction

Subjects with altruistic and quasi-maximin preferences do not take actions that reduce other subjects’ payoffs. Yet, this is frequently observed in many important games. Models of inequity aversion account for this by assuming that the payoff reduction is motivated by a desire to reduce disadvantageous inequality. In intention-based reciprocity models and in Levine (1998), subjects punish 28

In Eichenberger and Oberholzer-Gee (1998), almost 90 percent of the subjects gave nothing. In Hoffman et al. (1994), 64 percent gave nothing, and 19 percent gave between 1 percent and 10 percent. In List and Cherry, subjects earned their endowment in a quiz. Then they played the DG. Roughly 90 percent of the Allocators transferred nothing to the Receivers.

236

Fehr and Schmidt

if they observe an action that is perceived to be unfair or that reveals that the opponent is spiteful. In these models, players want to reduce the opponent’s payoff irrespective of whether they are better or worse off than the opponent, and irrespective of whether they can change income shares or income differences. Furthermore, intention-based theories predict that, in games in which no intention can be expressed, there will be no punishment. Therefore, a clean way to test for the relevance of intentions is to conduct control treatments in which choices are made through a random device or through some neutral and disinterested third party. Blount (1995) was the ﬁrst who applied this idea to the UG. Blount compared the rejection rate in the usual UG to the rejection rates in UGs in which either a computer generated a random offer or a third party made the offer. Because, in the random offer condition and the third-party condition, a low offer cannot be attributed to the greedy intentions of the Proposer, intention-based theories predict a rejection rate of zero in these conditions, whereas theories of inequity aversion still allow for positive rejection rates. Levine’s theory is also consistent with positive rejection rates in these conditions, but his theory predicts a decrease in the rejection rate relative to the usual condition, because low offers made by humans reveal that the type who made the offer is spiteful, which can trigger a spiteful response. Blount, indeed, observes a signiﬁcant and substantial reduction in the acceptance thresholds of the Responders in the random offer condition, but not in the third-party condition. Thus, the result of the random offer condition is consistent with intention- and type-based models, whereas the result of the third-party condition is inconsistent with the motives captured by these models. Yet, these puzzling results may be from some problematic features in Bount’s experiments.29 Subsequently, Offermann (1999) and FFF (2000b) conducted further experiments with computerized offers, but without the other worrisome features in Blount. In particular, in these experiments, the Responders knew that a rejection affects the payoff of a real, human “Proposer.” Offerman ﬁnds that subjects are 67 percent more likely to reduce the opponent’s payoff when the opponent made an intentional hurtful choice, compared with a situation in which a computer made the hurtful choice. FFF (2000b) conducted an experiment – invented by Abbink, Irlenbusch, and Renner (2000) – that simultaneously allows for the examination of positive and negative reciprocity. In this game, player A can give player B any integer amount of money g ∈ [0, 6] or, alternatively, she can take away from player B any integer amount of money t ∈ [1, 6]. In case of g > 0, the experimenter triples g so that player B receives 3g. If player A takes away t, player A gets 29

Blount’s results may be affected by the fact that subjects (in two of three treatments) had to make decisions as a Proposer and as a Responder before they knew their actual roles. After subjects had made their decisions in both roles, the role for which they received payments was determined randomly. In one of Blount’s treatments, deception was involved. Subjects believed that there were Proposers, although in fact the experimenters made the proposals. All subjects in this condition were “randomly” assigned to the responder role. In this treatment, subjects also were not paid according to their decisions, but they received a ﬂat fee instead.

Fairness and Reciprocity

237

t and player B loses t. After player B observes g or t, she can pay player A an integer reward r ∈ [0, 18] or she can reduce player A’s income by making an investment i ∈ [1, 6]. A reward transfers one money unit from player B to player A. An investment i costs player B exactly i, but reduces player A’s income by 3i. This game was played in a random choice condition and in a human choice condition. It turns out that when the choices are made by a human player A, player B invests signiﬁcantly more into payoff reductions for all t ∈ [1, 6]. However, as in Blount and Offerman, payoff reductions also occur when the computer makes a hurtful choice. Kagel, Kim, and Moser (1996) provide further support that intentions play a role for payoff-reducing behavior. In their experiments, subjects bargained over 100 chips in a UG. They conducted several treatments that varied the money value of the chips and the information provided about the money value. For example, in one treatment, the Proposers received three times more money per chip than the Responders (i.e., the equal money split requires that the Responders receive 75 chips). If the Responders know that the Proposers know the different money values of the chips they reject, unequal money splits much more frequently than if the Responders know that the Proposers do not know the different money values of the chips. Thus, knowingly unequal proposals were rejected at higher rates than unintentional unequal proposals. Another way to test for the relevance of intention-based or type-based punishments is to examine situations in which the subjects cannot increase their relative share or decrease payoff differences. FFF (2000a) report the results of UGs and PGGs with punishment that have this feature. In the ﬁrst (standard) treatment of the UG, the Proposers could propose a (5, 5) or an (8, 2) split of the surplus (the ﬁrst number represents the Proposer’s payoff). In case of rejection, both players received zero. In the second treatment, the Proposers had the same options, but a rejection now meant that the payoff was reduced for both players by two units. The BO model, as well as the FS model, predict, therefore, that there will be no rejections in the second treatment, whereas intention-based and type-based models predict that punishments will occur. It turns out that the rejection rate of the (8, 2) offer is 56 percent in the ﬁrst and 19 percent in the second treatment. Thus, roughly one-third (19/57) of the rejections are consistent with a pure taste for punishment, as conceptualized in intention- and type-based models.30 FFF (2000a) also report the results of PGGs with punishment in which the punishing subjects could not change the payoff difference between themselves and the punished subject. In one of their treatments, subjects had to pay one money unit to reduce the payoff of another group member by one unit. Thus, BO and FS both predict that there will be no punishment at all in this condition. 30

Ahlert, Cr¨uger, and G¨uth (1999) also report a signiﬁcant amount of punishment in UGs, in which the Responders cannot change the payoff difference. However, because they do not have a control treatment, it is not possible to say something about the relative importance of this kind of punishment.

238

Fehr and Schmidt

In a second treatment, investing one unit into punishment reduced the payoff of the punished group member by three units. FFF report that 51 percent of all subjects (N = 93) cooperate, which is still compatible with both BO and FS. However, another 51 percent of all cooperators punish the defectors. They invest, on average, 4.8 money units into punishment. Thus, 25 percent of the subjects punish free-riding, which is incompatible with BO and FS. To evaluate the relative importance of this amount of punishment, we have to compare these results with the results of the second condition. In the second condition, 61 percent of all subjects (N = 120) cooperate, and 59 percent of them punish the defectors (by imposing a punishment of 5.7 on average). Thus, the overall percentage of subjects who punish the defectors in the second condition is 36 percent. This suggests that a rather large fraction (i.e., 25/36) of the overall amount of punishment is not consistent with BO and FS. Taken together, the evidence from Blount (1995), Offerman (1999), and FFF (2000b) indicates that the motive to punish unfair intentions or unfair types plays an important role. Although the evidence provided by the initial study of Blount was mixed, the subsequent studies indicate a clear role of these motives. However, the evidence also suggests that inequity aversion plays an additional, nonnegligible role. The evidence from the experiments in FFF (2000a) suggests that many subjects who reduce the payoff of other players do not have the desire to change the equitability of the payoff allocation. Instead, a large fraction of these subjects seems to be driven by the desire to punish (i.e., a desire to hurt the other player). It is worthwhile to point out that this desire to hurt the other players, although consistent with intention- and type-based models of reciprocity, does not necessarily constitute evidence in favor of these models. The reason is that the desire to reduce the payoff of other players may also be triggered by an unfair payoff allocation per se.31 4.4.

Does Kindness Trigger Rewards?

Do intention- and type-based theories of fairness do equally well in the domain of rewarding behavior? Evidence in this domain is much more mixed. Some experimental results suggest that rewarding behavior is almost unaffected by these motives. Other results indicate some minor role, and only one paper ﬁnds an unambiguous positive effect of intention- or type-based reciprocity. 31

Assume that fair subjects have the following utility function: u i = xi + αi [1/(n − 1)] ×[ j=i β(xi − x j )v(x j )], where αi measures the strength of player i’s nonpecuniary preference, and v(π j ) is an increasing function of player j’s material payoff. β(xi − x j ) is positive, if xi − x j > 0 and negative if xi − x j < 0. Thus, a state of inequality triggers the desire to reduce or increase the other players’ payoff. In this regard, the utility function is similar to the preference assumption in FS. Yet, in contrast to FS, the aim of player i is no longer the reduction of the payoff difference. Instead, player i just wants to reduce or increase the other player’s payoff, depending on the sign of β.

Fairness and Reciprocity

239

Intention-based theories predict that people are generous only if they have been treated kindly (i.e., if the ﬁrst mover has signaled a fair intention). Levine’s theory is similar in this regard, because generous actions are more likely if the ﬁrst mover reveals that she is an altruistic type. However, in contrast to the intention-based approaches, Levine’s approach is also compatible with unconditional giving, if it is sufﬁciently surplus-enhancing. Neither intention- nor type-based reciprocity can explain positive transfers in the DG. Moreover, Charness (1996), Bolton, Brandts, and Ockenfels (1998), Offerman (1999), Cox (2000), and Charness and Rabin (2000) provide further evidence that intentions do not play a big role for rewarding behavior. Charness (1996) conducted GEGs in a random choice condition and a human choice condition. Intention-based theories predict that, in the random choice condition, the Responders will not put forward more than the minimal effort level irrespective of the wage level, because high wage offers are due to chance and not to kind intentions. In the human choice condition, higher wages indicate a higher degree of kindness and, therefore, a positive correlation between wages and effort is predicted. Levine’s theory allows, in principle, for a positive correlation between wages and effort in both conditions, because an increase in effort beneﬁts the Proposer much more than they cost the Responder. However, the correlation should be much stronger in the human choice condition because of the type-revealing effect of high wages. Charness ﬁnds a signiﬁcantly positive correlation in the random choice condition. In the human choice condition, effort is only slightly lower at low wages and equally high at high wages. This indicates, if anything, only a minor role for intention- and type-driven behavior. The best interpretation is probably that inequity aversion or quasi-maximin preferences induce nonminimal effort levels in this setting. In addition, negative reciprocity kicks in at low wages that explain the lower effort levels in the human choice condition. Cox (2000) tries to isolate rewarding responses in the context of a TG by using a related DG as a control condition. In the TG, Cox observes a baseline level of Responder transfers back to the Proposer. To isolate the relevance of intention-driven responses, he conducts a DG in which the distribution of endowments is identical to the distribution of material payoffs after the Proposers’ choices in the TG. Thus, both in the TG and in the DG, the Responders face exactly the same distributions of material payoffs; but, in the TG, this distribution has been caused intentionally by the Proposers, whereas in the DG the distribution is predetermined by the experimenter. In Cox’s DG, the motive of rewarding kindness can, therefore, play no role, and intention-based theories, as well as Levine’s theory, predict that Responders transfer nothing back. If one takes into account that some transfers in the DG are driven by inequity aversion or quasimaximin preferences, the difference between the transfers in the DG and the transfers in the TG measure the relevance of intention- or type-based theories. Cox’s results indicate that these theories play only a minor or no role in this context. In one condition, there is no difference in transfers between the TG and the DG, and, in another condition, transfers in the DG are lower by only one-third.

240

Fehr and Schmidt

The strongest evidence against the role of intentions comes from Bolton, Brandts, and Ockenfels (1998). They conducted sequential social dilemma experiments that are akin to a sequentially played prisoner’s dilemma. In one condition, the ﬁrst movers could make a kind choice relative to a baseline choice. The kind choice implied that – for any choice of the second mover – the payoff of the second mover increased by 400 units at a cost of 100 for the ﬁrst mover. Then, the second mover could take costly actions to reward the ﬁrst mover. In a control condition, the ﬁrst mover could make only the baseline choice (i.e., he could not express any kind intentions). Second movers reward the ﬁrst movers even more in this control condition. Although this difference is not signiﬁcant, the results clearly suggest that intention-driven rewards play no role in this experiment. The strongest evidence in favor of intentions comes from the moonlighting game of FFF (2000b) described in the previous subsection. FFF ﬁnd that, for all positive transfers of player A, player B sends back signiﬁcantly more money in the human choice condition. Moreover, the difference between the rewards in the human choice condition and the random choice condition are also quantitatively important. A recent paper by McCabe, Rigdon, and Smith (2000) also reports evidence in favor of intention-driven positive reciprocity. They show that, after a nice choice of the ﬁrst mover, two-thirds of the second movers make nice choices, too; whereas if the ﬁrst mover is forced to make the nice choice, only one-third of the second movers make the nice choice. In the absence of the evidence provided by FFF and McCabe et al., one would have to conclude that the motive to reward good intentions or fair types is (at best) of minor importance. However, in view of the relatively strong results in the ﬁnal two papers, it seems wise to be more cautious and to wait for further evidence. Nevertheless, the bulk of the evidence suggests that inequity aversion and efﬁciency-seeking are more important than intention- or typebased reciprocity in the domain of kind behavior. 4.5.

Summary and Outlook

Although most fairness models discussed in Section 3 are just a few years old, the discussion in this section shows that there is already a fair amount of evidence that sheds light on the relative performance of the different models. This indicates a quick and healthy interaction between experimental research and the development of new theories. The initial experimental results discussed in Section 2 gave rise to a number of new theories, which, in turn, have again been quickly subjected to careful and rigorous empirical testing. Although these tests have not yet led to conclusive results regarding the relative importance of the different motives, many important and interesting insights have been obtained. In our view, the main results can be summarized as follows: 1. Evidence from the Third-Party Punishment Game and the PGG with punishment indicates that many subjects do compare themselves with

Fairness and Reciprocity

241

other people in the group and not just to the group as a whole or to the group average. 2. There is a nonnegligible number of subjects in DGs whose behavior is consistent with surplus maximization. However, the relative quantitative importance of this motive in economically relevant settings has yet to be determined, and surplus maximization alone cannot account for many robust regularities in other games. 3. Pure revenge, as captured by reciprocity models, is an important motive for payoff-reducing behavior. In some games, like the PGG with punishment, it seems to be the dominant source of payoff-reducing behavior. Because pure equity models do not capture this motive, they cannot explain a signiﬁcant amount of payoff-reducing behavior. 4. In the domain of kind behavior, the motives captured by intentionor type-based models of fairness seem to be less important than in the domain of payoff-reducing behavior. Several studies indicate that inequity aversion or quasi-maximin preferences play a more important role here. Which model of fairness does best in the light of the data, and which one should be used in applications to economically important phenomena? We believe that it is too early to give a conclusive answer to these questions. There is a large amount of heterogeneity at the individual level, and any model of fairness has difﬁculties in explaining the full diversity of the experimental observations. The evidence suggests, however, some tentative answers to these questions. In our view, the most important heterogeneity is the one between purely selﬁsh subjects and fair-minded subjects. The success of the BO model and the FS model in explaining a large variety of data from bargaining, cooperation, and market games is partly from this recognition. Within the class of these equity models, the evidence suggests that the FS model does better. In particular, the experiments discussed in Section 4.1 indicate that people do not compare themselves with the group as a whole, but rather with other individuals in the group. The group average is less compelling as a yardstick to measure equity than differences in individual payoffs. However, the FS model clearly does not recognize the full heterogeneity within the class of fair-minded individuals. Section 4.4 makes it clear that an important part of payoff-reducing behavior is not driven by the desire to reduce payoff differences, but by the desire to reduce the payoff of those who take unfair actions or reveal themselves as unfair types. The model therefore underestimates the amount of punishing behavior in situations where the cost of punishment is relatively high, compared with the payoff reductions that can be achieved by punishing. Fairness models that are exclusively based on intentions (Rabin 1993; Dufwenberg and Kirchsteiger 1998) can, in principle, account for this type of punishment. Yet, these models have other undesirable features, including multiple, and very counterintuitive, equilibria in many games and a very high degree of complexity that is from the use of psychological game

242

Fehr and Schmidt

theory. The same has to be said about the intention-based theory of Charness and Rabin (2000). Falk and Fischbacher (1999) are not plagued by the multiple equilibrium problem as much as the pure intention models. This is because they incorporate equity as a global reference standard. Their model shares, however, the complexity costs of psychological game theory. Even though none of the available theories can take into account the full complexity of motives at the individual level, some theories may allow for better approximations than others. The evidence presented in Section 2 shows clearly that there are many important economic problems for which the self-interest theory is unambiguously, and in a quantitatively important way, refuted. The recent papers by BO and FS show that one can account for the bulk of this evidence by models that explicitly take into account that there are selﬁsh and fair-minded individuals. Although we believe that it is desirable to tackle the heterogeneity within the class of fair-minded subjects in parsimonious and tractable models, we also believe that the heterogeneity between selﬁsh and fair types is more important. In fact, in the following section, we will show that the FS model provides surprisingly good qualitative and quantitative predictions in important economic domains. Thus, even if we do not yet have a fully satisfactory model of fair behavior, one can probably go a long way with simple models that take into account the interaction between selﬁsh and fair types. 5. ECONOMIC APPLICATIONS 5.1.

Competition and Fairness – When Does Fairness Matter?

The self-interest model fails to explain the experimental evidence in many games in which only a few players interact, but it is very successful in explaining the outcome of competitive markets. It is a well-established experimental fact that, in a broad class of market games, prices converge to the competitive equilibrium.32 This result holds even if the resulting allocation is very unfair by any notion of fairness. Thus, the question arises: If so many people resist unfair outcomes in, say, the UG, why don’t they behave the same way when there is competition among the players? To answer this question, consider the following UG with Proposer competition, which was conducted by Roth, Prasnikar, Okuno-Fujiwara, and Zamir (1991) in four different countries. There are n − 1 Proposers who simultaneously offer a share si ∈ [0, 1], i ∈ {1, . . . , n − 1}, to one Responder. The Responder can either accept or reject the highest offer s max = maxi {si }. If there are several Proposers who offered s max , one of them is selected at random with equal probability. If the Responder accepts s max , her monetary payoff is s max and the successful Proposer earns 1 − s max , whereas all the other Proposers get 0. If the Responder rejects, everybody gets a payoff of 0. 32

See, e.g., Smith (1962) and Davis and Holt (1993).

Fairness and Reciprocity

243

The prediction of the self-interest model is straightforward: All Proposers will offer s = 1, which is accepted by the Responder. Hence, all Proposers get a payoff of 0 and the monopolistic Responder captures the entire surplus. This outcome is clearly very unfair, but it describes precisely what happened in the experiments. After a few periods of adaptation, s max was very close to 1, and all the surplus was captured by the Responder.33 This result is remarkable. It does not seem to be more fair that one side of the market gets all of the surplus in this setting than in the standard UG. Why do the Proposers let the Responder get away with it? The reason is that, in this strategic setting, preferences for fairness or reciprocity cannot have any effect. To see this, suppose that each of the Proposers strongly dislikes to get less than the Responder. Consider Proposer i and let s = max j=i {s j } be the highest offer made by his fellow Proposers. If Proposer i offers si < s , then his offer has no effect and he will get a monetary payoff of 0 with certainty. Furthermore, he cannot prevent that the Responder gets s and that one of the other Proposers gets 1 − s ; so, he will suffer from getting less than these two. However, if he offers a little bit more than s , say s + ε, then he will win the competition, get a positive monetary payoff, and reduce the inequality between himself and the Responder. Hence, he should try to overbid his competitors. This process drives the share that is offered by the Proposers up to 1. There is nothing the Proposers can do about it, even if all of them have a strong preference for fairness. We prove this result formally in Fehr and Schmidt (1999) for the case of inequityaverse players, but the same result is also predicted by the approaches of Levine (1998) and Bolton and Ockenfels (2000). Does this mean that sufﬁciently strong competition will always wipe out the impact of fairness? The answer to this question is negative, because fairness matters much more in market games in which the execution of contracts cannot be completely determined at the stage where the parties conclude the contracts. Labor markets are a good example. A labor contract is highly incomplete, because it cannot enforce the level of effort provided by the employee who chooses his effort level after the contract has been signed. These contractual features are captured by the GEG in an experimental setting. When the GEG is embedded into a competitive experimental market [e.g., in Fehr et al. (1993, 1998)], wages are systematically higher than the competitive equilibrium wage predicted by the self-interest model. There is also no tendency for wages to decrease over time. The reason for this stable wage premium is the effort behavior of the Responders: On average, effort levels are increasing with wages that provide an incentive for the ﬁrms to pay a wage premium. If, 33

The experiments were conducted in Israel, Japan, Slovenia, and the United States. In all experiments, there were nine Proposers and one responder. Roth et al. also conducted the standard UG with one Proposer in these four countries. They did ﬁnd some small (but statistically signiﬁcant) differences between countries in the standard UG, which may be attributed to cultural differences. However, there are no statistically signiﬁcant differences between countries for the UG with Proposer competition.

244

Fehr and Schmidt

however, the effort level is ﬁxed exogenously by the experimenter, the ﬁrms do not shy away from pushing down wages to the competitive level. FS and BO can explain this pattern in a straightforward manner. When effort is endogenous, inequity-averse Responders respond to high wages with high effort levels to prevent an unequal distribution of the surplus from trade. This induces all ﬁrms (including purely selﬁsh ones) to pay a wage premium because it is proﬁtable to do so. When effort is exogenous, this mechanism does not work, and competition drives down wages to the competitive level. 5.2.

Endogenous Incomplete Contracts

If fairness concerns affect the behavior of economic agents in so many situations, then it should also be taken into account in the design of incentive schemes. Surprisingly, hardly any theoretical and very little empirical or experimental work has been done to study the impact of fairness on incentive provision. Standard contract theory neglects this issue and assumes that all agents are interested only in their own material payoffs. Over the past two decades, this theory has been highly successful in solving fairly complicated contractual problems and in designing very sophisticated mechanisms and incentive schemes. This gave rise to many important and fascinating insights, and the methods developed there have been applied in almost all areas of economics. However, standard contract theory still ﬁnds it difﬁcult to explain the simplicity and incompleteness of many contracts that we observe in the real world. In particular, it cannot explain why the parties’ monetary payoffs are often not tied to measures of performance that would be available at a relatively small cost. For example, the salary of a teacher or a university professor is rarely contingent on students’ test scores, teaching ratings, or citations. These performance measures are readily available and easily veriﬁable, so one has to conclude that these contracts are deliberately left incomplete.34 In a recent paper, Fehr, Klein, and Schmidt (2000) take a fresh look at contractual incompleteness by taking concerns for fairness and reciprocity into account. They report on several simple principal–agent experiments in which the principal was given a choice whether to offer a “complete” contract or a less complete one. In the ﬁrst experimental design, an agent had to pick an effort level 34

The literature on incomplete contracts acknowledges contractual incompleteness, but most of this literature simply assumes that no long-term contingent contracts are feasible and does not attempt to explain this premise. See, for example, Grossman and Hart (1986) or Hart and Moore (1990) and Section 5.3. There is a small literature on endogenous incomplete contracts. Some papers in this literature [e.g., Aghion, Dewatripont, and Rey (1994), N¨oldeke and Schmidt (1995), or Edlin and Reichelstein (1996)] show that, in some situations, a properly designed incomplete contract can implement the ﬁrst best, so, there is no need to write a more complete contract. Some other papers [e.g., Che and Hausch (1998), Hart and Moore (1999), and Segal (1999)] show that, although an incomplete contract does not implement the ﬁrst best, a more complete contract is of no value to the parties because it is impossible to get closer to the efﬁciency frontier.

Fairness and Reciprocity

245

between 1 and 10 (at a monetary cost to herself) that is perfectly observed by a principal and can be veriﬁed (at a small ﬁxed cost) to the courts. The principal can try to induce the agent to spend effort by imposing a ﬁne on the agent that is enforced by the courts if she works too little. However, the ﬁne is bounded above so that the highest implementable effort level (e∗ = 4) falls short of the ﬁrst-best efﬁcient action (eFB = 10). In this contractual environment, principal– agent theory predicts that the principal should use the maximal ﬁne to induce the agent to choose e∗ = 4, and that he should offer a ﬁxed wage that holds the agent down to her reservation utility. If the agent complies with the contract, the principal can capture roughly 30 percent of the ﬁrst-best surplus for himself, while the agent gets nothing. There are two alternatives to this “incentive contract.” In one treatment, the principal could choose to offer a “trust contract” that does without a ﬁne and simply pays a generous ﬁxed wage up front to the agent asking her to reciprocate by spending a higher level of effort. However, effort cannot be enforced with this contract. In a second treatment, the principal could offer a “bonus contract,” which speciﬁes a ﬁxed wage, a desired level of effort, and an announced bonus payment if the effort is to the principal’s satisfaction. However, both parties know that the bonus cannot be enforced and is left at the discretion of the principal. The trust and the bonus contract are clearly less complete than the incentive contract. Because the experiments carefully rule out any repeated interactions between the parties, both types of contracts are, according to standard principal–agent theory, doomed to fail. Given the ﬁxed wage, a pure self-interested agent will not spend any effort. Similarly, a principal who is interested only in his own income will never pay a bonus, so a rational agent should never put in any effort. If concerns for fairness and reciprocity are taken into account, the predictions are less clear cut. Consider again the optimal incentive contract (as suggested by principal–agent theory). This contract aims at a rather unfair distribution of the surplus. If the agent is concerned about this, there are two ways how she could punish the principal. First, as in a UG, she could simply reject the contract, in which case both parties get a payoff of 0. A second, and more interesting, punishment strategy is to accept the contract and to shirk. Note that, if the incentive compatibility constraint is just binding, then the cost of shirking to the agent is zero and independent of the ﬁxed wage offered by the principal. Thus, if the principal offers a somewhat higher wage that gives a positive (but still “unfair”) share of the surplus to the agent, the agent can punish the principal by accepting the wage and shirking (at zero cost to herself). Hence, concerns for fairness and reciprocity suggest that the principal has to offer a fairly generous wage to get the agent to accept and to work, which makes the incentive contract less attractive. On the other hand, concerns for fairness and reciprocity improve the performance of trust and bonus contracts. A fair agent will reciprocate to a generous wage offer in a trust contract by putting in a higher effort level voluntarily. Similarly, a fair principal will reciprocate to a high effort level by paying a

246

Fehr and Schmidt

generous bonus, making it worth the agent’s while to spend more effort. Unfortunately, however, on such a general level, it is impossible to make any clear-cut predictions about the relative performance of the three types of contracts. Is the incentive contract going to be outperformed by the trust and/or the bonus contract? Is the bonus contract induced at a higher level of effort than the trust contract or the other way round? To obtain quantitative predictions for the experiments, Fehr et al. (2000) apply the model of inequity aversion by Fehr and Schmidt (1999) to this moral hazard problem. Most other models of fairness or intention-based reciprocity would probably yield similar results, and we want to stress that these experiments were not designed to discriminate between different notions of fairness. The main advantage of our model of inequity aversion is just its simplicity, which makes it straightforward to apply to these games. However, Fehr et al. (2000) have to make a few additional assumptions. In particular, they assume for simplicity that there are only two types of subjects, “selﬁsh” players who are interested only in their own material payoffs, and “fair” players who are willing to give up their own resources to achieve a more equal payoff distribution. Furthermore, in rough accordance with the experimental results of many UGs and DGs, they assume that 60 percent of the population are selﬁsh and 40 percent are fair. With these assumptions it is a straightforward exercise to analyze the different types of contracts and obtain the following predictions: 1. Trust Contracts: Fair agents will reciprocate to high wage offers by putting in an effort level that equalizes payoffs, whereas selﬁsh agents will choose the minimum effort level of 1. Thus, a higher wage offer will, on average, induce a higher level of effort. However, it can be shown that if less than two-thirds of all agents are fair, paying a higher wage does not raise the principal’s expected proﬁt. Therefore, with 40 percent fair agents, the trust contract is not going to work. 2. Incentive Contracts: For the same reason as in the trust contract, it does not pay for the principals to elicit higher average effort levels by paying generous wages. Thus, both selﬁsh and fair principals impose the highest possible ﬁne to induce the agent to choose e = 4. However, whereas the fair principals share the surplus arising from e = 4 equally with the agent, selﬁsh principals propose unfair contracts that give them the whole surplus. They anticipate that the fair agents reject these contracts; but, because the 60 percent selﬁsh agents accept these contracts, this strategy is still proﬁtable. 3. Bonus Contracts: Selﬁsh principals always pay a bonus of zero, but fair principals pay a bonus that divides the surplus equally between the principal and the agent. Therefore, the bonus is on average increasing with the agent’s effort. Moreover, the relation between the effort and the average bonus is sufﬁciently steep to induce a selﬁsh agent to

Fairness and Reciprocity

247

put it an effort level of 7. However, the fair agent chooses an effort level of only 1 or 2 (depending on the ﬁxed wage). The reason for this surprising result is that the fair agent is concerned not only about her expected monetary payoff, but that she suffers in addition from the inequality that arises if a selﬁsh principal does not pay the bonus. Nevertheless, on average, the bonus contract implements a higher level of effort (e = 5.2) and yields a higher payoff for the principal than both the incentive contract and the trust contract.35 What are the experimental results? Each experiment had 10 periods, in which each principal was matched randomly and anonymously with a different agent. In the ﬁrst treatment, in which principals could choose between a trust and an incentive contract, roughly 50 percent of the principals chose a trust contract and 50 percent chose an incentive contract in period 1. However, the fraction of incentive contracts rose quickly and, after period 5, roughly 80 percent of all contractual choices were incentive contracts. Those principals who offered a trust contract paid generous wages, to which some agents reciprocated by putting in a high effort level. However, in 64 percent of all trust contracts, the agents chose e = 1. Thus, on average, principals incurred considerable losses when they proposed trust contracts. The incentive contracts did better, but they did much less well than predicted by standard principal–agent theory. They also did less well than predicted by the model of inequity aversion. The reason is that, at the beginning, many principals offered incentive contracts with fairly high wages that were not incentive-compatible. In these cases, 62 percent of the agents shirked, imposing considerable losses on principals. On the other hand, those principals who offered incentive-compatible incentive contracts with low wages did fairly well. Principals learned to properly design incentive contracts over time. The fraction of incentive-compatible contracts increased from only 10 percent in period 1 to 64 percent in period 10. In the second treatment, the principal had to choose between a bonus contract and an incentive contract. From the very beginning, the bonus contract was much more popular than the incentive contract and accounted for roughly 90 percent of all contractual choices. Many principals did not pay a bonus, but a signiﬁcant fraction reciprocated generously to higher effort levels. The average bonus was, therefore, strongly increasing in the effort level, which made it worthwhile for the agents to put forward rather high effort levels. The average effort level was 5.2, which is signiﬁcantly higher than the average effort of 2.5 induced by 35

The analysis of the bonus contract is complicated by the fact that the principal has to move twice. He offers the terms of the contract at the ﬁrst stage of the game, and he has to choose his bonus payment at the last stage. Thus, his contract offer may reveal some information about his type. However, it can be shown that there is no separating equilibrium in this game and that all pooling equilibria have the properties described previously. Furthermore, if we assume that a higher wage offer is not interpreted by the agent as a signal that she faces the selﬁsh principal with a higher probability, then there is a unique pooling equilibrium. See Fehr et al. (2000).

248

Fehr and Schmidt

incentive contracts. The bonus contract not only is more efﬁcient than the incentive contract, it also yields on average a much higher payoff to the principal and a moderately higher payoff to the agent. These results are clearly inconsistent with the self-interest model, whereas the model of inequity aversion explains them surprisingly well.36 Our experiments demonstrate that quite powerful incentives can be given by a very incomplete bonus contract. The bonus contract relies on reciprocal fairness as an enforcement device. It does better than the more complete incentive contracts because it is incomplete and thus leaves more freedom to the parties to reciprocate. This enforcement mechanism is not perfect and, depending on the payoff structure and the fraction of reciprocal types in the population, it can fail. In fact, we have seen that the trust contract – in which the principal has to pay, in advance, the “bonus” unconditionally – is not viable in the set up of our experiments. Yet, the performance of the bonus contract suggests that the effect of reciprocal fairness, which has been neglected in contract theory so far, is important for optimal contractual design and should be taken into account. 5.3.

The Optimal Allocation of Ownership Rights

Consider two parties, A and B, who are engaged in a joint project (a “ﬁrm”) to which they have to make some relationship-speciﬁc investments today to generate a joint surplus in the future. An important question that has received considerable attention in recent years is who should own the ﬁrm. In a seminal paper, Grossman and Hart (1986) argue that ownership rights allocate residual rights of control on the physical assets that are required to generate the surplus. For example, if A owns the ﬁrm, then he will have a stronger bargaining position than B in the renegotiation game in which the surplus between the two parties is shared ex post, because he can exclude B from using the assets that make B’s relationship-speciﬁc investment less productive. Grossman and Hart show that there is no ownership structure that implements ﬁrst-best investments, but some ownership structures do better than others, and there is a unique second-best optimal allocation of ownership rights. 36

In a second experimental design, Fehr et al. (2000) consider a multitask principal–agent model inspired by Holmstr¨om and Milgrom (1991). In this experiment, the agents have to choose two separate effort levels (“tasks”), e1 and e2 , both of which are observable by the principal, but only e1 is veriﬁable and can be contracted on. The principal can choose between a piece-rate contract that rewards the agent for his effort spent on task 1 and a bonus contract that announces a voluntary bonus payment if the agent’s effort on both tasks is to the principal’s satisfaction. The overwhelming majority of principals opted for the bonus contract, which induced the agents to spend, on average, a considerable amount of effort and to allocate total effort efﬁciently across tasks. Those principals that chose a piece-rate contract induced the agents to concentrate all of their total efforts on task 1, which is very inefﬁcient. Again, these results are inconsistent with the self-interest model, but they can be nicely explained by the Fehr–Schmidt model of inequity aversion.

Fairness and Reciprocity

249

A common feature of most incomplete contract models is that joint ownership cannot be optimal.37 This result is at odds with the fact that there are many jointly owned companies, partnerships, or joint ventures. Furthermore, the argument neglects that reciprocal fairness may be an important enforcement mechanism to induce the involved parties to invest more under joint ownership than otherwise predicted. To test this hypothesis, Fehr, Kremhelmer, and Schmidt (2000) conducted a series of experiments on the optimal allocation of ownership rights. The experimental game is a grossly simpliﬁed version of Grossman and Hart (1986): There are two parties, A and B, who have to make investments, a, b ∈ {1, . . . , 10}, respectively, to generate a joint surplus v(a, b). Investments are sequential: B has to invest ﬁrst; his investment level b is observed by A, who has to invest thereafter. We consider two possible ownership structures: Under A ownership, A hires B as an employee and pays her a ﬁxed wage w. In this case, monetary payoffs are v(a, b) − w − a for A and w − b for B. Under joint ownership, each party gets half of the gross surplus minus his or her investment cost [i.e., 0.5v(a, b) − a for A and 0.5v(a, b) − b for B]. The gross proﬁt function has been chosen such that maximal investments are efﬁcient (i.e., a FB = bFB = 10), but if each party gets only 50 percent of the marginal return of their investments, then it is a dominant strategy for a purely self-interested player to choose the minimum level of investment, a = b = 1. Finally, in the ﬁrst stage of the game, A can decide whether to be the sole owner of the ﬁrm and make a wage offer to B, or whether to have joint ownership. The prediction of the self-interest model is straightforward. Under A ownership, B has no incentive to invest and will choose b = 1. On the other hand, A is a full residual claimant on the margin, so she will invest efﬁciently. Under joint ownership, each party gets only 50 percent of the marginal return, which is not sufﬁcient to induce any investments. Hence, in this case, B’s optimal investment level is unchanged, but A’s investment level is reduced to a = 1. Thus, A ownership outperforms joint ownership, and A should hire B as an employee. In the experiments, just the opposite happened. Party A chose joint ownership in more than 80 percent (187 of 230) of all observations and gave away 50 percent of the gross return to B. Moreover, the fraction of joint ownership contracts increased from 74 percent in the ﬁrst two periods to 89 percent in the 37

To see this note that, in the renegotiation game in which the surplus is shared, each party gets its reservation utility plus a ﬁxed fraction (50 percent, say) of the joint surplus in excess of the sum of the reservation utilities. Now, consider A ownership. If A invests, then his investment increases not only the joint surplus, but also his reservation utility (i.e., what he could get out of the ﬁrm without B’s collaboration). On the other hand, if B invests, then her investment increases only the joint surplus, but it does not improve her reservation utility. The reason is that the investment requires access to the ﬁrm to be productive. Hence, without the ﬁrm, B’s investment is useless. This is why A will invest more than B under A ownership. Consider now joint ownership. If both parties own the ﬁrm jointly, then each of them can prevent the other from using the assets. Hence, neither A’s nor B’s investment affects their respective reservation utilities. Therefore, A’s investment incentives are reduced, whereas B’s investment incentives do not improve. Hence, joint ownership is inferior.

250

Fehr and Schmidt

last two periods. With joint ownership, B players chose on average an investment level of 8.9, and A responded with an investment of 6.5 (on average). On the other hand, if A ownership was chosen and A hired B as an employee, B’s average investment was only 1.3, whereas all A players chose an investment level of 10. Furthermore A players earned much more on average if they chose joint ownership rather than A ownership. These results are inconsistent with the self-interest model, but it is straightforward to explain them with concerns for fairness. Applying the Fehr and Schmidt (1999) model of inequity aversion gives again fairly accurate quantitative predictions. Thus, the experimental results and the theoretical analysis suggest that joint ownership may do better than A ownership, because it offers more scope for reciprocal behavior. Subjects seem to understand this and predominantly choose this ownership structure. 6. CONCLUSIONS The self-interest model has been very successful in explaining individual behavior on competitive markets, but it is unambiguously refuted in many situations in which individuals interact strategically. The experimental evidence on, for example, UGs, DGs, GEGs, and PGGs demonstrates unambiguously not only that many people are maximizing their own material payoffs, but also that they are concerned about social comparisons, fairness, and the desire to reciprocate. We have reviewed several models that try to take these concerns explicitly into account. A general lesson to be drawn from these models is that the assumption that some people are fair-minded and have the desire to reciprocate does not imply that these people will always behave “fairly.” In some environments (e.g., in competitive markets or in PGGs without punishment), fair-minded actors will often behave as if they are purely self-interested. Likewise, a purely self-interested person may often behave as if he is strongly concerned about fairness (e.g., the Proposers who make fair proposals in the UG or generous wage offers in the GEG). Thus, the behavior of fair-minded and purely selfinterested actors depends on the strategic environment in which they interact and on their beliefs about the fairness of their opponents. The analysis of this behavior is not trivial, and it is helpful to develop theoretical tools to better understand what we observe. Some of the models reviewed focus solely on preferences over income distributions and ignore the fact that people often care about the intentions behind the actions of their opponents. Some other papers focus only on intention-based or type-based reciprocity and ignore the fact that some people are bothered by unfair distributions, even if their opponent could not do anything about it. It seems natural to try to combine these two motivations in a single model as has been done by Falk and Fischbacher (1998) and Charness and Rabin (2000). However, we believe that the cost of doing so is high. These models are rather complicated; they rely on psychological game theory, and it is difﬁcult to apply them even to very simple experimental games. Moreover, Charness and Rabin,

Fairness and Reciprocity

251

in particular, are plagued with multiple equilibria and have much more free parameters than all the other models. On the other hand, simple models of social preferences – for example, Bolton and Ockenfels’ (2000) ERC model or our own (1999) model of inequity aversion – ﬁt the data on large classes of games fairly well. They use standard game theory, they have fewer parameters to be estimated, and it is fairly straightforward to get clear-cut qualitative and quantitative predictions. The main advantage of these simple models is that they can easily be applied to other ﬁelds in economics. For more than 20 years, experimental economists concentrated on simple experimental games to better understand what drives economic behavior. However, very few of the insights that have been gained had any impact on how economists interpret the world. We feel that it is now time to change this. Many phenomena in situations in which people interact strategically cannot be understood by relying on the self-interest model alone. Our examples from contract theory and the theory of property rights illustrate that models of reciprocal fairness can be fruitfully applied to important and interesting economic questions, yielding predictions that are much closer to what we observe in many situations of the real world and in carefully controlled experiments than the predictions of the self-interest model. There are many other areas in which fairness models are likely to generate interesting new insights – be it the functioning of labor markets or questions of political economy or be it the design of optimal mechanisms or questions of compliance with organizational rules and the law. We hope that this is just the beginning. There is no shortage of important questions to which the newly developed tools and insights can be applied. ACKNOWLEDGMENTS We thank Glenn Ellison for many helpful comments and suggestions, and Alexander Klein and Susanne Kremhelmer for excellent research assistance. Part of this research was conducted while Klaus M. Schmidt visited Stanford University, and he thanks the Economics Department for its great hospitality. Financial support by Deutsche Forschungsgemeinschaft through Grant SCHM1196/4-1 is gratefully acknowledged. Ernst Fehr also gratefully acknowledges support from the Swiss National Science Foundation (Project No. 121405100.97), the Network on the Evolution of Preferences and Social Norms of the MacArthur Foundation, and the EU-TMR Research Network ENDEAR (FMRX-CTP98-0238).

References Abbink, K., B. Irlenbusch, and E. Renner (2000), “The Moonlighting Game: An Experimental Study on Reciprocity and Retribution,” Journal of Economic Behavior and Organization, 42, 265–277.

252

Fehr and Schmidt

Agell, J. and P. Lundborg (1995), “Theories of Pay and Unemployment: Survey Evidence from Swedish Manufacturing Firms,” Scandinavian Journal of Economics, 97, 295– 308. Aghion, P., M. Dewatripont, and P. Rey (1994), “Renegotiation Design with Unveriﬁable Information,” Econometrica, 62, 257–282. Ahlert, M., A. Cr¨uger, and W. G¨uth (1999), “An Experimental Analysis of Equal Punishment Games,” mimeo, University of Halle-Wittenberg. Alm, J., I. Sanchez, and A. de Juan (1995), “Economic and Noneconomic Factors in Tax Compliance,” Kyklos, 48, 3–18. Andreoni, J. (1989), “Giving with Impure Altruism: Applications to Charity and Ricardian Equivalence,” Journal of Political Economy, 97, 1447–1458. Andreoni, J., B. Erard, and J. Feinstein (1998), “Tax Compliance,” Journal of Economic Literature, 36, 818–860. Andreoni, J. and J. Miller (1993), “Rational Cooperation in the Finitely Repeated Prisoner’s Dilemma: Experimental Evidence,” Economic Journal, 103, 570–585. Andreoni, J. and J. Miller (2000), “Giving According to GARP: An Experimental Test of the Rationality of Altruism,” mimeo, University of Wisconsin and Carnegie Mellon University. Andreoni, J. and H. Varian (1999), “Preplay Contracting in the Prisoner’s Dilemma,” Proceedings of the National Academy of Sciences USA, 96, 10933–10938. Andreoni, J. and L. Vesterlund, “Which Is the Fair Sex? Gender Differences in Altruism,” Quarterly Journal of Economics, 116, 293–312. Arrow, K. J. (1981), “Optimal and Voluntary Income Redistribution,” in Economic Welfare and the Economics of Soviet Socialism: Essays in Honor of Abram Bergson, (ed. by S. Rosenﬁeld), Cambridge, UK: Cambridge University Press. Becker, G. S. (1974), “A Theory of Social Interactions,” Journal of Political Economy, 82, 1063–1093. Berg, J., J. Dickhaut, and K. McCabe (1995), “Trust, Reciprocity and Social History,” Games and Economic Behavior, 10, 122–142. Bernheim, B. D. (1986), “On the Voluntary and Involuntary Provision of Public Goods,” American Economic Review, 76, 789–793. Bewley, T. (1999), Why Wages Don’t Fall During a Recession. Cambridge, MA: Harvard University Press. Binmore, K. (1998), Game Theory and the Social Contract: Just Playing. Cambridge, MA: MIT Press. Binmore, K., J. Gale, and L. Samuelson (1995), “Learning to Be Imperfect: The Ultimatum Game,” Games and Economic Behavior, 8, 56–90. Blount, S. (1995), “When Social Outcomes Aren’t Fair: The Effect of Causal Attributions on Preferences,” Organizational Behavior and Human Decision Processes, 43, 131– 144. Bolle, F. and A. Kritikos (1998), “Self-Centered Inequality Aversion Versus Reciprocity and Altruism,” mimeo, Europa-Universit¨at Viadrina. Bolton, G. E. (1991), “A Comparative Model of Bargaining: Theory and Evidence,” American Economic Review, 81, 1096–1136. Bolton, G. E., J. Brandts, and A. Ockenfels (1998), “Measuring Motivations for the Reciprocal Responses Observed in a Simple Dilemma Game,” Experimental Economics, 3, 207–221. Bolton, G. E. and A. Ockenfels (2000), “A Theory of Equity, Reciprocity, and Competition,” American Economic Review, 100, 166–193.

Fairness and Reciprocity

253

Bolton, G. and R. Zwick (1995), “Anonymity Versus Punishment in Ultimatum Bargaining,” Games and Economic Behavior, 10, 95–121. Bowles, S. and H. Gintis (1999), “The Evolution of Strong Reciprocity,” mimeo, University of Massachusetts at Amherst. Bowles, S. and H. Gintis (2000), “Reciprocity, Self-Interest, and the Welfare State,” Nordic Journal of Political Economy, 26, 33–53. Brandts, J. and G. Charness (1999), “Gift-Exchange with Excess Supply and Excess Demand,” mimeo, Universitat Pompeu Fabra, Barcelona. Camerer, C. F. (1999), “Social Preferences in Dictator, Ultimatum and Trust Games,” mimeo, California Institute of Technology. Camerer, C. F. and R. H. Thaler (1995), “Ultimatums, Dictators and Manners,” Journal of Economic Perspectives, 9, 209–219. Cameron, L. A. (1999), “Raising the Stakes in the Ultimatum Game: Experimental Evidence from Indonesia.” Economic Inquiry, 37(1), 47–59. Carpenter, J. P. (2000), “Punishing Free-Riders: The Role of Monitoring – Group Size, Second-Order Free-Riding and Coordination,” mimeo, Middlebury College. Chamberlin, E. H. (1948), “An Experimental Imperfect Market,” Journal of Political Economy, 56, 95–108. Charness, G. (1996), “Attribution and Reciprocity in a Labor Market: An Experimental Investigation,” mimeo, University of California at Berkeley. Charness, G. (2000), “Responsibility and Effort in an Experimental Labor Market,” Journal of Economic Behavior and Organization, 42, 375–384. Charness, G. and M. Rabin (2000), “Social Preferences: Some Simple Tests and a New Model,” mimeo, University of California at Berkeley. Che, Y.-K. and D. B. Hausch (1999), “Cooperative Investments and the Value of Contracting.” American Economic Review, 89(1), 125–147. Cooper, D. J. and C. K. Stockman (1999), “Fairness, Learning, and Constructive Preferences: An Experimental Investigation,” mimeo, Case Western Reserve University. Costa-Gomes, M. and K. G. Zauner (1999), “Learning, Non-equilibrium Beliefs, and Non-Pecuniary Payoff Uncertainty in an Experimental Game,” mimeo, Harvard Business School. Cox, J. C. (2000), “Trust and Reciprocity: Implications of Game Triads and Social Contexts,” mimeo, University of Arizona at Tucson. Croson, R. T. A. (1999), “Theories of Altruism and Reciprocity: Evidence from Linear Public Goods Games,” Discussion Paper, Wharton School, University of Pennsylvania. Daughety, A. (1994), “Socially-Inﬂuenced Choice: Equity Considerations in Models of Consumer Choice and in Games,” mimeo, University of Iowa. Davis, D. and C. Holt (1993), Experimental Economics. Princeton, NJ: Princeton University Press. Dawes, R. M. and R. Thaler (1988), “Cooperation,” Journal of Economic Perspectives, 2, 187–197. Dufwenberg, M. and G. Kirchsteiger (1998), “A Theory of Sequential Reciprocity,” Discussion Paper, CENTER, Tilburg University. Edlin, A. S. and S. Reichelstein (1996), “Holdups, Standard Breach Remedies, and Optimal Investment,” American Economic Review, 86(3), 478–501. Eichenberger, R. and F. Oberholzer-Gee (1998), “Focus Effects in Dictator Game Experiments,” mimeo, University of Pennsylvania.

254

Fehr and Schmidt

Ellingsen, T. and M. Johannesson (2000), “Is There a Hold-up Problem? Stockholm School of Economics,” Working Paper 357. Encyclopaedia Britannica (1998), The New Encyclopaedia Britannica, Volume 1, (15th ed.), London, Encyclopaedia Britannica. Fahr, R. and B. Irlenbusch (2000), “Fairness as a Constraint on Trust in Reciprocity: Earned Property Rights in a Reciprocal Exchange Experiment,” Economics Letters, 66, 275–282. Falk, A. E. Fehr, and U. Fischbacher (2000a), “Informal Sanctions, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 59. Falk, A., E. Fehr, and U. Fischbacher (2000b), “Testing Theories of Fairness–Intentions Matter,” Institute for Empirical Research in Economics, University of Zurich, Working Paper 63. Falk, A., E. Fehr, and U. Fischbacher (2000c), “Appropriating the Commons, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 55. Falk, A. and U. Fischbacher (1999), “A Theory of Reciprocity, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 6. Falk, A., S. G´achter, and J. Kov´acs (1999), “Intrinsic Motivation and Extrinsic Incentives in a Repeated Game with Incomplete Contracts,” Journal of Economic Psychology, 20, 251–284. Fehr, E. and A. Falk (1999), “Wage Rigidity in a Competitive Incomplete Contract Market,” Journal of Political Economy, 107, 106–134. Fehr, E. and U. Fischbacher (2000), “Third Party Punishment,” mimeo, University of Z¨urich. Fehr, E. and S. G¨achter (2000), “Cooperation and Punishment in Public Goods Experiments,” American Economic Review, 90, 980–994. Fehr, E., S. G¨achter, and G. Kirchsteiger (1997), “Reciprocity as a Contract Enforcement Device,” Econometrica, 65, 833–860. Fehr, E., G. Kirchsteiger, and A. Riedl (1993), “Does Fairness Prevent Market Clearing? An Experimental Investigation,” Quarterly Journal of Economics, 108, 437–460. Fehr, E., G. Kirchsteiger, and A. Riedl (1998), “Gift Exchange and Reciprocity in Competitive Experimental Markets,” European Economic Review, 42, 1–34. Fehr, E., A. Klein, and K. M. Schmidt (2000), “Endogenous Incomplete Contracts,” mimeo, University of Munich. Fehr, E., S. Kremhelmer, and K. M. Schmidt (2000), “Fairness and the Optimal Allocation of Property Rights,” mimeo, University of Munich. Fehr, E. and K. M. Schmidt (1999), “A Theory of Fairness, Competition and Cooperation.” Quarterly Journal of Economics, 114, 817–868. Fehr, E. and E. Tougareva (1995), “Do High Monetary Stakes Remove Reciprocal Fairness? Experimental Evidence from Russia,” mimeo, Institute for Empirical Economic Research, University of Zurich. Fischbacher, U., S. G¨achter, and E. Fehr (1999), “Are People Conditionally Cooperative? Evidence from a Public Goods Experiment,” Working Paper 16, Institute for Empirical Research in Economics, University of Zurich. Forsythe, R. L., J. Horowitz, N. E. Savin, and M. Sefton (1994), “Fairness in Simple Bargaining Games,” Games and Economic Behavior, 6, 347–369. Frey, B. and H. Weck-Hannemann (1984), “The Hidden Economy as an ‘Unobserved’ Variable,” European Economic Review, 26, 33–53. G¨achter, S. and A. Falk (1999), “Reputation or Reciprocity?” Working Paper 19, Institute for Empirical Research in Economics, University of Z¨urich.

Fairness and Reciprocity

255

Geanakoplos, J., D. Pearce, and E. Stacchetti (1989), “Psychological Games and Sequential Rationality,” Games and Economic Behavior, 1, 60–79. Gintis, H. (2000), “Strong Reciprocity and Human Sociality,” Journal of Theoretical Biology, 206, 169–179. Greenberg, J. (1990), “Employee Theft as a Reaction to Underpayment Inequity: The Hidden Cost of Pay Cuts,” Journal of Applied Psychology, 75, 561–568. Grossman, S. and O. Hart (1986), “An Analysis of the Principal–Agent Problem,” Econometrica, 51, 7–45. G¨uth, W., H. Kliemt, and A. Ockenfels (2000), “Fairness Versus Efﬁciency – An Experimental Study of Mutual Gift-Giving,” mimeo, Humboldt University of Berlin. G¨uth, W., R. Schmittberger, and B. Schwarze (1982), “An Experimental Analysis of Ultimatium Bargaining,” Journal of Economic Behavior and Organization, 3, 367– 388. G¨uth, W. and E. van Damme (1998), “Information, Strategic Behavior and Fairness in Ultimatum Bargaining: An Experimental Study,” Journal of Mathematical Psychology, 42, 227–247. Hannan, L., J. Kagel, and D. Moser (1999), “Partial Gift Exchange in Experimental Labor Markets: Impact of Subject Population Differences, Productivity Differences and Effort Requests on Behavior,” mimeo, University of Pittsburgh. Harsanyi, J. (1955), “Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility,” Journal of Political Economy, 63, 309–321. Hart, O. and J. Moore (1990), “Property Rights and the Nature of the Firm,” Journal of Political Economy, 98, 1119–1158. Hart, O. and J. Moore (1999), “Foundations of Incomplete Contracts,” Review of Economic Studies, 66, 115–138. Hoffman, E., K. McCabe, K. Shachat, and V. Smith (1994), “Preferences, Property Rights, and Anonymity in Bargaining Games,” Games and Economic Behavior, 7, 346–380. Hoffman, E., K. McCabe, and V. Smith (1996), “On Expectations and Monetary Stakes in Ultimatum Games,” International Journal of Game Theory, 25, 289–301. Holmstr¨om, B. and P. Milgrom (1991), “Multi-Task Principal-Agent Analyses.” Journal of Law, Economics, and Organization, 7, 24–52. Isaac, M. R., J. M. Walker, A. W. Williams (1994), “Group Size and the Voluntary Provision of Public Goods,” Journal of Public Economics, 54, 1–36. Kagel, J. H, C. Kim, and D. Moser (1996), “Fairness in Ultimatum Games with Asymmetric Information and Asymmetric Payoffs,” Games and Economic Behavior, 13, 100–110. Kahneman, D., J. L. Knetsch, and R. Thaler (1986), “Fairness as a Constraint on Proﬁt Seeking: Entitlements in the Market,” American Economic Review, 76, 728– 741. Kirchsteiger, G. (1994), “The Role of Envy in Ultimatum Games,” Journal of Economic Behavior and Organization, 25, 373–389. Ledyard, J. (1995), “Public Goods: A Survey of Experimental Research,” Chapter 2, in Handbook of Experimental Economics, (ed. by A. Roth and J. Kagel), Princeton, NJ: Princeton University Press. Levine, D. (1998), “Modeling Altruism and Spitefulness in Experiments,” Review of Economic Dynamics, 1, 593–622. Lind, A. and T. Tyler (1988) The Social Psychology of Procedural Justice. New York: Plenum Press.

256

Fehr and Schmidt

List, J. and T. Cherry (2000), “Examining the Role of Fairness in Bargaining Games,” mimeo, University of Arizona at Tucson. McCabe, K., M. Rigdon, and V. Smith (2000), “Positive Reciprocity and Intentions in Trust Games,” mimeo, University of Arizona at Tucson. Miller, S. (1997), “Strategienuntersuchung zum Investitionsspiel von Berg,” Dickhaut, McCabe, Diploma Thesis, University of Bonn. Neilson, W. (2000), “An Axiomatic Characterization of the Fehr-Schmidt Model of Inequity Aversion,” mimeo, Department of Economics, Texas A&M University. N¨oldeke, G. and K. M. Schmidt (1995), “Option Contracts and Renegotiation: A Solution to the Hold-Up Problem,” Rand Journal of Economics, 26, 163–179. Offerman, T. (1999), “Hurting Hurts More Than Helping Helps: The Role of the Selfserving Bias,” mimeo, University of Amsterdam. Ostrom, E. (1990), Governing the Commons – The Evolution of Institutions for Collective Action. New York: Cambridge University Press. Ostrom, E. (2000), “Collective Action and the Evolution of Social Norms,” Journal of Economic Perspectives, 14, 137–158. Rabin, M. (1993), “Incorporating Fairness into Game Theory and Economics,” American Economic Review, 83(5), 1281–1302. Roth, A. E. (1995), “Bargaining Experiments,” in Handbook of Experimental Economics, (ed. by J. Kagel and A. Roth) Princeton, NJ: Princeton University Press. Roth, A. E. and I. Erev (1995), “Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term,” Games and Economic Behavior, 8, 164–212. Roth, A. E., M. W. K. Malouf, and J. K. Murningham (1981), “Sociological Versus Strategic Factors in Bargaining,” Journal of Economic Behavior and Organization, 2, 153–177. Roth, A. E., V. Prasnikar, M. Okuno-Fujiwara, and S. Zamir (1991), “Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study,” American Economic Review, 81, 1068–1095. Samuelson, P. A. (1993), “Altruism as a Problem Involving Group Versus Individual Selection in Economics and Biology,” American Economic Review, 83, 143–148. Segal, I. (1999), “Complexity and Renegotiation: A Foundation for Incomplete Contracts,” Review of Economic Studies, 66(1), 57–82. Segal, U. and J. Sobel (1999), “Tit for Tat: Foundations of Preferences for Reciprocity in Strategic Settings,” mimeo, University of California at San Diego. Seidl, C. and S. Traub (1999), “Taxpayers’ Attitudes, Behavior, and Perceptions of Fairness in Taxation,” mimeo, Institut f¨ur Finanzwissenschaft und Sozialpolitik, University of Kiel. Selten, R. and A. Ockenfels (1998), “An Experimental Solidarity Game,” Journal of Economic Behavior and Organization, 34, 517–539. Sen, A. (1995), “Moral Codes and Economic Success,” in Market Capitalism and Moral Values (ed. by C. S. Britten and A. Hamlin), Aldershot, UK: Edward Elgar. Sethi, R. and E. Somananthan (2001), “Preference Evolution and Reciprocity,” Journal of Economic Theory, 97, 273–297. Sethi, R. and E. Somananthan (2000), “Understanding Reciprocity,” mimeo, Columbia University. Slonim, R. and A. E. Roth (1997), “Financial Incentives and Learning in Ultimatum and Market Games: An Experiment in the Slovak Republic,” Econometrica, 65, 569– 596.

Fairness and Reciprocity

257

Smith, A. (1759), The Theory of Moral Sentiments. Indianapolis, IN: Liberty Fund (reprinted 1982). Smith, V. L. (1962), “An Experimental Study of Competitive Market Behavior,” Journal of Political Economy, 70, 111–137. Sonnemans, J., A. Schram, and T. Offerman (1999), “Strategic Behavior in Public Good Games–When Partners Drift Apart,” Economics Letters, 62, 35–41. Suleiman, R. (1996), “Expectations and Fairness in a Modiﬁed Ultimatum Game,” Journal of Economic Psychology, 17, 531–554. Veblen, T. (1922), The Theory of the Leisure Class–An Economic Study of Institutions. London: George Allen and Unwin (ﬁrst published 1899). Zajac, E. (1995), “Political Economy of Fairness,” Cambridge, MA: MIT Press. Zizzo, D. and A. Oswald (2000), “Are People Willing to Pay to Reduce Others’ Income?” mimeo, Oxford University.

CHAPTER 7

Hyberbolic Discounting and Consumption Christopher Harris and David Laibson

1. INTRODUCTION Robert Strotz (1956) ﬁrst suggested that people are more impatient when they make short-run trade-offs than when they make long-run trade-offs.1 Virtually every experimental study on time preference has supported Strotz’s conjecture.2 When two rewards are both far away in time, decision-makers act relatively patiently (e.g., I prefer two apples in 101 days, rather than one apple in 100 days). But when both rewards are brought forward in time, preferences exhibit a reversal, reﬂecting more impatience (e.g., I prefer one apple right now, rather than two apples tomorrow).3 Such reversals should be well understood by everyone who makes far-sighted New Year’s resolutions and later backtracks. We promise ourselves to exercise, diet, and quit smoking, but often postpone those virtuous behaviors when the moment arrives to make the required sacriﬁces. Looking to the long run, we wish to act patiently, but the desire for instant gratiﬁcation frequently overwhelms our good intentions. The contrast between long-run patience and short-run impatience has been modeled with discount functions that take an approximately hyperbolic form (Ainslie, 1992, Loewenstein and Prelec 1992, Laibson, 1997a). Such preferences imply that the instantaneous discount rate declines as the horizon increases. This pattern of discounting sets up a conﬂict between today’s preferences and the preferences that will be held in the future. From the perspective of period 0, the discount rate between two distant periods, t and t + 1, is a long-term low discount rate. However, from the perspective of period t, the discount rate between t and t + 1 is a short-term high discount rate. Hyperbolic consumers will report a gap between what they feel they should save and what they actually save. Prescriptive saving rates will lie above actual

1 2 3

Some of Strotz’s insights are anticipated by Ramsey (1928). See Ainslie (1992) and Frederick, Loewenstein, and O’Donoghue (2001) for reviews of the evidence for and against hyperbolic discounting. This example is from Thaler (1981).

Hyberbolic Discounting and Consumption

259

savings rates, because short-run preferences for instantaneous gratiﬁcation will undermine the consumer’s desire to implement long-run patient plans. However, the hyperbolic consumer is not doomed to retire in poverty. Illiquid assets can help the hyperbolic consumer lock in the patient, welfare-enhancing course of action. Hence, the availability of illiquid assets becomes a critical determinant of household savings and welfare. However, too much illiquidity can be problematic. Consumers face substantial uninsurable labor-income risk, and need to use liquid assets to smooth their consumption. Hyperbolic agents seek an investment portfolio that strikes the right balance between commitment and ﬂexibility. In this paper, we review and extend the literature on hyperbolic discounting and consumption. We begin our analysis of hyperbolic consumers by describing an inﬁnite-horizon consumption problem with a single liquid asset. Using this tractable problem, we characterize equilibrium behavior. We prove a new equilibrium uniqueness theorem, characterize some properties of the consumption function, and illustrate additional properties of the consumption function with numerical simulations. We show that hyperbolic consumption functions may exhibit pathologies like discontinuities, nonmonotonicities, and concavity violations. We analyze the comparative statics of these pathologies. The pathologies are exacerbated as hyperbolicity increases, risk aversion falls, and income uncertainty falls. We also show that these pathologies do not arise when the model parameters are calibrated at empirically sensible benchmark values. Finally, we review our earlier results on the Euler relation characterizing the equilibrium path (Harris and Laibson, 2001a). We then discuss simulations of savings and asset allocation choices of households who face a life cycle problem with liquid assets, liquid liabilities, and illiquid assets (Angeletos, Laibson, Repetto, Tobacman, and Weinberg 2001a; hereafter ALRTW). These life cycle simulations are used to compare the behavior of hyperbolic households and exponential households. Both the exponential and hyperbolic households are calibrated to hold levels of preretirement wealth that match observed levels of wealth reported in the Survey of Consumer Finances (SCF). Despite the fact that this calibration imposes identical levels of total wealth for hyperbolics and exponentials, numerous differences arise. First, the hyperbolic households invest comparatively little of their wealth in liquid assets. They hold relatively low levels of liquid wealth measured either as a fraction of labor income or as a share of total wealth. Analogously, hyperbolic households also borrow more aggressively in the revolving credit market (i.e., on credit cards). The low levels of liquid wealth and high rates of credit card borrowing generated by hyperbolic simulations match empirical measures from the SCF much better than the results of exponential simulations. Because the hyperbolic households have low levels of liquid assets and high levels of debt, they are unable to smooth their consumption paths in the presence of predictable changes in income. Calibrated hyperbolic simulations display substantial comovement between consumption and predictable income growth, matching empirical measures of comovement from the Panel Study of

260

Harris and Laibson

Income Dynamics (PSID). By contrast, calibrated exponential simulations generate too little consumption-income comovement. Similarly, hyperbolic simulations generate substantial drops in consumption around retirement, matching empirical estimates. The exponential simulations fail to replicate this pattern. All in all, the hyperbolic model matches observed consumption data better than the exponential model. Our paper is organized in 12 sections, and readers are encouraged to pick and choose among them. Section 4 contains the most technical parts of the paper and can be skipped by readers primarily interested in applications. In Section 2, we discuss the hyperbolic discount function. In Section 3, we present a one-asset, inﬁnite-horizon buffer-stock consumption model, which can accommodate either exponential or hyperbolic preferences. In Section 4, we discuss existence and uniqueness of an equilibrium. In Section 5, we describe the Euler relation that characterizes the equilibrium path. In Section 6, we describe our numerical simulations of the one-asset consumption problem. In Section 7, we describe the properties of the hyperbolic consumption function and illustrate these properties with simulations. In Section 8, we review empirical applications of the hyperbolic model. In Section 9, we discuss the level of consumer sophistication assumed in hyperbolic models. In Section 10, we describe the policy implications of the hyperbolic model. In Section 11, we discuss some important extensions of the hyperbolic model, including applications in continuous time. In Section 12, we conclude. 2. HYPERBOLIC DISCOUNTING When researchers elicit time preferences, they ask subjects to choose among a set of delayed rewards. The largest rewards are accompanied by the greatest delays.4 Researchers use subject choices to estimate the shape of the discount function. These estimated discount functions almost always approximate generalized hyperbolas: events τ periods away are discounted with weight (1 + ατ )−γ /α , with α, γ > 0 (Loewenstein and Prelec, 1992).5 Figure 7.1 graphs the generalized hyperbolic discount function with parameters α = 4 and γ = 1. Figure 7.1 also plots the standard exponential discount function, δ τ , assuming δ = 0.944 (the annual discount factor used in our simulations). 4

5

Such experiments have used a wide range of real rewards, including money, durable goods, fruit juice, sweets, video rentals, relief from noxious noise, and access to video games. For example, see Thaler (1981); Navarick (1982); Millar and Navarick (1984); King and Logue (1987); Kirby and Herrnstein (1995); Kirby and Marakovic (1995, 1996); Kirby (1997); and Read et al. (1996). See Ainslie (1992), Frederick et al. (2001), and Angeletos et al. (2001b) for partial reviews of this literature. See Mulligan (1997) for a critique. Loewenstein and Prelec (1992) provide an axiomatic derivation of the generalized hyperbolic discount function. See Chung and Herrnstein (1961) for the ﬁrst use of the hyperbolic discount function. The original psychology literature worked with the special cases 1/τ and 1/(1 + ατ ). Ainslie (1992) reviews this literature.

Hyberbolic Discounting and Consumption

261

1 0.9 0.8 Exponential Hyperbolic Quasi-hyperbolic

Discount function

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

5

10

15

t

20

Exponential: δ , with δ=.944. Hyperbolic: (1+αt)

-γ/α

25 Year

30

35

40

45 2

50

3

, with α=4 and γ=1. Quasi-hyperbolic: {1,βδ,βδ ,βδ ,...}, with β=.7 and δ=.957.

Figure 7.1. Exponential and hyperbolic discount functions.

Because the discount rate represents the rate of decline of the discount function, the exponential discount function implies a constant discount rate: −

∂ (δ τ ) ∂τ δτ

= − ln δ.

By contrast, the hyperbolic discount function implies a discount rate that falls with the horizon, τ : ∂ (1 + ατ )−γ /α γ ∂τ − = . −γ /α (1 + ατ ) (1 + ατ ) In the short run, the hyperbolic discount rate is γ and in the long run the discount rate converges to zero. This reﬂects the robust experimental ﬁnding that people are very impatient in the short run (e.g., when postponing a reward from today to tomorrow) and very patient when thinking about long-run tradeoffs (postponing a reward from 100 days to 101 days). To reﬂect the empirical pattern of discount rates that fall with the horizon, Laibson (1997a) adopted a discrete-time discount function, {1, βδ, βδ 2 , βδ 3 , . . .}, which Phelps and Pollak (1968) had previously used to model intergenerational time preferences.6 This “quasi-hyperbolic function” reﬂects the sharp short-run drop in valuation measured in the experimental time-preference data and has been adopted as a research tool because of

6

Akerlof (1991) used a similar function: {1, β, β, β, . . .}.

262

Harris and Laibson

its analytical tractability.7 Figure 7.1 plots the particular parameterization of the quasi-hyperbolic discount function used in our simulations: β = 0.7 and δ = 0.957. Using annual periods, these parameter values roughly match experimentally measured discounting patterns. Delaying an immediate reward by a year reduces the value of that reward by approximately 40 percent ≈ 1 − βδ. By contrast, delaying a distant reward by an additional year reduces the value of that reward by a relatively small percentage: 1 − δ.8 All forms of hyperbolic preferences induce dynamic inconsistency. Consider the discrete-time quasi-hyperbolic function. The discount factor between adjacent periods t and t + 1 represents the weight placed on utils at time t + 1 relative to the weight placed on utils at time t. From the perspective of self t, the discount factor between periods t and t + 1 is βδ, but the discount factor that applies between any two later periods is δ. Because we take β to be less than one, this implies a short-term discount factor that is less than the long-term discount factor.9 From the perspective of self t + 1, βδ is the relevant discount factor between periods t + 1 and t + 2. Hence, self t and self t + 1 disagree about the desired level of patience that should be used to trade off rewards in periods t + 1 and t + 2. Because of this dynamic inconsistency, the hyperbolic consumer is involved in a decision that has intrapersonal strategic dimensions. Early selves would like to commit later selves to honor the preferences of those early selves. Later selves do their best to maximize their own interests. Economists have modeled this situation as an intrapersonal game played among the consumer’s temporally situated selves (Strotz, 1956). Recently, hyperbolic discount functions have been used to explain a wide range of anomalous economic choices, including procrastination, contract design, drug addiction, self-deception, retirement timing, and undersaving.10 We focus here on the implications for life cycle savings decisions. In the sections that follow, we analyze the “sophisticated” version of the hyperbolic model. Sophisticated hyperbolic consumers correctly predict that later selves will not honor the preferences of early selves. By contrast, “naive” consumers make current choices under the false belief that later selves will act in the interests of the current self. The assumption of naivete was ﬁrst proposed

7

8 9 10

The quasi-hyperbolic discount function is “hyperbolic” only in the sense that it captures the key qualitative property of the hyperbolic functions: a faster rate of decline in the short run than in the long run. Laibson (1997a) adopted the phrase “quasi-hyperbolic” to emphasize the connection to the hyperbolic-discounting literature in psychology (Ainslie 1992). O’Donoghue and Rabin (1999a) call these preferences “present biased.” Krusell and Smith (2000a) call these preferences “quasi-geometric.” See Ainslie (1992) and Frederick et al. (2000). Note that a discount factor, say θ , is inversely related to the discount rate, − ln θ . For example, see Akerlof (1991), Laibson (1994, 1996, 1997a), Barro (1997), Diamond and Koszegi (1998), O’Donoghue and Rabin (1999a, 1999b, 2000), Benabou and Tirole (2000), Brocas and Carrillo (2000, 2001), Carrillo and Dewatripont (2000), Carrillo and Marriotti (2000), Della Vigna and Paserman (2000), Della Vigna and Malmendier (2001), Gruber and Koszegi (2001), and Krusell et al. (2000a, 2000b).

Hyberbolic Discounting and Consumption

263

by Strotz (1956), and has since been carefully studied by Akerlof (1991) and O’Donoghue and Rabin (1999a, 1999b, 2000). We return to a discussion of naifs in Section 9.

3. THE CONSUMPTION PROBLEM Our benchmark model adopts the technological assumptions of standard “buffer-stock” consumption models like those originally developed by Deaton (1991) and Carroll (1992, 1997). These authors assume stochastic labor income and incomplete markets – consumers cannot borrow against uncertain future labor income. In this section, we consider a stripped-down stationary version of the standard buffer-stock model. In Section 8, we discuss a more complex life cycle model, with a richer set of institutional assumptions. Our modeling assumptions for the stripped-down model divide naturally into four parts: the standard assumptions from the buffer-stock literature; the assumptions that make our model qualitatively hyperbolic; our equilibrium concept; and the technical assumptions that allow us to derive the Hyperbolic Euler Relation. We discuss the ﬁrst three sets of assumptions herein. The fourth set of assumptions is presented in Section 4.1. 3.1.

Buffer-Stock Assumptions

During period t, the consumer has cash on hand xt ≥ 0. She chooses a consumption level ct ∈ [0, xt ], which rules out borrowing. Whatever the consumer does not spend is saved, st = xt − ct ∈ [0, xt ]. The gross return on her savings is ﬁxed, R ≥ 0, and next period she receives labor income yt+1 ≥ 0. Cash on hand during period t + 1 is, therefore, xt+1 = R(xt − ct ) + yt+1 . Labor income is independently and identically distributed over time with density f . The consumer cannot sell her uncertain stream of future labor-income payments, because of moral hazard and adverse selection, or because of prohibitions against indenturing. In other words, there is no asset market for labor. 3.2.

Hyperbolic Preferences

We model an individual as a sequence of autonomous temporal selves. These selves are indexed by the respective periods, t = 0, 1, 2, . . . , in which they control the consumption choice. Self t receives payoff

E t U (ct ) + β

∞

δ U (ct+i ) , i

(3.1)

i=1

where β ∈ [0, 1], δ ∈ [0, 1), and U : [0, +∞) → [−∞, +∞). Our model nests the standard case of exponential discounting: β = 1, 0 ≤ δ < 1. Our model also nests the quasi-hyperbolic case: β < 1, 0 ≤ δ < 1.

264

Harris and Laibson

3.3.

Equilibrium

We analyze the set of perfect equilibria in stationary Markov strategies of the intrapersonal game with players (or selves) indexed by the non-negative integers. Because income is iid., the only state variable is cash on hand xt . We therefore restrict attention to consumption strategies C that depend only on x t . 4. EXISTENCE AND UNIQUENESS This technical discussion can be skipped by readers interested primarily in applications. Such readers may wish to move immediately to Section 5. 4.1.

Technical Assumptions

We make the following technical assumptions: U1 U has domain [0, +∞) and range [−∞, +∞) U2 U is twice continuously differentiable on (0, +∞) U3 U > 0 on (0, +∞) U4 there exist 0 < ρ ≤ ρ¯ < +∞ such that ρ ≤ −cU

(c)/U (c) ≤ ρ¯ for all c ∈ (0, +∞) F1 f has domain (0, +∞) and range [0, +∞) F2 f is twice continuously differentiable / [y, y¯ ] F3 there exist 0 < y < y¯ < +∞ such that f (y) = 0 for all y ∈ D max{δ, δ R 1−ρ } < 1 Assumptions U1–U4 could be summarized by saying that U has bounded relative risk aversion. They are automatically satisﬁed if U has constant relative risk aversion. Assumptions F1–F3 could be summarized by saying that f is smooth, that the support of f is compact, and that 0 does not lie in the support of f . Assumption D ensures that the expected present discounted value of the consumer’s utility stream is always well deﬁned. Further discussion of these assumptions can be found in Harris and Laibson (2001a). 4.2.

The Bellman Equation of the Hyperbolic Consumer

The intrapersonal game of the hyperbolic consumer can be approached recursively as follows. Suppose that self t has current-value function Wt and continuation value function Vt , and suppose that self t + 1 has consumption function Ct+1 and current-value function Wt+1 . Then, it follows from the Envelope theorem that

(xt+1 ). U (Ct+1 (xt+1 )) = Wt+1

(4.1)

Next, it follows from the deﬁnition of Wt+1 and Vt that βVt (xt+1 ) = Wt+1 (xt+1 ) − (1 − β)U (Ct+1 (xt+1 )).

(4.2)

Hyberbolic Discounting and Consumption

Finally, it follows from the deﬁnition of Wt that Wt (xt ) = max U (c) + βδ Vt (R(xt − c) + y) f (y)dy. c∈[0,xt ]

Hence,

265

(4.3)

Wt (xt ) = max U (c) + δ c∈[0,xt ]

(Wt+1 − (1 − β)U ◦ Ct+1 )

× (R(xt − c) + y) f (y)dy [substituting for Vt in equation (4.3) using equation (4.2)]

) (R(xt − c) + y) f (y)dy = max U (c) + δ (Wt+1 − εU ◦ g ◦ Wt+1 c∈[0,xt ]

[where ε = 1 − β and g = (U )−1 ] = (BWt+1 ) (xt ), say. This is the Bellman equation of the hyperbolic consumer. 4.3.

The Finite-Horizon Case: Current-Value Functions

Suppose that the intrapersonal game of the hyperbolic consumer has a ﬁnite horizon T < +∞. Then, in principle, the current-value functions can be shown to be unique by backward induction. Indeed, suppose for simplicity that the consumer has no bequest motive. Then, we expect WT = U, WT −1 = BWT , . . . , W1 = BW2 . In practice, we need to ﬁnd a space of functions W such that, if Wt+1 ∈ W, then Wt = BWt+1 is well deﬁned and lies in W. To this end, we make the following deﬁnition. Deﬁnition 4.1. The function g : (0, +∞) → R is of locally bounded variation iff there exist increasing functions g+ : (0, +∞) → R and g− : (0, +∞) → R such that g = g+ − g− . Now, let us say that two functions of locally bounded variation are equiv0 alent iff they are equal at all points of continuity. Let BVloc ((0, +∞)) is the space of equivalence classes of functions of locally bounded variation, and let BV 1loc ((0, +∞)) denote the space of equivalence classes of functions W such that both W and W are of locally bounded variation. Then, the correct choice of space for our current-value function is W = BV 1loc ((0, +∞)).

is a function To see this, note ﬁrst that, if Wt+1 ∈ BV 1loc ((0, +∞)), then Wt+1

of locally bounded variation. Hence, Wt+1 is uniquely deﬁned, except at a countable set of points, and BWt+1 is uniquely deﬁned at all points. Second, consider the operator bγ given by the formula

) (bγ Wt+1 ) (xt ) = U (γ xt ) + δ (Wt+1 − εU ◦ g ◦ Wt+1 × (R(1 − γ )xt + y) f (y)dy.

266

Harris and Laibson

Then BWt+1 = sup {bγ Wt+1 }. γ ∈[0,1]

In other words, BWt+1 is the upper envelope of the functions bγWt+1 . Third, note that bγWt+1 is twice continuously differentiable. Moreover, there exists a continuous function a : (0, +∞) → [0, +∞] such that, for all γ ∈ [0, 1], |bγWt+1 |, |(bγ Wt+1 ) |, |(bγ Wt+1 )

| ≤ a on (0, +∞). In particular, there exists a twice continuously differentiable convex function κ : (0, +∞) → R such that, for all γ ∈ [0, 1], bγ Wt+1 + κ is convex. Hence BWt+1 = sup {bγ Wt+1 } = sup {bγ Wt+1 + κ} − κ. γ ∈[0,1]

γ ∈[0,1]

In other words, BWt+1 is the difference of two convex functions. In light of the following result, this is exactly what we need. Proposition 4.2. Suppose that W : (0, +∞) → R. Then, W ∈ BV 1loc ((0, +∞)) iff W is the difference of two convex functions. 4.4.

The Finite-Horizon Case: Consumption Functions

Suppose, again, that the intrapersonal game of the hyperbolic consumer has a ﬁnite horizon T < +∞ and that the consumer has no bequest motive. Then, the consumption function of self T is unique and is given by the formula C T (x T ) = x T ; and, for all 1 ≤ t ≤ T − 1, the consumption function of self t is any function such that

) Ct (xt ) ∈ argmax U (c) + δ (Wt+1 − εU ◦ g ◦ Wt+1 c∈[0,xt ]

× (R(xt − c) + y) f (y)dy for all xt ∈ [0, +∞). Now, Ct = g ◦ Wt is uniquely deﬁned and continuous, except on a countable set of points. Because this set of points has measure zero, it is encountered with probability zero. It follows that any two consumption functions of self t are observationally equivalent. By the same token, any two equilibria are observationally equivalent. This uniqueness claim can be made precise by viewing consumption functions as elements of the space BV 0loc ((0, +∞)).

Hyberbolic Discounting and Consumption

4.5.

267

The Inﬁnite-Horizon Case: Existence

To establish existence in the ﬁnite-horizon case, we showed that the Bellman operator B was a self-map of the space BV 1loc ((0, +∞)). To establish existence in the inﬁnite-horizon case, we need to strengthen this result by showing that there is a nonempty compact convex subset K of BV 1loc ((0, +∞)), such that B is a self-map of K. Deﬁne V : [0, +∞) → [−∞, +∞) by the formula δ V (x) = U (x) + U (y) f (y)dy, 1−δ deﬁne V¯ : [0, +∞) → [−∞, +∞) by the formula ∞ t+1 t s t ¯ V (x) = U (x) + δU R y+R x , t=1

s=0

and, for all Borel measurable V ∈ [V , V¯ ], deﬁne WV : [0, +∞) → [−∞, +∞) by the formula $ (WV )(x) = max U (γ x) + βδ V (R(1 − γ )x + y) f (y)dy . γ ∈[0,1]

Finally, put V − = −(V ∧ 0) and V¯ + = +(V¯ ∨ 0), deﬁne N1 : [0, +∞) → [0, +∞) by the formula N1 (x) = V − (y) ∨ V¯ + (Rx + y¯ ), and deﬁne N2 : [0, +∞) → [0, +∞) by the formula N2 (x) = U (x)/x ∨ N1 (x). Then: Theorem 4.3 [Global Regularity]. There exist K > 0 such that, for all V ∈ [V , V¯ ], 1. (1 − β)U + βV ≤ WV ≤ (1 − β)U + β V¯ , 2. U ≤ (WV ) ≤ U ∨ (K N1 ), and 3. (WV )

≥ −K N2 on (0, +∞). The required set K is then simply the set of W ∈ BV 1loc ((0, +∞)) that satisfy the three estimates in this theorem. 4.6.

The Inﬁnite-Horizon Case: Uniqueness

To establish uniqueness in the inﬁnite-horizon case, we begin by showing that, no matter what the initial cash on hand of the consumer, there exists a ﬁnite interval from which the dynamics of wealth never exit. Theorem 4.4. [Absorbing Interval]. Suppose that δ R < 1. Then, for all x0 ∈ [0, +∞), there exists β¯ 1 ∈ [0, 1) and X¯ ∈ [x0 , +∞) such that, for all β ∈

268

Harris and Laibson

[β¯ 1 , 1] and all equilibria C of the inﬁnite-horizon model, R(x − C(x)) + y ∈ [y, X¯ ] for all x ∈ [0, X¯ ] and all y ∈ [y, y¯ ]. We are now in a position to prove uniqueness. Theorem 4.5. [Uniqueness]. Suppose that δ R < 1, and that U is three times continuously differentiable on (0, +∞). Then, for all x0 ∈ [0, +∞), there exists β¯ 2 ∈ [0, 1) and X¯ ∈ [x0 , +∞) such that, for all β ∈ [β¯ 2 , 1], equilibrium is unique on [0, X¯ ]. Notice that Theorem 4.5 is a local uniqueness theorem: the critical value β¯ 2 will in general depend on x0 . Local uniqueness is, however, all that we need: if initial cash on hand is x0 and β ∈ [β¯ 2 , 1], then levels of cash on hand outside the interval [0, X¯ ] will not be observed in any equilibrium. We do not know whether theorem 4.5 has a global analog. Proof. See the Appendix. 4.7.

The Finite-Horizon Case: Robustness

By combining our existence and uniqueness results for the ﬁnite-horizon case with our regularity results, we can show that the equilibrium of the ﬁnitehorizon model depends continuously on the parameters U, f, β, and δ. This leaves one parameter unaccounted for: T . This parameter plays an important role in empirical applications. For example, simulations of calibrated life cycle models usually proceed by truncating the life cycle at some point. It is therefore crucial to verify that the equilibrium of the chosen model is robust with respect to the horizon chosen for the model. The simplest way to establish robustness would be to show that there is a unique equilibrium of the inﬁnite-horizon model. If we could show this, then it would follow at once from our regularity results that this equilibrium depended continuously on T . More precisely, note that T is chosen from the space N ∪ {∞}. All the points of this space are isolated except for the point ∞, which is an accumulation point. By saying that the equilibrium depends continuously on T , we therefore mean that there is a unique equilibrium when T = ∞ and, for all η > 0, there exists a T0 < ∞ such that, for all T > T0 , the equilibrium of the model with horizon T is within η of the equilibrium of the model with horizon ∞. In other words, the choice of horizon for the model makes very little difference to the equilibrium, provided that this horizon is sufﬁciently far into the future. Unfortunately, the proof of theorem 4.5 shows only that, if β is sufﬁciently close to 1, then there is a unique stationary equilibrium of the model. This leaves open two possibilities. First, there may be more than one stationary equilibrium

Hyberbolic Discounting and Consumption

269

if β is not close to 1. Second, there may be nonstationary equilibria. It may be very difﬁcult to make progress with the ﬁrst possibility: Although it may be possible to identify other regions of parameter space in which there is a unique stationary equilibrium, it may not be true that there is a unique equilibrium for all choices of the parameters. After all, we are analyzing a game. It may, however, be possible to make progress with the second possibility: what is needed here is a proof that the Bellman operator is a contraction mapping. The proof of Theorem 4.5 falls short of this goal: it shows only that the Bellman operator is a contraction mapping when conﬁned to the set of current-value functions of stationary equilibria. Nonetheless, the available evidence suggests that life cycle simulations are probably robust to the choice of horizon provided that β is sufﬁciently close to 1. 5. GENERALIZED EULER EQUATION In this section, we discuss the hyperbolic analog of the standard Euler Relation.11 5.1.

Heuristic Derivation of the Hyperbolic Euler Relation

Suppose that C is an equilibrium consumption function. Adopt the perspective of self t. Because all future selves use the consumption function C, and because self t uses the same discount factor δ from period t + 1 onward, her continuation-value function V solves the recursive equation V (xt+1 ) = U (C(xt+1 )) + E t+1 [δV (R(xt+1 − C(xt+1 )) + yt+2 )]. (5.1) Note that V (xt+1 ) is the expectation, conditional on xt+1 , of the present discounted value of the utility stream that starts in period t + 1. Self t uses discount factor βδ at time t. Her current-value function W therefore solves the equation W (xt ) = U (C(xt )) + E t [βδV (R(xt − C(xt )) + yt+1 )].

(5.2)

Moreover C(xt ) ∈ argmax U (c) + E t [βδV (R(xt − c) + yt+1 )],

(5.3)

c∈[0,xt ]

because consumption is chosen by the current self. The ﬁrst-order condition associated with (5.3) implies that U (C(xt )) ≥ E t [RβδV (R(xt − C(xt )) + yt+1 )],

(5.4)

with equality if C(xt ) < xt . The ﬁrst-order condition and envelope theorem together imply that the shadow value of cash on hand equals the marginal 11

The material from this section was ﬁrst published in Harris and Laibson (2001).

270

Harris and Laibson

utility of consumption: W (xt ) = U (C(xt )).

(5.5)

Finally, V and W are linked by the equation βV (xt+1 ) = W (xt+1 ) − (1 − β)U (C(xt+1 )).

(5.6)

These expressions can be combined to yield the Strong Hyperbolic Euler Relation. Indeed, we have U (C(xt )) ≥ E t [RβδV (R(xt − C(xt )) + yt+1 )] [this is just the ﬁrst-order condition (5.4)] = E t [Rδ(W (xt+1 ) − (1 − β)U (C(xt+1 ))C (xt+1 ))] [differentiating equation (5.6) with respect to xt+1 and substituting in] = E t [Rδ(U (C(xt+1 )) − (1 − β)U (C(xt+1 ))C (xt+1 ))] [from the analog of equation (5.5) for self t + 1]. Rearranging yields U (C(xt )) ≥ E t [R(C (xt+1 )βδ + (1 − C (xt+1 ))δ)U (C(xt+1 ))],

(5.7)

with equality if c < xt . This is the Hyperbolic Euler Relation. When β = 1, this relation reduces to the well-known Exponential Euler Relation U (C(xt )) ≥ E t [RδU (C(xt+1 ))]. Intuitively, the marginal utility of consuming an additional dollar today, U (Ct ), must equal the marginal utility of saving that dollar. A saved dollar grows to R dollars by next year. Utilities next period are discounted with factor δ. Hence, the value of today’s marginal savings is given by E t [RδU (Ct+1 )]. The expectation operator integrates over uncertain future consumption. The difference between the Hyperbolic Euler Relation and the Exponential Euler Relation is that, in the former, the constant exponential discount factor, δ, is replaced by the effective discount factor, namely C (xt+1 )βδ + (1 − C (xt+1 ))δ. This effective discount factor is a weighted average of the short-run discount factor βδ and the long-run discount factor δ. The respective weights are C (xt+1 ), the marginal propensity to consume out of liquid wealth, and (1 − C (xt+1 )), the marginal propensity to consume out of liquid wealth. Because β < 1, the effective discount factor is stochastic and endogenous to the model. In the sophisticated hyperbolic model, the effective discount factor is negatively related to the future marginal propensity to consume (MPC). To gain intuition for this effect, consider a consumer at time 0 who is thinking about saving a marginal dollar for the future. The consumer at time zero – “self 0” – expects future selves to overconsume relative to the consumption rate that self 0 prefers those future selves to implement. Hence, on the equilibrium path, self 0 values marginal saving more than marginal consumption at any future time

Hyberbolic Discounting and Consumption

271

period. From self 0’s perspective, therefore, it matters how a marginal unit of wealth at time period 1 will be divided between savings and consumption by self 1. Self 1’s MPC determines this division. Because self 0 values marginal saving more than marginal consumption at time period 1, self 0 values the future less the higher the expected MPC at time period 1. The effective discount factor in the Hyperbolic Euler Relation varies significantly with cash on hand. Consumers who expect to have low levels of future cash on hand will expect C (xt+1 ) to be close to one,12 implying that the effective discount factor will approximately equal βδ. Assuming that periods are annual with a standard calibration of β = 0.7 and δ = 0.95, the effective discount rate would be − ln(0.7 × 0.95) = 0.41. By contrast, consumers with high levels of future cash on hand will expect C (xt+1 ) to be close to zero,13 implying that the effective discount factor will approximately equal δ. In this case, the effective discount rate will be − ln(0.95) = 0.05. The simulations reported below conﬁrm these claims about the shape of C. 5.2.

Exact Derivation

If the consumption function is discontinuous, then the derivation of the Hyperbolic Euler Relation is not valid. However, the consumption function is always of locally bounded variation. This property can be used to derive a weaker version of the Hyperbolic Euler Relation. This weaker version reduces to the Hyperbolic Euler Relation if the consumption function is Lipschitz-continuous. Moreover, it can be shown that the consumption function is indeed Lipschitzcontinuous when β is sufﬁciently close to 1 (Harris and Laibson 2001a). 6. NUMERICAL SOLUTION AND CALIBRATION OF THE MODEL We complement our theoretical analysis with numerical simulations. Numerical results help to build intuition and provide quantitative assessment of qualitative effects. In this section, we describe our strategy for simulating the one-asset, inﬁnite-horizon model. The same broad strategy applies to the institutionally richer simulations that we describe in Section 8. We calibrate our stripped-down model with the same parameter values used by ALRTW (2001a). Speciﬁcally, ρ = 2, β = 0.7, δ = 0.9571, and R = 1.0375.14

12 13

14

Low levels of cash on hand imply that the agent is liquidity-constrained. Hence, low levels of cash on hand imply a high MPC. When the agent is not liquidity-constrained, marginal consumption is approximately equal to the annuity value of marginal increments of wealth. Hence, the local slope of the consumption function is close to the real interest rate. ALRTW choose all of these parameters ex ante, except δ. Then, δ is chosen so that the simulated data match the empirical median wealth to income ratio of 50- to 59-year-old household heads. ALRTW also use this method to infer the preferences of exponential consumers (β = 1). They ﬁnd that δexponential = .9437.

272

Harris and Laibson

1.2

45-degree line

Consumption function C(x)

1

0.8

Consumption function

0.6

0.4

0.2

0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption function is based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375, a = 5.

Figure 7.2. Calibrated consumption function.

To capture labor-income uncertainty, we adopt a shifted symmetric Beta density with support [ε, 1 + ε]: f (Y ) ∝ (Y − ε)(a−1) (1 + ε − Y )(a−1) , where a > 0 and ε is positive, but close to zero. Hence, Y has mean 12 + ε. If a > 1, the density is bell-shaped and continuous on R. Moreover, if a > 3, then the density is twice continuously differentiable, and therefore satisﬁes the regularity conditions of Section 4. We set a = 5, implying that σ (Y )/Y = 0.30. This value is comparable with the value of σ (Y )/Y implied by standard income processes estimated from the Panel Study of Income Dynamics. For example, ALRTW estimate a process for ln Y that has two components: an AR(1) process and iid. noise.15 Their empirically estimated process implies σ (Y )/Y = 0.32.16 Figure 7.2 reports the equilibrium consumption function generated by our inﬁnite-horizon, one-asset simulation.17 The function is continuous, monotonic, 15 16 17

Speciﬁcally, ln Yt = [household ﬁxed effects] + [polynomial in age] + u t + ηt where u t = αu t−1 + εt , and ηt and εt are white noise. This is an unconditional ( normalized standard deviation. The empirical conditional normalized standard deviation, E t−1 (Yt − Y¯ t )2 /Y , is .23. To simulate our model numerically, we adopt a numerical solution algorithm that does not interpolate between points in the state space. Speciﬁcally, our algorithm discretizes the state space and forces the consumer to make choices that keep the state variables on the discrete partition. We believe that our algorithm successfully approximates the behavior that would arise in a continuous state space. Most importantly, we ﬁnd that once our partition is made sufﬁciently ﬁne, further reﬁnement has no effect on our simulation results.

Hyberbolic Discounting and Consumption

273

and concave. It appears smooth, except for the point at which the liquidity constraint begins to bind. In the next section, we identify cases in which these regularity properties cease to hold. 7. PROPERTIES OF THE CONSUMPTION FUNCTION The consumption function in Figure 7.2 is continuous, monotonic, and concave. However, hyperbolic consumption functions need not have these desirable properties (Laibson, 1997b, Morris and Postlewaite, 1997, O’Donoghue and Rabin, 1999a, Harris and Laibson, 2001a, and Krusell and Smith, 2000). In this section we characterize the general properties of the hyperbolic consumption function. We ﬁrst discuss the kinds of pathologies that can arise. We then discuss the regularity conditions that eliminate these pathologies. 7.1.

Pathologies: Violations of Continuity, Monotonicity, and Concavity

To develop intuition for the existence of hyperbolic pathologies, we consider a ﬁnite-horizon version of the model of Section 3.18 We assume that the stream of income is deterministic. We apply backward induction arguments to solve for the equilibrium policies. First, consider the strategy of self T . Trivially, self T sets cT = x T . Self T consumes all available cash on hand. Now, consider the problem of self T − 1. Self T − 1 knows that any resources left to self T will be consumed by self T . So, self T − 1 chooses cT −1 to maximize U (cT −1 ) + βδU (cT ) subject to the constraints x T = R(x T −1 − cT −1 ) + yT , cT −1 ≤ x T −1 , cT = x T . The ﬁrst constraint is the dynamic budget constraint. The second constraint is the liquidity constraint. The third constraint reﬂects the equilibrium strategy of self T . Given this problem, it is straightforward to show that, when the liquidity constraint does not bind, self T − 1 picks cT −1 such that U (cT −1 ) = βδ RU (R · (x T −1 − cT −1 ) + yT ). When the liquidity constraint binds, self T − 1 sets cT −1 = x T −1 . Represent self T − 1’s equilibrium policy function as C T −1 (x T −1 ). 18

See Laibson (1997b) for the original version of this example.

274

Harris and Laibson

Now, consider the problem of self T − 2. Self T − 2 chooses cT −2 to maximize U (cT −2 ) + βδU (cT −1 ) + βδ 2 U (cT ), subject to the constraints x T −1 = R(x T −2 − cT −2 ) + yT −1 , cT −1 = C T −1 (x T −1 ), cT = x T .

cT −2 ≤ x T −2 ,

The ﬁrst constraint is the dynamic budget constraint. The second constraint is the liquidity constraint. The third and fourth constraints represent the strategies of selves T − 1 and T . To develop intuition for the optimal policy of self T − 2, consider the continuation value function of self T − 2, VT −1 (x T −1 ) = u(C T −1 (x T −1 )) + δu(R(x T −1 − C T (x T −1 )) + yT ). From self T − 2’s perspective, wealth at time T − 1 has a value βδVT −1 (x T −1 ). There exists a threshold wealth level x T −1 = xˆ at which the liquidity constraint for self T − 1 ceases to bind. In the region to the left of xˆ , all marginal wealth is consumed in period T − 1, implying VT −1 (xˆ −) = U (C T −1 (xˆ )). In the region to the right of xˆ , some marginal wealth is passed on to period T , implying VT −1 (xˆ +) = C T −1 (xˆ ) · U (C T −1 (xˆ )) + δ R(1 − C T −1 (xˆ ))U (C T (xˆ )). Note that at x T −1 = xˆ , self T − 1 is indifferent between marginal consumption in period T − 1, and marginal consumption in period T . So, U (C T −1 (xˆ )) = RβδU (C T (xˆ )). Substituting this relationship into the previous expression yields 1

VT −1 (xˆ +) = C T −1 (xˆ ) + (1 − C T −1 (xˆ )) U (C T −1 (xˆ )) β > U (C T −1 (xˆ )) = VT −1 (xˆ −). Hence the continuation value function VT −1 has a kink at x T −1 = xˆ . At this point, the slope of the value function discretely rises. This kink implies that the equilibrium consumption function of self T − 2 will have a downward discontinuity. To understand why, note that self T − 2 will never select a value of cT −2 > 0, such that R(x T −2 − cT −2 ) + yT +1 = x T −1 = xˆ . If x T −1 = xˆ did hold, self T − 2 could raise her welfare by either cutting or raising consumption. If U (cT −2 ) < βδ RVT −1 (xˆ +), self T − 2 could increase welfare by cutting consumption – with marginal cost

Hyberbolic Discounting and Consumption

275

U (cT −2 ) – and raising saving – with marginal beneﬁt βδ RVT −1 (xˆ +). If U (cT −2 ) ≥ βδ RVT −1 (xˆ +), self T − 2 could increase welfare by raising consumption – with marginal beneﬁt U (cT −2 ) – and lowering saving – with marginal cost βδ RVT −1 (xˆ −) < βδ RVT −1 (xˆ +) ≤ U (cT −2 ). Self T − 2 makes equilibrium choices that avoid the region of lowcontinuation marginal utilities – in the neighborhood to the left of x T −1 = xˆ – by jumping to the region of high-continuation marginal utilities to the right of x T −1 = xˆ . This avoidance can be achieved only with an equilibrium consumption function that has a discrete downward discontinuity. Figure 7.3 plots the equilibrium consumption functions for selves T − 2, T − 1, and T for the case in which the instantaneous utility function is isoelastic and yt = 1 for all t. Intuitively, the pathology described here arises because of a special kind of strategic interaction. Self T − 2’s consumption function discontinuously declines because self T − 2 has an incentive to push self T − 1 over the wealth threshold xˆ at which self T − 1 has a kink in its consumption function. Self T − 2 is willing to discretely cut its own consumption to push T − 1 over the xˆ threshold, because the marginal returns to the right of xˆ are greater than the marginal returns to the left of xˆ from self T − 2’s perspective. If this example was extended another period, we could also demonstrate that the optimal choices of self T − 3 will violate the Hyperbolic Euler Equation. Finally, all of these pathologies would continue to arise, even if a small amount of smooth noise was added to the income process. 7.2.

Sufﬁcient Conditions for Continuity, Monotonicity, and Concavity of the Consumption Function

The previous subsection provides an example of the kinds of pathologies that can arise in hyperbolic models. However, these pathologies do not arise when the model is calibrated with empirically sensible parameter values (see Figure 7.2). In this section, we identify the parameter regions that generate the pathologies. First, when β is close to one, the discontinuities and nonmonotonicities vanish. Harris and Laibson (2001a) prove this claim formally. Intuitively, when β is close to one, the hyperbolic consumption function converges to the exponential consumption function, which is continuous and monotonic.19 Likewise, when β is close to one, the hyperbolic consumption function matches the concavity of the exponential consumption function. Carroll and Kimball (1996) provide sufﬁcient conditions for exponential concavity (U in the HARA class), although they do not handle the case of binding liquidity constraints. Figure 7.4 graphically demonstrates the comparative static on β. We plot the consumption functions generated by β values {0.1, 0.2, 0.3, . . . , 0.7}.20 The consumption functions are vertically shifted so they do not overlap. Recall 19 20

All of the convergence results apply to an absorbing interval of x values. See Section 4 for a deﬁnition and discussion of such absorbing intervals. We adopt the baseline parameter values a = 5, δ = .0571, ρ = 2, R = 1.0375.

Consumption in period T-2

1 2 Cash-on-hand

3

Consumption in period T-1 0 0

0.5

1

1.5

2

2.5

3

1x 2 Cash-on-hand

3

0 0

0.5

1

1.5

2

2.5

3

1 2 Cash-on-hand

Figure 7.3. Consumption functions in periods T − 2, T − 1, and T .

The consumption functions are based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375.

0 0

0.5

1

1.5

2

2.5

3 Consumption in period T

3

Hyberbolic Discounting and Consumption

277

Vertically shifted consumption functions C(x)

12 β = .1 β = .2 β = .3 β = .4 β = .5 β = .6 β = .7

10

8

6

4

2

0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which δ = .9571, ρ = 2, R = 1.0375, a = 5.

Figure 7.4. Variation in β.

that β = 0.7 corresponds to our benchmark calibration. As β falls below 0.4, the consumption function becomes increasingly irregular. However, regularity returns as β falls to zero: in a neighborhood of β = 0, the consumption function coincides with the 45 degree line. Pathologies are also controlled by the curvature of the consumption function. Our simulation results imply that increasing ρ eliminates irregularities. Figure 7.5 graphically demonstrates the comparative static on ρ. We plot the consumption functions generated by ρ values {0.5, 0.75, 1, 1.25}.21 The consumption functions are again vertically shifted. Recall that ρ = 2 corresponds to our benchmark calibration. As ρ falls below 1.25, the consumption function becomes increasingly irregular. The irregularities increase as ρ falls, because low curvature augments the feedback effects that engender the irregularities. Speciﬁcally, when the utility function is relatively less bowed, it is relatively less costly to strategically cut consumption today to push future selves over critical wealth thresholds. Finally, decreasing the variance of the income process increases the degree of irregularity. Figure 7.6 graphically demonstrates the comparative static on a. We plot the consumption functions generated by a values {25, 50, 100, 200, 400}.22 These a values correspond to σ (Y )/Y values of {0.14, 0.10, 0.07, 0.05, 0.04, 0.03}. Recall that a = 5 (i.e., σ (Y )/Y = 0.30) corresponds to our benchmark calibration. As a rises above 25 (i.e., σ (Y )/Y falls below 0.14), 21 22

We adopt the baseline parameter values a = 5, β = .7, δ = .9571, R = 1.0375. We adopt the baseline parameter values β = .7, δ = .9571, ρ = 2, R = 1.0375.

278

Harris and Laibson

Vertically shifted consumption functions C(x)

8 ρ = 0.50 ρ = 0.75 ρ=1 ρ = 1.25

7 6 5 4 3 2 1 0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which ρ = 2, δ = .9571, R = 1.0375, a = 5.

Figure 7.5. Variation in the coefﬁcient of relative risk aversion (ρ).

Vertically shifted consumption functions C(x)

4 a = 400 a = 200 a = 100 a = 50 a = 25

3.5 3 2.5 2 1.5 1 0.5 0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375.

Figure 7.6. Variation in income uncertainty (a).

Hyberbolic Discounting and Consumption

279

the consumption function becomes increasingly irregular. The irregularities increase as a increases, because high a values correspond to low levels of income volatility. Low volatility makes it easier for early selves to predict future wealth levels, and to strategically push later selves over critical wealth thresholds. In summary, irregularities vanish when β is close to one, risk aversion is high, and uncertainty is high. At the benchmark calibration, the pathologies do not arise. Moreover, our model omits some sources of uncertainty that would only reinforce the regularity of our benchmark consumption functions. For example, our model omits shocks to preferences and asset return uncertainty.23 8. CONSUMPTION APPLICATIONS A series of papers have analyzed the positive and normative implications of the hyperbolic buffer-stock model: Laibson, Repetto, and Tobacman (1998, 2000) [hereafter LRT] and ALRTW (2001a, 2001b). These papers extend the precautionary saving models pioneered by Zeldes (1989b), Deaton (1991), and Carroll (1992, 1997).24 We will focus our discussion on the work of LRT (2000) and ALRTW (2001a). The ALRTW model incorporates most of the features of previous life cycle simulation models and adds new features, including credit cards, time-varying household size, and illiquid assets. We summarize the key features of the ALRTW model herein. A more general version of the model, and a complete description of the calibration, appear in LRT.25 8.1.

Model Summary

Households are divided into three levels of educational attainment. We discuss simulation results only for the largest group, households whose head has only a high school degree (roughly half of U.S. households). The simulations have been replicated for households in other educational categories, and the conclusions are quantitatively similar (see LRT). Households face a time-varying, exogenous hazard rate of survival. Households live for a maximum of 90 periods, beginning economic life at age 20 and retiring at age 63. The retirement age is calibrated to match reported retirement ages from the PSID. Household composition – number of adults and nonadults – varies exogenously over the life cycle (also calibrated to match the PSID). Log income, ln Yit , is modeled as the sum of a polynomial in age and two stochastic components: an autocorrelated component and an iid. component. 23

24 25

Asset return uncertainty has an advantage over labor-income uncertainty, because the volatility generated by noisy returns scales up with the level of wealth. With sufﬁcient asset uncertainty, it should be possible to establish that regularity applies to the entire domain of cash on hand, instead of just an absorbing interval. See, also, Engen, Gale, and Scholz (1994), Hubbard, Skinner, and Zeldes (1994, 1995), and Gourinchas and Parker (1999). This more general model allows consumers to declare bankruptcy and allows the consumer to borrow against illiquid collateral (e.g., mortgages on housing).

280

Harris and Laibson

Different processes are estimated during the working life and during retirement (using the PSID). Households may hold liquid assets, X t , and illiquid assets, Z t . Because labor income is liquid wealth, X t + Yt represents total liquid asset holdings at the beginning of period t. Credit card borrowing is modeled as a negative value for X t . Credit card borrowing must not exceed a credit limit equal to some fraction of current (average) income. Speciﬁcally, X t ≥ −λ · Y¯ t , where Y¯ t is cohort average income at age t, and λ = 0.30 (calibrated from the 1995 SCF). The real after-tax interest rate on liquid assets is 3.75 percent. The real interest rate on credit card loans is 11.75 percent, two percentage points below the mean debt-weighted real interest rate reported by the Federal Reserve Board. This low value is chosen to capture implicitly the effect of bankruptcy. Actual annual bankruptcy rates of roughly 1 percent per year imply that the effective interest rate is at least one percentage point below the observed interest rate. The illiquid asset generates consumption ﬂows equal to 5 percent of the value of the asset (Z t ≥ 0). Hence, the holding return on illiquid assets is considerably higher than the return on other assets. However, the illiquid asset can be sold only with a transaction cost. Households have isoelastic preferences with a coefﬁcient of relative risk aversion of ρ = 2. Self t has instantaneous payoff function Ct + γ Z t 1−ρ −1 nt . u(Ct , Z t , n t ) = n t · 1−ρ Note that γ Z t represents the consumption ﬂow generated by Z t (γ = 0.05), and n t is the effective household size, n t = ([no. adultst ] + 0.4[no. of childrent ]). t ) or a Households have either an exponential discount function (δexponential t quasi hyperbolic discount function (βδhyperbolic , with β = 0.7). ALRTW assume that the economy is populated either exclusively by exponential households or exclusively by hyperbolic households. ALRTW pick δexponential and δhyperbolic to match empirical levels of retirement saving. Speciﬁcally, δexponential is picked so that the exponential simulations generate a median wealth to income ratio of 3.2, for individuals between ages 50 and 59. The median of 3.2 is calibrated from the SCF.26 The hyperbolic discount factor, δhyperbolic , is also picked to match the empirical median of 3.2.27 The discount factors that replicate the SCF wealth to income ratio are .9437 for the exponential model and .9571 for the hyperbolic model. Because hyperbolic consumers have two sources of discounting – β and δ – the hyperbolic 26 27

Wealth does not include social security wealth and other deﬁned beneﬁt pensions, which are already built into the model in the form of postretirement “labor income.” For calibration purposes, total wealth is measured as X + Z + (Y/24), where X represents liquid assets (excluding current labor income), Z represents illiquid assets, and Y represents annual after-tax labor income. The Y /24 is included to reﬂect average cash inventories used for (continuous) consumption out of labor income. If labor income is paid in equal monthly installments, Y /12, and consumption is smoothly spread over time, then average cash inventories will be Y /24.

Hyberbolic Discounting and Consumption

281

4

x 10

Mean consumption by age

4

Hyperbolic Exponential

3.5

3

2.5

2

1.5 20

30

40

50

60

70

80

90

Age Source: Angeletos et al 2001.

Figure 7.7. Simulated mean consumption proﬁles of hyperbolic and exponential households.

δs lie above the exponential δs. Recall that the hyperbolic and exponential discount functions are calibrated to generate the same amount of preretirement wealth accumulation. In this manner, the calibrations “equalize” the underlying willingness to save between the exponential and hyperbolic consumers. The calibrated long-term discount factors are sensible when compared with discount factors that have been used in similar exercises by other authors. Finally, note that these discount factors do not include mortality effects, which reduce the respective discount factors by an additional 1 percent on average per year. 8.2.

Simulation Results of ALRTW

Calibrated hyperbolic simulations – β = 0.7, δ = 0.957 – generate life cycle consumption proﬁles that closely match the life cycle consumption proﬁles generated by calibrated exponential simulations – β = 1, δ = 0.944. For example, Figure 7.7 compares hyperbolic and exponential consumption means over the life cycle. These two hump-shaped proﬁles are very similar.28 The only differences arise around retirement and at the very beginning and end of life. At the beginning of life, hyperbolic consumers go on a credit card–ﬁnanced spending spree,29 leading to higher consumption than the exponentials. Around 28

29

The consumption proﬁles roughly track the mean labor-income proﬁle. This low-frequency comovement is driven by two factors. First, low income early in life holds down consumption, because consumers do not have large credit lines. Second, consumption needs peak in midlife, when the number of adult-equivalent dependents peaks at age 47. See Gourinchas and Parker (1999) for empirical evidence on the early life consumption boom.

282

Harris and Laibson

retirement, hyperbolic consumption falls more steeply than exponential consumption, because hyperbolic households have most of their wealth in illiquid assets, which they cannot cost-effectively sell to smooth consumption. At the end of life, hyperbolic consumers have more illiquid assets to sell, slowing down the late-life collapse in consumption. The total wealth proﬁles of hyperbolics and exponentials are also similar. This correspondence is not surprising, because the hyperbolic and exponential simulations are each calibrated to match the observed level of retirement wealth accumulation in the SCF. However, the two models generate very different simulated allocations across liquid and illiquid assets. Just before retirement (age 63), the average liquid asset holding of the simulated hyperbolics households is only about $10,000, whereas the exponential households have accumulated more than $45,000 in liquid wealth (1990 dollars).30 Hyperbolics end up holding relatively little liquid wealth, because liquidity tends to be splurged to satisfy the hyperbolic taste for instant gratiﬁcation. Both naive and sophisticated hyperbolics will quickly spend whatever liquidity is at their disposal. By contrast, hyperbolics hold much more illiquid wealth than their exponential counterparts. Just before retirement, the average illiquid asset holding of the simulated hyperbolics is $175,000, compared with $130,000 for the exponentials. Hyperbolics are more willing to hold illiquid wealth for two reasons. First, sophisticated hyperbolics (like the hyperbolics in these simulations) view illiquid assets as a commitment device, which they value because it prevents later selves from splurging saved wealth too quickly. Second, illiquid assets are particularly valuable to hyperbolics (both naifs and sophisticates), because hyperbolics have lower long-run discount rates than exponentials. Hence, hyperbolics place relatively greater value on the long-run stream of payoffs associated with illiquid assets.31 Hyperbolics and exponentials dislike illiquidity for the standard reason that illiquid assets cannot be used to buffer income shocks. But, this cost of illiquidity is partially offset for hyperbolics for the two reasons described: hyperbolics value commitment and hyperbolics more highly value the long-run dividends of illiquid assets. Hence, on net, illiquidity is less costly for a hyperbolic than for an exponential consumer. To evaluate empirically the asset allocation predictions of the hyperbolic and exponential models, ALRTW compare the simulated results to survey evidence from the SCF. For example, ALRTW analyze the percentage of households that have at least 1 month of liquid wealth on hand. On average, 73 percent of simulated exponential households hold liquid assets greater than 1 month of labor income. The analogous number for hyperbolics is only 40 percent. For comparison, 42 percent of households in the SCF hold liquid ﬁnancial assets greater than 1 month of labor income. 30

31

For the purposes of the analysis in this subsection, simulated liquid assets are measured as X + + (Y /24), where X + represents positive holdings of liquid assets (excluding current labor income). The long-run discount rate of a hyperbolic consumer, − ln(δhyperbolic ) = − ln(.957) = .044, is calibrated to lie below the long-run discount rate of an exponential consumer, − ln(δexponential ) = − ln(.944) = .058.

Hyberbolic Discounting and Consumption

283

ALRTW also evaluate the models by analyzing the simulated quantity of liquid assets as a share of total assets. In the SCF, the average liquid wealth share is only 8 percent and neither the exponential nor hyperbolic simulations match this number, although the hyperbolic simulations are a bit closer to the mark. The average liquid wealth share for simulated hyperbolic households is 31 percent. The analogous exponential liquid wealth share is 50 percent. Revolving credit – e.g., credit card borrowing – represents another important form of liquidity. Low levels of liquid assets are naturally associated with high levels of credit card debt. ALRTW contrast exponential and hyperbolic consumers by comparing their simulated propensities to borrow on credit cards.32 At any point in time 51 percent of hyperbolic consumers borrow on their credit cards, compared with only 19 percent of exponentials. In the 1995 SCF, 70 percent of households with credit cards report that they did not fully pay their credit card bill the last time that they mailed in a payment. Hyperbolic simulations come much closer to matching these self-reports. Likewise, the simulated hyperbolic consumers borrow much more on average than the simulated exponential consumers. On average, simulated exponential households owe $900 of interest-paying credit card debt, including the households with no debt. By contrast, simulated hyperbolic households owe $3,400 of credit card debt. The actual amount of credit card debt owed per household with a credit card is approximately $4,600 (including households with no debt, but excluding the ﬂoat).33 Euler Equation tests have played a critical role in the empirical consumption literature since the work of Hall (1978). Many of the papers in this literature have asked whether lagged information predicts current consumption growth. In particular, many authors have tried to determine whether predictable changes in income predict changes in consumption: ln(Cit ) = α E t−1 ln(Yit ) + X it β + εit .

(8.1)

Here X it is a vector of control variables. The standard consumption model (without liquidity constraints) predicts α = 0; the marginal propensity to consume out of predictable changes in income should be zero. By contrast, empirical estimates of α lie above 0, with “consensus estimates” around α = 0.2.34 ALRTW estimate the standard comovement regression using simulated data. For the hyperbolic simulations, the coefﬁcient on E t−1 ln(Yit ) is α = 0.17. 32 33

34

See LRT (2000) for a much more detailed analysis of credit card borrowing. This average balance includes households in all education categories. It is calculated on the basis of aggregate information reported by the Federal Reserve. This ﬁgure is consistent with values from a proprietary account-level data set assembled by David Gross and Nicholas Souleles (1999a, 1999b, 2000). See LRT (2000). For example, Hall and Mishkin (1982) report a statistically signiﬁcant coefﬁcient of .200, Hayashi (1985) reports a signiﬁcant coefﬁcient of .158, Altonji and Siow (1987) report an insigniﬁcant coefﬁcient of .091, Attanasio and Weber (1993) report an insigniﬁcant coefﬁcient of .119, Attanasio and Weber (1995) report an insigniﬁcant coefﬁcient of .100, Shea (1995) reports a marginally signiﬁcant coefﬁcient of .888, Lusardi (1996) reports a signiﬁcant coefﬁcient of .368, Souleles (1999) reports a signiﬁcant coefﬁcient of .344, and ALRTW (2000) report a signiﬁcant coefﬁcient of .285. See Deaton (1992) and Browning and Lusardi (1996) for a discussion of the excess sensitivity literature.

284

Harris and Laibson

By contrast, the exponential simulations generate a value of α = 0.03. Hyperbolic consumers hold more of their wealth in illiquid form than exponentials. So, hyperbolics are more likely to hit liquidity constraints, raising their marginal propensity to consume out of predictable changes in income. The hyperbolic simulations also predict income-consumption comovement around retirement. Banks, Blundell, and Tanner (1998) and Bernheim, Skinner, and Weinberg (1997) argue that consumption anomalously falls during the mid1960s, at the same time that workers are retiring and labor income is falling. ALRTW estimate the following regression to explore the consumption drop at retirement: ln(Cit ) = IitRETIRE γ + X it β + εit . Here IitRETIRE is a set of dummy variables that take the value of one in periods t − 1, t, t + 1, and t + 2 if period t is the age of retirement; and X it is a vector of control variables. Summing the coefﬁcients on the four dummy variables (and switching signs) generates an estimate of the “excess” drop in consumption around retirement. Estimating these coefﬁcients from the PSID yields a statistically signiﬁcant excess drop of 11.6 percent around retirement. The analogous drop for simulated hyperbolic consumers is 14.5 percent, whereas the drop for simulated exponential consumers is only 3.0 percent. Hyperbolic consumers hold relatively little liquid wealth. A drop in income at retirement translates into a substantial drop in consumption, even though retirement is an exogenous, completely predictable event. All in all, the hyperbolic model consistently does a better job of approximating the data. Table 7.1 draws these ﬁndings together. 9. NAIFS VERSUS SOPHISTICATES Until now, we have considered the case in which early selves hold correct expectations about the preferences and behavior of later selves. Early selves anticipate that later selves will fail to maximize the patient long-run interests of early selves. When early selves hold such correct expectations, they are referred to as sophisticates (Strotz, 1956). Table 7.1.

% with liquid > 121 Y assets Mean liquidliquid + illiquid assets % borrowing on “Visa” Mean borrowing C − Y comovement % C drop at retirement Source: ALRTW (2001b).

Hyperbolic

Exponential

Data

40% 0.39 51% $3,400 0.17 14.5%

73% 0.50 19% $900 0.03 3.0%

42% 0.08 70% $4,600 ≈0.20 11.6%

Hyberbolic Discounting and Consumption

285

However, it is reasonable to imagine that early selves might mistakenly expect later selves to follow through on the early selves’ best intentions. This is the naive case, discussed by Strotz (1956), Akerlof (1991), and O’Donoghue and Rabin (1999a, 1999b, 2000). Such naifs have optimistic forecasts in the sense that they believe that future selves will carry out the wishes of the current self. Under this belief, the current self constructs the sequence of actions that maximizes the preferences of the current self. The current self then implements the ﬁrst action in that sequence, expecting future selves to implement the remaining actions. Instead, those future selves conduct their own optimization and therefore implement actions in conﬂict with the patient behavior anticipated by prior selves. In some cases, the behavior of naive hyperbolics is very close to the behavior of sophisticated hyperbolics. For example, ALRTW have replicated their calibration and analysis under the assumption that hyperbolic consumers are naive. They ﬁnd that the naive hyperbolics act effectively the same as the sophisticated hyperbolics discussed above. Hence, for the consumption applications in this paper, it does not matter whether we assume that hyperbolics are naive or sophisticated. However, this rough equivalence does not generally hold. Ted O’Donoghue and Matthew Rabin (1999a, 1999b, 2000) have written a series of papers that examine the differences between naifs and sophisticates, developing examples where naifs and sophisticates behave in radically different ways. Their most recent paper explores the issue of retirement saving. They show that naive hyperbolics may perpetually postpone asset reallocation decisions, generating sizeable welfare costs. Each one-period postponement seems optimal, because the naif mistakenly expects some future self to undertake the reallocation. Naifs models do not exhibit any of the pathologies that we have discussed (e.g., nonmonotonic consumption functions). If consumers do not recognize that their own preferences are dynamically inconsistent, they will not have any incentive to act strategically vis-`a-vis their own future selves. However, this solution to the pathology problem requires that consumers be completely naive about their own future preferences. Any partial knowledge of future dynamic inconsistency reinstates the pathologies. O’Donoghue and Rabin (2000) also propose an intermediate model in which decision-makers partially recognize their propensity to be hyperbolic in the future. Speciﬁcally, in this intermediate model, the actor believes that future ˆ Sophisticates hold correct expectations selves will have a β value equal to β. ˆ about the future value of β, so β = β. Naifs incorrectly believe that future selves will hold preferences consistent with the long-run interests of the current self, implying βˆ = 1. Partial naifs lie between these extremes, so β < βˆ < 1. 10. NORMATIVE ANALYSIS AND POLICY IMPLICATIONS Welfare and policy analysis can be problematic in hyperbolic models. The crux of the difﬁculty is the lack of a clear welfare criterion.

286

Harris and Laibson

The most traditional perspective has been adopted by Phelps and Pollak (1968) and Laibson (1996, 1997a). These authors take the multiple self framework literally, and simply apply the Pareto criterion for welfare analysis. If one allocation makes all selves as least as well off as another allocation, then the former allocation Pareto dominates the latter allocation. Even this very strong welfare criterion opens the door to interesting welfare analysis. It is typically the case that the equilibrium allocation in a hyperbolic model is Pareto-inferior to other feasible allocations that will not arise in equilibrium. These Paretodominant allocations can be attained only with a commitment technology. We turn to such commitment technologies (and the corresponding policies that support them) in the next subsection. O’Donoghue and Rabin adopt a different approach to welfare analysis. They argue that the right welfare perspective is the long-run perspective. Speciﬁcally, they rank allocations using the welfare of an agent with no hyperbolicity (i.e., β = 1). In the long-run, all selves discount exponentially. So, all past selves want future selves to discount exponentially. In this sense, β = 1 is the right discounting assumption if we adopt the preferences of some “earlier” self (say at birth). Another way to motivate this welfare criterion is to ask what discount function you would advise someone else to use. Typically, we urge others to act patiently, suggesting that we normatively discourage short-run impulsivity. In the language of these models, this advice recommends β = 1. Recently, Caplin, and Leahy (2000) have suggested another criterion. They take the multiple self framework literally and suggest a utilitarian approach. Speciﬁcally, they argue that a sensible welfare criterion would weight the welfare of all of the selves. This approach produces challenging implications. Speciﬁcally, if later selves get roughly the same weight as early selves, then late consumption should matter much more than early consumption. To see why, consider the following two-period example. Self 1 cares about periods 1 and 2 (with equal weights). Self 2 cares only about period 2. Then period 2 should get twice the weight of period 1 in the social planner’s welfare function. Late consumption beneﬁts both selves, whereas early consumption beneﬁts only self 1. At the moment, there is no consensus framework for measuring welfare in multiple self models. However, the different approaches reviewed herein usually give similar answers to policy questions. All of the competing welfare criteria imply that equilibrium allocations in economies without commitment typically generate savings rates that are too low (i.e., higher savings allocations would improve social welfare). This implication follows almost immediately once one adopts a welfare criterion in which β = 1 (O’Donoghue and Rabin) or once one adopts the utilitarian perspective of Caplin and Leahy. Equilibrium allocations also tend to be Pareto-inferior because the static gains of high consumption rates in the short run (gains to the current self) tend to be overwhelmed by the dynamic losses of low savings rates in the long-run steady state (dynamic losses to the expected utility of the current self). Recall that hyperbolic consumers have low long-run discount rates. Hence, the long-run outcomes matter a great deal to the welfare of the current hyperbolic consumer (Laibson 1996). A commitment

Hyberbolic Discounting and Consumption

287

to a savings rate slightly above the equilibrium savings rate will raise the welfare of all selves. 10.1.

The Value of Commitment

Sophisticated hyperbolic consumers are motivated to choose policies that commit the behavior of future selves. Moreover, such commitment devices can raise the welfare of all selves if the commitment locks in patient long-run behavior. Even naive consumers will beneﬁt from commitment, although they will not appreciate these beneﬁts at the time they are being locked into a patient behavioral regime. However, these naive agents may not mind such commitments (ex ante), because they incorrectly expect future selves to act patiently anyway. In a world of sophisticated hyperbolic consumers, the social planner’s goal is to make commitment possible, rather than imposing it on consumers.35 Sophisticated consumers understand the value of commitment and will adopt such commitments when it is in their interest. Hence, a 401(k), which is voluntary, might be viewed as a useful commitment device for a sophisticated hyperbolic consumer.36 Laibson (1996) and LRT (1998) measure the welfare consequences of providing voluntary commitment technologies, like 401(k)’s, to sophisticated hyperbolic consumers. By contrast, in a world of unsophisticated consumers (i.e., naifs), a benevolent government may want to impose commitment on consumers.37 Social security, with its universal coverage and illiquid “balances,” can be viewed as such a commitment. 11. EXTENSIONS 11.1.

Asset Uncertainty

Our simulation results reported in Section 7 demonstrate that hyperbolic consumption functions become less irregular as more noise is added to the model. The analysis in Section 7 explores the case in which the noise comes from stochastic labor income. Another natural source of noise is the asset return process. In the analysis, we assumed that the asset return process was deterministic. Incorporating random returns into the model will generate four likely beneﬁts. First, when pathologies (e.g., nonmonotonic consumption functions) do arise, those pathologies will probably be less pronounced when asset returns are stochastic. Second, pathologies will be less likely to arise in the ﬁrst place. 35 36 37

Commitment technologies typically make all selves better off. 401(k)’s are deﬁned contribution pension accounts available in most U. S. ﬁrms. These accounts have a penalty for “early” withdrawal (e.g., before age 59 12 ). Naturally, there are excellent reasons to be wary of activist governments. Much political activity is directed toward rent seeking. Moreover, even a benevolent social planner needs to worry about the disincentives and distortions that arise when well-intentioned politicians tax productive activities to pay for new social programs.

288

Harris and Laibson

Third, once asset return variability is added to the model, we may be able to prove more general theorems. For example, without asset return variability, we can show that, as β → 1, the consumption function becomes monotonic and continuous on an absorbing interval of cash on hand. An absorbing interval is a range of cash-on-hand values that, in equilibrium, the consumer will never leave. With asset return variability, we conjecture that we will be able to show that, as β → 1, the consumption function becomes monotonic and continuous on the entire state space. This more general theorem reﬂects the fact that asset return uncertainty scales up with ﬁnancial wealth, in contrast to labor income uncertainty that does not scale with ﬁnancial wealth. Finally, adding asset uncertainty will enable us to model multiasset state spaces as long as each asset has some idiosyncratic variability. In this setting, we expect to be able to prove the existence, uniqueness, and regularity of equilibria using variants of the techniques developed in Harris and Laibson (2001a). 11.2.

Continuous-Time Hyperbolic Models

Continuous-time modeling provides a more robust way of eliminating pathologies like nonmonotonic consumption functions (Harris and Laibson, 2001b). To motivate the continuous-time formalism, recall the discrete time set-up. In the standard discrete-time formulation of quasi-hyperbolic preferences, the present consists of the single period t. The future consists of periods t + 1, t + 2, . . .. A period n steps into the future is discounted with factor δ n , and an additional discount factor β is applied to all periods except the present. This model can be generalized in two ways. First, the present can last for any number of periods Tt ∈ {1, 2 . . .}. Second, Tt can be random. The preferences in equation (11.1) are a natural continuous-time analog of this more general formulation. Speciﬁcally, the preferences of self t are given by

+∞ t+Tt −γ (s−t) −γ (s−t) e U (c(s))ds + α e U (c(s))ds , (11.1) Et t

t+Tt

where γ ∈ (0, +∞), α ∈ (0, 1], U : (0, +∞) → R, and Tt is distributed exponentially with parameter λ ∈ [0, +∞). In other words, self t uses a stochastic discount function, namely −γ (s−t) $ e if s ≤ t + Tt Dλ (t, s) = . αe−γ (s−t) if s > t + Tt This stochastic discount function decays exponentially at rate γ up to time t + Tt , drops discontinuously at t + Tt to a fraction α of its level just prior to t + Tt , and decays exponentially at rate γ thereafter. Figure 7.8 plots a single realization of this discount function, with t = 0 and Tt = 3.4. Figure 7.9 plots the expected value of the discount function, namely E t Dλ (t, s) = e−λ(s−t) e−γ (s−t) + (1 − e−λ(s−t) )αe−γ (s−t) , for λ ∈ {0, 0.1, 1, 10, ∞}.

Hyberbolic Discounting and Consumption

289

1

Discount function

0.8

0.6

0.4

0.2

Present Future 0

0

1

2

3 4 Realization of T

5 6 7 8 9 10 Time gap between future period and present period

Figure 7.8. Realization of the discount function (α = 0.7, γ = 0.1).

This continuous-time formalization is close to the deterministic functions used in Barro (1999) and Luttmer and Mariotti (2000). However, Harris and Laibson (2001b) assume that Tt is stochastic. The stochastic transition with constant hazard rate reduces the problem to a system of two differential equations that characterize present and future value functions.

Expected value of discount function

1•

λ = ∞ (instantaneous gratification; i.e., with jump at 0) λ = 10 λ=1

0.8

λ = 0.1

°

λ = 0 (exponential discounting)

0.6

0.4

0.2

0 0

1

2

3

4 5 6 Time to discounted period

7

8

9

10

Figure 7.9. Expected value of the discount function for λ ∈ {0, 0.1, 1, 10, ∞}.

290

Harris and Laibson

When λ = 0, the discount function is equivalent to a standard exponential discount function. As λ → ∞, the discount function converges to a jump function, namely $ 1 if s = t D∞ (t, s) = . αe−γ (s−t) if s > t This limit case is both analytically tractable and psychologically relevant. In this “instantaneous gratiﬁcation” case, the present is vanishingly short. Individuals prefer consumption in the present instant discretely more than consumption in the momentarily delayed future. The lessons from this model carry over, by continuity, to the neighborhood of models in which the present is short, but not precisely instantaneous (i.e., λ large). The instantaneous gratiﬁcation model, which is dynamically inconsistent, shares the same value function as a related dynamically consistent optimization problem with a wealth-contingent utility function. Using this partial equivalence, Harris and Laibson (2001b) prove that the hyperbolic equilibrium exists and is unique. The associated equilibrium consumption functions are continuous and monotonic in wealth. The monotonicity property relies on the condition that the long-run discount rate is weakly greater than the interest rate. For this case, all of the pathological properties of discrete-time hyperbolic models are eliminated. 12. CONCLUSIONS We have characterized the consumption behavior of hyperbolic consumers. The hyperbolic model provides two payoffs. First, it provides an analytically tractable, parsimonious foundation with which to analyze self-control problems. Second, it is easily calibrated, providing precise numerical predictions that can be empirically evaluated in competition with mainstream models. We have shown that the hyperbolic model successfully matches empirical observations on household balance sheets and consumption choices. Relative to exponential households, hyperbolic households hold low levels of liquid wealth measured either as a fraction of labor income or as a share of total wealth. Hyperbolic households borrow more aggressively in the revolving credit market (i.e., on credit cards), but they save more actively in illiquid assets. Because the hyperbolic households have low levels of liquid assets and high levels of credit card debt, they are unable to smooth their consumption paths in the presence of predictable changes in income. Calibrated hyperbolic simulations explain observed levels of consumption-income comovement and the drop in consumption at retirement. Calibrated hyperbolic simulations generate “excess sensitivity” coefﬁcients of approximately 0.20, very close to empirical coefﬁcients estimated from household data. More generally, the hyperbolic model provides a good formal foundation for the study of self-defeating behaviors. Economists usually assume that rational agents will act in their own interests. Hyperbolic agents may hold rational

Hyberbolic Discounting and Consumption

291

expectations, but they will rarely make efﬁcient choices. Puzzling and important self-defeating behaviors like undersaving, overeating, and procrastination lose some of their mystery when analyzed with the hyperbolic model. ACKNOWLEDGMENTS We thank Glenn Ellison for numerous helpful suggestions. We also thank George-Marios Angeletos, Andrea Repetto, Jeremy Tobacman, and Stephen Weinberg, whose ideas and work are reﬂected in this paper. Laura Serban provided outstanding research assistance. David Laibson acknowledges ﬁnancial support from the National Science Foundation (SBR-9510985), the Olin Foundation, the National Institute on Aging (R01-AG-1665), and the MacArthur Foundation. APPENDIX PROOF OF THEOREM 4.5 Fix x0 ∈ [0, +∞). Suppose that W1 and W2 are two equilibrium current-value functions, and let S1 and S2 be the associated saving functions. Put h = U ◦ g. Then, (BW1 )(x) = U (x − S1 (x)) + δ (W1 − εh ◦ W1 )(R S1 (x) + y) f (y)dy = U (x − S1 (x)) + δ (W2 − εh ◦ W2 )(R S1 (x) + y) f (y)dy + δ (W1 − W2 )(R S1 (x) + y) f (y)dy − εδ (h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy ≤ (BW2 )(x) + δ (W1 − W2 )(R S1 (x) + y) f (y)dy − εδ (h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy. Hence, to obtain an upper bound for (BW1 )(x) − (BW2 )(x), it sufﬁces to estimate the expressions (W1 − W2 )(R S1 (x) + y) f (y)dy (A.1) and

(h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy.

(A.2)

In doing so, we shall make use of the estimates of Theorem 4.3 that apply to W1 and W2 . In particular, the constant K and the functions N1 and N2 used herein are taken from that theorem.

292

Harris and Laibson

Expression (A.1) is easy to estimate. Because S1 is an equilibrium-saving function, RS1 (x) + y ∈ [y, X¯ ] for all x ∈ [y, X¯ ] and all y ∈ [y, y¯ ]. Hence (W1 − W2 )(R S1 (x) + y) f (y)dy ≤ #W1 − W2 #c([y, X¯ ]) for all x ∈ [y, X¯ ], where #W1 − W2 #c([y, X¯ ]) = sup |W1 (x) − W2 (x)|. x∈[y, X¯ ]

Expression (A.2) requires more care. Put Wφ (x) = (1 − φ)W1 (x) + φW2 (x). Then h(W2 (x)) − h(W1 (x)) = and

1 0

((W2 − W1 )h ◦ Wφ )(x)dφ

(h ◦ W2 − h ◦ W1 )(R S1 (x) + y) f (y)dy 1

= ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dφ dy 0

1

= 0

Moreover,

((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy dφ.

((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy = − ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy − ((W2 − W1 )h

◦ Wφ )(R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy) (on integrating by parts). Hence, to estimate expression (A.2), we need to estimate the expressions (A.3) ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy and

((W2 − W1 )h

◦ Wφ )(R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy).

(A.4)

Expression (A.3) can be estimated as follows. First, note that, because S1 is an equilibrium-saving function, R S1 (x) + y ∈ [y, X¯ ] for all x ∈ [y, X¯ ] and all

Hyberbolic Discounting and Consumption

293

y ∈ [y, y¯ ]. Second, put λ = min U (x) and λ¯ = max U (x) ∨ N1 (x). x∈[y, X¯ ]

x∈[y, X¯ ]

¯ Third, note that, because Wφ ∈ [U , U ∨ N1 ] for all φ ∈ [0, 1], Wφ ∈ [λ, λ] for all φ ∈ [0, 1]. Hence, ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy

¯ − y) ≤ #W1 − W2 #c([y, X¯ ]) #h #c([λ, λ]) ¯ # f #c([y, y¯ ]) ( y

for all x ∈ [y, X¯ ]. Expression (A.4) can be estimated as follows. First, as in the case of expression (A.3), we have ((W2 − W1 )h

◦ Wφ ) (R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy)

≤ #W1 − W2 #c([y, X¯ ]) #h

#c([λ, λ]) ¯ # f #c([y, y¯ ]) #Wφ #τ ν([y, X¯ ])

for all x ∈ [y, X¯ ], where #Wφ

#τ ν([y, X¯ ]) denotes the total variation of the measure Wφ

on the interval [y, X¯ ]. Second, put µ(d x) = K N2 (x) d x

and

˜ φ

(d x) = Wφ

(d x) + µ(d x). W

Then, ˜ φ

− µ#τ ν([y, X¯ ]) ≤ #W ˜ φ

#τ ν([y, X¯ ]) + #µ#τ ν([y, X¯ ]) #Wφ

#τ ν([y, X¯ ]) = #W ˜ φ

(d x) + W = µ(d x) =

[y, X¯ ]

[y, X¯ ]

[y, X¯ ]

Wφ

(d x) + 2

[y, X¯ ]

µ(d x)

˜

φ and µ are both positive measures, and by deﬁnition of W ˜

φ ) (because W K N2 (x) d x = Wφ ( X¯ +) − Wφ (y−) + 2 [y, X¯ ]

≤ λ¯ − λ + 2

[y, X¯ ]

K N2 (x) d x.

Combining our estimates for expressions (A.1), (A.3) and (A.4), we obtain (BW1 )(x) − (BW2 ) (x) ≤ δ#W1 − W2 #c([y, X¯ ])

¯ − y) + εδ#W1 − W2 #c([y, X¯ ]) #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) ( y

+ εδ#W1 − W2 #c([y, X¯ ]) #h

#c([λ,λ]) ¯ # f #c([y, y¯ ]) ¯ × λ−λ+2 K N2 (x) d x [y, X¯ ]

294

Harris and Laibson

for all x ∈ [y, X¯ ]. Combining this estimate with the analogous estimate for (BW2 ) (x) − (BW1 )(x), we obtain #BW1 − BW2 #c([y, X¯ ]) ≤ δ(1 + εL)#W1 − W2 #c([y, X¯ ]) , where

¯ − y) L = #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) ( y

¯ + #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) λ − λ + 2

[y, X¯ ]

K N2 (x) d x .

It follows that, if

$ 1−δ ε < min 1 − β¯ 1 , , δL then #BW1 − BW2 #c([y, X¯ ]) = 0. In other words, W1 and W2 coincide on [y, X¯ ].

References Ainslie, G. (1992), Picoeconomics. Cambridge UK: Cambridge University Press. Akerlof, G. A. (1991), “Procrastination and Obedience,” American Economic Review (Papers and Proceedings), 81, 1–19. Altonji, J. and A. Siow (1987), “Testing the Response of Consumption to Income Changes with (Noisy) Panel Data,” Quarterly Journal of Economics, 102(2), 293–328. Angeletos, G.-M., D. Laibson, A. Repetto, J. Tobacman, and S. Weinberg (2001a), “The Hyperbolic Buffer Stock Model: Calibration, Simulation, and Empirical Evaluation National Bureau of Economic Research,” Working Paper. Angeletos, G.-M., D. Laibson, A. Repetto, J. Tobacman, and S. Weinberg (2001b), “The Hyperbolic Consumption Model: Calibration, Simulation, and Empirical Evaluation,” Journal of Economic Perspectives, 15, 47–68. Attanasio, O. (1999), “Consumption,” in Handbook of Macroeconomics, (ed. by J. Taylor and M. Woodford), Amsterdam: North-Holland. Attanasio, O. and G. Weber (1993), “Consumption Growth, the Interest Rate, and Aggregation,” Review of Economic Studies, 60(3), 631–649. Attanasio, O. and G. Weber (1995), “Is Consumption Growth Consistent with Intertemporal Optimization? Evidence from the Consumer Expenditure Survey,” Journal of Political Economy, 103(6), 1121–1157. Banks, J., R. Blundell, and S. Tanner (1998), “Is There a Retirement Puzzle?” American Economic Review, 88(4), 769–788. Barro, R. (1999), “Laibson Meets Ramsey in the Neoclassical Growth Model,” Quarterly Journal of Economics, 114(4), 1125–1152. Benabou, R. and J. Tirole (2000), “Willpower and Personal Rules,” mimeo. Bernheim, B. D., J. Skinner, and S. Weinberg (1997), “What Accounts for the Variation in Retirement Wealth among U.S. Households?” Working Paper 6227. Cambridge, MA: National Bureau of Economic Research. Blundell, R., M. Browning, and C. Meghir (1994), “Consumer Demand and the LifeCycle Allocation of Household Expenditures,” Review of Economic Studies, 61, 57–80.

Hyberbolic Discounting and Consumption

295

Brocas, I. and J. Carrillo (2000), “The Value of Information when Preferences are Dynamically Inconsistent,” European Economic Review, 44, 1104–1115. Brocas, I. and J. Carrillo (2001), “Rush and Procrastination under Hyperbolic Discounting and Interdependent Activities,” Journal of Risk and Uncertainty, 22, 141– 164. Browning, M. and A. Lusardi (1996), “Household Saving: Micro Theories and Micro Facts,” Journal of Economic Literature, 32, 1797–1855. Carrillo, J. and M. Dewatripont (2000), “Promises, promises, . . . ,” mimeo. Carrillo, J. and T. Mariotti (2000), “Strategic Ignorance as a Self-Disciplining Device,” Review of Economic Studies, 67, 529–544. Carroll, C. D. (1992), “The Buffer Stock Theory of Saving: Some Macroeconomic Evidence,” Brookings Papers on Economic Activity, 2, 61–156. Carroll, C. D. (1997), “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,” Quarterly Journal of Economics, 112, 1–57. Carroll, C. D. and M. Kimball (1996), “On the Concavity of the Consumption Function,” Econometrica, 64(4), 981–992. Chung, S.-H. and R. J. Herrnstein (1961), “Relative and Absolute Strengths of Response as a Function of Frequency of Reinforcement,” Journal of the Experimental Analysis of Animal Behavior, 4, 267–272. Deaton, A. (1991), “Saving and Liquidity Constraints,” Econometrica, 59, 1221–1248. Della Vigna, S. and U. Malmendier (2001), “Long Term Contracts and Self Control,” mimeo. Della Vigna, S. and D. Paserman (2000), “Job Search and Hyperbolic Discounting,” mimeo. Diamond, P. and B. Koszegi (1998), “Hyperbolic Discounting and Retirement,” mimeo, MIT. Engen, E., W. Gale, and J. K. Scholz (1994), “Do Saving Incentives Work,” Brookings Papers on Economic Activity, 1, 85–180. Frederick, S., G. Loewenstein, and E. O’Donoghue (2001), “Time Discounting: A Critical Review,” mimeo. Gourinchas, P.-O. and J. Parker (1999), “Consumption over the Life-Cycle,” mimeo. Gross, D. and N. Souleles (1999a), “An Empirical Analysis of Personal Bankruptcy and Delinquency,” Mimeo. Gross, D. and N. Souleles (1999b), “How Do People Use Credit Cards?” mimeo. Gross, D. and N. Souleles (2000), “Consumer Response to Changes in Credit Supply: Evidence from Credit Card Data,” mimeo. Gruber, J. and B. Koszegi (2001), “Is Addiction ‘Rational’?: Theory and Evidence,” Quarterly Journal of Economics, 116, 1261–1303. Hall, R. E. (1978), “Stochastic Implications of the Life Cycle–Permanent Income Hypothesis: Theory and Evidence,” Journal of Political Economy, 86(6), 971–987. Hall, R. E. and F. S. Mishkin (1982), “The Sensitivity of Consumption to Transitory Income: Estimates from Panel Data on Households,” Econometrica, 50(2), 461– 481. Harris, C. and D. Laibson (2001a), “Dynamic Choices of Hyperbolic Consumers,” Econometrica, 69, 935–957. Harris, C. and D. Laibson (2001b), “Instantaneous Gratiﬁcation,” mimeo. Hayashi, F. (1985), “The Permanent Income Hypothesis and Consumption Durability: Analysis Based on Japanese Panel Data,” Quarterly Journal of Economics, 100(4), 1083–1113.

296

Harris and Laibson

Hubbard, G., J. Skinner, and S. Zeldes (1994), “The Importance of Precautionary Motives in Explaining Individual and Aggregate Saving,” Carnegie–Rochester Conference Series on Public Policy, 40, 59–125. Hubbard, G., J. Skinner, and S. Zeldes (1995), “Precautionary Saving and Social Insurance,” Journal of Political Economy, 103, 360–399. King, G. R. and A. W. Logue (1987), “Choice in a Self-Control Paradigm with Human Subjects: Effects of Changeover Delay Duration,” Learning and Motivation, 18, 421– 438. Kirby, K. N. (1997), “Bidding on the Future: Evidence Against Normative Discounting of Delayed Rewards,” Journal of Experimental Psychology, 126, 54–70. Kirby, K. and R. J. Herrnstein (1995), “Preference Reversals Due to Myopic Discounting of Delayed Reward,” Psychological Science, 6(2), 83–89. Kirby, K. and N. N. Marakovic (1995), “Modeling Myopic Decisions: Evidence for Hyperbolic Delay-Discounting within Subjects and Amounts,” Organizational Behavior and Human Decision Processes, 64(1), 22–30. Kirby, K. and N. N. Marakovic (1996), “Delayed-Discounting Probabilistic Rewards Rates Decrease as Amounts Increase,” Psychonomic Bulletin and Review, 3(1), 100– 104. Krusell, P. and A. Smith (2000), “Consumption and Savings Decisions with QuasiGeometric Discounting,” mimeo. Krusell, P., B. Kuruscu, and A. Smith (2000a), “Equilibrium Welfare and Government Policy with Quasi-Geometric Discounting,” mimeo. Krusell, P., B. Kuruscu, and A. Smith (2000b), “Asset Pricing with Quasi-Geometric Discounting,” mimeo. Laibson, D. I. (1994), “Self-Control and Savings,” Ph.D. Dissertation, Massachusetts Institute of Technology. Laibson, D. I. (1996), “Hyperbolic Discounting, Undersaving, and Savings Policy,” Working Paper 5635, Cambridge, MA: National Bureau of Economic Research. Laibson, D. I. (1997a), “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Economics, 112(2), 443–478. Laibson, D. I. (1997b), “Hyperbolic Discount Functions and Time Preference Heterogeneity,” mimeo, Harvard University. Laibson, D. I. (1998), “Comments on Personal Retirement Saving Programs and Asset Accumulation,” by James M. Poterba, Steven F. Venti, and David A. Wise, in Studies in the Economics of Aging, (ed. by David A. Wise), Chicago: NBER and the University of Chicago Press, 106–124. Laibson, D. I., A. Repetto, and J. Tobacman (1998), “Self-Control and Saving for Retirement,” Brookings Papers on Economic Activity, 1, 91–196. Laibson, D. I., A. Repetto, and J. Tobacman (2000), “A Debt Puzzle,” mimeo. Loewenstein, G. and D. Prelec (1992), “Anomalies in Intertemporal Choice: Evidence and an Interpretation,” Quarterly Journal of Economics, 97, 573–598. Lusardi, A. (1996), “Permanent Income, Current Income, and Consumption; Evidence from Two Panel Data Sets,” Journal of Business and Economic Statistics, 14, 81–90. Luttmer, E. and T. Mariotti (2000), “Subjective Discount Factors,” mimeo. Millar, A. and D. J. Navarick (1984), “Self-Control and Choice in Humans: Effects of Video Game Playing as a Positive Reinforcer,” Learning and Motivation, 15, 203–218. Morris, S. and A. Postlewaite (1997), “Observational Implications of Nonexponential Discounting,” mimeo.

Hyberbolic Discounting and Consumption

297

Mulligan, C. (1997), “A Logical Economist’s Argument Against Hyperbolic Discounting,” mimeo, University of Chicago. Navarick, D. J. (1982), “Negative Reinforcement and Choice in Humans,” Learning and Motivation, 13, 361–377. O’Donoghue, T. and M. Rabin (1999a), “Doing It Now or Later,” American Economic Review, 89(1), 103–124. O’Donoghue, T. and M. Rabin (1999b), “Incentives for Procrastinators,” Quarterly Journal of Economics, 114(3), 769–816. O’Donoghue, T. and M. Rabin (2000), “Choice and Procrastination,” Working Paper. Parker, J. A. (1999), “The Reaction of Household Consumption to Predictable Changes in Social Security Taxes,” American Economic Review, 89, 959–973. Phelps, E. S. and R. A. Pollak (1968), “On Second-Best National Saving and GameEquilibrium Growth,” Review of Economic Studies, 35, 185–199. Ramsey, F. (1928), “A Mathematical Theory of Saving,” Economic Journal, December, 38, 543–559. Rankin, D. M. (1993), “How to Get Ready for Retirement: Save, Save, Save,” New York Times, March 13, 33. Read, D., G. Loewenstein, S. Kalyanaraman, and A. Bivolaru (1996), “Mixing Virtue and Vice: The Combined Effects of Hyperbolic Discounting and Diversiﬁcation,” Working Paper, Carnegie Mellon University. Runkle, D. (1991), “Liquidity Constraints and the Permanent-Income Hypothesis: Evidence from Panel Data,” Journal of Monetary Economics, 27(1), 73–98. Shapiro, M. D. and J. Slemrod (1995), “Consumer Response to the Timing of Income: Evidence from a Change in Tax Withholding,” American Economic Review, 85(1), 274–283. Shea, J. (1995), “Union Contracts and the Life-Cycle/Permanent Income Hypothesis,” American Economic Review, 85(1), 186–200. Simmons Market Research Bureau (1996), The 1996 Study of Media and Markets. New York. Souleles, N. (1999), “The Response of Household Consumption to Income Tax Refunds,” American Economic Review, 89, 947–958. Strotz, R. H. (1956), “Myopia and Inconsistency in Dynamic Utility Maximization,” Review of Economic Studies, 23, 165–180. Thaler, R. H. (1981), “Some Empirical Evidence on Dynamic Inconsistency,” Economics Letters, 8, 201–207. Thaler, R. H. (1992), “Saving, Fungibility, and Mental Accounts,” in The Winner’s Curse, Princeton, NJ: Princeton University Press, 107–121. Thaler, R. H. and H. M. Shefrin (1981), “An Economic Theory of Self-Control,” Journal of Political Economy, 89, 392–410. Zeldes, S. P. (1989a), “Consumption and Liquidity Constraints: An Empirical Investigation,” Journal of Political Economy, 97(2), 305–346. Zeldes, S. P. (1989b), “Optimal Consumption with Stochastic Income: Deviations from Certainty Equivalence,” Quarterly Journal of Economics, 104(2), 275–298.

A Discussion of the Papers by Ernst Fehr and Klaus M. Schmidt and by Christopher Harris and David Laibson Glenn Ellison

It was a pleasure to serve as the discussant for this session. The authors have played a major role in developing the areas under discussion. The papers they produced for this volume are insightful and will help shape the emerging literature. The papers are excellent. I feel fortunate that my task is to comment and not to criticize. One aspect of this session I found striking was the degree of agreement on how to deﬁne and organize a subﬁeld of behavioral economics. In each case, the authors focus on one way in which real-world behavior departs from the standard model of rational self-interested behavior. They begin by mentioning results from a number of experiments showing systematic departures from the standard model. Although Fehr and Schmidt spend a fair amount of time distinguishing between broad classes of fairness models, both sets of authors advocate the use of simple tractable models that reﬂect essential features of behavior. In the ﬁnal two paragraphs of the conclusions, both papers argue that behavioral economics models can add much to our understanding of important economic problems. The similarity in the authors’ perspectives makes one class of comments easy – one can look at the nice features of each subﬁeld and suggest that the other might try to do something similar. For example, one way in which the hyperbolic discounting literature seemed ahead of the fairness literature to me is that the application to consumption is well developed theoretically and empirically to the point of being undeniably a part of macroeconomics literature. It would be nice to see work on fairness pushing as hard on a single topic and gaining acceptance within an applied community. A feature of the fairness literature I admired is the attention that has been paid to heterogeneity in fairness preferences. Although I know that behavioral economists like to work

Discussion

299

with minimal departures from rationality, I would think that large consumption data sets could provide those interested in hyperbolic discounting with ample degrees of freedom to explore models with heterogeneity in the degree of time inconsistency. The similarity in perspective also makes it worthwhile to try to comment on whether the approach is a good one. I think it is. The one comment I would make, however, is that the “behavioral” organization of papers seems to me to have one drawback relative to the way in which most other applied papers are written. The difference I perceive in organization is that the papers focus on how a behavioral assumption can help us understand a large number of facts, rather than on the facts themselves. For example, I would regard a paper on high credit card debt as more applied and less “behavioral” if it focused narrowly on understanding credit card debt and discussed various potential explanations. The relative disadvantage of the behavioral approach is that in any paper that explains many disparate facts, one can worry that there is an implicit selection of facts consistent with the model. For example, the calibration papers discussed in Harris–Laibson did not conduct surveys of people with high credit card debt and ask them directly if they feel that they failed to foresee how quickly the debt would build up. The model “predicts” that there would be no afﬁrmative responses, and I am sure this is not what the survey would yield. Experimental papers are similarly selective because authors must decide which experiments to conduct. Good experimentalists no doubt have a keen intuition for how experiments will turn out, and may tend to carry out only experiments in which they expect that their model will be vindicated. It seems to me that this type of criticism is harder to make of narrower applied papers – there are fewer facts to select among when one is looking at a narrow topic and the presence of competing explanations gives the authors less reason to favor any one potential explanation over the others. As I said previously, the papers for this section are very insightful and very well done. I made a number of other speciﬁc comments at the World Congress, but I really cannot say that many of them merit being written down (especially now that the papers are in print). I therefore thought that I would devote the remainder of my time to talking more broadly about behavioral economics and its situation within the economics profession. Behavioral economics is a potential revolution. If one judges its progress by looking at top journal publications or its success in attracting top economists, it is doing very well. In every ﬁeld of economics, however, it has not yet affected how most work is done. What will be needed for the behavioral economics revolution to succeed? As an outsider, I clearly cannot offer the kind of insight based on a detailed understanding of various branches of the literature that many others could provide. Instead, I will try to approach the question as an amateur sociologist of the economics profession. There have been a number of recent successful revolutions in economic methodology (e.g., game theory and rational expectations). Behavioral economics (and other literatures like that on nonrational learning) may get valuable lessons from studying their progress.

300

Discussion

Any such lessons, of course, will be about what makes ﬁelds take off in our profession and not necessarily about what makes research valuable. One general thought that occurred to me on reﬂecting about revolutions is that the presence of interesting pure theory questions has spurred on many past revolutions. For example, I would argue that the main reason why the Folk Theorem continued to receive so much attention long after the basic principle was known was that economists enjoyed thinking and reading about cleverly constructed strategies. Franklin Fisher (1989) has criticized the ﬁnite-horizon reputation literature as an example of a literature driven by the elegance and sophistication of the analysis. Inﬁnite-horizon models are preferable descriptively and easily give what is probably the appropriate answer: that forming a reputation is possible, but will not necessarily happen. I do not want to debate the value of the reputation literature here. I just want to point out that the literature is extensive, and regardless of what people eventually conclude about the topic, it surely contributed to game theory’s success. Many economists like interesting models, and ﬁelds that can provide them will grow. Behavioral economists who want their ﬁeld to grow should not overlook this aspect of our profession. As a theorist, I really enjoyed reading Harris and Laibson’s discussion of the pathological properties of some hyperbolic models, and was intrigued by Fehr and Schmidt’s comment that workers who care more about fairness will shirk more in their contracting game. Undoubtedly, there are many other similar opportunities. In thinking about the theory/empirical divide, another thought that occurred to me (and here I am less conﬁdent in my casual empiricism) is that positive empirical evidence was not really a part of the takeoff of past revolutions. Empirical puzzles or shortcomings of the previous literature seem sometimes to be important. For example, the rational expectations revolution in macroeconomics was surely spurred by empirical evidence on the Phillips curve. The success of a new literature investigating failures of the old, however, seems not to be wrapped up with empirically demonstrating the superiority of new ideas. In the case of information economics, for example, the attention that Akerlof’s (1970) lemons model and Spence’s (1973) job market signaling model attracted was not due to any demonstration that a set of car prices or educational attainments were well understood using the models. In industrial organization, the game-theoretic literature exploded, whereas the empirical examination of such models proceeded at a much more leisurely pace. It seems that initial bursts of applied theory work have transformed ﬁelds and made them accepted long before any convincing empirical evidence is available. I would conclude that if behavioral economists want their revolution to occur, they might be well served to focus on producing applied theory papers that economists in various ﬁelds will want to teach their students. There are other ways in which behavioral economists seem to be taking a different approach from past revolutions. They are spending much more time developing experimental support for their assumptions. They seem to spend much more time highlighting contrasts with the existing literature than did

Discussion

301

participants in earlier revolutions. [For example, Spence (1973) has only two references and Milgrom and Roberts (1982) only mention the decades-long legal and economic debates on predation in the ﬁrst paragraph.] It seems hard to say, however, whether the leaders of previous revolutions would have taken advantage of experiments had they been easier to conduct, or if they would have been forced to write differently had they faced today’s review process. To conclude, I would like to come back to a question I have carefully avoided. Should behavioral economists follow the advice I’ve given? My observations only concerned what seems to make economic revolutions successful, not what work should be valued. Personally, I like pure theory and think that one of the nice features of academia is that we can stop to think about interesting issues that arise. I am happy that the profession seems to value such work. Personally, I also ﬁrmly believe that empirical work that helps us assess the applicability of new (and old theories) is extremely important. For example, I regard the work that Laibson and coauthors are carrying out on consumption as perhaps the most valuable work in the ﬁeld. Thus, I would like to say that, while studying the progress of past revolutions may provide behavioral economists with valuable insights into how they can succeed, I hope that they do not pay too much attention to this and let a preoccupation with success get in the way of doing important work.

References Akerlof, G. A. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. Fisher, F. M. (1989), “Games Economists Play,” Rand Journal of Economics, 20, 113– 124. Milgrom, P. R. and J. Roberts (1982), “Predation, Reputation and Entry Deterrence,” Journal of Economic Theory, 27, 280–312. Spence, A. M. (1973), “Job Market Signalling,” Quarterly Journal of Economics, 87, 355–374.

CHAPTER 8

Agglomeration and Market Interaction Masahisa Fujita and Jacques-Fran¸cois Thisse

1. INTRODUCTION The most salient feature of the spatial economy is the presence of a large variety of economic agglomerations. Our purpose is to review some of the main explanations of this universal phenomenon, as they are proposed in urban economics and modern economic geography. Because of space constraints, we restrict ourselves to the most recent contributions, referring the reader to our forthcoming book for a more complete description of the state of the art. Although using agglomeration as a generic term is convenient at a certain level of abstraction, it should be clear that the concept of economic agglomeration refers to very distinct real-world situations. At one extreme lies the core-periphery structure corresponding to North-South dualism. For example, Hall and Jones (1999) observe that high-income nations are clustered in small industrial cores in the Northern Hemisphere, whereas productivity per capita steadily declines with distance from these cores. As noted by many historians and development analysts, economic growth tends to be localized. This is especially well illustrated by the rapid growth of East Asia during the last few decades. We view East Asia as comprising Japan and nine other countries, that is, Republic of Korea, Taiwan, Hong Kong, Singapore, Philippines, Thailand, Malaysia, Indonesia, and China. In 1990, the total population of East Asia was approximately 1.6 billion. With only 3.5 percent of the total area and 7.9 percent of the total population, Japan accounted for 72 percent of the gross domestic product (GDP) and 67 percent of the manufacturing GDP of East Asia. In Japan itself, the economy is very much dominated by its core regions formed by the ﬁve prefectures containing the three major metropolitan areas of Japan: Tokyo and Kanagawa prefectures, Aichi prefecture (containing Nagoya MA), and Osaka and Hyogo prefectures. These regions account for only 5.2 percent of the area of Japan, but for 33 percent of its population, 40 percent of its GDP, and 31 percent of its manufacturing employment. Hence, for the whole of East Asia, the Japanese core regions with a mere 0.18 percent of the total area accounted for 29 percent of East Asia’s GDP.

Agglomeration and Market Interaction

303

Strong regional disparities within the same country imply the existence of agglomerations at another spatial scale. For example, in Korea, the capital region (Seoul and Kyungki Province), which has an area corresponding to 11.8 percent of the country and 45.3 percent of the population, produces 46.2 percent of the GDP. In France, the contrast is even greater: the Ile-de-France (the metropolitan area of Paris), which accounts for 2.2 percent of the area of the country and 18.9 percent of its population, produces 30 percent of its GDP. Inside the Ilede-France, only 12 percent of the available land is used for housing, plants, and roads, with the remaining land being devoted to agricultural, forestry, or natural activities. Regional agglomeration is also reﬂected in large varieties of cities, as shown by the stability of the urban hierarchy within most countries. Cities themselves may be specialized in a very small number of industries, as are many mediumsized American cities. However, large metropolises like Paris, New York, or Tokyo are highly diversiﬁed in that they nest a large variety of industries, which are not related through direct linkages. Industrial districts involving ﬁrms with strong technological and/or informational linkages (e.g., the Silicon Valley or Italian districts engaged in more traditional activities), as well as factory towns (e.g., Toyota City), manifest various types of local specialization. Therefore, it appears that highly diverse size/activity arrangements exist at the regional and urban levels. Although the sources are dispersed, not always trustworthy, and hardly comparable, data clearly converge to show the existence of an urban revolution. In Europe, the proportion of the population living in cities increased very slowly from 10 percent in 1300 to 12 percent in 1800. It was approximately 20 percent in 1850, 38 percent in 1900, 52 percent in 1950, and is close to 75 percent nowadays, thus showing an explosive growth in the urban population. In the United States, the rate of urbanization increased from 5 percent in 1800 to more than 60 percent in 1950 and is now near 77 percent. In Japan, the rate of urbanization was about 15 percent in 1800, 50 percent in 1950, and is now about 78 percent. The proportion of the urban population in the world increased from 30 percent in 1950 to 45 percent in 1995 and should exceed 50 percent in 2005. Furthermore, concentration in very big cities keeps rising. In 1950, only two cities had populations above 10 million: New York and Greater London. In 1995, 15 cities belonged to this category. The largest one, Tokyo, with more than 26 million, exceeds the second one, New York, by 10 million. In 2025, 26 megacities will exceed 10 million. Economists must explain why ﬁrms and households concentrate in large metropolitan areas, whereas empirical evidence suggests that the cost of living in such areas is typically higher than in smaller urban areas (Richardson, 1987). Or, as Lucas (1988, p. 39) put it in a neat way: “What can people be paying Manhattan or downtown Chicago rents for, if not for being near other people?” But Lucas did not explain why people want, or need, to be near other people.

304

Fujita and Thisse

The increasing availability of high-speed transportation infrastructure and the fast-growing development of new informational technologies might suggest that our economies enter an age that would culminate in the “death of distance.” If so, locational difference would gradually fade because agglomeration forces would be vanishing. In other words, cities would become a thing of the past. Matters are not that simple, however, because the opposite trend may as well happen.1 Indeed, one of the general principles that will come out from our analysis is that the relationship between the decrease in transport costs and the degree of agglomeration of economic activities is not that expected by many analysts: agglomeration happens provided that transport costs are below some critical threshold, although further decreases may yield dispersion of some activities due to factor price differentials.2 In addition, technological progress brings about new types of innovative activities that beneﬁt most from being agglomerated and, therefore, tend to arise in developed areas (Audretsch and Feldman, 1996). Consequently, the wealth or poverty of people seems to be more and more related to the existence of prosperous and competitive clusters of speciﬁc industries, as well as to the presence of large and diversiﬁed metropolitan areas. The recent attitude taken in several institutional bodies and media seems to support this view. For example, in its Entering the 21st Century: World Development Report 1999/2000, the World Bank stresses the importance of economic agglomerations and cities for boosting growth and escaping from the poverty trap. Another example of this increasing awareness of the relevance of cities in modern economies can be found in The Economist (1995, p. 18): The liberalization of world trade and the inﬂuence of regional trading groups such as NAFTA and the EU will not only reduce the powers of national governments, but also increase those of cities. This is because an open trading system will have the effect of making national economies converge, thus evening out the competitive advantage of countries, while leaving those of cities largely untouched. So in the future, the arenas in which companies will compete may be cities rather than countries.

The remainder of this paper is organized as follows. In Section 2, we show why the competitive framework can hardly be the foundation for the economics of agglomeration. We then brieﬂy review the alternative modeling strategies. In the hope to make our paper accessible to a broad audience, Section 3 presents in detail the two (speciﬁc) models that have been used so far to study the spatial distribution of economic activities. Several extensions of these models 1

2

For example, recent studies show that, in the United States, 86 percent of net delivery capacity is concentrated in the 20 largest cities. This suggests that the United States is quickly becoming a country of digital haves and have-nots, with many small businesses unable to compete, and minority neighborhoods and rural areas getting left out. Transportation (or transfer) costs are broadly deﬁned to include all the factors that drive a wedge between prices at different locations, such as shipping costs per se, tariff and nontariff barriers to trade, different product standards, difﬁculty of communication, and cultural differences.

Agglomeration and Market Interaction

305

are discussed in Section 4. Section 5 concludes with some suggestions for further research and policy implications. 2. MODELING STRATEGIES OF ECONOMIC AGGLOMERATIONS As a start, it is natural to ask the following question: to what extent is the competitive paradigm useful in understanding the main features of the economic landscape? The general competitive equilibrium model is indeed the benchmark used by economists when they want to study the market properties of an economic issue. Before proceeding, we should remind the reader that the essence of this model is that all trades are impersonal: when making their production or consumption decisions, economic agents need to know the price system only, which they take as given. At a competitive equilibrium, prices provide ﬁrms and consumers with all the information they must know to maximize their proﬁt and their utility. The most elegant and general model of a competitive economy is undoubtedly that developed by Arrow and Debreu. In this model, a commodity is deﬁned not only by its physical characteristics, but also by the place it is made available. This implies that the same good traded at different places is treated as different economic commodities. Within this framework, choosing a location is part of choosing commodities. This approach integrates spatial interdependence of markets into general equilibrium in the same way as other forms of interdependence. Thus, the Arrow–Debreu model seems to obviate the need for a theory speciﬁc to the spatial context. Unfortunately, as will be seen later, the competitive model cannot generate economic agglomerations without assuming strong spatial inhomogeneities. More precisely, we follow Starrett (1978) and show that introducing a homogeneous space (in a sense that will be made precise below) in the Arrow–Debreu model implies that total transport costs in the economy must be zero at any spatial competitive equilibrium, and thus trade and cities cannot arise in equilibrium. In other words, the competitive model per se cannot be used as the foundation for the study of a spatial economy because we are interested in identifying purely economic mechanisms leading agents to agglomerate in a featureless plain.3 This is because we concur with Hoover (1948, p. 3) for whom: Even in the absence of any initial differentiation at all, i.e., if natural resources were distributed uniformly over the globe, patterns of specialization and concentration of activities would inevitably appear in response to economic, social, and political principles.

3

Ellickson and Zame (1994) disagree with this claim and argue that the introduction of moving costs in a dynamic setting may be sufﬁcient to save the competitive paradigm. To the best of our knowledge, however, the implications of their approach have not yet been fully worked out.

306

Fujita and Thisse

2.1.

Breakdown of the Competitive Price Mechanism in a Homogeneous Spatial Economy

The economy is formed by agents (ﬁrms and households) and by commodities (goods and services). A ﬁrm is characterized by a set of production plans, with each production plan describing a possible input–output relation. A household is identiﬁed by a relation of preference, by a bundle of initial resources, and by shares in ﬁrms’ proﬁts. A competitive equilibrium is then described by a price system (one price per commodity), a production plan for each ﬁrm, and a consumption bundle for each household that satisﬁes the following conditions: at the prevailing prices (i) supply equals demand for each commodity; (ii) each ﬁrm maximizes its proﬁt subject to its production set; and (iii) each household maximizes her utility under her budget constraint deﬁned by the value of her initial endowment and her shares in ﬁrms’ proﬁts. In other words, all markets clear while each agent chooses her most preferred action at the equilibrium prices. Space involves a ﬁnite number of locations. Transportation within each location is costless, but shipping goods from one location to another requires the use of resources. Without loss of generality, transportation between any two locations is performed by a proﬁt-maximizing carrier who purchases goods in a location at the market prices prevailing in this location and sells them in the other location at the corresponding market prices, while using goods and land in each location as inputs. A typical ﬁrm produces in a small number of places. Likewise, a household has a very small number of residences. For simplicity, we therefore assume that each ﬁrm (each household) chooses a single location and engages in production (consumption) activities there. However, ﬁrms and households are free to choose any location they want (the industry is footloose). For expositional convenience, we distinguish explicitly prices and goods by their location. Given this convention, space is said to be homogeneous when (i) the utility function and the consumption set are the same regardless of the location in which the household resides, and (ii) the production set of a ﬁrm is independent of the location elected by this ﬁrm. In other words, consumers and producers have no intrinsic preferences for one location over others. In this context, the following unsuspected result, which we call the Spatial Impossibility Theorem, has been proven by Starrett (1978). Theorem 2.1. Consider an economy with a ﬁnite number of agents and locations. If space is homogeneous, transport is costly, and preferences are locally nonsatiated, then there is no competitive equilibrium involving transportation. What does it mean? If economic activities are perfectly divisible, a competitive equilibrium exists and is such that each location operates as an autarky. For example, when households are identical, locations have the same relative prices and the same production structure (backyard capitalism). This is hardly

Agglomeration and Market Interaction

307

a surprising outcome because, by assumption, there is no reason for economic agents to distinguish among locations and each activity can operate at an arbitrarily small level. Firms and households thus succeed in reducing transport costs at their absolute minimum, namely zero. However, as observed by Starrett (1978, p. 27), when economic activities are not perfectly divisible, the transport of some goods between some places becomes unavoidable: . . . as long as there are some indivisibilities in the system (so that individual operations must take up space) then a sufﬁciently complicated set of interrelated activities will generate transport costs (Starrett 1978, p. 27).

In this case, the Spatial Impossibility Theorem tells us that no competitive equilibrium exists. This is clearly a surprising result that requires more explanations. For simplicity, we restrict ourselves to the case of two locations, A and B. When both locations are not in autarky, one should keep in mind that the price system must do two different jobs simultaneously: (i) to support trade between locations (while clearing the markets in each location) and (ii) to prevent ﬁrms and households from relocating. The Spatial Impossibility Theorem says that, in the case of a homogeneous space, it is impossible to hit two birds with one stone: the price gradients supporting trade bear wrong signals from the viewpoint of locational stability. Indeed, if a set of goods is exported from A to B, then the associated positive price gradients induce producers located in A (who seek a higher revenue) to relocate in B, whereas location B’s buyers (who seek lower prices) want to relocate in A. Likewise, the export of another set of goods from B to A encourages such “cross-relocation.” The land rent differential between the two locations can discourage the relocation in one direction only. Hence, as long as trade occurs at positive costs, some agents always want to relocate. To ascertain the fundamental cause for this nonexistence, it is helpful to illustrate the difﬁculty encountered by using a standard diagram approach. Depicting the whole trade pattern between two locations would require a diagram with six dimensions (two tradable goods and land at each location), which is a task beyond our capability. Thus, we focus on a two-dimensional subspace of the whole pattern by considering the production of good i only, which is traded between A and B, while keeping the other elements ﬁxed. Because the same physical good available at two distinct locations corresponds to two different commodities, this is equivalent to studying the production possibility frontier between two different economic goods. Suppose that, at most, one unit of good i is produced by one ﬁrm at either location using a ﬁxed bundle of inputs. For simplicity, the cost of these inputs is assumed to be the same in both locations. The good is shipped according to an iceberg technology: when xi units of the good are moved between A and B, only a fraction xi /ϒ arrives at its destination, with ϒ > 1, whereas the rest melts away en route (Samuelson, 1983). In this context, if the ﬁrm is located in

308

Fujita and Thisse xiA

1

E

( piA , piB )

1/Υ

1/Υ

F′

Υ 0

F 1/Υ

E′ 1

xiB

Figure 8.1. The set of feasible allocations in a homogeneous space.

A, then the output is represented by point E on the vertical axis in Figure 8.1; if the entire output is shipped to B, then the fraction 1/ϒ arrives at B, which is denoted by point F on the horizontal axis. Hence, when the ﬁrm is at A, the set of feasible allocations of the output between the two locations is given by the triangle O E F. Space being homogeneous, if the ﬁrm locates at B, the set of feasible allocations between the two places is now given by the triangle O E F . Hence, when the ﬁrm is not located, the set of feasible allocations is given by the union of the two triangles. Let the ﬁrm be set up at A and assume that the demand conditions are such that good i is consumed in both locations so that trade occurs. Then, to support any feasible trade pattern, represented by an interior point of the segment E F, the price vector ( pi A , pi B ) must be such that pi A / pi B = 1/ϒ, as shown in Figure 8.1. However, under these prices, it is clear that the ﬁrm can obtain a strictly higher proﬁt by locating in B and choosing the production plan E in Figure 8.1. This implies that there is no competitive price system that can support both the existence of trade and a proﬁt-maximizing location for the ﬁrm. This difﬁculty arises from the nonconvexity of the set of feasible allocations. If transportation was costless, the set of feasible allocations would be given by the triangle O E E in Figure 8.1, which is convex. In this case, the ﬁrm would face no incentive to relocate. Similarly, if the ﬁrm’s production activity was perfectly divisible, this set would again be equal to the triangle O E E , and no difﬁculty would arise. Therefore, even though the individual land consumption is endogenous, we may conclude that the fundamental reason for the Spatial Impossibility Theorem is the nonconvexity of the set of feasible allocations caused by the existence of positive transport costs and the fact that agents have an address in space.

Agglomeration and Market Interaction

309

Some remarks are still in order. First, we have assumed that each ﬁrm locates in a single region. The theorem could be generalized to permit ﬁrms to run distinct plants, one plant per location because each plant amounts to a separate ﬁrm in the competitive setting (Koopmans, 1957). Second, we have considered a closed economy. The theorem can be readily extended to allow for trade with the rest of the world provided that each location has the same access to the world markets to satisfy the assumption of a homogeneous space. Third, the size of the economy is immaterial for the Spatial Impossibility Theorem to hold in that assuming a “large economy,” in which competitive equilibria often emerge as the outcome generated by several institutional mechanisms, does not affect the result because the value of total transport costs within the economy rises when agents are replicated. Last, the following result sheds extra light on the meaning of the Spatial Impossibility Theorem (Fujita and Thisse, 2002). Corollary 2.2. If there exists a competitive equilibrium in a spatial economy with a homogeneous space, then the land rent must be the same in all locations. This result has the following fundamental implication for us: in a homogeneous space, the competitive price mechanism is unable to explain why the land rent is higher in an economic agglomeration (such as a city, a central business district, or an industrial cluster) than in the surrounding area. This clearly shows the limits of the competitive paradigm for studying the agglomeration of ﬁrms and households. 2.2.

What Are the Alternative Modeling Strategies?

Thus, if we want to understand something about the spatial distribution of economic activities and, in particular, the formation of major economic agglomerations as well as regional specialization and trade, the Spatial Impossibility Theorem tells us that we must make at least one of the following three assumptions: (i) space is heterogeneous (as in the neoclassical theory of international trade) (ii) externalities in production and consumption exist (as in urban economic) (iii) markets are imperfectly competitive (as in the so-called “new” economic geography). Of course, in reality, economic spaces are the outcome of different combinations of these three agglomeration forces. However, it is convenient here to distinguish them to ﬁgure out what are the effects of each one of them. A. Comparative advantage models. The heterogeneity of space introduces the uneven distribution of immobile resources (such as mineral deposits or some production factors) and amenities

310

Fujita and Thisse

(climate), as well as the existence of transport nodes (ports, transhipment points) or trading places. This approach, while retaining the assumption of constant returns and perfect competition, yields comparative advantage among locations and gives rise to interregional and intercity trade. B. Externality models. Unlike models of comparative advantage, the basic forces for spatial agglomeration and trade are generated endogeneously through nonmarket interactions among ﬁrms and/or households (knowledge spillovers, business communications, and social interactions). Again, this approach allows us to appeal to the constant return/perfect competition paradigm.4 C. Imperfect competition models. Firms are no longer price-takers, thus making their price policy dependent on the spatial distribution of consumers and ﬁrms. This generates some form of direct interdependence between ﬁrms and households that may produce agglomerations. However, it is useful to distinguish two types of approaches. C1. Monopolistic competition. This leads to some departure from the competitive model and allows for ﬁrms to be price-makers and to produce differentiated goods under increasing returns; however, strategic interactions are weak because one assumes a continuum of ﬁrms. C2. Oligopolistic competition. Here, we face the integer aspect of location explicitly. That is, we assume a ﬁnite number of large agents (ﬁrms, local governments, and land developers) who interact strategically by accounting for their market power. The implications of the modeling strategy selected are important. For example, models under A, B, and C1 permit the use of a continuous density approach that seems to be in line with what geographers do. By contrast, under C2, it is critical to know “who is where” and with whom the corresponding agent interacts. In addition, if we focus on the heterogeneity of space, the market outcome is socially optimal. On the other hand, because the other two approaches involve market failures, the market outcome is likely to be inefﬁcient. Models of comparative advantage have been extensively studied by international and urban economists (Fujita, 1989), whereas models of spatial competition have attracted a lot of attention in industrial organization (Anderson, de Palma, and Thisse, 1992). Because Ed Glaeser and Jos´e Scheinkman deal with nonmarket interactions, we choose to focus on market interactions, that is, models belonging to class C1. Although this class of models has been initially developed in the context of intraurban agglomeration with a land market (e.g., Fujita 1988), we restrict ourselves to multiregional models of industrial agglomeration. 4

See, e.g., the now classical papers by Henderson (1974) and by Fujita and Ogawa (1982).

Agglomeration and Market Interaction

311

3. CORE AND PERIPHERY: A MONOPOLISTIC COMPETITION APPROACH The spatial economy is replete with pecuniary externalities. For example, when some workers choose to migrate, they are likely to affect both the labor and product markets in their region of origin, thus affecting the well-being of those who stay put. Moreover, the moving workers do not account either for the impact of their decision on the workers and ﬁrms located in the region of destination. Still, their moves will increase the level of demand inside this region, thus making the place more attractive to ﬁrms. Everything else being equal, they will also depress the local labor market so that the local wage is likely to be affected negatively. In sum, these various changes may increase or decrease the attractiveness of the destination region for outside workers and ﬁrms. Such pecuniary externalities are especially relevant in the context of imperfectly competitive markets, because prices do not reﬂect perfectly the social values of individual decisions. They are also better studied within a general equilibrium context to account for the interactions between the product and labor markets. In particular, such a framework allows us to study the dual role of individuals as workers and consumers. At ﬁrst sight, this seems to be a formidable task. Yet, as shown by Krugman (1991a), several of these various effects can be combined and studied within a simple enough general equilibrium model of monopolistic competition, which has come to be known as the core-periphery model. Recall that monopolistic competition in the manner of Chamberlin involves consumers with a preference for variety (varietas delectat), whereas ﬁrms producing these varieties compete for a limited amount of resources because they face increasing returns. The prototype that has emerged from the industrial organization literature is the model developed by Spence (1976) and Dixit and Stiglitz (1977), sometimes called the S-D-S model. These authors assume that each ﬁrm is negligible in the sense that it may ignore its impact on, and hence reactions from, other ﬁrms, but retains enough market power for pricing above marginal cost regardless of the total number of ﬁrms (like a monopolist). Moreover, the position of a ﬁrm’s demand depends on the actions taken by all ﬁrms in the market (as in perfect competition). In many applications, the S-D-S model is proven to be a very powerful instrument for studying the aggregate implications of monopoly power and increasing returns, and so especially when these are the basic ingredients of selfsustaining processes such as those encountered in modern theories of growth and geography (Matsuyama, 1995). This is because of the following reasons. First, although each ﬁrm is a price-maker, strategic interactions are very weak in this model, thus making the existence of an equilibrium much less problematic than in general equilibrium under imperfect competition (see, e.g., Bonanno, 1990). Second, the assumption of free entry and exit leads to zero proﬁt so that a worker’s income is just equal to her wage, another major simpliﬁcation. Last, the difference between price competition and quantity competition that plagues oligopoly models is immaterial in a monopolistic competitive setting.

312

Fujita and Thisse

Indeed, being negligible to the market, each ﬁrm behaves as a monopolist on her residual demand, which makes it indifferent between using price or quantity as a strategy. 3.1.

The Framework

We consider a 2 × 2 × 2 setting. The economic space is made of two regions (A and B). The economy has two sectors, the modern sector (M) and the traditional sector (T). There are two production factors, the high-skilled workers (H ) and the low-skilled workers (L). The M-sector produces a continuum of varieties of a horizontally differentiated product under increasing returns, using H as the only input. The T-sector produces a homogeneous good under constant returns, using unskilled labor L as the only input. The economy is endowed with L unskilled workers and with H skilled workers (labor dualism). The skilled workers are perfectly mobile between regions, whereas the unskilled workers are immobile. This extreme assumption is justiﬁed because the skilled are more mobile than the unskilled over long distances (SOPEMI 1998). Finally, the unskilled workers are equally distributed between the two regions, and thus regions are a priori symmetric. The technology in the T-sector is such that one unit of output requires one unit of L. The output of the T-sector is costlessly traded between any two regions and is chosen as the num´eraire so that p T = 1. Hence, the wage of the unskilled workers is also equal to 1 in both regions. Each variety of the M-sector is produced according to the same technology such that the production of the quantity q(i) requires l(i) units of skilled labor given by l(i) = f + cq(i),

(3.1)

in which f and c are, respectively, the ﬁxed and marginal labor requirements. Because there are increasing returns but no scope economies, each variety is produced by a single ﬁrm. This is because, due to the consumers’ preference for variety, any ﬁrm obtains a higher share of the market by producing a differentiated variety than by replicating an existing one. The market equilibrium is the outcome of the interplay between a dispersion force and an agglomeration force. The centrifugal force is very simple. It lies in two sources: (i) the spatial immobility of the unskilled whose demands for the manufactured good are to be met and (ii) the ﬁercer competition that arises when ﬁrms locate back to back (d’Aspremont, Gabszewicz, and Thisse, 1979). The centripetal force is more involved. If a larger number of ﬁrms is located in one region, the number of varieties locally produced is also larger. This in turn induces some skilled living in the smaller region to move toward the larger region in which they may enjoy a higher standard of living. The resulting increase in the numbers of consumers creates a larger demand for the differentiated good which, therefore, leads additional ﬁrms to locate in this region. This implies the availability of more varieties in the region in question, but less in the others because there are scale economies at the ﬁrm’s level. Consequently, as noted by

Agglomeration and Market Interaction

313

Krugman (1991a, p. 486), there is circular causation in the manner of Myrdal, because these two effects reinforce each other: “manufactures production will tend to concentrate where there is a large market, but the market will be large where manufactures production is concentrated.” Let λ be the fraction of skilled residing in region A and denote by vr (λ) the indirect utility a skilled worker enjoys in region r = A, B when the spatial distribution of skilled is (λ, 1 − λ). A spatial equilibrium arises at λ ∈ (0, 1) when v(λ) ≡ v A (λ) − v B (λ) = 0, at λ = 0 when v(0) ≤ 0, or at λ = 1 when v(1) ≥ 0. Such an equilibrium always exists when vr (λ) is a continuous function of λ. However, this equilibrium is not necessarily unique. Stability is then used to eliminate some of them. The stability of such an equilibrium is studied with respect to the following equation of motion:5 .

λ= λv(λ)(1 − λ).

(3.2)

If v(λ) is positive and λ ∈ (0, 1), workers move from B to A; if it is negative, they go in the opposite direction. Clearly, any spatial equilibrium is such that . λ= 0. A spatial equilibrium is stable if, for any marginal deviation of the population distribution from the equilibrium, the equation of motion brings the distribution of skilled workers back to the original one.6 We assume that local labor markets adjust instantaneously when some skilled workers move from one region to the other. More precisely, the number of ﬁrms in each region must be such that the labor market-clearing conditions (3.12) and (3.22) remain valid for the new distribution of workers. Wages are then adjusted in each region for each ﬁrm to earn zero proﬁts in any region having skilled workers, because the skilled move according to the utility differential. 3.2.

A Model with CES Utility and Iceberg Transport Costs

Although consumption takes place in a speciﬁc region, it is notationally convenient to describe preferences without explicitly referring to any particular region. Preferences are identical across all workers and described by a Cobb– Douglas utility: u = Q µ T 1−µ /µµ (1 − µ)1−µ , 5

6

0 < µ < 1,

(3.3)

This dynamic implies that the equilibrium is reached for t → ∞. One could alternately use the dynamic system proposed by Tabuchi (1986) in which the corner solutions λ = 0 and λ = 1 are reached within ﬁnite times. The difference becomes critical when the economy exhibits different equilibrium patterns over time. Note that (3.2) provides one more justiﬁcation for working with a continuum of agents: this modeling strategy allows one to respect the integer nature of an agent’s location (her address) while describing the evolution of the regional share of production by means of a differential equation.

314

Fujita and Thisse

where Q stands for an index of the consumption of the modern sector varieties, and T is the consumption of the output of the traditional sector. Because the modern sector provides a continuum of varieties of size M, the index Q is given by

M

Q=

ρ

1/ρ

q(i) di

0 < ρ < 1,

(3.4)

0

where q(i) represents the consumption of variety i ∈ [0, M]. Hence, each consumer displays a preference for variety. In (3.4), the parameter ρ stands for the inverse of the intensity of love for variety over the differentiated product. When ρ is close to 1, varieties are close to perfect substitutes; when ρ decreases, the desire to spread consumption over all varieties increases. If σ ≡ 1/(1 − ρ), then σ is the elasticity of substitution between any two varieties. Because there is a continuum of ﬁrms, each ﬁrm is negligible and the interactions between any two ﬁrms are zero, but aggregate market conditions of some kind (e.g., the average price across ﬁrms) affect any single ﬁrm. This provides a setting in which ﬁrms are not competitive (in the classic economic sense of having inﬁnite demand elasticity), but at the same time they have no strategic interactions with one another [see (3.5)]. If y denotes the consumer income and p(i) the price of variety i, then the demand functions are q(i) = µyp(i)−σP σ −1

i ∈ [0, M],

(3.5)

where P is the price index of the differentiated product given by P≡

M

p(i)−(σ −1) di

−1/(σ −1) .

(3.6)

0

The corresponding indirect utility function is v = y P −µ .

(3.7)

Without loss of generality, we choose the unit of skilled labor such that c = 1 in (3.1). The output of the M-sector is shipped at a positive cost according to the “iceberg” technology: When one unit of the differentiated product is moved from region r to region s, only a fraction 1/ϒ arrives at its destination with ϒ > 1. Because mill and discriminatory pricing can be shown to be equivalent in the present setting, we may use the mill pricing interpretation in what follows. When variety i is sold in region r at the mill price pr (i), the price pr s (i) paid by a consumer located in region s (= r ) is pr s (i) = pr (i)ϒ. If the distribution of ﬁrms is (Mr , Ms ), using (3.6) the price index Pr in region r

Agglomeration and Market Interaction

is then given by Pr =

Mr

pr (i)−(σ −1) di + ϒ −(σ −1)

0

315

Ms

ps (i)−(σ −1) di

$−1/(σ −1) ,

0

(3.8) which clearly depends on the spatial distribution of ﬁrms, as well as the level of transport costs. Let w r denote the wage rate of a skilled worker living in region r . Because there is free entry and exit and, therefore, zero proﬁt in equilibrium, the income of region r is Yr = λr H w r + L/2

r = A, B,

(3.9)

where λr is the share of skilled workers residing in region r . Using (3.5), the total demand of the ﬁrm producing variety i and located in region r is qr (i) = µpr (i)−σ Yr (Pr )σ −1 + µpr (i)−σ Ys ϒ −(σ −1) (Ps )σ −1 .

(3.10)

Because each ﬁrm has a negligible impact on the market, it may accurately neglect the impact of a price change over consumers’ income (Yr ) and other ﬁrms’ prices, hence on the regional price indexes (Pr ). Consequently, (3.10) implies that, regardless of the spatial distribution of consumers, each ﬁrm faces an isoelastic demand. This very convenient property depends crucially on the assumption of an iceberg transport cost, which affects here the level of demand but not its elasticity. The proﬁt function of a ﬁrm in r is πr (i) = [ pr (i) − w r ]qr (i) − w r f. Because varieties are equally weighted in the utility function, the equilibrium price is the same across all ﬁrms located in region r . Solving the ﬁrst-order condition yields the common equilibrium price pr∗ =

wr . ρ

(3.11)

Substituting pr∗ into πr (i) leads to πr =

wr [qr − (σ − 1) f ]. σ −1

Under free entry, proﬁts are zero so that the equilibrium output of a ﬁrm is given by qr∗ = (σ − 1) f , which is independent of the spatial distribution of demand. As a result, in equilibrium, a ﬁrm’s labor requirement is a constant given by l ∗ = σ f , and thus the total number of ﬁrms in the M-sector is equal to H/σ f . The corresponding distribution of ﬁrms Mr = λr H/σ f

r = A, B

(3.12)

316

Fujita and Thisse

depends only on the distribution of the skilled workers. Hence, the model allows for studying the spatial distribution of the modern sector but not for its size. Introducing the equilibrium prices (3.11) and substituting (3.12) for Mr in the regional price index (3.8) gives % &−1/(σ −1) Pr = κ1 λr w r−(σ −1) + λs (w s ϒ)−(σ −1) , (3.13) where κ1 is a positive constant. Finally, we consider the labor market-clearing conditions for a given distribution of workers. The wage prevailing in region r is the highest wage that ﬁrms located there can pay under the nonnegative proﬁt constraint. For that, we evaluate the demand (3.10) as a function of the wage through the equilibrium price (3.11): qr (w r ) = µ(w r /ρ)−σ Yr Prσ −1 + Ys ϒ −(σ −1) Psσ −1 . Because this expression is equal to (σ − 1) f when proﬁts are zero, we obtain the following implicit expression for the zero-proﬁt wages: 1/σ , (3.14) w r∗ = κ2 Yr Prσ −1 + Ys ϒ −(σ −1) Psσ −1 where κ2 is a positive constant. Clearly, w r∗ is the equilibrium wage in region r when λr > 0. Substituting (3.9) for Yr in the indirect utility (3.7), we obtain the real wage as follows: vr = ωr =

w r∗ Prµ

r = A, B.

(3.15)

Finally, the Walras law implies that the traditional sector market is in equilibrium provided that the equilibrium conditions noted previously are satisﬁed. Summarizing the foregoing developments, the basic equations for our economy are given by (3.9), (3.13), (3.14), and (3.15). From now on, set λ A = λ and λ B = (1 − λ). 3.2.1.

The Core-Periphery Structure

Suppose that the modern sector is concentrated in one region, say region A, so that λ = 1. We wish to determine conditions under which the real wage a skilled worker may obtain in region B does not exceed the real wage she gets in region A. Setting λ = 1 in (3.9), (3.13), (3.14), and (3.15), we get ωB 1 + µ −σ (µ+ρ) 1 − µ −σ (µ−ρ) 1/σ ϒ ϒ = + . (3.16) ωA 2 2 The ﬁrst term in the right-hand side of (3.16) is always decreasing in ϒ. Therefore, if µ ≥ ρ, the second term is also decreasing so that the ratio ω B /ω A always decreases with ϒ, thus implying that ω B < ω A for all ϒ > 1. This

Agglomeration and Market Interaction

317

ωB / ω A ωB / ω A

1

0 1

Υsustain

Υ

Figure 8.2. Determination of the sustain point.

means that the core-periphery structure is a stable equilibrium for all ϒ > 1. When µ ≥ ρ,

(3.17)

varieties are so differentiated that ﬁrms’ demands are not very sensitive to differences in transportation costs, thus making the agglomeration force very strong. More interesting is the case in which µ < ρ;

(3.18)

that is, varieties are not very differentiated so that ﬁrms’ demands are sufﬁciently elastic for the agglomeration force to be weak. If (3.18) holds, ϒ −µσ +σ −1 goes to inﬁnity when ϒ → ∞ and the ratio ω B /ω A is as depicted in Figure 8.2. In this case, there exists a single value ϒsustain > 1 such that ω B /ω A = 1. Hence, the agglomeration is a stable equilibrium for any ϒ ≤ ϒsustain . This occurs because ﬁrms can enjoy all the beneﬁts of agglomeration without losing much of their business in the other region. Such a point is called the sustain point because, once ﬁrms are fully agglomerated, they stay so for all smaller values of ϒ. On the other hand, when transportation costs are sufﬁciently high (ϒ > ϒsustain ), ﬁrms lose much on their exports, and thus the core-periphery structure is no longer an equilibrium. Summarizing this discussion, we obtain: Proposition 3.1. Consider a two-region economy. (i) If µ ≥ ρ, then the core-periphery structure is always a stable equilibrium.

318

Fujita and Thisse

(ii) If µ < ρ, then there exists a unique solution ϒsustain > 1 to the equation 1 + µ −σ (µ+ρ) 1 − µ −σ (µ−ρ) + = 1, ϒ ϒ 2 2 such that the core-periphery structure is a stable equilibrium for any ϒ ≤ ϒsustain . Interestingly, this proposition provides formal support to the claim made by Kaldor (1970, p. 241) more than 30 years ago: When trade is opened up between them, the region with the more developed industry will be able to supply the need of the agricultural area of the other region on more favourable terms: with the result that the industrial centre of the second region will lose its market and will tend to be eliminated.

3.2.2.

The Symmetric Structure

Proposition 3 suggests that the modern sector is geographically dispersed when transportation costs are high, at least when (3.18) holds. To check this, we consider the symmetric conﬁguration (λ = 1/2). In this case, for a given ϒ, the symmetric equilibrium is stable (unstable) if the slope of ω(λ) is negative (positive) at λ = 1/2. Checking this condition requires fairly long calculations using all the equilibrium conditions. However, Fujita, Krugman, and Venables (1999) have shown the following results. First, when (3.18) does not hold, the symmetric equilibrium is always unstable. Second, when (3.18) holds, this equilibrium is stable (unstable) if ϒ is larger (smaller) than some threshold value ϒbreak given by (ρ + µ)(1 + µ) 1/(σ −1) ϒbreak = , (3.19) (ρ − µ)(1 − µ) which is clearly larger than one. This is called the break point because symmetry between the two regions is no longer a stable equilibrium for lower values of ϒ. It is interesting to note that ϒbreak depends on the same parameters as ϒsustain . It is immediate from (3.19) that ϒbreak is increasing with the share of the modern sector (µ) and with the degree of product differentiation (1/ρ). Because ϒbreak < ϒsustain can be shown to hold,7 there exists a domain of parameters over which there is multiplicity of equilibria, namely agglomeration and dispersion, as depicted in Figure 8.3. More precisely, when ϒ > ϒsustain , the economy necessarily involves dispersion. When ϒ < ϒbreak , agglomeration always arises, the winning region depending on the initial conditions. Finally, when ϒbreak ≤ ϒ ≤ ϒsustain , both agglomeration and dispersion are stable equilibria. In this domain, the economy displays some hysteresis because dispersion (agglomeration) still prevails when transport costs rise above the sustain point 7

See Neary (2001) for a proof.

Agglomeration and Market Interaction

319

λ 1

1

2

0

1

Υbreak

Υsustain

Υ

Figure 8.3. Bifurcation diagram for the core-periphery model.

(fall below the break point) while staying below the break point (above the sustain point). Summarizing these results, when transportation costs are sufﬁciently low, all manufacturers are concentrated in a single region that becomes the core of the economy, whereas the other region, called the periphery, supplies only the traditional good. Firms in the modern sector are able to exploit increasing returns by selling more in the large market without losing much business in the small market. For exactly the opposite reason, the economy displays a symmetric regional pattern of production when transportation costs are large. Hence, this model allows for the possibility of divergence between regions, whereas the neoclassical model, based on constant returns and perfect competition in the two sectors, would predict symmetry only. 3.3.

A Linear Model of Core-Periphery

The conclusions derived in Section 3.2 are very important for the space economy. This is why it is crucial to know how they depend on the speciﬁcities of the framework used. The use of both the CES utility and iceberg cost leads to a convenient setting in which demands have a constant elasticity. However, such a result conﬂicts with research in spatial pricing theory in which demand elasticity is shown to vary with distance. Moreover, if using the iceberg cost is able to capture the fact that shipping is resource-consuming, such a modeling option implies that any increase in the mill price is accompanied by a proportional increase in transport cost, which seems unrealistic. Last, although models of the type considered in the foregoing are based on very speciﬁc assumptions, they are often beyond the reach of analytical resolution. The setting considered here, which has been developed by Ottaviano, Tabuchi, and Thisse (2002), is very similar to that used in Section 3.2. However, there are two major differences. First, the output of the M-sector is traded at a cost of τ units of the num´eraire per unit shipped between regions. This characteristic agrees more with reality, as well as with location theory, than the iceberg technology does. Second, preferences are given by a quasi-linear utility encapsulating a quadratic subutility instead of a Cobb–Douglas preference

320

Fujita and Thisse

on the homogeneous and differentiated goods with CES subutility. These two speciﬁcations correspond to rather extreme cases: the former assumes an inﬁnite elasticity of substitution between the differentiated product and the num´eraire, the latter a unit elasticity. Moreover, ﬁrms’ demands are linear and not isoelastic. Despite such major differences in settings, we will see that conclusions are qualitatively the same in the two models, thus suggesting that they hold for a whole class of models. 3.3.1.

A Model with Quadratic Utility and Linear Transport Costs

Preferences are identical across individuals and described by a quasi-linear utility with a quadratic subutility that is supposed to be symmetric in all varieties: M M u(q0 ; q(i), i ∈ [0, M]) = α q(i) di − (β − δ) [q(i)]2 di 0

−δ

0

2

M

q(i) di

+ q0 ,

(3.20)

0

where q(i) is the quantity of variety i ∈ [0, M] and q0 the quantity of a homogeneous good chosen as the num´eraire. The parameters in (3.20) are such that α > 0 and β > δ > 0. In this expression, α expresses the intensity of preferences for the differentiated product, whereas β > δ means that consumers’ preferences exhibit love of variety. Finally, for a given value of β, the parameter δ expresses the substitutability between varieties: the higher δ, the closer substitutes the varieties. Admittedly, a quasi-linear utility abstracts from general equilibrium income effects and gives the corresponding framework a fairly strong partial equilibrium ﬂavor. However, it does not remove the interaction between product and labor markets, thus allowing us to develop a full-ﬂedged model of agglomeration formation, independently of the relative size of the manufacturing sector. Any individual is endowed with one unit of labor (of type H or L) and q 0 > 0 units of the num´eraire. Her budget constraint can then be written as follows: M p (i) q (i) di + q0 = y + q 0 , 0

where y is the individual’s labor income and p(i) the price of variety i. The initial endowment q 0 is supposed to be large enough for the residual consumption of the num´eraire to be strictly positive for each individual. Hence, individual demand q(i) for variety i is given by q(i) = a − (b + d M) p(i) + d P, where

M

P≡

p(i) di, 0

(3.21)

Agglomeration and Market Interaction

321

which can be interpreted as the price index in the modern sector, whereas a ≡ 2α/[(β + (M − 1)δ], b ≡ 1/[β + (M − 1)δ], and d ≡ δ/(β − δ)[β + (M − 1)δ]. Finally, each variety can be traded at a positive cost of τ units of the num´eraire for each unit transported from one region to the other, regardless of the variety. The technologies are the same as in Section 3.1, but, for simplicity, c is set equal to zero in (3.1). Labor market clearing implies that the number of ﬁrms belonging to the M-sector in region r is Mr = λr H/ f.

(3.22)

Consequently, the total number of ﬁrms in the economy is constant and equal to M = H/ f . Discriminatory and mill pricing are no longer equivalent in this model. In the sequel, we focus on discriminatory pricing, because this policy endows ﬁrms with ﬂexibility in their price choice, something that could affect the process of agglomeration. This means that each ﬁrm sets a delivered price speciﬁc to each region. Hence, the proﬁt function of a ﬁrm located in region r is as follows: πr = prr qrr ( prr )(L/2 + λr H ) + ( pr s − τ )qr s ( pr s )(L/2 + λs H ) − f w r . To illustrate the type of interaction that characterizes this model of monopolistic competition, we describe how the equilibrium prices are determined. Each ﬁrm i in region r maximizes its proﬁt πr , assuming accurately that its price choice has no impact on the regional price indices Mr Ms Pr ≡ prr (i)di + psr (i)di s = r. 0

0

Because, by symmetry, the prices selected by the ﬁrms located within the same ∗ region are identical, the result is denoted by prr (Pr ) and pr∗s (Ps ). Clearly, it must be that ∗ ∗ (Pr ) + Ms psr (Pr ) = Pr . Mr prr

Given (3.22), it is then readily veriﬁed that the equilibrium prices are as follows: 1 2a + τ dλs M , 2 2b + d M τ pr∗s = pss + . 2

∗ prr =

(3.23) (3.24)

∗ Clearly, these prices depend directly on the ﬁrms’ distribution. In particular, prr decreases with the number of ﬁrms in region r and increases with the degree of product differentiation when τ is sufﬁciently small for the demands of the imported varieties to be positive. These results agree with what we know from standard models of product differentiation.

322

Fujita and Thisse

It is easy to check that the equilibrium operating proﬁts earned in each market by a ﬁrm established in r are as follows: ∗ 2 πrr∗ = (b + d M)( prr ) (L/2 + λr H ), ∗ ∗ πr s = (b + d M)( pr s − τ )2 (L/2 + λs H ).

Increasing λr has two opposite effects on πrr∗ . First, as λr rises, the equilibrium price (3.23) falls as well as the quantity of each variety bought by each consumer living in region r . However, the total population of consumers residing in this region is now larger so that the proﬁts made by a ﬁrm located in r on local sales may increase. What is at work here is a global demand effect due to the increase in the local population that may compensate ﬁrms for the adverse price effect, as well as for the decrease in each worker’s individual demand. Entry and exit are free so that proﬁts are zero in equilibrium. Hence, (3.22) implies that any change in the population of workers located in one region must be accompanied by a corresponding change in the number of ﬁrms. The equilibrium wage rates w r∗ of the skilled are obtained from the zero-proﬁt condition evaluated at the equilibrium prices: w r∗ (λr ) = (πrr∗ + πr∗s )/ f . 3.3.2.

The Debate Agglomeration Vs. Dispersion Revisited

The indirect utility differential v(λ) is obtained by plugging the equilibrium prices (3.23)–(3.24) and the equilibrium wages w r∗ (λ) into the indirect utility associated with (3.20): v(λ) ≡ v A (λ) − v B (λ) = C ∗ τ (τ ∗ − τ )(λ − 1/2),

(3.25)

where C ∗ is a positive constant and τ∗ ≡

4a f (3b f + 2d H ) > 0. 2b f (3b f + 3d H + d L) + d 2 H (L + H )

(3.26)

It follows immediately from (3.25) that λ = 1/2 is always an equilibrium. Moreover, because v(λ) is linear in λ and C ∗ > 0, for λ = 1/2 the indirect utility differential always has the same sign as λ − 1/2 if and only if τ < τ ∗ ; if τ > τ ∗ , it has the opposite sign. In particular, when there are no increasing returns in the manufacturing sector ( f = 0), the coefﬁcient of (λ − 1/2) is always negative because τ ∗ = 0, and thus dispersion is the only (stable) equilibrium. This shows once more the importance of increasing returns for the possible emergence of an agglomeration.8 The same holds for product differentiation, because τ ∗ becomes arbitrarily small when varieties become less and less differentiated (d → ∞). 8

Sonnenschein (1982) shows, a contrario, a related result: if the initial distribution of ﬁrms is uneven along a given circle, then the spatial adjustment of ﬁrms in the direction of higher proﬁt leads the economy toward a uniform long-run equilibrium, each local economy being perfectly competitive.

Agglomeration and Market Interaction

323

It remains to determine when τ ∗ is sufﬁciently low for all demands to be positive at the equilibrium prices. This is so if and only if L/H >

6b2 f 2 + 8bd f H + 3d 2 H 2 . d H (2b f + d H )

(3.27)

The inequality (3.27) means that the population of unskilled is large relative to the population of skilled. When (3.27) does not hold, the coefﬁcient of (λ − 1/2) in (3.25) is always positive for all transport costs that allow for interregional trade. In this case, the advantages of having a large home market always dominate the disadvantages incurred while supplying a distant periphery. The condition (3.18) plays a role similar to (3.17). More interesting is the case when (3.27) holds. Although the size of the industrial sector is captured here through the relative population size L/H and not through its share in consumption, the intuition is similar: the ratio L/H must be sufﬁciently large for the economy to display different types of equilibria according to the value of τ . This result does not depend on the expenditure share on the manufacturing sector because of the absence of general equilibrium income effects: small or large sectors in terms of expenditure share are agglomerated when τ is small enough. Finally, stability is studied using (3.2). When τ > τ ∗ , it is straightforward to see that the symmetric conﬁguration is the only stable equilibrium. In contrast, when τ < τ ∗ , the symmetric equilibrium becomes unstable and workers agglomerate in region r provided that the initial fraction of workers residing in this region exceeds 1/2. In other words, agglomeration arises when the transport cost is low enough. Proposition 3.2. Consider a two-region economy with segmented markets. (i) When (3.27) does not hold, the core-periphery structure is the only stable equilibrium under trade. (ii) When (3.27) is satisﬁed, we have: for any τ > τ ∗ the symmetric conﬁguration is the only stable equilibrium with trade; for any τ < τ ∗ the core-periphery pattern is the unique stable equilibrium; for τ = τ ∗ any conﬁguration is an equilibrium. Because (3.25) is linear in λ, the break point and the sustain point are the same, and thus history alone matters for the selection of the agglomerated outcome. Looking at the threshold value τ ∗ as given by (3.26), we ﬁrst observe that τ ∗ increases with the degree of product differentiation (d falls) when (3.27) holds. This is intuitively plausible because the agglomeration process is driven by the mobility of the skilled workers, whence their population must be sufﬁciently large for product differentiation to act as an agglomeration force. Second, higher ﬁxed costs leads to a smaller number of ﬁrms/varieties. Still, it is readily veriﬁed that τ ∗ also increases when increasing returns become stronger ( f rises) when

324

Fujita and Thisse

(3.27) holds. In other words, the agglomeration of the modern sector is more likely, the stronger are the increasing returns at the ﬁrm’s level. Last, τ ∗ increases when the number of unskilled (L) decreases because the dispersion force is weaker. Both models studied in this section yield similar results, suggesting that the core-periphery structure is robust against alternative speciﬁcations. Each model has its own merit. The former allows for income effects and the latter for a ﬁner description of the role played by the key parameters of the economy. As will be seen later, both have been used in various extensions of the core-periphery model. 4. FURTHER TOPICS IN ECONOMIC GEOGRAPHY In this section, we present an abbreviated version of a few recent contributions. The interested reader will ﬁnd the models at greater length in the corresponding references. 4.1.

On a ∩-Shaped Relationship Between Agglomeration and Transport Costs

The assumption of zero transport costs for the homogeneous good is not innocuous. Indeed, introducing positive transport costs for this good leads to some fundamental changes in the results presented previously. To permit trade of the traditional good even at the symmetric conﬁguration, we assume that this good is differentiated too (e.g., oranges in A and apples in B). Thus, T as it appears in (3.3) is now given by η η 1/η , T = T A + TB where 0 < η < 1. The num´eraire is given by the traditional good in one of the two regions. As shown by Fujita et al. (1999), the bifurcation diagram given in Figure 8.3 changes and is now as in Figure 8.4. To make things simple, we consider a ﬁxed value for the transport costs of the traditional good and, as before, we concentrate on a decrease in the transport costs in the modern sector. When these costs are high, the symmetric conﬁguration is the only equilibrium. Below some critical value, the core-periphery arises as before. However, further reductions in transport costs eventually lead to redispersion of the modern sector. Indeed, the agglomeration of the modern sector within, say, region A generates large imports of the traditional good from region B. When transport costs in the modern sector become sufﬁciently low, the price indices of this good are about the same in the two regions. Then, the relative price of the traditional good in A rises because its transport cost remains unchanged. This in turn lowers region B’s nominal wage, which guarantees the same utility level in both regions to the skilled. When the transport costs within the modern sector decrease sufﬁciently, the factor price differential becomes strong enough to induce ﬁrms to move away from A to B.

Agglomeration and Market Interaction

325

λ 1.0

0.5

0.0 1.0

1.2

1.4

1.6

1.8

ΥM

Figure 8.4. Bifurcation with positive agricultural transport costs.

Consequently, as transport costs in the modern sector keep decreasing from high to very low values, whereas transport costs in the traditional sector remain constant, the modern sector is ﬁrst dispersed, then agglomerated, and redispersed, as seen in Figure 8.4. It is worth stressing that the reasons that lead to dispersion in the ﬁrst and third phases are different: in the former, the modern sector is dispersed because the cost of shipping its output is high; in the latter, dispersion arises because the periphery develops some comparative advantage in terms of labor cost. Although transport costs of both types of goods have declined since the beginning of the Industrial Revolution, what matters for the regional distribution of economic activities is not only the absolute levels of transport costs, but also their relative values across sectors (Kilkenny, 1998). For example, if both costs decrease proportionally, it can be shown that redispersion never occurs. This is not surprising because there is no force creating wage differential any more. However, if agricultural transport costs decrease at a lower pace than those of manufacturing goods, cheaper rural labor should eventually attract industrial ﬁrms, whereas the reversal in the relationship between transport costs has the opposite impact [see Fujita et al. (1999), Section 7.4 for more details]. The pattern dispersion/agglomeration/redispersion also arises as long as we consider any ingredient giving rise to factor price differentials in favor of the periphery. For example, if we assume that the agglomeration of the modern sector in one region generates higher urban costs, such as land rent and commuting costs, a sufﬁciently strong decrease in transport costs between regions will foster redispersion when ﬁrms located in the core region have to pay high wages to their workers. This occurs because workers must be compensated for the high urban costs associated with a large concentration of people within

326

Fujita and Thisse

the same urban area (Helpman, 1998, Tabuchi, 1998, and Ottaviano et al., 2002). Another example is when all workers are immobile, whereas agglomeration of the industrial sector may arise because of technological linkages with the intermediate sector (more on this later). In this case, wage in the core region may become so high that redispersion is proﬁtable for ﬁrms (Krugman and Venables, 1995 and Puga, 1999). 4.2.

Welfare Implications of the Core-Periphery Structure

We now wish to determine whether or not agglomeration is efﬁcient. To this end, we assume that the planner is able (i) to assign any number of workers (or, equivalently, of ﬁrms) to a speciﬁc region and (ii) to use lump sum transfers from all workers to pay for the loss ﬁrms may incur while pricing at marginal cost. Because utilities are quasi-linear in the model of Section 3.3, a utilitarian approach may be used to evaluate the global level of welfare (Ottaviano and Thisse, 2002). Observe that no distortion arises in the total number of varieties because N is determined by the factor endowment (H ) and technology ( f ) in the modern sector and is, therefore, the same at both the equilibrium and optimum outcomes. Because the setting assumes transferable utility, the planner chooses λ to maximize the sum of individual indirect utilities W (λ) (for both types of workers) in which all prices have been set equal to marginal cost. It can be shown that W (λ) = C o τ (τ o − τ )λ(λ − 1) + constant,

(4.1)

o

where C is a positive constant and τo ≡

4a f . 2b f + d(H + L)

The welfare function (4.1) is strictly concave in λ if τ > τ o and strictly convex if τ < τ o . Furthermore, because the coefﬁcients of λ2 and of λ are the same (up to their sign), this expression always has an interior extremum at λ = 1/2. As a result, the optimal choice of the planner is determined by the sign of the coefﬁcient of λ2 , that is, by the value of τ with respect to τ o : if τ > τ o , the symmetric conﬁguration is the optimum; if τ < τ o any agglomerated conﬁguration is the optimum; if τ = τ o , the welfare level is independent of the spatial conﬁguration. In accordance with intuition, it is efﬁcient to agglomerate the modern sector into a single region once transport costs are low, increasing returns are strong enough, and/or the output of this sector is sufﬁciently differentiated. On the other hand, the optimum is always dispersed when increasing returns vanish ( f = 0) and/or when varieties are close substitutes (d is large). A simple calculation shows that τ o < τ ∗ . This means that the market yields an agglomerated conﬁguration for a range (τ o < τ < τ ∗ ) of transport cost values for which it is efﬁcient to have a dispersed pattern of activities. In contrast, when transport costs are low (τ < τ o ) or high (τ > τ ∗ ), no regional policy is

Agglomeration and Market Interaction

327

required from the efﬁciency point of view, although equity considerations might justify such a policy when agglomeration arises. On the contrary, for intermediate values of transport costs (τ o < τ < τ ∗ ), the market provides excessive agglomeration, thus justifying the need for an active regional policy to foster the dispersion of the modern sector on both the efﬁciency and equity grounds.9 This discrepancy may be explained as follows. First, workers do not internalize the negative external effects they impose on the unskilled who stay put, nor do they account for the impact of their migration decisions on the residents in their region of destination. Hence, even though the skilled have individual incentives to move, these incentives do not reﬂect the social value of their move. This explains why equilibrium and optimum do not necessarily coincide. Second, the individual demand elasticity is much lower at the optimum (marginal cost pricing) than at the equilibrium (Nash equilibrium pricing), and thus regional price indices are less sensitive to a decrease in τ . As a result, the fall in trade costs must be sufﬁciently large to make the agglomeration of workers socially desirable; this tells us why τ o < τ ∗ . 4.3.

On the Impact of Forward-Looking Behavior

In the dynamics used in Section 3, workers care only about their current utility level. This is a fairly restrictive assumption to the extent that migration decisions are typically made on the grounds of current and future utility ﬂows and costs (such as search, mismatch, and homesickness). In addition, this approach has been criticized because it is not consistent with fully rational forwardlooking behavior. It is, therefore, important to determine if and how workers’ expectations about the evolution of the economy may inﬂuence the process of agglomeration. In particular, we are interested in identifying the conditions under which, when initially the two regions host different numbers of skilled workers, the common belief that these workers will eventually agglomerate in the currently smaller region can reverse the historically inherited advantage of the larger region. Formally, we want to determine the parameter conditions for which there exists an equilibrium path consistent with this belief, assuming that workers have perfect foresight (self-fulﬁlling prophecy). Somewhat different approaches have been proposed to tackle this problem, but they yield similar conclusions (Ottaviano, 1999, Baldwin, 2001, and Ottaviano et al., 2002). In what follows, we use the model of Section 3.3 because it leads to a linear dynamic system that allows for a detailed analysis of the main issues (Krugman, 1991b and Fukao and B´enabou, 1993). Workers live indeﬁnitely with a rate of time preference equal to γ > 0. Because we wish to focus on the sole dynamics of migration, we assume that

9

Observe that the same qualitative results hold for a second-best analysis in which ﬁrms price at the Nash equilibrium while the planner controls their locations (Ottaviano and Thisse, 2002).

328

Fujita and Thisse

the consumption of the num´eraire is positive for each point in time so that there is no intertemporal trade in the differentiated good. For concreteness, consider the case in which workers expect agglomeration to occur in region A, whereas region B is initially larger than A. Formally, we assume that there exists T ≥ 0 such that, given λ0 < 1/2, ·

λ (t) > 0 λ (t) = 1

t ∈ [0, T ), t ≥ T.

(4.2)

Because workers have perfect foresight, the easiest way to generate a non– bang-bang migration behavior is to assume that, when moving from one region to the other, workers incur a utility loss that depends on the rate of migration, perhaps because a migrant imposes a negative externality on the others. Speciﬁcally, we assume that the cost C M(t) borne by a migrant at time t is proportional to the corresponding migration ﬂow: ) dλ(t) C M(t) ≡ δ, (4.3) dt where δ is a positive constant whose meaning is given herein. For each region r = A, B, let us deﬁne T Vr (t) ≡ e−γ (s−t) vr (s)ds + e−γ (T −t) v A (T )/γ t ∈ [0, T ),

(4.4)

t

where vr (s) is the instantaneous indirect utility at time s in region r . By definition, for r = A, V A (t) is the discounted sum of utility ﬂows of a worker who moves from B to A at time t (i.e., today), whereas for r = B, VB (t) is that of a worker who currently resides in B and plans to move to A at time T . Because workers are free to choose when to immigrate, in equilibrium they must be indifferent about the time t at which they move. Hence, at any t < T , the following equality must hold: V A (t) − C M(t) = VB (t) − e−r (T −t) C M(T ). Furthermore, because no worker residing currently in B wishes to postpone his migration time beyond T , it must be that C M(T ) = 0 (Fukao and B´enabou, 1993), and thus V A (t) − C M(t) = VB (t)

t ∈ [0, T ).

Using (4.2) and (4.3), we then obtain dλ = δV dt

t ∈ [0, T ),

(4.5)

where V ≡ (V A − VB ), and δ can be interpreted as the speed of adjustment. This means that the private marginal cost of moving equals its private marginal beneﬁt at any time t < T ; of course, λ(T ) = 1.

Agglomeration and Market Interaction

329

Using (4.4), we obtain the second law of motion by differentiating V A (t) − VB (t), thus yielding dV = γ V − v dt

t ∈ [0, T ),

(4.6)

where v ≡ v A − v B stands for the instantaneous indirect utility differential ﬂow given by (3.25). The expression (4.6) states that the “annuity value” of being in A rather than in B (i.e., γ V ) equals the “dividend” (v) plus the “capital gain” (dV /dt). As a result, because (3.25) is linear in λ, we obtain a system of two differential equations instead of one. The system (4.5) and (4.6) always has a steady state at (λ, V ) = (1/2, 0) that corresponds to the symmetric conﬁguration. When τ > τ ∗ , this steady state is globally stable. So, for the assumed belief (4.2) to be consistent with equilibrium, it must be τ < τ ∗ . Then, the study of the eigenvalues of the system (4.5) and (4.6) shows that two cases may arise. In the ﬁrst one,√when workers’ migration costs are sufﬁciently large (δ is such that γ > 2 Cδτ (τ ∗ − τ )), the outcome of the migration dynamics is the same as the one described in Section 3.3. In other words, the equilibrium path is not consistent with (4.2), thus implying that expectations do not matter. √ By contrast, when migration costs are small enough (γ < 2 Cδτ (τ ∗ − τ )), expectations may matter. More precisely, there exist two threshold values for the transport costs τ1 < τ ∗ /2 < τ2 < τ ∗ , as well as two boundary values λ1 < 1/2 < λ2 < 1 such that the equilibrium path is consistent with (4.2) if and only if τ ∈ (τ1 , τ2 ) and λ0 ∈ [λ1 , λ2 ]. Namely, as long as obstacles to trade take intermediate values and regions are not initially too different, the region that becomes the core is determined by workers’ expectations. This is more so either the lower the migration costs or the lower the discount rate. 4.4.

The Impact of a Heterogeneous Labor Force

So far, workers have been assumed to be identical in terms of preferences. Although this assumption is fairly standard in economic modeling, it seems highly implausible that potentially mobile individuals will react in the same way to some “gap” between regions. First of all, it is well known that some people show a high degree of attachment to the region in which they were born. They will stay put even though they may guarantee to themselves higher living standards in other places. In the same spirit, lifetime considerations such as marriage, divorce, and the like play an important role in the decision to migrate. Second, regions are not similar and exhibit different natural and cultural features. Clearly, people value differently local amenities, and such differences in attitudes are known to affect the migration process. These considerations are fundamental ingredients of the migration process and should be accounted for explicitly in workers’ preferences. Even though the personal motivations may be quite diverse and, therefore, difﬁcult to model at the individual level, it is possible to identify their aggregate impact on the

330

Fujita and Thisse

spatial distribution of economic activities using discrete choice theory, in much the same way that consumer preferences for differentiated products are modeled (Anderson et al. 1992). Speciﬁcally, we assume that the “matching” of workers’ with regions is expressed through the logit (McFadden 1974). This assumption turns out to be empirically relevant in migration modeling (see, e.g., Anderson and Papageorgiou 1994), whereas it is analytically convenient without affecting the qualitative nature of the main results. Then, the probability that a worker will choose to reside in region r is given by pr (λ) =

exp[vr (λ)/υ] , exp[v A (λ)/υ] + exp[v B (λ)/υ]

where υ expresses the dispersion of individual tastes: the larger υ, the more heterogeneous the responsiveness of workers to living standards differences v(λ) given by (3.25).10 When υ = 0, the living standard response is overwhelming and workers relocate until standards of living are equal in the two regions; when υ → ∞, mobility responds only to amenity differentials and the probability of moving is exogenous with respect to living standards. In the present setting, it should be clear that the population of workers changes according to the following equation of motion: dλ = (1 − λ) p B (λ) − λp A (λ) dt λ 1−λ − , = 1 + exp[−V (λ)/υ] 1 + exp[V (λ)/υ]

(4.7)

in which the ﬁrst term on the right-hand side of (4.7) stands for the fraction of people migrating into region A, whereas the second term represents those leaving this region for region B. Using theorem 5 by Tabuchi (1986), it is then readily veriﬁed that, for sufﬁciently large values of υ, there exists a unique stable equilibrium in which the manufacturing sector is equally distributed between regions. Otherwise, there exist two stable equilibria involving each partial agglomeration of the manufacturing sector in one region, whereas dispersion arises for very low values of these costs. As expected, taste heterogeneity prevents the emergence of a fully agglomerated equilibrium and favors the dispersion of activities.11 4.5.

Intermediate Sector and Industrial Agglomeration

In the models described previously, agglomeration is the outcome of a circular causation process in which more workers concentrate within the same region because they love variety. However, if workers are immobile, no agglomeration can arise. Instead, each region specializes in the production of differentiated 10 11

Alternately, it could be evaluated at ω(λ), which is deﬁned in Section 3. See Tabuchi and Thisse (2002) for more details.

Agglomeration and Market Interaction

331

varieties on the basis of their initial endowments, and intraindustry trade occurs for all values of the transport costs. However, the agglomeration of industries is a pervasive phenomenon even when labor is sticky (e.g., between countries). Venables (1996) suggests that an alternative explanation is to account for the fact that the modern sector uses an array of differentiated intermediate goods. In this case, the agglomeration of the ﬁnal sector in a particular region may occur because the concentration of the intermediate industry in that region makes the ﬁnal sector more productive and vice versa. Evidence reveals, indeed, the importance of the proximity of high-quality business services for the economic success of an urban area (Kolko, 1999). Workers being immobile, we may consider a single type of labor. Because its output is taken as homogeneous, the M-sector is assumed to operate under constant returns to scale and perfect competition. The M-good is produced according to the production function X M = l 1−α I α where

I =

M

0 < α < 1,

ρ

[q(i)] di

$1/ρ 0 0. The θi ’s are assumed to be independently and identically distributed with 1 , Prob (θi ≤ z) = 1 + exp (−νz) for some ν > 0. h measures the preference of the average agent for one of the actions, J the desire for conformity, and θi is a shock to the utility of taking the action ai = −1. Brock and Durlauf also consider generalized versions of this model where the γi j s vary, thus allowing each agent to have a distinct peer group. Example 2.2. Glaeser and Scheinkman (2001). The utility functions are: 1−β 2 β ai − (ai − Ai )2 + (θi − p)ai . 2 2 Here, 0 ≤ β ≤ 1 measures the taste for conformity. In this case, U i (ai , Ai , θi , p) = −

ai = [β Ai + θi − p].

(2.6) Ai s

Note that, when p = 0, β = 1, and the are the average action of all other agents, this is a version of the Brock–Durlauf model with continuous actions. 11

A related example is in Aoki (1995).

Nonmarket Interactions

347

Unfortunately, this case is very special. Equilibria exist only if i θi = 0, and, in this case, a continuum of equilibria would exist. The model is, as we will show, much better behaved when β < 1. In Glaeser and Scheinkman (2001), the objective was to admit both local and global interactions in the same model to try to distinguish empirically between them. This was done by allowing for two reference groups, and setting Pi1 = {1, . . . , n} − i, Ai1 the average action of all other agents, Pi2 = {i − 1} if i > 1, P12 = {n}, and writing 2 1 − β1 − β2 2 β1 U i ai , Ai1 , ai−1 , θi , p = − ai − ai − Ai1 2 2 β2 − (ai − ai−1 )2 + (θi − p)ai . 2 Example 2.3. The class of models of strategic complementarity discussed in Cooper and John (1988). Again the reference group of agent i is Pi = {1, . . . , n} − i. The set A is an interval on the line and Ai = 1/(n − 1) a j=i . There is no heterogeneity and the utility of each agent is U i = U (ai , Ai ). Cooper and John (1988) examine symmetric equilibria. The classic production externality example ﬁts in this ¯ framework. Each agent chooses an effort ai , and the resulting output is f (ai , a). Each agent consumes his per capita output and has a utility function u(ci , ai ). Write (n − 1)Ai + ai , ai . U (ai , Ai ) = u f ai , n Example 2.4. A simple version of the model of Diamond (1982) on trading externalities. Each agents draws an ei , which is his cost of production of a unit of the good. The ei ’s are distributed independently across agents and with a distribution H and density h > 0, with support on a (possibly inﬁnite) interval [0, d]. After a period in which the agent decides to produce or not, he is matched at random with a single other agent, and if they both have produced, they exchange the goods and each enjoys utility u > 0. Otherwise, if the agent has produced, he obtains utility θi ≥ 0 from the consumption of his own good. If the agent has not produced, he obtains utility 0. We assume that all agents use a cutoff policy, a level xi such that the agent produces if and only if ei ≤ xi . We set ai = H (xi ), the probability that agent i will produce. Here, the reference group is again all j = i, and j=i a j Ai = E(a j | j = i) ≡ . n−1

348

Glaeser and Scheinkman

Hence, if he uses policy ai , an agent has an expected utility that equals H −1 (ai ) U i (ai , Ai , θi ) = [u Ai + θi (1 − Ai ) − e]h(e)de. 0

Optimality requires that xi = min{u Ai + θi (1 − Ai ), d}. Suppose ﬁrst that θi ≡ 0. A symmetric equilibrium (ai ≡ a) will exist whenever there is a solution to the equation a = H (ua).

(2.7)

If H is the uniform distribution in [0, u], then every a ∈ [0, 1] is a symmetric equilibrium, As we will show in Proposition 2.2, this situation is very special. For a ﬁxed H , for almost every vector θ = (θ1 , . . . , θn ), (interior) equilibria are isolated. Example 2.5. A matching example that requires multiple reference groups (Pesendorfer, 1995). In a simple version, there are two groups, leaders (L) and followers (F), with n L and n F members, respectively. An individual can use one of two kinds of clothes. Buying the ﬁrst one (a = 0) is free; buying the second (a = 1) costs p. Agents are matched randomly to other agents using the same clothes. Suppose the utility agent i, who is of type t ∈ {L , F} and is matched to an agent of type t , is Vi (t, t , a, p, θi ) = u(t, t ) − ap + θi a, where θi is a parameter that shifts the preferences for the second kind of clothes. Assume that u(L , L) − u(L , F) > p > u(F, L) − u(F, F) > 0,

(2.8)

where we have abused notation by writing u(L , L) instead of u(t, t ) with t ∈ L and t ∈ L etc. In this example, each agent has two reference groups. If i ∈ L, then Pi1 = L − {i} and Pi2 = F. On the other hand, if i ∈ F, then Pi1 = L and Pi2 = F − {i}. 2.3.

Equilibria with Continuous Actions

In this subsection, we derive results concerning the existence, number of equilibria, stability, and ergodicity of a basic continuous action model. We try not to rely on a speciﬁc structure of reference groups or to assume a speciﬁc weighting for each reference group. We assume that A is a (possibly unbounded) interval in the real line, that each U i is at least twice continuously differentiable, and that i the second partial derivative with respect to an agent’s own action U11 < 0.12 Each agent i has a single reference group Pi . The choice a single peer group for each agent and a scalar action is not crucial, but it substantially simpliﬁes the notation. 12

i As usual, this inequality can be weakened by assuming that U11 ≤ 0 and that at the optimal choice strict inequality holds.

Nonmarket Interactions

349

We also assume that the optimal choices are interior, and hence, because i ∈ Pi , the ﬁrst-order condition may be written as U1i (ai , Ai , θi , p) = 0.

(2.9)

i < 0, then ai = g i (Ai , θi , p) is well deﬁned and Because U11

g1i (Ai , θi , p) = −

i (ai , Ai , θi , p) U12 . i U11 (ai , Ai , θi , p)

We will write G(a, θ, p) for the function deﬁned in R n × G(a, θ, p) = g 1 (A1 , θ1 , p), . . . , g n (An , θn , p) .

(2.10) n

× given by

Recall that, for given vectors θ = (θ1 , . . . , θn ) ∈ n and p, an equilibrium for (θ, p) is a vector a(θ, p) = (a1 (θ, p), . . . , an (θ, p)), such that, for each i, ai (θ, p) = g i (Ai (a(θ, p)), θi , p).

(2.11)

Proposition 2.1 gives conditions for the existence of an equilibrium. Proposition 2.1. Given a pair (θ, p) ∈ n × , suppose that I is a closedbounded interval such that, for each i, g i (Ai , θi , p) ∈ I, whenever Ai ∈ I. Then, there exists at least one equilibrium a(θ, p) ∈ I n . In particular, an equi◦ librium exists if there exists an m ∈ R, with [−m, m] ⊂ A , and such that, for any i and Ai ∈ [−m, m], U1i (−m, Ai , θi , p) ≥ 0, and U1i (m, Ai , θi , p) ≤ 0. Proof. If a ∈ I n , because Ai is a convex combination of the entries of a, Ai ∈ I. Because g i (Ai , θi , p) ∈ I, whenever Ai ∈ I, the (continuous) function G(·, θ, p) maps I n into I n , and therefore must have at least one ﬁxed point. The second part of the proposition follows because U11 < 0 implies that g i (Ai , θi , p) ∈ [−m, m], whenever Ai ∈ [−m, m]. QED Proposition 2.1 gives us sufﬁcient conditions for the existence of an equilibrium for a given (θ, p). The typical model, however, describes a process for generating the θi ’s in the cross-section. In this case, not all pairs (θ, p) are equally interesting. The process generating the θi ’s will impose a distribution on the vector θ, and we need only to check the assumptions of Proposition 2.1 on a set of θ’s that has probability one. For a ﬁxed p, we deﬁne an invariant interval I as any interval such that there exists a set ! ⊂ n with Prob (!) = 1, such that for each i, and for all θ ∈ !, g i (Ai , θi , p) ∈ I, whenever Ai ∈ I . If multiple disjoint compact invariant intervals exist, multiple equilibria prevail with probability one. It is relatively straightforward to construct models with multiple equilibria that are perturbations of models without heterogeneity.13 Suppose that is an 13

A model without heterogeneity is one where all utility functions U i and shocks θi are identical. We choose the normalization θ i ≡ 0. We will consider perturbations in which the utility functions are still uniform across agents, but the θ i can differ across agents.

350

Glaeser and Scheinkman

interval containing 0 and that g(A, θ ) is a smooth function that is increasing in both coordinates. The assumption that g is increasing in θ is only a normalization. In contrast, the assumption that g is increasing in A is equivalent to U12 > 0 (i.e., an increase in the average action by the members of his reference group, increases in the marginal utility of an agent’s own action). This assumption was called strategic complementarity in Bulow, Geanakoplos, and Klemperer (1985). Let x be a stable ﬁxed point of g(·, 0) [i.e., g(x, 0) = 0 and g1 (x, 0) < 1]. If the interval is small enough, there exists an invariant interval containing x. In particular, if a model without heterogeneity has multiple stable equilibria, the model with small noise, that is, where θ i ∈ , a small interval, will also have multiple equilibria. The condition on invariance must hold for almost all θ ∈ . In particular, if we have multiple disjoint invariant intervals and we shrink , we must still have multiple disjoint invariant intervals. On the other hand, if we expand , we may lose a particular invariant interval, and multiple equilibria are no longer assured. An implication of this reasoning is that when individuals are sorted into groups according to their θs, and agents do not interact across groups, then multiple equilibria are more likely to prevail. In Section, 2.5, we discuss a model where agents sort on their θ s. In this literature, strategic complementarity is the usual way to deliver the existence of multiple equilibria. The next example shows that, in contrast to the results of Cooper and John (1988), in our model, because we consider a richer structure of reference groups, strategic complementarity is not necessary for multiple equilibria. Example 2.6. This is an example to show that, in contrast to the case of purely global interactions, strategic complementarity is not a necessary condition for multiple equilibria. There are two sets of agents {S1 } and {S2 }, and n agents in each set. For agents of a given set, the reference group consists of all the agents of the other set. If i ∈ Sk , Ai =

1 aj, n j∈S"

" = k. There are two goods, and the relative price is normalized to one. Each agent has an initial income of one unit, and his objective is to maximize U i (ai , Ai ) = log ai + log(1 − ai ) +

λ (ai − Ai )2 . 2

(2.12)

Only the ﬁrst good exhibits social interactions, and agents of each set want i to differentiate from the agents of the other set. Provided λ < 8, U11 < 0. However, there is no strategic complementarity – an increase in the action of others (weakly) decreases the marginal utility of an agent’s own action. We will look for equilibria with ai constant within each set. An equilibrium of this type is described by a pair x, y of actions for each set of agents. In equilibrium

Nonmarket Interactions

351

we must have: 1 − 2x + λx(1 − x)(x − y) = 0, 1 − 2y + λy(1 − y)(y − x) = 0.

(2.13) (2.14)

Clearly x = y = 1/2 is always an equilibrium. It is the unique equilibrium that is symmetric across groups. Provided λ < 4, the Jacobian associated with equations (2.13) and (2.14) is positive, which is compatible with uniqueness even if we consider asymmetric equilibria. However, whenever λ > 4, the Jacobian becomes negative and other equilibria must appear. For instance, if λ = 4.04040404, x = .55 and y = .45 is an equilibrium, and consequently so is x = .45 and y = .55. Hence, at least three equilibria are obtained, without strategic complementarity. Proposition 2.1 gives existence conditions that are independent of the structure of the reference groups and the weights γi j ’s. Also, the existence of multiple invariant intervals is independent of the structure of interactions embedded in the Pi s and γi j s, and is simply a result of the choice of an individual’s action, given the “average action” of his reference group, the distribution of his taste shock, and the value of the exogenous parameter p. In some social interaction models, such as the Diamond search model (Example 2.4), there may exist a continuum of equilibria. The next proposition shows that these situations are exceptional. Proposition 2.2. Suppose is an open subset of R k and that there exists a coorj j dinate j such that ∂U1i /∂θi = 0; that is, θi has an effect in the marginal utility of the action. Then, for each ﬁxed p, except for a subset of n of Lebesgue j measure zero, the equilibria are isolated. In particular if the θi ’s are independently distributed with marginals that have a density with respect to the Lebesgue measure, then, for each ﬁxed p, except for a subset of n of zero probability, the equilibria are isolated. Proof. For any p, consider the map F(a, θ) = a − G(a, θ, p). The matrix of partial derivatives of F with respect to θ j is a diagonal matrix with entry j dii = 0, because ∂U1i /∂θi = 0. Hence, for each ﬁxed p, D F has rank n, and it is a consequence of Sard’s theorem (see, e.g., Mas-Colell 1985, p. 320) that, except perhaps for a subset of n of Lebesgue measure zero, F1 has rank n. The implicit function theorem yields the result. QED Consider again the search model discussed in Example 2.4. Suppose that u ≤ d and that each θi is in an open interval contained in (0, d). Then, at any interior equilibrium, the assumptions of the Proposition are satisﬁed. This justiﬁes our earlier claim that the continuum of equilibria exists when θi ≡ 0 is exceptional. In the model discussed in Example 2.2, if p = 0, β = 1, and

352

Glaeser and Scheinkman

the reference group of each agent is made up by all other agents (with equal weights), then if θi = 0, there are no equilibria, whereas if θi = 0, there is a continuum. Again, the continuum of equilibria is exceptional. However, if β < 1, there is a unique equilibrium for any vector θ. This situation is less discontinuous than it seems. In equilibrium, 1 θi ai = . n 1−β n Hence, if we ﬁx θi = 0 and drive β to 1, the average action becomes unbounded. Although Proposition 2.2 is stated using the θi ’s as parameters, it is also true that isolated equilibria become generic if there is heterogeneity across individuals’ utility functions. One occasionally proclaimed virtue of social interaction models is that they create the possibility that multiple equilibria might exist. Proposition 2.1 gives us sufﬁcient conditions for there to be multiple equilibria in social interactions models. One way to ensure uniqueness in this context is to place a bound on the effect of social interactions. We will say that MSI prevails if the marginal utility of an agent’s own action is more affected (in absolute value) by a change on his own action than by a change in the average action of his peers. More precisely, we say that MSI prevails if i (ai , Ai , θi , p) U12 < 1. i U11 (ai , Ai , θi , p)

(2.15)

From equation (2.10), the MSI condition implies |g1i (Ai , θi , p)| < 1.

(2.16)

This last condition is, in fact, weaker than inequality (2.15), because it is equivalent to inequality (2.15) when ai is optimal, given (Ai , θi , p). We use only inequality (2.16), and therefore we will refer to this term as the MSI condition. The next proposition shows that, if the MSI condition holds, there will be at most one equilibrium.14 Proposition 2.3. If for a ﬁxed (θ, p), MSI holds [that is, inequality (2.16) is veriﬁed for all i], then there exists at most one equilibrium a(θ, p). Proof. The matrix of partial derivatives of G with respect to a, which we denote by G 1 (a, θ, p), has diagonal elements equal to 0 and, using Equation (2.10), off-diagonal elements di j = g1i (Ai , θi , p)γi j . Also, for each i, 14

Cooper and John (1988) had already remarked that an analoguous condition is sufﬁcient for uniqueness in the context of their model.

Nonmarket Interactions

|di j | = |g1i (Ai , θi , p)|

j=i

353

γi j = |g1i (Ai , θi , p)| < 1.

j=i

It follows from the mean-value theorem that, for each (θ, p), G(a, θ, p) = a has a unique solution. QED To guarantee that uniqueness always prevails, MSI should hold for all (θ, p) ∈ n × . The assumption in Proposition 2.3 is independent of the structure of interactions embedded in the Pi ’s and the γi j ’s. An example where MSI is satisﬁed is when U (ai , Ai , θi , p) = u(ai , θi , p) + w(ai − Ai , p), where u 11 < 0, and, for each p, w(·, p) is concave. If, in addition to MSI, we assume strategic complementarity (U12 > 0), we can derive stronger results. Suppose p has a component, say p 1 , such that each g i has a positive partial derivative with respect to p 1 . In equilibrium, we have, writing F1 = I − G 1 , 1 ∂g ∂g n

∂a −1 = (F1 ) (a, θ, p) ,..., 1 . (2.17) ∂ p1 ∂ p1 ∂p Because F1 has a dominant diagonal that is equal to one, we may use the Neumann expansion to write (F1 )−1 = I + (I − F1 ) + (I − F1 )2 + · · · .

(2.18)

Recall that all diagonal elements of (I − F1 ) are zero and that the off-diagonal elements are g1i (Ai , θi , p)γi j > 0. Hence, each of the terms in this inﬁnite series is a matrix with nonnegative entries, and 1 ∂a ∂g ∂g n

= (I + H ) ,..., 1 , (2.19) ∂ p1 ∂ p1 ∂p where H is a matrix with nonnegative elements. The nonnegativity of the matrix H means that there is a social multiplier (as in Becker and Murphy 2000).15 An increase in p 1 , holding all a j ’s, j = i, constant, leads to a change dai =

∂g i (Ai , θi , p) 1 dp , ∂ p1

whereas, in equilibrium, that change equals

∂g j (A j , θ j , p) ∂g i (Ai , θi , p) d p1 . + Hi j 1 ∂ p1 ∂ p j The effect of a change in p 1 on the average ai A¯ ≡ i n 15

Cooper and John (1988) deﬁne a similar multiplier by considering symmetric equilibria of a game.

354

Glaeser and Scheinkman

is, in turn,

j (∂g i (Ai , θi , p) ∂g (A , θ , p) 1 j j dA¯ = d p1 . + Hi j 1 1 n ∂ p ∂ p i i, j

This same multiplier also impacts the effect of the shocks θi . Differences in the sample realizations of the θi s are ampliﬁed through the social multiplier effect. The size of the social multiplier depends on the value of g1i ≡ ∂g/∂ Ai . If these numbers are bounded away from one, one can bound the social multiplier. However, as these numbers approach unity, the social multiplier effect gets arbitrarily large. In this case, two populations with slightly distinct realizations of the θi s could exhibit very different average values of the actions. In the presence of unobserved heterogeneity, it may be impossible to distinguish between a large multiplier (that is, g1 is near unity) and multiple equilibria. Propositions 2.1 and 2.3 give us conditions for multiplicity or uniqueness. At this level of generality, it is impossible to reﬁne these conditions. It is easy to construct examples, where g1 > 1 in some range, but still only one equilibrium exists. One common way to introduce ad hoc dynamics in social interaction models is to simply assume that, in period t, each agent chooses his action based on the choices of the agents in his reference group at time t − 1.16 Such processes are not guaranteed to converge, but the next proposition shows that when MSI prevails, convergence occurs. Let a t (θ, p, a 0 ) be the solution to the difference equation a t+1 = G(a t , θ, p), with initial value a 0 . Proposition 2.4. If, for a ﬁxed (θ, p), |g1i (·, θi , p)| < 1, for all i, then lim a t (θ, p, a 0 ) = a(θ, p).

t→∞

Proof. For any matrix M, let #M# = maxi j |Mi j | be the matrix norm. Then, maxi |ait+1 − ai (θ, p)| ≤ sup y #G 1 (y, θ, p)# (maxi |ait − ai (θ, p)|) ≤ maxi |ait − ai (θ, p)| Hence, the vectors a t stay in a bounded set B and, by assumption, sup y∈B #G 1 (y, θ, p)# < 1. Hence, limt→∞ a t (θ, p, a 0 ) = a(θ, p). QED One intriguing feature of social interaction models is that, in some of these models, individual shocks can determine aggregate outcomes for large groups. 16

In social interaction models, ad hoc dynamics is frequently used to select among equilibria as in Young (1993, 1998) or Blume and Durlauf (1998).

Nonmarket Interactions

355

In contrast to the results presented earlier, which are independent of the particular interaction structure, ergodicity depends on a more detailed description of the interactions. For instance, consider the model in Example 2.2 with p = 0, the θi ’s iid, P1 = ∅, and Pi = {1} for each i > 1. That is, agent 1 is a “leader” that is followed by everyone. Then, a1 = θ1 and ai = θi + βa1 . Hence, the average action, even as n → ∞, depends on the realization of θ1 , even though the assumption of Proposition 2.3 holds. Our next proposition shows that, when MSI holds, shocks are iid, and individuals’ utility functions depend only on their own actions and the average action of their peer group, then, under mild technical conditions, the average action of a large population is independent of the particular realization of the shocks. Proposition 2.5. Suppose that 1. 2. 3. 4. 5. 6.

θi is identically and independently distributed. U i (and hence g i ) is independent of i. Pi = {1, . . . , i − 1, i + 1, . . . , n}. γi, j ≡ 1/(n − 1). A is bounded. MSI holds uniformly, that is, sup |g1 (Ai , θi , p)| < 1. Ai ,θi

Let a n (θ, p) denote the equilibrium when n agents are present and agent i ¯ p) such that, with probability one, receives shock θi . Then, there exists an A( lim

n→∞

n a n (θ, p) i

i=1

n

¯ p). = A(

(2.20)

n Proof. We omit the argument p from the proof. Let An (θ ) = i=1 ain (θ )/n. The boundedness of A ensures that there are convergent subsequences An k (θ ). Suppose the limit of one such convergent subsequence is A(θ). Note that Ain k (θ) − An k (θ) ≤ b/n k , for some constant b. Hence, for any ( > 0, we can ﬁnd K such that if k ≥ K , nk a n k (θ) i

i=1

nk

−

nk nk nk g(Ain k , θi ) g(A(θ), θi ) g(A(θ ), θi ) − = ≤ (. nk nk nk i=1 i=1 i=1

(2.21) Furthermore, because the θi are iid and g1 is uniformly bounded, there exists a set of probability one that can be chosen independent of A, such that, n g(A, θi ) g(A, y)d F(y), → n i=1

356

Glaeser and Scheinkman

where F is the distribution of each θi . Hence, given any ( > 0, if k is sufﬁciently large, An k (θ) − g(A(θ ), y)d F(y) ≤ (, or

A(θ ) =

g(A(θ ), y)d F(y)

in the hypothesis of the proposition guarantees that g(·, θi ) is a contraction and, ¯ In particular, as a consequence, this last equation has at most one solution, A. n all convergent subsequences of the bounded sequence A (θ) converge to A¯ and, ¯ QED. hence, An (θ) → A. The assumptions in the proposition are sufﬁcient, but not necessary, for ergodicity. In general, models in which shocks are i.i.d. and interactions are local tend to display ergodic behavior. 2.4.

“Mean Field” Models with Large Populations and Discrete Actions

In this subsection, we will examine models with discrete action spaces (actually two possible actions), in which the utility function of the agents depends on their own action and the average action taken by the population. Much of our framework and results are inspired by the treatment by Brock and Durlauf (1995) of Example 2.1 described previously. The action space of individuals is {0, 1}. As in Brock and Durlauf, we will assume that U i = U (ai , A, p) + (1 − ai )θi ; that is, the shock θi is the extra utility an agent obtains from taking action 0. We will assume that U (ai , ·, ·) is smooth and that the θi ’s are iid with a cdf F with continuous density f. Agents do not internalize the effect that their action has on the average action. We also assume strategic complementarity, which in this context we take to be U2 (1, A, p) − U2 (0, A, p) > 0; that is, an increase in the average action increases the difference in utility between action 1 and action 0. Given A, agent i will take action 1 if, and only if, θi ≤ U (1, A, p) − U (0, A, p). In a large population, a fraction F(U (1, A, p) − U (0, A, p)) will take action 1; the remainder will take action 0. A mean-ﬁeld equilibrium, thereafter MFE, is an average action A¯ such that ¯ p) − U (0, A, ¯ p)) − A¯ = 0. F(U (1, A,

(2.22)

This deﬁnition of an MFE is exactly as in the Brock and Durlauf treatment of Example 2.1. The next proposition corresponds to their results concerning equilibria in that example.

Nonmarket Interactions

357

Proposition 2.6. An MFE always exists. If 0 < A¯ < 1 is an equilibrium where ¯ p) − U (0, A, ¯ p))[U2 (1, A, ¯ p) − U2 (0, A, ¯ p)] > 1, (2.23) f (U (1, A, ¯ On then there are also at least two other MFE’s, one on each side of A. ¯ ¯ ¯ the other hand, if, at every MFE, f (U (1, A, p) − U (0, A, p))[U2 (1, A, p) − ¯ p)] < 1, there exists a single MFE. U2 (0, A, Proof. H (A) = F(U (1, A, p) − U (0, A, p)) − A satisﬁes H (0) ≥ 0, and ¯ = 0 and H (1) ≤ 0 and is continuous. If inequality (2.23) holds, then H (A)

¯ H (A) > 0. QED. The ﬁrst term on the left-hand side of inequality (2.23) is the density of agents that are indifferent between the two actions, when the average action is ¯ The second term is the marginal impact of the average action on the preference A. for action 1 over action 0, which, by our assumption of strategic complementarity, is always > 0. This second term corresponds exactly to the intensity of social inﬂuence that played a pivoting role in determining the uniqueness of equilibrium in the model with a continuum of actions. If there is a unique equilibrium,17 then ¯ p) − U (0, A, ¯ p))[U3 (1, A, ¯ p) − U3 (0, A, ¯ p)] f (U (1, A, ∂A¯ = . ¯ p) − U (0, A, ¯ p))[U2 (1, A, ¯ p) − U2 (0, A, ¯ p)] ∂p 1 − f (U (1, A, (2.24) The numerator in this expression is exactly the average change in action, when p changes, and agents consider that the average action remains constant. The denominator is, if uniqueness prevails, positive. As we emphasized in the model with continuous actions, there is a continuity in the multiplier effect. As the parameters of the model (U and F) approach the region of multiple equilibria, the effect of a change in p on the equilibrium average action approaches inﬁnity. In many examples, the distribution F satisﬁes: 1. Symmetry ( f (z) = h(|z|)) 2. Monotonicity (h is decreasing) If, in addition, the model is unbiased [U (1, 1/2, p) = U (0, 1/2, p)], then A = 1/2 is an MFE. The fulﬁllment of inequality (2.23) now depends on the value of f (0). This illustrates the role of homogeneity of the population in producing multiple equilibria. If we consider a parameterized family of models in which the random variable θi = σ xi , where σ > 0, then f σ (0) = (1/σ ) f 1 (0). As σ → 0 (σ → ∞), inequality (2.23) must hold (resp. must reverse). In particular, if the 17

In here and in what follows, we require strict uniqueness; that is, the left-hand side of inequality (2.23) is less than one.

358

Glaeser and Scheinkman

population is homogeneous enough, multiple equilibria must prevail in the unbiased case. These reasonings can be extended to biased models, if we assume that [U2 (1, ·, p) − U2 (0, ·, p)] is bounded and bounded away from zero, and that the density f 1 is continuous and positive.18 For, in this case, for σ large, sup{ f σ (U (1, A, p) − U (0, A, p))[U2 (1, A, p) − U2 (0, A, p)]} < 1. A

(2.25) Hence, equilibrium will be unique, if the population displays sufﬁcient heterogeneity. On the other hand, as σ → 0, inequality (2.25) is reversed and multiple equilibria appear. We can derive more detailed properties if we assume, in addition to the symmetry and monotonicity properties of f, that U22 (1, A, p) − U22 (0, A, p) ≤ 0; that is, the average action A has a diminishing marginal impact on the preference for the high action. In that case, it is easy to show that there are at most three equilibria. 2.5.

Choice of Peer Group

The mathematical structure and the empirical description of peer or reference groups vary from model to model. In several models (e.g., Benabou, 1993, Glaeser, Sacerdote, and Scheinkman, 1996, Gabszewicz and Thisse, 1996, or Mobius, 1999), the reference group is formed by geographical neighbors. To obtain more precise results, one must further specify the mathematical structure of the peer group relationship – typically assuming either that all fellow members of a given geographical unit form a reference group or that each agent’s reference group is formed by a set of near-neighbors. Mobius (1999) shows that, in the context that generalizes Schelling’s (1972) tipping model, the persistence of segregation depends on the particular form of the near-neighbor relationship. Glaeser, Sacerdote, and Scheinkman (1996) show that the variance of crime rates across neighborhoods or cities would be a function of the form of the near-neighbor relationship. Kirman (1983), Kirman, Oddou, and Weber (1986), and Ioannides (1990) use random graph theory to treat the peer group relationship as random. This approach is particularly useful in deriving properties of the probable peer groups as a function of the original probability of connections. Another literature deals with individual incentives for the formation of networks (e.g., Boorman, 1975, Jackson and Wolinsky, 1996, and Bala and Goyal 2000).19 18

19

An example that satisﬁes these conditions is the model of Brock and Durlauf described in Example 2.1. Brock and Durlauf use a slightly different state space, but once the proper translations are made, U2 (1, A, p) − U2 (0, A, p) = k J for a positive constant k and 0 < f 1 (z) ≤ ν. A related problem is the formation of coalition in games (e.g., Myerson, 1991).

Nonmarket Interactions

359

One way to model peer group choice is to consider a set of neighborhoods indexed by " = 1, . . . , m each with n " slots with " n " ≥ n.20 Every agent chooses a neighborhood to join after the realization of the θi ’s. To join neighborhood P " , one must pay q" . The peer group of agent i, if he joins neighborhood ", consists of all other agents j that joined " with γi j = γi j for all peers j and j . We will denote by A" the average action taken by all agents in neighborhood ". Our equilibrium notion, in this case, will parallel Tiebout’s equilibrium (see, e.g., Bewley 1981). For given vectors θ = (θ1 , . . . , θn ) ∈ n and p, an equilibrium will be a set of prices (q1 , . . . , qm ), an assignment of agents to neighborhoods, and a vector of actions a = (a1 , . . . , an ), that is, an equilibrium given the peer groups implied by the assignment, such that, if agent i is assigned to neighborhood ", there is no neighborhood " such that

sup U i (ai , A" , θi , p) − q" > sup U i (ai , Ai , θi , p) − q" . ai

(2.26)

ai

In other words, in an equilibrium with endogenous peer groups, we add the additional restriction that no agent prefers to move. To examine the structure of the peer groups that arise in equilibrium we assume, for simplicity, that the U i s are independent of i, that is, that all heterogeneity is represented in the θi s. If an individual with a higher θ gains more utility from an increase of the average action than an individual with a lower θ, then segregation obtains in equilibrium. More precisely, if is an interval [t0 , t 0 ] of the line, and V (A, θ, p) ≡ sup U (ai , A, θ, p) ai

satisﬁes V (A, θ, p) − V (A , θ, p) > V (A, θ , p) − V (A , θ , p) whenever A > A and θ > θ , there exist points t0 = t0 < t1 , < · · · < tm = t 0 such that agent i chooses neighborhood " if and only if θi ∈ [t"−1 , t" ] (e.g., Benabou, 1993, and Glaeser and Scheinkman, 2001). Although other equilibria exist, these are the only “stable” ones. 3. EMPIRICAL APPROACHES TO SOCIAL INTERACTIONS The theoretical models of social interaction models discussed previously are, we believe, helpful in understanding a wide variety of important empirical 20

This treatment of peer group formation is used in Benabou (1993) and Glaeser and Scheinkman (2001). However, in several cases, peer groups have no explicit fees for entry. Mailath, Samuelson, and Shaked (1996) examine the formation of peer groups when agents are matched to others from the same peer group.

360

Glaeser and Scheinkman

regularities. In principle, large differences in outcomes between seemingly homogeneous populations, radical shifts in aggregate patterns of behavior, and spatial concentration and segregation can be understood through social interaction models. But these models are not only helpful in understanding stylized facts, they can also serve as the basis for more rigorous empirical work. In this section, we outline the empirical approaches that can be and have been used to actually measure the magnitude of social interactions. For simplicity, in this empirical section, we focus on the linear-quadratic version of the model discussed in Example 2.2. Our decision to focus on the linear-quadratic model means that we ignore some of the more important questions in social interactions. For example, the case for place-based support to impoverished areas often hinges on a presumption that social interactions have a concave effect on outcome. Thus, if impoverished neighborhoods can be improved slightly by an exogenous program, then the social impact of this program (the social multiplier of the program) will be greater than if the program had been enacted in a more advantaged neighborhood. The case for desegregation also tends to hinge on concavity of social interactions. Classic desegregation might involve switching low human capital people from a disadvantaged neighborhood and high human capital people from a successful neighborhood. This switch will be socially advantageous if moving the low human capital people damages the skilled area less than moving the high human capital people helps the less skilled area. This will occur when social interactions operate in a concave manner. As important as the concavity or convexity of social interactions has been, most of the work in this area has focused on estimating linear effects.21 To highlight certain issues that arise in the empirical analysis, we make many simplifying assumptions that help us focus on the relevant problems.22 We will use the linear model in Example 2.2. We assume we can observe data on C, equally sized,23 groups. All interactions occur within a group. Rewriting equation (2.6) for the optimal action, to absorb p in the θi , we have ai = β Ai + θi .

(3.1)

We will examine here a simple form of global interactions. If agent i belongs to group ", 1 Ai = aj, n − 1 j=i 21

22 23

Crane (1991) is a notable exception. He searches for nonlinearities across a rich range of variables and ﬁnds some evidence for concavity in the social interactions involved in out-of-wedlock births. Reagan, Weinberg, and Yankow (2000) similarly explore nonlinearities in research on work behavior and ﬁnd evidence for concavity. A recent survey of the econometrics of a class of interaction-based binary choice models, and a review of the empirical literature, can be found in Brock and Durlauf (2001). The assumption of equally sized groups is made only to save on notation.

Nonmarket Interactions

361

where the sum is over the agents j in group ", and n is the size of a group. We will also assume that θi = λ" + εi , where the εi ’s are assumed to be iid, mean zero λ" is a place-speciﬁc variable (perhaps price) that affects everyone in the group, and εi is an idiosyncratic shock that is assumed to be independent across people. The average action within a group is λ" i ai i εi = + . (3.2) n 1−β n(1 − β) The optimal action of agent i is then

β j=i ε j λ" (n − 1 − βn + 2β)εi ai = + + . 1−β (n − 1 + β)(1 − β) (n − 1 + β)(1 − β)

(3.3)

The variance of actions on the whole population is 2 β σλ2 3(n − 1) − 2β(n − 2) − β 2 2 . + σε 1 + Var(ai ) = (1 − β)2 1−β (n − 1 + β)2 (3.4) As n → ∞, this converges to [σλ2 /(1 − β)2 ] + σε2 . In this case, and in the cases that are to follow, even moderate levels of n (n = 30+) yield results that are quite close to the asymptotic result. For example, if n = 40 and β ≤ .5, then the bias is at most −.05σε2 . Higher values of β are associated with more severe negative biases; but, when n = 100, a value of β = .75 (which we think of as being quite high) is associated with a bias of only −.135σε2 . 3.1.

Variances Across Space

The simplest, although hardly the most common, method of measuring the size of social interactions is to use the variance of a group average. The intuition of this approach stems from early work on social interactions and multiple equilibria [see, e.g., Schelling (1978), Becker (1991), or Sah (1991)]. These papers all use different social interaction models to generate multiple equilibria for a single set of parameter values. Although multiple equilibria are often used as an informal device to explain large cross-sectional volatility, in fact this multiplicity is not needed. What produces high variation is that social interactions are associated with large differences across time and space that cannot be fully justiﬁed by fundamentals. Glaeser, Sacerdote, and Scheinkman (1996) use this intuition to create a model in which social interactions are associated with a high degree of variance across space without multiple equilibria. Empirically, it is difﬁcult to separate out extremely high variances from multiple equilibria, but Glaeser and Scheinkman (2001) argue that for many variables high-variance models with a single equilibrium are a more parsimonious means of describing the data.

362

Glaeser and Scheinkman

Suppose we obtain m ≤ n observations of members of a group. The sum of the observed actions, normalized by dividing by the square root of the number of observations, will have variance ai σε2 mσλ2 var √i + = 2 (1 − β) (1 − β)2 m β 2 (n − 2) − 2β(n − 1) + (n − m)σε2 . (3.5) (1 − β)2 (n − 1 + β)2 When m = n, (3.5) reduces to [nσλ2 /(1 − β)2 ] + [σε2 /(1 − β)2 ], which is similar to the variance formula in Glaeser, Sacerdote, and Scheinkman (1996) or Glaeser and Scheinkman (2001). Thus, if m = n and σλ2 = 0, as n → ∞ the ratio of the variance of this normalized aggregate to the variance of individual actions converges to 1/(1 − β)2 . Alternatively, if m is ﬁxed, then as n grows large, the aggregate variance converges to mσλ2 + σε2 , (1 − β)2 and the ratio of the aggregate variance to the individual variance (when σλ2 = 0) converges to one. The practicality of this approach hinges on the extent to which σλ2 is either close to zero or known.24 As discussed previously, λ" may be nonzero either because of correlation of background factors or because there are place-speciﬁc characteristics that jointly determine the outcomes of neighbors. In some cases, researchers may know that neighbors are randomly assigned and that omitted place-speciﬁc factors are likely to be small. For example, Sacerdote (2000) looks at the case of Dartmouth freshman year roommates who are randomly assigned to one another. He ﬁnds signiﬁcant evidence for social interaction effects. In other contexts [see Glaeser, Sacerdote, and Scheinkman (1996)], there may be methods of putting an upper bound on σλ2 that allows the variance methodology to work. Our work found extremely high aggregate variances that seem hard to reconcile with no social interactions for reasonable levels of σλ2 . In particular, we estimated high levels of social interactions for petty crimes and crimes of the young. We found lower levels of social interactions for more serious crimes. 3.2.

Regressing Individual Outcomes on Group Averages

The most common methodology for estimating the size of social interactions is to regress an individual outcome on the group average. Crane (1991), discussed previously, is an early example of this approach. Case and Katz (1991) is another early paper implementing this methodology (and pioneering the instrumental variables approach discussed herein). Since these papers, there has been a torrent 24

In principle, we could use variations in n across groups, and the fact that when m = n the variance of the aggregates is an afﬁne function of m to try to separately estimate σλ and σε .

Nonmarket Interactions

363

of later work using this approach, and it is the standard method of trying to measure social interactions. We will illustrate the approach considering a univariate regression in which an individual outcome is regressed on the average outcome in that individual’s peer group (not including himself). In almost all cases, researchers control for other characteristics of the subjects, but these controls would add little but complication to the formulas. The univariate ordinary least squares coefﬁcient for a regression of an individual action on the action of his peer is cov ai , j=i a j /(m − 1) . (3.6) Var j=i a j /(m − 1) √ The denominator is a transformation of (3.5), where m − 1 replaces m: σλ2 j=i a j = Var m−1 (1 − β)2 + mσε2

[(n − 1 + β) − β(n − m)]2 + β 2 m(n − m) . (m − 1)2 (1 − β)2 (n − 1 + β)2 (3.7)

The numerator is cov ai ,

j=i

aj

m−1

=

σλ2 (2n − 2 − βn + 2β) + βσε2 . (1 − β)2 (1 − β)2 (n − 1 + β)2

(3.8)

When σλ = 0, then the coefﬁcient reduces to coeff =

(m − 1)2 2β(n − 1) − β 2 (n − 2) . m (n − 1 + β)2 − (n − m)[2β(n − 1) − β 2 (n − 2)] (3.9)

When m = n, coeff = 2β

(n − 1)2 (n − 1)2 − β2 . n(n − 1 + β) (n − 1 + β)2

(3.10)

Hence as n → ∞, the coefﬁcient converges to 2β − β 2 . Importantly, because of the reﬂection across individuals, the regression of an individual outcome on a group average cannot be thought of as a consistent estimate of β. However, under some conditions (m = n, large, σλ2 = 0), the ordinary least squares coefﬁcient does have an interpretation as a simple function of β. Again, the primary complication with this methodology is the presence of correlated error terms across individuals. Some of this problem is corrected by controlling for observable individual characteristics. Indeed, the strength of this approach relative to the variance approach is that it is possible to control for observable individual attributes. However, in most cases, the unobservable characteristics are likely to be at least as important as the observable ones and

364

Glaeser and Scheinkman

are likely to have strong correlations across individuals within a given locale. Again, this correlation may also be the result of place-speciﬁc factors that affect all members of the community. One approach to this problem is the use of randomized experiments that allocate persons into different neighborhoods. The Gautreaux experiment was an early example of a program that used government money to move people across neighborhoods. Unfortunately, the rules used to allocate people across neighborhoods are sufﬁciently opaque that it is hard to believe that this program really randomized neighborhoods. The Moving to Opportunity experiment contains more explicit randomization. In that experiment, funded by the department of Housing and Urban Development, individuals from high-poverty areas were selected into three groups: a control group and two different treatment groups. Both treatment groups were given money for housing, which they used to move into lowpoverty areas. By comparing the treatment and control groups, Katz, Kling, and Liebman (2001) are able to estimate the effects of neighborhood poverty without fear that the sorting of people into neighborhoods is contaminating their results. Unfortunately, they cannot tell whether their effects are the results of peers or other neighborhood attributes. As such, this work is currently the apex of work on neighborhood effects, but it cannot really tell us about the contribution of peers vs. other place-based factors. Sacerdote (2000) also uses a randomized experiment. He is able to compare people who are living in the same building, but who have different randomly assigned roommates. This work is therefore a somewhat cleaner test of peer effects. Before randomized experiments became available, the most accepted approach for dealing with cases where σλ2 = 0 was to use peer group background characteristics as instruments for peer group outcomes. Case and Katz (1991) pioneered this approach, and under some circumstances it yields valid estimates of β. To illustrate this approach, we assume that there is a parameter (x) that can be observed for all people and that is part of the individual error term (i.e., (i = γ xi + µi ). Thus, the error term can be decomposed into a term that is idiosyncratic and unobservable, and a term that is directly observable. Under the assumptions that both components of (i are orthogonal to λ" and to each other, using the formula for an instrumental variables estimator we ﬁnd that Cov ai , j=i x j /(m − 1) β = . n−1 β + (1 − β) m−1 Cov j=i a j /(m − 1), j=i x j /(m − 1) (3.11) When m = n, this reduces to β. Thus, in principle, the instrumental variables estimator can yield consistent estimates of the social interaction term of interest. However, as Manski (1993) stresses, the assumptions needed for this methodology may be untenable. First, the sorting of individuals across communities may mean that Cov(xi , µ j ) = 0 for two individuals i and j living in the same community. For example, individuals who live in high-education communities

Nonmarket Interactions

365

may have omitted characteristics that are unusual. Equation (3.11) is no longer valid in that case, and, in general, the instrumental variables estimator will overstate social interactions when there is sorting of this kind. Second, sorting may also mean that Cov(xi , λ" ) = 0. Communities with people who have high schooling levels, for example, may also have better public high schools or other important community-level characteristics. Third, the background characteristic of individual j may directly inﬂuence the outcome of person i, as well as inﬂuencing this outcome through the outcome of individual j. Many researchers consider this problem to be less important, because it occurs only when there is some level of social interaction (i.e., the background characteristic of person j inﬂuencing person i). Although this point is to some extent correct, it is also true that even a small amount of direct inﬂuence of x j on ai can lead to wildly inﬂated estimates of β, when the basic correlation of x j and a j is low. (Indeed, when this correlation is low, sorting can also lead to extremely high estimates of social interaction.) Because of this problem, instrumental variables estimates can often be less accurate than ordinary least squares estimates and need to be considered quite carefully, especially when the instruments are weak. 3.3.

Social Multipliers

A ﬁnal approach to measuring social interactions is discussed in Glaeser and Scheinkman (2001) and Glaeser, Laibson, and Sacerdote (2000), but to our knowledge has never been really utilized. This approach is derived from a lengthier literature on social multipliers in which these multipliers are discussed in theory, but not in practice (see Schelling 1978), The basic idea is that, when social interactions exist, the impact of an exogenous increase in a variable can be quite high if this increase impacts everyone simultaneously. The effect of the increase includes not only the direct effect on individual outcomes, but also the indirect effect that works through peer inﬂuence. Thus, the impact on aggregate outcomes of an increase in an aggregate variable may be much higher than the impact on an individual outcome of an increase in an individual variable. This idea has been used to explain how the pill may have had an extremely large effect on the amount of female education (see Goldin and Katz, 2000). Goldin and Katz argue that there is a positive complementarity across women who delay marriage that occurs because when one woman decides to delay marriage, her prospective spouse remains in the marriage market longer and is also available to marry other women. Thus, one woman’s delaying marriage may increase the incentives for other women to delay marriage, and this can create a social multiplier. Berman (2000) discusses social multipliers and how they might explain how government programs appear to have massive effects on labor practices among Orthodox Jews in Israel. In principle, social multipliers might explain phenomena such as the fact that there is a much stronger connection between out-of-wedlock births and crime at the aggregate level than at the individual level (see Glaeser and Sacerdote, 1999).

366

Glaeser and Scheinkman

In this section, we detail how social multipliers can be used in practice to estimate the size of social interactions. Again, we assume that the individual disturbance term can be decomposed into (i = γ xi + µi , and that m = n. When we estimate the microregression of individual outcomes on characteristic x, when x is orthogonal to all other error terms, the estimated coefﬁcient is Individual coeff = γ

(1 − β)n + (2β − 1) . (1 − β)n − (1 − β)2

(3.12)

This expression approaches γ as n becomes large, and for even quite modest levels of n (n = 20), this expression will be quite close to γ . Our assumption that the xi terms are orthogonal to the u i terms is probably violated in many cases. The best justiﬁcation for this assumption is expediency – interpretation of estimated coefﬁcients becomes quite difﬁcult when the assumption is violated. One approach, if the assumption is clearly untenable, is to use place-speciﬁc ﬁxed effects in the estimation. This will eliminate some of the correlation between individual characteristics on unobserved heterogeneity. An ordinary least squares regression of aggregate outcomes on aggregate x variables leads to quite a different expression. Again, assuming that the xi terms are orthogonal to both the λ" and µi terms, then the coefﬁcient from the aggregate regression is γ /(1 − β). The ratio of the individual to the aggregate coefﬁcient is therefore Ratio =

(1 − β)n + 2β − 1 . n−1+β

(3.13)

As n grows large, this term converges to 1 − β, which provides us with yet another means of estimating the degree of social interactions. Again, this estimate hinges critically on the orthogonality of the error terms, which generally means an absence of sorting. It also requires (as did the instrumental variables estimators) the assumption that the background characteristics of peers have no direct effect on outcomes. 3.4.

Reconciling the Three Approaches

Although we have put forward the three approaches as distinct ways to measure social interactions, in fact they are identical in some cases. In general, the microregression approach of regressing individual outcomes on peer outcomes (either instrumented or not) requires the most data. The primary advantage of this approach is that it creates the best opportunity to control for background characteristics. The variance approach is the least data intensive, because it generally requires only an aggregate and an individual variance. In the case of a binary variable, it requires only an aggregate variance. Of course, as Glaeser, Sacerdote, and Scheinkman (1996) illustrate, this crude measure can be improved on with more information. The social multiplier approach lies in the middle. This approach is closest to the instrumental variable approach using microdata.

Nonmarket Interactions

367

ACKNOWLEDGMENTS We thank Roland Benabou, Alberto Bisin, Avinash Dixit, Steve Durlauf, James Heckman, Ulrich Horst, and Eric Rasmusen for comments; Marcelo Pinheiro for research assistance; and the National Science Foundation for research support. We greatly beneﬁted from detailed comments by Lars Hansen on an earlier version.

References Aoki, M. (1995), “Economic Fluctuations with Interactive Agents: Dynamic and Stochastic Externalities,” Japanese Economic Review, 46, 148–165. Arthur, W. B. (1989), “Increasing Returns, Competing Technologies and Lock-in by Historical Small Events: The Dynamics of Allocation under Increasing Returns to Scale,” Economic Journal, 99, 116–131. Bak, P., K. Chen, J. Scheinkman, and M. Woodford (1993), “Aggregate Fluctuations from Independent Sectoral Shocks: Self-Organized Criticality in a Model of Production and Inventory Dynamics,” Ricerche Economiche, 47, 3–30. Bala, V. and S. Goyal (2000), “A Non-Cooperative Model of Network Formation,” Econometrica, 68, 1181–1229. Banerjee, A. (1992), “A Simple Model of Herd Behavior,” Quarterly Journal of Economics, 107, 797–818. Becker, G. (1991), “A Note on Restaurant Pricing and Other Examples of Social Inﬂuences on Price,” Journal of Political Economy, 99(5), 1109–1116. Becker, G. and K. M. Murphy (2000), “Social Economics: Market Behavior in a Social Environment,” Cambridge, MA: Belknap-Harvard University Press. Benabou, R. (1993), “Workings of a City: Location, Education, and Production,” Quarterly Journal of Economics, 108, 619–652. Benabou, R. (1996), “Heterogeneity, Stratiﬁcation, and Growth: Macroeconomic Effects of Community Structure,” American Economic Review, 86, 584–609. Berman, E. (2000), “Sect, Subsidy, and Sacriﬁce: An Economist’s View of UltraOrthodox Jews,” Quarterly Journal of Economics, 15, 905–954. Bewley, T. (1981), “A Critique of Tiebout’s Theory of Local Public Expenditures,” Econometrica, 49(3), 713–740. Bikhchandani, S., D. Hirshleifer, and I. Welch (1992), “A Theory of Fads, Fashion, Custom, and Cultural Exchange as Information Cascades,” Journal of Political Economy, 100, 992–1026. Blume, L. (1993), “The Statistical Mechanics of Strategic Interaction,” Games and Economic Behavior, 5, 387–424. Blume, L. and S. Durlauf (1998), “Equilibrium Concepts for Social Interaction Models,” Working Paper, Cornell University. Boorman, S. (1975), “A Combinatorial Optimization Model for Transmission of Job Information through Contact Networks,” Bell Journal of Economics, 6(1), 216–249. Brock, W. (1993), “Pathways to Randomness in the Economy: Emergent Nonlinearity and Chaos in Economics and Finance,” Estudios Economicos, 8(1), 3–55. Brock, W. and S. Durlauf (1995), “Discrete Choice with Social Interactions,” Working Paper, University of Wisconsin at Madison.

368

Glaeser and Scheinkman

Brock, W. and S. Durlauf (2001), “Interactions Based Models,” in Handbook of Econometrics (ed. by J. Heckman and E. Leamer), Amsterdam: North-Holland. Bulow, J., J. Geanakoplos, and P. Klemperer (1985), “Multimarket Oligopoly: Strategic Substitutes and Complements,” Journal of Political Economy, 93, 488–511. Case, A. and L. Katz (1991), “The Company You Keep: The Effects of Family and Neighborhood on Disadvantaged Families,” NBER, Working Paper 3705. Cooper, R. and A. John (1988), “Coordinating Coordination Failures in Keynesian Models,” Quarterly Journal of Economics, 103, 441–464. Crane, J. (1991), “The Epidemic Theory of Ghettos and Neighborhood Effects on Dropping Out and Teenage Childbearing,” American Journal of Sociology, 96, 1226–1259. Diamond, P. (1982), “Aggregate Demand Management in Search Equilibrium,” Journal of Political Economy, 90, 881–894. Durlauf, S. (1993), “Nonergodic Economic Growth,” Review of Economic Studies, 60, 349–366. Durlauf, S. (1996a), “A Theory of Persistent Income Inequality,” Journal of Economic Growth, 1, 75–93. Durlauf, S. (1996b), “Neighborhood Feedbacks, Endogenous Stratiﬁcation, and Income Inequality,” in Dynamic Disequilibrium Modeling – Proceedings of the Ninth International Symposium on Economic Theory and Econometrics, (ed. by W. Barnett, G. Gandolfo, and C. Hillinger), Cambridge: Cambridge University Press. Ellison, G. (1993), “Learning, Local Interaction, and Coordination,” Econometrica, 61, 1047–1072. Ellison, G. and D. Fudemberg (1993), “Rules of Thumb for Social Learning,” Journal of Political Economy, 101, 612–644. Follmer, H. (1974), “Random Economies with Many Interacting Agents,” Journal of Mathematical Economics, 1, 51–62. Froot, K., D. Scharfstein, and J. Stein (1992), “Herd on the Street: Informational Inefﬁciencies in a Market with Short-Term Speculation,” Journal of Finance, 47, 1461–1484. Gabszewicz, J. and J.-F. Thisse (1996), “Spatial Competition and the Location of Firms,” in Location Theory, (ed. by R. Arnott) Fundamentals of Pure and Applied Economics, Vol. 5, (ed. by J. Lesourne and H. Sonnenschein), Chur, Switzerland: Harwood Academic, 1–71. Gale, D. and H. Nikaido (1965), “The Jacobian Matrix and the Global Univalence of Mappings,” Mathematische Annalen, 159, 81–93. Glaeser, E., D. Laibson, and B. Sacerdote (2000), “The Economic Approach to Social Capital,” Working Paper 7728, NBER. Glaeser, E. and B. Sacerdote (1999), “Why Is There More Crime in Cities?” Journal of Political Economy

Advances in Economics and Econometrics This is the ﬁrst of three volumes containing edited versions of papers and commentaries presented at invited symposium sessions of the Eighth World Congress of the Econometric Society held in Seattle, WA, in August 2000. The papers summarize and interpret recent key developments, and they discuss future directions for a wide range of topics in economics and econometrics. The papers cover both theory and applications. Written by leading specialists in their ﬁelds, these volumes provide a unique survey of progress in the discipline. Mathias Dewatripont is Professor of Economics at the Universit´e Libre de Bruxelles where he was the founding Director of the European Centre for Advanced Research in Economics (ECARE). Since 1998, he has been Research Director of the Londonbased CEPR (Centre for Economic Policy Research) network. In 1998, he received the Francqui Prize, awarded each year to a Belgian scientist below the age of 50. Lars Peter Hansen is Homer J. Livingston Distinguished Service Professor of Economics at the University of Chicago. He was a co-winner of the Frisch Prize Medal in 1984. He is also a member of the National Academy of Sciences. Stephen J. Turnovsky is Castor Professor of Economics at the University of Washington and recently served as an Editor of the Journal of Economic Dynamics and Control. He is an Associate Editor and is on the Editorial Board of four other journals in economic theory and international economics. Professors Dewatripont, Hansen, and Turnovsky are Fellows of the Econometric Society and were Program Co-Chairs of the Eighth World Congress of the Econometric Society, held in Seattle, WA, in August 2000.

Econometric Society Monographs No. 35 Editors: Andrew Chester, University College London Matthew Jackson, California Institute of Technology The Econometric Society is an international society for the advancement of economic theory in relation to statistics and mathematics. The Econometric Society Monograph Series is designed to promote the publication of original research contributions of high quality in mathematical economics and theoretical and applied econometrics. Other titles in the series: G. S. Maddala Limited dependent and qualitative variables in econometrics, 0 521 33825 5 Gerard Debreu Mathematical economics: Twenty papers of Gerard Debreu, 0 521 33561 2 Jean-Michel Grandmont Money and value: A reconsideration of classical and neoclassical monetary economics, 0 521 31364 3 Franklin M. Fisher Disequilibrium foundations of equilibrium economics, 0 521 37856 7 Andreu Mas-Colell The theory of general economic equilibrium: A differentiable approach, 0 521 26514 2, 0 521 38870 8 Truman F. Bewley, Editor Advances in econometrics – Fifth World Congress (Volume I), 0 521 46726 8 Truman F. Bewley, Editor Advances in econometrics – Fifth World Congress (Volume II), 0 521 46725 X Herv´e Moulin Axioms of cooperative decision making, 0 521 36055 2, 0 521 42458 5 L. G. Godfrey Misspeciﬁcation tests in econometrics: The Lagrange multiplier principle and other approaches, 0 521 42459 3 Tony Lancaster The econometric analysis of transition data, 0 521 43789 X Alvin E. Roth and Marilda A. Oliviera Sotomayor, Editors Two-sided matching: A study in game-theoretic modeling and analysis, 0 521 43788 1 Wolfgang H¨ardle, Applied nonparametric regression, 0 521 42950 1 Jean-Jacques Laffont, Editor Advances in economic theory – Sixth World Congress (Volume I), 0 521 48459 6 Jean-Jacques Laffont, Editor Advances in economic theory – Sixth World Congress (Volume II), 0 521 48460 X Halbert White Estimation, inference and speciﬁcation, 0 521 25280 6, 0 521 57446 3 Christopher Sims, Editor Advances in econometrics – Sixth World Congress (Volume I), 0 521 56610 X Christopher Sims, Editor Advances in econometrics – Sixth World Congress (Volume II), 0 521 56609 6 Roger Guesnerie A contribution to the pure theory of taxation, 0 521 23689 4, 0 521 62956 X David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume I), 0 521 58011 0, 0 521 58983 5 David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume II), 0 521 58012 9, 0 521 58982 7 David M. Kreps and Kenneth F. Wallis, Editors Advances in economics and econometrics – Seventh World Congress (Volume III), 0 521 58013 7, 0 521 58981 9 Donald P. Jacobs, Ehud Kalai, and Morton I. Kamien, Editors Frontiers of research in economic theory: The Nancy L. Schwartz Memorial Lectures, 1983–1997, 0 521 63222 6, 0 521 63538 1 A. Colin Cameron and Pravin K. Trivedi Regression analysis of count data, 0 521 63201 3, 0 521 63567 5 Steinar Strøm, Editor Econometrics and economic theory in the 20th century: The Ragnar Frisch Centennial Symposium, 0 521 63323 0, 0 521 63365 6 Eric Ghysels, Norman R. Swanson, and Mark Watson, Editors Essays in econometrics: Collected papers of Clive W.J. Granger (Volume I), 0 521 77297 4, 0 521 80401 8, 0 521 77496 9, 0 521 79697 0 Eric Ghysels, Norman R. Swanson, and Mark Watson, Editors Essays in econometrics: Collected papers of Clive W.J. Granger (Volume II), 0 521 79207 X, 0 521 80401 8, 0 521 79649 0, 0 521 79697 0 Cheng Hsiao, Analysis of panel data, second edition, 0 521 81855 9, 0 521 52271 4 Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky, Editors Advances in economics and econometrics – Eighth World Congress (Volume II), 0 521 81873 7, 0 521 52412 1 Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky, Editors Advances in economics and econometrics – Eighth World Congress (Volume III), 0 521 81874 5, 0 521 52413 X

Advances in Economics and Econometrics Theory and Applications, Eighth World Congress, Volume I Edited by

Mathias Dewatripont Universit´e Libre de Bruxelles and CEPR, London

Lars Peter Hansen University of Chicago

Stephen J. Turnovsky University of Washington

Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo Cambridge University Press The Edinburgh Building, Cambridge , United Kingdom Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9780521818728 © Mathias Dewatripont, Lars Peter Hansen, and Stephen J. Turnovsky 2003 This book is in copyright. Subject to statutory exception and to the provision of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published in print format 2003 - isbn-13 978-0-511-06989-5 eBook (EBL) - isbn-10 0-511-06989-8 eBook (EBL) - isbn-13 978-0-521-81872-8 hardback - isbn-10 0-521-81872-9 hardback - isbn-13 978-0-521-52411-7 paperback - paperback isbn-10 0-521-52411-3 Cambridge University Press has no responsibility for the persistence or accuracy of s for external or third-party internet websites referred to in this book, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

List of Contributors Preface 1. Auctions and Efﬁciency eric maskin 2. Why Every Economist Should Learn Some Auction Theory paul klemperer 3. Global Games: Theory and Applications stephen morris and hyun song shin 4. Testing Contract Theory: A Survey of Some Recent Work pierre-andre chiappori and bernard salani e´ 5. The Economics of Multidimensional Screening jean-charles rochet and lars a. stole A Discussion of the Papers by Pierre-Andre Chiappori and Bernard Salani´e and by Jean Charles Rochet and Lars A. Stole patrick legros 6. Theories of Fairness and Reciprocity: Evidence and Economic Applications ernst fehr and klaus m. schmidt 7. Hyberbolic Discounting and Consumption christopher harris and david laibson A Discussion of the Papers by Ernest Fehr and Klaus M. Schmidt and by Christopher Harris and David Laibson glenn ellison 8. Agglomeration and Market Interaction masahisa fujita and jacques-franc¸ ois thisse 9. Nonmarket Interactions edward glaeser and jos e´ a. scheinkman Index

page ix xi 1

25 56 115 150

198

208 258

298 302 339

371

Contributors

Pierre-Andre Chiappori University of Chicago

Stephen Morris Yale University

Glenn Ellison Massachusetts Institute of Technology

Jean-Charles Rochet GREMA and IDEI-R, Universit´e des Sciences Sociales, Toulouse, France

Ernst Fehr University of Zurich and CEPR Masahisa Fujita Kyoto University Edward Glaeser Harvard University Christopher Harris University of Cambridge Paul Klemperer Oxford University David Laibson Harvard University Patrick Legros Universit´e Libre de Bruxelles Eric Maskin Institute for Advanced Study and Princeton University

Bernard Salani´e CREST, CNRS, and CEPR, Paris Jos´e A. Scheinkman Princeton University Klaus M. Schmidt University of Munich and CEPR Hyun Song Shin London School of Economics Lars A. Stole University of Chicago Jacques-Fran¸cois Thisse Universit´e Catholique de Louvain, Ecole, Nationale des Ponts et Chaus´ees, and CEPR

Preface

These volumes contain the papers of the invited symposium sessions of the Eighth World Congress of the Econometric Society. The meetings were held at the University of Washington, Seattle, in August 2000; we served as Program Co-Chairs. The book also contains an invited address, the “Seattle Lecture,” given by Eric Maskin. This address was in addition to other named lectures that are typically published in Econometrica. Symposium sessions had discussants, and about half of them wrote up their comments for publication. These remarks are included in the book after the session papers they comment on. The book chapters explore and interpret recent developments in a variety of areas in economics and econometrics. Although we chose topics and authors to represent the broad interests of members of the Econometric Society, the selected areas were not meant to be exhaustive. We deliberately included some new active areas of research not covered in recent Congresses. For many chapters, we encouraged collaboration among experts in an area. Moreover, some sessions were designed to span the econometrics–theory separation that is sometimes evident in the Econometric Society. We followed the lead of our immediate predecessors, David Kreps and Ken Wallis, by including all of the contributions in a single book edited by the three of us. Because of the number of contributions, we have divided the book into three volumes; the topics are grouped in a manner that seemed appropriate to us. We believe that the Eighth World Congress of the Econometric Society was very successful, and we hope that these books serve as suitable mementos of that event. We are grateful to the members of our Program Committee for their dedication and advice, and to Scott Parris at Cambridge University Press for his guidance and support during the preparation of these volumes. We also acknowledge support from the ofﬁcers of the Society – Presidents Robert Lucas, Jean Tirole, Robert Wilson, Elhanan Helpman, and Avinash Dixit – as well as the Treasurer, Robert Gordon, and Secretary, Julie Gordon. Finally, we express our gratitude to the Co-Chairs of the Local Organizing Committee, Jacques Lawarree and Fahad Khalil, for a smoothly run operation.

CHAPTER 1

Auctions and Efﬁciency Eric Maskin

1. INTRODUCTION The allocation of resources is an all-pervasive theme in economics. Furthermore, the question of whether there exist mechanisms ensuring efﬁcient allocation (i.e., mechanisms that ensure that resources end up in the hands of those who value them most) is of central importance in the discipline. Indeed, the very word “economics” connotes a preoccupation with the issue of efﬁciency. But economists’ interest in efﬁciency does not end with the question of existence. If efﬁcient mechanisms can be constructed, we want to know what they look like and to what extent they might resemble institutions used in practice. Understandably, the question of what will constitute an efﬁcient mechanism has been a major concern of economic theorists going back to Adam Smith. But, the issue is far from just a theoretical one. It is also of considerable practical importance. This is particularly clear when it comes to privatization, the transfer of assets from the state to the private sector. In the last 15 years or so, we have seen a remarkable ﬂurry of privatizations in Eastern Europe, the former Soviet Union, China, and highly industrialized Western nations, such as the United States, the United Kingdom, and Germany. An important justiﬁcation for these transfers has been the expectation that they will improve efﬁciency. But if efﬁciency is the rationale, an obvious leading question to ask is: “What sorts of transfer mechanisms will best advance this objective?” One possible and, of course, familiar answer is “the Market.” We know from the First Theorem of Welfare Economics (see Debreu, 1959) that, under certain conditions, the competitive mechanism (the uninhibited exchange and production of goods by buyers and sellers) results in an efﬁcient allocation. A major constraint on the applicability of this result to the circumstances of privatization, however, is the theorem’s hypothesis of large numbers. For the competitive mechanism to work properly – to avoid the exercise of monopoly power – there must be sufﬁciently many buyers and sellers so that no single agent has an appreciable effect on prices. But privatization often entails small

2

Maskin

numbers. In the recent U.S. “spectrum” auctions – the auctions in which the government sold rights (in the form of licenses) to use certain radio frequency bands for telecommunications – there were often only two or three serious bidders for a given license. The competitive model does not seem readily applicable to such a setting. An interesting alternative possibility was raised by William Vickrey (1961) 40 years ago. Vickrey showed that, if a seller has a single indivisible good for sale, a second-price auction (see Section 2) is an efﬁcient mechanism – i.e., the winner is the buyer whose valuation of the good is highest – in the case where buyers have private values (“private values” mean that no buyer’s private information affects any other buyer’s valuation). This ﬁnding is rendered even more signiﬁcant by the fact that it can be readily extended to the sale of multiple goods,1 as shown by Theodore Groves (1973) and Edward Clarke (1971). Unfortunately, once the assumption of private values is dropped and thus buyers’ valuations do depend on other buyers’ information (i.e., we are in the world of common2 or interdependent values), the second-price auction is no longer efﬁcient, as I will illustrate later by means of an example. Yet, the common-values case is the norm in practice. If, say, a telecommunications ﬁrm undertakes a market survey to forecast demand for cell phones in a given region, the results of the survey will surely be of interest to its competitors and thus turn the situation into one of common values. Recently, a literature has developed on the design of efﬁcient auctions in common-values settings. The time is not yet ripe for a survey; the area is currently evolving too rapidly for that. But I would like to take this opportunity to discuss a few of the ideas from this literature. 2. THE BASIC MODEL Because it is particularly simple, I will begin with the case of a single indivisible good. Later, I will argue that much (but not all) of what holds in the one-good case extends to multiple goods. Suppose that there are n potential buyers. It will be simplest to assume that they are risk-neutral (however, we can accommodate any other attitude toward risk if the model is specialized to the case in which there is no residual uncertainty about valuations when all buyers’ information is pooled). Assume that each buyer i’s private information about the good can be summarized by a real-valued signal. That is, buyer i’s information is reducible to a onedimensional parameter.3 Formally, suppose that each buyer i’s signal si lies in 1 2

3

Vickrey himself also treated the case of multiple units of the same good. I am using “common values” in the broad sense to cover any instance where one agent’s payoff depends on another’s information. The term is sometimes used narrowly to mean that all agents share the same payoff. Later on, I will examine the case of multidimensional signals. As with multiple goods, much will generalize. As we will see, the most problematic case is that in which there are both multiple goods and multidimensional signals.

Auctions and Efﬁciency

3

an interval [s i , s¯i ]. The joint prior distribution of (s1 , . . . , sn ) is given by the c.d.f. F(s1 , . . . , sn ). Buyer i’s valuation for the good (i.e., the most he would be willing to pay for it) is given by the function v i (s1 , . . . , sn ). I shall suppose (with little loss of generality) that higher values of si correspond to higher valuations, i.e., ∂v i > 0. ∂si

(2.1)

Let us examine two illustrations of this model. Example 2.1. Suppose that v i (s1 , . . . , sn ) = si . In this case, we are in the world of private values, not the interesting setting from the perspective of this lecture, but a valid special case. A more pertinent example is: Example 2.2. Suppose that the true value of the good to buyer i is yi , which, in turn, is the sum of a value component that is common to all buyers and a component that is peculiar to buyer i. That is, yi = z + z i , where z is the common component and z i is buyer i’s idiosyncratic component. Suppose, however, that buyer i does not actually observe yi , but only a noisy signal si = yi + εi ,

(2.2)

where εi is the noise term, and all the random variables –z, the z i s, and the εi s – are independent. In this case, every buyer j’s signal s j provides information to buyer i about his valuation, because s j is correlated [via (2.2)] with the common component z. Hence, we can express v i (s1 , . . . , sn ) as v i (s1 , . . . , sn ) = E[yi |s1 , . . . , sn ],

(2.3)

where the right-hand side of (2.3) denotes the expectation of yi conditional on the signals (s1 , . . . , sn ). This second example might be kept in mind as representative of the sort of scenario that the analysis is intended to apply to. 3. AUCTIONS An auction in the model of Section 2 is a mechanism (alternatively termed a “game form” or “outcome function”) that, on the basis of the bids submitted, determines (i) who wins (i.e., who – if anyone – is awarded the good), and

4

Maskin

(ii) how much each buyer pays.4 Let us call an auction efﬁcient provided that, in equilibrium, buyer i is the winner if and only if v i (s1 , . . . , sn ) ≥ max vj (s1 , . . . , sn ) j=i

(3.1)

(this deﬁnition is slightly inaccurate because of the possibility of ties for highest valuation, an issue that I shall ignore). In other words, efﬁciency demands that, in an equilibrium of the auction, the winner be the buyer with the highest valuation, conditional on all available information (i.e., on all buyers’ signals). This notion of efﬁciency is sometimes called expost efﬁciency. It assumes implicitly that the social value of the good being sold equals the maximum of the potential buyers’ individual valuations. This assumption would be justiﬁed if, for example, each buyer used the good (e.g., a spectrum license) to produce an output (e.g., telecommunication service) that is sold in a competitive market without signiﬁcant externalities (market power or externalities might drive a wedge between individual and social values). The reader may wonder why, even if one wants efﬁciency, it is necessary to insist that the auction itself be efﬁcient. After all, the buyers could always retrade afterward if the auction resulted in a winner with less than the highest valuation. The problem with relying on postauction trade, however, is much the same as that plaguing competitive exchange in the ﬁrst place: These mechanisms do not, in general, work efﬁciently when there are only a few traders. To see this, consider the following example:5 Example 3.1. Suppose that there are two buyers. Assume that buyer 1 has won the auction and has a valuation of 1. If the auction is not guaranteed to be efﬁcient, then there is some chance that buyer 2’s valuation is higher. Suppose that, from buyer 1’s perspective, buyer 2’s valuation is distributed uniformly in the interval [0, 2]. Now, if there is to be further trade after the auction, someone has to initiate it. Let us assume that buyer 1 does so by proposing a trading price to buyer 2. Presumably, buyer 1 will propose a price p ∗ that maximizes his expected payoff, i.e., that solves 1 max (2 − p)( p − 1). p 2

(∗)

[To understand (∗), note that 12 (2 − p) is the probability that the proposal is accepted – since it is the probability that buyer 2’s valuation is at least p – and that p − 1 is buyer 1’s net gain in the event of acceptance.] But the solution to (∗) is p ∗ = 32 . Hence, if buyer 2’s valuation lies between 1 and 32 , the allocation, 4

5

For some purposes – e.g., dealing with risk-averse buyers (see Maskin and Riley, 1984), liquidity constraints (see Che and Gale, 1996, or Maskin, 2000) or allocative externalities (see Jehiel and Moldovanu (2001) – one must consider auctions in which buyers other than the winner also make payments. In this lecture, however, I will not have to deal with this possibility. In this example, buyers have private values, but, as Fieseler, Kittsteiner, and Moldavanu (2000) show, resale can become even more problematic when there are common values.

Auctions and Efﬁciency

5

even after allowing for expost trade, will remain inefﬁcient, because buyer 2 will reject 1’s proposal. I will ﬁrst look at efﬁciency in the second-price auction. This auction form (often called the Vickrey auction) has the following rules: (i) each bidder i makes a (sealed) bid bi , which is a nonnegative number; (ii) the winner is the bidder who has made the highest bid (again ignoring the issue of ties); (iii) the winner pays the second-highest bid, max j=i b j . As I have already noted and will illustrate explicitly, in Section 6 this auction can readily be extended to multiple goods. The Vickrey auction is efﬁcient in the case of private values.6 To see this, note ﬁrst that it is optimal – in fact, a dominant strategy – for buyer i to set bi = v i (i.e., to bid his true valuation). In particular, bidding below v i does not affect buyer i’s payment if he wins (because his bid does not depend on his own bid); it just reduces his chance of winning – and so is not a good strategy. Bidding above v i raises buyer i’s probability of winning, but the additional events in which he wins are precisely those in which someone else has bid higher than v i . In such events, buyer i pays more than v i , also not a desirable outcome. Thus, it is indeed optimal to bid bi = v i , which implies that the winner is the buyer with the highest valuation, the criterion for efﬁciency. Unfortunately, the Vickrey auction does not remain efﬁcient once we depart from private values. To see this, consider the following example. Example 3.4. Suppose that there are three buyers with valuation functions 2 v 1 (s1 , s2 , s3 ) = s1 + s2 + 3 1 v 2 (s1 , s2 , s3 ) = s2 + s1 + 3 v 3 (s1 , s2 , s3 ) = s3 .

1 s3 , 3 2 s3 , 3

Notice that buyers 1 and 2 have common values (i.e., their valuations do not depend only on their own signals). Assume that it happens that s1 = s2 = 1 (of course, buyers 1 and 2 would not know that their signal values are equal, because signals are private information), and suppose that buyer 3’s signal value is either slightly below or slightly above 1. In the former case, it is easy to see that v1 > v2 > v3, and so, for efﬁciency, buyer 1 ought to win. However, in the latter case v2 > v1 > v3, 6

It is easy to show that the “ﬁrst-price” auction – the auction in which each buyer makes a bid, the high bidder wins, and the winner pays his bid – is a nonstarter as far as efﬁciency is concerned. Indeed, even in the case of private values, the ﬁrst-price auction is never efﬁcient, except when buyers’ valuations are symmetrically distributed (see Maskin, 1992).

6

Maskin

and so buyer 2 is the efﬁcient winner. Thus, the efﬁcient allocation between buyers 1 and 2 turns on whether s3 is below or above 1. But, in a Vickrey auction, the bids made by buyers 1 and 2 cannot incorporate information about s3 , because that signal is private information to buyer 3. Thus, the outcome of the auction cannot in general be efﬁcient.

4. AN EFFICIENT AUCTION How should we respond to the shortcomings of the Vickrey auction as illustrated by Example 3.3? One possible reaction is to appeal to classical mechanismdesign theory. Speciﬁcally, we could have each buyer i announce a signal value sˆi , award the good to the buyer i for whom v i (ˆs1 , . . . , sˆn ) is highest, and choose the winner’s payment to evoke truth-telling in buyers (i.e., to induce each buyer j to set sˆ j equal to his true signal value s j ). This approach is taken in Cr´emer and McLean (1985) and Maskin (1992). The problem with such a “direct revelation” mechanism is that it is utterly unworkable in practice. In particular, notice that it requires the mechanism designer to know the physical signal spaces S1 , . . . , Sn , the functional forms v i (·), and the prior distributions of the signals – an extraordinarily demanding constraint. Now, the mechanism designer could attempt to elicit this information from the buyers themselves using the methods of the implementation literature (see Palfrey, 1993). For example, to learn the signal spaces, he could have each buyer announce a vector ( Sˆ 1 , . . . , Sˆ n ) and assign suitable penalties if the announcements did not match up appropriately. A major difﬁculty with such a scheme, however, is that in all likelihood the signal spaces Si are themselves private information. For analytic purposes, we model Si as simply an interval of numbers. But, this abstracts from the reality that buyer i’s signal corresponds to some physical entity – whatever it is that buyer i observes. Indeed, the signal may well be a sufﬁcient statistic for data from a variety of different informational sources, and there is no reason why other buyers should know just what this array of sources is. To avoid these complications, I shall concentrate on auction rules that do not make use of such details as signal spaces, functional forms, and distributions. Indeed, I will be interested in auctions that work well irrespective of these details; that is, I will adhere to the “Wilson Doctrine” (after Robert Wilson, who has been an eloquent proponent of the view that auction institutions should be “detail-free”). It turns out that a judicious modiﬁcation of the Vickrey auction will do the trick. Before turning to the modiﬁcation, however, I need to introduce a restriction on valuation functions that is critical to the possibility of constructing efﬁcient auctions. Let us assume that for all i and j = i and all (s1 , . . . , sn ), v i (s1 , . . . , sn ) = v j (s1 , . . . , sn ) ⇒ 7

∂v j ∂v i (s1 , . . . , sn ) > (s1 , . . . , sn ).7 (4.1) ∂si ∂si

This condition was introduced by Gresik (1991).

Auctions and Efﬁciency

7

In other words, condition (4.1) says that buyer i’s signal has a greater marginal effect on his own valuation than on that of any other buyer j (at least at points where buyer i’s and buyer j’s valuations are equal). Notice that, in view of (2.1), condition (4.1)8 is automatically satisﬁed by Example 2.1 (the case of private values): the right-hand side of the inequality then simply vanishes. Condition (4.1) also holds for Example 2.2. This is because, in that example, si conveys relevant information to buyer j (= i) about the common component z, but tells buyer i not only about z but also his idiosyncratic component z i. Thus, v i will be more sensitive than v j to variations in si . But whether or not condition (4.1) is likely to be satisﬁed, it is, in any event, essential for efﬁciency. To see what can go wrong without it, consider the following example. Example 4.5. Suppose that the owner of a tract of land wishes to sell off the rights to drill for oil on her property. There are two potential drillers who are competing for this right. Driller 1’s ﬁxed cost of drilling is 1, whereas his marginal cost is 2. In contrast, driller 2 has ﬁxed and marginal costs of 2 and 1, respectively. Assume that driller 1 observes how much oil is underground. That is, s1 equals the quantity of oil. Driller 2 obtains no private information. Then, if the price of oil is 4, we have v 1 (s1 ) = (4 − 2)s1 − 1 = 2s1 − 1, v 2 (s1 ) = (4 − 1)s1 − 2 = 3s1 − 2. Observe that v 1 (s1 ) > v 2 (s1 ) if and only if s1 < 1. Thus, for efﬁciency, driller 1 should be awarded drilling rights provided that 12 < s1 < 1 ( for s1 < 12 , there is not enough oil to justify drilling at all). Driller 2, by contrast, should get the rights when s1 > 1. In this example, there is no way (either through a modiﬁed Vickrey auction or otherwise) of inducing driller 1 to reveal the true value s1 to allocate drilling rights efﬁciently. To see this, consider, without loss of generality, a direct revelation mechanism and let t1 (ˆs1 ) be a monetary transfer (possibly negative) to driller 1 if he announces signal value sˆ1 . Let s1 and s1

be signal values such that 1 < s1 < 1 < s1

. 2

(4.2)

Then, for driller 1 to have the incentive to announce truthfully when s1 = s1

, we must have t1 (s1

) ≥ 2s1

− 1 + t1 (s1 ) 8

(4.3)

Notice that the strictness of the inequality in (4.1) rules out the case of “pure common values,” where all buyers share the same valuation. However, in that case, the issue of who wins does not matter for efﬁciency.

8

Maskin

(the left-hand side is his payoff when he is truthful, whereas the right-hand side is his payoff when he pretends that s1 = s1 ). Similarly, the incentive-constraint corresponding to s1 = s1 is 2s1 − 1 + t1 (s1 ) ≥ t1 (s1

).

(4.4)

Subtracting (4.4) from (4.3), we obtain 2(s1 − s1

) ≥ 0, a contradiction of (4.2). Hence, there exists no efﬁcient mechanism. The feature that interferes with efﬁciency in this example is the violation of condition (4.1), i.e., the fact that 0

v 2o .

(4.7)

To understand the rationale for (4.6) and (4.7), imagine that buyers bid truthfully. Because signals are private information and thus buyer 1 will not in general know his own valuation, truthful bidding means that, if his signal value

Auctions and Efﬁciency

9

is s1 , he submits a schedule bˆ 1 (·) = b1 (·) such that b1 (v 2 (s1 , s2 )) = v 1 (s1 , s2 )

for all s2 .9

(4.8)

That is, whatever s2 (and hence v 2 ) turns out to be, buyer 1 bids his true valuation for that signal value. Similarly, truthful bidding for buyer 2 with signal value s2 means reporting schedule bˆ 2 (·) = b2 (·), such that b2 (v 1 (s1 , s2 )) = v 2 (s1 , s2 )

for all s1 .

(4.9)

Observe that if buyers bid according to (4.8) and (4.9), then the true valuations (v 1 (s1 , s2 ), v 2 (s1 , s2 )) constitute a ﬁxed point in the sense of (4.6).10 In view of (4.6) and (4.7), this means that if buyers are truthful, the auction will result in an efﬁcient allocation. Thus, the remaining critical issue is how to get buyers to bid truthfully. For this purpose, it is useful to recall the device that the Vickrey auction exploits to induce truthful bidding, viz. to make the winner’s payment equal, not to his own bid, but to the lowest possible bid he could have made and still have won the auction. This trick cannot be exactly replicated in our setting because buyers are submitting schedules rather than single bids. But let us try to take it as far as it will go. Suppose that when buyers report the schedules (bˆ 1 (·), bˆ 2 (·)), the resulting ﬁxed point (v 1o , v 2o ) satisﬁes v 1o > v 2o . Then, according to our rules, buyer 1 should win. But rather than having him pay v 1o , we will have buyer 1 pay v 1∗ , where v 1∗ = bˆ 2 (v 1∗ ).

(4.10)

This payment rule, I maintain, is the common-values analog of the Vickrey trick in the sense that v 1∗ is the lowest constant bid (i.e., the lowest uncontingent bid) that buyer 1 could make and still win (or tie for winning) given buyer 2’s bid bˆ 2 (·). The corresponding payment rule for buyer 2 should he win is v 2∗ such that v 2∗ = bˆ 1 v 2∗ . (4.11) I claim that, given the payment rules (4.10) and (4.11), it is an equilibrium for buyers to bid truthfully. To see this most easily, let us make use of a strengthened 9

10

I noted in my arguments against direct revelation mechanisms that buyer 1 most likely will not know buyer 2’s signal space S2 . But this in no way should prevent him from understanding how his own valuation is related to that of buyer 2, which is what (4.8) is really expressing [i.e., (4.8) still makes sense even if buyer 1 does not know what values s2 can take]. Without further assumptions on valuation functions, there could be additional – nontruthful – ﬁxed points. Dasgupta and Maskin (2000) and Eso and Maskin (2000a) provide conditions to rule such ﬁxed points out. But even if they are not ruled out, the auction rules can be modiﬁed so that, in equilibrium, the truthful ﬁxed point results (see Dasgupta and Maskin, 2000).

10

Maskin

version of (4.1): ∂v j ∂v i > . ∂si ∂si

(4.12)

Let us suppose that buyer 2 is truthful, i.e., he bids b2 (·) satisfying (4.9). I must show that it is optimal for buyer 1 to bid b1 (·) satisfying (4.8). Notice ﬁrst that if buyer 1 wins, his payoff is v 1 (s1 , s2 ) − v 1∗ , where v 1∗ = b2 v 1∗ , (4.13) regardless of how he bids (because neither his valuation nor his payment depends on his bid). I claim that if buyer 1 bids truthfully, then he wins if and only if (4.13) is positive. Observe that if this claim is established, then I will in fact have shown that truthful bidding is optimal; because buyer 1’s bid does not affect (4.13), the most he can possibly hope for is to win precisely in those cases where the net payoff from winning is positive. To see that the claim holds, let us ﬁrst differentiate (4.9) with respect to s1

to obtain db2 ∂v 1

∂v 2

(v 1 (s1 , s2 )) (s1 , s2 ) = (s , s2 ) for all s1 . dv 1 ∂s1 ∂s1 1 This identity, together with (2.1) and (4.12), implies that db2 (v 1 ) < 1, dv 1

for all v 1 .

(4.14)

But, from (4.14), (4.13) is positive if and only if v 1 (s1 , s2 ) − v 1∗ >

db2

(v )(v 1 (s1 , s2 ) − v 1∗ ) dv 1 1

for all

v 1 .

(4.15)

Now, from the intermediate value theorem, there exists v 1 ∈ [v 1∗ , v 1 (s1 , s2 )] such that db2

(v )(v 1 (s1 , s2 ) − v 1∗ ). b2 (v 1 (s1 , s2 )) − b2 (v 1∗ ) = dv 1 1 Hence (4.13) is positive if and only if v 1 (s1 , s2 ) − v 1∗ > b2 (v 1 (s1 , s2 )) − b2 (v 1∗ ), which, because

v 1∗

=

b2 (v 1∗ ),

(4.16)

is equivalent to

v 1 (s1 , s2 ) > v 2 (s1 , s2 ).

(4.17)

Now suppose that buyer 1 is truthful. Because (v 1 (s1 , s2 ), v 2 (s1 , s2 )) is then a ﬁxed point, 1 wins if and only if (4.17) holds. So, we can conclude that, when buyer 1 is truthful, his net payoff from winning is positive [i.e., (4.13) is positive] if and only if he wins, which is what I claimed. That is, the modiﬁed Vickrey auction is efﬁcient. (This analysis ignores the possible costs to buyers of aquiring signals; once such costs are incorporated the modiﬁed Vickrey

Auctions and Efﬁciency

11

auction is no longer efﬁcient in general – see Maskin, 1992 and Bergeman and V¨alim¨aki, 2000.) An attractive feature of the Vickrey auction in the case of private values is that bidding one’s true valuation is optimal regardless of the behavior of other buyers (i.e., it is a dominant strategy). Once we abandon private values, however, there is no hope of ﬁnding an efﬁcient mechanism with dominant strategies (this is because, if my payoff depends on your signal, then my optimal strategy necessarily depends on the way that your strategy reﬂects your signal value, and so is not independent of what you do). Nevertheless, equilibrium in our modiﬁed Vickery auction has a strong robustness property. In particular, notice that although, technically, truthful bidding constitutes only a Bayesian (rather than dominant-strategy) equilibrium, equilibrium strategies are independent of the prior distribution of signals F. That is, regardless of buyers’ prior beliefs about signals, they will behave the same way in equilibrium. In particular, this means that the modiﬁed Vickrey auction will be efﬁcient even in the case in which buyers’ signals are believed to be independent of one another.11 It also means that truthful bidding will remain an equilibrium even after buyers learn one another’s signal values; i.e., truthful bidding constitutes an ex post Nash equilibrium. Finally Chung and Ely (2001) show that, at least in the two-buyer case, the modiﬁed Vickrey auction is dominant solvable. One might complain that having a buyer make his bid a function of the other buyer’s valuation imposes a heavy informational burden on him – what if he does not know anything about the connection between the other’s valuation and his own? I would argue, however, that the modiﬁed Vickrey auction should be viewed as giving buyers an additional opportunity rather than as setting an onerous requirement. After all, the degree to which a buyer makes his bid contingent is entirely up to him. In particular, he always has the option of bidding entirely uncontingently (i.e., of submitting a constant function). Thus, contingency is optional (but, of course, the degree to which the modiﬁed Vickrey auction will be more efﬁcient than the ordinary Vickrey will turn on the extent to which buyers are prepared to bid contingently). I have explicitly illustrated how the modiﬁed Vickrey auction works only in the case of two bidders, but the logic extends immediately to larger numbers. For the case of n buyers, the rules become: 1. Each buyer i submits a contingent bid schedule bˆ i (·), which is a function of v −i , the vector of valuations excluding that of buyer i. 2. The auctioneer computes a ﬁxed point (v 1o , . . . , v no ), where v io = o bˆ i (v −i ) for all i. 3. The winner is the buyer i for whom v io ≥ v oj for all j = i. 11

Cr´emer and McLean (1985) exhibit a mechanism that attains efﬁciency if the joint distribution of signals is common knowledge (including to the auction designer) and exhibits correlation. R. McLean and A. Postlewaite (2001) show how this sort of mechanism can be generalized to the case where the auction designer himself does not know the joint distribution.

12

Maskin ∗ ∗ 4. The winner pays max j=i bˆ j (v − j ), where, for all j = i, v j satisﬁes ∗ ∗ v j = bˆ j (v − j ).

Under conditions (2.1) and (4.1), an argument similar to the two-buyer demonstration establishes that it is an equilibrium in this auction for each buyer to bid truthfully (see Dasgupta and Maskin, 2000).12 That is, if buyer i’s signal value is si , he should set bˆ i (·) = bi (·) such that

)) = v i (si , s−i ) bi (v −i (si , s−i

13 for all s−i .

(4.18)

Furthermore, it is easy to see that, if buyers bid truthfully, the auction results in an efﬁcient allocation. One drawback of the modiﬁed Vickrey auction that I have exhibited is that a buyer must report quite a bit of information (this is an issue distinct from that of the buyer’s having to know a great deal, discussed previously) – a bid for each possible vector of valuations that others may have. Perry and Reny (1999a) have devised an alternative modiﬁcation of the Vickrey auction that considerably reduces the complexity of the buyer’s report. Speciﬁcally, the Perry–Reny auction consists of two rounds of bidding. This means that a buyer can make his second-round bid depend on whatever he learned about other buyers’ valuations from their ﬁrst-round bids, and so the auction avoids the need to report bid schedules. In the ﬁrst round, each buyer j i submits a bid bi ≥ 0. In the second round, each buyer i submits a bid bi for each buyer j = i. If some buyer submits a bid of zero in the ﬁrst round, then the Vickrey rules apply: the winner is the high bidder, and he pays the secondhighest bid. If all ﬁrst-round bids are strictly positive, then the second-round bids determine the outcome. In particular, if there exists a buyer i such that j

bi ≥ bij

for all

j = i,

(4.19)

then buyer i wins and pays max j=i bij . If there exists no i satisfying (4.19), then the good is allocated at random. Perry and Reny show that, under conditions (2.1) and (4.1) and provided that the probability a buyer has a zero valuation is zero, there exists an efﬁcient 12

13

The reader may wonder whether, when (4.1) is not satisﬁed and so an efﬁcient auction may not be possible, the efﬁciency of the ﬁnal outcome could be enhanced by allowing buyers to retrade after the auction is over. However, any postauction trading episode could alternatively be viewed as part of a single mechanism that embraces both it and the auction proper. That is, in our search for efﬁcient auctions, we need not consider postauction trade, because such activity could always be folded into the auction itself. Indeed, permitting trade after an auction can, in principle, distort buyers’ bidding in the same way that the prospect of renegotiation can distort parties’ behavior in the execution of a contract (see Dewatripont, 1989). Ausubel and Cramton (1999) argue that only an efﬁcient auction is exempt from such distortion. It is conceivable – although unlikely – that for a given vector v −i there could exist two dif and s

, such that v (s , s ) = v (s , s

) = v , but v (s , s ) = ferent signal vectors s−i −i i −i −i i −i −i i i −i −i

), in which case (4.18) is not well deﬁned. To see how to handle that possibility, see v i (si , s−i Dasgupta and Maskin (2000).

Auctions and Efﬁciency

13

equilibrium of this auction. They also demonstrate that the auction can be readily extended to the case in which multiple identical goods are sold, provided that a buyer’s marginal utility from additional units is declining. 5. THE ENGLISH AUCTION The reader may wonder why, in my discussion of efﬁciency, I have not brought up the English auction, the familiar open format in which (i) buyers call out bids publicly (with the proviso that each successive bid exceed the one before), (ii) the winner is the last buyer to make a bid, and (iii) the winner pays his bid. After all, the opportunity to observe other buyers’ bids in the English auction would seem to allow a buyer to make a conditional bid in the same way that the modiﬁed Vickrey auction does. However, as shown in Maskin (1992), Eso and Maskin (2000b), and Krishna (2000), the English auction is not efﬁcient in as wide a class of cases as the modiﬁed Vickrey auction. To see this, let us consider a variant of the English auction, sometimes called the “Japanese” auction (see Milgrom and Weber, 1982), which is particularly convenient analytically: 1. 2. 3. 4. 5.

All buyers are initially in the auction. The auctioneer raises the price continuously starting from zero. A buyer can drop out (publicly) at any time. The last buyer remaining wins. The winner pays the price prevailing when the penultimate buyer dropped out.

Now, in this auction, a buyer can indeed condition his drop-out point according to when other buyers have dropped out, allowing bids in effect to be conditional on other buyers’ valuations. However, a buyer can condition only on buyers who have already dropped out. Thus, for efﬁciency, buyers must drop out in the “right” order in the equilibrium. That this might not happen is illustrated by the following example from Eso and Maskin (2000a): Example 5.6. Suppose there are two buyers, where v 1 (s1 , s2 ) = 2 + s1 − 2s2 , and v 2 (s1 , s2 ) = 2 + s2 − 2s1 and s1 and s2 are distributed uniformly on [0, 1]. Notice ﬁrst that conditions (2.1) and (4.1) hold, so that the modiﬁed Vickrey auction results in an efﬁcient equilibrium allocation. Indeed, buyers’ equilibrium contingent bids are b1 (v 2 ) = 6 − 3s1 − 2v 2 ,

14

Maskin

and b2 (v 1 ) = 6 − 3s2 − 2v 1 . Now, consider the English auction. For i = 1, 2, let pi (si ) be the price at which buyer i drops out if his signal value is si . If the English auction were efﬁcient, then we would have s1 > s2

if and only if

p1 (s1 ) > p2 (s2 ).

( 䉬)

From symmetry, if s1 = s2 = s, then p1 (s1 ) = p2 (s2 ).

(䉬䉬)

But from (䉬) and (䉬䉬), pi (s + s) > pi (s) and so pi (·) is strictly increasing in si .

(䉬䉬䉬)

Thus, p1 (s) = v 1 (s, s) and p2 (s) = v 2 (s, s) [if v 1 (s, s) > p1 (s) and s1 = s2 = s, then buyer 1 drops out before the price reaches his valuation and so would do better to stay in a bit longer; if v 1 (s, s) < p1 (s), then buyer 1 stays in for prices above his valuation, and so would do better to drop out earlier]. But, v 1 (s, s) = 2 + s − 2s = 2 − s, which is decreasing in s, violating our ﬁnding that p1 (·) is increasing. In short, efﬁciency demands that a buyer with a lower signal value drop out ﬁrst. But, if buyer i’s signal value is s, he has the incentive to drop out when the price equals v 1 (s, s), and this function is decreasing in s. So, in equilibrium, buyers will not drop out in the right order. We conclude that the English auction does not have an efﬁcient equilibrium in this example. In Example 5.6, each buyer’s valuation is decreasing in the other buyer’s signal. Indeed, this feature is important: as Maskin (1992) shows, the English and Vickrey auctions are efﬁcient in the case n = 2 when valuations are nondecreasing functions of signals [and conditions (2.1) and (4.1) hold]. However, examples due to Perry and Reny (1999b), Krishna (2000), and Eso and Maskin (2000b) demonstrate that this result does not extend to more than two buyers. Nevertheless, Krishna (2000) provides some interesting conditions [considerably stronger than the juxtaposition of (2.1) and (4.1)] under which the English auction is efﬁcient with three or more buyers (see also Eso and Maskin, 2000b). Moreover, Izmalkov (2001) shows that these conditions can be relaxed considerably when reentry in the English auction is permitted. Finally Perry and Reny (1999b) shows that the English auction can be modiﬁed [in a way analogous

Auctions and Efﬁciency

15

to their (1999a) alteration of the Vickrey auction] that renders it efﬁcient under the same conditions as the modiﬁed Vickrey auction. In fact, this modiﬁed English auction extends to multiple (identical) units, as long as buyers’ marginal valuations are decreasing in the number of units consumed [in the multiunit case, the Perry–Reny auction is actually a modiﬁcation of the Ausubel (1997) generalization of the English auction]. 6. MULTIPLE GOODS In the same way that the ordinary Vickrey auction extends to multiple goods via the Groves–Clarke mechanism, so our modiﬁed Vickrey auction can be extended to handle more than one good. It is simplest to consider the case of two buyers, 1 and 2, and two goods, A and B. If there were private values, the pertinent information about buyer i would consist of three numbers (v i A , v i B, and v i AB ), his valuations, respectively, for good A, good B, and both goods together. Efﬁciency would then mean allocating the goods to maximize the sum of valuations. For example, it would be efﬁcient to allocate both goods to buyer 1 provided that v 1AB ≥ max{v 1A + v 2B , v 1B + v 2A , v 2AB }. The Groves–Clarke mechanism is the natural generalization of the Vickrey auction to a multigood setting. In this mechanism, buyers submit valuations (in our two-good, private-values model, each buyer i submits vˆ i A , vˆ i B , and vˆ i AB ); the goods are allocated in the way that maximizes the sum of the submitted valuations; and each buyer makes a payment equal to his marginal impact on the other buyers (as measured by their submitted valuations). Thus, in the private-values model, if buyer 1 is allocated good A, then he should pay vˆ 2AB − vˆ 2B ,

(6.1)

because vˆ 2AB would be buyer 2’s payoff were buyer 1 absent, vˆ 2B is his payoff given buyer 1’s presence, and so the difference between the two – i.e., (6.1) – is buyer 1’s marginal effect on buyer 2. Given private values, bidding one’s true valuation is a dominant strategy in the Vickrey auction, and the same is true in the Groves–Clarke mechanism. Hence, in view of its allocative rule, the mechanism is efﬁcient in the case of private values. But, as with the Vickrey auction, the Groves–Clarke mechanism is not efﬁcient when there are common values. Hence, I shall examine a modiﬁcation of Groves–Clarke analogous to that for Vickrey. As in the one-good case, assume that each buyer i (i = 1, 2) observes a private real-valued signal si . Buyer i’s valuations are functions of the two signals: v i A (s1 , s2 ), v i B (s1 , s2 ), v i AB (s1 , s2 ). The appropriate counterpart to condition (2.1) is the requirement that if H and H are two bundles of goods for which, given (s1 , s2 ), buyer i prefers H, then the intensity of that preference rises with si . That is, for all i = 1, 2 and for any

16

Maskin

two bundles, H, H = φ, A, B, AB, v i H (s1 , s2 ) − v i H (s1 , s2 ) > 0 ⇒

∂ (v i H (s1 , s2 ) − v i H (s1 , s2 )) > 0. ∂si (6.2)

Notice that if, in particular, H = A and H = φ, then (6.2) just reduces to the requirement that if v i A (s1 , s2 ) > 0, then ∂v i A /∂si (s1 , s2 ) > 0, i.e., to (2.1). Similarly, the proper generalization of (4.1) is the requirement that if, for given signal values, two allocations of goods are equally efﬁcient (i.e., give rise to the same sum of valuations), then an increase in si leads to the allocation that buyer i prefers to become the more efﬁcient. That is, for all i = 1, 2, and any two allocations (H1 , H2 ), (H1 , H2 ), if

2

v j H j (s1 , s2 ) =

j=1

then

2

v j H j (s1 , s2 )

and v i Hi (s1 , s2 ) > v i Hi (s1 , s2 ),

j=1

2 2 ∂ ∂ v j H j (s1 , s2 ) > v j H j (s1 , s2 ). ∂si j=1 ∂si j=1

(6.3)

Notice that, if just one good A was being allocated and the two allocations were (H1 , H2 ) = (A, φ) and (H1 , H2 ) = (φ, A), then, when i = 1, condition (6.3) would reduce to the requirement v 1A (s1 , s2 ) = v 2A (s1 , s2 )

if

and

v 1A (s1 , s2 ) > 0, (6.4)

then

∂v 1A ∂v 2A (s1 , s2 ) > (s1 , s2 ), ∂s1 ∂s1

which is just (4.1). An auction is efﬁcient in this setting if, for all (s1 , s2 ), the equilibrium allocation (H1o , H2o ) solves max

(H1 ,H2 )

2

v i Hi (s1 , s2 ).

i=1

Under assumptions (6.2) and (6.3), the following rules constitute an efﬁcient auction: 1. Buyer i submits schedules bˆ i A (·), bˆ i B (·), bˆ i AB (·), where for all H = A, B, AB and all v j , bˆ i H (v j ) = buyer i’s bid for H if buyer j’s ( j = i) valuations are v j = (v j A , v j B , v j AB ).

Auctions and Efﬁciency

17

2. The auctioneer computes a ﬁxed point (v 1o , v 2o ) such that, for all i and H, v ioH = bˆ i H v oj . 3. Goods are divided according to allocation (H1o , H2o ), where

2 v ioHi . H1o , H2o = arg max (H1 ,H2 )

i=1

4. Suppose that buyer 1 is allocated good A (i.e., H1o = A); if (i) there exists v 1∗ such that ∗ v 1A + bˆ 2B (v 1∗ ) = bˆ 2AB (v 1∗ ), (6.5) then buyer 1 pays bˆ 2AB (v 1∗ ) − bˆ 2B (v 1∗ ) ; ∗ if instead of (6.5), (ii) there exist vˆ 1∗ (with vˆ 1A ∗ ∗ vˆ 1A + bˆ 2B (ˆv 1∗ ) = vˆ 1B + bˆ 2A (ˆv 1∗ )

(6.6)

s2 + βs11 + γ s12

(7.1)

+ γ s12 , s1

+ s2

+ αs2 < s2 + βs11

(7.2)

but

so that, with (s11 , s12 ) = (s11 , s12 ), the good should be allocated to buyer 1 and,

with (s11 , s12 ) = (s11 , s12 ) , it should be allocated to buyer 2 [if β = γ , this conﬂict does not arise; the directions of the inequalities in (7.1) and (7.2) must be the same]. Hence, an efﬁcient auction is impossible when β = γ . However, because buyer 1 cares only about the sum s11 + s12 , it is natural to deﬁne

r1 = s11 + s12 and set w 1 (r1 , s2 ) = r1 + αs2 and w 2 (r1 , s2 ) = E s11 ,s12 [s2 + βs11 + γ s12 |s11 + s12 = r1 ]. Notice that we have reduced the two-dimensional signal s1 to the onedimensional signal r1 . Furthermore, provided that α, β, and γ are all less than 1 [so that condition (4.1) holds], our modiﬁed Vickrey auction is efﬁcient with respect to the “reduced” valuation functions w 1 (·) and w 2 (·) (because all the analysis of Section 4 applies). Hence, a moment’s reﬂection should convince the reader that, although full efﬁciency is impossible for the valuation functions v 1 (·) and v 2 (·) , the modiﬁed Vickrey auction is constrained efﬁcient, where “constrained” refers to the requirement that buyer 1 must behave the same way for any pair (s11 , s12 ) summing to the same r1 (in the terminology of Holmstrom and Myerson, 1983, the auction is “incentive efﬁcient”). Unfortunately, as Jehiel and Moldovanu (2001) show in their important paper, this trick of reducing a multidimensional signal to one dimension no longer works in general if there are multiple goods. To see the problem, suppose that, as in Section 5, there are two goods, A and B, but that now a buyer i (i = 1, 2, 3) receives two signals – one for each good. Speciﬁcally, let s1A and s1B

20

Maskin

be buyer i’s signals for A and B, respectively, and let his valuation functions be v i A (s1A , s2A , s3A )

and

v i B (s1B , s2B , s3B ) .

Assume that each buyer wants to buy at most one good. Let us ﬁrst ﬁx the signal values of buyers 2 and 3 at levels such that, as we vary s1A and s1B , either (i) it is efﬁcient to allocate good A to buyer 1 and B to 2, or (ii) it is efﬁcient to allocate good A to 2 and B to 3. In case (i), we have v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) > v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) , that is, v 1A (s1A , s2A , s3A ) > v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ),

(7.3)

whereas in case (ii), we have v 1A (s1A , s2A , s3A ) < v 2A (s1A , s2A , s3A ) + v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ).

(7.4)

Notice that buyer 1’s objective function does not depend on s1B [s1B affects only buyer 1’s valuation for good B, but buyer 1 is not allocated B in either case (i) or (ii)]. Hence, the equilibrium outcome of any auction cannot turn on the value of this parameter. But this means that, if an auction is efﬁcient, which of case (i) or (ii) [i.e., which of (7.3) or (7.4)] holds cannot depend on s1B . We conclude, from the right-hand sides of (7.3) and (7.4), that v 3B (s1B , s2B , s3B ) − v 2B (s1B , s2B , s3B ) must be independent of s1B . Expressed differently, we have ∂ ∂ v 3B (s1B , s2B , s3B ) = v 2B (s1B , s2B , s3B ). ∂s1B ∂s1B Repeating the argument for all other pairs of buyers and for good B, we have ∂v j H ∂v k H = , ∂si H ∂si H

for all

j = i = k

and

H = A, B.

(7.5)

Next, let us ﬁx the signal values of buyers 2 and 3 at levels such that, as we vary s1A and s1B , either (iii) it is efﬁcient to allocate A to buyer 1 and B to 2 or (iv) it is efﬁcient to allocate B to buyer 1 and A to 2. In case (iii), we have v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) > v 1B (s1B , s2B , s3B ) + v 2A (s1A , s2A , s3A ),

(7.6)

and in case (iv), v 1A (s1A , s2A , s3A ) + v 2B (s1B , s2B , s3B ) < v 1B (s1B , s2B , s3B ) + v 2A (s1A , s2A , s3A ).

(7.7)

Auctions and Efﬁciency

21

To simplify matters, let us assume that valuation functions are linear: v 1A (s1A , s2A , s3A ) = s1A + α12 s2A + α13 s3A ,

(7.8) (7.9)

v 1B (s1B , s2B , s3B ) = s1B + β12 s2B + β13 s3B , and similarly for buyers 2 and 3. Then (7.6) and (7.9) can be rewritten as

s1A − s1B > α21 s1A + α22 s2A + α23 s3A − β21 s1B − β22 s2B − β23 s3B (7.10) and s1A − s1B < α21 s1A + α22 s2A + α23 s3A − β21 s1B − β22 s2B − β23 s3B . (7.11) Now (because we have ﬁxed 2’s and 3’s signal values), buyer 1’s objective function depends only on s1A − s1B . That is, for any value of , buyer 1 will behave the same way for signal values (s1A , s1B ) as for (s1A + , s1B + ). Hence, in any auction, the equilibrium outcome must be the same for any value of . In particular, if the auction is efﬁcient, whether (7.10) or (7.11) applies cannot depend on ’s value. But, from the right-hand sides of (7.10) and (7.11), this can be the case only if α21 = β21 , i.e., only if ∂v 2A ∂v 2B = . ∂s1A ∂s1B Repeating the argument for the other buyers, we have ∂v j B ∂v j A = ∂si A ∂si B

for all

i

and

j = i.

(7.12)

The necessary conditions (7.5) and (7.12), due to Jehiel and Moldovanu (2001), are certainly restrictive. Nevertheless, as shown in Eso and Maskin (2000a), there is a natural class of cases in which they are automatically satisﬁed. Speciﬁcally, suppose that in our two-good model, each buyer wants at most one good (this is not essential). Assume that the true value of good A to buyer i, yi A , is the sum of a component z A common to all buyers and a component of z i A that is idiosyncratic to him. That is, yi A = z A + z i A . Similarly, assume that buyer i’s true valuation of good B, yi B , satisﬁes yi B = z B + z i B . Suppose, however, that buyer i does not directly observe his true valuations, but only noisy signals of them. That is, he observes si A and si B , where si A = yi A + εi A and si B = yi B + εi B .

22

Maskin

It can be shown (see Eso and Maskin, 2000a) that, if the random variables z H , z i H , εi H , i = 1, 2, 3, H = A, B, are independent, normal random variables and if the variances of εi H and z i H are proportional to that of z H , i.e., for all i, there exists kiε and ki z such that var εi H = kiε var z H

and

var z i H = ki z var z H , H = A, B,

then (7.5) and (7.12) are automatically satisﬁed and the modiﬁed Groves–Clarke mechanism discussed in Section 6 is an efﬁcient auction. 8. FURTHER WORK There is clearly a great deal of work remaining to be done on efﬁcient auctions, including dealing with the multiple good/multidimensional problem in cases where (7.5) and (7.12) do not hold. I would like to simply underscore one issue: ﬁnding an open auction counterpart to the modiﬁed Groves–Clarke mechanism in the case of multiple goods. The task of submitting contingent bids is considerable even for a single good. For multiple goods, it could be formidable. For this reason, as I have already discussed, researchers have sought open auctions – variants of the English auction – as desirable alternatives. Perry and Reny (1999b) have exhibited a lovely modiﬁcation of the Ausubel (1997) auction (which, in turn, elegantly extends the English auction to multiple identical goods). However, efﬁciency in that auction obtains only when all goods are identical and buyers’ marginal valuations are declining. It would be an important step, in my judgment, to ﬁnd a similar result without such restrictions on goods or preferences. ACKNOWLEDGMENTS I thank the National Science Foundation and the Beijer International Institute for research support and S. Baliga, S. Izmalkov, P. Jehiel, V. Krishna, and B. Moldovanu for helpful comments. Much of my research on efﬁcient auctions – and much of the work reported here – was carried out with my long-time collaborator and friend P. Dasgupta. More recently, I have had the pleasure of working with P. Eso. Others whose research ﬁgures prominently in the recent literature – and to whom I owe a considerable intellectual debt – include L. Ausubel, P. Jehiel, V. Krishna, B. Moldovanu, M. Perry, A. Postlewaite, R. McLean, and P. Reny. APPENDIX: BUYER 1’S PAYMENT WHEN ALLOCATED BOTH GOODS IN A TWO-GOOD, TWO-BUYER AUCTION If (a) there exists v 1∗ such that ∗ v 1AB = bˆ 2AB (v 1∗ ),

Auctions and Efﬁciency

then buyer 1 pays bˆ 2AB (v 1∗ ); if (a) does not hold and instead (b) there exists vˆ 1∗ such that ∗ ∗ = vˆ 1A + bˆ 2B (ˆv 1∗ ), vˆ 1AB

then if (b1) there exists v 1∗∗ such that ∗∗ + bˆ 2B (v 1∗∗ ) = bˆ 2AB (v 1∗∗ ), v 1A

buyer 1 pays bˆ 2B (ˆv 1∗ ) + (bˆ 2AB (v 1∗∗ ) − bˆ 2B (v 1∗∗ )); and if instead (b2) there exist vˆ 1∗∗ and vˆ 1∗∗∗ such that ∗∗ ∗∗ + bˆ 2B (ˆv 1∗∗ ) = vˆ 1B + bˆ 2A (ˆv 1∗∗ ) vˆ 1A and ∗∗∗ v 1B + bˆ 2A (v 1∗∗∗ ) = bˆ 2AB (v 1∗∗∗ ),

then buyer 1 pays bˆ 2B (ˆv 1∗ ) + (bˆ 2A (ˆv 1∗∗ ) − bˆ 2B (ˆv 1∗∗ )) + (bˆ 2AB (v 1∗∗∗ ) − bˆ 2A (v 1∗∗∗ )); ﬁnally, if ∗ (c) there exists vˆˆ 1 such that ∗ ∗ ∗ vˆˆ 1AB = vˆˆ 1B + bˆ 2A (vˆˆ 1 ),

then if (c1) there exists v 1∗∗ such that ∗∗ v 1B + bˆ 2A (v 1∗∗ ) = bˆ 2AB (v 1∗∗ ),

buyer 1 pays ∗∗ bˆ 2A (vˆˆ 1 ) + (bˆ 2AB (v 1∗∗ ) − bˆ 2A (v 1∗∗ ));

and if instead ∗∗ (c2) there exist vˆˆ 1 and vˆ 1∗∗∗ such that ∗∗ ∗∗ ∗∗ ∗∗ vˆˆ 1B + bˆ 2A (vˆˆ 1 ) = vˆˆ 1A + bˆ 2B (vˆˆ 1 )

and ∗∗∗ + bˆ 2B (v 1∗∗∗ ) = bˆ 2AB (v 1∗∗∗ ), v 1A

then buyer 1 pays ∗ ∗∗ ∗∗ bˆ 2A (vˆˆ 1 ) + (bˆ 2B (vˆˆ 1 ) − bˆ 2A (vˆˆ 1 )) + (bˆ 2AB (v 1∗∗∗ ) − bˆ 2B (v 1∗∗∗ )).

23

24

Maskin

References Ausubel, L. (1997), An Efﬁcient Ascending-Bid Auction for Multiple Objects, mimeo. Ausubel, L. and P. Cramton (1999), The Optimality of Being Efﬁcient, mimeo. Che, Y. K. and I. Gale (1996), Expected Revenue of the All-Pay Auctions and First-Price Sealed-Bid Auctions with Budget Constraints, Economics Letters, 50, 373–380. Chung, K.-C. and J. Ely (2001), Efﬁcient and Dominant Solvable Auction with Interdependent Valuations, mimeo. Clarke, E. (1971), Multipart Pricing of Public Goods, Public Choice, 11, 17–33. Cr´emer, J. and R. McLean (1985), Optimal Selling Strategies Under Uncertainty for a Discriminating Monopolist When Demands Are Interdependent, Econometrica, 53, 345–362. Dasgupta, P. and E. Maskin (2000), Efﬁcient Auctions, Quarterly Journal of Economics, 115, 341–388. Bergemann, D. and J. V¨alin¨aki (2001), Information Acquisition and Efﬁcient Mechanism Design, mimeo. Debreu, G. (1959), Theory of Value, New Haven, CT: Yale University Press. Dewatripont, M. (1989), Renegotiation and Information Revelation Over Time: The Case of Optimal Labor Contracts, Quarterly Journal of Economics, 104, 589–619. Eso, P. and E. Maskin (2000a), Multi-Good Efﬁcient Auctions with Multidimensional Information, mimeo. Eso, P. and E. Maskin (2000b), Notes on the English Auction, mimeo. Fiesler, K. T. Kittsteiner, and B. Moldovanu (2000), Partnerships, Lemons, and Efﬁcient Trade, mimeo. Gresik, T. (1991), Ex Ante Incentive Efﬁcient Trading Mechanisms without the Private Valuation Restriction, Journal of Economic Theory, 55, 41–63. Groves, T. (1973), Incentives in Teams, Econometrica, 41, 617–631. Holmstrom, B. and R. Myerson (1983), Efﬁcient and Durable Decision Rules with Incomplete Information, Econometrica, 51, 1799–1819. Izmalkov, S. (2001), English Auctions with Reentry, mimeo. Jehiel, P. and B. Moldovanu (2001), Efﬁcient Design with Interdependent Values, Econometrica, 69, 1237–1260. Krishna, V. (2000), Asymmetric English Auctions, mimeo. McLean, R., and A. Postlewaite (2001), Efﬁcient Auction Mechanisms with Interdependent Signals, mimeo. Maskin, E. (1992), Auctions and Privatization, in Privatization (ed. by H. Siebert), T¨ubingen: J. C. B. Mohr, 115–136. Maskin, E. (2000), Auctions, Development and Privatization: Efﬁcient Auctions with Liquidity-Constrained Buyers, European Economic Review, 44(4–6), 667–681. Maskin, E. and J. Riley (1984), Optimal Auctions with Risk-Averse Buyers, Econometrica, 52, 1473–1518. Milgrom, P. and R. Weber (1982), A Theory of Auctions and Competitive Bidding, Econometrica, 50, 1081–1122. Palfrey, T. (1993), Implementation in Bayesian Equilibrium, in Advances in Economic Theory (ed. by J. J. Laffont), Cambridge, U.K.: Cambridge University Press. Perry, M. and P. Reny (1999a), An Ex Post Efﬁcient Auction, mimeo. Perry, M. and P. Reny (1999b), An Ex Post Efﬁcient Ascending Auction, mimeo. Vickrey, W. (1961), Counterspeculation, Auctions, and Competitive Sealed Tenders, Journal of Finance, 16, 8–37.

CHAPTER 2

Why Every Economist Should Learn Some Auction Theory Paul Klemperer

Figure 2.1. Disclaimer: We don’t contend that the following ideas are all as important as the one illustrated, merely that those who haven’t imbibed auction theory are missing out on a potent brew! This chapter discusses the strong connections between auction theory and “standard” economic theory; we show that situations that do not at ﬁrst sight look like auctions can be recast to use auction-theoretic techniques; and we argue that auction-theoretic tools and intuitions can provide useful arguments and insights in a broad range of mainstream economic settings. We also discuss some more obvious applications, especially to industrial organization.

26

Klemperer

1. INTRODUCTION Auction theory has attracted enormous attention in the last few years.1 It has been increasingly applied in practice, and this has generated a new burst of theory. It has also been extensively used, both experimentally and empirically, as a testing ground for game theory.2 Furthermore, by carefully analyzing very simple trading models, auction theory is developing the fundamental building blocks for our understanding of more complex environments. But some people still see auction theory as a rather specialized ﬁeld, distinct from the main body of economic theory, and as an endeavor for management scientists and operations researchers rather than as a part of mainstream economics. This paper aims to counter that view. This view may have arisen in part because auction theory was substantially developed by operational researchers, or in operations research journals,3 and using technical mathematical arguments rather than standard economic intuitions. But it need not have been this way. This paper argues that the connections between auction theory and “standard” economic theory run deeper than many people realize; that auction-theoretic tools provide useful arguments in a broad range of contexts; and that a good understanding of auction theory is valuable in developing intuitions and insights that can inform the analysis of many mainstream economic questions. In short, auction theory is central to economics. We pursue this agenda in the context of some of the main themes of auction theory: the revenue equivalence theorem, marginal revenues, and ascending vs. (ﬁrst-price) sealed-bid auctions. To show how auction-theoretic tools can be applied elsewhere in economics, Section 2 exploits the revenue equivalence theorem to analyze a wide range of applications that are not, at ﬁrst sight, auctions, including litigation systems, ﬁnancial crashes, queues, and wars of attrition. To illustrate how looser analogies can usefully be made between auction theory and economics, Section 3 applies some intuitions from the comparison of ascending and sealed-bid auctions to other economic settings, such as rationing and e-commerce. To demonstrate the deeper connections between auction theory and economics, Section 4 discusses and applies the close parallel between the optimal auction problem and that of the discriminating monopolist; both are about maximizing marginal revenues. Furthermore, auction-theoretic ways of thinking are also underutilized in more obvious areas of application, for instance, price-setting oligopolies we 1 2 3

See Klemperer (1999a) for a review of auction theory; many of the most important contributions are collected in Klemperer (2000). See Figure 2.1. Kagel (1995) and Laffont (1997) are excellent recent surveys of the experimental and empirical work, respectively. Section 6 of this paper and Klemperer (2002a) discuss practical applications. The earliest studies appear in the operations research literature, for example, Friedman (1956). Myerson’s (1981) breakthrough article appeared in Mathematics of Operations Research, and Rothkopf’s (1969) and Wilson’s (1967, 1969) classic early papers appeared in Management Science. Ortega’s (1968) pathbreaking models of auctions, including a model of signaling that signiﬁcantly predated Spence (1972), remain relatively little known by economists, perhaps because they formed an operations research Ph.D. thesis.

Auction Theory

27

discuss in Section 5.4 Few non-auction theorists know, for example, that marginal-cost pricing is not always the only equilibrium when identical ﬁrms with constant marginal costs set prices, or know the interesting implications of this fact. Section 6 brieﬂy discusses direct applications of auction theory to markets that are literally auction markets, including electricity markets, treasury auctions, spectrum auctions, and internet markets, and we conclude in Section 7. 2. USING AUCTION-THEORETIC TOOLS IN ECONOMICS: THE REVENUE EQUIVALENCE THEOREM Auction theory’s most celebrated theorem, the Revenue Equivalence Theorem (RET), states conditions under which different auction forms yield the same expected revenue, and also allows revenue rankings of auctions to be developed when these conditions are violated.5 Our purpose here, however, is to apply it in contexts where the use of an auction model might not seem obvious. Revenue Equivalence Theorem. Assume each of a given number of riskneutral potential buyers has a privately known valuation independently drawn from a strictly increasing atomless distribution, and that no buyer wants more than one of the k identical indivisible prizes. Then, any mechanism in which (i) the prizes always go to the k buyers with the highest valuations and (ii) any bidder with the lowest feasible valuation expects zero surplus, yields the same expected revenue (and results in each bidder making the same expected payment as a function of her valuation).6 More general statements are possible, but are not needed for the current purpose. Our ﬁrst example is very close to a pure auction. 2.1.

Comparing Litigation Systems

In 1991, U.S. Vice President Dan Quayle suggested reforming the U.S. legal system in the hope, in particular, of reducing legal expenditures. One of his 4

5 6

Of course, standard auction models form the basic building blocks of models in many contexts. See, for example, Stevens’ (1994, 2000) models of wage determination in oligopsonistic labor markets, Bernheim and Whinston (1986), Feddersen and Pesendorfer (1996, 1998), Persico (2000) and many others’ political economy models, and many models in ﬁnance (including, of course, takeover battles, to which we give an application in Section 4). Another major area we do not develop here is the application of auction theorists’ understanding of the winner’s curse to adverse selection more generally. For example, Klemperer’s (1999a) survey develops a series of revenue rankings starting from the RET. See Klemperer (1999a, Appendix A) for more general statements and an elementary proof. The theorem was ﬁrst derived in an elementary form by Vickrey (1961, 1962) and subsequently extended to greater generality by Myerson (1981), Riley and Samuelson (1981), and others.

28

Klemperer

proposals was to augment the current rule according to which parties pay their own legal expenses, by a rule requiring the losing party to pay the winner an amount equal to the loser’s own expenses. Quayle’s intuition was that if spending an extra $1 on a lawsuit might end up costing you $2, then less would be spent. Was he correct?7 A simple starting point is to assume each party has a privately known value of winning the lawsuit relative to losing, independently drawn from a common, strictly increasing, atomless distribution;8 that the parties independently and simultaneously choose how much money to spend on legal expenses; and that the party who spends the most money wins the “prize” (the lawsuit).9 It is not too hard to see that both the existing U.S. system and the Quayle system satisfy the assumptions of the RET, so the two systems result in the same expected total payments on lawyers.10 Thus Quayle was wrong (as usual); his argument is precisely offset by the fact that the value of winning the lawsuit is greater when you win your opponent’s expenses.11 Ah, Quayle might say, but this calculation has taken as given the set of lawsuits that are contested. Introducing the Quayle scheme will change the “bidding functions,” that is, change the amount any given party spends on litigation, and also change who decides to bring suits. Wrong again, Dan! Although it is correct that the bidding functions change, the RET also tells us (in its parenthetical remark) that any given party’s expected payoffs from the lawsuit are unchanged, so the incentives to bring lawsuits are unchanged. What about other systems, such as the typical European system in which the loser pays a fraction of the winner’s expenses? This is a trick question: It is no longer true that a party with the lowest possible valuation can spend nothing and lose nothing. In this case, this party always loses in equilibrium and 7

8

9

10

11

This question was raised and analyzed (although not by invoking the RET) by Baye, Kovenock, and de Vries (1997). The ideas in this section, except for the method of analysis, are drawn from them. See also Baye, Kovenock, and de Vries (1998). For example, a suit about which party has the right to a patent might ﬁt this model. The results extend easily to common-value settings, e.g., contexts in which the issue is the amount of damages that should be transferred from one party to another. American seminar audiences typically think this is a natural assumption, but non-Americans often regard it as unduly jaundiced. Of course, we use it as a benchmark only, to develop insight and intuition (just as the lowest price does not win the whole market in most real “Bertrand” markets, but making the extreme assumption is a common and useful starting point). Extensions are possible to cases in which with probability (1 − λ) the “most deserving” party wins, but with probability λ > 0, the biggest spender wins. The fact that no single “auctioneer” collects the players’ payments as revenues, but that they are instead dissipated in legal expenses in competing for the single available prize (victory in the lawsuit), is of course irrelevant to the result. Formally, checking our claims requires conﬁrming that there are equilibria of the games that satisfy the RET’s assumptions. The assumption we made that the parties make a one-shot choice of legal expenses is not necessary, but makes conﬁrming this relatively easy. See Baye, Kovenock, and de Vries (1997) for explicit solutions. Some readers might argue they could have inferred the effectiveness of the proposal from the name of the proponent, without need of further analysis. In fact, however, this was one of Dan Quayle’s policy interventions that was not subject to immediate popular derision.

Auction Theory

29

must pay a fraction of the winner’s expenses, and so makes negative expected surplus. Thus, condition (ii) of the RET now fails. Thinking through the logic of the proof of the RET makes clear that all the players are worse off than under the previous systems.12 Thus, legal bills are higher under the European rule. The reason is that the incentives to win are greater than in the U.S. system, and there is no offsetting effect. Here, of course, the issue of who brings lawsuits is important because low-valuation parties would do better not to contest suits in this kind of system; consistent with our theory, there is empirical evidence (e.g., Hughes and Snyder, 1995) that the American system leads to more trials than, for example, the British system. This last extension demonstrates that even where the RET in its simplest form fails, it is often possible to see how the result is modiﬁed; Appendix 1 shows how to use the RET to solve for the relative merits of a much broader class of systems in which those we have discussed are special cases. We also show there that a system that might be thought of as the exact opposite of Quayle’s system is optimal in this model. Of course, many factors are ignored (e.g., asymmetries); the basic model should be regarded as no more than a starting point for analysis. 2.2.

The War of Attrition

Consider a war of attrition in which N players compete for a prize. For example, N ﬁrms compete to be the unique survivor in a natural monopoly market, or N ﬁrms each hold out for the industry to adopt the standard they prefer.13 Each player pays costs of 1 per unit time until she quits the game. When just one player remains, that player also stops paying costs and wins the prize. There is no discounting. The two-player case, where just one quit is needed to end the game, has been well analyzed.14 Does the many-player case yield anything of additional interest? Assume players’ values of winning are independently drawn from a common, strictly increasing, atomless distribution, and the game has an equilibrium satisfying the other conditions of the RET. Then the RET tells us that, in expectation, 12

13

14

As Appendix 1 discusses, every type’s surplus is determined by reference to the lowest valuation type’s surplus [see, also, Klemperer (1999a, Appendix A)], and the lowest type is worse off in the European system. Again, our argument depends on condition (i) of the RET applying. See Appendix 1 and Baye et al. (1997). Another related example analyzed by Bulow and Klemperer (1999) is that of N politicians, each delaying in the hope of being able to avoid publicly supporting a necessary but unpopular policy that requires the support of N − 1 to be adopted. See, for example, Maynard Smith (1974) and Riley (1980) who discuss biological competition, Fudenberg and Tirole (1986) who discuss industrial competition, Abreu and Gul (2000), Kambe (1999), and others who analyze bargaining, and Bliss and Nalebuff (1984) who give a variety of amusing examples. Bliss and Nalebuff note that extending to K + 1 players competing for K prizes does not change the analysis in any important way, because it remains true that just one quit is needed to end the game.

30

Klemperer

the total resources spent by the players in the war of attrition equal those paid by the players in any other mechanism satisfying the RET’s conditions – e.g., a standard ascending auction in which the price rises continuously until just one player remains and (only) the winner pays the ﬁnal price. This ﬁnal price will equal the second-highest actual valuation, so the expected total resources dissipated in the war of attrition are the expectation of this quantity. Now imagine the war of attrition has been under way long enough that just the two highest-valuation players remain. What are the expected resources that will be dissipated by the remaining two players, starting from this time on? The RET tells us that they equal the auctioneer’s expected revenue if the war of attrition was halted at this point and the objects sold to the remaining players by an ascending auction, that is, the expected second-highest valuation of these two remaining players. This is the same quantity, on average, as before!15 Thus the expected resources dissipated, and hence the total time taken until just two players remain must be zero; all but the two highest-valuation players must have quit at once. Of course, this conclusion is, strictly speaking, impossible; the lowestvaluation players cannot identify who they are in zero time. However, the conclusion is correct in spirit, in that it is the limit point of the unique symmetric equilibria of a sequence of games that approaches this game arbitrarily closely (and there is no symmetric equilibrium of the limit game).16 Here, therefore, the role of the RET is less to perform the ultimate analysis than it is to show that there is an interesting and simple result to be obtained.17 Of course by developing intuition about what the result must be, the RET also makes proving it much 15

16

17

Of course, the expectation of the second-highest valuation of the last two players is computed when just these two players remain, rather than at the beginning of the war of attrition as before. But, on average, these two expectations must be the same. Bulow and Klemperer (1999) analyze games in which each player pays costs at rate 1 before quitting, but must continue to pay costs even after quitting at rate c per unit time until the whole game ends. The limit c → 0 corresponds to the war of attrition discussed here. (The case c = 1 corresponds, for example, to “standards battles” or political negotiations in which all players bear costs equally until all have agreed on the same standard or outcome; this game also has interesting properties; see Bulow and Klemperer.) Other series of games, for example games in which being kth to last to quit earns a prize of ε k−1 times one’s valuation, with ε → 0, or games in which players can quit only at the discrete times 0, ε, 2ε, . . . , with ε → 0, also yield the same outcome in the limit. It was the RET that showed Bulow and Klemperer that there was an analysis worth doing. Many people, and some literature, had assumed the many-player case would look like the two-player case, but with more complicated expressions, although Fudenberg and Kreps (1987) and Haigh and Cannings (1989) observed a similar result to ours in games without any private information and in which all players’ values are equal. However, an alternative way to see the result in our war of attrition is to imagine the converse, but that a player is within ε of her planned quit time when n > 1 other players remain. Then, the player’s cost of waiting as planned is of the order ε, but her beneﬁt is of the order ε n , because only when all n other players are within ε of giving up will she ultimately win. So, for small ε, she will prefer to quit now rather than wait; but, in this case, she should, of course, have quit ε earlier, and so on. So, only when n = 1 is delay possible.

Auction Theory

31

easier. Furthermore, the RET was also useful in the actual analysis of the more complex games that Bulow and Klemperer (1999) used to approximate this game. In addition, anyone armed with a knowledge of the RET can simplify the analysis of the basic two-player war of attrition.

2.3.

Queueing and Other “All-Pay” Applications

The preceding applications have both been variants of “all-pay” auctions. As another elementary example of this kind, consider different queueing systems (e.g., for tickets to a sporting event). Under not unreasonable assumptions, a variety of different rules of queue management (e.g., making the queue more or less comfortable, informing or not informing people whether the number queueing exceeds the number who will receive a ticket, etc.) will make no difference to the social cost of the queueing mechanism. As in our litigation example (Section 2.1), we think of these results as a starting point for analysis rather than as ﬁnal conclusions.18 Many other issues – such as lobbying battles, political campaigns,19 tournaments in ﬁrms, contributions to public goods,20 patent races, and some kinds of price-setting oligopoly (see Section 5.2) – can be modeled as all-pay auctions and may provide similar applications.

2.4.

Solving for Equilibrium Behavior: Market Crashes and Trading “Frenzies”

The examples thus far have all proceeded by computing the expected total payments made by all players. But, the RET also states that each individual’s expected payment must be equal across mechanisms satisfying the assumptions. This fact can be used to infer what players’ equilibrium actions must be in games that would be too complex to solve by any direct method of computing optimal behavior.21 Consider the following model. The aim is to represent, for example, a ﬁnancial or housing market and show that trading “frenzies” and price “crashes” 18 19 20 21

Holt and Sherman (1982) compute equilibrium behavior and hence obtain these results without using the RET. See, especially, Persico (2000). Menezes, Monteiro, and Temimi (2000) use the RET in this context. The same approach is also an economical method of computing equilibrium bids in many standard auctions. For example, in an ascending auction for a single unit, the expected payment of a bidder equals her probability of winning times the expected second-highest valuation among all the bidders conditional on her value being higher. So, the RET implies that her equilibrium bid in a standard all-pay auction equals this quantity. Similarly, the RET implies that her equilibrium bid in a ﬁrst-price, sealed-bid auction equals the expected second-highest valuation among all the bidders, conditional on her value being higher. See Klemperer (1999a, Appendix A) for more details and discussion.

32

Klemperer

are the inevitable outcome of rational strategic behavior in a market that clears through a sequence of sales rather than through a Walrasian auctioneer. There are N potential buyers, each of whom is interested in securing one of K available units. Without fully modeling the selling side of the market, we assume it generates a single asking price at each instant of time according to some given function of buyer behavior to date. Each potential buyer observes all prices and all past offers to trade, and can accept the current asking price at any instant, in which case, supply permitting, the buyer trades at that price. Thus traders have to decide both whether and when to offer to buy, all the while conditioning their strategies on the information that has been revealed in the market to date. Regarding the function generating the asking prices, we specify only that (i) if there is no demand at a price, then the next asking price is lower, and (ii) if demand exceeds remaining supply at any instant, then no trade actually takes place at that time but the next asking price is higher and only those who attempted to trade are allowed to buy subsequently.22 Note, however, that even if we did restrict attention to a speciﬁc price-setting process, the direct approach of computing buyers’ optimal behavior using ﬁrst-order conditions as a function of all prior behavior to solve a dynamic program would generally be completely intractable. To use the RET, we must ﬁrst ensure that the appropriate assumptions are satisﬁed. We assume, of course, that buyers’ valuations are independently drawn from a common, strictly increasing, atomless distribution, and that there is no discounting during the time the mechanism takes. Furthermore, the objects do eventually go to the highest-valuation buyers, and the lowest-possible-valuation buyer makes zero surplus in equilibrium, because of our assumption that if demand ever exceeds remaining supply, then no trade takes place and nondemanders are henceforth excluded. So, the RET applies, and it also applies to any subgame of the whole game.23 Under our assumptions, then, starting from any point of the process, the remainder of the game is revenue equivalent to what would result if the game were halted at that point and the remaining k objects were sold to the remaining buyers using a standard ascending auction [which sells all k objects at the (k + 1)st-highest valuation among the remaining buyers]. At any point of our game, therefore, we know the expected payment of any buyer in the remainder of our game, and therefore also the buyer’s expected payment conditional on 22 23

Additional technical assumptions are required to ensure that all units are sold in ﬁnite time. See Bulow and Klemperer (1994) for full details. If, instead, excess demand resulted in random rationing, the highest-valuation buyers might not win, violating the requirements of the RET; so, even if we thought this was more natural, it would make sense to begin with our assumption to be able to analyze and understand the process using the RET. The effects of the alternative assumption could then be analyzed with the beneﬁt of the intuitions developed using the RET. Bulow and Klemperer (1994) proceed in exactly this way.

Auction Theory

33

winning.24 But any potential buyer whose expected payment conditional on winning equals or exceeds the current asking price will attempt to buy at the current price.25 This allows us to completely characterize buyer behavior, so fully characterizes the price path for any given rule generating the asking prices. It is now straightforward to show (see Bulow and Klemperer, 1994) that potential buyers are extremely sensitive to the new information that the price process reveals. It follows that almost any seller behavior – e.g., starting at a very high price and slowly lowering the price continuously until all the units are sold or there is excess demand – will result in “frenzies” of trading activity in which many buyers bid simultaneously, even though there is zero probability that two buyers have the same valuation.26 Furthermore, these frenzies will sometimes lead to “crashes” in which it becomes common knowledge that the market price must fall a substantial distance before any further trade will take place.27 Bulow and Klemperer also show that natural extensions to the model (e.g., “common values,” the possibility of resale, or an elastic supply of units) tend to accentuate frenzies and crashes. Frenzies and crashes arise precisely because buyers are rational and strategic; by contrast, buyer irrationality might lead to “smoother” market behavior. Of course, our main point here is not the details of the process, but rather that the RET permits the solution and analysis of the dynamic price path of a market that would otherwise seem completely intractable to solve for.

24

25

26

27

Speciﬁcally, if k objects remain, the buyer’s expected payment conditional on winning will be the expected (k + 1)st-highest valuation remaining conditional on the buyer having a valuation among the k-highest remaining, and conditional on all the information revealed to date. This is exactly the buyer’s expected payment conditional on winning an object in the ascending auction, because in both cases only winners pay and the probability of a bidder winning is the same. The marginal potential buyer, who is just indifferent about bidding now, either will win now or will never win an object. (If bidding now results in excess demand, this bidder will lose to inframarginal current bidders, because there is probability zero that two bidders have the same valuation.) So, conditional on winning, this bidder’s actual payment is the current price. Inframarginal bidders, whose expected payment conditional on winning exceeds the current price, may eventually end up winning an object at above the current price. To see why a frenzy must arise if the price is lowered continuously, note that, for it to be rational for any potential buyer to jump in and bid ﬁrst, there must be positive probability that there will be a frenzy large enough to create excess demand immediately after the ﬁrst bid. Otherwise, the strategy of waiting to bid until another player has bid ﬁrst would guarantee a lower price. For more general seller behavior, the point is that while buyers’ valuations may be very dispersed, higher-valuation buyers are all almost certainly inframarginal in terms of whether to buy and are therefore all solving virtually identical optimization problems of when to buy. So, a small change in asking price, or a small change in market conditions (such as the information revealed by a single trade) at a given price, can make a large number of buyers change from being unwilling to trade to wanting to trade. The only selling process that can surely avoid a frenzy is a repeated Dutch auction. The price process is also extremely sensitive to buyer valuations; an arbitrarily small change in one buyer’s value can discontinuously and substantially change all subsequent trading prices.

34

Klemperer

3. TRANSLATING LOOSER ANALOGIES FROM AUCTIONS INTO ECONOMICS: ASCENDING VS. (FIRST-PRICE) SEALED-BID AUCTIONS A major focus of auction theory has been contrasting the revenue and efﬁciency properties of “ascending” and “sealed-bid” auctions.28 Ideas and intuitions developed in these comparisons have wide applicability. 3.1.

Internet Sales vs. Dealer Sales

There is massive interest in the implications of e-commerce and internet sales. For example, the advent of internet sales in the automobile industry as a partial replacement for traditional methods of selling through dealers has been widely welcomed in Europe;29 the organization of the European automobile market is currently a major policy concern in both ofﬁcial circles and the popular press, and the internet sales are seen as increasing “transparency.” But is transparency a good thing? Auction theory shows that internet sales need not be good for consumers. Clearly, transparent prices beneﬁt consumers if they reduce consumers’ search costs so that, in effect, there are more competitors for every consumer,30 and internet sales may also lower prices by cutting out the ﬁxed costs of dealerships, albeit by also cutting out the additional services that dealers provide. But, transparency also makes internet sales more like ascending auctions, by contrast with dealer sales that are more like (ﬁrst-price) sealed-bid auctions, and we will show this is probably bad for consumers. Transparent internet prices are readily observable by a ﬁrm’s competitors and therefore result, in effect, in an “ascending” auction; a ﬁrm knows if and when its offers are being beaten and can rapidly respond to its competitors’ offers if it wishes. Viewing each car sale as a separate auction, the price any consumer faces falls until all but one ﬁrm quits bidding to sell to him. (The price is, of course, descending because ﬁrms are competing to sell, but the process corresponds exactly to the standard ascending auction among bidders competing to buy an object, and we therefore maintain the standard “ascending” terminology.) On the other hand, shopping to buy a car from one of the competing dealers is very much like procuring in a (ﬁrst-price) “sealed-bid” auction. It is typically impossible to credibly communicate one dealer’s offer to another. (Car dealers 28

29

30

By “sealed-bid,” we mean standard, ﬁrst-price, sealed-bid auctions. “Ascending” auctions have similar properties to second-price, sealed-bid auctions. See Klemperer (1999a) for an introduction to the different types of auctions. See, for example, “May the Net Be with You,” Financial Times, October 21, 1999, p. 22. In the UK, Vauxhaul began selling a limited number of special models over the Internet late in 1999, while Ford began a pilot project in Finland. There may be both a direct effect (that consumers can observe more ﬁrms) and an indirect effect (that new entry is facilitated). See Baye and Morgan (2001) and K¨uhn and Vives (1994) for more discussion.

Auction Theory

35

often deliberately make this hard by refusing to put an offer in writing.) From the buyer’s perspective, it is as if sellers were independently making sealed-bid offers in ignorance of the competition. Of course, the analogies are imperfect,31 but they serve as a starting point for analysis. What, therefore, does auction theory suggest? Because, under the conditions of the revenue equivalence theorem, there is no difference between the auction forms for either consumer or producer welfare, we consider the implications of the most important violations of the conditions. First, market demand is downward sloping, not inelastic.32 Hansen (1988) showed that this means consumers always prefer the sealed-bid setting, and ﬁrms may prefer it also; the sum of producer and consumer surpluses is always higher in a sealed-bid auction.33 The intuition is that, in an “ascending” auction, the sales price equals the runner-up’s cost, and is therefore less reﬂective of the winner’s cost than is the sealed-bid price. So, the sealed-bid auction is more productively efﬁcient (the quantity traded better reﬂects the winner’s cost) and provides greater incentive for aggressive bidding (a more aggressive sealed bid not only increases the probability of winning, but also increases the quantity traded contingent on winning). Second, we need to consider the possibilities for collusion, implicit or explicit. The general conclusion is that ascending auctions are more susceptible to collusion, and this is particularly the case when, as in our example, many auctions of different car models and different consumers are taking place simultaneously.34 As has been observed in the United States and German auctions of radiospectrum, for example, bidders may be able to tacitly coordinate on dividing up the spoils in a simultaneous ascending auction. Bidders can use the early rounds when prices are still low35 to signal their views about who should win which objects, and then, when consensus has been reached, tacitly agree 31

32

33 34

35

The analogies are less good for many other products. For lower-value products than cars, internet sales are less like an “ascending” auction because search costs will allow price dispersion, while traditional sales through posted prices in high-street stores are more like “ascending” auctions than are dealer sales of cars. Note also that the outcomes of the two auction types differ most when competitors have private information about their costs, which is more likely when competitors are original manufacturers than when competitors are retailers selling goods bought at identical prices from the same wholesaler. For an individual consumer, demand might be inelastic for a single car up to a reservation price. From the point of view of the sellers who do not know the consumer’s reservation price, the expected market demand is downward sloping. Of course, Hansen is maintaining the other important assumptions of the revenue equivalence theorem. See Robinson (1985) and Milgrom (1987) for discussion of the single-unit case. See Ausubel and Schwartz (1999), Brusco and Lopomo (1999), Cramton and Schwartz (2000), EngelbrechtWiggans and Kahn (1998), Menezes (1996), and Weber (1997), for the multi-unit case. Klemperer (2002a) reviews these arguments and gives many examples. Bidders are competing to buy rather than sell spectrum, so prices are ascending rather than descending.

36

Klemperer

to stop pushing prices up.36 The same coordination cannot readily be achieved in simultaneous sealed-bid auctions, in which there is neither the opportunity to signal, nor the ability to retaliate against a bidder who fails to cooperate.37 The conclusion is less stark when there are many repetitions over time, but it probably remains true that coordination is easier in ascending auctions. Furthermore, as is already well understood in the industrial organization literature,38 this conclusion is strengthened by the different observabilities of internet and dealer sale prices that make mutual understanding of ﬁrms’ strategies, including defections from “agreements,” far greater in the internet case. Thus selling over the internet probably makes it easier for ﬁrms to collude. A third important issue is that bidders may be asymmetric. Then “ascending” auctions are generally more efﬁcient (because the lowest-cost bidders win39 ), but sealed-bid auctions typically yield lower consumer prices.40 In this case economists generally favor ascending auctions, but competition-policy practitioners should usually prefer sealed-bid auctions because most competition regimes concentrate on consumer welfare. Furthermore, this analysis ignores the impact of auction type on new entry in the presence of asymmetries. Because an “ascending” auction is generally efﬁcient, a potential competitor with even a slightly higher cost (or lower quality) than an incumbent will see no point in entering the auction. However, the same competitor might enter a sealed-bid auction, which gives a weaker bidder a shot 36

37

38 39

40

For example, in a 1999 German spectrum auction, Mannesman bid a low price for half the licenses and a slightly lower price for the other half. Here is what one of T-Mobil’s managers said: “There were no agreements with Mannesman. But [T-Mobil] interpreted Mannesman’s ﬁrst bid as an offer.” T-Mobil understood that it could raise the bid on the other half of the licenses slightly, and that the two companies would then “live and let live,” with neither company challenging the other on “their” half. Just that happened. The auction closed after just two rounds, with each of the bidders having half the licenses for the same low price. See Jehiel and Moldovanu (2000) and Grimm et al. (2001). In U.S. FCC auctions, bidders have used the ﬁnal three digits of multimillion dollar bids to signal the market id codes of the areas they coveted, and a 1997 auction that was expected to raise $1,800 million raised less than $14 million. See Cramton and Schwartz (2001), and “Learning to Play the Game,” The Economist, May 17, 1997, p. 120. Klemperer (2002a) gives many more examples. The low prices in the ascending auction are supported by the threat that, if a bidder overbids a competitor anywhere, then the competitor will retaliate by overbidding the ﬁrst bidder on markets where the ﬁrst bidder has the high bids. At least since Stigler (1964). To the extent that the auctions for individual consumers are independent single-unit auctions, an ascending auction is efﬁcient under a broad class of assumptions if bidders’ private signals are single-dimensional, even with asymmetries among bidders and common-value components to valuations. See Maskin (1992). A price-minimizing auction allocates the object to the bidder with the lowest “virtual cost,” rather than to the one with the lowest actual cost. (See Section 4; virtual cost is the analogous concept to marginal revenue for an auction to buy an object.) Compared with an ascending auction, a sealed-bid auction discriminates in favor of selling to “weaker” bidders, whose costs are drawn from higher distributions, because they bid more aggressively (closer to their actual costs) than stronger ones. But, for a given cost, a weaker bidder has a lower virtual cost than a stronger one. So, the sealed-bid auction often, but not always, yields lower prices. See Section 7.1 of Klemperer (1999a).

Auction Theory

37

at winning. The extra competition may lower prices substantially. Of course, the entry of the weaker competitor may also slightly reduce efﬁciency, but if competition is desirable per se, or if competition itself improves efﬁciency, or if the objective is consumer welfare rather than efﬁciency, then the case for sealed-bid auctions is very strong (see next subsection and Klemperer, 2002a). Although there are other dimensions in which our setting fails the revenue equivalence assumptions, they seem less important.41 It follows that the transparency induced between ﬁrms that makes internet sales more like ascending auctions than sealed-bid auctions is probably bad for consumers. Although gains from lower consumer search costs and dealer costs could certainly reverse this conclusion, auction-theoretic considerations mount a strong case against “transparent” internet sales.42 In another application of auction-theoretic insights to e-commerce, Bulow and Klemperer (2002b) apply Milgrom and Weber’s (1982) celebrated linkage principle to show when the price discrimination that internet markets make possible helps consumers. 3.2.

Anglo-Dutch Auctions, a Theory of Rationing, and Patent Races

The last disadvantage of ascending auctions discussed earlier – the dampening effect on entry – has been very important in practical auction contexts (see Klemperer 2002a). For example, in the main (1995) auction of U.S. mobilephone licenses, some large potential bidders such as MCI, the U.S.’s third-largest phone company, failed to enter at all, and many other bidders were deterred from competing seriously for particular licenses such as the Los Angeles and New York licenses, which therefore sold at very low prices.43 Entry was therefore a prominent concern when the UK planned an auction of four UMTS “thirdgeneration” mobile-phone licenses in 1998 for a market in which four companies operated mobile telephone services and therefore had clear advantages over any new entrant.44 In this case, the design chosen was an “Anglo-Dutch” auction as ﬁrst proposed in Klemperer (1998):45 in an Anglo-Dutch auction for four licenses, the 41 42

43

44 45

Other violations of the revenue equivalence assumptions may include buyer and seller risk aversion that both favor sealed-bid auctions and afﬁliation of costs that favors ascending auctions. Empirical evidence is limited. Lee (1998) and Lee et al. (1999) ﬁnd electronic markets yield higher prices than conventional markets for cars. Scott Morton et al. (2001) ﬁnd that California customers get lower prices if they use automobile internet sites, but this is unsurprising because these sites merely refer customers to dealers for price quotes, so behave more like traditional dealers than like the “transparent” sites that we have described and that are being promised in Europe. See Klemperer and Pagnozzi (2002) for econometric evidence of these kinds of problems in U.S. spectrum auctions, Bulow and Klemperer (2000) and Klemperer (1998) for extensive discussion, and Bulow, Huang, and Klemperer (1999) for related modeling. Bidders could not be allowed to win more than one license each. See Klemperer (1998, 2002a) and Radiocommunications Agency (1998a,1998b) for more details and for variants on the basic design. (The Agency was advised by Binmore, Klemperer, and others.)

38

Klemperer

price rises continuously until ﬁve bidders remain (the “English” stage), after which the ﬁve survivors make sealed bids (required to be no lower than the current price level) and the four winners pay the fourth-highest bid (the “Dutch” stage). Weak bidders have an incentive to enter such an auction because they know they have a chance of winning at the sealed-bid stage if they can survive to be among the ﬁve ﬁnalists. The design accepts some risk of an ex post inefﬁcient allocation to increase the chance of attracting the additional bidders that are necessary for a successful auction and reasonable revenues.46,47 Translating this idea into a more traditional economics context suggests a theory of why ﬁrms might ration their output at prices in which there is excess demand as, for example, microprocessor manufacturers routinely do after the introduction of a new chip. Raising the price to clear the market would correspond to running an ascending auction. It would be ex post efﬁcient and ex post proﬁt maximizing, but would give poor incentives for weaker potential customers who fear being priced out of the market to make the investments necessary to enter the market (such as the product design necessary to use the new chip). Committing to rationing at a ﬁxed price at which demand exceeds supply is ex post inefﬁcient,48 but may encourage more entry into the market and so improve ex ante proﬁts. Details and more examples are in Gilbert and Klemperer (2000). A similar point is that a weaker ﬁrm may not be willing to enter a patent race in which all parties can observe others’ progress. Such a race is akin to an ascending auction in which a stronger rival can always observe and overtake a weaker ﬁrm, which therefore has no chance of winning.49 A race in which rivals’ progress cannot be monitored is more akin to a sealed-bid auction and may attract more entry.

46 47

48 49

The additional bidders might yield a higher price even after the English stage, let alone after the ﬁnal stage, than in a pure ascending auction. The design performed very successfully in laboratory testing, but the auction was delayed until 2000, and technological advances made it possible to offer ﬁve licenses, albeit of different sizes. The additional license resolved the problem of attracting new entrants, and because collusion was not a serious problem in this case (bidders were not allowed to win more than one license each), it was decided to switch to a simultaneous ascending design. The actual UK auction was very successful, but the wisdom of the UK decision not to run an ascending auction when the number of strong bidders equaled the number of licences was conﬁrmed when the Netherlands did just this three months later, and raised little more than one-quarter of the per capita revenue raised by the UK In large part, the Netherlands’ problem was that their ascending auction deterred entry, Denmark also had the same number of strong bidders as licences, and (successfully) used a sealed-bid auction for similar reasons that the UK would have run an Anglo-Dutch auction in this context. (In Denmark it was clear that there were too few potential bidders to make an Anglo stage worthwhile.) See Klemperer (2002a, 2002b, 2002c) for more detail. We assume any resale is inefﬁcient. But see Cramton, Gibbons, and Klemperer (1987). Of course, this point is closely related to the idea of “ε-preemption” in R&D races with observability that has already been well discussed in the standard industrial organization literature (Fudenberg et al. 1983).

Auction Theory

39

These analogies illustrate how an insight that is routine in auction theory may help develop ideas in economics more broadly. 4. EXPLOITING DEEPER CONNECTIONS BETWEEN AUCTIONS AND ECONOMICS: MARGINAL REVENUES The previous sections showed how a variety of economic problems can be thought of in auction-theoretic terms, allowing us to use tools such as the revenue equivalence theorem and intuitions such as those from the comparison of ascending and sealed-bid auctions. This section explains that the connections between auction theory and standard economic theory run much deeper. Much of the analysis of optimal auctions can be phrased, like the analysis of monopoly, in terms of “marginal revenues.” Imagine a ﬁrm whose demand curve is constructed from an arbitrarily large number of bidders whose values are independently drawn from a bidder’s value distribution. When bidders have independent private values, a bidder’s “marginal revenue” is deﬁned as the marginal revenue of this ﬁrm at the price that equals the bidder’s actual value (see Figure 2.2).50 Although it had been hinted at before,51 the key point was ﬁrst explicitly drawn out by Bulow and Roberts (1989), who showed that under the assumptions of the revenue equivalence theorem the expected revenue from an auction equals the expected marginal revenue of the winning bidder(s). The new results in the article were few – the paper largely mimicked Myerson (1981), while renaming Myerson’s concept of “virtual utility” as “marginal revenue”52,53 – but their contribution was nevertheless important. Once the connection had been made, it was possible to take ways of thinking that are second nature to economists from the standard theory of monopoly pricing and apply them to auction theory. 50

51 52

53

The point of this construction is particularly clear when a seller faces a single bidder whose private value is distributed according to F(v). Then, setting a take-it-or-leave-it price of v yields expected sales, or “demand,” 1 − F(v), expected revenue of v(1 − F(v)), and expected marginal revenue d(qv)/dq = v − (1 − F(v))/ f (v). See Appendix B of Klemperer (1999a). For example, Mussa and Rosen’s (1978) analysis of monopoly and product quality contained expressions for “marginal revenue” that look like Myerson’s (1981) analysis of optimal auctions. Myerson’s results initially seemed unfamiliar to economists, in part because his basic analysis (although not all his expressions) expressed virtual utilities as a function of bidders’ values, which correspond to prices, and so computed revenues by integrating along the vertical axis, whereas we usually solve monopoly problems by expressing marginal revenues as functions of quantities and integrating along the horizontal axis of the standard (for monopoly) picture. Bulow and Roberts emphasize the close parallel between a monopolist third-degree pricediscriminating across markets with different demand curves, and an auctioneer selling to bidrevenue ders whose valuations are drawn from different distributions. For the { monopolist auctioneer }, { expected revenue } is maximized by selling to the { consumers } with the highest marginal revenue(s), not necessarily bidder } with marginal revenue less than the the highest value(s), subject to never selling to a { consumer bidder marginal cost { monopolist’s auctioneer’s own valuation }, assuming (i) resale can be prohibited, (ii) credible commitment can be marginal revenue curves are all downward sloping future sales higher “types” of any bidder have higher marginal revenues than lower “types” }, etc. made to { no sticking to any reserve price }, and (iii) { of the same bidder

40

Klemperer

Figure 2.2. Construction of marginal revenue of bidder with value v˜ drawn from distribution F(v) on [v, v¯ ].

For example, once the basic result (that an auction’s expected revenue equals the winning bidder’s expected marginal revenue) was seen, Bulow and Klemperer (1996) were able to use a simple monopoly diagram to derive it more simply and under a broader class of assumptions than had previously been done by Myerson or Bulow and Roberts.54 Bulow and Klemperer also used standard monopoly intuition to derive additional results in auction theory. The main beneﬁts from the marginal-revenue connection come from translating ideas from monopoly analysis into auction analysis, because most economists’ intuition for and understanding of monopoly is much more highly developed than for auctions. But, it is possible to go in the other direction, too, from auction theory to monopoly theory. Consider, for example, the main result of Bulow and Klemperer (1996): Proposition 4.1 (Auction-Theoretic Version). An optimal auction of K units to Q bidders earns less proﬁt than a simple ascending auction (without a reserve price) of K units to Q + K bidders, assuming (a) bidders are symmetric, (b) bidders are serious (i.e., their lowest possible valuations exceed the seller’s supply cost), and (c) bidders with higher valuations have higher marginal revenues.55 Proof. See Bulow and Klemperer (1996). 54 55

See Appendix B of Klemperer (1999a) for an exposition. See Bulow and Klemperer (1996) for a precise statement. We do not require bidders’ valuations to be private, but do place some restrictions on the class of possible mechanisms from which the “optimal” one is selected, if bidders are not risk neutral or their signals are not independent. We assume bidders demand a single unit each.

Auction Theory

41

Application. One application is to selling a ﬁrm (so, K = 1). Because the seller can always resort to an ascending auction, attracting a single additional bidder is worth more than any amount of negotiating skill or bargaining power against an existing bidder or bidders, under reasonable assumptions. Thus, there is little justiﬁcation for, for example, accepting a “lock-up” bid for a company without fully exploring the interest of alternative possible purchasers. The optimal auction translates, for large Q and K , to the monopolist’s optimum. An ascending auction translates to the competitive outcome, in which price-taking ﬁrms make positive proﬁts only because of the ﬁxed supply of units. (An ascending auction yields the K + 1st-highest value among the bidders; in a perfectly competitive market, an inelastic supply of K units is in equilibrium with demand at any price between the K th and K + 1st-highest value, but the distinction is unimportant for large K .) So, one way of expressing the result in the market context is: Proposition 4.2 (Monopoly-Theoretic Version). A perfectly competitive industry with (ﬁxed) capacity K and Q consumers would gain less by fully cartelizing the industry (and charging the monopoly price) than it would gain by attracting K new potential customers into the industry with no change in the intensity of competition, assuming (a ) the K new potential consumers have the same distribution of valuations as the existing consumers, (b ) all consumers’ valuations for the product exceed sellers’ supply costs (up to sellers’ capacity), and (c ) the marginal-revenue curve constructed from the market-demand curve is downward sloping.56

Proof. No proof is required – the proposition is implied by the auction-theoretic version – but once we know the result we are looking for and the necessary assumptions, it is very simple to prove it directly using introductory undergraduate economics. We do this in a brief Appendix 2. Application. One application is that this provides conditions under which a joint-marketing agency does better to focus on actually marketing rather than (as some of the industrial organization literature suggests) on facilitating collusive practices.57 5. APPLYING AUCTION THEORY TO PRICE-SETTING OLIGOPOLIES We have stressed the applications of auction theory to contexts that might not be thought of as auctions, but even though price-setting oligopolies are obviously 56 57

We are measuring capacity in units such that each consumer demands a single unit of output. Appendix 2 makes it clear how the result generalizes. Of course, the agency may wish to pursue both strategies in practice.

42

Klemperer

auctions, the insights that can be obtained by thinking of them in this way are often passed by.

5.1.

Marginal-Cost Pricing Is Not the Unique Bertrand Equilibrium

One of the most famous results in economics is the “Bertrand paradox,” that with just two ﬁrms with constant and equal marginal costs in a homogeneous products industry, the unique equilibrium is for both ﬁrms to set price equal to marginal cost and ﬁrms earn zero proﬁt. This “theorem” is widely quoted in standard texts. But, it is false. There are other equilibria with large proﬁts, for some standard demand curves, a fact that seems until recently to have been known only to a few auction theorists.58 Auction theorists are familiar with the fact that a boundary condition is necessary to solve a sealed-bid auction. Usually, this is imposed by assuming no bidder can bid less than any bidder’s lowest possible valuation, but there are generally a continuum of equilibria if arbitrarily negative bids are permitted.59 Exactly conversely, with perfectly inelastic demand for one unit and, for example, two risk-neutral sellers with zero costs, it is a mixed-strategy equilibrium for each ﬁrm to bid above any price p with probability k/ p, for any ﬁxed k. (Each ﬁrm therefore faces expected residual demand of constant elasticity −1, and is therefore indifferent about mixing in this way; proﬁts are k per ﬁrm.) It is not hard to see that a similar construction is possible with downwardsloping demand, for example, standard constant-elasticity demand, provided that monopoly proﬁts are unbounded. [See, especially, Baye and Morgan (1999a) and Kaplan and Wettstein (2000).] One point of view is that the nonuniqueness of the “Bertrand paradox” equilibrium is a merely technical point, because it requires “unreasonable” (even though often assumed60 ) demand. However, the construction immediately suggests another more important result: quite generally (including for demand which becomes zero at some ﬁnite choke price), there are very proﬁtable mixed-strategy ε-equilibria to the Bertrand game, even though there are no pure-strategy ε-equilibria. That is, there are mixed strategies that are very different from marginal-cost pricing 58

59

60

We assume ﬁrms can choose any prices. It is well known that if prices can be quoted only in whole pennies, there is an equilibrium with positive (but small) proﬁts in which each ﬁrm charges one penny above cost. (With perfectly inelastic demand, there is also an equilibrium in which each ﬁrm charges two pennies above cost.) For example, if each of two risk-neutral bidders’ private values is independently drawn from a uniform distribution on the open interval (0, 1), then for any nonnegative k there is an equilibrium in which a player with value v bids v/2 − k/v. If it is common knowledge that both bidders have value zero, there is an equilibrium in which each player bids below any price − p with probability k/ p, for any ﬁxed nonnegative k. This demand can, for example, yield unique and ﬁnite-proﬁt Cournot equilibrium.

Auction Theory

43

in which no player can gain more than a very small amount, ε, by deviating from the strategies.61 (There are also “quantal response” equilibria with a similar ﬂavor.) Experimental evidence suggests that these strategies may be empirically relevant (see Baye and Morgan, 1999b).62

5.2.

The Value of New Consumers

The revenue equivalence theorem (RET) can, of course, be applied to pricesetting oligopolies.63 For example: what is the value of new consumers in a market with strong brand loyalty? If ﬁrms can price discriminate between new uncommitted consumers and old “locked-in” consumers, Bertrand competition for the former will mean their value is low, but, what if price discrimination is impossible? In particular, it is often argued that new youth smokers are very valuable to the tobacco industry because brand loyalty (as well as loyalty to the product) is very high (only about 10 percent of smokers switch brands in any year), so price-cost margins on all consumers are very high. Is there any truth to this view? The answer, of course, under appropriate assumptions, is that the RET implies that the ability to price discriminate is irrelevant to the value of the new consumers (see the discussion in Section 2). With price discrimination, we can model the oligopolists as acting as monopolists against their old customers, and as being in an “ascending”64 price auction for the uncommitted consumers with the ﬁrm that is prepared to price the lowest selling to all these consumers at the cost of the runner-up ﬁrm. Alternatively, we can model the oligopolists as making sealed bids for the uncommitted consumers, with the lowest bidder selling to these consumers at its asking price. The expected proﬁts are the same under the RET assumptions. Absent price discrimination, a natural model is the latter one, but in addition each oligopolist must discount its price to its own locked-in customers down to the price it bids for the uncommitted consumers. The RET tells us that the total cost to the industry of these “discounts” to old consumers will, on average, precisely compensate the higher

61

62

63

64

Of course, the concept of mixed-strategy ε-equilibrium used here is even more contentious than either mixed-strategy (Nash) equilibria or (pure-strategy) ε-equilibrium. The best defense for it may be its practical usefulness. Spulber (1995) uses the analogy with a sealed-bid auction to analyze a price-setting oligopoly in which, by contrast with our discussion, ﬁrms do not know their rivals’ costs. For a related application of auction theory to price-setting oligopoly, see Athey et al. (2000). As another example, Vives (1999) uses the revenue equivalence theorem to compare pricesetting oligopoly equilibria with incomplete and complete (or shared) information about ﬁrms’ constant marginal costs, and so shows information sharing is socially undesirable in this context. The price is descending because the oligopolists are competing to sell rather than buy, but it corresponds to an ascending auction in which ﬁrms are competing to buy, and we stick with this terminology as in Section 3.1.

44

Klemperer

sale price achieved on new consumers.65 That is, the net value to the industry of the new consumers is exactly as if there was Bertrand competition for them, even when the inability to price discriminate prevents this. Thus, Bulow and Klemperer (1998) argue that the economic importance to the tobacco companies of the youth market is actually very tiny, even though from an accounting perspective new consumers appear as valuable as any others.66 Similarly, applying the same logic to an international trade question, the value of a free-trading market to ﬁrms, each of which has a protected home market, is independent (under appropriate assumptions) of whether the ﬁrms can price discriminate between markets.67 Section 3.1’s discussion of oligopolistic e-competition develops this kind of analysis further by considering implications of failures of the RET. 5.3.

Information Aggregation in Perfect Competition

Although the examples cited previously, and in Section 3,68 suggest auction theory has been underused in analyzing oligopolistic competition, it has been very important in inﬂuencing economists’ ideas about the limit as the number of ﬁrms becomes large. An important strand of the auction literature has focused on the properties of pure-common-value auctions as the number of bidders becomes large, and asked: does the sale price converge to the true value, thus fully aggregating all of the economy’s information even though each bidder has only partial information? Milgrom (1979) and Wilson (1977) showed assumptions under which the answer is “yes” for a ﬁrst-price, sealed-bid auction. Milgrom (1981) obtained similar results for a second-price auction [or for a (k + 1)th-price auction for k objects].69 These models justify some of our ideas about perfect competition. 65

66 67 68 69

Speciﬁcally let n “old” consumers be attached to each ﬁrm i, and ﬁrms’ costs ci be independently drawn from a common, strictly increasing, atomless distribution. There are m “new” consumers who will buy from the cheapest ﬁrm. All consumers have reservation price r . Think of ﬁrms competing for the prize of selling to the new consumers, worth m(r − ci ) to ﬁrm i. Firms set prices pi = r − di to “new” consumers; equivalently, they set “discounts” di to consumers’ reservation prices. If price discrimination is feasible, the winner pays mdi for the prize and all ﬁrms sell to their old consumers at r . Absent price discrimination, the prices pi apply to all ﬁrms’ sales, so relative to selling just to old consumers at price r , the winner pays (m + n)di for the prize and the losers pay ndi each. For the usual reasons, the two sets of payment rules are revenue equivalent. For more discussion of this result, including its robustness to multiperiod contexts, see Bulow and Klemperer (1998); if the total demand of new consumers is more elastic, their economic value will be somewhat less than our model suggests; for a fuller discussion of the effects of “brand loyalty” or “switching costs” in oligopoly, see, especially, Beggs and Klemperer (1992) and Klemperer (1987a, 1987b, 1995). If industry executives seem to value the youth segment, it is probably due more to concern for their own future jobs than concern for their shareholders. See also Rosenthal (1980). Bulow and Klemperer (2002b) provides an additional example. Matthews (1984), on the other hand, showed that the (ﬁrst-price) sale price does not in general converge to the true value when each bidder can acquire information at a cost. Pesendorfer and

Auction Theory

45

6. APPLYING AUCTION THEORY (AND ECONOMICS) TO AUCTION MARKETS Finally, although it has not always been grasped by practitioners, some markets are literally auctions. The increasing recognition that many real markets are best understood through the lens of auction theory has stimulated a burst of new theorizing,70 and created the new subject of market design that stands in similar relation to auction theory as engineering does to physics. 6.1.

Important Auction Markets

It was not initially well understood that deregulated electricity markets, such as in the United Kingdom, are best described and analyzed as auctions of inﬁnitely divisible quantities of homogeneous units.71 Although much of the early analysis of the UK market was based on Klemperer and Meyer (1989), which explicitly followed Wilson’s (1979) seminal contribution to multiunit auctions, the Klemperer and Meyer model was not thought of as an “auctions” paper, and only recently received much attention among auction theorists.72 Indeed, von der Fehr and Harbord (1993) were seen as rather novel in pointing out that the new electricity markets could be viewed as auctions. Now, however, it is uncontroversial that these markets are best understood through auction theory, and electricity market design has become the province of leading auction theorists, such as Wilson, who have been very inﬂuential. Treasury bill auctions, like electricity markets, trade a divisible homogeneous good; but, although treasury auctions have always been clearly understood to be “auctions,” and the existing auction theory is probably even more relevant to treasury markets than to electricity markets,73 auction theorists have never been as inﬂuential as they are now in energy markets. In part, this is

70 71 72

73

Swinkels (1997) recently breathed new life into this literature, by showing convergence under weaker assumptions than previously if the number of objects for sale, as well as the number of bidders, becomes large. See also Kremer (2000), Swinkels (2001), and Pesendorfer and Swinkels (2000). Especially on multiunit auctions in which bidders are not restricted to winning a single unit each, because most markets are of this kind. von der Fehr and Harbord (1998) provide a useful overview of electricity markets. Klemperer and Meyer (1989) was couched as a traditional industrial organization study of the question of whether competition is more like Bertrand or Cournot, following Klemperer and Meyer (1986). Non-auction-theoretic issues that limit the direct application of auction theory to electricity markets include the very high frequency of repetition among market participants who have stable and predictable requirements, which makes the theory of collusion in repeated games also very relevant; the nature of the game the major electricity suppliers are playing with the industry regulator who may step in and attempt to change the rules (again) if the companies are perceived to be making excessive proﬁts; the conditions for new entry; and the effects of vertical integration of industry participants. On the other hand, the interaction of a treasury auction with the ﬁnancial markets for trading the bills both before and after the auction complicates the analysis of that auction.

46

Klemperer

because the treasury auctions predated any relevant theory,74 and the auctions did not seem to have serious problems. In part it may be because no clear view has emerged about the best form of auction to use; indeed, one possibility is that the differences between the main types of auction may not be too important in this context – see Klemperer (2002a).75 Academics were involved at all stages of the radiospectrum auctions, from suggesting the original designs to advising bidders on their strategies. The original U.S. proponents of an auction format saw it as a complex environment that needed academic input, and a pattern of using academic consultants was set in the U.S. and spread to other countries.76 Many other new auction markets are currently being created using the internet, such as the online consumer auctions run by eBay, Amazon, and others that have more than 10 million customers, and the business-to-business auto parts auctions being planned by General Motors, Ford, and Daimler-Chrysler that are expected to handle $250 million in transactions a year. Here, too, auction theorists have been in heavy demand, and there is considerable ongoing

74 75

76

By contrast, the current U.K. government sales of gold are a new development, and government agencies have now consulted auction theorists (including myself) about the sale method. In a further interesting contrast, the U.K. electricity market – the ﬁrst major market in the world to be deregulated and run as an auction – was set up as a uniform price auction, but its perceived poor performance has led to a planned switch to an exchange market, followed by a discriminatory auction (see Klemperer 2002a; Ofﬁce of Gas and Electricity Markets 1999; Newbery 1998, Wolfram 1998, 1999). Meanwhile, the vast majority of the world’s treasury bill markets have until recently been run as discriminatory auctions (see Bartolini and Cottarelli 1997), but the U.S. switched to uniform price auctions in late 1998, and several other countries have been experimenting with these. In fact, it seems unlikely that either form of auction is best either for all electricity markets or for all treasury markets (see, e.g., Klemperer 1999b, Federico and Rahman 2000, McAdams 1998, Nyborg and Sundaresan 1996). Evan Kwerel was especially important in promoting the use of auctions. The dominant design has been the simultaneous ascending auction sketched by Vickrey (1976), and proposed and developed by McAfee, Milgrom, and Wilson for the U.S. auctions. (See McMillan 1994, McAfee and McMillan 1996, and especially Milgrom forthcoming.) Although some problems have emerged, primarily its susceptibility to collusion and its inhospitability to entry (see Section 3.2), it has generally been considered a success in most of its applications (see, e.g., Board 1999, Cramton 1997, Plott 1997, Salant 1997, Weber 1997, and Zheng 1999). A large part of the motivation for the U.S. design was the possibility of complementarities between licenses (see Ausubel et al. 1997), although it is unproven either that the design was especially helpful in allowing bidders to aggregate efﬁcient packages, or that it would work well if complementarities had been very signiﬁcant. Ironically, the simultaneous ascending auction is most attractive when each of an exogenously ﬁxed number of bidders has a privately known value for each of a collection of heterogeneous objects, but (contrary to the U.S. case) is restricted to buying at most a single license. In this case, entry is not an issue, collusion is very unlikely, and the outcome is efﬁcient. For this reason a version of the simultaneous ascending auction was designed by Binmore and Klemperer for the U.K. 3G auction (in which each bidder was restricted to a single license) after concerns about entry had been laid to rest. A sealed-bid design was recently used very successfully in Denmark where attracting entry was a serious concern. See Section 3.2, see Binmore and Klemperer (2002) for discussion of the U.K. auction, and see Klemperer (2002a, 2002b, 2002c, 2002d, 2002e) for a discussion of the recent European spectrum auctions.

Auction Theory

47

experimentation with different auctions forms.77 Furthermore, we have already argued that internet markets that are not usually thought of as auctions can be illuminated by auction theory [see Section 3.1 and Bulow and Klemperer (2002b)]. 6.2.

Applying Economics to Auction Design

Although many economic markets are now fruitfully analyzed as auctions, the most signiﬁcant problems in auction markets and auction design are probably those with which industry regulators and competition authorities have traditionally been concerned – discouraging collusive, predatory, and entry-deterring behavior, and analyzing the merits of mergers or other changes to market structure. This contrasts with most of the auction literature that focuses on Nash equilibria in one-shot games with a ﬁxed number of bidders, and emphasizes issues such as the effects of risk aversion, correlation of information, budget constraints, complementarities, asymmetries, etc. Certainly these are also important topics – and auction theorists have made important progress on them that other economic theory can learn from – but they are probably not the main issues. Although the relative thinness of the auction-theoretic literature on collusion and entry deterrence may be defensible to the extent general economic principles apply, there is a real danger that auction theorists will underemphasize these problems in applications. In particular, ascending, second-price, and uniformprice auction forms, although attractive in many auction theorists’ models, are more vulnerable to collusive and predatory behavior than (ﬁrst-price) sealed-bid and hybrid forms, such as the Anglo-Dutch auction described in Section 3.2. Klemperer (2002a) provides an extensive discussion of these issues. Although auction theorists are justly proud of how much they can teach economics, they must not forget that the classical lessons of economics continue to apply. 7. CONCLUSIONS Auction theory is a central part of economics and should be a part of every economist’s armory; auction theorists’ ways of thinking shed light on a whole range of economic topics. We have shown that many economic questions that do not at ﬁrst sight seem related to auctions can be recast to be solvable using auction-theoretic techniques, such as the revenue equivalence theorem. The close parallels between auction theory and standard price theory – such as those between the theories of optimal auctions and of price discrimination – mean ideas can be arbitraged 77

See, e.g., Hall (2001). The UK government recently used the internet to run the world’s ﬁrst auction for greenhouse gas emissions reductions. (Peter Cramton, Eric Maskin, and I advised on the design, and Larry Ausubel and Jeremy Bulow also helped with the implemention.)

48

Klemperer

from auction theory to standard economics, and vice versa. The insights and intuitions that auction theorists have developed in comparing different auction forms can ﬁnd fertile application in many other contexts. Furthermore, although standard auction theory models already provide the basis of much work in labor economics, political economy, ﬁnance, and industrial organization, we have used the example of price-setting oligopoly to show that a much greater application of auction-theoretic thinking may be possible in these more obvious ﬁelds. “Heineken refreshes the parts other beers cannot reach” was recently voted one of the top advertising campaigns of all time, worldwide. The moral of this paper is that, “Auction theory refreshes the parts other economics cannot reach.” Like Heineken, auction theory is a potent brew that we should all imbibe.

ACKNOWLEDGMENTS Susan Athey was an excellent discussant of this paper. I have also received extremely helpful comments and advice from many other friends and colleagues, including Larry Ausubel, Mike Baye, Alan Beggs, Simon Board, Jeremy Bulow, Peter Cramton, Joe Farrell, Giulio Federico, Nils Hendrik von der Fehr, Dan Kovenock, David McAdams, Peter McAfee, Flavio Menezes, Meg Meyer, Jonathan Mirrlees-Black, John Morgan, Marco Pagnozzi, Nicola Persico, Eric Rasmussen, David Salant, Margaret Stevens, Rebecca Stone, Lucy White, Mark Williams, Xavier Vives, Caspar de Vries, and Charles Zheng.

APPENDIX 1: COMPARING LITIGATION SYSTEMS Assume that after transfers between the parties, the loser ends up paying fraction α ≥ 0 of his own expenses and fraction β ≤ 1 of his opponent’s. (The winner pays the remainder.)78 The American system is α = 1, β = 0; the British system is α = β = 1; the Netherlands system is, roughly, α = 1, 0 < β < 1; and Quayle’s is α = 2, β = 0. It is also interesting to consider a “reverse-Quayle” rule α = 1, β < 0 in which both parties pay their own expenses, but the winner transfers an amount proportional to her own expenses to the loser. Let L be the average legal expenses spent per player. The following slight generalization of the RET is the key: assuming the conditions of the RET all hold except for assumption (ii) (i.e., the expected surplus of a bidder with the lowest feasible valuation, say S, may not be zero), it remains true that the expected surplus of any other types of bidder is a ﬁxed amount above S. [See, e.g., Klemperer (1999a; Appendix A); the ﬁxed amount 78

As in the main text, we assume a symmetric equilibrium with strictly increasing bidding functions. For extreme values of α and β, this may not exist (and we cannot then use the RET directly). See Baye, Kovenock, and de Vries (1997) for explicit solutions for the equilibria for different α and β.

Auction Theory

49

depends on the distribution of the parties’ valuations, but unlike S and L does not depend on the mechanism {α, β}.] It follows that the average bidder surplus is S plus a constant. But the average bidder surplus equals the average lawsuit winnings (expectation of {probability of winning} × {valuation}) minus L, equals a constant minus L by assumption (i) of the RET. So, S = K − L in which K is a constant independent of α and β. Because the lowest valuation type always loses in equilibrium [by assumption (i) of the RET], she bids zero so S = −β L, because in a one-shot game her opponent, on average, incurs expenses of L. Solving, L = K /(1 − β) and the expected surplus of any given party is a constant minus β K /(1 − β). It follows that both expected total expenses and any party’s expected payoff are invariant to α; hence the remarks in the text about the Quayle proposal. But legal expenses are increasing in β, indeed become unbounded in the limit corresponding to the British system. The mechanism that minimizes legal expenses taking the set of lawsuits as given is the reverse Quayle. The intuition is that it both increases the marginal cost of spending on a lawsuit and reduces the value of winning the suit. On the other hand, of course, bringing lawsuits becomes more attractive as β falls.

APPENDIX 2: DIRECT PROOF OF MONOPOLYTHEORETIC VERSION OF PROPOSITION IN SECTION 4 The proof rests precisely on the assumptions (a ), (b ), and (c ). Without loss of generality, let ﬁrms’ marginal costs be ﬂat up to capacity,79 and consider what would be the marginal revenue curve for the market if the K new consumers were attracted into it (see Figure 2.3). A monopolist on this (expanded) market would earn area A in proﬁts (i.e., the area between the marginal revenue and marginal cost curves up to the monopoly point, M). The perfectly competitive industry in the same (expanded) market would earn c = A − B, that is, the integral of marginal revenue less marginal cost up to industry capacity, K . By assumption (a ), a monopolist (or fully cartelized industry) in the original market would earn M = [Q/(Q + K )]A. Now, the average marginal revenue up to quantity Q + K equals the price at demand Q + K (because total marginal revenue = price × quantity), which exceeds marginal cost by assumption (b ), so B + C ≤ A. Furthermore, by assumption (c ) and elementary geometry, B ≤ [(K − M)/((Q + K ) − M)](B + C). So, B ≤ [(K − M)/(Q + K − M)]A, and therefore c = A − B ≥ [Q/(Q + K − M)]A ≥ M , as required. 79

If the industry cost curve is not ﬂat up to the capacity, then use the argument in the text to prove the result for a cost curve that is ﬂat and everywhere weakly above the actual cost curve. A fortiori, this proves the result for the actual curve, because a monopoly saves less from a lower cost curve than a competitive industry saves from the lower cost curve.

50

Klemperer

Marginal Revenue Marginal Cost Marginal cost

A

B

C

Marginal revenue Quantity M

K

Q+K

Figure 2.3. Marginal revenue if demand is expanded.

References Abreu, D. and F. Gul (2000), “Bargaining and Reputation,” Econometrica, 68, 85–117. Athey, S., K. Bagwell, and C. Sanchirico (2000), “Collusion and Price Rigidity,” mimeo, MIT. Ausubel, L. M., P. Cramton, R. P. McAfee, and J. McMillan (1997), “Synergies in Wireless Telephony: Evidence from the Broadband PCS Auctions,” Journal of Economics and Management Strategy, 6, 497–527. Ausubel, L. M. and J. A. Schwartz (1999), “The Ascending Auction Paradox,” Working Paper, University of Maryland. Bartolini, L. and C. Cottarelli (1997), “Designing Effective Auctions for Treasury Securities,” Current Issues in Economics and Finance, 3, 1–6. Baye, M. R., D. Kovenock, and C. de Vries (1997), “Fee Allocation of Lawyer Services in Litigation,” mimeo, Indiana University, Purdue University, and Tinbergen Institute, Erasmus University. Baye, M. R., D. Kovenock, and C. de Vries (1998, February), “A General Linear Model of Contests,” Working Paper, Indiana University, Purdue University, and Tinbergen Institute, Erasmus University. Baye, M. R. and J. Morgan (1999a), “A Folk Theorem for One-Shot Bertrand Games,” Economics Letters, 65, 59–65.

Auction Theory

51

Baye, M. R. and J. Morgan (1999b), “Bounded Rationality in Homogeneous Product Pricing Games,” Working Paper, Indiana University and Princeton University. Baye, M. R. and J. Morgan (2001), “Information Gatekeepers on the Internet and the Competitiveness of Homogeneous Product Markets,” American Economic Review, 91, 454–474. Beggs, A. W. and P. D. Klemperer (1992), “Multi-Period Competition with Switching Costs,” Econometrica, 60(3), 651–666. Bernheim, B. D. and M. D. Whinston (1986), “Menu Auctions, Resource Allocation, and Economic Inﬂuence,” Quarterly Journal of Economics, 101, 1–31. Binmore, K. and P. D. Klemperer (2002), “The Biggest Auction Ever: The Sale of the British 3G Telecom Licenses,” Economic Journal, 112(478), C74–C96. Bliss, C. and B. Nalebuff (1984), “Dragon-Slaying and Ballroom Dancing: The Private Supply of a Public Good,” Journal of Public Economics, 25, 1–12. Board, S. A. (1999), “Commitment in Auctions,” M. Phil. Thesis, Nufﬁeld College, Oxford University. Brusco, S. and G. Lopomo (1999), “Collusion via Signalling in Open Ascending Auctions with Multiple Objects and Complementarities,” Working Paper, Stern School of Business, New York University. Bulow, J. I., M. Huang, and P. D. Klemperer (1999), “Toeholds and Takeovers,” Journal of Political Economy, 107, 427–454. Bulow, J. I. and P. D. Klemperer (1994), “Rational Frenzies and Crashes,” Journal of Political Economy, 102, 1–23. Bulow, J. I. and P. D. Klemperer (1996), “Auctions vs. Negotiations,” American Economic Review, 86, 180–194. Bulow, J. I. and P. D. Klemperer (1998), “The Tobacco Deal,” Brookings Papers on Economic Activity (Microeconomics), 323–394. Bulow, J. I. and P. D. Klemperer (1999), “The Generalized War of Attrition,” American Economic Review, 89, 175–189. Bulow, J. I. and P. D. Klemperer (2002a), “Prices and the Winner’s Curse,” Rand Journal of Economics, 33(1), 1–21. Bulow, J. I. and P. D. Klemperer (2002b), “Privacy and Prices,” Nufﬁeld College, Oxford University Discussion Paper, available at www.paulklemperer.org. Bulow, J. I. and D. J. Roberts (1989), “The Simple Economics of Optimal Auctions,” Journal of Political Economy, 97, 1060–1090. Cramton, P. (1997), “The FCC Spectrum Auctions: An Early Assessment,” Journal of Economics and Management Strategy, 6(3), 431–495. Cramton, P., R. Gibbons, and P. D. Klemperer (1987), “Dissolving a Partnership Efﬁciently,” Econometrica, 55(3), 615–632. Cramton, P. and J. A. Schwartz (2001), “Collusive Bidding: Lessons from the FCC Spectrum Auctions,” Journal of Regulatory Economics, 18, 187–205. Engelbrecht-Wiggans, R. and C. M. Kahn (1998), “Low Revenue Equilibria in Simultaneous Auctions,” Working Paper, University of Illinois. Feddersen, T. J. and W. Pesendorfer (1996), “The Swing Voter’s Curse,” American Economic Review, 86(3), 408–424. Feddersen, T. J. and W. Pesendorfer (1998), “Convicting the Innocent: The Inferiority of Unanimous Jury Verdicts under Strategic Voting,” American Political Science Review, 92(1), 23–35. Federico, G. and D. Rahman (2000), “Bidding in an Electricity Pay-as-Bid Auction,” Working Paper, Nufﬁeld College.

52

Klemperer

von der Fehr, N.-H. and D. Harbord (1993), “Spot Market Competition in the UK Electricity Industry,” Economic Journal, 103, 531–546. von der Fehr, N.-H. and D. Harbord (1998), “Competition in Electricity Spot Markets: Economic Theory and International Experience,” Memorandum No. 5/1998, Department of Economics, University of Oslo. Friedman, L. (1956), “A Competitive Bidding Strategy,” Operations Research, 4, 104– 112. Fudenberg, D., and D. M. Kreps (1987), “Reputation in the Simultaneous Play of Multiple Opponents,” Review of Economic Studies, 54, 541–568. Fudenberg, D. and J. Tirole (1986), “A Theory of Exit in Duopoly,” Econometrica, 54, 943–960. Fudenberg, D., R. Gilbert, J. Stiglitz and J. Tirole (1983), “Preemption, Leapfrogging, and Competition in Patent Races,” European Economic Review, 22, 3–31. Gilbert, R. and P. D. Klemperer, (2000), “An Equilibrium Theory of Rationing,” Rand Journal of Economics, 31(1), 1–21. Grimm, V., F. Riedel, and E. Wolfstetter (2001), “Low Price Equilibrium in MultiUnit Auctions: The GSM Spectrum Auction in Germany,” Working Paper, Humboldt Universit¨at zu Berlin. Haigh, J. and C. Cannings, (1989), “The n-Person War of Attrition,” Acta Applicandae Mathematicae, 14, 59–74. Hall, R. E. (2001), Digital Dealing. New York: W. W. Norton. Hansen, R. G. (1988), “Auctions with Endogenous Quantity,” Rand Journal of Economics, 19, 44–58. Holt, C. A. Jr. and R. Sherman (1982), “Waiting-Line Auctions.” Journal of Political Economy, 90, 280–294. Hughes, J. W. and E. A. Snyder (1995), “Litigation and Settlement under the English and American Rules: Theory and Evidence.” Journal of Law and Economics, 38, 225–250. Jehiel, P. and B. Moldovanu (2000), “A Critique of the Planned Rules for the German UMTS/IMT-2000 License Auction,” Working Paper, University College London and University of Mannheim. Kagel, J. H. (1995), “Auctions: A Survey of Experimental Research,” in The Handbook of Experimental Economics (ed. by J. H. Kagel and A. E. Roth), Princeton, NJ: Princeton University Press, 501–586. Kambe, S. (1999), “Bargaining with Imperfect Commitment,” Games and Economic Behavior, 28(2), 217–237. Kaplan, T. and D. Wettstein (2000), “The Possibility of Mixed-Strategy Equilibria with Constant-Returns-to-Scale Technology under Bertrand Competition,” Spanish Economic Review, 2(1), 65–71. Klemperer, P. D. (1987a), “Markets with Consumer Switching Costs,” Quarterly Journal of Economics, 102(2), 375–394. Klemperer, P. D. (1987b), “The Competitiveness of Markets with Switching Costs,” Rand Journal of Economics, 18(1), 138–150. Klemperer, P. D. (1995), “Competition When Consumers Have Switching Costs: An Overview with Applications to Industrial Organization, Macroeconomics, and International Trade,” Review of Economic Studies, 62(4), 515–539. Klemperer, P. D. (1998), “Auctions with Almost Common Values,” European Economic Review, 42, 757–769. Klemperer, P. D. (1999a), “Auction Theory: A Guide to the Literature,” Journal of

Auction Theory

53

Economic Surveys, 13(3), 227–286. [Also reprinted in The Current State of Economic Science, 2, 711–766, (ed. by S. Dahiya), 1999.] Klemperer, P. D. (1999b), “Applying Auction Theory to Economics,” Working Paper, Nufﬁeld College Oxford. Klemperer, P. D. (Ed.) (2000), The Economic Theory of Auctions. Cheltenham, UK: Edward Elgar. Klemperer, P. D. (2002a), “What Really Matters in Auction Design,” Journal of Economic Perspectives, 16(1), 169–189. Klemperer, P. D. (2002b), “How (Not) to Run Auctions: The European 3G Telecom Auctions,” European Economic Review, 46(4–5), 829–845. Klemperer, P. D. (2002c), “Using and Abusing Economic Theory,” 2002 Marshall Lecture to the European Economic Association. Forthcoming at www.paulklemperer. org. Klemperer, P. D. (2002d), “Some Observations on the British 3G Telecom Auction,” ifo Studien, 48(1), forthcoming, and at www.paulklemperer.org Klemperer, P. D. (2002e), “Some Observations on the German 3G Telecom Auction.” ifo Studien, 48(1), forthcoming, and at www.paulklemperer.org Klemperer, P. D. and M. A. Meyer (1986), “Price Competition vs. Quantity Competition: The Role of Uncertainty,” Rand Journal of Economics, 17(4), 618–638. Klemperer, P. D. and M. A. Meyer (1989), “Supply Function Equilibria in Oligopoly under Uncertainty,” Econometrica, 57, 1243–1277. Klemperer, P. D. and M. Pagnozzi (2002), “Advantaged Bidders and Spectrum Prices: An Empirical Analysis,” Discussion Paper, Nufﬁeld College, Oxford University, available at www.paulklemperer.org Kremer, I. (2000), “Information Aggregation in Common Value Auctions,” Working Paper, Northwestern University. K¨uhn, K.-U. and X. Vives (1994), “Information Exchanges among Firms and Their Impact on Competition,” Working Paper, Institut d’An`alisi Econ`omica (CSIC) Barcelona. Laffont, J.-J. (1997), “Game Theory and Empirical Economics: The Case of Auction Data,” European Economic Review, 41, 1–35. Lee, H. G. (1998), “Do Electronic Marketplaces Lower the Price of Goods?” Communications of the ACM, 41, 73–80. Lee, H. G., J. C. Westland, and S. Hong (1999), “The Impact of Electronic Marketplaces on Product Prices: An Empirical Study of AUCNET,” International Journal of Electronic Commerce, 4-2, 45–60. Maskin, E. S. (1992), “Auctions and Privatization,” in Privatization: Symposium in Honour of Herbert Giersch (ed. by H. Siebert), T¨ubingen: Mohr, 115–136. Matthews, S. A. (1984), “Information Acquisition in Discriminatory Auctions,” in Bayesian Models in Economic Theory (ed. by M. Boyer and R. E. Kihlstrom), New York: North-Holland, 181–207. Maynard Smith, J. (1974), “The Theory of Games and the Evolution of Animal Conﬂicts,” Journal of Theoretical Biology, 47, 209–219. McAdams, D. (1998), “Adjustable Supply and “Collusive-Seeming Equilibria” in The Uniform-Price Share Auction, Working Paper, Stanford University. McAfee, R. P. and J. McMillan (1996), “Analyzing the Airwaves Auction,” Journal of Economic Perspectives, 10, 159–175. McMillan, J. (1994), “Selling Spectrum Rights,” Journal of Economic Perspectives, 8, 145–162.

54

Klemperer

Menezes, F. (1996), “Multiple-Unit English Auctions,” European Journal of Political Economy, 12, 671–684. Menezes, F., P. K. Monteiro, and A. Temimi (2000), “Discrete Public Goods with Incomplete Information,” Working Paper, EPGE/FGV. Milgrom, P. R. (1979), “A Convergence Theorem for Competitive Bidding with Differential Information,” Econometrica, 47, 679–688. Milgrom, P. R. (1981), “Rational Expectations, Information Acquisition, and Competitive Bidding,” Econometrica, 49, 921–943. Milgrom, P. R. (1985), “The Economics of Competitive Bidding: A Selective Survey,” in Social Goals and Social Organization: Essays in Memory of Elisha Pazner, (ed. by L. Hurwicz, D. Schmeidler, and H. Sonnenschein), Cambridge: Cambridge University Press. Milgrom, P. R. (1987), “Auction Theory,” in Advances in Economic Theory–Fifth World Congress, (ed. by T. F. Bewley), Cambridge:Cambridge University Press. Milgrom, P. R. (forthcoming), Putting Auction Theory to Work. Cambridge:Cambridge University Press. Milgrom, P. R. and R. J. Weber (1982), “A Theory of Auctions and Competitive Bidding,” Econometrica, 50, 1089–1122. Mussa, M. and S. Rosen (1978), “Monopoly and Product Quality,” Journal of Economic Theory, 18, 301–317. Myerson, R. B. (1981), “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. Newbery, D. M. (1998), “Competition, Contracts, and Entry in the Electricity Spot Market,” Rand Journal of Economics, 29(4), 726–749. Nyborg, K. and S. Sundaresan (1996), “Discriminatory Versus Uniform Treasury Auctions: Evidence from When-Issued Transactions,” Journal of Financial Economics, 42, 63–104. Ofﬁce of Gas and Electricity Markets (1999), The New Electricity Trading Arrangements, July, available at www.open.gov.uk/offer/reta.htm. Ortega-Reichert, A. (1968), Models for Competitive Bidding under Uncertainty. Stanford University Ph.D. Thesis (and Technical Report No. 8, Department of Operations Research, Stanford University). [Chapter 8 reprinted with foreword by S. A. Board and P. D. Klemperer, in P. D. Klemperer (Ed.) (2000), The Economic Theory of Auctions, Cheltenham, UK: Edward Elgar.] Persico, N. (2000), “Games of Redistribution Politics are Equivalent to All-Pay Auctions with Consolation Prizes,” Working Paper, University of Pennsylvania. Pesendorfer, W. and J. M. Swinkels (1997), “The Loser’s Curse and Information Aggregation in Common Value Auctions,” Econometrica, 65, 1247–1281. Pesendorfer, W. and J. M. Swinkels (2000), “Efﬁciency and Information Aggregation in Auctions,” American Economic Review, 90(3), 499–525. Plott, C. (1997), “Laboratory Experimental Testbeds: Application to the PCS Auction,” Journal of Economics and Management Strategy, 6(3), 605–638. Radiocommunications Agency (1998a), “UMTS Auction Design.” UMTS Auction Consultative Group Report, 98, 14, available at www.spectrumauctions.gov.uk. Radiocommunications Agency. (1998b), “UMTS Auction Design 2.” UMTS Auction Consultative Group Report, 98, 16, available at www.spectrumauctions.gov.uk. Riley, J. G. (1980), “Strong Evolutionary Equilibrium and the War of Attrition,” Journal of Theoretical Biology, 82, 383–400. Riley, J. G. and W. F. Samuelson (1981), “Optimal Auctions,” American Economic Review, 71, 381–392.

Auction Theory

55

Robinson, M. S. (1985), “Collusion and the Choice of Auction,” Rand Journal of Economics, 16, 141–145. Rosenthal, R. W. (1980), “A Model in Which an Increase in the Number of Sellers Leads to a Higher Price,” Econometrica, 48(6), 1575–1579. Rothkopf, M. H. (1969), “A Model of Rational Competitive Bidding,” Management Science, 15, 362–373. Salant, D. (1997), “Up in the Air: GTE’s Experience in the MTA Auction for Personal Communication Services Licenses,” Journal of Economics and Management Strategy, 6(3), 549–572. Scott Morton, F., F. Zettelmeyer, and J. Silva Risso (2001), “Internet Car Retailing,” Working Paper, Yale University. Spence, M. A. (1972), “Market Signalling: The Informational Structure of Job Markets and Related Phenomena,” Ph.D. Thesis, Harvard University. Spulber, D. F. (1995), “Bertrand Competition When Rivals’ Costs Are Unknown, ” Journal of Industrial Economics, 43, 1–12. Stevens, M. (1994), “Labour Contracts and Efﬁciency in On-the-Job Training,” Economic Journal, March, 104(423), 408–419. Stevens, M. (2000), “Reconciling Theoretical and Empirical Human Capital Earnings Functions,” Working Paper, Nufﬁeld College, Oxford University. Stigler, G. J. (1964), “A Theory of Oligopoly,” Journal of Political Economy, 72, 44–61. Swinkels, J. M. (2001), “Efﬁciency of Large Private Value Auctions,” Econometrica, 69 37–68. Vickrey, W. (1961), “Counterspeculation, Auctions, and Competitive Sealed Tenders,” Journal of Finance, 16, 8–37. Vickrey, W. (1962), “Auction and Bidding Games,” in Recent Advances in Game Theory, Princeton, NJ: The Princeton University Conference, 15–27. Vickrey, W. (1976), “Auctions Markets and Optimum Allocations,” in Bidding and Auctioning for Procurement and Allocation: Studies in Game Theory and Mathematical Economics, (ed. by Y. Amihud), New York: New York University Press, 13–20. Vives, X. (1999), “Information Aggregation, Strategic Behavior, and Efﬁciency in Cournot Markets,” Discussion Paper, Institut d’An`alisi Econ`omica (CSIC, Barcelona). Weber, R. J. (1997), “Making More from Less: Strategic Demand Reduction in the FCC Spectrum Auctions,” Journal of Economics and Management Strategy, 6(3), 529–548. Wilson, R. (1967), “Competitive Bidding with Asymmetric Information,” Management Science, 13, A816–A820. Wilson, R. (1969), “Competitive Bidding with Disparate Information,” Management Science, 15, 446–448. Wilson, R. (1977), “A Bidding Model of Perfect Competition,” Review of Economic Studies, 44, 511–518. Wilson, R. (1979), “Auctions of Shares,” Quarterly Journal of Economics, 93, 675–689. Wolfram, C. D. (1998), “Strategic Bidding in a Multiunit Auction: An Empirical Analysis of Bids to Supply Electricity in England and Wales,” Rand Journal of Economics, 29(4), 703–725. Wolfram, C. D. (1999), “Measuring Duopoly Power in the British Electricity Spot Market,” American Economic Review, 89, 805–826. Zheng, C. (1999), “High Bids and Broke Winners,” mimeo, University of Minnesota.

CHAPTER 3

Global Games: Theory and Applications Stephen Morris and Hyun Song Shin

1. INTRODUCTION Many economic problems are naturally modeled as a game of incomplete information, where a player’s payoff depends on his own action, the actions of others, and some unknown economic fundamentals. For example, many accounts of currency attacks, bank runs, and liquidity crises give a central role to players’ uncertainty about other players’ actions. Because other players’ actions in such situations are motivated by their beliefs, the decision maker must take account of the beliefs held by other players. We know from the classic contribution of Harsanyi (1967–1968) that rational behavior in such environments not only depends on economic agents’ beliefs about economic fundamentals, but also depends on beliefs of higher-order – i.e., players’ beliefs about other players’ beliefs, players’ beliefs about other players’ beliefs about other players’ beliefs, and so on. Indeed, Mertens and Zamir (1985) have shown how one can give a complete description of the “type” of a player in an incomplete information game in terms of a full hierarchy of beliefs at all levels. In principle, optimal strategic behavior should be analyzed in the space of all possible inﬁnite hierarchies of beliefs; however, such analysis is highly complex for players and analysts alike and is likely to prove intractable in general. It is therefore useful to identify strategic environments with incomplete information that are rich enough to capture the important role of higher-order beliefs in economic settings, but simple enough to allow tractable analysis. Global games, ﬁrst studied by Carlsson and van Damme (1993a), represent one such environment. Uncertain economic fundamentals are summarized by a state θ and each player observes a different signal of the state with a small amount of noise. Assuming that the noise technology is common knowledge among the players, each player’s signal generates beliefs about fundamentals, beliefs about other players’ beliefs about fundamentals, and so on. Our purpose in this paper is to describe how such models work, how global game reasoning can be applied to economic problems, and how this analysis relates to more general analysis of higher-order beliefs in strategic settings.

Global Games

57

One theme that emerges is that taking higher-order beliefs seriously does not require extremely sophisticated reasoning on the part of players. In Section 2, we present a benchmark result for binary action continuum player games with strategic complementarities where each player has the same payoff function. In a global games setting, there is a unique equilibrium where each player chooses the action that is a best response to a uniform belief over the proportion of his opponents choosing each action. Thus, when faced with some information concerning the underlying state of the world, the prescription for each player is to hypothesize that the proportion of other players who will opt for a particular action is a random variable that is uniformly distributed over the unit interval and choose the best action under these circumstances. We dub such beliefs (and the actions that they elicit) as being Laplacian, following Laplace’s (1824) suggestion that one should apply a uniform prior to unknown events from the “principle of insufﬁcient reason.” A striking feature of this conclusion is that it reconciles Harsanyi’s fully rational view of optimal behavior in incomplete information settings with the dissenting view of Kadane and Larkey (1982) and others that rational behavior in games should imply only that each player chooses an optimal action in the light of his subjective beliefs about others’ behavior, without deducing his subjective beliefs as part of the theory. If we let those subjective beliefs be the agnostic Laplacian prior, then there is no contradiction with Harsanyi’s view that players should deduce rational beliefs about others’ behavior in incomplete information settings. The importance of such analysis is not that we have an adequate account of the subtle reasoning undertaken by the players in the game; it clearly does not do justice to the reasoning inherent in the Harsanyi program. Rather, its importance lies in the fact that we have access to a form of short-cut, or heuristic device, that allows the economist to identify the actual outcomes in such games, and thereby open up the possibility of systematic analysis of economic questions that may otherwise appear to be intractable. One instance of this can be found in the debate concerning self-fulﬁlling beliefs and multiple equilibria. If one set of beliefs motivates actions that bring about the state of affairs envisaged in those beliefs, while another set of selffulﬁlling beliefs bring about quite different outcomes, then there is an apparent indeterminacy in the theory. In both cases, the beliefs are logically coherent, consistent with the known features of the economy, and are borne out by subsequent events. However, we do not have any guidance on which outcome will transpire without an account of how the initial beliefs are determined. We have argued elsewhere (Morris and Shin, 2000) that the apparent indeterminacy of beliefs in many models with multiple equilibria can be seen as the consequence of two modeling assumptions introduced to simplify the theory. First, the economic fundamentals are assumed to be common knowledge. Second, economic agents are assumed to be certain about others’ behavior in equilibrium. Both assumptions are made for the sake of tractability, but they do much more besides.

58

Morris and Shin

They allow agents’ actions and beliefs to be perfectly coordinated in a way that invites multiplicity of equilibria. In contrast, global games allow theorists to model information in a more realistic way, and thereby escape this straitjacket. More importantly, through the heuristic device of Laplacian actions, global games allow modelers to pin down which set of self-fulﬁlling beliefs will prevail in equilibrium. As well as any theoretical satisfaction at identifying a unique outcome in a game, there are more substantial issues at stake. Global games allow us to capture the idea that economic agents may be pushed into taking a particular action because of their belief that others are taking such actions. Thus, inefﬁcient outcomes may be forced on the agents by the external circumstances even though they would all be better off if everyone refrained from such actions. Bank runs and ﬁnancial crises are prime examples of such cases. We can draw the important distinction between whether there can be inefﬁcient equilibrium outcomes and whether there is a unique outcome in equilibrium. Global games, therefore, are of more than purely theoretical interest. They allow more enlightened debate on substantial economic questions. In Section 2.3, we discuss applications that model economic problems using global games. Global games open up other interesting avenues of investigation. One of them is the importance of public information in contexts where there is an element of coordination between the players. There is plentiful anecdotal evidence from a variety of contexts that public information has an apparently disproportionate impact relative to private information. Financial markets apparently “overreact” to announcements from central bankers that merely state the obvious, or reafﬁrm widely known policy stances. But a closer look at this phenomenon with the beneﬁt of the insights given by global games makes such instances less mysterious. If market participants are concerned about the reaction of other participants to the news, the public nature of the news conveys more information than simply the “face value” of the announcement. It conveys important strategic information on the likely beliefs of other market participants. In this case, the “overreaction” would be entirely rational and determined by the type of equilibrium logic inherent in a game of incomplete information. In Section 3, these issues are developed more systematically. Global games can be seen as a particular instance of equilibrium selection though perturbations. The set of perturbations is especially rich because it turns out that they allow for a rich structure of higher-order beliefs. In Section 4, we delve somewhat deeper into the properties of general global games – not merely those whose action sets are binary. We discuss how global games are related to other notions of equilibrium reﬁnements and what is the nature of the perturbation implicit in global games. The general framework allows us to disentangle two properties of global games. The ﬁrst property is that a unique outcome is selected in the game. A second, more subtle, question is how such a unique outcome depends on the underlying information structure and the noise in the players’ signals. Although in some cases the outcome is sensitive to the details of the information structure, there are cases where a particular outcome

Global Games

59

is selected and where this outcome turns out to be robust to the form of the noise in the players’ signals. The theory of “robustness to incomplete information” as developed by Kajii and Morris (1997) holds the key to this property. We also discuss a larger theoretical literature on higher-order beliefs and the relation to global games. In Section 5, we show how recent work on local interaction games and dynamic games with payoff shocks use a similar logic to global games in reaching unique predictions. 2. SYMMETRIC BINARY ACTION GLOBAL GAMES 2.1.

Linear Example

Let us begin with the following example taken from Carlsson and van Damme (1993a). Two players are deciding whether to invest. There is a safe action (not invest); there is a risky action (invest) that gives a higher payoff if the other player invests. Payoffs are given in Table 3.1: Table 3.1. Payoffs of leading example

Invest NotInvest

Invest

NotInvest

θ, θ 0, θ − 1

θ − 1, 0 0, 0

(2.1)

If there was complete information about θ, there would be three cases to consider: r If θ > 1, each player has a dominant strategy to invest. r If θ ∈ [0, 1], there are two pure strategy Nash equilibria: both invest and both not invest. r If θ < 0, each player has a dominant strategy not to invest. But there is incomplete information about θ . Player i observes a private signal xi = θ + εi . Each εi is independently normally distributed with mean 0 and standard deviation σ . We assume that θ is randomly drawn from the real line, with each realization equally likely. This implies that a player observing signal x considers θ to be distributed normally with mean x and standard deviation σ . This in turn implies that he thinks his opponent’s signal x is normally √ distributed with mean x and standard deviation 2σ. The assumption that θ is uniformly distributed on the real line is nonstandard, but presents no technical difﬁculties. Such “improper priors” (with an inﬁnite mass) are well behaved, as long as we are concerned only with conditional beliefs. See Hartigan (1983) for a discussion of improper priors. We will also see later that an improper

60

Morris and Shin . .... .... ... ... ... .... .... .... ... . ... .... ... . .. .... ....... ....... ....... .......... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ....... ......... ....... ....... ....... ....... ....... ....... ....... .. .. ...... ... .................. . ............ .... .... .......... ......... . .... ....... .... ....... . . . . . .... . ..... .... ....... ... ....... .. ... ... ............ ... . .... .. .... .......... . . . .... . . .... ........ .... .. ..... ... ...... .... . ...... .... ............. . . . . . .... . ...... . ..... .......... .... ...... .. ...... .... ......... ... ....... . . . . ... .. ... ..... . ... ..... ... ..... ... .... ..... . ...... .... .... ...... .. . . . . .... . ... ...... ... ...... ... ... ............ .. . ........... .... ........ ....... .... . . . . . . . . . . .. ...... ......... .... ... ........... ............... .... . .. ....................... ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ . . . . . . . . .. . . .... . .... .. ...

1

b (k)

0.5

0

0.5

1

k

Figure 3.1. Function b(k).

prior can be seen as a limiting case either as the prior distribution of θ becomes diffuse or as the standard deviation of the noise σ becomes small. A strategy is a function specifying an action for each possible private signal; a natural kind of strategy we might consider is one where a player takes the risky action only if he observes a private signal above some cutoff point, k: Invest, if x > k s(x) = NotInvest, if x ≤ k. We will refer to this strategy as the switching strategy around k. Now suppose that a player observed signal x and thought that his opponent was following such a “switching” strategy with cutoff √ point k. His expectation of θ will be x. He will assign probability 1/ 2σ (k − x) to his opponent observing a signal less than k [where (·) is the c.d.f. of the standard normal distribution]. In particular, if he has observed a signal equal to the cutoff point of his opponent (x = k), he will assign probability 12 to his opponent investing. Thus, there will be an equilibrium where both players follow switching strategies with cutoff 12 . In fact, a switching strategy with cutoff 12 is the unique strategy surviving iterated deletion of strictly interim-dominated strategies. To see why,1 ﬁrst deﬁne b(k) to be the unique value of x solving the equation k−x x − √ = 0. (2.2) 2σ The function b(·) is plotted in Figure 3.1. There is a unique such value because the left-hand side is strictly increasing in x and strictly decreasing in k. These 1

An alternative argument follows Milgrom and Roberts (1990): if a symmetric game with strategic complementarities has a unique symmetric Nash equilibrium, then the strategy played in that unique Nash equilibrium is also the unique strategy surviving iterated deletion of strictly dominated strategies.

Global Games

61

properties also imply that b(·) is strictly increasing. So, if your opponent is following a switching strategy with cutoff k, your best response is to follow a switching strategy with cutoff b(k). We will argue that if a strategy s survives n rounds of iterated deletion of strictly dominated strategies, then s(x) =

Invest, if NotInvest, if

x > bn−1 (1) x < bn−1 (0).

(2.3)

We argue the second clause by induction (the argument for the ﬁrst clause is symmetric). The claim is true for n = 1, because as we noted previously, NotInvest is a dominant strategy if the expected value of θ is less than 0. Now, suppose the claim is true for arbitrary n. If a player knew that his opponent would choose action NotInvest if he had observed a signal less than bn−1 (1), his best response would always be to choose action NotInvest if his signal was less than b(bn−1 (1)). Because b(·) is strictly increasing and has a unique ﬁxed point at 12 , bn (0) and bn (1) both tend to 12 as n → ∞. The unique equilibrium has both players investing only if they observe a signal greater than 12 . In the underlying symmetric payoff complete information game, investing is a risk dominant action (Harsanyi and Selten, 1988), exactly if θ ≥ 12 ; not investing is a risk dominant action exactly if θ ≤ 12 . The striking feature of this result is that no matter how small σ is, players’ behavior is inﬂuenced by the existence of the ex ante possibility that their opponent has a dominant strategy to choose each action.2 The probability that either individual invests is 1 − θ ; 2 σ Conditional on θ , their investment decisions are independent. The previous example and analysis are due to Carlsson and van Damme (1993a). There is a many-players analog of this game, whose solution is no more difﬁcult to arrive at. A continuum of players are deciding whether to invest. The payoff to not investing is 0. The payoff to investing is θ − 1 + l, where l is the proportion of other players choosing to invest. The information structure is as before, with each player i observing a private signal xi = θ + εi , where the εi are normally distributed in the population with mean 0 and standard deviation σ . Also in this case, the unique strategy surviving iterated deletion of strictly dominated strategies has each player investing if they observe a signal above 12 and not investing if they observe a signal below 12 . We will brieﬂy sketch why this is the case. Consider a player who has observed signal x and thinks that all his opponents are following the “switching” strategy with cutoff point k. As before, his √ expectation of θ will be x. As before, he will assign probability ((k − x)/ 2σ )) to 2

Thus, a “grain of doubt” concerning the opponent’s behavior has large consequences. This element has been linked by van Damme (1997) to the classic analysis of surprise attacks of Schelling (1960), Chapter 9.

62

Morris and Shin

any given opponent observing a signal less than k. But, because the realization of the signals are independent conditional on θ, his expectation of the proportion of players who observe a signal less than k will be exactly equal to the probability he assigns to any one opponent observing a signal √ less than k. Thus, his expected payoff to investing will be x − ((k − x)/ 2σ ), as before, and all the previous arguments go through. This argument shows the importance of keeping track of the layers of beliefs across players, and as such may seem rather daunting from the point of view of an individual player. However, the equilibrium outcome is also consistent with a procedure that places far less demands on the capacity of the players, and that seems to be far removed from equilibrium of any kind. This procedure has the following three steps. r Estimate θ from the signal x. r Postulate that l is distributed uniformly on the unit interval [0, 1]. r Take the optimal action. Because the expectation of θ conditional on x is simply x itself, the expected payoff to investing if l is uniformly distributed is x − 12 , whereas the expected payoff to not investing is zero. Thus, a player following this procedure will choose to invest or not depending on whether x is greater or smaller than 1 , which is identical to the unique equilibrium strategy previously outlined. 2 The belief summarized in the second bullet point is Laplacian in the sense introduced in the introductory section. It represents a “diffuse” or “agnostic” view on the actions of other players in the game. We see that an apparently naive and simplistic strategy coincides with the equilibrium strategy. This is not an accident. There are good reasons why the Laplacian action is the correct one in this game, and why it turns out to be an approximately optimal action in many binary action global games. The key to understanding this feature is to consider the following question asked by a player in this game. “My signal has realization x. What is the probability that proportion less than z of my opponents have a signal higher than mine?”

The answer to this question would be especially important if everyone is using the switching strategy around x, since the proportion of players who invest is equal to the proportion whose signal is above x. If the true state is θ, the proportion of players who receive a signal higher than x is given by 1 − ((ψ − θ)/σ ). So, this proportion is less than z if the state θ is such that 1 − ((ψ − θ )/σ ) ≤ z. That is, when θ ≤ x − σ −1 (1 − z). The probability of this event conditional on x is x − σ −1 (1 − z) − x = z. σ

(2.4)

Global Games

63

In other words, the cumulative distribution function of z is the identity function, implying that the density of z is uniform over the unit interval. If x is to serve as the switching point of an equilibrium switching strategy, a player must be indifferent between choosing to invest and not to invest given that the proportion who invest is uniformly distributed on [0, 1]. More importantly, even away from the switching point, the optimal action motivated by this belief coincides with the equilibrium action, even though the (Laplacian) belief may not be correct. Away from the switching point, the density of the random variable representing the proportion of players who invest will not be uniform. However, as long as the payoff advantage to investing is increasing in θ , the Laplacian action coincides with the equilibrium action. Thus, the apparently naive procedure outlined by the three bulleted points gives the correct prediction as to what the equilibrium action will be. In the next section, we will show that the lessons drawn from this simple example extend to cover a wide class of binary action global games. We will focus on the continuum player case in most of this paper. However, as suggested by this example, the qualitative analysis is very similar irrespective of the number of players. In particular, the analysis of the continuum player game with linear payoffs applies equally well to any ﬁnite number of players (where each player observes a signal with an independent normal noise term). Independent of the number of players, the cutoff signal in the unique equilibrium is 12 . However, a distinctive implication of the inﬁnite player case is that the outcome is a deterministic function of the realized state. In particular, once we know the realization of θ, we can calculate exactly the proportion of players who will invest. It is 1 −θ 2 ξ (θ ) = 1 − . σ With a ﬁnite number of players (I ), we write ξλ,I (θ) for the probability that at least proportion λ out of the I players invest when the realized state is θ : ξλ,I (θ) =

I n≥λI

n

1 2

−θ σ

I −n

1−

1 2

−θ σ

n .

Observe, however, that the many ﬁnite player case converges naturally to the continuum model: by the law of large numbers, as I → ∞, ξλ,I (θ) → 1

if λ < ξ (θ)

ξλ,I (θ) → 0

if λ > ξ (θ).

and

64

Morris and Shin

2.2.

Symmetric Binary Action Global Games: A General Approach

Let us now take one step in making the argument more general. We deal ﬁrst with the case where there is a uniform prior on the initial state, and each player’s signal is a sufﬁcient statistic for how much they care about the state (we call this the private values case). In this case, the analysis is especially clean, and it is possible to prove a uniqueness result and characterize the unique equilibrium independent of both the structure and size of the noise in players’ signals. We then show that the analysis can be extended to deal with general priors and payoffs that depend on the realized state. 2.2.1.

Continuum Players: Uniform Prior and Private Values

There is a continuum of players. Each player has to choose an action a ∈ {0, 1}. All players have the same payoff function, u : {0, 1} × [0, 1] × R → R, where u(a, l, x) is a player’s payoff if he chooses action a, proportion l of his opponents choose action 1, and his “private signal” is x. Thus, we assume that his payoff is independent of which of his opponents choose action 1. To analyze best responses, it is enough to know the payoff gain from choosing one action rather than the other. Thus, the utility function is parameterized by a function π : [0, 1] × R → R with π (l, x) ≡ u(1, l, x) − u(0, l, x). Formally, we say that an action is the Laplacian action if it is a best response to a uniform prior over the opponents’ choice of action. Thus, action 1 is the Laplacian action at x if 1 1 u(1, l, x)dl > u(0, l, x)dl, l=0

l=0

or, equivalently, 1 π (l, x)dl > 0; l=0

action 0 is the Laplacian action at x if 1 π (l, x)dl < 0. l=0

Generically, a continuum player, symmetric payoff, two-action game will have exactly one Laplacian action. A state θ ∈ R is drawn according to the (improper) uniform density on the real line. Player i observes a private signal xi = θ + σ εi , where σ > 0. The noise terms εi are distributed in the population with continuous density f (·),

Global Games

65

with support on the real line.3 We note that this density need not be symmetric around the mean, nor even have zero mean. The uniform prior on the real line is “improper” (i.e., has inﬁnite probability mass), but the conditional probabilities are well deﬁned: a player observing signal xi puts density (1/σ ) f ((xi − θ )/σ ) on state θ (see Hartigan 1983). The example of the previous section ﬁts this setting, where f (·) is the standard normal distribution and π (l, x) = x + l − 1. We will initially impose ﬁve properties on the payoffs: A1: Action Monotonicity: π(l, θ) is nondecreasing in l. A2: State Monotonicity: π(l, θ) is nondecreasing in θ . A3: Strict Laplacian State Monotonicity: There exists a unique θ ∗

1 solving l=0 π(l, θ ∗ )dl = 0. A4: Limit Dominance: There exist θ ∈ R and θ ∈ R, such that [1] π (l, x) < 0 for all l ∈ [0, 1] and x ≤ θ ; and [2] π (l, x) > 0 for all l ∈ [0, 1] and x ≥ θ . 1 A5: Continuity: l=0 g(l) π (l, x)dl is continuous with respect to signal x and density g. Condition A1 states that the incentive to choose action 1 is increasing in the proportion of other players’ actions who use action 1; thus there are strategic complementarities between players’ actions (Bulow, Geanakoplos, and Klemperer, 1985). Condition A2 states that the incentive to choose action 1 is increasing in the state; thus a player’s optimal action will be increasing in the state, given the opponents’ actions. Condition A3 introduces a further strengthening of A2 to ensure that there is at most one crossing for a player with Laplacian beliefs. Condition A4 requires that action 0 is a dominant strategy for sufﬁciently low signals, and action 1 is a dominant strategy for sufﬁciently high signals. Condition A5 is a weak continuity property, where continuity in g is with respect to the weak topology. Note that this condition allows for some discontinuities in payoffs. For example, 0, if l ≤ x π (l, x) = 1, if l > x satisﬁes A5 as for any given x, it is discontinuous at only one value of l. We denote by G ∗ (σ ) this incomplete information game – with the uniform prior and satisfying A1 through A5. A strategy for a player in the incomplete information game is a function s : R → {0, 1}, where s(x) is the action chosen if a player observes signal x. We will be interested in strategy proﬁles, s = (si )i ∈ [0,1] , that form a Bayesian Nash equilibrium of G ∗ (σ ). We will show not merely that there is a unique Bayesian Nash equilibrium of the game, but that a unique strategy proﬁle survives iterated deletion of strictly (interim) dominated strategies. 3

With small changes in terminology, the argument will extend to the case where f (·) has support on some bounded interval of the real line.

66

Morris and Shin

Proposition 2.1. Let θ ∗ be deﬁned as in (A3). The essentially unique strategy surviving iterated deletion of strictly dominated strategies in G ∗ (σ ) satisﬁes s(x) = 0 for all x < θ ∗ and s(x) = 1 for all x > θ ∗ . The “essential” qualiﬁcation arises because either action may be played if the private signal is exactly equal to θ ∗ . The key idea of the proof is that, with a uniform prior on θ, observing xi gives no information to a player on his ranking within the population of signals. Thus, he will have a uniform prior belief over the proportion of players who will observe higher signals. Proof. Write πσ∗ (x, k) for the expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: ∞ 1 x −θ k−θ f π 1− F , x dθ . πσ∗ (x, k) ≡ σ σ θ=−∞ σ First, observe that πσ∗ (x, k) is continuous in x and k, increasing in x, and decreasing in k, πσ∗ (x, k) < 0 if x ≤ θ and πσ∗ (x, k) > 0 if x ≥ θ. We will argue by induction that a strategy survives n rounds of iterated deletion of strictly interim dominated strategies if and only if 0, if x < ξ n s(x) = 1, if x > ξ n , where ξ 0 = −∞ and ξ 0 = +∞, and ξ n and ξ n are deﬁned inductively by ξ n+1 = min{x : πσ∗ (x, ξ n ) = 0} and ξ n+1 = max{x : πσ∗ (x, ξ n ) = 0}. Suppose the claim was true for n. By strategic complementarities, if action 1 were ever to be a best response to a strategy surviving n rounds, it must be a best response to the switching strategy with cutoff ξ n ; ξ n+1 is deﬁned to be the lowest signal where this occurs. Similarly, if action 0 were ever to be a best response to a strategy surviving n rounds, it must be a best response to the switching strategy with cutoff ξ n ; ξ n+1 is deﬁned to be the highest signal where this occurs. Now note that ξ n and ξ n are increasing and decreasing sequences, respectively, because ξ 0 = −∞ < θ < ξ 1 , ξ 0 = ∞ > θ > ξ 1 , and πσ∗ (x, k) is increasing in x and decreasing in k. Thus, ξ n → ξ and ξ n → ξ as n → ∞. The continuity of πσ∗ and the construction of ξ and ξ imply that we must have πσ∗ (ξ , ξ ) = 0 and πσ∗ (ξ , ξ ) = 0. Thus, the second step of our proof is to show that θ ∗ is the unique solution to the equation πσ∗ (x, x) = 0. To see this second step, write σ∗ (l; x, k) for the probability that a player assigns to proportion less than l of the other players observing a signal greater

Global Games

67

than k, if he has observed signal x. Observe that if the true state is θ, the proportion of players observing a signal greater than k is 1 − F((k − θ )/σ ). This proportion is less than l if θ ≤ k − σ F −1 (1 − l). So, k−σ F −1 (1−l) 1 x −θ ∗ σ (l; x, k) = f dθ σ σ θ=−∞ ∞ x −θ f (z) dz, changing variables to z = = x−k −1 σ z= σ +F (1−l) x −k + F −1 (1 − l) . = 1− F (2.6) σ Also observe that if x = k, then σ∗ (·; x, k) is the identity function [i.e., σ∗ (l; x, k) = l], so it is the cumulative distribution function of the uniform density. Thus, 1 π(l, x)dl. πσ∗ (x, x) = l=0

Now by A3, 2.2.2.

πσ∗ (x, x)

= 0 implies x = θ ∗ .

䊏

Continuum Players: General Prior and Common Values

Now suppose instead that θ is drawn from a continuously differentiable strictly positive density p(·) on the real line and that a player’s utility depends on the realized state θ , not his signal of θ . Thus, u(a, l, θ ) is his payoff if he chooses action a, proportion l of his opponents choose action 1, and the state is θ, and as before, π(l, θ) ≡ u(1, l, θ) − u(0, l, θ ). We must also impose two extra technical assumptions. A4∗ : Uniform Limit Dominance: There exist θ ∈ R, θ ∈ R, and ε ∈ R++ , such that [1] π (l, θ) ≤ −ε for all l ∈ [0, 1] and θ ≤ θ ; and [2] there exists θ such that π (l, θ) > ε for all l ∈ [0, 1] and θ ≥ θ. Property A4∗ strengthens property A4 by requiring that the payoff gain to choosing action 0 is uniformly positive for sufﬁciently low values of θ , and the payoff gain to choosing action 1 is uniformly positive for sufﬁciently high values of θ .

∞ A6: Finite Expectations of Signals: z=−∞ z f (z)dz is well deﬁned. Property A6 requires that the distribution of noise is integrable. We will denote by G(σ ) this incomplete information game, with prior p(·) and satisfying A1, A2, A3, A4∗ , A5, and A6. Proposition 2.2. Let θ ∗ be deﬁned as in A3. For any δ > 0, there exists σ > 0 such that for all σ ≤ σ , if strategy s survives iterated deletion of strictly dominated strategies in the game G(σ ), then s(x) = 0 for all x ≤ θ ∗ − δ, and s(x) = 1 for all x ≥ θ ∗ + δ.

68

Morris and Shin

We will sketch here why this general prior, common values, game G(σ ) becomes like the uniform prior, private values, game G ∗ (σ ) as σ becomes small. A more formal proof is relegated to Appendix A. Consider σ (l; x, k), the probability that a player assigns to proportion less than or equal to l of the other players observing a signal greater than or equal to k, if he has observed signal x:

k−σ F −1 (1−l) p(θ ) f ( x−θ ) dθ

∞ x−θ σ σ (l; x, k) = θ=−∞ dθ θ=−∞ p(θ ) f σ

∞ −1 (1−l) p (x − σ z) f (z) dz z= x−k σ +F

∞ , = z=−∞ p (x − σ z) f (z) dz x −θ changing variables to z = . σ For small σ , the shape of the prior will not matter and the posterior beliefs over l will depend only on (x − k)/σ , the normalized difference between the x and k. Formally, setting κ = (x − k)/σ , we have

∞ z=κ+F −1 (1−l) p(x − σ z) f (z) dz ∗

∞ , σ (l; x, x − σ κ) = z=−∞ p(x − σ z) f (z) dz so that as σ → 0, σ∗ (l; x, x − σ κ) →

∞

f (z) dz z=κ+F −1 (1−l)

= 1 − F(κ + F −1 (1 − l)).

(2.7)

In other words, for small σ , posterior beliefs concerning the proportion of opponents choosing each action are almost the same as under a uniform prior. The formal proof of proposition 2.2 presented in Appendix A consists of showing, ﬁrst, that convergence of posterior beliefs described previously is uniform; and, second, that the small amount of uncertainty about payoffs in the common value case does not affect the analysis sufﬁciently to matter. 2.2.3.

Discussion

The proofs of propositions 2.1 and 2.2 follow the logic of Carlsson and van Damme (1993) and generalize arguments presented in Morris and Shin (1998). The technique of analyzing the uniform prior private values game, and then showing continuity with respect to the general prior, common values game, follows Frankel, Morris, and Pauzner (2000). (This paper is discussed further in Section 4.1.) Carlsson and van Damme (1993b) showed a version of the uniform prior result (proposition 2.1) in the ﬁnite player case (see also Kim, 1996). We brieﬂy discuss the relation to the ﬁnite player case in Appendix B.

Global Games

69

How do these propositions make use of the underlying assumptions? First, note that assumptions A1 and A2 represent very strong monotonicity assumptions: A1 requires that each player’s utility function is supermodular in the action proﬁle, whereas A2 requires that each player’s utility function is supermodular in his own action and the state. Vives (1990) showed that the supermodularity property A2 of complete information game payoffs is inherited by the incomplete information game. Thus, the existence of a largest and smallest strategy proﬁle surviving iterated deletion of dominated strategies when payoffs are supermodular, noted by Milgrom and Roberts (1990), can be applied also to the incomplete information game. The ﬁrst step in the proof of proposition 2.1 is a special case of this reasoning, with the state monotonicity assumption A2 implying, in addition, that the largest and smallest equilibria consist of strategies that are monotonic with respect to type (i.e., switching strategies). Once we know that we are interested in monotonic strategies, the very weak assumption A3 is sufﬁcient to ensure the equivalence of the largest and smallest equilibria and thus the uniqueness of equilibrium. Can one dispense with the full force of the supermodular payoffs assumption A1? Unfortunately, as long as A1 is not satisﬁed at the cutoff point θ ∗ [i.e., π(l, θ ∗ ) is decreasing in l over some range], then one can ﬁnd a problematic noise distribution f (·) such that the symmetric switching strategy proﬁle with cutoff point θ ∗ is not an equilibrium, and thus there is no switching strategy equilibrium. To obtain positive results, one must either impose additional restrictions on the noise distribution or relax A1 only away from the cutoff point. We discuss both approaches in turn. Athey (2002) provides a general description of how monotone comparative static results can be preserved in stochastic optimization problems, when supermodular payoff conditions are weakened to single crossing properties, but signals are assumed to be sufﬁciently well behaved (i.e., satisfy a monotone likelihood ratio property). Athey (2001) has used such techniques to prove existence of monotonic pure strategy equilibria in a general class of incomplete information games, using weaker properties on payoffs, but substituting stronger restrictions on signal distribution. We can apply her results to our setting as follows. Consider the following two new assumptions. A1∗ : Action Single Crossing: For each θ ∈ R, there exists l ∗ ∈ R ∪ {−∞, ∞} such that π(l, θ ) < 0 if l < l ∗ and π (l, θ ) > 0 if l > l ∗ . A7: Monotone Likelihood Ratio Property: If x > x, then f (x − θ)/ f (x − θ) is increasing in θ. Assumption A1∗ is a signiﬁcant weakening of assumption A1 to a single crossing property. Assumption A7 is a new restriction on the distribution of the noise. Recall that we earlier made no assumptions on the distribution of the ) the incomplete information game with a uniform prior noise. Denote by G(σ ∗ satisfying A1 , A2, A3, A4, A5, and A7.

70

Morris and Shin

) has a unique (symmetLemma 2.3. Let θ ∗ be deﬁned as in A3. The game G(σ ric) switching strategy equilibrium, with s(x) = 0 for all x < θ ∗ and s(x) = 1 for all x > θ ∗ . The proof is in Appendix C. An analog of proposition 2.2 could be similarly constructed. Notice that this result does not show the nonexistence of other, nonmonotonic, equilibria. Additional arguments are required to rule out nonmonotonic equilibria. For example, in Goldstein and Pauzner (2000a) – an application to bank runs discussed in the next section – noise is uniformly distributed (and thus satisﬁes A7) and payoffs satisfy assumption A1∗ . They show that (1) there is a unique symmetric switching strategy equilibrium and that (2) there is no other equilibrium. Lemma 2.3 could be used to extend the former result to all noise distributions satisfying the MLRP (assumption A7), but we do not know if the latter result extends beyond the uniform noise distribution. Proposition 2.1 can also be weakened by allowing assumption A1 to fail away from θ ∗ . We will report one weakening that is sufﬁcient. Let g(·) and h(·) be densities on the interval [0, 1]; g stochastically dominates h (g h)

l

l if z=0 g(z) dz ≤ z=0 h(z) dz for all l ∈ [0, 1]. We write g(·) for the uniform density on [0, 1], i.e., g(l) = 1 for all l ∈ [0, 1]. Now consider

1 A8: There exists θ ∗ which solves l=0 π (l, θ ∗ )dl = 0 such that [1]

1 x ≥ θ ∗ and g g, with strict inl=0 g(l) π(l, x)dl ≥ 0 for all

1 ∗ equality if x > θ ; and [2] l=0 g(l)π (l, x)dl ≤ 0 for all x ≤ θ ∗ and g g, with strict inequality if x < θ ∗ . We can replace A1–A3 with A8 in propositions 2.1 and 2.2, and all the arguments and results go through. Observe that A1–A3 straightforwardly imply A8. Also, observe that A8 implies that π(l, θ ∗ ) be nondecreasing in l [suppose that l > l and π(l, θ ∗ ) < π (l , θ ∗ ); now start with the uniform distribution g and shift mass from l to l]. But, A8 allows some failure of A1 away from θ ∗ . Propositions 2.1 and 2.2 deliver strong negative conclusions about the efﬁciency of noncooperative outcomes in global games. In the limit, all players

1 will be choosing action 1 when the state is θ if l=0 π (l, θ )dl > 0. However, it is efﬁcient to choose action 1 at state θ if u(1, 1, θ ) > u(0, 0, θ ). These conditions will not coincide in general. For example, in the investment example, we had u(1, l, θ ) = θ + l − 1, u(0, l, θ) = 0 and thus π(l, θ ) = θ + l − 1. So in the limiting equilibrium, both players will be investing if the state θ is at least 12 , although it is efﬁcient for them to be investing if the state is at least 0. The analysis of the unique noncooperative equilibrium serves as a benchmark describing what will happen in the absence of other considerations. In practice, repeated play or other institutions will often allow players to do better. We will brieﬂy consider what happens in the game if players were allowed to make

Global Games

71

cheap talk statements about the signals that they have observed in the investment example (for this exercise, it is most natural to consider a ﬁnite player case; we consider the two-player case). The arguments here follow Baliga and Morris (2000). The investment example as formulated has a nongeneric feature, which is that if a player plans not to invest, he is exactly indifferent about which action his opponent will take. To make the problem more interesting, let us perturb the payoffs to remove this tie: Table 3.2. Payoffs for cheap talk example

Invest NotInvest

Invest

NotInvest

θ + δ, θ + δ δ, θ − 1

θ − 1, δ 0, 0

Thus, each player receives a small payoff δ (which may be positive or negative) if the other player invests, independent of his own action. This change does not inﬂuence each player’s best responses, and the analysis of this game in the absence of cheap talk is unchanged by the payoff change. But, observe that if δ ≤ 0, there is an equilibrium of the game with cheap talk, where each player truthfully announces his signal, and invests if the (common) expectation of θ conditional on both announcements is greater than −δ (this gives the efﬁcient outcome). On the other hand, if δ > 0, then each player would like to convince the other to invest even if he does not plan to do so. In this case, there cannot be a truth-telling equilibrium where the efﬁcient equilibrium is achieved, although there may be equilibria with some partially revealing cheap talk that improves on the no cheap talk outcome. 2.3.

Applications

We now turn to applications of these results and describe models of pricing debt (Morris and Shin, 1999b), currency crises (Morris and Shin, 1998), and bank runs (Goldstein and Pauzner, 2000a).4 Each of these papers makes speciﬁc assumptions about the distribution of payoffs and signals. But, if one is interested only in analyzing the limiting behavior as noise about θ becomes 4

See Fukao (1994) for an early argument in favor of using global game reasoning in applied settings. Other applications include Karp’s (2000) noisy version of Krugman’s (1991) multiple equilibrium model of sectoral shifts; Scaramozzino and Vulkan’s (1999) noisy model of Shleifer’s (1986) multiple equilibrium model of implementation cycles; and D¨onges and Heinemann’s (2000) model of competition between dealer markets and crossing networks in ﬁnancial markets.

72

Morris and Shin

small, the results of the previous section imply that we can identify the limiting behavior independently of the prior beliefs and the shape of the noise.5 In each example, we describe one comparative static exercise changing the payoffs of the game, illustrating how changing payoffs has a direct effect on outcomes and an indirect, strategic effect via the impact on the cutoff point of the unique equilibrium. We emphasize that it is also interesting in the applications to study behavior away from the limit; indeed, the focus of the analysis in Morris and Shin (1999b) is on comparative statics away from the limit. More assumptions on the shape of the prior and noise are required in this case. We study behavior away from the limit in Section 3. 2.3.1.

Pricing Debt

In Morris and Shin (1999b), we consider a simple model of debt pricing. In period 1, a continuum of investors hold collateralized debt that will pay 1 in period 2 if it is rolled over and if an underlying investment project is successful; the debt will pay 0 in period 2 if the project is not successful. If an investor does not roll over his debt, he receives the value of the collateral, κ ∈ (0, 1). The success of the project depends on the proportion of investors who do not roll over and the state of the economy, θ. Speciﬁcally, the project is successful if the proportion of investors not rolling over is less than θ/z. Writing 1 for the action “roll over” and 0 for the action “do not roll over,” payoffs can be described as follows: 1, if z (1 − l) ≤ θ u(1, l, θ ) = 0, if z (1 − l) > θ, u (0, l, θ ) = κ. So π(l, θ) ≡ u(1, l, θ ) − u(0, l, θ) 1 − κ, if z(1 − l) ≤ θ = −κ, if z(1 − l) > θ. Now

if −κ, θ π (l, θ ) dl = z − κ, if l=0 1 − κ, if

5

1

θ ≤0 0≤θ ≤z z ≤ θ.

The model in Goldstein and Pauzner (2000a) fails the action monotonicity property (A1) of the previous section, but they are nonetheless able to prove the uniqueness of a symmetric switching equilibrium, exploiting their assumption that noise terms are distributed uniformly. However, their game satisﬁes assumptions A1* and A2, and therefore whenever there is a unique equilibrium, it must satisfy the Laplacian characterization with the cutoff point θ ∗ deﬁned as in A3.

Global Games

1.0 0.8 0.6 V (κ) 0.4 0.2

73

.. .... .... .... .... . . .... .... .... ... .... .... . . . .. .... ... .... ... .... .... . . .... .... .... ... .... ... . . . . .... ... .... ... .... .. .... ... ... . ... . . . .. .... .. ... .. .... ... .... .. .... ... ... . ... . . .. ... ... .... ... .... .... ... ..... ... ....... ............ .....

0.0 0.0

0.2

0.4

0.6

0.8

1.0

κ Figure 3.2. Function V (κ).

Thus, θ ∗ = zκ. In other words, if private information about θ among the investors is sufﬁciently accurate, the project will collapse exactly if θ ≤ zκ. We can now ask how debt would be priced ex ante in this model (before anyone observed private signals about θ ). Recalling that p(·) is the density of the prior on θ , and writing P(·) for the corresponding cdf, the value of the collateralized debt will be V (κ) ≡ κ P(zκ) + 1 − P(zκ) = 1 − (1 − κ)P(zκ), and dV = P(zκ) − z(1 − κ) p(zκ). dκ Thus, increasing the value of collateral has two effects: ﬁrst, it increases the value of debt in the event of default (the direct effect). But, second, it increases the range of θ at which default occurs (the strategic effect). For small κ, the strategic effect outweighs the direct effect, whereas for large κ, the direct effect outweighs the strategic effect. Figure 3.2 plots V (·) for the case where z = 10 and p(·) is the standard normal density. Morris and Shin (1999b) study the model away from the limit and argue that taking the strategic, or liquidity, effect into account in debt pricing can help explain anomalies in empirical implementation of the standard debt pricing theory of Merton (1974). Brunner and Krahnen (2000) present evidence of the importance of debtor coordination in distressed lending relationships in Germany [see also Chui, Gai, and Haldane (2000) and Hubert and Sch¨afer (2000)].

74

Morris and Shin

2.3.2.

Currency Crises

In Morris and Shin (1998), a continuum of speculators must decide whether to attack a ﬁxed–exchange rate regime by selling the currency short. Each speculator may short only a unit amount. The current value of the currency is e∗ ; if the monetary authority does not defend the currency, the currency will ﬂoat to the shadow rate ζ (θ), where θ is the state of fundamentals. There is a ﬁxed transaction cost t of attacking. This can be interpreted as an actual transaction cost or as the interest rate differential between currencies. The monetary authority defends the currency if the cost of doing so is not too large. Assuming that the costs of defending the currency are increasing in the proportion of speculators who attack and decreasing in the state of fundamentals, there will be some critical proportion of speculators, a(θ ), increasing in θ, who must attack in order for a devaluation to occur. Thus, writing 1 for the action “not attack” and 0 for the action “attack,” payoffs can be described as follows: u(1, l, θ ) = 0, ∗ e − ζ (θ ) − t, u (0, l, θ ) = −t,

if l ≤ 1 − a (θ ) if l > 1 − a(θ ),

where ζ (·) and a(·) are increasing functions, with ζ (θ ) ≤ e∗ − t for all θ. Now ζ (θ ) + t − e∗ , if l ≤ 1 − a(θ ) π (l, θ ) = t, if l > 1 − a(θ). If θ were common knowledge, there would be three ranges of parameters. If θ < a −1 (0), each player has a dominant strategy to attack. If a −1 (0) ≤ θ ≤ a −1 (1), then there is an equilibrium where all speculators attack and another equilibrium where all speculators do not attack. If θ > a −1 (1), each player has a dominant strategy to attack. This tripartite division of fundamentals arises in a range of models in the literature on currency crises (see Obstfeld, 1996). However, if θ is observed with noise, we can apply the results of the previous section, because π(l, θ ) is weakly increasing in l, and weakly increasing in θ : 1 π (l, θ ) dl = (1 − a(θ ))(ζ (θ ) + t − e∗ ) + a(θ )t l=0

= t − (1 − a(θ ))(e∗ − ζ (θ )).

Thus, θ ∗ is implicitly deﬁned by (1 − a(θ))(e∗ − ζ (θ )) = t. Theorem 2 in Morris and Shin (1998) gave an incorrect statement of this condition. We are grateful to Heinemann (2000) for pointing out the error and giving a correct characterization. Again, we will describe one simple comparative statics exercise. Consider a costly ex ante action R for the monetary authority that lowered their costs of defending the currency. For example, R might represent the value of foreign currency reserves or (as in the recent case of Argentina) a line of credit with

Global Games

75

foreign banks to provide credit in the event of a crisis. Thus, the critical proportion of speculators for which an attack occurs becomes a(θ, R), where a(·) is increasing in R. Now, write θ ∗ (R) for the unique value of θ solving (1 − a(θ, R))(e∗ − ζ (θ )) = t. The ex ante probability that the currency will collapse is P(θ ∗ (R)). So, the reduction in the probability of collapse resulting from a marginal increase in R is − p(θ ∗ (R))

dθ ∗ = p(θ ∗ (R)) ∂a dR + ∂θ

∂a ∂R 1−a(θ,R) dζ e∗ −ζ (θ) dθ

.

This comparative static refers to the limit (as noise becomes very small), and the effect is entirely strategic [i.e., the increased value of R reduces the probability of attack only because it inﬂuences speculators’ equilibrium strategies (“builds conﬁdence”) and not because the increase in R actually prevents an attack in any relevant contingency]. In Section 4.1, we very brieﬂy discuss Corsetti, Dasgupta, Morris, and Shin (2000), an extension of this model of currency attacks where a large speculator is added to the continuum of small traders [see also Chan and Chiu (2000), Goldstein and Pauzner (2000b), Heinemann and Illing (2000), Hellwig (2000), Marx (2000), Metz (2000), and Morris and Shin (1999a)]. 2.3.3.

Bank Runs

We describe a model of Goldstein and Pauzner (2000a), who add noise to the classic bank runs model of Diamond and Dybvig (1983). A continuum of depositors (with total deposits normalized to 1) must decide whether to withdraw their money from a bank or not. If the depositors withdraw their money in period 1, they will receive r > 1 (if there are not enough resources to fund all those who try to withdraw, then the remaining cash is divided equally among early withdrawers). Any remaining money earns a total return R(θ) > 0 in period 2 and is divided equally among those who chose to wait until period 2 to withdraw their money. Proportion λ of depositors will have consumption needs only in period 1 and will thus have a dominant strategy to withdraw. We will be concerned with the game among the proportion 1 − λ of depositors who have consumption needs in period 2. Consumers have utility U (y) from consumption y, where the relative risk aversion coefﬁcient of U is strictly greater than 1. They note that if R(θ) was greater than 1 and θ were common knowledge, the ex ante optimal choice of r maximizing 1 − λr λU (r ) + (1 − λ)U R (θ ) 1−λ

76

Morris and Shin

would be strictly greater than 1. But, if θ is not common knowledge, we have a global game. Writing 1 for the action “withdraw in period 2” and 0 for the action “withdraw in period 1,” and l for the proportion of late consumers who do not withdraw early, the money payoffs in this game can be summarized in Table 3.3: Table 3.3. Payoffs in bank run game l≤ Early 0 Withdrawal Late 1 Withdrawal

r −1 r (1−λ)

1−λr (1−λ)(1−l)r

0

l≥

r −1 r (1−λ)

r r−

r −1 l(1−λ)

R (θ)

Observe that, if θ is sufﬁciently small [and so R(θ ) is sufﬁciently small], all players have a dominant strategy to withdraw early. Goldstein and Pauzner assume that, if θ is sufﬁciently large, all players have a dominant strategy to withdraw late (a number of natural economic stories could justify this variation in the payoffs). Thus, the payoffs in the game among late consumers are

U (0), if l ≤ u(1, l, θ ) = r −1 U r − l(1−λ) R (θ) , if l ≥ 1 r −1 U 1−l(1−λ) , if l ≤ r (1−λ) u(0, l, θ ) = r −1 U (r ), if l ≥ r (1−λ) so that

1 U(0) − U , 1−l(1−λ) π(l, θ ) = r −1 U r − R (θ ) − U (r ), l(1−λ)

r −1 r (1−λ) r −1 , r (1−λ)

if l ≤ if l ≥

r −1 r (1−λ) r −1 . r (1−λ)

The threshold state θ ∗ is implicitly deﬁned by r −1 r (1−λ) 1 U (0) − U dl 1 − l (1 − λ) l=0 1 r −1 R (θ ) − U(r ) dl = 0. U r− + r −1 l (1 − λ) l= r (1−λ) The ex ante welfare of consumers as a function of r (as noise goes to zero) is W (r ) = P(θ ∗ (r ))U (1) ∞ 1 − λr R(θ ) . p(θ ) λU (r ) + (1 − λ)U + 1−λ θ=θ ∗ (r )

Global Games

77

There are two effects of increasing r : the direct effect on welfare is the increased value of insurance in the case where there is not a bank run. But, there is also the strategic effect that an increase in r will lower θ ∗ (r ). Morris and Shin (2000) examine a stripped down version of this model, where alternative assumptions on the investment technology and utility functions imply that payoffs reduce to those of the linear example in Section 2.1 [see also Boonprakaikawe and Ghosal (2000), Dasgupta (2000b), Goldstein (2000), and Rochet and Vives (2000)]. 3. PUBLIC VERSUS PRIVATE INFORMATION The analysis so far has all been concerned with behavior when either there is a uniform prior or the noise is very small. In this section, we look at the behavior of the model with large noise and nonuniform priors. There are three reasons for doing this. First, we want to understand how extreme the assumptions required for uniqueness are. We will provide sufﬁcient conditions for uniqueness depending on the relative accuracy of private and public (or prior) signals. Second, away from the limit, prior beliefs play an important role in determining outcomes. In particular, we will see how even with a continuum of players and a unique equilibrium, public information contained in the prior beliefs plays a signiﬁcant role in determining outcomes, even controlling for beliefs concerning the fundamentals. Finally, by seeing how and when the model jumps from having one equilibrium to multiple equilibria, it is possible to develop a better intuition for what is driving results. We return to the linear example of Section 2.1: there is a continuum of players, the payoff to not investing is 0, and the payoff to investing is θ + l − 1, where θ is the state and l is the proportion of the population investing. It may help in following in the analysis to recall that, with linear payoffs, the exact number of players is irrelevant in identifying symmetric equilibrium strategies (and we will see that symmetric equilibrium strategies will naturally arise). Thus, the analysis applies equally to a two-player game. Now assume that θ is normally distributed with mean y and standard deviation τ . The mean y is publicly observed. As before, each player observes a private signal xi = θ + εi , where the εi are distributed normally in the population with mean 0 and standard deviation σ . Thus, each player i observes a public signal y ∈ R and a private signal xi ∈ R. To analyze the equilibria of this game, ﬁrst ﬁx the public signal y. Suppose that a player observed private signal x. His expectation of θ is θ=

σ 2 y + τ 2x . σ2 + τ2

It is useful to conduct analysis in terms of these posterior expectations of θ. In particular, we may consider a switching strategy of the following form: Invest, if θ > κ s(θ ) = NotInvest, if θ ≤ κ.

78

Morris and Shin

If the standard deviation of players’ private signals is sufﬁciently small relative to the standard deviation of the public signal in the prior, then there is a strategy surviving iterated deletion of strictly dominated strategies. Speciﬁcally, let σ2 σ2 + τ2 . γ ≡ γ (σ, τ ) ≡ 4 τ σ 2 + 2τ 2 Now we have Proposition 3.1. The game has a symmetric switching strategy equilibrium with cutoff κ if κ solves the equation √ κ = ( γ (κ − y)); (3.1) if γ (σ, τ ) ≤ 2π , then there is a unique value of κ solving (3.1) and the strategy with that trigger is the essentially unique strategy surviving iterated deletion of strictly dominated strategies; if γ (σ, τ ) > 2π, then (for some values of y) there are multiple values of κ solving (3.1) and multiple symmetric switching strategy equilibria. Figure 3.3 plots the regions in σ 2 − τ 2 space, where uniqueness holds. In Morris and Shin (2000), we gave a detailed version of the uniqueness part of this result in Appendix A. Here, we sketch the idea. Consider a player who has observed private signal x. By standard properties of the normal distribution

5

σ2 τ4

... ... ... .. .... . ... .. . ... . ... σ2 . . ... . . .. . . .. τ4 . ... .. .. . ... . .. ... . . . .. ... ... .. .. .. ... . .. ... . . .. ... . .. . .. ... ... . .. .. .. .. .. . . ... . .. ... . .. . .. ... . .. . ... . . .. . .. . .. . .. . . ... . . ... . . . . .. . .. . . ... . . .. . .. . .. .. . . . . .. . . .. . . ... . .. . . .. . . .. .. . . . ... . . ... . . ... .. . ... . ... . . ... .. . . . . ... . . ... . . ..... ... . . . ... .. . ..... .. . . . . . . ... .. .. ......... .. ...... .. .. ........ .......... ................ . . . . .. . . . . .. .... ........... ..... ....

= 4π

4 σ2 3

2

1

0

σ 2 +τ 2 σ2 +2τ 2 σ2 τ4

= 2π

= 2π

multiplicity

uniqueness

0.0

0.2

0.4

0.6

0.8

τ2

Figure 3.3. Parameter range for unique equilibrium.

Global Games

79

(see DeGroot, 1970), his posterior beliefs about θ would be normal with mean θ=

σ 2 y + τ 2x σ2 + τ2

and standard deviation σ 2τ 2 . σ2 + τ2 He knows that any other player’s signal, x , is equal to θ plus a noise term with mean 0 and standard deviation σ . Thus, he believes that x is distributed normally with mean θ and standard deviation 2σ 2 τ 2 + σ 4 . σ2 + τ2 Now suppose he believed that all other players will invest exactly if their expectation of θ is at least κ [i.e., if their private signals x satisfy (σ 2 y + τ 2 x )/(σ 2 + τ 2 ) ≥ κ, or x ≥ κ + (σ 2 /τ 2 )(κ − y)]. Thus, he assigns probability 2 κ − θ + στ 2 (κ − y) (3.2) 1 − 2σ 2 τ 2 +σ 4 σ 2 +τ 2

to any particular opponent investing. But his expectation of the proportion of his opponents investing must be equal to the probability he assigns to any one opponent investing. Thus, (3.2) is also equal to his expectation of the proportion of his opponents investing. Because his payoff to investing is θ + l − 1, his expected payoff to investing is θ plus expression (3.2) minus one, i.e., 2 κ − θ + στ 2 (κ − y) . v(θ , κ) ≡ θ − 2σ 2 τ 2 +σ 4 σ 2 +τ 2

His payoff to not investing is 0. Because v(θ , κ) is increasing in θ , we have that there is a symmetric equilibrium with switching point κ exactly if v ∗ (κ) ≡ v(κ, κ) = 0. But v ∗ (κ) ≡ v(κ, κ)

2 (κ − y) σ = κ − 2 2 4 τ 2 2σσ 2τ+τ+σ2 √ = κ − ( γ (κ − y)) .

Figure 3.4 plots the function v ∗ (κ) for y = respectively.

1 2

and γ = 1,000, 10, 5, and 0.1,

80

Morris and Shin

1.0 ν ∗ (κ) 0.5

0.0

. ... .... ... .... .... . . .. .... ... .... ... .... ... . . . .. .... ... ....... .... ..... .... ...... ... ..... .... . ......... . ........... .... . . . . . . . . . ....... ... ......... ... ........ ... ......... .... ........ ... .......... .... ...... ............. . .... .... .... . . . . . . . ... . .. ......... .... ... .... .......... ... ... ..... ... ... ..... ...... .... .... ... ..... .... ... ... ... ...... ........ .... ... ... .................. ... . ... . . . . . . . . . ... ..... .... ... .... ... ....... ....... ... .... ...... ............ ... .... .... . ........ ... ... .... ........ .............. .... ... ......... .... .... ..... ... .......... . . ... . . . . . . . . . . . . . . . . . . . . . . . . .. ... .. .................. ....................................... ...... .. ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... . ................................ ...... .... ....................................... ... ........ ... ... ..... .... ..... .... .... .......... ... .... ... .... .... .......... ................ ... . . . . . . . . . . . .... ... .......... ............. .... . . . . . . . ... .. ... .... ... ....... ........ .... ... .... ....... ..... .... .. ... ...... ..... .... ...... ...... ... ... ..... ... ...... ...... .. ... ... ................ . . . . . . . . . . . . .. . ... . .... ... .......... .... ... .......... ... ..... .... ........ ..... ......... .... .... ........... ............ .... . . . . . . . .... .. ......... .... ....... ... ......... .... ....... ... ...... .... ... ....... . . . . . . . . .... .... ...... .... .... ..... .... .... ... ..... .... .... .... . . ...

γ = 1000

γ = 10

γ=5

−0.5

γ = 0.1

−0.5

0.0

0.5

1.0

κ

1.5

Figure 3.4. Function ν ∗ (κ).

The intuition for these graphs is the following. If public information is relatively large (i.e., σ τ and thus γ is large), then players with posterior expectation κ less than y = 12 conﬁdently expect that their opponent will have observed a higher signal, and therefore will be investing. Thus, his expected utility is (about) κ. But, as κ moves above y = 12 , he rapidly becomes conﬁdent that his opponent has observed a lower signal and will not be investing. Thus, his expected utility drops rapidly, around y, to (about) κ − 1. But, if public information is relatively small (i.e., σ τ and γ is small), then players with κ not too far above or below y = 12 attach probability (about) 12 to their opponent observing a higher signal. Thus, his expected utility is (about) κ − 12 . We can identify analytically when there is a unique solution: Observe that dv ∗ √ √ = 1 − γ φ( γ (κ − y)) . dκ Recall √ that φ(x), the density of the standard normal, attains its maximum of 1/ 2π at x = 0. Thus, if γ ≤ 2π , dv ∗ /dκ is greater than or equal to zero always, and strictly greater than zero, except when κ = y. So, (3.1) has a unique solution. But, if γ > 2π and y = 12 , then setting κ = 12 solves (3.1), but dv ∗ /dκ|κ= 12 < 0, so (3.1) has two other solutions. Throughout the remainder of this section, we assume that there is a unique equilibrium [i.e., that γ (α, β) ≤ 2π ]. Under this assumption, we can invert the equilibrium condition (3.1) to show in (θ¯ , y) space what the unique equilibrium

Global Games

2 y

1 0 −1 −2

81

... ... ... ... .. ... ... ... ... ... ... ... ... ... ... .. ... ... ... ... ... ... ... ... .. ... ... .... ... ....... ... ......... ... ................. ........................................ .......................................................................................................... ............................... .. .............. ... ........ ... ...... .. .... .. ... ... ... ... ... ... ... ... ... ... ... .... ..... ... ... .... ... ... ... .... ... ... .... ... ... ... ... ... ... ... ... ... ... ...

γ = 0.001

γ=5

invest

not invest

0.0

0.4

κ

0.8

Figure 3.5. Investment takes place above and to the right of the line.

looks like: 1 y = h γ (θ¯ ) = θ¯ − √ −1 (θ¯ ). γ

(3.3)

Figure 3.5 plots this for γ = 5 and γ = 1/1,000. The picture has an elementary intuition. If θ¯ < 0, it is optimal to not invest (independent of the public signal). If θ¯ > 1, it is optimal to invest (independent of the public signal). But, if 0 < θ¯ < 1, there is a trade-off. The higher y is ¯ the more likely it is that the other player will invest. Thus, if (for a given θ), 0 < θ¯ < 1, the player will always invest for sufﬁciently high y, and not invest for sufﬁciently low y. This implies in particular that changing y has a larger impact on a player’s action than changing his private signal (controlling for the informativeness of the signals). We next turn to examining this “publicity” effect.

3.1.

The Publicity Multiplier

To explore the strategic impact of public information, we examine how much a player’s private signal must adjust to compensate for a given change in the public signal. Equation (3.1) can be written as 2 σ 2 y + τ 2x σ y + τ 2x √ − γ − y = 0. σ2 + τ2 σ2 + τ2 Totally differentiating with respect to y gives √ σ2 + γ φ(·) dx τ2 =− . √ dy 1 − γ φ(·) This measures how much the private signal would have to change to compensate for a change in the public signal (and still leave the player indifferent between investing or not investing). We can similarly see how much the private signal

82

Morris and Shin

would have to change to compensate for a change in the public signal, if there was no strategic effect. Totally differentiating θ=

σ 2 y + τ 2x = k, σ2 + τ2

we obtain dx σ2 =− 2. dy τ Deﬁne the publicity multiplier as the ratio of these two: 2 √ 1 + στ 2 γ φ(·) . ζ = √ 1 − γ φ(·) Thus, suppose a player’s expectation of θ is θ and he has observed the public signal that makes him indifferent between investing and not investing [y = √ θ − (1/ γ )−1 (θ )]; the publicity multiplier evaluated at this point will be: ζ =

τ2 √ γ φ(−1 (θ )) σ2 . √ − γ φ((−1 (θ )))

1+ 1

Notice that (for any given σ and τ ) the publicity multiplier is maximized when θ = 12 , and thus the critical public signal y = 12 . Thus, it is precisely when there is no conﬂict between private and public signals that the multiplier has its biggest effect. Here, the publicity multiplier equals 2 γ 1 + στ 2 2π ζ∗ = . γ 1 − 2π Notice that, when γ is small (i.e., σ/τ 2 is small), the publicity multiplier is very small. The multiplier is biggest just before we hit the multiplicity zone of the parameter space (i.e., when γ ≈ 2π ). There is plentiful anecdotal evidence that in settings where coordination is important, public signals play a role in coordinating outcomes that exceed the information content of those announcements. For example, ﬁnancial markets apparently “overreact” to announcements from the Federal Reserve Board and public announcements in general. If market participants are concerned about the reaction of other participants to the news, the “overreaction” may be rational and determined by the type of equilibrium logic of our example. Further evidence for this is brieﬁngs on market conditions by key players in ﬁnancial markets using conference calls with hundreds of participants. Such public brieﬁngs have a larger impact on the market than bilateral brieﬁngs with the same information, because they automatically convey to participants not only information about market conditions, but also valuable information about the beliefs of the other participants.

Global Games

83

Urban renewal also has a coordination aspect. Private ﬁrms’ incentives to invest in a run-down neighborhood depend partly on exogenous characteristics of the neighborhood, but they also depend to a great extent on whether other ﬁrms are investing. A well-publicized investment in the neighborhood might be expected to have an apparently disproportionate effect on the probability of ending in the good equilibrium. The willingness of public authorities to subsidize football stadiums and conference centers is consistent with this view. An indirect econometric test of the publicity effect is performed by Chwe (1998). Chwe observes that the per viewer price of advertising during the Super Bowl is exceptionally high (i.e., the price of advertising increases more than linearly in the number of viewers). The premium price is explained by the fact that any information conveyed by those advertisements becomes not merely known to the wide audience, but also common knowledge among them. The value of this common knowledge to advertisers should depend on whether there is a signiﬁcant coordination problem in consumers’ decisions whether to purchase the product. Chwe makes some plausible ex ante guesses about when coordination is an important issue because of network externalities (e.g., the Apple Macintosh) or social consumption (e.g., beer) and when it is not (e.g., batteries). He then conﬁrms econometrically that it is the advertisers of coordination goods who pay a premium for large audiences. In Morris and Shin (1999b), we use the publicity effect to explain an anomaly in the pricing of debt. Empirically, the option pricing model of debt due to Merton (1974) underestimates the yield on debt (i.e., underestimates the empirical default rate). This deviation from theory is largest for low-grade (highrisk) bonds. A deterioration in public signals for low-grade bonds generates a large publicity effect: the deterioration makes investors more pessimistic about default for any given strategies of the other players, but, more importantly, the deterioration makes investors more pessimistic about other players’ strategies.

3.2.

Limiting Behavior

If we increase the precision of public signals, while holding the precision of private signals ﬁxed (i.e., let τ → 0 for ﬁxed σ ), then we clearly exit the unique equilibrium zone.6 If we increase the precision of private signals, while holding the precision of public signals ﬁxed (i.e., let σ → 0 for ﬁxed τ ), then we return to the uniform prior setting of Section 2.1. But, we can also examine what happens to the unique equilibrium as the precision of both signals increases in such a way that uniqueness is maintained. Speciﬁcally, let τ → 0 and let 6

For sufﬁciently small τ , either action is rationalizable as long as y ∈ (0, 1) and θ ∈ (0, 1). If either θ ≥ 1 or θ > 0 and y ≥ 1, then only investing is rationalizable. If either θ ≤ 0 or θ < 1 and y ≤ 0, then only not investing is rationalizable.

84

Morris and Shin

σ 2 → cτ 4 , where c < 4π . In this case, σ2 σ2 + τ2 γ (σ, τ ) = 4 τ σ 2 + 2τ 2 4 cτ cτ 4 + τ 2 → 4 τ cτ 4 + 2τ 2 c → 2 < 2π. Thus

2 −1 θ . hγ (σ,τ ) (θ ) → θ − c

This result says that, even though the public signal becomes irrelevant to a player’s expected value of θ in the limit, it continues to have a large impact on the outcome. For example, suppose c = 1 and y = 13 (i.e., public information looks bad). Each player will invest only if θ ≥ 0.7 (i.e., they will be very conservative). This is true even as they ignore y (i.e., θ → x). The intuition for this result is the following. Suppose public information looks bad (y < 12 ). If each player’s private information is much more accurate than the public signal, each player will mostly ignore the public signal in forming his own expectation of θ. But, each will nonetheless expect the other to have observed a somewhat worse signal than themselves. This pessimism about the other’s signal makes it very hard to support an investment equilibrium. 3.3.

Sufﬁcient Conditions for Uniqueness

We derived a very simple necessary and sufﬁcient condition for uniqueness in the linear example, depending only on the precision of public and private signals. In this section, we brieﬂy demonstrate that a similar sufﬁcient condition works for general payoff functions. In particular, we will show that there is always a unique equilibrium if σ 2 /τ 4 is sufﬁciently small.7 We will show this in a simple setting, although the argument can be extended. We maintain the normal distribution assumptions on the prior and signals, but let the payoffs be as in Section 2.2, so that π(l, θ ) is the payoff gain from choosing action 1 instead of action 0. Furthermore, we will focus on the continuum players case, where π (l, θ) is differentiable and strictly increasing in l and θ , with dπ/dl(l, θ) ≤ K and dπ/dθ(l, θ ) ≥ ε for all l and θ . Under these assumptions, we may look at the expected gain to choosing action 1 rather than action 0 if your expectation of θ is θ and you think that 7

Hellwig (2000) performs a related exercise in a version of our currency attacks model (Morris and Shin, 1998).

Global Games

85

others follow a switching strategy at κ: ∞ σ 2τ 2 θ − θ V (θ , κ) = φ σ2 + τ2 σ 2τ 2 θ=−∞

×π 1 − =

∞

θ =−∞

σ 2 +τ 2

κ −θ +

σ2 τ2

(κ − y)

σ

, θ dθ

σ 2τ 2 θ φ σ2 + τ2 σ 2τ 2 σ 2 +τ 2

−θ + κ − θ + ×π 1 − σ

σ2 τ2

(κ − y)

, θ + θ dθ .

Now to apply our earlier argument for uniqueness, it is enough to show that expression is increasing in θ and V (κ, κ) = 0 has a unique solution. The former is clearly true; to show the latter, observe that ∞ σ 2τ 2 θ φ V (κ, κ) = σ2 + τ2 σ 2τ 2 θ =−∞ σ 2 +τ 2 2 −θ + στ 2 (κ − y)

, θ + κ dθ , ×π 1 − σ so

σ 2 τ 2 θ dπ(·) dπ(·) σ − φ (·) 2 dθ

φ σ2 + τ2 dθ dl τ σ 2τ 2 θ =−∞ σ 2 +τ 2

∞ dπ (·) σ σ 2 τ 2 θ dπ(·) dl 1 − dπ (·) φ (·) 2 dθ . = φ σ2 + τ2 dθ τ σ 2τ 2 θ =−∞ dθ

d V (κ, κ) = dκ

∞

σ 2 +τ 2

(3.4) If this expression is always positive, then there is a unique value of κ solving V (κ, κ) = 0, and the unique strategy surviving iterated deletion of strictly dominated strategies is the switching strategy with that cutoff. Because φ(·) √ is at most 1/ 2π , the expression in square brackets within equation (3.4) is positive as long as dπ (·) √ 2 dl < τ 2π ; dπ (·) σ dθ

86

Morris and Shin

since dπ (·) dl ≤ K ; dπ (·) ε dθ this will be true as long as √ K τ 2 2π < , ε σ i.e., ε 2 σ2 < 2π . 4 τ K 4. THEORETICAL UNDERPINNINGS 4.1.

General Global Games

All the analysis thus far has dealt with symmetric payoff games. The analysis of Carlsson and van Damme (1993a) in fact provided a remarkably general result for two-player, two-action games, even with asymmetric payoffs. Let the payoffs of a two-player, two-action game be given by Table 3.4: Table 3.4. Payoffs for general 2 × 2 global game

1 0

1

0

θ 1 , θ2 θ 5 , θ6

θ 3 , θ4 θ 7 , θ8

Thus, a vector θ ∈ R8 describes the payoffs of the game. Each player i observes a signal xi = θ + σ εi , where the εi are eight-dimensional noise terms. This setup describes an incomplete information game parameterized by σ . Under mild technical assumptions,8 as σ → 0, any sequence of strategy proﬁles surviving iterated deletion of strictly dominated strategies converges to a unique limit. Moreover, that limit is independent of the distribution of the noise and has the unique Nash equilibrium of the underlying complete information game being played (if there is one), and has the risk-dominant Nash equilibrium played (if there are two strict Nash equilibria). To understand if and when this remarkable result might extend to many players and many action games, it is useful to ﬁrst observe that there are two 8

The following technical conditions are sufﬁcient (Carlsson and van Damme’s actual setup is a little more general): payoff vector θ is drawn according to a strictly positive, continuously differentiable, bounded density on R8 ; and the noise terms (ε1 , ε2 ) are drawn according to a continuous density with bounded support, independently of θ .

Global Games

87

independent things being proved here. First, there is a limit uniqueness result. As the noise goes to zero, there is a unique strategy proﬁle surviving iterated deletion of strictly dominated strategies. Given that with no noise we know that there are multiple equilibria, this is a striking result by itself. Second, there is a noise-independent selection result. We can characterize behavior in that unique limit as a function of the complete information payoffs in the limit, and thus independently of the shape of the prior beliefs on θ and the distribution of noise. Thus, Carlsson and van Damme’s two-player, two-action analysis combines separate limit uniqueness and noise-independent selection results. Similarly, the results in Section 2 for continuum player, symmetric binary action games simultaneously showed that there was a unique strategy surviving iterated deletion of strictly dominated strategies in the limit (a limit uniqueness result) and characterized behavior in the limit (the Laplacian action) independent of the structure of the noise (a noise-independent selection result). Frankel, Morris, and Pauzner (2000) (hereafter, FMP) examine global games with many players, asymmetric payoffs, and many actions. They show that a limit uniqueness result holds quite generally, as long as some monotonicity properties are satisﬁed. They consider the following environment. Each player has an ordered set of actions (ﬁnite or continuum); his payoff depends on the action proﬁle played and a payoff parameter θ ∈ R; he observes a signal xi = θ + σ εi , where σ > 0, and εi is an independently distributed noise term. For sufﬁciently low values of θ , each player has a dominant strategy to choose his lowest action, and that for sufﬁciently high values of θ , each player has a dominant strategy to choose his highest action. Each player’s payoffs are supermodular in the action proﬁle, implying that each player’s best response is increasing in others actions (for any θ ). Each player’s payoffs are supermodular in his own action and the state, implying that his best response is increasing in the payoff parameter θ (for any given actions of his opponents). Under these substantive assumptions, and additional technical assumptions,9 FMP show a limit uniqueness result. The proof uses the technique, also followed in Section 2.2, of ﬁrst analyzing the uniform prior, private values game and showing a uniqueness result independent of the size of the noise; and then showing that, if the noise is small, all equilibria of the game with a general prior and common values are close to the unique equilibrium of the uniform prior, private values game. The limit uniqueness result of FMP provides a natural many-player, many-action generalization of Carlsson and van Damme (1993a). It is true that Carlsson and van Damme required no strategic complementarity and other monotonicity properties. But, when a two-player, two-action game has multiple Nash equilibria (the interesting case for Carlsson and van Damme’s analysis), there are automatically strategic complementarities. FMP’s limit uniqueness 9

Payoffs are continuous with respect to actions and θ , and there is a Lipschitz bound on the sensitivity of payoffs to changes in own and others’ actions. The state is drawn according to a continuous and positive density, and signals are drawn according to a continuous and positive density with bounded support.

88

Morris and Shin

results could presumably be extended straightforwardly to many-dimensional payoff parameters and signals, if the relevant monotonicity conditions were suitably adjusted.10 Within this class of monotonic global games where limit uniqueness holds, FMP also provide sufﬁcient conditions for noise-independent selection. They generalize the notion of a potential maximizing action, due to Monderer and Shapley (1996). We will discuss these generalized potential conditions in more detail in Section 4.4, because they are also sufﬁcient for the (more demanding) property of being robust to incomplete information. The sufﬁcient conditions for noise-independent selection encompass two classes of games already discussed in this survey: many-player, two-action, symmetric payoff games (where the Laplacian action is played); and two-player, two-action games, with possibly asymmetric payoffs (where the risk dominant equilibrium is played). They also encompass two-player, three-action games with symmetric payoffs. They encompass the minimum effort game of Bryant (1983).11 FMP also provide an example of a two-player, four-action, symmetric payoff game where noise-independent selection fails. Thus, there is a unique limit as the noise goes to zero, but the nature of the limit depends on the exact distribution of the noise. Carlsson (1989) gave a three-player, two-action example in which noise-independent selection failed. Corsetti, Dasgupta, Morris, and Shin (2000) describe a global games model of currency crises, where there is a continuum of small traders and a single large trader. This is thus a many-player, two-action game with asymmetric payoffs. We show that the equilibrium selected as noise goes to zero depends on the relative informativeness of the large and small traders’ signals. This is thus an application where noise-independent selection fails. We conclude this brief summary by noting one consequence of FMP for the earlier analysis in this paper. In Section 2.2, it was shown that the Laplacian action was selected in symmetric binary action global games. The argument exploited the fact that players observed signals with iid noise in that class of games. But, FMP show noise-independent selection of the Laplacian action independent of the distribution of noise. If the distribution of noise is very different for different players, we surely cannot guarantee that each player has a uniform belief over the proportion of his opponents taking each action. Nonetheless, the Laplacian action must be played in the limit. We can illustrate this implication with a simple example. Consider a three-player game, with binary action set {0, 1}. The payoff to action 1 is θ if both of the other players choose action 1, θ − z if one other player chooses action 1, and θ − 1 if neither 10

11

The conditions for limit uniqueness in FMP conditions could also presumably be weakened in a number of directions. For example, with additional restrictions on the noise structure, one could perhaps use the monotone comparative statics under uncertainty techniques of Athey (2001, 2002), as in lemma 2.3. Carlsson and Ganslandt (1998) show the potential maximizing action is selected in the minimum effort game when players’ continuous actions are perturbed.

Global Games

89

player chooses action 1 (where 0 < z < 1). The payoff to action 0 is zero. State θ is uniformly distributed on the real line. Observe that the Laplacian action is 1 if 13 θ + 13 (θ − z) + 13 (θ − 1) > 0 [i.e., θ > 13 (z + 1)]. Let ε1 , ε2 , and ε3 be i.i.d. with symmetric c.d.f. F(·), let δ be a very small positive number, and let σ be a parameter describing the size of the noise. The players’ signals x1 , x2 , and x3 are given by x1 = θ + σ δε1 , x2 = θ + σ δε2 , x 3 = θ + σ ε3 . Thus, 1 and 2 observe much more informative signals. We will look for a switching strategy equilibrium, where players 1 and 2 use cutoff x σ and player 3 uses cutoff xσ . Let xσ − x σ λσ = F . σ We are interested in what happens in the limit as ﬁrst we take δ → 0, and then take the limit as σ → 0. As δ becomes very small, if player 1 or 2 observes signal x σ , he will assign probability (about) 12 (1 − λσ ) to both players choosing action 1, probability (about) 12 to one player choosing action 1, and probability (about) 12 λσ to neither player choosing action 1; although, if player 3 observes signal xσ , he will assign probability λσ to both players choosing action 1, probability 0 to one player choosing action 1, and probability 1 − λσ to neither player choosing action 1. Thus, we must have: 1 1 1 (1 − λσ )x σ + (x σ − z) + λσ (x σ − 1) = 0, 2 2 2 xσ − z) + (1 − λσ )( xσ − 1) = 0. xσ + 0 ( λσ Rearranging gives: 1 1 z + λσ , 2 2 x σ = 1 − λσ . xσ =

As σ → 0, we must have x σ → xσ and thus λσ → 23 (1 − 12 z) [so, −1 xσ must both converge to 13 (z + 1). ( xσ − x σ )/σ −→ F (λσ )]. Thus, x σ and But this gives the result that the Laplacian action is played by all players in the limit, independent of the shape of F. 4.2.

Higher-Order Beliefs

In global games, the importance of the noisy observation of the underlying state lies in the fact that it generates strategic uncertainty, that is, uncertainty about others’ behavior in equilibrium. That strategic uncertainty is generated by

90

Morris and Shin

players’ uncertainty about other players’ payoffs. Thus, understanding global games involves understanding how equilibria depend on players’ uncertainty about other players’ payoffs. But, clearly, it is not going to be enough to know each player’s beliefs about other players’ payoffs. We must also take into account each player’s beliefs about other players’ beliefs about his payoffs, and further such higher-order beliefs. Players’ payoffs and higher-order beliefs about payoffs are the true primitives of a game of incomplete information, not the asymmetric information structure. In earlier sections, we told an asymmetric information story about how there is a true state of fundamentals θ drawn from some prior and each player observes a signal of θ generated by some technology. But, our analysis of the resulting game implicitly assumes that there is common knowledge of the prior distribution of θ and the signaling technologies. It is hard to defend this assumption literally when the original purpose was to get away from the unrealistic assumption that there is common knowledge of the realization of θ . The classic arguments of Harsanyi (1967–1968) and Mertens and Zamir (1985) tell us that we can assume common knowledge of some state space without loss of generality. But such a common knowledge state space makes sense with an incomplete information interpretation (a player’s “type” is a description of his higher-order beliefs about payoffs), but not with an asymmetric information interpretation (a player’s “type” is a signal drawn according to some ex ante ﬁxed distribution); see Battigalli (1999) and Dekel and Gul (1996) for forceful defenses of this position. Thus, we believe that the noise structures analyzed in global games are interesting because they represent a tractable way of generating a rich structure of higher-order beliefs. The analysis of global games represents a natural vehicle to illustrate the power of higher-order beliefs at work in applications.12 But, then, the natural way to understand the “trick” to global games analysis is to go back and understand what is going on in terms of higher-order beliefs. Even if one is uninterested in the philosophical distinction between incomplete information and asymmetric information, there is a second reason why the higher-order beliefs literature may contribute to our understanding of global games. Even keeping a pure asymmetric information interpretation, we can calculate (from the prior distribution over θ and the signal technologies) the players’ higher-order beliefs about payoffs. Statements about higher-order beliefs about payoffs turn out to represent a natural mathematical way of characterizing which properties of the prior distribution and signal technologies matter for the results. The pedagogical risk of emphasizing higher-order beliefs is that readers may conclude that playing in the uniquely rational way in a global game requires fancy powers of reasoning, some kind of hyperrationality that allows them to reason to an arbitrarily high number of levels. We emphasize that the fact that either the analyst or a player expresses information about the game in terms 12

For work on higher-order beliefs not using the global games technology, see Townsend (1983); Allen, Morris, and Postlewaite (1993); Shin (1996); and the discussion of Section 4.1 of Allen and Morris (2000).

Global Games

91

of higher-order beliefs does not make standard equilibrium concepts any less compelling and does not suggest any particular view about how equilibrium behavior might be arrived at. In particular, recall that there is a very simple heuristic that will generate equilibrium behavior in symmetric binary action games. If there is not common knowledge of the environment you are in, you should hold diffuse beliefs about others’ behavior. In particular, if you are on the margin between your two actions, it seems reasonable to take the agnostic view that you are equally likely to hold any rank in the population concerning your evaluation of the desirability of the two actions. Thus, if other people behave like you, you should make your decision on the assumption that the proportion of other players choosing each action is uniformly distributed. This reasoning sound naive, but actually generates a very simple heuristic for behavior that is consistent with the unique rational behavior. In the remainder of this section, we ﬁrst informally discuss the role of higherorder beliefs in a global game example. Then, we review brieﬂy the theoretical literature on higher-order beliefs in games.13 Finally, we show how results from that literature can be taken back to the analysis of global games. Monderer and Samet (1989) introduced a natural language for characterizing players’ higher-order beliefs. Fix a probability p ∈ (0, 1]. Let be a set of possible states, and let E be any subset of . The event E is p-believed at state ω among some ﬁxed group of individuals if everyone believes that it is true with probability at least p (and we write B pE for the set of states where event E is p-believed). The event E is common p-belief at state ω if it is p-believed, it is p-believed that it is p-believed, and so on, up to an arbitrary number of levels [and we write C p (E) for the set of states where event E is common p-belief ]. The event E is p-evident if whenever it is true, it is p-believed (i.e., E ⊆ B pE). Monderer and Samet proved the following result: Proposition 4.1. Event E is common p-belief at ω [i.e., ω ∈ C p (E)] if and only if there exists a p-evident event F such that ω ∈ F ⊆ B p E. This result provides a ﬁxed-point characterization (i.e., using the p-evident property) of an iterative deﬁnition of common p-belief. It thus generalizes Aumann’s classic characterization of common knowledge (Aumann, 1976). We will illustrate these properties of higher-order beliefs in the global games setting.14 So, consider again the two-player example of Section 2.1: θ is drawn uniformly from the real line and players i = 1, 2 each observe a signal 13

14

Our review of this literature is much abbreviated and highly selective. See Fudenberg and Tirole (1991) Chapter 14; Osborne and Rubinstein (1994) Chapter 5; Geanakoplos (1994); and Dekel and Gul (1996) for more background on this material. Morris and Shin (1997) survey the higherorder beliefs in game theory literature with a focus on the relationship to related literatures in philosophy and computer science. Kajii and Morris (1997c) survey this literature with a focus on the relation to the standard reﬁnements literature in game theory. Monderer and Samet (1989) characterized common p-belief for discrete state spaces, but Kajii and Morris (1997b) show the straightforward extension to continuum state spaces.

92

Morris and Shin

xi = θ + εi , where εi is distributed normally with mean 0 and standard deviation σ . Thus, the relevant state space is R3 , with typical element (θ, x1 , x2 ). Fix the payoff relevant event E k = {(θ, x1 , x2 ) : θ ≥ k}; this is the set of states where the true θ is at least k. If player i observes signal xi , he will assign probability (xi − k/σ ) to the event E k being true. Thus, he will assign probability at least p to the event E k exactly if xi ≥ k + σ −1 ( p) ≥ k. Thus B p E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p),

i = 1, 2}. √ Now, if player i observes xi , he assigns probability (xi − κ)/ 2σ to player j observing a signal√ above κ, and he assigns probability at least p to that event exactly if xi ≥ κ + 2σ −1 ( p). In addition, player i knows for sure whether xi is greater than κ. Thus for

B p B p E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p) √ + max{0, 2σ −1 ( p)}, for i = 1, 2} and, by induction, [B p ]n E k = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p) √ + (n − 1) max{0, 2σ −1 ( p)},

for i = 1, 2}.

(4.1)

So C p E k = ∩ [B p ]n E n≥1 ∅, = {(θ, x1 , x2 ) : xi ≥ k + σ −1 ( p),

if p > 12 for i = 1, 2}, if p ≤ 12 .

Thus, a remarkable feature of this simple example is that for any p > 12 , there is never common p-belief that θ is greater than k, for any k. We could also have shown this using the characterization of common p-belief described in proposition 4.1. For any k, event E k is p-evident only if p ≤ 12 . This is because a player observing signal k will always assign probability 12 to his opponent observing a signal less than k. A key property of global games is that they fail to deliver nontrivial common p-belief and p-evident events (for high p). As we will see, the existence of such events is key to supporting multiple equilibria in incomplete information games. Combining this information structure with the payoffs from the two-player example of Section 2.1, we can illustrate the extreme sensitivity of strategic outcomes to players’ higher-order beliefs. Recall that each player had to choose between not investing (with payoff 0) and investing (with payoff θ if the other player invests, and payoff θ − 1 otherwise). The unique equilibrium involved each player i investing if his signal xi was greater than 12 and not otherwise. This result was independent of σ (the scale variable of the noise). Now observe that if 1 σ ≤ , √ 5(1 + (n − 1) 2)−1 ( p)

Global Games

93

then [by equation (4.1)] for all θ ,

2 2 θ, , 5 5

∈ [B p ]n E 1 . 5

In words, suppose that each player observed signal 25 . If we ﬁx any integer n and any p < 1, we may choose σ sufﬁciently small such that it is p-believed that it is p-believed that (n times) . . . that θ is greater than 15 . If it was common knowledge that θ was greater than 15 , it would clearly be rational for both players to invest. But, the unique rational behavior has each player not investing. Rubinstein (1989) used his electronic mail game to illustrate this sensitivity of strategic outcomes to common knowledge. Monderer and Samet (1989) showed why n levels of p-belief or even knowledge was not enough to approximate common knowledge in strategic settings, and common p-belief (i.e., an inﬁnite number of levels) is required. The idea behind this observation is illustrated in the next section. Morris, Rob, and Shin (1995) showed why only some Nash equilibria (e.g., risk-dominated equilibria) were sensitive to higherorder beliefs and not others, and provided a characterization – related to the lack of common p-belief events – of which (discrete state) information systems displayed an extreme sensitivity to higher-order beliefs (see also Sorin, 1998). Kajii and Morris (1997a) introduced a notion of robustness to incomplete information to characterize equilibria that are not sensitive to higher-order beliefs. This work is reviewed and related back to global games in Sections 4.4 and 4.5.

4.3.

Common p-Belief and Game Theory

Fix a ﬁnite set of players 1, . . . , I and a ﬁnite action set Ai for each player i. A complete information game is then a vector of payoff functions, g ≡ (g1 , . . . , g I ), where each gi : A → R. A (discrete state) incomplete information I I game is then a collection {, π, (Pi )i=1 , (u i )i=1 }, where is a countable state space, π ∈ () is a prior probability on that state space, Pi is the partition of the state space of player i; and u i : A × → R is the payoff function of player i. I I , (u i )i=1 }, we may For any given incomplete information game {, π, (Pi )i=1 write |g| for the set of states in the incomplete information game where payoffs are given by g. Thus, |g| = {ω ∈ | u i (a, ω) = gi (a)

for all

a ∈ A and i = 1, . . . , I } .

Using this language, we can summarize some key observations from the theoretical literature on higher-order beliefs in game theory. A pure strategy Nash equilibrium a ∗ of a complete information game, g, is said to be a pdominant equilibrium (Morris, Rob, and Shin, 1995) if each player’s action is a best response whenever he assigns probability at least p to his opponents

94

Morris and Shin

choosing according to a ∗ , i.e., λ(a−i )gi (ai∗ , a−i ) ≥ λ(a−i )gi (ai , a−i ) a−i ∈Ai

a−i ∈Ai

∗ for all i = 1, . . . , I , ai ∈ Ai and λ ∈ (A−i ), such that λ(a−i ) ≥ p.

Lemma 4.2. If a ∗ is a p-dominant equilibrium of complete information game I I g, then every incomplete information game {, π, (Pi )i=1 , (u i )i=1 } has an equi∗ p librium where a is played with probability 1 on the event C (|g|). The proof of this result is straightforward. The event C p (|g|) is itself a p-evident event. Consider the modiﬁed incomplete information game where each player is constrained to choose according to a ∗ when he p-believes the event C p (|g|). Find an equilibrium of that modiﬁed game. By construction, a ∗ is played with probability 1 on the event C p (|g|). But, the equilibrium of the modiﬁed game is also an equilibrium of the original game. If a player i p-believes the event C p (|g|), then he p-believes that other players are choosing ∗ . But, because his payoffs are given by g and a ∗ is a p-dominant equilibrium, a−i ∗ ai must be a best response for player i. Because every strict Nash equilibrium is a p-dominant equilibrium for some p < 1, we immediately have: Corollary 4.3. If a ∗ is a strict Nash equilibrium of complete information game g, then there exists p < 1, such that every incomplete information game I I , (u i )i=1 } has an equilibrium where a ∗ is played on the event {, π, (Pi )i=1 p C (|g|). Thus, if we took a sequence of incomplete information games where in the limit payoffs are common knowledge, and close to the limit they are common p-belief (with p close to 1) with ex ante probability close to 1, then payoffs from equilibria of that sequence of incomplete information games must converge to payoffs in the limit game. Monderer and Samet (1989) proved such a lower hemicontinuity result. One can also ask a converse question: what is the relevant topology on information systems, such that information systems close to common knowledge information systems deliver outcomes that are close to common knowledge outcomes. Monderer and Samet (1996) and Kajii and Morris (1998) characterize such topologies (for different kinds of information system). 4.4.

Robustness to Incomplete Information

Let a ∗ be a pure strategy Nash equilibrium of complete information game g; a ∗ is robust to incomplete information if every incomplete information game where payoffs are almost always given by g has an equilibrium where players

Global Games

95

almost always choose a ∗ [Kajii and Morris (KM), 1997a)].15 More precisely, a ∗ is robust to incomplete information if, for all δ > 0, there exists ε > 0, such that every incomplete information game where π(|g|) ≥ 1 − ε has an equilibrium where a ∗ is played by all players on an event with probability at least 1 − δ. Robustness (to incomplete information) can be seen as a very strong reﬁnement of Nash equilibrium. Kajii and Morris (1997b) provide a detailed account of the relation between robustness and the existing reﬁnements literature, which we brieﬂy summarize here. The reﬁnements literature examines what happens to a given Nash equilibrium in perturbed versions of the complete information game. A weak class of reﬁnements requires only that the Nash equilibrium continues to be equilibrium in some nearby perturbed game [Selten’s (1975) notion of perfect equilibrium is the leading example of this class]; a stronger class requires that the Nash equilibrium continues to be played in all perturbed nearby games [Kohlberg and Mertens’ (1986) notion of stable equilibria is the leading example of this class]. Robustness belongs to the latter, stronger class of reﬁnements. Moreover, robustness to incomplete information allows an extremely rich set of “perturbed games.” In particular, while Kohlberg and Mertens allow only independent action trembles across players, the deﬁnition of robustness leads to highly correlated trembles and thus an even stronger reﬁnement. Indeed, KM construct an example in the spirit of Rubinstein (1989) to show that even a game with a unique Nash equilibrium, which is strict, may fail to have any robust equilibrium. Yet it turns out that a large set of games do have robust equilibria. KM provided two sufﬁcient conditions. The ﬁrst is that if a ∗ is the unique correlated equilibrium of g, then a ∗ is robust. The second sufﬁcient condition comes from a generalization of the notion of p-dominance. Fix a vector of probabilities, p = ( p1 , . . . , p I ), one for each player. Action proﬁle a ∗ is a p-dominant equilibrium if each player i’s action is a best response whenever he assigns probability at least pi to his opponents choosing according to a ∗ , i.e., λ(a−i )gi (ai∗ , a−i ) ≥ λ(a−i )gi (ai , a−i ) a−i ∈Ai

a−i ∈Ai

∗ ) such that λ(a−i ) ≥ pi . If a ∗ is for all i = 1, . . . , I , ai ∈ Ai , and λ ∈ (A−i I a p-dominant equilibrium for some p with i=1 pi ≤ 1, then a ∗ is robust to incomplete information. This property is a many-player, many-action generalization of risk dominance. KM proved this result by showing a surprising property of higher-order beliefs. Say that an event is p-believed (for some vector of probabilities p) if each player i believes it with probability at least pi ; and the event is common p-belief if it is p-believed, is p-believed that it is it I p-believed, etc. KM show that if vector p satisﬁes i=1 pi ≤ 1, and an event

15

KM deﬁne the property of robustness to incomplete information for mixed strategy equilibria also, but most of the sufﬁcient conditions described previously apply only to pure strategy proﬁles. For this reason, we focus on pure strategy proﬁles in the discussion that follows.

96

Morris and Shin

has a high probability, then with high probability that event is common p-belief. A generalization of lemma 4.2 then proves the robustness result. Further sufﬁcient conditions for robustness exploit the idea of potential games due to Monderer and Shapley (1996). A function v : A → R is a potential function for complete information game g, if v(ai , a−i ) − v(ai , a−i ) = gi (ai , a−i ) − gi (ai , a−i ) for all i = 1, . . . , I , ai , ai ∈ Ai , and a−i ∈ A−i . This property implies that the game g has identical mixed strategy best response correspondences to the common interest game with common payoff function v. Observe that a ∗ is thus a Nash equilibrium of g if it is a local maximizer of v (i.e., it is not possible to increase v by changing one player’s action). Monderer and Shapley suggested if a game has multiple Nash equilibria, the global maximizer of v (which must of course be a local maximizer and thus a Nash equilibrium) is a natural candidate for selection. If action proﬁle a ∗ is the strict maximum of a potential function v for complete information game g, we say that a ∗ is potential maximizer of g. Ui (2001) shows that a potential maximizing action proﬁle is necessarily robust to incomplete information.16 Many-player, two-action, symmetric payoff games are potential games, so this result provides a proof that the strategy proﬁle where all players choose the Laplacian action is robust to incomplete information.17 The p-dominance sufﬁcient conditions and potential game sufﬁcient conditions for robustness can be uniﬁed and generalized. We very brieﬂy sketch the main ideas and refer the reader to Morris (1999) for more details. Action proﬁle a ∗ is a characteristic potential maximizer of the complete information game g if there exists a function v : 2{1,...,I } → R with v({1, . . . ,I }) > v(S) for all S = {1, . . . , I }, and µi : Ai → R+ such that for all i, ai ∈ Ai , and a−i ∈ A−i , v({ j : a j = a ∗j }) − v({ j : a j = a ∗j } ∪ {i}) ≥ µi (ai )(gi (ai , a−i ) − gi (ai∗ , a−i )). Here, v(·) is a potential function that depends only on the set of players choosing according to a ∗ . In this sense, the characteristic potential maximizer condition strengthens the potential maximizer condition. But, the earlier equalities are replaced with inequalities, and the constants µi also add extra degrees of freedom. So, the characteristic potential maximizer condition neither implies nor is implied by the potential maximizer condition. Any characteristic potential maximizing action proﬁle is robust to incomplete information. One can use duality to show that if a ∗ is a p-dominant equilibrium for some p arguments I with i=1 pi ≤ 1, then a ∗ is a characteristic potential maximizer.18 16

17 18

Ui uses a slightly weaker version of robustness to incomplete information, where all types in the perturbed game either have payoffs given exactly by the complete information game g or have a dominant strategy to choose some action. Morris (1997) previously provided an independent argument showing the robustness of the Laplacian strategy proﬁle. Ui (2000) extends these ideas with a set-based notion of robustness to incomplete information.

Global Games

97

Let the actions of each player be ordered, and for any action ai ∈ Ai , write ai− for the action below ai and ai+ for the action above ai . Action proﬁle a ∗ is a local potential maximizer of the complete information game g if there exists a local potential function v : A → R with v(a ∗ ) > v(a) for all a = a ∗ and, for each i, µi : Ai → R+ , such that for all i = 1, . . . , I and a−i ∈ A−i , gi (ai , a−i ) if ai > ai∗ v(ai , a−i ) − v(ai− , a−i ) ≥ µi (ai ) −gi (ai− , a−i ) (4.2) and v(ai , a−i ) − v(ai+ , a−i ) ≥ µi (ai )

gi (a i , a−i ) −gi ai+ , a−i

if ai < ai∗ .

One can show that if a ∗ is a local potential maximizer, then a ∗ is both a potential maximizer and a characteristic potential maximizer. Thus, it generalizes both conditions. If a ∗ is a local potential maximizer of g, and g satisﬁes strategic complementarities and each gi (ai , a−i ) is concave with respect to ai , then a ∗ is robust to incomplete information. The following two-player, three-action, symmetric payoff game satisﬁes the strategic complementarity and concavity conditions, and one can show that (0, 0) is the local potential maximizer and thus robust (the earlier conditions do not help to characterize robustness in this example; see Table 3.5): Table 3.5. Payoffs in three-action example

0 1 2

0

1

2

4, 4 0, 0 −3, −6

0, 0 1, 1 0, 0

−6, −3 0, 0 2, 2

In fact, the local potential maximizer condition can be used to characterize the unique robust equilibrium in generic two-player, three-action, symmetric payoff games. 4.5.

Noise-Independent Selection

If an action proﬁle is robust to incomplete information, we know that – roughly speaking – any way that a “small” amount of incomplete information is added cannot prevent that action proﬁle being played in equilibrium. This observation has important implications for global games. Consider a global game where payoffs depend continuously on a random parameter θ (which could be multidimensional), and each player observes a noisy signal xi = θ + σ εi . If a ∗ is a robust equilibrium of the game being played at θ ∗ , then there will always be an equilibrium of the global game (for small σ ) where action proﬁle a ∗ is

98

Morris and Shin

almost always played whenever all players observe signals close to θ ∗ . In other words, there will be no way of adding noise that will prevent action proﬁle a ∗ being played in the neighborhood of θ ∗ in some equilibrium. Thus, if there is limit uniqueness [say, because there are strategic complementarities and the other assumptions of Frankel, Morris, and Pauzner (2000) are satisﬁed], then a ∗ must be played in the unique limit for every noise distribution. In the language of Section 4.1, a ∗ must be the noise-independent selection. Here is a heuristic argument for this claim. Fix θ ∗ and let a ∗ be a Nash equilibrium of the complete information game at θ ∗ that is robust to incomplete information. By deﬁnition, if a ∗ is robust to incomplete information in game u(·, θ ∗ ), every incomplete information game where payoffs are almost always given by u(·, θ ∗ ) has an equilibrium where a ∗ is almost always played. Generically, it will also be true that every incomplete information game where payoffs are almost always close to u(·, θ ∗ ) will have an equilibrium where a ∗ is almost always played. But now consider an incomplete information where some types of each player have payoffs close to u(·, θ ∗ ) (“sane” types), although some types may have very different payoffs (“crazy” types). Suppose that conditional on any player being sane, with probability close to 1, he assigns probability close to 1 to all other players being sane. Now, the robustness arguments described previously could be adapted to show that this incomplete information game has an equilibrium where, conditional on all players being sane, a ∗ is almost always played. Now, return to the global game and write B(θ ∗ , δ) for a δ ball around θ ∗ (i.e., the set of θ within Euclidean distance δ of θ ∗ ). For a generic choice of θ ∗ , a ∗ will remain robust to incomplete information close to θ ∗ [i.e., at all θ ∈ B(θ ∗ , δ) for some sufﬁciently small δ > 0]. Now, consider a sequence of global games where we let the noise go to zero (i.e.,σ → 0). For ﬁxed δ and ﬁxed q < 1, we can choose σ sufﬁciently small such that conditional on a player observing a signal in B(θ ∗ , δ), with probability at least q, he will assign probability at least q to all other players observing signals within B(θ ∗ , δ). Labeling the types who observe signals in B(θ ∗ , δ) “sane” and types who observe signals not in B(θ ∗ , δ) “crazy,” this argument shows that there is an equilibrium where a ∗ is almost always played in a neighborhood of θ ∗ .19 5. RELATED MODELS: LOCAL HETEROGENEITY AND UNIQUENESS There are a number of ways that adding local heterogeneity to a population of players can remove multiplicity. In this section, we will attempt to give some intuition for a general logic at work. We start with a familiar example. 19

There is a technical problem formalizing this argument. The robustness analysis described in Section 4.4 was carried out in discrete state spaces, where existence of equilibrium in incomplete information games is never a problem. In the uncountable state space setting of global games, it would be necessary to impose extra assumptions to ensure existence.

Global Games

99

There are two players, 1 and 2, and each player i has a payoff parameter xi . Expected payoffs are given by Table 3.6 : Table 3.6. Payoffs in private value example

Invest NotInvest

Invest

NotInvest

x1 , x2 0, x2 − 1

x1 − 1, 0 0, 0

If there was common knowledge that x1 = x2 = x ∈ (0, 1), then there would be multiple strict Nash equilibria of the complete information game. Because both pure strategy equilibria are strict, they seem quite stable. It seems surprising that an apparently “small” perturbation could remove either equilibrium. But, now let x be a publicly observed random variable and let x1 = x2 = x. Let players be restricted to switching strategies, so that player i will invest if his payoff parameter exceeds some cutoff ki and not invest otherwise. Thus, player i’s strategy is parameterized by a number ki . Because the game is symmetric, we can write b∗ (k) to the optimal cutoff of any player if he expects his opponent to choose cutoff k. Clearly, we have 0, if k ≤ 0 b∗ (k) = k, if 0 ≤ k ≤ 1 . 1, if 1 ≤ k. This function is plotted in Figure 3.6. Symmetric equilibria will exist when this best response function crosses the 45◦ line. So, there are a continuum of equilibria: for any x ∈ [0, 1], there is an equilibrium where each player follows a switching strategy with cutoff x. If we perturb this best response function, we would expect there to be a ﬁnite number of equilibria (i.e., a ﬁnite number of points where the function b∗ crosses the 45◦ line). Given the shape of the best response function, it does not ... .. . ... ........... ......................................................... . .... ..... .... ... .... .... ... . . .... . ... .... .... ... .... ... ... .... .... .... . . ... . .... .... .... .... .... ... .... ... ... . . . .... . .... ... .... ∗ .. .... .. ... ..... .... .... . . .... ... .... .... ... .... .... ... .... .... ... . . .... . . .... .... .... .... .... ... .... .... .... . .... . ... .... ...... . ... ...... ........................................................................................................................................................................................................................................................................ . . ... .. .. .

1

b (k)

0

1 ∗

Figure 3.6. Function b (k).

k

100

Morris and Shin

seem surprising that there might be natural ways of perturbing the best response function so that there is a unique equilibrium. The two-player example of Section 2.1 represented one way of carrying out such a perturbation. There, it was assumed that there was a payoff parameter θ , and each player i observed a noisy signal xi = θ + σ εi . The payoffs in Table 3.6 then represent the expected payoffs of the players, given their signals. Recall signal x j is that a player observing signal xi will believe that his opponent’s √ distributed normally with mean xi and standard deviation 2σ . If σ = 0 in that example, so there is no noise in the signal, we have exactly the scenario described previously with best response function b∗ . But, if σ > 0, then the best response function rotates clockwise a little bit and crosses the 45◦ line only at 12 (see Figure 3.1) and there is a unique equilibrium. However, this argument does not really rely on the incomplete information interpretation. The important feature of the argument is the local heterogeneity in payoffs: a player with payoff parameter xi knows that he is interacting with other player(s) who have some perhaps different, but nearby, payoff parameters; and he knows that those other player(s) in turn know that they are interacting with other player(s) who have some perhaps different, but nearby, payoff parameters. In the remainder of this section, we will see how a similar logic to the global game argument can arise when players are interacting not with unknown types of an opponent, but with (known) opponents at different locations or at different points in time.20,21 5.1.

Local Interaction Games

A continuum of players are evenly distributed on the real line. If a player does not invest, his payoff is 0. If he invests, his payoff is x + l − 1, where x is his location and l is a weighted average of the proportion of his neighbors investing. In particular, let f (·) be the√density of a standard normal distribution with mean 0 and standard deviation 2σ ; a player puts weight f (z) on the actions of players at location x + z. This setup describes a game among a continuum of players. The analysis of this game is identical to the analysis of the continuum player example of Section 2.1. In particular, players at locations less than 12 will not invest, and 20

21

This logic also emerges in the the models of Carlsson (1991) and Carlsson and Ganslandt (1998), where players’ continuous action choice is subject to a small heterogeneous tremble. The exact connection to global games is not known. A distinctive feature of these arguments relying on local heterogeneity is that a very small amount of heterogeneity is sufﬁcient to imply unique equilibrium in environments where there are multiple strict equilibria without heterogeneity. One can also sometimes obtain uniqueness results assuming global, not local, heterogeneity (i.e. assuming that each player or type has the same, but sufﬁciently diffuse, beliefs about other players or types’ payoff parameters). Such global heterogeneity uniqueness arguments rely on the existence of a sufﬁciently large amount of heterogeneity. See Baliga and Sj¨ostr¨om (2001) in an incomplete information context (where global heterogeneity corresponds to independent types); Herrendorf, Valentinyi, and Waldmann (2000) and Glaeser and Scheinkman (2000) in models of large population interactions; and Frankel (2000b) in the context of a dynamic model with payoff shocks.

Global Games

101

players at locations above 12 will invest. This is despite the fact that, if players were interacting only with people at the exact same location (i.e., σ = 0), there would be multiple equilibria at all locations between 0 and 1. This rather stylized game illustrates the possibility that in local interaction games, play at some locations may be inﬂuenced by play at distant locations via the structure of local interaction. A literature on local interaction games has examined this type of effect.22 To understand the connection a little better, imagine a local interaction game where payoffs depend in a nonlinear way on location. Thus, let the payoff to investing be ψ(x) + l − 1 (instead of x + l − 1). Furthermore, suppose that ψ(x) < 12 for all x and that ψ(x) < 0 for some open interval of values of x. For small σ , this game will have a unique equilibrium where no player ever invests. To see why, note that for sufﬁciently small σ , players inside the open interval where ψ(x) < 0 will have a dominant strategy to not invest. But, now players close to the edge of that interval will have about 1 their neighbors within that interval, and thus [since ψ(x) < 12 always] will not 2 invest in equilibrium. This argument will iterate to ensure that no investment takes place anywhere. This argument has very much the ﬂavor of the contagion argument developed by Ellison (1993) and others. There, a population with constant payoffs interacts with near neighbors on a line. Players choose best responses to some average behavior of their neighbors. But, a low rate of mutations ensures small neighborhoods where each action is played with periodically arise randomly. Once a risk-dominant action is played in a small neighborhood, it will tend to spread to the whole population under the best response dynamics. The initial mutant region where the risk-dominant action is played plays much the same role as the dominant strategy region in the story described previously. In this setting with strategic complementarities, best response dynamics mimic iterated deletion of strictly dominated strategies. Morris (1997) describes more formally an exact relationship between a version of Rubinstein’s (1989) e-mail game and a version of Ellison’s contagion effect, and describes more generally an exact equivalence between games of incomplete information and local interaction games.23 The connection between games of incomplete information and local interaction games can be exploited. In evolutionary models, local interaction leads to much faster convergence to stochastically stable states than global interaction, because of the contagious dynamics. But, there is a very close connection between which action will spread contagiously in a local interaction game and which action will be played in the limit in a global game. In particular, recall from Section 4.1 that some games have a noise-independent selection (i.e., an action proﬁle played in the limit of a global game, independent of the noise 22 23

For example, Blume (1995), Ellison (1993), and Young (1998). See Glaeser and Scheinkman (2000) for a recent survey. Hofbauer (1998, 1999) introduces an approach to equilibrium selection in a local interaction environment. His “spatially dominant equilibria” seem to coincide with those that are robust to incomplete information.

102

Morris and Shin

structure); whereas in other games, the action played in the limit depends on the noise structure. Translated to a local interaction setting, this result implies that some games that have the same action tend to spread contagiously, independent of the structure of interaction, whereas in other games ﬁne details of the local interaction structure will determine which action is contagious [see Morris (1999) for details]. Thus, local interaction may not just speed up convergence to stochastically stable states, but may change the stochastically stable states in subtle ways.24 5.2.

Dynamic Games

5.2.1.

Dynamic Payoff Shocks

A continuum of players each live for an instant of time. If a player does not invest, his payoff is 0. If he invests, his payoff is x + l − 1, where x is the date at which he lives and l is a weighted average of the proportion of players investing at other points in time. In particular, let f (·) be the √ density of a standard normal distribution with mean 0 and standard deviation 2σ ; a player puts weight f (z) on the actions of players living at date x + z. This setup describes a game among a continuum of players. The analysis of this game is identical to the analysis of the continuum player example of Section 2.1 and thus also the local interaction example of the previous section. In particular, players will not invest before date 12 and will invest after date 12 . This is despite the fact that, if players were interacting only with people making contemporaneous choices (i.e., σ = 0), there would be multiple equilibria at all dates between 0 and 1. This was a very stylized example. But, the logic is quite general. In many dynamic strategic environments where choices are made at different points in time, a player’s payoff may depend not only on contemporaneous choices, but also on choices made by other players at other times. Payoff conditions may be varying through time. Thus, players’ optimal choices may depend indirectly on environments, where payoffs are very different from what they are now. These features may allow us to identify a unique equilibrium. We discuss two approaches that exploit this logic.25 One approach has been developed recently in Burdzy, Frankel, and Pauzner (2001), Frankel and Pauzner (1999), and Frankel (2000a).26 A continuum of players are periodically randomly matched in a two-player, two-action game. 24 25

26

Morris (2000) also exploits techniques from the higher-order beliefs literature to prove new results about local interaction. Morris (1995) describes a third approach. Suppose that players are deciding whether to invest or not invest at different points in time, but they make their decisions in private and their watches are not synchronized. Thus, each player will believe that the time on any other player’s watch is close to his own, but not identical. Risk-dominant play may result even when perfect synchronization would have allowed multiple equilibria. See also Frankel and Pauzner (2000) and Levin (2000a) for applications following this approach.

Global Games

103

For simplicity, we can think of them playing the investment game described in matrix (2.1). But assume that the publicly observed common payoff parameter θ evolves through time according to some random process [a random walk in Burdzy, Frankel, and Pauzner (2001), a continuous Brownian motion in Frankel and Pauzner (1999)]. Furthermore, suppose that each player can only occasionally alter his behavior: Revision opportunities arrive according to a Poisson process and arrive slowly relative to changes in the game’s payoffs. Under certain conditions on the noise process (roughly equivalent to the sufﬁciently uniform prior conditions in global games), there is a unique equilibrium where each player invests when θ exceeds 12 and not when θ is less than 12 . This description considerably oversimpliﬁes the analysis. For example, it is natural to assume that players observe the public evolution of θ , so they will be able to infer at any point in time (even if they cannot observe) the proportion of players taking each action. This creates an extra state variable (relative to the global games analysis), and the resulting asymmetry between the past and future complicates the analysis. Nonetheless, the logic is similar to the stylized example previously described. In particular, note how the friction in revision opportunities exactly ensures that a player making a choice given some publicly observed θ will take into account the choices that others will make at different times with different publicly observed θ.27 Levin (2000a) describes another approach that is closer to the stylized example previously described. At discrete time t, player t chooses an action. His payoff may depend on the actions of players choosing before him or the player choosing after him, but also depends on a payoff parameter θ . The payoff parameter is publicly observed and evolves according to a random walk. If players act as if they cannot inﬂuence or do not care about the action of the decision maker in the next period, then under weak monotonicity conditions (a player’s best response is increasing in others’ actions and the payoff parameter) and limit dominance conditions [the highest (lowest) action is a dominant strategy for sufﬁciently high (low) values of θ], there is a unique equilibrium. The no inﬂuence assumption makes sense if there are in fact a continuum of players at each date or if actions are observed only with a sufﬁciently long lag. In Matsui’s (1999) currency crisis model, there are overlapping generations of players, but there is a natural reason why players do not care about the actions of players preceding them.28 27

28

Matsui and Matsuyama (1995) earlier analyzed a model with Poisson revision opportunities. However, they assumed that the same game was being played through time (i.e., θ was constant), but examined the stability of different population states. The state where the whole population plays the risk-dominant action can be reached in equilibrium from the state where the whole population plays the risk-dominated action, but not vice versa. Hofbauer and Sorger (1999) show that the potential maximizing action of (many-action) symmetric potential games tends to be played in the Matsui-Matsuyama environment. Oyama (2000) shows that the 12 -dominant equilibrium is selected in this context. In a private communication, Hofbauer has reported that it also selects the “local potential maximizing action” (see Section 4.4) in two-player, three-action games with strategic complementarities and symmetric payoffs. See also Frankel (2000b) on the relationship between some of these models.

104

Morris and Shin

5.2.2.

Recurring Incomplete Information

Let θt follow a random walk, with θt = θt−1 + ηt , where each ηt is independently normally distributed with mean 0 and standard deviation τ . In period t, θt−1 is publicly observed, but θt is observed only with noise. In particular, each player i observes xit = θt + εit , where each εit is independently normally distributed with mean 0 and standard deviation σ . In each period, a continuum of players decide whether to invest with linear payoffs depending on θt (the payoff to not investing is 0, and the payoff to investing is θt + l − 1, where l is the proportion of the population investing). This dynamic game represents a crude way of embedding the static global games analysis in a dynamic setting. In particular, each period’s play of this dynamic game can be analyzed independently and is exactly equivalent to the public signals model of Section 3. In particular, θt−1 is the public signal about θt , whereas xit is player i’s private signal. A unique equilibrium will exist in this dynamic game exactly if γ (σ, τ ) ≤ 2π (i.e., σ is small relative to τ ). In Morris and Shin (2000), we sketch a continuous time version of this recurring incomplete information model and derive the continuous time sufﬁcient conditions for uniqueness. In Morris and Shin (1999a), we discuss such a recurring incomplete information model of currency crises. One distinctive implication of that analysis is that by the publicity effect, the previous period’s fundamentals may be expected to have a disproportionate inﬂuence on current outcomes. Thus, for any given actual level of fundamentals, an attack on the exchange rate is more likely when the fundamentals have just risen. Chamley (1999) considers a richer global game model with recurring incomplete information. A large population of players play a coordination game in each period, but each player has a private cost of taking a risky action that evolves through time. There is correlation in private costs and dominance regions, so that each period’s coordination game has the structure of a global game. But past actions convey information about other players’ private costs and thus (because of persistence) their current costs. Chamley identiﬁes sufﬁcient conditions for uniqueness in all periods and discusses a variety of applications. 5.2.3.

Herding

In the herding models of Banerjee (1992) and Bikhchandani, Hirshleifer, and Welch (1992), players sequentially make some discrete choice. Players do not care about each other’s actions directly, but players have private information, and so each player may partially learn the information of players who choose before him. But, if a number of early-moving players happen to observe signals favoring one action, late-moving players may start ignoring their own private information, leading to inefﬁcient herding because of the negative informational externality. Herding models share with global game models the feature that outcomes are highly sensitive to ﬁne details of the information structure. However, it is

Global Games

105

important to note that the mechanisms are quite different. The global games analysis is driven by strategic complementarities and the highly correlated signals generated by the noisy observations technology. However, sensitivity to the information structure arises in a purely static setting. The herding stories have no payoff complementarities and simple information structures, but rely on sequential choice. Dasgupta (2000a) analyzes a simple model where it is possible to see both kinds of effects at work. A ﬁnite set of players decide sequentially (in an exogenous order) whether to invest or not. Investment conditions are either bad (when each player has a dominant strategy to not invest) or good (in which case it pays to invest if all other players invest). Each player observes a signal from a continuum, with high signals implying a higher probability that investment conditions are good. All equilibria in this model are switching equilibria: each player invests only if all previous players invested and his private signal exceeds some cutoff. Such equilibria encompass herding effects: previous players’ decisions to invest convey positive information to later players and make it more likely that they will invest. They also encompass higher-order belief effects: an increase in a player’s signal makes it more likely that he will invest both because he thinks it more likely that investment conditions are good and because he thinks it more likely that later players will observe high signals and choose to invest.29 6. CONCLUSIONS Global games rest on the premise that the information received by economic agents is informative, but not so informative so as to achieve common knowledge of the underlying fundamentals. Indeed, as the information concerning the fundamentals become more and more accurate, the actions elicited in equilibrium resemble behavior when the uncertainty concerning the actions of other agents becomes more and more diffuse. This points to the potential pitfalls if we rely too much on our intuitions that are based on complete information games that allow perfectly coordinated switching of beliefs and actions. Decentralized decision making in market environments cannot be relied on to rule out inefﬁcient outcomes, so that there may be room for policies that mitigate the inefﬁciencies. The analysis of economic problems using the methods from global games is in its infancy, but the method seems promising. Global games also present a “user-friendly” face of games with incomplete information in the tradition of Harsanyi. The potentially daunting task of forming an inﬁnite hierachy of beliefs over the actions of all players in the game can be given a representation in terms of beliefs (and the behavior that they elicit) that are simple to the point of being naive. Global games go some 29

For other models combining elements of payoff complementarities and herding, see Chari and Kehoe (2000), Corsetti, Dasgupta, Morris, and Shin (2000), Jeitshcko and Taylor (2001), and Marx (2000).

106

Morris and Shin

way to bridging the gap between those who believe that rigorous game theory has a role in economics (as we do) and those who insist on tractable and usable tools for applied economic analysis. ACKNOWLEDGMENTS This study was supported by the National Science Foundation Grant 9709601(to S.M.). Section 3 incorporates work circulated earlier under the title “Private Versus Public Information in Coordination Problems.” We thank Hans Carlsson, David Frankel, Josef Hofbauer, Jonathan Levin, and Ady Pauzner for valuable comments on this paper, and Susan Athey for her insightful remarks. Morris would like to record an important intellectual debt in this area to Atsushi Kajii, through joint research and long discussions. APPENDIX A: PROOF OF PROPOSITION 2.2 We will prove the ﬁrst half of the result [s(x) = 0 for all x ≤ θ ∗ − δ]. The second half [s(x) = 0 for all x ≤ θ ∗ − δ] follows by a symmetric argument. For any given strategy proﬁle s = {si }i∈[0,1] , we write ζ (x) for the proportion of players observing signal x who choose action 1; ζ (·) will always be a continuous function of x. Write πσ (x, k) for the highest possible expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: πσ (x, k) ≡ max {ζ :ζ (x)=0 for all x 0 for the required values of x and ξ . Because we are interested in values of x in the closed interval [x, θ ∗ ] and because varying ξ generates a compact set of 䊏 distributions over l, covergence is uniform.

108

Morris and Shin

APPENDIX B: THE FINITE PLAYER CASE As we noted in the linear example of Section 2.1, analysis of the continuum and ﬁnite players can follow similar methods. Here, we brieﬂy note how to extend the uniform prior private values analysis of proposition 2.1 to the ﬁnite player case. The extension of the general prior common values analysis of proposition 2.2 is then straightforward. The setting is as in Section 2.2.1, except that there are now I ≥ 2 players, and the noise terms in the private signals are identically and independently distributed according to the density f (·). As before, π (l, x) is the payoff gain to choosing action 1 rather than action 0, if you have observed signal x and proportion l of your opponents choose action 1. Of course, now (because you have I − 1 opponents) l will always be an element of the set {0, 1/(I − 1), 2/(I − 1), . . . , 1}. Property A3 becomes: A3(I): I -Player Single Crossing: There exists a unique θ I∗ solving I −1 ∗ k=0 (1/I )π (k/(I − 1), θ I ) = 0. Observe that, as I → ∞, θ I∗ → θ ∗ (i.e., the θ ∗ of assumption A3). In the special case where I = 2, this reduces to 12 π(0, θ2∗ ) + 12 π (1, θ2∗ ) = 0; in other words, θ2∗ is the point where the risk-dominant action (Harsanyi and Selten 1988) switches from 0 to 1. Proposition 2.1 remains true as stated for the ﬁnite player game, with θ I∗ replacing θ ∗ . This was essentially shown by Carlsson and van Damme (1993b). The key step in the proof is showing that, in a symmetric strategy proﬁle, each player has uniform beliefs over the proportion of players observing a higher signal. To see why this is true, note that the probability that a player observing signal x assigns to exactly proportion n(I − 1) of his opponents signal greater than k is

x −θ I −1 k − θ I −1−n 1 f F σ I −1−n σ θ=−∞ σ k−θ n × 1− F dθ, σ ∞

where F(·) is the c.d.f. of f (·). Letting x = k − σ z and carrying out the change of variables ξ = (k − θ )/σ , this expression becomes

∞

ξ =−∞

I −1 f (ξ − z) [F(ξ )] I −1−n [1 − F(ξ )]n dξ. I −1−n

This expression is now independent of σ and k, so we may denote this expression by ψ I (n/(I − 1); z). For the same argument to work as in the continuum case, it is enough to show that ψ I (·; 0) is the uniform distribution. But, integration

Global Games

109

by parts gives ∞ I −1 n I ;0 = ψ f (ξ ) [F(ξ )] I −1−n [1 − F(ξ )]n dξ I −1 I − 1 − n ξ =−∞ ∞ I −1 = f (ξ ) [F(ξ )] I −n [1 − F(ξ )]n−1 dξ I − n ξ =−∞ = ... = =

∞

f (ξ ) [F(ξ )] I −1 dξ

ξ =−∞

1 . I

APPENDIX C: PROOF OF LEMMA 2.3 Recall the following expression for a player’s expected payoff gain to choosing action 1 for a player who has observed a signal x and knows that all other players will choose action 0 if they observe signals less than k: ∞ 1 x −θ k−θ ∗ f π 1− F , x dθ. πσ (x, k) ≡ σ σ θ=−∞ σ With a change of variables [setting z = (θ − k)/σ ], this expression becomes ∞ x −k − z π(1 − F (−z) , x) dz. πσ∗ (x, k) = f σ z=−∞ We can rewrite this expression as πσ∗ (x, k) = h(x, k, x), where

∞

h(x, k, x ) ≡ f (x, z)g(z, x )dz, z=−∞ x −k f (x, z) ≡ f −z , σ

and g(z, x ) ≡ π (1 − F(−z), x ). Now observe that, by A7, f (x, z) satisﬁes a monotone likelihood ratio property [i.e., if x > x, then f (x, z)/ f (x, z) is increasing in z]; also observe that, by A1∗ ,

g(·, x ) satisﬁes a single crossing property: there exists z ∗ ∈ R ∪ {−∞, ∞} such that g(z, x ) < 0 if z < z ∗ and g(z, x ) > 0 if z > z ∗ . Now lemma 5 in Athey (2000b) implies that h(·, k, x ) satisﬁes a single crossing property: there exists x ∗ (k, x ) such that h(x, k, x ) < 0 for all x < x ∗ (k, x ), and h(x, k, x ) > 0 for all x > x ∗ (k, x ). But by A2, we know that h(x, k, x ) is strictly increasing in x .

110

Morris and Shin

Now suppose h(x, k, x) = 0. If x < x, then h(x , k, x ) < h(x , k, x), < h(x, k, x),

by A2 by the single crossing property of h.

By a symmetric argument, we have x > x ⇒ h(x , k, x ) > h(x, k, x). Thus, there exists β : R → R such that πσ∗ (x, k) < 0 πσ∗ (x, k) = 0 πσ∗ (x, k) > 0

if if

x < β(k) x = β(k)

if

x > β(k).

Thus, if a player thinks that others are following a strategy with cutoff k, a player’s best response is to follow a switching strategy with cutoff β(k). But, by A3, we know that there exists exactly one value of k such that 1 π(l, k)dl = 0. πσ∗ (k, k) = l=0

Thus, there is a unique symmetric switching strategy equilibrium. References Allen, F. and S. Morris (2001), “Finance Applications of Game Theory,” in Advances in Business Applications of Game Theory, (ed. by K. Chatterjee and W. Samuelson), Boston, MA: Kluwer Academic Press. Allen, F., S. Morris, and A. Postlewaite (1993), “Finite Bubbles with Short Sales Constraints and Asymmetric Information,” Journal of Economic Theory, 61, 209–229. Athey, S. (2001), “Single Crossing Properties and the Existence of Pure Strategy Equilibria in Games of Incomplete Information,” Econometrica, 69, 861–889. Athey, S. (2002), “Monotone Comparative Statics under Uncertainty,” Quarterly Journal of Economics, 117(1), 187–223. Aumann, R. (1976), “Agreeing to Disagree,” Annals of Statistics, 4, 1236–1239. Baliga, S. and S. Morris (2000), “Coordination, Spillovers and Cheap Talk,” Journal of Economic Theory. Baliga, S. and T. Sj¨ostr¨om (2001), “Arms Races and Negotiations,” Northwestern University. Banerjee, A. (1992), “A Simple Model of Herd Behavior,” Quarterly Journal of Economics, 107, 797–818. Battigalli, P. (1999), “Rationalizability in Incomplete Information Games,” available at http://www.iue.it/Personal/Battigalli. Bikhchandani, S., D. Hirshleifer, and I. Welch (1992), “A Theory of Fads, Fashion, Custom, and Cultural Change as Informational Cascades,” Journal of Political Economy, 100, 992–1026. Blume, L. (1995), “The Statistical Mechanics of Best-Response Strategy Revision,” Games and Economic Behavior, 11, 111–145. Boonprakaikawe, J. and S. Ghosal (2000), “Bank Runs and Noisy Signals,” University of Warwick.

Global Games

111

Brunner, A. and J. Krahnen (2000), “Corporate Debt Restructuring: Evidence on Coordination Risk in Financial Distress,” Center for Financial Studies, Frankfurt. Bryant, J. (1983), “A Simple Rational Expectations Keynes Type Model,” Quarterly Journal of Economics, 98, 525–529. Bulow, J., J. Geanakoplos, and P. Klemperer (1985), “Multimarket Oligopoly: Strategic Substitutes and Complements,” Journal of Political Economy, 93, 488–511. Burdzy, K., D. Frankel, and A. Pauzner (2001), “Fast Equilibrium Selection by Rational Players Living in a Changing World,” Econometrica, 69, 163–189. Carlsson, H. (1989), “Global Games and the Risk Dominance Criterion,” University of Lund. Carlsson, H. (1991), “A Bargaining Model where Parties Make Errors,” Econometrica, 59, 1487–1496. Carlsson, H. and E. van Damme (1993a), “Global Games and Equilibrium Selection,” Econometrica, 61, 989–1018. Carlsson, H. and E. van Damme (1993b), “Equilibrium Selection in Stag Hunt Games,” in Frontiers of Game Theory, (ed. by K. Binmore, A. Kirman, and A. Tani), Cambridge, MA: MIT Press. Carlsson, H. and M. Ganslandt (1998), “Noisy Equilibrium Selection in Coordination Games,” Economics Letters, 60, 23–34. Chamley, C. (1999), “Coordinating Regime Switches,” Quarterly Journal of Economics, 114, 817–868. Chan, K. and Y. Chiu (2000), “The Role of (Non)Transparency in a Currency Crisis Model,” McMaster University. Chari, V. and P. Kehoe (2000), “Financial Crises as Herd Behavior,” Working Paper 600, Federal Reserve Bank of Minneapolis. Chui, M., P. Gai, and A. Haldane (2000), “Sovereign Liquidity Crises: Analytics and Implications for Public Policy,” International Finance Division, Bank of England. Chwe, M. (1998), “Believe the Hype: Solving Coordination Problems with Television Advertising,” available at http://chwe.net/michael. Corsetti, G., A. Dasgupta, S. Morris, and H. S. Shin (2000), “Does One Soros Make a Difference? The Role of a Large Trader in Currency Crises,” Review of Economic Studies. Dasgupta, A. (2000a), “Social Learning and Payoff Complementarities,” available at http://aida.econ.yale.edu/˜amil. Dasgupta, A. (2000b), “Financial Contagion Through Capital Connections: A Model of the Origin and Spread of Bank Panics,” available at http://aida.econ.yale.edu/˜amil. DeGroot, M. (1970), Optimal Statistical Decisions. New York: McGraw-Hill. Dekel, E. and F. Gul (1996), “Rationality and Knowledge in Game Theory,” in Advances in Economic Theory–Seventh World Congress of the Econometric Society, (ed. by D. Kreps and K. Wallace), Cambridge: Cambridge University Press. Diamond, D. and P. Dybvig (1983), “Bank Runs, Deposit Insurance, and Liquidity,” Journal of Political Economy, 91, 401–419. D¨onges, J. and F. Heinemann (2000), “Competition for Order Flow as a Coordination Game,” Center for Financial Studies, Frankfurt, Germany. Ellison, G. (1993), “Learning, Local Interaction, and Coordination,” Econometrica, 61, 1047–1071. Frankel, D. (2000a), “Determinacy in Models of Divergent Development and Business Cycles,” available at www.tau.ac.il/˜dfrankel. Frankel, D. (2000b), “Noise versus Shocks,” seminar notes, University of Tel Aviv.

112

Morris and Shin

Frankel, D., S. Morris, and A. Pauzner (2000), “Equilibrium Selection in Global Games with Strategic Complementarities,” Journal of Economic Theory. Frankel, D., and A. Pauzner (1999), “Expectations and the Timing of Neighborhood Change,” available at www.tau.ac.il/˜dfrankel. Frankel, D. and A. Pauzner (2000), “Resolving Indeterminacy in Dynamic Settings: The Role of Shocks,” Quarterly Journal of Economics, 115, 285–304. Fudenberg, D. and J. Tirole (1991), Game Theory. Cambridge, MA: MIT Press. Fukao, K. (1994), “Coordination Failures under Incomplete Information and Global Games,” Discussion Paper Series A 299, The Institute of Economic Research, Hitotsubashi University, Kunitachi, Tokyo, Japan. Geanakoplos, J. (1994), “Common Knowledge,” in Handbook of Game Theory, Chapter 40 of Volume 2, (ed. by R. Aumann and S. Hart), New York: Elsevier Science. Glaeser, E. and J. Scheinkman (2000), “Non-Market Interactions,” prepared for the Eighth World Congress of the Econometric Society. Goldstein, I. (2000), “Interdependent Banking and Currency Crises in a Model of SelfFulﬁlling Beliefs,” University of Tel Aviv. Goldstein, I. and A. Pauzner (2000a), “Demand Deposit Contracts and the Probability of Bank Runs,” available at http://www.tau.ac.il/˜pauzner. Goldstein, I. and A. Pauzner (2000b), “Contagion of Self-Fulﬁlling Currency Crises,” University of Tel Aviv. Harsanyi, J. (1967–1968), “Games with Incomplete Information Played by ‘Bayesian’ Players, Parts I–III,” Management Science, 14, 159–182, 320–334, and 486–502. Harsanyi, J. and R. Selten (1988). A General Theory of Equilibrium Selection in Games, Cambridge, MA: MIT Press. Hartigan, J. (1983), Bayes Theory. New York: Springer-Verlag. Heinemann, F. (2000), “Unique Equilibrium in a Model of Self-Fulﬁlling Currency Attacks: Comment,” American Economic Review, 90, 316–318. Heinemann, F. and G. Illing (2000), “Speculative Attacks: Unique Sunspot Equilibrium and Transparency,” Center for Financial Studies, Frankfurt. Hellwig, C. (2000), “Public Information, Private Information, and the Multiplicity of Equilibria in Coordination Games,” London School of Economics. ´ Valentinyi, and R. Waldmann (2000), “Ruling out Multiplicity and Herrendorf, B., A. Indeterminacy: The Role of Heterogeneity,” Review of Economic Studies, 67, 295– 308. Hofbauer, J. (1998), “Equilibrium Selection in Travelling Waves,” in Game Theory, Experience, Rationality: Foundations of Social Sciences, Economics and Ethics, (ed. by W. Leinfellner and E. K¨ohler), Boston, MA: Kluwer. Hofbauer, J. (1999), “The Spatially Dominant Equilibrium of a Game,” Annals of Operations Research, 89, 233–251. Hofbauer, J. and G. Sorger (1999), “Perfect Foresight and Equilibrium Selection in Symmetric Potential Games,” Journal of Economic Theory, 85, 1–23. Hubert, F. and D. Sch¨afer (2000), “Coordination Failure with Multiple Source Lending,” available at http://www.wiwiss.fu-berlin.de/˜hubert. Jeitschko, T. and C. Taylor (2001), “Local Discouragement and Global Collapse: A Theory of Coordination Avalanches,” American Economic Review, 91, 208–224. Kadane, J. and P. Larkey (1982), “Subjective Probability and the Theory of Games,” Management Science, 28, 113–120. Kajii, A. and S. Morris (1997a), “The Robustness of Equilibria to Incomplete Information,” Econometrica, 65, 1283–1309.

Global Games

113

Kajii, A. and S. Morris (1997b), “Common p-Belief: The General Case,” Games and Economic Behavior, 18, 73–82. Kajii, A. and S. Morris (1997c), “Reﬁnements and Higher Order Beliefs in Game Theory,” available at http://www.econ.yale.edu/˜smorris. Kajii, A. and S. Morris (1998), “Payoff Continuity in Incomplete Information Games,” Journal of Economic Theory, 82, 267–276. Karp, L. (2000), “Fundamentals Versus Beliefs under Almost Common Knowledge,” University of California at Berkeley. Kim, Y. (1996), “Equilibrium Selection in n-Person Coordination Games,” Games and Economic Behavior, 15, 203–227. Kohlberg, E. and J.-F. Mertens (1986), “On the Strategic Stability of Equilibria,” Econometrica, 54, 1003–1038. Krugman, P. (1991), “History Versus Expectations,” Quarterly Journal of Economics, 106, 651–667. Laplace, P. (1824), Essai Philosophique sur les Probabilit´es. New York: Dover (English translation). Levin, J. (2000a), “Collective Reputation,” Stanford University. Levin, J. (2000b), “A Note on Global Equilibrium Selection in Overlapping Generations Games,” Stanford University. Marx, R. (2000), “Triggers and Equilibria in Self-Fulﬁlling Currency Collapses,” University of California at Berkeley. Matsui, A. (1999), “Multiple Investors and Currency Crises,” University of Tokyo. Matsui, A. and K. Matsuyama (1995), “An Approach to Equilibrium Selection,” Journal of Economic Theory, 65, 415–434. Mertens, J.-F. and Zamir, S. (1985), “Formulation of Bayesian Analysis for Games with Incomplete Information,” International Journal of Game Theory, 10, 619–632. Merton, R. (1974), “On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,” Journal of Finance, 29, 449–470. Metz, C. (2000), “Private and Public Information in Self-Fulﬁlling Currency Crises,” University of Kassel. Milgrom, P. and J. Roberts (1990), “Rationalizability, Learning, and Equilibrium in Games with Strategic Complementarities,” Econometrica, 58, 1255–1277. Monderer, D. and D. Samet (1989), “Approximating Common Knowledge with Common Beliefs,” Games and Economic Behavior, 1, 170–190. Monderer, D. and D. Samet (1996), “Proximity of Incomplete Information in Games with Common Beliefs,” Mathematics of Operations Research, 21, 707–725. Monderer, D. and L. Shapley (1996), “Potential Games,” Games and Economic Behavior, 14, 124–143. Morris, S. (1995), “Cooperation and Timing,” available at http://www.econ.yale. edu/˜smorris. Morris, S. (1997), “Interaction Games,” available at http://www.econ.yale.edu/˜smorris. Morris, S. (1999), “Potential Methods in Interaction Games,” available at http://www. econ.yale.edu/˜smorris. Morris, S. (2000), “Contagion,” Review of Economic Studies, 67, 57–78. Morris, S., R. Rob, and H. S. Shin (1995), “ p-Dominance and Belief Potential,” Econometrica, 63, 145–157. Morris, S. and H. S. Shin (1997), “Approximate Common Knowledge and Coordination: Recent Lessons from Game Theory,” Journal of Logic, Language, and Information, 6, 171–190.

114

Morris and Shin

Morris, S. and H. S. Shin (1998), “Unique Equilibrium in a Model of Self-Fulﬁlling Currency Attacks,” American Economic Review, 88, 587–597. Morris, S. and H. S. Shin (1999a), “A Theory of the Onset of Currency Attacks,” in Asian Financial Crisis: Causes, Contagion and Consequences, (ed. by P.-R. Agenor, D. Vines, and A. Weber), Cambridge: Cambridge University Press. Morris, S. and H. S. Shin (1999b), “Coordination Risk and the Price of Debt,” available at http://www.econ.yale.edu/˜smorris. Morris, S. and H. S. Shin (2000), “Rethinking Multiple Equilibria in Macroeconomic Modelling,” NBER Macroeconomics Annual 2000, (ed. by B. Bernanke and K. Rogoff ) Cambridge, MA: MIT Press. Obstfeld, M. (1996), “Models of Currency Crises with Self-Fulﬁlling Features,” European Economic Review, 40, 1037–1047. Osborne, M. and A. Rubinstein (1994), A Course in Game Theory. Cambridge, MA: MIT Press. Oyama, D. (2000), “ p-Dominance and Equilibrium Selection under Perfect Foresight Dynamics,” University of Tokyo, Tokyo, Japan. Rochet, J.-C. and X. Vives (2000), “Coordination Failures and the Lender of Last Resort: Was Bagehot Right after All?” Universitat Autonoma de Barcelona. Rubinstein, A. (1989), “The Electronic Mail Game: Strategic Behavior under Almost Common Knowledge,” American Economic Review, 79, 385–391. Scaramozzino, S. and N. Vulkan (1999), “Noisy Implementation Cycles and the Informational Role of Policy,” University of Bristol. Schelling, T. (1960), Strategy of Conﬂict. Cambridge, MA: Harvard University Press. Selten, R. (1975), “Reexamination of the Perfectness Concept for Equilibrium Points in Extensive Games,” International Journal of Game Theory, 4, 25–55. Shin, H. S. (1996), “Comparing the Robustness of Trading Systems to Higher Order Uncertainty,” Review of Economic Studies, 63, 39–60. Shleifer, A. (1986), “Implementation Cycles,” Journal of Political Economy, 94, 1163– 1190. Sorin, S. (1998), “On the Impact of an Event,” International Journal of Game Theory, 27, 315–330. Townsend, R. (1983), “Forecasting the Forecasts of Others,” Journal of Political Economy, 91, 546–588. Ui, T. (2000), “Generalized Potentials and Robust Sets of Equilibria,” University of Tsukuba. Ui, T. (2001), “Robust Equilibria of Potential Games,” Econometrica, 69, 1373–1380. van Damme, E. (1997), “Equilibrium Selection in Team Games,” in Understanding Strategic Interaction: Essays in Honor of Reinhard Selten, (ed. by W. Albers et al.), New York: Springer-Verlag. Vives, X. (1990), “Nash Equilibrium with Strategic Complementarities,” Journal of Mathematical Economics, 19, 305–321. Young, P. (1998), “Individual Strategy and Social Structure,” Princeton, NJ: Princeton University Press.

CHAPTER 4

Testing Contract Theory: A Survey of Some Recent Work Pierre-Andre Chiappori and Bernard Salani´e

1. INTRODUCTION It is a capital mistake to theorise before one has data. Arthur Conan Doyle, A Scandal in Bohemia.

Since the early seventies, the development of the theoretical literature on contracts has been nothing short of explosive. The study of more and more sophisticated abstract models has gone hand in hand with the use of the tools of the theory to better understand many ﬁelds of economics, such as industrial organization, labor economics, taxation, insurance markets, or the economics of banking. However, it is only fair to say that the empirical validation of the theory has long lagged behind the theoretical work. Many papers consist of theoretical analyses only, with little attention to the facts. Others state so-called stylized facts often based on fragile anecdotal evidence and go on to study a model from which these stylized facts can be derived. Until the beginning of the eighties, empirical tests using actual data and econometric methods were very rare, even though the theoretical literature had by then given birth to a large number of interesting testable predictions. Although such a long lag is not untypical in economics, it is clearly unfortunate, especially when one compares our practice to that of other scientists. Even without fully sharing the somewhat extreme methodological views expressed above by Sherlock Holmes, one can hardly dispute that interactions between theory and reality are at the core of any scientiﬁc approach. To give only one example, the models of insurance markets under asymmetric information developed at the beginning of the seventies were extensively tested (and found to lack empirical support) only in the middle of the nineties. If this had been done earlier, the 20-year period could have been used to devise better models. Fortunately, a number of empirical researchers have turned their attention to the theory of contracts in recent years, so that such long lags should become less common. This survey will present a panorama of this burgeoning literature. Because new papers are appearing every week in this ﬁeld, we cannot claim to be exhaustive. We just hope that we can convey to the reader both a sense of

116

Chiappori and Salani´e

excitement at these recent developments and an understanding of the speciﬁc econometric problems involved in taking contract theory to the data. A unifying theme of our survey is the necessity of controlling adequately for unobserved heterogeneity in this literature. If it is not done properly, then the combination of unobserved heterogeneity and endogenous matching of agents to contracts is bound to create selection biases on the parameters of interest. This is given a striking illustration in a recent contribution by Ackerberg and Botticini (2002). They consider the choice between sharecropping and ﬁxedrent contracts in a tenant–landlord relationship. Standard moral hazard models stress the trade-off between incentives and risk-sharing in the determination of contractual forms. Fixed-rent contracts are very efﬁcient from the incentives viewpoint, since the tenant is both the main decision maker and the residual claimant. However, they also generate a very inefﬁcient allocation of risk, in which all the risk is borne by one agent, the tenant, who is presumably more risk averse. When uncertainty is small, risk-sharing matters less, and ﬁxed-rent contracts are more likely to be adopted. On the contrary, in a very uncertain environment, risk-sharing is paramount, and sharecropping is the natural contractual form. This prediction can readily be tested from data on existing contracts, provided that a proxy for the level of risk is available. For instance, if some crops are known to be more risky than others, the theory predicts that these crops are more likely to be associated with sharecropping contracts. A number of papers have tested this prediction by regressing contract choice on crop riskiness. The underlying argument, however, has an obvious weakness: it takes contracts as exogenously given, and disregards any possible endogeneity in the matching of agents to contracts. In other words, the theoretical prediction described holds only for given characteristics of the landlord and the agents. It can be taken to the data only to the extent that this “everything equal” assumption is satisﬁed, so that agents facing different contracts do not differ by some otherwise relevant characteristic. Assume, on the contrary, that agents exhibit ex ante heterogeneous degrees of risk aversion. To keep things simple, assume that a fraction of the agents are risk neutral, whereas the rest are risk averse. Different agents will be drawn to different crops; efﬁciency suggests that riskneutral agents should specialize in the more risky crops. But, note that riskneutral agents should also be proposed ﬁxed-rent contracts, because risk-sharing is not an issue for them. Thus, given heterogeneous risk aversions, ﬁxed-rent contracts are associated with the more risky crops, and the standard prediction is reversed. Clearly, the core of the difﬁculty lies in the fact that, although risk aversion plays a crucial role in the story, it is not directly observable. Conditional on risk aversion, the initial theoretical argument remains valid: more risk makes ﬁxedrent contracts look less attractive. This prediction can in principle be tested, but it requires that differences in risk aversion be controlled for in the estimation or that the resulting endogeneity bias be corrected in some way. The paper is divided in two parts. In Section 2, we study the effect of contractual forms on behavior. This obviously comprises the measure of the so-called “incentive effect” (i.e., the increase in productivity generated by moving to a

Testing Contract Theory

117

higher-powered incentive contract), but we adopt a more general approach here. Thus, we consider that the decision to participate in a relationship and the choice of a contract in a menu of contracts all are effects of contractual forms on behavior. Section 3 turns to the optimality of observed contracts. The central question here can be stated as follows: does the theory predict well the contractual forms that we actually observe? Section 4 provides a brief conclusion. Contract theory encompasses a very large body of literature, and we had to make choices to keep a manageable length for this survey. First, we consider only situations in which contracts are explicit and the details of the contractual agreement are available to the econometrician. In particular, we do not cover the literature on optimal risk-sharing within a group, which has rapidly developed since the initial contributions of Cochrane (1991) and Townsend (1994).1 There are also areas where excellent surveys of the empirical literature have been written recently. Thus, we will not mention any work on auctions in this survey, and we refer the reader to Laffont (1997). Similarly, we will only brieﬂy touch on the provision of incentives in ﬁrms, which is discussed by Gibbons and Waldman (1998) and Prendergast (1999). 2.

CONTRACTS AND BEHAVIOR Circumstantial evidence is a very tricky thing. Arthur Conan Doyle, The Boscombe Valley Mystery.

Several papers aim at analyzing the links between the form of existing contracts and observed behavior. A recurrent problem of this literature is related to selection issues. Empirical observation provides direct evidence of correlations between contracts and behavior. Theoretical predictions, on the other hand, are concerned with causality relationships. Assessing causality from correlations is an old problem in economics, and indeed in all of science; but the issue is particularly important in our context. Typically, one can observe that different contracts are associated with different behaviors, as documented by a large number of contributions. But, the interpretation of the observed correlations is not straightforward. One explanation is that contracts induce the corresponding behavior through their underlying incentive structure; this deﬁnes the so-called incentive effect of contracts. However, an alternative, and often just as convincing, story is that differences in behavior simply reﬂect some unobserved heterogeneity across agents and that this heterogeneity is also responsible for the variation in contract choices. Interestingly enough, this distinction is familiar to both theorists and econometricians, although the vocabulary may differ. Econometricians have for a long time stressed the importance of endogenous selection. In the presence of unobserved heterogeneity, the matching of agents to contracts must be studied with care. If the outcome of the matching process is related to the unobserved heterogeneity variable (as one can expect), then the choice of the contract is 1

See the contribution by Attanasio and Rios-Rull in this volume.

118

Chiappori and Salani´e

endogenous. In particular, any empirical analysis taking contracts as given will be biased. Contract theory, on the other hand, systematically emphasizes the distinction between adverse selection (whereby unobserved heterogeneity preexists the contractual relationship and constrains its form) and moral hazard (whereby behavior directly responds to the incentive structure created by the contract). As an illustration, consider the literature on automobile insurance contracts. The idea, here, is to test a standard prediction of the theory: Everything being equal, people who face contracts entailing more comprehensive coverage should exhibit a larger accident probability. Such a pattern, if observed, can however be given two different interpretations. One is the classical adverse selection effect a` la Rothschild–Stiglitz: high-risk agents, knowing they are more likely to have an accident, self-select by choosing contracts entailing a more comprehensive coverage. Alternatively, one can evoke moral hazard. If some agents, for exogenous reasons (say, picking up the insurance company located down the corner), end up facing a contract with only partial coverage, they will be highly motivated to adopt a more cautious behavior, which may result in lower accident rates. In practice, the distinction between adverse selection and moral hazard may be crucial, especially from a normative viewpoint.2 But it is also very difﬁcult to implement empirically, especially on cross-sectional data. Most empirical papers relating contracts and behavior face, at least implicitly, a selection problem of this kind. Various strategies can be adopted to address it. Some papers explicitly recognize the problem and merely test for the presence of asymmetric information without trying to be speciﬁc about its nature. In other cases, however, available data allow to disentangle selection and incentives. Such is the case, in particular, when the allocation of agents to contracts is exogenous, either because it results from explicit randomization or because some “natural experiment” has modiﬁed the incentive structure without changing the composition of the population. In some cases, an explicit modelization of the economic and/or econometric structure at stake leads to simultaneous estimation of selection and incentives effects. Finally, a promising direction relies on the use of panel data, the underlying intuition being that the dynamics of behavior exhibit speciﬁc features under moral hazard. 2.1.

Testing for Asymmetric Information

Several papers have recently been devoted to the empirical analysis of insurance contracts and insurees’ behavior.3 Following initial contributions by Dahlby 2

3

One of the most debated issues regarding health insurance is the impact of deductible on consumption. It is a well-established fact that, in cross-sectional data, better coverage is correlated with higher expenditure levels. But the welfare implications are not straightforward. If incentives are the main explanation, deductibles or copayments are likely to be useful, because they reduce overconsumption. However, should selection be the main driving force, then limits on the coverage level can only reduce the insurance available to risk averse agents with no gain in terms of expenditure. The result is an unambiguous welfare loss. See Chiappori (2000) for a recent overview.

Testing Contract Theory

119

(1983), Boyer and Dionne (1987), and Puelz and Snow (1994), a (nonexhaustive) list includes Chiappori and Salani´e (1997, 2000), Gouri´eroux (1999), Bach (1998), Cawley and Philipson (1999), Dionne, Gouri´eroux, and Vanasse (2001), and Richaudeau (1999).4 In most cases, the nature of the test is straightforward: conditionally on all information that is available to the insurance company, is the choice of a particular contract correlated to risk, as proxied ex post by the occurrence of an accident? This idea can be given a very simple illustration. Consider an automobile insurance context, where insurees choose between two types of coverage (say, comprehensive versus liability only). Then they may or may not have an accident during the subsequent period. The simplest representation of this framework relies on two probit equations. One describes the choice of a contract, and takes the form yi = I [X i β + εi > 0] ,

(2.1)

where yi = 1 when the insuree chose the full coverage contract at the beginning of the period, 0 otherwise; here, the X i are exogenous covariates that control for all the information available to the insurer, and β is a vector of parameters to be estimated. The second equation relates to the occurrence of an accident: z i = I [X i γ + ηi > 0] ,

(2.2)

where z i = 1 when the insuree had an accident during the period contract, 0 otherwise, and γ is a vector of parameters to be estimated.5 In this context, asymmetric information should result in a positive correlation between yi and z i conditional on X i , which is equivalent to a positive correlation between εi and ηi . This can be tested in a number of ways; for instance, Chiappori and Salani´e (2000) propose two parametric tests and a nonparametric test.6 Interestingly enough, none of these tests can reject the null hypothesis of zero correlation (corresponding to the absence of asymmetric information). These results are conﬁrmed by most studies on automobile insurance7 ; similarly, Cawley and Philipson (1997) ﬁnd no evidence of asymmetric information in life insurance. However, Bach (1998), analyzing mortgage-related unemployment insurance contracts, ﬁnds that insurees who choose contracts with 4 5

6

7

A related reference is Toivanen and Cressy (1998), who consider credit contracts. An additional problem is that, typically, claims, not accidents, are observed. The decision to ﬁll a claim is obviously inﬂuenced by many factors, including the form of the contract, which may induce spurious correlations. For that reason, most studies concentrate on accidents involving several vehicles and/or bodily injuries. See Dionne and Gagn´e (2001) for a careful investigation of these issues. One parametric test is based on a computation of generalized residuals from independent estimations of the two probits, whereas the other requires a simultaneous estimation of the two probits using a general covariance matrix for the residuals. The nonparametric approach relies on the construction of “cells” of identical proﬁles, followed by a series of χ 2 tests. One notable exception is the initial paper by Puelz and Snow (1994). However, subsequent studies strongly suggest that their result may be due to a misspeciﬁcation of the model [see Chiappori and Salani´e (2000) and Dionne et al. (2001)].

120

Chiappori and Salani´e

better (in her case earlier) coverage are more likely to become unemployed. Evidence of adverse selection has also been repeatedly found in annuity markets. Following earlier work by Friedman and Warshawski (1990) and Brugiavini (1993) shows that, controlling for age and gender (the two variables used for pricing), annuity buyers have a longer life expectancy than the rest of the population. Recently, Finkelstein and Poterba (2000) have studied the annuity policies sold by a large UK insurance company since the early 1980s. Again, the systematic and signiﬁcant relationships they ﬁnd between ex post mortality and some relevant characteristics of the policies suggest that adverse selection may play an important role in that market. For instance, individuals who buy more backloaded annuities are found to be longer lived, whereas policies involving payment to the estate in the event of an early death are preferred by customers with shorter life expectancy. This empirical literature on asymmetric information in insurance suggests a few general insights. One is that asymmetric information may be an important issue in some insurance markets, but not in others. Ultimately, this is an empirical question, and the last word should be given to empirical analysis instead of theoretical speculations. From a more methodological perspective, the treatment of the information available to both the insuree and the insurer appears as a key issue. Correctly controlling for this information is a crucial, but quite delicate, task. It may be, for instance, that the linear forms used are not ﬂexible enough, in the sense that they omit relevant nonlinearities or cross-effects.8 Should this be the case, then the resulting, omitted variable bias will result in a spurious correlation between contract choices and risk that could mistakenly be interpreted as evidence of asymmetric information. A last conclusion is that static models may miss important dimensions of the problem. In automobile insurance, for instance, experience rating is known to play an important role. Insurers typically observe past driving records; these are highly informative on accident probabilities, and, as such, are used for pricing. Again, omitting history in the probit regressions will generate a bias toward overestimating the importance of asymmetric information. However, in the presence of unobserved heterogeneity, the introduction of variables reﬂecting past behavior raises complex endogeneity problems. In many cases, an explicit model of the dynamics of the relationship will be required. 2.2.

Experiments

The most natural way to overcome selection problems is to make sure that the allocation of people to contracts is fully exogenous. Assume that different people are assigned to different contracts in a purely random way; then differences in observed behavior can safely be analyzed as responses to the different 8

Chiappori and Salani´e argue that the use of simple, linear functional forms (such as logit or probit) should be restricted to homogeneous populations, such as “young” drivers. An additional advantage of this approach is that it avoids the problems raised by experience rating.

Testing Contract Theory

121

incentive structures at stake. Random assignment may be seen as an ideal situation, a kind of “ﬁrst-best” context for testing contract theory. Such situations, although infrequent, can however be found; their analysis generates some of the most interesting and robust conclusions of the literature. The best example of a random experiment of this kind certainly is the celebrated Rand Health Insurance Experiment (HIE).9 Between November, 1974 and February 1977, the HIE enrolled families in six sites in the United States. Families participating in the experiment were randomly assigned to one of 14 different insurance plans, involving different coinsurance rates and different upper limits on annual out-of-pocket expenses. In addition, lump-sum payments were introduced to guarantee that no family would lose by participating in the experiment. The HIE has provided extremely valuable information about the sensitivity of the demand for health services to out-of-pocket expenditures under a number of different schemes. The use of medical services was found to respond to changes in the amount paid by the insuree. The largest decrease in the use of outpatient services occurs between a free plan and a plan involving a 25% copayment rate; larger rates did not signiﬁcantly affect expenditures. The impact of the various features of the different plans could be estimated, as well as their interaction with such family characteristics as income or number of children. Also, it is possible, using the regressions results, to estimate “pure coinsurance elasticities” (i.e., the elasticity of expenditures to coinsurance rates in the absence of ceilings on out-of-pocket expenses). It is fair to say that the results of the HIE study have been extremely inﬂuential in the subsequent discussions on health plan reforms. The HIE will probably remain as one of the best empirical studies ever made in that ﬁeld, a “Rolls Royce” of empirical contract theory. However, quality comes at a cost. That of the HIE (130 million 1984 dollars) may not be totally prohibitive, but is high enough to severely hamper the repetition of such experiments in the future. Fortunately, not only academics (or government agencies) are willing to run experiments of this kind. Knowledge about the incentive effects of contractual forms is valuable for ﬁrms as well; as a consequence, they may be eager to invest in acquiring such relevant information, in particular through experiments. In a recent contribution, Shearer (1999) studies the case of a tree-planting ﬁrm that randomly allocated workers to plant under piece rate and ﬁxed-wage contracts under a subset of planting conditions. Daily productivities were recorded for each worker and are used to measure the percentage difference in average productivity under both types of payment. A simple analysis of variance analysis suggests an incentive effect of piece wages of about 20 percent. In addition, Shearer estimates a structural econometric model of worker behavior. This enables him to take into account nonexperimental data as well, to impose nonlinear restrictions on the analysis of variance model, and ﬁnally to extend

9

See Manning et al. (1987).

122

Chiappori and Salani´e

his conclusions to a larger set of planting conditions. The estimates appear to be very robust: Shearer ﬁnds a lower bound of 17 percent for the incentive effect. Ausubel (1999) analyzes the market for bank credit cards. A substantial portion of bank credit card marketing today is done via direct-mailed preapproved solicitations; furthermore, several card issuers decide on the terms of the solicitations by conducting large-scale randomized trials. Ausubel uses the outcomes of such a trial to test for a standard prediction of adverse selection theory, namely that high-risk agents are more willing to accept less favorable deals.10 The trial is conducted by generating a mailing list of 600,000 customer names and randomly assigning them among equal market cells. The market cells are mailed solicitations that vary in the introductory interest rate, in the duration of the introductory offer, and in the postintroductory interest rate. Three tests can be conducted on these data. The ﬁrst test relates to a “winner’s curse” prediction: respondents should be worse borrowers than nonrespondents. Ausubel indeed ﬁnds that respondents have on average shorter credit histories, inferior credit rating, and are more borrowed-up than nonrespondents. Second, respondents to inferior offers (i.e., offers displaying a higher introductory interest rate, a shorter duration of the introductory period, or a higher postintroductory interest rate) are also worse borrowers on average, in the sense that they exhibit lower incomes, inferior credit records, lower balances on other credit cards, and higher utilization rates of credit lines on other credit cards. Note, however, that these two tests involve characteristics that are observable by the bank and hence do not correspond to adverse selection in the usual sense. On the other hand, a third test looks for hidden information by checking whether, even after controlling for the observable characteristics of respondents to inferior offers, the latter still yield a customer pool that is more likely to default. The answer is an unambiguous yes, which provides very convincing evidence supporting the existence of adverse selection on the credit card market. 2.3.

Natural Experiments

Selection issues arise naturally in a cross-sectional context: if different people are involved in different contracts, the mechanism that allocates contracts to people deserves close scrutiny. Assume, however, that the same people successively face different contracts. Then, selection is no longer a problem; in particular, any resulting change of behavior can safely be attributed to the variation of incentives, at least to the extent that no other signiﬁcant factor has changed during the same period. This is the basic insight of natural experiments: incentive effects are easier to assess when they stem from some exogenous change in the incentive structure. 10

Technically, the market for credit card exhibits nonexclusive contracts. In particular, the relevant theoretical reference is Stiglitz and Weiss (1981) rather than Rothschild and Stiglitz (1976) as in automobile insurance. Also, Ausubel (1999) focuses on testing for adverse selection, but he argues that moral hazard cannot explain his ﬁndings.

Testing Contract Theory

123

Changes in regulations constitute an obvious source of natural experiments. For instance, the automobile insurance regulation in Qu´ebec was modiﬁed in 1978 by the introduction of a “no fault” system, which in turn was deeply restructured in 1992. Dionne and Vanasse (1996) provide a careful investigation of the effects of these changes. They show in particular that the average accident frequency dropped signiﬁcantly after strong incentives to increase prevention efforts were reinstored in 1992. They conclude that changes in agents’ behavior, as triggered by new incentives, did have a signiﬁcant effect on accident probabilities.11 Another illustration is provided by the study of tenancy reform in West Bengal by Banerjee, Gertler, and Ghatak (2002). The reform, which took place in 1976, entitled tenants, upon registration with the Department of LandRevenue, to permanent and inheritable tenure on the land they sharecropped so long as they paid the landlord at least 25 percent of output as rent. The incentive impact of the reform is rather complex, because it changes the respective bargaining powers of the parties and the tenant’s incentives to invest while reducing the set of incentive devices available for the landlord. To test for the impact of the reform, the authors use two methods. One is to use neighboring Bangladesh as a control; the fact that the reform was implemented in West Bengal, but not in Bangladesh, the authors argue, was to a large extent due to an exogenous political shock. The second method compares changes in productivity across districts with different registration rates. Again, endogeneity might be a problem here; the authors carefully discuss this issue. They ﬁnd that the reform signiﬁcantly increased productivity. Regulation is not the only cause of changes in incentive structures. Periodically, ﬁrms modify their incentive schemes, introduce new rewards, or restructure their wage schedules. Natural experiments of this kind have been repeatedly analyzed. To take only one example, Lazear (2000) uses data from a large auto glass company that changed its compensation structure from hourly wages to piece rates. He ﬁnds that, in accordance with the theoretical predictions, the productivity increases sharply, half of which can be attributed to existing workers producing more. A ﬁrst potential limitation of any work of this kind is that, strictly speaking, it establishes a simultaneity rather than a causality. What the studies by Dionne and Vanasse or Lazear show is that, on a given period, outcomes have changed signiﬁcantly, and that this evolution immediately followed a structural change in incentives. But, the two phenomena might stem from simultaneous and independent (or correlated) causes. The lower rate of accidents following the 1992 Qu´ebec reform may be due, say, to milder climatic conditions. Such a “coincidence” may be more or less plausible, but it is difﬁcult to discard totally. A second and related problem is that the change in the incentive structure may well fail to be exogenous. This is particularly true for ﬁrms, which are supposed to adopt optimal contracts. If the switch from ﬁxed wages to piece rates

11

See Browne and Puelz (1999) for a similar study on U.S. data.

124

Chiappori and Salani´e

indicates that, for some reason, ﬁxed wages were the best scheme before the reform but ceased to be by the time the reform was implemented, then a direct regression will provide biased estimates, at least to the extent that the factors affecting the efﬁciency of ﬁxed wages had an impact on productivity Again, this type of explanation may be difﬁcult to discard.12 The “coincidence” problem can be overcome when the experiment provides a “control” sample that is not affected by the change, so that the effects can be estimated in differences (or more precisely differences of differences). In two recent papers, Chiappori, Durand, and Geoffard (1998) and Chiappori, Geoffard, and Kyriazidou (2000) use such data on health insurance. Following a change in regulation in 1993, French health insurance companies modiﬁed the coverage offered by their contracts in a nonuniform way. Some of them increased the level of deductible, whereas others did not. The tests use a panel of clients belonging to different companies, who were faced with different changes in coverage, and whose demand for health services is observed before and after the change in regulation. To concentrate on those decisions that are essentially made by consumers themselves (as opposed to those partially induced by the physician), the authors study the number of physician visits, distinguishing between general practitioner ofﬁce visits, general practitioner home visits, and specialist visits. They ﬁnd that the number of home visits signiﬁcantly decreased for the “treatment” group (i.e., agents who experienced a change of coverage), but not for the “control” group (for which the coverage remained constant). They argue that this difference is unlikely to result from selection, because the two populations are employed by similar ﬁrms, they display similar characteristics, and participation in the health insurance scheme was mandatory. A paper by Dionne and St.-Michel (1991) provides another illustration of these ideas. They study the impact of a regulatory variation of the coinsurance level in the Qu´ebec public insurance plan on the demand for days of compensation. The main methodological contribution of the paper is to introduce a distinction between injuries, based on the type of diagnosis; it reﬂects the fact that it is much easier for a physician to detect a fracture than, say, lower back pain. In the ﬁrst case, moral hazard (interpreted, in the ex post sense, as the tendency to cheat on the true severity of the accident) can play only a minor role, whereas it may be prevalent when the diagnosis is more difﬁcult. In a sense, the easy diagnoses play the role of a control group, although in a speciﬁc way: they represent situations where the moral hazard problem does not exist. Theory predicts that the regulatory change will have more signiﬁcant effects on the number of days of compensation for those cases where the diagnosis is more problematic. This prediction is clearly conﬁrmed by empirical evidence. A more generous insurance coverage, resulting from an exogenous regulatory change, 12

This remark illustrates a general phenomenon: if contracts are always optimal, then contract changes should always be taken as endogenous. In real life, however, (at least temporarily) inefﬁcient contracts can hardly be assumed away, which, paradoxically, may simplify a lot the task of the econometrician!

Testing Contract Theory

125

is found to increase the number of days on compensation, but only for the cases of difﬁcult diagnoses. Note that the effect thus identiﬁed is ex post moral hazard. The reform is unlikely to have triggered signiﬁcant changes in prevention; and, in any case, such changes would have affected all types of accidents. Another natural experiment based on reforms of public programs is studied by Fortin, Lanoie, and Laporte (1995), who examine how the Canadian Worker’s Compensation (WC) and the Unemployment Insurance (UI) programs interact to inﬂuence the duration of workplace accidents. They show that an increase in the generosity of WC in Qu´ebec leads to an increase in the duration of accidents. In addition, a reduction in the generosity of UI is, as in Dionne and St.-Michel, associated with an increase in the duration of accidents that are difﬁcult to diagnose. The underlying intuition is that worker’s compensation can be used as a substitute to UI. When a worker goes back to the labor market, he may be unemployed and entitled to UI payments for a certain period. Whenever worker’s compensation is more generous than UI, there will be strong incentives to delay the return to the market. In particular, the authors show that the hazard of leaving WC is 27 percent lower when an accident occurs at the end of the construction season, when unemployment is seasonally maximum.13 Finally, an interesting situation is when the changes in the incentive structure are random but endogenous. Take the example of mutual fund managers, as studied by Chevalier and Ellison (1997). The basic assumption of the paper is that fund companies have an incentive to increase the inﬂow of investments. That, in turn, depends on the fund’s performance in an implicit contract between fund companies and their customers. The authors estimate the shape of the ﬂow– performance relationship for a sample of funds observed over the 1982–1992 period, and ﬁnd that it is highly nonlinear. Such a nonlinear shape, in turn, creates incentives for fund managers to alter the riskiness of their portfolios, and these incentives vary with time and past performance. Examining portfolio holdings, the authors ﬁnd that risk levels are changed toward the end of the year in a manner consistent with these incentives. For instance, the ﬂow performance is convex for funds that are ahead of the market; and, as expected, these tend to gamble so as to increase their expected inﬂow of investment.14 In a similar vein, Oyer (1998) remarks that compensation contracts for salespersons and executives are typically nonlinear in ﬁrm revenues, which creates incentives for these agents to manipulate prices, vary effort, and inﬂuence the timing of customer purchases. Using an extensive data set (gathering ﬁrm revenue and cost of goods sold for 31,936 quarterly observations covering 981 manufacturers), Oyer ﬁnds evidence of business seasonality patterns that fully support the theoretical predictions. 13 14

See also Fortin and Lanoie (1992), Bolduc et al. (1997), and the survey by Fortin and Lanoie (1998). Chevalier and Ellison (1999) extend this approach to study the impact of career concerns on the investment decisions of mutual fund managers. For another, recent work on the incentive impact of managerial contracts, see Lemmon, Schallheim, and Zender (2000).

126

Chiappori and Salani´e

2.4.

Explicit Modeling

2.4.1.

Econometric Tools

In the absence of (natural) experiments, the endogenous matching problem is pervasive. Adequate theoretical tools may, however, allow it to be tackled in a satisfactory way. From the econometric perspective, much attention has been devoted to exogeneity tests, which ﬁnd a natural application in our context. An illustration is provided by Laffont and Matoussi (1995), who study a model of sharecropping with moral hazard. The main prediction of this class of models is that production increases with the share of the product kept by the tenant. Laffont and Matoussi use data collected in 1986 on contracts and production in a Tunisian village to test that sharecropping indeed reduces production. To do this, they estimate augmented Cobb–Douglas production functions, adding contract dummy variables as explanatory variables. They ﬁnd that moving from a sharecropping contract to a rental contract increases production by 50 percent on average. However, longer-term sharecropping relationships, which allow for delayed retaliation, tend to be much more efﬁcient, as one would expect from the repeated moral hazard literature in a context of missing credit markets (see Chiappori, Macho, Rey, and Salani´e, 1994). As presented, the Laffont–Matoussi approach seems very sensitive to the criticism of selection bias: if they ﬁnd higher production in plots with rental contracts, it may simply be that rental contracts are more often adopted for more fertile plots. Their answer to this criticism is to test for exogeneity of the contract-type variables in production functions. This they do, and they do not reject exogeneity, which validates their approach. One problem with exogeneity tests is that they may not be very powerful. As we will see, another solution to the selection bias problem is to use instruments. In fact, the exogeneity test used by Laffont–Matoussi assumes that some variables (such that the tenant’s age, his wealth, and working capital) are valid instruments for the contract variables in the production function.

2.4.2.

Structural Models of Regulation under Adverse Selection

Often, however, identiﬁcation requires a full-grown structural model. Wolak (1994) pioneered the estimation of structural models with adverse selection. His paper is set within the context of the regulator–utility relationship for California water companies. However, it is simpler to present it for a price discriminating monopoly (the principal) facing consumers (agents) with an unknown taste θ for the good. Let X be the vector of exogenous variables that are observed by both parties and by the econometrician, α be the vector of parameters we want to estimate, and let q be the quantity traded as per the contract. The observational status of θ depends on our assumptions. First, consider model S (for symmetric information), in which both Principal and Agent observe θ. Then, we obtain by

Testing Contract Theory

127

maximizing the total surplus15 a likelihood function l S (q, X, α; θ ). Note that this is conditional on θ. Now consider the more interesting model A (for asymmetric information) in which only the Agent knows θ and the Principal has a prior given by a probability distribution function f and a cumulative distribution function F. In that case, we know from the theoretical literature that under the appropriate hazard rate condition, the solution is given by maximizing the virtual surplus, which generates a likelihood function l A (q, X, α; θ, (1 − F(θ ))/ f (θ )) . Note that the latter is conditional both on θ and on (1 − F(θ ))/ f (θ ). Assume that we have data on n relationships between Principals and Agents that are identical except for the exogenous variables X , so that our sample is n (qi , X i )i=1 . The difﬁculty here is that we do not know θ or f , even in model S, in which both parties observe θ . In econometric terms, θ is an unobserved heterogeneity parameter and we must integrate over it. To do this, we must ﬁnd a functional form for f that is ﬂexible enough, given that we have very little idea of what the Principal’s prior may look like. Let ( f γ ) be such a parameterized family. We can now estimate all parameters of model S by maximizing over α and γ the log-likelihood n log l S (qi , X i , α; θ ) f γ (θ ) dθ. i=1

To estimate model A, we must ﬁrst integrate f γ to get Fγ ; then, we maximize n 1 − Fγ (θ ) A log l qi , X i , α; θ, f γ (θ ) dθ. f γ (θ ) i=1 These log-likelihood functions are obviously highly nonlinear and also require a numerical integration in both models; however, modern computers make it quite feasible to maximize them. As pointed out before, Wolak (1994) introduced this approach to study the regulation of water utilities in California in the eighties. He found that nonnested tests a` la Vuong (1989) favor model A over model S, indicating that asymmetric information is relevant in this regulation problem. Wolak also noted that using model S instead of model A may lead the analyst to conclude wrongly that returns are increasing, whereas they are estimated to be constant in model A. Finally, he was able to evaluate the underproduction that is characteristic of adverse selection models to about 10 percent in the middle of the θ range. One difﬁculty with Wolak’s method is that the econometrician observes only the conditional distribution of q, given X ; thus, identiﬁcation of the preferred 15

Assuming that utilities are quasilinear.

128

Chiappori and Salani´e

model heavily relies on functional form assumptions. Without them, it is easy to ﬁnd examples in which model S with parameters (α, F) yields exactly the same likelihood function as model A with parameters (α , F ), so that there is no way to discriminate between these two models on the basis of data. Of course, this problem is not speciﬁc to Wolak’s model; it is just the usual identiﬁcation problem in structural models, with the new twist that the parameter F is really inﬁnite-dimensional.16 Ivaldi and Martimort (1994) have used a similar approach in a model that has both market power and asymmetric information. They study competition through supply schedules in an oligopoly, where two suppliers of differentiated goods do not know the consumers’ valuations for the two goods. They model this situation as a multiprincipals game where the suppliers are the principals and the consumers are the agents. Assuming supply schedules to be quadratic, they derive the perfect Bayesian equilibrium in supply schedules and the corresponding segmentation of the market according to the valuations of consumers for the two goods. Ivaldi and Martimort apply this theoretical model to study energy supply to the French dairy industry. The ﬁrst supplier is the public sector monopoly on gas and electricity, EDF-GDF. The second supplier consists of oil ﬁrms, who are assumed to act as a cartel. Oil ﬁrms maximize proﬁt, but EDF-GDF maximizes social welfare. The authors use pseudo–maximum likelihood (Gouri´eroux, Monfort, and Trognon, 1984) to estimate the structural equations derived from their theoretical model. They ﬁnd that the estimated variance of suppliers’ priors on the valuations of consumers is signiﬁcantly positive, so that there is evidence of asymmetric information in this market. Obviously, our remark on identiﬁcation in Wolak’s model also applies here.17

2.4.3.

Structural Models Involving Moral Hazard and Selection

Structural models can be used in a more speciﬁc way to disentangle selection from incentive effects. Paarsch and Shearer (2000) analyze data from a treeplanting ﬁrm, where some workers receive a piece rate, whereas others are paid a ﬁxed wage. In their model, the decision to adopt a piece rate or a ﬁxed wage is modeled as resulting from the observation of the planting conditions by the ﬁrm. The endogeneity problem arises from the fact that neither the planting conditions nor the individual-speciﬁc cost of effort is observed by the econometrician. According to the structural model developed in the paper, 16

17

Wolak also assumes that the regulator maximizes social welfare. Timmins (2000) relaxes this assumption and estimates the relative weights of consumers’ surplus and ﬁrms’ proﬁts in the regulator’s objective function. Gagnepain and Ivaldi (2001) take the existing regulatory framework as given; they estimate the structural parameters of supply and demand and use them to simulate the optimal contracts. See also Lavergne and Thomas (2000).

Testing Contract Theory

129

ﬁxed wages are efﬁcient under poor planting conditions and for less productive employees, whereas piece rates work well in more favorable contexts. A direct comparison of observed productivities under each type of contract thus is biased, because the estimated difference results partly from the incentive effect of piece rates and partly from the selection effect. Hence observed discrepancies in productivity provide an upper bound of the incentive effect. Conversely, differences in real earnings provide a lower bound for the incentive effect. This simple idea can be taken to the data quite easily; the authors ﬁnd an upper (respectively lower) bound of 60 percent (respectively 8 percent). Finally, a parametric version of the structural model is estimated. The authors conclude that about half of the difference in productivity is due to incentive effects and half to selection. Interestingly enough, these nonexperimental ﬁndings are fully compatible with the experimental results in Shearer (1999).18 A related approach is adopted by Cardon and Hendel (2001), who consider employer-provided health insurance. As argued here, a contract that involves a larger copayment rate is likely to correspond to smaller health expenditures, either because of the incentive impact of the copayment rate or because highrisk agents self-select by choosing contracts entailing more coverage. The main identifying assumption used by Cardon and Hendel is that agents do not choose their employer on the basis of the health insurance coverage. A consequence is that whereas the allocation of individuals among the various options of a given plan typically reﬂects adverse selection, the differences in behavior across plans must be from incentive effects. Again, a structural model is needed to disentangle the two effects; the authors ﬁnd that selection effects are negligible, whereas incentives matter.19 2.5.

Using Behavioral Dynamics

If selection and moral hazard are difﬁcult to disentangle in a static context, a natural response is to turn to dynamic data.20 Adverse selection and moral hazard indeed induce different behavioral dynamics, which provides a new source for identiﬁcation. An illustration of this line of research is provided by a recent work by Chiappori, Abbring, Heckman, and Pinquet (2001). They consider a French data base provided by an automobile insurer. A particular feature of automobile insurance in France is that pricing relies on experience rating (i.e., the premium associated to any particular contract depends, among other things, on the 18

19 20

Paarsch and Shearer (1999) use a similar model, where the ﬁrm, having observed the planting conditions, chooses a speciﬁc piece rate. Again, the structural model allows the endogeneity of the rate to be taken into account. Other references include, among others, Holly et al. (1998) and Ferrall and Shearer (1999). A different but related idea is that the use of panel data allows control of unobserved heterogeneity and selection issues in a much more satisfactory way than in cross-sectional analysis. See, for instance, MacLeod and Parent (1999).

130

Chiappori and Salani´e

past history of the relationship), but the particular form experience rating may take is strongly regulated. All companies must apply the same “bonus/malus” system, according to which the premium is decomposed as the product of a “basis” premium, freely set by the insurer but independent of past history, and a bonus coefﬁcient, the dynamics of which is imposed by law. Speciﬁcally, the coefﬁcient is decreased by a factor µ < 1 after each year without an accident but increased by a factor λ > 1 after each year with an accident.21 The authors show that this scheme has a very general property, namely that each accident increases the marginal cost of (future) accidents. Under moral hazard, any accident thus increases prevention efforts and reduces accident probability. The conclusion is that for any given individual, moral hazard induces a negative contagion phenomenon: the occurrence of an accident in the past reduces accident probability in the future. The tricky part, however, is that this prediction holds only conditional on individual characteristics, whether observable or unobservable. As is well known, unobserved heterogeneity induces an opposite, positive contagion mechanism: past accidents are typical of bad drivers, hence are a good predictor of a higher accident rate in the future. The problem thus is to control for unobserved heterogeneity. This problem is fairly similar to an old issue of the empirical literature on dynamic data, namely the distinction between pure heterogeneity and state dependence. The authors show that nonparametric identiﬁcation can actually be achieved under mild identifying restrictions, even when the history available to the econometrician about each driver consists only of the number of years of presence and the total number of accidents during this period. Using a proportional hazard duration model on French data, they cannot reject the null of no moral hazard. 3. ARE CONTRACTS OPTIMAL? We now turn to tests of contract optimality. The papers we are going to survey all focus on the same question: do observed contracts have the properties predicted by contract theory? There is a sense in which the answer is always positive: given any contract, a theorist with enough ingenuity may be able to build an ad hoc theory that “explains” it. The operative word here is “ad hoc.” Clearly, there is no precise deﬁnition of what constitutes an ad hoc assumption, but there may be accepted standards. So, we can rephrase the optimality question thus: do the properties of observed contracts correspond to those that the currently standard models of contract theory predict? This new formulation makes it clear that a negative answer may be only temporary, as better models with new predictions are developed (ideally, in response to such rejections of currently standard models). 21

Currently, µ = .95 and λ = 1.25. In addition, the coefﬁcient at any time is capped and ﬂoored (at 3.5 and .5, respectively). Note that the strict regulation avoids selection problems, because the insuree cannot choose between menus involving different bonus/malus coefﬁcients, as is often the case in other countries.

Testing Contract Theory

3.1.

Static, Complete Contracts

3.1.1.

Managerial Pay

131

The standard model of moral hazard implies that managers’ pay should be sensitive to their ﬁrms’ performance. The “pay-performance sensitivity” has been estimated by many papers [for a recent survey of the evidence, see Murphy (1999)]. The seminal contribution is that of Jensen and Murphy (1990); using data on CEOs of U.S. ﬁrms from 1969 to 1983, they obtained what seemed to be very low estimates of the elasticity of executive compensation to ﬁrm performance. Their oft-quoted result was that when the ﬁrm’s value increases by $1,000, the (mean) manager’s wealth increases only by $3.25. The early reaction to Jensen and Murphy’s result was that they indicated inefﬁciently low incentives for top management (see, e.g., Rosen 1992). However, Haubrich (1994) showed that even fairly low levels of manager’s risk aversion (such as a relative index of risk aversion of about 5) were consistent with this empirical result. The intuition is that for large companies, changes in ﬁrm value can be very large and imply large swings in CEO wealth even for such lowish pay-performance sensitivity levels. Moreover, more recent estimates point to much higher elasticities. Thus, Hall and Liebman (1998) use a more recent data set (1980–1994). They show that the spectacular increase in the stock options component of managers’ pay has made their pay much more sensitive to ﬁrm performance. Their mean (respectively median) estimate of the change in CEO wealth (salary, bonus, and the change in value of stocks and stock options) linked to a $1,000 increase in ﬁrm value indeed is about $25 (respectively $5.3). Much of it is due to the change in value of stocks and stock options. Another testable implication of the moral hazard model is that payperformance sensitivity should be inversely related to the variance of the measure of performance used (typically ﬁrm value for managers). Aggarwal and Samwick (1999) show that, indeed, CEO pay is much less sensitive to performance for ﬁrms whose stock returns are less volatile.22 This result, however, may itself be sensitive to the choice of covariates.23 This illustrates a problem frequently encountered by this literature. Theory predicts the form of optimal contracts within simpliﬁed models, where comparative statics are easy to work out (one can change the level of uncertainty within a moral hazard model by varying one parameter). Taking such predictions to data typically requires some very strong “everything equal” qualiﬁcation. In practice, ﬁrms differ by the uncertainty they face, but also by their size, market share, relationship to the clients, technology, internal organization and others – all of which may be

22 23

Aggarwal and Samwick use panel data and include ﬁxed CEO effects that allow them to control for CEO risk aversion. For instance, Core and Guay (2000) ﬁnd that the sign of the relationship is reversed when controlling for ﬁrm size.

132

Chiappori and Salani´e

correlated, moreover, in various ways. In this context, sorting out one particular type of causality is a difﬁcult task indeed. Other models relate the use of a particular form of compensation to the characteristics of the task to be performed. Using various data sets, MacLeod and Parent (1999) ﬁnd, for instance, that jobs using high-power incentives are associated with more autonomy on the job, and that a high local rate of unemployment results in less discretion in pay or promotion, conﬁrming standard conclusions of incomplete contract theory. Finally, one empirical puzzle in this literature is that ﬁrms do not seem to use relative performance evaluation of managers very much.24 The theory indeed predicts that managers should not be paid for performance that is due to “observable luck,” such as a favorable industrywide exchange rate shock or a change in input prices. Bertrand and Mullainathan (2001) revisit this issue of “pay for luck”; they ﬁnd that manager pay in fact reacts about as much to performance changes that are predictable from observable luck measures as to unpredictable changes in performance. This clearly contradicts the theoretical prediction. However, Bertrand and Mullainathan also ﬁnd that better-governed ﬁrms (such as those with large shareholders) give less pay for luck, as one would expect. 3.1.2.

Sharecropping

Many papers have tested the moral hazard model of sharecropping, and we will quote only a few recent examples.25 Ackerberg and Botticini (2002) regress the type of contract (rental or sharecropping) on crop riskiness and tenant’s wealth. As explained previously, theory predicts that more risky crops are more likely to be grown under sharecropping contracts. If wealth is taken to be a proxy for risk aversion, we would also expect that richer (and presumably less risk averse) tenants are more likely to be under a rental contract. Now wealth is only an imperfect proxy for risk aversion, and as explained earlier, the unobserved component of risk aversion is likely to be correlated with crop riskiness. This implies that the error in the contract choice equation is correlated with one of the explanatory variables, and the estimators of such a naive regression are biased. To remedy this endogenous matching problem, Ackerberg and Botticini instrument the crop riskiness variable, using geographical variables as instruments. They ﬁnd that the results are more compatible with theory than a naive regression would suggest. Moreover, the implicit bias in the naive estimators goes in the direction implied by a matching of more risk-averse tenants with less risky crops: it leads to overestimating the effect of crop risk and underestimating the effect of wealth.26 24 25 26

Gibbons and Murphy (1990) argue that they do. Other recent works include, in particular, a series of papers by Allen and Lueck (1992, 1993, 1998, 1999). An alternative strategy used by Dubois (2000a, 2000b) is to independently estimate individual risk aversion (as a function of the available covariates) from panel data on consumptions (in the line of the consumption smoothing literature), then include the estimated parameter of risk aversion within the explanatory variables for the contract choice equation.

Testing Contract Theory

133

Laffont and Matoussi (1995) test a different variant of the moral hazard sharecropping model. In their story, tenants are risk neutral; but they are facing ﬁnancial constraints that limit how much risk they may take. This model predicts that tenants with less working capital tend to work under sharecropping or even wage contracts. They ﬁnd that their Tunisian data support this prediction. In either of these variants, the theory used is drastically simpliﬁed. Empirical work must often extend the theory to take into account features of real-world applications. Dubois (1999) makes a step in that direction by taking into account landlords’ concerns that tenant effort may exhaust the soil and reduce future land fertility and hence future proﬁts. This is a problem because contracts are incomplete: they cannot be made contingent on land fertility. Moreover, many contracts extend over only one season and so long-term contracts are not feasible. Then, sharecropping may be optimal even with risk-neutral tenants, as it improves future land fertility by reducing tenant effort. This “extended model” of sharecropping has some predictions that differentiate it from the “canonical model” of Stiglitz (1974) and that seem to ﬁt Dubois’ Philippines data set better. For instance, the data show that incentives are higher powered for more valuable plots of land. This is incompatible with most versions of the canonical model; on the other hand, it is quite possible under the extended model. Moreover, observed incentives are lower powered for crops such as corn that tend to exhaust the soil, as the extended model predicts. The theory also predicts that a technological shock that makes the effort of the tenant less crucial should increase the share of the landlord at the optimal contract. Hanssen (2001) argues that this is exactly what happened in the movie industry with the coming of sound in 1927. When ﬁlms were silent, the exhibitor was expected to provide musical background and other live acts. With sound ﬁlms, all of this was incorporated in the movie itself, making the receipts less sensitive to the exhibitor’s effort. Hanssen shows that, as we would expect, contracts between ﬁlm companies and exhibitors rapidly moved from ﬂat-fee rentals to the revenue-sharing agreements that now dominate the industry. Finally, when long-term contracts are available, they are effective in providing incentives for noncontractible investment. If incentive provision is costly because of information rents, long-term contracts will be used only when maintenance beneﬁts are large enough. This idea is exploited by Bandiera (2001) in her study of agricultural contracts in nineteenth century Sicily. She ﬁnds that long-term contracts were indeed used for crops requiring higher maintenance efforts. There are still some features of sharecropping contracts that are harder to explain. One of them is that the share of output that goes to the tenant is not as responsive to economic fundamentals as theory predicts it should be. Young and Burke (2001) show that, in their sample of Illinois farms, almost all contracts have the same tenant share for all types of crops, and this share is one-half for 80 percent of the contracts. They argue that such inﬂexible terms are due to local custom: whereas shares do vary across regions, they are almost constant within regions. Young and Burke put this down to fairness concerns.

134

Chiappori and Salani´e

3.2.

Multitasking

Both the managerial pay and the sharecropping literature test traditional versions of the moral hazard model; but, more recent variants have also been tested. Slade (1996) tests the multitask agency model of Holmstrom and Milgrom (1991) on contracts between oil ﬁrms and their service stations in the Vancouver area. Service stations do not only deliver gasoline, but also may act as convenience stores and/or repair cars. In multitask models, the form of the optimal contract crucially depends on complementarity patterns between tasks: incentives should be lower powered when tasks are more complementary. Slade argues that the convenience store task is likely to be more complementary to the gasoline task than the repairs task. Thus, the theory predicts that service stations that also do repairs should face higher-powered incentives than those that run convenience stores. Slade tests this prediction by running probits for contract type: service station operators may be lessee dealers (with high-powered incentives) or commissioned agents (with low-powered incentives). She ﬁnds that, as predicted by the theory, doing repairs increases the probability of running a lessee dealership, while having a convenience store reduces it. 3.3.

Incomplete Contracts/Transaction Costs

The formal literature on incomplete contracts is still rather young, and to the best of our knowledge, it has not been submitted yet to econometric testing.27 On the other hand, a number of papers have tested the main intuitions from the transactions cost literature as developed by Williamson (1975, 1985, 1996). We will give only a few examples; the reader can refer to more detailed surveys such as in Shelanski and Klein (1995). Perhaps the best-known result from the transactions cost literature, following Williamson, is that, when relationship-speciﬁc investments matter more, contracts will have a longer duration (so as to avoid hold-up problems). This has been tested by Joskow (1987). He studies the relationship between coal suppliers and electric plants that burn coal in the United States in 1979. Williamson distinguishes four types of speciﬁcity. Joskow uses three of them to construct testable predictions: r site speciﬁcity: some electric plants are “mine-mouth” (i.e., located close to the coal mine that supplies them) r physical asset speciﬁcity: electric plants are designed to burn a speciﬁc type of coal (but not necessarily from a speciﬁc supplier); Joskow argues that this consideration matters most in the West, less in the Midwest, and least in the East r dedicated asset speciﬁcity: this holds when large annual quantities are contracted for 27

We will discuss a descriptive study of Kaplan and Str¨omberg (1999).

Testing Contract Theory

135

Thus, transaction cost theory predicts that contracts should have longer duration when they involve mine-mouth plants, when the ﬁrms are in the West, and when large annual quantities are contracted for. Joskow runs a simple regression of contract duration on the three speciﬁcity variables and ﬁnds that all three hypotheses are robustly validated by the data. Crocker and Masten (1988) also test whether the determinants of contract duration conform to what transactions cost theory predicts, with one interesting twist. This goes back to the difﬁculty for the analyst to know whether actual contracts optimally maintain incentives for efﬁcient adaptation, while minimizing need for costly enforcement. Crocker and Masten argue that sometimes there is external interference from courts or government that makes contract terms deviate from the optimal trade-off in predictable ways, and this can be used by the econometrician. They use the example of natural gas, where wellhead regulation at times imposed price ceilings at the producer level. When such a price ceiling is binding, contracts should stipulate higher damages or take-or-pay rates to protect producers. Then, the contract is less efﬁcient, and the contract duration will be shorter – unless the seller fears that the next renegotiation will lead to much lower prices. Crocker and Masten indeed ﬁnd that when the price ceiling is much lower than the notional price (estimated as the latent variable in a probit model), contracts have a shorter duration. This effect is highly significant and matters a lot: price regulation may have shortened contract duration by half. Crocker and Reynolds (1993) look at the determinants of the degree of contract incompleteness itself. They argue that this results from a trade-off between the ex ante costs of crafting more detailed arrangements and the ex post costs of inefﬁciencies. Because the former increase with uncertainty and complexity and the latter increase with the likelihood of opportunistic behavior, one expects that contracts will be less complete when the environment is more uncertain and complex and when opportunistic behavior is less likely. Crocker and Reynolds test these predictions on a sample of U.S. Air Force procurement contracts. They run an ordered probit for the type of the contract on variables that proxy for uncertainty and the reputation of the supplier for opportunistic behavior. Their results support the theoretical prediction. Transactions cost theory also predicts that when quasi-rents are large, sometimes even long-term contracts will not sufﬁce, and vertical integration will take place. A number of papers have tested this prediction and generally found good support for it. An early example is Monteverdi and Teece (1982). They looked at the “make-or-buy” decision in the automobile industry: should components be produced in-house or should they be obtained from outside suppliers? They argued that the answer depends on whether making a particular component involves much or little engineering-speciﬁc knowledge. Then, they ran a probit of the make-or-buy decision on a set of variables that included a measure of engineering-speciﬁc knowledge provided to them by an independent engineer. They found that, as predicted by the theory, components tend to be made in-house when they involve more speciﬁc knowledge.

136

Chiappori and Salani´e

Some less obvious results from transactions cost theory have also been tested. Thus, Crocker and Masten (1991) look at the provisions for adjusting prices during the lifetime of contracts. Some contracts rely on “redetermination provisions”: price adjustment is predetermined through a more or less contingent price adjustment formula. Others emphasize renegotiation provisions, which more or less structure the process of renegotiating prices. Crocker and Masten argue that renegotiations provisions are more useful when the environment is more uncertain or the contract has a longer duration. To test this, they examine a 1982 sample of natural gas contracts in the United States. The observed price adjustment provisions are very diverse, but a probit model for renegotiation vs. redetermination validates the predictions of the theory. Transactions cost theory has also been tested against other theories. For instance, Hubbard and Weiner (1991) use natural gas contracts in the United States in the ﬁfties to examine whether considerations of market power or efﬁcient contracting matter most. Market power is often invoked in this market, because switching contracting parties is difﬁcult and thus there is an element of bilateral monopoly. A linear regression for contract prices (paid by the pipeline to the gas producer) indeed appears to show some evidence for pipeline monopsony power: prices are higher in regions with more pipelines. However, Hubbard and Weiner show that this is due to a spurious correlation: growing markets have more pipelines, but they also exhibit larger quasi-rents. The existence of these quasi-rents motivates the use of a most-favored-nation clause according to which a pipeline that has a contract with producer A and signs a new contract with producer B at a higher price must grant that new price to producer A. Because the most-favored-nation clause tends to be associated with higher prices, this generates the positive correlation between prices and the number of pipelines. That correlation thus appears to be due to efﬁcient contracting considerations and not to market power on either side. Most of the empirical tests of transactions cost theory have been implemented on data from relatively thin markets, where quasi-rents are large. An interesting question is whether these intuitions extend to thicker markets. This has been studied by Hubbard (1999) for the trucking industry. This is an industry in which assets are not very speciﬁc, even less so when local markets are thick. Still, there is some variation on how thick local markets are, and transactions cost theory then predicts that spot arrangements should be more likely when the local market is thicker. Hubbard runs an ordered logit on the various contractual forms in the industry that conﬁrms this prediction. It is fair to say that most of the empirical literature has been supportive of the basic ideas of transactions cost theory. Nevertheless, it is hard to feel completely satisﬁed with the methodology of these studies. One ﬁrst problem is a consequence of the somewhat vague character of some of the concepts in the theory: because quasi-rents and uncertainty are such broad categories, it is very difﬁcult to ﬁnd good proxies for them. Besides, it is not always clear what the observability/veriﬁability status of these variables is. Consider uncertainty, for instance; in this literature, it is often proxied by the volatility of a price

Testing Contract Theory

137

index. But, this is certainly veriﬁable information, so one still has to explain why the contract is not made contingent on the value of that price index. A second problem with this literature is that it usually does not control for the possible endogeneity of right-hand-side variables. Consider, for instance, Joskow’s (1987) study. One of the right-hand-side variables is a dummy variable for a mine-mouth location. But, we are not given any evidence on the determinants of the decision to site a plant mine-mouth; and that may certainly depend on unobserved factors that also inﬂuence contract duration, making the mine-mouth variable endogenous in the regression of contract duration. Because Joskow does not attempt to correct for endogeneity or to test for it, the estimates may be biased. A related point is that Joskow does not condition on the fact that these ﬁrms are not vertically integrated,28 whereas the decision to not vertically integrate again may be correlated with contract duration. Clearly, these two points exemplify the endogenous matching problem that we mentioned repeatedly: regressions of contract variables on characteristics of the parties are fraught with selection bias and endogeneity problems. Finally, what does this tell us about the more recent theory of incomplete contracts, as exposited in Hart’s (1995) book? Because many of the underlying ideas started with transactions cost theory, one might think that the relative empirical success of the older theory somehow validates the newer one. However, this would certainly be premature, as argued by Whinston (2000) for theories of vertical integration. One ﬁrst point is that, because incomplete contracts theory is more formalized, it has a much richer set of predictions than transactions cost theory does. By implication, it exposes itself more to empirical refutation. A second point is that testing incomplete contracts theory is bound to be a much more demanding task. Although we have argued that transactions cost theory relies on quasi-rents that may be difﬁcult to proxy properly, the situation is even worse for incomplete contracts theory, because predictions rather precisely depend on how the marginal returns to noncontractible investments are distributed among the parties. Measuring these marginal returns reliably enough to test the predictions of the theory will require much more highly detailed information on contracting environments than is usually present in our data sets.29 Of course, one may forgo econometrics for the moment and take a more descriptive look at the data. A ﬁrst attempt to do this is the work by Kaplan and Str¨omberg (1999), who analyze a large number of venture capital contracts. The authors argue that venture capitalists (VCs) are real-world entities who most closely approximate the investors of the theory; hence, relating theoretical predictions to real-life VC contracts will provide precious insights about the relevance of theory. Indeed, some of their ﬁndings tend to support standard predictions of the incomplete contract literature. Separate allocation of cash ﬂow 28

29

In a separate paper, Joskow (1985) explores the determinants of vertical integration for this same sample; but what we would want is a joint modeling of contract duration and the decision to integrate vertically. Whinston (2001) and Baker and Hubbard (2001) also discuss this issue.

138

Chiappori and Salani´e

and control rights is a standard feature of VC contracts. The allocation of rights is contingent on observed measures of ﬁnancial and nonﬁnancial performance, especially at early stages of the relationship. Existing contracts are consistent with a basic prediction of the theory, namely that control should be left to the manager in case of success (then the VC keeps cash ﬂow rights only), whereas it shifts to the VC when the ﬁrm’s performance is poor. Finally, the importance of noncompete and vesting provisions suggests that imperfect commitment and hold-up problems are indeed an important aspect of VC contracts. However, some theories seem to fare less well than others. “Stealing” theories a` la Hart and Moore (1998) or Gale and Hellwig (1982), for instance, rely on the impossibility of making contracts contingent on proﬁts (or other measures of ﬁnancial performance), an assumption that is not supported by the data. Finally, several problems are left open by the empirical investigation. For instance, existing theories cannot explain why we observe in these contracts that control rights are allocated across a number of dimensions, such as voting rights, board rights, or liquidation rights. Similarly, the variety and the complexity of the ﬁnancial tools used to allocate rights – convertible securities (with speciﬁc strikes), common and preferred stocks, . . . – go well beyond the simple settings (typically, debt vs. equity) considered so far. Finally, some recent studies usefully remind us that there may be more to incomplete contracting than transactions cost theory or the more recent approach. Banerjee and Duﬂo (1999) focus on the Indian-customized software industry, which writes specialized software for (usually) foreign clients. In this industry, the product is very difﬁcult to describe ex ante; the client writes a vague description of what he wants, software ﬁrms bid by announcing a price and a time schedule, and the client chooses whom he will contract with. Much of the process of describing the functions of the software is interactive and takes place after the contract is signed. Therefore, the contracts are highly incomplete and cost overruns are frequent: Three-quarters of the contracts have cost overruns, of 25 percent of planned costs on average. Because the initial description of the software is so vague, it would be impossible for a court to decide in what proportions the overruns are due to the ﬁrm or to the client. In practice, the contracts are often renegotiated in case of cost overruns to increase the price the software ﬁrm is paid. Banerjee and Duﬂo ﬁnd that the client is more generous in these renegotiations when he faces an older ﬁrm, especially if he has already contracted with that ﬁrm in the past. Banerjee and Duﬂo put it down to reputation effects: they argue that older ﬁrms have shown in the past that they were reliable, all the more so if the client has already dealt with them. They show that alternative explanations ﬁt the data less well.30 McMillan and Woodruff (1999) use a survey of private ﬁrms in Vietnam to investigate the determinants of trade credit. Vietnam does not have a reliable

30

In particular, this cannot be due to optimal risk-sharing, because younger ﬁrms tend to be smaller than older ﬁrms.

Testing Contract Theory

139

legal system, so trust matters a great deal. McMillan and Woodruff indeed ﬁnd that a ﬁrm tends to grant more trade credit to its customers when these have no alternative supplier, when the supplier has more information about the customer’s reliability, and when the supplier belongs to a business or social network that makes information available and/or makes it easier to enforce sanctions. Baker and Hubbard (2000a) investigate the impact on asset ownership of technological changes that modify the contractibility of actions. They consider the U.S. trucking industry, where the introduction, in the late 1980s, of on-board computers (OBCs) allowed contracts to be made contingent on various operating parameters of trucks (speed, etc.). Because of the exogenous enlargement of the space of feasible contracts, suboptimal behavior becomes monitorable, and the need for powerful incentive schemes (such as ownership by drivers) is reduced. Using a survey of the U.S. trucking ﬂeet, they actually ﬁnd that OBC adoption leads to less driver ownership. All OBCs are not equal, however: some improve the monitoring of drivers and others improve the coordination of the ﬂeets. Baker and Hubbard (2000b) argue that this distinction is relevant to the make-or-buy decision (whether the shipper should use an internal or an external ﬂeet): equipments that improve monitoring (respectively coordination) should lead to more (respectively less) integration. Using the same survey, they ﬁnd supporting evidence for this prediction. 3.4.

Dynamics of Contracts

Finally, a few papers have tried to take the predictions of dynamic contract theory to data. This is a difﬁcult task, if only because the theory is often inconclusive or relies on very strong assumptions that are difﬁcult to maintain within an applied framework.31 Still, interesting insights have emerged from this line of work. Three types of models have been considered in the literature. One is the pure model of repeated adverse selection; a second one considers repeated moral hazard; ﬁnally, a couple of papers have recently been devoted to empirical testing of models entailing symmetric learning. 3.4.1.

Dynamic Models of Asymmetric Information

An important contribution is due to Dionne and Doherty (1994), whose model of repeated adverse selection with one-sided commitment transposes previous 31

For instance, most papers in the ﬁeld assume that agents cannot freely save or borrow, so that the dynamics of their consumption can be fully monitored by the principal (whether the latter is an employer, a landlord, or an insurance company). When this assumption is relaxed, the models typically use very speciﬁc preferences (such as constant absolute risk aversion with monetary cost of effort) to guarantee that income effects do not matter. For a detailed discussion in a moral hazard context, see Chiappori et al. (1994).

140

Chiappori and Salani´e

work by Laffont and Tirole (1990) to a competitive framework. The key testable prediction is that, in a repeated adverse selection framework of this kind, whenever commitment is possible for the insurer, then optimal contracts entail experience rating and exhibit a “highballing” property (i.e., the insurance company makes positive proﬁts in the ﬁrst period, compensated by low, below-cost second-period prices). Dionne and Doherty test this property on Californian automobile insurance data. According to the theory, when contracts with and without commitment (from the insurer) are simultaneously available, contracts entailing commitments will typically attract low-risk agents. The presence of highballing is empirically characterized by the fact that the loss to premium ratio should rise with the cohort age. If insurance companies are classiﬁed according to their average loss per vehicle (which reﬂects the “quality” of their portfolio of insurees), one expects the premium growth to be negative for the best-quality portfolios; in addition, the corresponding slope should be larger for ﬁrms with higher average loss ratios. This prediction is conﬁrmed by the data. Insurance companies are classiﬁed into three subgroups. The slope coefﬁcient is negative and signiﬁcant for the ﬁrst group (with lowest average loss), positive and signiﬁcant for the third group, and nonsigniﬁcant for the intermediate group. Dionne and Doherty conclude that the “highballing” prediction is not rejected. In a recent contribution, Margiotta and Miller (2000) analyze a dynamic model of managerial compensation under moral hazard. Their framework is reminiscent of that introduced by Fudenberg, Holmstrom, and Milgrom (1990): the manager’s utility function exhibits constant absolute risk aversion, so that wealth effects do not make the analysis untractable. They estimate the model from longitudinal data on returns to ﬁrms and managerial compensations. Obviously, the dynamic nature of the data introduces more robustness into the estimations, compared with simple cross-sectional analysis. In particular, it allows mitigation of an obvious selection problem with cross-sectional data: the level of incentives provided by the manager’s contract should be endogenous to the ﬁrm’s situation, and the latter may impact the outcome in a nonobservable way. The conclusions drawn by Margiotta and Miller are particularly interesting in view of the Jensen–Murphy controversy. They ﬁnd that, although the beneﬁts of providing incentives are large, the costs are small, in the sense that even a relatively small fraction of the ﬁrm’s shares is generally sufﬁcient to induce the required level of effort. 3.4.2.

Symmetric Learning

Finally, several works test a model of symmetric but incomplete information and learning. The basic reference, here, is the labor contract paper by Harris and Holmstrom (1992), in which the employer and the employee have identical priors about the employee’s ability and learn at the same pace from the employee’s performance. This setting has been applied with success to labor contracts, but also to long-term insurance relationships.

Testing Contract Theory

141

An application to internal labor markets is proposed by Chiappori, Salani´e, and Valentin (1999). Their model borrows the two main ingredients of the Harris and Holmstrom framework, namely symmetric learning and downward rigidity of wages (the latter being explained either by risk-sharing considerations as in the initial model or by hold-up problems and contractual incompleteness). They show that optimal contracts should then exhibit a “late beginner” effect: if two agents, A and B, are at the same wage level at date 0 and at date 2, but A’s wage at date 1 was higher, then B has better future prospects for date 3 and later. They test this prediction on data on contracts and careers within a French public ﬁrm. Interestingly enough, careers, in this context, must be analyzed as sequences of discrete promotions – a feature that requires speciﬁc econometric tools. The results very strongly conﬁrm the predictions: the “late beginner” effect seems like a crucial feature of careers in the context under consideration. Recently, the same type of model has been applied to life insurance contracts by Hendel and Lizzeri (2000). They exploit an interesting database of contracts that includes information on the entire proﬁle of future premiums. Some contracts involve commitment from the insurer, in the sense that the evolution of premia will not be contingent on the insuree’s health status, whereas under the other contracts future premiums are increased if the insuree’s health condition deteriorates. According to the theory, commitment implies front loading (initial premiums should be higher than without commitment, because they include an insurance premium against the reclassiﬁcation risk) and a lower lapsation rate (a fraction of the agents whose health has actually deteriorated would be strictly worse off if they were to change company). These predictions are satisﬁed by existing contracts. Even more interesting is the fact that this conﬁrmation obtains only for general life insurance. Accidental death contracts exhibit none of these features, as one would expect, given that learning considerations are much less prominent. Finally, in such a context, any friction that limits the agent’s mobility between contracts is welfare-improving, because the precommitment of insurees to stay in the pool helps mitigate the uninsurability of the reclassiﬁcation risk. This idea is exploited by Crocker and Moran (1997) in a study of employer-provided health insurance contracts, for which precommitment is proxied by the difﬁculty for workers of switching jobs. They show that when employers must offer the same contract to all of their workers, then the optimal contract exhibits a coverage limitation that is inversely proportional to the degree of employee job lock. If, on the other hand, employers are able to offer multiple contracts that experience-rate the insurees, then the optimal contract exhibits full coverage of medical expenditures, albeit at second-period premiums that partially reﬂect each individual’s observable health status. Crocker and Moran conﬁrm these predictions on data with insurance coverages using proxies for job lock: the insurance contracts associated with ﬁrms who offer a single policy exhibit coverage limitations that are decreasing in the amount of employee job lock, and those ﬁrms offering multiple plans to their workforce have higher levels of coverage that are insensitive to the degree of job lock.

142

Chiappori and Salani´e

4. CONCLUSIONS “Data! data! data!” he cried impatiently. “I can’t make bricks without clay.” Arthur Conan Doyle, The Adventure of the Copper Beeches.

We hope this survey has shown that the econometrics of contracts is a very promising and burgeoning ﬁeld. Although empirical testing of the theory of contracts started in the eighties, most of the papers we have surveyed were indeed written in the last 5 years. For a long time, econometricians could be heard echoing Sherlock Holmes’s complaint about lack of data on contracts. It is true that some researchers have gone far to ﬁnd their data [as far as Renaissance Tuscany for Ackerberg and Botticini (2002)]. Still, it has proven much less difﬁcult than expected to ﬁnd data amenable to econometric techniques. In fact, we draw the impression from Bresnahan’s (1997) earlier World Congress survey that the situation is somewhat worse in industrial organization. It is still true that many papers in this ﬁeld use similar data and/or focus on similar problems, as shown by the number of papers on sharecropping or natural gas we surveyed. We would certainly want to see wider-ranging empirical work in the future. Insurance data are very promising in that respect, because they are fairly standardized, come in large data sets, and can be used to test many different theories. It can also be hoped that, in the future, ﬁrms will be less averse to opening their personnel data to researchers, as they did to Baker, Gibbs, and Holmstrom (1994a, 1994b). Our conclusion on the importance of incentive effects echoes that of Prendergast (1999) for incentives in ﬁrms: the recent literature, as surveyed in Section 2, provides very strong evidence that contractual forms have large effects on behavior. As the notion that “incentives matter” is one of the central tenets of economists of every persuasion, this should be comforting to the community. On the other hand, it raises an old puzzle: if contractual form matters so much, why do we observe such a prevalence of fairly simple contracts? More generally, the question asked in Section 3 is whether observed contracts take the form predicted by the theory. As we have seen, the evidence is more mixed in that regard. However, it is reassuring to see that papers that control adequately for selection and endogeneity bias have generally been more supportive of the theory. Throughout this survey, we emphasized the crucial role of the selection, matching, and contract endogeneity issues. These problems are prevalent in the two approaches we distinguish (i.e., whether one is testing for the optimality of contracts or for the behavioral impact of given contractual forms). It can be argued that selection issues are probably even more difﬁcult to address in the ﬁrst case, because our theoretical understanding of situations involving “realistic” forms of unobserved heterogeneity is often very incomplete. To take but one example, Rothchild and Stiglitz’s (1976) celebrated model of insurance under adverse selection assumes identical preferences across agents. Should risk aversion differ across insurees as well, then the shape of the equilibrium

Testing Contract Theory

143

contract is not fully known for the moment.32 It is safe, however, to predict that where the theory cannot be reconciled with the facts, new and improved models will emerge. Thus we hope that some econometricians will be inspired by this survey to contribute to the growing literature on testing of contract theory, while negative empirical ﬁndings may prompt some theorists to improve the theory itself. As an example of this potentially fruitful dialog between theorists and econometricians, the empirical ﬁndings by Chiappori and Salani´e (1997, 2000) and others that the standard models of insurance do not ﬁt the data well in some insurance markets has led Chassagnon and Chiappori (1997), Jullien, Salani´e, and Salani´e (2000), and de Meza and Webb (2001) to propose new models of insurance that are based on a combination of moral hazard and adverse selection. Similarly, new tools have recently been developed that allow tackling the possible coexistence of several types of unobserved heterogeneity.33 We hope to see more of this interplay between theory and testing in the future. ACKNOWLEDGMENTS We thank our discussant Patrick Legros and Jeff Campbell, Pierre Dubois, Phillippe Gagnepain, Lars Hansen, Jim Heckman, Patrick Legros, Bruce Shearer, Steve Tadelis, and Rob Townsend for their comments. This paper was written while Salani´e was visiting the University of Chicago, which he thanks for its hospitality.

References Ackerberg, D. and M. Botticini (2002), “Endogenous Matching and the Empirical Determinants of Contract Form,” Journal of Political Economy, 110(3), 564–91 Aggarwal, R. and A. Samwick (1999), “The Other Side of the Trade-off: The Impact of Risk on Executive Compensation,” Journal of Political Economy, 107, 65–105. Akerlof, G. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. Allen, D. W. and D. Lueck (1992), “Contract Choice in Modern Agriculture: Cash Rent Versus Cropshare,” Journal of Law and Economics, 35, 397–426. Allen, D. W. and D. Lueck (1993), “Transaction Costs and the Design of Cropshare Contracts,” Rand Journal of Economics, 24(1), 78–100. Allen, D. W. and D. Lueck (1998), “The Nature of the Farm,” Journal of Law and Economics, 41, 343–386. Allen, D. W. and D. Lueck (1999), “The Role of Risk in Contract Choice,” Journal of Law, Economics and Organization, 15(3), 704–736. Ausubel, L. (1999), “Adverse Selection in the Credit Card Market,” mimeo, University of Maryland. 32 33

See Landsberger and Meilijson (1999). See Rochet and Stole in this volume (pp. 150–197).

144

Chiappori and Salani´e

Bach, K. (1998), Negativauslese und Tarifdifferenzierung im Versicherungs-sektor. DUV, Schesslitz. Baker, G., M. Gibbs, and B. Holmstrom (1994a), “The Internal Economics of the Firm: Evidence from Personnel Data,” Quarterly Journal of Economics, 109, 881– 919. Baker, G., M. Gibbs, and B. Holmstrom (1994b), “The Wage Policy of a Firm,” Quarterly Journal of Economics, 109, 921–955. Baker, G. and T. Hubbard (2000a), “Contractibility and Asset Ownership: On-Board Computers and Governance in U.S. Trucking,” NBER Working Paper 7634. Baker, G. and T. Hubbard (2000b), “Make vs. Buy in Trucking: Asset Ownership, Job Design, and Information,” mimeo, Harvard University. Baker, G. and T. Hubbard (2001), “Empirical Strategies in Contract Economics: Information and the Boundary of the Firm,” American Economic Review, 91, 189–194. Bandiera, O. (2001), “On the Structure of, Tenancy Contracts: Theory and Evidence from 19th Century Rural Sicily,” CEPR Working Paper 3032. Banerjee, A. and E. Duﬂo (1999), “Reputation Effects and the Limits of Contracting: A Study of the Indian Software Industry,” mimeo, MIT. Banerjee, A., P. Gertler, and M. Ghatak (2002), “Empowerment and Efﬁciency: Tenancy Reform in West Bengal,” Journal of Political Economy, 110, 239–280. Bertrand, M. and S. Mullainathan (2001), “Are CEOs Rewarded for Luck? The Ones without Principals are,” Quarterly Journal of Economics, 116, 901–932. Bolduc, D., B. Fortin, F. Labrecque, and P. Lanoie (1997), “Workers’ Compensation, Moral Hazard and the Composition of Workplace Injuries,” mimeo, HEC, Montreal. Boyer, M. and G. Dionne (1989), “An Empirical Analysis of Moral Hazard and Experience Rating,” Review of Economics and Statistics, 71, 128–134. Bresnahan, T. (1997), “Testing and Measurement in Competition Models,” in Advances in Economics and Econometrics–Theory and Applications, Volume 3, (ed. by D. Kreps and K. Wallis), Econometric Society Monographs, 28, Cambridge University Press, pp. 61–81. Browne, M., and R. Puelz (1999), “The Effect of Legal Rules on the Value of Economic and Non-Economic Damages and the Decision to File,” Journal of Risk and Uncertainty, 18, 189–213. Brugiavini, A. (1993), “Uncertainty Resolution and the Timing of Annuity Purchase,” Journal of Public Economics, 50, 31–62. Cardon, J. and I. Hendel (2001), “Asymmetric Information in Health Insurance: Evidence from the National Health Expenditure Survey,” Rand Journal of Economics, 32, 408– 427. Cawley, J. and T. Philipson (1999), “An Empirical Examination of Information Barriers to Trade in Insurance,” American Economic Review, 89, 827–846. Chevalier, J. and G. Ellison (1997), “Risk Taking by Mutual Funds as a Response to Incentives,” Journal of Political Economy, 105, 1167–1200. Chevalier, J. and G. Ellison (1999), “Career Concerns of Mutual Fund Managers,” Quarterly Journal of Economics, 114, 389–432. Chassagnon, A. and P. A. Chiappori (1997), “Insurance under Moral Hazard and Adverse Selection: The Competitive Case,” mimeo, DELTA. Chiappori, P. A. (2000), “Econometric Models of Insurance under Asymmetric Information,” Handbook of Insurance, (ed. by G. Dionne), Amsterdam: North-Holland. Chiappori, P. A., J. Abbring, J. Heckman, and J. Pinquet (2001), “Testing for Adverse Selection Versus Moral Hazard from Dynamic Data,” mimeo, University of Chicago.

Testing Contract Theory

145

Chiappori, P. A., F. Durand, and P. Y. Geoffard (1998), “Moral Hazard and the Demand for Physician Services: First Lessons from a French Natural Experiment,” European Economic Review, 42, 499–511. Chiappori, P. A., P. Y. Geoffard, and E. Kyriazidou (2000), “Cost of Time, Moral Hazard, and the Demand for Physician Services,” mimeo, University of Chicago. Chiappori, P. A., I. Macho, P. Rey, and B. Salani´e (1994), “Repeated Moral Hazard: The Role of Memory and Commitment, and the Access to Credit Markets,” European Economic Review, 38, 1527–1553. Chiappori, P. A. and B. Salani´e (1997), “Empirical Contract Theory: The Case of Insurance Data,” European Economic Review, 41, 943–950. Chiappori, P. A. and B. Salani´e (2000), “Testing for Asymmetric Information in Insurance Markets,” Journal of Political Economy, 108, 56–78. Chiappori, P. A., B. Salani´e, and J. Valentin (1999), “Early Starters versus Late Beginners.” Journal of Political Economy, 107, 731–760. Cochrane, J. (1991), “A Simple Test of Consumption Insurance,” Journal of Political Economy, 99, 957–976. Core, J. and W. Guay (2000), “The Other Side of the Trade-off: The Impact of Risk on Executive Compensation: a Comment,” mimeo, Wharton School. Crocker, K. and S. Masten (1988), “Mitigating Contractual Hazards: Unilateral Options and Contract Length,” Rand Journal of Economics, 19, 327–343. Crocker, K. and S. Masten (1991), “Pretia Ex Machina? Prices and Process in Long-Term Contracts,” Journal of Law and Economics, 34, 69–99. Crocker, K. and J. Moran (1997), “Commitment and the Design of Optimal Agreements: Evidence from Employment-Based Health Insurance Contract,” mimeo, University of Michigan. Crocker, K. and K. Reynolds (1993), “The Efﬁciency of Incomplete Contracts: An Empirical Analysis of Air Force Engine Procurement,” Rand Journal of Economics, 24, 126–146. Dahlby, B. (1983), “Adverse Selection and Statistical Discrimination: An Analysis of Canadian Automobile Insurance,” Journal of Public Economics, 20, 121–130. de Meza, D. and D. Webb (2001), “Advantageous Selection in Insurance Markets,” Rand Journal of Economics, 32, 249–262. Dionne, G. and N. Doherty (1994), “Adverse Selection, Commitment and Renegotiation: Extension to and Evidence from Insurance Markets,” Journal of Political Economy, 102(2), 210–235. Dionne, G. and R. Gagn´e (2001), “Deductible Contracts against Fraudulent Claims: Evidence from Automobile Insurance,” Review of Economics and Statistics, 83, 290– 301. Dionne, G., C. Gouri´eroux, and C. Vanasse (2001), “Testing for Adverse Selection in the Automobile Insurance Market: A Comment,” Journal of Political Economy, 109, 444–453. Dionne, G, and P. St-Michel (1991), “Worker’s Compensation and Moral Hazard,” Review of Economics and Statistics, 73, 236–244. Dionne, G., and C. Vanasse (1996), “Une e´ valuation empirique de la nouvelle tariﬁcation de l’assurance automobile au Qu´ebec,” mimeo, Montreal. Dubois, P. (1999), “Moral Hazard, Land Fertility, and Sharecropping in a Rural Area of the Philippines,” CREST Working Paper 9930. Dubois, P. (2000a), “Assurance compl`ete, h´et´erog´en´eit´e des pr´ef´erences et m´etayage au ´ Pakistan,” Annales d’Economie et de Statistiques, 59, 1–36.

146

Chiappori and Salani´e

Dubois, P. (2000b), “Consumption Insurance with Heterogeneous Preferences: Can Sharecropping Help Complete Markets?,” mimeo, INRA, Toulouse. Ferrall, C. and B. Shearer (1999), “Incentives and Transactions Cost within the Firm: Estimating an Agency Model Using Payroll Records,” Review of Economic Studies, 66, 309–338. Finkelstein, A. and J. Poterba (2000), “Adverse Selection in Insurance Markets: Policyholder Evidence from the U.K. Annuity Market,” NBER Working Paper W8045. Fortin, B. and P. Lanoie (1992), “Substitution between Unemployment Insurance and Workers’ Compensation,” Journal of Public Economics, 49, 287–312. Fortin, B. and P. Lanoie (1998), “Effects of Workers’ Compensation: A Survey,” CIRANO Scientiﬁc Series, Montr´eal, 98s–104s. Fortin, B., P. Lanoie, and C. Laporte (1995), “Is Workers’ Compensation Disguised Unemployment Insurance?” CIRANO Scientiﬁc Series, Montr´eal, 95s-148s. Friedman, B. M. and M. J. Warshawski (1990), “The Cost of Annuities: Implications for Savings Behavior and Bequests,” Quarterly Journal of Economics, 105, 135–154. Fudenberg, D., B. Holmstrom, and P. Milgrom (1990), “Short-Term Contracts and LongTerm Agency Relationships,” Journal of Economic Theory, 51, 1–31. Gagnepain, P. and M. Ivaldi (2001), “Incentive Regulatory Policies: The Case of Public Transit Systems in France,” mimeo. Gale, D. and M. Hellwig (1985), “Incentive-Compatible Debt Contracts: The One-Period Problem,” Review of Economic Studies, 52, 647–663. Gibbons, R. and K. Murphy (1990), “Relative Performance Evaluation of Chief Executive Ofﬁcers,” Industrial and Labor Relations Review, 43, S30–S51. Gibbons, R. and M. Waldman (1999), “Careers in Organizations: Theory and Evidence, Handbook of Labor Economics,” Volume 3b (ed. by O. Ashenfelter and D. Card): North-Holland, Amsterdam, 2373–2437 Gouri´eroux, C. (1999), “The Econometrics of Risk Classiﬁcation in Insurance,” Geneva Papers on Risk and Insurance Theory, 24, 119–139. Gouri´eroux, C., A. Monfort, and A. Trognon (1984), “Pseudo-Maximum Likelihood Methods: Theory,” Econometrica, 52, 681–700. Hall, J. and J. Liebman (1998), “Are CEOs Really Paid Like Bureaucrats?” Quarterly Journal of Economics, 113, 653–691. Hanssen, R. (2001), “The Effect of a Technological Shock on Contract Form: Revenue Sharing in Movie Exhibition and the Coming of Sound,” mimeo. Harris, M. and B. Holmstrom (1982), “A Theory of Wage Dynamics,” Review of Economic Studies, 49, 315–333. Hart, O. (1995), Firms, Contracts, and Financial Structure. London: Oxford University Press. Hart, O. and J. Tirole (1988), “Contract Renegotiation and Coasian Dynamics,” Review of Economic Studies, 55, 509–540. Haubrich, J. (1994), “Risk Aversion, Performance Pay, and the Principal-Agent Model,” Journal of Political Economy, 102, 258–276. Hendel, I. and A. Lizzeri (2000), “The Role of Commitment in Dynamic Contracts: Evidence from Life Insurance,” Working Paper, Princeton University. Holly, A., L. Gardiol, G. Domenighetti, and B. Bisig (1998), “An Econometric Model of Health Care Utilization and Health Insurance in Switzerland,” European Economic Review, 42, 513–522. Holmstrom, B. and P. Milgrom (1991), “Multitask Principal-Agent Analyses: Incentive

Testing Contract Theory

147

Contracts, Asset Ownership and Job Design,” Journal of Law, Economics and Organization, 7, 24–51. Hubbard, T. (1999), “How Wide Is the Scope of Hold-Up-Based Theories? Contractual Form and Market Thickness in Trucking,” mimeo, UCLA . Hubbard, R. and R. Weiner (1991), “Efﬁcient Contracting and Market Power: Evidence from the U.S. Natural Gas Industry,” Journal of Law and Economics, 34, 25–67. Ivaldi, M. and D. Martimort (1994), “Competition under Nonlinear Pricing,” Annales d’Economie et de Statistiques, 34, 72–114. Jensen, M. and K. Murphy (1990), “Performance Pay and Top-Management Incentives,” Journal of Political Economy, 98, 225–264. Joskow, P. (1985), “Vertical Integration and Long-Term Contracts: The Case of CoalBurning Electric Generation Plants,” Journal of Law, Economics, and Organization, 1, 33–80. Joskow, P. (1987),“ Contract Duration and Relationship-Speciﬁc Investments: Empirical Evidence from Coal Markets,” American Economic Review, 77, 168–185. Jullien, B., B. Salani´e, and F. Salani´e (2000), “Screening Risk-Averse Agents under Moral Hazard,” mimeo. Kaplan, S. and P. Str¨omberg (1999), “Financial Contracting Theory Meets the Real World: Evidence from Venture Capital Contracts,” mimeo. Laffont, J. J. (1997), “Game Theory and Empirical Economics: The Case of Auction Data,” European Economic Review, 41, 1–35. Laffont, J.-J. and M. Matoussi (1995), “Moral Hazard, Financial Constraints and Sharecropping in El Oulja,” Review of Economic Studies, 62, 381–399. Laffont, J.-J. and J. Tirole (1990), “Adverse Selection and Renegotiation in Procurement,” Review of Economic Studies, 57(4), 597–625. Landsberger, M. and I. Meilijson (1999), “A General Model of Insurance under Adverse Selection,” Economic Theory, 14, 331–352. Lavergne, P., and A. Thomas (2000), “Semiparametric Estimation and Testing in a Model of Environmental Regulation with Adverse Selection,” mimeo, INSEE, Paris. Lazear, E. (2000), “Performance Pay and Productivity,” American Economic Review, 90, 1346–1361. Lemmon, M., J. Schallheim, and J. Zender (2000), “Do Incentives Matter? Managerial Contracts for Dual-Purpose Funds,” Journal of Political Economy, 108, 273– 299. MacLeod, B., and D. Parent (1999), “Job Characteristics and the Form of Compensation,” Research in Labor Economics, 18, 177–242. Manning, W., J. Newhouse, N. Duan, E. Keeler, and A. Leibowitz (1987), “Health Insurance and the Demand for Medical Care: Evidence from a Randomized Experiment,” American Economic Review, 77, 251–277. Margiotta, M. and R. Miller (2000), “Managerial Compensation and the Cost of Moral Hazard,” International Economic Review, 41, 669–719. Masten, S. and K. Crocker (1985), “Efﬁcient Adaptation in Long-Term Contracts: Take-or-Pay Provisions for Natural Gas,” American Economic Review, 75, 1083– 1093. McMillan, J. and C. Woodruff (1999), “Interﬁrm Relationships and Informal Credit in Vietnam,” Quarterly Journal of Economics, 114, 1285–1320. Monteverdi, K. and D. Teece (1982), “Supplier Switching Costs and Vertical Integration in the Automobile Industry,” Bell Journal of Economics, 13, 206–213.

148

Chiappori and Salani´e

Murphy, K. (1999), “Executive Compensation,” in Handbook of Labor Economics, Vol. 3, (ed. by O. Ashenfelter and D. Card), Amsterdam: North-Holland. Oyer, P. (1998), “Fiscal Year End and Nonlinear Incentive Contracts: The Effect on Business Seasonality,” Quarterly Journal of Economics, 113, 149–185. Paarsch, H. and B. Shearer (1999), “The Response of Worker Effort to Piece Rates: Evidence from the British Columbia Tree-Planting Industry,” Journal of Human Resources, 34, 643–667. Paarsch, H. and B. Shearer (2000), “Piece Rates, Fixed Wages, and Incentive Effects: Statistical Evidence from Payroll Records,” International Economic Review, 41, 59– 92. Prendergast, C. (1999), “The Provision of Incentives in Firms,” Journal of Economic Literature, 37, 7–63. Puelz, R. and A. Snow (1994), “Evidence on Adverse Selection: Equilibrium Signaling and Cross-Subsidization in the Insurance Market,” Journal of Political Economy, 102, 236–257. Richaudeau, D. (1999), “Automobile Insurance Contracts and Risk of Accident: An Empirical Test Using French Individual Data,” Geneva Papers on Risk and Insurance Theory, 24, 97–114. Rosen, S. (1992), “Contracts and the Market for Executives,” in Contract Economics, (ed. by L. Werin and H. Wijkander), Oxford, UK: Basil Blackwell. Rothschild, M. and J. Stiglitz (1976), “Equilibrium in Competitive Insurance Markets,” Quarterly Journal of Economics, 90, 629–649. Shearer, B. (1999), “Piece Rates, Fixed Wages and Incentives: Evidence from a Field Experiment,” mimeo, Laval University. Shelanski, H. and P. Klein (1995), “Empirical Research in Transaction Cost Economics: A Review and Assessment,” Journal of Law, Economics and Organization, 11, 335– 361. Slade, M. (1996), “Multitask Agency and Contract Choice: An Empirical Exploration,” International Economic Review, 37, 465–486. Stiglitz, J. (1974), “Incentives and Risk Sharing in Sharecropping,” Review of Economic Studies, 41, 219–255. Stiglitz, J. and A. Weiss (1981), “Credit Rationing in Markets with Imperfect Information,” American Economic Review, 71, 393–410. Timmins, C. (2002), “Measuring the Dynamic Efﬁciency Costs of Regulators’ Preferences: Municipal Water Utilities in the Arid West,” Econometrica 70(2), 603–29. Toivanen, O. and R. Cressy (1998), “Is There Adverse Selection on the Credit Market?” mimeo, Warwick. Townsend, R. (1994), “Risk and Insurance in Village India,” Econometrica, 62, 539–591. Vuong, Q. (1989), “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses,” Econometrica, 57, 307–334. Whinston, M. (2000), “On the Transaction Cost Determinants of Vertical Integration,” mimeo, Northwestern University. Whinston, M. (2001), “Assessing the Property Rights and Transaction Cost Theories of Firm Scope,” American Economic Review, 91, 184–188. Williamson, O. (1975), Markets and Hierarchies: Analysis and Antitrust Implications. New York: The Free Press. Williamson, O. (1985), The Economic Institutions of Capitalism: Firms, Markets, Relational Contracting. New York: The Free Press.

Testing Contract Theory

149

Williamson, O. (1996), The Mechanisms of Governance. London: Oxford University Press. Wolak, F. (1994), “An Econometric Analysis of the Asymmetric Information, RegulatorUtility Interaction,” Annales d’Economie et de Statistiques, 34, 13–69. Young, P. and M. Burke (2001), “Competition and Custom in Economic Contracts: A Case Study of Illinois Agriculture,” American Economic Review, 91, 559–573.

CHAPTER 5

The Economics of Multidimensional Screening Jean-Charles Rochet and Lars A. Stole

1. MOTIVATION AND INTRODUCTION Since the late 1970s, the theory of optimal screening contracts has received considerable attention. The analysis has been usefully applied to such topics as optimal taxation, public good provision, nonlinear pricing, imperfect competition in differentiated industries, regulation with information asymmetries, government procurement, and auctions, to name a few prominent examples.1 The majority of these applications have made the assumption that preferences can be ordered by a single dimension of private information, largely to facilitate ﬁnding the optimal solution of the design problem. However, in most cases that we can think of, a multidimensional preference parameterization seems critical to capturing the basic economics of the environment. For example, consider the case of duopolists in a market where each ﬁrm competes with nonlinear pricing over its product line. In many examples of nonlinear pricing (e.g., Mussa and Rosen 1978 and Maskin and Riley 1984), it is natural to think of consumers’ preferences being ordered by the willingness to pay for additional units of quantity or quality. But, if we believe that competition between duopolists is imperfect in the horizontal dimension as suggested, for example, by models such as Hotelling’s (1929), then we need to introduce a form of horizontal heterogeneity as well. As a consequence, a minimally accurate model of imperfect competition between duopolists suggests including two dimensions of heterogeneity – vertical and horizontal. There are several additional economic applications that naturally lend themselves to multidimensional heterogeneity. r General models of pricing. In some instances, a ﬁrm may offer a single product over which the preferences of the consumer may depend 1

Among the seminal contributions, we can cite Mirrlees (1971, 1976) for optimal taxation, Green and Laffont (1977) for public good provision, Spence (1980) and Goldman, Leland, and Sibley (1984) for nonlinear pricing, Mussa and Rosen (1978) for imperfect competition in differentiated industries, Baron and Myerson (1982), Baron and Besanko (1984), McAfee and McMillan (1987), and Laffont and Tirole (1986, 1993) for regulation, and Myerson (1981) for auctions.

Multidimensional Screening

151

importantly on several dimensions of uncertainty (e.g., tastes, marginal utility of income, etc.). In other instances, a ﬁrm may be selling an array of distinct products, of which consumers may desire any subset of the total bundle of goods. In this latter case, the dimension of heterogeneity of consumers’ preferences for the ﬁrm’s products will be at least as large as the number of distinct products. r Regulation under richer asymmetries of information. As noted in the seminal article by Baron and Myerson (1982) on regulation under private information, at least two dimensions of private cost information naturally arise – ﬁxed and marginal costs. Another example is studied by Lewis and Sappington (1988) in which the regulator is simultaneously uncertain about cost and demand. If we wish to take the normative consequences of asymmetric information models of regulation seriously, we should check the robustness of the results to such reasonable bidimensional private information. r Income effects and related phenomena. Many times it makes sense to think of two-dimensional information when privately known budget constraints or other forms of limited liability are present. For example, how should a seller design a price schedule when customers have random valuations and simultaneously random budget constraints? r Auctions. Similar to the aforementioned problem, we may suppose that multiple buyers bid for a single item, but their preferences depend on a privately known budget constraint in addition to a private valuation for the good (as in Che and Gale, 1998, 1999, 2000). Or in another important auction setting, suppose (as in Jehiel, Moldovanu, and Stacchetti 1999) that a buyer’s preferences depend not only on his own valuation of the good, but also on the privately known externality from someone else getting the good instead (e.g., two downstream ﬁrms bid for an exclusive franchise and the loser must compete against the winner with an inferior product). Although in this paper, we do not consider the auction literature in depth, the techniques of optimal contract design in multidimensional environments are clearly relevant.2 Unfortunately, the techniques for confronting multidimensional settings are far less straightforward as in the one-dimensional paradigm. This difﬁculty has meant that the bulk of applied theory papers in the self-selection literature are based on one-dimensional models of heterogeneity. As a consequence, the results of these economic applications remain uncomfortably restrictive and possibly inaccurate (or at least nonrobust) in their conclusions. In this sense, we have been searching under the proverbial street lamp, looking for our lost keys, not because that is where we believe them to lie, but because it is apparently the only place where we can see. This survey is an attempt to catalog and 2

Other multidimensional auctions problems are studied by Gal, Landsberger, and Nemirovski (1999) and Zheng (2000).

152

Rochet and Stole

explain the terrain that has been discovered in the brief forays away from the one-dimensional street lamp – indicating both what we have learned and how light or dark the night sky actually is. In Section 2, we review the one-dimensional paradigm, emphasizing those aspects that will generate problems as we extend the analysis to multiple dimensions. In Section 3, the general multidimensional paradigm is explained for both the discrete and continuous settings. We illustrate the concepts in a simple two-type “multidimensional” model, explaining how the multidimensionality of types introduces new economic and mathematical aspects of the screening problem. In Sections 4–9, we specialize our discussion to speciﬁc classes of multidimensional models that have proven successful in the applied literature. Section 4 presents results on separation and aggregation that greatly simplify multidimensional screening. Section 5 considers environments in which there is a single, nonmonetary contracting variable, but multiple dimensions of type – a scenario that also frequently gives rise to explicit solutions. Section 6 looks at a further specialized subset of models (from Section 5) that are economically important and mathematically tractable: bidimensional private information settings in which one dimension of information enters the agent’s utility function additively. Section 7 considers a series of multidimensional models that have been successfully applied to competitive environments. Section 8 considers a distinct set of multidimensional environments in which information is revealed over time. Finally, Section 9 considers the more subtle problems inherent in general models of multiple instruments and multidimensional preferences; here, most papers written to date have considered the scenario of multiproduct monopoly bundling, so we study this model in some detail. Section 10 concludes. 2. A REVIEW OF THE ONE-DIMENSIONAL PREFERENCE MODEL Although it is often recognized that agents typically have several characteristics and that principals typically have several instruments, the screening problem has most of the time been examined under the assumption of a single characteristic and a single instrument (in addition to monetary transfers). In this case, several qualitative results can be obtained with some generality: 1. When the single-crossing condition is satisﬁed, only local (ﬁrst- and second-order) incentive compatibility constraints can be binding. 2. In most problems, the second-order (local) incentive compatibility constraints can be ignored, provided that the distribution of types is not too irregular. 3. If bunching is ruled out, then the principal’s optimal mechanism is found in two steps: (a) First, compute the minimum expected rent of the agent as a function of the allocation of (nonmonetary) goods.

Multidimensional Screening

153

(b) Second, ﬁnd the allocation of goods that maximizes the surplus of the principal, net of the expected rent computed in (a). To understand the difﬁculties inherent in designing optimal screening contracts when preferences are multidimensional, it is useful to ﬁrst review this basic one-dimensional paradigm. This will serve both as a building block for the multidimensional extensions and as an illustration of how one-dimensional preferences generate simplicity and recursion in the optimization program. We will use a simple nonlinear pricing framework similar to Mussa and Rosen (1978) as our basic screening environment, elaborating as appropriate. Suppose that a monopolist sells its products using a nonlinear tariff, P(q), where q is the amount of quantity chosen by the consumer and P(q) is the associated price. The population of potential consumers of the ﬁrm’s good have preferences that can be indexed by a single-dimensional parameter, θ ∈ ≡ [θ, θ ], and is distributed in the population according to the absolutely continuous distribution function F(θ ), where f (θ) ≡ F (θ ) represents the associated density. Let each ¯ for a price of P be given consumer’s preferences for consuming q ∈ Q ≡ [0, q] by u = v(q, θ) − P. Note that preferences are linear in money. To place some additional structure on the effect of θ, we assume the well-known, single-crossing property that v qθ has a constant sign; in this paper, we will associate higher types with higher marginal valuations of consumption; hence, v qθ > 0. This condition makes the one-dimensional assumption restrictive.3 It is worth noting that this condition has two equivalent implications: (i) the indifference curves of any two types of consumers cross at most once in price-quantity space, and (ii) the associated demand curves do not intersect and are completely ordered as a family of curves given by p = v q (q, θ). We will begin our focus on the even simpler linear-quadratic setting in which v(q, θ) = θq − 12 q 2 . In this case, the associated demand curves are parallel lines, p = θ − q. There are two methodologies used to solve one-dimensional screening problems – what we refer to as the parametric-utility approach and the demandproﬁle approach. The former has been more commonly used in the applied theory literature, but the latter provides useful conceptual insights, particularly in the multidimensional context, that are easily overlooked in the former methodology. For completeness, we will brieﬂy present both here.4 3 4

In a discrete setting, for example, multidimensional types can always be reassigned to a onedimensional parameter, but the single-crossing property is not always preserved. Most recent methodological treatments of the screening problem use the parametric-utility approach, referred to by Wilson (1993a) as the “disaggregated-type” approach. See, for example, the article by Guesnerie and Laffont (1984), and the relevant sections in Fudenberg and Tirole (1991), Myerson (1991), Mas-Colell, Whinston, and Green (1995), and Stole (1997). The demand-proﬁle approach is thoroughly expounded in Wilson (1993a). Brown and Sibley (1986) and Wilson (1993a) discuss both approaches.

154

Rochet and Stole

2.1.

The Parametric-Utility Approach

The basic methodology we follow here was initially developed by Mirrlees (1971), and applied to nonlinear pricing by Mussa and Rosen (1978). The ﬁrm in our setting cares only about expected proﬁt and so seeks to maximize θ E[π ] = [P(q(θ )) − cq(θ)] d F(θ ), θ

where q(θ) is a parametric representation of the choice of type θ consumers, and c is the constant marginal cost of producing q units for a given consumer. Suppose that our monopolist offers a nonlinear, lower-semi-continuous pricing schedule P(q) deﬁned over the compact domain Q. Then, we can deﬁne a type θ consumer’s indirect utility under this scheme as u(θ) ≡ max{v(q, θ ) − P(q)}. q∈Q

Provided that the derivatives of v are bounded, u(θ) is absolutely continuous. Applying the revelation principle, we can reparameterize our problem and focus on maximizing expected proﬁts over all incentive-compatible and individually rational mechanisms, { p(θ), q(θ)}θ∈ . As is well known, a mechanism in this context is incentive-compatible if and only if, for almost all θ , we ˙ ) = v θ (q(θ ), θ ) and q(θ) is nondecreasing.5 The former condition is have u(θ equivalent to the local ﬁrst-order condition and arises as a natural analog of the envelope condition in Roy’s identity; the latter is equivalent to the local second-order condition. When preferences satisfy the single-crossing property, the local second-order condition implies a global condition as well. Hence, our ˙ monopolist ﬁrm can maximize expected proﬁts subject to u(θ) = v θ (q(θ ), θ ) and the monotonicity of q. Given our incentive compatibility conditions are stated in terms of u and q, it is useful to transform our monopolist’s program from price-quantity space to the utility-quantity space. Because S(q, θ ) ≡ v(q, θ ) − cq represents joint surplus from producing q units of output to be consumed by a type θ consumer, the ﬁrm’s expected proﬁt can be restated as θ E[π] = [S(q(θ ), θ) − u(θ)] d F(θ ). (2.1) θ

˙ = Hence, the monopolist maximizes (2.1) over {q(θ), u(θ )}θ ∈ subject to u(θ) v θ (q(θ ), θ), q(θ) nondecreasing and subject to individual rationality. Note that this program is entirely deﬁned by the social surplus function S(q, θ ) and the partial derivative of the consumer’s utility function with respect to θ . For example, the setting in which utility over quantity is v(q, θ) = θq − 12 q 2 and cost is cq is formally equivalent to the setting in which a monopolist sells a 5

Throughout, we use the notation x˙ (y) to represent the derivative of x with respect to y.

Multidimensional Screening

155

product line with various qualities, in which the consumer’s value of consum˜ and the cost of producing ing one unit of quality q is given by v˜ (q, θ˜ ) = θq such a unit is 12 q 2 , where θ˜ = θ − c. Both settings give rise to identical surplus functions and partial derivatives with respect to type, and hence have identical optimal price schedules. In this sense, there is little to distinguish the use of quality [as in Mussa and Rosen’s (1978) seminal paper] from quantity [as in Maskin and Riley’s (1984) generalization of this model]. Fortunately, in both cases, the operative instruments begin with the letter q and, as a pleasant historical accident, we can refer to this second-best allocation as the MR allocation. We will nonetheless focus our attention on the quantity variation of this model. As a technical simpliﬁcation, we use the local ﬁrst-order condition for truthtelling to replace u in the ﬁrm’s objective via integration by parts. The result is an objective function that is maximized over {{q(θ)}θ ∈ , u( θ )} subject to q nondecreasing and the individual rationality constraint: θ 1 − F(θ ) E[π] = v θ (q(θ ), θ ) − u( θ ) d F(θ). S(q(θ ), θ) − f (θ ) θ This objective function has been usefully termed the monopolist’s “virtual surplus” function by Myerson (1991); it includes the total surplus generated by the monopolist’s production less the information rents that must be left to the consumers as a function of their type. ˙ ) = v θ (q, θ) ≥ 0, individual rationality is equivalent to requirBecause u(θ ing u( θ ) ≥ 0. Thus, we choose u( θ ) = 0 as a corner solution in this program, guaranteeing participation at the least possible cost. Note that, in this simple program, it is never proﬁtable to leave excess rents to consumers. Hence, we are left with an objective function that can be maximized pointwise in q(θ ) if we ignore the monotonicity condition. Providing that the virtual surplus !(q, θ) ≡ S(q, θ ) −

1 − F(θ ) v θ (q, θ) f (θ )

is quasi-concave in q and satisﬁes a cross-partial condition, !qθ ≥ 0, the solution {q(θ)}θ∈ , which is deﬁned by the pointwise ﬁrst-order condition !q (q(θ ), θ) = 0, maximizes expected proﬁt and is nondecreasing as required. This solution satisﬁes 1 − F(θ ) v θ (q(θ ), θ ) ≥ 0. Sq (q(θ ), θ) = f (θ ) Hence, we have the familiar result that q(θ ) is distorted downward relative to the social optimum, everywhere but at the “top” (i.e., at θ = θ). If !qθ is not everywhere nonnegative, it is possible that an ironing procedure needs to be used to constrain q(θ) to be nondecreasing.6 Such a procedure typically requires that we utilize more general control-theoretic techniques and depart from our 6

See, for example, Fudenberg and Tirole (1991) for details.

156

Rochet and Stole

simple pointwise maximization program. However, in the single-dimensional setting, a mild set of regularity conditions on v and F guarantees us the simple case.7 Note that because proﬁt per customer, π = S − u, is linear in utility, we are able to use integration by parts to eliminate the utility function from the objective function, except for the requirement that u( θ ) ≥ 0. This allows us to maximize proﬁts pointwise in q; i.e., we do not have to concern ourselves simultaneously with the value of u(θ ). In this sense, the program is block recursive: ﬁrst the optimum can be found for each q(θ ) and for u( θ ) in isolation; then using the resulting function q(θ ) and u( θ ), u(θ) can be determined via integration. The resulting utility schedule can then be combined with q(θ ) to determine the total type-speciﬁc transfer, p(θ) = v(q(θ), θ ) − u(θ ). Given { p(θ ), q(θ )}θ∈ , the price schedule can be constructed by inverting the function q(θ ): P(q) = p(θ −1 (q)). A second inherent simplicity in the one-dimensional model is that the incentive compatibility conditions are determined by a simple differential equation and a monotonicity condition. Whether we use integration by parts or the maximum principle to solve the program, in both instances we made important use of this fact: without it, we also lose the recursive nature of the problem. In the multidimensional setting, if we are uncertain as to which constraints bind, we will generally be forced to maximize proﬁts subject to a far larger set of global constraints. To this end, it is useful to brieﬂy consider the discrete setting. Suppose that θ is distributed discretely on θ = θ1 < θ2 < · · · < θ I = θ , with respective probabilities f i > 0, i = 1, . . . , I and cumulative distribution function k f i . A direct mechanism is a menu of I price-quantity pairs, where Fk ≡ i=1 the ith indexed pair is given to consumers who report that they are of the ith type: {qi , pi }i=1,...,I . Given that the single-crossing property is satisﬁed, it is straightforward to show that, if adjacent incentive compatibility constraints are satisﬁed, then global incentive compatibility is satisﬁed. The adjacent constraints are typically referred to as the downward local and upward local incentive constraints: v(qi , θi ) − pi ≥ v(qi−1 , θi ) − pi−1 ,

for i = 2, . . . , I,

v(qi , θi )− pi ≥ v(qi+1 , θi ) − pi+1 , for i = 1, . . . , I − 1.

(ICi,i−1 ) (ICi,i+1 )

Furthermore, assuming that it is always proﬁtable to transfer rents from the consumer to the ﬁrm, one can easily demonstrate that the downward constraints are binding. In addition, providing that the resulting quantity allocation, {qi }i=1,...,I , is monotonic, one can show that the upward constraints must be slack and consequently incentive compatibility is global. This set of results is typically used to solve the relaxed program with only the downward constraints. In this sense, the sequence of binding downward-local incentive constraints (and the difference 7

The commonly made assumptions that preferences are quadratic, and θ has a log-concave distribution are sufﬁcient for !qθ ≥ 0.

Multidimensional Screening

157

equation that they imply) are analogous to the ordinary differential equation ˙ u(θ) = v θ (q(θ), θ) in the continuous setting. Not surprisingly, the solution to the relaxed program (ignoring monotonicity constraints) satisﬁes an analogous condition: 1 − Fi {v q (qi , θi+1 ) − v q (qi , θi )}, Sq (qi , θi ) = fi i = 1, . . . , I − 1. In the discrete setting case, it is perhaps easier to see the importance of focusing on the local constraints, and in particular on the downward-local constraints. Without such a simpliﬁcation, we would have to introduce a Lagrange multiplier for every type-report pair, (i, j), resulting in I (I − 1) total constraints rather than simply I − 1. Not only does the single-crossing property in tandem with a one-dimensional type space allow us to reduce the set of potential constraints by a factor of I , it also renders these local constraints in an tractable fashion: a simple ﬁrst-order difference equation. The absence of such a convenient ordering is the source of much difﬁculty in the multiple-dimension setting. 2.2.

The Demand-Proﬁle Approach

An alternative approach to modeling optimal screening contracts in parametricutility space is to work with a less primitive and perhaps more economically salient structure – demand curves ordered by type and then aggregated into a demand proﬁle.8 Because demand curves entirely capture consumers’ preferences, there is no informational loss from restricting our attention to demand proﬁles. Given that they are generally easier to estimate empirically, this primitive has arguably more practical appeal. For our purposes, however, the demand proﬁle approach is useful also in that this method more clearly illustrates the simplicity and recursiveness of the single-type framework, and also underscores the aspects of the multiple-type framework that will lead to genuine economic difﬁculties rather than merely technical concerns. Consider ﬁrst the continuous parameterization of demand curves that we will index by θ: an individual of type θ has a demand curve given by p = v q (q, θ). The single-crossing property is equivalent to the requirement that these demand curves do not intersect. In the parametric-utility approach where v(q, θ ) = θq − 12 q 2 , this generates a simple family of parallel demand curves: p = θ − q. The primitive object on which we will work, however, is the aggregate demand proﬁle generated by calculating the measure of consumers who consume q or 8

An interested reader is urged to consult Wilson (1993a) for a wealth of examples and insights into this approach. Wilson (1993a) builds on the work of Brown and Sibley (1986), who provide an earlier treatment of this approach.

158

Rochet and Stole

more units of output with a price schedule, P(q). Formally, we characterize this “cumulative” aggregate demand functional as M[P(·), q] = Prob[θ ∈

| arg max{v(x, θ ) − P(x)} ≥ q]. x

If the consumer’s program is quasi-concave [which is equivalent to the requirement that the marginal price schedule, p(q) ≡ P (q), intersects the consumer’s demand curve once from below], then consumer θ will demand q or more units if and only if v q (q, θ) is not less than the marginal price, p(q), which implies that the cumulative aggregative demand functional has a very simple form: M[P(·), q] = Prob [v q (q, θ) ≥ p(q)] ≡ N ( p(q), q). In this case, the problem is fully decomposable: The seller’s program is to choose optimal marginal prices, p(q), to maximize N ( p, q)[ p − c] pointwise for each q. Assuming that the monopolist’s local ﬁrst-order necessary condition is also sufﬁcient, we can characterize the solution by N ( p(q), q) +

∂ N ( p(q), q) [ p(q) − c] = 0, ∂p

or in a more familiar inverse-elasticity formula p(q) − c 1 = , p(q) η( p(q), q)

where η( p, q) ≡

−p ∂N . N ∂p

Providing that the resulting marginal price schedule, p(q), cuts each parameterized demand curve only once from below, this solution to the relaxed program will satisfy the agent’s global incentive compatibility constraints. The resulting

q nonlinear price schedule in this case is P(q) = P(0) + 0 p(s) ds, where the ﬁxed fee is chosen optimally to induce participation for all consumers who generate nonnegative virtual surplus. When the monopolist’s program is not quasi-concave over p for all q, the solution is still given by the maximization over p of N ( p, q)( p − c), but the resulting marginal price schedule p(q) may fail to be continuous in q, which gives rise to kinks in the price function P. This situation corresponds to the cases where !θq < 0 and q(θ ) is not strictly monotonic (bunching arises). Notice that in this case, also the demand proﬁle approach is less difﬁcult than the parametric-utility approach, which must resort to an ironing procedure. The demand proﬁle approach does not work well when the resulting price schedule cuts some demand curve twice. In this case the expression of the aggregated demand function M cannot be simpliﬁed, because it depends on the whole function P. As an illustration, consider the following numerical example. Suppose that there are three types of consumers with demand curves for the quantities given in the ﬁrst three numeric columns (we normalize

Multidimensional Screening

159

Table 5.1. The demand-proﬁle approach: a numerical example Unit

θ1

θ2

θ3

p(q)

N ( p(q), q)

R(q)

1st 2nd 3rd 4th 5th 6th Total

7 5 3 1 0 0

9 7 5 3 1 0

11 9 7 5 3 1

7 5 5 3 3 1

3 3 2 2 1 1

21 15 10 6 3 1 56

marginal cost to zero; see Table 5.1): The fourth numeric column represents the pointwise optimal price p(q), obtained by maximizing revenue pN ( p, q) for the qth unit. The ﬁfth column is the number of consumers purchasing that quantity (we have normalized the population to an average of one consumer of each type), and the ﬁnal column represents the revenue attributed to the particular quantity level. Total revenue using nonlinear pricing is equal to 56, whereas a uniform-pricing monopolist would choose a price of 5 per unit, sell 9 units, and make a total revenue of 45. The simplicity of this method for ﬁnding the optimal price schedule is worth noting. The local demand-proﬁle representation sometimes falls short, however. If the zeros in the θ1 type’s demand curve were replaced by 1 − 2ε, and the zero in the θ2 type’s demand curve was replaced by 1 − ε, the maximum revenue for the 6th unit would be obtained by selling to all types, whereas it would still be optimal to sell the 5th unit only to θ3 types. Thus, we would generate gaps in consumption choices for types θ1 and θ2 when we maximized p(q) pointwise by q. Speciﬁcally, types θ1 and θ2 would each be directed to choose only units 1–4 and unit 6 (but to skip unit 5), which is not feasible. This candidate solution represents the failure of the local representation; speciﬁcally, the marginal demand proﬁle N ( p, q) does not capture the consumer’s true preferences, which are instead characterized by the full demand proﬁle, M[P(·), q]. 3. THE GENERAL MULTIDIMENSIONAL SCREENING PROGRAM 3.1.

A General Discrete Formulation

We begin with the discrete setting, because it is perhaps most easiest to follow, relying on simple techniques of summation and optimization for the characterization of an optimum, unlike its continuous-type counterpart that makes use of more complex techniques in vector calculus and differential forms. Nonetheless, both approaches are closely related, and the conditions in the discrete setting have smooth analogs in the continuous setting. More importantly for the purposes of this survey, the rough equivalence between the two settings

160

Rochet and Stole

allows us to understand what is difﬁcult about “multiple dimensions.” To be precise, the problems arise not because of multiple dimensionality itself, but because of a commonly associated lack of exogenous type-ordering in multipledimensional environments. This source of the problem is clearest in the discrete setting, where it makes no sense to speak about dimensionality without simultaneously imposing structure on preferences.9 Let us consider now a more general version of the discrete model, where there are I distinct consumer types, and the monopolist produces n different goods: q ∈ Rn . Hence, we can speak of there being n available instruments (i.e., varieties of goods exchanged for money). We make no assumptions on preferences, except for linearity in money. For the sake of consistency with the rest of the paper, we still parameterize gross utilities in the form v(q, θi ) (where θi is the consumer type i = 1, . . . , I ), but we make no assumption on v. By convention, q = 0 represents “no consumption,” and we normalize utility such that v(0, θ) = 0 for all θ . We denote the allocation for consumer θi by the vector qi = q(θi ) and the associated utility by the scalar u i = u(θi ). We will use q to denote the n × I matrix (q1 , . . . , q I ), and u to denote the I -length row vector (u 1 , . . . , u I ). Using the parametric-utility approach, we represent the ﬁrm’s expected proﬁt as E[π ] =

I

f i {S(qi , θi ) − u i )},

i=1

to be maximized under the discrete incentive compatibility constraints, ICi, j , as deﬁned previously in the one-dimensional case, and individual rationality constraints: ∀i

u i ≡ v(qi , θi ) − P(qi ) ≥ 0.

(IRi )

The individual rationality constraints can be considered as a particular case of incentive compatibility constraints by deﬁning a “dummy”-type θ0 , such that v(q, θ0 ) ≡ 0, which implies that it will always be optimal to choose q0 = 0 and P(0) = 0.10 The ﬁrm’s problem is thus to maximize its expected proﬁt under implementability (i.e., IC and IR) constraints: ∀i, j ∈ {0, . . . , I },

u i ≥ v(q j , θi ) − P(q j ),

or, equivalently, ∀i, j ∈ {0, . . . , I }, 9

10

u i − u j ≥ v(q j , θi ) − v(q j , θ j ).

(ICi j )

For example, whether we index preferences using two dimensions, v(q, θ1i , θ2 j ) where i, j = 1, . . . , I /2, or a single dimension, v(q, θk ) with k = 1, . . . , I , is immaterial by itself. Fundamentally, any difﬁculty in extending single-dimensional models to multidimensional models must arise from a lack of ordering among the types rather than any primitive notions of dimensionality. This is just a convention. It is compatible with ﬁxed fees, because P can be discontinuous in 0.

Multidimensional Screening

161

Following Spence (1980), it is natural to decompose the ﬁrm’s problem in two subproblems: 1. Minimize expected utility for ﬁxed q = (q1 , . . . , q I ). 2. Choose q to maximize expected surplus minus expected utility. It is remarkable that the ﬁrst subproblem has a general solution that can be found by a relatively simple algorithm. Let us denote by U(q) the set of utility vectors that implement q. That is, U(q) = {(u 1 , . . . , u I )such that ICi j is satisﬁed for all i, j = 0, . . . , I and u 0 = 0}. In what follows, it will be useful to consider arbitrary paths in the set = {θ1 , . . . , θ I }. We will denote such a path from type θi to θ j by the function γ . We denote the “length” of γ by "; i.e., " is the number of segments used to connect θi = γ (0) to θ j = γ ("). Hence, γ is a mapping, γ : {0, 1, . . . , "} → . Finally, we say that a path of length " is “closed” if γ (0) = γ ("). With this notation for discrete paths, the following characterization of U(q) can be stated. A proof is found in Rochet (1987). Lemma 3.1. U(q) is nonempty if and only if for every closed "-length path γ "−1

v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ) ≤ 0.

(3.1)

k=0

To provide an intuition for condition (3.1), deﬁne the incremental utility between type i and type j as the difference between the utility of type i and the utility of type j when consuming the bundle assigned to type j. Condition (3.1) means that, for all closed paths γ in the set of types, the sum of incremental utilities along γ is nonpositive. Consider, for example, a closed path of length k. Incentive compatibility requires u γ (k+1) − u γ (k) ≥ v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ). By summing over these inequalities, we see that condition (3.1) is implied by incentive compatibility for any closed path. Lemma 3.1 says that the converse is true: condition (3.1) implies incentive compatibility, as well.11 The proof is constructive: Lemma 3.2 gives an algorithm for constructing the minimal element of U(q). 11

The reader versed in vector calculus will recognize this as a discrete variation of the requirement that v θ (q(θ ), θ ) is a conservative ﬁeld, where C represents an arbitrary closed path in : ! v θ (q(θ ), θ ) dθ = 0. C

162

Rochet and Stole

Lemma 3.2. When (3.1) is satisﬁed, U(q) has a unique minimal element, u , characterized for i = 0, . . . , I by "−1

u i ≡ sup γ

v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) ),

(3.2)

k=0

where the sup is taken over all open paths from 0 to i, and u 0 ≡ 0. Condition (3.2) means that agent i is guaranteed a utility level, u i , equal to the sum of the incremental utilities along any path connecting θ0 to θi . We will refer to this ith element of the minimum of U(q) as the informational rent of agent i. Note that this rent does not depend on the frequencies { f 1 , . . . , f I } of the distribution of types, but only on the support, = {θ1 , . . . , θ I }, of this distribution. Formula (3.2) shows that the informational rent of each agent can be computed by a recursive algorithm. Intuitively, it is as if each type i chooses the path from θ0 to θi that maximizes the sum of incremental utilities. Denote by u i" the maximum of formula (3.2) over all paths of length less than or equal to " from 0 to i. u i" can be computed recursively by the Bellman-type formula: " # u i"+1 = max u "j + v(q j , θi ) − v(q j , θ j ) . j

Condition (3.1) implies that this algorithm has no cycles. The set of types being ﬁnite, u i" converges to the rent of agent i in a ﬁnite number of steps as " is increased to I . For any allocation q, the dynamic programming principle implies that if j belongs to the optimal path γ from 0 to i, the truncation of γ to the path between 0 and j deﬁnes the optimal path from 0 to j. This allows us to deﬁne a partial ordering ≺ on types: j ≺ i ⇐⇒ j belongs to one of the optimal paths from 0 to i. For generic12 allocations, there is a unique optimal path γ [with γ (0) = 0 and γ (") = i] from 0 to i, and the rent of i is easily computed: u i (q, ≺) =

"−1 [v(qγ (k) , θγ (k+1) ) − v(qγ (k) , θγ (k) )]. k=1

Graphically, the collection of optimal paths comprises a “tree” (i.e., a connected graph without cycles such that, from the “root” vertex 0, there is a unique path to any other point in the graph); we use # to represent such a tree. We can therefore represent the binding incentive constraints by such a tree emanating from the type, θ0 . One can also deﬁne for all i, j, such that i ≺ j, the “immediate successor” s(i, j) of i in the direction of j by the formula s(i, j) = min{k | i ≺ k ≺ j, k = i}. 12

However, the optimal allocation q may be such that there are several optimal paths. We give an example of such a case in Section 3.3.

Multidimensional Screening

163

Then, it is easy to see that the agent’s expected rent can be written as13 ER(q, ≺) =

I

f j [v(qi , θs(i, j) ) − v(qi , θi )].

i=1 j!i

In the classic one-dimensional case when the single-crossing holds, condition (3.1) reduces to the well-known monotonicity condition q1 ≤ q2 ≤ · · · ≤ q I and ≺ always consists of the complete ordering: θ1 < θ2 < · · · < θ I ; the associated tree is a single connected branch. In this case ER(q, ≺) =

I

(1 − Fi ) [v(qi , θi+1 ) − v(qi , θi )] ,

i=1

and as previously shown in Section 2, subproblem 2 is easily solved by maximizing the virtual surplus !(qi , θi ) = S(qi , θi ) −

1 − Fi [v(qi , θi+1 ) − v(qi , θi )]. fi

In the general case, the binding IC constraints (corresponding to the agent’s optimal paths deﬁning the tree #) depend on the allocation q, which means that the virtual surplus does not have in general a simple expression. As will be illustrated later, the virtual surplus approach works only when one can anticipate a priori the optimal paths γ ∈ #: i.e., which IC constraints will be binding. To summarize, from this discussion of the general discrete formulation, two conclusions emerge that are inherent in all multidimensional problems. First, and most signiﬁcantly, multiple-dimension models are difﬁcult precisely when they give rise to an endogenous ordering over the types of (i.e., the set of binding IC constraints is endogenous to the choice of q). Second, and closely related, the incentive compatibility conditions are frequently binding not only among local types, and hence the discrete analog of the ﬁrst-order approach is not generally valid and a form of an integrability condition, (3.1), must necessarily be satisﬁed. We will see a similar structure in the continuous-type setting. 3.2.

The Continuous Case

In the continuous case, the implementability condition (3.1) translates into two necessary conditions. The ﬁrst is an integrability condition that requires, for every closed path γ : [0, 1] → , that 1 v θ (q(γ (s)), γ (s)) dγ (s) = 0. (3.3 ) 0

13

When i is a maximal element, the set { j | j ! i} is empty and ER does not depend on qi .

164

Rochet and Stole

This is equivalent to saying that v θ (q(θ ), θ ) is the gradient14 of some function u(θ ). The second condition is a set of inequalities: ∀θ

(3.3

)

D 2 u(θ ) ≥ v θ θ (q(θ ), θ ),

where D 2 u is the Hessian matrix of any function u such that ∇u = v θ and the inequality is taken in the sense of matrices (i.e., D 2 u − v θ θ is positive semideﬁnite). The trouble is that these conditions are not sufﬁcient, except when v θ θ ≡ 0 (the linear parameterization) in which case (3.3 ) and (3.3

) are necessary and sufﬁcient for implementability by Fenchel’s duality theorem15 (Rochet, 1987). The continuous equivalent of Lemma 3.2 is somewhat trivial. This is because the integrability condition (3.3 ) implies that, for any path γ connecting γ (0) = θ0 to γ (1) = θ , we have 1 u(θ ) = v θ (q(γ (s)), γ (s)) dγ (s). 0

Expected surplus can be computed using the divergence theorem:16 u(θ) f (θ ) dθ = λ(θ ) · v θ (q(θ), θ) f (θ) dθ λ(θ ) · n(θ ) f (θ )u(θ ) dσ (θ ), − ∂

where λ is any solution of the partial-differential equation: div (λ(θ ) f (θ )) + f (θ ) = 0,

(3.4)

where n(θ) is the outward normal to the boundary ∂ of , and the notation ∂ W (θ) dσ (θ) represents the integral of some function W along the boundary ∂ . 14

As noticed by several authors, this is also equivalent to a set of partial differential equations reminiscent of Slutsky equations: ∂ ∂ ∂v ∂v ∀n, m (q(θ ), θ) = (q(θ ), θ ) . ∂θn ∂θm ∂θm ∂θn

15

McAfee and McMillan (1988) deﬁne a Generalized Single-Crossing condition that slightly generalizes the linear case: it amounts to assuming that, for any nonlinear price, the set of types who choose the same allocation is a linear subspace. They use it to generalize the results of Laffont, Maskin, and Rochet (1987). They also ﬁnd a necessary and sufﬁcient condition for implementability. The divergence theorem is the multidimensional analog of the integration-by-parts formula. It asserts that, under regularity conditions, λ(θ ) · n(θ )u(θ ) f (θ ) dσ (θ ). − u(θ )div[λ(θ) f (θ )] dθ = λ(θ ) · ∇u(θ ) f (θ ) dθ −

16

∂

Multidimensional Screening

165

Now, the expected proﬁt of the ﬁrm can be written as E[π ] = {S(q(θ), θ ) − λ(θ )v θ (q(θ ), θ )} f (θ) dθ + λ(θ ) · n(θ )u(θ) f (θ) dσ (θ ), ∂

which has to be maximized under the implementability conditions (3.3 ) and (3.3

). When these constraints are not binding, this problem can, in principle, be solved by point-wise maximization of virtual surplus: !(q, θ) = S(q, θ) − λ(θ)v θ (q, θ). The trouble is that, like in the discrete case, λ is not known explicitly. It is deﬁned as the unique solution of partial differential equation (3.4) that satisﬁes the boundary condition u(θ)[λ(θ ) · n(θ )] = 0

for all

θ on ∂ .

It can be proved that the general solution to equation (3.4) can be computed by integrating the density, f , along arbitrary paths γ : 1 f (γ (s)) dγ (s). λ(θ ) = γ −1 (θ)

Therefore, the optimal u is characterized by two objects: r A partition of the boundary of into two regions: the “lower boundary” ∂0 , where the participation constraint is binding [u(θ) = 0] and the “upper boundary” ∂1 , where λ(θ) · n(θ) = 0, which means that there is no distortion along the normal to the boundary; r A family of paths connecting the lower boundary [where u(θ) = 0] to the upper boundary (where there is no distortion). This is the continuous equivalent of the pattern found in the discrete case: a partial ordering of types along paths connecting the region where the participation constraint binds to the region where there is no distortion. As in the discrete setting, again two ideas emerge that are distinct to the multidimensional case: (i) the set of paths connecting the lower and upper boundaries of are endogenous to the choice of allocation {q(θ )}θ ∈ , and (ii) an integrability condition must necessarily be satisﬁed. 3.3.

Tractable Discrete Models

To illustrate the different patterns that can arise in multidimensional screening models and how our conclusions affect our results, we consider here a very simple example of nonlinear pricing problems, inspired by Sibley and Srinagesh

166

Rochet and Stole

(1997) and Armstrong and Rochet (1999).17 In those examples, a monopolist ﬁrm produces two goods j = 1, 2 at a constant marginal cost (normalized to zero). There are two types of consumers, characterized by independent linear inverse demands pi j (qi j ) = θi j − qi j ,

j = 1, 2,

i = 1, 2.

Thus types are bidimensional θi = (θi1 , θi2 ), i = 1, 2. Linear demands are equivalent to quadratic utilities: $ 2 1 2 v(θi , qi ) = θi j qi j − qi j , 2 j=1 where qi is the vector qi = (qi1 , qi2 ). The ﬁrst-best efﬁcient allocation is char2 2 + θi2 ). acterized by the vector qi∗ = θi and surplus by the scalar Si∗ = 12 (θi1 18 Following lemma 3.1, the implementability condition reduces to (θ1 − θ2 ) · q2 + (θ2 − θ1 ) · q1 ≤ 0.

(3.5)

Providing this condition is satisﬁed, lemma 3.2 implies that, at the optimum, the rents to the types are given by u 1 = max(0, (θ1 − θ2 ) · q2 ), u 2 = max(0, (θ2 − θ1 ) · q1 ). The implementability condition then implies that either u 1 or u 2 equals 0 (i.e., the IR constraint binds somewhere). To ﬁx ideas, we assume that S1∗ < S2∗ . By analogy with the unidimensional case, one may conjecture that the second-best allocation is then characterized by u 1 = 0 (binding IR “at the bottom”) and q2 = q2∗ (efﬁciency “at the top”). This is indeed one possible regime, illustrated by Figure 5.1. In this ﬁrst case, u 2 = (θ2 − θ1 ) · q1 and q1 = θ1 −

f2 (θ2 − θ1 ). f1

This allocation can be implemented by a menu of two-part tariffs: tariff 1 has a low ﬁxed fee T1 = 12 q12 and a unit price vector p1 = f 2 / f 1 (θ2 − θ1 ); tariff 2 has a high ﬁxed fee T2 = S2∗ − u 2 and a zero unit price vector. Note that unit prices are not necessarily above marginal costs (which have been normalized to zero), because we did not assume19 θ2 > θ1 . Apart from this feature the completely 17 18 19

Dana (1992) and Armstrong (1999a) also provide related examples of tractable discrete-type models. The only relevant closed path to consider is the cycle from θ1 to θ2 . The case θ2 > θ1 corresponds to what Sibley and Srinagesh (1997) have called uniformly ordered demands.

Multidimensional Screening

167

θ2 ❅ ❅

❅ ❘ ❅ ❅ ❅

❅ ❅ ❅ ✘✘ θ1 ✘ ✘ ✘ ✾ ✘ ✘ ✘

θ0

✘ ✘✘✘ ✘

Figure 5.1. First-regime – the completely ordered case. Arrows indicate the direction of the binding incentive constraints; e.g., an arrow from θ2 to θ1 represents type θ2 ’s indifference between their own allocation and that meant for θ1 .

ordered case is analogous to the unidimensional case. It corresponds to the solution of the monopoly problem whenever u 2 = (θ2 − θ1 ) · q1 ≥ 0 = u 1 ≥ (θ1 − θ2 ) · q2 . The second inequality is implied by the ﬁrst, given the implementability condition in (3.5), whereas the ﬁrst inequality is equivalent to f 1 (θ2 − θ1 ) · θ1 ≥ f 2 (θ2 − θ1 )2 , or θ12 + f 2 θ22 . 1 + f2

θ1 · θ2 ≥

(3.6)

When this condition is not satisﬁed, a second possible regime corresponds to the case where there is no interaction between types; we call it the separable case (see Figure 5.2). In this second case, there are no distortions: q1 = θ1

and q2 = θ2 , θ2

✢

✘ θ1

✘✘ ✾✘✘ ✘ ✘✘ ✘ ✘ ✘✘ ✘

θ0

Figure 5.2. Second regime – the separable case.

168

Rochet and Stole θ2 •

❅ ❅ ❅ µ21 ❘ ❅ ❅ µ20 ✎ ❅ ❅ ❅ ❅ ✘• ✘✘ θ1 ✘✘ ✾ ✘ ✘ ✘ ✘ ✘ µ 10 ✘✘ •✘

θ0

Figure 5.3. Third regime – the mixed case.

and all the surplus is captured by the seller: u 1 = u 2 = 0. Following (3.5), this allocation is implementable if and only if (θ2 − θ1 ) · θ1 ≤ 0,

(θ1 − θ2 ) · θ2 ≤ 0.

Given our assumption that θ12 ≤ θ22 , this is equivalent to θ1 · θ2 ≥ θ12 .

(3.7)

Finally, when neither (3.6) nor (3.7) is satisﬁed, there is an intermediate case that combines the features of the two regimes (see Figure 5.3). In this third and ﬁnal case, the ﬁrm is still able to capture all the surplus, but this is at the cost of a distortion on q1 , designed in such a way that type θ2 is just indifferent between the efﬁcient bundle q2 = θ2 at a total tariff T2 = S2∗ and bundle q1 at tariff T1 = 12 q12 . Notice that there are two optimal paths connecting θ2 to θ0 , corresponding to two different trees, #1 and #2 . The weight µ21 put on this second path is determined by this indifference condition: u 2 = 0 = (θ2 − θ1 )q1 , where q1 = θ1 −

µ21 (θ2 − θ1 ). f1

This gives µ21 = f 1

(θ2 − θ1 ) · θ1 , (θ2 − θ1 )2

which has to be between 0 and f 2 . These conditions determine the boundary of this regime in the parameter space: 0 ≤ (θ2 − θ1 ) · θ1 ≤ f 2 (θ2 − θ1 ) · θ2 , or θ12 ≤ θ1 · θ2 ≤

θ12 + f 2 θ22 . 1 + f2

Multidimensional Screening

169

Notice that, in this case, we have that u 1 = u 2 = 0 but q1 = q2 , which cannot arise in dimension 1. The three cases (completely ordered, separable, and mixed) illustrate the three settings that generally arise in multidimensional models. When we place signiﬁcant restrictions on preferences and heterogeneity, we can frequently obtain simpler solutions that correspond to the ﬁrst two cases. We discuss these in the following section, and then consider variations on these themes in Sections 5–8. The mixed case corresponds to the more general and difﬁcult setting we discuss in Section 9.

4. AGGREGATION AND SEPARABILITY In this section, we explore two cases where multidimensional problems can be effectively reduced to unidimensional problems: the case of aggregation, where a one-dimensional sufﬁcient statistic can be found for representing unobservable preference heterogeneity, and the case of separability, where the set of types can be partitioned a priori into one-dimensional subsets. In the former setting, the binding IC constraints necessarily lie in a completely ordered graph, which is known a priori and corresponds to the completely ordered case, discussed in the previous section. In the latter setting, the incentive constraints can be partitioned into an exogenously given tree that is known a priori, which corresponds to the separable case.

4.1.

Aggregation

A family of multidimensional screening problems that effectively reduces to one-dimensional problems are characterized by the existence of a sufﬁcient statistic of dimension 1 that summarizes all relevant information on unobservable heterogeneity of types and that has an exogenously given distribution. Let us start with a trivial example where the sufﬁcient statistics can be found immediately. Suppose that only one good is sold (n = 1), but types are bidimensional (m = 2) and social surplus is given by 1 S(θ, q) = (θ1 + θ2 )q − q 2 . 2 It is then obvious that θˆ ≡ θ1 + θ2 is a one-dimensional sufﬁcient statistic for the consumer’s preferences, and the monopolist’s can be solved by applying the usual techniques to the distribution of θˆ . Even in this simple transformable setting, however, we can see that everything is not the same as in the canonical one-dimensional model. The primary difference is the exclusion property discovered by Armstrong (1996). Suppose, indeed, that θ = (θ1 , θ2 ) has a bounded density on R2+ , or on a rectangle [ θ 1 , θ¯1 ] × [ θ 2 , θ¯2 ] (or any domain with a “southwest” corner). Then, it

170

Rochet and Stole

is easy to see that the density of θˆ , obtained by convolution of the marginals20 of θ1 and θ2 tends to zero when θˆ tends to the lower bound of its support. As a result, the inverse hazard rate tends to inﬁnity, which implies the existence of an exclusion region at the bottom that would not necessarily emerge if either θ1 or θ2 was observable and contractible. There is an associated intuition that relates to the envelope theorem: raising prices by ε raises revenues from inframarginal buyers by a ﬁrst-order amount at a loss of a second-order measure of consumers in the southwest corner of the support of types, ε2 . We will see this insight extends to more general settings; for example, Armstrong (1996) originally demonstrates this result for the separable setting discussed in the following section.21 It is worth noting that, whereas the aggregation technique appears trivial in our toy example, it is often more subtle and arises from a property of the market setting. For example, Biais, Martimort, and Rochet (2000) consider a market maker who sells a risky asset to a population of potential investors, characterized by two dimensions of adverse selection, θ = (θ1 , θ2 ) (using our notation): θ1 corresponds to the investor’s fundamental information; i.e., his evaluation of the asset’s liquidation value [the true liquidation value is θ1 + ε˜ , where ε˜ is N (0, σ 2 ) and independent of θ2 ]; θ2 corresponds to a sort of personal taste variable – namely, the initial position of the investor in the risky asset (his hedging needs). If he buys q units of the asset for a total price P(q), the investor’s ﬁnal wealth is ˜ (q) = W0 − P(q) + (θ1 + ε˜ )(θ2 + q), W where W0 denotes his initial endowment of money. Assuming that the investor has constant absolute risk aversion preferences [u(W ) = −e−ρW ], the certainty equivalent of trading q units is 1 V (q) = W0 − P(q) + θ1 (θ2 + q) − ρσ 2 (θ2 + q)2 . 2 Thus, the net utility of trading q is given by 1 U = V (q) − V (0) = (θ1 − σ 2 θ2 )q − ρσ 2 q 2 − P(q). 2 Even though the initial screening problem is bidimensional, the simpliﬁed

20 21

As noticed by Miravete (1996), the monotone hazard-rate property is preserved by convolution. Armstrong (1996) shows that the exclusion property is also true when is strictly convex. However, suppose that θ is uniformly distributed on a rectangle that has been rotated 45 degrees: = {θ ∈ R2 , θ ≥ θ1 + θ2 ≥ θ¯ , −d ≤ θ1 − θ2 ≤ d}. Then, it is easy to see that θˆ has a uniform distribution on [ θ, θ¯ ], which implies that q ∗ (θ ) = 2θˆ − θ¯ and that the exclusion region vanishes when 2θ > θ¯ . This shows that the exclusion property discovered by Armstrong (1996) is not intrinsically related to multidimensionality, but rather to the properties of the distribution of types.

Multidimensional Screening

171

version of the problem reduces to a one-dimensional screening problem with a sufﬁcient statistic θˆ = θ1 − ρσ 2 θ2 that aggregates the two motives for trade. Other examples of this sort appear in Laffont and Tirole (1993) and Ivaldi and Martimort (1994). Ivaldi and Martimort (1994) study a model of competition with two dimensions of preference heterogeneity, which, given their assumptions about distributions and preferences, aggregates into a model with a one-dimensional statistic. Laffont and Tirole (1993) study regulation of multidimensional ﬁrms in a model combining adverse selection and moral hazard. By assuming that costs are observable to the regulator, they effectively transform their problem into a pure screening model, amenable to the technique presented here. In particular, when the unobservable technological parameters of the ﬁrms (their “types”) are multidimensional, Laffont and Tirole ﬁnd conditions, inspired by the aggregation theorems of Blackorby and Schworm (1984), under which the type vectors can be aggregated into a single number.

4.2.

Separability

Wilson (1993a, 1993b) and Armstrong (1996) were the ﬁrst to provide closedform solutions to multidimensional screening models. These solutions are all of the separable type. An illustration can be given in our framework by assuming linear parametrization of surplus with respect to types S(θ, q) = θ · q − W (q), where W is convex such that ∇W (0) = 0, and a density f of types that depends only on #θ#. Consider, for example, the case where there are two goods (m = 2), and f is the density of a truncated normal on a quarter of a circle of center 0 and radius R > 1. Wilson (1993a) and Armstrong (1996) ﬁnd conditions under which the solution to the monopolist problem depends only on the distribution of types along the “rays” (i.e., the straight lines through the origin). In other words, they look for cases where the only binding IC constraints are “radial” (see the Figure 5.4). θ2 ✻

✁ ✁ ✁

✁ ✁✁ ✁✁ ☛ ✁ ✁✁ ☛ ✁ ✁✁ ☛✁ ✁ ✁

✟✟ ✟✟ ✙ ✟✟ ✟ ✟ ✙ ✟ ✟✟ ✙✟ ✟ ✟✟

✁ ✁ ✟✟ ✁ ✟✟ ✁✟✟ ✟ ✁

“radial IC constraints”

✲ θ1

Figure 5.4. Radial incentive compatibility constraints.

172

Rochet and Stole

If this is the case, the solution can be determined by computing the conditional distribution of types αalong the rays. This is done by introducing the change of variable θ = t cos sin α , with t ∈ [0, R] and α ∈ [0, π/2]. The change of variable formula for multivariate densities gives the conditional density along the rays: t2 , 2 which does not depend on α. The virtual surplus is easily computed as g(t) = t exp −

!(θ, q) = S(θ, q) −

1 − G(t) θ · q, g(t) #θ#

which gives, after easy computations: 1 !(θ, q) = 1 − θq − W (q). #θ#2 This virtual surplus is maximized for q ∗ (θ ) deﬁned implicitly by ∇W (q) = (1 − 1/#θ #2 )θ for #θ # ≥ 1, and q = 0 for #θ # < 1. If we use the indirect surplus function S ∗ (θ ) = maxq {θ · q − W (q)}, this is equivalent to: q ∗ (θ ) = ∇ S ∗ ([1 − 1/#θ #2 ]+ · θ ), where [x]+ denotes max(0, x). We now have to check whether this function q ∗ satisﬁes the necessary conditions of the monopoly problem, namely boundary conditions and implementability conditions. The boundary conditions require that the boundary of = R2+ be partitioned into two regions: r ∂0 , where u(θ) ≡ 0 (no rent) r ∂1 , where the gradient of the surplus is tangent to the boundary (no distortion at the boundary). These two regions are represented in Figure 5.5. Notice that the boundary condition is satisﬁed in ∂1 only because the extreme rays are tangent to the boundary. This property would not be satisﬁed if the support of θ was shifted by an arbitrarily small vector. This more complex case is discussed in Section 9.2. On the other hand, Armstrong (1996) discovered a robust property of the solution, namely the existence of an exclusion region (where u ≡ 0): in our example, it corresponds to the region #θ# ≤ 1. This is explained by the fact that, for “regular” distributions on R2+ (similar properties hold for many other domains), the conditional densities along the rays tend to zero when #θ# tends to zero, which implies that inverse hazard rates tend to inﬁnity, as discussed in Section 4.1. It remains to check that implementability conditions are satisﬁed. Due to the linearity of preferences with respect to θ , these implementability conditions are equivalent to saying that q ∗ is the gradient of a convex function (i.e., that Dq ∗ is a symmetric, positive deﬁnite matrix). Easy computations show that symmetry is equivalent to saying that S ∗ (θ ) depends only on #θ #; that is, it possesses the same type of symmetry as the density of types. If this property is satisﬁed, the

Multidimensional Screening

173

θ2 ✻

∂1 Θ no distortion at the boundary

No distortion at

✲ ❄

1s

✂

✂

✂

✂

✂

✂

✂

✂

✚ the boundary

✂ ✂✄ ✎✂✌ ✂

✚

✚ ✚ ❂ ❂ ✱ ✱

❂✱ ✱

✱ ✱

✱

✱ ✱

u≡0

✯

✥ ✾ ✥ ✥✥✥ ✥✥✥ ✥ ✥ ✥ s

∂0 Θ

✣

1

✛ ✻

✲θ

1

no distortion at the boundary ∂1 Θ

Figure 5.5. Exclusion region and boundaries.

second-order conditions for implementability (i.e., the fact that Dq ∗ is positive deﬁnite) will be automatically satisﬁed. When this is not the case, the solution is much more complex to characterize, because bunching necessarily appears. We study such an example in Section 9.2. 5. ENVIRONMENTS WITH ONE-DIMENSIONAL INSTRUMENTS In many multidimensional screening problems, there are more dimensions of heterogeneity than instruments available to the principal (n < m). Here, we turn attention to the case of screening problems with one instrument (n = 1), but several parameters of adverse selection (m > 1) in which, even though a univariate sufﬁcient statistic exists, its distribution is endogenous, depending on the pricing schedule chosen by the ﬁrm. Typically, the set of instruments may be limited either by exogenous reasons [see, e.g., the justiﬁcations given by Rochet and Stole (2002) for ruling out stochastic contracts] or because the principal restricts herself to a subclass of all possible instruments. For example, Armstrong (1996) focuses on cost-based tariffs in his search of optimal nonlinear prices for a monopolist.22 Using our notation, the monopolist problem in Armstrong (1996) can then be simpliﬁed 22

Similarly, several authors e.g., Zheng (2000) and Che and Gale (1996a, 1996b), have studied score auctions, a particular subclass of multidimensional auctions in which the auctioneer aggregates bids using a prespeciﬁed scoring rule. As another example, Armstrong and Vickers (2000) consider price-cap regulation under the restriction of no lump-sum transfers.

174

Rochet and Stole

by computing indirect utilities V (y, θ) = max{v(q, θ) | C(q) ≤ y} q

representing the maximum utility attained by a consumer of type θ who gets a bundle of total cost less than or equal to y. The problem reduces then to ﬁnd the best one-dimensional schedule T (y) (n = 1) for screening a multidimensional distribution of buyers (m > 1). As in the one-dimensional case, there are two approaches available for this class of problems: the parametric-utility approach and the demand-proﬁle approach. The demand-proﬁle approach is typically far easier to implement, provided that the consumer’s preferences can be accurately summarized by a demand proﬁle that depends only on the marginal prices. Laffont, Maskin, and Rochet (1987) solved such a problem using the parametric-utility approach. Consider the scenario in which a monopolist sells only one good (n = 1) to buyers differing by two characteristics: the intercept θ1 and the slope −θ2 of their (individual) inverse demand curves. This corresponds to the following parameterization of preferences: 1 v(q, θ) = θ1 q − θ2 q 2 . 2 If we want to apply the parametric-utility methodology, we are confronted with the problem that implementability of an indirect utility function u(·) is more complex to characterize. Indeed, let P(q) be a given price schedule. The corresponding indirect utility u and allocation rule q satisfy $ 1 2 u(θ ) = max θ1 q − θ2 q − P(q) , q 2 where the maximum is attained for q = q(θ ). By the envelope principle, we have that u is again a convex function such that q(θ) for a.e. θ. ∇u(θ ) = − 12 q 2 (θ ) This shows that u necessarily satisﬁes a nonlinear partial-differential equation ∂u 1 ∂u 2 + = 0. (5.1) ∂θ2 2 ∂θ1 The monopolist’s problem can then be transformed as before into a calculus of variations problem in u and ∇u, but with the additional constraint (5.1) that makes the program difﬁcult. Interestingly, Wilson’s demand-proﬁle approach works very well in this case. Let us deﬁne the demand proﬁle for quantity q at marginal price p as N ( p, q) = Prob[v q (q, θ) ≥ p] = Prob[θ1 − θ2 q ≥ p]. Assuming a constant marginal cost c, the optimal marginal price p(q) = P (q) can be obtained by maximizing ( p − c)N ( p, q) with respect to p. If θ1

Multidimensional Screening

175

and θ2 are distributed independently according to cumulative distributions F1 and F2 (and densities f 1 and f 2 ), we obtain +∞ {1 − F1 ( p + θ2 q)} f 2 (θ2 ) dθ2 . N ( p, q) = 0

The optimal marginal price is deﬁned implicitly by +∞ {1 − F1 ( p(q) + θ2 q)} f 2 (θ2 ) dθ2 N ( p(q), q) p(q) = c − = c + 0 +∞ , N p ( p(q), q) f 1 ( p(q) + θ2 q) f 2 (θ2 ) dθ2 0

which generalizes the classical formula obtained when θ2 is nonstochastic: p(q) = c +

1 − F1 ( p(q) + θ2 q). f1

For example, when θ1 is exponentially distributed (i.e., f 1 (θ1 ) = λ1 e−λ1 θ1 ), the mark-up is constant and the two formulas coincide: p(q) = c + 1/λ1 . Notice also that θˆ = θ1 − θ2 q(θ) is a univariate sufﬁcient statistic, but unlike the case considered in Section 4.1, its distribution depends on q(θ ) and thus on the price schedule chosen by the monopolist. We now turn to a subset of these models with a single instrument, in which one dimension of type enters utilities additively. 6. ENVIRONMENTS WITH RANDOM PARTICIPATION 6.1.

A General Framework

We consider a class of environments in which n = 1, but in which a particular additivity assumption provides sufﬁcient structure to produce some general economic conclusions. Speciﬁcally, suppose that n = 1 and m = 2, but utility of the agent is restricted to the form u = v(q, θ1 ) − θ2 − P, where 1 = [ θ 1 , θ 1 ] and 2 = R+ . Several interesting economic settings can be studied within this model. First, we can think of the θ2 parameter as capturing a type-dependent participation constraint. Previous work on type-dependent participation has assumed that θ2 is a deterministic function of θ1 (e.g., they are perfectly correlated).23 In this sense, the framework generalizes the previous one-dimensional literature, although many of the more interesting results rely on independent distributions of θ1 and θ2 . 23

See, for example, Maggi and Rodriguez-Clare (1995), Lewis and Sappington (1989a, 1989b), and Jullien (2000).

176

Rochet and Stole

Second, one can think of θ2 as capturing a “locational cost” in a discretechoice model of consumer behavior.24 This allows one to extend the nonlinear pricing model of Mussa and Rosen (1978) to a more general setting, which may be important to obtain a more realistic model of consumer behavior. As an illustration, consider the predicted consumer behavior of the standard, onedimensional model following a uniform price increase from P(q) to P(q) + δ: the units sold at every quality level except the lowest should remain unchanged. This is because a shift in P(q) has no effect on any of the incentive compatibility conditions, since the shift occurs on both sides of the constraints. By adding the stochastic utility effect of θ2 , predicted market shares would smoothly change for all types, although perhaps more dramatically for lower types. Third, consider the regulatory setting ﬁrst discussed in Baron and Myerson (1982). There, a regulator designs an optimal mechanism for regulating a monopoly with unknown marginal cost. Suppose that, in addition, ﬁxed costs are also private information: i.e., C(q) = θ1 q + θ2 . Proﬁt for the regulated ﬁrm that receives T (q) as a transfer from the regulator for producing q units is π = T (q) − θ1 q + θ2 , that has a one-to-one correspondence with the previous monopoly setting.25 Other closely related examples that we discuss in more detail include selling to liquidity-constrained buyers, where θ2 captures the buyer’s available budget, regulation of a ﬁrm in an environment with demand and cost heterogeneity, competition between oligopolists selling differentiated products with nonlinear pricing, and competition among sellers providing goods via auctions. The key simpliﬁcations in all of these settings are twofold. First, one dimension of information enters additively. As such, q is unavailable for direct screening on this additive attribute. Second, attention is limited to deterministic26 price schedules, P(q). 24 25

26

See Anderson, de Palma, and Thisse (1992) for a review of this large literature, and Berry, Levinsohn, and Pakes (1995) for an econometric justiﬁcation of the additive speciﬁcation. Rochet (1984) ﬁrst solved this problem on an example with general mechanisms that rely on randomization. Applying Rochet and Stole’s (2002) results to this context is appropriate in the restricted setting in which the price schedule is deterministic. In this case, Rochet and Stole (2002) show that the presence of uncertainty over ﬁxed costs causes the optimal regulation to reduce the extent of the production distortion. Given the relevance of deterministic contracts, this may seem a reasonable restriction, a priori. In general, however, the principal may be able to do better by introducing a second screening instrument, φ, which represents the probability that the agent is turned away with q = 0. In this case, utility becomes φ(v(q, θ1 ) − θ2 − P) and φ can be used to screen different values of θ2 . On the other hand, it is without loss of generality to rule out such random mechanisms when either (i) the value θ2 is lost by participating in the mechanism (i.e., even if φ = 0), which eliminates the possibility to screen over θ2 ; alternatively, (ii) if the agent can anonymously return to the principal until φ = 1 is realized, the problem is stationary and the agent will continue to return until q > 0, and so there is no beneﬁt to the randomization. We leave the discussion of stochastic mechanisms unresolved and simply restrict attention to deterministic price schedules remaining agnostic about the reasons.

Multidimensional Screening

177

We take the joint density to be f (θ1 , θ2 ) > 0 on 1 × 2 , the marginal distribution of θ1 as f 1 (θ

θ1 ), and the conditional cumulative distribution function for θ2 as G(θ2 | θ1 ) ≡ θ 2 f (θ1 , t) dt. Deﬁne the indirect utility function 2

u(θ1 ) ≡ max v(q, θ1 ) − P(q). q∈Q

This indirect utility is independent of the additive component, θ2 , because it does not affect the optimal choice of q, conditional on q > 0. Net utility is given by u(θ1 ) − θ2 . Note that the agent’s participation contains an additional random component: i.e., the agent participates iff u(θ1 ) ≥ θ2 . Hence, an agent with type θ1 participates with probability G(u(θ1 ) | θ1 ), and the expected proﬁt of a mechanism that generates {q(θ1 ), u(θ1 )} for all participating agents is G(u(θ1 ) | θ1 ) (S(q(θ1 ), θ1 ) − u(θ1 )) f (θ1 ) dθ1 . 1

This is maximized subject to the standard one-dimensional incentive compati˙ 1 ) = v θ1 (q(θ1 ), θ1 ) and q(θ1 ) nondecreasing. In short, we bility conditions: u(θ have removed the typical corner condition that would require the utility of the lowest type – which we denote u ≡ u( θ 1 ) − to be zero, and instead introduced an endogenous determination of u. The endogeneity of u poses some difﬁculties that were not present in the onedimensional setting. First and foremost, part of the block-recursive structure is now lost: There is a nonrecursive aspect to the problem as the entire function q(θ1 ) and the initial condition u( θ 1 ) must be jointly determined. Given that a purchasing consumer’s preferences are ordered by a single-crossing property in (θ1 , q), the general problem of global vs. local incentive constraints is not present; incentive constraints are still recursive in their structure, although we may have to restrict q to a nondecreasing allocation. The problem is that the ﬁrstorder condition determining the optimal utility for the lowest-type u depends on the optimal quantity schedule, {q(θ1 )}θ1 ∈ 1 , and the ﬁrst-order equation for the latter (speciﬁcally, the Euler equation) depends on the value of the former. Thus, although the resulting system of equations is not a system of partial differential equations as is common in the general multidimensional continuous type setting, but rather a second-order boundary-value problem, it is still more complicated than the standard initial-value ﬁrst-order problem that arises in the canonical class of one-dimensional models. Finding general characteristics of the solution is difﬁcult without imposing some additional structure. A convenient restriction used in Rochet and Stole (2002) is to focus attention on independent distributions of θ1 and θ2 , requiring that the former is distributed uniformly on 1 and that the latter have a log-concave conditional cumulative distribution function.27 Even with these distributional simpliﬁcations, the additional effect on market share still 27

In Rochet and Stole (2002), some general results are nonetheless available in the two-type setting, providing that G(θ2 | θ1 ) is log-concave in θ2 .

178

Rochet and Stole

provides substantial difﬁculty. The primary cause of the difﬁculties is that the relaxed program (without monotonicity imposed) frequently generates nonmonotonic solutions. Hence, pooling occurs even with nonpathological distributions. Nonetheless, as a ﬁrst result, one can show that if pooling occurs, it occurs only for a lower interval on 1 and that otherwise efﬁciency occurs on the boundaries of 1 . This already is a substantial departure from the one-dimensional setting, and shares many similarities with the work of Rochet and Chon´e (1998) (see Section 9.2), especially the general presence of bunching and the efﬁciency on the boundaries. Several results emerge beyond pooling or efﬁciency at the bottom. First, as the distribution on 2 converges to an atom at θ2 = 0, the optimal allocation converges to that of the standard one-dimensional setting. Second, one can demonstrate that the optimal solution is always bounded above by the ﬁrst-best allocation and below by the MR allocation. This last result has a clear economic intuition behind it. Under the standard one-dimensional setting, there is no reason not to extract the maximal amount of rent from the agents. This is the reason for distorting output downward; it allows the principal to extract greater rents from the higher types without completely shutting off the lower types from the market. When participation depends monotonically on the amount of rent left to the agent, it seems natural to leave more rents to the agent on the margin, and therefore to reduce the magnitude of the distortions. The argument is a bit more involved than this, because the presence of pooling eliminates these simple envelope-style arguments. These results can be illustrated with a numerical example. Suppose that −θ2 /σ . Here, we use σ as a crude measure of 1 = [4, 5] and G(θ2 ) = 1 − e the amount of noise in the participation constraint. As σ goes to zero, the exponential distribution converges to an atom on zero. In the example, as σ becomes small, the optimal allocation converges pointwise to the MR allocation, although pooling emerges at the bottom. For σ sufﬁciently large, the allocation becomes efﬁcient on the boundaries of 1 (see Figure 5.6). Returning to our previous discussion of other applications, it should be clear that these results immediately extend to the regulatory environment of Baron and Myerson (1982), where marginal and ﬁxed costs are represented by θ1 and θ2 , respectively, and the regulator is restricted to offering a deterministic, nonlinear transfer schedule. Other settings ﬁt into this class of models in a less obvious manner. For example, consider the papers of Lewis and Sappington (1988) and Armstrong (1999a), which look at regulation of a ﬁrm in an environment of twodimensional private information: demand is q = x − p, marginal cost is c, and the ﬁrm’s private information is (x, c). The regulator observes only the price of the ﬁrm’s output and offers a transfer that depends on price, T ( p). The ﬁrm’s payoff is u = (x − p)( p − c) + T ( p); the regulator maximizes consumer surplus less transfer, W = 12 (x − p)2 − T ( p). Redeﬁne the private information as θ1 = x + c and θ2 = xc. Following similar arguments as previously described after substituting for the demand function,

Multidimensional Screening

179

quality, q(θ1) 5

4.5

.07

>5 .0 = 4

qFB(θ1) 4 = 1.0

3.5

= 0.25 qMR(θ1)

4.2

4.4

4.6

4.8

5 type, θ1

Figure 5.6. The monopoly solution with a uniform distribution of θ1 on [4, 5] and an exponential distribution of θ2 : G(u) = 1 − e−u .

we can deﬁne u(θ1 ) ≡ max p θ1 p − p 2 + T ( p) and p(θ1 ) to be the correspond˙ 1 ) = p(θ1 ) ≥ 0; ing maximizer. Note that the local IC constraint requires that u(θ second-order conditions require that u is convex in θ1 (i.e., p is nondecreasing in θ1 ). The ﬁrm will participate if and only if u(θ1 ) ≥ θ2 . The regulator’s program can then be written as G(u(θ1 ) | θ1 ) max { p(θ1 ),u(θ1 )} 1 1 2 × γ1 (θ1 , u(θ1 )) + p(θ1 )γ2 (θ1 , u(θ1 )) − p(θ1 ) − u(θ1 ) dθ1 , 2 ˙ 1 ) = p(θ1 ) and u(θ ¨ 1 ) ≥ 0, where γ1 (θ1 , θ2 ) = E[ 12 x 2 | θ1 , θ˜2 ≤ subject to u(θ ˜ θ2 ] and γ2 (θ1 , θ2 ) = E[c | θ1 , θ2 ≤ θ2 ]. As another example, the work on optimal taxation is frequently concerned about leaving rents to agents that is characterized as part of the principal’s objective function. Here, there is a natural connection to this class of models. As a last example, it is worth noting the recent work of Che and Gale (2000) on two-dimensional screening when one of the dimensions is the budget constraint of the buyer. In their framework, the monopolist is selling a good to a consumer with preferences u = θ1 q − P, but with a budget constraint given by θ2 . Hence, the indirect utility function is necessarily two-dimensional: u(θ1 , θ2 ) ≡

max

{q|P(q)≤θ2 }

θ1 q − P(q).

180

Rochet and Stole

This is a departure from the basic model presented in that θ2 does not enter utility linearly, and monetary payments can be directly informative about the buyer’s budget θ2 , because a buyer cannot pay more than he has available. Although this problem looks more complicated than the previous setting, the authors demonstrate that an optimal nonlinear pricing schedule is increasing, convex, and goes through the origin. This pins down the utility of the lowest type, u; efﬁciency at the top determines the other boundary. Although the resulting Euler equation generates a second-order differential equation, the solution can be found analytically in many simple examples. Formally, this setting differs from the previous in that the variable θ2 represents dissipated surplus in the case of Rochet and Stole (2002), but θ2 represents a constraint on how much money can be transferred to the principal in Che and Gale (2000). This minor difference nonetheless translates into a signiﬁcant effect on the nature of the solution: in Rochet and Stole (2002) the determination of the participation region is more difﬁcult than in Che and Gale’s (2000) setting, where the latter are able to demonstrate that the optimal tariff goes through the origin and generates full participation, albeit with distorted consumption.28 Finally, it is worth pointing out that the general class of problems contained in this section are closely related to models of nonlinear pricing with income effects. As Wilson (1993a) has noted in his discussion of income effects (i.e., in models in which the marginal utility of money is not constant, but varies either with wealth levels or with some related parameterization), in general the Euler conditions for optimality will consist of second-order differential equations (rather than ﬁrst-order in the canonical case) and ﬁxed fees may be part of an optimal pricing schedule. Using the demand-proﬁle approach, suppose that the income effect is modeled by a nonlinearity in money: N [P, p, q] = Prob[θ ∈

28

| MRS(q, I − P(q), θ) ≥ p(q)].

One may be tempted to solve the budget-constrained class of problems in Che and Gale (2000) by appealing to the aggregation results presented earlier. In particular, a natural candidate for a sufﬁcient statistic when there are unit demands, q ∈ [0, 1], is θ = min{θ1 , θ2 }. This line of reasoning is ﬂawed because min{θ1 , θ2 } is not a sufﬁcient statistic for the consumer’s marginal rate of substitution between money and q. A simple example from Che and Gale (2000) demonstrates this most clearly. Suppose ﬁrst that the consumer’s valuation for the good is distributed uniformly on 1 = [0, 1] and the consumer’s wealth is nonstochastic and equal to θ2 = 2. The revenue-maximizing unit price is P(1) = 12 and expected revenues are 14 . Utilizing a pricequantity schedule cannot further increase revenues. Now, suppose instead that the consumer’s valuation is ﬁxed at θ1 = 2, but wealth is a random variable distributed uniformly on 2 = [0, 1]. In this case, min{θ1 , θ2 } is identical as in the former setting, but now the monopolist can raise expected revenues by charging the price schedule P(q) = 2q for q ∈ [0, 1]. Each consumer of type θ2 purchases the fraction q = θ2 /2, and expected revenues are 12 . Aggregation fails because the marginal rates of substitution differ across the two settings and are not functions of the same aggregate statistic. In the ﬁrst, the marginal rate of substitution of q for money is θ1 if the total purchase price is less than or equal to 2 and 0 if the total price is greater than 2. In the second setting, the marginal rate of substitution is 2 if the total price is less than or equal to θ2 , and 0 otherwise.

Multidimensional Screening

181

Here, I represents income and the demand proﬁle depends on the marginal price, p(q), and the total price level, P(q), since the latter affects the marginal rate of substitution of q for money. The Euler equation is $ ∂N d ∂N [ p(q) − C (q)] − N+ [ p(q) − C (q)] = 0. ∂P dq ∂p Because the second component is totally differentiated by q, a second-order differential equation arises. The problem loses much of its tractability because N now depends on the total price level P as well as marginal price, p. Economically, the problem is complicated because the choice of a marginal price for some q will shift the consumer’s demand curve via an income effect, which will affect the optimality of other marginal prices. Hence, the program is no longer block-recursive in structure as in Rochet and Stole (2002). As one raises the marginal price of a given level of output, one also lowers the participation rate for all consumers who consume that margin or greater. It is not a coincidence that, in some models of self-selection, private information over income, and exponential utility, the nature of the optimal allocation resembles that of the allocations in the nonlinear pricing context with random participation, as in Salani´e (1990) and Laffont and Rochet (1998). 7. COMPETITIVE ENVIRONMENTS This section builds on the previous sections by applying various models to study the effects of competition on the design of screening contracts. There have been some limited attempts to model imperfect competition between ﬁrms competing with nonlinear prices within a one-dimensional framework. This, for example, is the approach taken in the papers by Spulber (1989), Ivaldi and Martimort (1994), and Stole (1995). Similarly, in most work on common agency in screening environments (e.g., Stole 1991 and Martimort 1992, 1996), the agent’s private information is of one dimension. Unfortunately, as argued previously, competitive models naturally suggest at least two dimensions of heterogeneity; so, the robustness of these approaches may be called into question. Several papers have considered competitive nonlinear pricing using a variety of methodologies. We brieﬂy survey a few papers using the demand-proﬁle methodology with some limited success. We then present a speciﬁc form of bidimensional heterogeneity that has been successful in applied work. 7.1.

A Variety of Demand-Proﬁle Approaches

Wilson (1993a, Chapter 12) surveys the basic economics of ﬁrms competing with nonlinear prices, outlining two general classes of models. The ﬁrst category supposes that there is some product differentiation between the ﬁrms. As before, an aggregate demand proﬁle can be constructed that measures the proportion of consumers who buy from ﬁrm i at least q units when the marginal price is p;

182

Rochet and Stole

this demand proﬁle obviously depends on the nonlinear price schedules offered by the other ﬁrms. The ﬁrst-order conditions for optimality now include terms capturing the ﬂux of consumer purchases on the boundaries, but also isolate a competitive externality. Wilson numerically solves two models of this sort. A second category of models discussed by Wilson (1993a) assumes that products are homogeneous. Now, to avoid the outcome of zero-proﬁt, marginalcost pricing between the competing ﬁrms, one has to assume some sort of extensive form game (e.g., a Cournot game where output is brought to market and then is subsequently priced with nonlinear price schedules, etc.). Several games are considered with a variety of strategic restrictions and results in Oren, Smith, and Wilson (1982) using the demand-proﬁle approach. 7.2.

A Speciﬁc Approach: Location Models (Hotelling Type)

The third, more recent, approach has been to model competition in multidimensional environments in which simple aggregation is not available by introducing one dimension of uncertainty to handle the differentiation between ﬁrms (e.g., brand location and, more generally, “horizontal” heterogeneity) and another dimension to capture important characteristics of consumer tastes that may be similar in effect across all ﬁrms (e.g., marginal willingness to pay for quantity/quality and, more generally, “vertical” heterogeneity). Recent papers that take this approach include Armstrong and Vickers (1999), Biglaiser and Mezzetti (1999), Rochet and Stole (2002), and Schmidt-Mohr and Villas-Boas (1999), among others. We brieﬂy survey the model and results in Rochet and Stole (1997, 2002) before remarking on the similar treatments by other authors. As we suggest, this framework for modeling oligopoly markets is quite general; we need only posit some distribution of horizontal preferences.29 What is fundamental is that our proposed model affords both a vertical preference parameter along the lines of Mussa and Rosen (1978), while also incorporating a measure of imperfect competition by allowing for distinct horizontal preferences.30 For simplicity, consider the case of two ﬁrms competing on either ends of a market with unit length and transportation cost σ . We will let θ2L ≡ θ2 denote the distance from a consumer located at θ2 to the left ﬁrm and θ2R ≡ 1 − θ2 denote the distance from the same consumer to the right ﬁrm. Preferences are as before: For a consumer of type (θ1 , θ2 ) consuming from ﬁrm j, an amount q j at a price 29 30

Such a framework has been usefully employed recently by Laffont, Rey, and Tirole (1998a, 1998b) and Dessein (1999) for studying competition between telecommunications networks. This modeling of competition is in the spirit of some recent empirical work on price discrimination. Leslie (1999), for example, in his study of Broadway theater ticket pricing ﬁnds it useful to incorporate heterogeneous valuations of outside alternatives to capture the presence of competing ﬁrms while maintaining a distinct form of vertical heterogeneity (in this case, income) to capture variation in preferences over quality. Because Leslie (1999) takes the quality of theater seats as ﬁxed, he does not solve for the optimal quality-price schedule. Similarly, Ginsburgh and Weber (1996) use a Hotelling-type model to study price discrimination in the European car market.

Multidimensional Screening

183 j

of P j , the consumer obtains utility of θ1 q j − θ2 − P j . We further assume that θ1 is distributed independently of θ2 , with F(θ1 ) and G(θ2 ) representing the distribution of types, respectively. Each ﬁrm simultaneously posts a publicly observable price schedule, Pi (qi ), after which each consumer decides which ﬁrm (if any) to visit and which price-quality pair to select. The market share of ﬁrm j among consumers of type (θ1 , θ2 ) can be computed easily: $ u j 1 u j − uk M j (u j , u k ) = G j min , + . (7.1) σ 2 2σ This comes from the fact that the marginal consumer of ﬁrm j is located at a distance that is the minimum of u j (θ1 )/σ (which occurs when the total market shares are less than one – the local monopoly regime) and 12 + (u j − u k )/2σ (which occurs when all the market is served – the competitive regime). Again, using the dual approach, we can write the total expected proﬁt of ﬁrm i as a functional involving the consumers’ rents u i (·) and u j (·) taken as the strategic variables of the two ﬁrms: u j (θ1 ) ≡ max θ1 q − P j (q), q

where P j is the price schedule chosen by ﬁrm j. We obtain θ¯1 {S(t, q j (t)) − u j (t)}M j (u j (t), u k (t)) dt, B j (u j , u k ) =

(7.2)

θ1

where qi is again related to u i by the ﬁrst-order differential equation u˙ j (θ1 ) = q j (θ1 ). We now look for a Nash equilibrium of the normal form game deﬁned by (7.1) and (7.2), where the strategy spaces of the ﬁrms have been restricted to u i consistent with nondecreasing quality allocations. This turns out to be a difﬁcult task in general, because of the monotonicity conditions [remember that q L (·) and q R (·) have to be nondecreasing]. However, if we neglect these monotonicity conditions (which can be checked ex post), competitive nonlinear prices can be characterized by a straightforward set of Hamiltonian equations. A numerical example is illustrative. Consider, for example, the case when θ1 is uniformly distributed on [4, 5], which is shown in Figure 5.7. For σ sufﬁciently large (i.e., σ > 14.8), the market shares of the two ﬁrms do not adjoin: each ﬁrm is in a (local) monopoly situation and the quality allocation is exactly the same as in our previously analyzed monopoly setting.31 Interestingly, when the market shares are adjoining for high θ1 (i.e., u L (θ 1 ) + u R (θ 1 ) ≥ σ ), but not all θ1 (i.e, u L (θ 1 ) + u R (θ 1 ) < σ ), the qualitative pattern of the solution remains identical (cf. Figure 5.7 below for σ = 10). However, when u L ( θ 1 ) + u R ( θ 1 ) ≥ σ (the fully competitive regime), it turns out that quality distortions disappear completely (cf. Figure 5.7 below, σ < 16/3). In this particular case, the equilibrium pricing schedules are cost-plus-fee schedules. 31

It can be proved that this local monopoly solution involves full separation.

184

Rochet and Stole

quality, q(θ1) 5

4.8

4.6

qFB(θ1), < 5.33

4.4

> 14.8 = 10.0

4.2

4.2

4.4

4.6

4.8

5 type, θ1

Figure 5.7. Quality choices in the oligopoly equilibrium for three regimes: fully competitive (σ < 16/3), mixed (σ = 10), and local monopoly (σ > 14.8). We assume that θ2 is uniform on [0, 1]; θ1 is uniform on [4, 5].

As demonstrated by Armstrong and Vickers (1999) and Rochet and Stole (2002), the result that efﬁciency in q emerges for a fully covered market is somewhat general, extending well beyond this simple example. Formally, if σ is sufﬁciently small so as to guarantee that every consumer in 1 × 2 purchases, in equilibrium each ﬁrm offers a cost-plus-fee pricing schedule, P j (q) = C(q) + F j , and each customer consumes the efﬁcient allocation q f b (t) from one of the two ﬁrms. Fundamentally, this result relies on full coverage and the requirement that the inverse hazard rate is constant over θ1 in equilibrium for each ﬁrm. More generally, we could think about an N -ﬁrm oligopoly with a joint distribution of (θ21 , . . . , θ2N ). Formally, let G i (u 1 , . . . , u N ) ≡ Prob[u i − θ2i ≥ max j=i (u j − j θ2 )], and let the inverse hazard rate be given by Hi (u 1 , . . . , u N ) =

G i (u 1 , . . . , u N ) . G i (u 1 , . . . , u N )

∂ ∂u i

Then, if d/du Hi (u, . . . , u) = 0, for each i, cost-plus-ﬁxed-fee pricing is an equilibrium outcome. Biglaiser and Mezzetti (1999), in a different context, consider the case of auctions for incentive contracts of a restricted form. Because sellers have heterogeneity over their ability to provide quality, each seller’s objective function takes a similar form as in Armstrong and Vickers (1999) and Rochet and Stole

Multidimensional Screening

185

(2002). Nonetheless, because of the structure of preferences and contracts, efﬁcient cost-based pricing emerges only in the limit as preferences become homogeneous. 8. SEQUENTIAL SCREENING MODELS A common setting in which multidimensional screening is particularly important is when information evolves over time. For example, an important paper by Baron and Besanko (1984) considers the environment in which a regulated ﬁrm learns information about its marginal cost over time, and the regulator sets prices over time as a function of the ﬁrm’s sequential choices.32 As another example, consider the problem of refund pricing studied by Courty and Li (2000). Here, a consumer purchases an airline ticket knowing that, with some probability, the value of the trip may change. The initial purchase price may depend on the refund price, particularly when marginal return to the ticket may be positively correlated with the likelihood of a change in plans or a high second-period valuation. As a third important example, Clay, Sibley, and Srinagesh (1992) and Miravete (1997) provide convincing evidence that individuals choose a variety of purchase plans, such as telephone services and electricity, which turns out ex post to be suboptimal, suggesting that the consumer is uncertain about his ﬁnal needs at the time of contracting. Miravete (1997) goes on to analyze this optimal two-stage tariff problem theoretically. In these settings, the agent learns at t = 1 an initial one-dimensional type parameter, θ1 , distributed according to F1 (θ1 ) on 1 , and enters into a contractual relationship with this private information. Making an appeal to the revelation principle, without loss of generality it is assumed that the ﬁrm offers a menu of nonlinear price schedules, {P(q, θˆ 1 )}θ1 ∈ 1 , which we index by ﬁrst-period report, θˆ 1 . Later, at t = 2, additional information is revealed to the agent that is economically relevant and that affects the agent’s marginal utility of the contractual activity in the ﬁnal period. We denote this second-period information as θ2 , which is conditionally distributed according to F2 (θ2 | θ1 ) on 2 with density f (θ2 | θ1 ). After the realization of θ2 , the consumer chooses a particular quantity from the schedule. Assume that the consumer’s ﬁnal utility is given by33 1 u = θ2 q − q 2 − P. 2

32 33

Baron and Besanko (1984) also address issues of moral hazard and optimal production choice over time in the context of their model. Note that θ2 can directly depend on θ1 in this setting. For example, it can be the case that θ2 = θ1 + x, where x is independently distributed from θ1 and is learned at t = 2; this is the setting studied in Miravete (1996). For conciseness, we do, however, require that the support 2 is independent of θ1 .

186

Rochet and Stole

Given the speciﬁc utility of the agent in this setting, we know that θ2 would be a sufﬁcient statistic for the agent’s preferences, and therefore at date t = 2, in absence of a prior contract, there would be a simple one-dimensional problem to solve – θ1 is payoff irrelevant conditional on θ2 .34 This class of models differs from previous formulations of the multidimensional problem in that the sequential nature of the information revelation restricts the agent’s ability to lie. Nonetheless, the recurring theme of this survey – that multidimensional models generally pose difﬁculties in determining the “tree” of binding incentive constraints – reappears in the sequential context as the single-crossing property is once again endogenous. Although the papers written to date on sequential screening have typically imposed sufﬁcient conditions to guarantee a complete ordering (i.e., the single-crossing condition), the source of the problem is still the familiar one. To see this clearly, consider the second stage of the relationship: incentive compatibility for any given schedule chosen at t = 1 is guaranteed by the standard methods. Speciﬁcally, deﬁning second-period indirect utility and optimal choice by 1 u(θˆ 1 , θ2 ) ≡ max θ2 q − q 2 − P(q | θˆ 1 ), q∈Q 2 1 q(θˆ 1 , θ2 ) ≡ arg max θ2 q − q 2 − P(q | θˆ 1 ), q∈Q 2 second-period incentive compatibility is equivalent to ∂u(θ1 , θ2 )/∂θ2 = q(θ1 , θ2 ) and q(θ1 , θ2 ) nondecreasing in θ2 . The approach is standard because preferences satisfy a single-crossing property in the second period. First-period incentive compatibility is more difﬁcult to address because single crossing in (q, θ1 ) is not exogenously given. Assuming second-period incentive compatibility, ﬁrst-period indirect utility as a function of true type θ1 and reported type θˆ 1 can be deﬁned as θ2 ˆ ˜ θ 1 | θ1 ) ≡ u(θˆ1 , θ2 ) f 2 (θ2 | θ1 ) dθ2 . u( θ2

The relevant ﬁrst-order local condition for truth-telling at t = 1 requires that θ2 ∂ f 2 (θ2 | θ1 ) d ˜ 1 | θ1 ) = u(θˆ 1 , θ2 ) dθ2 . u(θ dθ1 ∂θ1 θ2 ˜ 1 | θ1 )/∂θ1 ∂ θˆ 1 ≥ In the standard setting, the local second-order condition ∂ 2 u(θ 0, in tandem with a monotonicity condition, yields a sufﬁcient condition for 34

With more general preferences, however, we could remove the presence of a one-dimensional aggregate and still ﬁnd that the sequential mechanism restricts the manner in which the agent can misreport, thereby simplifying the set of binding incentive constraints.

Multidimensional Screening

187

global incentive compatibility – the global single-crossing property: ˜ 1 | θˆ 1 )/∂θ1 ∂ θˆ 1 ≥ 0 ∀θ1 , θˆ 1 ∈ ∂ 2 u(θ

1.

(scp)

In the present setting, this argument is not available. Instead, the focus is on maximizing the relaxed program (i.e., the program with only local ﬁrst-order incentive conditions imposed), and then checking ex post that the second-order conditions are satisﬁed. Substituting the agent’s ﬁrst-order condition into the principal’s program and integrating by parts twice, we obtain the following virtual surplus for the sequential design program: 1 ˜ θ 1 | θ 1 ), !(q, θ1 , θ2 ) = θ2 q − q 2 + α(θ1 , θ2 )q − u( 2 where α(θ1 , θ2 ) ≡

∂ F2 (θ2 | θ1 )/∂θ1 f 2 (θ2 | θ1 )

1 − F1 (θ1 ) . f 1 (θ1 )

Given that expected proﬁt is E θ1 ,θ2 [!(q, θ1 , θ2 )] and the IR constraint binds only for the lowest type θ1 = θ 1 , proﬁt is maximized by choosing q to maximize !(q, θ1 , θ2 ) pointwise over 1 × 2 . In general, the nature of the distortion will depend on the nature of the conditional distribution, F2 (θ2 | θ1 ). Baron and Besanko (1984) and Courty and Li (2000) consider the case of ﬁrst-order stochastic dominance (FSD): i.e., θ1 represents a ﬁrst-order FSD shift in the distribution of θ2 . Both demonstrate that, under FSD, the IR constraint will bind for the lowest type because: (i) utility in the second period is nondecreasing in θ2 , and (ii) θ1 shifts the distribution toward higher types. Hence, this relaxed program is cast in the appropriate form, and the principal will choose ˜ θ 1 , θ 1 ) = 0. Examining the relaxed program, we see that in the FSD case, u( α < 0, and the distortion in q is downward, away from the full-information solution. The intuition is by now familiar: By distorting consumption downward by a small amount, q, only a second-order loss in surplus arises, but a ﬁrst-order gain in rent reduction is obtained as captured by −αq. The difference in the sequential screening FSD model is that the dependence of future rents on θ1 depends on the informativeness function, α. Baron and Besanko (1984) note the importance of commitment in this context, because the secondperiod allocation will be constrained efﬁcient only if α(θ1 , θ2 ) happens to equal 1 − F2 (θ2 | θ1 )/ f 2 (θ2 | θ1 ). When are the global incentive constraints satisﬁed by the q that solves the relaxed program? Baron and Besanko (1984) are silent on the sufﬁcient conditions for global incentive compatibility, providing instead an example in which the global constraints are satisﬁed. Courty and Li (2000) demonstrate that if the resulting q(θ1 , θ2 ) allocation is nondecreasing in both arguments, then a price

188

Rochet and Stole

schedule exists that implements the allocation and satisﬁes global incentive compatibility.35 Courty and Li (2000) also consider the case in which θ1 parameterizes the distribution of θ2 in terms of a mean-preserving spread. Again, they demonstrate that the IR constraint will bind only for the lowest type, so the relaxed program is appropriate.36 Global incentive compatibility in the ﬁrst stage is again difﬁcult to assess, but Courty and Li provide a useful sufﬁcient condition to this end.37 Interestingly, this incentive problem shares many similarities with the one-dimensional problem in which the sign of the cross-partial of expected utility with respect to q and θ1 changes sign as one varies (q, θ1 ) ∈ Q × 1 ; see Araujo and Moreira (2000) for a discussion.38 Taking the distributional assumption of Courty and Li, the solution to the relaxed program has a simple economic description. For all stage-one types, θ1 < θ 1 , the principal introduces a distortion in the second-period adjustment. One can think of the ﬁnal price as a markup over cost that depends on the difference between the ﬁnal consumption and its expected value. The lower the initial type (i.e., the lower the noise in the second-stage marginal utility of consumption), the less valuable is the option to change consumption plans in the future. Note that variability creates higher value in expected consumption (which is why the IR constraint binds only for the lowest type, θ1 ) and hence the monopolist will offer a lower price to this less consumption-valuing segment. The high types have high variability and also 35

This follows immediately from the global sufﬁcient condition for incentive compatibility, θ2 ˜ 1 | θˆ 1 ) ∂ 2 u(θ ∂u(θˆ 1 , θ2 ) ∂ f 2 (θ2 | θ1 ) = dθ2 ≥ 0. ˆ ∂θ1 ∂θ1 ∂ θ 1 ∂ θˆ 1 θ2

36

37

38

Given that q(θ1 , θ2 ) is nondecreasing in the ﬁrst argument, the ﬁrst term in the integrand is a nondecreasing function of θ2 . Because θ1 represents an FSD improvement in θ2 , this integral must be nonnegative. Moreover, the fact that q(θ1 , θ2 ) is nondecreasing in its second argument guarantees second-period global incentive compatibility. Because global incentive compatibility does not require that q(θ1 , θ 2 ) be nondecreasing in the ﬁrst argument, when the relaxed solution is neither monotone nor globally incentive compatible, the solution to the unrelaxed program is more complex than simply using an ironing procedure on q in the relaxed program. This follows from the necessary condition for second-stage incentive compatibility: q(θ1 , θ2 ) is nondecreasing in θ2 . This (with the local ﬁrst-order condition) implies that u(θ1 , θ2 ) is convex in θ2 , and hence a mean-preserving spread must increase utility; hence the IR constraint can bind only for the lowest type, θ1 . One useful simplifying assumption used by Courty and Li is that the class of distribution functions passes through the same point θ2 = z for all θ1 . This assumption guarantees that α(θ1 , θ2 ) is negative (positive) for all θ2 < z (resp., θ2 > z). Providing that the resulting allocation q(θ1 , θ2 ) from the relaxed program is nondecreasing in each argument, global incentive compatibility is satisﬁed at both stages. This will be the case, for example, whenever (∂ F2 /∂θ1 )/ f 2 does not vary much in θ1 . Araujo and Moreira (2000) study a one-dimensional screening model where the single-crossing condition is relaxed. As a result, nonlocal incentive constraints can be binding. They derive optimal contracts within the set of piecewise continuous contracts, and apply their techniques to bidimensional models with (perfect) negative correlation between the two dimensions of individual heterogeneity.

Multidimensional Screening

189

high option value from altering future consumption. Hence, the ﬁrm can screen these customers from the low types by charging a premium for the initial ticket, but allowing a low-cost variation in the level of ﬁnal consumption. It is also the case that any θ1 type that draws θ2 = z in the second period will consume the efﬁcient allocation; in our present setting, this is q = θ2 = z. The actual allocation will rotate through this point, departing from the ﬁrst-best allocation q = θ2 increasingly as θ1 decreases. Although the ﬁnal allocation may have some individuals consuming above the ﬁrst-best level of output, this should not be considered an upward distortion. Rather, the distortion is in the amount of allowed stage-two adjustment; the principal optimally distorts this adjustment downward from the efﬁcient level.39 9. PRODUCT BUNDLING In the previous discussions, we have largely focused on a variety of models that are tractable at some expense in generality. Providing, for example, that either simple aggregation or separability exists, the type space is small and discrete, or n = 1, we can deal with multidimensional environments with some success. We now turn to a set of models in which multidimensional screening poses the most difﬁcult problems: n > 1 and m > 1 with nonseparable and nonaggregatable preferences. The most well-studied version of this problem is the problem of commodity bundling by a multiproduct monopolist. We will consider the papers in the literature in this context. 9.1.

Some Simple Bundling Environments

We begin with the simplest linear n-product monopolist bundling environment, where m = n. Consumer preferences are given by u=

n

θi qi − P,

i=1

where each θi is independently and identically distributed according to the distribution function F(θi ) on i . (Below, we extend this model to quadratic preferences.) The cost of production is assumed to be zero, but demands are for at most one unit of each product; hence without loss of generality qi ∈ [0, 1]. The monopolist’s space of contracts is assumed to be a price schedule P(q1 , . . . , qn ) deﬁned on the domain [0, 1]n . Given that preferences are linear in money and consumption, we can think of qi ∈ (0, 1) as representing either a lottery over unit consumption or partial (but deterministic) consumption. We seek to ﬁnd the optimal price schedule. Nonetheless, even in this simpliﬁed setting, we are still looking for a collection of 2n − 1 prices. 39

This effect is similar to that which arises in signaling models in which agents desire to signal variance to the market. See, e.g., Prendergast and Stole (1996).

190

Rochet and Stole

Unlike the full one-dimensional (n = m = 1) setting in which the economics of the downward distortion is well understood, it is difﬁcult to see the economics behind the optimal screening contract in multidimensional environments. This is in part because the multidimensional bundling environment is mathematically more complex, but also because there are at least two distinct economic effects. The ﬁrst is a familiar sorting effect in which consumption is distorted downward to reduce the rents to “higher” types; the second effect arises because if demand parameters are independently distributed, a law-of-large-numbers argument shows that multigoods have a “homogenizing” effect on consumer heterogeneity. To illustrate these effects, we will present two extreme forms of this model: when n = 2 and when n → ∞. 9.1.1.

The Case of n = m = 2: Similarities with the One-Dimensional Paradigm

When n = 2, given the symmetry of the problem, we are looking for two marginal prices, p(1) and p(2); i.e., the price for one good, and the price for a second good, having already purchased the ﬁrst good. The key insight is that even though the marginal values are independently distributed, the order statistics are positively correlated. This positive correlation makes the bundling environment akin to the classic one-dimensional paradigm. In short, provided that the ﬁrst-order statistic of one consumer is greater than the ﬁrst-order statistic of another, it is more likely than not that the second-order statistics are similarly ordered. Hence, it is probable that the two-good demand curves of any two consumers are nested. In this sense, a single-crossing property is present in a stochastic fashion. To demonstrate this more precisely, denote consumer θ’s ﬁrst- and secondorder statistics as θ (1) and θ (2) , and refer to the corresponding ﬁrst and second units of consumption as q (1) and q (2) . Considering that it is physically possible to consume the second unit only after having consumed the ﬁrst unit, the ﬁrm could think of this as a simple one-dimensional problem and construct the demand proﬁle as follows: N ( p, q (i) ) = Prob[θ (i) ≥ p]. One could then apply the one-dimensional paradigm for the demand proﬁle to this function to obtain the optimal marginal prices. This procedure, although possibly proﬁtable, will not obtain the maximum possible revenue. The reason why it may work in a crude sense is that a large subset of the type space will have nested demand curves (hence the one-dimensional single-crossing property will hold). Because not all types are so ordered, however, this procedure will fail to maximize revenue. To return to the intuition for why a large subset of types are ordered as if they had one dimension, think about two consumers, where consumer A has a higher ﬁrst-order statistic than consumer B. Conditional on this fact, it is also likely that consumer A will have a higher second-order statistic than B. In the case of uniformly distributed types on i = [0, 1], there is a three-fourths

Multidimensional Screening

191

probability that such a nesting of demand curves will emerge between any two consumers. If the types were perfectly positively correlated, then demand curves over {q (1) , q (2) } would always be nested, and we would be in the equivalent of a one-dimensional world. Because some types will have nonnested demand, a one-dimensional single-crossing property will not hold, and hence the simple demand-proﬁle procedure will not maximize proﬁts. Mathematically, the ﬁrm needs to account for the possibility of nonnested curves, and this alters the optimal price. A ﬁrm following the simple demand-proﬁle procedure incorrectly perceives its proﬁt to be π = p(1)Prob[θ (1) ≥ p (1) ] + p(2)Prob[θ (2) ≥ p (2) ], when in fact its proﬁt is given by π = p(1)(Prob[θ (1) ≥ p (1) ] + Prob[θ (1) < p (1) & θ (1) + θ (2) > p (1) + p (2) ]) + p(2)(Prob[θ (2) ≥ p (2) ] − Prob[θ (2) ≥ p (2) & θ (1) + θ (2) < p (1) + p (2) ]). There is an adjustment that must be made to the demand for each product that is not noted by the naive seller. Nonetheless, to the extent that these second terms are small, the naive one-dimensional screening approach does well in approximating the optimal solution. This simple example makes two points. First, the well-known economic principle behind nonlinear pricing in the one-dimensional model is still present in the two-dimensional model, albeit obscured. Second, as n becomes large, the likelihood that any two consumers will have ordered demand curves decreases to zero, suggesting that the one-dimensional intuition begins to wane as n increases. Although it is difﬁcult to make this second idea precise, we will see that a homogenizing effect along the lines of the law of large numbers removes most of the value for sorting as n increases, suggesting that the one-dimensional intuition is less appropriate for larger n. 9.1.2.

The Case of n = m → ∞: The Homogenizing Effect of the Law of Large Numbers

It has been noted in a few papers that an increase in n with independently distributed types has the effect of allowing a ﬁrm to capture most of the consumer surplus.40 The idea is simple: selling a single aggregate bundle (again assuming marginal cost is zero) can extract most of the consumer’s rents, because as n becomes large the per-unit value of this bundle converges to the sample mean. Using the argument in Armstrong (1999b), let si (θi ) ≡ θi qi∗ (θi ) − C(qi∗ (θi )) represent the social surplus generated by a consumer of type θ who consumes the full-information efﬁcient allocation, qi∗ (θi ). Suppose that the distribution of si (θi ) [derived from F(θi )] has mean µ and standard deviation σ . Then, a ﬁrm that offers cost-plus-fee pricing, P(q) = (1 − ε)µ + C(q), where 40

Schmalensee (1984), Armstrong (1999b), and Bakos and Brynjolfsson (1996).

192

Rochet and Stole 1

2

ε = 2 3 (σ/µ) 3 , will obtain expected proﬁts that converge to the full-information proﬁt level as n approaches inﬁnity.41 Armstrong demonstrates that this result easily extends to a setting with a particular form of positive correlation: u = θ0

n

θi qi − P.

i=1

Now, θ0 is a multiplicative shock, common to all n products, but independently distributed across consumers. Armstrong shows that as n increases, the ﬁrm’s proﬁt approaches that of a monopolist with uncertainty only over the common component. 9.2.

General Results on Product Bundling

In this section [based on Rochet and Chon´e (1998)], we generalize the bundling model presented to allow for multiple units demands. We come back to the general framework of nonlinear pricing by a multiproduct monopolist already studied by Wilson (1993a, 1993b) and Armstrong (1996) and presented in our Section 3.2. However, we do not assume the particular homogeneity properties of costs and types distributions that have allowed these authors to ﬁnd explicit solutions using the separability property. In other words, we consider the most general multidimensional screening model in which binding IC constraints are unknown a priori. For simplicity, we assume linear-quadratic preferences: $ n 1 2 u= θi qi − qi − P. 2 i=1 Like before, production costs are assumed to be constant and normalized to zero, but contrary to the simple bundling model presented previously, demands for each good are not restricted to be 0 or 1. Types θ are distributed on some convex domain , in accord with a continuous and positive density f (θ ). Building on our previous discussions, we want to characterize the optimal pricing policy of a monopolist, using the parametric-utility approach. The problem is thus to ﬁnd the function u ∗ that maximizes expected proﬁt E[π] = {S(θ, ∇u(θ)) − u(θ )} f (θ) dθ, over all convex, nonnegative functions u. When the second-order condition is not binding (i.e., when u ∗ is strictly convex), we already saw that u ∗ is characterized by two elements: 1. a partition of the boundary ∂ of into two subsets: r ∂0 , where u ∗ = 0 (binding participation constraint), and 41

More speciﬁcally, as shown in Armstrong (1999b), let π ∗ be the full-information expected proﬁt level and let π˜ be the expected proﬁt from the cost-plus-fee price schedule. Then, π˜ /π ∗ converges to 1 at speed n −1/3 .

Multidimensional Screening

193

r ∂1 , where ∂/∂q S(θ, ∇u(θ )) is orthogonal to the boundary of (no distortion along the boundary) 2. a set of paths γ connecting ∂0 to ∂1 , along which u ∗ is computed by integrating q ∗ (θ ) = ∇u ∗ (θ ). As proved by Armstrong (1996), the nonparticipation region 0 (where u ∗ = 0) typically has a nonempty interior, and u ∗ can be computed numerically by solving a free-boundary problem; that is, ﬁnding the curve 0 that partitions into two regions: 0 (where u ∗ = 0), and 1 ,where u ∗ > 0, and, in the latter region, u ∗ satisﬁes the Euler equation: ∂ div S(θ, ∇u(θ ) · f (θ ) = − f (θ ), ∂q together with the boundary condition stated previously. The problem is that, for most distributions of types [for details, see Rochet and Chon´e (1998)], the solution of this free-boundary problem violates the second-order conditions. The economic intuition behind this result is the presence of a strong conﬂict between the desire of the monopolist to limit the nonparticipation region (by pushing 0 toward the lower boundary of ) and “transverse” incentive compatibility constraints (that force 0 and thus 0 to be convex). By trading off these two effects, the typical shape of 0 will be linear, which means that, in the region immediately above it, u ∗ will depend only on a linear combination of θ1 and θ2 . This is a robust property of multidimensional screening problems: even with log concave distributions of types, bunching cannot be ruled out, and typically occurs in the “southwest” part of (i.e., for consumers with low valuations in all dimensions). From an economic viewpoint, it means that “pure bundling” (i.e., an inefﬁcient limitation of the choice set of consumers with low valuations) is a general pattern. Rochet and Chon´e (1998) consider, for example, the case where θ is exponentially distributed on [a, +∞)2 : with f (θ ) = exp(2a − θ1 − θ2 ) and a > 1. Because θ1 and θ2 are independently distributed and demands are separable, a natural candidate for the optimal price schedule is the best separable price, which can easily be computed: P(q1 , q2 ) = q1 + q2 + (a − 1)2 , giving rise to demands qi (θ ) = θi − 1, i = 1, 2. However, this cannot be the solution, because the nonparticipation region would be empty. In fact, the true solution has the characteristic pattern of multidimensional screening models, whereby is partitioned into three regions: r the nonparticipation region 0 , delimited by a ﬁrst boundary 0 (of equation θ1 + θ2 = τ0 ) in which u ∗ = 0, r the pure bundling region 1, delimited by a second boundary 1 (of equation θ1 + θ2 = τ1 ) in which consumers are forced to buy a bundle with identical quantities of the two goods (thus u ∗ is not strictly convex, because it depends only on θ1 + θ2 ), and ﬁnally

194

Rochet and Stole

r the fully separating region, where consumers have a complete choice and u ∗ can only be determined numerically. Rochet and Chon´e (1998) design a speciﬁc technique, the sweeping procedure, which generalizes the ironing procedure of Mussa and Rosen (1978) for dealing with this new form of bunching, that is speciﬁc to multidimensional screening problems. 10. CONCLUDING REMARKS In this survey, we have emphasized one general theme – that in models with multidimensional heterogeneity over preferences, the ordering of the binding incentive constraints is endogenous. Because the resulting endogenous ordering also is a source of our economic predictions, the difﬁculty in ﬁnding general, tractable mathematical models is particularly signiﬁcant. Notwithstanding this pessimistic appraisal, we have also emphasized in these pages that several solutions to this problem of endogenous ordering exist, all of which shed light on this issue. The simple discrete model we presented, together with a sketch of the algorithm for determining the endogenous ordering and a solution for the simple two-type case, is helpful in illustrating the economic multidimensional screening contracts – one of our primary goals in this paper. In addition, we present a variety of classes of restricted models that make the modeling tractable although still allowing sufﬁcient theoretical degrees of freedom for interesting economics to come out of the analysis. We are particularly heartened by the recent results applied to auctions, bundling, and other rich economic settings, especially competitive environments. ACKNOWLEDGMENTS We are grateful to Mark Armstrong and Patrick Legros for helpful comments. The ﬁrst author thanks CNRS for ﬁnancial support. The second author thanks the National Science Foundation for ﬁnancial support through the PFF Program. Any errors are our own. References Anderson, S., A. de Palma, and J.-F. Thisse (1992), Discrete Choice Theory of Product Differentiation. Cambridge, MA: MIT Press. Araujo, A. and H. Moreira (2000), “Adverse Selection Problems without the SpenceMirrlees Condition,” mimeo, IMPA, Rio de Janeiro, Brazil. Armstrong, M. (1996), “Multiproduct Nonlinear Pricing,” Econometrica, 64(1), 51–75. Armstrong, M. (1999a), “Optimal Regulation with Unknown Demand and Cost Functions,” Journal of Economic Theory, 84(2), 196–215. Armstrong, M. (1999b), “Price Discrimination by a Many-Product Firm,” Review of Economic Studies, 66(1), 151–168. Armstrong, M. and J.-C. Rochet (1999), “Multi-dimensional Screening: A User’s Guide,” European Economic Review, 43(4–6), 959–979.

Multidimensional Screening

195

Armstrong, M. and J. Vickers (1999), “Competitive Price Discrimination,” mimeo. Armstrong, M. and J. Vickers (2000), “Multiproduct Price Regulation under Asymmetric Information,” Journal of Industrial Economics, 48(2), 137–159. Bakos, Y. and E. Brynjolfsson (1996), “Bundling Information Goods: Pricing, Proﬁts and Efﬁciency,” Discussion Paper, MIT. Baron, D. P. and D. Besanko (1984), “Regulation and Information in a Continuing Relationship,” Information Economics and Policy, 1(3), 267–302. Baron, D. and R. Myerson (1982), “Regulating a Monopolist with Unknown Costs,” Econometrica, 50(4), 911–930. Berry, S., J. Levinsohn, and A. Pakes (1995), “Automobile Prices in Market Equilibrium,” Econometrica 63(4), 841–890. Biais, B., D. Martimort, and J.-C. Rochet (2000), “Competing Mechanisms in a Common Value Environment,” Econometrica, 68(4), 799–838. Biglaiser, G. and C. Mezzetti (1999), “Incentive Auctions and Information Revelation,” mimeo, University of North Carolina. Blackorby, C. and W. Schworm (1984), “The Structure of Economies with Aggregate Measures of Capital: A Complete Characterization,” Review of Economic Studies, 51, 633–650. Brown, S. and D. Sibley (1986), The Theory of Public Utility Pricing. New York: Cambridge University Press. Che, Y.-K. and I. Gale (1996a), “Expected Revenue of All-Pay Auctions and First-Price Sealed-Bid Auctions with Budget Constraints,” Economics Letters, 50(3), 373–379. Che, Y.-K. and I. Gale (1996b), “Financial Constraints in Auctions: Effects and Antidotes,” in Advances in Applied Microeconomics, Volume 6: Auctions, (ed. by M. R. Baye), Greenwich, CT: JAI Press, 97–120. Che, Y.-K. and I. Gale (1998), “Standard Auctions with Financially Constrained Bidders,” Review of Economic Studies, 65(1), 1–21. Che, Y.-K. and I. Gale (1999), “Mechanism Design with a Liquidity Constrained Buyer: The 2 × 2 Case,” European Economic Review, 43(4–6), 947–957. Che, Y.-K. and I. Gale (2000), “The Optimal Mechanism for Selling to a BudgetConstrained Buyer,” Journal of Economic Theory, 92(2), 198–233. Clay, K., D. Sibley, and P. Srinagesh (1992), “Ex Post vs. Ex Ante Pricing: Optional Calling Plans and Tapered Tariffs,” Journal of Regulatory Economics, 4(2), 115–138. Courty, P. and H. Li (2000), “Sequential Screening,” Review of Economic Studies, 67(4), 697–718. Dana, J. (1993), “The Organization and Scope of Agents: Regulating Multiproduct Industries,” Journal of Economic Theory, 59(2), 288–310. Dessein, W. (1999), “Network Competition in Nonlinear Pricing,” mimeo, ECARE, Brussels. Fudenberg, D. and J. Tirole (1991), Game Theory. Cambridge, MA: MIT Press, Chapter 7. Gal, S., M. Landsberger and A. Nemirouski (1999), “Costly Bids, Rebates and a Competitive Environment,” mimeo, University of Haifa. Ginsburgh, V. and S. Weber (1996), “Product Lines and Price Discrimination in the European Car Market,” mimeo. Goldman, M. B., H. Leland, and D. Sibley (1984), “Optimal Nonuniform Prices,” Review of Economic Studies, 51, 305–319. Green, J. and J.-J. Laffont (1977), “Characterization of Satisfactory Mechanisms for the Revelation of Preferences for Public Goods,” Econometrica, 45, 427–438.

196

Rochet and Stole

Guesnerie, R. and J.-J. Laffont (1984), “A Complete Solution to a Class of PrincipalAgent Problems with an Application to the Control of a Self-Managed Firm,” Journal of Public Economics, 25, 329–369. Hotelling, H. (1929), “Stability in Competition,” Economic Journal, 39, 41–57. Ivaldi, M. and D. Martimort (1994), “Competition under Nonlinear Pricing,” Annales d’Economie et de Statistique, 34, 71–114. Jehiel, P., B. Moldovanu, and E. Stacchetti (1999), “Multidimensional Mechanism Design for Auctions with Externalities,” Journal of Economic Theory, 85(2), 258–293. Jullien, B. (2000), “Participation Constraints in Adverse Selection Models,” Journal of Economic Theory, 93(1), 1–47. Laffont, J.-J., E. Maskin, and J.-C. Rochet (1987), “Optimal Nonlinear Pricing with Two-Dimensional Characteristics,” in Information, Incentives, and Economic Mechanisms., (ed. by T. Groves, R. Radner, and S. Reiter), Minneapolis MN: University of Minnesota Press. Laffont, J.-J., P. Rey, and J. Tirole (1998a), “Network Competition: Overview and Nondiscriminatory Pricing,” Rand Journal of Economics, 29(1), 1–37. Laffont, J.-J., P. Rey, and J. Tirole (1998b), “Network Competition: Price Discrimination,” Rand Journal of Economics, 29(1), 38–56. Laffont, J.-J. and J.-C. Rochet (1998), “Regulation of a Risk-Averse Firm,” Games and Economic Behavior, 25, 149–173. Laffont, J.-J. and J. Tirole (1986), “Using Cost Observation to Regulate Firms,” Journal of Political Economy, 94, 614–641. Laffont, J.-J. and J. Tirole (1993), A Theory of Incentives in Regulation and Procurement. Cambridge, MA: MIT Press. Leslie, P. (1999), “Price Discrimination in Broadway Theatre,” mimeo, UCLA. Lewis, T. and D. Sappington (1988), “Regulating a Monopolist with Unknown Demand and Cost Functions,” Rand Journal of Economics, 19(3), 438–457. Lewis, T. and D. Sappington (1989a), “Inﬂexible Rules in Incentive Problems,” American Economic Review, 79(1), 69–84. Lewis, T. and D. Sappington (1989b), “Countervailing Incentives in Agency Problems,” Journal of Economic Theory, 49, 294–313. Maggi, G. and A. Rodriguez-Clare (1995), “On Countervailing Incentives,” Journal of Economic Theory, 66(1), 238–263. Martimort, D. (1992), “Multi-principaux avec Anti-selection,” Annales d’Economie et de Statistique, 28, 1–37. Martimort, D. (1996), “Exclusive Dealing, Common Agency, and Multiprincipals Incentive Theory,” Rand Journal of Economics, 27(1), 1–31. Mas-Colell, A., M. Whinston and J. Green (1995), Microeconomic Theory. New York: Oxford University Press. Maskin, E. and J. Riley (1984), “Monopoly with Incomplete Information,” Rand Journal of Economics, 15, 171–196. McAfee, R. P. and J. McMillan (1987), “Competition for Agency Contracts,” Rand Journal of Economics, 18(2), 296–307. McAfee, R. P. and J. McMillan (1988), “Multidimensional Incentive Compatibility and Mechanism Design,” Journal of Economic Theory, 46(2), 335–354. Miravete, E. (1996), “Screening Consumers Through Alternative Pricing Mechanisms,” Journal of Regulatory Economics, 9(2), 111–132. Miravete, E. (1997), “Estimating Demand for Local Telephone Service with Asymmetric Information and Optimal Calling Plans,” Working Paper, INSEAD.

Multidimensional Screening

197

Mirrlees, J. (1971), “An Exploration in the Theory of Optimum Income Taxation,” Review of Economic Studies, 38(114), 175–208. Mirrlees, J. (1976), “Optimal Tax Theory: A Synthesis,” Journal of Public Economics, 6(4), 327–358. Mussa, M. and S. Rosen (1978), “Monopoly and Product Quality,” Journal of Economic Theory, 18, 301–317. Myerson, R. (1981), “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. Myerson, R. (1991), Game Theory. Cambridge, MA: Harvard University Press. Oren, S., S. Smith, and R. Wilson (1983), “Competitive Nonlinear Tariffs,” Journal of Economic Theory, 29(1), 49–71. Prendergast, C. and L. Stole (1996), “Impetuous Youngsters and Jaded Oldtimers: Acquiring a Reputation for Learning,” Journal of Political Economy, 104(6), 1105– 1134. Rochet, J.-C. (1984), “Monopoly Regulation with Two Dimensional Uncertainty,” mimeo, Universit´e Paris 9. Rochet, J.-C. (1987), “A Necessary and Sufﬁcient Condition for Rationalizability in a Quasi-linear Context,” Journal of Mathematical Economics, 16(2), 191–200. Rochet, J.-C. and P. Chon´e (1998), “Ironing, Sweeping, and Multidimensional Screening,” Econometrica, 66(4), 783–826. Rochet, J.-C. and L. Stole (1997), “Competitive Nonlinear Pricing,” mimeo. Rochet, J.-C. and L. Stole (2000), “Nonlinear Pricing with Random Participation,” Review of Economic Studies 69(1), 277–311. Salani´e, B. (1990), “Selection Adverse et Aversion pour le Risque,” Annales d’Economie et de Statistique, 18, 131–149. Schmalensee, R. (1984), “Gaussian Demand and Commodity Bundling,” Journal of Business, 57(1), Part 2, S211–S230. Schmidt-Mohr, U. and M. Villas-Boas (1999), “Oligopoly with Asymmetric Information: Differentiation in Credit Markets,” Rand Journal of Economics, 30(3), 375–396. Sibley, D. and P. Srinagesh (1997), “Multiproduct Nonlinear Pricing with Multiple Taste Characteristics,” Rand Journal of Economics, 28(4), 684–707. Spence, M. (1980), “Multi-Product Quantity Dependent Prices and Proﬁtability Constraints,” Review of Economic Studies, 47, 821–841. Spulber, D. (1989), “Product Variety and Competitive Discounts,” Journal of Economic Theory, 48, 510–525. Stole, L. (1991), “Mechanism Design and Common Agency,” mimeo, MIT. Stole, L. (1995), “Nonlinear Pricing and Oligopoly,” Journal of Economics and Management Strategy, 4(4), 529–562. Stole, L. (1997), “Lectures on the Theory of Contracts”, mimeo, University of Chicago. Wilson, R. (1993a), Nonlinear Pricing. Oxford, UK: Oxford University Press. Wilson, R. (1993b), “Design of Efﬁcient Trading Procedures,” in The Double Auction Market: Institutions, Theories and Evidence, Chapter 5, Santa Fe Institute Studies, Volume 14, (ed. by D. Friedman and J. Rust), Reading, MA: Addison Wesley, 125–152. Wilson, R. (1996), “Nonlinear Pricing and Mechanism Design,” in Handbook of Computational Economics. Volume 1. Handbooks in Economics, Volume 13, (ed. by H. Amman, D. Kendricks, and J. Rust), New York: Elsevier Science, 253–293. Zheng, C. G. (2000), “Optimal Auction in a Multidimensional World,” Discussion Paper, Northwestern University.

A Discussion of the Papers by Pierre-Andre Chiappori and Bernard Salani´e and by Jean Charles Rochet and Lars A. Stole Patrick Legros

Each of these surveys is a “must read”: anyone who wants to analyze multidimensional screening models should start by reading Rochet and Stole (RS), and anyone who wants to do empirical work on contracts should begin with Chiappori and Salani´e (CS). I will start this discussion (Section 1) by what I perceived to be the main message of each survey. Although the two papers are quite different in nature and in focus, they both remind us why we should be interested in contracts and organizations: when markets are incomplete or imperfect, contracts and organizations are the relevant allocation devices and are not neutral from an “efﬁciency” point of view. Therefore, if we want to understand the effects of economic policies, macroeconomic shocks, technological shocks on the performance of ﬁrms, or the economy, we are bound ﬁrst to answer two questions. 1. What are the effects of contractual and organizational choices on behavior and economic performance? 2. What are the determinants of contractual choices? RS and CS show how answers to these questions can be enhanced by theoretical and empirical work in contract theory. Reading these surveys and the literature, it seems fair to acknowledge two tendencies: ﬁrst, that empirical work has been an active consumer of theory, but that theory has been a more timid consumer of empirical work and, second, that we seem to have many answers to (1), but fewer answers to (2). I will therefore develop two themes in my discussion: the necessity of a constructive dialogue between theory and empirical work, and the necessity to provide theoretical models that will more

Discussion

199

accurately capture market forces. Although the ﬁrst theme is clearly present in RS and CS, the second theme is less present in these surveys, but is a logical consequence of the agendas described in RS and CS. Section 2 develops the two themes, and Section 3 illustrates these themes with some examples taken from CS. 1. ROCHET-STOLE AND CHIAPPORI-SALANI E´ 1.1.

RS: Multidimensional Screening

The difﬁculty in multidimensional screening models is the lack of a natural order on types. The problem is not so much one of feasibility, because RS show an algorithm by which the solution can be computed. The problem is rather the possibility to obtain robust qualitative results (similar, e.g., to the “no inefﬁciency at the top” result in the one dimension). RS provide a useful classiﬁcation of the multidimensional models into three categories. They show that, for two of them (aggregation and separability), such robust results can be obtained. The properties of the solution in the aggregation case (i.e., when the multidimensionality can be reduced to one dimension by using an aggregator) are (obviously) related to the distribution of the aggregator. RS footnote 21 nicely illustrates this point. More important differences arise in the separability case (when transversality conditions can be ignored): bundling at the bottom and the possibility of efﬁciency at the top and at the bottom when one looks at one dimension only. RS convincingly show that a rich new set of economic problems can be studied by going from one to two (or more) dimensions. Budgetary constraints, sequential screening, and multiple product purchase are naturally modeled as multidimensional screening problems that can be analyzed at times as simply as in the one-dimensional case. Because in practice not all dimensions can be quantiﬁed or instrumented, a challenge faced by theory is to provide results like those in Figure 5.6 of RS (i.e., to establish a relationship between the endogenous variable and the quantiﬁable dimension). Figure 5.6 summarizes the relationship between the noise in the distribution of outside options and the quantity schedule contingent on the ﬁrst dimension in a parametric example. Because outside options are not observable, the relevant exogenous variable in a regression would indeed be the ﬁrst dimension only (the residual would then be the noise in outside options). We observe that all solutions are increasing in the ﬁrst dimension, and that the schedule becomes “ﬂatter” as the noise in outside options becomes larger. Note also that there is a U-shaped relationship between the noise and the size of the bundling region at the bottom. The comparative static results in Figure 5.6 are therefore quite useful from a theoretical perspective, since they tell us how noise in outside options yields different quality-price schedules than the ﬁxed (and uniform) outside option case. However, it is not clear how easy it will be to identify these results. For instance, a change in the ﬂatness of the optimal schedule could be obtained

200

Discussion

in the one-dimensional case by changing the distribution function (since the ﬂatness is related to the hazard rate). It is not clear at this point how one can empirically distinguish a multidimensional model in which the second dimension is a (random) outside option from a one-dimensional model. There is a sense, however, in which this difﬁculty is also a strength, because the interpretation of the residual as unobserved outside options might be more satisfying than the interpretation in terms of measurement error. 1.2.

CS: Capturing (Endogenous) Heterogeneity

CS’s survey covers a lot of ground. They identify early on the main challenge that empirical work must face: controlling for heterogeneity and endogeneity of the contractual relationships. If agents self-select into ﬁrms or contracting relationships, the outcome of the relation, as well as the contract itself, are explained by the characteristics of the agents, whereas the modeler would be tempted to see the contract as the endogenous variable and the characteristics of the agents in the relationship as the exogenous variables. Their warning should also echo to theorists. CS show that it is possible to create or ﬁnd good data sets to test a variety of important questions: incentive effects of compensation schemes, relative importance of adverse selection and moral hazard to explain behavior in markets, role of reputation, and effects of contractual instruments (e.g., insurance deductible, technology that make contracts more complete). At the same time CS make clear the difﬁculties in meeting their challenge: controlling for the selection effect, distinguishing between the available theoretical models, and controlling for quasi-rents. The task of identifying the incentive effect is already daunting. Trying to identify whether the form of contracting is “optimal,” as they set to do in their Section 3, is certainly even more daunting. For instance, principal-agent theory simply tells that, for a given outside option of the agent, there exists a second-best optimal contract that maximizes the level of utility of the principal. Changing the outside option might – and often will – also change the form of the second-best contract. Hence, unless there is a good way to proxy for the outside option, or for the market forces that affect this outside option, it is not clear how one can answer the question, “Are contracts optimal?” This problem is even more severe when other organizational instruments such as monitoring, auditing, size of the hierarchy, etc. deﬁne the form of the contract. 2. TOWARD A CONSTRUCTIVE DIALOGUE 2.1.

More Theory or More Facts? Necessity of a Dialogue

Research in contract theory has proceeded like most other scientiﬁc endeavors: one step at a time. It has isolated sources of imperfections and has analyzed the consequences of these imperfections for contracts, prices, or organizations. This

Discussion

201

literature has generated a large “toolbox” consisting of a host of models (e.g., adverse selection, moral hazard, multitasks, teams, principal-agent, principalagents, principals-agent, principals-agents, additive noise, multiplicative noise, complete contracting, incomplete contracting, dynamic contracting, career concerns). Do we now have an embarrassment of riches? To paraphrase the title of a recent paper,1 is contract theory plagued by too many theories and too few facts? I will argue in fact that we need more facts and more theory. A dialogue between theory and empirical work is necessary to identify the relevant omitted variables in theoretical and empirical research. Omitted variables are usually associated with econometric analysis. Theory is useful because it helps the econometrician pinpoint the relevant omitted variables, and how these variables affect the observed outcome. Here, the “embarrassment of riches” becomes the solution. Less appreciated perhaps is the fact that theoretical work also faces (by nature?) a problem of omitted variables. An analysis based on a moral hazard model will fail if the essence of the imperfection is adverse selection. If both moral hazard and adverse selection are important, a new model combining the two effects might be necessary if one expects new effects to emerge when the two imperfections are taken simultaneously into account. Here, empirical work helps by providing a “sanity check” on the relevance of a model in a given situation and by suggesting new avenues for research. Now, it is easy to make a model “more general”: generalize some assumptions. Ignoring issues of tractability, such generalizations seem to be useful for the dialogue with empirical work if they yield robust theoretical results that are qualitatively different from the simpler case and if these differences can be identiﬁed in empirical work. CS and RS are excellent illustrations of the beneﬁts of such a dialogue between theory and empirical work. The main focus of RS is on ﬁnding robust theoretical results and the main focus of CS is on identifying theoretical results in the data. However, and this is another theme of this discussion, while existing theoretical and empirical work can generate a dialogue to answer (1) – do incentives matter and how? – the theoretical literature uses a modeling paradigm that will eventually limit the possibility to pursue the dialogue successfully and answer (2) – what determines contracts? This modeling paradigm is the use of outside options for capturing market effects (i.e., forces external to the contract or the organization). Outside options capture the underlying market forces at play in the economy. The question is then which outside options correctly capture market forces. As I argue in the next section, there is a need for theoretical constructs that “bypass” the outside options and that capture directly the relationship between observable data and market forces. This would facilitate, for instance, the identiﬁcation of the effects in the random outside options model of RS, or the completion of the agenda set forth in Section 3 of CS.

1

Baker and Holmstr¨om (1995).

202

Discussion

2.2.

Omitted Variables

Contracts are shaped by a variety of forces or variables. Contract theory has mainly focused on the internal forces that shape an organization or a contract but has been relatively silent on the external forces that shape an organization.2 Examples of “internal” variables are: r Agents’ characteristics: risk aversion, talent, productivity, . . . . r Contractual instruments: monitoring, auditing instruments, delegation rights, screening devices, compensation schemes, . . . . whereas examples of “external” variables are r Policy variables: competition policy, regulation, . . . . r Market variables: distribution of characteristics, product market competition, process of matching, interest rate, market imperfections, . . . . For instance, we understand quite well that monitoring will reduce the cost of inducing productive effort and that more monitoring will be associated with ﬂatter compensation schemes. We understand less well why seemingly identical economies have ﬁrms with different monitoring intensities. We understand that an entrepreneur with more liquidity will need to forfeit less control to ﬁnance an investment. We understand less well the effects of economywide changes in liquidity on control structures. Interestingly, CS emphasize as one of the main sources of bias in empirical work on contracts the endogeneity of the match between contracting parties (i.e., an illustration of how market forces inﬂuence the characteristics of contracting parties, a question on which theory is most silent). Now, market forces are already taken into account in most models in contract theory, albeit in a shortcut sort of way. The “optimal contract” in a principalagent model is the solution to a constrained Pareto problem: maximize the welfare of the principal subject to a set of incentive constraints and subject to giving the agent his outside option (the participation constraint). The outside option of the agent captures his “market value.” By changing the outside option, one changes the nature of the optimal contract. Here is already a sense in which market forces matter for organizations. There is also a sense in which it is futile to test for the efﬁciency of contracting by using this type of model: by changing the outside option, we can generate as optimal contracts a large set of contracts. The fact that we do not observe directly outside options is another impediment. I will come back to this point in the examples at the end of this discussion.3 2 3

Nevertheless, a small, and growing, theoretical literature exists [e.g., Fershtman and Judd (1987), Legros and Newman (1996), Schmidt (1997), and Aghion, Dewatripont, and Rey (1999)]. What about many agents or many types of agents? Most of the literature has been developed under the assumption of a unique outside option. More generally, the outside option of a type can vary with the type (as in the countervailing incentive literature) or can be a random variable (as in RS). In each case, the relationship between types and outside options is quite important for the qualitative properties of the optimal contract. Because we do not observe directly the distribution of outside options, it is not clear how the new effects from these generalizations can be identiﬁed.

Discussion

203

Therefore, the outside option is a convenient theoretical shortcut, but is not an instrument that can be directly used in empirical work for capturing market forces. What seems needed is a theoretical apparatus that will articulate how outside options are determined. Such a mechanism will then link directly some observable and hopefully quantiﬁable variables to contractual or organizational forms, bypassing the outside options. Some of the work cited in footnote 2 goes in this direction, but much remains to be done on this front. 3. REVISITING SOME EXAMPLES Two of the examples in CS enable me to illustrate the beneﬁts of a dialogue between theory and empirical work, and the need to instrument for external forces. The ﬁrst example is about the role of risk in shaping contracts. The second example is about the form of compensation schemes between ﬁxed wage and piece rate. 3.1.

The Risk-Incentive Trade-off

Independently of the risk attitude of the agent, the creation of incentives requires variations in output-based compensation. The cost-minimizing schedule for a risk-neutral principal who wants to give a risk-averse agent his outside option is a perfectly ﬂat compensation schedule. Because a ﬁxed compensation is incompatible with incentives, some variation in compensation characterizes the second-best contract. If the risk inherent in production increases, two effects come into play: ﬁrst, more insurance should be provided to the agent to meet his outside option (for a given effort level); second, because the marginal expected return from effort changes, the incentive compatible level of effort changes (for a given compensation scheme). How the two effects interact is ambiguous; what is not ambiguous is that risk in production will have an effect on contracting. Some models4 predict a negative correlation between risk in production and variation in output-based compensation. A natural place to test this prediction is contracts for sharecropping: lands with more risky crops should be associated with sharecropping (tenant shares the risk with the owner), whereas lands with less risky crops should be associated with rental contracts (tenant faces all the risk). The empirical literature has shown that there is no such positive relationship between risk and sharecropping. CS cite the explanation of Ackerberg and Botticini (2002). Let us embed the basic sharecropping model in a two-sided matching model where one side, the workers, is differentiated by their risk attitude, and the other side, the crops, is differentiated by their riskiness. We will in a competitive equilibrium have more risk-averse agents be assigned to less risky crops, whereas risk-neutral agents would be assigned to more risky crops. Risk-neutral agents are willing 4

For example, the normal noise model with constant absolute risk aversion utility functions and linear-sharing rules.

204

Discussion

to accept to bear all risk (i.e., we should observe rental contracts for risky crops and sharecropping for less risky crops, which is consistent with stylized facts, but is the opposite to what a model with homogeneous workers would predict). Hence, theory omits both an “internal variable” – the heterogeneity in workers’ risk attitude – and an “external variable” – the competitive determination of the assignment of workers to crops. Here,“facts” force theory to identify relevant omitted variables. However, this is not the end of the dialogue. Imagine that workers indeed have the same risk attitude and that crops have different riskiness. Can theory still make sense of “the facts”? If yes, what are the relevant omitted variables? We can follow here an early work of Rao (1971).5 Because the ability to contract on output is linked to its veriﬁability, riskier crops prevent the use of output contingent contracts – absent technologies that make output veriﬁable. Hence, a proﬁt-maximizing land owner who can allocate resources between technologies that make input veriﬁable and technologies that make output veriﬁable will tend to favor output monitoring when crops are risky and to favor input monitoring when crops are less risky. Now, if there is input monitoring, it is easier to contract directly on the worker effort, and the contract should reﬂect ﬁrst-best risk-sharing arrangements, whereas if there is output monitoring, incentives will be created by having the worker bear more risk. Here, again, we obtain a negative correlation between riskiness of crop and sharecropping, absent heterogeneity in risk attitudes. Theory therefore points out an omitted internal variable – the ability to monitor (or measure) input and output6 – and emphasizes the trade-off between rent extraction and incentives.

3.2.

From Fixed Wages to Piece Rates

3.2.1.

Incentives Matter

CS cite the papers by Lazear (1999) and by Paarsch and Shearer (1999), who show how going from ﬁxed wage to piece rates will generate (large) productivity gains. For those of us who are interested in incentive theory, this is good news indeed. In the case of Paarsch and Shearer, the ﬁrm uses both piece rate and wage contracts, whereas in the case of Lazear there was a change in management that coincided with a change of compensation scheme from ﬁxed wage to piece rate. In the ﬁrst case, the observed productivity reﬂects both the contractual terms and 5

6

See, also, Allen and Lueck (1995), Newman (1999), and Prendergast (2000). Lefﬂer and Rucker (1991) show also that contractual choices are best explained by variables like enforcement costs or measurement costs rather than differences in risk attitudes. Interestingly, Ackerberg and Botticini (2000) conclude that there is no empirical support for the risk-sharing hypothesis, but that there is empiricial support for the moral hazard and the imperfect capital market hypotheses. A corollary of this story is that riskier crops should also be correlated with more delegation of authority to the worker. See Rao (1971) or Prendergast (2000).

Discussion

205

the land condition (piece rate is associated with good planting conditions).7 In the second case the observed productivity seems to reﬂect only the contractual change. Both studies are related to question (1). Paarsch and Shearer also partially answer question (2) because they see as a possible source of contractual choice the quality of the land. Lazear is more silent on (2). For both situations, outside options are not taken into account. This raises a natural question in the case of Lazear: Why did we observe the contractual change following the change of management? There are at least three possible answers. r Is it because there was some type of organizational innovation? This is not likely given the prevalence of piece-rate contracts elsewhere. r Is it because the previous management did not realize the productivity beneﬁts of using piece rates? Possibly (and could explain why the previous management was replaced). In this case, the contractual change generates sorting effects: high types are paid more and therefore will tend to “ﬂow” toward the ﬁrm more than before.8 r Or is it because the change of management coincided with a change in outside options (or other market conditions) of the workers?9 In this case, sorting effects generate the contractual change. It is because high types have a relatively larger outside option than low types that the contract must be piece-rate in order to minimize the cost to the ﬁrm of giving each type of agent his outside option. Here the omitted variable is external. 3.2.2.

Outside Options Matter

In the work cited by RS and by CS, moral hazard or asymmetric information was key to explaining the performance and nature of the contracts. As I have argued, external variables are also important. Here, I would like to propose a simple example showing how external variables could be sufﬁcient to explain, for instance, the choice of piece-rate versus wage contracts. Consider a risk-neutral principal who has limited liability (this is the ﬁrst “market variable”; there is a missing insurance market) and who contracts with a risk-averse worker. Assume that output is veriﬁable and that effort is contractable. To simplify, assume that there is a unique level of effort consistent 7 8 9

Note the parallel with the previous explanation for correlation between sharecropping and riskiness of crop. This is the observation of Lazear (1999) Think of a situation in which the type of a worker affects his private cost of production, but not the level of production. It is easy to show that, if there is any cost to implementing menu contracts, we will observe for relatively equal outside options a unique wage-effort contract, although if the outside options are more unequal, we will observe a menu contract that can be implemented by a piece-rate contract.

206

Discussion

Figure 1. Piece-rate contracts are optimal when outside option is large.

with production and that there is an equal probability that a low output R0 and a high output R1 are realized. The principal will therefore choose a contingent contract (w 0 , w 1 ) that minimizes the expected wage bill subject to two constraints: (i) the limited liability constraint that wages cannot exceed available output and (ii) the participation constraint that the expected utility of the agent is greater than his outside option u (this is our second “market variable”). The principal solves the problem min w 0 + w 1 u(w 0 ) + u(w 1 ) ≥ 2u w i ∈ [0, Ri ], i = 0, 1. It is straightforward to show that the cost-minimizing schedule is of the form w 1 = w 0 + b(R1 − R0 ) and that there exists a cutoff level u 0 = u(R0 ) such that when the outside option is smaller than u 0 , the optimal b is equal to zero (wage contract) and when the outside option is greater than u 0 , the optimal b is positive and increases in the outside option (piece-rate contract). This is also simply illustrated in the Edgeworth box diagram (Figure 1) where the contract curve corresponding to the previous problem is the thick line. For low values of the outside option, the contract curve is the full insurance line. For high values of the outside option, the limited liability constraint of the principal binds and prevents full insurance. A change from wage contracting to piece-rate contracting is therefore directly due to an increase in outside options, absent any agency problem.

Discussion

207

References Ackerberg, D. and M. Botticini (2002), “Endogenous Matching and the Empiricial Determinants of Contract Form,” Journal of Political Economy, 110(3), 564–591. Ackerberg, D. and M. Botticini (2000), “The Choice of Agrarian Contracts in Early Renaissance Tuscany: Risk Sharing, Moral Hazard, or Capital Market Imperfections?” Explorations in Economics History, 37(3), 241–257. Aghion, P., M. Dewatripont, and P. Rey (1999), “Competition, Financial Discipline and Growth,” Review of Economic Studies, 66, 825–852. Allen, D. W. and D. Lueck (1995), “Risk Preferences and the Economics of Contracts,” American Economic Review, 85, 447–451. Baker, G. and B. Holmstr¨om (1995), “Internal Labor Markets: Too Many Theories, Too Few Facts,” American Economic Review, 85(2), 255–259. Fershtman, C. and K. Judd (1987), “Equilibrium Incentives in Oligopoly,” American Economic Review, 77, 927–940. Lazear, E. (1999), “Performance Pay and Productivity,” mimeo (revised version of NBER W5672, 1996, with the same title). Lefﬂer, K. B. and R. R. Rucker (1991), “Transaction Costs and the Organization of Production: Evidence from Timber Sales Contracts,” Journal of Political Economy, 99(5), 1060–1087. Legros, P. and A. Newman (1996), “Wealth Effects, Distribution and the Theory of Organization,” Journal of Economic Theory, 70, 312–341. Newman, A. (1999), “Risk-Bearing, Entrepreneurship and the Theory of Moral Hazard,” mimeo. Prendergast, C. (2000), “Uncertainty and Incentives,” mimeo. Rao, C. H. H. (1971), “Uncertainty, Entrepreneurship, and Sharecropping in India,” Journal of Political Economy, 79(3), 578–595. Schmidt, K. (1997), “Managerial Incentives and Product Market Competition,” Review of Economic Studies, 64, 191–213.

CHAPTER 6

Theories of Fairness and Reciprocity: Evidence and Economic Applications Ernst Fehr and Klaus M. Schmidt

1. INTRODUCTION Most economic models are based on the self-interest hypothesis that assumes that all people are exclusively motivated by their material self-interest. Many inﬂuential economists – including Adam Smith (1759), Gary Becker (1974), Kenneth Arrow (1981), Paul Samuelson (1993), and Amartya Sen (1995) – pointed out that people often do care for the well-being of others and that this may have important economic consequences. Yet, so far, these opinions have not had much of an impact on mainstream economics. In recent years, experimental economists have gathered overwhelming evidence that systematically refutes the self-interest hypothesis. The evidence suggests that many people are strongly motivated by other-regarding preferences, and that concerns for fairness and reciprocity cannot be ignored in social interactions. Moreover, several theoretical papers have been written showing that the observed phenomena can be explained in a rigorous and tractable manner. Some of these models shed new light on problems that have puzzled economists for a long time (e.g., the persistence of noncompetitive wage premia, the incompleteness of contracts, the allocation of property rights, the conditions for successful collective action, and the optimal design of institutions). These theories in turn induced a new wave of experimental research offering additional exciting insights into the nature of preferences and into the relative performance of competing theories of fairness. The purpose of this paper is to review these recent developments, to point out open questions, and to suggest avenues for future research. Furthermore, we will argue that it is not only necessary, but also very promising for mainstream economics to take the presence of other-regarding preferences into account. Why are economists so reluctant to give up the self-interest hypothesis? One reason is that this hypothesis has been quite successful in providing accurate predictions in some economic domains. For example, models based on the self-interest hypothesis make very good predictions for competitive markets with standardized goods. This has been shown in many carefully conducted market experiments. However, a large amount of economic activity is taking

Fairness and Reciprocity

209

place outside of competitive markets – in markets with a small number of traders, in markets with informational frictions, in ﬁrms and organizations, and under incompletely speciﬁed and incompletely enforceable contracts. In these environments, models based on the self-interest assumption frequently make very misleading predictions. An important insight provided by some of the newly developed fairness models is that they show why, in competitive environments with standardized goods, the self-interest model is so successful and why, in other environments, it is refuted. In this way, the new models provide fresh and experimentally conﬁrmed insights into important phenomena (e.g., nonclearing markets or the widespread use of incomplete contracts). We consider it important to stress that the available experimental evidence also suggests that many subjects behave quite selﬁshly even when they are given a chance to affect other peoples’ well-being at a relatively small cost. However, there are also many people who are strongly motivated by fairness and reciprocity and who are willing to reward or punish other people at a considerable cost to themselves. One of the exciting insights of some of the newly developed theoretical models is that the interaction between fair and selﬁsh individuals is key to the understanding of the observed behavior in strategic settings. These models explain why, in some strategic settings, almost all people behave as if they are completely selﬁsh, whereas in others the same people will behave as if they are driven by fairness. A second reason for the reluctance to give up the self-interest hypothesis is methodological. There is a strong convention in economics of not explaining puzzling observations by changing assumptions on preferences. Changing preferences is said to open Pandora’s box, because everything can be explained by assuming the “right” preferences. We believe that this convention made sense in the past when economists did not have sophisticated tools to examine the nature of preferences in a scientiﬁcally rigorous way. However, because of the development of experimental techniques, this is no longer true. In fact, one purpose of this paper is to show that much progress and fascinating new insights into the nature of fairness preferences have been made in the past decade. Although there is still much to be done, this research clearly shows that it is possible to discriminate between theories based on different preference assumptions. Therefore, in view of the facts, the new theoretical developments, the importance of fairness concerns in many economic domains, and in view of the existence of rigorous experimental techniques that allow us to examine hitherto unsolvable problems in a scientiﬁc manner, we believe that it is time to recognize that a substantial fraction of the people is also motivated by fairness concerns. People do not differ only in their tastes for chocolate and bananas, but also along a more fundamental dimension. They differ with regard to how selﬁsh or fair-minded they are, and this does have important economic consequences. The rest of this paper is organized as follows. Section 2 provides many reallife examples indicating the relevance of fairness considerations and reviews the experimental evidence. It shows that the self-interest model is refuted in

210

Fehr and Schmidt

many important situations and that a substantial number of people seem to be strongly concerned about fairness and behave reciprocally. Section 3 surveys different theoretical approaches that try to explain the observed phenomena. In the meantime, there is also a large and growing literature on the evolutionary origins of reciprocity (see, e.g., Bowles and Gintis 1999; Gintis 2000; Sethi and Somananthan 2000, 2001). We do not discuss and review this literature in our paper. Section 4 discusses the wave of new experiments that have been conducted to discriminate between these theories. Section 5 explores the implications of fairness-driven behavior in various economic applications and offers some directions for future research. Section 6 concludes. In view of the length of our paper, it is also possible to read the paper selectively. For example, readers who are already familiar with the basic evidence and the different fairness theories may go directly to the new evidence in Section 4 and the economic applications in Section 5. 2. EMPIRICAL FOUNDATIONS OF FAIRNESS AND RECIPROCITY 2.1.

Where Does Fairness Matter?

The notion of fairness is frequently invoked in families, at the workplace, and in people’s interactions with neighbors, friends, and even strangers. For instance, our spouse becomes sour if we do not bear a fair share of family responsibilities. Our children are extremely unhappy and envious if they receive less attention and gifts than their brothers and sisters. We do not like those among our colleagues who persistently escape doing their share of important, yet inconvenient, departmental activities. Fairness considerations are, however, not restricted to our personal interactions with others. They shape the behavior of people in important economic domains. For example, employee theft and the general work morale of employees are affected by the perceived fairness of the ﬁrm’s policy (Greenberg, 1990 and Bewley, 1999). The impact of fairness and equity norms may render direct wage cuts unproﬁtable (Kahneman, Knetsch, and Thaler 1986, Agell and Lundborg, 1995). Firms may, therefore, be forced to cut wages in indirect ways (e.g., by outsourcing activities). Fairness concerns may thus inﬂuence decisions about the degree of vertical integration. They may also severely affect the hold-up problem as demonstrated by Ellingsen and Johannesson (2000). Debates about the appropriate income tax schedule are strongly affected by notions of merit and fairness (Seidl and Traub, 1999). The amount of tax evasion is likely to be affected by the perceived fairness of the tax system (Frey and Weck-Hanneman, 1984, Alm, Sanchez, and de Juan 1995, Andreoni, Erard, and Feinstein 1998). Public support for the regulation of private industries depends on the perceived fairness of the ﬁrms’ policies (Zajac, 1995). Compliance with contractual obligations, with organizational rules, and with the law in general is strongly shaped

Fairness and Reciprocity

211

by the perceived fairness of the allocation of material beneﬁts and by issues of procedural justice (Lind and Tyler, 1988, Fehr, G¨achter, and Kirchsteiger, 1997). The functioning of incentive-compatible mechanisms has been shown to depend on fairness considerations (Andreoni and Varian, 1999). The solution of collective action problems (e.g., rules regulating the access to common pool resources) critically depends on the fairness of the allocation of the costs and beneﬁts of the rules (Ostrom 1990, 2000, Falk, Fehr, and Fischbacher, 2000c). The erosion of public support for the welfare state in the United States in the last two decades probably has much to do with deeply entrenched notions of reciprocal fairness (Bowles and Gintis, 2000). Many people cease to support public programs that help the poor if they have the impression that the poor do not attempt to bear their share of a society’s obligations. Thus, real-world examples in which fairness concerns are likely to matter abound. Nevertheless, in the following, we concentrate on clean experimental studies, because in most real-life situations, it is impossible to unambiguously isolate the impact of fairness motives. A skeptic may always argue that the notion of fairness is used only for rhetorical purposes that disguises purely selfinterested behavior in an equilibrium of a repeated game. Therefore, we rely on experimental evidence of human decision making. In these experiments, real subjects make decisions with real monetary consequences in carefully controlled laboratory settings. In particular, the experimenter can implement one-shot interactions between the subjects so that long-term self-interest can be ruled out as an explanation for what we observe. As we will see, in some experiments, the monetary stakes involved are quite high – amounting up to the income of three months’ work. In the experiments reviewed, subjects do not know each others’ identity, they interact anonymously, and, sometimes, even the experimenter cannot observe their individual choices. 2.2.

Experimental Evidence

In hindsight, it is a bit ironic that experiments have proved to be critical for the discovery and the understanding of fairness-driven behavior, because for several decades, experimental economists were ﬁrmly convinced that fairness motives would not matter much. At best, fair behavior was viewed as a temporary deviation from the strong forces of self-interest. In the 1950s, Vernon Smith discovered that, under relatively weak conditions, experimental markets quickly converge to the competitive equilibrium.1 Since then, the remarkable convergence properties of experimental markets have been conﬁrmed by hundreds of experiments (see, e.g., Davis and Holt, 1993). For these experiments, the equilibrium is computed under the assumption that all players are 1

Smith’s results were eventually published in 1962 in the Journal of Political Economy after time-consuming debates with the referees. It is also ironic that Smith‘s initial aim was “to do a more credible job of rejecting competitive price theory” than Chamberlin (1948).

212

Fehr and Schmidt

exclusively self-interested. Therefore, the quick convergence to equilibrium has been interpreted as a conﬁrmation of the self-interest hypothesis. We will see later in this paper that this conclusion was premature because, as the newly developed models of fairness show (see Section 3 and Section 5.1), convergence to standard competitive predictions can occur even if agents are very strongly concerned about fairness. This strong commitment to the self-interest hypothesis slowly weakened in the 1980s when experimental economists started to study bilateral bargaining games and interactions in small groups in controlled laboratory settings (see, e.g., Roth, Malouf, and Murningham, 1981, G¨uth et al., 1982). One of the important experimental games that ultimately led many people to realize that the self-interest hypothesis is problematic was the so-called Ultimatum Game (UG) invented by G¨uth, Schmittberger, and Schwarze (1982). In addition, the Gift Exchange Game (GEG), the Trust Game (TG), the Dictator Game (DG), and Public Good Games (PGGs) played an important role in weakening the exclusive reliance on the self-interest hypothesis. All these games share the feature of simplicity. Because they are so simple, they are easy to understand for the experimental subjects, and this makes inferences about subjects’ motives more convincing. In the UG, a pair of subjects has to agree on the division of a ﬁxed sum of money. Person A, the Proposer, can make one proposal of how to divide the amount. Person B, the Responder, can accept or reject the proposed division. In the case of rejection, both receive nothing; in the case of acceptance, the proposal is implemented. Under the standard assumptions that (i) both the Proposer and the Responder are rational and care only about how much money they get and (ii) that the Proposer knows that the Responder is rational and selﬁsh, the subgame perfect equilibrium prescribes a rather extreme outcome: The Responder accepts any positive amount of money and, hence, the Proposer gives the Responder the smallest money unit, ε, and keeps the rest. A robust result in the UG, across hundreds of experiments, is that proposals offering the Responder less than 20 percent of the available surplus are rejected with probability 0.4–0.6. In addition, the probability of rejection is decreasing in the size of the offer (see, e.g., G¨uth et al., 1982, Camerer and Thaler, 1995; Roth, 1995, and the references therein). Apparently, many Responders do not behave in a self-interest maximizing manner. In general, the motive indicated for the rejection of positive, yet “low,” offers is that subjects view them as unfair. A further robust result is that many Proposers seem to anticipate that low offers will be rejected with a high probability. This is suggested, for example, by the comparison of the results of DGs and UGs. In a DG, the Responder’s option to reject is removed – the Responder must accept any proposal. Forsythe et al. (1994) were the ﬁrst who compared the offers in UGs and DGs. They report that offers are substantially higher in the UG, which suggests that many Proposers do apply backward induction. This interpretation is also supported by the surprising observation of Roth, Prasnikar, Okuno-Fujiwara, and Zamir

Fairness and Reciprocity

213

(1991), who showed that the modal offer in the UG tends to maximize the expected income of the Proposer.2 The UG shows that a sizeable fraction of Responders is willing to punish behavior that is perceived as unfair. In contrast, the GEG indicates that a substantial fraction of the Responders is willing to reward actions that are perceived as generous or fair. The ﬁrst GEG has been conducted by Fehr, Kirchsteiger, and Riedl (1993). In the GEG, the Proposer offers an amount of money w ∈ [w, w], w ≥ 0, which can be interpreted as a wage payment, to the Responder. The Responder can accept or reject w. In case of a rejection, both players receive zero payoff; in case of acceptance, the Responder has to make a costly “effort” choice e ∈ [e, e], e > 0. The monetary payoff for the Proposer is x P = ve − w, whereas the Responder’s payoff is x R = w − c(e), where v denotes the marginal value of effort for the Proposer and c(e) the strictly increasing effort cost schedule.3 Under the standard assumptions (i) and (ii), the Responder will always choose the lowest feasible effort level e and will, in equilibrium, never reject any w. Therefore, the subgame perfect proposal is the lowest feasible wage level w. The GEG captures a principal–agent relation with highly incomplete contracts in a stylized way. Variants of the GEG have been conducted by several authors.4 All of these studies report that the mean effort is, in general, positively related to the offered wage that is consistent with the interpretation that the Responders, on average, reward generous wage offers with generous effort choices. However, as in the case of the UG, there are considerable individual differences among the Responders. Although there typically is a sizeable fraction of Responders (frequently roughly 40 percent, sometimes more than 50 percent) who exhibit a reciprocal effort pattern, there is also a substantial fraction of Responders who always make purely selﬁsh effort choices or whose choices seem to deviate randomly from the self-interested action. Despite the presence of selﬁsh Responders, the relation between average effort and wages is in general sufﬁciently steep to render a high wage policy proﬁtable. This induces Proposers to pay wages far above w. Evidence for this interpretation comes from Fehr, Kirchsteiger, and Riedl, who embedded the GEG into an experimental market. 2

3

4

Suleiman (1996) reports the results of UGs with varying degrees of veto power. In these games, a rejection meant that λ percent of the cake was destroyed. For example, if λ = 0.8, and the Proposer offered a 9:1 division of $10, a rejection implied that the Proposer received $1.8, whereas the Responder received $0.2. Suleiman reports that Proposers’ offers are strongly increasing in λ. In some applications of this game, the Proposer’s payoff was given by x P = (v − w)e. This formulation rules out that Proposers can make losses when they offer generously high wages. Likewise, in some applications of the GEG, the Responder did not have the option to reject w. Thus, the Proposer just sent w, whereas the Responder choose an effort level. Under the standard assumptions of rationality and selﬁshness, the subgame perfect equilibrium is, however, not affected by these differences. See, e.g., Fehr, Kirchsteiger, and Riedl (1993, 1998), Charness (1996, 2000), Brandts and Charness (1999), Falk, G¨achter, and Kovacs (1999), Fehr and Falk, (1999), G¨achter and Falk (1999), and Hannan, Kagel, and Moser (1999).

214

Fehr and Schmidt

In addition to the embedded GEG, there was a control condition in which the effort level was exogenously ﬁxed by the experimenter. Note that, in the control condition, the Responders can no longer reward generous wages with high effort levels. It turns out that the average wage is substantially reduced when the effort is exogenously ﬁxed. Another important game that did much to change the exclusive reliance on the self-interest hypothesis was the TG, ﬁrst studied by Berg, Dickhaut, and McCabe (1995). In a TG, a Proposer receives an amount of money y from the experimenter, and then can send between zero and y to the Responder. The experimenter then triples the amount sent, which we term z, so that the Responder has 3z. The Responder is then free to return anything between zero and 3z to the Proposer. It turns out that many Proposers send money and that many Responders give back some money. Moreover, there is frequently a strong correlation between z and the amount sent back at the individual, as well as at the aggregate, level (see, e.g., Miller 1997, Cox 2000, Fahr and Irlenbusch, 2000). Finally, we brieﬂy consider the evidence on PGGs. Like the GEG, the PGG is important because it not only provides interesting insights into the nature of nonpecuniary motivations, but it also captures the essence of numerous realworld situations. There is by now a huge experimental literature on PGGs (see, for surveys, Dawes and Thaler, 1988, Ledyard, 1995). In the typical experiment, there are n players who simultaneously decide how much of their endowment to contribute to a public good. Player i’s monetary payoff is given by xi = yi − gi + m g j , where yi is player i’s endowment, gi her contribution, m the monetary payoff per unit of the public good, and g j the amount of the public good provided by all players. The unit payoff m obeys m < 1 < nm. This ensures that it is a dominant strategy to contribute nothing to the public good, although the total surplus would be maximized if all players contributed their whole endowment.5 In many experiments, the PGG is repeated for about 10 periods, in which in each period the group composition changes randomly. If we restrict attention to behavior in the ﬁnal period (to abstract from repeated games or learning effects), it turns out that roughly 75 percent of all subjects contribute nothing to the public good and the rest contributes very little.6 If one adds to the PGG the opportunity to punish other group members, the contribution pattern changes radically (Fehr and G¨achter 2000). In a PGG with a punishment option, there are two stages. Stage 1 is identical to the previously described PGG. At stage 2, after every player in the group has been informed 5 6

Typically, endowments are identical and n ≤ 10, but there are also experiments with a group size of 40 and 100 (Isaac, Walker, and Williams, 1994). At the beginning of a repeated PGG, subjects contribute on average between 40 and 60 percent of their endowment; but, toward the end, contributions are typically very low. This pattern may be due to repeated game effects. Another plausible reason for the decay of cooperation is that many subjects are conditional cooperators, as shown by Croson (1999), Fischbacher, G¨achter, and Fehr (1999), and Sonnemans, Schram, and Offerman (1999). Conditional cooperators cease to cooperate once they notice that selﬁsh subjects take advantage of their cooperation.

Fairness and Reciprocity

215

about the contributions of each group member, each player can assign up to ten punishment points to each of the other players. The assignment of one punishment point reduces the ﬁrst-stage income of the punished subject by three points on average, but it also reduces the income of the punisher according to a strictly increasing and convex cost schedule. Note that because punishment is costly for the punisher, the self-interest hypothesis predicts zero punishment. Moreover, because rational players will anticipate this, the self-interest hypothesis predicts that nobody will contribute (i.e., there should be no difference in the contribution behavior between the usual PGG and a PGG with a punishment opportunity). The experimental evidence is, however, completely at odds with this prediction. Although, in the usual PGG, cooperation is close to zero in the ﬁnal period, the punishment opportunity causes, on average, stable cooperation rates around 75 percent of subjects’ endowment.7 The reason for these huge differences in contribution behavior is that in the punishment condition many cooperators punish the free riders. The more a subject deviates from the average contribution of the other group members, the more it is punished. Thus, the willingness to punish “unfair” behavior is not restricted to the UG. The above-mentioned facts in the UG, the GEG, the TG, and the PGG are now well established, and there is little disagreement about them. But there are, of course, questions about which factors change the behavior in these games. For example, a question that routinely comes up in discussions with economists is whether a rise in the stake level will eventually induce subjects to behave in a self-interested manner. There are several papers examining this question (Hoffman, McCabe, and Smith 1996; Fehr and Tougareva 1995; Slonim and Roth 1998; Cameron 1999). The surprising answer is that relatively large increases in the monetary stakes did nothing or little to change behavior. Hoffman et al. could not detect any effect of the stake level in their UGs. Fehr and Tougareva conducted GEGs (embedded in a competitive exerimental market) in Moscow. In one condition, the subjects earned, on average, the equivalent amount of the income of 1 week in the experiment. In another condition, they earned the equivalent of 10 weeks’ income. Despite this large difference in the stake size, there are no signiﬁcant differences across conditions in the behavior of both the Proposers and the Responders. Slonim and Roth conducted UGs in Slovakia. They found a small interaction effect between experience and the stake level. In the ﬁnal period of a series of one-shot UGs, the Responders in the high-stake condition (with a tenfold increase in the stake level relative to the low-stake condition) seem to be willing to reject a bit less frequently. Fehr and Tougareva also allowed subjects to repeat the game (with randomly matched partners). They found no such interaction effects. Cameron conducted UGs in Indonesia and – in the high-stake condition – subjects could earn 7

If the same subjects are allowed to stay together for 10 periods, the cooperation rate even climbs to 90 percent of subjects’ endowments in the ﬁnal period. In Fehr and G¨achter (2000), the group size was n = 4. Recently, Carpenter (2000) showed that, with a group size of n = 10, subjects achieve almost full cooperation, even with a random group composition over time.

216

Fehr and Schmidt

the equivalent of three months’ income in her experiment. She observed no effect of the stake level on Proposers’ behavior and a slight reduction of the rejection probability when stakes were high. Of course, it is still possible that, in the presence of extremely high stakes, there may be a shift toward more selﬁsh behavior. However, for large segments of the population, this is not the economically relevant question. For almost all people, the vast majority of their decisions involves stake levels well below three months’ income. Thus, even if fairness-driven behavior would play no role at all at stake levels above that size, fairness concerns would still play a major role in many economically important domains. 2.3.

Interpretation of the Evidence

Although there is now little disagreement regarding the facts, there is still disagreement about the interpretation of these facts. In Section 3, we will describe several recently developed theories of fairness that maintain the rationality assumption, but change the assumption of purely selﬁsh preferences. Some researchers have, however, reservations about changes in the motivational assumptions and prefer, instead, to interpret the behavior in these games as elementary forms of bounded rationality. For example, Binmore, Gale, and Samuelson (1995) and Roth and Erev (1995) try to explain the presence of fair offers and rejections of low offers in the UG by learning models that are based on purely pecuniary preferences. These models are based on the idea that the rejection of low offers is not very costly for the Responder and, therefore, the Responders learn only very slowly not to reject such offers. The rejection of offers is, however, quite costly for the Proposers. Therefore, Proposers learn more quickly that it does not pay to make low offers. Moreover, because Proposers quickly learn to make fair offers, the pressure on the Responders to learn accepting low offers is greatly reduced. This gives rise to very slow convergence to the subgame perfect equilibrium – if there is convergence at all. The simulations of Binmore et al. and Roth and Erev show that it often takes thousands of iterations until play comes close to the standard prediction. In our view, there can be little doubt that learning processes are important in real life, as well as in laboratory experiments. There are numerous examples where the behavior of subjects changes over time, and it seems clear that learning models are prime candidates to explain such dynamic patterns. We believe, however, that attempts to explain the basic facts in such simple games as the UG, the GEG, and the TG in terms of learning models that assume completely selﬁsh preferences are misplaced. The decisions of the Responders, in particular, are so simple in these games that it is difﬁcult to believe that they make systematic mistakes and reject money or reward generous offers, although their true preferences would require them not to do so. Moreover, the previously cited evidence from Roth et al. (1991), Forsythe et al. (1994), Suleiman (1996), and Fehr et al. (1998) suggests that many Proposers do anticipate Responders’

Fairness and Reciprocity

217

actions surprisingly well. Thus, at least in these simple two-stage games, many Proposers seem to be quite rational and forward-looking. Sometimes it is also argued that the behavior in these games is because of a social norm (see, e.g., Binmore 1998). In real life, so the argument goes, experimental subjects make the bulk of their decisions in repeated interactions. It is well known that, in repeated interactions, the rejection of unfair offers or the rewarding of generous offers can be sustained as an equilibrium. According to this argument, notions of fairness perform the function of selecting a particular equilibrium among the inﬁnitely many equilibria that typically exist in long-term interactions. Subjects’ behavior is, therefore, adapted to repeated interactions, and they tend to apply behavioral rules that are appropriate in the context of repeated interactions erroneously to laboratory one-shot games. This argument essentially boils down to the claim that subjects cannot rationally distinguish between one-shot and repeated interactions. One problem with this argument – apart from claiming that subjects make systematic mistakes – is that it cannot explain the huge behavioral variations across one-shot games. Why do, in Forsythe et al. (1994), the Proposers give so much less in the DG compared with the UG? Why do the Proposers in the control condition with exogenously ﬁxed effort (Fehr et al., 1998) make so low wage offers? Why is there so much defection in the ﬁnal round of PGGs, whereas in the presence of a punishment opportunity, a high level of cooperation can be achieved? Invoking some kind of social norm cannot explain this behavior unless one is willing to assume that different social norms apply to these different situations. A second problem with this argument is that there is compelling evidence that, in repeated interactions, experimental subjects do behave very differently compared with one-shot situations. In G¨achter and Falk (1999), it is shown that the Responders in GEGs put forward much higher effort levels if they can stay together with the same Proposer.8 In fact, experimental subjects who participate in one-shot GEGs frequently complain after the experiment that the experimenter ruled out repeated interactions because that would have enabled them, so the subjects’ claim, to develop a much more trustful and efﬁcient relation with their partner. All this indicates that experimental subjects are well aware of the difference between one-shot interactions and repeated interactions. These arguments suggest that an approach that combines bounded rationality with purely selﬁsh preferences does not provide a satisfactory explanation of the facts observed in UGs, GEGs, TGs, and PGGs. In our view, there remain two plausible approaches to account for the facts. One approach is to maintain the assumption of rationality at least for the analysis of these simple games and to assume, in addition, that some players are not only motivated by pecuniary forces. The other approach is to combine models of learning with models that take into account nonselﬁsh motives. In the following we focus on the ﬁrst 8

Andreoni and Miller (1993) also report that, in prisoner’s dilemmas, increases in the probability of staying together or meeting the same partner again increase cooperation rates.

218

Fehr and Schmidt

approach because there has been much progress in this area in recent years, whereas the second approach is still in its infancy.9 3. THEORIES OF FAIRNESS AND RECIPROCITY This section surveys the most prominent recent attempts to explain the experimental evidence sketched in Section 2 within a rational choice framework. Two main approaches can be distinguished. The ﬁrst approach assumes that at least some agents have “social preferences” (i.e., the utility function of these agents depends not only on the own material payoff, but also on how much the other players receive). Given these social preferences, all agents are assumed to behave perfectly rational, and the well-known concepts of traditional utility and game theory can be applied to analyze optimal behavior and to characterize equilibrium outcomes in experimental games. The second approach focuses on “intention-based reciprocity.” This approach assumes that a player cares about the intentions of her opponent. If she feels treated kindly, she wants to return the favor and be nice to her opponent. If she feels treated badly, she wants to hurt her opponent. Thus, in this approach, it is crucial how a player interprets the behavior of the other players. This cannot be captured by traditional game theory but requires the framework of psychological game theory. The starting point of both of these approaches is to make rather speciﬁc assumptions on the utility functions of the players. Alternatively, one could start from a general preference relation and ask what kind of axioms are necessary and sufﬁcient to generate utility functions with certain properties. Axiomatic approaches are discussed at the end of this section. 3.1.

Social Preferences

Classical utility theory assumes that a decision maker has preferences over allocations of material outcomes (e.g., goods) and that these preferences satisfy some “rationality” or “consistency” requirements, such as completeness and transitivity. However, in almost all applications, this fairly general framework is interpreted much more narrowly by implicitly assuming that the decision maker cares about only one aspect of an allocation, namely the material resources that are allocated to her. Models of social preferences assume, in contrast, that the decision maker may also care about how much material resources are allocated to others. Somewhat more formally, let {1, 2, . . . , N } denote a set of individuals and x = (x1 , x2 , . . . , x N ) denote an allocation of physical resources out of some set X of feasible allocations, where xi denotes the material resources allocated to person i. The self-interest hypothesis says that the utility of individual i 9

An exemption is the recent paper by Cooper and Stockman (1999), which combines reenforcement learning with a model of social preferences, and the paper by Costa-Gomes and Zauner (1999).

Fairness and Reciprocity

219

depends on xi only. We will say that individual i has social preferences if for any given xi person i’s utility is affected by variations of x j , j = i. Of course, simply assuming that the utility of individual i may be any function of the total allocation is too general, because it does not yield any empirically testable restrictions on observed behavior. In the following, we will discuss several models of social preferences, each of which assumes that the preferences of an individual depend on x j , j = i, in a different way. 3.1.1.

Altruism

A person is altruistic if the ﬁrst partial derivatives of u(x1 , . . . , x N ) with respect to x1 , . . . , x N are strictly positive (i.e., if her utility increases with the wellbeing of other people).10 The hypothesis that people are altruistic has a long tradition in economics and has been used to explain charitable donations and the voluntary provision of public goods (see, e.g., Becker, 1974). Clearly, the simplest game to elicit altruistic preferences is the DG. Adreoni and Miller (2000) conducted a series of DG experiments in which one agent could allocate “tokens” between herself and another agent for a series of different budgets. The tokens were exchanged into money at different rates for the two agents and the different budgets. Let Ui (x1 , x2 ) denote subject i’s utility function representing her preferences over monetary allocations (x1 , x2 ). In a ﬁrst step, Adreoni and Miller check for violations of the General Axiom of Revealed Preference and ﬁnd that almost all subjects behaved consistently and passed this basic rationality check. Then they classify the subjects into three main groups. They ﬁnd that about 30 percent of the subjects give tokens to the other party in a fashion that equalizes the monetary payoffs between players. The behavior of 20 percent of the subjects can be explained by a utility function in which x1 and x2 are perfect substitutes [i.e., these subjects seem to have maximized the (weighted) sum of the monetary payoffs]. However, there are also almost 50 percent of the subjects who behaved “selﬁshly” and did not give any signiﬁcant amounts to the other party. Andreoni and Miller (2000, p. 23) conclude that altruistic behavior exists and that it is consistent with rationality, but also that individuals are heterogeneous. Charness and Rabin (2000) consider a speciﬁc form of altruism that they call quasi-maximin preferences. They start from a “disinterested social welfare function,” which is a convex combination of Rawls’ maximin criterion and a utilitarian welfare function: W (x1 , x2 , . . . , x N ) = δ · min{x1 , . . . , x N } + (1 − δ) · (x1 + · · · + x N ), 10

The Encyclopaedia Britannica (1998, 15th edition) deﬁnes an altruistic agent as someone who feels the obligation “to further the pleasures and alleviate the pains of other people.” Note that our deﬁnition of altruism differs somewhat from the deﬁnition used in moral philosophy, where “altruism” requires a moral agent to be concerned only about the welfare of others and not about his own happiness.

220

Fehr and Schmidt

where δ ∈ (0, 1) is a parameter reﬂecting the weight that is put on the maximin criterion. The utility function of an individual is then given by a convex combination of his own monetary payoff and the above social welfare function:11 Ui (x1 , x2 , . . . , x N ) = (1 − γ )x1 + γ [δ · min{x1 , . . . , x N } + (1 − δ) · (x1 + · · · + x N )]. In the two-player case, this boils down to x + γ (1 − δ)x j Ui (x1 , x2 ) = i (1 − γ δ)xi + γ x j

if if

xi < x j xi ≥ x j .

Note that the marginal rate of substitution between xi and x j is smaller if xi < x j . Hence, the decision maker cares about the well-being of the other person, but less so if the other person is better off than she is. Altruism in general and quasi-maximin preferences, in particular, can explain positive acts to other players, such as giving in DGs, voluntary contributions in PGGs, and the kind behavior of Responders in TGs and GEGs12 ; but, it is clearly inconsistent with the fact that, in some experiments, subjects try to retaliate and hurt other subjects, even if this is costly for them (as in the UG or a PGG with punishments). This is why Charness and Rabin augment quasi-maximin preferences by incorporating reciprocity (see Section 3.2.3). 3.1.2.

Relative Income and Envy

An alternative hypothesis is that subjects are concerned not only about the absolute amount of money they receive, but also about their relative standing compared with others. This “relative income hypothesis” has a long tradition in economics and goes back at least to Veblen (1922). Bolton (1991) formalized this idea in the context of an experimental bargaining game between two players and assumed that Ui (xi , x j ) = u i (xi , xi /x j ), where u(·, ·) is strictly increasing in its ﬁrst argument and where the partial derivative with respect to xi /x j is strictly positive for xi < x j and equal to 0 for xi ≥ x j . Thus, agent i suffers if she gets less than player j, but she does not care about player j if she is better off herself. Note that this utility function implies that ∂Ui /∂ x j ≤ 0, just the opposite of altruism. Hence, whereas this utility function is consistent with the behavior in the bargaining games considered by Bolton, it fails to explain 11

12

Note that Charness and Rabin do not normalize payoffs with respect to N . Thus, if the group size changes, and the parameters δ and γ are assumed to be constant, the importance of the maximin term in relation to the player’s own material payoff changes. However, even in these games, altruism has some implausible implications. For example, in a public good context, altruism implies that if the government provides part of the public good (ﬁnanced by taxes), then every dollar provided by the government “crowds out” one dollar of private, voluntary contributions. This “neutrality property” holds quite generally (Bernheim, 1986). However, it is in contrast to the empirical evidence reporting that the actual crowding out is rather small. This has led some researchers to include the pleasure of giving (a “warm glow effect”) in the utility function (Andreoni, 1989).

Fairness and Reciprocity

221

giving in DGs, GEGs, and TGs or voluntary contributions in public PGGs. The same problem arises in the envy-approach of Kirchsteiger (1994). 3.1.3.

Inequity Aversion

The preceding approaches assumed that utility is either monotonically increasing or monotonically decreasing in the well-being of other players. Fehr and Schmidt (1999) assume that a player is altruistic toward other players if their material payoffs are below an equitable benchmark, but she feels envy when the material payoffs of the other players exceed this level.13 In most experiments, it is natural to assume that an equitable allocation is an equal monetary payoff for all players. Fehr and Schmidt consider the simplest utility function capturing this idea. Ui (x1 , . . . , x N ) = xi − [αi /(N − 1)] max {x j − xi , 0} j=i

− [βi /(N − 1)] max

{xi − x j , 0}.

j=i

with βi ≤ αi and βi ≤ 1. Note that ∂Ui /∂ x j ≥ 0 if and only if xi ≥ x j . Note also that the disutility from inequality is larger if another person is better off than player i than if another person is worse off (αi ≥ βi ). This utility function can rationalize positive and negative actions toward other players. It is consistent with giving in DGs, GEGs, and TGs, and with the rejection of low offers in UGs. It can also explain voluntary contributions in PGGs and the costly punishment of free riders. A second important ingredient of this model is the assumption that individuals are heterogeneous. If all people were alike, it would be difﬁcult to explain why we observe that people sometimes resist “unfair” outcomes or manage to cooperate even though it is a dominant strategy for a selﬁsh person not to do so, whereas in other environments fairness concerns or the desire to cooperate do not seem to have much of an effect. Fehr and Schmidt show that the interaction of the distribution of types with the strategic environment explains why, in some situations, very unequal outcomes are obtained, whereas in other situations very egalitarian outcomes prevail. For example, in certain competitive environments (see, e.g., the UG with Proposer competition in Section 5.1), even a population that consists of only very fair types (high αs and βs) cannot prevent very uneven outcomes. The reason is that none of the inequity-averse players can enforce a more equitable outcome through her own actions. In contrast, in a PGG with punishment, a small fraction of inequity-averse players is sufﬁcient to threaten credibly that free riders will be punished, which induces selﬁsh players to contribute to the public good. 13

Daughety (1994) and Fehr et al. (1998) also assume that a player values the payoff of reference agents positively, if she is relatively better off, whereas she values the others’ payoff negatively, if she is relatively worse off.

222

Fehr and Schmidt

Using data that are available from many experiments on the UG, Fehr and Schmidt calibrate the distribution of α and β in the population. Keeping this distribution constant, they show that their model yields quantitatively accurate predictions across many bargaining, market, and cooperation games.14 Neilson (2000) provides an axiomatic characterization of the Fehr and Schmidt (1999) model of inequity aversion. He introduces the axiom of “selfreferent separability,” which requires that if the payoff differences between player i and any subset of all other players remain constant, then the preferences of player i should not be affected by the magnitude of these differences. Neilson shows that this axiom is equivalent to having a utility function that is additively separable in the individual’s own material payoff and the payoff differences to his opponents, which is an essential feature of the Fehr–Schmidt model. Neilson also offers a full axiomatic characterization of the more speciﬁc functional form used by Fehr and Schmidt. Bolton and Ockenfels (2000) independently developed a similar model of inequity aversion. They also show that their model can explain a wide variety of seemingly puzzling evidence (e.g., giving in DGs and GEGs and rejections in UGs). In their model, the utility function is given by Ui = Ui (xi , σi ), where

σi =

x N i j=1

1 N

xj

if if

N j=1

x j = 0

j=1

x j = 0.

N

For any given σi , the utility function is assumed to be weakly increasing and concave in player i’s own material payoff xi . Furthermore, for any given xi , the utility function is strictly concave in player i’s share of total income, σi , and obtains a maximum at σi = 1/N .15 Bolton and Ockenfels do not pin down a 14

15

One drawback of the piecewise linear utility function used by Fehr and Schmidt is that it implies corner solutions for some games where interior solutions are frequently observed. For example, in the DG, a decision maker with a Fehr-Schmidt utility function would either give nothing (if her β < 0.5) or share the pie equally (if β > 0.5). Giving away a fraction that is strictly in between 0 and 0.5 is optimal only in the nongeneric case, where β = 0.5. However, this problem can be avoided by assuming nonlinear inequity aversion. This speciﬁcation of the utility function has the disadvantage that it is not independent of a shift in payoffs. Consider, for example, a DG in which the dictator has to divide X dollars. Note that this is a constant sum game, because x1 + x2 ≡ X . If we reduce the sum of payoffs by X (i.e., if the dictator can take away money from her opponent or give to him out of her own pocket), then x1 + x2 = 0 for any decision of the dictator and thus we always have σ1 = σ2 = 1/2. Therefore, the theory makes the implausible prediction that, in contrast to the game where x1 + x2 = X > 0, all dictators should take as much money from their opponent as possible. A related problem has been noted by Camerer (1999, p. 61). Suppose that the UG is modiﬁed as follows: If the Responder rejects a proposal, the Proposer receives a small amount ε > 0 while the Responder receives zero. In this game, the rejection of a positive offer implies σ = 0, whereas acceptance implies σ > 0. Thus, the Responder never rejects any positive offer, no matter how small ε > 0.

Fairness and Reciprocity

223

speciﬁc functional form, so their utility function is more ﬂexible. However, this also makes it more difﬁcult to get closed-form solutions and quantitative predictions for the outcomes of many experiments. It also imposes less discipline on the researcher not to adjust the utility function to a speciﬁc set of data. For two-player games, Fehr and Schmidt and Bolton and Ockenfels often yield qualitatively similar results. With more than two players, there are some interesting differences. In this case, Fehr and Schmidt assume that a player compares herself with each of her opponents separately. This implies that her behavior toward an opponent depends on the income difference toward this person. In contrast, Bolton and Ockenfels assume that the decision maker is not concerned about each individual opponent, but only about the average income of all players. Thus, whether ∂Ui /∂ x j is positive or negative in the Bolton– Ockenfels model does not depend on j’s relative position toward i, but rather on how well i does, compared with the average. If xi is below the average, then iwould like to reduce j’s income even if j has a much lower income than i herself. On the other hand, if i is doing better than the average, then she is prepared to give to j even if j is much better off than i.16 3.1.4.

Altruism and Spitefulness

Levine (1998) offers a different solution to explain giving in some games and punishing in others. Consider the utility function Ui = xi + x j (ai + λa j )/(1 + λ), j=i

where 0 ≤ λ ≤ 1 and −1 < ai < 1 for all i ∈ {1, . . . , N }. Suppose ﬁrst that λ = 0. In this case, the utility function reduces to Ui = xi + ai j=i x j . If ai > 0, then person i is an altruist who wants to promote the well-being of other people; if ai < 0, then player i is spiteful. Although this utility function would be able to explain why some people contribute in PGGs and why some (other) people reject positive offers in the UG, it cannot explain why the same person who is altruistic in one setting is spiteful in another. To deal with this problem, suppose that λ > 0. In this case, an altruistic player i (with ai > 0) feels more altruistic toward another altruist than toward a spiteful person. In fact, if −λa j > ai , player i may behave spitefully herself. In most experiments, where there is anonymous interaction, the players do not know the parameter a j of their opponents and have to form beliefs about them. Thus, any sequential game becomes a signaling game in which beliefs about the other players’ types are crucially important to determine optimal strategies. This may give rise to a multiplicity of signaling equilibria. Levine uses the data from the UG to calibrate the distribution of a and to estimate λ (which is assumed to be the same for all players). He shows that, 16

See Camerer (1999) and Section 4.1 for a more extensive comparison of these two approaches.

224

Fehr and Schmidt

with these parameters, the model can reasonably ﬁt the data on centipete games, market games, and PGGs. However, because ai < 1, the model cannot explain positive giving in the dictator game. 3.2.

Models of Intention-Based Reciprocity

Models of social preferences share a common weakness. They assume that players are concerned only about the distributional consequences of their acts but not about the intentions that lead their opponents to choose these acts. To see that this may be a problem, consider the following two “mini-UGs” in which the strategy set of the Proposer is restricted. In the ﬁrst condition, the Proposer can choose between a 50:50 and an 80:20 split. In the second condition, the Proposer must choose between an 80:20 and a 20:80 division of the pie. All theories that look only at the distributional consequences must predict that, if a Responder rejects the 80:20 split in the ﬁrst condition, then she must also reject this offer in the second condition. However, in the second condition, a fair division of the pie was not feasible, and so the Responder may be more inclined to accept this offer, compared with the ﬁrst treatment in which the Proposer could have split the pie evenly, but chose not to do so. In fact, Falk, Fehr, and Fischbacher (2000a) report that the 80:20 split is rejected signiﬁcantly less often under the second condition.17 This is inconsistent with any theory of social preferences that rely only on preferences over income distributions. 3.2.1.

Fairness Equilibrium

In a pioneering article, Rabin (1993) starts from the observation that our behavior is often a reaction to the (expected) intentions of other people. If we feel that another person has been kind to us, we often have a desire to be kind as well. If we feel that somebody wanted to hurt us, we often have the desire to retaliate, even if this is personally costly. To model intentions explicitly, Rabin departs from traditional game theory and adopts the concept of “psychological game theory” that had been introduced by Geanakoplos, Pearce, and Stacchetti (1989). In psychological game theory, utilities depend not only on terminal-node payoffs, but also on players’ beliefs. Rabin restricts attention to two-player, normal-form games. Let A1 and A2 denote the (mixed) strategy sets for players 1 and 2, respectively, and let xi : A1 × A2 → R be player i’s material payoff function. 17

This criticism does not necessarily apply to Levine (1998). In his model, offering 80:20 may be interpreted as a signal that the Proposer is spiteful if the 50:50 split was available, and may be differently interpreted if the 50:50 split was not available. However, if a player knows the type of her opponent, her behavior is independent of what the opponent does to her and of why he does it to her.

Fairness and Reciprocity

225

We now have to deﬁne (hierarchies of) beliefs over strategies. Let ai ∈ Ai denote a strategy of player i. When ichooses her strategy, she must have some belief about the strategy to be chosen by player j. In all of the following i ∈ {1, 2} and j = 3 − i. Let b j denote player i’s belief about what player j is going to do. Furthermore, to rationalize her expectation b j , player i must have some belief about what player j believes that player iis going to do. This belief about beliefs is denoted by ci . The hierarchy of beliefs could be continued ad inﬁnitum, but the ﬁrst two levels of beliefs are sufﬁcient to deﬁne reciprocal preferences. Rabin starts with a “kindness function,” f i (ai , b j ), which measures how kind player i is to player j. If player i believes that her opponent chooses strategy b j , then she chooses effectively her opponent’s payoff out of the set [x lj (b j ), x hj (b j )], where x lj (b j ) (x hj (b j )) is the lowest (highest) payoff of player j that can be induced by player i if j chooses b j . According to Rabin, a “fair” or “equitable” payoff for player j, x jf (b j ), is just the average of the lowest and highest payoffs (excluding Pareto-dominated payoffs, however). Note that this “fair” payoff is independent of the payoff of player i. The kindness of player i toward player j is measured by the difference between the actual payoff she gives to player j and the “fair” payoff, relative to the whole range of feasible payoffs:18 % &'% & f f i (ai , b j ) ≡ x j (b j , ai ) − x j (b j ) x hj (b j ) − x lj (b j ) , with j = 3 − i and f i (ai , b j ) = 0, if x hj (b j ) − x lj (b j ) = 0. Note that f i (ai , b j ) > 0 if and only if player i gives player j more than the “fair” payoff. Finally, we have to deﬁne player i’s belief about how kindly she is being treated by player j. This is deﬁned in exactly the same manner, but beliefs have to move up one level. Thus, if player i believes that player j chooses b j and if she believes that player j believes that i chooses ci , then player i perceives player j’s kindness as given by % &'% & f f j (b j , ci ) ≡ xi (ci , b j ) − xi (ci ) xih (ci ) − xil (ci ) , with j = 3 − i and f j (b j , ci ) = 0, if xih (ci ) − xil (ci ) = 0. These kindness functions can now be used to deﬁne a player’s utility function: Ui (a, b j , ci ) = xi (a, b j ) + f j (b j , ci )[1 + f i (ai , b j )], where a = (a1 , a2 ). Note that if player j is perceived to be unkind ( f j (·) < 0), player i wants to be as unkind as possible, too. On the other hand, if f j (·) is positive, player i gets some additional utility from being kind to player j as 18

A disturbing feature of Rabin’s formulation is that he excludes Pareto-dominated payoffs in the deﬁnition of the “fair” payoff, but not in the denominator of the kindness term. Thus, adding a Pareto-dominated strategy for player j would not affect the fair payoff, but it would reduce the kindness term.

226

Fehr and Schmidt

well. Note also that the kindness terms have no dimension and that they must lie in the interval [−1, 0.5]. Thus, the utility function is sensitive to positive afﬁne transformations. Furthermore, the kindness term becomes less and less important the higher the material payoffs are. A “fairness equilibrium” is an equilibrium in a psychological game with these payoff functions [i.e., a pair of strategies (a1 , a2 ) that are mutually best responses to each other and a set of rational expectations b = (b1 , b2 ) and c = (c1 , c2 ) that are consistent with equilibrium play]. Rabin’s theory is important because it was the ﬁrst contribution that made the notion of reciprocity precise and explored the consequences of reciprocal behavior. The model provides several interesting insights, but it is not well suited for predictive purposes. It is consistent with rejections in the UG, but there exist many other unreasonable equilibria, including equilibria in which the Responder receives more than 50 percent of the pie. The multiplicity of equilibria is a general feature of Rabin’s model. If material payoffs are sufﬁciently small so that psychological payoffs matter, then there are always multiple equilibria. In particular, there is one equilibrium in which both players are nice to each other and one in which they are nasty. Both equilibria are supported by selffulﬁlling prophecies, so it is difﬁcult to predict which equilibrium is going to be played. The theory also predicts that players do not undertake kind actions unless others have shown their kind intentions. Suppose, for example, that in the prisoner’s dilemma, player 2 has no choice but is forced to cooperate. If player 1 knows this, then – according to Rabin’s theory – she will interpret player 2’s cooperation as “neutral” ( f 2 (·) = 0). Thus, she will look at only her material payoffs and will defect. This contrasts with models of inequity aversion in which player 2 would cooperate, irrespective of the reason for player 1’s cooperation. We will discuss the experimental evidence that can be used to discriminate between the different approaches in Section 4. 3.2.2.

Intentions in Sequential Games

Rabin’s theory has been deﬁned only for two-person, normal-form games. If the theory is applied to the normal form of simple sequential games, some very implausible equilibria may arise. For example, in the sequential prisoner’s dilemma, unconditional cooperation of the second player is part of a “fairness” equilibrium. The reason is that Rabin’s equilibrium notion does not force player 2 to behave optimally off the equilibrium path. In a subsequent paper, Dufwenberg and Kirchsteiger (1998) generalized Rabin’s theory to N -person extensive form games for which they introduce the notion of a “Sequential Reciprocity Equilibrium” (SRE). The main innovation is to keep track of beliefs about intentions as the game evolves. In particular, it has to be speciﬁed how beliefs about intentions are formed off the equilibrium path. Given this system of beliefs, strategies have to form a fairness equilibrium

Fairness and Reciprocity

227

in every proper subgame.19 Applying their model to several examples, Dufwenberg and Kirchsteiger show that conditional cooperation in the prisoner’s dilemma is an SRE. They also show that it can be an SRE in the UG in which the Proposer makes an offer that is rejected by the Responder with certainty. This is an equilibrium because both players believe that the other party wants to hurt them. However, even in these extremely simple sequential games, the equilibrium analysis is fairly complex, and there are typically many equilibria with different equilibrium outcomes due to different self-fulﬁlling beliefs about intentions. 3.2.3.

Merging Intentions and Social Preferences

Falk and Fischbacher (1999) also generalize Rabin (1993). They consider N person extensive form games and allow for the possibility of incomplete information. Furthermore, they measure “kindness” in terms of inequity aversion. A strategy of player j is perceived to be kind by player i if it gives rise to a payoff for player i that is higher than the payoff of player j. Note that this is fundamentally different from Rabin and Dufwenberg and Kirchsteiger, who deﬁne “kindness” in relation to the feasible payoffs of player i and not in relation to the payoff that player j gets. Furthermore, Falk and Fischbacher distinguish whether an unequal distribution could have been altered by player j or whether player j was a “dummy player” who is unable to affect the distribution by his actions. In the former case, the kindness term gets a higher weight than in the latter. However, even if player j is a dummy player who has no choice to make, the kindness term (which now reﬂects pure inequity aversion) gets a positive weight. Thus, Falk and Fischbacher merge intention-based reciprocity and inequity aversion. Their model is quite complex. At every node where player i has to move, she has to evaluate the kindness of player j that depends on the expected payoff difference between the two players and on what player j could have done about this difference. This “kindness term” is multiplied by a “reciprocation term,” which is positive if player i is kind to player j and negative if i is unkind. The product is further multiplied by an individual reciprocity parameter that measures the weight of player i’s desire to reciprocate, compared with his 19

Dufwenberg and Kirchsteiger also suggest several other deviations from Rabin’s model. In particular, they measure kindness “in proportion to the size of the gift” (i.e., in monetary units). This has the advantage that reciprocity does not disappear as the stakes become larger, but it also implies that the kindness term in the utility function has the dimension of “money squared,” which again makes the utility function sensitive to linear transformations. Furthermore, they deﬁne “inefﬁcient strategies” (which play an important role in the deﬁnition of the kindness term) as strategies that yield a weakly lower payoff for all players than some other strategy for all subgames. Rabin (1993) deﬁnes inefﬁcient strategies as those that yield weakly less on the equilibrium path. However, with more than two players in Dufwenberg and Kirchsteiger (1998), the problem arises that an additional dummy player may render an inefﬁcient strategy efﬁcient and might thus affect the size of the kindness term.

228

Fehr and Schmidt

desire to get a higher material payoff. These preferences, together with the underlying game form, deﬁne a psychological game a` la Geanakoplos et al. (1989). A subgame perfect psychological Nash equilibrium of this game is called a “reciprocity equilibrium.” Falk and Fischbacher show that there are parameter constellations for which their model is consistent with the stylized facts of the UG, the GEG, the DG, the PGG, and the prisoner’s dilemma game. Furthermore, there are parameter constellations that can explain the difference in outcomes if one player moves intentionally and if she is a dummy player. Because their model contains variants of a pure intention-based reciprocity model (e.g., Rabin) and a pure inequity aversion model (e.g., Fehr and Schmidt or Bolton and Ockenfels) as special cases, it is possible to get a better ﬁt of the data, but at a signiﬁcant cost in terms of the complexity of the model. Another attempt to combine social preferences with intention-based reciprocity is due to Charness and Rabin (2000). We described their model of quasi-maximin preferences in Section 3.1.1. In a second step, they augment these preferences by introducing a demerit proﬁle ρ ≡ (ρ1 , . . . , ρ N ), where ρi ∈ [0, 1] is a measure of how much player i deserves from the point of view of all other players. The smaller the ρi , the more does player i count in the utility function of the other players. Given a demerit proﬁle ρ, player i’s utility function is given by " # Ui (x1 , x2 , . . . , x N | ρ) = (1 − γ )xi + γ [δ · min xi , min{x j + dρ j } j=i max{1 − kρ j , 0} · x j ) − f ρ j x j ], + (1 − δ) · (xi + j=i

j=i

where d, k, f ≥ 0 are three new parameters of the model. If d = k = f = 0, this boils down to the quasi-maximin preferences described previously. If d and k are large, then player i does not want to promote the well-being of player j. If f is large, player i may actually want to hurt player j. The crucial step is to endogenize the demerit proﬁle ρ. Charness and Rabin do this by comparing player j’s strategy to an unanimously agreed-upon, exogenously given “selﬂess standard” of behavior. The more player j falls short of this standard, the higher is his demerit factor ρ j . A “reciprocal fairness equilibrium” (RFE) is a strategy proﬁle and a demerit proﬁle such that each player is maximizing his utility function given other players’ strategies and given the demerit proﬁle that is itself consistent with the proﬁle of strategies. This deﬁnition implicitly corresponds to a Nash equilibrium of a psychological game as deﬁned by Geanakoplos et al. (1989). The notion of RFE has several drawbacks that make it almost impossible to use it for the analysis of even the simplest experimental games. First of all, the model is incomplete because preferences are deﬁned only in equilibrium (i.e., for an equilibrium demerit proﬁle ρ), and it is unclear how to evaluate outcomes out of equilibrium or if there are multiple equilibria. Second, it requires that all players have the same utility functions and agree on a “quasi-maximin”

Fairness and Reciprocity

229

social welfare function to determine the demerit proﬁle ρ. Finally, the model is so complicated and involves so many free parameters that it would be very difﬁcult to test it empirically. Charness and Rabin show that if the “selﬂess standard” is sufﬁciently small, then every RFE corresponds to a Nash equilibrium of the game in which players simply maximize their quasi-maximin utility functions. Therefore, in the analysis of the experimental evidence, they restrict attention to the much simpler model of quasi-maximin preferences that we discussed in Section 3.1.1. 3.3.

Axiomatic Approaches

The models considered so far assume very speciﬁc utility functions that are deﬁned either on (lotteries over) material payoff vectors and/or on beliefs about other players’ strategies and other players’ beliefs. These utility functions are based on psychological plausibility, yet most of them lack an axiomatic foundation. Segal and Sobel (1999) take the opposite approach and ask what kind of axioms generate preferences that can reﬂect fairness and reciprocity. Their starting point is to assume that players have preferences over strategy proﬁles rather than over material allocations. Consider a given two-player game and let %i , i ∈ {1, 2}, denote the space of (mixed) strategies of player i. For any strategy proﬁle (σ1 , σ2 ) ∈ % × %1 , let v i (σ1 , σ2 ) denote player i’s material payoff function, assuming that these “selﬁsh preferences” satisfy the von Neumann–Morgenstern axioms. However, the actual preferences of player i are given by a preference relation σi , σ j over her own strategies. Note that this preference relation depends on the strategy chosen by player j. Segal and Sobel show that if the preference relation σi , σ j satisﬁes the independence axiom and if, for a given σ j , player i prefers to get a higher material payoff for herself if the payoff of player j is held constant (self-interest), then the preferences σi , σ j over %i can be represented by a utility function of the form20 u i (σi , σ j ) = v i (σi , σ j ) + ai , σ j v j (σi , σ j ). In standard game theory, ai , σ j ≡ 0. Positive values of this coefﬁcient mean that player i has altruistic preferences, negative values of ai , σ j mean that she is spiteful. Note that the coefﬁcient ai , σ j depends on σ j . Therefore, whether a player is altruistic or spiteful may depend on the strategy chosen by her opponent, so there is scope to model reciprocity. To do so, Segal and Sobel introduce an additional axiom, called “reciprocal altruism.” This axiom requires that, when player j chooses a strategy σ j that player i likes better than some other strategy σ j , then player i prefers strategies that give a higher payoff to player j. Segal and Sobel show that this axiom implies that the coefﬁcient ai , σ j varies with 20

The construction resembles that of Harsanyi’s (1955) “utilitarian” social welfare function %αi u i . Note, however, that Harsanyi’s axiom of Pareto efﬁciency is stronger than the axiom of selfinterest used here. Therefore, the ai , σ j in Segal and Sobel may be negative.

230

Fehr and Schmidt

σ j such that (other things being equal) the coefﬁcient increases if and only if player j chooses a “nicer” strategy. The models of social preferences that we discussed at the beginning of this chapter – in particular the models of altruism, relative income, inequity aversion, quasi-maximin preferences, and altruism and spitefulness – can all be seen as special cases of a Segal–Sobel utility function. Segal and Sobel can also capture some, but not all, aspects of intention-based reciprocity. For example, in Rabin’s (1993) model, a player’s utility depended not only on the strategy chosen by her opponent, but also on why he has chosen this strategy. This can be illustrated in the “Battle of the Sexes” game. Player 1 may go to boxing, because she expects player 2 to go to boxing, too (which is kind of player 2, given that he believes player 1 to go to boxing). Yet, she may also go to boxing, because she expects player 2 to go to ballet (which is unkind of player 2 if he believes player 1 to go to boxing) and which is punished by the boxing strategy of player 1. This effect cannot be captured by Segal and Sobel, because in their framework preferences are deﬁned on strategies only. 4. DISCRIMINATING BETWEEN THEORIES OF FAIRNESS Most theories discussed in Section 3 have been developed during the last few years, and the evidence to discriminate between these theories is still limited. As we will show, however, the available data do exhibit some clear qualitative regularities that give a ﬁrst indication of the advantages and disadvantages of the different theories.21 4.1.

Who Are the Relevant Reference Actors?

All theories of fairness and reciprocity are based on the idea that actors compare themselves with a set of reference actors. To whom do people compare themselves? In bilateral interactions, there is no ambiguity about who the relevant reference actor is. In multiperson interactions, however, the answer is less clear. Most of the theories that are applicable in the N -person context assume that players make comparisons with all other N − 1 players in the game. The only exemption is the theory of Bolton and Ockenfels (BO). They assume that players compare themselves only with the “average” player in the game and do not care about inequities between the other players. In this regard, the BO approach is inspired by the data of Selten and Ockenfels (1998) and G¨uth and van Damme (1998), which seem to suggest that actors do not care for inequities among the other reference agents. It would greatly simplify matters if this aspect of the BO theory were correct. 21

This section rests to a large extent on joint work of one of the authors with Armin Falk and Urs Fischbacher (Falk, Fehr, and Fischbacher, 2000a, 2000b, henceforth FFF). In particular, the organization of this section according to the questions herein and many of the empirical results emerged from this joint project.

Fairness and Reciprocity

231

One problem with this aspect of the BO approach is that it renders the theory unable to explain the punishment pattern in the PGG with punishment. Remember that, in this experiment, the assignment of one punishment point reduces the income of the punished member by 3 points. The theory of BO predicts that punishing subjects are indifferent between punishing a free rider and punishing a cooperator. All that matters is whether punishment brings the income of the punishing subject closer to the average income in the group and, for this purpose, the punishment of a cooperator is equally good as the punishment of a defector. Yet, in contrast to this indifference prediction, the cooperators predominantly punish the defectors. To further test the BO model, Fehr and Fischbacher (2000) conducted the following Third-Party Punishment Game. There are three players: A, B, and C. Player A is endowed with 100 experimental currency units and must decide how much of the 100 units to give to B, who has no endowment. Player B is just a dummy player and has no decision power. Player C has an endowment of 50 units and can spend this money on the punishment of A after he observes how much A gave to B. For any money unit player C spends on punishment, the payoff of player A is reduced by 3 units.22 Note that without punishment, player C is certain to get her fair share of the total surplus (50 of 150 units). Therefore, BO predict that C will never punish. In contrast to this prediction, player A is, however, punished a lot. The less player A gives to B, the more C punishes A. For example, if A gives nothing, his income is reduced by roughly 30 percent. This indicates that many players do care about inequities among other players. Further support for this hypothesis comes from Charness and Rabin (2000), who offered player C the choice between payoff allocations (575, 575, 575) and (900, 300, 600). Because both allocations give player C the fair share of one-third of the surplus, BO predict that player C will choose the second allocation that gives him a higher absolute payoff. However, 54 percent of the subjects preferred the ﬁrst allocation. Note that the self-interest hypothesis also predicts the second allocation, so one cannot conclude that the other 46 percent of the subjects have BO preferences. A recent paper by Zizzo and Oswald (2000) also strongly suggests that subjects care about the inequities among the set of reference agents. It is important to note that theories in which fair-minded subjects have multiple reference agents do not necessarily imply that fair subjects take actions in favor of all other reference agents. To illustrate this, consider the following three-person UG (G¨uth and van Damme, 1998). In this game, there is a Proposer, a Responder who can reject or accept the proposal, and a passive Receiver who can do nothing but collect the amount of money allocated to him. The Proposer proposes an allocation (x1 , x2 , x3 ), where x1 is the Proposer’s payoff, x2 the Responder’s payoff, and x 3 the Receiver’s payoff. If the Responder rejects, all three players get nothing; otherwise, the proposed allocation is implemented. 22

In the experimental instructions, the value-laden term “punishment” was not used. The punishment option of player C was described in neutral terms by telling subjects that player C could “assign points” to player A that reduced the incomes of A and C in the way described previously.

232

Fehr and Schmidt

In this game, the Proposer allocates substantial fractions of the surplus to the Responder, but little or nothing to the Receiver. Moreover, G¨uth and van Damme (p. 230) report that, “there is not a single rejection that can clearly be attributed to a low share for the dummy (i.e., the Receiver, FS).” BO take this as evidence in favor of their approach because the Proposer and the Responder apparently do not take the Receiver’s interest into account. However, this conclusion is premature, because it is easy to show that approaches with multiple reference agents are fully consistent with the G¨uth and van Damme data. The point can be demonstrated in the context of the Fehr–Schmidt model. Assume for simplicity that the Proposer makes an offer of x1 = x2 = x, whereas the Receiver gets x3 < x. It is easy to show that a Responder with FS preferences will never (!) reject such an allocation, even if x3 = 0 and even if he is very fair-minded (i.e., has a high β-coefﬁcient). To see this, note that the utility of the Responder if he accepts is given by U2 = x − (β/2)(x − x3 ), which is positive for all β ≤ 1 and thus higher than the rejection payoff of zero. A similar calculation shows that it takes implausibly high β-values to induce a Proposer to take the interests of the Receiver into account.23 4.2.

Equality Versus Efﬁciency

Many models of fairness are based on the deﬁnition of a fair or equitable outcome to which people compare the available payoff allocations. In experimental games, a natural ﬁrst approximation for the relevant reference outcome is the equality of material payoffs. The quasi-maximin theory of Charness and Rabin assumes instead that subjects care for the total surplus accruing to the group. A natural way to study whether there are subjects who want to maximize the total surplus is to construct experiments in which the predictions of both theories of inequality aversion (BO and FS) are in conﬂict with surplus maximization. This has been done by Bolle and Kritikos (1998), Andreoni and Miller (2000), Andreoni and Vesterlund (2001). Charness and Rabin (2000), Cox (2000), and G¨uth, Kliemt, and Ockenfels (2000). Except for the G¨uth et al. paper, these papers indicate that, in DG situations a nonnegligible fraction of the subjects is willing to give up some of their own money to increase total surplus, even if this implies that they generate inequality that is to their disadvantage. Andreoni and Miller and Andreoni and Vesterlund, for example, conducted DGs with varying prices for transferring money to the Receiver. In some conditions, the Allocator had to give up less than a dollar to give the Receiver a dollar; in some conditions, the exchange ratio was 1:1, and in some other conditions the Allocator had to give up more than one dollar. In the usual DGs the exchange 23

The Proposer’s utility is given by U1 = x1 − (β/2)[(x1 − x 2 ) + (x1 − x3 )]. If we normalize the surplus to one and take into account that x 1 + x2 + x3 = 1, U1 = (β/2) + (3/2)x1 [(2/3) − β]; thus, the marginal utility of x1 is positive unless β exceeds 2/3. This means that Proposers with β < 2/3 will give the Responders just enough to prevent rejection. Because the Responders neglect the interests of the Receivers, nothing is given to the Receivers.

Fairness and Reciprocity

233

ratio is 1:1, and there are virtually no cases in which an Allocator transfers more than 50 percent of the surplus. In contrast, in DGs with an exchange ratio of 1 : 3 (or 1 : 2), a nonnegligible number of subjects makes transfers such that they end up with less money than the Receiver. This contradicts BO, FS, and Falk and Fischbacher because in these models fair subjects never take actions that give the other party more than they get. It is, however, consistent with altruistic preferences or quasi-maximin preferences. What is the relative importance of this kind of behavior? Andreoni and Vesterlund are able to classify subjects in three distinct classes. They report that 44 percent of their subjects (N = 141) are completely selﬁsh, 35 percent exhibit egalitarian preferences (i.e., they tend to equalize payoffs), and 21 percent of the subjects can be classiﬁed as surplus maximizers. Charness and Rabin report similar results with regard to the fraction of egalitarian subjects in a simple DG, where the Allocator had to choose between (own, other) allocations of (400, 400) and (400, 750). Thirty-one percent of the subjects preferred the egalitarian and 69 percent the surplus-maximizing allocation. Among the 69 percent, there may, however, also be many selﬁsh subjects who no longer choose the surplusmaximizing allocation when this decreases their payoff only slightly. This is suggested by the DG where the Allocator had to choose between (400, 400) and (375, 750). Here, only 49 percent of surplus-maximizing choices were observed. Charness and Rabin also present questionnaire evidence indicating that, when the income disparities are greater, the egalitarian motive gains weight at the cost of the surplus maximization motive. When the Allocator faces a choice between (400, 400) and (400, 2,000), 62 percent prefer the egalitarian allocation. The evidence cited in the described papers indicates that surplus maximization is a relevant motive in DGs. This motive has not been included in the prevailing models of inequity aversion, but it would be straightforward to do this. It should also be remembered that any positive transfer in DGs is incompatible with intention-based reciprocity models, irrespective of the exchange rate. We would like to stress, however, that the DG is different from many economically important games and real-life situations, because in economic interactions it is rarely the case that one player is at the complete mercy of another player. It may well be that, in situations where both players have some power to affect the outcome, the surplus maximization motive is less important than in DGs. The gift exchange experiments by Fehr et al. (1993, 1998) are telling in this regard because they embed a situation that is like a DG into an environment with competitive and strategic elements. These experiments exhibit a competitive element because the GEG is embedded into a competitive experimental market. The experiments also exhibit a strategic element because the Proposers are wage setters and have to take into account the likely effort responses of the Responders. Yet, once the Responder has accepted a wage offer, the experiments are similar to a DG because, for a given wage, the Responder essentially determines the income distribution and the total surplus by his choice of the effort level. The gift exchange experiments

234

Fehr and Schmidt

are an ideal environment to check the robustness of the surplus maximization motive, because an increase in the effort cost by one unit increases, on average, the total surplus by ﬁve units. Therefore, the maximal feasible effort level is, in general, also the surplus-maximizing effort level. If surplus maximization is a robust motive capable of overturning inequity aversion, one would expect that many Responders choose effort levels that give the Proposer a higher monetary payoff than the Responder.24 Moreover, surplus maximization also means that we should not observe a positive correlation between effort and wages because, for a given wage, the maximum feasible effort always maximizes the total surplus.25 However, neither of these implications is supported by the data. Effort levels that give the Proposer a higher payoff than the Responder are virtually nonexistent. In the overwhelming majority of the cases effort is substantially below the maximally feasible level, and in less than 2 percent of the cases the Proposer earns a higher payoff than the Responder.26 Moreover, almost all subjects who regularly chose nonminimal effort levels exhibited a reciprocal effort–wage relation. These numbers are in sharp contrast to the 49 percent of the Allocators in Charness and Rabin who preferred the (375, 750) allocation over the (400, 400) allocation. One reason for the difference across studies is perhaps the fact that it was much cheaper to increase the surplus in the Charness–Rabin example. Although the surplus increases in the gift exchange experiments on average by ﬁve units, if the Responder sacriﬁces one payoff unit, the surplus increases by 14 units per payoff unit sacriﬁced in the Charness–Rabin case. This suggests that surplus maximization gives rise to a violation of the equality constraint only if surplus increases are extremely cheap. A second reason for the behavioral difference may be that, when both players have some power to affect the outcome, the motive to increase the surplus is quickly crowded out by other considerations. This reason is quite plausible, insofar as the outcomes in DGs themselves are notoriously nonrobust. Although the experimental results on UGs, GEGs, or PGGs are fairly robust, the DG seems to be a rather fragile situation in which minor factors can have large effects. Cox (2000), for example, reports that, in his DGs, 100 percent of all subjects transferred positive amounts.27 This result contrasts sharply with many other games, including the games in Charness and Rabin and many other DGs. To indicate the other extreme, Hoffman, McCabe, Shachat, and Smith 24

25 26 27

The Responder’s effort level may, of course, also be affected by the intentions of the Proposer. For example, paying a high wage may signal fair intentions that may increase the effort level. Yet, because this tends to raise effort levels, we would have even stronger evidence against the surplus maximization hypothesis, if we observe little or no effort choices that give the Proposer a higher payoff than the Responder. There are degenerate cases in which this is not true. The total number of effort choices is N = 480 in these experiments (i.e., the results are not an artifact of a low number of observations). In Cox’s experiment, both players had an endowment of 10 and the Allocator could transfer his endowment to the Receiver where the transferred amount was tripled by the experimenter.

Fairness and Reciprocity

235

(1994), Eichenberger and Oberholzer-Gee (1998), and List and Cherry (2000) report on DGs with extremely low transfers.28 Likewise, in the Impunity Game of Bolton and Zwick (1995), which is very close but not identical to a DG, the vast majority of Proposers did not shy away from making very unfair offers. The Impunity Game differs from the DG only insofar as the Responder can reject an offer; however, the rejection destroys only the Responder’s, but not the Proposer’s, payoff. The notorious nonrobustness of outcomes in situations resembling the DG indicates that one should be very careful in generalizing the results found in these situations to other games. Testing theories of social preferences in DGs is a bit like testing the law of gravity with a table tennis ball. In both situations, minor unobserved distortions can have large effects. Therefore, we believe that it is necessary to show that the same motivational forces that are inferred from DGs are also behaviorally relevant in economically more important games. One way to do this is to apply the theories that have been constructed on the basis of DG experiments to predict outcomes in other games. With the exemption of Andreoni and Miller (2000), this has not yet been done. Andreoni and Miller (2000) estimate utility functions based on the results of their DG experiments and use them to predict cooperation behavior in a standard PGG. They predict behavior in period 1 of these games, in which cooperation is often quite high, rather well. However, their predictions are far away from ﬁnal period outcomes, where cooperation is typically very low. In our view, the low cooperation rates in the ﬁnal period of repeated PGGs constitutes a strong challenge for models that rely exclusively on altruistic or surplus-maximizing preferences. Why should a subject with a stable preference for the payoff of others or the payoff of the whole group contribute much less in the ﬁnal period, compared with the ﬁrst period? Models of inequity aversion and intentionbased or type-based reciprocity models provide a plausible explanation for this behavior. All of these models predict that fair subjects make their cooperation contingent on the cooperation of others. Thus, if the fair subjects realize that there are sufﬁciently many selﬁsh decisions in the course of a PGG experiment, they cease to cooperate as well. 4.3.

Revenge Versus Inequity Reduction

Subjects with altruistic and quasi-maximin preferences do not take actions that reduce other subjects’ payoffs. Yet, this is frequently observed in many important games. Models of inequity aversion account for this by assuming that the payoff reduction is motivated by a desire to reduce disadvantageous inequality. In intention-based reciprocity models and in Levine (1998), subjects punish 28

In Eichenberger and Oberholzer-Gee (1998), almost 90 percent of the subjects gave nothing. In Hoffman et al. (1994), 64 percent gave nothing, and 19 percent gave between 1 percent and 10 percent. In List and Cherry, subjects earned their endowment in a quiz. Then they played the DG. Roughly 90 percent of the Allocators transferred nothing to the Receivers.

236

Fehr and Schmidt

if they observe an action that is perceived to be unfair or that reveals that the opponent is spiteful. In these models, players want to reduce the opponent’s payoff irrespective of whether they are better or worse off than the opponent, and irrespective of whether they can change income shares or income differences. Furthermore, intention-based theories predict that, in games in which no intention can be expressed, there will be no punishment. Therefore, a clean way to test for the relevance of intentions is to conduct control treatments in which choices are made through a random device or through some neutral and disinterested third party. Blount (1995) was the ﬁrst who applied this idea to the UG. Blount compared the rejection rate in the usual UG to the rejection rates in UGs in which either a computer generated a random offer or a third party made the offer. Because, in the random offer condition and the third-party condition, a low offer cannot be attributed to the greedy intentions of the Proposer, intention-based theories predict a rejection rate of zero in these conditions, whereas theories of inequity aversion still allow for positive rejection rates. Levine’s theory is also consistent with positive rejection rates in these conditions, but his theory predicts a decrease in the rejection rate relative to the usual condition, because low offers made by humans reveal that the type who made the offer is spiteful, which can trigger a spiteful response. Blount, indeed, observes a signiﬁcant and substantial reduction in the acceptance thresholds of the Responders in the random offer condition, but not in the third-party condition. Thus, the result of the random offer condition is consistent with intention- and type-based models, whereas the result of the third-party condition is inconsistent with the motives captured by these models. Yet, these puzzling results may be from some problematic features in Bount’s experiments.29 Subsequently, Offermann (1999) and FFF (2000b) conducted further experiments with computerized offers, but without the other worrisome features in Blount. In particular, in these experiments, the Responders knew that a rejection affects the payoff of a real, human “Proposer.” Offerman ﬁnds that subjects are 67 percent more likely to reduce the opponent’s payoff when the opponent made an intentional hurtful choice, compared with a situation in which a computer made the hurtful choice. FFF (2000b) conducted an experiment – invented by Abbink, Irlenbusch, and Renner (2000) – that simultaneously allows for the examination of positive and negative reciprocity. In this game, player A can give player B any integer amount of money g ∈ [0, 6] or, alternatively, she can take away from player B any integer amount of money t ∈ [1, 6]. In case of g > 0, the experimenter triples g so that player B receives 3g. If player A takes away t, player A gets 29

Blount’s results may be affected by the fact that subjects (in two of three treatments) had to make decisions as a Proposer and as a Responder before they knew their actual roles. After subjects had made their decisions in both roles, the role for which they received payments was determined randomly. In one of Blount’s treatments, deception was involved. Subjects believed that there were Proposers, although in fact the experimenters made the proposals. All subjects in this condition were “randomly” assigned to the responder role. In this treatment, subjects also were not paid according to their decisions, but they received a ﬂat fee instead.

Fairness and Reciprocity

237

t and player B loses t. After player B observes g or t, she can pay player A an integer reward r ∈ [0, 18] or she can reduce player A’s income by making an investment i ∈ [1, 6]. A reward transfers one money unit from player B to player A. An investment i costs player B exactly i, but reduces player A’s income by 3i. This game was played in a random choice condition and in a human choice condition. It turns out that when the choices are made by a human player A, player B invests signiﬁcantly more into payoff reductions for all t ∈ [1, 6]. However, as in Blount and Offerman, payoff reductions also occur when the computer makes a hurtful choice. Kagel, Kim, and Moser (1996) provide further support that intentions play a role for payoff-reducing behavior. In their experiments, subjects bargained over 100 chips in a UG. They conducted several treatments that varied the money value of the chips and the information provided about the money value. For example, in one treatment, the Proposers received three times more money per chip than the Responders (i.e., the equal money split requires that the Responders receive 75 chips). If the Responders know that the Proposers know the different money values of the chips they reject, unequal money splits much more frequently than if the Responders know that the Proposers do not know the different money values of the chips. Thus, knowingly unequal proposals were rejected at higher rates than unintentional unequal proposals. Another way to test for the relevance of intention-based or type-based punishments is to examine situations in which the subjects cannot increase their relative share or decrease payoff differences. FFF (2000a) report the results of UGs and PGGs with punishment that have this feature. In the ﬁrst (standard) treatment of the UG, the Proposers could propose a (5, 5) or an (8, 2) split of the surplus (the ﬁrst number represents the Proposer’s payoff). In case of rejection, both players received zero. In the second treatment, the Proposers had the same options, but a rejection now meant that the payoff was reduced for both players by two units. The BO model, as well as the FS model, predict, therefore, that there will be no rejections in the second treatment, whereas intention-based and type-based models predict that punishments will occur. It turns out that the rejection rate of the (8, 2) offer is 56 percent in the ﬁrst and 19 percent in the second treatment. Thus, roughly one-third (19/57) of the rejections are consistent with a pure taste for punishment, as conceptualized in intention- and type-based models.30 FFF (2000a) also report the results of PGGs with punishment in which the punishing subjects could not change the payoff difference between themselves and the punished subject. In one of their treatments, subjects had to pay one money unit to reduce the payoff of another group member by one unit. Thus, BO and FS both predict that there will be no punishment at all in this condition. 30

Ahlert, Cr¨uger, and G¨uth (1999) also report a signiﬁcant amount of punishment in UGs, in which the Responders cannot change the payoff difference. However, because they do not have a control treatment, it is not possible to say something about the relative importance of this kind of punishment.

238

Fehr and Schmidt

In a second treatment, investing one unit into punishment reduced the payoff of the punished group member by three units. FFF report that 51 percent of all subjects (N = 93) cooperate, which is still compatible with both BO and FS. However, another 51 percent of all cooperators punish the defectors. They invest, on average, 4.8 money units into punishment. Thus, 25 percent of the subjects punish free-riding, which is incompatible with BO and FS. To evaluate the relative importance of this amount of punishment, we have to compare these results with the results of the second condition. In the second condition, 61 percent of all subjects (N = 120) cooperate, and 59 percent of them punish the defectors (by imposing a punishment of 5.7 on average). Thus, the overall percentage of subjects who punish the defectors in the second condition is 36 percent. This suggests that a rather large fraction (i.e., 25/36) of the overall amount of punishment is not consistent with BO and FS. Taken together, the evidence from Blount (1995), Offerman (1999), and FFF (2000b) indicates that the motive to punish unfair intentions or unfair types plays an important role. Although the evidence provided by the initial study of Blount was mixed, the subsequent studies indicate a clear role of these motives. However, the evidence also suggests that inequity aversion plays an additional, nonnegligible role. The evidence from the experiments in FFF (2000a) suggests that many subjects who reduce the payoff of other players do not have the desire to change the equitability of the payoff allocation. Instead, a large fraction of these subjects seems to be driven by the desire to punish (i.e., a desire to hurt the other player). It is worthwhile to point out that this desire to hurt the other players, although consistent with intention- and type-based models of reciprocity, does not necessarily constitute evidence in favor of these models. The reason is that the desire to reduce the payoff of other players may also be triggered by an unfair payoff allocation per se.31 4.4.

Does Kindness Trigger Rewards?

Do intention- and type-based theories of fairness do equally well in the domain of rewarding behavior? Evidence in this domain is much more mixed. Some experimental results suggest that rewarding behavior is almost unaffected by these motives. Other results indicate some minor role, and only one paper ﬁnds an unambiguous positive effect of intention- or type-based reciprocity. 31

Assume that fair subjects have the following utility function: u i = xi + αi [1/(n − 1)] ×[ j=i β(xi − x j )v(x j )], where αi measures the strength of player i’s nonpecuniary preference, and v(π j ) is an increasing function of player j’s material payoff. β(xi − x j ) is positive, if xi − x j > 0 and negative if xi − x j < 0. Thus, a state of inequality triggers the desire to reduce or increase the other players’ payoff. In this regard, the utility function is similar to the preference assumption in FS. Yet, in contrast to FS, the aim of player i is no longer the reduction of the payoff difference. Instead, player i just wants to reduce or increase the other player’s payoff, depending on the sign of β.

Fairness and Reciprocity

239

Intention-based theories predict that people are generous only if they have been treated kindly (i.e., if the ﬁrst mover has signaled a fair intention). Levine’s theory is similar in this regard, because generous actions are more likely if the ﬁrst mover reveals that she is an altruistic type. However, in contrast to the intention-based approaches, Levine’s approach is also compatible with unconditional giving, if it is sufﬁciently surplus-enhancing. Neither intention- nor type-based reciprocity can explain positive transfers in the DG. Moreover, Charness (1996), Bolton, Brandts, and Ockenfels (1998), Offerman (1999), Cox (2000), and Charness and Rabin (2000) provide further evidence that intentions do not play a big role for rewarding behavior. Charness (1996) conducted GEGs in a random choice condition and a human choice condition. Intention-based theories predict that, in the random choice condition, the Responders will not put forward more than the minimal effort level irrespective of the wage level, because high wage offers are due to chance and not to kind intentions. In the human choice condition, higher wages indicate a higher degree of kindness and, therefore, a positive correlation between wages and effort is predicted. Levine’s theory allows, in principle, for a positive correlation between wages and effort in both conditions, because an increase in effort beneﬁts the Proposer much more than they cost the Responder. However, the correlation should be much stronger in the human choice condition because of the type-revealing effect of high wages. Charness ﬁnds a signiﬁcantly positive correlation in the random choice condition. In the human choice condition, effort is only slightly lower at low wages and equally high at high wages. This indicates, if anything, only a minor role for intention- and type-driven behavior. The best interpretation is probably that inequity aversion or quasi-maximin preferences induce nonminimal effort levels in this setting. In addition, negative reciprocity kicks in at low wages that explain the lower effort levels in the human choice condition. Cox (2000) tries to isolate rewarding responses in the context of a TG by using a related DG as a control condition. In the TG, Cox observes a baseline level of Responder transfers back to the Proposer. To isolate the relevance of intention-driven responses, he conducts a DG in which the distribution of endowments is identical to the distribution of material payoffs after the Proposers’ choices in the TG. Thus, both in the TG and in the DG, the Responders face exactly the same distributions of material payoffs; but, in the TG, this distribution has been caused intentionally by the Proposers, whereas in the DG the distribution is predetermined by the experimenter. In Cox’s DG, the motive of rewarding kindness can, therefore, play no role, and intention-based theories, as well as Levine’s theory, predict that Responders transfer nothing back. If one takes into account that some transfers in the DG are driven by inequity aversion or quasimaximin preferences, the difference between the transfers in the DG and the transfers in the TG measure the relevance of intention- or type-based theories. Cox’s results indicate that these theories play only a minor or no role in this context. In one condition, there is no difference in transfers between the TG and the DG, and, in another condition, transfers in the DG are lower by only one-third.

240

Fehr and Schmidt

The strongest evidence against the role of intentions comes from Bolton, Brandts, and Ockenfels (1998). They conducted sequential social dilemma experiments that are akin to a sequentially played prisoner’s dilemma. In one condition, the ﬁrst movers could make a kind choice relative to a baseline choice. The kind choice implied that – for any choice of the second mover – the payoff of the second mover increased by 400 units at a cost of 100 for the ﬁrst mover. Then, the second mover could take costly actions to reward the ﬁrst mover. In a control condition, the ﬁrst mover could make only the baseline choice (i.e., he could not express any kind intentions). Second movers reward the ﬁrst movers even more in this control condition. Although this difference is not signiﬁcant, the results clearly suggest that intention-driven rewards play no role in this experiment. The strongest evidence in favor of intentions comes from the moonlighting game of FFF (2000b) described in the previous subsection. FFF ﬁnd that, for all positive transfers of player A, player B sends back signiﬁcantly more money in the human choice condition. Moreover, the difference between the rewards in the human choice condition and the random choice condition are also quantitatively important. A recent paper by McCabe, Rigdon, and Smith (2000) also reports evidence in favor of intention-driven positive reciprocity. They show that, after a nice choice of the ﬁrst mover, two-thirds of the second movers make nice choices, too; whereas if the ﬁrst mover is forced to make the nice choice, only one-third of the second movers make the nice choice. In the absence of the evidence provided by FFF and McCabe et al., one would have to conclude that the motive to reward good intentions or fair types is (at best) of minor importance. However, in view of the relatively strong results in the ﬁnal two papers, it seems wise to be more cautious and to wait for further evidence. Nevertheless, the bulk of the evidence suggests that inequity aversion and efﬁciency-seeking are more important than intention- or typebased reciprocity in the domain of kind behavior. 4.5.

Summary and Outlook

Although most fairness models discussed in Section 3 are just a few years old, the discussion in this section shows that there is already a fair amount of evidence that sheds light on the relative performance of the different models. This indicates a quick and healthy interaction between experimental research and the development of new theories. The initial experimental results discussed in Section 2 gave rise to a number of new theories, which, in turn, have again been quickly subjected to careful and rigorous empirical testing. Although these tests have not yet led to conclusive results regarding the relative importance of the different motives, many important and interesting insights have been obtained. In our view, the main results can be summarized as follows: 1. Evidence from the Third-Party Punishment Game and the PGG with punishment indicates that many subjects do compare themselves with

Fairness and Reciprocity

241

other people in the group and not just to the group as a whole or to the group average. 2. There is a nonnegligible number of subjects in DGs whose behavior is consistent with surplus maximization. However, the relative quantitative importance of this motive in economically relevant settings has yet to be determined, and surplus maximization alone cannot account for many robust regularities in other games. 3. Pure revenge, as captured by reciprocity models, is an important motive for payoff-reducing behavior. In some games, like the PGG with punishment, it seems to be the dominant source of payoff-reducing behavior. Because pure equity models do not capture this motive, they cannot explain a signiﬁcant amount of payoff-reducing behavior. 4. In the domain of kind behavior, the motives captured by intentionor type-based models of fairness seem to be less important than in the domain of payoff-reducing behavior. Several studies indicate that inequity aversion or quasi-maximin preferences play a more important role here. Which model of fairness does best in the light of the data, and which one should be used in applications to economically important phenomena? We believe that it is too early to give a conclusive answer to these questions. There is a large amount of heterogeneity at the individual level, and any model of fairness has difﬁculties in explaining the full diversity of the experimental observations. The evidence suggests, however, some tentative answers to these questions. In our view, the most important heterogeneity is the one between purely selﬁsh subjects and fair-minded subjects. The success of the BO model and the FS model in explaining a large variety of data from bargaining, cooperation, and market games is partly from this recognition. Within the class of these equity models, the evidence suggests that the FS model does better. In particular, the experiments discussed in Section 4.1 indicate that people do not compare themselves with the group as a whole, but rather with other individuals in the group. The group average is less compelling as a yardstick to measure equity than differences in individual payoffs. However, the FS model clearly does not recognize the full heterogeneity within the class of fair-minded individuals. Section 4.4 makes it clear that an important part of payoff-reducing behavior is not driven by the desire to reduce payoff differences, but by the desire to reduce the payoff of those who take unfair actions or reveal themselves as unfair types. The model therefore underestimates the amount of punishing behavior in situations where the cost of punishment is relatively high, compared with the payoff reductions that can be achieved by punishing. Fairness models that are exclusively based on intentions (Rabin 1993; Dufwenberg and Kirchsteiger 1998) can, in principle, account for this type of punishment. Yet, these models have other undesirable features, including multiple, and very counterintuitive, equilibria in many games and a very high degree of complexity that is from the use of psychological game

242

Fehr and Schmidt

theory. The same has to be said about the intention-based theory of Charness and Rabin (2000). Falk and Fischbacher (1999) are not plagued by the multiple equilibrium problem as much as the pure intention models. This is because they incorporate equity as a global reference standard. Their model shares, however, the complexity costs of psychological game theory. Even though none of the available theories can take into account the full complexity of motives at the individual level, some theories may allow for better approximations than others. The evidence presented in Section 2 shows clearly that there are many important economic problems for which the self-interest theory is unambiguously, and in a quantitatively important way, refuted. The recent papers by BO and FS show that one can account for the bulk of this evidence by models that explicitly take into account that there are selﬁsh and fair-minded individuals. Although we believe that it is desirable to tackle the heterogeneity within the class of fair-minded subjects in parsimonious and tractable models, we also believe that the heterogeneity between selﬁsh and fair types is more important. In fact, in the following section, we will show that the FS model provides surprisingly good qualitative and quantitative predictions in important economic domains. Thus, even if we do not yet have a fully satisfactory model of fair behavior, one can probably go a long way with simple models that take into account the interaction between selﬁsh and fair types. 5. ECONOMIC APPLICATIONS 5.1.

Competition and Fairness – When Does Fairness Matter?

The self-interest model fails to explain the experimental evidence in many games in which only a few players interact, but it is very successful in explaining the outcome of competitive markets. It is a well-established experimental fact that, in a broad class of market games, prices converge to the competitive equilibrium.32 This result holds even if the resulting allocation is very unfair by any notion of fairness. Thus, the question arises: If so many people resist unfair outcomes in, say, the UG, why don’t they behave the same way when there is competition among the players? To answer this question, consider the following UG with Proposer competition, which was conducted by Roth, Prasnikar, Okuno-Fujiwara, and Zamir (1991) in four different countries. There are n − 1 Proposers who simultaneously offer a share si ∈ [0, 1], i ∈ {1, . . . , n − 1}, to one Responder. The Responder can either accept or reject the highest offer s max = maxi {si }. If there are several Proposers who offered s max , one of them is selected at random with equal probability. If the Responder accepts s max , her monetary payoff is s max and the successful Proposer earns 1 − s max , whereas all the other Proposers get 0. If the Responder rejects, everybody gets a payoff of 0. 32

See, e.g., Smith (1962) and Davis and Holt (1993).

Fairness and Reciprocity

243

The prediction of the self-interest model is straightforward: All Proposers will offer s = 1, which is accepted by the Responder. Hence, all Proposers get a payoff of 0 and the monopolistic Responder captures the entire surplus. This outcome is clearly very unfair, but it describes precisely what happened in the experiments. After a few periods of adaptation, s max was very close to 1, and all the surplus was captured by the Responder.33 This result is remarkable. It does not seem to be more fair that one side of the market gets all of the surplus in this setting than in the standard UG. Why do the Proposers let the Responder get away with it? The reason is that, in this strategic setting, preferences for fairness or reciprocity cannot have any effect. To see this, suppose that each of the Proposers strongly dislikes to get less than the Responder. Consider Proposer i and let s = max j=i {s j } be the highest offer made by his fellow Proposers. If Proposer i offers si < s , then his offer has no effect and he will get a monetary payoff of 0 with certainty. Furthermore, he cannot prevent that the Responder gets s and that one of the other Proposers gets 1 − s ; so, he will suffer from getting less than these two. However, if he offers a little bit more than s , say s + ε, then he will win the competition, get a positive monetary payoff, and reduce the inequality between himself and the Responder. Hence, he should try to overbid his competitors. This process drives the share that is offered by the Proposers up to 1. There is nothing the Proposers can do about it, even if all of them have a strong preference for fairness. We prove this result formally in Fehr and Schmidt (1999) for the case of inequityaverse players, but the same result is also predicted by the approaches of Levine (1998) and Bolton and Ockenfels (2000). Does this mean that sufﬁciently strong competition will always wipe out the impact of fairness? The answer to this question is negative, because fairness matters much more in market games in which the execution of contracts cannot be completely determined at the stage where the parties conclude the contracts. Labor markets are a good example. A labor contract is highly incomplete, because it cannot enforce the level of effort provided by the employee who chooses his effort level after the contract has been signed. These contractual features are captured by the GEG in an experimental setting. When the GEG is embedded into a competitive experimental market [e.g., in Fehr et al. (1993, 1998)], wages are systematically higher than the competitive equilibrium wage predicted by the self-interest model. There is also no tendency for wages to decrease over time. The reason for this stable wage premium is the effort behavior of the Responders: On average, effort levels are increasing with wages that provide an incentive for the ﬁrms to pay a wage premium. If, 33

The experiments were conducted in Israel, Japan, Slovenia, and the United States. In all experiments, there were nine Proposers and one responder. Roth et al. also conducted the standard UG with one Proposer in these four countries. They did ﬁnd some small (but statistically signiﬁcant) differences between countries in the standard UG, which may be attributed to cultural differences. However, there are no statistically signiﬁcant differences between countries for the UG with Proposer competition.

244

Fehr and Schmidt

however, the effort level is ﬁxed exogenously by the experimenter, the ﬁrms do not shy away from pushing down wages to the competitive level. FS and BO can explain this pattern in a straightforward manner. When effort is endogenous, inequity-averse Responders respond to high wages with high effort levels to prevent an unequal distribution of the surplus from trade. This induces all ﬁrms (including purely selﬁsh ones) to pay a wage premium because it is proﬁtable to do so. When effort is exogenous, this mechanism does not work, and competition drives down wages to the competitive level. 5.2.

Endogenous Incomplete Contracts

If fairness concerns affect the behavior of economic agents in so many situations, then it should also be taken into account in the design of incentive schemes. Surprisingly, hardly any theoretical and very little empirical or experimental work has been done to study the impact of fairness on incentive provision. Standard contract theory neglects this issue and assumes that all agents are interested only in their own material payoffs. Over the past two decades, this theory has been highly successful in solving fairly complicated contractual problems and in designing very sophisticated mechanisms and incentive schemes. This gave rise to many important and fascinating insights, and the methods developed there have been applied in almost all areas of economics. However, standard contract theory still ﬁnds it difﬁcult to explain the simplicity and incompleteness of many contracts that we observe in the real world. In particular, it cannot explain why the parties’ monetary payoffs are often not tied to measures of performance that would be available at a relatively small cost. For example, the salary of a teacher or a university professor is rarely contingent on students’ test scores, teaching ratings, or citations. These performance measures are readily available and easily veriﬁable, so one has to conclude that these contracts are deliberately left incomplete.34 In a recent paper, Fehr, Klein, and Schmidt (2000) take a fresh look at contractual incompleteness by taking concerns for fairness and reciprocity into account. They report on several simple principal–agent experiments in which the principal was given a choice whether to offer a “complete” contract or a less complete one. In the ﬁrst experimental design, an agent had to pick an effort level 34

The literature on incomplete contracts acknowledges contractual incompleteness, but most of this literature simply assumes that no long-term contingent contracts are feasible and does not attempt to explain this premise. See, for example, Grossman and Hart (1986) or Hart and Moore (1990) and Section 5.3. There is a small literature on endogenous incomplete contracts. Some papers in this literature [e.g., Aghion, Dewatripont, and Rey (1994), N¨oldeke and Schmidt (1995), or Edlin and Reichelstein (1996)] show that, in some situations, a properly designed incomplete contract can implement the ﬁrst best, so, there is no need to write a more complete contract. Some other papers [e.g., Che and Hausch (1998), Hart and Moore (1999), and Segal (1999)] show that, although an incomplete contract does not implement the ﬁrst best, a more complete contract is of no value to the parties because it is impossible to get closer to the efﬁciency frontier.

Fairness and Reciprocity

245

between 1 and 10 (at a monetary cost to herself) that is perfectly observed by a principal and can be veriﬁed (at a small ﬁxed cost) to the courts. The principal can try to induce the agent to spend effort by imposing a ﬁne on the agent that is enforced by the courts if she works too little. However, the ﬁne is bounded above so that the highest implementable effort level (e∗ = 4) falls short of the ﬁrst-best efﬁcient action (eFB = 10). In this contractual environment, principal– agent theory predicts that the principal should use the maximal ﬁne to induce the agent to choose e∗ = 4, and that he should offer a ﬁxed wage that holds the agent down to her reservation utility. If the agent complies with the contract, the principal can capture roughly 30 percent of the ﬁrst-best surplus for himself, while the agent gets nothing. There are two alternatives to this “incentive contract.” In one treatment, the principal could choose to offer a “trust contract” that does without a ﬁne and simply pays a generous ﬁxed wage up front to the agent asking her to reciprocate by spending a higher level of effort. However, effort cannot be enforced with this contract. In a second treatment, the principal could offer a “bonus contract,” which speciﬁes a ﬁxed wage, a desired level of effort, and an announced bonus payment if the effort is to the principal’s satisfaction. However, both parties know that the bonus cannot be enforced and is left at the discretion of the principal. The trust and the bonus contract are clearly less complete than the incentive contract. Because the experiments carefully rule out any repeated interactions between the parties, both types of contracts are, according to standard principal–agent theory, doomed to fail. Given the ﬁxed wage, a pure self-interested agent will not spend any effort. Similarly, a principal who is interested only in his own income will never pay a bonus, so a rational agent should never put in any effort. If concerns for fairness and reciprocity are taken into account, the predictions are less clear cut. Consider again the optimal incentive contract (as suggested by principal–agent theory). This contract aims at a rather unfair distribution of the surplus. If the agent is concerned about this, there are two ways how she could punish the principal. First, as in a UG, she could simply reject the contract, in which case both parties get a payoff of 0. A second, and more interesting, punishment strategy is to accept the contract and to shirk. Note that, if the incentive compatibility constraint is just binding, then the cost of shirking to the agent is zero and independent of the ﬁxed wage offered by the principal. Thus, if the principal offers a somewhat higher wage that gives a positive (but still “unfair”) share of the surplus to the agent, the agent can punish the principal by accepting the wage and shirking (at zero cost to herself). Hence, concerns for fairness and reciprocity suggest that the principal has to offer a fairly generous wage to get the agent to accept and to work, which makes the incentive contract less attractive. On the other hand, concerns for fairness and reciprocity improve the performance of trust and bonus contracts. A fair agent will reciprocate to a generous wage offer in a trust contract by putting in a higher effort level voluntarily. Similarly, a fair principal will reciprocate to a high effort level by paying a

246

Fehr and Schmidt

generous bonus, making it worth the agent’s while to spend more effort. Unfortunately, however, on such a general level, it is impossible to make any clear-cut predictions about the relative performance of the three types of contracts. Is the incentive contract going to be outperformed by the trust and/or the bonus contract? Is the bonus contract induced at a higher level of effort than the trust contract or the other way round? To obtain quantitative predictions for the experiments, Fehr et al. (2000) apply the model of inequity aversion by Fehr and Schmidt (1999) to this moral hazard problem. Most other models of fairness or intention-based reciprocity would probably yield similar results, and we want to stress that these experiments were not designed to discriminate between different notions of fairness. The main advantage of our model of inequity aversion is just its simplicity, which makes it straightforward to apply to these games. However, Fehr et al. (2000) have to make a few additional assumptions. In particular, they assume for simplicity that there are only two types of subjects, “selﬁsh” players who are interested only in their own material payoffs, and “fair” players who are willing to give up their own resources to achieve a more equal payoff distribution. Furthermore, in rough accordance with the experimental results of many UGs and DGs, they assume that 60 percent of the population are selﬁsh and 40 percent are fair. With these assumptions it is a straightforward exercise to analyze the different types of contracts and obtain the following predictions: 1. Trust Contracts: Fair agents will reciprocate to high wage offers by putting in an effort level that equalizes payoffs, whereas selﬁsh agents will choose the minimum effort level of 1. Thus, a higher wage offer will, on average, induce a higher level of effort. However, it can be shown that if less than two-thirds of all agents are fair, paying a higher wage does not raise the principal’s expected proﬁt. Therefore, with 40 percent fair agents, the trust contract is not going to work. 2. Incentive Contracts: For the same reason as in the trust contract, it does not pay for the principals to elicit higher average effort levels by paying generous wages. Thus, both selﬁsh and fair principals impose the highest possible ﬁne to induce the agent to choose e = 4. However, whereas the fair principals share the surplus arising from e = 4 equally with the agent, selﬁsh principals propose unfair contracts that give them the whole surplus. They anticipate that the fair agents reject these contracts; but, because the 60 percent selﬁsh agents accept these contracts, this strategy is still proﬁtable. 3. Bonus Contracts: Selﬁsh principals always pay a bonus of zero, but fair principals pay a bonus that divides the surplus equally between the principal and the agent. Therefore, the bonus is on average increasing with the agent’s effort. Moreover, the relation between the effort and the average bonus is sufﬁciently steep to induce a selﬁsh agent to

Fairness and Reciprocity

247

put it an effort level of 7. However, the fair agent chooses an effort level of only 1 or 2 (depending on the ﬁxed wage). The reason for this surprising result is that the fair agent is concerned not only about her expected monetary payoff, but that she suffers in addition from the inequality that arises if a selﬁsh principal does not pay the bonus. Nevertheless, on average, the bonus contract implements a higher level of effort (e = 5.2) and yields a higher payoff for the principal than both the incentive contract and the trust contract.35 What are the experimental results? Each experiment had 10 periods, in which each principal was matched randomly and anonymously with a different agent. In the ﬁrst treatment, in which principals could choose between a trust and an incentive contract, roughly 50 percent of the principals chose a trust contract and 50 percent chose an incentive contract in period 1. However, the fraction of incentive contracts rose quickly and, after period 5, roughly 80 percent of all contractual choices were incentive contracts. Those principals who offered a trust contract paid generous wages, to which some agents reciprocated by putting in a high effort level. However, in 64 percent of all trust contracts, the agents chose e = 1. Thus, on average, principals incurred considerable losses when they proposed trust contracts. The incentive contracts did better, but they did much less well than predicted by standard principal–agent theory. They also did less well than predicted by the model of inequity aversion. The reason is that, at the beginning, many principals offered incentive contracts with fairly high wages that were not incentive-compatible. In these cases, 62 percent of the agents shirked, imposing considerable losses on principals. On the other hand, those principals who offered incentive-compatible incentive contracts with low wages did fairly well. Principals learned to properly design incentive contracts over time. The fraction of incentive-compatible contracts increased from only 10 percent in period 1 to 64 percent in period 10. In the second treatment, the principal had to choose between a bonus contract and an incentive contract. From the very beginning, the bonus contract was much more popular than the incentive contract and accounted for roughly 90 percent of all contractual choices. Many principals did not pay a bonus, but a signiﬁcant fraction reciprocated generously to higher effort levels. The average bonus was, therefore, strongly increasing in the effort level, which made it worthwhile for the agents to put forward rather high effort levels. The average effort level was 5.2, which is signiﬁcantly higher than the average effort of 2.5 induced by 35

The analysis of the bonus contract is complicated by the fact that the principal has to move twice. He offers the terms of the contract at the ﬁrst stage of the game, and he has to choose his bonus payment at the last stage. Thus, his contract offer may reveal some information about his type. However, it can be shown that there is no separating equilibrium in this game and that all pooling equilibria have the properties described previously. Furthermore, if we assume that a higher wage offer is not interpreted by the agent as a signal that she faces the selﬁsh principal with a higher probability, then there is a unique pooling equilibrium. See Fehr et al. (2000).

248

Fehr and Schmidt

incentive contracts. The bonus contract not only is more efﬁcient than the incentive contract, it also yields on average a much higher payoff to the principal and a moderately higher payoff to the agent. These results are clearly inconsistent with the self-interest model, whereas the model of inequity aversion explains them surprisingly well.36 Our experiments demonstrate that quite powerful incentives can be given by a very incomplete bonus contract. The bonus contract relies on reciprocal fairness as an enforcement device. It does better than the more complete incentive contracts because it is incomplete and thus leaves more freedom to the parties to reciprocate. This enforcement mechanism is not perfect and, depending on the payoff structure and the fraction of reciprocal types in the population, it can fail. In fact, we have seen that the trust contract – in which the principal has to pay, in advance, the “bonus” unconditionally – is not viable in the set up of our experiments. Yet, the performance of the bonus contract suggests that the effect of reciprocal fairness, which has been neglected in contract theory so far, is important for optimal contractual design and should be taken into account. 5.3.

The Optimal Allocation of Ownership Rights

Consider two parties, A and B, who are engaged in a joint project (a “ﬁrm”) to which they have to make some relationship-speciﬁc investments today to generate a joint surplus in the future. An important question that has received considerable attention in recent years is who should own the ﬁrm. In a seminal paper, Grossman and Hart (1986) argue that ownership rights allocate residual rights of control on the physical assets that are required to generate the surplus. For example, if A owns the ﬁrm, then he will have a stronger bargaining position than B in the renegotiation game in which the surplus between the two parties is shared ex post, because he can exclude B from using the assets that make B’s relationship-speciﬁc investment less productive. Grossman and Hart show that there is no ownership structure that implements ﬁrst-best investments, but some ownership structures do better than others, and there is a unique second-best optimal allocation of ownership rights. 36

In a second experimental design, Fehr et al. (2000) consider a multitask principal–agent model inspired by Holmstr¨om and Milgrom (1991). In this experiment, the agents have to choose two separate effort levels (“tasks”), e1 and e2 , both of which are observable by the principal, but only e1 is veriﬁable and can be contracted on. The principal can choose between a piece-rate contract that rewards the agent for his effort spent on task 1 and a bonus contract that announces a voluntary bonus payment if the agent’s effort on both tasks is to the principal’s satisfaction. The overwhelming majority of principals opted for the bonus contract, which induced the agents to spend, on average, a considerable amount of effort and to allocate total effort efﬁciently across tasks. Those principals that chose a piece-rate contract induced the agents to concentrate all of their total efforts on task 1, which is very inefﬁcient. Again, these results are inconsistent with the self-interest model, but they can be nicely explained by the Fehr–Schmidt model of inequity aversion.

Fairness and Reciprocity

249

A common feature of most incomplete contract models is that joint ownership cannot be optimal.37 This result is at odds with the fact that there are many jointly owned companies, partnerships, or joint ventures. Furthermore, the argument neglects that reciprocal fairness may be an important enforcement mechanism to induce the involved parties to invest more under joint ownership than otherwise predicted. To test this hypothesis, Fehr, Kremhelmer, and Schmidt (2000) conducted a series of experiments on the optimal allocation of ownership rights. The experimental game is a grossly simpliﬁed version of Grossman and Hart (1986): There are two parties, A and B, who have to make investments, a, b ∈ {1, . . . , 10}, respectively, to generate a joint surplus v(a, b). Investments are sequential: B has to invest ﬁrst; his investment level b is observed by A, who has to invest thereafter. We consider two possible ownership structures: Under A ownership, A hires B as an employee and pays her a ﬁxed wage w. In this case, monetary payoffs are v(a, b) − w − a for A and w − b for B. Under joint ownership, each party gets half of the gross surplus minus his or her investment cost [i.e., 0.5v(a, b) − a for A and 0.5v(a, b) − b for B]. The gross proﬁt function has been chosen such that maximal investments are efﬁcient (i.e., a FB = bFB = 10), but if each party gets only 50 percent of the marginal return of their investments, then it is a dominant strategy for a purely self-interested player to choose the minimum level of investment, a = b = 1. Finally, in the ﬁrst stage of the game, A can decide whether to be the sole owner of the ﬁrm and make a wage offer to B, or whether to have joint ownership. The prediction of the self-interest model is straightforward. Under A ownership, B has no incentive to invest and will choose b = 1. On the other hand, A is a full residual claimant on the margin, so she will invest efﬁciently. Under joint ownership, each party gets only 50 percent of the marginal return, which is not sufﬁcient to induce any investments. Hence, in this case, B’s optimal investment level is unchanged, but A’s investment level is reduced to a = 1. Thus, A ownership outperforms joint ownership, and A should hire B as an employee. In the experiments, just the opposite happened. Party A chose joint ownership in more than 80 percent (187 of 230) of all observations and gave away 50 percent of the gross return to B. Moreover, the fraction of joint ownership contracts increased from 74 percent in the ﬁrst two periods to 89 percent in the 37

To see this note that, in the renegotiation game in which the surplus is shared, each party gets its reservation utility plus a ﬁxed fraction (50 percent, say) of the joint surplus in excess of the sum of the reservation utilities. Now, consider A ownership. If A invests, then his investment increases not only the joint surplus, but also his reservation utility (i.e., what he could get out of the ﬁrm without B’s collaboration). On the other hand, if B invests, then her investment increases only the joint surplus, but it does not improve her reservation utility. The reason is that the investment requires access to the ﬁrm to be productive. Hence, without the ﬁrm, B’s investment is useless. This is why A will invest more than B under A ownership. Consider now joint ownership. If both parties own the ﬁrm jointly, then each of them can prevent the other from using the assets. Hence, neither A’s nor B’s investment affects their respective reservation utilities. Therefore, A’s investment incentives are reduced, whereas B’s investment incentives do not improve. Hence, joint ownership is inferior.

250

Fehr and Schmidt

last two periods. With joint ownership, B players chose on average an investment level of 8.9, and A responded with an investment of 6.5 (on average). On the other hand, if A ownership was chosen and A hired B as an employee, B’s average investment was only 1.3, whereas all A players chose an investment level of 10. Furthermore A players earned much more on average if they chose joint ownership rather than A ownership. These results are inconsistent with the self-interest model, but it is straightforward to explain them with concerns for fairness. Applying the Fehr and Schmidt (1999) model of inequity aversion gives again fairly accurate quantitative predictions. Thus, the experimental results and the theoretical analysis suggest that joint ownership may do better than A ownership, because it offers more scope for reciprocal behavior. Subjects seem to understand this and predominantly choose this ownership structure. 6. CONCLUSIONS The self-interest model has been very successful in explaining individual behavior on competitive markets, but it is unambiguously refuted in many situations in which individuals interact strategically. The experimental evidence on, for example, UGs, DGs, GEGs, and PGGs demonstrates unambiguously not only that many people are maximizing their own material payoffs, but also that they are concerned about social comparisons, fairness, and the desire to reciprocate. We have reviewed several models that try to take these concerns explicitly into account. A general lesson to be drawn from these models is that the assumption that some people are fair-minded and have the desire to reciprocate does not imply that these people will always behave “fairly.” In some environments (e.g., in competitive markets or in PGGs without punishment), fair-minded actors will often behave as if they are purely self-interested. Likewise, a purely self-interested person may often behave as if he is strongly concerned about fairness (e.g., the Proposers who make fair proposals in the UG or generous wage offers in the GEG). Thus, the behavior of fair-minded and purely selfinterested actors depends on the strategic environment in which they interact and on their beliefs about the fairness of their opponents. The analysis of this behavior is not trivial, and it is helpful to develop theoretical tools to better understand what we observe. Some of the models reviewed focus solely on preferences over income distributions and ignore the fact that people often care about the intentions behind the actions of their opponents. Some other papers focus only on intention-based or type-based reciprocity and ignore the fact that some people are bothered by unfair distributions, even if their opponent could not do anything about it. It seems natural to try to combine these two motivations in a single model as has been done by Falk and Fischbacher (1998) and Charness and Rabin (2000). However, we believe that the cost of doing so is high. These models are rather complicated; they rely on psychological game theory, and it is difﬁcult to apply them even to very simple experimental games. Moreover, Charness and Rabin,

Fairness and Reciprocity

251

in particular, are plagued with multiple equilibria and have much more free parameters than all the other models. On the other hand, simple models of social preferences – for example, Bolton and Ockenfels’ (2000) ERC model or our own (1999) model of inequity aversion – ﬁt the data on large classes of games fairly well. They use standard game theory, they have fewer parameters to be estimated, and it is fairly straightforward to get clear-cut qualitative and quantitative predictions. The main advantage of these simple models is that they can easily be applied to other ﬁelds in economics. For more than 20 years, experimental economists concentrated on simple experimental games to better understand what drives economic behavior. However, very few of the insights that have been gained had any impact on how economists interpret the world. We feel that it is now time to change this. Many phenomena in situations in which people interact strategically cannot be understood by relying on the self-interest model alone. Our examples from contract theory and the theory of property rights illustrate that models of reciprocal fairness can be fruitfully applied to important and interesting economic questions, yielding predictions that are much closer to what we observe in many situations of the real world and in carefully controlled experiments than the predictions of the self-interest model. There are many other areas in which fairness models are likely to generate interesting new insights – be it the functioning of labor markets or questions of political economy or be it the design of optimal mechanisms or questions of compliance with organizational rules and the law. We hope that this is just the beginning. There is no shortage of important questions to which the newly developed tools and insights can be applied. ACKNOWLEDGMENTS We thank Glenn Ellison for many helpful comments and suggestions, and Alexander Klein and Susanne Kremhelmer for excellent research assistance. Part of this research was conducted while Klaus M. Schmidt visited Stanford University, and he thanks the Economics Department for its great hospitality. Financial support by Deutsche Forschungsgemeinschaft through Grant SCHM1196/4-1 is gratefully acknowledged. Ernst Fehr also gratefully acknowledges support from the Swiss National Science Foundation (Project No. 121405100.97), the Network on the Evolution of Preferences and Social Norms of the MacArthur Foundation, and the EU-TMR Research Network ENDEAR (FMRX-CTP98-0238).

References Abbink, K., B. Irlenbusch, and E. Renner (2000), “The Moonlighting Game: An Experimental Study on Reciprocity and Retribution,” Journal of Economic Behavior and Organization, 42, 265–277.

252

Fehr and Schmidt

Agell, J. and P. Lundborg (1995), “Theories of Pay and Unemployment: Survey Evidence from Swedish Manufacturing Firms,” Scandinavian Journal of Economics, 97, 295– 308. Aghion, P., M. Dewatripont, and P. Rey (1994), “Renegotiation Design with Unveriﬁable Information,” Econometrica, 62, 257–282. Ahlert, M., A. Cr¨uger, and W. G¨uth (1999), “An Experimental Analysis of Equal Punishment Games,” mimeo, University of Halle-Wittenberg. Alm, J., I. Sanchez, and A. de Juan (1995), “Economic and Noneconomic Factors in Tax Compliance,” Kyklos, 48, 3–18. Andreoni, J. (1989), “Giving with Impure Altruism: Applications to Charity and Ricardian Equivalence,” Journal of Political Economy, 97, 1447–1458. Andreoni, J., B. Erard, and J. Feinstein (1998), “Tax Compliance,” Journal of Economic Literature, 36, 818–860. Andreoni, J. and J. Miller (1993), “Rational Cooperation in the Finitely Repeated Prisoner’s Dilemma: Experimental Evidence,” Economic Journal, 103, 570–585. Andreoni, J. and J. Miller (2000), “Giving According to GARP: An Experimental Test of the Rationality of Altruism,” mimeo, University of Wisconsin and Carnegie Mellon University. Andreoni, J. and H. Varian (1999), “Preplay Contracting in the Prisoner’s Dilemma,” Proceedings of the National Academy of Sciences USA, 96, 10933–10938. Andreoni, J. and L. Vesterlund, “Which Is the Fair Sex? Gender Differences in Altruism,” Quarterly Journal of Economics, 116, 293–312. Arrow, K. J. (1981), “Optimal and Voluntary Income Redistribution,” in Economic Welfare and the Economics of Soviet Socialism: Essays in Honor of Abram Bergson, (ed. by S. Rosenﬁeld), Cambridge, UK: Cambridge University Press. Becker, G. S. (1974), “A Theory of Social Interactions,” Journal of Political Economy, 82, 1063–1093. Berg, J., J. Dickhaut, and K. McCabe (1995), “Trust, Reciprocity and Social History,” Games and Economic Behavior, 10, 122–142. Bernheim, B. D. (1986), “On the Voluntary and Involuntary Provision of Public Goods,” American Economic Review, 76, 789–793. Bewley, T. (1999), Why Wages Don’t Fall During a Recession. Cambridge, MA: Harvard University Press. Binmore, K. (1998), Game Theory and the Social Contract: Just Playing. Cambridge, MA: MIT Press. Binmore, K., J. Gale, and L. Samuelson (1995), “Learning to Be Imperfect: The Ultimatum Game,” Games and Economic Behavior, 8, 56–90. Blount, S. (1995), “When Social Outcomes Aren’t Fair: The Effect of Causal Attributions on Preferences,” Organizational Behavior and Human Decision Processes, 43, 131– 144. Bolle, F. and A. Kritikos (1998), “Self-Centered Inequality Aversion Versus Reciprocity and Altruism,” mimeo, Europa-Universit¨at Viadrina. Bolton, G. E. (1991), “A Comparative Model of Bargaining: Theory and Evidence,” American Economic Review, 81, 1096–1136. Bolton, G. E., J. Brandts, and A. Ockenfels (1998), “Measuring Motivations for the Reciprocal Responses Observed in a Simple Dilemma Game,” Experimental Economics, 3, 207–221. Bolton, G. E. and A. Ockenfels (2000), “A Theory of Equity, Reciprocity, and Competition,” American Economic Review, 100, 166–193.

Fairness and Reciprocity

253

Bolton, G. and R. Zwick (1995), “Anonymity Versus Punishment in Ultimatum Bargaining,” Games and Economic Behavior, 10, 95–121. Bowles, S. and H. Gintis (1999), “The Evolution of Strong Reciprocity,” mimeo, University of Massachusetts at Amherst. Bowles, S. and H. Gintis (2000), “Reciprocity, Self-Interest, and the Welfare State,” Nordic Journal of Political Economy, 26, 33–53. Brandts, J. and G. Charness (1999), “Gift-Exchange with Excess Supply and Excess Demand,” mimeo, Universitat Pompeu Fabra, Barcelona. Camerer, C. F. (1999), “Social Preferences in Dictator, Ultimatum and Trust Games,” mimeo, California Institute of Technology. Camerer, C. F. and R. H. Thaler (1995), “Ultimatums, Dictators and Manners,” Journal of Economic Perspectives, 9, 209–219. Cameron, L. A. (1999), “Raising the Stakes in the Ultimatum Game: Experimental Evidence from Indonesia.” Economic Inquiry, 37(1), 47–59. Carpenter, J. P. (2000), “Punishing Free-Riders: The Role of Monitoring – Group Size, Second-Order Free-Riding and Coordination,” mimeo, Middlebury College. Chamberlin, E. H. (1948), “An Experimental Imperfect Market,” Journal of Political Economy, 56, 95–108. Charness, G. (1996), “Attribution and Reciprocity in a Labor Market: An Experimental Investigation,” mimeo, University of California at Berkeley. Charness, G. (2000), “Responsibility and Effort in an Experimental Labor Market,” Journal of Economic Behavior and Organization, 42, 375–384. Charness, G. and M. Rabin (2000), “Social Preferences: Some Simple Tests and a New Model,” mimeo, University of California at Berkeley. Che, Y.-K. and D. B. Hausch (1999), “Cooperative Investments and the Value of Contracting.” American Economic Review, 89(1), 125–147. Cooper, D. J. and C. K. Stockman (1999), “Fairness, Learning, and Constructive Preferences: An Experimental Investigation,” mimeo, Case Western Reserve University. Costa-Gomes, M. and K. G. Zauner (1999), “Learning, Non-equilibrium Beliefs, and Non-Pecuniary Payoff Uncertainty in an Experimental Game,” mimeo, Harvard Business School. Cox, J. C. (2000), “Trust and Reciprocity: Implications of Game Triads and Social Contexts,” mimeo, University of Arizona at Tucson. Croson, R. T. A. (1999), “Theories of Altruism and Reciprocity: Evidence from Linear Public Goods Games,” Discussion Paper, Wharton School, University of Pennsylvania. Daughety, A. (1994), “Socially-Inﬂuenced Choice: Equity Considerations in Models of Consumer Choice and in Games,” mimeo, University of Iowa. Davis, D. and C. Holt (1993), Experimental Economics. Princeton, NJ: Princeton University Press. Dawes, R. M. and R. Thaler (1988), “Cooperation,” Journal of Economic Perspectives, 2, 187–197. Dufwenberg, M. and G. Kirchsteiger (1998), “A Theory of Sequential Reciprocity,” Discussion Paper, CENTER, Tilburg University. Edlin, A. S. and S. Reichelstein (1996), “Holdups, Standard Breach Remedies, and Optimal Investment,” American Economic Review, 86(3), 478–501. Eichenberger, R. and F. Oberholzer-Gee (1998), “Focus Effects in Dictator Game Experiments,” mimeo, University of Pennsylvania.

254

Fehr and Schmidt

Ellingsen, T. and M. Johannesson (2000), “Is There a Hold-up Problem? Stockholm School of Economics,” Working Paper 357. Encyclopaedia Britannica (1998), The New Encyclopaedia Britannica, Volume 1, (15th ed.), London, Encyclopaedia Britannica. Fahr, R. and B. Irlenbusch (2000), “Fairness as a Constraint on Trust in Reciprocity: Earned Property Rights in a Reciprocal Exchange Experiment,” Economics Letters, 66, 275–282. Falk, A. E. Fehr, and U. Fischbacher (2000a), “Informal Sanctions, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 59. Falk, A., E. Fehr, and U. Fischbacher (2000b), “Testing Theories of Fairness–Intentions Matter,” Institute for Empirical Research in Economics, University of Zurich, Working Paper 63. Falk, A., E. Fehr, and U. Fischbacher (2000c), “Appropriating the Commons, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 55. Falk, A. and U. Fischbacher (1999), “A Theory of Reciprocity, Institute for Empirical Research in Economics,” University of Zurich, Working Paper 6. Falk, A., S. G´achter, and J. Kov´acs (1999), “Intrinsic Motivation and Extrinsic Incentives in a Repeated Game with Incomplete Contracts,” Journal of Economic Psychology, 20, 251–284. Fehr, E. and A. Falk (1999), “Wage Rigidity in a Competitive Incomplete Contract Market,” Journal of Political Economy, 107, 106–134. Fehr, E. and U. Fischbacher (2000), “Third Party Punishment,” mimeo, University of Z¨urich. Fehr, E. and S. G¨achter (2000), “Cooperation and Punishment in Public Goods Experiments,” American Economic Review, 90, 980–994. Fehr, E., S. G¨achter, and G. Kirchsteiger (1997), “Reciprocity as a Contract Enforcement Device,” Econometrica, 65, 833–860. Fehr, E., G. Kirchsteiger, and A. Riedl (1993), “Does Fairness Prevent Market Clearing? An Experimental Investigation,” Quarterly Journal of Economics, 108, 437–460. Fehr, E., G. Kirchsteiger, and A. Riedl (1998), “Gift Exchange and Reciprocity in Competitive Experimental Markets,” European Economic Review, 42, 1–34. Fehr, E., A. Klein, and K. M. Schmidt (2000), “Endogenous Incomplete Contracts,” mimeo, University of Munich. Fehr, E., S. Kremhelmer, and K. M. Schmidt (2000), “Fairness and the Optimal Allocation of Property Rights,” mimeo, University of Munich. Fehr, E. and K. M. Schmidt (1999), “A Theory of Fairness, Competition and Cooperation.” Quarterly Journal of Economics, 114, 817–868. Fehr, E. and E. Tougareva (1995), “Do High Monetary Stakes Remove Reciprocal Fairness? Experimental Evidence from Russia,” mimeo, Institute for Empirical Economic Research, University of Zurich. Fischbacher, U., S. G¨achter, and E. Fehr (1999), “Are People Conditionally Cooperative? Evidence from a Public Goods Experiment,” Working Paper 16, Institute for Empirical Research in Economics, University of Zurich. Forsythe, R. L., J. Horowitz, N. E. Savin, and M. Sefton (1994), “Fairness in Simple Bargaining Games,” Games and Economic Behavior, 6, 347–369. Frey, B. and H. Weck-Hannemann (1984), “The Hidden Economy as an ‘Unobserved’ Variable,” European Economic Review, 26, 33–53. G¨achter, S. and A. Falk (1999), “Reputation or Reciprocity?” Working Paper 19, Institute for Empirical Research in Economics, University of Z¨urich.

Fairness and Reciprocity

255

Geanakoplos, J., D. Pearce, and E. Stacchetti (1989), “Psychological Games and Sequential Rationality,” Games and Economic Behavior, 1, 60–79. Gintis, H. (2000), “Strong Reciprocity and Human Sociality,” Journal of Theoretical Biology, 206, 169–179. Greenberg, J. (1990), “Employee Theft as a Reaction to Underpayment Inequity: The Hidden Cost of Pay Cuts,” Journal of Applied Psychology, 75, 561–568. Grossman, S. and O. Hart (1986), “An Analysis of the Principal–Agent Problem,” Econometrica, 51, 7–45. G¨uth, W., H. Kliemt, and A. Ockenfels (2000), “Fairness Versus Efﬁciency – An Experimental Study of Mutual Gift-Giving,” mimeo, Humboldt University of Berlin. G¨uth, W., R. Schmittberger, and B. Schwarze (1982), “An Experimental Analysis of Ultimatium Bargaining,” Journal of Economic Behavior and Organization, 3, 367– 388. G¨uth, W. and E. van Damme (1998), “Information, Strategic Behavior and Fairness in Ultimatum Bargaining: An Experimental Study,” Journal of Mathematical Psychology, 42, 227–247. Hannan, L., J. Kagel, and D. Moser (1999), “Partial Gift Exchange in Experimental Labor Markets: Impact of Subject Population Differences, Productivity Differences and Effort Requests on Behavior,” mimeo, University of Pittsburgh. Harsanyi, J. (1955), “Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparisons of Utility,” Journal of Political Economy, 63, 309–321. Hart, O. and J. Moore (1990), “Property Rights and the Nature of the Firm,” Journal of Political Economy, 98, 1119–1158. Hart, O. and J. Moore (1999), “Foundations of Incomplete Contracts,” Review of Economic Studies, 66, 115–138. Hoffman, E., K. McCabe, K. Shachat, and V. Smith (1994), “Preferences, Property Rights, and Anonymity in Bargaining Games,” Games and Economic Behavior, 7, 346–380. Hoffman, E., K. McCabe, and V. Smith (1996), “On Expectations and Monetary Stakes in Ultimatum Games,” International Journal of Game Theory, 25, 289–301. Holmstr¨om, B. and P. Milgrom (1991), “Multi-Task Principal-Agent Analyses.” Journal of Law, Economics, and Organization, 7, 24–52. Isaac, M. R., J. M. Walker, A. W. Williams (1994), “Group Size and the Voluntary Provision of Public Goods,” Journal of Public Economics, 54, 1–36. Kagel, J. H, C. Kim, and D. Moser (1996), “Fairness in Ultimatum Games with Asymmetric Information and Asymmetric Payoffs,” Games and Economic Behavior, 13, 100–110. Kahneman, D., J. L. Knetsch, and R. Thaler (1986), “Fairness as a Constraint on Proﬁt Seeking: Entitlements in the Market,” American Economic Review, 76, 728– 741. Kirchsteiger, G. (1994), “The Role of Envy in Ultimatum Games,” Journal of Economic Behavior and Organization, 25, 373–389. Ledyard, J. (1995), “Public Goods: A Survey of Experimental Research,” Chapter 2, in Handbook of Experimental Economics, (ed. by A. Roth and J. Kagel), Princeton, NJ: Princeton University Press. Levine, D. (1998), “Modeling Altruism and Spitefulness in Experiments,” Review of Economic Dynamics, 1, 593–622. Lind, A. and T. Tyler (1988) The Social Psychology of Procedural Justice. New York: Plenum Press.

256

Fehr and Schmidt

List, J. and T. Cherry (2000), “Examining the Role of Fairness in Bargaining Games,” mimeo, University of Arizona at Tucson. McCabe, K., M. Rigdon, and V. Smith (2000), “Positive Reciprocity and Intentions in Trust Games,” mimeo, University of Arizona at Tucson. Miller, S. (1997), “Strategienuntersuchung zum Investitionsspiel von Berg,” Dickhaut, McCabe, Diploma Thesis, University of Bonn. Neilson, W. (2000), “An Axiomatic Characterization of the Fehr-Schmidt Model of Inequity Aversion,” mimeo, Department of Economics, Texas A&M University. N¨oldeke, G. and K. M. Schmidt (1995), “Option Contracts and Renegotiation: A Solution to the Hold-Up Problem,” Rand Journal of Economics, 26, 163–179. Offerman, T. (1999), “Hurting Hurts More Than Helping Helps: The Role of the Selfserving Bias,” mimeo, University of Amsterdam. Ostrom, E. (1990), Governing the Commons – The Evolution of Institutions for Collective Action. New York: Cambridge University Press. Ostrom, E. (2000), “Collective Action and the Evolution of Social Norms,” Journal of Economic Perspectives, 14, 137–158. Rabin, M. (1993), “Incorporating Fairness into Game Theory and Economics,” American Economic Review, 83(5), 1281–1302. Roth, A. E. (1995), “Bargaining Experiments,” in Handbook of Experimental Economics, (ed. by J. Kagel and A. Roth) Princeton, NJ: Princeton University Press. Roth, A. E. and I. Erev (1995), “Learning in Extensive-Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term,” Games and Economic Behavior, 8, 164–212. Roth, A. E., M. W. K. Malouf, and J. K. Murningham (1981), “Sociological Versus Strategic Factors in Bargaining,” Journal of Economic Behavior and Organization, 2, 153–177. Roth, A. E., V. Prasnikar, M. Okuno-Fujiwara, and S. Zamir (1991), “Bargaining and Market Behavior in Jerusalem, Ljubljana, Pittsburgh, and Tokyo: An Experimental Study,” American Economic Review, 81, 1068–1095. Samuelson, P. A. (1993), “Altruism as a Problem Involving Group Versus Individual Selection in Economics and Biology,” American Economic Review, 83, 143–148. Segal, I. (1999), “Complexity and Renegotiation: A Foundation for Incomplete Contracts,” Review of Economic Studies, 66(1), 57–82. Segal, U. and J. Sobel (1999), “Tit for Tat: Foundations of Preferences for Reciprocity in Strategic Settings,” mimeo, University of California at San Diego. Seidl, C. and S. Traub (1999), “Taxpayers’ Attitudes, Behavior, and Perceptions of Fairness in Taxation,” mimeo, Institut f¨ur Finanzwissenschaft und Sozialpolitik, University of Kiel. Selten, R. and A. Ockenfels (1998), “An Experimental Solidarity Game,” Journal of Economic Behavior and Organization, 34, 517–539. Sen, A. (1995), “Moral Codes and Economic Success,” in Market Capitalism and Moral Values (ed. by C. S. Britten and A. Hamlin), Aldershot, UK: Edward Elgar. Sethi, R. and E. Somananthan (2001), “Preference Evolution and Reciprocity,” Journal of Economic Theory, 97, 273–297. Sethi, R. and E. Somananthan (2000), “Understanding Reciprocity,” mimeo, Columbia University. Slonim, R. and A. E. Roth (1997), “Financial Incentives and Learning in Ultimatum and Market Games: An Experiment in the Slovak Republic,” Econometrica, 65, 569– 596.

Fairness and Reciprocity

257

Smith, A. (1759), The Theory of Moral Sentiments. Indianapolis, IN: Liberty Fund (reprinted 1982). Smith, V. L. (1962), “An Experimental Study of Competitive Market Behavior,” Journal of Political Economy, 70, 111–137. Sonnemans, J., A. Schram, and T. Offerman (1999), “Strategic Behavior in Public Good Games–When Partners Drift Apart,” Economics Letters, 62, 35–41. Suleiman, R. (1996), “Expectations and Fairness in a Modiﬁed Ultimatum Game,” Journal of Economic Psychology, 17, 531–554. Veblen, T. (1922), The Theory of the Leisure Class–An Economic Study of Institutions. London: George Allen and Unwin (ﬁrst published 1899). Zajac, E. (1995), “Political Economy of Fairness,” Cambridge, MA: MIT Press. Zizzo, D. and A. Oswald (2000), “Are People Willing to Pay to Reduce Others’ Income?” mimeo, Oxford University.

CHAPTER 7

Hyberbolic Discounting and Consumption Christopher Harris and David Laibson

1. INTRODUCTION Robert Strotz (1956) ﬁrst suggested that people are more impatient when they make short-run trade-offs than when they make long-run trade-offs.1 Virtually every experimental study on time preference has supported Strotz’s conjecture.2 When two rewards are both far away in time, decision-makers act relatively patiently (e.g., I prefer two apples in 101 days, rather than one apple in 100 days). But when both rewards are brought forward in time, preferences exhibit a reversal, reﬂecting more impatience (e.g., I prefer one apple right now, rather than two apples tomorrow).3 Such reversals should be well understood by everyone who makes far-sighted New Year’s resolutions and later backtracks. We promise ourselves to exercise, diet, and quit smoking, but often postpone those virtuous behaviors when the moment arrives to make the required sacriﬁces. Looking to the long run, we wish to act patiently, but the desire for instant gratiﬁcation frequently overwhelms our good intentions. The contrast between long-run patience and short-run impatience has been modeled with discount functions that take an approximately hyperbolic form (Ainslie, 1992, Loewenstein and Prelec 1992, Laibson, 1997a). Such preferences imply that the instantaneous discount rate declines as the horizon increases. This pattern of discounting sets up a conﬂict between today’s preferences and the preferences that will be held in the future. From the perspective of period 0, the discount rate between two distant periods, t and t + 1, is a long-term low discount rate. However, from the perspective of period t, the discount rate between t and t + 1 is a short-term high discount rate. Hyperbolic consumers will report a gap between what they feel they should save and what they actually save. Prescriptive saving rates will lie above actual

1 2 3

Some of Strotz’s insights are anticipated by Ramsey (1928). See Ainslie (1992) and Frederick, Loewenstein, and O’Donoghue (2001) for reviews of the evidence for and against hyperbolic discounting. This example is from Thaler (1981).

Hyberbolic Discounting and Consumption

259

savings rates, because short-run preferences for instantaneous gratiﬁcation will undermine the consumer’s desire to implement long-run patient plans. However, the hyperbolic consumer is not doomed to retire in poverty. Illiquid assets can help the hyperbolic consumer lock in the patient, welfare-enhancing course of action. Hence, the availability of illiquid assets becomes a critical determinant of household savings and welfare. However, too much illiquidity can be problematic. Consumers face substantial uninsurable labor-income risk, and need to use liquid assets to smooth their consumption. Hyperbolic agents seek an investment portfolio that strikes the right balance between commitment and ﬂexibility. In this paper, we review and extend the literature on hyperbolic discounting and consumption. We begin our analysis of hyperbolic consumers by describing an inﬁnite-horizon consumption problem with a single liquid asset. Using this tractable problem, we characterize equilibrium behavior. We prove a new equilibrium uniqueness theorem, characterize some properties of the consumption function, and illustrate additional properties of the consumption function with numerical simulations. We show that hyperbolic consumption functions may exhibit pathologies like discontinuities, nonmonotonicities, and concavity violations. We analyze the comparative statics of these pathologies. The pathologies are exacerbated as hyperbolicity increases, risk aversion falls, and income uncertainty falls. We also show that these pathologies do not arise when the model parameters are calibrated at empirically sensible benchmark values. Finally, we review our earlier results on the Euler relation characterizing the equilibrium path (Harris and Laibson, 2001a). We then discuss simulations of savings and asset allocation choices of households who face a life cycle problem with liquid assets, liquid liabilities, and illiquid assets (Angeletos, Laibson, Repetto, Tobacman, and Weinberg 2001a; hereafter ALRTW). These life cycle simulations are used to compare the behavior of hyperbolic households and exponential households. Both the exponential and hyperbolic households are calibrated to hold levels of preretirement wealth that match observed levels of wealth reported in the Survey of Consumer Finances (SCF). Despite the fact that this calibration imposes identical levels of total wealth for hyperbolics and exponentials, numerous differences arise. First, the hyperbolic households invest comparatively little of their wealth in liquid assets. They hold relatively low levels of liquid wealth measured either as a fraction of labor income or as a share of total wealth. Analogously, hyperbolic households also borrow more aggressively in the revolving credit market (i.e., on credit cards). The low levels of liquid wealth and high rates of credit card borrowing generated by hyperbolic simulations match empirical measures from the SCF much better than the results of exponential simulations. Because the hyperbolic households have low levels of liquid assets and high levels of debt, they are unable to smooth their consumption paths in the presence of predictable changes in income. Calibrated hyperbolic simulations display substantial comovement between consumption and predictable income growth, matching empirical measures of comovement from the Panel Study of

260

Harris and Laibson

Income Dynamics (PSID). By contrast, calibrated exponential simulations generate too little consumption-income comovement. Similarly, hyperbolic simulations generate substantial drops in consumption around retirement, matching empirical estimates. The exponential simulations fail to replicate this pattern. All in all, the hyperbolic model matches observed consumption data better than the exponential model. Our paper is organized in 12 sections, and readers are encouraged to pick and choose among them. Section 4 contains the most technical parts of the paper and can be skipped by readers primarily interested in applications. In Section 2, we discuss the hyperbolic discount function. In Section 3, we present a one-asset, inﬁnite-horizon buffer-stock consumption model, which can accommodate either exponential or hyperbolic preferences. In Section 4, we discuss existence and uniqueness of an equilibrium. In Section 5, we describe the Euler relation that characterizes the equilibrium path. In Section 6, we describe our numerical simulations of the one-asset consumption problem. In Section 7, we describe the properties of the hyperbolic consumption function and illustrate these properties with simulations. In Section 8, we review empirical applications of the hyperbolic model. In Section 9, we discuss the level of consumer sophistication assumed in hyperbolic models. In Section 10, we describe the policy implications of the hyperbolic model. In Section 11, we discuss some important extensions of the hyperbolic model, including applications in continuous time. In Section 12, we conclude. 2. HYPERBOLIC DISCOUNTING When researchers elicit time preferences, they ask subjects to choose among a set of delayed rewards. The largest rewards are accompanied by the greatest delays.4 Researchers use subject choices to estimate the shape of the discount function. These estimated discount functions almost always approximate generalized hyperbolas: events τ periods away are discounted with weight (1 + ατ )−γ /α , with α, γ > 0 (Loewenstein and Prelec, 1992).5 Figure 7.1 graphs the generalized hyperbolic discount function with parameters α = 4 and γ = 1. Figure 7.1 also plots the standard exponential discount function, δ τ , assuming δ = 0.944 (the annual discount factor used in our simulations). 4

5

Such experiments have used a wide range of real rewards, including money, durable goods, fruit juice, sweets, video rentals, relief from noxious noise, and access to video games. For example, see Thaler (1981); Navarick (1982); Millar and Navarick (1984); King and Logue (1987); Kirby and Herrnstein (1995); Kirby and Marakovic (1995, 1996); Kirby (1997); and Read et al. (1996). See Ainslie (1992), Frederick et al. (2001), and Angeletos et al. (2001b) for partial reviews of this literature. See Mulligan (1997) for a critique. Loewenstein and Prelec (1992) provide an axiomatic derivation of the generalized hyperbolic discount function. See Chung and Herrnstein (1961) for the ﬁrst use of the hyperbolic discount function. The original psychology literature worked with the special cases 1/τ and 1/(1 + ατ ). Ainslie (1992) reviews this literature.

Hyberbolic Discounting and Consumption

261

1 0.9 0.8 Exponential Hyperbolic Quasi-hyperbolic

Discount function

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

5

10

15

t

20

Exponential: δ , with δ=.944. Hyperbolic: (1+αt)

-γ/α

25 Year

30

35

40

45 2

50

3

, with α=4 and γ=1. Quasi-hyperbolic: {1,βδ,βδ ,βδ ,...}, with β=.7 and δ=.957.

Figure 7.1. Exponential and hyperbolic discount functions.

Because the discount rate represents the rate of decline of the discount function, the exponential discount function implies a constant discount rate: −

∂ (δ τ ) ∂τ δτ

= − ln δ.

By contrast, the hyperbolic discount function implies a discount rate that falls with the horizon, τ : ∂ (1 + ατ )−γ /α γ ∂τ − = . −γ /α (1 + ατ ) (1 + ατ ) In the short run, the hyperbolic discount rate is γ and in the long run the discount rate converges to zero. This reﬂects the robust experimental ﬁnding that people are very impatient in the short run (e.g., when postponing a reward from today to tomorrow) and very patient when thinking about long-run tradeoffs (postponing a reward from 100 days to 101 days). To reﬂect the empirical pattern of discount rates that fall with the horizon, Laibson (1997a) adopted a discrete-time discount function, {1, βδ, βδ 2 , βδ 3 , . . .}, which Phelps and Pollak (1968) had previously used to model intergenerational time preferences.6 This “quasi-hyperbolic function” reﬂects the sharp short-run drop in valuation measured in the experimental time-preference data and has been adopted as a research tool because of

6

Akerlof (1991) used a similar function: {1, β, β, β, . . .}.

262

Harris and Laibson

its analytical tractability.7 Figure 7.1 plots the particular parameterization of the quasi-hyperbolic discount function used in our simulations: β = 0.7 and δ = 0.957. Using annual periods, these parameter values roughly match experimentally measured discounting patterns. Delaying an immediate reward by a year reduces the value of that reward by approximately 40 percent ≈ 1 − βδ. By contrast, delaying a distant reward by an additional year reduces the value of that reward by a relatively small percentage: 1 − δ.8 All forms of hyperbolic preferences induce dynamic inconsistency. Consider the discrete-time quasi-hyperbolic function. The discount factor between adjacent periods t and t + 1 represents the weight placed on utils at time t + 1 relative to the weight placed on utils at time t. From the perspective of self t, the discount factor between periods t and t + 1 is βδ, but the discount factor that applies between any two later periods is δ. Because we take β to be less than one, this implies a short-term discount factor that is less than the long-term discount factor.9 From the perspective of self t + 1, βδ is the relevant discount factor between periods t + 1 and t + 2. Hence, self t and self t + 1 disagree about the desired level of patience that should be used to trade off rewards in periods t + 1 and t + 2. Because of this dynamic inconsistency, the hyperbolic consumer is involved in a decision that has intrapersonal strategic dimensions. Early selves would like to commit later selves to honor the preferences of those early selves. Later selves do their best to maximize their own interests. Economists have modeled this situation as an intrapersonal game played among the consumer’s temporally situated selves (Strotz, 1956). Recently, hyperbolic discount functions have been used to explain a wide range of anomalous economic choices, including procrastination, contract design, drug addiction, self-deception, retirement timing, and undersaving.10 We focus here on the implications for life cycle savings decisions. In the sections that follow, we analyze the “sophisticated” version of the hyperbolic model. Sophisticated hyperbolic consumers correctly predict that later selves will not honor the preferences of early selves. By contrast, “naive” consumers make current choices under the false belief that later selves will act in the interests of the current self. The assumption of naivete was ﬁrst proposed

7

8 9 10

The quasi-hyperbolic discount function is “hyperbolic” only in the sense that it captures the key qualitative property of the hyperbolic functions: a faster rate of decline in the short run than in the long run. Laibson (1997a) adopted the phrase “quasi-hyperbolic” to emphasize the connection to the hyperbolic-discounting literature in psychology (Ainslie 1992). O’Donoghue and Rabin (1999a) call these preferences “present biased.” Krusell and Smith (2000a) call these preferences “quasi-geometric.” See Ainslie (1992) and Frederick et al. (2000). Note that a discount factor, say θ , is inversely related to the discount rate, − ln θ . For example, see Akerlof (1991), Laibson (1994, 1996, 1997a), Barro (1997), Diamond and Koszegi (1998), O’Donoghue and Rabin (1999a, 1999b, 2000), Benabou and Tirole (2000), Brocas and Carrillo (2000, 2001), Carrillo and Dewatripont (2000), Carrillo and Marriotti (2000), Della Vigna and Paserman (2000), Della Vigna and Malmendier (2001), Gruber and Koszegi (2001), and Krusell et al. (2000a, 2000b).

Hyberbolic Discounting and Consumption

263

by Strotz (1956), and has since been carefully studied by Akerlof (1991) and O’Donoghue and Rabin (1999a, 1999b, 2000). We return to a discussion of naifs in Section 9.

3. THE CONSUMPTION PROBLEM Our benchmark model adopts the technological assumptions of standard “buffer-stock” consumption models like those originally developed by Deaton (1991) and Carroll (1992, 1997). These authors assume stochastic labor income and incomplete markets – consumers cannot borrow against uncertain future labor income. In this section, we consider a stripped-down stationary version of the standard buffer-stock model. In Section 8, we discuss a more complex life cycle model, with a richer set of institutional assumptions. Our modeling assumptions for the stripped-down model divide naturally into four parts: the standard assumptions from the buffer-stock literature; the assumptions that make our model qualitatively hyperbolic; our equilibrium concept; and the technical assumptions that allow us to derive the Hyperbolic Euler Relation. We discuss the ﬁrst three sets of assumptions herein. The fourth set of assumptions is presented in Section 4.1. 3.1.

Buffer-Stock Assumptions

During period t, the consumer has cash on hand xt ≥ 0. She chooses a consumption level ct ∈ [0, xt ], which rules out borrowing. Whatever the consumer does not spend is saved, st = xt − ct ∈ [0, xt ]. The gross return on her savings is ﬁxed, R ≥ 0, and next period she receives labor income yt+1 ≥ 0. Cash on hand during period t + 1 is, therefore, xt+1 = R(xt − ct ) + yt+1 . Labor income is independently and identically distributed over time with density f . The consumer cannot sell her uncertain stream of future labor-income payments, because of moral hazard and adverse selection, or because of prohibitions against indenturing. In other words, there is no asset market for labor. 3.2.

Hyperbolic Preferences

We model an individual as a sequence of autonomous temporal selves. These selves are indexed by the respective periods, t = 0, 1, 2, . . . , in which they control the consumption choice. Self t receives payoff

E t U (ct ) + β

∞

δ U (ct+i ) , i

(3.1)

i=1

where β ∈ [0, 1], δ ∈ [0, 1), and U : [0, +∞) → [−∞, +∞). Our model nests the standard case of exponential discounting: β = 1, 0 ≤ δ < 1. Our model also nests the quasi-hyperbolic case: β < 1, 0 ≤ δ < 1.

264

Harris and Laibson

3.3.

Equilibrium

We analyze the set of perfect equilibria in stationary Markov strategies of the intrapersonal game with players (or selves) indexed by the non-negative integers. Because income is iid., the only state variable is cash on hand xt . We therefore restrict attention to consumption strategies C that depend only on x t . 4. EXISTENCE AND UNIQUENESS This technical discussion can be skipped by readers interested primarily in applications. Such readers may wish to move immediately to Section 5. 4.1.

Technical Assumptions

We make the following technical assumptions: U1 U has domain [0, +∞) and range [−∞, +∞) U2 U is twice continuously differentiable on (0, +∞) U3 U > 0 on (0, +∞) U4 there exist 0 < ρ ≤ ρ¯ < +∞ such that ρ ≤ −cU

(c)/U (c) ≤ ρ¯ for all c ∈ (0, +∞) F1 f has domain (0, +∞) and range [0, +∞) F2 f is twice continuously differentiable / [y, y¯ ] F3 there exist 0 < y < y¯ < +∞ such that f (y) = 0 for all y ∈ D max{δ, δ R 1−ρ } < 1 Assumptions U1–U4 could be summarized by saying that U has bounded relative risk aversion. They are automatically satisﬁed if U has constant relative risk aversion. Assumptions F1–F3 could be summarized by saying that f is smooth, that the support of f is compact, and that 0 does not lie in the support of f . Assumption D ensures that the expected present discounted value of the consumer’s utility stream is always well deﬁned. Further discussion of these assumptions can be found in Harris and Laibson (2001a). 4.2.

The Bellman Equation of the Hyperbolic Consumer

The intrapersonal game of the hyperbolic consumer can be approached recursively as follows. Suppose that self t has current-value function Wt and continuation value function Vt , and suppose that self t + 1 has consumption function Ct+1 and current-value function Wt+1 . Then, it follows from the Envelope theorem that

(xt+1 ). U (Ct+1 (xt+1 )) = Wt+1

(4.1)

Next, it follows from the deﬁnition of Wt+1 and Vt that βVt (xt+1 ) = Wt+1 (xt+1 ) − (1 − β)U (Ct+1 (xt+1 )).

(4.2)

Hyberbolic Discounting and Consumption

Finally, it follows from the deﬁnition of Wt that Wt (xt ) = max U (c) + βδ Vt (R(xt − c) + y) f (y)dy. c∈[0,xt ]

Hence,

265

(4.3)

Wt (xt ) = max U (c) + δ c∈[0,xt ]

(Wt+1 − (1 − β)U ◦ Ct+1 )

× (R(xt − c) + y) f (y)dy [substituting for Vt in equation (4.3) using equation (4.2)]

) (R(xt − c) + y) f (y)dy = max U (c) + δ (Wt+1 − εU ◦ g ◦ Wt+1 c∈[0,xt ]

[where ε = 1 − β and g = (U )−1 ] = (BWt+1 ) (xt ), say. This is the Bellman equation of the hyperbolic consumer. 4.3.

The Finite-Horizon Case: Current-Value Functions

Suppose that the intrapersonal game of the hyperbolic consumer has a ﬁnite horizon T < +∞. Then, in principle, the current-value functions can be shown to be unique by backward induction. Indeed, suppose for simplicity that the consumer has no bequest motive. Then, we expect WT = U, WT −1 = BWT , . . . , W1 = BW2 . In practice, we need to ﬁnd a space of functions W such that, if Wt+1 ∈ W, then Wt = BWt+1 is well deﬁned and lies in W. To this end, we make the following deﬁnition. Deﬁnition 4.1. The function g : (0, +∞) → R is of locally bounded variation iff there exist increasing functions g+ : (0, +∞) → R and g− : (0, +∞) → R such that g = g+ − g− . Now, let us say that two functions of locally bounded variation are equiv0 alent iff they are equal at all points of continuity. Let BVloc ((0, +∞)) is the space of equivalence classes of functions of locally bounded variation, and let BV 1loc ((0, +∞)) denote the space of equivalence classes of functions W such that both W and W are of locally bounded variation. Then, the correct choice of space for our current-value function is W = BV 1loc ((0, +∞)).

is a function To see this, note ﬁrst that, if Wt+1 ∈ BV 1loc ((0, +∞)), then Wt+1

of locally bounded variation. Hence, Wt+1 is uniquely deﬁned, except at a countable set of points, and BWt+1 is uniquely deﬁned at all points. Second, consider the operator bγ given by the formula

) (bγ Wt+1 ) (xt ) = U (γ xt ) + δ (Wt+1 − εU ◦ g ◦ Wt+1 × (R(1 − γ )xt + y) f (y)dy.

266

Harris and Laibson

Then BWt+1 = sup {bγ Wt+1 }. γ ∈[0,1]

In other words, BWt+1 is the upper envelope of the functions bγWt+1 . Third, note that bγWt+1 is twice continuously differentiable. Moreover, there exists a continuous function a : (0, +∞) → [0, +∞] such that, for all γ ∈ [0, 1], |bγWt+1 |, |(bγ Wt+1 ) |, |(bγ Wt+1 )

| ≤ a on (0, +∞). In particular, there exists a twice continuously differentiable convex function κ : (0, +∞) → R such that, for all γ ∈ [0, 1], bγ Wt+1 + κ is convex. Hence BWt+1 = sup {bγ Wt+1 } = sup {bγ Wt+1 + κ} − κ. γ ∈[0,1]

γ ∈[0,1]

In other words, BWt+1 is the difference of two convex functions. In light of the following result, this is exactly what we need. Proposition 4.2. Suppose that W : (0, +∞) → R. Then, W ∈ BV 1loc ((0, +∞)) iff W is the difference of two convex functions. 4.4.

The Finite-Horizon Case: Consumption Functions

Suppose, again, that the intrapersonal game of the hyperbolic consumer has a ﬁnite horizon T < +∞ and that the consumer has no bequest motive. Then, the consumption function of self T is unique and is given by the formula C T (x T ) = x T ; and, for all 1 ≤ t ≤ T − 1, the consumption function of self t is any function such that

) Ct (xt ) ∈ argmax U (c) + δ (Wt+1 − εU ◦ g ◦ Wt+1 c∈[0,xt ]

× (R(xt − c) + y) f (y)dy for all xt ∈ [0, +∞). Now, Ct = g ◦ Wt is uniquely deﬁned and continuous, except on a countable set of points. Because this set of points has measure zero, it is encountered with probability zero. It follows that any two consumption functions of self t are observationally equivalent. By the same token, any two equilibria are observationally equivalent. This uniqueness claim can be made precise by viewing consumption functions as elements of the space BV 0loc ((0, +∞)).

Hyberbolic Discounting and Consumption

4.5.

267

The Inﬁnite-Horizon Case: Existence

To establish existence in the ﬁnite-horizon case, we showed that the Bellman operator B was a self-map of the space BV 1loc ((0, +∞)). To establish existence in the inﬁnite-horizon case, we need to strengthen this result by showing that there is a nonempty compact convex subset K of BV 1loc ((0, +∞)), such that B is a self-map of K. Deﬁne V : [0, +∞) → [−∞, +∞) by the formula δ V (x) = U (x) + U (y) f (y)dy, 1−δ deﬁne V¯ : [0, +∞) → [−∞, +∞) by the formula ∞ t+1 t s t ¯ V (x) = U (x) + δU R y+R x , t=1

s=0

and, for all Borel measurable V ∈ [V , V¯ ], deﬁne WV : [0, +∞) → [−∞, +∞) by the formula $ (WV )(x) = max U (γ x) + βδ V (R(1 − γ )x + y) f (y)dy . γ ∈[0,1]

Finally, put V − = −(V ∧ 0) and V¯ + = +(V¯ ∨ 0), deﬁne N1 : [0, +∞) → [0, +∞) by the formula N1 (x) = V − (y) ∨ V¯ + (Rx + y¯ ), and deﬁne N2 : [0, +∞) → [0, +∞) by the formula N2 (x) = U (x)/x ∨ N1 (x). Then: Theorem 4.3 [Global Regularity]. There exist K > 0 such that, for all V ∈ [V , V¯ ], 1. (1 − β)U + βV ≤ WV ≤ (1 − β)U + β V¯ , 2. U ≤ (WV ) ≤ U ∨ (K N1 ), and 3. (WV )

≥ −K N2 on (0, +∞). The required set K is then simply the set of W ∈ BV 1loc ((0, +∞)) that satisfy the three estimates in this theorem. 4.6.

The Inﬁnite-Horizon Case: Uniqueness

To establish uniqueness in the inﬁnite-horizon case, we begin by showing that, no matter what the initial cash on hand of the consumer, there exists a ﬁnite interval from which the dynamics of wealth never exit. Theorem 4.4. [Absorbing Interval]. Suppose that δ R < 1. Then, for all x0 ∈ [0, +∞), there exists β¯ 1 ∈ [0, 1) and X¯ ∈ [x0 , +∞) such that, for all β ∈

268

Harris and Laibson

[β¯ 1 , 1] and all equilibria C of the inﬁnite-horizon model, R(x − C(x)) + y ∈ [y, X¯ ] for all x ∈ [0, X¯ ] and all y ∈ [y, y¯ ]. We are now in a position to prove uniqueness. Theorem 4.5. [Uniqueness]. Suppose that δ R < 1, and that U is three times continuously differentiable on (0, +∞). Then, for all x0 ∈ [0, +∞), there exists β¯ 2 ∈ [0, 1) and X¯ ∈ [x0 , +∞) such that, for all β ∈ [β¯ 2 , 1], equilibrium is unique on [0, X¯ ]. Notice that Theorem 4.5 is a local uniqueness theorem: the critical value β¯ 2 will in general depend on x0 . Local uniqueness is, however, all that we need: if initial cash on hand is x0 and β ∈ [β¯ 2 , 1], then levels of cash on hand outside the interval [0, X¯ ] will not be observed in any equilibrium. We do not know whether theorem 4.5 has a global analog. Proof. See the Appendix. 4.7.

The Finite-Horizon Case: Robustness

By combining our existence and uniqueness results for the ﬁnite-horizon case with our regularity results, we can show that the equilibrium of the ﬁnitehorizon model depends continuously on the parameters U, f, β, and δ. This leaves one parameter unaccounted for: T . This parameter plays an important role in empirical applications. For example, simulations of calibrated life cycle models usually proceed by truncating the life cycle at some point. It is therefore crucial to verify that the equilibrium of the chosen model is robust with respect to the horizon chosen for the model. The simplest way to establish robustness would be to show that there is a unique equilibrium of the inﬁnite-horizon model. If we could show this, then it would follow at once from our regularity results that this equilibrium depended continuously on T . More precisely, note that T is chosen from the space N ∪ {∞}. All the points of this space are isolated except for the point ∞, which is an accumulation point. By saying that the equilibrium depends continuously on T , we therefore mean that there is a unique equilibrium when T = ∞ and, for all η > 0, there exists a T0 < ∞ such that, for all T > T0 , the equilibrium of the model with horizon T is within η of the equilibrium of the model with horizon ∞. In other words, the choice of horizon for the model makes very little difference to the equilibrium, provided that this horizon is sufﬁciently far into the future. Unfortunately, the proof of theorem 4.5 shows only that, if β is sufﬁciently close to 1, then there is a unique stationary equilibrium of the model. This leaves open two possibilities. First, there may be more than one stationary equilibrium

Hyberbolic Discounting and Consumption

269

if β is not close to 1. Second, there may be nonstationary equilibria. It may be very difﬁcult to make progress with the ﬁrst possibility: Although it may be possible to identify other regions of parameter space in which there is a unique stationary equilibrium, it may not be true that there is a unique equilibrium for all choices of the parameters. After all, we are analyzing a game. It may, however, be possible to make progress with the second possibility: what is needed here is a proof that the Bellman operator is a contraction mapping. The proof of Theorem 4.5 falls short of this goal: it shows only that the Bellman operator is a contraction mapping when conﬁned to the set of current-value functions of stationary equilibria. Nonetheless, the available evidence suggests that life cycle simulations are probably robust to the choice of horizon provided that β is sufﬁciently close to 1. 5. GENERALIZED EULER EQUATION In this section, we discuss the hyperbolic analog of the standard Euler Relation.11 5.1.

Heuristic Derivation of the Hyperbolic Euler Relation

Suppose that C is an equilibrium consumption function. Adopt the perspective of self t. Because all future selves use the consumption function C, and because self t uses the same discount factor δ from period t + 1 onward, her continuation-value function V solves the recursive equation V (xt+1 ) = U (C(xt+1 )) + E t+1 [δV (R(xt+1 − C(xt+1 )) + yt+2 )]. (5.1) Note that V (xt+1 ) is the expectation, conditional on xt+1 , of the present discounted value of the utility stream that starts in period t + 1. Self t uses discount factor βδ at time t. Her current-value function W therefore solves the equation W (xt ) = U (C(xt )) + E t [βδV (R(xt − C(xt )) + yt+1 )].

(5.2)

Moreover C(xt ) ∈ argmax U (c) + E t [βδV (R(xt − c) + yt+1 )],

(5.3)

c∈[0,xt ]

because consumption is chosen by the current self. The ﬁrst-order condition associated with (5.3) implies that U (C(xt )) ≥ E t [RβδV (R(xt − C(xt )) + yt+1 )],

(5.4)

with equality if C(xt ) < xt . The ﬁrst-order condition and envelope theorem together imply that the shadow value of cash on hand equals the marginal 11

The material from this section was ﬁrst published in Harris and Laibson (2001).

270

Harris and Laibson

utility of consumption: W (xt ) = U (C(xt )).

(5.5)

Finally, V and W are linked by the equation βV (xt+1 ) = W (xt+1 ) − (1 − β)U (C(xt+1 )).

(5.6)

These expressions can be combined to yield the Strong Hyperbolic Euler Relation. Indeed, we have U (C(xt )) ≥ E t [RβδV (R(xt − C(xt )) + yt+1 )] [this is just the ﬁrst-order condition (5.4)] = E t [Rδ(W (xt+1 ) − (1 − β)U (C(xt+1 ))C (xt+1 ))] [differentiating equation (5.6) with respect to xt+1 and substituting in] = E t [Rδ(U (C(xt+1 )) − (1 − β)U (C(xt+1 ))C (xt+1 ))] [from the analog of equation (5.5) for self t + 1]. Rearranging yields U (C(xt )) ≥ E t [R(C (xt+1 )βδ + (1 − C (xt+1 ))δ)U (C(xt+1 ))],

(5.7)

with equality if c < xt . This is the Hyperbolic Euler Relation. When β = 1, this relation reduces to the well-known Exponential Euler Relation U (C(xt )) ≥ E t [RδU (C(xt+1 ))]. Intuitively, the marginal utility of consuming an additional dollar today, U (Ct ), must equal the marginal utility of saving that dollar. A saved dollar grows to R dollars by next year. Utilities next period are discounted with factor δ. Hence, the value of today’s marginal savings is given by E t [RδU (Ct+1 )]. The expectation operator integrates over uncertain future consumption. The difference between the Hyperbolic Euler Relation and the Exponential Euler Relation is that, in the former, the constant exponential discount factor, δ, is replaced by the effective discount factor, namely C (xt+1 )βδ + (1 − C (xt+1 ))δ. This effective discount factor is a weighted average of the short-run discount factor βδ and the long-run discount factor δ. The respective weights are C (xt+1 ), the marginal propensity to consume out of liquid wealth, and (1 − C (xt+1 )), the marginal propensity to consume out of liquid wealth. Because β < 1, the effective discount factor is stochastic and endogenous to the model. In the sophisticated hyperbolic model, the effective discount factor is negatively related to the future marginal propensity to consume (MPC). To gain intuition for this effect, consider a consumer at time 0 who is thinking about saving a marginal dollar for the future. The consumer at time zero – “self 0” – expects future selves to overconsume relative to the consumption rate that self 0 prefers those future selves to implement. Hence, on the equilibrium path, self 0 values marginal saving more than marginal consumption at any future time

Hyberbolic Discounting and Consumption

271

period. From self 0’s perspective, therefore, it matters how a marginal unit of wealth at time period 1 will be divided between savings and consumption by self 1. Self 1’s MPC determines this division. Because self 0 values marginal saving more than marginal consumption at time period 1, self 0 values the future less the higher the expected MPC at time period 1. The effective discount factor in the Hyperbolic Euler Relation varies significantly with cash on hand. Consumers who expect to have low levels of future cash on hand will expect C (xt+1 ) to be close to one,12 implying that the effective discount factor will approximately equal βδ. Assuming that periods are annual with a standard calibration of β = 0.7 and δ = 0.95, the effective discount rate would be − ln(0.7 × 0.95) = 0.41. By contrast, consumers with high levels of future cash on hand will expect C (xt+1 ) to be close to zero,13 implying that the effective discount factor will approximately equal δ. In this case, the effective discount rate will be − ln(0.95) = 0.05. The simulations reported below conﬁrm these claims about the shape of C. 5.2.

Exact Derivation

If the consumption function is discontinuous, then the derivation of the Hyperbolic Euler Relation is not valid. However, the consumption function is always of locally bounded variation. This property can be used to derive a weaker version of the Hyperbolic Euler Relation. This weaker version reduces to the Hyperbolic Euler Relation if the consumption function is Lipschitz-continuous. Moreover, it can be shown that the consumption function is indeed Lipschitzcontinuous when β is sufﬁciently close to 1 (Harris and Laibson 2001a). 6. NUMERICAL SOLUTION AND CALIBRATION OF THE MODEL We complement our theoretical analysis with numerical simulations. Numerical results help to build intuition and provide quantitative assessment of qualitative effects. In this section, we describe our strategy for simulating the one-asset, inﬁnite-horizon model. The same broad strategy applies to the institutionally richer simulations that we describe in Section 8. We calibrate our stripped-down model with the same parameter values used by ALRTW (2001a). Speciﬁcally, ρ = 2, β = 0.7, δ = 0.9571, and R = 1.0375.14

12 13

14

Low levels of cash on hand imply that the agent is liquidity-constrained. Hence, low levels of cash on hand imply a high MPC. When the agent is not liquidity-constrained, marginal consumption is approximately equal to the annuity value of marginal increments of wealth. Hence, the local slope of the consumption function is close to the real interest rate. ALRTW choose all of these parameters ex ante, except δ. Then, δ is chosen so that the simulated data match the empirical median wealth to income ratio of 50- to 59-year-old household heads. ALRTW also use this method to infer the preferences of exponential consumers (β = 1). They ﬁnd that δexponential = .9437.

272

Harris and Laibson

1.2

45-degree line

Consumption function C(x)

1

0.8

Consumption function

0.6

0.4

0.2

0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption function is based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375, a = 5.

Figure 7.2. Calibrated consumption function.

To capture labor-income uncertainty, we adopt a shifted symmetric Beta density with support [ε, 1 + ε]: f (Y ) ∝ (Y − ε)(a−1) (1 + ε − Y )(a−1) , where a > 0 and ε is positive, but close to zero. Hence, Y has mean 12 + ε. If a > 1, the density is bell-shaped and continuous on R. Moreover, if a > 3, then the density is twice continuously differentiable, and therefore satisﬁes the regularity conditions of Section 4. We set a = 5, implying that σ (Y )/Y = 0.30. This value is comparable with the value of σ (Y )/Y implied by standard income processes estimated from the Panel Study of Income Dynamics. For example, ALRTW estimate a process for ln Y that has two components: an AR(1) process and iid. noise.15 Their empirically estimated process implies σ (Y )/Y = 0.32.16 Figure 7.2 reports the equilibrium consumption function generated by our inﬁnite-horizon, one-asset simulation.17 The function is continuous, monotonic, 15 16 17

Speciﬁcally, ln Yt = [household ﬁxed effects] + [polynomial in age] + u t + ηt where u t = αu t−1 + εt , and ηt and εt are white noise. This is an unconditional ( normalized standard deviation. The empirical conditional normalized standard deviation, E t−1 (Yt − Y¯ t )2 /Y , is .23. To simulate our model numerically, we adopt a numerical solution algorithm that does not interpolate between points in the state space. Speciﬁcally, our algorithm discretizes the state space and forces the consumer to make choices that keep the state variables on the discrete partition. We believe that our algorithm successfully approximates the behavior that would arise in a continuous state space. Most importantly, we ﬁnd that once our partition is made sufﬁciently ﬁne, further reﬁnement has no effect on our simulation results.

Hyberbolic Discounting and Consumption

273

and concave. It appears smooth, except for the point at which the liquidity constraint begins to bind. In the next section, we identify cases in which these regularity properties cease to hold. 7. PROPERTIES OF THE CONSUMPTION FUNCTION The consumption function in Figure 7.2 is continuous, monotonic, and concave. However, hyperbolic consumption functions need not have these desirable properties (Laibson, 1997b, Morris and Postlewaite, 1997, O’Donoghue and Rabin, 1999a, Harris and Laibson, 2001a, and Krusell and Smith, 2000). In this section we characterize the general properties of the hyperbolic consumption function. We ﬁrst discuss the kinds of pathologies that can arise. We then discuss the regularity conditions that eliminate these pathologies. 7.1.

Pathologies: Violations of Continuity, Monotonicity, and Concavity

To develop intuition for the existence of hyperbolic pathologies, we consider a ﬁnite-horizon version of the model of Section 3.18 We assume that the stream of income is deterministic. We apply backward induction arguments to solve for the equilibrium policies. First, consider the strategy of self T . Trivially, self T sets cT = x T . Self T consumes all available cash on hand. Now, consider the problem of self T − 1. Self T − 1 knows that any resources left to self T will be consumed by self T . So, self T − 1 chooses cT −1 to maximize U (cT −1 ) + βδU (cT ) subject to the constraints x T = R(x T −1 − cT −1 ) + yT , cT −1 ≤ x T −1 , cT = x T . The ﬁrst constraint is the dynamic budget constraint. The second constraint is the liquidity constraint. The third constraint reﬂects the equilibrium strategy of self T . Given this problem, it is straightforward to show that, when the liquidity constraint does not bind, self T − 1 picks cT −1 such that U (cT −1 ) = βδ RU (R · (x T −1 − cT −1 ) + yT ). When the liquidity constraint binds, self T − 1 sets cT −1 = x T −1 . Represent self T − 1’s equilibrium policy function as C T −1 (x T −1 ). 18

See Laibson (1997b) for the original version of this example.

274

Harris and Laibson

Now, consider the problem of self T − 2. Self T − 2 chooses cT −2 to maximize U (cT −2 ) + βδU (cT −1 ) + βδ 2 U (cT ), subject to the constraints x T −1 = R(x T −2 − cT −2 ) + yT −1 , cT −1 = C T −1 (x T −1 ), cT = x T .

cT −2 ≤ x T −2 ,

The ﬁrst constraint is the dynamic budget constraint. The second constraint is the liquidity constraint. The third and fourth constraints represent the strategies of selves T − 1 and T . To develop intuition for the optimal policy of self T − 2, consider the continuation value function of self T − 2, VT −1 (x T −1 ) = u(C T −1 (x T −1 )) + δu(R(x T −1 − C T (x T −1 )) + yT ). From self T − 2’s perspective, wealth at time T − 1 has a value βδVT −1 (x T −1 ). There exists a threshold wealth level x T −1 = xˆ at which the liquidity constraint for self T − 1 ceases to bind. In the region to the left of xˆ , all marginal wealth is consumed in period T − 1, implying VT −1 (xˆ −) = U (C T −1 (xˆ )). In the region to the right of xˆ , some marginal wealth is passed on to period T , implying VT −1 (xˆ +) = C T −1 (xˆ ) · U (C T −1 (xˆ )) + δ R(1 − C T −1 (xˆ ))U (C T (xˆ )). Note that at x T −1 = xˆ , self T − 1 is indifferent between marginal consumption in period T − 1, and marginal consumption in period T . So, U (C T −1 (xˆ )) = RβδU (C T (xˆ )). Substituting this relationship into the previous expression yields 1

VT −1 (xˆ +) = C T −1 (xˆ ) + (1 − C T −1 (xˆ )) U (C T −1 (xˆ )) β > U (C T −1 (xˆ )) = VT −1 (xˆ −). Hence the continuation value function VT −1 has a kink at x T −1 = xˆ . At this point, the slope of the value function discretely rises. This kink implies that the equilibrium consumption function of self T − 2 will have a downward discontinuity. To understand why, note that self T − 2 will never select a value of cT −2 > 0, such that R(x T −2 − cT −2 ) + yT +1 = x T −1 = xˆ . If x T −1 = xˆ did hold, self T − 2 could raise her welfare by either cutting or raising consumption. If U (cT −2 ) < βδ RVT −1 (xˆ +), self T − 2 could increase welfare by cutting consumption – with marginal cost

Hyberbolic Discounting and Consumption

275

U (cT −2 ) – and raising saving – with marginal beneﬁt βδ RVT −1 (xˆ +). If U (cT −2 ) ≥ βδ RVT −1 (xˆ +), self T − 2 could increase welfare by raising consumption – with marginal beneﬁt U (cT −2 ) – and lowering saving – with marginal cost βδ RVT −1 (xˆ −) < βδ RVT −1 (xˆ +) ≤ U (cT −2 ). Self T − 2 makes equilibrium choices that avoid the region of lowcontinuation marginal utilities – in the neighborhood to the left of x T −1 = xˆ – by jumping to the region of high-continuation marginal utilities to the right of x T −1 = xˆ . This avoidance can be achieved only with an equilibrium consumption function that has a discrete downward discontinuity. Figure 7.3 plots the equilibrium consumption functions for selves T − 2, T − 1, and T for the case in which the instantaneous utility function is isoelastic and yt = 1 for all t. Intuitively, the pathology described here arises because of a special kind of strategic interaction. Self T − 2’s consumption function discontinuously declines because self T − 2 has an incentive to push self T − 1 over the wealth threshold xˆ at which self T − 1 has a kink in its consumption function. Self T − 2 is willing to discretely cut its own consumption to push T − 1 over the xˆ threshold, because the marginal returns to the right of xˆ are greater than the marginal returns to the left of xˆ from self T − 2’s perspective. If this example was extended another period, we could also demonstrate that the optimal choices of self T − 3 will violate the Hyperbolic Euler Equation. Finally, all of these pathologies would continue to arise, even if a small amount of smooth noise was added to the income process. 7.2.

Sufﬁcient Conditions for Continuity, Monotonicity, and Concavity of the Consumption Function

The previous subsection provides an example of the kinds of pathologies that can arise in hyperbolic models. However, these pathologies do not arise when the model is calibrated with empirically sensible parameter values (see Figure 7.2). In this section, we identify the parameter regions that generate the pathologies. First, when β is close to one, the discontinuities and nonmonotonicities vanish. Harris and Laibson (2001a) prove this claim formally. Intuitively, when β is close to one, the hyperbolic consumption function converges to the exponential consumption function, which is continuous and monotonic.19 Likewise, when β is close to one, the hyperbolic consumption function matches the concavity of the exponential consumption function. Carroll and Kimball (1996) provide sufﬁcient conditions for exponential concavity (U in the HARA class), although they do not handle the case of binding liquidity constraints. Figure 7.4 graphically demonstrates the comparative static on β. We plot the consumption functions generated by β values {0.1, 0.2, 0.3, . . . , 0.7}.20 The consumption functions are vertically shifted so they do not overlap. Recall 19 20

All of the convergence results apply to an absorbing interval of x values. See Section 4 for a deﬁnition and discussion of such absorbing intervals. We adopt the baseline parameter values a = 5, δ = .0571, ρ = 2, R = 1.0375.

Consumption in period T-2

1 2 Cash-on-hand

3

Consumption in period T-1 0 0

0.5

1

1.5

2

2.5

3

1x 2 Cash-on-hand

3

0 0

0.5

1

1.5

2

2.5

3

1 2 Cash-on-hand

Figure 7.3. Consumption functions in periods T − 2, T − 1, and T .

The consumption functions are based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375.

0 0

0.5

1

1.5

2

2.5

3 Consumption in period T

3

Hyberbolic Discounting and Consumption

277

Vertically shifted consumption functions C(x)

12 β = .1 β = .2 β = .3 β = .4 β = .5 β = .6 β = .7

10

8

6

4

2

0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which δ = .9571, ρ = 2, R = 1.0375, a = 5.

Figure 7.4. Variation in β.

that β = 0.7 corresponds to our benchmark calibration. As β falls below 0.4, the consumption function becomes increasingly irregular. However, regularity returns as β falls to zero: in a neighborhood of β = 0, the consumption function coincides with the 45 degree line. Pathologies are also controlled by the curvature of the consumption function. Our simulation results imply that increasing ρ eliminates irregularities. Figure 7.5 graphically demonstrates the comparative static on ρ. We plot the consumption functions generated by ρ values {0.5, 0.75, 1, 1.25}.21 The consumption functions are again vertically shifted. Recall that ρ = 2 corresponds to our benchmark calibration. As ρ falls below 1.25, the consumption function becomes increasingly irregular. The irregularities increase as ρ falls, because low curvature augments the feedback effects that engender the irregularities. Speciﬁcally, when the utility function is relatively less bowed, it is relatively less costly to strategically cut consumption today to push future selves over critical wealth thresholds. Finally, decreasing the variance of the income process increases the degree of irregularity. Figure 7.6 graphically demonstrates the comparative static on a. We plot the consumption functions generated by a values {25, 50, 100, 200, 400}.22 These a values correspond to σ (Y )/Y values of {0.14, 0.10, 0.07, 0.05, 0.04, 0.03}. Recall that a = 5 (i.e., σ (Y )/Y = 0.30) corresponds to our benchmark calibration. As a rises above 25 (i.e., σ (Y )/Y falls below 0.14), 21 22

We adopt the baseline parameter values a = 5, β = .7, δ = .9571, R = 1.0375. We adopt the baseline parameter values β = .7, δ = .9571, ρ = 2, R = 1.0375.

278

Harris and Laibson

Vertically shifted consumption functions C(x)

8 ρ = 0.50 ρ = 0.75 ρ=1 ρ = 1.25

7 6 5 4 3 2 1 0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which ρ = 2, δ = .9571, R = 1.0375, a = 5.

Figure 7.5. Variation in the coefﬁcient of relative risk aversion (ρ).

Vertically shifted consumption functions C(x)

4 a = 400 a = 200 a = 100 a = 50 a = 25

3.5 3 2.5 2 1.5 1 0.5 0 0

1

2

3

4 5 6 Cash-on-hand (x)

7

8

9

10

The consumption functions are based on simulations in which β = .7, δ = .9571, ρ = 2, R = 1.0375.

Figure 7.6. Variation in income uncertainty (a).

Hyberbolic Discounting and Consumption

279

the consumption function becomes increasingly irregular. The irregularities increase as a increases, because high a values correspond to low levels of income volatility. Low volatility makes it easier for early selves to predict future wealth levels, and to strategically push later selves over critical wealth thresholds. In summary, irregularities vanish when β is close to one, risk aversion is high, and uncertainty is high. At the benchmark calibration, the pathologies do not arise. Moreover, our model omits some sources of uncertainty that would only reinforce the regularity of our benchmark consumption functions. For example, our model omits shocks to preferences and asset return uncertainty.23 8. CONSUMPTION APPLICATIONS A series of papers have analyzed the positive and normative implications of the hyperbolic buffer-stock model: Laibson, Repetto, and Tobacman (1998, 2000) [hereafter LRT] and ALRTW (2001a, 2001b). These papers extend the precautionary saving models pioneered by Zeldes (1989b), Deaton (1991), and Carroll (1992, 1997).24 We will focus our discussion on the work of LRT (2000) and ALRTW (2001a). The ALRTW model incorporates most of the features of previous life cycle simulation models and adds new features, including credit cards, time-varying household size, and illiquid assets. We summarize the key features of the ALRTW model herein. A more general version of the model, and a complete description of the calibration, appear in LRT.25 8.1.

Model Summary

Households are divided into three levels of educational attainment. We discuss simulation results only for the largest group, households whose head has only a high school degree (roughly half of U.S. households). The simulations have been replicated for households in other educational categories, and the conclusions are quantitatively similar (see LRT). Households face a time-varying, exogenous hazard rate of survival. Households live for a maximum of 90 periods, beginning economic life at age 20 and retiring at age 63. The retirement age is calibrated to match reported retirement ages from the PSID. Household composition – number of adults and nonadults – varies exogenously over the life cycle (also calibrated to match the PSID). Log income, ln Yit , is modeled as the sum of a polynomial in age and two stochastic components: an autocorrelated component and an iid. component. 23

24 25

Asset return uncertainty has an advantage over labor-income uncertainty, because the volatility generated by noisy returns scales up with the level of wealth. With sufﬁcient asset uncertainty, it should be possible to establish that regularity applies to the entire domain of cash on hand, instead of just an absorbing interval. See, also, Engen, Gale, and Scholz (1994), Hubbard, Skinner, and Zeldes (1994, 1995), and Gourinchas and Parker (1999). This more general model allows consumers to declare bankruptcy and allows the consumer to borrow against illiquid collateral (e.g., mortgages on housing).

280

Harris and Laibson

Different processes are estimated during the working life and during retirement (using the PSID). Households may hold liquid assets, X t , and illiquid assets, Z t . Because labor income is liquid wealth, X t + Yt represents total liquid asset holdings at the beginning of period t. Credit card borrowing is modeled as a negative value for X t . Credit card borrowing must not exceed a credit limit equal to some fraction of current (average) income. Speciﬁcally, X t ≥ −λ · Y¯ t , where Y¯ t is cohort average income at age t, and λ = 0.30 (calibrated from the 1995 SCF). The real after-tax interest rate on liquid assets is 3.75 percent. The real interest rate on credit card loans is 11.75 percent, two percentage points below the mean debt-weighted real interest rate reported by the Federal Reserve Board. This low value is chosen to capture implicitly the effect of bankruptcy. Actual annual bankruptcy rates of roughly 1 percent per year imply that the effective interest rate is at least one percentage point below the observed interest rate. The illiquid asset generates consumption ﬂows equal to 5 percent of the value of the asset (Z t ≥ 0). Hence, the holding return on illiquid assets is considerably higher than the return on other assets. However, the illiquid asset can be sold only with a transaction cost. Households have isoelastic preferences with a coefﬁcient of relative risk aversion of ρ = 2. Self t has instantaneous payoff function Ct + γ Z t 1−ρ −1 nt . u(Ct , Z t , n t ) = n t · 1−ρ Note that γ Z t represents the consumption ﬂow generated by Z t (γ = 0.05), and n t is the effective household size, n t = ([no. adultst ] + 0.4[no. of childrent ]). t ) or a Households have either an exponential discount function (δexponential t quasi hyperbolic discount function (βδhyperbolic , with β = 0.7). ALRTW assume that the economy is populated either exclusively by exponential households or exclusively by hyperbolic households. ALRTW pick δexponential and δhyperbolic to match empirical levels of retirement saving. Speciﬁcally, δexponential is picked so that the exponential simulations generate a median wealth to income ratio of 3.2, for individuals between ages 50 and 59. The median of 3.2 is calibrated from the SCF.26 The hyperbolic discount factor, δhyperbolic , is also picked to match the empirical median of 3.2.27 The discount factors that replicate the SCF wealth to income ratio are .9437 for the exponential model and .9571 for the hyperbolic model. Because hyperbolic consumers have two sources of discounting – β and δ – the hyperbolic 26 27

Wealth does not include social security wealth and other deﬁned beneﬁt pensions, which are already built into the model in the form of postretirement “labor income.” For calibration purposes, total wealth is measured as X + Z + (Y/24), where X represents liquid assets (excluding current labor income), Z represents illiquid assets, and Y represents annual after-tax labor income. The Y /24 is included to reﬂect average cash inventories used for (continuous) consumption out of labor income. If labor income is paid in equal monthly installments, Y /12, and consumption is smoothly spread over time, then average cash inventories will be Y /24.

Hyberbolic Discounting and Consumption

281

4

x 10

Mean consumption by age

4

Hyperbolic Exponential

3.5

3

2.5

2

1.5 20

30

40

50

60

70

80

90

Age Source: Angeletos et al 2001.

Figure 7.7. Simulated mean consumption proﬁles of hyperbolic and exponential households.

δs lie above the exponential δs. Recall that the hyperbolic and exponential discount functions are calibrated to generate the same amount of preretirement wealth accumulation. In this manner, the calibrations “equalize” the underlying willingness to save between the exponential and hyperbolic consumers. The calibrated long-term discount factors are sensible when compared with discount factors that have been used in similar exercises by other authors. Finally, note that these discount factors do not include mortality effects, which reduce the respective discount factors by an additional 1 percent on average per year. 8.2.

Simulation Results of ALRTW

Calibrated hyperbolic simulations – β = 0.7, δ = 0.957 – generate life cycle consumption proﬁles that closely match the life cycle consumption proﬁles generated by calibrated exponential simulations – β = 1, δ = 0.944. For example, Figure 7.7 compares hyperbolic and exponential consumption means over the life cycle. These two hump-shaped proﬁles are very similar.28 The only differences arise around retirement and at the very beginning and end of life. At the beginning of life, hyperbolic consumers go on a credit card–ﬁnanced spending spree,29 leading to higher consumption than the exponentials. Around 28

29

The consumption proﬁles roughly track the mean labor-income proﬁle. This low-frequency comovement is driven by two factors. First, low income early in life holds down consumption, because consumers do not have large credit lines. Second, consumption needs peak in midlife, when the number of adult-equivalent dependents peaks at age 47. See Gourinchas and Parker (1999) for empirical evidence on the early life consumption boom.

282

Harris and Laibson

retirement, hyperbolic consumption falls more steeply than exponential consumption, because hyperbolic households have most of their wealth in illiquid assets, which they cannot cost-effectively sell to smooth consumption. At the end of life, hyperbolic consumers have more illiquid assets to sell, slowing down the late-life collapse in consumption. The total wealth proﬁles of hyperbolics and exponentials are also similar. This correspondence is not surprising, because the hyperbolic and exponential simulations are each calibrated to match the observed level of retirement wealth accumulation in the SCF. However, the two models generate very different simulated allocations across liquid and illiquid assets. Just before retirement (age 63), the average liquid asset holding of the simulated hyperbolics households is only about $10,000, whereas the exponential households have accumulated more than $45,000 in liquid wealth (1990 dollars).30 Hyperbolics end up holding relatively little liquid wealth, because liquidity tends to be splurged to satisfy the hyperbolic taste for instant gratiﬁcation. Both naive and sophisticated hyperbolics will quickly spend whatever liquidity is at their disposal. By contrast, hyperbolics hold much more illiquid wealth than their exponential counterparts. Just before retirement, the average illiquid asset holding of the simulated hyperbolics is $175,000, compared with $130,000 for the exponentials. Hyperbolics are more willing to hold illiquid wealth for two reasons. First, sophisticated hyperbolics (like the hyperbolics in these simulations) view illiquid assets as a commitment device, which they value because it prevents later selves from splurging saved wealth too quickly. Second, illiquid assets are particularly valuable to hyperbolics (both naifs and sophisticates), because hyperbolics have lower long-run discount rates than exponentials. Hence, hyperbolics place relatively greater value on the long-run stream of payoffs associated with illiquid assets.31 Hyperbolics and exponentials dislike illiquidity for the standard reason that illiquid assets cannot be used to buffer income shocks. But, this cost of illiquidity is partially offset for hyperbolics for the two reasons described: hyperbolics value commitment and hyperbolics more highly value the long-run dividends of illiquid assets. Hence, on net, illiquidity is less costly for a hyperbolic than for an exponential consumer. To evaluate empirically the asset allocation predictions of the hyperbolic and exponential models, ALRTW compare the simulated results to survey evidence from the SCF. For example, ALRTW analyze the percentage of households that have at least 1 month of liquid wealth on hand. On average, 73 percent of simulated exponential households hold liquid assets greater than 1 month of labor income. The analogous number for hyperbolics is only 40 percent. For comparison, 42 percent of households in the SCF hold liquid ﬁnancial assets greater than 1 month of labor income. 30

31

For the purposes of the analysis in this subsection, simulated liquid assets are measured as X + + (Y /24), where X + represents positive holdings of liquid assets (excluding current labor income). The long-run discount rate of a hyperbolic consumer, − ln(δhyperbolic ) = − ln(.957) = .044, is calibrated to lie below the long-run discount rate of an exponential consumer, − ln(δexponential ) = − ln(.944) = .058.

Hyberbolic Discounting and Consumption

283

ALRTW also evaluate the models by analyzing the simulated quantity of liquid assets as a share of total assets. In the SCF, the average liquid wealth share is only 8 percent and neither the exponential nor hyperbolic simulations match this number, although the hyperbolic simulations are a bit closer to the mark. The average liquid wealth share for simulated hyperbolic households is 31 percent. The analogous exponential liquid wealth share is 50 percent. Revolving credit – e.g., credit card borrowing – represents another important form of liquidity. Low levels of liquid assets are naturally associated with high levels of credit card debt. ALRTW contrast exponential and hyperbolic consumers by comparing their simulated propensities to borrow on credit cards.32 At any point in time 51 percent of hyperbolic consumers borrow on their credit cards, compared with only 19 percent of exponentials. In the 1995 SCF, 70 percent of households with credit cards report that they did not fully pay their credit card bill the last time that they mailed in a payment. Hyperbolic simulations come much closer to matching these self-reports. Likewise, the simulated hyperbolic consumers borrow much more on average than the simulated exponential consumers. On average, simulated exponential households owe $900 of interest-paying credit card debt, including the households with no debt. By contrast, simulated hyperbolic households owe $3,400 of credit card debt. The actual amount of credit card debt owed per household with a credit card is approximately $4,600 (including households with no debt, but excluding the ﬂoat).33 Euler Equation tests have played a critical role in the empirical consumption literature since the work of Hall (1978). Many of the papers in this literature have asked whether lagged information predicts current consumption growth. In particular, many authors have tried to determine whether predictable changes in income predict changes in consumption: ln(Cit ) = α E t−1 ln(Yit ) + X it β + εit .

(8.1)

Here X it is a vector of control variables. The standard consumption model (without liquidity constraints) predicts α = 0; the marginal propensity to consume out of predictable changes in income should be zero. By contrast, empirical estimates of α lie above 0, with “consensus estimates” around α = 0.2.34 ALRTW estimate the standard comovement regression using simulated data. For the hyperbolic simulations, the coefﬁcient on E t−1 ln(Yit ) is α = 0.17. 32 33

34

See LRT (2000) for a much more detailed analysis of credit card borrowing. This average balance includes households in all education categories. It is calculated on the basis of aggregate information reported by the Federal Reserve. This ﬁgure is consistent with values from a proprietary account-level data set assembled by David Gross and Nicholas Souleles (1999a, 1999b, 2000). See LRT (2000). For example, Hall and Mishkin (1982) report a statistically signiﬁcant coefﬁcient of .200, Hayashi (1985) reports a signiﬁcant coefﬁcient of .158, Altonji and Siow (1987) report an insigniﬁcant coefﬁcient of .091, Attanasio and Weber (1993) report an insigniﬁcant coefﬁcient of .119, Attanasio and Weber (1995) report an insigniﬁcant coefﬁcient of .100, Shea (1995) reports a marginally signiﬁcant coefﬁcient of .888, Lusardi (1996) reports a signiﬁcant coefﬁcient of .368, Souleles (1999) reports a signiﬁcant coefﬁcient of .344, and ALRTW (2000) report a signiﬁcant coefﬁcient of .285. See Deaton (1992) and Browning and Lusardi (1996) for a discussion of the excess sensitivity literature.

284

Harris and Laibson

By contrast, the exponential simulations generate a value of α = 0.03. Hyperbolic consumers hold more of their wealth in illiquid form than exponentials. So, hyperbolics are more likely to hit liquidity constraints, raising their marginal propensity to consume out of predictable changes in income. The hyperbolic simulations also predict income-consumption comovement around retirement. Banks, Blundell, and Tanner (1998) and Bernheim, Skinner, and Weinberg (1997) argue that consumption anomalously falls during the mid1960s, at the same time that workers are retiring and labor income is falling. ALRTW estimate the following regression to explore the consumption drop at retirement: ln(Cit ) = IitRETIRE γ + X it β + εit . Here IitRETIRE is a set of dummy variables that take the value of one in periods t − 1, t, t + 1, and t + 2 if period t is the age of retirement; and X it is a vector of control variables. Summing the coefﬁcients on the four dummy variables (and switching signs) generates an estimate of the “excess” drop in consumption around retirement. Estimating these coefﬁcients from the PSID yields a statistically signiﬁcant excess drop of 11.6 percent around retirement. The analogous drop for simulated hyperbolic consumers is 14.5 percent, whereas the drop for simulated exponential consumers is only 3.0 percent. Hyperbolic consumers hold relatively little liquid wealth. A drop in income at retirement translates into a substantial drop in consumption, even though retirement is an exogenous, completely predictable event. All in all, the hyperbolic model consistently does a better job of approximating the data. Table 7.1 draws these ﬁndings together. 9. NAIFS VERSUS SOPHISTICATES Until now, we have considered the case in which early selves hold correct expectations about the preferences and behavior of later selves. Early selves anticipate that later selves will fail to maximize the patient long-run interests of early selves. When early selves hold such correct expectations, they are referred to as sophisticates (Strotz, 1956). Table 7.1.

% with liquid > 121 Y assets Mean liquidliquid + illiquid assets % borrowing on “Visa” Mean borrowing C − Y comovement % C drop at retirement Source: ALRTW (2001b).

Hyperbolic

Exponential

Data

40% 0.39 51% $3,400 0.17 14.5%

73% 0.50 19% $900 0.03 3.0%

42% 0.08 70% $4,600 ≈0.20 11.6%

Hyberbolic Discounting and Consumption

285

However, it is reasonable to imagine that early selves might mistakenly expect later selves to follow through on the early selves’ best intentions. This is the naive case, discussed by Strotz (1956), Akerlof (1991), and O’Donoghue and Rabin (1999a, 1999b, 2000). Such naifs have optimistic forecasts in the sense that they believe that future selves will carry out the wishes of the current self. Under this belief, the current self constructs the sequence of actions that maximizes the preferences of the current self. The current self then implements the ﬁrst action in that sequence, expecting future selves to implement the remaining actions. Instead, those future selves conduct their own optimization and therefore implement actions in conﬂict with the patient behavior anticipated by prior selves. In some cases, the behavior of naive hyperbolics is very close to the behavior of sophisticated hyperbolics. For example, ALRTW have replicated their calibration and analysis under the assumption that hyperbolic consumers are naive. They ﬁnd that the naive hyperbolics act effectively the same as the sophisticated hyperbolics discussed above. Hence, for the consumption applications in this paper, it does not matter whether we assume that hyperbolics are naive or sophisticated. However, this rough equivalence does not generally hold. Ted O’Donoghue and Matthew Rabin (1999a, 1999b, 2000) have written a series of papers that examine the differences between naifs and sophisticates, developing examples where naifs and sophisticates behave in radically different ways. Their most recent paper explores the issue of retirement saving. They show that naive hyperbolics may perpetually postpone asset reallocation decisions, generating sizeable welfare costs. Each one-period postponement seems optimal, because the naif mistakenly expects some future self to undertake the reallocation. Naifs models do not exhibit any of the pathologies that we have discussed (e.g., nonmonotonic consumption functions). If consumers do not recognize that their own preferences are dynamically inconsistent, they will not have any incentive to act strategically vis-`a-vis their own future selves. However, this solution to the pathology problem requires that consumers be completely naive about their own future preferences. Any partial knowledge of future dynamic inconsistency reinstates the pathologies. O’Donoghue and Rabin (2000) also propose an intermediate model in which decision-makers partially recognize their propensity to be hyperbolic in the future. Speciﬁcally, in this intermediate model, the actor believes that future ˆ Sophisticates hold correct expectations selves will have a β value equal to β. ˆ about the future value of β, so β = β. Naifs incorrectly believe that future selves will hold preferences consistent with the long-run interests of the current self, implying βˆ = 1. Partial naifs lie between these extremes, so β < βˆ < 1. 10. NORMATIVE ANALYSIS AND POLICY IMPLICATIONS Welfare and policy analysis can be problematic in hyperbolic models. The crux of the difﬁculty is the lack of a clear welfare criterion.

286

Harris and Laibson

The most traditional perspective has been adopted by Phelps and Pollak (1968) and Laibson (1996, 1997a). These authors take the multiple self framework literally, and simply apply the Pareto criterion for welfare analysis. If one allocation makes all selves as least as well off as another allocation, then the former allocation Pareto dominates the latter allocation. Even this very strong welfare criterion opens the door to interesting welfare analysis. It is typically the case that the equilibrium allocation in a hyperbolic model is Pareto-inferior to other feasible allocations that will not arise in equilibrium. These Paretodominant allocations can be attained only with a commitment technology. We turn to such commitment technologies (and the corresponding policies that support them) in the next subsection. O’Donoghue and Rabin adopt a different approach to welfare analysis. They argue that the right welfare perspective is the long-run perspective. Speciﬁcally, they rank allocations using the welfare of an agent with no hyperbolicity (i.e., β = 1). In the long-run, all selves discount exponentially. So, all past selves want future selves to discount exponentially. In this sense, β = 1 is the right discounting assumption if we adopt the preferences of some “earlier” self (say at birth). Another way to motivate this welfare criterion is to ask what discount function you would advise someone else to use. Typically, we urge others to act patiently, suggesting that we normatively discourage short-run impulsivity. In the language of these models, this advice recommends β = 1. Recently, Caplin, and Leahy (2000) have suggested another criterion. They take the multiple self framework literally and suggest a utilitarian approach. Speciﬁcally, they argue that a sensible welfare criterion would weight the welfare of all of the selves. This approach produces challenging implications. Speciﬁcally, if later selves get roughly the same weight as early selves, then late consumption should matter much more than early consumption. To see why, consider the following two-period example. Self 1 cares about periods 1 and 2 (with equal weights). Self 2 cares only about period 2. Then period 2 should get twice the weight of period 1 in the social planner’s welfare function. Late consumption beneﬁts both selves, whereas early consumption beneﬁts only self 1. At the moment, there is no consensus framework for measuring welfare in multiple self models. However, the different approaches reviewed herein usually give similar answers to policy questions. All of the competing welfare criteria imply that equilibrium allocations in economies without commitment typically generate savings rates that are too low (i.e., higher savings allocations would improve social welfare). This implication follows almost immediately once one adopts a welfare criterion in which β = 1 (O’Donoghue and Rabin) or once one adopts the utilitarian perspective of Caplin and Leahy. Equilibrium allocations also tend to be Pareto-inferior because the static gains of high consumption rates in the short run (gains to the current self) tend to be overwhelmed by the dynamic losses of low savings rates in the long-run steady state (dynamic losses to the expected utility of the current self). Recall that hyperbolic consumers have low long-run discount rates. Hence, the long-run outcomes matter a great deal to the welfare of the current hyperbolic consumer (Laibson 1996). A commitment

Hyberbolic Discounting and Consumption

287

to a savings rate slightly above the equilibrium savings rate will raise the welfare of all selves. 10.1.

The Value of Commitment

Sophisticated hyperbolic consumers are motivated to choose policies that commit the behavior of future selves. Moreover, such commitment devices can raise the welfare of all selves if the commitment locks in patient long-run behavior. Even naive consumers will beneﬁt from commitment, although they will not appreciate these beneﬁts at the time they are being locked into a patient behavioral regime. However, these naive agents may not mind such commitments (ex ante), because they incorrectly expect future selves to act patiently anyway. In a world of sophisticated hyperbolic consumers, the social planner’s goal is to make commitment possible, rather than imposing it on consumers.35 Sophisticated consumers understand the value of commitment and will adopt such commitments when it is in their interest. Hence, a 401(k), which is voluntary, might be viewed as a useful commitment device for a sophisticated hyperbolic consumer.36 Laibson (1996) and LRT (1998) measure the welfare consequences of providing voluntary commitment technologies, like 401(k)’s, to sophisticated hyperbolic consumers. By contrast, in a world of unsophisticated consumers (i.e., naifs), a benevolent government may want to impose commitment on consumers.37 Social security, with its universal coverage and illiquid “balances,” can be viewed as such a commitment. 11. EXTENSIONS 11.1.

Asset Uncertainty

Our simulation results reported in Section 7 demonstrate that hyperbolic consumption functions become less irregular as more noise is added to the model. The analysis in Section 7 explores the case in which the noise comes from stochastic labor income. Another natural source of noise is the asset return process. In the analysis, we assumed that the asset return process was deterministic. Incorporating random returns into the model will generate four likely beneﬁts. First, when pathologies (e.g., nonmonotonic consumption functions) do arise, those pathologies will probably be less pronounced when asset returns are stochastic. Second, pathologies will be less likely to arise in the ﬁrst place. 35 36 37

Commitment technologies typically make all selves better off. 401(k)’s are deﬁned contribution pension accounts available in most U. S. ﬁrms. These accounts have a penalty for “early” withdrawal (e.g., before age 59 12 ). Naturally, there are excellent reasons to be wary of activist governments. Much political activity is directed toward rent seeking. Moreover, even a benevolent social planner needs to worry about the disincentives and distortions that arise when well-intentioned politicians tax productive activities to pay for new social programs.

288

Harris and Laibson

Third, once asset return variability is added to the model, we may be able to prove more general theorems. For example, without asset return variability, we can show that, as β → 1, the consumption function becomes monotonic and continuous on an absorbing interval of cash on hand. An absorbing interval is a range of cash-on-hand values that, in equilibrium, the consumer will never leave. With asset return variability, we conjecture that we will be able to show that, as β → 1, the consumption function becomes monotonic and continuous on the entire state space. This more general theorem reﬂects the fact that asset return uncertainty scales up with ﬁnancial wealth, in contrast to labor income uncertainty that does not scale with ﬁnancial wealth. Finally, adding asset uncertainty will enable us to model multiasset state spaces as long as each asset has some idiosyncratic variability. In this setting, we expect to be able to prove the existence, uniqueness, and regularity of equilibria using variants of the techniques developed in Harris and Laibson (2001a). 11.2.

Continuous-Time Hyperbolic Models

Continuous-time modeling provides a more robust way of eliminating pathologies like nonmonotonic consumption functions (Harris and Laibson, 2001b). To motivate the continuous-time formalism, recall the discrete time set-up. In the standard discrete-time formulation of quasi-hyperbolic preferences, the present consists of the single period t. The future consists of periods t + 1, t + 2, . . .. A period n steps into the future is discounted with factor δ n , and an additional discount factor β is applied to all periods except the present. This model can be generalized in two ways. First, the present can last for any number of periods Tt ∈ {1, 2 . . .}. Second, Tt can be random. The preferences in equation (11.1) are a natural continuous-time analog of this more general formulation. Speciﬁcally, the preferences of self t are given by

+∞ t+Tt −γ (s−t) −γ (s−t) e U (c(s))ds + α e U (c(s))ds , (11.1) Et t

t+Tt

where γ ∈ (0, +∞), α ∈ (0, 1], U : (0, +∞) → R, and Tt is distributed exponentially with parameter λ ∈ [0, +∞). In other words, self t uses a stochastic discount function, namely −γ (s−t) $ e if s ≤ t + Tt Dλ (t, s) = . αe−γ (s−t) if s > t + Tt This stochastic discount function decays exponentially at rate γ up to time t + Tt , drops discontinuously at t + Tt to a fraction α of its level just prior to t + Tt , and decays exponentially at rate γ thereafter. Figure 7.8 plots a single realization of this discount function, with t = 0 and Tt = 3.4. Figure 7.9 plots the expected value of the discount function, namely E t Dλ (t, s) = e−λ(s−t) e−γ (s−t) + (1 − e−λ(s−t) )αe−γ (s−t) , for λ ∈ {0, 0.1, 1, 10, ∞}.

Hyberbolic Discounting and Consumption

289

1

Discount function

0.8

0.6

0.4

0.2

Present Future 0

0

1

2

3 4 Realization of T

5 6 7 8 9 10 Time gap between future period and present period

Figure 7.8. Realization of the discount function (α = 0.7, γ = 0.1).

This continuous-time formalization is close to the deterministic functions used in Barro (1999) and Luttmer and Mariotti (2000). However, Harris and Laibson (2001b) assume that Tt is stochastic. The stochastic transition with constant hazard rate reduces the problem to a system of two differential equations that characterize present and future value functions.

Expected value of discount function

1•

λ = ∞ (instantaneous gratification; i.e., with jump at 0) λ = 10 λ=1

0.8

λ = 0.1

°

λ = 0 (exponential discounting)

0.6

0.4

0.2

0 0

1

2

3

4 5 6 Time to discounted period

7

8

9

10

Figure 7.9. Expected value of the discount function for λ ∈ {0, 0.1, 1, 10, ∞}.

290

Harris and Laibson

When λ = 0, the discount function is equivalent to a standard exponential discount function. As λ → ∞, the discount function converges to a jump function, namely $ 1 if s = t D∞ (t, s) = . αe−γ (s−t) if s > t This limit case is both analytically tractable and psychologically relevant. In this “instantaneous gratiﬁcation” case, the present is vanishingly short. Individuals prefer consumption in the present instant discretely more than consumption in the momentarily delayed future. The lessons from this model carry over, by continuity, to the neighborhood of models in which the present is short, but not precisely instantaneous (i.e., λ large). The instantaneous gratiﬁcation model, which is dynamically inconsistent, shares the same value function as a related dynamically consistent optimization problem with a wealth-contingent utility function. Using this partial equivalence, Harris and Laibson (2001b) prove that the hyperbolic equilibrium exists and is unique. The associated equilibrium consumption functions are continuous and monotonic in wealth. The monotonicity property relies on the condition that the long-run discount rate is weakly greater than the interest rate. For this case, all of the pathological properties of discrete-time hyperbolic models are eliminated. 12. CONCLUSIONS We have characterized the consumption behavior of hyperbolic consumers. The hyperbolic model provides two payoffs. First, it provides an analytically tractable, parsimonious foundation with which to analyze self-control problems. Second, it is easily calibrated, providing precise numerical predictions that can be empirically evaluated in competition with mainstream models. We have shown that the hyperbolic model successfully matches empirical observations on household balance sheets and consumption choices. Relative to exponential households, hyperbolic households hold low levels of liquid wealth measured either as a fraction of labor income or as a share of total wealth. Hyperbolic households borrow more aggressively in the revolving credit market (i.e., on credit cards), but they save more actively in illiquid assets. Because the hyperbolic households have low levels of liquid assets and high levels of credit card debt, they are unable to smooth their consumption paths in the presence of predictable changes in income. Calibrated hyperbolic simulations explain observed levels of consumption-income comovement and the drop in consumption at retirement. Calibrated hyperbolic simulations generate “excess sensitivity” coefﬁcients of approximately 0.20, very close to empirical coefﬁcients estimated from household data. More generally, the hyperbolic model provides a good formal foundation for the study of self-defeating behaviors. Economists usually assume that rational agents will act in their own interests. Hyperbolic agents may hold rational

Hyberbolic Discounting and Consumption

291

expectations, but they will rarely make efﬁcient choices. Puzzling and important self-defeating behaviors like undersaving, overeating, and procrastination lose some of their mystery when analyzed with the hyperbolic model. ACKNOWLEDGMENTS We thank Glenn Ellison for numerous helpful suggestions. We also thank George-Marios Angeletos, Andrea Repetto, Jeremy Tobacman, and Stephen Weinberg, whose ideas and work are reﬂected in this paper. Laura Serban provided outstanding research assistance. David Laibson acknowledges ﬁnancial support from the National Science Foundation (SBR-9510985), the Olin Foundation, the National Institute on Aging (R01-AG-1665), and the MacArthur Foundation. APPENDIX PROOF OF THEOREM 4.5 Fix x0 ∈ [0, +∞). Suppose that W1 and W2 are two equilibrium current-value functions, and let S1 and S2 be the associated saving functions. Put h = U ◦ g. Then, (BW1 )(x) = U (x − S1 (x)) + δ (W1 − εh ◦ W1 )(R S1 (x) + y) f (y)dy = U (x − S1 (x)) + δ (W2 − εh ◦ W2 )(R S1 (x) + y) f (y)dy + δ (W1 − W2 )(R S1 (x) + y) f (y)dy − εδ (h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy ≤ (BW2 )(x) + δ (W1 − W2 )(R S1 (x) + y) f (y)dy − εδ (h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy. Hence, to obtain an upper bound for (BW1 )(x) − (BW2 )(x), it sufﬁces to estimate the expressions (W1 − W2 )(R S1 (x) + y) f (y)dy (A.1) and

(h ◦ W1 − h ◦ W2 )(R S1 (x) + y) f (y)dy.

(A.2)

In doing so, we shall make use of the estimates of Theorem 4.3 that apply to W1 and W2 . In particular, the constant K and the functions N1 and N2 used herein are taken from that theorem.

292

Harris and Laibson

Expression (A.1) is easy to estimate. Because S1 is an equilibrium-saving function, RS1 (x) + y ∈ [y, X¯ ] for all x ∈ [y, X¯ ] and all y ∈ [y, y¯ ]. Hence (W1 − W2 )(R S1 (x) + y) f (y)dy ≤ #W1 − W2 #c([y, X¯ ]) for all x ∈ [y, X¯ ], where #W1 − W2 #c([y, X¯ ]) = sup |W1 (x) − W2 (x)|. x∈[y, X¯ ]

Expression (A.2) requires more care. Put Wφ (x) = (1 − φ)W1 (x) + φW2 (x). Then h(W2 (x)) − h(W1 (x)) = and

1 0

((W2 − W1 )h ◦ Wφ )(x)dφ

(h ◦ W2 − h ◦ W1 )(R S1 (x) + y) f (y)dy 1

= ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dφ dy 0

1

= 0

Moreover,

((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy dφ.

((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy = − ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy − ((W2 − W1 )h

◦ Wφ )(R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy) (on integrating by parts). Hence, to estimate expression (A.2), we need to estimate the expressions (A.3) ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy and

((W2 − W1 )h

◦ Wφ )(R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy).

(A.4)

Expression (A.3) can be estimated as follows. First, note that, because S1 is an equilibrium-saving function, R S1 (x) + y ∈ [y, X¯ ] for all x ∈ [y, X¯ ] and all

Hyberbolic Discounting and Consumption

293

y ∈ [y, y¯ ]. Second, put λ = min U (x) and λ¯ = max U (x) ∨ N1 (x). x∈[y, X¯ ]

x∈[y, X¯ ]

¯ Third, note that, because Wφ ∈ [U , U ∨ N1 ] for all φ ∈ [0, 1], Wφ ∈ [λ, λ] for all φ ∈ [0, 1]. Hence, ((W2 − W1 )h ◦ Wφ )(R S1 (x) + y) f (y)dy

¯ − y) ≤ #W1 − W2 #c([y, X¯ ]) #h #c([λ, λ]) ¯ # f #c([y, y¯ ]) ( y

for all x ∈ [y, X¯ ]. Expression (A.4) can be estimated as follows. First, as in the case of expression (A.3), we have ((W2 − W1 )h

◦ Wφ ) (R S1 (x) + y) f (y)Wφ

(R S1 (x) + dy)

≤ #W1 − W2 #c([y, X¯ ]) #h

#c([λ, λ]) ¯ # f #c([y, y¯ ]) #Wφ #τ ν([y, X¯ ])

for all x ∈ [y, X¯ ], where #Wφ

#τ ν([y, X¯ ]) denotes the total variation of the measure Wφ

on the interval [y, X¯ ]. Second, put µ(d x) = K N2 (x) d x

and

˜ φ

(d x) = Wφ

(d x) + µ(d x). W

Then, ˜ φ

− µ#τ ν([y, X¯ ]) ≤ #W ˜ φ

#τ ν([y, X¯ ]) + #µ#τ ν([y, X¯ ]) #Wφ

#τ ν([y, X¯ ]) = #W ˜ φ

(d x) + W = µ(d x) =

[y, X¯ ]

[y, X¯ ]

[y, X¯ ]

Wφ

(d x) + 2

[y, X¯ ]

µ(d x)

˜

φ and µ are both positive measures, and by deﬁnition of W ˜

φ ) (because W K N2 (x) d x = Wφ ( X¯ +) − Wφ (y−) + 2 [y, X¯ ]

≤ λ¯ − λ + 2

[y, X¯ ]

K N2 (x) d x.

Combining our estimates for expressions (A.1), (A.3) and (A.4), we obtain (BW1 )(x) − (BW2 ) (x) ≤ δ#W1 − W2 #c([y, X¯ ])

¯ − y) + εδ#W1 − W2 #c([y, X¯ ]) #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) ( y

+ εδ#W1 − W2 #c([y, X¯ ]) #h

#c([λ,λ]) ¯ # f #c([y, y¯ ]) ¯ × λ−λ+2 K N2 (x) d x [y, X¯ ]

294

Harris and Laibson

for all x ∈ [y, X¯ ]. Combining this estimate with the analogous estimate for (BW2 ) (x) − (BW1 )(x), we obtain #BW1 − BW2 #c([y, X¯ ]) ≤ δ(1 + εL)#W1 − W2 #c([y, X¯ ]) , where

¯ − y) L = #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) ( y

¯ + #h #c([λ,λ]) ¯ # f #c([y, y¯ ]) λ − λ + 2

[y, X¯ ]

K N2 (x) d x .

It follows that, if

$ 1−δ ε < min 1 − β¯ 1 , , δL then #BW1 − BW2 #c([y, X¯ ]) = 0. In other words, W1 and W2 coincide on [y, X¯ ].

References Ainslie, G. (1992), Picoeconomics. Cambridge UK: Cambridge University Press. Akerlof, G. A. (1991), “Procrastination and Obedience,” American Economic Review (Papers and Proceedings), 81, 1–19. Altonji, J. and A. Siow (1987), “Testing the Response of Consumption to Income Changes with (Noisy) Panel Data,” Quarterly Journal of Economics, 102(2), 293–328. Angeletos, G.-M., D. Laibson, A. Repetto, J. Tobacman, and S. Weinberg (2001a), “The Hyperbolic Buffer Stock Model: Calibration, Simulation, and Empirical Evaluation National Bureau of Economic Research,” Working Paper. Angeletos, G.-M., D. Laibson, A. Repetto, J. Tobacman, and S. Weinberg (2001b), “The Hyperbolic Consumption Model: Calibration, Simulation, and Empirical Evaluation,” Journal of Economic Perspectives, 15, 47–68. Attanasio, O. (1999), “Consumption,” in Handbook of Macroeconomics, (ed. by J. Taylor and M. Woodford), Amsterdam: North-Holland. Attanasio, O. and G. Weber (1993), “Consumption Growth, the Interest Rate, and Aggregation,” Review of Economic Studies, 60(3), 631–649. Attanasio, O. and G. Weber (1995), “Is Consumption Growth Consistent with Intertemporal Optimization? Evidence from the Consumer Expenditure Survey,” Journal of Political Economy, 103(6), 1121–1157. Banks, J., R. Blundell, and S. Tanner (1998), “Is There a Retirement Puzzle?” American Economic Review, 88(4), 769–788. Barro, R. (1999), “Laibson Meets Ramsey in the Neoclassical Growth Model,” Quarterly Journal of Economics, 114(4), 1125–1152. Benabou, R. and J. Tirole (2000), “Willpower and Personal Rules,” mimeo. Bernheim, B. D., J. Skinner, and S. Weinberg (1997), “What Accounts for the Variation in Retirement Wealth among U.S. Households?” Working Paper 6227. Cambridge, MA: National Bureau of Economic Research. Blundell, R., M. Browning, and C. Meghir (1994), “Consumer Demand and the LifeCycle Allocation of Household Expenditures,” Review of Economic Studies, 61, 57–80.

Hyberbolic Discounting and Consumption

295

Brocas, I. and J. Carrillo (2000), “The Value of Information when Preferences are Dynamically Inconsistent,” European Economic Review, 44, 1104–1115. Brocas, I. and J. Carrillo (2001), “Rush and Procrastination under Hyperbolic Discounting and Interdependent Activities,” Journal of Risk and Uncertainty, 22, 141– 164. Browning, M. and A. Lusardi (1996), “Household Saving: Micro Theories and Micro Facts,” Journal of Economic Literature, 32, 1797–1855. Carrillo, J. and M. Dewatripont (2000), “Promises, promises, . . . ,” mimeo. Carrillo, J. and T. Mariotti (2000), “Strategic Ignorance as a Self-Disciplining Device,” Review of Economic Studies, 67, 529–544. Carroll, C. D. (1992), “The Buffer Stock Theory of Saving: Some Macroeconomic Evidence,” Brookings Papers on Economic Activity, 2, 61–156. Carroll, C. D. (1997), “Buffer-Stock Saving and the Life Cycle/Permanent Income Hypothesis,” Quarterly Journal of Economics, 112, 1–57. Carroll, C. D. and M. Kimball (1996), “On the Concavity of the Consumption Function,” Econometrica, 64(4), 981–992. Chung, S.-H. and R. J. Herrnstein (1961), “Relative and Absolute Strengths of Response as a Function of Frequency of Reinforcement,” Journal of the Experimental Analysis of Animal Behavior, 4, 267–272. Deaton, A. (1991), “Saving and Liquidity Constraints,” Econometrica, 59, 1221–1248. Della Vigna, S. and U. Malmendier (2001), “Long Term Contracts and Self Control,” mimeo. Della Vigna, S. and D. Paserman (2000), “Job Search and Hyperbolic Discounting,” mimeo. Diamond, P. and B. Koszegi (1998), “Hyperbolic Discounting and Retirement,” mimeo, MIT. Engen, E., W. Gale, and J. K. Scholz (1994), “Do Saving Incentives Work,” Brookings Papers on Economic Activity, 1, 85–180. Frederick, S., G. Loewenstein, and E. O’Donoghue (2001), “Time Discounting: A Critical Review,” mimeo. Gourinchas, P.-O. and J. Parker (1999), “Consumption over the Life-Cycle,” mimeo. Gross, D. and N. Souleles (1999a), “An Empirical Analysis of Personal Bankruptcy and Delinquency,” Mimeo. Gross, D. and N. Souleles (1999b), “How Do People Use Credit Cards?” mimeo. Gross, D. and N. Souleles (2000), “Consumer Response to Changes in Credit Supply: Evidence from Credit Card Data,” mimeo. Gruber, J. and B. Koszegi (2001), “Is Addiction ‘Rational’?: Theory and Evidence,” Quarterly Journal of Economics, 116, 1261–1303. Hall, R. E. (1978), “Stochastic Implications of the Life Cycle–Permanent Income Hypothesis: Theory and Evidence,” Journal of Political Economy, 86(6), 971–987. Hall, R. E. and F. S. Mishkin (1982), “The Sensitivity of Consumption to Transitory Income: Estimates from Panel Data on Households,” Econometrica, 50(2), 461– 481. Harris, C. and D. Laibson (2001a), “Dynamic Choices of Hyperbolic Consumers,” Econometrica, 69, 935–957. Harris, C. and D. Laibson (2001b), “Instantaneous Gratiﬁcation,” mimeo. Hayashi, F. (1985), “The Permanent Income Hypothesis and Consumption Durability: Analysis Based on Japanese Panel Data,” Quarterly Journal of Economics, 100(4), 1083–1113.

296

Harris and Laibson

Hubbard, G., J. Skinner, and S. Zeldes (1994), “The Importance of Precautionary Motives in Explaining Individual and Aggregate Saving,” Carnegie–Rochester Conference Series on Public Policy, 40, 59–125. Hubbard, G., J. Skinner, and S. Zeldes (1995), “Precautionary Saving and Social Insurance,” Journal of Political Economy, 103, 360–399. King, G. R. and A. W. Logue (1987), “Choice in a Self-Control Paradigm with Human Subjects: Effects of Changeover Delay Duration,” Learning and Motivation, 18, 421– 438. Kirby, K. N. (1997), “Bidding on the Future: Evidence Against Normative Discounting of Delayed Rewards,” Journal of Experimental Psychology, 126, 54–70. Kirby, K. and R. J. Herrnstein (1995), “Preference Reversals Due to Myopic Discounting of Delayed Reward,” Psychological Science, 6(2), 83–89. Kirby, K. and N. N. Marakovic (1995), “Modeling Myopic Decisions: Evidence for Hyperbolic Delay-Discounting within Subjects and Amounts,” Organizational Behavior and Human Decision Processes, 64(1), 22–30. Kirby, K. and N. N. Marakovic (1996), “Delayed-Discounting Probabilistic Rewards Rates Decrease as Amounts Increase,” Psychonomic Bulletin and Review, 3(1), 100– 104. Krusell, P. and A. Smith (2000), “Consumption and Savings Decisions with QuasiGeometric Discounting,” mimeo. Krusell, P., B. Kuruscu, and A. Smith (2000a), “Equilibrium Welfare and Government Policy with Quasi-Geometric Discounting,” mimeo. Krusell, P., B. Kuruscu, and A. Smith (2000b), “Asset Pricing with Quasi-Geometric Discounting,” mimeo. Laibson, D. I. (1994), “Self-Control and Savings,” Ph.D. Dissertation, Massachusetts Institute of Technology. Laibson, D. I. (1996), “Hyperbolic Discounting, Undersaving, and Savings Policy,” Working Paper 5635, Cambridge, MA: National Bureau of Economic Research. Laibson, D. I. (1997a), “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Economics, 112(2), 443–478. Laibson, D. I. (1997b), “Hyperbolic Discount Functions and Time Preference Heterogeneity,” mimeo, Harvard University. Laibson, D. I. (1998), “Comments on Personal Retirement Saving Programs and Asset Accumulation,” by James M. Poterba, Steven F. Venti, and David A. Wise, in Studies in the Economics of Aging, (ed. by David A. Wise), Chicago: NBER and the University of Chicago Press, 106–124. Laibson, D. I., A. Repetto, and J. Tobacman (1998), “Self-Control and Saving for Retirement,” Brookings Papers on Economic Activity, 1, 91–196. Laibson, D. I., A. Repetto, and J. Tobacman (2000), “A Debt Puzzle,” mimeo. Loewenstein, G. and D. Prelec (1992), “Anomalies in Intertemporal Choice: Evidence and an Interpretation,” Quarterly Journal of Economics, 97, 573–598. Lusardi, A. (1996), “Permanent Income, Current Income, and Consumption; Evidence from Two Panel Data Sets,” Journal of Business and Economic Statistics, 14, 81–90. Luttmer, E. and T. Mariotti (2000), “Subjective Discount Factors,” mimeo. Millar, A. and D. J. Navarick (1984), “Self-Control and Choice in Humans: Effects of Video Game Playing as a Positive Reinforcer,” Learning and Motivation, 15, 203–218. Morris, S. and A. Postlewaite (1997), “Observational Implications of Nonexponential Discounting,” mimeo.

Hyberbolic Discounting and Consumption

297

Mulligan, C. (1997), “A Logical Economist’s Argument Against Hyperbolic Discounting,” mimeo, University of Chicago. Navarick, D. J. (1982), “Negative Reinforcement and Choice in Humans,” Learning and Motivation, 13, 361–377. O’Donoghue, T. and M. Rabin (1999a), “Doing It Now or Later,” American Economic Review, 89(1), 103–124. O’Donoghue, T. and M. Rabin (1999b), “Incentives for Procrastinators,” Quarterly Journal of Economics, 114(3), 769–816. O’Donoghue, T. and M. Rabin (2000), “Choice and Procrastination,” Working Paper. Parker, J. A. (1999), “The Reaction of Household Consumption to Predictable Changes in Social Security Taxes,” American Economic Review, 89, 959–973. Phelps, E. S. and R. A. Pollak (1968), “On Second-Best National Saving and GameEquilibrium Growth,” Review of Economic Studies, 35, 185–199. Ramsey, F. (1928), “A Mathematical Theory of Saving,” Economic Journal, December, 38, 543–559. Rankin, D. M. (1993), “How to Get Ready for Retirement: Save, Save, Save,” New York Times, March 13, 33. Read, D., G. Loewenstein, S. Kalyanaraman, and A. Bivolaru (1996), “Mixing Virtue and Vice: The Combined Effects of Hyperbolic Discounting and Diversiﬁcation,” Working Paper, Carnegie Mellon University. Runkle, D. (1991), “Liquidity Constraints and the Permanent-Income Hypothesis: Evidence from Panel Data,” Journal of Monetary Economics, 27(1), 73–98. Shapiro, M. D. and J. Slemrod (1995), “Consumer Response to the Timing of Income: Evidence from a Change in Tax Withholding,” American Economic Review, 85(1), 274–283. Shea, J. (1995), “Union Contracts and the Life-Cycle/Permanent Income Hypothesis,” American Economic Review, 85(1), 186–200. Simmons Market Research Bureau (1996), The 1996 Study of Media and Markets. New York. Souleles, N. (1999), “The Response of Household Consumption to Income Tax Refunds,” American Economic Review, 89, 947–958. Strotz, R. H. (1956), “Myopia and Inconsistency in Dynamic Utility Maximization,” Review of Economic Studies, 23, 165–180. Thaler, R. H. (1981), “Some Empirical Evidence on Dynamic Inconsistency,” Economics Letters, 8, 201–207. Thaler, R. H. (1992), “Saving, Fungibility, and Mental Accounts,” in The Winner’s Curse, Princeton, NJ: Princeton University Press, 107–121. Thaler, R. H. and H. M. Shefrin (1981), “An Economic Theory of Self-Control,” Journal of Political Economy, 89, 392–410. Zeldes, S. P. (1989a), “Consumption and Liquidity Constraints: An Empirical Investigation,” Journal of Political Economy, 97(2), 305–346. Zeldes, S. P. (1989b), “Optimal Consumption with Stochastic Income: Deviations from Certainty Equivalence,” Quarterly Journal of Economics, 104(2), 275–298.

A Discussion of the Papers by Ernst Fehr and Klaus M. Schmidt and by Christopher Harris and David Laibson Glenn Ellison

It was a pleasure to serve as the discussant for this session. The authors have played a major role in developing the areas under discussion. The papers they produced for this volume are insightful and will help shape the emerging literature. The papers are excellent. I feel fortunate that my task is to comment and not to criticize. One aspect of this session I found striking was the degree of agreement on how to deﬁne and organize a subﬁeld of behavioral economics. In each case, the authors focus on one way in which real-world behavior departs from the standard model of rational self-interested behavior. They begin by mentioning results from a number of experiments showing systematic departures from the standard model. Although Fehr and Schmidt spend a fair amount of time distinguishing between broad classes of fairness models, both sets of authors advocate the use of simple tractable models that reﬂect essential features of behavior. In the ﬁnal two paragraphs of the conclusions, both papers argue that behavioral economics models can add much to our understanding of important economic problems. The similarity in the authors’ perspectives makes one class of comments easy – one can look at the nice features of each subﬁeld and suggest that the other might try to do something similar. For example, one way in which the hyperbolic discounting literature seemed ahead of the fairness literature to me is that the application to consumption is well developed theoretically and empirically to the point of being undeniably a part of macroeconomics literature. It would be nice to see work on fairness pushing as hard on a single topic and gaining acceptance within an applied community. A feature of the fairness literature I admired is the attention that has been paid to heterogeneity in fairness preferences. Although I know that behavioral economists like to work

Discussion

299

with minimal departures from rationality, I would think that large consumption data sets could provide those interested in hyperbolic discounting with ample degrees of freedom to explore models with heterogeneity in the degree of time inconsistency. The similarity in perspective also makes it worthwhile to try to comment on whether the approach is a good one. I think it is. The one comment I would make, however, is that the “behavioral” organization of papers seems to me to have one drawback relative to the way in which most other applied papers are written. The difference I perceive in organization is that the papers focus on how a behavioral assumption can help us understand a large number of facts, rather than on the facts themselves. For example, I would regard a paper on high credit card debt as more applied and less “behavioral” if it focused narrowly on understanding credit card debt and discussed various potential explanations. The relative disadvantage of the behavioral approach is that in any paper that explains many disparate facts, one can worry that there is an implicit selection of facts consistent with the model. For example, the calibration papers discussed in Harris–Laibson did not conduct surveys of people with high credit card debt and ask them directly if they feel that they failed to foresee how quickly the debt would build up. The model “predicts” that there would be no afﬁrmative responses, and I am sure this is not what the survey would yield. Experimental papers are similarly selective because authors must decide which experiments to conduct. Good experimentalists no doubt have a keen intuition for how experiments will turn out, and may tend to carry out only experiments in which they expect that their model will be vindicated. It seems to me that this type of criticism is harder to make of narrower applied papers – there are fewer facts to select among when one is looking at a narrow topic and the presence of competing explanations gives the authors less reason to favor any one potential explanation over the others. As I said previously, the papers for this section are very insightful and very well done. I made a number of other speciﬁc comments at the World Congress, but I really cannot say that many of them merit being written down (especially now that the papers are in print). I therefore thought that I would devote the remainder of my time to talking more broadly about behavioral economics and its situation within the economics profession. Behavioral economics is a potential revolution. If one judges its progress by looking at top journal publications or its success in attracting top economists, it is doing very well. In every ﬁeld of economics, however, it has not yet affected how most work is done. What will be needed for the behavioral economics revolution to succeed? As an outsider, I clearly cannot offer the kind of insight based on a detailed understanding of various branches of the literature that many others could provide. Instead, I will try to approach the question as an amateur sociologist of the economics profession. There have been a number of recent successful revolutions in economic methodology (e.g., game theory and rational expectations). Behavioral economics (and other literatures like that on nonrational learning) may get valuable lessons from studying their progress.

300

Discussion

Any such lessons, of course, will be about what makes ﬁelds take off in our profession and not necessarily about what makes research valuable. One general thought that occurred to me on reﬂecting about revolutions is that the presence of interesting pure theory questions has spurred on many past revolutions. For example, I would argue that the main reason why the Folk Theorem continued to receive so much attention long after the basic principle was known was that economists enjoyed thinking and reading about cleverly constructed strategies. Franklin Fisher (1989) has criticized the ﬁnite-horizon reputation literature as an example of a literature driven by the elegance and sophistication of the analysis. Inﬁnite-horizon models are preferable descriptively and easily give what is probably the appropriate answer: that forming a reputation is possible, but will not necessarily happen. I do not want to debate the value of the reputation literature here. I just want to point out that the literature is extensive, and regardless of what people eventually conclude about the topic, it surely contributed to game theory’s success. Many economists like interesting models, and ﬁelds that can provide them will grow. Behavioral economists who want their ﬁeld to grow should not overlook this aspect of our profession. As a theorist, I really enjoyed reading Harris and Laibson’s discussion of the pathological properties of some hyperbolic models, and was intrigued by Fehr and Schmidt’s comment that workers who care more about fairness will shirk more in their contracting game. Undoubtedly, there are many other similar opportunities. In thinking about the theory/empirical divide, another thought that occurred to me (and here I am less conﬁdent in my casual empiricism) is that positive empirical evidence was not really a part of the takeoff of past revolutions. Empirical puzzles or shortcomings of the previous literature seem sometimes to be important. For example, the rational expectations revolution in macroeconomics was surely spurred by empirical evidence on the Phillips curve. The success of a new literature investigating failures of the old, however, seems not to be wrapped up with empirically demonstrating the superiority of new ideas. In the case of information economics, for example, the attention that Akerlof’s (1970) lemons model and Spence’s (1973) job market signaling model attracted was not due to any demonstration that a set of car prices or educational attainments were well understood using the models. In industrial organization, the game-theoretic literature exploded, whereas the empirical examination of such models proceeded at a much more leisurely pace. It seems that initial bursts of applied theory work have transformed ﬁelds and made them accepted long before any convincing empirical evidence is available. I would conclude that if behavioral economists want their revolution to occur, they might be well served to focus on producing applied theory papers that economists in various ﬁelds will want to teach their students. There are other ways in which behavioral economists seem to be taking a different approach from past revolutions. They are spending much more time developing experimental support for their assumptions. They seem to spend much more time highlighting contrasts with the existing literature than did

Discussion

301

participants in earlier revolutions. [For example, Spence (1973) has only two references and Milgrom and Roberts (1982) only mention the decades-long legal and economic debates on predation in the ﬁrst paragraph.] It seems hard to say, however, whether the leaders of previous revolutions would have taken advantage of experiments had they been easier to conduct, or if they would have been forced to write differently had they faced today’s review process. To conclude, I would like to come back to a question I have carefully avoided. Should behavioral economists follow the advice I’ve given? My observations only concerned what seems to make economic revolutions successful, not what work should be valued. Personally, I like pure theory and think that one of the nice features of academia is that we can stop to think about interesting issues that arise. I am happy that the profession seems to value such work. Personally, I also ﬁrmly believe that empirical work that helps us assess the applicability of new (and old theories) is extremely important. For example, I regard the work that Laibson and coauthors are carrying out on consumption as perhaps the most valuable work in the ﬁeld. Thus, I would like to say that, while studying the progress of past revolutions may provide behavioral economists with valuable insights into how they can succeed, I hope that they do not pay too much attention to this and let a preoccupation with success get in the way of doing important work.

References Akerlof, G. A. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. Fisher, F. M. (1989), “Games Economists Play,” Rand Journal of Economics, 20, 113– 124. Milgrom, P. R. and J. Roberts (1982), “Predation, Reputation and Entry Deterrence,” Journal of Economic Theory, 27, 280–312. Spence, A. M. (1973), “Job Market Signalling,” Quarterly Journal of Economics, 87, 355–374.

CHAPTER 8

Agglomeration and Market Interaction Masahisa Fujita and Jacques-Fran¸cois Thisse

1. INTRODUCTION The most salient feature of the spatial economy is the presence of a large variety of economic agglomerations. Our purpose is to review some of the main explanations of this universal phenomenon, as they are proposed in urban economics and modern economic geography. Because of space constraints, we restrict ourselves to the most recent contributions, referring the reader to our forthcoming book for a more complete description of the state of the art. Although using agglomeration as a generic term is convenient at a certain level of abstraction, it should be clear that the concept of economic agglomeration refers to very distinct real-world situations. At one extreme lies the core-periphery structure corresponding to North-South dualism. For example, Hall and Jones (1999) observe that high-income nations are clustered in small industrial cores in the Northern Hemisphere, whereas productivity per capita steadily declines with distance from these cores. As noted by many historians and development analysts, economic growth tends to be localized. This is especially well illustrated by the rapid growth of East Asia during the last few decades. We view East Asia as comprising Japan and nine other countries, that is, Republic of Korea, Taiwan, Hong Kong, Singapore, Philippines, Thailand, Malaysia, Indonesia, and China. In 1990, the total population of East Asia was approximately 1.6 billion. With only 3.5 percent of the total area and 7.9 percent of the total population, Japan accounted for 72 percent of the gross domestic product (GDP) and 67 percent of the manufacturing GDP of East Asia. In Japan itself, the economy is very much dominated by its core regions formed by the ﬁve prefectures containing the three major metropolitan areas of Japan: Tokyo and Kanagawa prefectures, Aichi prefecture (containing Nagoya MA), and Osaka and Hyogo prefectures. These regions account for only 5.2 percent of the area of Japan, but for 33 percent of its population, 40 percent of its GDP, and 31 percent of its manufacturing employment. Hence, for the whole of East Asia, the Japanese core regions with a mere 0.18 percent of the total area accounted for 29 percent of East Asia’s GDP.

Agglomeration and Market Interaction

303

Strong regional disparities within the same country imply the existence of agglomerations at another spatial scale. For example, in Korea, the capital region (Seoul and Kyungki Province), which has an area corresponding to 11.8 percent of the country and 45.3 percent of the population, produces 46.2 percent of the GDP. In France, the contrast is even greater: the Ile-de-France (the metropolitan area of Paris), which accounts for 2.2 percent of the area of the country and 18.9 percent of its population, produces 30 percent of its GDP. Inside the Ilede-France, only 12 percent of the available land is used for housing, plants, and roads, with the remaining land being devoted to agricultural, forestry, or natural activities. Regional agglomeration is also reﬂected in large varieties of cities, as shown by the stability of the urban hierarchy within most countries. Cities themselves may be specialized in a very small number of industries, as are many mediumsized American cities. However, large metropolises like Paris, New York, or Tokyo are highly diversiﬁed in that they nest a large variety of industries, which are not related through direct linkages. Industrial districts involving ﬁrms with strong technological and/or informational linkages (e.g., the Silicon Valley or Italian districts engaged in more traditional activities), as well as factory towns (e.g., Toyota City), manifest various types of local specialization. Therefore, it appears that highly diverse size/activity arrangements exist at the regional and urban levels. Although the sources are dispersed, not always trustworthy, and hardly comparable, data clearly converge to show the existence of an urban revolution. In Europe, the proportion of the population living in cities increased very slowly from 10 percent in 1300 to 12 percent in 1800. It was approximately 20 percent in 1850, 38 percent in 1900, 52 percent in 1950, and is close to 75 percent nowadays, thus showing an explosive growth in the urban population. In the United States, the rate of urbanization increased from 5 percent in 1800 to more than 60 percent in 1950 and is now near 77 percent. In Japan, the rate of urbanization was about 15 percent in 1800, 50 percent in 1950, and is now about 78 percent. The proportion of the urban population in the world increased from 30 percent in 1950 to 45 percent in 1995 and should exceed 50 percent in 2005. Furthermore, concentration in very big cities keeps rising. In 1950, only two cities had populations above 10 million: New York and Greater London. In 1995, 15 cities belonged to this category. The largest one, Tokyo, with more than 26 million, exceeds the second one, New York, by 10 million. In 2025, 26 megacities will exceed 10 million. Economists must explain why ﬁrms and households concentrate in large metropolitan areas, whereas empirical evidence suggests that the cost of living in such areas is typically higher than in smaller urban areas (Richardson, 1987). Or, as Lucas (1988, p. 39) put it in a neat way: “What can people be paying Manhattan or downtown Chicago rents for, if not for being near other people?” But Lucas did not explain why people want, or need, to be near other people.

304

Fujita and Thisse

The increasing availability of high-speed transportation infrastructure and the fast-growing development of new informational technologies might suggest that our economies enter an age that would culminate in the “death of distance.” If so, locational difference would gradually fade because agglomeration forces would be vanishing. In other words, cities would become a thing of the past. Matters are not that simple, however, because the opposite trend may as well happen.1 Indeed, one of the general principles that will come out from our analysis is that the relationship between the decrease in transport costs and the degree of agglomeration of economic activities is not that expected by many analysts: agglomeration happens provided that transport costs are below some critical threshold, although further decreases may yield dispersion of some activities due to factor price differentials.2 In addition, technological progress brings about new types of innovative activities that beneﬁt most from being agglomerated and, therefore, tend to arise in developed areas (Audretsch and Feldman, 1996). Consequently, the wealth or poverty of people seems to be more and more related to the existence of prosperous and competitive clusters of speciﬁc industries, as well as to the presence of large and diversiﬁed metropolitan areas. The recent attitude taken in several institutional bodies and media seems to support this view. For example, in its Entering the 21st Century: World Development Report 1999/2000, the World Bank stresses the importance of economic agglomerations and cities for boosting growth and escaping from the poverty trap. Another example of this increasing awareness of the relevance of cities in modern economies can be found in The Economist (1995, p. 18): The liberalization of world trade and the inﬂuence of regional trading groups such as NAFTA and the EU will not only reduce the powers of national governments, but also increase those of cities. This is because an open trading system will have the effect of making national economies converge, thus evening out the competitive advantage of countries, while leaving those of cities largely untouched. So in the future, the arenas in which companies will compete may be cities rather than countries.

The remainder of this paper is organized as follows. In Section 2, we show why the competitive framework can hardly be the foundation for the economics of agglomeration. We then brieﬂy review the alternative modeling strategies. In the hope to make our paper accessible to a broad audience, Section 3 presents in detail the two (speciﬁc) models that have been used so far to study the spatial distribution of economic activities. Several extensions of these models 1

2

For example, recent studies show that, in the United States, 86 percent of net delivery capacity is concentrated in the 20 largest cities. This suggests that the United States is quickly becoming a country of digital haves and have-nots, with many small businesses unable to compete, and minority neighborhoods and rural areas getting left out. Transportation (or transfer) costs are broadly deﬁned to include all the factors that drive a wedge between prices at different locations, such as shipping costs per se, tariff and nontariff barriers to trade, different product standards, difﬁculty of communication, and cultural differences.

Agglomeration and Market Interaction

305

are discussed in Section 4. Section 5 concludes with some suggestions for further research and policy implications. 2. MODELING STRATEGIES OF ECONOMIC AGGLOMERATIONS As a start, it is natural to ask the following question: to what extent is the competitive paradigm useful in understanding the main features of the economic landscape? The general competitive equilibrium model is indeed the benchmark used by economists when they want to study the market properties of an economic issue. Before proceeding, we should remind the reader that the essence of this model is that all trades are impersonal: when making their production or consumption decisions, economic agents need to know the price system only, which they take as given. At a competitive equilibrium, prices provide ﬁrms and consumers with all the information they must know to maximize their proﬁt and their utility. The most elegant and general model of a competitive economy is undoubtedly that developed by Arrow and Debreu. In this model, a commodity is deﬁned not only by its physical characteristics, but also by the place it is made available. This implies that the same good traded at different places is treated as different economic commodities. Within this framework, choosing a location is part of choosing commodities. This approach integrates spatial interdependence of markets into general equilibrium in the same way as other forms of interdependence. Thus, the Arrow–Debreu model seems to obviate the need for a theory speciﬁc to the spatial context. Unfortunately, as will be seen later, the competitive model cannot generate economic agglomerations without assuming strong spatial inhomogeneities. More precisely, we follow Starrett (1978) and show that introducing a homogeneous space (in a sense that will be made precise below) in the Arrow–Debreu model implies that total transport costs in the economy must be zero at any spatial competitive equilibrium, and thus trade and cities cannot arise in equilibrium. In other words, the competitive model per se cannot be used as the foundation for the study of a spatial economy because we are interested in identifying purely economic mechanisms leading agents to agglomerate in a featureless plain.3 This is because we concur with Hoover (1948, p. 3) for whom: Even in the absence of any initial differentiation at all, i.e., if natural resources were distributed uniformly over the globe, patterns of specialization and concentration of activities would inevitably appear in response to economic, social, and political principles.

3

Ellickson and Zame (1994) disagree with this claim and argue that the introduction of moving costs in a dynamic setting may be sufﬁcient to save the competitive paradigm. To the best of our knowledge, however, the implications of their approach have not yet been fully worked out.

306

Fujita and Thisse

2.1.

Breakdown of the Competitive Price Mechanism in a Homogeneous Spatial Economy

The economy is formed by agents (ﬁrms and households) and by commodities (goods and services). A ﬁrm is characterized by a set of production plans, with each production plan describing a possible input–output relation. A household is identiﬁed by a relation of preference, by a bundle of initial resources, and by shares in ﬁrms’ proﬁts. A competitive equilibrium is then described by a price system (one price per commodity), a production plan for each ﬁrm, and a consumption bundle for each household that satisﬁes the following conditions: at the prevailing prices (i) supply equals demand for each commodity; (ii) each ﬁrm maximizes its proﬁt subject to its production set; and (iii) each household maximizes her utility under her budget constraint deﬁned by the value of her initial endowment and her shares in ﬁrms’ proﬁts. In other words, all markets clear while each agent chooses her most preferred action at the equilibrium prices. Space involves a ﬁnite number of locations. Transportation within each location is costless, but shipping goods from one location to another requires the use of resources. Without loss of generality, transportation between any two locations is performed by a proﬁt-maximizing carrier who purchases goods in a location at the market prices prevailing in this location and sells them in the other location at the corresponding market prices, while using goods and land in each location as inputs. A typical ﬁrm produces in a small number of places. Likewise, a household has a very small number of residences. For simplicity, we therefore assume that each ﬁrm (each household) chooses a single location and engages in production (consumption) activities there. However, ﬁrms and households are free to choose any location they want (the industry is footloose). For expositional convenience, we distinguish explicitly prices and goods by their location. Given this convention, space is said to be homogeneous when (i) the utility function and the consumption set are the same regardless of the location in which the household resides, and (ii) the production set of a ﬁrm is independent of the location elected by this ﬁrm. In other words, consumers and producers have no intrinsic preferences for one location over others. In this context, the following unsuspected result, which we call the Spatial Impossibility Theorem, has been proven by Starrett (1978). Theorem 2.1. Consider an economy with a ﬁnite number of agents and locations. If space is homogeneous, transport is costly, and preferences are locally nonsatiated, then there is no competitive equilibrium involving transportation. What does it mean? If economic activities are perfectly divisible, a competitive equilibrium exists and is such that each location operates as an autarky. For example, when households are identical, locations have the same relative prices and the same production structure (backyard capitalism). This is hardly

Agglomeration and Market Interaction

307

a surprising outcome because, by assumption, there is no reason for economic agents to distinguish among locations and each activity can operate at an arbitrarily small level. Firms and households thus succeed in reducing transport costs at their absolute minimum, namely zero. However, as observed by Starrett (1978, p. 27), when economic activities are not perfectly divisible, the transport of some goods between some places becomes unavoidable: . . . as long as there are some indivisibilities in the system (so that individual operations must take up space) then a sufﬁciently complicated set of interrelated activities will generate transport costs (Starrett 1978, p. 27).

In this case, the Spatial Impossibility Theorem tells us that no competitive equilibrium exists. This is clearly a surprising result that requires more explanations. For simplicity, we restrict ourselves to the case of two locations, A and B. When both locations are not in autarky, one should keep in mind that the price system must do two different jobs simultaneously: (i) to support trade between locations (while clearing the markets in each location) and (ii) to prevent ﬁrms and households from relocating. The Spatial Impossibility Theorem says that, in the case of a homogeneous space, it is impossible to hit two birds with one stone: the price gradients supporting trade bear wrong signals from the viewpoint of locational stability. Indeed, if a set of goods is exported from A to B, then the associated positive price gradients induce producers located in A (who seek a higher revenue) to relocate in B, whereas location B’s buyers (who seek lower prices) want to relocate in A. Likewise, the export of another set of goods from B to A encourages such “cross-relocation.” The land rent differential between the two locations can discourage the relocation in one direction only. Hence, as long as trade occurs at positive costs, some agents always want to relocate. To ascertain the fundamental cause for this nonexistence, it is helpful to illustrate the difﬁculty encountered by using a standard diagram approach. Depicting the whole trade pattern between two locations would require a diagram with six dimensions (two tradable goods and land at each location), which is a task beyond our capability. Thus, we focus on a two-dimensional subspace of the whole pattern by considering the production of good i only, which is traded between A and B, while keeping the other elements ﬁxed. Because the same physical good available at two distinct locations corresponds to two different commodities, this is equivalent to studying the production possibility frontier between two different economic goods. Suppose that, at most, one unit of good i is produced by one ﬁrm at either location using a ﬁxed bundle of inputs. For simplicity, the cost of these inputs is assumed to be the same in both locations. The good is shipped according to an iceberg technology: when xi units of the good are moved between A and B, only a fraction xi /ϒ arrives at its destination, with ϒ > 1, whereas the rest melts away en route (Samuelson, 1983). In this context, if the ﬁrm is located in

308

Fujita and Thisse xiA

1

E

( piA , piB )

1/Υ

1/Υ

F′

Υ 0

F 1/Υ

E′ 1

xiB

Figure 8.1. The set of feasible allocations in a homogeneous space.

A, then the output is represented by point E on the vertical axis in Figure 8.1; if the entire output is shipped to B, then the fraction 1/ϒ arrives at B, which is denoted by point F on the horizontal axis. Hence, when the ﬁrm is at A, the set of feasible allocations of the output between the two locations is given by the triangle O E F. Space being homogeneous, if the ﬁrm locates at B, the set of feasible allocations between the two places is now given by the triangle O E F . Hence, when the ﬁrm is not located, the set of feasible allocations is given by the union of the two triangles. Let the ﬁrm be set up at A and assume that the demand conditions are such that good i is consumed in both locations so that trade occurs. Then, to support any feasible trade pattern, represented by an interior point of the segment E F, the price vector ( pi A , pi B ) must be such that pi A / pi B = 1/ϒ, as shown in Figure 8.1. However, under these prices, it is clear that the ﬁrm can obtain a strictly higher proﬁt by locating in B and choosing the production plan E in Figure 8.1. This implies that there is no competitive price system that can support both the existence of trade and a proﬁt-maximizing location for the ﬁrm. This difﬁculty arises from the nonconvexity of the set of feasible allocations. If transportation was costless, the set of feasible allocations would be given by the triangle O E E in Figure 8.1, which is convex. In this case, the ﬁrm would face no incentive to relocate. Similarly, if the ﬁrm’s production activity was perfectly divisible, this set would again be equal to the triangle O E E , and no difﬁculty would arise. Therefore, even though the individual land consumption is endogenous, we may conclude that the fundamental reason for the Spatial Impossibility Theorem is the nonconvexity of the set of feasible allocations caused by the existence of positive transport costs and the fact that agents have an address in space.

Agglomeration and Market Interaction

309

Some remarks are still in order. First, we have assumed that each ﬁrm locates in a single region. The theorem could be generalized to permit ﬁrms to run distinct plants, one plant per location because each plant amounts to a separate ﬁrm in the competitive setting (Koopmans, 1957). Second, we have considered a closed economy. The theorem can be readily extended to allow for trade with the rest of the world provided that each location has the same access to the world markets to satisfy the assumption of a homogeneous space. Third, the size of the economy is immaterial for the Spatial Impossibility Theorem to hold in that assuming a “large economy,” in which competitive equilibria often emerge as the outcome generated by several institutional mechanisms, does not affect the result because the value of total transport costs within the economy rises when agents are replicated. Last, the following result sheds extra light on the meaning of the Spatial Impossibility Theorem (Fujita and Thisse, 2002). Corollary 2.2. If there exists a competitive equilibrium in a spatial economy with a homogeneous space, then the land rent must be the same in all locations. This result has the following fundamental implication for us: in a homogeneous space, the competitive price mechanism is unable to explain why the land rent is higher in an economic agglomeration (such as a city, a central business district, or an industrial cluster) than in the surrounding area. This clearly shows the limits of the competitive paradigm for studying the agglomeration of ﬁrms and households. 2.2.

What Are the Alternative Modeling Strategies?

Thus, if we want to understand something about the spatial distribution of economic activities and, in particular, the formation of major economic agglomerations as well as regional specialization and trade, the Spatial Impossibility Theorem tells us that we must make at least one of the following three assumptions: (i) space is heterogeneous (as in the neoclassical theory of international trade) (ii) externalities in production and consumption exist (as in urban economic) (iii) markets are imperfectly competitive (as in the so-called “new” economic geography). Of course, in reality, economic spaces are the outcome of different combinations of these three agglomeration forces. However, it is convenient here to distinguish them to ﬁgure out what are the effects of each one of them. A. Comparative advantage models. The heterogeneity of space introduces the uneven distribution of immobile resources (such as mineral deposits or some production factors) and amenities

310

Fujita and Thisse

(climate), as well as the existence of transport nodes (ports, transhipment points) or trading places. This approach, while retaining the assumption of constant returns and perfect competition, yields comparative advantage among locations and gives rise to interregional and intercity trade. B. Externality models. Unlike models of comparative advantage, the basic forces for spatial agglomeration and trade are generated endogeneously through nonmarket interactions among ﬁrms and/or households (knowledge spillovers, business communications, and social interactions). Again, this approach allows us to appeal to the constant return/perfect competition paradigm.4 C. Imperfect competition models. Firms are no longer price-takers, thus making their price policy dependent on the spatial distribution of consumers and ﬁrms. This generates some form of direct interdependence between ﬁrms and households that may produce agglomerations. However, it is useful to distinguish two types of approaches. C1. Monopolistic competition. This leads to some departure from the competitive model and allows for ﬁrms to be price-makers and to produce differentiated goods under increasing returns; however, strategic interactions are weak because one assumes a continuum of ﬁrms. C2. Oligopolistic competition. Here, we face the integer aspect of location explicitly. That is, we assume a ﬁnite number of large agents (ﬁrms, local governments, and land developers) who interact strategically by accounting for their market power. The implications of the modeling strategy selected are important. For example, models under A, B, and C1 permit the use of a continuous density approach that seems to be in line with what geographers do. By contrast, under C2, it is critical to know “who is where” and with whom the corresponding agent interacts. In addition, if we focus on the heterogeneity of space, the market outcome is socially optimal. On the other hand, because the other two approaches involve market failures, the market outcome is likely to be inefﬁcient. Models of comparative advantage have been extensively studied by international and urban economists (Fujita, 1989), whereas models of spatial competition have attracted a lot of attention in industrial organization (Anderson, de Palma, and Thisse, 1992). Because Ed Glaeser and Jos´e Scheinkman deal with nonmarket interactions, we choose to focus on market interactions, that is, models belonging to class C1. Although this class of models has been initially developed in the context of intraurban agglomeration with a land market (e.g., Fujita 1988), we restrict ourselves to multiregional models of industrial agglomeration. 4

See, e.g., the now classical papers by Henderson (1974) and by Fujita and Ogawa (1982).

Agglomeration and Market Interaction

311

3. CORE AND PERIPHERY: A MONOPOLISTIC COMPETITION APPROACH The spatial economy is replete with pecuniary externalities. For example, when some workers choose to migrate, they are likely to affect both the labor and product markets in their region of origin, thus affecting the well-being of those who stay put. Moreover, the moving workers do not account either for the impact of their decision on the workers and ﬁrms located in the region of destination. Still, their moves will increase the level of demand inside this region, thus making the place more attractive to ﬁrms. Everything else being equal, they will also depress the local labor market so that the local wage is likely to be affected negatively. In sum, these various changes may increase or decrease the attractiveness of the destination region for outside workers and ﬁrms. Such pecuniary externalities are especially relevant in the context of imperfectly competitive markets, because prices do not reﬂect perfectly the social values of individual decisions. They are also better studied within a general equilibrium context to account for the interactions between the product and labor markets. In particular, such a framework allows us to study the dual role of individuals as workers and consumers. At ﬁrst sight, this seems to be a formidable task. Yet, as shown by Krugman (1991a), several of these various effects can be combined and studied within a simple enough general equilibrium model of monopolistic competition, which has come to be known as the core-periphery model. Recall that monopolistic competition in the manner of Chamberlin involves consumers with a preference for variety (varietas delectat), whereas ﬁrms producing these varieties compete for a limited amount of resources because they face increasing returns. The prototype that has emerged from the industrial organization literature is the model developed by Spence (1976) and Dixit and Stiglitz (1977), sometimes called the S-D-S model. These authors assume that each ﬁrm is negligible in the sense that it may ignore its impact on, and hence reactions from, other ﬁrms, but retains enough market power for pricing above marginal cost regardless of the total number of ﬁrms (like a monopolist). Moreover, the position of a ﬁrm’s demand depends on the actions taken by all ﬁrms in the market (as in perfect competition). In many applications, the S-D-S model is proven to be a very powerful instrument for studying the aggregate implications of monopoly power and increasing returns, and so especially when these are the basic ingredients of selfsustaining processes such as those encountered in modern theories of growth and geography (Matsuyama, 1995). This is because of the following reasons. First, although each ﬁrm is a price-maker, strategic interactions are very weak in this model, thus making the existence of an equilibrium much less problematic than in general equilibrium under imperfect competition (see, e.g., Bonanno, 1990). Second, the assumption of free entry and exit leads to zero proﬁt so that a worker’s income is just equal to her wage, another major simpliﬁcation. Last, the difference between price competition and quantity competition that plagues oligopoly models is immaterial in a monopolistic competitive setting.

312

Fujita and Thisse

Indeed, being negligible to the market, each ﬁrm behaves as a monopolist on her residual demand, which makes it indifferent between using price or quantity as a strategy. 3.1.

The Framework

We consider a 2 × 2 × 2 setting. The economic space is made of two regions (A and B). The economy has two sectors, the modern sector (M) and the traditional sector (T). There are two production factors, the high-skilled workers (H ) and the low-skilled workers (L). The M-sector produces a continuum of varieties of a horizontally differentiated product under increasing returns, using H as the only input. The T-sector produces a homogeneous good under constant returns, using unskilled labor L as the only input. The economy is endowed with L unskilled workers and with H skilled workers (labor dualism). The skilled workers are perfectly mobile between regions, whereas the unskilled workers are immobile. This extreme assumption is justiﬁed because the skilled are more mobile than the unskilled over long distances (SOPEMI 1998). Finally, the unskilled workers are equally distributed between the two regions, and thus regions are a priori symmetric. The technology in the T-sector is such that one unit of output requires one unit of L. The output of the T-sector is costlessly traded between any two regions and is chosen as the num´eraire so that p T = 1. Hence, the wage of the unskilled workers is also equal to 1 in both regions. Each variety of the M-sector is produced according to the same technology such that the production of the quantity q(i) requires l(i) units of skilled labor given by l(i) = f + cq(i),

(3.1)

in which f and c are, respectively, the ﬁxed and marginal labor requirements. Because there are increasing returns but no scope economies, each variety is produced by a single ﬁrm. This is because, due to the consumers’ preference for variety, any ﬁrm obtains a higher share of the market by producing a differentiated variety than by replicating an existing one. The market equilibrium is the outcome of the interplay between a dispersion force and an agglomeration force. The centrifugal force is very simple. It lies in two sources: (i) the spatial immobility of the unskilled whose demands for the manufactured good are to be met and (ii) the ﬁercer competition that arises when ﬁrms locate back to back (d’Aspremont, Gabszewicz, and Thisse, 1979). The centripetal force is more involved. If a larger number of ﬁrms is located in one region, the number of varieties locally produced is also larger. This in turn induces some skilled living in the smaller region to move toward the larger region in which they may enjoy a higher standard of living. The resulting increase in the numbers of consumers creates a larger demand for the differentiated good which, therefore, leads additional ﬁrms to locate in this region. This implies the availability of more varieties in the region in question, but less in the others because there are scale economies at the ﬁrm’s level. Consequently, as noted by

Agglomeration and Market Interaction

313

Krugman (1991a, p. 486), there is circular causation in the manner of Myrdal, because these two effects reinforce each other: “manufactures production will tend to concentrate where there is a large market, but the market will be large where manufactures production is concentrated.” Let λ be the fraction of skilled residing in region A and denote by vr (λ) the indirect utility a skilled worker enjoys in region r = A, B when the spatial distribution of skilled is (λ, 1 − λ). A spatial equilibrium arises at λ ∈ (0, 1) when v(λ) ≡ v A (λ) − v B (λ) = 0, at λ = 0 when v(0) ≤ 0, or at λ = 1 when v(1) ≥ 0. Such an equilibrium always exists when vr (λ) is a continuous function of λ. However, this equilibrium is not necessarily unique. Stability is then used to eliminate some of them. The stability of such an equilibrium is studied with respect to the following equation of motion:5 .

λ= λv(λ)(1 − λ).

(3.2)

If v(λ) is positive and λ ∈ (0, 1), workers move from B to A; if it is negative, they go in the opposite direction. Clearly, any spatial equilibrium is such that . λ= 0. A spatial equilibrium is stable if, for any marginal deviation of the population distribution from the equilibrium, the equation of motion brings the distribution of skilled workers back to the original one.6 We assume that local labor markets adjust instantaneously when some skilled workers move from one region to the other. More precisely, the number of ﬁrms in each region must be such that the labor market-clearing conditions (3.12) and (3.22) remain valid for the new distribution of workers. Wages are then adjusted in each region for each ﬁrm to earn zero proﬁts in any region having skilled workers, because the skilled move according to the utility differential. 3.2.

A Model with CES Utility and Iceberg Transport Costs

Although consumption takes place in a speciﬁc region, it is notationally convenient to describe preferences without explicitly referring to any particular region. Preferences are identical across all workers and described by a Cobb– Douglas utility: u = Q µ T 1−µ /µµ (1 − µ)1−µ , 5

6

0 < µ < 1,

(3.3)

This dynamic implies that the equilibrium is reached for t → ∞. One could alternately use the dynamic system proposed by Tabuchi (1986) in which the corner solutions λ = 0 and λ = 1 are reached within ﬁnite times. The difference becomes critical when the economy exhibits different equilibrium patterns over time. Note that (3.2) provides one more justiﬁcation for working with a continuum of agents: this modeling strategy allows one to respect the integer nature of an agent’s location (her address) while describing the evolution of the regional share of production by means of a differential equation.

314

Fujita and Thisse

where Q stands for an index of the consumption of the modern sector varieties, and T is the consumption of the output of the traditional sector. Because the modern sector provides a continuum of varieties of size M, the index Q is given by

M

Q=

ρ

1/ρ

q(i) di

0 < ρ < 1,

(3.4)

0

where q(i) represents the consumption of variety i ∈ [0, M]. Hence, each consumer displays a preference for variety. In (3.4), the parameter ρ stands for the inverse of the intensity of love for variety over the differentiated product. When ρ is close to 1, varieties are close to perfect substitutes; when ρ decreases, the desire to spread consumption over all varieties increases. If σ ≡ 1/(1 − ρ), then σ is the elasticity of substitution between any two varieties. Because there is a continuum of ﬁrms, each ﬁrm is negligible and the interactions between any two ﬁrms are zero, but aggregate market conditions of some kind (e.g., the average price across ﬁrms) affect any single ﬁrm. This provides a setting in which ﬁrms are not competitive (in the classic economic sense of having inﬁnite demand elasticity), but at the same time they have no strategic interactions with one another [see (3.5)]. If y denotes the consumer income and p(i) the price of variety i, then the demand functions are q(i) = µyp(i)−σP σ −1

i ∈ [0, M],

(3.5)

where P is the price index of the differentiated product given by P≡

M

p(i)−(σ −1) di

−1/(σ −1) .

(3.6)

0

The corresponding indirect utility function is v = y P −µ .

(3.7)

Without loss of generality, we choose the unit of skilled labor such that c = 1 in (3.1). The output of the M-sector is shipped at a positive cost according to the “iceberg” technology: When one unit of the differentiated product is moved from region r to region s, only a fraction 1/ϒ arrives at its destination with ϒ > 1. Because mill and discriminatory pricing can be shown to be equivalent in the present setting, we may use the mill pricing interpretation in what follows. When variety i is sold in region r at the mill price pr (i), the price pr s (i) paid by a consumer located in region s (= r ) is pr s (i) = pr (i)ϒ. If the distribution of ﬁrms is (Mr , Ms ), using (3.6) the price index Pr in region r

Agglomeration and Market Interaction

is then given by Pr =

Mr

pr (i)−(σ −1) di + ϒ −(σ −1)

0

315

Ms

ps (i)−(σ −1) di

$−1/(σ −1) ,

0

(3.8) which clearly depends on the spatial distribution of ﬁrms, as well as the level of transport costs. Let w r denote the wage rate of a skilled worker living in region r . Because there is free entry and exit and, therefore, zero proﬁt in equilibrium, the income of region r is Yr = λr H w r + L/2

r = A, B,

(3.9)

where λr is the share of skilled workers residing in region r . Using (3.5), the total demand of the ﬁrm producing variety i and located in region r is qr (i) = µpr (i)−σ Yr (Pr )σ −1 + µpr (i)−σ Ys ϒ −(σ −1) (Ps )σ −1 .

(3.10)

Because each ﬁrm has a negligible impact on the market, it may accurately neglect the impact of a price change over consumers’ income (Yr ) and other ﬁrms’ prices, hence on the regional price indexes (Pr ). Consequently, (3.10) implies that, regardless of the spatial distribution of consumers, each ﬁrm faces an isoelastic demand. This very convenient property depends crucially on the assumption of an iceberg transport cost, which affects here the level of demand but not its elasticity. The proﬁt function of a ﬁrm in r is πr (i) = [ pr (i) − w r ]qr (i) − w r f. Because varieties are equally weighted in the utility function, the equilibrium price is the same across all ﬁrms located in region r . Solving the ﬁrst-order condition yields the common equilibrium price pr∗ =

wr . ρ

(3.11)

Substituting pr∗ into πr (i) leads to πr =

wr [qr − (σ − 1) f ]. σ −1

Under free entry, proﬁts are zero so that the equilibrium output of a ﬁrm is given by qr∗ = (σ − 1) f , which is independent of the spatial distribution of demand. As a result, in equilibrium, a ﬁrm’s labor requirement is a constant given by l ∗ = σ f , and thus the total number of ﬁrms in the M-sector is equal to H/σ f . The corresponding distribution of ﬁrms Mr = λr H/σ f

r = A, B

(3.12)

316

Fujita and Thisse

depends only on the distribution of the skilled workers. Hence, the model allows for studying the spatial distribution of the modern sector but not for its size. Introducing the equilibrium prices (3.11) and substituting (3.12) for Mr in the regional price index (3.8) gives % &−1/(σ −1) Pr = κ1 λr w r−(σ −1) + λs (w s ϒ)−(σ −1) , (3.13) where κ1 is a positive constant. Finally, we consider the labor market-clearing conditions for a given distribution of workers. The wage prevailing in region r is the highest wage that ﬁrms located there can pay under the nonnegative proﬁt constraint. For that, we evaluate the demand (3.10) as a function of the wage through the equilibrium price (3.11): qr (w r ) = µ(w r /ρ)−σ Yr Prσ −1 + Ys ϒ −(σ −1) Psσ −1 . Because this expression is equal to (σ − 1) f when proﬁts are zero, we obtain the following implicit expression for the zero-proﬁt wages: 1/σ , (3.14) w r∗ = κ2 Yr Prσ −1 + Ys ϒ −(σ −1) Psσ −1 where κ2 is a positive constant. Clearly, w r∗ is the equilibrium wage in region r when λr > 0. Substituting (3.9) for Yr in the indirect utility (3.7), we obtain the real wage as follows: vr = ωr =

w r∗ Prµ

r = A, B.

(3.15)

Finally, the Walras law implies that the traditional sector market is in equilibrium provided that the equilibrium conditions noted previously are satisﬁed. Summarizing the foregoing developments, the basic equations for our economy are given by (3.9), (3.13), (3.14), and (3.15). From now on, set λ A = λ and λ B = (1 − λ). 3.2.1.

The Core-Periphery Structure

Suppose that the modern sector is concentrated in one region, say region A, so that λ = 1. We wish to determine conditions under which the real wage a skilled worker may obtain in region B does not exceed the real wage she gets in region A. Setting λ = 1 in (3.9), (3.13), (3.14), and (3.15), we get ωB 1 + µ −σ (µ+ρ) 1 − µ −σ (µ−ρ) 1/σ ϒ ϒ = + . (3.16) ωA 2 2 The ﬁrst term in the right-hand side of (3.16) is always decreasing in ϒ. Therefore, if µ ≥ ρ, the second term is also decreasing so that the ratio ω B /ω A always decreases with ϒ, thus implying that ω B < ω A for all ϒ > 1. This

Agglomeration and Market Interaction

317

ωB / ω A ωB / ω A

1

0 1

Υsustain

Υ

Figure 8.2. Determination of the sustain point.

means that the core-periphery structure is a stable equilibrium for all ϒ > 1. When µ ≥ ρ,

(3.17)

varieties are so differentiated that ﬁrms’ demands are not very sensitive to differences in transportation costs, thus making the agglomeration force very strong. More interesting is the case in which µ < ρ;

(3.18)

that is, varieties are not very differentiated so that ﬁrms’ demands are sufﬁciently elastic for the agglomeration force to be weak. If (3.18) holds, ϒ −µσ +σ −1 goes to inﬁnity when ϒ → ∞ and the ratio ω B /ω A is as depicted in Figure 8.2. In this case, there exists a single value ϒsustain > 1 such that ω B /ω A = 1. Hence, the agglomeration is a stable equilibrium for any ϒ ≤ ϒsustain . This occurs because ﬁrms can enjoy all the beneﬁts of agglomeration without losing much of their business in the other region. Such a point is called the sustain point because, once ﬁrms are fully agglomerated, they stay so for all smaller values of ϒ. On the other hand, when transportation costs are sufﬁciently high (ϒ > ϒsustain ), ﬁrms lose much on their exports, and thus the core-periphery structure is no longer an equilibrium. Summarizing this discussion, we obtain: Proposition 3.1. Consider a two-region economy. (i) If µ ≥ ρ, then the core-periphery structure is always a stable equilibrium.

318

Fujita and Thisse

(ii) If µ < ρ, then there exists a unique solution ϒsustain > 1 to the equation 1 + µ −σ (µ+ρ) 1 − µ −σ (µ−ρ) + = 1, ϒ ϒ 2 2 such that the core-periphery structure is a stable equilibrium for any ϒ ≤ ϒsustain . Interestingly, this proposition provides formal support to the claim made by Kaldor (1970, p. 241) more than 30 years ago: When trade is opened up between them, the region with the more developed industry will be able to supply the need of the agricultural area of the other region on more favourable terms: with the result that the industrial centre of the second region will lose its market and will tend to be eliminated.

3.2.2.

The Symmetric Structure

Proposition 3 suggests that the modern sector is geographically dispersed when transportation costs are high, at least when (3.18) holds. To check this, we consider the symmetric conﬁguration (λ = 1/2). In this case, for a given ϒ, the symmetric equilibrium is stable (unstable) if the slope of ω(λ) is negative (positive) at λ = 1/2. Checking this condition requires fairly long calculations using all the equilibrium conditions. However, Fujita, Krugman, and Venables (1999) have shown the following results. First, when (3.18) does not hold, the symmetric equilibrium is always unstable. Second, when (3.18) holds, this equilibrium is stable (unstable) if ϒ is larger (smaller) than some threshold value ϒbreak given by (ρ + µ)(1 + µ) 1/(σ −1) ϒbreak = , (3.19) (ρ − µ)(1 − µ) which is clearly larger than one. This is called the break point because symmetry between the two regions is no longer a stable equilibrium for lower values of ϒ. It is interesting to note that ϒbreak depends on the same parameters as ϒsustain . It is immediate from (3.19) that ϒbreak is increasing with the share of the modern sector (µ) and with the degree of product differentiation (1/ρ). Because ϒbreak < ϒsustain can be shown to hold,7 there exists a domain of parameters over which there is multiplicity of equilibria, namely agglomeration and dispersion, as depicted in Figure 8.3. More precisely, when ϒ > ϒsustain , the economy necessarily involves dispersion. When ϒ < ϒbreak , agglomeration always arises, the winning region depending on the initial conditions. Finally, when ϒbreak ≤ ϒ ≤ ϒsustain , both agglomeration and dispersion are stable equilibria. In this domain, the economy displays some hysteresis because dispersion (agglomeration) still prevails when transport costs rise above the sustain point 7

See Neary (2001) for a proof.

Agglomeration and Market Interaction

319

λ 1

1

2

0

1

Υbreak

Υsustain

Υ

Figure 8.3. Bifurcation diagram for the core-periphery model.

(fall below the break point) while staying below the break point (above the sustain point). Summarizing these results, when transportation costs are sufﬁciently low, all manufacturers are concentrated in a single region that becomes the core of the economy, whereas the other region, called the periphery, supplies only the traditional good. Firms in the modern sector are able to exploit increasing returns by selling more in the large market without losing much business in the small market. For exactly the opposite reason, the economy displays a symmetric regional pattern of production when transportation costs are large. Hence, this model allows for the possibility of divergence between regions, whereas the neoclassical model, based on constant returns and perfect competition in the two sectors, would predict symmetry only. 3.3.

A Linear Model of Core-Periphery

The conclusions derived in Section 3.2 are very important for the space economy. This is why it is crucial to know how they depend on the speciﬁcities of the framework used. The use of both the CES utility and iceberg cost leads to a convenient setting in which demands have a constant elasticity. However, such a result conﬂicts with research in spatial pricing theory in which demand elasticity is shown to vary with distance. Moreover, if using the iceberg cost is able to capture the fact that shipping is resource-consuming, such a modeling option implies that any increase in the mill price is accompanied by a proportional increase in transport cost, which seems unrealistic. Last, although models of the type considered in the foregoing are based on very speciﬁc assumptions, they are often beyond the reach of analytical resolution. The setting considered here, which has been developed by Ottaviano, Tabuchi, and Thisse (2002), is very similar to that used in Section 3.2. However, there are two major differences. First, the output of the M-sector is traded at a cost of τ units of the num´eraire per unit shipped between regions. This characteristic agrees more with reality, as well as with location theory, than the iceberg technology does. Second, preferences are given by a quasi-linear utility encapsulating a quadratic subutility instead of a Cobb–Douglas preference

320

Fujita and Thisse

on the homogeneous and differentiated goods with CES subutility. These two speciﬁcations correspond to rather extreme cases: the former assumes an inﬁnite elasticity of substitution between the differentiated product and the num´eraire, the latter a unit elasticity. Moreover, ﬁrms’ demands are linear and not isoelastic. Despite such major differences in settings, we will see that conclusions are qualitatively the same in the two models, thus suggesting that they hold for a whole class of models. 3.3.1.

A Model with Quadratic Utility and Linear Transport Costs

Preferences are identical across individuals and described by a quasi-linear utility with a quadratic subutility that is supposed to be symmetric in all varieties: M M u(q0 ; q(i), i ∈ [0, M]) = α q(i) di − (β − δ) [q(i)]2 di 0

−δ

0

2

M

q(i) di

+ q0 ,

(3.20)

0

where q(i) is the quantity of variety i ∈ [0, M] and q0 the quantity of a homogeneous good chosen as the num´eraire. The parameters in (3.20) are such that α > 0 and β > δ > 0. In this expression, α expresses the intensity of preferences for the differentiated product, whereas β > δ means that consumers’ preferences exhibit love of variety. Finally, for a given value of β, the parameter δ expresses the substitutability between varieties: the higher δ, the closer substitutes the varieties. Admittedly, a quasi-linear utility abstracts from general equilibrium income effects and gives the corresponding framework a fairly strong partial equilibrium ﬂavor. However, it does not remove the interaction between product and labor markets, thus allowing us to develop a full-ﬂedged model of agglomeration formation, independently of the relative size of the manufacturing sector. Any individual is endowed with one unit of labor (of type H or L) and q 0 > 0 units of the num´eraire. Her budget constraint can then be written as follows: M p (i) q (i) di + q0 = y + q 0 , 0

where y is the individual’s labor income and p(i) the price of variety i. The initial endowment q 0 is supposed to be large enough for the residual consumption of the num´eraire to be strictly positive for each individual. Hence, individual demand q(i) for variety i is given by q(i) = a − (b + d M) p(i) + d P, where

M

P≡

p(i) di, 0

(3.21)

Agglomeration and Market Interaction

321

which can be interpreted as the price index in the modern sector, whereas a ≡ 2α/[(β + (M − 1)δ], b ≡ 1/[β + (M − 1)δ], and d ≡ δ/(β − δ)[β + (M − 1)δ]. Finally, each variety can be traded at a positive cost of τ units of the num´eraire for each unit transported from one region to the other, regardless of the variety. The technologies are the same as in Section 3.1, but, for simplicity, c is set equal to zero in (3.1). Labor market clearing implies that the number of ﬁrms belonging to the M-sector in region r is Mr = λr H/ f.

(3.22)

Consequently, the total number of ﬁrms in the economy is constant and equal to M = H/ f . Discriminatory and mill pricing are no longer equivalent in this model. In the sequel, we focus on discriminatory pricing, because this policy endows ﬁrms with ﬂexibility in their price choice, something that could affect the process of agglomeration. This means that each ﬁrm sets a delivered price speciﬁc to each region. Hence, the proﬁt function of a ﬁrm located in region r is as follows: πr = prr qrr ( prr )(L/2 + λr H ) + ( pr s − τ )qr s ( pr s )(L/2 + λs H ) − f w r . To illustrate the type of interaction that characterizes this model of monopolistic competition, we describe how the equilibrium prices are determined. Each ﬁrm i in region r maximizes its proﬁt πr , assuming accurately that its price choice has no impact on the regional price indices Mr Ms Pr ≡ prr (i)di + psr (i)di s = r. 0

0

Because, by symmetry, the prices selected by the ﬁrms located within the same ∗ region are identical, the result is denoted by prr (Pr ) and pr∗s (Ps ). Clearly, it must be that ∗ ∗ (Pr ) + Ms psr (Pr ) = Pr . Mr prr

Given (3.22), it is then readily veriﬁed that the equilibrium prices are as follows: 1 2a + τ dλs M , 2 2b + d M τ pr∗s = pss + . 2

∗ prr =

(3.23) (3.24)

∗ Clearly, these prices depend directly on the ﬁrms’ distribution. In particular, prr decreases with the number of ﬁrms in region r and increases with the degree of product differentiation when τ is sufﬁciently small for the demands of the imported varieties to be positive. These results agree with what we know from standard models of product differentiation.

322

Fujita and Thisse

It is easy to check that the equilibrium operating proﬁts earned in each market by a ﬁrm established in r are as follows: ∗ 2 πrr∗ = (b + d M)( prr ) (L/2 + λr H ), ∗ ∗ πr s = (b + d M)( pr s − τ )2 (L/2 + λs H ).

Increasing λr has two opposite effects on πrr∗ . First, as λr rises, the equilibrium price (3.23) falls as well as the quantity of each variety bought by each consumer living in region r . However, the total population of consumers residing in this region is now larger so that the proﬁts made by a ﬁrm located in r on local sales may increase. What is at work here is a global demand effect due to the increase in the local population that may compensate ﬁrms for the adverse price effect, as well as for the decrease in each worker’s individual demand. Entry and exit are free so that proﬁts are zero in equilibrium. Hence, (3.22) implies that any change in the population of workers located in one region must be accompanied by a corresponding change in the number of ﬁrms. The equilibrium wage rates w r∗ of the skilled are obtained from the zero-proﬁt condition evaluated at the equilibrium prices: w r∗ (λr ) = (πrr∗ + πr∗s )/ f . 3.3.2.

The Debate Agglomeration Vs. Dispersion Revisited

The indirect utility differential v(λ) is obtained by plugging the equilibrium prices (3.23)–(3.24) and the equilibrium wages w r∗ (λ) into the indirect utility associated with (3.20): v(λ) ≡ v A (λ) − v B (λ) = C ∗ τ (τ ∗ − τ )(λ − 1/2),

(3.25)

where C ∗ is a positive constant and τ∗ ≡

4a f (3b f + 2d H ) > 0. 2b f (3b f + 3d H + d L) + d 2 H (L + H )

(3.26)

It follows immediately from (3.25) that λ = 1/2 is always an equilibrium. Moreover, because v(λ) is linear in λ and C ∗ > 0, for λ = 1/2 the indirect utility differential always has the same sign as λ − 1/2 if and only if τ < τ ∗ ; if τ > τ ∗ , it has the opposite sign. In particular, when there are no increasing returns in the manufacturing sector ( f = 0), the coefﬁcient of (λ − 1/2) is always negative because τ ∗ = 0, and thus dispersion is the only (stable) equilibrium. This shows once more the importance of increasing returns for the possible emergence of an agglomeration.8 The same holds for product differentiation, because τ ∗ becomes arbitrarily small when varieties become less and less differentiated (d → ∞). 8

Sonnenschein (1982) shows, a contrario, a related result: if the initial distribution of ﬁrms is uneven along a given circle, then the spatial adjustment of ﬁrms in the direction of higher proﬁt leads the economy toward a uniform long-run equilibrium, each local economy being perfectly competitive.

Agglomeration and Market Interaction

323

It remains to determine when τ ∗ is sufﬁciently low for all demands to be positive at the equilibrium prices. This is so if and only if L/H >

6b2 f 2 + 8bd f H + 3d 2 H 2 . d H (2b f + d H )

(3.27)

The inequality (3.27) means that the population of unskilled is large relative to the population of skilled. When (3.27) does not hold, the coefﬁcient of (λ − 1/2) in (3.25) is always positive for all transport costs that allow for interregional trade. In this case, the advantages of having a large home market always dominate the disadvantages incurred while supplying a distant periphery. The condition (3.18) plays a role similar to (3.17). More interesting is the case when (3.27) holds. Although the size of the industrial sector is captured here through the relative population size L/H and not through its share in consumption, the intuition is similar: the ratio L/H must be sufﬁciently large for the economy to display different types of equilibria according to the value of τ . This result does not depend on the expenditure share on the manufacturing sector because of the absence of general equilibrium income effects: small or large sectors in terms of expenditure share are agglomerated when τ is small enough. Finally, stability is studied using (3.2). When τ > τ ∗ , it is straightforward to see that the symmetric conﬁguration is the only stable equilibrium. In contrast, when τ < τ ∗ , the symmetric equilibrium becomes unstable and workers agglomerate in region r provided that the initial fraction of workers residing in this region exceeds 1/2. In other words, agglomeration arises when the transport cost is low enough. Proposition 3.2. Consider a two-region economy with segmented markets. (i) When (3.27) does not hold, the core-periphery structure is the only stable equilibrium under trade. (ii) When (3.27) is satisﬁed, we have: for any τ > τ ∗ the symmetric conﬁguration is the only stable equilibrium with trade; for any τ < τ ∗ the core-periphery pattern is the unique stable equilibrium; for τ = τ ∗ any conﬁguration is an equilibrium. Because (3.25) is linear in λ, the break point and the sustain point are the same, and thus history alone matters for the selection of the agglomerated outcome. Looking at the threshold value τ ∗ as given by (3.26), we ﬁrst observe that τ ∗ increases with the degree of product differentiation (d falls) when (3.27) holds. This is intuitively plausible because the agglomeration process is driven by the mobility of the skilled workers, whence their population must be sufﬁciently large for product differentiation to act as an agglomeration force. Second, higher ﬁxed costs leads to a smaller number of ﬁrms/varieties. Still, it is readily veriﬁed that τ ∗ also increases when increasing returns become stronger ( f rises) when

324

Fujita and Thisse

(3.27) holds. In other words, the agglomeration of the modern sector is more likely, the stronger are the increasing returns at the ﬁrm’s level. Last, τ ∗ increases when the number of unskilled (L) decreases because the dispersion force is weaker. Both models studied in this section yield similar results, suggesting that the core-periphery structure is robust against alternative speciﬁcations. Each model has its own merit. The former allows for income effects and the latter for a ﬁner description of the role played by the key parameters of the economy. As will be seen later, both have been used in various extensions of the core-periphery model. 4. FURTHER TOPICS IN ECONOMIC GEOGRAPHY In this section, we present an abbreviated version of a few recent contributions. The interested reader will ﬁnd the models at greater length in the corresponding references. 4.1.

On a ∩-Shaped Relationship Between Agglomeration and Transport Costs

The assumption of zero transport costs for the homogeneous good is not innocuous. Indeed, introducing positive transport costs for this good leads to some fundamental changes in the results presented previously. To permit trade of the traditional good even at the symmetric conﬁguration, we assume that this good is differentiated too (e.g., oranges in A and apples in B). Thus, T as it appears in (3.3) is now given by η η 1/η , T = T A + TB where 0 < η < 1. The num´eraire is given by the traditional good in one of the two regions. As shown by Fujita et al. (1999), the bifurcation diagram given in Figure 8.3 changes and is now as in Figure 8.4. To make things simple, we consider a ﬁxed value for the transport costs of the traditional good and, as before, we concentrate on a decrease in the transport costs in the modern sector. When these costs are high, the symmetric conﬁguration is the only equilibrium. Below some critical value, the core-periphery arises as before. However, further reductions in transport costs eventually lead to redispersion of the modern sector. Indeed, the agglomeration of the modern sector within, say, region A generates large imports of the traditional good from region B. When transport costs in the modern sector become sufﬁciently low, the price indices of this good are about the same in the two regions. Then, the relative price of the traditional good in A rises because its transport cost remains unchanged. This in turn lowers region B’s nominal wage, which guarantees the same utility level in both regions to the skilled. When the transport costs within the modern sector decrease sufﬁciently, the factor price differential becomes strong enough to induce ﬁrms to move away from A to B.

Agglomeration and Market Interaction

325

λ 1.0

0.5

0.0 1.0

1.2

1.4

1.6

1.8

ΥM

Figure 8.4. Bifurcation with positive agricultural transport costs.

Consequently, as transport costs in the modern sector keep decreasing from high to very low values, whereas transport costs in the traditional sector remain constant, the modern sector is ﬁrst dispersed, then agglomerated, and redispersed, as seen in Figure 8.4. It is worth stressing that the reasons that lead to dispersion in the ﬁrst and third phases are different: in the former, the modern sector is dispersed because the cost of shipping its output is high; in the latter, dispersion arises because the periphery develops some comparative advantage in terms of labor cost. Although transport costs of both types of goods have declined since the beginning of the Industrial Revolution, what matters for the regional distribution of economic activities is not only the absolute levels of transport costs, but also their relative values across sectors (Kilkenny, 1998). For example, if both costs decrease proportionally, it can be shown that redispersion never occurs. This is not surprising because there is no force creating wage differential any more. However, if agricultural transport costs decrease at a lower pace than those of manufacturing goods, cheaper rural labor should eventually attract industrial ﬁrms, whereas the reversal in the relationship between transport costs has the opposite impact [see Fujita et al. (1999), Section 7.4 for more details]. The pattern dispersion/agglomeration/redispersion also arises as long as we consider any ingredient giving rise to factor price differentials in favor of the periphery. For example, if we assume that the agglomeration of the modern sector in one region generates higher urban costs, such as land rent and commuting costs, a sufﬁciently strong decrease in transport costs between regions will foster redispersion when ﬁrms located in the core region have to pay high wages to their workers. This occurs because workers must be compensated for the high urban costs associated with a large concentration of people within

326

Fujita and Thisse

the same urban area (Helpman, 1998, Tabuchi, 1998, and Ottaviano et al., 2002). Another example is when all workers are immobile, whereas agglomeration of the industrial sector may arise because of technological linkages with the intermediate sector (more on this later). In this case, wage in the core region may become so high that redispersion is proﬁtable for ﬁrms (Krugman and Venables, 1995 and Puga, 1999). 4.2.

Welfare Implications of the Core-Periphery Structure

We now wish to determine whether or not agglomeration is efﬁcient. To this end, we assume that the planner is able (i) to assign any number of workers (or, equivalently, of ﬁrms) to a speciﬁc region and (ii) to use lump sum transfers from all workers to pay for the loss ﬁrms may incur while pricing at marginal cost. Because utilities are quasi-linear in the model of Section 3.3, a utilitarian approach may be used to evaluate the global level of welfare (Ottaviano and Thisse, 2002). Observe that no distortion arises in the total number of varieties because N is determined by the factor endowment (H ) and technology ( f ) in the modern sector and is, therefore, the same at both the equilibrium and optimum outcomes. Because the setting assumes transferable utility, the planner chooses λ to maximize the sum of individual indirect utilities W (λ) (for both types of workers) in which all prices have been set equal to marginal cost. It can be shown that W (λ) = C o τ (τ o − τ )λ(λ − 1) + constant,

(4.1)

o

where C is a positive constant and τo ≡

4a f . 2b f + d(H + L)

The welfare function (4.1) is strictly concave in λ if τ > τ o and strictly convex if τ < τ o . Furthermore, because the coefﬁcients of λ2 and of λ are the same (up to their sign), this expression always has an interior extremum at λ = 1/2. As a result, the optimal choice of the planner is determined by the sign of the coefﬁcient of λ2 , that is, by the value of τ with respect to τ o : if τ > τ o , the symmetric conﬁguration is the optimum; if τ < τ o any agglomerated conﬁguration is the optimum; if τ = τ o , the welfare level is independent of the spatial conﬁguration. In accordance with intuition, it is efﬁcient to agglomerate the modern sector into a single region once transport costs are low, increasing returns are strong enough, and/or the output of this sector is sufﬁciently differentiated. On the other hand, the optimum is always dispersed when increasing returns vanish ( f = 0) and/or when varieties are close substitutes (d is large). A simple calculation shows that τ o < τ ∗ . This means that the market yields an agglomerated conﬁguration for a range (τ o < τ < τ ∗ ) of transport cost values for which it is efﬁcient to have a dispersed pattern of activities. In contrast, when transport costs are low (τ < τ o ) or high (τ > τ ∗ ), no regional policy is

Agglomeration and Market Interaction

327

required from the efﬁciency point of view, although equity considerations might justify such a policy when agglomeration arises. On the contrary, for intermediate values of transport costs (τ o < τ < τ ∗ ), the market provides excessive agglomeration, thus justifying the need for an active regional policy to foster the dispersion of the modern sector on both the efﬁciency and equity grounds.9 This discrepancy may be explained as follows. First, workers do not internalize the negative external effects they impose on the unskilled who stay put, nor do they account for the impact of their migration decisions on the residents in their region of destination. Hence, even though the skilled have individual incentives to move, these incentives do not reﬂect the social value of their move. This explains why equilibrium and optimum do not necessarily coincide. Second, the individual demand elasticity is much lower at the optimum (marginal cost pricing) than at the equilibrium (Nash equilibrium pricing), and thus regional price indices are less sensitive to a decrease in τ . As a result, the fall in trade costs must be sufﬁciently large to make the agglomeration of workers socially desirable; this tells us why τ o < τ ∗ . 4.3.

On the Impact of Forward-Looking Behavior

In the dynamics used in Section 3, workers care only about their current utility level. This is a fairly restrictive assumption to the extent that migration decisions are typically made on the grounds of current and future utility ﬂows and costs (such as search, mismatch, and homesickness). In addition, this approach has been criticized because it is not consistent with fully rational forwardlooking behavior. It is, therefore, important to determine if and how workers’ expectations about the evolution of the economy may inﬂuence the process of agglomeration. In particular, we are interested in identifying the conditions under which, when initially the two regions host different numbers of skilled workers, the common belief that these workers will eventually agglomerate in the currently smaller region can reverse the historically inherited advantage of the larger region. Formally, we want to determine the parameter conditions for which there exists an equilibrium path consistent with this belief, assuming that workers have perfect foresight (self-fulﬁlling prophecy). Somewhat different approaches have been proposed to tackle this problem, but they yield similar conclusions (Ottaviano, 1999, Baldwin, 2001, and Ottaviano et al., 2002). In what follows, we use the model of Section 3.3 because it leads to a linear dynamic system that allows for a detailed analysis of the main issues (Krugman, 1991b and Fukao and B´enabou, 1993). Workers live indeﬁnitely with a rate of time preference equal to γ > 0. Because we wish to focus on the sole dynamics of migration, we assume that

9

Observe that the same qualitative results hold for a second-best analysis in which ﬁrms price at the Nash equilibrium while the planner controls their locations (Ottaviano and Thisse, 2002).

328

Fujita and Thisse

the consumption of the num´eraire is positive for each point in time so that there is no intertemporal trade in the differentiated good. For concreteness, consider the case in which workers expect agglomeration to occur in region A, whereas region B is initially larger than A. Formally, we assume that there exists T ≥ 0 such that, given λ0 < 1/2, ·

λ (t) > 0 λ (t) = 1

t ∈ [0, T ), t ≥ T.

(4.2)

Because workers have perfect foresight, the easiest way to generate a non– bang-bang migration behavior is to assume that, when moving from one region to the other, workers incur a utility loss that depends on the rate of migration, perhaps because a migrant imposes a negative externality on the others. Speciﬁcally, we assume that the cost C M(t) borne by a migrant at time t is proportional to the corresponding migration ﬂow: ) dλ(t) C M(t) ≡ δ, (4.3) dt where δ is a positive constant whose meaning is given herein. For each region r = A, B, let us deﬁne T Vr (t) ≡ e−γ (s−t) vr (s)ds + e−γ (T −t) v A (T )/γ t ∈ [0, T ),

(4.4)

t

where vr (s) is the instantaneous indirect utility at time s in region r . By definition, for r = A, V A (t) is the discounted sum of utility ﬂows of a worker who moves from B to A at time t (i.e., today), whereas for r = B, VB (t) is that of a worker who currently resides in B and plans to move to A at time T . Because workers are free to choose when to immigrate, in equilibrium they must be indifferent about the time t at which they move. Hence, at any t < T , the following equality must hold: V A (t) − C M(t) = VB (t) − e−r (T −t) C M(T ). Furthermore, because no worker residing currently in B wishes to postpone his migration time beyond T , it must be that C M(T ) = 0 (Fukao and B´enabou, 1993), and thus V A (t) − C M(t) = VB (t)

t ∈ [0, T ).

Using (4.2) and (4.3), we then obtain dλ = δV dt

t ∈ [0, T ),

(4.5)

where V ≡ (V A − VB ), and δ can be interpreted as the speed of adjustment. This means that the private marginal cost of moving equals its private marginal beneﬁt at any time t < T ; of course, λ(T ) = 1.

Agglomeration and Market Interaction

329

Using (4.4), we obtain the second law of motion by differentiating V A (t) − VB (t), thus yielding dV = γ V − v dt

t ∈ [0, T ),

(4.6)

where v ≡ v A − v B stands for the instantaneous indirect utility differential ﬂow given by (3.25). The expression (4.6) states that the “annuity value” of being in A rather than in B (i.e., γ V ) equals the “dividend” (v) plus the “capital gain” (dV /dt). As a result, because (3.25) is linear in λ, we obtain a system of two differential equations instead of one. The system (4.5) and (4.6) always has a steady state at (λ, V ) = (1/2, 0) that corresponds to the symmetric conﬁguration. When τ > τ ∗ , this steady state is globally stable. So, for the assumed belief (4.2) to be consistent with equilibrium, it must be τ < τ ∗ . Then, the study of the eigenvalues of the system (4.5) and (4.6) shows that two cases may arise. In the ﬁrst one,√when workers’ migration costs are sufﬁciently large (δ is such that γ > 2 Cδτ (τ ∗ − τ )), the outcome of the migration dynamics is the same as the one described in Section 3.3. In other words, the equilibrium path is not consistent with (4.2), thus implying that expectations do not matter. √ By contrast, when migration costs are small enough (γ < 2 Cδτ (τ ∗ − τ )), expectations may matter. More precisely, there exist two threshold values for the transport costs τ1 < τ ∗ /2 < τ2 < τ ∗ , as well as two boundary values λ1 < 1/2 < λ2 < 1 such that the equilibrium path is consistent with (4.2) if and only if τ ∈ (τ1 , τ2 ) and λ0 ∈ [λ1 , λ2 ]. Namely, as long as obstacles to trade take intermediate values and regions are not initially too different, the region that becomes the core is determined by workers’ expectations. This is more so either the lower the migration costs or the lower the discount rate. 4.4.

The Impact of a Heterogeneous Labor Force

So far, workers have been assumed to be identical in terms of preferences. Although this assumption is fairly standard in economic modeling, it seems highly implausible that potentially mobile individuals will react in the same way to some “gap” between regions. First of all, it is well known that some people show a high degree of attachment to the region in which they were born. They will stay put even though they may guarantee to themselves higher living standards in other places. In the same spirit, lifetime considerations such as marriage, divorce, and the like play an important role in the decision to migrate. Second, regions are not similar and exhibit different natural and cultural features. Clearly, people value differently local amenities, and such differences in attitudes are known to affect the migration process. These considerations are fundamental ingredients of the migration process and should be accounted for explicitly in workers’ preferences. Even though the personal motivations may be quite diverse and, therefore, difﬁcult to model at the individual level, it is possible to identify their aggregate impact on the

330

Fujita and Thisse

spatial distribution of economic activities using discrete choice theory, in much the same way that consumer preferences for differentiated products are modeled (Anderson et al. 1992). Speciﬁcally, we assume that the “matching” of workers’ with regions is expressed through the logit (McFadden 1974). This assumption turns out to be empirically relevant in migration modeling (see, e.g., Anderson and Papageorgiou 1994), whereas it is analytically convenient without affecting the qualitative nature of the main results. Then, the probability that a worker will choose to reside in region r is given by pr (λ) =

exp[vr (λ)/υ] , exp[v A (λ)/υ] + exp[v B (λ)/υ]

where υ expresses the dispersion of individual tastes: the larger υ, the more heterogeneous the responsiveness of workers to living standards differences v(λ) given by (3.25).10 When υ = 0, the living standard response is overwhelming and workers relocate until standards of living are equal in the two regions; when υ → ∞, mobility responds only to amenity differentials and the probability of moving is exogenous with respect to living standards. In the present setting, it should be clear that the population of workers changes according to the following equation of motion: dλ = (1 − λ) p B (λ) − λp A (λ) dt λ 1−λ − , = 1 + exp[−V (λ)/υ] 1 + exp[V (λ)/υ]

(4.7)

in which the ﬁrst term on the right-hand side of (4.7) stands for the fraction of people migrating into region A, whereas the second term represents those leaving this region for region B. Using theorem 5 by Tabuchi (1986), it is then readily veriﬁed that, for sufﬁciently large values of υ, there exists a unique stable equilibrium in which the manufacturing sector is equally distributed between regions. Otherwise, there exist two stable equilibria involving each partial agglomeration of the manufacturing sector in one region, whereas dispersion arises for very low values of these costs. As expected, taste heterogeneity prevents the emergence of a fully agglomerated equilibrium and favors the dispersion of activities.11 4.5.

Intermediate Sector and Industrial Agglomeration

In the models described previously, agglomeration is the outcome of a circular causation process in which more workers concentrate within the same region because they love variety. However, if workers are immobile, no agglomeration can arise. Instead, each region specializes in the production of differentiated 10 11

Alternately, it could be evaluated at ω(λ), which is deﬁned in Section 3. See Tabuchi and Thisse (2002) for more details.

Agglomeration and Market Interaction

331

varieties on the basis of their initial endowments, and intraindustry trade occurs for all values of the transport costs. However, the agglomeration of industries is a pervasive phenomenon even when labor is sticky (e.g., between countries). Venables (1996) suggests that an alternative explanation is to account for the fact that the modern sector uses an array of differentiated intermediate goods. In this case, the agglomeration of the ﬁnal sector in a particular region may occur because the concentration of the intermediate industry in that region makes the ﬁnal sector more productive and vice versa. Evidence reveals, indeed, the importance of the proximity of high-quality business services for the economic success of an urban area (Kolko, 1999). Workers being immobile, we may consider a single type of labor. Because its output is taken as homogeneous, the M-sector is assumed to operate under constant returns to scale and perfect competition. The M-good is produced according to the production function X M = l 1−α I α where

I =

M

0 < α < 1,

ρ

[q(i)] di

$1/ρ 0 0. The θi ’s are assumed to be independently and identically distributed with 1 , Prob (θi ≤ z) = 1 + exp (−νz) for some ν > 0. h measures the preference of the average agent for one of the actions, J the desire for conformity, and θi is a shock to the utility of taking the action ai = −1. Brock and Durlauf also consider generalized versions of this model where the γi j s vary, thus allowing each agent to have a distinct peer group. Example 2.2. Glaeser and Scheinkman (2001). The utility functions are: 1−β 2 β ai − (ai − Ai )2 + (θi − p)ai . 2 2 Here, 0 ≤ β ≤ 1 measures the taste for conformity. In this case, U i (ai , Ai , θi , p) = −

ai = [β Ai + θi − p].

(2.6) Ai s

Note that, when p = 0, β = 1, and the are the average action of all other agents, this is a version of the Brock–Durlauf model with continuous actions. 11

A related example is in Aoki (1995).

Nonmarket Interactions

347

Unfortunately, this case is very special. Equilibria exist only if i θi = 0, and, in this case, a continuum of equilibria would exist. The model is, as we will show, much better behaved when β < 1. In Glaeser and Scheinkman (2001), the objective was to admit both local and global interactions in the same model to try to distinguish empirically between them. This was done by allowing for two reference groups, and setting Pi1 = {1, . . . , n} − i, Ai1 the average action of all other agents, Pi2 = {i − 1} if i > 1, P12 = {n}, and writing 2 1 − β1 − β2 2 β1 U i ai , Ai1 , ai−1 , θi , p = − ai − ai − Ai1 2 2 β2 − (ai − ai−1 )2 + (θi − p)ai . 2 Example 2.3. The class of models of strategic complementarity discussed in Cooper and John (1988). Again the reference group of agent i is Pi = {1, . . . , n} − i. The set A is an interval on the line and Ai = 1/(n − 1) a j=i . There is no heterogeneity and the utility of each agent is U i = U (ai , Ai ). Cooper and John (1988) examine symmetric equilibria. The classic production externality example ﬁts in this ¯ framework. Each agent chooses an effort ai , and the resulting output is f (ai , a). Each agent consumes his per capita output and has a utility function u(ci , ai ). Write (n − 1)Ai + ai , ai . U (ai , Ai ) = u f ai , n Example 2.4. A simple version of the model of Diamond (1982) on trading externalities. Each agents draws an ei , which is his cost of production of a unit of the good. The ei ’s are distributed independently across agents and with a distribution H and density h > 0, with support on a (possibly inﬁnite) interval [0, d]. After a period in which the agent decides to produce or not, he is matched at random with a single other agent, and if they both have produced, they exchange the goods and each enjoys utility u > 0. Otherwise, if the agent has produced, he obtains utility θi ≥ 0 from the consumption of his own good. If the agent has not produced, he obtains utility 0. We assume that all agents use a cutoff policy, a level xi such that the agent produces if and only if ei ≤ xi . We set ai = H (xi ), the probability that agent i will produce. Here, the reference group is again all j = i, and j=i a j Ai = E(a j | j = i) ≡ . n−1

348

Glaeser and Scheinkman

Hence, if he uses policy ai , an agent has an expected utility that equals H −1 (ai ) U i (ai , Ai , θi ) = [u Ai + θi (1 − Ai ) − e]h(e)de. 0

Optimality requires that xi = min{u Ai + θi (1 − Ai ), d}. Suppose ﬁrst that θi ≡ 0. A symmetric equilibrium (ai ≡ a) will exist whenever there is a solution to the equation a = H (ua).

(2.7)

If H is the uniform distribution in [0, u], then every a ∈ [0, 1] is a symmetric equilibrium, As we will show in Proposition 2.2, this situation is very special. For a ﬁxed H , for almost every vector θ = (θ1 , . . . , θn ), (interior) equilibria are isolated. Example 2.5. A matching example that requires multiple reference groups (Pesendorfer, 1995). In a simple version, there are two groups, leaders (L) and followers (F), with n L and n F members, respectively. An individual can use one of two kinds of clothes. Buying the ﬁrst one (a = 0) is free; buying the second (a = 1) costs p. Agents are matched randomly to other agents using the same clothes. Suppose the utility agent i, who is of type t ∈ {L , F} and is matched to an agent of type t , is Vi (t, t , a, p, θi ) = u(t, t ) − ap + θi a, where θi is a parameter that shifts the preferences for the second kind of clothes. Assume that u(L , L) − u(L , F) > p > u(F, L) − u(F, F) > 0,

(2.8)

where we have abused notation by writing u(L , L) instead of u(t, t ) with t ∈ L and t ∈ L etc. In this example, each agent has two reference groups. If i ∈ L, then Pi1 = L − {i} and Pi2 = F. On the other hand, if i ∈ F, then Pi1 = L and Pi2 = F − {i}. 2.3.

Equilibria with Continuous Actions

In this subsection, we derive results concerning the existence, number of equilibria, stability, and ergodicity of a basic continuous action model. We try not to rely on a speciﬁc structure of reference groups or to assume a speciﬁc weighting for each reference group. We assume that A is a (possibly unbounded) interval in the real line, that each U i is at least twice continuously differentiable, and that i the second partial derivative with respect to an agent’s own action U11 < 0.12 Each agent i has a single reference group Pi . The choice a single peer group for each agent and a scalar action is not crucial, but it substantially simpliﬁes the notation. 12

i As usual, this inequality can be weakened by assuming that U11 ≤ 0 and that at the optimal choice strict inequality holds.

Nonmarket Interactions

349

We also assume that the optimal choices are interior, and hence, because i ∈ Pi , the ﬁrst-order condition may be written as U1i (ai , Ai , θi , p) = 0.

(2.9)

i < 0, then ai = g i (Ai , θi , p) is well deﬁned and Because U11

g1i (Ai , θi , p) = −

i (ai , Ai , θi , p) U12 . i U11 (ai , Ai , θi , p)

We will write G(a, θ, p) for the function deﬁned in R n × G(a, θ, p) = g 1 (A1 , θ1 , p), . . . , g n (An , θn , p) .

(2.10) n

× given by

Recall that, for given vectors θ = (θ1 , . . . , θn ) ∈ n and p, an equilibrium for (θ, p) is a vector a(θ, p) = (a1 (θ, p), . . . , an (θ, p)), such that, for each i, ai (θ, p) = g i (Ai (a(θ, p)), θi , p).

(2.11)

Proposition 2.1 gives conditions for the existence of an equilibrium. Proposition 2.1. Given a pair (θ, p) ∈ n × , suppose that I is a closedbounded interval such that, for each i, g i (Ai , θi , p) ∈ I, whenever Ai ∈ I. Then, there exists at least one equilibrium a(θ, p) ∈ I n . In particular, an equi◦ librium exists if there exists an m ∈ R, with [−m, m] ⊂ A , and such that, for any i and Ai ∈ [−m, m], U1i (−m, Ai , θi , p) ≥ 0, and U1i (m, Ai , θi , p) ≤ 0. Proof. If a ∈ I n , because Ai is a convex combination of the entries of a, Ai ∈ I. Because g i (Ai , θi , p) ∈ I, whenever Ai ∈ I, the (continuous) function G(·, θ, p) maps I n into I n , and therefore must have at least one ﬁxed point. The second part of the proposition follows because U11 < 0 implies that g i (Ai , θi , p) ∈ [−m, m], whenever Ai ∈ [−m, m]. QED Proposition 2.1 gives us sufﬁcient conditions for the existence of an equilibrium for a given (θ, p). The typical model, however, describes a process for generating the θi ’s in the cross-section. In this case, not all pairs (θ, p) are equally interesting. The process generating the θi ’s will impose a distribution on the vector θ, and we need only to check the assumptions of Proposition 2.1 on a set of θ’s that has probability one. For a ﬁxed p, we deﬁne an invariant interval I as any interval such that there exists a set ! ⊂ n with Prob (!) = 1, such that for each i, and for all θ ∈ !, g i (Ai , θi , p) ∈ I, whenever Ai ∈ I . If multiple disjoint compact invariant intervals exist, multiple equilibria prevail with probability one. It is relatively straightforward to construct models with multiple equilibria that are perturbations of models without heterogeneity.13 Suppose that is an 13

A model without heterogeneity is one where all utility functions U i and shocks θi are identical. We choose the normalization θ i ≡ 0. We will consider perturbations in which the utility functions are still uniform across agents, but the θ i can differ across agents.

350

Glaeser and Scheinkman

interval containing 0 and that g(A, θ ) is a smooth function that is increasing in both coordinates. The assumption that g is increasing in θ is only a normalization. In contrast, the assumption that g is increasing in A is equivalent to U12 > 0 (i.e., an increase in the average action by the members of his reference group, increases in the marginal utility of an agent’s own action). This assumption was called strategic complementarity in Bulow, Geanakoplos, and Klemperer (1985). Let x be a stable ﬁxed point of g(·, 0) [i.e., g(x, 0) = 0 and g1 (x, 0) < 1]. If the interval is small enough, there exists an invariant interval containing x. In particular, if a model without heterogeneity has multiple stable equilibria, the model with small noise, that is, where θ i ∈ , a small interval, will also have multiple equilibria. The condition on invariance must hold for almost all θ ∈ . In particular, if we have multiple disjoint invariant intervals and we shrink , we must still have multiple disjoint invariant intervals. On the other hand, if we expand , we may lose a particular invariant interval, and multiple equilibria are no longer assured. An implication of this reasoning is that when individuals are sorted into groups according to their θs, and agents do not interact across groups, then multiple equilibria are more likely to prevail. In Section, 2.5, we discuss a model where agents sort on their θ s. In this literature, strategic complementarity is the usual way to deliver the existence of multiple equilibria. The next example shows that, in contrast to the results of Cooper and John (1988), in our model, because we consider a richer structure of reference groups, strategic complementarity is not necessary for multiple equilibria. Example 2.6. This is an example to show that, in contrast to the case of purely global interactions, strategic complementarity is not a necessary condition for multiple equilibria. There are two sets of agents {S1 } and {S2 }, and n agents in each set. For agents of a given set, the reference group consists of all the agents of the other set. If i ∈ Sk , Ai =

1 aj, n j∈S"

" = k. There are two goods, and the relative price is normalized to one. Each agent has an initial income of one unit, and his objective is to maximize U i (ai , Ai ) = log ai + log(1 − ai ) +

λ (ai − Ai )2 . 2

(2.12)

Only the ﬁrst good exhibits social interactions, and agents of each set want i to differentiate from the agents of the other set. Provided λ < 8, U11 < 0. However, there is no strategic complementarity – an increase in the action of others (weakly) decreases the marginal utility of an agent’s own action. We will look for equilibria with ai constant within each set. An equilibrium of this type is described by a pair x, y of actions for each set of agents. In equilibrium

Nonmarket Interactions

351

we must have: 1 − 2x + λx(1 − x)(x − y) = 0, 1 − 2y + λy(1 − y)(y − x) = 0.

(2.13) (2.14)

Clearly x = y = 1/2 is always an equilibrium. It is the unique equilibrium that is symmetric across groups. Provided λ < 4, the Jacobian associated with equations (2.13) and (2.14) is positive, which is compatible with uniqueness even if we consider asymmetric equilibria. However, whenever λ > 4, the Jacobian becomes negative and other equilibria must appear. For instance, if λ = 4.04040404, x = .55 and y = .45 is an equilibrium, and consequently so is x = .45 and y = .55. Hence, at least three equilibria are obtained, without strategic complementarity. Proposition 2.1 gives existence conditions that are independent of the structure of the reference groups and the weights γi j ’s. Also, the existence of multiple invariant intervals is independent of the structure of interactions embedded in the Pi s and γi j s, and is simply a result of the choice of an individual’s action, given the “average action” of his reference group, the distribution of his taste shock, and the value of the exogenous parameter p. In some social interaction models, such as the Diamond search model (Example 2.4), there may exist a continuum of equilibria. The next proposition shows that these situations are exceptional. Proposition 2.2. Suppose is an open subset of R k and that there exists a coorj j dinate j such that ∂U1i /∂θi = 0; that is, θi has an effect in the marginal utility of the action. Then, for each ﬁxed p, except for a subset of n of Lebesgue j measure zero, the equilibria are isolated. In particular if the θi ’s are independently distributed with marginals that have a density with respect to the Lebesgue measure, then, for each ﬁxed p, except for a subset of n of zero probability, the equilibria are isolated. Proof. For any p, consider the map F(a, θ) = a − G(a, θ, p). The matrix of partial derivatives of F with respect to θ j is a diagonal matrix with entry j dii = 0, because ∂U1i /∂θi = 0. Hence, for each ﬁxed p, D F has rank n, and it is a consequence of Sard’s theorem (see, e.g., Mas-Colell 1985, p. 320) that, except perhaps for a subset of n of Lebesgue measure zero, F1 has rank n. The implicit function theorem yields the result. QED Consider again the search model discussed in Example 2.4. Suppose that u ≤ d and that each θi is in an open interval contained in (0, d). Then, at any interior equilibrium, the assumptions of the Proposition are satisﬁed. This justiﬁes our earlier claim that the continuum of equilibria exists when θi ≡ 0 is exceptional. In the model discussed in Example 2.2, if p = 0, β = 1, and

352

Glaeser and Scheinkman

the reference group of each agent is made up by all other agents (with equal weights), then if θi = 0, there are no equilibria, whereas if θi = 0, there is a continuum. Again, the continuum of equilibria is exceptional. However, if β < 1, there is a unique equilibrium for any vector θ. This situation is less discontinuous than it seems. In equilibrium, 1 θi ai = . n 1−β n Hence, if we ﬁx θi = 0 and drive β to 1, the average action becomes unbounded. Although Proposition 2.2 is stated using the θi ’s as parameters, it is also true that isolated equilibria become generic if there is heterogeneity across individuals’ utility functions. One occasionally proclaimed virtue of social interaction models is that they create the possibility that multiple equilibria might exist. Proposition 2.1 gives us sufﬁcient conditions for there to be multiple equilibria in social interactions models. One way to ensure uniqueness in this context is to place a bound on the effect of social interactions. We will say that MSI prevails if the marginal utility of an agent’s own action is more affected (in absolute value) by a change on his own action than by a change in the average action of his peers. More precisely, we say that MSI prevails if i (ai , Ai , θi , p) U12 < 1. i U11 (ai , Ai , θi , p)

(2.15)

From equation (2.10), the MSI condition implies |g1i (Ai , θi , p)| < 1.

(2.16)

This last condition is, in fact, weaker than inequality (2.15), because it is equivalent to inequality (2.15) when ai is optimal, given (Ai , θi , p). We use only inequality (2.16), and therefore we will refer to this term as the MSI condition. The next proposition shows that, if the MSI condition holds, there will be at most one equilibrium.14 Proposition 2.3. If for a ﬁxed (θ, p), MSI holds [that is, inequality (2.16) is veriﬁed for all i], then there exists at most one equilibrium a(θ, p). Proof. The matrix of partial derivatives of G with respect to a, which we denote by G 1 (a, θ, p), has diagonal elements equal to 0 and, using Equation (2.10), off-diagonal elements di j = g1i (Ai , θi , p)γi j . Also, for each i, 14

Cooper and John (1988) had already remarked that an analoguous condition is sufﬁcient for uniqueness in the context of their model.

Nonmarket Interactions

|di j | = |g1i (Ai , θi , p)|

j=i

353

γi j = |g1i (Ai , θi , p)| < 1.

j=i

It follows from the mean-value theorem that, for each (θ, p), G(a, θ, p) = a has a unique solution. QED To guarantee that uniqueness always prevails, MSI should hold for all (θ, p) ∈ n × . The assumption in Proposition 2.3 is independent of the structure of interactions embedded in the Pi ’s and the γi j ’s. An example where MSI is satisﬁed is when U (ai , Ai , θi , p) = u(ai , θi , p) + w(ai − Ai , p), where u 11 < 0, and, for each p, w(·, p) is concave. If, in addition to MSI, we assume strategic complementarity (U12 > 0), we can derive stronger results. Suppose p has a component, say p 1 , such that each g i has a positive partial derivative with respect to p 1 . In equilibrium, we have, writing F1 = I − G 1 , 1 ∂g ∂g n

∂a −1 = (F1 ) (a, θ, p) ,..., 1 . (2.17) ∂ p1 ∂ p1 ∂p Because F1 has a dominant diagonal that is equal to one, we may use the Neumann expansion to write (F1 )−1 = I + (I − F1 ) + (I − F1 )2 + · · · .

(2.18)

Recall that all diagonal elements of (I − F1 ) are zero and that the off-diagonal elements are g1i (Ai , θi , p)γi j > 0. Hence, each of the terms in this inﬁnite series is a matrix with nonnegative entries, and 1 ∂a ∂g ∂g n

= (I + H ) ,..., 1 , (2.19) ∂ p1 ∂ p1 ∂p where H is a matrix with nonnegative elements. The nonnegativity of the matrix H means that there is a social multiplier (as in Becker and Murphy 2000).15 An increase in p 1 , holding all a j ’s, j = i, constant, leads to a change dai =

∂g i (Ai , θi , p) 1 dp , ∂ p1

whereas, in equilibrium, that change equals

∂g j (A j , θ j , p) ∂g i (Ai , θi , p) d p1 . + Hi j 1 ∂ p1 ∂ p j The effect of a change in p 1 on the average ai A¯ ≡ i n 15

Cooper and John (1988) deﬁne a similar multiplier by considering symmetric equilibria of a game.

354

Glaeser and Scheinkman

is, in turn,

j (∂g i (Ai , θi , p) ∂g (A , θ , p) 1 j j dA¯ = d p1 . + Hi j 1 1 n ∂ p ∂ p i i, j

This same multiplier also impacts the effect of the shocks θi . Differences in the sample realizations of the θi s are ampliﬁed through the social multiplier effect. The size of the social multiplier depends on the value of g1i ≡ ∂g/∂ Ai . If these numbers are bounded away from one, one can bound the social multiplier. However, as these numbers approach unity, the social multiplier effect gets arbitrarily large. In this case, two populations with slightly distinct realizations of the θi s could exhibit very different average values of the actions. In the presence of unobserved heterogeneity, it may be impossible to distinguish between a large multiplier (that is, g1 is near unity) and multiple equilibria. Propositions 2.1 and 2.3 give us conditions for multiplicity or uniqueness. At this level of generality, it is impossible to reﬁne these conditions. It is easy to construct examples, where g1 > 1 in some range, but still only one equilibrium exists. One common way to introduce ad hoc dynamics in social interaction models is to simply assume that, in period t, each agent chooses his action based on the choices of the agents in his reference group at time t − 1.16 Such processes are not guaranteed to converge, but the next proposition shows that when MSI prevails, convergence occurs. Let a t (θ, p, a 0 ) be the solution to the difference equation a t+1 = G(a t , θ, p), with initial value a 0 . Proposition 2.4. If, for a ﬁxed (θ, p), |g1i (·, θi , p)| < 1, for all i, then lim a t (θ, p, a 0 ) = a(θ, p).

t→∞

Proof. For any matrix M, let #M# = maxi j |Mi j | be the matrix norm. Then, maxi |ait+1 − ai (θ, p)| ≤ sup y #G 1 (y, θ, p)# (maxi |ait − ai (θ, p)|) ≤ maxi |ait − ai (θ, p)| Hence, the vectors a t stay in a bounded set B and, by assumption, sup y∈B #G 1 (y, θ, p)# < 1. Hence, limt→∞ a t (θ, p, a 0 ) = a(θ, p). QED One intriguing feature of social interaction models is that, in some of these models, individual shocks can determine aggregate outcomes for large groups. 16

In social interaction models, ad hoc dynamics is frequently used to select among equilibria as in Young (1993, 1998) or Blume and Durlauf (1998).

Nonmarket Interactions

355

In contrast to the results presented earlier, which are independent of the particular interaction structure, ergodicity depends on a more detailed description of the interactions. For instance, consider the model in Example 2.2 with p = 0, the θi ’s iid, P1 = ∅, and Pi = {1} for each i > 1. That is, agent 1 is a “leader” that is followed by everyone. Then, a1 = θ1 and ai = θi + βa1 . Hence, the average action, even as n → ∞, depends on the realization of θ1 , even though the assumption of Proposition 2.3 holds. Our next proposition shows that, when MSI holds, shocks are iid, and individuals’ utility functions depend only on their own actions and the average action of their peer group, then, under mild technical conditions, the average action of a large population is independent of the particular realization of the shocks. Proposition 2.5. Suppose that 1. 2. 3. 4. 5. 6.

θi is identically and independently distributed. U i (and hence g i ) is independent of i. Pi = {1, . . . , i − 1, i + 1, . . . , n}. γi, j ≡ 1/(n − 1). A is bounded. MSI holds uniformly, that is, sup |g1 (Ai , θi , p)| < 1. Ai ,θi

Let a n (θ, p) denote the equilibrium when n agents are present and agent i ¯ p) such that, with probability one, receives shock θi . Then, there exists an A( lim

n→∞

n a n (θ, p) i

i=1

n

¯ p). = A(

(2.20)

n Proof. We omit the argument p from the proof. Let An (θ ) = i=1 ain (θ )/n. The boundedness of A ensures that there are convergent subsequences An k (θ ). Suppose the limit of one such convergent subsequence is A(θ). Note that Ain k (θ) − An k (θ) ≤ b/n k , for some constant b. Hence, for any ( > 0, we can ﬁnd K such that if k ≥ K , nk a n k (θ) i

i=1

nk

−

nk nk nk g(Ain k , θi ) g(A(θ), θi ) g(A(θ ), θi ) − = ≤ (. nk nk nk i=1 i=1 i=1

(2.21) Furthermore, because the θi are iid and g1 is uniformly bounded, there exists a set of probability one that can be chosen independent of A, such that, n g(A, θi ) g(A, y)d F(y), → n i=1

356

Glaeser and Scheinkman

where F is the distribution of each θi . Hence, given any ( > 0, if k is sufﬁciently large, An k (θ) − g(A(θ ), y)d F(y) ≤ (, or

A(θ ) =

g(A(θ ), y)d F(y)

in the hypothesis of the proposition guarantees that g(·, θi ) is a contraction and, ¯ In particular, as a consequence, this last equation has at most one solution, A. n all convergent subsequences of the bounded sequence A (θ) converge to A¯ and, ¯ QED. hence, An (θ) → A. The assumptions in the proposition are sufﬁcient, but not necessary, for ergodicity. In general, models in which shocks are i.i.d. and interactions are local tend to display ergodic behavior. 2.4.

“Mean Field” Models with Large Populations and Discrete Actions

In this subsection, we will examine models with discrete action spaces (actually two possible actions), in which the utility function of the agents depends on their own action and the average action taken by the population. Much of our framework and results are inspired by the treatment by Brock and Durlauf (1995) of Example 2.1 described previously. The action space of individuals is {0, 1}. As in Brock and Durlauf, we will assume that U i = U (ai , A, p) + (1 − ai )θi ; that is, the shock θi is the extra utility an agent obtains from taking action 0. We will assume that U (ai , ·, ·) is smooth and that the θi ’s are iid with a cdf F with continuous density f. Agents do not internalize the effect that their action has on the average action. We also assume strategic complementarity, which in this context we take to be U2 (1, A, p) − U2 (0, A, p) > 0; that is, an increase in the average action increases the difference in utility between action 1 and action 0. Given A, agent i will take action 1 if, and only if, θi ≤ U (1, A, p) − U (0, A, p). In a large population, a fraction F(U (1, A, p) − U (0, A, p)) will take action 1; the remainder will take action 0. A mean-ﬁeld equilibrium, thereafter MFE, is an average action A¯ such that ¯ p) − U (0, A, ¯ p)) − A¯ = 0. F(U (1, A,

(2.22)

This deﬁnition of an MFE is exactly as in the Brock and Durlauf treatment of Example 2.1. The next proposition corresponds to their results concerning equilibria in that example.

Nonmarket Interactions

357

Proposition 2.6. An MFE always exists. If 0 < A¯ < 1 is an equilibrium where ¯ p) − U (0, A, ¯ p))[U2 (1, A, ¯ p) − U2 (0, A, ¯ p)] > 1, (2.23) f (U (1, A, ¯ On then there are also at least two other MFE’s, one on each side of A. ¯ ¯ ¯ the other hand, if, at every MFE, f (U (1, A, p) − U (0, A, p))[U2 (1, A, p) − ¯ p)] < 1, there exists a single MFE. U2 (0, A, Proof. H (A) = F(U (1, A, p) − U (0, A, p)) − A satisﬁes H (0) ≥ 0, and ¯ = 0 and H (1) ≤ 0 and is continuous. If inequality (2.23) holds, then H (A)

¯ H (A) > 0. QED. The ﬁrst term on the left-hand side of inequality (2.23) is the density of agents that are indifferent between the two actions, when the average action is ¯ The second term is the marginal impact of the average action on the preference A. for action 1 over action 0, which, by our assumption of strategic complementarity, is always > 0. This second term corresponds exactly to the intensity of social inﬂuence that played a pivoting role in determining the uniqueness of equilibrium in the model with a continuum of actions. If there is a unique equilibrium,17 then ¯ p) − U (0, A, ¯ p))[U3 (1, A, ¯ p) − U3 (0, A, ¯ p)] f (U (1, A, ∂A¯ = . ¯ p) − U (0, A, ¯ p))[U2 (1, A, ¯ p) − U2 (0, A, ¯ p)] ∂p 1 − f (U (1, A, (2.24) The numerator in this expression is exactly the average change in action, when p changes, and agents consider that the average action remains constant. The denominator is, if uniqueness prevails, positive. As we emphasized in the model with continuous actions, there is a continuity in the multiplier effect. As the parameters of the model (U and F) approach the region of multiple equilibria, the effect of a change in p on the equilibrium average action approaches inﬁnity. In many examples, the distribution F satisﬁes: 1. Symmetry ( f (z) = h(|z|)) 2. Monotonicity (h is decreasing) If, in addition, the model is unbiased [U (1, 1/2, p) = U (0, 1/2, p)], then A = 1/2 is an MFE. The fulﬁllment of inequality (2.23) now depends on the value of f (0). This illustrates the role of homogeneity of the population in producing multiple equilibria. If we consider a parameterized family of models in which the random variable θi = σ xi , where σ > 0, then f σ (0) = (1/σ ) f 1 (0). As σ → 0 (σ → ∞), inequality (2.23) must hold (resp. must reverse). In particular, if the 17

In here and in what follows, we require strict uniqueness; that is, the left-hand side of inequality (2.23) is less than one.

358

Glaeser and Scheinkman

population is homogeneous enough, multiple equilibria must prevail in the unbiased case. These reasonings can be extended to biased models, if we assume that [U2 (1, ·, p) − U2 (0, ·, p)] is bounded and bounded away from zero, and that the density f 1 is continuous and positive.18 For, in this case, for σ large, sup{ f σ (U (1, A, p) − U (0, A, p))[U2 (1, A, p) − U2 (0, A, p)]} < 1. A

(2.25) Hence, equilibrium will be unique, if the population displays sufﬁcient heterogeneity. On the other hand, as σ → 0, inequality (2.25) is reversed and multiple equilibria appear. We can derive more detailed properties if we assume, in addition to the symmetry and monotonicity properties of f, that U22 (1, A, p) − U22 (0, A, p) ≤ 0; that is, the average action A has a diminishing marginal impact on the preference for the high action. In that case, it is easy to show that there are at most three equilibria. 2.5.

Choice of Peer Group

The mathematical structure and the empirical description of peer or reference groups vary from model to model. In several models (e.g., Benabou, 1993, Glaeser, Sacerdote, and Scheinkman, 1996, Gabszewicz and Thisse, 1996, or Mobius, 1999), the reference group is formed by geographical neighbors. To obtain more precise results, one must further specify the mathematical structure of the peer group relationship – typically assuming either that all fellow members of a given geographical unit form a reference group or that each agent’s reference group is formed by a set of near-neighbors. Mobius (1999) shows that, in the context that generalizes Schelling’s (1972) tipping model, the persistence of segregation depends on the particular form of the near-neighbor relationship. Glaeser, Sacerdote, and Scheinkman (1996) show that the variance of crime rates across neighborhoods or cities would be a function of the form of the near-neighbor relationship. Kirman (1983), Kirman, Oddou, and Weber (1986), and Ioannides (1990) use random graph theory to treat the peer group relationship as random. This approach is particularly useful in deriving properties of the probable peer groups as a function of the original probability of connections. Another literature deals with individual incentives for the formation of networks (e.g., Boorman, 1975, Jackson and Wolinsky, 1996, and Bala and Goyal 2000).19 18

19

An example that satisﬁes these conditions is the model of Brock and Durlauf described in Example 2.1. Brock and Durlauf use a slightly different state space, but once the proper translations are made, U2 (1, A, p) − U2 (0, A, p) = k J for a positive constant k and 0 < f 1 (z) ≤ ν. A related problem is the formation of coalition in games (e.g., Myerson, 1991).

Nonmarket Interactions

359

One way to model peer group choice is to consider a set of neighborhoods indexed by " = 1, . . . , m each with n " slots with " n " ≥ n.20 Every agent chooses a neighborhood to join after the realization of the θi ’s. To join neighborhood P " , one must pay q" . The peer group of agent i, if he joins neighborhood ", consists of all other agents j that joined " with γi j = γi j for all peers j and j . We will denote by A" the average action taken by all agents in neighborhood ". Our equilibrium notion, in this case, will parallel Tiebout’s equilibrium (see, e.g., Bewley 1981). For given vectors θ = (θ1 , . . . , θn ) ∈ n and p, an equilibrium will be a set of prices (q1 , . . . , qm ), an assignment of agents to neighborhoods, and a vector of actions a = (a1 , . . . , an ), that is, an equilibrium given the peer groups implied by the assignment, such that, if agent i is assigned to neighborhood ", there is no neighborhood " such that

sup U i (ai , A" , θi , p) − q" > sup U i (ai , Ai , θi , p) − q" . ai

(2.26)

ai

In other words, in an equilibrium with endogenous peer groups, we add the additional restriction that no agent prefers to move. To examine the structure of the peer groups that arise in equilibrium we assume, for simplicity, that the U i s are independent of i, that is, that all heterogeneity is represented in the θi s. If an individual with a higher θ gains more utility from an increase of the average action than an individual with a lower θ, then segregation obtains in equilibrium. More precisely, if is an interval [t0 , t 0 ] of the line, and V (A, θ, p) ≡ sup U (ai , A, θ, p) ai

satisﬁes V (A, θ, p) − V (A , θ, p) > V (A, θ , p) − V (A , θ , p) whenever A > A and θ > θ , there exist points t0 = t0 < t1 , < · · · < tm = t 0 such that agent i chooses neighborhood " if and only if θi ∈ [t"−1 , t" ] (e.g., Benabou, 1993, and Glaeser and Scheinkman, 2001). Although other equilibria exist, these are the only “stable” ones. 3. EMPIRICAL APPROACHES TO SOCIAL INTERACTIONS The theoretical models of social interaction models discussed previously are, we believe, helpful in understanding a wide variety of important empirical 20

This treatment of peer group formation is used in Benabou (1993) and Glaeser and Scheinkman (2001). However, in several cases, peer groups have no explicit fees for entry. Mailath, Samuelson, and Shaked (1996) examine the formation of peer groups when agents are matched to others from the same peer group.

360

Glaeser and Scheinkman

regularities. In principle, large differences in outcomes between seemingly homogeneous populations, radical shifts in aggregate patterns of behavior, and spatial concentration and segregation can be understood through social interaction models. But these models are not only helpful in understanding stylized facts, they can also serve as the basis for more rigorous empirical work. In this section, we outline the empirical approaches that can be and have been used to actually measure the magnitude of social interactions. For simplicity, in this empirical section, we focus on the linear-quadratic version of the model discussed in Example 2.2. Our decision to focus on the linear-quadratic model means that we ignore some of the more important questions in social interactions. For example, the case for place-based support to impoverished areas often hinges on a presumption that social interactions have a concave effect on outcome. Thus, if impoverished neighborhoods can be improved slightly by an exogenous program, then the social impact of this program (the social multiplier of the program) will be greater than if the program had been enacted in a more advantaged neighborhood. The case for desegregation also tends to hinge on concavity of social interactions. Classic desegregation might involve switching low human capital people from a disadvantaged neighborhood and high human capital people from a successful neighborhood. This switch will be socially advantageous if moving the low human capital people damages the skilled area less than moving the high human capital people helps the less skilled area. This will occur when social interactions operate in a concave manner. As important as the concavity or convexity of social interactions has been, most of the work in this area has focused on estimating linear effects.21 To highlight certain issues that arise in the empirical analysis, we make many simplifying assumptions that help us focus on the relevant problems.22 We will use the linear model in Example 2.2. We assume we can observe data on C, equally sized,23 groups. All interactions occur within a group. Rewriting equation (2.6) for the optimal action, to absorb p in the θi , we have ai = β Ai + θi .

(3.1)

We will examine here a simple form of global interactions. If agent i belongs to group ", 1 Ai = aj, n − 1 j=i 21

22 23

Crane (1991) is a notable exception. He searches for nonlinearities across a rich range of variables and ﬁnds some evidence for concavity in the social interactions involved in out-of-wedlock births. Reagan, Weinberg, and Yankow (2000) similarly explore nonlinearities in research on work behavior and ﬁnd evidence for concavity. A recent survey of the econometrics of a class of interaction-based binary choice models, and a review of the empirical literature, can be found in Brock and Durlauf (2001). The assumption of equally sized groups is made only to save on notation.

Nonmarket Interactions

361

where the sum is over the agents j in group ", and n is the size of a group. We will also assume that θi = λ" + εi , where the εi ’s are assumed to be iid, mean zero λ" is a place-speciﬁc variable (perhaps price) that affects everyone in the group, and εi is an idiosyncratic shock that is assumed to be independent across people. The average action within a group is λ" i ai i εi = + . (3.2) n 1−β n(1 − β) The optimal action of agent i is then

β j=i ε j λ" (n − 1 − βn + 2β)εi ai = + + . 1−β (n − 1 + β)(1 − β) (n − 1 + β)(1 − β)

(3.3)

The variance of actions on the whole population is 2 β σλ2 3(n − 1) − 2β(n − 2) − β 2 2 . + σε 1 + Var(ai ) = (1 − β)2 1−β (n − 1 + β)2 (3.4) As n → ∞, this converges to [σλ2 /(1 − β)2 ] + σε2 . In this case, and in the cases that are to follow, even moderate levels of n (n = 30+) yield results that are quite close to the asymptotic result. For example, if n = 40 and β ≤ .5, then the bias is at most −.05σε2 . Higher values of β are associated with more severe negative biases; but, when n = 100, a value of β = .75 (which we think of as being quite high) is associated with a bias of only −.135σε2 . 3.1.

Variances Across Space

The simplest, although hardly the most common, method of measuring the size of social interactions is to use the variance of a group average. The intuition of this approach stems from early work on social interactions and multiple equilibria [see, e.g., Schelling (1978), Becker (1991), or Sah (1991)]. These papers all use different social interaction models to generate multiple equilibria for a single set of parameter values. Although multiple equilibria are often used as an informal device to explain large cross-sectional volatility, in fact this multiplicity is not needed. What produces high variation is that social interactions are associated with large differences across time and space that cannot be fully justiﬁed by fundamentals. Glaeser, Sacerdote, and Scheinkman (1996) use this intuition to create a model in which social interactions are associated with a high degree of variance across space without multiple equilibria. Empirically, it is difﬁcult to separate out extremely high variances from multiple equilibria, but Glaeser and Scheinkman (2001) argue that for many variables high-variance models with a single equilibrium are a more parsimonious means of describing the data.

362

Glaeser and Scheinkman

Suppose we obtain m ≤ n observations of members of a group. The sum of the observed actions, normalized by dividing by the square root of the number of observations, will have variance ai σε2 mσλ2 var √i + = 2 (1 − β) (1 − β)2 m β 2 (n − 2) − 2β(n − 1) + (n − m)σε2 . (3.5) (1 − β)2 (n − 1 + β)2 When m = n, (3.5) reduces to [nσλ2 /(1 − β)2 ] + [σε2 /(1 − β)2 ], which is similar to the variance formula in Glaeser, Sacerdote, and Scheinkman (1996) or Glaeser and Scheinkman (2001). Thus, if m = n and σλ2 = 0, as n → ∞ the ratio of the variance of this normalized aggregate to the variance of individual actions converges to 1/(1 − β)2 . Alternatively, if m is ﬁxed, then as n grows large, the aggregate variance converges to mσλ2 + σε2 , (1 − β)2 and the ratio of the aggregate variance to the individual variance (when σλ2 = 0) converges to one. The practicality of this approach hinges on the extent to which σλ2 is either close to zero or known.24 As discussed previously, λ" may be nonzero either because of correlation of background factors or because there are place-speciﬁc characteristics that jointly determine the outcomes of neighbors. In some cases, researchers may know that neighbors are randomly assigned and that omitted place-speciﬁc factors are likely to be small. For example, Sacerdote (2000) looks at the case of Dartmouth freshman year roommates who are randomly assigned to one another. He ﬁnds signiﬁcant evidence for social interaction effects. In other contexts [see Glaeser, Sacerdote, and Scheinkman (1996)], there may be methods of putting an upper bound on σλ2 that allows the variance methodology to work. Our work found extremely high aggregate variances that seem hard to reconcile with no social interactions for reasonable levels of σλ2 . In particular, we estimated high levels of social interactions for petty crimes and crimes of the young. We found lower levels of social interactions for more serious crimes. 3.2.

Regressing Individual Outcomes on Group Averages

The most common methodology for estimating the size of social interactions is to regress an individual outcome on the group average. Crane (1991), discussed previously, is an early example of this approach. Case and Katz (1991) is another early paper implementing this methodology (and pioneering the instrumental variables approach discussed herein). Since these papers, there has been a torrent 24

In principle, we could use variations in n across groups, and the fact that when m = n the variance of the aggregates is an afﬁne function of m to try to separately estimate σλ and σε .

Nonmarket Interactions

363

of later work using this approach, and it is the standard method of trying to measure social interactions. We will illustrate the approach considering a univariate regression in which an individual outcome is regressed on the average outcome in that individual’s peer group (not including himself). In almost all cases, researchers control for other characteristics of the subjects, but these controls would add little but complication to the formulas. The univariate ordinary least squares coefﬁcient for a regression of an individual action on the action of his peer is cov ai , j=i a j /(m − 1) . (3.6) Var j=i a j /(m − 1) √ The denominator is a transformation of (3.5), where m − 1 replaces m: σλ2 j=i a j = Var m−1 (1 − β)2 + mσε2

[(n − 1 + β) − β(n − m)]2 + β 2 m(n − m) . (m − 1)2 (1 − β)2 (n − 1 + β)2 (3.7)

The numerator is cov ai ,

j=i

aj

m−1

=

σλ2 (2n − 2 − βn + 2β) + βσε2 . (1 − β)2 (1 − β)2 (n − 1 + β)2

(3.8)

When σλ = 0, then the coefﬁcient reduces to coeff =

(m − 1)2 2β(n − 1) − β 2 (n − 2) . m (n − 1 + β)2 − (n − m)[2β(n − 1) − β 2 (n − 2)] (3.9)

When m = n, coeff = 2β

(n − 1)2 (n − 1)2 − β2 . n(n − 1 + β) (n − 1 + β)2

(3.10)

Hence as n → ∞, the coefﬁcient converges to 2β − β 2 . Importantly, because of the reﬂection across individuals, the regression of an individual outcome on a group average cannot be thought of as a consistent estimate of β. However, under some conditions (m = n, large, σλ2 = 0), the ordinary least squares coefﬁcient does have an interpretation as a simple function of β. Again, the primary complication with this methodology is the presence of correlated error terms across individuals. Some of this problem is corrected by controlling for observable individual characteristics. Indeed, the strength of this approach relative to the variance approach is that it is possible to control for observable individual attributes. However, in most cases, the unobservable characteristics are likely to be at least as important as the observable ones and

364

Glaeser and Scheinkman

are likely to have strong correlations across individuals within a given locale. Again, this correlation may also be the result of place-speciﬁc factors that affect all members of the community. One approach to this problem is the use of randomized experiments that allocate persons into different neighborhoods. The Gautreaux experiment was an early example of a program that used government money to move people across neighborhoods. Unfortunately, the rules used to allocate people across neighborhoods are sufﬁciently opaque that it is hard to believe that this program really randomized neighborhoods. The Moving to Opportunity experiment contains more explicit randomization. In that experiment, funded by the department of Housing and Urban Development, individuals from high-poverty areas were selected into three groups: a control group and two different treatment groups. Both treatment groups were given money for housing, which they used to move into lowpoverty areas. By comparing the treatment and control groups, Katz, Kling, and Liebman (2001) are able to estimate the effects of neighborhood poverty without fear that the sorting of people into neighborhoods is contaminating their results. Unfortunately, they cannot tell whether their effects are the results of peers or other neighborhood attributes. As such, this work is currently the apex of work on neighborhood effects, but it cannot really tell us about the contribution of peers vs. other place-based factors. Sacerdote (2000) also uses a randomized experiment. He is able to compare people who are living in the same building, but who have different randomly assigned roommates. This work is therefore a somewhat cleaner test of peer effects. Before randomized experiments became available, the most accepted approach for dealing with cases where σλ2 = 0 was to use peer group background characteristics as instruments for peer group outcomes. Case and Katz (1991) pioneered this approach, and under some circumstances it yields valid estimates of β. To illustrate this approach, we assume that there is a parameter (x) that can be observed for all people and that is part of the individual error term (i.e., (i = γ xi + µi ). Thus, the error term can be decomposed into a term that is idiosyncratic and unobservable, and a term that is directly observable. Under the assumptions that both components of (i are orthogonal to λ" and to each other, using the formula for an instrumental variables estimator we ﬁnd that Cov ai , j=i x j /(m − 1) β = . n−1 β + (1 − β) m−1 Cov j=i a j /(m − 1), j=i x j /(m − 1) (3.11) When m = n, this reduces to β. Thus, in principle, the instrumental variables estimator can yield consistent estimates of the social interaction term of interest. However, as Manski (1993) stresses, the assumptions needed for this methodology may be untenable. First, the sorting of individuals across communities may mean that Cov(xi , µ j ) = 0 for two individuals i and j living in the same community. For example, individuals who live in high-education communities

Nonmarket Interactions

365

may have omitted characteristics that are unusual. Equation (3.11) is no longer valid in that case, and, in general, the instrumental variables estimator will overstate social interactions when there is sorting of this kind. Second, sorting may also mean that Cov(xi , λ" ) = 0. Communities with people who have high schooling levels, for example, may also have better public high schools or other important community-level characteristics. Third, the background characteristic of individual j may directly inﬂuence the outcome of person i, as well as inﬂuencing this outcome through the outcome of individual j. Many researchers consider this problem to be less important, because it occurs only when there is some level of social interaction (i.e., the background characteristic of person j inﬂuencing person i). Although this point is to some extent correct, it is also true that even a small amount of direct inﬂuence of x j on ai can lead to wildly inﬂated estimates of β, when the basic correlation of x j and a j is low. (Indeed, when this correlation is low, sorting can also lead to extremely high estimates of social interaction.) Because of this problem, instrumental variables estimates can often be less accurate than ordinary least squares estimates and need to be considered quite carefully, especially when the instruments are weak. 3.3.

Social Multipliers

A ﬁnal approach to measuring social interactions is discussed in Glaeser and Scheinkman (2001) and Glaeser, Laibson, and Sacerdote (2000), but to our knowledge has never been really utilized. This approach is derived from a lengthier literature on social multipliers in which these multipliers are discussed in theory, but not in practice (see Schelling 1978), The basic idea is that, when social interactions exist, the impact of an exogenous increase in a variable can be quite high if this increase impacts everyone simultaneously. The effect of the increase includes not only the direct effect on individual outcomes, but also the indirect effect that works through peer inﬂuence. Thus, the impact on aggregate outcomes of an increase in an aggregate variable may be much higher than the impact on an individual outcome of an increase in an individual variable. This idea has been used to explain how the pill may have had an extremely large effect on the amount of female education (see Goldin and Katz, 2000). Goldin and Katz argue that there is a positive complementarity across women who delay marriage that occurs because when one woman decides to delay marriage, her prospective spouse remains in the marriage market longer and is also available to marry other women. Thus, one woman’s delaying marriage may increase the incentives for other women to delay marriage, and this can create a social multiplier. Berman (2000) discusses social multipliers and how they might explain how government programs appear to have massive effects on labor practices among Orthodox Jews in Israel. In principle, social multipliers might explain phenomena such as the fact that there is a much stronger connection between out-of-wedlock births and crime at the aggregate level than at the individual level (see Glaeser and Sacerdote, 1999).

366

Glaeser and Scheinkman

In this section, we detail how social multipliers can be used in practice to estimate the size of social interactions. Again, we assume that the individual disturbance term can be decomposed into (i = γ xi + µi , and that m = n. When we estimate the microregression of individual outcomes on characteristic x, when x is orthogonal to all other error terms, the estimated coefﬁcient is Individual coeff = γ

(1 − β)n + (2β − 1) . (1 − β)n − (1 − β)2

(3.12)

This expression approaches γ as n becomes large, and for even quite modest levels of n (n = 20), this expression will be quite close to γ . Our assumption that the xi terms are orthogonal to the u i terms is probably violated in many cases. The best justiﬁcation for this assumption is expediency – interpretation of estimated coefﬁcients becomes quite difﬁcult when the assumption is violated. One approach, if the assumption is clearly untenable, is to use place-speciﬁc ﬁxed effects in the estimation. This will eliminate some of the correlation between individual characteristics on unobserved heterogeneity. An ordinary least squares regression of aggregate outcomes on aggregate x variables leads to quite a different expression. Again, assuming that the xi terms are orthogonal to both the λ" and µi terms, then the coefﬁcient from the aggregate regression is γ /(1 − β). The ratio of the individual to the aggregate coefﬁcient is therefore Ratio =

(1 − β)n + 2β − 1 . n−1+β

(3.13)

As n grows large, this term converges to 1 − β, which provides us with yet another means of estimating the degree of social interactions. Again, this estimate hinges critically on the orthogonality of the error terms, which generally means an absence of sorting. It also requires (as did the instrumental variables estimators) the assumption that the background characteristics of peers have no direct effect on outcomes. 3.4.

Reconciling the Three Approaches

Although we have put forward the three approaches as distinct ways to measure social interactions, in fact they are identical in some cases. In general, the microregression approach of regressing individual outcomes on peer outcomes (either instrumented or not) requires the most data. The primary advantage of this approach is that it creates the best opportunity to control for background characteristics. The variance approach is the least data intensive, because it generally requires only an aggregate and an individual variance. In the case of a binary variable, it requires only an aggregate variance. Of course, as Glaeser, Sacerdote, and Scheinkman (1996) illustrate, this crude measure can be improved on with more information. The social multiplier approach lies in the middle. This approach is closest to the instrumental variable approach using microdata.

Nonmarket Interactions

367

ACKNOWLEDGMENTS We thank Roland Benabou, Alberto Bisin, Avinash Dixit, Steve Durlauf, James Heckman, Ulrich Horst, and Eric Rasmusen for comments; Marcelo Pinheiro for research assistance; and the National Science Foundation for research support. We greatly beneﬁted from detailed comments by Lars Hansen on an earlier version.

References Aoki, M. (1995), “Economic Fluctuations with Interactive Agents: Dynamic and Stochastic Externalities,” Japanese Economic Review, 46, 148–165. Arthur, W. B. (1989), “Increasing Returns, Competing Technologies and Lock-in by Historical Small Events: The Dynamics of Allocation under Increasing Returns to Scale,” Economic Journal, 99, 116–131. Bak, P., K. Chen, J. Scheinkman, and M. Woodford (1993), “Aggregate Fluctuations from Independent Sectoral Shocks: Self-Organized Criticality in a Model of Production and Inventory Dynamics,” Ricerche Economiche, 47, 3–30. Bala, V. and S. Goyal (2000), “A Non-Cooperative Model of Network Formation,” Econometrica, 68, 1181–1229. Banerjee, A. (1992), “A Simple Model of Herd Behavior,” Quarterly Journal of Economics, 107, 797–818. Becker, G. (1991), “A Note on Restaurant Pricing and Other Examples of Social Inﬂuences on Price,” Journal of Political Economy, 99(5), 1109–1116. Becker, G. and K. M. Murphy (2000), “Social Economics: Market Behavior in a Social Environment,” Cambridge, MA: Belknap-Harvard University Press. Benabou, R. (1993), “Workings of a City: Location, Education, and Production,” Quarterly Journal of Economics, 108, 619–652. Benabou, R. (1996), “Heterogeneity, Stratiﬁcation, and Growth: Macroeconomic Effects of Community Structure,” American Economic Review, 86, 584–609. Berman, E. (2000), “Sect, Subsidy, and Sacriﬁce: An Economist’s View of UltraOrthodox Jews,” Quarterly Journal of Economics, 15, 905–954. Bewley, T. (1981), “A Critique of Tiebout’s Theory of Local Public Expenditures,” Econometrica, 49(3), 713–740. Bikhchandani, S., D. Hirshleifer, and I. Welch (1992), “A Theory of Fads, Fashion, Custom, and Cultural Exchange as Information Cascades,” Journal of Political Economy, 100, 992–1026. Blume, L. (1993), “The Statistical Mechanics of Strategic Interaction,” Games and Economic Behavior, 5, 387–424. Blume, L. and S. Durlauf (1998), “Equilibrium Concepts for Social Interaction Models,” Working Paper, Cornell University. Boorman, S. (1975), “A Combinatorial Optimization Model for Transmission of Job Information through Contact Networks,” Bell Journal of Economics, 6(1), 216–249. Brock, W. (1993), “Pathways to Randomness in the Economy: Emergent Nonlinearity and Chaos in Economics and Finance,” Estudios Economicos, 8(1), 3–55. Brock, W. and S. Durlauf (1995), “Discrete Choice with Social Interactions,” Working Paper, University of Wisconsin at Madison.

368

Glaeser and Scheinkman

Brock, W. and S. Durlauf (2001), “Interactions Based Models,” in Handbook of Econometrics (ed. by J. Heckman and E. Leamer), Amsterdam: North-Holland. Bulow, J., J. Geanakoplos, and P. Klemperer (1985), “Multimarket Oligopoly: Strategic Substitutes and Complements,” Journal of Political Economy, 93, 488–511. Case, A. and L. Katz (1991), “The Company You Keep: The Effects of Family and Neighborhood on Disadvantaged Families,” NBER, Working Paper 3705. Cooper, R. and A. John (1988), “Coordinating Coordination Failures in Keynesian Models,” Quarterly Journal of Economics, 103, 441–464. Crane, J. (1991), “The Epidemic Theory of Ghettos and Neighborhood Effects on Dropping Out and Teenage Childbearing,” American Journal of Sociology, 96, 1226–1259. Diamond, P. (1982), “Aggregate Demand Management in Search Equilibrium,” Journal of Political Economy, 90, 881–894. Durlauf, S. (1993), “Nonergodic Economic Growth,” Review of Economic Studies, 60, 349–366. Durlauf, S. (1996a), “A Theory of Persistent Income Inequality,” Journal of Economic Growth, 1, 75–93. Durlauf, S. (1996b), “Neighborhood Feedbacks, Endogenous Stratiﬁcation, and Income Inequality,” in Dynamic Disequilibrium Modeling – Proceedings of the Ninth International Symposium on Economic Theory and Econometrics, (ed. by W. Barnett, G. Gandolfo, and C. Hillinger), Cambridge: Cambridge University Press. Ellison, G. (1993), “Learning, Local Interaction, and Coordination,” Econometrica, 61, 1047–1072. Ellison, G. and D. Fudemberg (1993), “Rules of Thumb for Social Learning,” Journal of Political Economy, 101, 612–644. Follmer, H. (1974), “Random Economies with Many Interacting Agents,” Journal of Mathematical Economics, 1, 51–62. Froot, K., D. Scharfstein, and J. Stein (1992), “Herd on the Street: Informational Inefﬁciencies in a Market with Short-Term Speculation,” Journal of Finance, 47, 1461–1484. Gabszewicz, J. and J.-F. Thisse (1996), “Spatial Competition and the Location of Firms,” in Location Theory, (ed. by R. Arnott) Fundamentals of Pure and Applied Economics, Vol. 5, (ed. by J. Lesourne and H. Sonnenschein), Chur, Switzerland: Harwood Academic, 1–71. Gale, D. and H. Nikaido (1965), “The Jacobian Matrix and the Global Univalence of Mappings,” Mathematische Annalen, 159, 81–93. Glaeser, E., D. Laibson, and B. Sacerdote (2000), “The Economic Approach to Social Capital,” Working Paper 7728, NBER. Glaeser, E. and B. Sacerdote (1999), “Why Is There More Crime in Cities?” Journal of Political Economy