[This is a slightly expanded version of my Presidential Address for the 1997 Conference of the Association for Consumer Research in Denver. Thanks to Jim Bettman, Kim Corfman, Wes Hutchinson, Debbie John, and Rick Staelin for timely and insightful comments on an earlier draft and to Kimberly Dillon for analysis help.]
I want to talk with you today about reviewing and the review process. I chose this topic because I had thought a lot about the review process, first as an associate editor for Journal of Consumer Psychology and more recently as one of Brian Sternthal's team of associate editors for Journal of Consumer Research. Of course, now our team is history and Bob Burnkrant's is in charge. So I knew that the prospect of hearing my opinions had limited currency and appeal-about as much as hearing Earl Butz tell the inside story of the Gerald Ford presidency.
Still, I was undeterred because I was pretty sure that I would have some original things to say. But when I dug in to do the research to write this address, I discovered that I was wrong. There has been an astounding amount of research on the peer review process, of which I was almost entirely unaware-643 academic papers, according to an excellent review by Armstrong (1997), including 101 empirical studies. I'm counting on the possibility that you in the audience share my ignorance. I will try to say something that, if not original, is at least synthetic. Let me get serious here, beginning with some intemperate personal observations and then attempting to draw some more temperate conclusions based on the literature on peer review.
My single biggest motivation in wanting to talk about reviewing is my perception that so much of it is badly done. I know that it is heresy to say so. Editors always praise the insightful contributions of their editorial boards and their ad hoc reviewers. Authors always talk graciously about how much reviews, even critical ones, helped their papers. And I've been the beneficiary of some really insightful reviews over the years. But from working in this field for 18 years and from seeing reviews of a couple hundred JCR and JCP manuscripts, my observation is that the average reviewer is mediocre at best (present company excepted, of course).
Reviewers have two roles as "critics" and "coaches" (Beyer, Chanove, and Fox 1995; Cummings, Frost, and Vakil 1985) that roughly correspond to what we write on the covering ratings sheets that accompany manuscripts and to what we say in our comments to authors. Our ratings on the cover sheets reflect our critic role as gatekeepers, deciding which papers do and do not deserve publication and attention by the larger research community. If we do our job well by selecting only the best papers for publication in the top journals, we lower the search cost for readers overwhelmed with a knowledge stock that grows far faster than our capacity to keep up (Roediger 1987). In our second, "coaching" role, we try to be good colleagues, adding value to manuscripts via our comments to authors (Laband 1990). I'll first consider our limitations as gatekeepers, then turn to our coaching role.
GATEKEEPING ERRORS: WHAT CAN WE LEARN FROM RATINGS ON REVIEW FORMS?
Lessons in Humility
When I evaluate a paper, I'm usually extremely confident of my overall evaluation. I've noticed the same thing about a lot of you. But numerous accounts exist of classic papers rejected when first submitted (Garcia 1981, Campanario 1995). For example, George Akerlof's (1970) "A Market for 'Lemons" paper was rejected by the American Economic Review, the Journal of Political Economy, and the Review of Economic Studies. Two said it was trivial, the other that it was too general to be true (Gans and Shepherd 1994).
I'd like to say that I'd never make those kinds of mistakes, but I'd be lying. As a member of a University of Florida faculty hiring group at AMA, I interviewed a rookie, now famous, and attempted to persuade my colleagues that the guy's dissertation was no great shakes. His dissertation paper later won two major awards. I read that paper now and wonder what in the world I was thinking at the time.
Would your published papers be accepted if resubmitted and reviewed again? In a widely cited and widely criticized study, Peters and Ceci (1982) resubmitted 12 papers recently published in top APA journals to the same journals, after changing authors, affiliations, and making minor changes in abstracts. Most of the editors handling the resubmissions were the same ones who had accepted the originals. Only 3 of the 12 were detected as resubmissions. This by itself is an embarrassing indictment of the reading habits of the reviewers who, of course, were described in the editors' letters as the top experts in the topic areas of the papers. Of the 9 papers that escaped detection as duplicates, 8 were rejected.
Low reliability. We'd like to believe that reviewers' favorable judgments of us and our work are due to our unquestionable merit, not luck. But we may be fooling ourselves. Studies show that interjudge reliability is very low across all disciplines. As reviewers, therefore, we are rash when we assume that our judgments of quality are shared by others-even those we most respect. It's the reviewer's "false consensus effect" (Ross, Green, and House 1977).
Experts' retrospective judgments of impact and article quality don't agree, nor are individual experts' assessments highly related to citation counts (Cole 1992, Gottfredson 1978). Roediger (1987) notes that even when it is obvious that a highly cited article had "impact", we can't always agree whether the highly cited work moved the field forward or sent us up blind alleys. He points to Craik and Lockhart's research on levels of processing and Festinger's on cognitive dissonance as examples. Roediger argues that, if we can't agree in retrospect, it should not surprise us that we cannot agree in prospect.
Still, I was amazed to find dozens of studies documenting the lack of interjudge reliability of reviewers' overall recommendations for manuscript disposition. The most definitive evidence comes from two cross-disciplinary investigations of the reliability of peer review for manuscript and grant submissions. Across studies of 10 top psychology journals and 5 top medical journals, interjudge reliability ranged between .19 and .44, as measured by intraclass correlations (Cichetti 1991). If we think that there is a "true score" for the quality of each paper, somewhere around two thirds of the variance in individual reviewer evaluations is "error." Cole (1992) shows most convincingly that this lack of consensus about new work is very general, and is equally true in the hard sciences as in "softer" social sciences.
An ACR example. When Kim Corfman and I co-chaired the 1995 ACR conference, our program committee included 39 people who each reviewed about 16 proposals most related to their expertise. I knew these folks to be insightful reviewers from reading their reviews as an associate editor at JCR or JCP. I correlated the ratings of each program committee member with the mean ratings of the proposals he or she reviewed. The correlations ranged from r=+.97 to -.15, with a mean of .54. Squaring that, we are right in line with prior studies concluding that around a third of the variance in individual proposal ratings reflects variance due to "true scores."
Eleven of our 39 committee members served before or after as editors or associate editors of JCR, JCP, or the ACR Conference. To my surprise, the "item-to-total" correlation of individual reviewer judgments with program committee averages was significantly lower for editors than for noneditors. My interpretation is that we choose people to be on program committees or to perform important editorial assignments based on their demonstrated insight in past reviews-in the "comments to authors part." But we should question our assumption that these scholars are better than the rest of us at predicting on their cover sheets what others will find good.
Causes of unreliability. Roediger (1987), past editor of Journal of Experimental Psychology, argues that reviewer agreement is irrelevant. Editors deliberately choose reviewers who can evaluate different components of the manuscript-e.g., a methodologist, a theorist in the area, and someone who can assess readability for the average journal subscriber. This makes reviewers' evaluations what LISREL modelers call "formative" rather than "reflective" indicators. The observable ratings "cause" the score on the underlying latent construct of judged manuscript quality, rather than the reverse. It's like how different beliefs and evaluations of consequences combine to determine attitude in a multiattribute model. In this case, reliability is not a meaningful measure (Bollen and Lennox 1991) because overall quality is a composite of the quality scores on subdimensions, like a multiple regression. In choosing predictors for a multiple regression, one would prefer uncorrelated predictors to redundant ones.
I'm not sure that editors always try to choose maximally different reviewers, but I accept the argument that the editor can achieve a more complete picture than individual reviewers possess by piecing together the views of those with complementary perspectives. Moreover, editors see the full gamut of manuscripts and so have more stable comparison referents than do reviewers trying to translate their private impressions into overt ratings on a scale (1=Accept unconditionally . 6= Reject unconditionally) (Laming 1991). Consequently, we can hope that the "validity" of editors' decisions is not bounded by the "reliability" of reviewers' ratings.
Conclusions about Reviewer Ratings of Manuscript Quality
In my ACR Program Committee anecdote, I suggested that editors are no better than (highly selected) noneditors at anticipating what others will find useful. That conclusion holds when both groups base their evaluations on the same information. But editors acting in their role as editors have access to a more complete set of inputs on which to base their judgments. Therefore, the main conclusion that I would like you in the audience to draw from the discussion so far is that, as reviewers, we ought to be less egocentric than we are about our recommendations for manuscript disposition. We should not be insulted, complain bitterly, or resign from editorial service for a journal when the editor makes a decision that does not agree with our personal recommendation.
That's important, because I believe that all editors feel political pressure not to overrule the reviewers on a consistent basis. Empirical evidence across a variety of disciplines shows that editors' decisions are highly predictable from the average ratings of the reviewers (Bakanic, McPhail, and Simon 1987; Beyer, Chanove, and Fox 1995; Blank 1991). Cichetti (1991) shows cross-disciplinary evidence that, with one review, editors "go with the flow", with two, they "go with the low", and with three, they "go with the mode." This sounds like what one might predict from an "editor as politician" model (a la Tetlock), where editors are concerned about accountability to the audience of reviewers. Decisions would be better if editors would rely almost entirely on the merits of the arguments in the open-ended comments to authors (Bailer 1991). I will now turn to a consideration of the quality of those comments.
COMMENTS FOR AUTHORS: DO REVIEWERS ADD VALUE?
It's easier to demonstrate the benefits of our comments to authors than of the evaluative ratings that accompany them. Most authors report that they believe that the reviews they receive improve their ultimately accepted papers (Bradley 1981). Readers apparently agree that the revision process yields better papers. Fletcher and Fletcher (1997) describe a study in which they and their collaborators took original manuscripts and their revised counterparts and sent them to new reviewers, blind to condition. The revised manuscripts were rated as superior on 33 of 34 elements of quality. Laband (1990) has shown that the subsequent citations of papers in top economics journals are a positive function of the sheer volume of reviewers' comments. (So when all you authors out there get 18 pages of single spaced comments about how you can improve your paper, are you lucky!)
We can debate the direction of causality of those effects, but perhaps those positive effects ensue because some reviewers do a good job. An editor of two biomedical journals, David Horrobin (1990), says it well.
"In my own experience, about one third of referees' reports are accurate, comment on important issues, and are fair in their recommendations; about one third are accurate but obsessed with the trivial and recommend revision or rejection on inadequate grounds; and about one-third are inaccurate and can be demonstrated to be so on objective grounds. What constantly astonishes me is the intemperate language in which many reports in the last two categories are couched." (p. 217)
Good people can do some very bad things, due to two reviewer misperceptions. First, we all think the papers we get to review are a lot worse than our own papers and somehow worse than the average submission. This isn't true, on average. Second, via reviewer false consensus, we assume that all but the dense-read, "everyone but the authors"-will see the same obvious flaws. Wrong again. Reviewers rarely comment on the same things (Fiske and Fogg 1990). In hopes of highlighting how we can go astray in our Comments to Authors, may I humbly present to you my list of the Seven Deadly Sins of Reviewing.
SEVEN DEADLY SINS OF REVIEWING
Sin 1. Failure to Read the Journal's Mission Statement
Many reviewers have the wrong goal. They review journal manuscripts as if they perceive that their task is to "find the flaw"; they aim to demolish whatever paper they are sent, like a demolition expert sent to destroy an unsound building. Little attention is given to evaluating ways that the authors could make their best points in ways that do not have a faulty basis. Fiske and Fogg's (1990) content analysis of APA journal reviews coded only criticisms, because positive statements were so sparse.
But remember that the journal's mission is to publish the best papers, not to reject them all. After all, nobody pays for a subscription to JCR to read about the editorial board's brilliant rejections. Since almost no papers are accepted on the first round, it follows that reviewer input is necessary to bring out what is insightful and good in what is written in at least some papers.
Sin 2. Bad Doctor, Ambitious Undertaker: "The Fundamental Reviewer Error"
Of course criticism improves papers-even harsh criticism. But the key is that reviewers need to be clear on which flaws are ones that they think are fatal and which are correctable. The sheer volume of critical comments to authors does not differentiate ultimately accepted from ultimately rejected papers (Bakanic, McPhail, and Simon 1989), and the seriousness of a given comment isn't obvious to readers. When Fiske and Fogg (1990) content analyzed reviewer comments on manuscripts submitted to top APA journals, they gave up on coding the severity of criticisms. They couldn't get acceptable intercoder reliability.
Maybe the rest of us are confused about what reviewers think is fatal because reviewers themselves are confused. This inability to tell a fatal error from a correctable flaw is so common that, with apologies to Lee Ross, I label it the "fundamental reviewer error." The fundamental reviewer error comes in two flavors, one pointing to confounds and attendant threats to construct validity and the other pointing to alleged fatal problems of external validity. In both cases, the error comes when the critique cannot explain the pattern of the data found. Most reviewers don't seem to think this through. Perhaps that is why they are so anxious to bury the patient while she is still alive.
Let's take an example of a garden variety confound. Suppose that you run a study intended to show that more expert sources induce more persuasion in their audiences. Your 2 x 2 design varies Target Issue and Identity of Source. The essay either advocates the position that certain diets cause health problems or that gun control would be an aid to crime prevention. The source is said to be either Dr. Marvin Smith, Deputy Surgeon General or Sergeant Marvin Smith of the Durham Police Department. A manipulation check shows that Dr. Smith is perceived to be more expert for the diet and health essay, but that Sergeant Smith is perceived to be more expert in matters of crime prevention. Results on the main dependent variables show more agreement with the message advocacy of the health essay for Dr. than for Sgt. Smith, while the reverse is true for the essay on gun control and crime prevention.
A reviewer complains that the study is confounded. Dr. and Sgt. Smith differ in status; people may agree more with advocacies from high-status sources. True enough, but that would predict a main effect of Identity of Source (Doctor vs. Sergeant), and the obtained result was an interaction of Target Issue x Identity of Source. Not only is this "problem" not fatal, it may not merit mention in the Limitations section. Confounds that don't explain the data should not change our confidence in research conclusions (Brinberg, Lynch, and Sawyer 1992; Sternthal, Tybout and Calder 1987).
A similar issue arises with critiques on grounds of external validity (Lynch 1982). I would urge you as reviewers to follow your criticisms one step further than is typical. Work through how fixing the "problem" you have identified would change the data pattern. If it would not, or if the data pattern would only grow stronger, you have not identified a fatal flaw.
Sin 3. 20-200 Hindsight
Of all the reviewer criticisms, two related charges seem most unfair to the authors: "It's obvious" and "That's already well known." If a finding is "already well known", the reviewer should be able to provide an exact reference or withdraw the charge. "Obviousness", though, is a less easily evaluated accusation. Ableson (1995) says that a paper is "interesting" if it shifts our beliefs about issues we believed to be important. If the findings are "obvious" to everyone, they don't shift anyone's beliefs about the plausibility of the hypothesized cause of the findings. I think that's a legitimate cause for rejection.
But reviewers are often myopic about two things. First, the findings may look obvious in retrospect, but not in prospect. Reviewers forget how different their views were before reading the paper. That's hindsight bias (Hasher, Attig, and Alba 1981; Hawkins and Hastie 1990).
Second, even when reviewers legitimately claim that a paper did not shift their own priors, they may give insufficient weight to the fact that others may have very different priors. That's a form of false consensus. Paul Samuelson tells the story of little-known Roy Harrod, who was the first to sketch the marginal revenue curve:
"Harrod went to his grave bitter because Maynard Keynes, absolute monarch at the Economic Journal, turned down his early breakthroughs in the economics of imperfect competition. Thus, Harrod was robbed of credit for the "marginal revenue" nomenclature. All this was on the advice of Frank Ramsey, genius in logic and mathematics. To genius, every new idea is indeed "obvious" and besides, all that was already in 1838 Cournot. Hard cheese for Harrod, or for any of us, if the trace of our new brainchild can be found in 1750 Hume or 1826 von Thunen (from Gans and Shepherd 1994, p. 175)."
Even if Harrod's paper did not shift the beliefs of the reviewer Ramsey, it still would clearly have value if it shifted others' beliefs (Brinberg et al., 1992). One common way that this can occur is when readers from different subfields have different priors. For example, the finding that a particular context effect is insensitive to large changes in reward may surprise economists but not a behavioral decision theorist who thinks that the effect is due to involuntary perceptual processes. As a reviewer, I suspect that I'm often guilty of projecting my priors onto others. Now that I've read all this stuff about disagreement among reviewers, I'm going to be more watchful to avoid Sin 3.
Sin 4: Emptyhanded at the Potluck Supper
Of course, all authors are sometimes reviewers and all reviewers are sometimes authors. But we forget this, as Morris Holbrook (1986) points out so cleverly in his paper on "sadomasochism in the review process." He compares the experiences of a submitting author with those of a reviewer on his paper. The author perceives his submission as a breakthrough, but he receives slow, incompetent, unsympathetic, and inconsistent reviews. The reviewer, on the other hand, feels blocked on all sides from doing his own research. He receives 50-page papers from four journals in the same day. He is stunned to see a bad paper he had voted to reject come back with 18 pages of notes to reviewers not accepting most of his points.
I'm more sympathetic to the "scholar as author" in Holbrook's scenario than to his "scholar as harried reviewer." We are asked to review a lot of papers if we publish a lot in those same journals. This seems fair to me. Our papers wouldn't be published without the efforts of a lot of other reviewers. So it is transitive reciprocity that if you want other people to put in an above-average effort on your papers, you should do the same when it's your turn. To do careless reviews is to free ride, like someone who eats but fails to contribute to the potluck supper. It's an N-person prisoner's dilemma.
I've heard stories from colleagues of reviewers who completed reviews of a manuscript that was missing the Results section, apparently not noticing its absence. How carefully could they have read the manuscript to overlook something like that? That's pretty extreme, but just as bad is the careless reviewer who "discovers" fatal flaws on the second round that were there all the time.
The problem is often simple underinvestment in time, or as Holbrook (1986) reminds us, failure to schedule uninterrupted time. Surveys of reviewers across a variety of disciplines reveal that we spend between two and six hours on a review, on average (Lock and Smith 1990; Yankauer 1990). I have colleagues I respect very much who are so efficient that they can knock out a good review in a half a day, but everybody is different. I find that I am incapable of saying anything insightful if I don't spend at least a day thinking about a paper. I've done reviews in four hours and most have been drivel.
Sin 5: Overestimating Your Indispensibility
You might be thinking that I'm being hard on Holbrook's harried reviewer. You might point out that the problem with being conscientious is that this begets ever more work.
"Conscientious referees find their popularity with editors increasing and more and more manuscripts landing on their desks long after their own research has begun to suffer, until they cannot even cope with their refereeing work efficiently." (Colman 1991, pp. 141)
Yankauer's (1990) survey found that reviewers spent less time per paper the more papers they reviewed for a journal. We have here a reviewer's Peter Principle in which the excellent reviewer gets on so many editorial boards that she can no longer live up to the standards that led her to be chosen.
I'm not that sympathetic to the reviewer here. Sometimes we overestimate our individual importance or prize too much the prestige of editorial board appointments. That makes us slow to say no. We accept more reviewing work than we can handle well. A lot of talented junior people out there would do the job if we couldn't. In fact, they'd like a crack at those editorial board assignments too, if only we'd step down when we see that we are no longer working up to our own standards.
Sin 6: Hieroglyphics in Your Diary
A review should be comprehensible to someone other than the reviewer. It's not like writing in your diary. It takes time to organize your thoughts to figure out what the issues are with a manuscript and even more time to express those thoughts cogently. The authors, the editor, and the other reviewers need to understand what the major issues are that must be resolved before the paper could be published. They also need to know what you see as the contribution that should be highlighted.
Organization. Most reviews just list a series of line-by-line comments (Fiske and Fogg 1990, p. 592). That's unhelpful, no matter how thorough. A good review begins with an overview at a more global level that alerts the reader to the strengths of the paper and the major weaknesses as elaborated in numbered major comments that follow. There should be a clear statement in the opening paragraph about which numbered comments are the ones that determine the ultimate publishability of the paper. For a revision, it helps greatly if the comments focus on issues identified in the prior round and are numbered accordingly. Specific comments are optional, but should follow major points.
Length. Some editors have attempted to elicit more prioritized reviews by suggesting a limit of 2 single-spaced pages. I have no problem with longer reviews except for really bad papers, where mercy dictates brevity. I usually exceed two pages myself. But there should not be a dozen equally weighted points, and the reviewer should not expect the authors to agree with and implement every point, or to write revision notes on the minor ones.
Sometimes I write long reviews because, like Holbrook's harried reviewer, I'm frustrated to be putting in more time on a review than I had anticipated. I may do a core dump rather than spending the extra hour or two necessary to express myself concisely. That's a sin.
But sometimes a longer review is motivated by an honest effort to avoid misunderstandings that can inflate the number of rounds required to make the right decision. Sometimes, by elaborating one's points on one round, one can greatly increase the chances that the revised paper will be accepted on the next round. That's a blessed act, not a sin. As Jacoby and Hoyer (1989) point out, there is a significant amount of miscommunication and miscomprehension in noncommercial print communication -like what we write to each other in the review process.
Authors share the blame for this miscommunication (Armstrong 1997), and evidence shows that more papers are rejected due to (presumably correctable) problems of exposition than to fatal design flaws (Daft 1985; Fiske and Fogg 1990). But if, as a reviewer, you can't understand what authors are saying in manuscripts that presumably reflect years of work, consider the likelihood that neither the author, the editor, nor the other reviewers will understand the points that you dash off in a couple of hours. I often read what other reviewers are saying, and can't follow their arguments. Even more regrettably, I sometimes have the same experience looking at my own old reviews.
Sin 7: Hijacking the Plane
I've talked mostly about sins committed by unsupportive and lazy reviewers. But sometimes constructive reviewers become too constructive; they want to take over your paper. It's a hard line to draw. When reviewing, I often might suggest some small follow-up study to augment what the authors have reported if I think it will change a minor paper to a major one. And I'll try to cajole authors into making changes, redoing analysis, etc. when I think the paper has a lot of potential. So many of our published articles are rarely cited (Roediger 1987). We should seize the opportunity when we have the chance to write a really outstanding paper.
But it is possible to go too far. We reviewers need to be better at distinguishing changes that are truly critical for a paper to be publishable. Too often we insist on changes just because we think they would be nice improvements. Authors capitulate-sometimes against their better judgment-because they feel they must (Bradley 1981). Authors should get to decide the direction of a paper that is clearly over the publication threshold.
As an associate editor for JCR, I was handling a manuscript by two talented junior consumer researchers and an eminent senior co-author from another field. The paper went several rounds, and I was very pleased to have been associated with it on its publication. The junior authors discovered my identity when the page proofs came in and wrote me a very nice note about how helpful my comments had been. But when I saw the senior co-author at a conference, he told me that he would never again submit a paper to JCR. The review and revision process was too long and drawn out relative to other publication options he had at his stage of career. I learned something that day, and I've been more careful since to avoid hijacking the authors' plane.
Many of the problems I describe arise from a lack of accountability. Editors tend to be very deferential to reviewers-after all, they need their motivation and good will for the survival of the journal. Authors have very limited ability to disagree with and debate a reviewer's points, and if an author does have the temerity to protest, the default assumption is that she is being defensive and egocentric. We act as if the same scholar's IQ decreased by 20 points when she shifts from being a reviewer to being an author.
I'm not saying that reviewers should cut authors more slack. I'm saying that we should cut each other less slack when we function as reviewers. If we have 90% rejection rates for new submissions, maybe we need to have a higher rejection rate for reviews. I'd like to see some editor write back to a reviewer, "I'm sorry to have to reject your review, but if you could fix this incoherent argument and add a section that shows how your critique explains the data, I'd be willing to entertain your review as a new submission. You have 30 days to respond."
I have some more serious conclusions about what we can do to increase reviewer accountability.
Associate Editor Structure
The associate editor structure increases reviewer accountability. AE's have the specialized expertise to sort through the merits and demerits of reviewers' arguments. Moreover, they share the editor's interest in actually publishing some papers and are calibrated about how extremely negative the distribution of reviews is. In their own reviews, AE's can counterargue reviewers' specious points so that authors don't have to do it later and face the reviewers' wrath.
Reviewing the Reviewers
As an AE, I also reviewed the reviewers confidentially in my covering letter to Brian. We kept no systematic records of this. I wish we had and had communicated systematic feedback to editorial board members at our annual meetings. We tend to reward things for which we have measures. Right now, we measure turnaround time. Editorial board members see lists with their own turnaround times and those of the other members and feel embarrassed if they are slow. Perhaps because we lack systematic records, reviewers are much less accountable for the intellectual content of their reviews.
Two other groups are in a position to give feedback: the authors and the other reviewers on the same paper. I'd like to see us solicit both systematically. Authors could rate the constructiveness of reviewers when the file is closed on a manuscript. Reviewers could review each other too. Technology that we use now in our global executive MBA programs at Fuqua could allow reviewers of a manuscript to see the comments of other reviewers on a private electronic bulletin board-for, say, a week after the last review has been received. Reviewers could then agree or disagree before the editor has made a decision (and before they have forgotten too much of the manuscript to respond). That should allow convergence of opinion in a smaller number or rounds and should avoid some mistakes on round 1. A lower-tech initiative would ask reviewers of revised manuscripts to say in covering letters what they found compelling and not in what the other reviewers said, and whether the authors' reply to the other reviewers was compelling.
We wouldn't put people behind the wheel if they had never been trained to drive a car, but that's essentially what we do with novice reviewers. Crandall's (1991) study found that 2 of 76 social science journals surveyed had programs to train new reviewers.
Moreover, explicit policies exhort reviewers to treat submitted manuscripts as confidential documents. There are obvious reasons for such policies, but a pernicious side-effect is that novices are discouraged from soliciting feedback from more experienced colleagues before submitting their comments. I'd like to see journals ask submitting authors to check a box giving reviewers permission to share the manuscript with a Ph.D. student or a junior colleague. The reviewer and her junior colleague could prepare comments independently and then compare notes, even though only the original reviewer's comments would be submitted to the editor. This would not only help train the next generation of reviewers; the prospect of comparing notes with one's student might inspire greater effort and more thoroughness than may be typical now.
Editors' Weighting of Conflicting Inputs
In the face of conflicting opinions, editors should not reflexively defer to the reviewers with the biggest research reputations. "Judgments made by great scholars of other scholars' work may be accepted more on the basis of the great's past accomplishments than on the cogency of the current judgment" (Bakinic, McPhail, and Simon 1987 p. 633). My perception from seeing who said what at JCR and JCP was that the correlation between the quality and reputation of scholars' published papers and the quality of their reviews is near zero. There are luminaries who are great reviewers and there are lesser-lights who are bad reviewers. But the other cells are full too. Fletcher and Fletcher's (19997) review suggests the correlation is actually mildly negative. They cite two studies of reviewing for medical journals finding that reviews of more senior, high-status researchers were lower in quality than those of their younger peers.
I applaud Russ Winer's "one year contracts" for members of his Journal of Marketing Research editorial board. Editorial board membership should not be a lifetime appointment. But the best idea I've heard comes from the Journal of Finance (Laband 1990, p. 351), which assigns the best reviewers to the authors of manuscripts who have themselves provided the best reviews in the past. Now the prolific author who is a sloppy reviewer has a reason to change her ways.
I hope that my comments today inspire some of you to change how you approach your two jobs as a reviewer. You can be a critic and treat your role primarily as a gatekeeper, or you can be a coach who works with authors and with the journal to ensure that the best possible papers are published. The evidence shows that coaches have more impact than do critics on the ultimate disposition of a manuscript (Beyer, Chanove, and Fox 1995), so give it a try. Try to avoid reviewer's false consensus. The evidence shows that your most respected colleagues may have a different opinion-and it's nearly certain that the authors may too. If your conclusion isn't obvious to everyone, it becomes important to specify your premises. Articulate your concerns as clearly and as respectfully as you can, both about the manuscript you are reviewing and about positions taken by other parties in the review process. The outcome will be better manuscripts in our journals and a more civil review process to boot. Now go forth and sin no more!
Abelson, Robert (1995), "Chapter 8: Interestingness of Argument", in Statistics As Principled Argument. Hillsdale, NJ: Lawrence Erlbaum Associates, 156-169.
Akerlof, George A. (1970), "The Market for 'Lemons': Quality Uncertainty and the Market Mechanism", Quarterly Journal of Economics, 84 (3), 488-500.
Armstrong, J. Scott (1997), "Peer Review for Journals: Evidence on Quality Control, Fairness, and Innovation", Science and Engineering Ethics, 3 (1), 63-84.
Bailer, John C. (1991), "Reliability, Fairness, Objectivity, and Other Inappropriate Goals in Peer Review" Behavioral and Brain Sciences, 14, (March), 137-138.
Bakanic, V., C McPhail, and R.J.Simon (1987), "The Manuscript Review and Decision-Making Process", American Sociological Review, 52, 631-642.
Bakanic, V., C McPhail, and R.J.Simon (1989), "Mixed Messages: Referees' Comments on the Manuscripts They Review," Sociological Quarterly, 30, 639-654.
Beyer, Janice M., Roland G. Chanove, and William B. Fox, (1995), "The Review Process and the Fates of Manuscripts Submitted to AMJ", Academy of Management Journal, 38, (October), 1219-1260.
Blank, Rebecca M. (1991), "The Effects of Double-Blind versus Single-Blind Reviewing: Experimental Evidence from the American Economic Review," American Economic Review, 81 (5), 1041-1067.
Bollen, Kenneth and Richard Lennox, (1991) "Conventional Wisdom on Measurement: A Structural Equation Perspective," Psychological Bulletin, 110, 305-314
Bradley, J.V. (1981), "Pernicious Publication Practices" Bulletin of the Psychonomic Society, 18, 31-34.
Brinberg, David L., John G. Lynch, Jr., and Alan G. Sawyer, (1992) "Hypothesized and Confounded Explanations in Theory Tests: A Bayesian Analysis." Journal of Consumer Research, 19 (September), 139-154.
Campanario, J.M. (1995), "On Influential Books and Journal Articles Initially Rejected Because of Negative Referee's Evaluations," Science Communication, 16 (March), 304-325.
Cicchetti, Dominic V. (1991), "The Reliability of Peer Review for Manuscript and Grant Submissions: A Cross-Disciplinary Investigation" Behavioral and Brain Sciences, 14 (March), 119-186.
Cole, Stephen (1992) Making Science: Between Nature and Society. Cambridge, MA: Harvard University Press.
Colman, Andrew M. (1991), "Unreliable Peer Review: Causes and Cures of Human Misery," Behavioral and Brain Sciences, 14, (March), 141-142.
Crandall, Rick (1991), "What Should Be Done to Improve Reviewing?", Behavioral and Brain Sciences, 14, (March), 143.
Cummings, Larry , P.J Frost, and T. F. Vakil (1985), "The Manuscript Review Process: A View from the Inside on Coaches, Critics, and Special Cases." In L.L Cummings and P.J. Frost (eds) Publishing in the Organizational Sciences. Homewood, IL: Irwin, 469-508.
Daft, Richard L. (1985), "Why I Recommended That Your Manuscript Be Rejected and What You Can Do About It," In L. L. Cummings and P. J. Frost (Eds), Publishing in the Organizational Sciences. Homewood, IL: Irwin, pp. 193-209.
Fiske, Donald W. and Louis Fogg (1990), "But the Reviewers are Making Different Criticisms of My Paper!" American Psychologist, 45 (May), 591-598.
Fletcher, R.H, and S.W. Fletcher (1997), "Evidence for the Effectiveness of Peer Review," Science and Engineering Ethics, 3, 35-50.
Garcia, J. (1981), "Tilting at the Paper Mills of Academe", American Psychologist, 36 (2), 149-158.
Gottfredson, S.D. (1978), "Evaluating Psychological Research Reports: Dimensions, Reliability, and Correlates of Quality Judgments", American Psychologist, 33, 920-934.
Hasher, Lynn, Mary S. Attig, and Joseph W. Alba (1981), "I Knew It All Along: Or Did I?" Journal of Verbal Learning and Verbal Behavior, 20 (February), 86-96.
Hawkins, Scott. A. and Reid Hastie. (1990). Hindsight: Biased judgments of past events after the outcomes are known. Psychological Bulletin, 107, 311-327.
Holbrook, Morris B. (1986), "A Note on Sadomasochism in the Review Process: I Hate It When That Happens," Journal of Marketing, 50 (July), 104-108.
Horrobin, D.F. (1990), "A Philosophically Faulty Concept Which Is Proving Disastrous for Science", Behavioral and Brain Sciences, 5 (2), 217-218.
Jacoby, Jacob and Wayne D. Hoyer (1989), "The Comprehension/Miscomprehension of Print Communication: Selected Findings", Journal of Consumer Research, 15 (March), 434-443.
Laband, David N. (1990), "Is There Value-Added from the Review Process in Economics?: Preliminary Evidence from Authors," Quarterly Journal of Economics, (Volume?), May, 341-352.
Laming, Donald (1991), "Why Is the Reliability of Peer Review So Low?", Behavioral and Brain Sciences, 14, (March), 1154-156.
Lock, S., and J. Smith (1990), "What Do Peer Reviewers Do?" , Journal of the American Medical Association, 263 (10), 1341-1343.
Peters, D.P. and S. J. Ceci (1982), "Peer Review Practices of Psychology Journals: The Fate of Published Articles, Submitted Again", Behavioral and Brain Sciences, 5, 187-195.
Roediger, Henry L. III (1987), "The Role of Journal Editors in the Scientific Process", in D. N. Jackson and J. P. Rushton (Eds.), Scientific Excellence: Origins and Assessment. Thousand Oaks, CA: Sage, pp. 222-252.
Ross, Lee, D. Greene and P. House (1977), "The 'False Consensus Effect': An Egocentric Bias in Social Perception and Attribution Process." Journal of Experimental Social Psychology, 13, 279-301.
Sternthal, Brian, Alice M. Tybout, and Bobby J. Calder (1987), "Confirmatory versus Comparative Approaches to Judging Theory Tests" Journal of Consumer Research, 14, (June), 114-125.
Yankauer, A. (1990), "Who Are the Peer Reviewers and How Much Do They Review?" Journal of the American Medical Association, 263, 1338-1340.