As I type this, over on Twitter, Dr. Vino, Steve!, and lord knows where else, folks are rehashing—with considerable vitriol—arguments on the merits of 100-point wine ratings, or lack thereof. This got me thinking about what it takes to assign meaningful numerical value to a wine’s attributes—something I have had some experience with at points in my career where I was responsible for various research projects. In light of the current “discussions” surrounding the validity of wine reviews and point scales, I thought it might be of interest to explore what it takes in the research setting to evaluate wines to the objective standard that some feel wine reviewers should aspire to.
Define What Is “Better”
In any discussion of wine, in order to get beyond endless argument over personal opinion there has to be agreement on what constitutes “better”—exactly what is it that makes wine A superior to wine B. This is a non-trivial question that seems to be completely glossed over in the discussions of the merits of wine reviews. In my opinion, in a general sense there is no answer to this question. But my opinion aside, in order to put numerical values on wines there must be universal agreement on the value to be assigned to specific attributes. Simply put, in a research setting the first and most important question is: “what is the goal of this project?” For example, we might say “Chardonnay that shows more minerality, fruitiness, lack of vegetal notes, and creamy texture is better; our desire is to increase these attributes in our wine, so what can we do to increase these attributes?”
Set Up The Experiments
Perhaps we could explore the effects of canopy, crop load and irrigation management in the vineyard. Or maybe we could study options in fruit handling, processing temperature, juice settling, yeast selection, barrel choice, and lees stirring in the winery. First we have to define what are we willing to change, and then rigorously produce wines that reflect the range of these options as closely as possible to how we would treat them in routine production. Ideally, we would do this over several vintages to eliminate uncontrolled seasonal variables in the results.
Train The Tasting Panel
Aye and here’s the rub. Training the tasting panel—more than one person; my preference is for 5 to 7 experienced tasters—is the single most critical control point in assuring that the evaluation of experimental results has any meaning. In the research setting, reference standards for the attributes being tested for must be established, e.g. from the example above: “this is what we mean by ‘mineral,’ this is ‘fruity,’ this is ‘vegetal’ and this is ‘creamy.’” Reference compounds are dosed into neutral wines, and the panel members are drilled to develop their ability to recognize them. If a reference can’t be reliably identified it has to be dropped from the trial. If a member can’t reliably identify a standard obvious to the rest of the panel, that taster has to be removed from the trial. (I recall hearing that Ann Noble at UC Davis used to reward her tasting panel trainees with cookies when they got good at picking out particular attributes. I never found that motivation all that useful, but then the panels I trained weren’t hungry students.)
Present The Trial
This is the easy part. The setting needs to be well lit without distracting sights, sounds, drafts, or especially aromas. The glasses need to be all the same and well-cleaned, without any residue of the cleaner. Importantly, the wines to be evaluated need to be presented to the panel on more than one occasion (3 to 5 seems to be optimal) and these evaluations should be made at the same time of day in each instance. Of course the samples are presented blind and in random order. Reference standards need to be included in the blind presentation—these are to control for panel members having a bad day; if a taster who is usually good at identifying the attributes fails on the standards, their results should be excluded. When I would evaluate a multivariate trial, the tasting sheet for each wine would have the attributes listed, with a 100mm long straight line next to each and the words “low” and “high” underneath the lines to the left and right, respectively. The tasters were required only to put a mark on the line indicating their perception of the intensity of each attribute. The protocol I most often employed was to present the trial wines in ensemble; the tasters were allowed to smell all, taste all, smell all again, and then mark their sheets. The trial wines were presented in different orders for each taster and in each tasting session.
Evaluate The Data
I would slap a ruler on each line and measure where each mark was located: 0mm to 100mm. Each record in the data set comprised the session ID, the taster ID, the trial wine ID, the attribute ID and the associated intensity “value.” A first cleanup pass on the dataset would scrub the records for session/taster/attribute combinations where reference standards were poorly identified. The references I used were usually pretty obvious, so I somewhat arbitrarily set the cutoff at 60; e.g. if a taster failed to identify a standard with an intensity value of 60 or more, their session results for that attribute were excluded from the data set. Finally, I fed the data into statistics software to crunch the numbers. The most robust results came from principal component or factor analyses; non-parametric methods that maximize the variance in the observational data, and then rotate the experimental treatment axes relative to the observational vectors. In the example above, say if the wines produced from different crop loads grouped along the vector for perceived minerality, or perhaps the vector for perceived fruitiness, we could conclude that crop load affects these attributes of the wine.
I don’t own any of the data I generated from my days as a researcher, and I worked for private companies that did not publish the results of the work I did. So to illustrate this kind of analysis I have lifted a pretty decent graphic from a published study exploring the effects of yeast selection on the attributes of Sauvignon Blanc:
I leave it to the (very) interested reader to look deeper into this statistical approach.
The Bottom Line
What I have tired to convey here is not the method, but rather a sense of the level of rigor I believe is necessary to perform an objective evaluation of wine—to be able to conclude with reasonable certainty that one wine is “better” than another, according to some specific definition of what constitutes better-ness. Would it surprise anyone to find that I view any expectation of inviolate veracity for 100 point scores to be hopelessly naive? Given the work I have done, I have earned the right to tell y’all that any insistence that someone reviewing many wines a day can approach tasting with this level of rigor and reproducibility is misplaced to the point of irrationality.
In The Trenches
I have huge respect for anyone who reviews wines for a living. It is hard work. In the argument over the meaning of scores—inflated or not—I come down on the side that scores are a shorthand valued by a culture that views everything in terms of a competition and shuns relativism. I truly believe that most if not all reviewers would prefer not to use scores if they had a choice, but that consumers demand them. I also believe that a certain slice of consumers are 100% to blame for the expectation that scores must reflect some sort of absolute. I don’t see a single reviewer claiming omniscience, infallibility, or the inviolability of their scores or evaluation methodologies. And I don’t fault wineries for touting scores to move their product—that’s just good business sense. But anyone who buys by scores and truly expects that anybody’s 96 is objectively “better” than an 88, every time, to every person—as the sage said, there’s one of them born every minute.
Yeah, I said it. Oh yeah, I really did. Sucker.
Today I was reading the February 2012 issue of Road & Track magazine (in paper, thank you—I’ve been a subscriber for nearly 40 years) and was struck by this bit from the opening “Road Ahead” column by Editor In-Chief Matt DeLorenzo:
‘What’s a good car?’ It’s a common question put to enthusiasts, yet impossible to answer because invariably part of the reason it’s asked is to validate the questioner’s own opinion. What really is a good car? More often than not, you end up engaging in Socratic dialogue to find out the person’s needs or wants before settling on an answer.
A better question is what car do you love? The beauty of this approach lies in its subjectivity, as opposed to the objectivity demanded by the ‘what’s a good car’ question. If someone is seeking your opinion, shouldn’t the answer be more subjective than objective? This also opens the door to allow passion to enter the discussion rather than simple data.
You can love a car for many reasons, both rational and irrational, the latter being eminently more fun than the former. So… we’ve decided to bring you a loose collection of cars we love. We aren’t saying these are the definitive best cars in the world, but rather cars worthy of not just your attention, but more importantly, your affection.
Subjectivity. Passion. Fun. Affection.
I could not agree more with Mr. DeLorenzo. As he suggests with cars, the joy of wine appreciation is sucked out by the “simple data” implied by scores. I have come to realize that James Suckling gushing “I’m 100 points on that!” is more expressing enthusiasm and emotional honesty about a particular experience than he is saying “this 100 point wine is objectively better than that 96 point wine.” Score inflation? Wines getting better? No, I don’t think either. I think that reviewers are just getting more enthusiastic about wines they love.
I wish reviewers would give up the pretense of objectivity. If we acknowledge that when a reviewer gives a high score it means they love that particular wine—no more, no less, and with no expectation that that score fits in a wider, objective context—we would all be happier. Kumbaya.


by Sue Langstaff
11 Jan 2012 at 09:46
Nice post, John. Keep up the good work! If people hear it enough times, they might finally begin to believe us.
Amen.
by John M. Kelly
11 Jan 2012 at 10:23
Thanks Sue—great to hear from you! (And for those of you who don’t know, Sue is one of the top wine sensory researchers in the world.)
by Morton
11 Jan 2012 at 10:15
The reason for the pretense of objectivity is that the critic’s customer, the reader/wine buyer/collector/consumer wants objectivity. The customer wants a sort of Consumer Reports evaluation of wine, objectively evaluated by experts. That is why both the critic and the consumer lean to the 100 point score in the belief that there was an objective, measured, mathematical summation based on traits that exist in the products.
The consumer does not look at a score with the belief that is a subjective measurement influenced by the critic’s opinion of the label influenced by the wineries marketing message, the attactiveness of the winemaker, and the perks given to the critic by the wine producer. The consumer has every right to expect that the critic has adhered to the basic elements of professional sensory evaluation that are necessary to produce accurate and reproducible evaluations.
Wines are expensive to most wine buyers. If they knew they were buying wines that were rated and scored in a haphazard, amateuristic manner, they wouldn’t bother to subscribe to the publication or read the blog. Hence the pretense of objectivity.
by John M. Kelly
11 Jan 2012 at 11:20
Hi Morton – While I agree with your opening premise, you lose me when you state: “The consumer has every right to expect that the critic has adhered to the basic elements of professional sensory evaluation that are necessary to produce accurate and reproducible evaluations.” The entire point of my post is that this expectation is wildly naive and irrational. How are “basic elements” defined? What is your standard for “accurate and reproducible”? My point in this post is that even with the rigor implied in the methodology I presented, “accurate and reproducible” is a matter for statistical calculation.
I do not believe professional wine reviewers taste in a “haphazard, amateuristic manner.” However, to expect infallibility of even our best tasters is cruel. Even Harry Waugh could confuse Burgundy with Bordeaux in a blind tasting—once in a while, perhaps over lunch. As far as I am concerned, the only right the consumer has in this area is to take responsibility for the willful suspension of their own disbelief.
by Thomas Pellechia
11 Jan 2012 at 10:55
John,
I’m sure you know that I could not agree more with your post, and Mr. DeLorenzo, too (as well as Sue, with whom I’ve discussed this matter earlier on other blogs).
The problem, however, is that there is no incentive for reviewers to drop the pretense of objectivity. The idea that they do their work for the good of the consumer is what gives them currency, and if they reduce their work to merely subjective opining, who should care?
To me, the better solution would be for the wine industry and reviewing industry agree on and establish evaluation parameters (standards) and that–maybe–professional reviewers undergo sensory and organoleptic training, not to talk like scientists but to have grounding in the composition of the product that they are evaluating.
I believe that this is the kind of thing Maynard Amerine had in mind a long time ago.
Of course, this is my opinion and I also know that there’s a fat chance something like it will ever happen, not as long as opinions are confused with facts the way “reality” TV is confused with reality.
by John M. Kelly
11 Jan 2012 at 11:29
Hi Thomas – I’m working on a post comparing “Technical-Grade” evaluation with the “Research-Grade” approach I presented here. It is fairly easy to develop skills in evaluating and describing wines on a few evaluation parameters—the UC Davis 20-point scale was based on this approach. But as a way to communicate, to write something people want to read about wine that isn’t boring and dry, the 20-point scale is a total failure.
As to who should care – well, who should care about any other kind of review? Restaurants, theater, movies, music, literature, cars, cell phone customer service? Are any of these reviewed on a 100-point scale? Not that I can think of.
by Steve Nelson
11 Jan 2012 at 11:43
John, you hit the nail on the head. I award you 99 points for this extraordinary article.
by John M. Kelly
11 Jan 2012 at 11:58
Thanks, Steve – I’m 100 points on your 99 points!
by SUAMW
11 Jan 2012 at 12:04
Very disappointed to see you outline a system of wine evaluation not far from mine and then declare it impossible, “irrational”.
I agree with Thomas: “better solution would be for the wine industry and reviewing industry agree on and establish evaluation parameters (standards) and that–maybe–professional reviewers undergo sensory and organoleptic training, not to talk like scientists but to have grounding in the composition of the product that they are evaluating.”
Too bad the egoes and non-scientific backgrounds of most wine reviewers do not allow them to recognize that they could actually be *wrong* in the things they say about wine…
by Thomas Pellechia
11 Jan 2012 at 13:04
John,
“Restaurants, theater, movies, music, literature, cars, cell phone customer service? Are any of these reviewed on a 100-point scale?”
Indeed, they are not. But in most cases, people who review those products/arts have some form of grounding in the manufacture, production, and or performance of the products/arts (maybe restaurant critics are the exception, but many that I’ve encountered are trained cooks–and some are interior designers
).
With wine, it is exceedingly rare that top critics have such grounding in the product that they evaluate.
It matters not to me, because I eschew aesthetic criticism in general, deciding to make my own evaluations concerning the things in life that excite me. But as a wine professional, I find the critic/review process deplorable.
by John M. Kelly
11 Jan 2012 at 13:45
Thomas – I will stipulate that many wine critics lack grounding in grape growing and wine production when they start out, but it seems to me that every one I have interacted with has picked up more than a little grounding on the job.
I deplore the expectation of some consumers that wine evaluation by critics is somehow different from other forms of criticism. We can read Foucault and Heidegger and find something of value in each, without ever bestowing on either the mantle of infallibility and absolute truth. On a more quotidian level, we can read Consumer Reports for vacuum cleaner reviews and find they give a poor rating to a unit we have used happily for years without experiencing cognitive dissonance. But for some reason, wine is different?
by Jennifer R Thomson
11 Jan 2012 at 14:19
MSNBC recently ran a piece about UC Davis researchers and “How scientists learn to speak ‘the language of flavor’” I found the most interesting part of the article to be the scientific approach the institution takes when – just as you’ve proposed – they “train the panel” (http://on.msnbc.com/zuVzam)
As a Pinot Noir Grower, I was thrilled to have been invited by Barbara Drady, organizer of the Pinot Noir Shootout, to judge the final round of Pinots in December for the 2012 contest. She asked that judges take copious notes, support their ultimate score, and suggested a variety of point systems and scales utilized by Wine Spectator and the like.
I sipped through 32 or 34 Pinots and chose to assign a numerical value based on 70, 80, 90 standing firm on my belief that NO ONE consumer, expert, winemaker or otherwise can truly differentiate between a 92 and a 93 point wine.
By number 27 or so, I found it difficult to concentrate or articulate on paper or even decide upon a number.
Nonetheless, it was a wonderful experience. I believe more Growers should be included as tasters/judges to add diversity and expertise to often narrowly focused panels; the scoring system should in some way be streamlined or one approach adopted by the various “point scoring machines” and that wine critics (and bloggers) should probably find something better to spend their time criticizing.
Thanks for the scientific post John. Good to see your winemaking self back online again. Let’s hear it for the 2012 growing season!
by John M. Kelly
11 Jan 2012 at 17:00
Jennifer – thanks for stopping by! Tasting 32-34 wines, writing notes and making an “objective” evaluation for every one all in one sitting is a lot of hard work, isn’t it! And yet, with all the wines out there to taste, the intrepid wine reviewer might do three times that in a day. Every day. The very thought gives me the shivers.
The article you cited said something to the point that doing a “research-grade” tasting of just 18 wines could take 5 weeks. In my experience that is the time scale required for a panel to do a thorough quantitative evaluation of a group of wines. And yet people throw brickbats at reviewers for not pursuing their trade at that level of rigor.
I’m glad you got to participate on the Pinot Shootout panel. It is great for the usual suspects to get a grower’s perspective—I hope you also got something out of seeing that aspect of the end user world.
by Blake Gray
11 Jan 2012 at 16:53
John: A very minor point about something that came up in comments:
Increasingly, music and movies ARE being graded on the 100-point scale, mainly by aggregator sites, which apply a 100-point-scale equivalent to ratings from various critics. This is particularly difficult for New York Times ratings, which use no scale whatsoever, but I have seen them translated.
In the music fandom world, these aggregator sites with 100-point-ratings seem to be increasingly influential.
I often argue that critics’ ratings matter little for Hollywood movies, but that’s tangential: in fact, imdb.com, which for many (including me) is the first stop for movie information, gives an aggregate rating from 0.0 to 10.0 — the 100-point scale, with a different decimal point.
Not arguing pro scale or anti scale here today. Just pointing out that wine isn’t uniquely 97′d (ooooooooh, I made a number into a verb! Take that Steve!)
by John M. Kelly
11 Jan 2012 at 17:33
Blake – That’s a really good point about aggregator sites. It makes a lot of sense that meta-analyses of ratings from disparate sources would perform some sort of statistical normalization. It’s like me measuring the values on my tasting panels’ attribute intensity lines with a ruler.
But heck this is hardly the first time a number has been verbed – I think we are about to zero in on the fact it is time to 86 the idea that wine scores imply an objective standard.
by Tina Morey
11 Jan 2012 at 21:10
Hi John,
Why do we invariably assign a rating system to our tangibles? Because we crave it—we’ve become accustomed to be told that this is the way it should be—it’s easy and most folks like easy, especially in our fast-paced, “difficult” lives. But what happens when we “lead” folks in a direction that allows them to think, instead of react.
For me, this is the next level, a higher level. Where each person makes his/her own decisions based on many factors, hopefully on the experience itself, not just a #.
“I think we are about to zero in on the fact it is time to 86 the idea that wine scores imply an objective standard.” Priceless.
Thanks for this exquisitely-written piece.
by John M. Kelly
12 Jan 2012 at 08:37
Thanks, Tina. I think we rate our intangibles as well. People, too—as we watch the gladiatorial spectacle of the Presidential candidates being voted off the island. Thumbs up, or thumbs down—the ultimate digital binary rating system. This morning Blake Gray has a piece up suggesting that some people embrace scores to impose order on an uncertain world—even if it is the illusion of order.
Blake talks about “score-haters” and reading some comments I have no doubt that there is a lot of hate out there over scores. I’m not a score hater. I do wish that people who think scores are the be-all and end-all would wake the hell up.
Yesterday Tom Wark and I had a brief exchange on Twitter where I threw out the idea that “scores are emoticons.” Follows up the last couple paragraphs in my original post—as in “This wine is awesome! I’m 98 points on it
#winefuckyeah” Hmmm… I suppose scores could be viewed as equivalent to Tumblr tags too.
Tom took it up a notch and suggested that the idea of scores as emoticons invoked Hume and Kant. I’m not all that well-read in philosophy; my understanding of Hume and Kant is superficial at best. As I have understood Hume, he rejected Cartesian rationality and embraced passion as the motivation of behavior. Viewed in that light, perhaps scores are reflections of our passion for wine. Emoticons indeed. Kant I’m not so sure of, unless the connection is the idea that scores reflect our innate need to impose rationality on subjective experience.
This discussion needs to be taken down many notches. Josh Hermsmeyer (@pinotblogger) put up a little graphic that I saw as a corollary to wine scoring:

…which led me to recall reports about a phone app that allows you to geo-tag, and rate, your sexual activity, and share the info with your friends. Talk about human desire motivating our rationality. Coming soon (so to speak)—”Rate Your O”!!! Imagine how this would put a proper perspective on the “wisdom of crowds.”