What Review Scores Actually Mean | Blog

You've spent ten minutes on a game's Metacritic page. The Metascore is 76. You check OpenCritic and it says 78. One forum says that's solid; another says anything under an 80 isn't worth your time. You still have no idea whether to spend $70, and you're somehow less confident than before you started looking.

This confusion isn't your fault. Review scores have been quietly broken for years, and the way they're reported makes the underlying problem harder to spot. Once you understand what a score is actually encoding and where those numbers come from, you'll use them much more effectively. Or at least stop losing sleep over a two-point gap.

The Scale That Made Sense Once

When print magazines were the main source of game criticism, publications were upfront about what each score meant. Electronic Gaming Monthly (EGM), one of the defining outlets of the 1990s, told readers directly: 10 was perfect, 9 was outstanding, 8 was excellent, 7 was very good, 6 was above average, and 5 was average. That last point is the one that matters most. A 5 was not a dismissal. It placed a game squarely in the middle of the scale, which is exactly what the word "average" means.

Reviewers who followed that rubric used the full range. A bad game got a 2 or 3. A functional but unexciting title got a 5. A high score meant something because a low score was a real possibility. The whole system had a kind of honesty to it: the hierarchy reflected actual differences in quality.

That changed as gaming criticism moved online in the late 1990s and early 2000s.

How 7 Became the New "Fine"

As independent websites multiplied, score guides became less visible. Writers arrived without clear rubrics. A quiet cultural shift took hold, borrowed partly from how US schools grade: if 70% is a "passing" grade, then 7/10 must mean "barely passable," and 5/10 must mean failure. But a 5 on a 10-point scale isn't failing. It's supposed to be ordinary.

The numbers drifted upward. According to a 2014 analysis published on Game Developer, IGN's average review score for games was 8.0 out of 10 in 2006. GameSpot's average was 7.0 the same year. Across the industry, scores clustered into a band that left the lower half of the scale essentially empty. Critics became reluctant to go low because a score below 6 read as aggressive, even for genuinely poor games, and there's real social pressure in a small industry not to be the person who torpedoed a small studio's release.

This is what critics sometimes call the "lazy 8" problem: the practical scoring range collapsed from 1-10 into something closer to 7-10, with everything below a 7 treated as a harsh judgment rather than a neutral one. When the scale compresses like this, the differences between scores start to blur. A 7.5 and an 8.5 represent a meaningful gap in the original rubric. After the drift, they're both just "good."

The table below shows how far the intended meaning and the received meaning have drifted apart:

Score	What it's officially supposed to mean	What most readers actually take from it
10/10	Perfect, a genuine masterpiece	Must-buy immediately
9/10	Outstanding, a landmark achievement	Near-must-play
8/10	Very good, clearly worth your time	Good, probably fine
7/10	Above average, worthwhile	Mixed signals, some hesitate
6/10	Average, neither good nor bad	Often read as a negative review
5/10 or below	Flawed but might have value / poor	Terrible, never touch it

That gap between intended and received meaning is where a lot of gaming discourse goes sideways.

A flat brand illustration showing two identical game controller silhouettes: one labeled with a full 1-10 scale bar, one with an arrow crowded into just the upper third of the same bar, clean dark tech aesthetic

Two Aggregators, Two Philosophies

Most players don't read a dozen publications. They consult aggregators, and two sites dominate that space.

Metacritic launched in 1999, created by Jason Dietz, Marc Doyle, and Julie Doyle Roberts. It aggregates scores from roughly 140 publications using a weighted average: certain outlets count more than others based on criteria Metacritic has never publicly disclosed. You can't see what the weights are. You can't tell from looking at a Metascore whether one major outlet drove the number or whether it reflects genuine consensus. Metacritic also doesn't update scores once they're published, which means a review from 2012 that doesn't reflect where a publication stands today still shapes a game's permanent record.

OpenCritic launched on September 30, 2015, built by Matthew Enthoven, Charles Green, Richard Triggs, and Aaron Rutledge directly in response to that opacity. Rather than hidden weights, it uses a simple arithmetic mean: every critic's score counts equally after normalizing to a 0-100 scale. It also shows the percentage of critics who actively recommended the game, and it tags titles into tiers: Mighty (top 10% of all games scored), Strong (next 30%), Fair (the 30th to 60th percentile), and Weak (bottom 30%). Valnet acquired OpenCritic in July 2024, but it remains the more transparent option for anyone trying to understand where an aggregate number came from.

Worth noting: the "% recommended" metric on OpenCritic often tells you more than the average score does. A game with a 74 average and 85% of critics recommending it is in a very different position than a game with a 74 average and 55% recommending it. The first is a game most people liked but didn't love. The second is a polarizing title where some critics were enthusiastic and others were significantly put off. Same number, completely different story.

Neither aggregator tells you whether you'll enjoy the game.

When Scores Have Real Consequences

The moment review scores became tied to developer paychecks, everything about how the industry uses them changed.

The most-discussed example is Fallout: New Vegas. Developer Obsidian Entertainment had a clause in their contract with publisher Bethesda Softworks tying a completion bonus to their Metacritic performance. The game released in 2010 to genuine critical praise and landed a Metascore of 84. The bonus threshold was 85. Obsidian's entire team missed a payout that Jason Schreier reported at Kotaku averaged approximately $14,000 per employee across a 70-person studio. One point on an aggregator using an undisclosed weighting formula.

That story became infamous, but the underlying practice was common. Once developers know scores affect bonuses, producers start identifying what reviewers have historically rewarded and engineering those elements into the game regardless of fit. Marketing teams shape press strategies around review embargoes to maximize opening-week scores. Publishers hold back preview access from publications likely to score low. The score becomes a target that influences design, not an outcome of it.

None of that makes the resulting number more useful for helping you decide what to play. If anything, it makes it less representative of what actual players experience.

What the Number Is Actually Encoding

Here's the plain version: a review score is one person's attempt to compress a complex, hours-long experience into a number that can be compared to other numbers from other people reviewing other games. The aggregated version averages those individual compressions together.

That compression loses almost everything relevant:

Whether the combat style clicks with how you play
Whether the story is the type you can lose yourself in
Whether the reviewer finished the game before publishing (more common than the industry admits)
Whether the outlet's general taste aligns with yours
Whether a significant patch changed the experience after the review ran
Whether the game's community is active if it has multiplayer components

What a score tells you is where a critic's gut reaction fell on their internal hierarchy, compared to other games they've reviewed, on the day they published. That's genuinely useful information. It just isn't a verdict you can adopt wholesale.

How to Actually Read Review Scores

Getting value from scores means treating them as starting points, not conclusions. A few approaches that hold up in practice:

Find a critic whose taste matches yours and follow their work specifically. A consistent voice who has pointed you toward games you've loved (and warned you away from ones you'd have hated) is worth far more than a consensus average. Their 8 means something precise to you. The aggregate's 8 means something averaged across dozens of people with different preferences.

Look at the score distribution, not just the mean. A game with forty scores between 80 and 90 is a different case from one with thirty 85s and ten 45s. The polarized game is telling you something important: some people find it excellent and some find it seriously flawed. Knowing which camp you're likely to fall into requires reading a few of those outlier reviews.

Use the "% recommended" figure on OpenCritic alongside the number. It cuts through the scale drift cleanly. Two games can share a 74 average while one has 82% of critics recommending it and the other has 51%. The percentage measures something simpler and more actionable than the score.

Read at least one review text before trusting the number. The review that gave a game 9/10 might spend two paragraphs mentioning that the final quarter collapses. If endings matter to you, that changes everything. The number tells you what happened; the text tells you why. Only the "why" helps you make a decision.

An illustration of a gamer leaning toward a warm-lit desk, one screen showing a large score number, the other showing dense review text, reading carefully with a mug of coffee nearby

Why the Score You Give Matters Most

Professional critics are doing something genuinely difficult: writing usefully about games they play on short deadlines, for audiences they can only partially know, under institutional pressures that shape what they say and how. Review scores are a flawed shorthand for that effort, and understanding the flaws makes them more useful, not less.

But the score that determines whether a game was worth your time is the one you'd assign after you've played it. That verdict accounts for your history with similar games, your tolerance for specific design choices, whether you played it alone or with friends, and whether it arrived at a moment in your life when you were ready for it. No aggregator captures any of that.

The broader history of how we got from clear magazine rubrics to opaque weighted averages and developer bonus clauses is worth understanding too. It's part of a longer story about how review culture in gaming has evolved from the era when a handful of publications set the agenda to today's scattered, platform-fragmented landscape.

If you want to build a real record of what you actually thought about the games you've played, and have that record mean something over time, it helps to keep it somewhere intentional. On The EndWiki, you can log every game, write your own reviews with as much or as little depth as you want, and build a profile that reflects your actual taste. Your score, your reasoning, your history. Not a consensus you didn't vote in.

Create your free account on The EndWiki and start writing the reviews that actually matter to you.