How Google’s Quality Rater Guidelines shape what NavBoost rewards.

20/05/2026

If you’ve spent any time reading about Google’s ranking systems, you’ve probably bumped into two terms that sound completely unrelated. The first is the Search Quality Evaluator Guidelines, a 175-page document Google publishes for the human contractors who rate search results. The second is NavBoost, the click-based ranking system Google confirmed under oath during the 2023 DOJ antitrust trial. Most SEO content treats these as separate worlds. They aren’t. The human rater guidelines are the standard NavBoost is trained against, and once you understand how that training loop works, a lot of confusing advice about “quality content” and “user signals” starts to make sense as one connected mechanism.

Key takeaways

  • Human raters don’t change your rankings, they train the system that does. Google employs roughly 16,000 contractors worldwide who rate search results against the Quality Rater Guidelines. Their ratings never touch your live rankings directly. What they do is grade Google’s algorithm changes, and the algorithm is rebuilt to match the patterns raters reward.
  • NavBoost is the click-signal layer that translates rater judgments into live ranking adjustments. When raters consistently mark a page as “Highly Meets” the user’s intent, Google’s engineers tune NavBoost to favor the click and dwell patterns those pages produce. The rater data is the answer key; NavBoost is the part of the algorithm being graded.
  • This is why amplifying user signals on high-quality pages works and amplifying them on weak pages does not. NavBoost was built to recognize the click patterns that match what raters call quality. Strengthening engagement signals on a page that already meets the rater standard reinforces a signal the algorithm is designed to reward. Doing the same on a thin, low-trust page produces a signal the system actively learned to discount.

What the Quality Rater Guidelines actually are

Google’s Search Quality Evaluator Guidelines (SQEG, or QRG for short) is a public document. You can download it directly from Google’s site, and the current version runs around 175 pages. It’s been updated continuously since 2013, when Google first published a version externally after an internal copy leaked. Every update adds new examples, sharpens the definitions, and reflects how Google’s understanding of “good search results” has evolved.

The document is written for a specific audience: the human contractors Google hires through firms like Telus International (formerly Lionbridge), RaterLabs, and ZeroChaos. There are roughly 16,000 of these raters working globally, and their job is to look at a search query and the results Google returned, then rate how well those results served the user’s intent. They don’t see your traffic. They don’t see your conversions. They see what a user would see, and they apply the framework in the guidelines to decide whether Google did a good job.

Here’s the part that catches most people out. Rater scores never get pushed to your live ranking. Your site can be rated “Lowest” by a human contractor today and still rank on page one tomorrow. The ratings flow into a completely different process: algorithm evaluation. When Google’s engineers want to ship a change to the ranking system, they run the new version against a sample of queries and ask the raters to compare results from the old version and the new one. If the raters say the new results better match the intent, the change ships. If not, it gets sent back.

So the rater guidelines aren’t a list of rules your site has to follow. They’re the rubric Google’s algorithm changes are graded against. Anything in the document that consistently separates high-rated results from low-rated ones eventually becomes something the algorithm tries to detect on its own.

The two scales that matter: Page Quality and Needs Met

Inside the guidelines, raters apply two separate ratings to every result they see. These two scales are the spine of the whole document, and they map directly onto two different parts of how Google ranks.

Page Quality (PQ) is a rating of the page itself, independent of any specific search query. It runs from “Lowest” to “Highest” and asks: regardless of what someone was searching for, is this a page that does its job well? The PQ rating looks at the purpose of the page, how well it achieves that purpose, the expertise behind it, the reputation of the site, and whether the page has anything that would harm or mislead the visitor. Raters spend a lot of time on PQ because it’s the slowest signal to learn and the easiest to fake.

Needs Met (NM) is a query-specific rating. It runs from “Fails to Meet” to “Fully Meets” and asks: for this specific user with this specific intent, does this result solve their problem? Needs Met is where intent-matching lives. A page can score “Highest” on Page Quality and “Fails to Meet” on Needs Met because the user wanted something completely different. A simple, well-built reference page can score “Fully Meets” for a specific query while only earning a “Medium” Page Quality rating.

Overhead view of printed project management and funnel charts laid out on a wooden desk with a pencil, illustrating structured rating frameworks.

Both ratings matter to the ranking system, but they map to different mechanisms. Page Quality maps to the slower, more durable signals Google uses to assess a site overall: links, brand presence, content depth, expertise indicators, technical trust. Needs Met maps directly to user behavior signals: good clicks, bad clicks, and last-longest-clicks, all of which are query-specific by definition. You can’t separate the rater framework from the click signals NavBoost ingests. The first is the standard the second was built to detect.

E-E-A-T: the expansion of what raters check

Anyone who has read SEO content in the last few years has run into the acronym E-E-A-T. It stands for Experience, Expertise, Authoritativeness, and Trustworthiness, and it’s the framework raters apply when judging Page Quality. The first E (Experience) was added in December 2022; before that, the framework was just E-A-T. The change wasn’t cosmetic. Adding experience was Google’s response to a wave of AI-generated content that could fake expertise on paper but couldn’t fake having actually used the product, visited the place, or lived through the situation being written about.

Each letter has a specific definition raters are trained to apply. Experience is whether the content reflects firsthand knowledge: has the author actually done what they’re writing about? Expertise is whether the author has the knowledge or skill needed for the topic. Authoritativeness is the reputation of the author or site within the topic. Trustworthiness is the underlying foundation that holds the other three up: is the page accurate, honest, safe, and reliable?

The reason E-E-A-T matters for user signals is that the four letters describe the page properties that produce the click patterns NavBoost was trained to reward. A page with strong experience tends to keep readers on it longer because the content has specifics no thin page can match. A page with real expertise gets fewer pogo-stick returns to the SERP because the reader’s question actually gets answered. A page from an authoritative source gets clicked more in the first place because the brand carries weight in the listing. A trustworthy page earns the return visits and direct branded searches that compound over time.

None of these are things Google “measures E-E-A-T directly” through. There’s no E-E-A-T score in the algorithm. There are dozens of correlated signals the algorithm picks up because pages with strong E-E-A-T produce them naturally, and human raters reward those pages, and the algorithm gets tuned to match the raters. The framework is the description of what good looks like. The user signals are the measurable evidence that good looks like that.

The training loop: how rater scores shape NavBoost

Here’s the part most coverage of QRG misses. Raters and ranking systems aren’t connected through a direct pipe. They’re connected through a training loop that has three steps, and understanding the loop is the key to understanding why NavBoost ended up where it is today.

Step one: raters score results. A rater gets a query, sees the current top results, and rates each one on Page Quality and Needs Met. That data flows into Google’s internal evaluation systems. It doesn’t change live rankings.

Step two: engineers propose changes. Google’s search engineers run experiments constantly. They might propose a tweak to how the system weights click behavior, a new way of detecting low-quality clicks, a different decay function for the 13-month NavBoost window, or any of a thousand other adjustments. Each proposal is tested against a large sample of queries.

Step three: rater data grades the experiment. For each candidate change, raters are shown side-by-side comparisons: results from the current system versus results from the proposed system. If raters agree the new results match intent better, the change is approved. If they don’t, it’s rejected. This is the moment rater judgment becomes algorithm behavior. The change ships, becomes live ranking, and the new version of NavBoost is now slightly closer to what raters reward.

Repeat this loop tens of thousands of times across two decades, and you get the NavBoost we have today: a system that’s been continuously shaped to detect the click patterns produced by pages raters call quality. That’s why NavBoost rewards click quality, not click quantity. The training data is rater-defined quality, not raw engagement numbers.

Overhead view of hands reviewing multiple printed dashboards and reports on a wooden desk, illustrating the manual review process behind algorithm training data.

What the Google leak revealed about the overlap

The May 2024 Content API Warehouse leak made the overlap between rater frameworks and ranking signals concrete in a way that previous SEO debate couldn’t. Among the 2,596 leaked internal documents, several feature groups directly mirror the language of the rater guidelines. The leak showed Google’s internal classifiers track things like “site authority,” “page quality predictors,” and feature scores that read like operationalised versions of E-E-A-T dimensions. None of these are E-E-A-T scores in the literal sense, but they’re the signal-level approximations engineers built so the algorithm can predict what raters would say without needing a human in every loop.

Equally important, the leak confirmed click signal definitions that match the rater framework exactly. goodClicks are defined by behaviors raters would describe as “Needs Met” outcomes: a user clicks, stays, doesn’t return to the SERP for a related search. badClicks are defined by behaviors raters call “Fails to Meet”: pogo-sticking back, refining the query immediately, abandoning the session in frustration. lastLongestClick measures the result that satisfies the user enough to end their search journey, which is the operational definition of “Fully Meets” applied to a single result rather than the whole SERP.

This wasn’t an accident. The engineers building NavBoost knew what the rater guidelines defined, and they built signals that approximate those definitions automatically. The leak made the parallel explicit. Once you see it, the rater guidelines and the click signal definitions read like two versions of the same document written for different audiences.

Y-M-Y-L and the asymmetry rater data introduced

One specific concept from the rater guidelines deserves separate attention because it changed how Google’s algorithm treats entire categories of queries: Your Money or Your Life, abbreviated YMYL. The guidelines define YMYL topics as anything where bad information could meaningfully harm a person’s health, finances, safety, or wellbeing. Medical advice, financial planning, legal questions, and government services are the obvious examples. The implication for raters is direct: a page on a YMYL topic has to clear a much higher Page Quality bar before it can be rated as good, regardless of how well it solves the query.

The reason YMYL matters here is that the algorithm learned the asymmetry. Click signals on YMYL queries are weighted differently than click signals on, say, a query about a recipe or a sports score. A page with strong engagement signals on a financial advice query needs to also clear the trust threshold before the engagement signal is allowed to push it up. A recipe page doesn’t have the same gate. This is one of the clearest examples of rater frameworks shaping ranking system behavior: the raters drew the YMYL line, and over time the algorithm learned to apply different filters on the queries that fall on each side of it.

For a site operating in a YMYL category, this matters practically. Amplifying user signals on a thin financial or medical page is fighting the algorithm twice: once on the engagement signal itself, which has to be authentic enough to register, and again on the trust filter, which can override engagement evidence when the page doesn’t clear the YMYL bar. The same campaign on a non-YMYL site doesn’t face the second gate.

Why this matters for your user signals strategy

The practical implication of the training loop is one of the more useful frameworks you can carry into any decision about content, optimization, or amplification. It’s this: NavBoost was built to reward the click patterns produced by pages that meet the rater standard. Anything that helps your page produce those patterns naturally is a multiplier. Anything that produces those patterns inauthentically is something the algorithm was specifically tuned to discount.

Take dwell time as an example. A page that earns long dwell time because the content is genuinely useful is exactly what NavBoost was trained to reward; raters would mark such a page as Fully Meets and the algorithm shipped countless updates to detect the click pattern that follows. A page that buys long dwell time through engagement traps (autoplay video, modal popups blocking the back button) produces what looks like dwell time on the surface but doesn’t match the rater-validated pattern in the data. The system was trained to know the difference because raters were trained to know the difference. Page experience signals get filtered through the same lens, which is why dark patterns that game one metric usually fail the broader detection layer.

The same logic applies in the opposite direction. If your page genuinely deserves to rank, meaning a rater would mark it Highly Meets or Fully Meets if they happened to see it, then the click patterns it should be producing are already what NavBoost is trained to reward. The most efficient leverage is making sure those patterns actually reach the threshold the algorithm needs to detect them. A good page that gets 0.4% CTR at position eight has no chance to demonstrate the click quality it deserves. The same page at 4% CTR has the volume needed to register, and at that point the underlying quality (which the algorithm was trained to weight heavily) does the rest.

The expertise dimension nobody talks about: rater handbooks vs your site

Buried in the QRG is a long list of specific examples of what raters should look for to assess expertise on different types of pages. Medical pages should be written or reviewed by qualified medical professionals. Legal pages should reflect actual legal training. Financial advice should be tied to verifiable credentials. Less formal categories have lower bars: a recipe site might just need evidence of actual cooking experience; a hobby site might need demonstration of personal involvement.

What gets overlooked is the same handbook has guidance for assessing everyday expertise, the kind of knowledge that doesn’t come with a credential but does come from doing something seriously for a long time. A parent reviewing baby products has everyday expertise. A long-time gamer writing about a specific title has everyday expertise. A small business owner explaining how they solved a niche operational problem has everyday expertise. Raters are explicitly told to recognize this kind of authority even when no formal credential is present.

The reason this matters for ranking is the same reason E-E-A-T matters. Pages with everyday expertise produce the click patterns NavBoost rewards: people stay because the specifics ring true, they return because the source proved useful, they search the brand directly because the experience was memorable. The algorithm can’t read your credentials directly, but it can detect the engagement footprint that genuine experience produces. The QRG taught raters to spot it, and rater data taught the algorithm to detect its absence.

Brand signals as the operational shortcut

One signal from the leak deserves attention here because it sits at the intersection of rater frameworks and click behavior: branded search frequency. The rater guidelines tell evaluators to weight reputation heavily when assessing Page Quality. Reputation, from a rater’s seat, looks like consistent positive coverage, recognized presence in the topic, and an absence of significant complaints or warnings. From the algorithm’s seat, reputation has to be approximated through measurable signals, and the most reliable proxy turned out to be branded search volume.

If users search for your brand by name, the algorithm reads that as the operational definition of authority in your category. Brand entity signals compound with click behavior because they pass the rater test (raters would call this a recognized brand) and they produce measurable behavior the algorithm can verify. The two halves work together. Your brand search volume tells the system you’re real; your engagement signals on the ranked pages tell the system the brand earns its position when users actually land there.

This is the connection most “build your brand” advice misses. Building brand doesn’t matter as a vanity metric; it matters because it’s the one signal that’s both rater-validated and algorithm-detectable, which makes it the most efficient bridge between the two systems.

Where this leaves the click-versus-content debate

For nearly two decades, the SEO industry has argued whether clicks or content matter more. The framing was always wrong. Clicks and content aren’t competing inputs; they’re the same input measured at two different stages. Content is what raters can see directly. Clicks are what the algorithm uses to predict what raters would say. A page can earn its position through content quality (raters would approve) and demonstrate that position through click patterns (the algorithm can verify). The two have to align for rankings to be durable.

That alignment is also why misalignment is detectable. A page with high CTR and weak content tends to produce the click pattern (early click, fast return) that NavBoost was trained to read as failure. A page with strong content and zero traffic produces no detectable signal at all, which is its own problem because the algorithm has nothing to confirm against. Identifying pages where the two are out of alignment is one of the highest-leverage analytical moves available in modern SEO.

The implication for User Signal Amplification specifically is direct. USA is most effective on pages that would already pass the rater test if a human saw them. The engagement amplification gives the algorithm the volume of evidence it needs to detect the quality signal it was trained to reward. On pages that wouldn’t pass the rater test, amplification produces a mismatched pattern: surface engagement metrics that don’t match the deeper quality indicators the algorithm checks against. That mismatch is exactly what the system was designed to discount.

How to read the QRG productively

If you’ve never read the actual document, it’s worth doing. Not because you need to memorize it, but because reading it changes how you read every other piece of SEO advice. Most coverage of “what Google wants” is one or two layers removed from the source. The QRG is the source. The patterns Google’s engineers tune the algorithm toward are described, with examples, in the same document.

Focus on three sections when you read it. First, the Page Quality rating examples, which show you what raters call “Highest” versus “Lowest” and why. Second, the Needs Met scale examples, which show you how raters separate “Fully Meets” from “Moderately Meets” for the same query. Third, the YMYL section, because the criteria there apply (with different weights) across the entire algorithm even on non-YMYL topics. Reading these three sections gives you the rubric the algorithm was trained against. Everything else in SEO is downstream of that.

From there, the productive question for any page you work on isn’t “is this optimized” but “would a rater applying this rubric mark this page Highly Meets?” If yes, your job is making sure the click signals reach the threshold needed to confirm it. If no, no amount of user signal work will save the page, because the algorithm was built to detect exactly that kind of inauthenticity.

Frequently asked questions

Q: Do human raters directly affect my Google rankings?

A: No. Raters never see your live rankings, and their ratings never change them directly. Rater data is used to grade proposed changes to Google’s algorithm. When raters consistently prefer results from a new version of the algorithm, that version ships and becomes live ranking. So raters shape the system that ranks you; they don’t rank you.

Q: Is E-E-A-T a ranking factor?

A: E-E-A-T is not a single ranking factor in the algorithm. It’s a framework raters use to assess Page Quality. The algorithm doesn’t have an “E-E-A-T score,” but it has dozens of signals that correlate with what raters call high E-E-A-T, because the algorithm was tuned through years of rater feedback to detect the patterns that good E-E-A-T produces. So E-E-A-T matters, just not as a single metric.

Q: If raters say my page is good, will it rank better?

A: Not directly. A specific rater scoring your page doesn’t push it up. But if your page consistently produces the patterns raters reward (intent-matching content, expertise signals, low pogo-stick rates, strong dwell time), the algorithm was trained to detect those patterns and weight them favorably. The path from rater approval to ranking improvement runs through the click and engagement signals NavBoost ingests, not through any direct rating channel.

Have you tried User Signal Amplification?

Takes 15 seconds. No email required.

© 2026 - All rights reserved!