Skip to content
pseo-geo-aeogeo-pageslocal-seopseopseo-geo-aeo

Geo Pages That Don't Get Penalized: The Content Uniqueness Bar

A geographic landing page can rank for years or get deindexed in a week. The difference is the uniqueness bar. Here's the 500-word, 40-percent n-gram threshold.

rj-murray, Contributor · April 25, 2026 · 12 min read

Geo page uniqueness threshold

title: "Geo Pages That Don't Get Penalized: The Content Uniqueness Bar" slug: geo-pages-that-dont-get-penalized description: "A geographic landing page can rank for years or get deindexed in a week. The difference is the uniqueness bar. Here's the 500-word, 40-percent n-gram threshold." pillar: pseo-geo-aeo author: rj-murray publishedAt: "2026-04-25T00:00:00Z" tags: ["geo-pages", "local-seo", "pseo", "pseo-geo-aeo"] coverImage: /posts/geo-pages-that-dont-get-penalized/cover.png coverAlt: "Geo page uniqueness threshold" featured: false faq:

  • q: "What is the minimum word count for a geo page that won't be flagged as a doorway page?" a: "We hold every geo page to a 500-word floor of unique body copy, not counting nav, footer, or shared CTAs. That number is not magic, it is the point at which there is enough room for genuine local detail to differentiate one page from another. Pages under 300 words almost never clear the n-gram check, and pages between 300 and 500 only clear it if the writer is unusually disciplined."
  • q: "How do you measure n-gram uniqueness in CI?" a: "We tokenize every geo page, generate trigram and 4-gram sets, and compute Jaccard distance between each pair. The minimum threshold is 40 percent differentiation, meaning at least 40 percent of the n-grams on page A do not appear on page B for any pair in the set. The check runs as part of the build and fails the build if the threshold is breached. The implementation is in our pSEO engine package."
  • q: "When are geo pages the wrong move?" a: "Single-location businesses with no multi-town service radius, national-only SaaS with no in-person delivery, and any business whose service is the same in every market. Geo pages exist to capture intent that is meaningfully local. If the offering does not vary by location and there is no local proof to point to, geo pages will read as filler and get treated as filler."
  • q: "Will Google penalize a small set of well-written geo pages?" a: "No. Google's spam policy targets pages that are templated, near-duplicate, and built primarily to rank rather than to serve a local user. A handful of geo pages with real local detail, real schema, and a real underlying data shape are not what the policy is aimed at. The risk scales with thinness, not with count."
  • q: "How long does it take to add geo pages to an existing site?" a: "We run a 14-day playbook: three days of data collection per location, three days of writer drafting against a structured brief, two days of CI integration and schema, two days of internal-link wiring, and four days of indexing and Search Console monitoring. A site with 8 to 12 locations clears the playbook end-to-end in two weeks."

A geo page can rank for five years or get deindexed in a week. The difference is not the template, the design, or the schema. The difference is whether the page clears a measurable uniqueness bar before it ships.

We hold every geo page to a 500-word floor of unique body copy and a 40-percent n-gram differentiation threshold against every other geo page in the set. The check runs in CI. Pages that fail do not ship. This post covers the policy text we hold ourselves to, the CI rule we enforce, two real client builds that cleared the bar, the data shape behind a real geo page, the cases where geo pages are the wrong call, and a 14-day playbook for adding a geo set to an existing site.

tl;dr

Google's spam policy treats templated, near-duplicate location pages as doorway pages and removes them. The operational rule we run on every client build is 500 words of genuinely local copy per page and 40 percent n-gram differentiation across the set, both checked in CI. Karpentor's 10 town pages and Burris and Sons' 8 neighbourhood pages cleared the bar through real local detail, not template fill. A pSEO page without 500 unique words and a real underlying data shape is a doorway page waiting to get deindexed.

What Google means by "doorway pages", in the actual policy text

The relevant policy is Google's spam policies for Google web search, specifically the doorway-pages section. The policy defines doorway pages as sites or pages created to rank for similar search queries, where each page is essentially a templated version of the others, funneling the user to a single destination.

The policy lists four characteristic patterns. Multiple pages targeting variations of the same query. Pages that funnel users to a single destination rather than serving the user themselves. Multiple domain names targeting specific regions or cities that funnel users to one page. Substantially similar pages that are closer to search results than a clearly defined hierarchy.

The phrasing matters. Google does not say "many location pages are bad." It says templated, near-duplicate, low-value-to-the-user location pages are bad. The same policy is consistent with a site having 50 town pages if each one earns its place. The line is drawn at uniqueness and user value, not count.

The penalty for crossing the line is not subtle. Affected pages are removed from the index. Whole sections of a site can be demoted. The fix is rarely "rewrite the offending pages." The fix is usually "delete the set and start again with a real data shape."

The 500-word, 40-percent n-gram uniqueness bar we run in CI

Every geo page we ship has to clear two checks before it goes to production. The first is a body-copy word count of at least 500, excluding nav, footer, shared CTAs, and structured data. The second is a 40-percent n-gram differentiation threshold against every other geo page in the set.

The n-gram check is a Jaccard distance computed on trigram and 4-gram token sets. For every pair of geo pages, the build computes the size of the intersection over the size of the union of their n-gram sets. The page passes if at least 40 percent of its n-grams are not shared with any single other page in the set. The check runs against the published HTML, after MDX compilation, so the score reflects what Google's crawler will actually see.

Pages that fail are not patched at the eleventh hour. They are sent back to the writer with the offending n-gram overlaps highlighted. Most of the time the fix is the same: the writer leaned on shared template phrasing for the middle paragraphs and needs to replace them with location-specific facts. Service hours, lead times, named technicians, named projects, named streets and intersections, local code requirements, weather and seasonal pressure on the trade, named suppliers in the area.

We run the check in CI for the same reason we run TypeScript in CI. Quality bars that depend on writer discipline alone will fail eventually, and at scale the failure looks like a deindex event. Our pSEO engine ships this check as part of the default build configuration. Every client gets it whether they ask for it or not.

The 40-percent threshold is calibrated, not arbitrary. Below 25 percent the pages read as templates with the city name swapped. Between 25 and 35 percent they pass a casual human read but fail Google's near-duplicate clustering on technical inspection. At 40 percent and above the pages read as independently written and survive the Rich Results test plus a manual sample audit at scale.

How Karpentor's 10 town pages cleared the bar

Karpentor has run residential renovations in southern Ontario for 25 years. Decks, fences, interior renos, three distinct service lines. We shipped a 36-page rebuild in 24 days with separate service trees per vertical, 10 geo pages for the towns they actually serve, and 8 launch blog posts on scope, permits, and material choice.

The 10 town pages were the part of the build most at risk. The Karpentor team had originally drafted them in a single afternoon by swapping the town name in a single template, which is the textbook doorway pattern. We threw that draft out and ran a structured intake instead.

For each town we collected: a list of completed projects in or adjacent to that municipality with start and end dates, the local building permit office address and turnaround time, the deck and fence regulations specific to that municipality, the named lumber and hardware suppliers Karpentor actually orders from for jobs in that town, the lead carpenter who runs that geography, and three to five photos of completed work in or near the town.

The writer turned that intake into 600-to-800-word pages, each one written from the local data. The Oakville page leads with the Town of Oakville Building Services permit window because that is the rate-limiting step on most jobs there. The Burlington page leads with Conservation Halton setbacks because waterfront properties drive the question pattern in that market. Neither page contains the other page's lead paragraph in any recognizable form.

The set passed the n-gram check on the first CI run. The lowest pair-wise differentiation across the 10 pages was 47 percent. The highest was 71 percent. Six months after launch, all 10 pages were indexed, all 10 had ranked impressions in Search Console, and three were in the top three positions for their primary geo-modified query.

How Burris and Sons' 8 neighbourhood pages preserved heritage detail

Burris and Sons has run HVAC in Chicago since 1917. We rebuilt their 12-page static site into a 30-page Next.js build in 21 days, with eight neighbourhood geo pages written with genuinely distinct copy per neighbourhood. We kept the family photography rather than re-shooting. The heritage look was an asset, not a liability.

The Burris team's advantage was that they have a hundred years of work in a defined service area. Each of the eight neighbourhood pages drew on installation history that no template could fake. The Lakeview page references the prevalence of two-flats with retrofit ductwork and the specific pre-war boiler models the team has converted. The Pilsen page references the heat-island effect on the brick row buildings and the sizing implications. The Hyde Park page references the University-area condo associations Burris has standing service contracts with.

We preserved the original family photography that had been on the heritage site for two decades. The grandfather of the current owner is in three of the photos. The photos went on the about page and were referenced from the neighbourhood pages where his grandfather had personally done installations. That is not template fill. That is local detail with a hundred-year provenance attached.

The eight-page set cleared the n-gram check at a minimum 52 percent differentiation. Three of the eight pages outranked the existing national HVAC aggregators for "hvac repair Lakeview" and similar queries within four months of launch.

The data shape behind a real geo page

A geo page that survives is not a page, it is a data shape rendered as a page. The shape we use as a baseline includes a LocalBusiness schema block with serviceArea polygon, hours of operation, payment methods accepted, and a sameAs array linking the canonical Google Business Profile and any local directory citations.

On top of that we layer location-specific facts pulled from the intake. Local response time. Named technicians for the geography. A list of completed work for the town with year, scope, and a one-line outcome. Local pricing if it varies. Local certifications, permits, or municipal registrations. Three to seven internal links to related service pages, case studies, and parent-area pages. Two to four external citations to authoritative local sources, the municipal building department or chamber being the typical pair.

The schema and the body copy reinforce each other. The crawler reads the LocalBusiness block and gets a structured signal. The user reads the body copy and gets a human one. When the two agree, the page indexes fast and stays indexed. When the two disagree, the page either gets demoted or treated as low-quality even when it ranks. We cover the broader role of structured data in AEO and answer engine ranking and in the llms.txt file post.

The data shape is the part of geo-page work that most agencies skip. It is also the part that compounds over time. A page with a real LocalBusiness block and real local facts can be updated in place as facts change. A page that is just templated prose has nothing underneath it to update.

When geo pages are the wrong move

Geo pages are not the right call for every business. National SaaS with no in-person component and no regional pricing should not run a geo set. Single-location service businesses with a clear primary market should invest in the one location page they have, not split attention across invented variants. Pure ecommerce with no local fulfillment should treat product pages as the unit of content work, not geographies.

The test we use on intake is whether the offering meaningfully varies by location. Variation can be in lead time, pricing, named team, regulatory environment, language preference, or service catalog. If at least one of those varies and there are at least five locations to write about, geo pages are usually a positive ROI move. If none of those vary, geo pages are filler and Google will treat them as such.

A common mistake is for a single-location business to spin up geo pages for every neighbourhood in their city as a pSEO play. The pages get indexed for a few weeks, then the Helpful Content classifier catches up and the pages are demoted as a group. We have audited this pattern on three inbound prospects in the last quarter. In all three cases the recommended move was to delete the geo set and consolidate the link equity onto the primary location page. We covered the broader audit pattern in the mid-market SEO reporting framework and the budget conversation in why CMOs should kill paid search budget.

If the business does have a real multi-location story but no content team to support it, geo pages still are not the right move yet. Build the data collection process first, then the pages. We discussed the broader migration sequencing in WordPress to Next.js migration path and why mid-market companies keep getting stuck on WordPress.

A 14-day playbook for adding geo pages to an existing site

The playbook below assumes a site with 8 to 12 target locations and an existing CMS. It compresses to two weeks because most of the work is data collection and writing, not engineering.

Days 1 to 3, intake. The account lead runs a structured intake call per location, or one consolidated call per region. The intake template captures: completed projects with dates, named team for the geography, local regulatory or permit detail, local supplier names, local photo assets, and the primary query pattern the location should rank for.

Days 4 to 6, drafting. A senior writer drafts each page to a 500-to-800-word target against the intake brief. Drafts are written in MDX, committed to a feature branch, and pushed for the n-gram check to run.

Days 7 and 8, CI integration and schema. The pSEO engine runs the n-gram check, the word-count gate, and a LocalBusiness schema validation against schema.org. Pages that fail any check are sent back to the writer with the failing scores attached. Pages that pass go through a second human edit pass for tone and voice.

Days 9 and 10, internal link wiring. Each geo page gets three to seven internal links to related service pages, case studies, and parent-area pages. Each service page gets a "service areas" block linking to the geo set. The build passes a final link audit that confirms no orphans and no broken links.

Days 11 to 14, indexing and monitoring. Pages are pushed to production, submitted to Search Console, and monitored daily. Indexing latency on a well-built set is 24 to 72 hours. Pages that are not indexed within five days are flagged for a structural review. We covered the broader 90-day rollout shape in the 90-day organic growth plan and the speed proofs in real Lighthouse scores before and after 6 mid-market rebuilds and Core Web Vitals changed in 2025. The 48-hour rebuild motion that anchors all of this is documented in the 48-hour before-after demo post.

The 14-day shape is the right shape because it forces the data work to happen before the writing. Most failed geo-page sets we have audited were written in three days and shipped without intake, which is exactly the path Google's policy was designed to catch.

Closing

A geo page is not a thing you write. It is a data shape you render. The shape clears the policy line if it has 500 words of genuinely local copy, 40 percent n-gram differentiation against the rest of the set, a real LocalBusiness schema block, and a real underlying intake to support both.

A pSEO page without 500 unique words and a real underlying data shape is a doorway page waiting to get deindexed. We hold ourselves to that line on every client build, we enforce it in CI, and we ship the engine that runs the check on every site we deliver.

If your existing geo set is more than three pages and was written in less than a week, the audit is worth doing this quarter. The penalty for being wrong is a deindex event, and the recovery from a deindex event is rebuilding the set from scratch under a colder Search Console relationship. The cheaper move is the 14-day playbook above, run once, on a set that earns its place.

Frequently asked

What is the minimum word count for a geo page that won't be flagged as a doorway page?
We hold every geo page to a 500-word floor of unique body copy, not counting nav, footer, or shared CTAs. That number is not magic, it is the point at which there is enough room for genuine local detail to differentiate one page from another. Pages under 300 words almost never clear the n-gram check, and pages between 300 and 500 only clear it if the writer is unusually disciplined.
How do you measure n-gram uniqueness in CI?
We tokenize every geo page, generate trigram and 4-gram sets, and compute Jaccard distance between each pair. The minimum threshold is 40 percent differentiation, meaning at least 40 percent of the n-grams on page A do not appear on page B for any pair in the set. The check runs as part of the build and fails the build if the threshold is breached. The implementation is in our pSEO engine package.
When are geo pages the wrong move?
Single-location businesses with no multi-town service radius, national-only SaaS with no in-person delivery, and any business whose service is the same in every market. Geo pages exist to capture intent that is meaningfully local. If the offering does not vary by location and there is no local proof to point to, geo pages will read as filler and get treated as filler.
Will Google penalize a small set of well-written geo pages?
No. Google's spam policy targets pages that are templated, near-duplicate, and built primarily to rank rather than to serve a local user. A handful of geo pages with real local detail, real schema, and a real underlying data shape are not what the policy is aimed at. The risk scales with thinness, not with count.
How long does it take to add geo pages to an existing site?
We run a 14-day playbook: three days of data collection per location, three days of writer drafting against a structured brief, two days of CI integration and schema, two days of internal-link wiring, and four days of indexing and Search Console monitoring. A site with 8 to 12 locations clears the playbook end-to-end in two weeks.

Want your site to read like this does?

We use analytics to understand which pages help, with PII redacted and session inputs masked. Your form submissions always reach us regardless of this choice.