How VisibilityPulse Scores AI Visibility
Signals v1.2 — Updated June 2026. This page documents what we measure, why we measure it, how the composite score is calculated, and — critically — what we cannot measure.
Jump to: What We Measure·Why These Signals·Scoring Formula·Limitations·What We Don't Measure·Changelog
1. What We Measure
VisibilityPulse audits the technical prerequisites that determine whether AI engines can crawl, parse, trust, and cite your website. Every signal we check is observable from a publicly accessible page fetch — no proprietary data sources, no black boxes.
We do not claim to predict actual citation rates. Citation decisions are made by AI engines themselves based on proprietary ranking algorithms, query intent matching, and real-time retrieval logic that is not publicly disclosed or observable from the outside.
⚡ The four major AI engines are architecturally distinct. ChatGPT Search uses OAI-SearchBot + Bing's retrieval infrastructure. Perplexity runs an independent crawler. Gemini uses Google-Extended + the standard Google index. Claude uses ClaudeBot. A single composite score is a technical readiness proxy — not a cross-platform citation prediction. See Section 4.
2. Why These 8 Signals
Each signal was selected because it represents a technical factor with documented relevance to AI engine crawlability, content extraction, or entity recognition. Five signals contribute to the composite score. Three are reported as qualitative flags.
Weighted Signals — Contribute to Composite Score
AI Crawler Access
We check your robots.txt for the following crawlers. Each engine has a distinct crawl architecture:
- OAI-SearchBot (25 pts) — ChatGPT Search citation crawler. This is what indexes pages for real-time ChatGPT Search answers. This is the primary signal for ChatGPT citation eligibility.
- Bingbot (15 pts) — Bing's crawler. ChatGPT Search retrieval is built on Bing's index (OpenAI / Microsoft partnership). Blocking Bingbot significantly reduces ChatGPT citation eligibility regardless of OAI-SearchBot status.
- PerplexityBot (20 pts) — Perplexity operates a fully independent crawler and index, separate from both Bing and Google. Blocking PerplexityBot removes Perplexity citation eligibility entirely.
- ClaudeBot / anthropic-ai (20 pts) — Anthropic's crawler for Claude.
- Google-Extended (20 pts) — Google's opt-in crawler for Gemini, AI Overviews, and AI Mode. Blocking it does not affect regular Google Search.
- GPTBot — OpenAI's training data crawler. Affects future model training only; has no direct effect on ChatGPT Search citations. We report its status but do not score it. Publishers may block GPTBot (opt out of training) while allowing OAI-SearchBot (stay citable).
Sources: platform.openai.com/docs/bots, openai.com/searchbot, Anthropic ClaudeBot docs, docs.perplexity.ai, Google Search Central
JSON-LD Schema Markup
We detect and score presence of: JSON-LD blocks (any schema), Organization schema, FAQPage schema, Article/BlogPosting/WebPage schema, and BreadcrumbList schema.
Important clarification on FAQPage schema (as of June 2026): Google fully deprecated FAQ rich results on May 7, 2026. The FAQPage schema type remains valid and parseable, but no longer generates visual SERP features. Its remaining value for AI visibility is:
- Enforces clear Q&A content structure that AI extraction systems can parse
- Sends signals to Google's Knowledge Graph that strengthen entity associations
LLMs and JSON-LD: LLMs tokenize JSON-LD as raw text alongside visible page content (Williams-Cook, February 2026 controlled study). The benefit of schema markup comes from the content structure it enforces and Knowledge Graph associations — not from LLMs semantically parsing the markup itself.
Sources: schema.org/FAQPage, Google FAQ deprecation notice (May 2026), Google: Optimizing for generative AI (May 2026), Williams-Cook (Feb 2026): LLMs tokenize JSON-LD as raw text.
Entity Authority
We check: Wikipedia presence (Wikipedia API), Wikidata entity search (Wikidata API), sameAs schema links to authoritative domains, social profile links in HTML, and Organization/Person schema quality.
Brand presence is the strongest predictor of AI Overview visibility. Ahrefs analysis of 75,000 brands (May 2025) found branded web mentions correlate 0.664 with AI Overview citation probability vs 0.218 for backlinks. Entity Authority attempts to proxy this off-site signal using observable on-page and schema data.
ℹ️ On-page entity signals are a proxy. The strongest actual predictor — off-site brand mention volume across Reddit, YouTube, Quora, and industry publications — is not checkable from a page audit. See Section 5.
Source: Ahrefs — AI Overview Brand Visibility Factors (75K Brands), May 2025. Wikipedia API, Wikidata Search API.
Technical Health
We check: HTTPS protocol, server response time, HTML content size, single H1 tag structure, and internal link count. These are foundational prerequisites that determine whether any crawler — AI or traditional search — can access and parse your content.
Google's official May 2026 guidance confirms AI Overviews and AI Mode use the same core ranking and quality systems as regular Search. Technical hygiene that matters for Google Search matters equally for Google's AI features.
Source: Google Search Central — Optimizing for generative AI (May 15, 2026)
Content Freshness
We check: sitemap lastmod dates, schema dateModified /datePublished properties,article:modified_time meta tags, footer copyright year, and presence of active content sections (/blog, /news, /insights).
Rationale: AI engines serving real-time answers prioritise recent, accurate content. Stale freshness signals reduce citation probability for time-sensitive queries.
Qualitative Flags — Reported But Not Scored in Composite
These three signals are checked and displayed but do not affect the composite score. They are directional quality indicators, not precision measurements. Fixing them improves citation eligibility but the relationship is not directly quantifiable from observable signals.
Checks: OAI-SearchBot/sitemap/page status/title/meta description quality. Pass/fail flags only.
Checks: hreflang tags, Open Graph completeness, Twitter Card, canonical URL, HTML lang attribute.
Checks: Person/author schema, Organization social proof, Review/AggregateRating schema, Q&A content patterns, authoritative outbound links (.edu/.gov/.org).
3. How the Score Is Calculated
The composite score (0–100) is a weighted sum of the five scored signals, calculated server-side on each audit run:
Override rules prevent unfair penalisation of globally recognised entities:
- Wikipedia confirmed + Wikidata confirmed → minimum composite of 65
- Wikipedia confirmed + social presence → minimum composite of 55
- Either Wikipedia or Wikidata confirmed → minimum composite of 45
ℹ️ These weights reflect our best current understanding of relative signal importance based on publicly available research. They are not derived from a proprietary AI citation dataset and will be revised as peer-reviewed research matures. All changes are logged in Section 6.
4. Known Limitations
We publish these limitations explicitly because honest framing is more useful than false precision.
⚠️ The four AI engines are architecturally different
ChatGPT Search uses OAI-SearchBot + Bing's retrieval infrastructure. Perplexity runs an entirely independent crawler and index. Google Gemini uses Google-Extended + the standard Google Search index. Claude uses ClaudeBot. A single 0–100 score measures technical readiness against all four systems, but optimising for one does not guarantee visibility on the others.
⚠️ Cross-platform citation overlap is approximately 1.4%
Research tracking 19,556 identical queries across ChatGPT, Perplexity, Claude, and Gemini found a Jaccard similarity of 0.014 (1.4%) between cited URLs (Lee, 2026). A page cited by ChatGPT tells you almost nothing about whether Perplexity will cite it for the same query. Our composite score measures technical prerequisites — not cross-platform citation correlation. Four individually valid signals can still produce a composite that does not predict cross-platform behavior, because the systems they feed do not agree with each other on citation choices.
⚠️ LLMs do not semantically parse JSON-LD markup
LLMs tokenize JSON-LD structured data as raw text alongside visible page content (Williams-Cook, February 2026 controlled study). The citation benefit of FAQPage schema comes from enforcing a clear Q&A content structure that AI extraction can parse, and from Google Knowledge Graph entity signals — not from LLMs reading the JSON-LD tag as semantic markup. "FAQPage schema is the #1 predictor of AI citations" is not a documented finding, and we do not make that claim.
⚠️ We cannot observe actual citation rates
The citation decisions made by ChatGPT, Perplexity, Gemini, and Claude are not publicly observable from outside the AI engine. A perfect VisibilityPulse score does not guarantee AI citations. A low score identifies specific technical barriers you can remove to improve eligibility.
⚠️ Entity authority is a proxy, not a direct measurement
Off-site brand mentions across Reddit, YouTube, Quora, and third-party publications are the strongest predictor of AI Overview visibility (Ahrefs, 0.664 Spearman correlation, May 2025). These cannot be checked from a page audit. Our Entity Authority signal proxies this using on-page and schema data — a directional estimate, not an equivalent measurement.
⚠️ Bingbot correlation with ChatGPT citations is a directional finding
Seer Interactive's analysis found ~87% of ChatGPT citations matched Bing's top organic results — widely cited but not an official OpenAI disclosure. We include Bingbot in our crawler check because of OpenAI's documented Microsoft Bing partnership for ChatGPT Search infrastructure, and treat the 87% figure as a strong directional finding, not a hard official number.
5. What We Don't Measure (Yet)
The following signals are relevant to AI citation probability but are currently outside the scope of what a client-side page audit can check:
- Off-site brand mention volume (Reddit, Quora, YouTube, industry publications) — strongest known AI citation predictor per Ahrefs 2025
- Bing search ranking position — directionally correlated with ChatGPT citation probability
- Actual training data inclusion by any AI engine
- Prompt-level brand recall in AI model responses
- Backlink profile and Domain Rating / Domain Authority
- Real Core Web Vitals (LCP, INP, CLS) from Chrome UX Report field data
- Content quality assessment: answer directness, factual accuracy, reading level, originality
- Google E-E-A-T signals beyond on-page schema (manual review factors)
CrUX API integration for real Core Web Vitals is planned for v1.3. Brand mention proxy signals via off-site crawling are planned for v2.0.
6. Changelog
- Added OAI-SearchBot (ChatGPT Search citation crawler) — correctly separated from GPTBot (training only)
- Added Bingbot to crawler check with explanation of ChatGPT Search retrieval architecture
- Added ChatGPT-User crawler detection
- Standardised signal count to 8 (5 weighted + 3 qualitative flags — consistent across all pages)
- Removed "FAQPage schema is the #1 predictor of AI citations" — replaced with accurate mechanism description
- Clarified Google FAQ rich result deprecation (May 7, 2026) and actual AI extraction benefit of FAQPage schema
- Published this methodology page — addresses "no published methodology" criticism
- Updated robots.txt: added OAI-SearchBot and ChatGPT-User explicit allow rules
- Fixed blog post "rank-in-chatgpt-2026": Step 1 now correctly references OAI-SearchBot + Bingbot, not just GPTBot
- Fixed ItemList schema: removed #1 Predictor false claim from blog post title
- Updated FAQ schema answers in layout.tsx to reference correct crawler architecture
- Renamed signals for clarity across the UI
- Added E-E-A-T sub-checks (author schema, review markup, authoritative outbound citations)
- Added Wikipedia API and Wikidata entity search to Entity Authority signal
- Fixed BlogPosting publisher logo to use square favicon.svg (was sharing the 1200×630 share image)
- Trimmed meta description to ≤155 characters
- Cleaned ItemList schema: removed 4 blog slugs returning 404
- Initial public release: 8 signals, composite scoring formula, Cloudflare Pages deployment
- Multi-UA cascade for WAF bypass (Chrome / Googlebot / curl)
- SSRF protection and rate limiting on analysis endpoint
- Zero-signup, zero-cost architecture
Have a correction or better source?
This methodology is a living document. If you find an error, a better citation, or a signal we should add, we want to hear it. Good criticism makes the product better.
Run Free AI Audit →