Structured Data
Structured data is any machine-readable representation of information on a webpage — most commonly schema.org JSON-LD, but also microdata, RDFa, sitemaps, OpenGraph tags, Twitter cards, and (increasingly) llms.txt files.
Why it matters
Unstructured prose is open to interpretation; structured data is unambiguous. AI engines lean on structured data to disambiguate entities, extract facts, and decide which pages to trust as canonical sources.
Forms of structured data
- JSON-LD schema.org markup — the dominant form. Embedded as
<script type="application/ld+json">. - Microdata and RDFa — older inline forms, still supported but less common.
- OpenGraph tags —
<meta property="og:*">tags for link-preview cards on social platforms and AI engines. - Twitter cards — Twitter / X-specific preview metadata.
- Sitemap.xml — machine-readable list of URLs with last-modified dates.
- RSS / Atom feeds — structured publication streams for AI engines and aggregators.
- llms.txt / llms-full.txt — emerging plain-text structured summary for AI consumption.
Why each form matters
Different AI engines weight different signals. Google AI Overviews lean heavily on schema.org markup. Perplexity uses sitemap and citation data. ChatGPT (with browsing) blends OpenGraph for previews with body-text extraction. The robust strategy is to ship all of them — they're cheap to maintain and each closes a different blind spot.
Common pitfalls
- Drift — structured data goes stale when copy changes but the JSON-LD doesn't get updated. Single-source helpers in code prevent this.
- Over-claiming — marking up content as something it isn't (FAQPage on a blog post that has no Q&A) gets the page penalised.
- Validation gaps — broken schema can be ignored entirely. Test every change.
Related terms


