The agentic web is here: Why NLWeb makes schema your greatest SEO asset

The web’s purpose is shifting. Once a link graph – a network of pages for users and crawlers to navigate – it’s rapidly becoming a queryable knowledge graph.
For technical SEOs, that means the goal has evolved from optimizing for clicks to optimizing for visibility and even direct machine interaction.
Enter NLWeb – Microsoft’s open-source bridge to the agentic web
At the forefront of this evolution is NLWeb (Natural Language Web), an open-source project developed by Microsoft.
NLWeb simplifies the creation of natural language interfaces for any website, allowing publishers to transform existing sites into AI-powered applications where users and intelligent agents can query content conversationally – much like interacting with an AI assistant.
Developers suggest NLWeb could play a role similar to HTML in the emerging agentic web.
Its open-source, standards-based design makes it technology-agnostic, ensuring compatibility across vendors and large language models (LLMs).
This positions NLWeb as a foundational framework for long-term digital visibility.
Schema.org is your knowledge API: Why data quality is the NLWeb foundation
NLWeb proves that structured data isn’t just an SEO best practice for rich results – it’s the foundation of AI readiness.
Its architecture is designed to convert a site’s existing structured data into a semantic, actionable interface for AI systems.
In the age of NLWeb, a website is no longer just a destination. It’s a source of information that AI agents can query programmatically.
The NLWeb data pipeline
The technical requirements confirm that a high-quality schema.org implementation is the primary key to entry.
Data ingestion and format
The NLWeb toolkit begins by crawling the site and extracting the schema markup.
The schema.org JSON-LD format is the preferred and most effective input for the system.
This means the protocol consumes every detail, relationship, and property defined in your schema, from product types to organization entities.
For any data not in JSON-LD, such as RSS feeds, NLWeb is engineered to convert it into schema.org types for effective use.
Semantic storage
Once collected, this structured data is stored in a vector database. This element is critical because it moves the interaction beyond traditional keyword matching.
Vector databases represent text as mathematical vectors, allowing the AI to search based on semantic similarity and meaning.
For example, the system can understand that a query using the term “structured data” is conceptually the same as content marked up with “schema markup.”
This capacity for conceptual understanding is absolutely essential for enabling authentic conversational functionality.
Protocol connectivity
The final layer is the connectivity provided by the Model Context Protocol (MCP).
Every NLWeb instance operates as an MCP server, an emerging standard for packaging and consistently exchanging data between various AI systems and agents.
MCP is currently the most promising path forward for ensuring interoperability in the highly fragmented AI ecosystem.
The ultimate test of schema quality
Since NLWeb relies entirely on crawling and extracting schema markup, the precision, completeness, and interconnectedness of your site’s content knowledge graph determine success.
The key challenge for SEO teams is addressing technical debt.
Custom, in-house solutions to manage AI ingestion are often high-cost, slow to adopt, and create systems that are difficult to scale or incompatible with future standards like MCP.
NLWeb addresses the protocol’s complexity, but it cannot fix faulty data.
If your structured data is poorly maintained, inaccurate, or missing critical entity relationships, the resulting vector database will store flawed semantic information.
This leads inevitably to suboptimal outputs, potentially resulting in inaccurate conversational responses or “hallucinations” by the AI interface.
Robust, entity-first schema optimization is no longer just a way to win a rich result; it is the fundamental barrier to entry for the agentic web.
By leveraging the structured data you already have, NLWeb allows you to unlock new value without starting from scratch, thereby future-proofing your digital strategy.
NLWeb vs. llms.txt: Protocol for action vs. static guidance
The need for AI crawlers to process web content efficiently has led to multiple proposed standards.
A comparison between NLWeb and the proposed llms.txt file illustrates a clear divergence between dynamic interaction and passive guidance.
The llms.txt file is a proposed static standard designed to improve the efficiency of AI crawlers by:
- Providing a curated, prioritized list of a website’s most important content – typically formatted in markdown.
- Attempting to solve the legitimate technical problems of complex, JavaScript-loaded websites and the inherent limitations of an LLM’s context window.
In sharp contrast, NLWeb is a dynamic protocol that establishes a conversational API endpoint.
Its purpose is not just to point to content, but to actively receive natural language queries, process the site’s knowledge graph, and return structured JSON responses using schema.org.
NLWeb fundamentally changes the relationship from “AI reads the site” to “AI queries the site.”
| Attribute | NLWeb | llms.txt |
| Primary goal | Enables dynamic, conversational interaction and structured data output | Improves crawler efficiency and guides static content ingestion |
| Operational model | API/Protocol (active endpoint) | Static Text File (passive guidance) |
| Data format used | Schema.org JSON-LD | Markdown |
| Adoption status | Open project; connectors available for major LLMs, including Gemini, OpenAI, and Anthropic | Proposed standard; not adopted by Google, OpenAI, or other major LLMs |
| Strategic advantage | Unlocks existing schema investment for transactional AI uses, future-proofing content | Reduces computational cost for LLM training/crawling |
The market’s preference for dynamic utility is clear. Despite addressing a real technical challenge for crawlers, llms.txt has failed to gain traction so far.
NLWeb’s functional superiority stems from its ability to enable richer, transactional AI interactions.
It allows AI agents to dynamically reason about and execute complex data queries using structured schema output.
The strategic imperative: Mandating a high-quality schema audit
While NLWeb is still an emerging open standard, its value is clear.
It maximizes the utility and discoverability of specialized content that often sits deep in archives or databases.
This value is realized through operational efficiency and stronger brand authority, rather than immediate traffic metrics.
Several organizations are already exploring how NLWeb could let users ask complex questions and receive intelligent answers that synthesize information from multiple resources – something traditional search struggles to deliver.
The ROI comes from reducing user friction and reinforcing the brand as an authoritative, queryable knowledge source.
For website owners and digital marketing professionals, the path forward is undeniable: mandate an entity-first schema audit.
Because NLWeb depends on schema markup, technical SEO teams must prioritize auditing existing JSON-LD for integrity, completeness, and interconnectedness.
Minimalist schema is no longer enough – optimization must be entity-first.
Publishers should ensure their schema accurately reflects the relationships among all entities, products, services, locations, and personnel to provide the context necessary for precise semantic querying.
The transition to the agentic web is already underway, and NLWeb offers the most viable open-source path to long-term visibility and utility.
It’s a strategic necessity to ensure your organization can communicate effectively as AI agents and LLMs begin integrating conversational protocols for third-party content interaction.



Recent Comments