Making Your Site Visible to AI Agents with Cloudflare
The way people find things online is changing. It used to be all about Google and SEO. Now, an increasing amount of traffic is coming from AI agents and crawlers that need structured data from a web that was built for humans. If your site isn’t set up to talk to these systems, you’re going to get left behind.
Cloudflare just launched two features that make this significantly easier: Markdown for Agents and Content Signals. I’ve already enabled both on a site I manage at work, and the setup is straightforward enough that there’s no real reason not to do it.
Markdown for Agents#
The core idea is simple. HTML is expensive for AI to process. All the <div> wrappers, nav bars, script tags, and CSS classes are noise that burns through tokens without adding any meaning. Cloudflare claims this blog post about the feature takes 16,180 tokens as HTML and only 3,150 as markdown — an 80% reduction.
Markdown for Agents handles the conversion automatically at the edge. When an AI agent requests a page with Accept: text/markdown in the header, Cloudflare intercepts the response, converts the HTML to clean markdown, and serves that instead. The agent gets exactly what it needs without wasting compute on parsing junk.
Here’s what the request looks like:
curl https://your-site.com/about/ \
-H "Accept: text/markdown"bashThe response comes back as text/markdown with an x-markdown-tokens header that tells you exactly how many tokens are in the document. Tools like Claude Code and OpenCode already send these accept headers, so enabling this means those tools immediately get cleaner content from your site.
How to Enable It#
If you’re on Cloudflare (Pro, Business, or Enterprise), it’s a toggle. Go to your zone in the dashboard, find Quick Actions, and flip on Markdown for Agents. That’s it.
You can also enable it via the API:
curl -X PATCH 'https://api.cloudflare.com/client/v4/zones/{zone_tag}/settings/content_converter' \
--header 'Content-Type: application/json' \
--header "Authorization: Bearer {api_token}" \
--data '{"value": "on"}'bashThere are some limitations worth knowing about. It only converts HTML (no PDFs or other formats yet), it won’t work if the origin response is larger than 1 MB or doesn’t include a content-length header, and compressed origin responses aren’t supported. But for a typical website, these shouldn’t be issues.
Content Signals#
The second piece is Content Signals, which is a framework built around robots.txt that lets you explicitly tell AI systems what they can and can’t do with your content. It’s defined at contentsignals.org ↗ and uses three categories:
ai-train— Can this content be used to train or fine-tune AI models?search— Can this content be indexed and returned in search results?ai-input— Can this content be used as input to AI models (RAG, grounding, agentic use)?
You add these as directives in your robots.txt:
User-Agent: *
Content-Signal: ai-train=no, search=yes, ai-input=yes
Allow: /textThe site at contentsignals.org ↗ has a generator where you can pick from preset policies and get the exact robots.txt directives you need. You can go as broad as “allow everything” or as restrictive as “disallow all.” You can even target specific user agents or specific paths if you want different rules for different parts of your site.
When Markdown for Agents is enabled, converted responses automatically include a Content-Signal header: ai-train=yes, search=yes, ai-input=yes. Custom policy options are coming in the future.
It’s worth being clear about what this is and isn’t. These are signals, not enforcement. Just like robots.txt has always been a polite request rather than a technical barrier, Content Signals express your preferences. Some crawlers will respect them. Some won’t. But having an explicit, machine-readable declaration of your intent is better than having nothing, and it establishes a baseline that responsible AI companies are starting to honor.
Why This Matters#
If you run a website and want AI tools to be able to accurately reference your content, you should be making it easy for them to do so. That means serving clean, structured content when agents ask for it, and being explicit about how that content can be used.
I recently enabled both of these on a website at work. The process took about five minutes — flip the toggle, update robots.txt with Content Signals, done. If you’re already on Cloudflare, this is one of the easiest things you can do to make your site more accessible to the next generation of how people find things online.