Convert Any Web Page to LLM-Ready Markdown Using Gaffa
How you can use Gaffa to convert any web page into markdown, ready to feed into your LLM-based apps
Apr 30 2025

Many applications need real-time web access to deliver relevant information, especially those powered by large language models with static knowledge bases is generally frozen in time. Web HTML is typically noisy, cluttered with ads and navigation elements that inflate token usage and reduce LLM performance.
Gaffa’s Browser Request API is a simple REST interface for web automation and data extraction . Its generate_markdown action converts web pages into clean, LLM-ready markdown by removing clutter to reduce token usage whilst preserving key content.
Converting a Web Page to Markdown
To extract a clean and readable markdown from a web page using Gaffa, you send a POST request to the /v1/browser/requests endpoint with the following JSON payload:
Generate Markdown Browser Request
This instructs Gaffa to wait for the main content area to load and then generate markdown from it. This is the response:
Generate Markdown Response
Gaffa's returns a list of processed actions with the generate_markdown action containing a markdown file that includes only meaningful content. The generate_markdown action removes lots of HTML information like styles, JavaScript, and headers that aren't relevant to the core content of the page. You can read more about this action in the Gaffa docs.
Building a Simple Python CLI Tool
To show how you can integrate Gaffa's markdown into your workflow, we built a simple Python CLI tool that:
- Takes a URL input from the user
- Sends a POST request to Gaffa which will load the URL on our fleet of headful cloud browsers and return markdown, giving the output as a link.
- Sends the resulting markdown to OpenAI’s API to enable question answering over the content.
You can find the full code on GitHub, but one notable point is:
- With Gaffa, handling proxies is as simple as adding as a proxy_location parameter to your request. This is especially useful for bypassing region-based restrictions or automation blockers. Learn more in the docs.
HTTP POST Payload Gaffa Request
After extracting markdown with Gaffa, you then send it to an LLM API like OpenAI.
LLM Prompt
This function handles user input, your Gaffa-generated markdown, and a follow-up question about it, sending them to the model (in this case, gpt-3.5-turbo) via the ChatCompletion API. The model then returns a relevant response based on the provided prompt.
Sample Output
Here's the CLI tool in action, using Gaffa to fetch markdown from a live article:
The user is prompted to enter a URL, and the script fetches the markdown content from Gaffa and returns a link pointing to the markdown file, which is then used to ask questions about the content in the second part of the script.
Script Output #1
The second part allows users to ask questions about the content. The script uses the OpenAI API to generate a response based on the markdown content. The user can ask any content-related question and get an answer back.
Script Output #2
If a question is asked that isn't covered in the content, the tool will let you know this and give you a summary of the content.
Script Output #3
Using Gaffa's generate_markdown action it is easy to convert any web page into a clean, readable markdown. We've shown you how you can integrate this output into a simple LLM app which allows us to ask questions about the content of any web page without unnecessary token costs.
This is just the tip of what can be achieved with Gaffa. To explore more advanced features and capabilities:
- Learn about additional actions and settings from the Gaffa documentation.
- Have a play on the API playground to experiment more with the different actions available, which include pre-built requests to get you started.
Ready to take your web automation skills to the next level? Start using Gaffa today or reach out to us with your project’s requirements.
Appendix:
- Read more about generating markdown in the Gaffa API docs
- Download the code sample above from GitHub
- Try sending a request to export a PDF from a site using the Gaffa API playground