Convert any webpage into LLM-ready Markdown using Gaffa
The ability to convert websites into LLM-friendly markdown is powerful when building applications for summarization, Q&A, or knowledge extraction. In this guide, you'll learn how to use the Gaffa API to extract the main content of any web page using browser rendering and convert it into structured markdown.
By the end of this guide, you’ll be able to:
Render web pages using Gaffa’s API.
Extract clean page content.
Generate structured markdown suitable for LLM-based Q&A or summarization.
Prerequistes
Install Python 3.10 or newer.
Create a virtual environment
Install the required libraries
Get your Gaffa API key and OpenAI API key, and store them as environment variables:
Convert a webpage to Markdown
In the code below, we define a function that takes a URL as input, makes a POST request to the Gaffa API, invoking the generate_markdown action, which uses the browser rendering engine to extract the main content of the page and convert it into markdown.
Ask questions using OpenAI
Now that we have the markdown content, we can ask questions about it using the OpenAI API. The function below takes the markdown content and a question as input and uses the OpenAI API to generate a summary based on the provided content. In this case, we are using the gpt-3.5-turbo model, but you can choose any other model.
The markdown becomes the model’s context, enabling accurate answers about the original web content.
User Interaction and Execution
Having defined the functions, we can now create a simple command-line interface that allows users to input a URL and ask questions about the content.
Full Script
The full script is available to download from the Gaffa Python Examples GitHub repo.
Running the Script
To run the script, simply execute it in your terminal:
With your script running, you can enter any URL of any web page, and the script will fetch the markdown content and allow you to ask questions about it.
Last updated