Only this pageAll pages
Powered by GitBook
1 of 54

Documentation

Loading...

Loading...

Loading...

Features

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

API Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Tutorials

Loading...

Loading...

Loading...

Loading...

Loading...

AI Tools

Introduction

What is Gaffa?

Gaffa is a powerful API for browser automation that lets you control real web browsers at scale through a simple interface with no configuration required. We'll handle the complexities of managing infrastructure, such as virtual machines, proxies, and caching, so you can focus on building powerful, reliable web automation and AI applications!

API Playground

Start experimenting with the Gaffa API right now.

Get Started

The simple steps to get you started using Gaffa in your apps.

API Reference

Explore the API and docs for the finer details

Key features

Gaffa is ready to power your web automations:

  • Simplicity - there's no need to learn another new framework; Gaffa is accessible through a simple REST API - just tell it what site you want to visit and what actions you want to perform, and it will be carried out as soon as you send the request.

  • Real browsers - headless browsers are popular but we make it simple to control real cloud-hosted browsers at scale which render JavaScript sites exactly as they would on a local machine, are harder to detect when doing scraping, and allow full observability. We're also planning to let you go beyond just controlling web browsers!

  • Proxies - you can easily choose to route your traffic through a network of residential proxy IP addresses to help avoid bot-detection on sites you are trying to automate.

  • Scalable - whether you want to control a single cloud browser or 100s in parallel with Gaffa, you can do that easily without one thought about infrastructure management.

  • Powerful data processing - once you've accessed your desired site, you can export your data in a constantly growing number of formats. If you want the page content in Markdown to feed into a large language model, or an image to feed into a vision modal we can help.

Ready to work with Gaffa?

Get Started

Stay up to date

We'll be announcing updates and new features in our newsletter - sign up here.

Get Started

An introduction to the Gaffa Browser API. Learn how you can get started building fast, powerful web automations!

Welcome to the Gaffa documentation site! You'll find everything you need here to get started using the API, including interactive API definitions, a comprehensive list of actions you can use to interact with our cloud browsers, and breakdowns of our example requests you can run right away in our API Playground.

Gaffa is currently in its very early stages, so we'd love to hear how we can improve our docs and API to make life easier for our users. If you have any questions or comments, please email us or use the support tool on our site. To stay up to date with the latest developments, features, and news on the mission to support the development of revolutionary AI Agents, sign up for sporadic newsletter updates.

1

Create an account

You can sign up to create a Gaffa account here. After signing up, you can use the API to access our API Playground, which includes several prebuilt automations for our demo site that simulate a range of scenarios.

Accessing the open web

When you're ready to use Gaffa on the open web, you'll need to choose a plan that suits your needs and pay for it. After that, the full internet will be available for you to automate.

To avoid scaling issues for our existing customers, we are currently using a queuing system for new accounts. Simply join the queue when prompted on your account dashboard, and we'll let you know when you have access. If you want to jump the queue, you can fill out a short survey to help us better understand our users, and we'll approve your account sooner!

2

Making your first browser request

The easiest way to make your first Gaffa browser request is to use our API Playground, where you can see several pre-made interactive browser request examples of automations we've built against our test site, which simulates some common scraping and web automation scenarios. You can run these examples without a paid account and edit them easily to experiment. Once you have a paid account, you can also use the playground to build your automations for other sites.

Gaffa API Playground examples

Here are all the sample requests we've created for use in the API Playground.

Print to PDF

Export a web page to PDF and wait for elements to load with the Gaffa API.

Convert to Markdown

Export a web page to markdown format - useful feeding into LLM apps.

Infinitely Scroll

Scroll the bottom of a page that infinitely loads items and record the interaction.

Capture Screenshot

Interact with a page and capture the a screenshot of the whole page.

Form Completion

Fill out a form in a human-like way and record the interaction

3

Building your own browser requests

Once you have a paid account and are ready to start building your own browser requests, you'll want to read about all the other actions you can use for your solution, as well as how you can easily use proxy servers, our cache, and the other endpoints that are part of the API

Want to build faster with AI assistance?

You can use Gaffa's llms.txt file to give AI assistants like ChatGPT or Claude instant, accurate context about the Gaffa API, so they can generate working code for you straight away, without you having to explain the API yourself. Learn how to use the Gaffa LLMs.txt file →

Credits and Pricing

View our current pricing plans on the Gaffa homepage

Browser Requests

Browser requests are charged in terms of credits based on the following factors:

  • Request length: Billed at 1 credit per 30 seconds, the request takes to run on the browser.

    • If screen recording is enabled, this is doubled to 2 credits per 30 seconds.

  • Proxy bandwidth usage: All requests that use a proxy_location parameter use our network of residential proxies and are billed at 1500 credits per 1GB of bandwidth used.

  • Paid Actions: Some actions will incur additional costs for their usage in a browser request. These are:

    • JSON Parsing

Each successful request will deduct the corresponding number of credits from your monthly allowance. Be sure to use as many of your monthly credits as you want, as they don't roll over month to month.

Mapping Requests

Mapping requests are also charged in credits at a rate of 1 credit per mapping request.

Block DOM Removals

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

Type: block_dom_removals

This action will prevent the page from removing items from the page. This is useful if you are trying to scrape data from a JavaScript-based web application that removes items from the page when they are out of view, which can make grabbing data difficult.

Using this action will block DOM removals for the rest of the browser request.

Parameters

See universal parameters.

Usage

Capture the cookies of the current page

"actions": [
    {
      "type": "block_dom_removals"
    }
]

Capture Cookies

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

Type: capture_cookies

This action will capture the browser cookies currently saved for the web page you are on and return them as a JSON object with key/values.

Parameters

See universal parameters.

Usage

Capture the cookies of the current page

"actions": [
    {
      "type": "capture_cookies"
    }
]

Capture DOM

Type: capture_dom

This action will capture and return the site's raw DOM, which you can then extract data from on your end.

For common AI scenarios, you may find that this returns too much data, so we have provided a generate_simplified_dom , an action that distills the DOM to only the important elements.

Parameters

See universal parameters.

Usage

Capture the raw DOM of the current page

Example Output

13KB
GaffaDOMSample.txt
Open

API Playground Examples

On the following pages, you can view all prebuilt requests we've created to show what is possible with the Gaffa web automation API.

You can start using these in the API Playground once you've created an account.

Export Web Page to PDF

An example request that uses Gaffa to convert an HTML page to a PDF. There are lots of HMTL to PDF API's but Gaffa handles it easily, as well as doing much more.

The following example is a request we've prebuilt to show you Gaffa's capabilities on our demo site. You can run this request right now in the Gaffa API Playground.

Gaffa's print-to-PDF feature allows you to easily export web pages as PDF files. Unlike the standard "Print to PDF" in your local browser, Gaffa's feature waits for specific items to load, uses proxies, and scales with your product's growth. Enhance your customer experience and streamline your PDF export process

API Request

The request below uses the POST endpoint to open the demo site on the table page, wait for the table to load, and then print the webpage to a PDF in A4 size with a 20-point margin and in portrait orientation.

Actions

Read the full documentation for these actions here.

WaitPrint

Response

Here's an example of the PDF returned by the request after the table has loaded.

51KB
GaffaPrintPdfExample.pdf
PDF
Open

API Authentication

We use API Keys for authenticating requests to our API. In this document we'll explain how you can manage and use the keys for your account.

Creating Keys

Once your account is approved, you will need to create an API key to send your requests to our API. Go to your account Dashboard > API Keys and create a new key with a name. Once the key is created, copy the value, and you can immediately start using it to make requests.

You can create as many keys as you wish, but always remember to treat the key as a secret and do not reveal it in public blog posts or GitHub repositories. If someone uses your leaked key to make requests, we won't be responsible!

Deleting Keys

If you are worried you have exposed your Gaffa API key, or just want to periodically rotate your keys, you can create a new key and then delete your old keys. Deleted keys will immediately stop working for new API requests, but past browser requests made using old keys will still be available.

Authenticating Requests

Our API is secured with a customer header X-API-Key whose value should be any current API key in your account. That's all you need to add to your request!

"actions": [
    {
      "type": "capture_dom"
    }
]
{
  "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "actions": [
      {
        "type": "wait",
        "selector": "table"
      },
      {
        "type": "print",
        "size": "A4",
        "margin": 20,
        "orientation": "portrait"
      }
    ]
  }
}

Browser Requests

Making web automation requests has never been so simple.

Browser Requests allow you to send the Gaffa API a URL and a list of actions you want to be carried out, including any outputs you want from the page. We'll carry out the request in our cloud browsers and return the response, so you don't have to worry about proxies, IP rotation, web automation frameworks, or scaling.

There's absolutely zero configuration needed, and you can interact with Gaffa from any program that can send web requests. We think it's by far the simplest way to automate basic web tasks, and the good news is that we're just getting started and have much more planned.

How It Works

A browser request consists of three main components:

  1. Parameters — Control the basics like URL, proxy location, and caching

  2. Settings — Configure recording, media limits, and timing

  3. Actions — Define the tasks you want performed on the page

Running a new browser request is as simple as sending the following . Below, you can see the URL () and a list of actions that instruct Gaffa to wait for the table to load, then print the page to PDF.

You can read more about this particular example and how you can run it right now in our API Playground .

We believe your AI Agents should be able to use the internet exactly how humans would. Gaffa can help you access sites with some of the most challenging anti-bot restrictions by combining proxies, human-like behaviour, captcha solving, and a custom browser implementation. We handle and maintain all of that so you can focus on building your solution!

— Learn about URL, proxy settings, async mode, and caching

— Explore recording, media bandwidth controls, and time limits

— Discover all available actions like screenshots, markdown generation, and more

— View pre-built requests and start using them in the API Playground

— Complete endpoint documentation and technical details

We've created a number of sample browser requests you can read about , or you can jump straight into the to run them right now.

Check out our API reference for more details on the available endpoints, particularly .

Parameters

Parameters are the top-level settings that control the fundamental behaviour of your automation. These parameters define where your request goes, how it's routed, whether it runs synchronously or asynchronously, and how caching is handled.

Below you'll find detailed documentation for each available parameter.

Proxy servers

In order to access public sites and use proxy servers, you'll need to sign up for a , but after that, you'll be able to build automations for any site you wish.

Gaffa makes it super simple to proxy your traffic through a global network of residential proxies. Setting proxy_location in your request will allow you to utilize one of our partner third-party proxy services to gain local access to a site.

Not setting a proxy_location will mean the request does not use a proxy server and will use a generic datacenter IP.

Available Locations

Proxy Server Location
Country Code

Currently, all our IP addresses are residential IP addresses, which are procured through reputable third parties.

IP rotation is an essential part of any web data scraping or automation task. In Gaffa, each browser request is treated as unique. We regularly rotate the IP addresses used, so you should assume each request is made from a different IP address than the last.

Whilst we'll do our best to provide access to as wide a range of sites as possible, we may have to restrict access to certain sites to prevent abuse of our service or of other services. Our proxy partners may also enforce restrictions on certain sites and categories of sites that we don't have any control over.


max_cache_age: integer

When we were building Gaffa, we noticed that many existing scraping tools don't let users easily share their scraped web data, even though many users request the same pages on the same sites. Not only is this a waste of a user's allowance, but it also puts a burden on the site owners who are serving the same data to different users for the same purpose. Because of this, we have created a service-wide cache in Gaffa.

When making a browser request, you can provide a max_cache_age parameter that is a number in milliseconds equal to or greater than 0. This value denotes the maximum age of data you would accept from the API. If another user of our service has requested the same URL with exactly the same parameters and actions as you in this timeframe, the response will be returned to you immediately and will not be processed by one of our browsers. If there are multiple identical requests in the given timeframe, then the most recent will be returned. This will save you time waiting for a response and credits, because requests returned from the cache don't use any bandwidth.


The settings object allows you to configure how your browser requests behave. It currently supports three parameters that control recording, media downloads, and execution time limits.

You can read more about all available settings parameters .

Capture Screenshot

Type: capture_screenshot

Takes a screenshot of the current page. You can take a full-screen screenshot of the entire page or just the current view.

Name
Type
Required
Description

Capture Element

Type: capture_element

Returns the , essentially the contents, of a particular element on the page. This can be used when you are only interested in the contents of a particular element.

Name
Type
Required
Description

Capture Snapshot

Type: capture_snapshot

This output type will return an HTML file that captures a static version of the page state. The page will load offline and can be saved to your local machine.

This will:

  • Load and embed all images on the page.

Click

Type: click

Request that the browser click a particular element on the page.

Parameters

Name
Type
Required
Description

See .

The following code will wait 1 second and then continue with the next action, if provided.

The following code will wait for the logo to appear for a maximum of 5 seconds, and it will continue with the list of actions

Download File

Type: download_file

Request a copy of the most recently viewed file in the browser.

Parameters

Name
Type
Required
Description

See .

Currently, this only works with the following file formats: .pdf, .jpg, .png, .gif, .bmp, .webp, .svg, .tiff, .tif, .img

The following waits 20s for a file to download and then returns it.

And the service responds with the file being in the action output:

Generate Simplified DOM

Type: generate_simplified_dom

When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into an LLM. The generate_simplified_dom output format processes the HTML in the following way:

  • Removes all links in the head

  • Removes all script nodes and links to scripts

  • Removes all style nodes

  • Remove style attributes from all elements

  • Remove all links to stylesheets

  • Remove all noscript elements outside of the body

  • Finds all hrefs with query strings and removes the query strings

  • Important meta tags are kept, all others are removed

  • Remove all alternate links

  • Remove all SVG paths

  • Remove empty text nodes and excessive spacing

See .

The following JSON captures the page's DOM and simplifies it.

Print

Type: print

Request that the browser print the page to a PDF.

Parameters

Name
Type
Required
Description

See .

The following JSON prints the page to a PDF in landscape orientation with a 20px margin.

Type

Type: type

Request that the browser enter a specific piece of text into a field.

Parameters

Name
Type
Required
Description

See .

The following action will type into a particular text field.

The following code will wait up to 10 seconds for the email input field to appear, then type in the provided email.

Wait

Type: wait

Request that the browser wait a given amount of time or for a particular item to appear on the page.

Parameters

Name
Type
Required
Description

See .

The following code will wait 1 second and then continue with the next action, if provided.

The following code will wait for a table to appear on the page for a maximum of 5 seconds. If the table has not appeared after 5 seconds, the next action will be executed, if provided.

Convert Web Page to Markdown

An example request that uses Gaffa to convert a web page page to markdown. This could be used to export web page reports or to print the content of a page in a readable format.

The following example is a request we've prebuilt to demonstrate Gaffa's capabilities on our You can run this request right now in the .

Gaffa converts web pages to clean markdown, stripping away styling, scripts, and images. This optimises content for LLM applications by reducing credit usage while preserving essential information.

The request below uses the POST endpoint to open the demo site on the article simulator, wait for the article to load, and then generate a markdown from the page's content, which you can download for use in your program.

Here's an example of the PDF returned by the request after the article has loaded.

Capture a Full-Height Screenshot

An example request that uses Gaffa to dismiss a modal, scroll to the bottom of a page and then capture a full height screenshot.

The following example is a request we've prebuilt to show you Gaffa's capabilities on our You can run this request right now in the .

Gaffa can also capture screenshots at any point during your interaction for use in your app or to work out exactly what was shown at a given time. You can capture just what is shown, as if you were looking at the screen or the full height of the page.

The request below uses the to open the demo site on the ecommerce page with 20 items, wait for and dismiss the dialog, scroll to the bottom of the page, and capture a full height screenshot.

The full-height export screenshot of the page showing all items.

API Reference

Complete HTTP API documentation for Gaffa. Each endpoint page includes interactive OpenAPI definitions you can try from the docs.

Start with to create and use API keys, then explore the endpoints below.

  • — Create and run a browser request

  • — Get a browser request by ID

POST v1/browser/requests

For more information on browser requests, .

The following endpoint creates a browser request and either runs it synchronously or returns immediately with an ID so you can check its status later.

GET v1/browser/requests/{id}

The following endpoint allows you to query the browser request for your account by ID.

GET v1/browser/requests

For more information on browser requests, see here.

The following endpoint allows you to query for multiple browser requests, either by status or a list of particular ids, submitting a request with neither of these will return all requests for your account.

POST v1/schemas

The following endpoint allows you to describe a data schema for parsing an online PDF to JSON.

PUT v1/schemas

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

The following endpoint allows you to update a data schema by ID.

GET v1/schemas

The following endpoint allows you to list data schemas for your account in a paginated list.

DELETE v1/schemas/{id}

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

The following endpoint allows you to delete a schema from your account.

POST v1/site/map

This endpoint creates a new site mapping request and returns the result.

GET v1/site/map

This endpoint retrieves information about previous site mapping requests, filterable by id or status

GET v1/site/map/{id}

This endpoint retrieves information about a site mapping request.

United States

us

Ireland

ie

Singapore

sg

France

fr

At the moment, all our servers are in one location, but we aim to deploy local machines at our proxy locations to improve realistic end-user load times. If this interests you, please contact support.

IP Types

IP Rotation

We are working to support a wider range of IP address scenarios, including static IPs in the future, and to enable more trusted proxies for requests that require enhanced security (logins, etc.).

Restrictions

Caching

How it works

Settings

here
paid account

GET v1/browser/requests — List browser requests

  • POST v1/schemas — Create a schema

  • PUT v1/schemas — Update a schema

  • GET v1/schemas — List schemas

  • DELETE v1/schemas/{id} — Delete a schema

  • POST v1/site/map — Create a site mapping request

  • GET v1/site/map — List site mapping requests

  • GET v1/site/map/{id} — Get a site mapping request by ID

Browser requests

API Authentication
POST v1/browser/requests
GET v1/browser/requests/{id}

Schemas

Site mapping

Example Request

Stealth

Learn More

Examples

API Endpoints

POST body to our endpoint
our demo site
here
Parameters
Settings
Actions
Examples
API Reference
here
API Playground
those you can use to query for past requests by ID or status

timeout

integer

The maximum amount of time the browser should wait for a file to download. Default: 5,000 (5s)

Files Supported

Usage

Download a copy of a PDF open in the Browser

universal parameters

Parameters

Usage

We are actively working to improve this and to make this process more configurable - let us know if there's something you think we can improve.

Example Output

universal parameters
6KB
GaffaSimplifiedDOMSample.txt
Open

time

integer

The time in milliseconds that the browser should wait.

selector

string

The selector that defines the page element that the browser should wait to appear.

timeout

integer

The maximum amount of time the browser should wait for the provided selector to appear. Default: 5,000 (5s)

Usage

Wait for a particular amount of time

Wait for a particular element to appear

universal parameters

size

string

The size of paper the page should be printed to. Default: A4 Accepted: ["A4"]

margin

integer

The margin of the page in pixels when the page is printed to PDF. Default: 20

orientation

string

Should execution of further actions continue or throw an error if this action fails. Default: portrait Accepted: ["portrait", "landscape"]

Usage

Print a page in landscape to PDF

Example Output

universal parameters
51KB
GaffaPrintPdfExample.pdf
PDF
Open

selector

string

The selector that defines the page element that the browser should click on.

timeout

integer

The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

Usage

Click an element on the page

Wait for a particular element to appear

universal parameters

selector

string

The selector that defines the page element that the browser should click on.

text

string

The text the browser should enter into the text field.

Sites that use more advanced bot detection often use keyboard events to detect unusual activity on their site, rather than immediately dropping all characters of the text into a field, our platform types the text in a human-like manner.

Usage

Type into a text box

Wait for an element to appear before typing

universal parameters

API Request

Actions

Response

demo site.
Gaffa API Playground
POST endpoint
Wait
Click
Scroll
Capture Screenshot
Gaffa's full height screenshot

API Request

Actions

Response

demo site.
Gaffa API Playground
Wait
Generate Markdown
5KB
GaffaMarkdownExample.md
Open
Embed all CSS files

Currently, JavaScript will be disabled, and interactivity might not work as expected, but this feature should be useful for preserving the page state as it was and allowing you to view it offline.

See universal parameters

The following captures the current section of the page currently visible in the browser.

Here's an example that shows an offline snapshot of a site

Parameters

Usage

Example Output

518KB
GaffaSnapshotSample.mhtml
Open
{
  "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "max_media_bandwidth": null,
    "actions": [
      {
        "type": "wait",
        "selector": "table"
      },
      {
        "type": "print",
        "size": "A4",
        "margin": 20,
        "orientation": "portrait"
      }
    ]
  }
}
"actions": [
    {
        "type": "download_file",
        "timeout": 20000
    }
]
"actions": [
      {
        "id": "act_VHhrUbXjZSaYCPTqbBYD4acCzzeFGH",
        "type": "download_file",
        "query": "download_file?continue_on_fail=false&timeout=20000",
        "timestamp": "2025-05-30T15:02:06.6615306Z",
        "output": "https://storage.gaffa.dev/brq/downloads/5845df07-3749-424e-9c64-9602be19a857.pdf"
      }
    ]
"actions": [
    {
        "type": "generate_simplified_dom"
    }
]
"actions": [
      {
        "type": "wait",
        "time": 1000,
      }
]
"actions": [
      {
        "type": "wait",
        "selector": "table",
        "timeout": 5000,
        "continueOnFail": true
      }
]
"actions": [
    {
        "type": "print",
        "page_size": "A4",
        "orientation": "landscape",
        "margin": 20
    }
]
"actions": [
    {
      "type": "click",
      "selector": "a.header__logo"
    }
]
"actions": [
      {
        "type": "wait",
        "selector": "a.header__logo",
          "timeout": 5000,
          "continueOnFail": true
      }
]
"actions": [
      {
            "name": "type",
            "selector": "#postform-text",
            "text": "Hello world!"
      }
]
"actions": [
      {
         "name": "type",
         "selector": "form input[name="email"]",
         "text": "test@test.com"
         "timeout": 10000
      }
]
{
  "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=20",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "actions": [
      {
        "type": "wait",
        "selector": "div[role=\"dialog\"]",
        "timeout": 10000
      },
      {
        "type": "click",
        "selector": "[data-testid=\"accept-all-button\"]"
      },
      {
        "type": "wait",
        "selector": "[data-testid^=\"product-1\"]",
        "timeout": 5000
      },
      {
        "type": "scroll",
        "percentage": 100
      },
      {
        "type": "capture_screenshot",
        "size": "fullscreen"
      }
    ]
  }
}
{
  "url": "https://demo.gaffa.dev/simulate/article?loadTime=3&paragraphs=10&images=3",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "actions": [
      {
        "type": "wait",
        "selector": "article"
      },
      {
        "type": "generate_markdown"
      }
    ]
  }
}
"actions": [
    {
        "type": "capture_snapshot",
    }
]

string

The size of paper the page should be printed to. Default: view Accepted: ["view", "fullscreen"]

See universal parameters.

The following captures the current section of the page currently visible in the browser.

An example screenshot in fullscreen mode.

Parameters

size

"actions": [
    {
        "type": "capture_screenshot",
        "size": "view"
    }
]

Usage

Example Output

string

The that defines the element whose contents you want to capture.

timeout

integer

The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

See universal parameters.

The following code will wait 1 second for the .page_contents element to appear and return an HTML file containing the div's innerHTML.

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

Parameters

innerHTML

selector

"actions": [
    {
      "type": "capture_element",
      "selector": ".page_contents",
      "timeout": 1000
    }
]

Usage

Click an element on the page

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

For more information on browser requests, see here.

contact support
see here
contact support

Parse an HTML Table to JSON

An example request that uses Gaffa to extract structured data (JSON) from a table on a webpage

The following example is a prebuilt request that demonstrates Gaffa's capabilities on our demo site. You can run this request right here in the Gaffa API Playground.

This example demonstrates how to extract tabular data from any webpage without writing a scraper. Gaffa renders the page using a real browser, waits for the table to load, and returns the rows as a clean JSON array, making it perfect for building data pipelines, monitoring dashboards, or feeding structured data into LLM workflows.

API Request

The request below uses the POST endpoint to load a demo table page, waits for the table element to appear, and parses each row into a structured JSON array, using the table's header row as property names.

{
  "url": "https://demo.gaffa.dev/simulate/table?loadTime=1&rowCount=3",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "actions": [
      {
        "type": "wait",
        "selector": "table",
        "timeout": 5000
      },
      {
        "type": "parse_table",
        "selector": "table"
      }
    ]
  }
}

Actions

WaitParse Table

Response

The parse_table action returns an output URL pointing to the extracted JSON:

{
  "data": {
    "id": "brq_abc123ExampleRequestId",
    "url": "https://demo.gaffa.dev/simulate/table?loadTime=1&rowCount=10",
    "state": "completed",
    "credit_usage": 1,
    "http_status_code": 200,
    "from_cache": false,
    "started_at": "2025-06-09T12:00:00.000Z",
    "completed_at": "2025-06-09T12:00:04.321Z",
    "running_time": "00:00:04.3210000",
    "page_load_time": "00:00:01.1230000",
    "actions": [
      {
        "id": "act_wait001",
        "type": "wait",
        "query": "wait?selector=table&timeout=5000&continue_on_fail=false",
        "timestamp": "2025-06-09T12:00:01.500Z"
      },
      {
        "id": "act_parse001",
        "type": "parse_table",
        "query": "parse_table?selector=table",
        "timestamp": "2025-06-09T12:00:01.600Z",
        "output": "https://storage.gaffa.dev/brq/results/brq_abc123ExampleRequestId/act_parse001_table.json"
      }
    ]
  }
}

Fetching that URL gives you the table rows as a ready-to-use array:

Mapping Requests

Mapping requests allow you to extract all URLs from a website's sitemap. Gaffa mapping requests have the following useful features:

  • Sitemap Discovery: No need to manually find a site's sitemap URL; we'll find it automatically.

  • Caching: If you or another Gaffa user has retrieved a sitemap within a defined timeframe, we'll quickly return the cached data instead of fetching it again.

  • Index Traversal: If the sitemap references other sitemap files, we'll automatically process each one and add its URLs to the list, ensuring the entire hierarchy is captured.

  • Aggregation and Duplicate Prevention: In rare cases where the sitemap contains duplicate entries, we'll automatically remove them for you and return all URLs sorted alphabetically.

  • Proxies: Gaffa uses its residential proxies behind the scenes to ensure your sitemap retrieval requests aren't blocked.

The endpoint allows you to create a new request and await the result. It's a request with a simple payload containing the URL of the site you want to extract the sitemap of, and a max_cache_age in milliseconds, you would accept a response returned from the cache; the default is 0, and Gaffa will never return a cached response when used.

For the Gaffa site, this will return the following response:

As you'll see from the of the site, there are also requests to retrieve site mapping requests for your account.

See the for the current cost of mapping requests.

Automated Form Filling

An example request that uses Gaffa to automate the completion of a form and waits for a success modal to appear.

The following example is a request we've prebuilt to show you Gaffa's capabilities on our demo site. You can run this request right now in the Gaffa API Playground.

API Request

{
  "url": "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=false&modalDelay=0&formType=address&firstName=John&lastName=Doe&address1=123%20Main%20Street&city=London&country=UK",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": true,
    "actions": [
      {
        "type": "type",
        "selector": "#email",
        "text": "johndoe@example.com"
      },
      {
        "type": "type",
        "selector": "#state",
        "text": "CA"
      },
      {
        "type": "type",
        "selector": "#zipCode",
        "text": "12345"
      },
      {
        "type": "click",
        "selector": "button[type='submit']"
      },
      {
        "type": "wait",
        "selector": "[role=\"dialog\"] h2:has-text(\"Success!\")",
        "timeout": 10000
      }
    ]
  }
}

Actions

TypeClickWait

Response

Here's a video showing Gaffa filling out the page and waiting for the success modal.

Read More

Read more about screen recording here (TODO).

Generate Markdown

Type: generate_markdown

The markdown output format exports page data (articles, tables, etc.) in a human- and LLM-readable format, removing unnecessary styling and other "junk" that is only relevant to the site's proper functioning.

Gaffa exports with comments removed and unknown tags ignored.

Name
Type
Required
Description
selector

Example Request

The request currently has a maximum running time of 60 seconds, after which an error will be returned.

Pricing

POST v1/site/map
API Reference section
Credits and Pricing page
[
  {
    "id": "1",
    "name": "Item 1",
    "quantity": "30",
    "price": "$56.05"
  },
  {
    "id": "2",
    "name": "Item 2",
    "quantity": "68",
    "price": "$76.89"
  },
  {
    "id": "3",
    "name": "Item 3",
    "quantity": "67",
    "price": "$20.44"
  }
]
{
  "url": "https://gaffa.dev",
  "max_cache_age": 10000
}
{
  "data": {
    "id": "smr_VQW4E66TdcQFZfCs6qavgdowPj3Bzk",
    "url": "https://gaffa.dev",
    "state": "completed",
    "credit_usage": 1,
    "from_cache": true,
    "started_at": "2025-08-22T11:05:43.328175Z",
    "completed_at": "2025-08-22T11:05:47.857941Z",
    "running_time": "00:00:04.5297660",
    "links": [
      "https://gaffa.dev",
      "https://gaffa.dev/about",
      "https://gaffa.dev/blog",
      "https://gaffa.dev/blog/convert-any-web-page-to-llm-ready-markdown-using-gaffa",
      "https://gaffa.dev/blog/how-to-extract-and-simplify-a-webpage-dom-with-gaffa",
      "https://gaffa.dev/blog/printing-webpages-to-pdf-html-to-pdf-using-gaffa",
      "https://gaffa.dev/docs",
      "https://gaffa.dev/docs/api-reference/api-authentication",
      ....and so on
    ],
    "link_count": 52
  }
}

selector

string

The that defines an element you want to generate markdown from. This is useful if you are only interested in the contents of a certain element.

output_type

string

Should the action output be saved to a file where a URL will be returned or should the parsed JSON object be included directly in the request. Default: file Accepted: ["file", "inline"]

See universal parameters.

The following converts the current page to markdown:

The following converts only a specific element to markdown and returns it inline:

Parameters

GitHub-flavoured markdown
"actions": [
  {
    "type": "generate_markdown"
  }
]
"actions": [
  {
    "type": "generate_markdown",
    "selector": "article",
    "output_type": "inline"
  }
]

Usage

Example Output

5KB
GaffaMarkdownExample.md
Open

Scroll

Type: scroll

Request that the browser scrolls to a certain point on the page or, in the case of pages with infinite scrolling, scrolls for a particular amount of time.

Parameters

Name
Type
Required
Description

See .

Gaffa gives you flexibility over how fast you scroll down the page, which can be really useful to get around restrictions enforced by some sites that detect and limit fast scrolling. By experimenting with scroll_speed and interval, you will be able to create the perfect scrolling action for your scenario. The speed settings are as follows:

  • instant- The page will smoothly scroll to the desired position immediately, useful for sites with no rate limits or loading events caused by scroll actions.

  • medium - Human-like scrolling at a normal speed to the desired position. Gaffa will scroll in much the same way as you would using a mouse.

  • slow

intervalallows you to adjust the scroll speed further by inserting pauses between scroll events.

If wait_time is set to 0, and Gaffa arrives at the desired location, then Gaffa will immediately mark the action as succeeded. However, if another value is set, the page will be monitored for the specified duration to check for further expansions. If, during this period, the page expands again, then Gaffa will continue scrolling to the desired location, and the wait will reset.

The following code will scroll halfway down the page.

The following code will scroll to the bottom of the page and then keep scrolling when new content loads for a maximum of 25 seconds, waiting 1 second for new content and scrolling at a slow pace with 1 second between scroll actions.

Using the Gaffa LLMs.txt File with Your AI Assistant

AI assistants like ChatGPT or Claude can generate working code far more effectively when they have accurate, up-to-date context about an API. That's exactly what Gaffa's llms.txt file provides. It provides a concise reference covering Gaffa's endpoints, actions, and code samples that you can drop directly into any AI assistant to get useful, accurate code from the very first prompt.

In this tutorial, we'll walk you through how to use the llms.txt file to build a complete Python script that interacts with the Gaffa API.

Step 1: Get the LLMs.txt File

Download or open the file at https://gaffa.dev/docs/llms.txt. It contains a concise overview of the Gaffa API, including available endpoints, actions, and example payloads.

Step 2: Load It Into Your AI Assistant

Start a new chat with ChatGPT, Claude, or your preferred AI assistant, then paste the full contents of the file into the conversation. This gives the assistant accurate, up-to-date context about the Gaffa API before you ask it anything.

Step 3: Ask the Assistant to Write Your Script

Once the assistant has the context loaded, you can ask it to build scripts for you. For example:

"Write me a Python script that uses Gaffa's browser API to convert a page into Markdown and save the output file locally."

Because the assistant already has the full API context, it can produce accurate code without you needing to explain endpoint structures or payload formats.

Here's an example of the kind of script your AI assistant might generate, based directly on the Gaffa API. It submits a browser request to convert a page to Markdown, polls until the request completes, and downloads the output file.

Run it with:

You'll see the job state printed in your terminal and a downloaded Markdown file saved to an outputs folder.

From here, you can modify the actions list to use other supported operations, such as generate_pdf, screenshot, or extract_text. You can make these changes manually, or simply ask your AI assistant to adapt the script for you. Since it still has the llms.txt context loaded, it can adjust the code to your specific requirements without needing any further explanation.

Parse Table

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

Type: parse_table

The parse_table action finds a table on a page using a CSS selector and converts it into a structured JSON array with no HTML parsing or post-processing required on your end.

The action reads the table's header row and converts each header into a property name (lowercased, with non-alphanumeric characters replaced by underscores). It then maps each cell value to its corresponding header for every row, returning a clean, ready-to-use JSON array. At the moment, all values are returned as string types.

For cases where you need more control, such as handling merged cells, skipping rows, or applying custom transformations, consider using capture_dom with a parsing library like BeautifulSoup instead.

Parameters

Name
Type
Required
Description

See .

The following request waits up to 1 second for a .large_table element to appear, then parses it into JSON:

Here is an example using Wikipedia's . Wikipedia applies a consistent CSS class to its data tables, making it straightforward to target with a selector:

Notice how column headers like "Country/Territory" and "IMF 2026" are automatically normalized into country_territory and imf_2026. Spaces and special characters are replaced with underscores, and everything is lowercased, so the output is immediately usable without any cleanup:

For a full walkthrough, including a comparison with the capture_dom + BeautifulSoup approach, see our blog post on .

Actions

When , you can specify a list of actions you want us to perform on the requested web page. These actions conform to the following format:

All actions have the following parameters:

Name
Type
Required
Description

Infinitely Scroll an E-commerce Site

An example request that uses Gaffa to infinitely scroll down a simulated ecommerce site whilst recording the interaction.

The following example is a request we've prebuilt to show you Gaffa's capabilities on our You can run this request right now in the .

Gaffa automates infinite scrolling on dynamic pages, such as e-commerce storefronts. Set a duration, and Gaffa will capture all content as it scrolls. Each session can be recorded as a video for playback, letting you debug or review the interaction.

The request below uses the to open the demo site in the e-commerce site simulator, featuring an infinitely scrolling storefront. It will wait for and dismiss a dialog box, wait for a product to load, and then scroll down the page for a maximum of 20 seconds - if new items load, it will keep scrolling.

Here's a video showing Gaffa scrolling the page for 20 seconds as more items load.

Read more about screen recording here. (TODO)

selector
- Human-like scrolling at a very slow speed to the desired position. The speed is comparable to scrolling while reading a page.

percentage

integer

The percentage the page should scroll up or down (+/-) Range: [-100 - 0 - 100] Default: 100 (% - scroll to bottom)

wait_time

integer

After arriving at the desired scroll location this the time Gaffa should monitor for changes to the page height before marking the action as succeeded. Read more below. Default: 0

max_scroll_time

integer

The maximum amount of time the page should be scrolled for, in milliseconds. After this time passes, the action will be cancelled. This doesn't cause the action to fail. Default: 20,000 (20s)

scroll_speed

string

The speed which the page should scroll to the desired point. You can read more about this below. Default: medium Accepted: [slow, medium, instant]

interval

integer

The amount of time, in milliseconds, that scrolling should pause between scroll events. Read more about this below. Default: 0

timeout

integer

The maximum amount of time Gaffa will wait for the page to become scrollable Default: 0

selector

string

The CSS selector that identifies the element to scroll. If not provided, the page body will be scrolled.

Scroll Speed & Interval

We've found some sites with infinite scrolling and strict rate limits respond better to immediate speed scroll events to the bottom of the page with large intervalsbetween these scrolls to keep within rate limits.

Wait Time

This can be really useful if you find that the site takes some time to load additional items when you reach the bottom of the page, and more items load after the action has succeeded.

Usage

Scroll a particular percentage down the page

Scroll an infinitely scrolling webpage

Read more

universal parameters
Cover

How to Handle Infinite Scrolling and Dynamic Loading with Gaffa’s Scroll Action

Step 4: Example Script

Step 5: Extend and Customise

The maximum time in milliseconds to wait for the table to appear. Default: 5000 (5s)

selector

string

The CSS selector that identifies the table you want to parse.

timeout

integer

Usage

Basic examples

Real-world example

Sample output

universal parameters
List of Countries by GDP (Nominal)
how to scrape a table with Python (the easy way)
contact support

Parse PDF to Structured JSON

An example request that uses Gaffa to extract structured data from an online PDF.

The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site. You can run this request right here in the Gaffa API Playground.

This example demonstrates how to extract data from PDF documents. Gaffa downloads the PDF and uses AI to intelligently parse the content according to your schema, making it perfect for building research databases, citation managers, or literature review tools.

This feature currently works for online PDFs.

API Request

The request below uses the POST endpoint to download a demo research paper from the hosted PDFs, wait for it to load, and then parse the first page to extract author information and paper metadata.

Actions

Download FileParse JSON

Response

The parsed data is returned as a structured JSON object matching your schema:

"actions": [
      {
        "type": "scroll",
        "percentage": 50,
      }
]
"actions": [
      {
        "type": "scroll",
        "percentage": 100,
        "scroll_speed": "slow",
        "max_scroll_time": 25000,
        "interval": 1000,
        "wait_time": 1000
      }
]
import os, time, requests, pathlib, urllib.parse

API_KEY = os.environ.get("GAFFA_API_KEY", "YOUR_API_KEY")
BASE = "https://api.gaffa.dev"

def submit_request(url, actions, async_mode=True):
    payload = {
        "url": url,
        "async": async_mode,
        "settings": {"actions": actions}
    }
    r = requests.post(
        f"{BASE}/v1/browser/requests",
        headers={"X-API-Key": API_KEY, "Content-Type": "application/json"},
        json=payload
    )
    r.raise_for_status()
    return r.json()["data"]

def wait_for_completion(request_id, poll_every=2, max_wait=180):
    start = time.time()
    while True:
        r = requests.get(
            f"{BASE}/v1/browser/requests/{request_id}",
            headers={"X-API-Key": API_KEY}
        )
        data = r.json()["data"]
        if data["state"] in ("completed", "failed"):
            return data
        if time.time() - start > max_wait:
            raise TimeoutError("Request timed out")
        time.sleep(poll_every)

def download_outputs(brq, dest="outputs"):
    dest = pathlib.Path(dest)
    dest.mkdir(parents=True, exist_ok=True)
    files = []
    for act in brq.get("actions") or []:
        out = act.get("output")
        if isinstance(out, str) and out.startswith("http"):
            name = pathlib.Path(urllib.parse.urlparse(out).path).name
            p = dest / name
            with requests.get(out, stream=True) as r:
                with open(p, "wb") as f:
                    for chunk in r.iter_content(8192):
                        if chunk: f.write(chunk)
            files.append(str(p))
    return files

if __name__ == "__main__":
    target_url = "https://demo.gaffa.dev/simulate/article?paragraphs=5"
    actions = [
        {"type": "wait", "selector": "article"},
        {"type": "generate_markdown"}
    ]
    job = submit_request(target_url, actions)
    brq = wait_for_completion(job["id"])
    print("Final state:", brq["state"])
    if brq["state"] == "completed":
        saved = download_outputs(brq)
        print("Downloaded:", saved)
python gaffa_script.py
{
  "url": "https://example.com",
  "settings": {
    "actions": [
      {
        "type": "parse_table",
        "selector": ".large_table",
        "timeout": 1000
      }
    ]
  }
}
{
  "url": "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)",
  "settings": {
    "actions": [
      {
        "type": "parse_table",
        "selector": ".wikitable",
        "timeout": 5000
      }
    ]
  }
}
[
  {
    "country_territory": "United States",
    "imf_2026": "30,337",
    "imf_year": "2026",
    "world_bank_2023": "27,361",
    "world_bank_year": "2023"
  },
  {
    "country_territory": "China",
    "imf_2026": "19,534",
    "imf_year": "2026",
    "world_bank_2023": "17,795",
    "world_bank_year": "2023"
  }
]
{
  "url": "https://demo.gaffa.dev/simulate/pdf/ReasoningAboutActionAndChange.pdf",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": false,
    "actions": [
      {
        "type": "download_file"
      },
      {
        "type": "parse_json",
        "data_schema": {
          "name": "AcademicPaper",
          "description": "Schema for parsing academic paper summary and author information",
          "fields": [
            {
              "type": "string",
              "name": "title",
              "description": "The full title of the academic paper"
            },
            {
              "type": "string",
              "name": "abstract",
              "description": "The paper's abstract or summary"
            },
            {
              "type": "array",
              "name": "authors",
              "description": "List of authors who contributed to the paper",
              "fields": [
                {
                  "type": "string",
                  "name": "name",
                  "description": "Author's full name as it appears in the paper"
                },
                {
                  "type": "array",
                  "name": "affiliations",
                  "description": "Institutional affiliations for this author",
                  "fields": [
                    {
                      "type": "string",
                      "name": "institution",
                      "description": "Name of the university or research institution"
                    },
                    {
                      "type": "string",
                      "name": "department",
                      "description": "Department or division name"
                    },
                    {
                      "type": "string",
                      "name": "city",
                      "description": "City where the institution is located"
                    },
                    {
                      "type": "string",
                      "name": "country",
                      "description": "Country of the institution"
                    }
                  ]
                },
                {
                  "type": "string",
                  "name": "email",
                  "description": "Author's contact email address if provided"
                }
              ]
            },
            {
              "type": "array",
              "name": "keywords",
              "description": "Key terms and topics covered in the paper",
              "fields": [
                {
                  "type": "string",
                  "name": "keyword",
                  "description": "Individual keyword or phrase"
                }
              ]
            }
          ]
        },
        "instruction": "Parse this academic paper focusing on the title, abstract, author information, and keywords typically found on the first page. Extract all author names, their institutional affiliations with department and location details, and their contact information.",
        "model": "gpt-4o-mini",
        "output_type": "inline",
        "max_pages": 1
      }
    ]
  }
}
{
    "data": {
        "id": "brq_VYfyVifa26oMpmX4YDeNN3iJDrhK3a",
        "url": "https://demo.gaffa.dev/simulate/pdf/ReasoningAboutActionAndChange.pdf",
        "state": "completed",
        "credit_usage": 0,
        "http_status_code": 200,
        "from_cache": false,
        "started_at": "2025-12-01T06:09:43.6125439Z",
        "completed_at": "2025-12-01T06:09:57.5453161Z",
        "running_time": "00:00:13.9327722",
        "page_load_time": "00:00:00.8959680",
        "actions": [
            {
                "id": "act_VYfyVhGPwQjur9XAu5XA47n2FozYfK",
                "type": "download_file",
                "timestamp": "2025-12-01T06:09:46.509484Z",
                "output": "https://storage.gaffa.dev/brq/downloads/brq_VYfyVifa26oMpmX4YDeNN3iJDrhK3a/ReasoningAboutActionAndChange.pdf"
            },
            {
                "id": "act_VYfyVjNHWzECbraio6xS6MqhYhiDWP",
                "type": "parse_json",
                "timestamp": "2025-12-01T06:09:57.5453056Z",
                "output": {
                    "title": "Reasoning about Action and Change",
                    "abstract": "This chapter presents the state of research concerning the formalisation of an agent reasoning about a dynamic system which can be partially observed and acted upon. We first define the basic concepts of the area: system states, ontic and epistemic actions, observations; then the basic reasoning processes: prediction, progression, regression, postdiction, filtering, abduction, and extrapolation. We then recall the classical action representation problems and show how these problems are solved in some standard frameworks. For space reasons, we focus on these major settings: the situation calculus, STRIPS and some propositional action languages, dynamic logic, and dynamic Bayesian networks. We finally address a special case of progression, namely belief update.",
                    "authors": [
                        {
                            "name": "Florence Dupin de Saint-Cyr",
                            "affiliations": [
                                {
                                    "institution": "IRIT-CNRS. Université Paul Sabatier",
                                    "department": "",
                                    "city": "Toulouse",
                                    "country": "France"
                                }
                            ],
                            "email": ""
                        },
                        {
                            "name": "Andreas Herzig",
                            "affiliations": [
                                {
                                    "institution": "IRIT-CNRS. Université Paul Sabatier",
                                    "department": "",
                                    "city": "Toulouse",
                                    "country": "France"
                                }
                            ],
                            "email": ""
                        },
                        {
                            "name": "Jérôme Lang",
                            "affiliations": [
                                {
                                    "institution": "CNRS, Université Paris-Dauphine, PSL Research University, LAMSADE",
                                    "department": "",
                                    "city": "Paris",
                                    "country": "France"
                                }
                            ],
                            "email": ""
                        },
                        {
                            "name": "Pierre Marquis",
                            "affiliations": [
                                {
                                    "institution": "CRIL-CNRS, Université d’Artois & Institut Universitaire de France",
                                    "department": "",
                                    "city": "Lens",
                                    "country": "France"
                                }
                            ],
                            "email": ""
                        }
                    ],
                    "keywords": []
                },
                "reference": "https://storage.gaffa.dev/brq/downloads/brq_VYfyVifa26oMpmX4YDeNN3iJDrhK3a/ReasoningAboutActionAndChange.pdf"
            }
        ]
    }
}

The type name of the action.

continue_on_fail

boolean

Should execution of further actions continue or throw an error if this action fails. Default: false

customId

string

A customId to help you find the action in the response. Default: null

Actions are carried out in the order they are submitted. Every action type has a continue_on_fail parameter, which defaults to false.This means that if any action fails, the execution of the browser request ends, and an error will be returned. Setting continue_on_fail to true ensures that all actions are carried out, regardless of the previous action's results, and an error will not be returned.

As shown above, you can submit a customId with each action you submit to the API. We'll include this Id in the outputs from the browser request so you can find a certain action's output and/or status easily in the response.

When a browser request has completed, information on an action's execution

The Gaffa API supports the following actions, detailed below. Click the "read more" buttons to read more information about each type.

Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
Type
Description
Read More
{
    "type": "", //the type of the action
    //other params follow as key value pairs
    "key": value //string, number, etc. 
}

type

Universal Parameters

making a Browser Request

string

{
    "id": "", //a unique id given to the action by Gaffa
    "type": "capture_screenshot", //the type of the action
    "query": "", //a representation of the action in querystring format
    "timestamp": "", //the UTC timestamp the action was executed
    "output": "" //if the action has an output, you will find a URL for this here,
    "error": "" //if the request fails, the error message will be returned here
}

Action Execution

Custom Id

Response Format

Supported Actions

Actions without outputs

Actions with outputs

{
  "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=infinite",
  "proxy_location": null,
  "async": false,
  "max_cache_age": 0,
  "settings": {
    "record_request": true,
    "actions": [
      {
        "type": "wait",
        "selector": "div[role=\"dialog\"]",
        "timeout": 10000
      },
      {
        "type": "click",
        "selector": "[data-testid=\"accept-all-button\"]"
      },
      {
        "type": "wait",
        "selector": "[data-testid^=\"product-1\"]",
        "timeout": 5000
      },
      {
        "type": "scroll",
        "percentage": 100,
        "max_scroll_time": 20000
      }
    ]
  }
}

API Request

Actions

Response

Read More

demo site.
Gaffa API Playground
POST endpoint
Wait
Click
Scroll
Get Started

Settings

The settings object in your browser request allows you to configure various aspects of how your automation behaves. Below are all the available settings parameters you can use.


Screen Recording

Parameter: record_request (boolean)

By specifying record_request, you can ask Gaffa to screen record your automation and return a video in the response, allowing you to view the magic happening or to debug your automation.

Recording requests comes at an additional cost.

Example:

{
  "url": "https://example.com",
  "settings": {
    "record_request": true,
    "actions": [...]
  }
}

Parameter: max_media_bandwidth (integer or null)

If you're using Gaffa on a site with lots of images and videos but are more interested in the text data on the page, you can cap how much media content a page loads using the max_media_bandwidth setting. This makes your automation faster and prevents spending credits on data you aren't interested in.

You can set max_media_bandwidth in three ways:

  • "max_media_bandwidth": 0 — Block all images and videos completely

  • "max_media_bandwidth": 5 — Cap media downloads at 5MB (or any number you specify)

  • "max_media_bandwidth": null — No limit (default)

When the max_media_bandwidth value is set, Gaffa monitors the data being downloaded by the page. When the downloaded media exceeds the specified MB limit, any further downloads of images or videos will be cancelled.

This setting is particularly useful for:

  • Scraping news articles for text only — Extract headlines and article content without downloading thumbnails

  • E-commerce price monitoring — Track product prices and descriptions without loading product images

  • Extracting reviews and text content — Capture customer reviews without profile pictures

Start with max_media_bandwidth: 0 for maximum savings, then adjust upward only if you encounter issues with specific sites. Setting a value of 0 will cause no images to load, which works well on most sites, but on some could lead to the site thinking you are using an ad blocker.

Example:

Learn more: See our detailed on optimizing browser requests with max_media_bandwidth, including real-world testing, use cases, and best practices.


Parameter: time_limit (integer)

Using the time_limit setting caps the maximum running time of the request in milliseconds. If this time expires, all incomplete actions will be canceled, and the request will return an error.

This cap must be less than the maximum request runtime specified in your plan; if not set, it defaults to that value.

Example:


Parameter: block_ads (boolean)

If you are automating or scraping content on ad-heavy websites, third-party ad network requests can slow down your page load significantly, even though you don't need them. By enabling block_ads , Gaffa intercepts and immediately aborts requests to known ad-serving domains before they load, reducing page load times without affecting the core page content.

You can set block_ads in two ways:

  • "block_ads": false — Ad blocking disabled (default)

  • "block_ads": true — Ad blocking enabled

Example:


Parameter: actions (array)

The actions parameter defines the specific tasks you want Gaffa to perform on the page once it loads. Actions are executed in the order they appear in your array and can include tasks such as waiting for elements, capturing screenshots, generating Markdown, printing to PDF, and more.

We support different types of actions, each designed for specific automation needs. .

Example:


Here's a browser request using multiple settings parameters:

Capture a full-height screenshot of a webpage

In just a few lines of JSON inlined in a single cURL command, you can automate:

  • Dismissing Wikipedia’s EU cookie consent banner (if present)

  • Waiting for the main heading on the Artificial Intelligence article

  • Scrolling through every section (lazy-loaded images and all)

  • Capturing a full-page PNG for archiving, visual regression, or documentation

All without installing Playwright or managing headless browsers, Gaffa handles it for you server-side via the.

  • A valid Gaffa API key

  • A simple HTTP client (cURL, Postman, axios, etc.).

  • Familiarity with the for testing browser requests.

  • Target URL for this tutorial, for this we'll use Wikipedia:

1

Use cURL with the full JSON payload inlined to ensure Gaffa receives exactly what you intend:

Replace YOUR_API_KEY with your actual token from your This command has the following actions:

  1. Wait (optional): Detect and accept Wikipedia’s cookie banner if it appears. If it fails, that simply means no banner was present, or it did not load in time. Since continue_on_fail defaults to true, Gaffa will continue without halting the workflow, ensuring the remaining steps still execute.

If you don't want to use cURL, you can also run this query in the , which is an easy way to get started.

Gaffa's screenshot action could be used for a huge number of use cases, but here are a few ideas:

  • Visual Regression: Integrate into your CI pipeline to compare changes over time.

  • Archival: Schedule daily captures for audit or compliance purposes.

  • Monitoring: Automate periodic checks to detect visual bugs or layout shifts.

Convert any webpage into LLM-ready Markdown using Gaffa

The ability to convert websites into LLM-friendly markdown is powerful when building applications for summarization, Q&A, or knowledge extraction. In this guide, you'll learn how to use the Gaffa API to extract the main content of any web page using browser rendering and convert it into structured markdown.

By the end of this guide, you’ll be able to:

  • Render web pages using Gaffa’s API.

  • Extract clean page content.

  • Generate structured markdown suitable for LLM-based Q&A or summarization.

  1. Install Python 3.10 or newer.

  2. Create a virtual environment

  1. Install the required libraries

  1. Get your key and key, and store them as environment variables:

In the code below, we define a function that takes a URL as input, makes a POST request to the Gaffa API, invoking the action, which uses the browser rendering engine to extract the page's main content and convert it to markdown.

Now that we have the markdown content, we can ask questions about it using the OpenAI API. The function below takes markdown content and a question as input, then uses the OpenAI API to generate a summary based on the provided content. In this case, we are using the model, but you can choose any other model.

The markdown becomes the model’s context, enabling accurate answers about the original web content.

Having defined the functions, we can now create a simple command-line interface that lets users enter a URL and ask questions about its content.

The full script is available to download from the .

To run the script, simply execute it in your terminal:

With your script running, you can enter any web page URL, and it will fetch the markdown content and let you ask questions about it.

click

Click on a given element

Click

scroll

Scroll to a particular point on the page or, in the case of pages with infinite scrolling, scroll until a given time has elapsed.

Scroll

type

Type the provided text into a given element

Type

wait

Wait for a given time to elapse or an element to appear on page before proceeding to the next action.

Wait

capture_cookies

Save a JSON object of cookies for the current page

Capture Cookies

capture_dom

Export the raw DOM page data

DOM

capture_screenshot

Capture a screenshot of the web page

Screenshot

capture_snapshot

Create a completely static version of the web page which can be accessed offline

Snapshot

download_file

Download an online file using Gaffa

Download File

generate_markdown

Convert the page into markdown

Markdown

generate_simplified_dom

Generate a simplified version of the DOM

Simplified DOM

parse_json

Parse online data to a defined JSON schema

JSON Parsing

print

Print the web page to a PDF

Print

SEO and content analysis — Analyze page structure, headings, and text without media files

Max Media Bandwidth

Setting Options

How It Works

Important: When enabled, only image and video downloads are blocked. HTML, CSS, JavaScript, and other essential page resources load normally, preserving functionality.

Common Use Cases

Performance Benefits: Testing on image-heavy news sites showed up to 43% token savings with no loss of text data. Sites with more media content see even greater savings in both cost and request speed.

When NOT to Use: Not recommended for capturing screenshots, verifying images, or analysing visual content.

Getting Started

Time Limit

Ad Blocking

Beta feature: Ad blocking is available to all users but is currently in beta. If you encounter ad networks that aren't being blocked, get in touch, and we'll add them.

Setting options

Actions

Complete Example

guide
Learn more about all available actions here
Wait: Ensure the main heading (#firstHeading) is loaded.
  • Scroll: Scroll through the entire page to trigger any lazy-loaded content.

  • Capture Screenshot: Produce a full-page PNG.

  • 2

    Retrieve Your Screenshot

    A successful response returns JSON like:

    The response contains the following information:

    • data.id: Unique request identifier.

    • data.state: "completed" means the workflow finished (even if some steps timed out).

    • data.credit_usage: Credits consumed for this run.

    • data.started_at / data.completed_at: Workflow timing.

    • data.running_time and data.page_load_time: Performance metrics.

    • data.actions: Each action’s details, including successes, timeouts, and final screenshot URL.

    Within the list of actions, you'll be able to see the capture_screenshot action, which contains an output parameter containing the full-size screenshot that was captured.

    Prerequisites

    Execute the Request

    Use Cases

    All this is powered by Gaffa’s hosted headless browsers with no local setup required. Experiment with more actions and easily build complex browser workflows. Refer to the full Browser Requests API documentation for additional capabilities.

    Browser Requests API
    API Playground
    Dashboard.
    Gaffa API Playground
    https://en.wikipedia.org/wiki/Artificial_intelligence

    Prerequistes

    Convert a webpage to Markdown

    Ask questions using OpenAI

    User Interaction and Execution

    Full Script

    Running the Script

    Gaffa API
    OpenAI API
    generate_markdown
    gpt-3.5-turbo
    Gaffa Python Examples GitHub repo
    {
      "url": "https://www.bbc.com/",
      "settings": {
        "max_media_bandwidth": 0,
        "actions": [
          {
            "type": "generate_markdown"
          }
        ]
      }
    }
    {
      "url": "https://example.com",
      "settings": {
        "time_limit": 30000,
        "actions": [...]
      }
    }
    {
      "url": "https://www.allrecipes.com",
      "settings": {
        "block_ads": true,
        "actions": [
          {
            "type": "capture_dom"
          }
        ]
      }
    }
    {
      "url": "https://example.com",
      "settings": {
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }
    {
      "url": "https://www.bbc.com/",
      "proxy_location": "us",
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "max_media_bandwidth": 0,
        "time_limit": 60000,
        "block_ads": true,
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }
    {
      "data": {
        "id": "brq_VJX3mbESLiyCFYvZQEUih9RdDYovog",
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "proxy_location": null,
        "state": "completed",
        "credit_usage": 2,
        "http_status_code": 200,
        "from_cache": false,
        "started_at": "2025-06-09T15:55:46.4235903Z",
        "completed_at": "2025-06-09T15:56:27.9381332Z",
        "running_time": "00:00:40.7348244",
        "page_load_time": "00:00:02.2087117",
        "actions": [
          {
            "id": "act_VJX3memaue6YUgFcn44uNscZbVUpYg",
            "type": "wait",
            "query": "wait?selector=%23cookie-policy-notice%2C%20.mw-cookie-consent-container&timeout=10000&continue_on_fail=true",
            "timestamp": "2025-06-09T15:55:48.6323091Z",
            "error": "action_timed_out"
          },
          {
            "id": "act_VJX3mkwfwNPdGiMUpqKr34Tm5xzyUU",
            "type": "click",
            "query": "click?selector=%23cookie-policy-notice%20button%2C%20.mw-cookie-consent-container%20button&continue_on_fail=true&timeout=5000",
            "timestamp": "2025-06-09T15:55:58.7949275Z",
            "error": "action_timed_out"
          },
          {
            "id": "act_VJX3mkSJ3sevWRXUCjFy6zwfD172fV",
            "type": "wait",
            "query": "wait?selector=%23firstHeading&timeout=10000&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:03.9581113Z"
          },
          {
            "id": "act_VJX3mbq9Jgj8EwADszW2AqdeJJXJiY",
            "type": "scroll",
            "query": "scroll?percentage=100&max_scroll_time=20000&scroll_speed=medium&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:03.9691994Z"
          },
          {
            "id": "act_VJX3mjBQYv8zTsXv1SkgUnBkzNFmJU",
            "type": "capture_screenshot",
            "query": "capture_screenshot?size=fullscreen&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:20.0727905Z",
            "output": "https://storage.gaffa.dev/brq/image/brq_VJX3mbESLiyCFYvZQEUih9RdDYovog/act_VJX3mjBQYv8zTsXv1SkgUnBkzNFmJU_full.png"
          }
        ]
      },
      "error": null
    }
    curl https://api.gaffa.dev/v1/browser/requests \
      --request POST \
      --header 'Content-Type: application/json' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --data '{
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "async": false,
        "max_cache_age": 0,
        "settings": {
          "actions": [
            {
              "type": "wait",
              "selector": "#cookie-policy-notice",
              "timeout": 10000,
              "continue_on_fail": true
            },
            {
              "type": "click",
              "selector": "#cookie-policy-notice",
              "continue_on_fail": true
            },
            {
              "type": "wait",
              "selector": "#firstHeading",
              "timeout": 10000
            },
            {
              "type": "scroll",
              "percentage": 100
            },
            {
              "type": "capture_screenshot",
              "size": "fullscreen"
            }
          ]
        }
      }'
    python -m venv venv && source venv/bin/activate
    pip install requests openai
    GAFFA_API_KEY=your_gaffa_api_key
    OPENAI_API_KEY=your_openai_api_key
    import requests
    import openai
    
    GAFFA_API_KEY = os.getenv("GAFFA_API_KEY")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    
    # Fetch the markdown content from Gaffa
    def fetch_markdown_with_gaffa(url):
        payload = {
            "url": url,
            "proxy_location": None,
            "async": False,
            "max_cache_age": 0,
            "settings": {
                "record_request": False,
                "actions": [
                    {
                        "type": "wait",
                        "selector": "article"
                    },
                    {
                        "type": "generate_markdown"
                    }
                ]
            }
        }
       
        # Set the headers for the request
        headers = {
            "x-api-key": GAFFA_API_KEY,
            "Content-Type": "application/json"
        }
        # Make the POST request to the Gaffa API
        print("Calling Gaffa API to generate markdown...")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", json=payload, headers=headers)
        response.raise_for_status()
       
        # Extract the markdown URL from the response
        markdown_url = response.json()["data"]["actions"][1]["output"]
       
        # Fetch the markdown content from the generated URL
        print(f"📥 Fetching markdown from: {markdown_url}")
        markdown_response = requests.get(markdown_url)
        markdown_response.raise_for_status()
       
        return markdown_response.text
    def ask_question(markdown, question):
        openai.api_key = OPENAI_API_KEY
        prompt = (
            f"You are an assistant helping analyze different webpages.\n\n"
            f"Markdown content:\n{markdown[:3000]}\n\n"
            f"Question: {question}\nAnswer as clearly as possible."
        )
    
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message["content"]
    def main():
        url = input("Enter the URL of the article: ")
        try:
            markdown = fetch_markdown_with_gaffa(url)
            print("\n✅ Markdown successfully retrieved from Gaffa.\n")
    
            while True:
                question = input("Ask a question about the content (or type 'exit'): ")
                if question.lower() == "exit":
                    break
                answer = ask_question(markdown, question)
                print(f"\n💬 Answer: {answer}\n")
    
        except Exception as e:
            print(f"⚠️ Error: {e}")
    
     if __name__ == "__main__":
        main()
    python your_script_name.py

    How to scrape all images from a website using Gaffa

    This tutorial will show you how you can use Gaffa to retrieve all images from a site and then download all images across those pages.

    Automating the collection of images from a website can save hours of manual work. Whether you're a marketer building a competitor analysis, a developer creating a dataset, or an archiver preserving digital content, doing this manually is tedious and error-prone.

    In this tutorial, you'll learn how to use Gaffa's powerful Mapping and Browser Requests endpoints to automatically find, extract, and download every image from a website in a short Python script. We'll leverage features like the capture_dom action, intelligent sitemap parsing, and the download_file action to handle this efficiently and responsibly.

    By the end of this guide, you'll be able to:

    • Use Gaffa's site/map endpoint to discover every page on a site.

    • Render each page with a headless browser to capture its full DOM.

    • Parse and download all images using Gaffa's action with residential proxies

    • Run the process at scale with built-in proxy rotation and caching.

    • Python 3.10+ is installed on your machine.

    • A Gaffa API key. and get your API key from the dashboard.

    • Basic familiarity with the command line.

    1

    First, create a new project directory and install the required Python libraries.

    Next, set your Gaffa API key as an environment variable to keep it secure.

    2

    Let's build the script step-by-step. The core logic consists of three main parts: mapping the site, capturing the DOM for each page, and extracting images using Gaffa's download system.

    Fetch All URLs from the Sitemap

    • Handles JavaScript-Rendered Content: Unlike simple HTTP scrapers, Gaffa uses a real browser, so it captures anything that is lazy-loaded by JavaScript.

    • Stealth Downloading with Residential Proxies: The download_file action uses real browsers and proxies, making your requests appear as legitimate user traffic.

    • Intelligent Caching: With `max_cache_age` set to 24 hours, repeated requests for the same image are served from cache, reducing load on target servers and improving efficiency.

    This technique is useful for far more than just downloading pictures. Here are a few ideas:

    • Competitive Analysis: Analyze competitors' product photography styles using real browsers.

    • AI/ML Datasets: Build large, curated image datasets for training computer vision models using ethically sourced images.

    • Website Migration & Audits: Download all assets from an old site before a migration while minimizing server impact through caching.

    The full script is available on our .

    Ready to automate your image collection with enterprise-grade infrastructure? and start building today.

    Parse HTML Form to Structured JSON

    An example request that uses Gaffa to analyze a web form and extract all input fields, their labels, types, and properties into structured JSON.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our . You can run this request right here in the .

    This example demonstrates how to extract structured information from HTML forms on web pages. Gaffa uses AI to identify form elements and their properties, making it perfect for form automation, testing, accessibility audits, or building form-filling assistants.

    The request below uses the to open the demo form page, wait for the modal to appear, and then parse the visible form to extract all field information, including labels, input names, placeholders, and dropdown options.

    The parsed form data is returned as a structured JSON object:

    Gaffa can help automatically fill out your forms!

    API Request

    Actions

    Response

    demo site
    Gaffa API Playground
    POST endpoint
    Parse JSON
    The
    site/map
    endpoint is our starting point. It does the heavy lifting of discovery by reading the sitemap, traversing potential link-outs, and retrieving every page on the website you want to scrape.

    Capture the Rendered DOM of a Page

    For each URL, we use Gaffa to fully render the page (including JavaScript execution) and capture the final DOM. This is an important step since many websites are actually not fully rendered when we receive them. They contain links to JavaScript files that need to be executed first. These scripts will load further content from the backend, load images and other data. It’s necessary to first generate a fully rendered page before diving deeper into scraping it; otherwise, we would only scrape the content already provided in the initial HTML.

    Extract Images and Download with Gaffa

    With the real HTML in hand, we extract image URLs using a simple regex pattern and use Gaffa's download_file action for secure, reliable downloads. This also allows us to use caching, which avoids downloading the same image over and over again and putting a load on the target server.

    3

    Bringing It All Together

    The main() function orchestrates the entire workflow: mapping the site, processing each page, and downloading the images using Gaffa's infrastructure.

    4

    Run the Script

    Save the complete code to a file like gaffa_scrape_images.py and run it from your terminal:

    Sit back and watch as Gaffa automatically discovers, renders, and scrapes every image from the site using proxies and real browsers. The script will create timestamped folders and save all the images there.

    Built-in Reliability: Gaffa's infrastructure handles proxy rotation, request pacing, retries automatically and provides the correct file format directly.

  • Respectful Scraping: Gaffa's infrastructure is designed for responsible automation. Always check a website's robots.txt and terms of service before scraping, and respect reasonable rate limits.

  • Archival & Documentation: Preserve visual evidence for journalism or create backups of a site's visual content using proxies for access.

    Prerequisites

    Set Up Your Environment

    The Core Script Explained

    Why This Gaffa-Powered Approach is Superior

    Use Cases and Ideas

    Next Steps

    download_file
    Sign up for a free account
    GitHub repository
    Sign up for Gaffa
    {
      "url": "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=true&modalDelay=5&formType=address",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "parse_json",
            "data_schema": {
              "name": "AddressFormSchema",
              "description": "Extracts fields, labels, and placeholders from the demo address form",
              "fields": [
                {
                  "type": "string",
                  "name": "form_title",
                  "description": "The heading or title of the form"
                },
                {
                  "type": "object",
                  "name": "full_name",
                  "description": "Full name input field",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "placeholder",
                      "description": "Placeholder text shown in the input"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the input element"
                    }
                  ]
                },
                {
                  "type": "object",
                  "name": "address_line_1",
                  "description": "First address line input field",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "placeholder",
                      "description": "Placeholder text shown in the input"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the input element"
                    }
                  ]
                },
                {
                  "type": "object",
                  "name": "address_line_2",
                  "description": "Second address line input field",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "placeholder",
                      "description": "Placeholder text shown in the input"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the input element"
                    }
                  ]
                },
                {
                  "type": "object",
                  "name": "city",
                  "description": "City input field",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "placeholder",
                      "description": "Placeholder text shown in the input"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the input element"
                    }
                  ]
                },
                {
                  "type": "object",
                  "name": "postcode",
                  "description": "Postcode or ZIP code input field",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "placeholder",
                      "description": "Placeholder text shown in the input"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the input element"
                    }
                  ]
                },
                {
                  "type": "object",
                  "name": "country",
                  "description": "Country selection dropdown",
                  "fields": [
                    {
                      "type": "string",
                      "name": "label",
                      "description": "The visible label text"
                    },
                    {
                      "type": "string",
                      "name": "input_name",
                      "description": "The name attribute of the select element"
                    },
                    {
                      "type": "array",
                      "name": "options",
                      "description": "Available country options in the dropdown",
                      "fields": [
                        {
                          "type": "string",
                          "name": "value",
                          "description": "The option value or text"
                        }
                      ]
                    }
                  ]
                }
              ]
            },
            "instruction": "Extract all visible form fields from this address form, including their labels, input names, placeholders, and for dropdown fields, list all available options.",
            "model": "gpt-4o-mini",
            "output_type": "inline"
          }
        ]
      }
    }
    {
        "data": {
            "id": "brq_VYg5H56A7m4vLJTdzj2jB3MgTAfT7K",
            "url": "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=true&modalDelay=5&formType=address",
            "state": "completed",
            "credit_usage": 0,
            "http_status_code": 200,
            "from_cache": false,
            "started_at": "2025-12-01T06:40:15.9241312Z",
            "completed_at": "2025-12-01T06:40:23.7495525Z",
            "running_time": "00:00:07.8254213",
            "page_load_time": "00:00:00.3124478",
            "actions": [
                {
                    "id": "act_VYg5HDUFBrWq1GdmhQruRq4Gp7hjAk",
                    "type": "parse_json",
                    "timestamp": "2025-12-01T06:40:23.7495396Z",
                    "output": {
                        "form_title": "Address Form",
                        "full_name": {
                            "label": "Full Name",
                            "placeholder": "Enter your full name",
                            "input_name": "full_name"
                        },
                        "address_line_1": {
                            "label": "Address Line 1",
                            "placeholder": "Enter your address",
                            "input_name": "address_line_1"
                        },
                        "address_line_2": {
                            "label": "Address Line 2",
                            "placeholder": "Optional",
                            "input_name": "address_line_2"
                        },
                        "city": {
                            "label": "City",
                            "placeholder": "Enter your city",
                            "input_name": "city"
                        },
                        "postcode": {
                            "label": "Postcode",
                            "placeholder": "Enter your postcode",
                            "input_name": "postcode"
                        },
                        "country": {
                            "label": "Country",
                            "input_name": "country",
                            "options": [
                                {
                                    "value": "United States"
                                },
                                {
                                    "value": "Canada"
                                },
                                {
                                    "value": "United Kingdom"
                                },
                                {
                                    "value": "Australia"
                                },
                                {
                                    "value": "Germany"
                                }
                            ]
                        }
                    },
                    "reference": "https://storage.gaffa.dev/brq/dom/brq_VYg5H56A7m4vLJTdzj2jB3MgTAfT7K/act_VYg5HDUFBrWq1GdmhQruRq4Gp7hjAk_raw.txt"
                }
            ]
        }
    }
    def main():
        site_url = "https://gaffa.dev"
        sitemap_urls = get_sitemap_urls(site_url)[:3]
        
        for i, url in enumerate(sitemap_urls, 1):
            dom_content = get_dom(url)
            image_urls = extract_image_urls(dom_content, url)
            
            if image_urls:
                download_image(image_urls[0], f"image_{i}")
    
    if __name__ == "__main__":
        main()
    python3 gaffa_scrape_images.py
    # Create a new directory and navigate into it
    mkdir gaffa-image-scraper && cd gaffa-image-scraper
    
    # Create a virtual environment (optional but recommended)
    python -m venv venv
    source venv/bin/activate
    # On macOS/Linux
    export GAFFA_API_KEY='your_gaffa_api_key_here'
    def get_sitemap_urls(site_url, max_cache_age=86400):
        payload = {
            "url": site_url,
            "max_cache_age": max_cache_age
        }
        print("Retrieving sitemap URLs.")
        response = requests.post("https://api.gaffa.dev/v1/site/map", 
            json=payload, headers=HEADERS)
        return response.json()["data"]["links"]
    def get_dom(url):
        payload = {
            "url": url,
            "async": False,
            "settings": {
                "actions": [
                    {"type": "wait", "selector": "img", "timeout": 20000},
                    {"type": "capture_dom"}
                ],
                "time_limit": 40000
            }
        }
        print("Capturing DOM URL.")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", 
            json=payload, headers=HEADERS)
        dom_url = response.json()["data"]["actions"][1]["output"]
        print("Retrieving DOM.")
        dom_response = requests.get(dom_url)
        return dom_response.text
    def extract_image_urls(dom_content, base_url):
        image_urls = []
        src_pattern = r'<img[^>]+(?:src|data-src)=["\']([^"\']+)["\']'
        matches = re.findall(src_pattern, dom_content)
        
        for src in matches:
            if not src.startswith(('http:', 'https:')):
                src = urljoin(base_url, src)
            image_urls.append(src)
        
        return image_urls
    
    def download_image(image_url, filename):
        payload = {
            "url": image_url,
            "async": False,
            "settings": {
                "actions": [{"type": "download_file"}]
            }
        }
        print("Retrieving download URL.")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", json=payload, headers=HEADERS)
        actions = response.json()["data"]["actions"]
        download_url = actions[0]["output"]
        download_ext = os.path.splitext(download_url)[1]
        
        print("Downloading image.")
        img_response = requests.get(download_url)
        filepath = f"{filename}{download_ext}"
        with open(filepath, 'wb') as f:
            f.write(img_response.content)
    

    Extract and Fill Web Forms Automatically Using Gaffa

    Web forms are some of the most common and repetitive elements that users often interact with as developers. Whether you are collecting data, testing user flows, or even building other automation systems.

    In this guide, you'll learn how to use pase_json action to extract the structure of a web form and then automatically fill and submit it using Gaffa's browser automation features.

    By the end of this guide, you will be able to:

    • Extract structured form data (labels, input names, required fields, and placeholders) using parse_json

    • Define and use schemas to reliably understand page structure

    • Build a simple interactive CLI that collects user input

    • Automatically fill and submit a web form using Gaffa browser actions

    1. Install Python 3.10 or newer.

    2. Create a virtual environment

    1. Install the required libraries

    1. Get your key and store it as an environment variable:

    1. Install the required library

    In this tutorial, you'll create a Python script that:

    • Extracts form fields - Uses Parse JSON to analyze any web form and identify all input fields.

    • Collects user input - Prompts the user in the terminal to provide values for each field.

    • Submits the form - Automatically fills and submits the form using Gaffa's browser automation.

    By the end, you'll have a working form automation tool that can be adapted for countless use cases.

    Create a new directory and Python file.

    Create a file called form_filler.py (or any name that works for you) and add your configuration.

    Replace your_api_key_here with your actual Gaffa API key from the .

    In the code below, you define a function that takes a form URL as input and makes a POST request to the Gaffa API.

    The request uses two actions: first, a wait action ensures the form element is fully loaded on the page, then the parse_json action that uses AI to intelligently analyze the form structure and extract all input fields along with their properties (labels, names, types, placeholders, and required status). The AI understands the context of the form and returns structured JSON data that we can easily work with.

    Next, you need to define a function that takes the extracted form data and interacts with the user in the terminal. The function will display the form title and then loop through each field, prompting the user to fill in the value.

    For each field in the form, a label and a required marker, if applicable, are shown. The function ensures that the required fields are not left empty and allows users to skip optional fields by pressing enter. All the user's input is collected into a dictionary where the keys are the field names and the values are what the user entered.

    You need a function that will take the form URL and the user's input values, then submit the form to Gaffa's browser automation. The function will build a list of actions.

    First, it waits for the form to be ready, then creates a type action for each field to enter the user's value into the corresponding input element using CSS selectors. Lastly, it adds a click action to submit the form and a capture_screenshot action to take a full-screen image of the results.

    The function makes a POST request with all these actions and returns the response, which includes the screenshot URL if successful.

    Having defined the functions, we can now create a simple command-line interface that allows users to interact with the form.

    The full script is available to download from the .

    To run the script, simply execute it in your terminal:

    Prerequistes

    What You'll Build

    Set Up Your Environment

    Extract Form Fields Using parse_json

    Collect User Input

    Fill and Submit the Form

    User Interaction and Execution

    Full Script

    Running the Script

    Example output:

    Gaffa API
    Dashboard
    Gaffa Python Examples GitHub repo
    python -m venv venv && source venv/bin/activate
    pip install requests openai
    GAFFA_API_KEY=your_gaffa_api_key
    pip install requests
    mkdir gaffa-form-filler
    cd gaffa-form-filler
    import requests
    import json
    
    # Configuration
    GAFFA_API_KEY = "your_api_key_here"  # Replace with your actual API key
    GAFFA_API_URL = "https://api.gaffa.dev/v1/request"
    
    # The demo form we'll work with
    FORM_URL = "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=true&modalDelay=5&formType=address"
    def extract_form_fields(form_url):
        payload = {
            "url": form_url,
            "async": False,
            "settings": {
                "record_request": False,
                "actions": [
                    {
                        "type": "wait", 
                        "selector": "form", 
                        "timeout": 10000
                    },
                    {
                        "type": "parse_json",
                        "data_schema": {
                            "name": "FormFields",
                            "description": "Extract all form input fields",
                            "fields": [
                                {"type": "string", "name": "form_title", "description": "Form title"},
                                {
                                    "type": "array",
                                    "name": "fields",
                                    "description": "List of all input fields",
                                    "fields": [
                                        {"type": "string", "name": "label", "description": "Field label"},
                                        {"type": "string", "name": "field_name", "description": "Field name attribute"},
                                        {"type": "string", "name": "field_type", "description": "Input type"},
                                        {"type": "boolean", "name": "required", "description": "Is required?"},
                                        {"type": "string", "name": "placeholder", "description": "Placeholder text"}
                                    ]
                                }
                            ]
                        },
                        "instruction": "Extract all form fields with their properties",
                        "model": "gpt-4o-mini",
                        "output_type": "inline"
                    }
                ]
            }
        }
        
        headers = {"X-API-Key": GAFFA_API_KEY, "Content-Type": "application/json"}
        response = requests.post(GAFFA_API_URL, json=payload, headers=headers)
        response.raise_for_status()
        result = response.json()
        
        for action in result["data"]["actions"]:
            if action.get("type") == "parse_json":
                return action["output"]
        
        return None
    def collect_user_input(form_data):
        print(f"\n{'='*60}")
        print(f"📋 Form: {form_data.get('form_title', 'Unknown Form')}")
        print(f"{'='*60}\n")
        
        user_values = {}
        fields = form_data.get("fields", [])
        
        if not fields:
            print("⚠️  No fields found in the form")
            return user_values
        
        print(f"Please provide values for {len(fields)} field(s):\n")
        
        for i, field in enumerate(fields, 1):
            label = field.get("label", "Unknown Field")
            field_name = field.get("field_name", "")
            required = field.get("required", False)
            placeholder = field.get("placeholder", "")
            
            required_marker = " *" if required else ""
            placeholder_hint = f" (e.g., {placeholder})" if placeholder else ""
            prompt = f"[{i}/{len(fields)}] {label}{required_marker}{placeholder_hint}: "
            
            while True:
                value = input(prompt).strip()
                
                if required and not value:
                    print("  ⚠️  This field is required. Please provide a value.")
                    continue
                
                if not value and not required:
                    print("  ℹ️  Skipping optional field")
                    break
                
                user_values[field_name] = value
                break
        
        return user_values
    def fill_form(form_url, field_values):
        if not field_values:
            return None
        
        actions = [
            {
                "type": "wait", 
                "selector": "form", 
                "timeout": 10000
            }
        ]
        
        for field_name, value in field_values.items():
            if value:
                actions.append({
                    "type": "type",
                    "selector": f"[name='{field_name}']",
                    "text": value
                })
        
        actions.extend([
            {"type": "click", "selector": "button[type='submit']"},
            {"type": "capture_screenshot", "size": "fullscreen"}
        ])
        
        payload = {
            "url": form_url,
            "async": False,
            "settings": {
                "record_request": False,
                "actions": actions
            }
        }
        
        headers = {"X-API-Key": GAFFA_API_KEY, "Content-Type": "application/json"}
        response = requests.post(GAFFA_API_URL, json=payload, headers=headers)
        response.raise_for_status()
        
        return response.json()
    def main():
        print("\n" + "="*60)
        print("🤖 Gaffa Form Filler")
        print("="*60)
        print("This tool extracts form fields and helps you fill them out.\n")
        
        print("📋 Step 1: Analyzing form...")
        form_data = extract_form_fields(FORM_URL)
        
        if not form_data:
            print("\n❌ Could not extract form fields")
            return
        
        print(f"✅ Found {len(form_data.get('fields', []))} field(s)\n")
        
        print("📝 Step 2: Collecting your input...")
        user_values = collect_user_input(form_data)
        
        if not user_values:
            print("\n⚠️  No values provided. Exiting.")
            return
        
        print(f"\n{'='*60}")
        print("📊 Summary of values to submit:")
        print(f"{'='*60}")
        for field_name, value in user_values.items():
            print(f"  {field_name}: {value}")
        print(f"{'='*60}\n")
        
        confirm = input("Submit this form? (y/n): ").strip().lower()
        if confirm != 'y':
            print("\n❌ Submission cancelled")
            return
        
        print("\n🚀 Step 3: Submitting form...")
        result = fill_form(FORM_URL, user_values)
        
        if not result:
            print("❌ Form submission failed")
            return
        
        print("\n✅ Form submitted successfully!")
        
        if "data" in result and "actions" in result["data"]:
            for action in result["data"]["actions"]:
                if action.get("type") == "capture_screenshot" and "output" in action:
                    print(f"📸 Screenshot: {action['output']}")
        
        print("\n🎉 All done!\n")
    
    if __name__ == "__main__":
        main()
    python your_script_name.py
    ============================================================
    🤖 Gaffa Form Filler
    ============================================================
    This tool extracts form fields and helps you fill them out.
    
    📋 Step 1: Analyzing form...
    ✅ Found 9 field(s)
    
    📝 Step 2: Collecting your input...
    
    ============================================================
    📋 Form: Form Submission Test
    ============================================================
    
    Please provide values for 9 field(s):
    
    [1/9] First Name *: John
    [2/9] Last Name *: Smith
    [3/9] Email *: john@example.com
    ...
    
    ============================================================
    📊 Summary of values to submit:
    ============================================================
      first_name: John
      last_name: Smith
      email: john@example.com
    ...
    
    Submit this form? (y/n): y
    
    🚀 Step 3: Submitting form...
    
    ✅ Form submitted successfully!
    
    🎉 All done!

    Parse JSON

    Paid Action: This action consumes credits based on the amount of content parsed. See more .

    Type: parse_json

    The parse_json action extracts data from web pages and online PDFs. It uses AI to parse web content from text into a pre-defined data schema and return it as a JSON object.

    The action lets you convert unstructured content, such as academic papers, forms, and webpages, into JSON objects that you can use in automations, analysis, or further processing.

    This feature currently works for online PDFs and web page text.

    Parameters

    Name
    Type
    Required
    Description

    See .

    A data schema tells the model exactly what JSON structure to produce.

    You can define schemas in two ways:

    • Inline schemas (defined directly inside the action)

    • Reusable schemas (created via the Schema API and referenced by ID in your requests)

    A schema has:

    Property
    Type
    Description

    Each field in the fields array has:

    Type
    Description

    This example shows:

    • Simple fields (string, datetime) for basic data

    • Object fields for grouped related data with nested fields

    • Array fields

    Instead of defining schemas inline each time, you can save them to your Gaffa account and reuse them across multiple requests. This makes your actions more readable, easier to maintain, and ensures consistency when parsing similar content.

    Use the endpoint to create a reusable schema:

    Response:

    Save the id returned in the response, you'll use this to reference the schema in your requests

    Allows you to view all schemas saved to your account:

    Endpoint:

    Allows you to modify an existing schema by its ID:

    Endpoint:

    Removes a schema from your account:

    Endpoint:

    Simple List Extraction

    Nested Objects

    The credits this action uses depend on the model used. Here are the current supported models and their pricing:

    Model
    Input Token Cost
    Output Token Cost

    A custom instruction, in addition to any detail you have added to the data schema, that you want to include with this particular parse.

    model

    string

    The AI model you wish to use to parse the content into JSON. Default: gpt-4o-mini Accepted: ["gpt-4o-mini"]

    input_token_cap

    int

    The max number of source input tokens that will be passed to the AI model to parse. This can be used to prevent unnecessary credit usage. If your source input is longer than the token cap, it will be abbreviated. Default: 1,000,000

    selector

    string

    The that defines an element you want to parse the content of - this is useful if you are only interested in the contents of a certain element.

    output_type

    string

    Should the action output be saved to a file where a URL will be returned or should the parsed JSON object be included directly in the request. Default: file Accepted: ["file", "inline"]

    max_pages

    int

    If you are parsing a PDF you can specify this parameter to limit the number of pages that are passed to the LLM. Default: no limit

    name

    string

    This identifies the schema and should clearly indicate what data it extracts. Example: "ProductInfo", "ArticleMetadata", "ContactForm"

    type

    string

    Determines how the AI interprets and structures the extracted data. Must be one of the supported types below.

    decimal

    Precise decimal

    double

    Floating-point number

    integer

    Whole number

    object

    Nested structured object

    string

    Text value

    for lists of items with nested
    fields
    defining each item's structure

    data_schema_id

    string

    The id of the data schema you have defined that you want to transform the content into. You must provide a data_schema or data_schema_id with your request.

    data_schema

    json

    A JSON object describing the data_schema you want to transform the content into.

    You must provide a data_schema or data_schema_id with your request.

    instruction

    string

    description

    string

    Explains what data the schema extracts and provides context to help the AI model understand the extraction goal. Example: "Extract product details from this e-commerce product page"

    fields

    array

    Each field defines a piece of data to extract from the content. See field properties below.

    description

    string

    Include details about format, handling of missing values, or special cases.

    Example: "Maximum salary in GBP. If only one value is provided, use the same value for both min and max. Return null if not provided."

    fields

    array

    Required only for object and array types.

    name

    string

    array

    List of items

    boolean

    True/False

    datetime

    timestamp

    "actions": [
      {
        "type": "parse_json",
        "data_schema": {
          "name": "ArticleMetadata",
          "instruction": "Extract metadata from an article",
          "fields": [
            {
              "type": "string",
              "name": "title",
              "description": "Article title"
            },
            {
              "type": "string",
              "name": "author",
              "description": "Author name"
            },
            {
              "type": "datetime",
              "name": "published",
              "description": "Publication date"
            }
          ]
        },
        "model": "gpt-4o-mini",
        "output_type": "inline"
      }
    ]
    curl -L \
      --request POST \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "ProductInfo",
        "instruction": "Extract product details from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "boolean",
            "name": "in_stock",
            "description": "Product availability"
          },
          {
            "type": "object",
            "name": "ratings",
            "description": "Product rating information",
            "fields": [
              {
                "type": "double",
                "name": "average",
                "description": "Average rating score"
              },
              {
                "type": "integer",
                "name": "total_reviews",
                "description": "Number of reviews"
              }
            ]
          },
          {
            "type": "array",
            "name": "tags",
            "description": "Product tags",
            "fields": [
              {
                "type": "string",
                "name": "tag",
                "description": "Individual tag name"
              }
            ]
          }
        ]
      }'
    {
      "id": "schema_abc123xyz",
      "name": "ProductInfo",
      "description": "Extract product details from e-commerce pages",
      "fields": [...]
    }
    curl -L \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    curl -L \
      --request PUT \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "id": "schema_abc123xyz",
        "name": "ProductInfo",
        "instruction": "Extract detailed product information from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "string",
            "name": "brand",
            "description": "Product brand name"
          }
        ]
      }'
    curl -L \
      --request DELETE \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    {
      "name": "TagList",
      "instruction": "Extract article tags",
      "fields": [
        {
          "type": "array",
          "name": "tags",
          "description": "List of article tags",
          "fields": [
            {
              "type": "string",
              "name": "tag",
              "description": "Individual tag name"
            }
          ]
        }
      ]
    }
    {
      "name": "ProductWithReviews",
      "instruction": "Product details with nested review data",
      "fields": [
        {
          "type": "string",
          "name": "product_name",
          "description": "Product name"
        },
        {
          "type": "object",
          "name": "pricing",
          "description": "Pricing information",
          "fields": [
            {
              "type": "decimal",
              "name": "current_price",
              "description": "Current price"
            },
            {
              "type": "decimal",
              "name": "original_price",
              "description": "Original price before discount"
            },
            {
              "type": "integer",
              "name": "discount_percentage",
              "description": "Discount percentage"
            }
          ]
        }
      ]
    }

    gpt-4o-mini

    1 credit per 10,000 input tokens

    4 credits per 10,000 output tokens

    Defining Data Schemas

    Schema Structure

    Supported Field Types

    Inline Schema Example

    Schema Operations

    Creating a Saved Schema

    Managing Schemas

    List all schemas:

    Update a schema:

    Delete a schema:

    Common Schema Patterns

    Pricing

    universal parameters
    POST /v1/schemas
    GET /v1/schemas
    PUT /v1/schemas
    DELETE /v1/schemas/:id
    below

    Use clear, descriptive names that follow your preferred naming convention (e.g., snake_case or camelCase). Example: "product_name", "published_date", "author_email"

    selector
    Gaffa scrolling to the bottom of a simulated ecommerce page!

    Get multiple browser requests

    get

    This endpoint retrieves browser requests in bulk by id or status.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    idsstringOptional

    The unique identifiers of the browser requests to retrieve.

    Example: {"value":"brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6,brq_V2P6X38RDRMRyYcNJ82qPSH5eFfQRD"}
    statusstringOptional

    The statuses of the browser requests to filter by. Valid values: pending, running, completed, failed

    Example: {"value":"completed,running"}
    pageSizeinteger · int32Optional

    Items to return per page (default: 30).

    Example: {"value":20}
    pageinteger · int32Optional

    Page number of the pagination (default: 1).

    Example: {"value":1}
    Responses
    200

    A collection of browser requests that match the criteria

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    idstring · nullableOptional

    ID of the browser request

    urlstring · nullableOptional

    URL of the request

    proxy_locationstring · nullableOptional

    The proxy location of the request.

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    actual_urlstring · nullableOptional

    The actual URL captured, after any redirects.

    http_status_codeinteger · int32Optional

    The http status code for the request.

    from_cacheboolean · nullableOptional

    If this request was served from the cached

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    page_load_timestring · timespanOptional

    How long did the page take to fully render.

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    actionsobject · browerRequestActionResponse[] · nullableOptional
    ⤷Circular reference to object · browerRequestActionResponse[]
    errorstring · nullableOptional

    Error message, if any

    errorstring · nullableOptional

    Error message, if any

    videostring · nullableOptional

    Video url

    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    400

    Invalid query parameters

    application/json
    typestring · nullableOptional

    The type of object this is concerning

    idstring · nullableOptional

    The id of the item concerned.

    codestring · nullableOptional

    Error code.

    messagestring · nullableOptional

    Error description.

    get/v1/browser/requests

    Delete a data schema

    delete

    Deletes a data schema by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired
    Responses
    204

    No description

    delete/v1/schemas/{id}
    204

    No description

    No content

    Create a new sitemap request

    post

    This endpoint processes a website's sitemap and returns all URLs found within it.

    Authorizations
    X-API-KeystringRequired
    Body
    urlstringOptional

    The url you want our sitemap reader to process on your behalf

    max_cache_ageinteger · int32 · nullableOptional

    Maximum cache age in seconds for this request. If a cached result exists within this timeframe, it will be returned. Default is 0 (no cache).

    Responses
    200

    The sitemap request response detailing the URLs found

    application/json
    idstring · nullableOptional

    ID of the sitemap request

    urlstring · nullableOptional

    URL of the request

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    from_cacheboolean · nullableOptional

    If this request was served from the cache

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    linksstring[] · nullableOptional

    List of URLs found in the sitemap

    link_countinteger · int32 · nullableOptional

    Number of links found

    408

    The sitemap request timed out after 60 seconds

    application/json
    503

    The requested site is unavailable

    application/json
    post/v1/site/map

    Create a new browser request

    post

    This endpoint loads the required URL in our browser and then performs the selected actions.

    Authorizations
    X-API-KeystringRequired
    Body
    proxy_locationstring · nullableOptional

    The location of the proxy server that your request will be routed through, null means no proxy is used

    Default: null
    urlstringOptional

    The url you want our browsers to visit on your behalf

    asyncbooleanOptional

    Whether the request should be processed asynchronously, synchronous requests can be maximum 60 seconds long.

    Default: true
    max_cache_ageinteger · int32 · nullableOptional

    The maximum age of a cached result in seconds. 0 means the cache will never be used

    Default: 0
    record_requestboolean · nullableOptional

    Record a video of this request

    Default: false
    Other propertiesobjectOptional
    time_limitinteger · int32 · nullableOptional

    Cap the maximum time the request should take to complete, in milliseconds (default: 60000)

    Default: 60000
    max_media_bandwidthinteger · int32 · nullableOptional

    Cap the maximum bandwidth to use for media downloads, in MB

    block_adsboolean · nullableOptional

    Enable ad blocking for this request

    Default: false
    Responses
    200

    The browser request response detailing the state and output of the request

    application/json
    idstring · nullableOptional

    ID of the browser request

    urlstring · nullableOptional

    URL of the request

    proxy_locationstring · nullableOptional

    The proxy location of the request.

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    actual_urlstring · nullableOptional

    The actual URL captured, after any redirects.

    http_status_codeinteger · int32Optional

    The http status code for the request.

    from_cacheboolean · nullableOptional

    If this request was served from the cached

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    page_load_timestring · timespanOptional

    How long did the page take to fully render.

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    actionsobject · browerRequestActionResponse[] · nullableOptional
    ⤷Circular reference to object · browerRequestActionResponse[]
    errorstring · nullableOptional

    Error message, if any

    errorstring · nullableOptional

    Error message, if any

    videostring · nullableOptional

    Video url

    408

    The browser request timed out - an example error

    application/json
    post/v1/browser/requests

    Get Sitemap

    get

    This endpoint retrieves sitemap requests in bulk by id or status.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    idsstringOptional

    The unique identifiers of the sitemap requests to retrieve.

    Example: {"value":"smr_1234567890abcdef,smr_0987654321fedcba"}
    statusstringOptional

    The statuses of the sitemap requests to filter by. Valid values: pending, completed, failed

    Example: {"value":"completed,pending"}
    pageSizeinteger · int32Optional

    Items to return per page (default: 30).

    Example: {"value":30}
    pageinteger · int32Optional

    Page number of the pagination (default: 1).

    Example: {"value":1}
    Responses
    200

    A collection of sitemap requests that match the criteria

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    idstring · nullableOptional

    ID of the sitemap request

    urlstring · nullableOptional

    URL of the request

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    from_cacheboolean · nullableOptional

    If this request was served from the cache

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    linksstring[] · nullableOptional

    List of URLs found in the sitemap

    link_countinteger · int32 · nullableOptional

    Number of links found

    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    400

    Invalid query parameters

    application/json
    typestring · nullableOptional

    The type of object this is concerning

    idstring · nullableOptional

    The id of the item concerned.

    codestring · nullableOptional

    Error code.

    messagestring · nullableOptional

    Error description.

    get/v1/site/map

    Update an existing data schema

    put

    Updates an existing data schema by its ID and returns the updated schema.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired
    Body
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    fieldsobject · schemaField[] · nullableOptional
    ⤷Circular reference to object · schemaField[]
    Responses
    200

    Payload of DataSchema

    application/json
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    fieldsobject · schemaField[] · nullableOptional
    ⤷Circular reference to object · schemaField[]
    put/v1/schemas/{id}

    List data schemas

    get

    Retrieves a paginated list of data schemas.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    pageSizeinteger · int32Optional
    pageinteger · int32Optional
    Responses
    200

    Payload of PagedResult containing DataSchema

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    fieldsobject · schemaField[] · nullableOptional
    ⤷Circular reference to object · schemaField[]
    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    get/v1/schemas
    200

    Payload of PagedResult containing DataSchema

    Get a sitemap request by ID

    get

    This endpoint retrieves a sitemap request by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired

    The unique identifier of the sitemap request to retrieve.

    Responses
    200

    The sitemap request

    application/json
    idstring · nullableOptional

    ID of the sitemap request

    urlstring · nullableOptional

    URL of the request

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    from_cacheboolean · nullableOptional

    If this request was served from the cache

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    linksstring[] · nullableOptional

    List of URLs found in the sitemap

    link_countinteger · int32 · nullableOptional

    Number of links found

    404

    Sitemap request not found

    application/json
    get/v1/site/map/{id}

    Create a new data schema

    post

    Creates a new data schema definition and returns the created schema.

    Authorizations
    X-API-KeystringRequired
    Body
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    fieldsobject · schemaField[] · nullableOptional
    ⤷Circular reference to object · schemaField[]
    Responses
    200

    Payload of DataSchema

    application/json
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    typeinteger · enumOptional

    The type of the field.

    Possible values:
    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    fieldsobject · schemaField[] · nullableOptional
    ⤷Circular reference to object · schemaField[]
    post/v1/schemas

    Get a browser request by ID

    get

    This endpoint retrieves a browser request by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired

    The unique identifier of the browser request to retrieve.

    Query parameters
    idstringRequired

    The unique identifiers of the browser request to retrieve.

    Responses
    200

    The browser request

    application/json
    idstring · nullableOptional

    ID of the browser request

    urlstring · nullableOptional

    URL of the request

    proxy_locationstring · nullableOptional

    The proxy location of the request.

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    actual_urlstring · nullableOptional

    The actual URL captured, after any redirects.

    http_status_codeinteger · int32Optional

    The http status code for the request.

    from_cacheboolean · nullableOptional

    If this request was served from the cached

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    page_load_timestring · timespanOptional

    How long did the page take to fully render.

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    idstring · nullableOptional

    ID of the action

    typestring · nullableOptional

    Name of the action

    custom_idstring · nullableOptional

    Custom ID of the action

    timestampstring · date-timeOptional

    Time the action was initiated

    outputobject · nullableOptional

    Ouput of the action, if any

    referencestring · nullableOptional

    Reference file for the action, if any

    iterationsinteger · int32Optional

    Number of iterations completed for loop actions

    actionsobject · browerRequestActionResponse[] · nullableOptional
    ⤷Circular reference to object · browerRequestActionResponse[]
    errorstring · nullableOptional

    Error message, if any

    errorstring · nullableOptional

    Error message, if any

    videostring · nullableOptional

    Video url

    404

    Browser request not found

    application/json
    get/v1/browser/requests/{id}
    200

    Payload of DataSchema

    200

    Payload of DataSchema

    {"data":{"total_pages":1,"total_records":2,"results":[{"id":"brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP","url":"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20","proxy_location":null,"state":"completed","credit_usage":4,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T16:31:13.128103+00:00","completed_at":"2024-11-22T16:31:47.020851+00:00","running_time":null,"page_load_time":"00:00:03.4705813","actions":[{"id":"act_V2PUfETiTdXwzEgAW2NPURnATW7we9","type":"wait","custom_id":null,"timestamp":"2024-11-22T16:31:16.6080484+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2PUfBreQxHR2SNqGXuPzzWoiyRsrm","type":"print","custom_id":null,"timestamp":"2024-11-22T16:31:40.5760333+00:00","output":"https://storage.gaffa.dev/brq/pdf/brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP/act_V2PUfBreQxHR2SNqGXuPzzWoiyRsrm.pdf","reference":null,"iterations":null,"actions":null,"error":null}],"video":"https://storage.gaffa.dev/brq/video/brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP.mp4"},{"id":"brq_V2NmHY9FsvPQEGbfVBSeV6UCp2SXjC","url":"https://demo.gaffa.dev/simulate/article?loadTime=3&paragraphs=10&images=3","proxy_location":null,"state":"completed","credit_usage":1,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T12:52:48.708264+00:00","completed_at":"2024-11-22T12:52:54.25994+00:00","running_time":null,"page_load_time":"00:00:00.8094888","actions":[{"id":"act_V2NmHijnQa9iPDNcvhjS2GGFt5se8j","type":"wait","custom_id":null,"timestamp":"2024-11-22T12:52:49.5690537+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2NmHgs27VJKB49YavtK4CcyErdfvD","type":"generate_markdown","custom_id":null,"timestamp":"2024-11-22T12:52:52.8353136+00:00","output":"https://storage.gaffa.dev/brq/md/brq_V2NmHY9FsvPQEGbfVBSeV6UCp2SXjC/act_V2NmHgs27VJKB49YavtK4CcyErdfvD.md","reference":null,"iterations":null,"actions":null,"error":null}],"video":null},{"id":"brq_V2HvS2cw4Z2wonqEAwbxoxjrkmRdEM","url":"https://demo.gaffa.dev/simulate/article","proxy_location":null,"state":"failed","credit_usage":0,"error":null,"error_reason":null,"actual_url":null,"http_status_code":null,"from_cache":false,"started_at":null,"completed_at":null,"running_time":null,"page_load_time":null,"actions":null,"video":null}],"page":1,"page_size":30},"error":null}
    GET /v1/browser/requests HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    DELETE /v1/schemas/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {"id":"smr_1234567890abcdef","url":"https://example.com","state":"completed","credit_usage":1,"error":null,"error_reason":null,"from_cache":false,"started_at":"2024-01-01T12:00:00+00:00","completed_at":"2024-01-01T12:00:30+00:00","running_time":"00:00:30","links":["https://example.com/","https://example.com/about","https://example.com/products","https://example.com/contact"],"link_count":4}
    POST /v1/site/map HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 55
    
    "{\"url\":\"https://example.com\",\"max_cache_age\":0}"
    {"data":{"id":"brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6","url":"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20","proxy_location":null,"state":"completed","credit_usage":1,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T14:33:38.7762685+00:00","completed_at":"2024-11-22T14:33:42.7135779+00:00","running_time":null,"page_load_time":"00:00:00.1902889","actions":[{"id":"act_V2P6Q3fSHhBWAhf4BQJxj9oYbQF1V9","type":"wait","custom_id":null,"timestamp":"2024-11-22T14:33:38.9665719+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2P6Q6BuQhoYDAVkQMVSKkkwmUrArb","type":"print","custom_id":null,"timestamp":"2024-11-22T14:33:42.3025888+00:00","output":"https://storage.gaffa.dev/brq/pdf/brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6/....","reference":null,"iterations":null,"actions":null,"error":null}],"video":null},"error":null}
    POST /v1/browser/requests HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 370
    
    "{\"proxy_location\":null,\"url\":\"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20\",\"async\":false,\"max_cache_age\":0,\"settings\":{\"record_request\":false,\"actions\":[{\"type\":\"wait\",\"selector\":\"table\"},{\"type\":\"print\",\"size\":\"A4\",\"margin\":20}],\"time_limit\":60000,\"max_media_bandwidth\":null,\"output\":null,\"block_ads\":false}}"
    {"total_pages":0,"total_records":1,"results":[{"id":"smr_1234567890abcdef","url":"https://example.com","state":"completed","credit_usage":1,"error":null,"error_reason":null,"from_cache":false,"started_at":"2024-01-01T12:00:00+00:00","completed_at":"2024-01-01T12:01:00+00:00","running_time":"00:01:00","links":["https://example.com/","https://example.com/about","https://example.com/products"],"link_count":3}],"page":1,"page_size":30}
    GET /v1/site/map HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "name": "text",
      "description": "text",
      "fields": [
        {
          "type": 0,
          "name": "text",
          "description": "text",
          "fields": [
            {
              "type": 0,
              "name": "text",
              "description": "text",
              "fields": [
                "[Circular Reference]"
              ]
            }
          ]
        }
      ]
    }
    PUT /v1/schemas/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 1082
    
    "{\"name\":\"Updated Product Schema\",\"description\":\"Enhanced schema for product information with additional fields\",\"fields\":[{\"type\":\"string\",\"name\":\"productName\",\"description\":\"Name of the product\",\"fields\":[]},{\"type\":\"decimal\",\"name\":\"price\",\"description\":\"Product price\",\"fields\":[]},{\"type\":\"boolean\",\"name\":\"inStock\",\"description\":\"Whether the product is in stock\",\"fields\":[]},{\"type\":\"array\",\"name\":\"tags\",\"description\":\"Product tags\",\"fields\":[{\"type\":\"string\",\"name\":\"tagItem\",\"description\":null,\"fields\":[]}]},{\"type\":\"array\",\"name\":\"categories\",\"description\":\"Product categories\",\"fields\":[{\"type\":\"string\",\"name\":\"category\",\"description\":null,\"fields\":[]}]},{\"type\":\"object\",\"name\":\"specifications\",\"description\":\"Technical specifications\",\"fields\":[{\"type\":\"string\",\"name\":\"dimensions\",\"description\":\"Product dimensions\",\"fields\":[]},{\"type\":\"double\",\"name\":\"weight\",\"description\":\"Product weight in grams\",\"fields\":[]}]}]}"
    GET /v1/schemas HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {"data":{"total_pages":1,"total_records":3,"results":[{"id":"schema_abc123def456","name":"Customer Schema","description":"Data schema for customer information","fields":[{"type":"string","name":"firstName","description":"Customer's first name","fields":[]},{"type":"string","name":"lastName","description":"Customer's last name","fields":[]},{"type":"integer","name":"age","description":"Customer's age in years","fields":[]},{"type":"boolean","name":"isActive","description":"Whether the customer account is active","fields":[]}]},{"id":"schema_xyz789uvw123","name":"Product Schema","description":"Data schema for product information","fields":[{"type":"string","name":"productName","description":"Name of the product","fields":[]},{"type":"decimal","name":"price","description":"Product price","fields":[]},{"type":"boolean","name":"inStock","description":"Whether the product is in stock","fields":[]},{"type":"array","name":"tags","description":"Product tags","fields":[{"type":"string","name":"tagItem","description":null,"fields":[]}]}]},{"id":"schema_hij456klm789","name":"Order Schema","description":"Data schema for order processing","fields":[{"type":"string","name":"orderId","description":"Unique order identifier","fields":[]},{"type":"datetime","name":"orderDate","description":"Date when order was placed","fields":[]},{"type":"object","name":"customer","description":"Customer information","fields":[{"type":"string","name":"customerId","description":"Customer identifier","fields":[]},{"type":"string","name":"email","description":"Customer email address","fields":[]}]},{"type":"decimal","name":"totalAmount","description":"Total order amount","fields":[]}]}],"page":1,"page_size":30},"error":null}
    GET /v1/site/map/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "url": "text",
      "state": "text",
      "credit_usage": 1,
      "error": "text",
      "error_reason": "text",
      "from_cache": true,
      "started_at": "2026-06-25T02:09:35.290Z",
      "completed_at": "2026-06-25T02:09:35.290Z",
      "running_time": "text",
      "links": [
        "text"
      ],
      "link_count": 1
    }
    {"id":"schema_abc123def456","name":"Customer Schema","description":"Data schema for customer information","fields":[{"type":"string","name":"firstName","description":"Customer's first name","fields":[]},{"type":"string","name":"lastName","description":"Customer's last name","fields":[]},{"type":"integer","name":"age","description":"Customer's age in years","fields":[]},{"type":"boolean","name":"isActive","description":"Whether the customer account is active","fields":[]}]}
    POST /v1/schemas HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 518
    
    "{\"name\":\"Customer Schema\",\"description\":\"Data schema for customer information\",\"fields\":[{\"type\":\"string\",\"name\":\"firstName\",\"description\":\"Customer's first name\",\"fields\":[]},{\"type\":\"string\",\"name\":\"lastName\",\"description\":\"Customer's last name\",\"fields\":[]},{\"type\":\"integer\",\"name\":\"age\",\"description\":\"Customer's age in years\",\"fields\":[]},{\"type\":\"boolean\",\"name\":\"isActive\",\"description\":\"Whether the customer account is active\",\"fields\":[]}]}"
    GET /v1/browser/requests/{id}?id=text HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "url": "text",
      "proxy_location": "text",
      "state": "text",
      "credit_usage": 1,
      "error": "text",
      "error_reason": "text",
      "actual_url": "text",
      "http_status_code": 1,
      "from_cache": true,
      "started_at": "2026-06-25T02:09:35.290Z",
      "completed_at": "2026-06-25T02:09:35.290Z",
      "running_time": "text",
      "page_load_time": "text",
      "actions": [
        {
          "id": "text",
          "type": "text",
          "custom_id": "text",
          "timestamp": "2026-06-25T02:09:35.290Z",
          "output": {},
          "reference": "text",
          "iterations": 1,
          "actions": [
            {
              "id": "text",
              "type": "text",
              "custom_id": "text",
              "timestamp": "2026-06-25T02:09:35.290Z",
              "output": {},
              "reference": "text",
              "iterations": 1,
              "actions": [
                "[Circular Reference]"
              ],
              "error": "text"
            }
          ],
          "error": "text"
        }
      ],
      "video": "text"
    }