arrow-left

Only this pageAll pages
gitbookPowered by GitBook
1 of 46

Documentation

Loading...

Loading...

Loading...

Features

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

API Reference

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Tutorials

Loading...

Loading...

Loading...

AI Tools

Capture Cookies

triangle-exclamation

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

Type: capture_cookies

This action will capture the browser cookies currently saved for the web page you are on and return them as a JSON object with key/values.

hashtag
Parameters

See .

hashtag
Usage

Capture the cookies of the current page

Credits and Pricing

circle-info

View our current pricing plans on the Gaffa

hashtag
Browser Requests

Browser requests are charged in terms of credits based on the following factors:

Capture Screenshot

Type: capture_screenshot

Takes a screenshot of the current page. You can choose to take a full screen screenshot showing the whole page or just the current view.

hashtag
Parameters

Name
Type

Block DOM Removals

triangle-exclamation

Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

Type: block_dom_removals

This action will prevent the page from removing items from the page. This is useful if you are trying to scrape data from a Javascript-based web application that removes items from the page when they are out of view which can make grabbing data difficult.

Generate Markdown

Type: generate_markdown

The markdown output format can export the data of the page (an article, table etc.) in a human and LLM readable format which removes unnecessary styling data and other "junk" that is only relevant for the site to work properly.

Gaffa exports with comments removed and unknown tags ignored.

hashtag
Parameters

POST v1/browser/requests

circle-info

For more information on browser requests, .

The following endpoint creates a browser request and either runs it synchronously or returns immediately with an ID so you can check it status later using this endpoint.

API Authentication

We use API Keys for authenticating requests to our API. In this document we'll explain how you can manage and use the keys for your account.

hashtag
Creating Keys

Once your account is approved, you will need to create an API key to send your requests to our API. Go to your account and create a new key with a name. Once the key is created, copy the value and you will immediately be free to start using it to make requests.

circle-info

You can create as many keys as wish but always remember to treat the key as a secret and do not reveal in public blog posts or GitHub repositories. If someone uses your key to make requests with your leaked key we won't be responsible!

hashtag
Deleting Keys

If you are worried you have exposed your Gaffa API key or just want to periodically rotate your keys you can create another key and then delete your old keys. Deleted keys will immediately stop working for new requests to the API but past browser requests made using old keys will still be available.

hashtag
Authenticating Requests

Our API is secured with a customer header X-API-Key whose value should be any current API key in your account. That's all you need to add to your request!

Dashboard > API Keysarrow-up-right

API Playground Examples

In the following pages you can view all the pre-built requests we've built to show what is possible with the Gaffa web automation API.

You can start using these in the API Playgroundarrow-up-right once you've created an account.

"actions": [
    {
      "type": "capture_cookies"
    }
]
universal parameters
Using this action will block DOM removals for the rest of the browser request.

hashtag
Parameters

See universal parameters.

hashtag
Usage

Capture the cookies of the current page

contact supportarrow-up-right
"actions": [
    {
      "type": "block_dom_removals"
    }
]
See universal parameters.

hashtag
Usage

The following converts the current page to markdown:

hashtag
Example Output

GitHub flavoured markdownarrow-up-right
file-download
5KB
GaffaMarkdownExample.md
arrow-up-right-from-squareOpen
"actions": [
    {
        "type": "generate_markdown"
    }
]
  • Request length: Billed at 1 credit per 30 seconds the request takes to run on the browser.

    • If screen recording is enabled, this is doubled to 2 credits per 30 seconds.

  • Proxy bandwidth usage: All requests that use a proxy_location parameter use our network of residential proxies and are billed at 1500 credits per 1GB of bandwidth used.

  • Paid Actions: Some actions will incur additional costs for their usage in a browser request. These are:

  • Each successful request will deduct the corresponding number of credits from your monthly allowance. Be sure to use as many of your monthly credits as you want as they don't roll over month to month.

    hashtag
    Mapping Requests

    Mapping requests are also charged in credits at a rate of 1 credit per mapping request.

    homepagearrow-up-right
    Required
    Description

    size

    string

    The size of paper the page should be printed to. Default: view Accepted: ["view", "fullscreen"]

    See universal parameters.

    hashtag
    Usage

    The following captures the current section of the page currently visible in the browser.

    hashtag
    Example Output

    An example screenshot in fullscreen mode.

    see here

    Capture DOM

    Type: capture_dom

    This action will capture and return the raw dom of the site which you can then extract data from on your end.

    For common AI scenarios you may find this returns too much data so we have provided a generate_simplified_dom action which distills the DOM to only the important elements.

    hashtag
    Parameters

    See .

    hashtag
    Usage

    Capture the raw DOM of the current page

    hashtag
    Example Output

    Click

    Type: click

    Request that the browser clicks a particular element on the page.

    hashtag
    Parameters

    Name
    Type
    Required
    Description

    See .

    hashtag
    Usage

    hashtag
    Click an element on the page

    The following code will wait 1 second and then continue with the next action, if provided.

    hashtag
    Wait for a particular element to appear

    The following code will wait for the logo to appear for a maximum of 5 seconds and it will continue with the list of actions

    Export Web Page to PDF

    An example request that uses Gaffa to convert an HTML page to a PDF. There are lots of HMTL to PDF API's but Gaffa handles it easily, as well as doing much more.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site.arrow-up-right You can run this request right now in the Gaffa API Playgroundarrow-up-right.

    Gaffa's print to PDF feature allows you to export web pages as PDF files easily. Unlike the standard "Print to PDF" in your local browser, Gaffa's feature waits for specific items to load, uses proxies, and scales with your product's growth. Enhance your customer experience and streamline your PDF export process

    hashtag
    API Request

    The request below uses the to open the demo site on the table page, wait for the table to load and then print the webpage to a PDF in size A4 with a margin of 20 and using the portrait orientation.

    hashtag
    Actions

    Read the full documentation for these actions here.

    hashtag
    Response

    Here's an example of the PDF returned by the request after waiting for the table to load.

    Parse Table

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

    Type: parse_table

    Finds a table on the page with a given selector and then converts the table data into a JSON object.

    This action first fins the table headers and converts them into property names by converting them to lower case and replacing non-alphanumeric characters with underscores. It then processes each table row and for each cell is extracts the contents and saves a value. At the moment, all values will be string types.

    hashtag
    Parameters

    Name
    Type
    Required
    Description

    See .

    hashtag
    Usage

    hashtag
    Extract a table on the page

    The following code will wait 1 second for the .large_table element to appear and return a JSON file with the headers and rows converted.

    POST v1/schemas

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

    The following endpoint allows you to describe a data schema for parsing an online PDF to JSON.

    GET v1/schemas

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

    The following endpoint allows you to list data schemas for your account in a paged list.

    PUT v1/schemas

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

    The following endpoint allows you to update a data schema by ID.

    GET v1/site/map

    This endpoint retrieves information about previous site mapping requests, filterable by id or status

    GET v1/site/map/{id}

    This endpoint retrieves information about a site mapping request.

    GET v1/browser/requests/{id}

    circle-info

    For more information on browser requests, see here.

    The following endpoint allows you to query browser request for your account by ID.

    Capture Element

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

    Type: capture_element

    Returns the , essentially the contents, of a particular element on the page. This can be used when you are only interested in the contents of a particular element.

    Download File

    Type: download_file

    Request a copy of the most recent file viewed in the browser.

    hashtag
    Parameters

    Name
    Type

    Convert Web Page to Markdown

    An example request that uses Gaffa to convert a web page page to markdown. This could be used to export web page reports or to print the content of a page in a readable format.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .

    Gaffa converts web pages to clean markdown, stripping away styling, scripts, and images. This optimizes content for LLM applications by reducing token usage while preserving essential information.

    hashtag
    API Request

    Print

    Type: print

    Request that the browser prints the page to a PDF.

    hashtag
    Parameters

    Name
    Type
    Required

    Generate Simplified DOM

    Type: generate_simplified_dom

    When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into a LLM. The generate_simplified_dom output format processes the HTML in the following way:

    • Removes all links in the head

    Wait

    Type: wait

    Request that the browser waits a given amount of time or for a particular item to appear on the page.

    hashtag
    Parameters

    Name
    Type

    DELETE v1/schemas/{id}

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

    The following endpoint allows you to delete a schema from your account.

    POST v1/site/map

    This endpoint creates a new site mapping request and returns the result.

    GET v1/browser/requests

    circle-info

    For more information on browser requests, .

    The following endpoint allows you to query for multiple browser requests, either by status or a list of particular ids, submitting a request with neither of these will return all requests for your account.

    "actions": [
        {
            "type": "capture_screenshot",
            "size": "view"
        }
    ]
    JSON Parsing
    universal parameters
    file-download
    13KB
    GaffaDOMSample.txt
    arrow-up-right-from-squareOpen

    selector

    string

    The selector arrow-up-rightthat defines the page element that the browser should click on.

    timeout

    integer

    The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

    universal parameters
    POST endpoint
    Waitchevron-right
    Printchevron-right
    file-pdf
    51KB
    GaffaPrintPdfExample.pdf
    PDF
    arrow-up-right-from-squareOpen

    selector

    string

    The selector arrow-up-rightthat defines the table whose contents you want to parse.

    timeout

    integer

    The maximum amount of time the browser should wait for the table defined by the selector to appear. Default: 5000 (5s)

    universal parameters
    Required
    Description

    timeout

    integer

    The maximum amount of time the browser should wait for a file to download. Default: 5,000 (5s)

    See universal parameters.

    hashtag
    Files Supported

    Currently this only works with the following file formats: .pdf, .jpg, .png, .gif, .bmp, .webp, .svg, .tiff, .tif, .img

    hashtag
    Usage

    hashtag
    Download a copy of a PDF open in the Browser

    The following waits 20s for a file to download and then returns it.

    And the service responds with the file being in the action output:

    The request below uses the POST endpoint to open the demo site on the article simulator, wait for the article to load and then generate a markdown from the page's content which you can download for use in your program.

    hashtag
    Actions

    hashtag
    Response

    Here's an example of the PDF returned by the request after waiting for the article to load.

    demo site.arrow-up-right
    Gaffa API Playgroundarrow-up-right
    Waitchevron-right
    Generate Markdownchevron-right
    file-download
    5KB
    GaffaMarkdownExample.md
    arrow-up-right-from-squareOpen
    Description

    size

    string

    The size of paper the page should be printed to. Default: A4 Accepted: ["A4"]

    margin

    integer

    The margin of the page in pixels when the page is printed to PDF. Default: 20

    orientation

    string

    Should execution of further actions continue or throw an error if this action fails. Default: portrait Accepted: ["portrait", "landscape"]

    continue_on_fail

    boolean

    Should execution of further actions continue or throw an error if this action fails. Default: true

    See universal parameters.

    hashtag
    Usage

    hashtag
    Print a page in landscape to PDF

    The following JSON prints the page to a PDF in landscape with margins of 20px.

    hashtag
    Example Output

    file-pdf
    51KB
    GaffaPrintPdfExample.pdf
    PDF
    arrow-up-right-from-squareOpen

    Removes all script nodes and links to scripts

  • Removes all style nodes

  • Remove style attributes from all elements

  • Remove all links to stylesheets

  • Remove all noscript elements outside of the body

  • Finds all hrefs with query strings and removes the query strings

  • Important meta tags are kept, all others are removed

  • Remove all alternate links

  • Remove all SVG paths

  • Remove empty text nodes and excessive spacing

  • hashtag
    Parameters

    See universal parameters.

    hashtag
    Usage

    The following JSON captures the DOM of the page and simplifies it.

    circle-info

    We are actively working to improve this and to make this process more configurable - let us know if there's something you think we can improve.

    hashtag
    Example Output

    file-download
    6KB
    GaffaSimplifiedDOMSample.txt
    arrow-up-right-from-squareOpen
    "actions": [
        {
          "type": "capture_dom"
        }
    ]
    "actions": [
        {
          "type": "click",
          "selector": "a.header__logo"
        }
    ]
    "actions": [
          {
            "type": "wait",
            "selector": "a.header__logo",
              "timeout": 5000,
              "continueOnFail": true
          }
    ]
    {
      "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }
    "actions": [
        {
          "type": "parse_table",
          "selector": ".large_table",
          "timeout": 1000
        }
    ]
    "actions": [
        {
            "type": "download_file",
            "timeout": 20000
        }
    ]
    "actions": [
          {
            "id": "act_VHhrUbXjZSaYCPTqbBYD4acCzzeFGH",
            "type": "download_file",
            "query": "download_file?continue_on_fail=false&timeout=20000",
            "timestamp": "2025-05-30T15:02:06.6615306Z",
            "output": "https://storage.gaffa.dev/brq/downloads/5845df07-3749-424e-9c64-9602be19a857.pdf"
          }
        ]
    {
      "url": "https://demo.gaffa.dev/simulate/article?loadTime=3&paragraphs=10&images=3",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "article"
          },
          {
            "type": "generate_markdown"
          }
        ]
      }
    }
    "actions": [
        {
            "type": "print",
            "page_size": "A4",
            "orientation": "landscape",
            "margin": 20
        }
    ]
    "actions": [
        {
            "type": "generate_simplified_dom"
        }
    ]
    hashtag
    Parameters
    Name
    Type
    Required
    Description

    selector

    string

    The that defines the element whose contents you want to capture.

    timeout

    integer

    The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

    See universal parameters.

    hashtag
    Usage

    hashtag
    Click an element on the page

    The following code will wait 1 second for the .page_contents element to appear and return an html file containg the div's innerHTML.

    contact supportarrow-up-right
    innerHTMLarrow-up-right
    Required
    Description

    time

    integer

    The time in milliseconds that the browser should wait.

    selector

    string

    The that defines the page element that the browser should wait to appear.

    timeout

    integer

    The maximum amount of time the browser should wait for the provided selector to appear. Default: 5,000 (5s)

    See universal parameters.

    hashtag
    Usage

    hashtag
    Wait for a particular amount of time

    The following code will wait 1 second and then continue with the next action, if provided.

    hashtag
    Wait for a particular element to appear

    The following code will wait for a table to appear on the page for a maximum of 5 seconds. If the table has not appeared after 5 seconds the next action will be executed, if provided.

    contact supportarrow-up-right
    see here

    Automated Form Filling

    An example request that uses Gaffa to automate the completion of a form and waits for a success modal to appear.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site.arrow-up-right You can run this request right now in the Gaffa API Playgroundarrow-up-right.

    Filling forms is tedious, Gaffa can be used to fill out a form in a human-like manner so you can spend time doing much more interesting things.

    hashtag
    API Request

    The request below uses the to open the demo site on the form simulator page with some sections pre-filled (for speed). After typing in the required information and clicking submit, Gaffa waits for the success dialog to show before returning a video of the interaction.

    hashtag
    Actions

    hashtag
    Response

    Here's a video showing Gaffa filling out the page and waiting for the success modal.

    hashtag
    Read More

    Read more about screen recording here (TODO).

    Type

    Type: type

    Request that the browser type a particular bit of text into a field.

    hashtag
    Parameters

    Name
    Type
    Required
    Description

    See .

    circle-info

    Sites that use more advanced bot detection often use keyboard events to detect unusual activity on their site, rather than immediately dropping all characters of the text into a field our platform types the text in a human-like manner.

    hashtag
    Usage

    hashtag
    Type into a text box

    The following action will type into a particular text field.

    hashtag
    Wait for an element to appear before typing

    The following code will wait a maximum of 10 seconds for the email input to appear in the field and then type in the provided email.

    Capture Snapshot

    Type: capture_snapshot

    This output type will return a HTML file which captures a static version of the page state. The page will load offline and can be saved to your local machine.

    This will:

    • Load and embed all images on the page.

    • Embed all css files

    Currently, Javascript will be disabled and interactivity might not worked as expected but this feature should be useful for preserving the page state as it was and allowing you to view it offline.

    hashtag
    Parameters

    See

    hashtag
    Usage

    The following captures the current section of the page currently visible in the browser.

    hashtag
    Example Output

    Here's an example that shows an offline snapshot of a site

    Get Started

    An introduction to the Gaffa Browser API. Learn how you can get started building fast, powerful web automations!

    Welcome to the Gaffa documentation site! You'll find everything you need here to get started using API including , you can use to interact with our cloud browsers and you can run right away in our API Playground.

    circle-info

    Gaffa is currently in it's very early stages, so we'd love to hear how we can improve our docs and API to make life easier for our users. If you have any questions or comments please or us . To stay up to date with latest developments, features and news on mission to support the development of revolutionary AI Agents, sign up to sporadic updates.

    Browser Requests

    Making web automation requests has never been so simple.

    Browser Requests allow you to send the Gaffa API a URL and a list of actions you want to be carried out, including any outputs you want from the page. We'll carry out the request on our cloud browsers and return you the response with no need to worry about proxies, IP rotation, web automation frameworks and scaling.

    There's absolutely zero configuration needed and you can interact with Gaffa from any program that can send web requests. We think it's by far the simplest way to automate simple web tasks and the good news is, we're just getting started and have much more planned.


    hashtag
    Example request

    Scroll

    Type: scroll

    Request that the browser scrolls to a certain point on the page or, in the case of pages with infinite scrolling, scrolls for a particular amount of time.

    hashtag
    Parameters

    Name
    Type

    Mapping Requests

    Mapping requests allow you to extract all urls from the sitemap of a website. Gaffa mapping requests have the following useful features:

    • Sitemap Discovery: No need to find the URL of a site's sitemap, we'll find it automatically.

    • Caching: If you or another Gaffa user has retrieved a sitemap within a defined timeframe we'll quickly return the cached data instead of having to fetch it all again.

    "actions": [
        {
          "type": "capture_element",
          "selector": ".page_contents",
          "timeout": 1000
        }
    ]
    "actions": [
          {
            "type": "wait",
            "time": 1000,
          }
    ]
    "actions": [
          {
            "type": "wait",
            "selector": "table",
            "timeout": 5000,
            "continueOnFail": true
          }
    ]
    selector arrow-up-right
    selector arrow-up-right

    selector

    string

    The selector arrow-up-rightthat defines the page element that the browser should click on.

    text

    string

    The text the browser should enter into the text field.

    timeout

    integer

    The maximum amount of time the browser should wait for the element that needs to be typed in to appear. Default: 5000 (5s)

    universal parameters
    universal parameters
    file-download
    518KB
    GaffaSnapshotSample.mhtml
    arrow-up-right-from-squareOpen
    Running a new browser request is as simple as sending the following POST body to our endpoint. Below, you can see the url (our demo sitearrow-up-right) and a list of actions which instruct Gaffa to wait for a table to load and print the page to PDF.
    circle-info

    You can read more about this particular example and how you can run it right now in our API Playground here


    hashtag
    Proxy servers

    circle-info

    In order to access public sites and use proxy servers you'll need to sign up for a paid accountarrow-up-right but after that you'll be able to build automations for any site you wish.

    Gaffa makes proxying your traffic through a global network of residential proxies super simple. Setting proxy_location in your request will allow you to utilize one of our partner third party proxy services to gain local access to a site.

    Not setting a proxy_location will mean the request does not use a proxy server and will use a generic datacenter IP.

    hashtag
    Available Locations

    Proxy Server Location
    Country Code

    United States

    us

    Ireland

    ie

    Singapore

    sg

    France

    fr

    circle-info

    At the moment all our servers are in one location but we aim to introduce local machines to our proxy locations for a more realistic end-user load times. If this would interest you please contact support.

    hashtag
    IP Types

    Currently all our IP addresses are residential IP addresses which are procured through reputable third parties.

    hashtag
    IP Rotation

    IP rotation is an essential part of any web data, scraping or automation task. In Gaffa, each browser request is treated as unique. We regularly rotate the IP addresses used so you should assume that each request will be carried out from a different IP address from the last.

    circle-info

    We are working to supporter a greater range of IP address scenarios, like static IPs in the future, as well as more trusted proxies for requests that require enhanced levels of security (logins etc.)

    hashtag
    Restrictions

    Whilst we'll do our best to provide access to as wide a range of sites as possible we may have to restrict access to certain sites to prevent abuse of our service or of other services. Our proxy partners may also enforce restrictions on certain sites and categories of sites which we don't have any control over.


    hashtag
    Caching

    max_cache_age: integer

    When we were building Gaffa we noticed that a lot of pre-existing scraping tools don't allow users to easily share their scraped web data with each other, despite many users requesting the same web pages on the same sites. Not only is this a waste of a user's allowance, it also puts a burden on the site owners who are serving the same data to different users for the same purpose. Because of this in Gaffa we have created a service-wide cache.

    hashtag
    How it works

    When making a browser request you can provide a max_cache_ageparameter which is a number in seconds equal or greater than 0. This values denotes the maximum age of data you would accept from the API. If another user of our service has requested the same URL with exactly the same parameters and actions as you in this chosen timeframe then the response will be returned to you immediately and the response will not be carried out on one of our browsers. If there are multiple identical requests in the given timeframe then the most recent will be returned. This will save you time waiting for the response, as well as credits, because requests returned from the cache don't use any bandwidth.


    hashtag
    Screen Recording

    record_request: boolean

    By specifying record_request you can ask Gaffa to screen record your automation and return a video in the response allowing you to view the magic happening or to debug your automation.

    Recording requests comes at an additional cost.


    hashtag
    Max Media Bandwidth

    max_media_bandwidth: integer

    If you are using Gaffa on a site with lots of images and videos and more interested in the text data on the page, you can cap how much data a page loads in MB using the max_media_bandwidth setting. This makes your automation faster and prevents spending credits on data you aren't interested in. With the max_media_bandwidth value set, Gaffa monitors data being downloaded by the page and when downloaded data exceeds the given number of MB, all further downloads of images or video will be cancelled. max_media_bandwidth defaults to null meaning downloads are not capped. Setting a value of 0 will cause no images to load which can work on some sites but on others this could lead to the site thinking you are using an ad blocker.


    hashtag
    Time Limit

    time_limit: integer

    Using the setting time_limit caps the maximum running time of the request in milliseconds. If this time expires all incomplete actions will be cancelled and the request will return an error. This cap has to be less than the maximum request running time dictated by your plan and if not set, will default to this value.


    hashtag
    Actions

    We currently support ten different types of actions which you can read more about here.


    hashtag
    Stealth

    We believe your AI Agents should be able to use the internet exactly how humans would. Gaffa can help you get access to sites with some of the most challenging anti-bot restrictions using a combination of proxies, human-like behavior, captcha solving and a custom browser implementation. We handle and maintain all of that so you can focus on building your solution!


    hashtag
    Examples

    We've created a number of sample browser requests you can read about here or you can jump straight into the API Playgroundarrow-up-right to start running them right now.


    hashtag
    API Endpoints

    Check out our API reference for more details about the endpoints available, particularly those you can use to query for past requests by id or status.

    Index Traversal: If the sitemap references other sitemap files we'll automatically process each one of those and add them to the list of urls emsuring the whole hierachy is captured.

  • Aggregation and Duplicate Prevention: In the rare cases that there are duplicate entries in the sitemap we'll automatically remove them for you and return all URLs sorted alphabetically.

  • Proxies: Gaffa uses it's residential proxies behind the scenes to ensure your requests to retrieve sitemaps aren't blocked.

  • hashtag
    Example Request

    The POST v1/site/map endpoint allows you to create a new request and await the result. It's a request with a simple payload containing the URL of the site you want to extract the sitemap of and a max_cache_age in milliseconds of a response you would accept returned from the cache, the default is 0 and Gaffa will never return a cached response if used.

    circle-info

    The request currently has a maximum running time of 60 seconds after which an error will be returned.

    For the Gaffa site this will return the following response:

    As you'll see from the API Reference section of the site there are also requests to retrieve site mapping requests for your account.

    hashtag
    Pricing

    See the Credits and Pricing page for the current cost of mapping requests.

    "actions": [
          {
                "name": "type",
                "selector": "#postform-text",
                "text": "Hello world!"
          }
    ]
    "actions": [
          {
             "name": "type",
             "selector": "form input[name="email"]",
             "text": "test@test.com"
             "timeout": 10000
          }
    ]
    "actions": [
        {
            "type": "capture_snapshot",
        }
    ]
    {
      "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "max_media_bandwidth": null,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }
    {
      "url": "https://gaffa.dev",
      "max_cache_age": 10000
    }
    {
      "data": {
        "id": "smr_VQW4E66TdcQFZfCs6qavgdowPj3Bzk",
        "url": "https://gaffa.dev",
        "state": "completed",
        "credit_usage": 1,
        "from_cache": true,
        "started_at": "2025-08-22T11:05:43.328175Z",
        "completed_at": "2025-08-22T11:05:47.857941Z",
        "running_time": "00:00:04.5297660",
        "links": [
          "https://gaffa.dev",
          "https://gaffa.dev/about",
          "https://gaffa.dev/blog",
          "https://gaffa.dev/blog/convert-any-web-page-to-llm-ready-markdown-using-gaffa",
          "https://gaffa.dev/blog/how-to-extract-and-simplify-a-webpage-dom-with-gaffa",
          "https://gaffa.dev/blog/printing-webpages-to-pdf-html-to-pdf-using-gaffa",
          "https://gaffa.dev/docs",
          "https://gaffa.dev/docs/api-reference/api-authentication",
          ....and so on
        ],
        "link_count": 52
      }
    }
    1

    hashtag
    Create an account

    You can sign up to create a Gaffa account herearrow-up-right. After signing up you'll immediately be able to use the API to start using our API Playgroundarrow-up-right which has a number of pre-built automations for our demo site arrow-up-rightsimulating a range of scenarios.

    hashtag
    Accessing the open web

    When you're ready to use Gaffa on the open web you'll need to choose a plan suitable for your needs and pay at which point the full internet will be available for you to automate.

    triangle-exclamation

    In order to avoid scaling issues for our existing customers we are currently operating a queuing system for new accounts. Simply join the queue when prompted on your and we'll let you know when you have access. If you want to jump the queue, you can fill out a short survey to help us better understand our users and we'll approve your account sooner!

    2

    hashtag
    Making your first browser request

    The easiest way to make your first Gaffa browser request is to start using our API Playgroundarrow-up-right where you can see several pre-made and interactive browser request examples of automations we've built against our test site which simulates some common scraping and web automation scenarios. You can run these examples without a paid account and also edit them easily to experiment - once you have a paid account you can also use the playground to build your automations for other sites.

    hashtag
    Gaffa API Playground examples

    Here are all the sample requests we've created for use in the API Playground.

    3

    hashtag
    Building your own browser requests

    Once you have a paid account and are ready to start building your own browser requests you'll want to read about all the other actions you can use for your solution as well as how you can easily use proxy servers, our cache as well as the other endpoints that are part of the API.

    interactive API definitions
    a comprehensive list of actions
    breakdowns of our example requests
    email usarrow-up-right
    the support tool on our sitearrow-up-right
    newsletterarrow-up-right
    Required
    Description

    percentage

    integer

    The percentage the page should scroll up or down (+/-) Range: [-100 - 0 - 100] Default: 100 (% - scroll to bottom)

    wait_time

    integer

    After arriving at the desired scroll location this the time Gaffa should monitor for changes to the page height before marking the action as succeeded. Read more . Default: 0

    max_scroll_time

    integer

    The maximum amount of time the page should be scrolled for, in milliseconds. After this time passes, the action will be cancelled. This doesn't cause the action to fail. Default: 20,000 (20s)

    scroll_speed

    string

    The speed which the page should scroll to the desired point. You can read more about this . Default: medium Accepted: [slow, medium, instant]

    interval

    See universal parameters.

    hashtag
    Scroll Speed & Interval

    Gaffa gives you a flexibility about how fast you scroll down the page which can be really useful to get around restrictions enforced by some sites which detect and limit fast scrolling. By experimenting with scroll_speed and interval you will be able to create the perfect scrolling action for your scenario. The speed settings are as follows:

    • instant- the page will smoothly scroll to the desired position immediately, useful for sites with no rate limits or loading events caused by scroll actions.

    • medium - human-like scrolling at a normal speed to the desired position. Gaffa will scroll in much the same way as you would using a mouse.

    • slow- human-like scrolling at a very slow speed to the desired position. The speed is comparable to scrolling whilst reading a page.

    intervalallows you to adjust the scroll speed further by inserting pauses between scroll events.

    circle-info

    We've found some sites with infinite scrolling and strict rate limits respond better to immediate speed scroll events to the bottom of the page with large intervalsbetween these scrolls to keep within rate limits.

    hashtag
    Wait Time

    If wait_time is set to 0 and Gaffa arrives at the desired location then Gaffa will immediately mark the action as succeeded. However, if another value is set then the page will be monitored for the desired amount of time to check for further expansions. If, during this period, the page expands again then Gaffa will continue scrolling to the desired location and the wait will reset.

    circle-info

    This can be really useful if you find that the site takes some time to load more items when you reach the bottom of the page and more will be loaded after the action has suceeded.

    hashtag
    Usage

    hashtag
    Scroll a particular percentage down the page

    The following code will scroll half way down the page.

    hashtag
    Scroll an infinitely scrolling webpage

    The following code will scroll to the bottom of the page and then keep scrolling when new content loads for a maximum of 25 seconds, waiting 1 second for new content and scrolling at a slow pace with 1 second between scroll actions.

    hashtag
    Read more

    POST endpoint
    Typechevron-right
    Clickchevron-right
    Waitchevron-right

    Actions

    When making a Browser Request you can specify a list of actions you wish for us to carry out on the requested web page. These actions conform to the following format:

    hashtag
    Universal Parameters

    All actions have the following parameters:

    Name
    Type
    Required
    Description

    hashtag
    Action Execution

    Actions are carried out in the order they are submitted. Every action type has a continue_on_fail parameter which defaults to false, this means that if any action fails the execution of the browser request ends and an error will be returned. Setting continue_on_fail to true ensures that all actions are carried out, regardless of previous action results and an error will not be returned.

    hashtag
    Custom Id

    As shown above, you can submit a customId with each action you submit to the API. We'll include this Id in the outputs from the browser request so you can find a certain action's output and/or status easily in the response.

    hashtag
    Response Format

    When a browser request has completed, information on an action's execution

    hashtag
    Supported Actions

    The Gaffa API supports the following actions detailed below. Click the "read more" buttons to read more information about each type.

    hashtag
    Actions without outputs

    hashtag
    Actions with outputs

    Capture a Full Height Screenshot

    An example request that uses Gaffa to dismiss a modal, scroll to the bottom of a page and then capture a full height screenshot.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site.arrow-up-right You can run this request right now in the Gaffa API Playgroundarrow-up-right.

    Gaffa can also capture screenshots at any point during your interaction for use in your app or just to work out exactly was being shown at a given point in time. You can capture just what is shown as if you were looking at the screen or the full height of the page.

    hashtag
    API Request

    The request below uses the to open the demo site on the ecommerce page with 20 items, wait for and dismiss the dialog, scroll to the bottom of the page and capture a full height screenshot.

    hashtag
    Actions

    hashtag
    Response

    The export full height screenshot of the page showing all items.

    "actions": [
          {
            "type": "scroll",
            "percentage": 50,
          }
    ]
    "actions": [
          {
            "type": "scroll",
            "percentage": 100,
            "scroll_speed": "slow",
            "max_scroll_time": 25000,
            "interval": 1000,
            "wait_time": 1000
          }
    ]
    {
      "url": "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=false&modalDelay=0&formType=address&firstName=John&lastName=Doe&address1=123%20Main%20Street&city=London&country=UK",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": true,
        "actions": [
          {
            "type": "type",
            "selector": "#email",
            "text": "johndoe@example.com"
          },
          {
            "type": "type",
            "selector": "#state",
            "text": "CA"
          },
          {
            "type": "type",
            "selector": "#zipCode",
            "text": "12345"
          },
          {
            "type": "click",
            "selector": "button[type='submit']"
          },
          {
            "type": "wait",
            "selector": "[role=\"dialog\"] h2:has-text(\"Success!\")",
            "timeout": 10000
          }
        ]
      }
    }
    {
        "type": "", //the type of the action
        //other params follow as key value pairs
        "key": value //string, number etc. 
    }
    account dashboardarrow-up-right

    Print to PDF

    Export a web page to PDF and wait for elements to load with the Gaffa API.

    Convert to Markdown

    Export a web page to markdown format - useful feeding into LLM apps.

    Infinitely Scroll

    Scroll the bottom of a page that infinitely loads items and record the interaction.

    Capture Screenshot

    Interact with a page and capture the a screenshot of the whole page.

    Form Completion

    Fill out a form in a human-like way and record the interaction

    integer

    The amount of time, in milliseconds, that scrolling should pause between scroll events. Read more about this below. Default: 0

    timeout

    integer

    The maximum amount of time Gaffa will wait for the page to become scrollable Default: 0

    How to Handle Infinite Scrolling and Dynamic Loading with Gaffa’s Scroll Action

    below
    below
    Cover
    POST endpoint
    Waitchevron-right
    Clickchevron-right
    Scrollchevron-right
    Capture Screenshotchevron-right
    Gaffa's full height screenshot
    {
      "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "div[role=\"dialog\"]",
            "timeout": 10000
          },
          {
            "type": "click",
            "selector": "[data-testid=\"accept-all-button\"]"
          },
          {
            "type": "wait",
            "selector": "[data-testid^=\"product-1\"]",
            "timeout": 5000
          },
          {
            "type": "scroll",
            "percentage": 100
          },
          {
            "type": "capture_screenshot",
            "size": "fullscreen"
          }
        ]
      }
    }

    type

    string

    The type name of the action.

    continue_on_fail

    boolean

    Should execution of further actions continue or throw an error if this action fails. Default: false

    customId

    string

    A customId to help you find the action in the response. Default: null

    Type

    click

    Description

    Click on a given element

    Read More

    Type

    scroll

    Description

    Scroll to a particular point on the page or, in the case of pages with infinite scrolling, scroll until a given time has elapsed.

    Read More

    Type

    type

    Description

    Type the provided text into a given element

    Read More

    Type

    wait

    Description

    Wait for a given time to elapse or an element to appear on page before proceeding to the next action.

    Read More

    Type

    capture_cookies

    Description

    Save a JSON object of cookies for the current page

    Read More

    Type

    capture_dom

    Description

    Export the raw DOM page data

    Read More

    Type

    capture_screenshot

    Description

    Capture a screenshot of the web page

    Read More

    Type

    capture_snapshot

    Description

    Create a completely static version of the web page which can be accessed offline

    Read More

    Type

    download_file

    Description

    Download an online file using Gaffa

    Read More

    Type

    generate_markdown

    Description

    Convert the page into markdown

    Read More

    Type

    generate_simplified_dom

    Description

    Generate a simplified version of the DOM

    Read More

    Type

    parse_json

    Description

    Parse online data to a defined JSON schema

    Read More

    Type

    print

    Description

    Print the web page to a PDF

    Read More

    Introduction

    What is Gaffa?

    Gaffa is a powerful API for browser automation which allows you to control real web browsers at scale through a simple interface with no configuration necessary. We'll handle the complexities of managing infrastructure like virtual machines, proxies and caching so you can focus on building powerful and reliable web automation and AI applications!

    hashtag
    Key features

    Gaffa is ready to power your web automations:

    • Simplicity - there's no need to learn another new framework, Gaffa is accessible through a simple REST API - just tell it what site you want to visit and what actions you want to perform and it will be carried out as soon as you send the request.

    • Real browsers - headless browsers are popular but we make it simple to control real cloud-hosted browsers at scale which render JavaScript sites exactly as they would on a local machine, are harder to detect when doing scraping and allow full observability. We're also planning to allow you to go beyond just being able to control web browsers!

    • Proxies - you can easily choose to route your traffic through a network of residential proxy IP addresses to help avoid bot-detection on sites you are trying to automate.

    • Scalable - whether you want to control a single cloud browser or 100s in parallel with Gaffa you can do that easily without one thought about infrastructure management.

    • Powerful data processing - once you've accessed your desired site you can export your data in a constantly growing number of formats. If you want the to feed into a large language model or to feed into a vision modal we can help.

    hashtag
    Ready to work with Gaffa?

    hashtag
    Stay up to date

    We'll be sporadically announcing updates and new features in our newsletter - .

    Capture a full-height screenshot of a webpage

    In just a few lines of JSON inlined in a single cURL command, you can automate:

    • Dismissing Wikipedia’s EU cookie consent banner (if present)

    • Waiting for the main heading on the Artificial Intelligence article

    {
        "id": "", //a unique id given to the action by Gaffa
        "type": "capture_screenshot", //the type of the action
        "query": "", //a representation of the action in querystring format
        "timestamp": "", //the UTC timestamp the action was executed
        "output": "" //if the action has an output you will find a url for this here,
        "error": "" //if the requesst fails the error message will be returned here
    }

    API Playground

    Start experimenting with the Gaffa API right now.

    Get Started

    The simple steps to get you started using Gaffa in your apps.

    API Reference

    Explore the API and docs for the finer details

    page content in markdown
    an image
    Get Startedchevron-right
    sign up herearrow-up-right
    Click
    Scroll
    Type
    Wait
    Capture Cookies
    DOM
    Screenshot
    Snapshot
    Download File
    Markdown
    Simplified DOM
    JSON Parsing
    Print
    Scrolling through every section (lazy-loaded images and all)
  • Capturing a full-page PNG for archiving, visual regression, or documentation

  • All without installing Playwright or managing headless browsers, Gaffa handles it for you server-side via the Browser Requests APIarrow-up-right.

    hashtag
    Prerequisites

    • A valid Gaffa API key

    • A simple HTTP client (cURL, Postman, axios, etc.).

    • Familiarity with the API Playgroundarrow-up-right for testing browser requests.

    • Target URL for this tutorial, for this we'll use wikipedia:

    1

    hashtag
    Execute the Request

    Use cURL with the full JSON payload inlined to ensure Gaffa receives exactly what you intend:

    Replace YOUR_API_KEY with your actual token from your Dashboard.arrow-up-right This command has the following actions:

    1. Wait (optional): Detect and accept Wikipedia’s cookie banner if it appears. If it fails, that simply means no banner was present or it did not load in time. Since continue_on_fail defaults to true, Gaffa will move on without halting the workflow, ensuring the rest of the steps still execute.

    2. Wait: Ensure the main heading (#firstHeading) is loaded.

    3. Scroll: Scroll through the entire page to trigger any lazy-loaded content.

    4. Capture Screenshot: Produce a full-page PNG.

    2

    hashtag
    Retrieve Your Screenshot

    A successful response returns JSON like:

    The response contains the following information:

    If you don't want to use cURL, you can also run this query in the Gaffa API Playgroundarrow-up-right which is an easy way to get started.

    hashtag
    Use Cases

    Gaffa's screenshot action could be used for a huge number of use cases, but here are a few ideas:

    • Visual Regression: Integrate into your CI pipeline to compare changes over time.

    • Archival: Schedule daily captures for audit or compliance purposes.

    • Monitoring: Automate periodic checks to detect visual bugs or layout shifts.

    hashtag
    All this is powered by Gaffa’s hosted headless browsers with no local setup required. Experiment with more actions and build complex browser workflows easily. Refer to the full Browser Requests API documentationarrow-up-right for additional capabilities.

    How to scrape all images from a website using Gaffa

    This tutorial will show you how you can use Gaffa to retrieve all images from a site and then download all images across those pages.

    Automating the collection of images from a website can save hours of manual work. Whether you're a marketer building a competitor analysis, a developer creating a dataset, or an archiver preserving digital content, doing this manually is tedious and error-prone.

    In this tutorial, you'll learn how to use Gaffa's powerful Mapping and Browser Requests endpoints to automatically find, extract, and download every image from a website in a short Python script. We'll leverage features like the capture_dom action, intelligent sitemap parsing, and the download_file action to handle this efficiently and responsibly.

    By the end of this guide, you'll be able to:

    • Use Gaffa's endpoint to discover every page on a site.

    • Render each page with a headless browser to capture its full DOM.

    • Parse and download all images using Gaffa's action with residential proxies

    • Run the process at scale with built-in proxy rotation and caching.

    hashtag
    Prerequisites

    • Python 3.10+ installed on your machine.

    • A Gaffa API key. and get your API key from the dashboard.

    • Basic familiarity with the command line.

    1

    hashtag
    Set Up Your Environment

    First, create a new project directory and install the required Python libraries.

    Next, set your Gaffa API key as an environment variable to keep it secure.

    2

    hashtag
    Why This Gaffa-Powered Approach is Superior

    • Handles JavaScript-Rendered Content: Unlike simple HTTP scrapers, Gaffa uses a real browser, so it captures anything that is lazy-loaded by JavaScript.

    • Stealth Downloading with Residential Proxies: The download_file action uses real browsers and proxies, making your requests appear as legitimate user traffic.

    • Intelligent Caching: With `max_cache_age` set to 24 hours, repeated requests for the same image are served from cache, reducing load on target servers and improving efficiency.

    hashtag
    Use Cases and Ideas

    This technique is useful for far more than just downloading pictures. Here are a few ideas:

    • Competitive Analysis: Analyze the product photography styles of competitors using real browsers.

    • AI/ML Datasets: Build large, curated image datasets for training computer vision models with ethically-sourced images.

    • Website Migration & Audits: Download all assets from an old site before a migration while minimizing server impact through caching.

    hashtag
    Next Steps

    The full script is available on our .

    Ready to automate your image collection with enterprise-grade infrastructure? and start building today.

    curl https://api.gaffa.dev/v1/browser/requests \
      --request POST \
      --header 'Content-Type: application/json' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --data '{
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "async": false,
        "max_cache_age": 0,
        "settings": {
          "actions": [
            {
              "type": "wait",
              "selector": "#cookie-policy-notice",
              "timeout": 10000,
              "continue_on_fail": true
            },
            {
              "type": "click",
              "selector": "#cookie-policy-notice",
              "continue_on_fail": true
            },
            {
              "type": "wait",
              "selector": "#firstHeading",
              "timeout": 10000
            },
            {
              "type": "scroll",
              "percentage": 100
            },
            {
              "type": "capture_screenshot",
              "size": "fullscreen"
            }
          ]
        }
      }'
    data.id: Unique request identifier.
  • data.state: "completed" means the workflow finished (even if some steps timed out).

  • data.credit_usage: Credits consumed for this run.

  • data.started_at / data.completed_at: Workflow timing.

  • data.running_time and data.page_load_time: Performance metrics.

  • data.actions: Each action’s details, including successes, timeouts, and final screenshot URL.

  • Within the list of actions you'll be able to see the capture_screenshot action which contains an output parameter containing the full size screenshot that was captured.

    https://en.wikipedia.org/wiki/Artificial_intelligencearrow-up-right
    {
      "data": {
        "id": "brq_VJX3mbESLiyCFYvZQEUih9RdDYovog",
        "url": "https://en.wikipedia.org/wiki/Artificial_intelligence",
        "proxy_location": null,
        "state": "completed",
        "credit_usage": 2,
        "http_status_code": 200,
        "from_cache": false,
        "started_at": "2025-06-09T15:55:46.4235903Z",
        "completed_at": "2025-06-09T15:56:27.9381332Z",
        "running_time": "00:00:40.7348244",
        "page_load_time": "00:00:02.2087117",
        "actions": [
          {
            "id": "act_VJX3memaue6YUgFcn44uNscZbVUpYg",
            "type": "wait",
            "query": "wait?selector=%23cookie-policy-notice%2C%20.mw-cookie-consent-container&timeout=10000&continue_on_fail=true",
            "timestamp": "2025-06-09T15:55:48.6323091Z",
            "error": "action_timed_out"
          },
          {
            "id": "act_VJX3mkwfwNPdGiMUpqKr34Tm5xzyUU",
            "type": "click",
            "query": "click?selector=%23cookie-policy-notice%20button%2C%20.mw-cookie-consent-container%20button&continue_on_fail=true&timeout=5000",
            "timestamp": "2025-06-09T15:55:58.7949275Z",
            "error": "action_timed_out"
          },
          {
            "id": "act_VJX3mkSJ3sevWRXUCjFy6zwfD172fV",
            "type": "wait",
            "query": "wait?selector=%23firstHeading&timeout=10000&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:03.9581113Z"
          },
          {
            "id": "act_VJX3mbq9Jgj8EwADszW2AqdeJJXJiY",
            "type": "scroll",
            "query": "scroll?percentage=100&max_scroll_time=20000&scroll_speed=medium&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:03.9691994Z"
          },
          {
            "id": "act_VJX3mjBQYv8zTsXv1SkgUnBkzNFmJU",
            "type": "capture_screenshot",
            "query": "capture_screenshot?size=fullscreen&continue_on_fail=false",
            "timestamp": "2025-06-09T15:56:20.0727905Z",
            "output": "https://storage.gaffa.dev/brq/image/brq_VJX3mbESLiyCFYvZQEUih9RdDYovog/act_VJX3mjBQYv8zTsXv1SkgUnBkzNFmJU_full.png"
          }
        ]
      },
      "error": null
    }

    hashtag
    The Core Script Explained

    Let's build the script step-by-step. The core logic involves three main parts: mapping the site, capturing the DOM of each page, and extracting the images using Gaffa's download system.

    Fetch All URLs from the Sitemap

    The site/map endpoint is our starting point. It does the heavy lifting of discovery by reading the sitemap, traversing possible link-outs and retrieving every page available on the website you want to scrape.

    Capture the Rendered DOM of a Page

    For each URL, we use Gaffa to fully render the page (executing JavaScript) and capture the final DOM. This is an important step since many websites are actually not fully rendered when we receive them. They contain links to JavaScript files that need to be executed first. These scripts will load further content from the backend, load images and other data. It’s necessary to first generate a fully rendered page before actually diving deeper into the scraping of it, otherwise we would only scrape the content that was already provided with the initial HTML.

    Extract Images and Download with Gaffa

    With the real HTML in hand, we extract image URLs using a simple regex pattern and use Gaffa's action for secure, reliable downloads. This also allows us to use caching, which avoids downloading the same image over and over again and putting load onto the target server.

    3

    hashtag
    Bringing It All Together

    The main() function orchestrates the entire workflow: mapping the site, processing each page, and downloading the images using Gaffa's infrastructure.

    4

    hashtag
    Run the Script

    Save the complete code to a file like gaffa_scrape_images.py and run it from your terminal:

    Sit back and watch as Gaffa automatically discovers, renders, and scrapes every image from the site using proxies and real browsers. The script will create timestamped folders and save all the images there.

    Built-in Reliability: Gaffa's infrastructure handles proxy rotation, request pacing, retries automatically and provides the correct file format directly.

  • Respectful Scraping: Gaffa's infrastructure is designed for responsible automation. Always check a website's robots.txt and terms of service before scraping, and respect reasonable rate limits.

  • Archival & Documentation: Preserve visual evidence for journalism or create backups of a site's visual content using proxies for access.

    site/map
    download_file
    Sign up for a free accountarrow-up-right
    GitHub repositoryarrow-up-right
    Sign up for Gaffaarrow-up-right
    def get_sitemap_urls(site_url, max_cache_age=86400):
        payload = {
            "url": site_url,
            "max_cache_age": max_cache_age
        }
        print("Retrieving sitemap URLs.")
        response = requests.post("https://api.gaffa.dev/v1/site/map", 
            json=payload, headers=HEADERS)
        return response.json()["data"]["links"]
    def main():
        site_url = "https://gaffa.dev"
        sitemap_urls = get_sitemap_urls(site_url)[:3]
        
        for i, url in enumerate(sitemap_urls, 1):
            dom_content = get_dom(url)
            image_urls = extract_image_urls(dom_content, url)
            
            if image_urls:
                download_image(image_urls[0], f"image_{i}")
    
    if __name__ == "__main__":
        main()
    python3 gaffa_scrape_images.py
    # Create a new directory and navigate into it
    mkdir gaffa-image-scraper && cd gaffa-image-scraper
    
    # Create a virtual environment (optional but recommended)
    python -m venv venv
    source venv/bin/activate
    # On macOS/Linux
    export GAFFA_API_KEY='your_gaffa_api_key_here'
    download_file
    def get_dom(url):
        payload = {
            "url": url,
            "async": False,
            "settings": {
                "actions": [
                    {"type": "wait", "selector": "img", "timeout": 20000},
                    {"type": "capture_dom"}
                ],
                "time_limit": 40000
            }
        }
        print("Capturing DOM URL.")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", 
            json=payload, headers=HEADERS)
        dom_url = response.json()["data"]["actions"][1]["output"]
        print("Retrieving DOM.")
        dom_response = requests.get(dom_url)
        return dom_response.text
    def extract_image_urls(dom_content, base_url):
        image_urls = []
        src_pattern = r'<img[^>]+(?:src|data-src)=["\']([^"\']+)["\']'
        matches = re.findall(src_pattern, dom_content)
        
        for src in matches:
            if not src.startswith(('http:', 'https:')):
                src = urljoin(base_url, src)
            image_urls.append(src)
        
        return image_urls
    
    def download_image(image_url, filename):
        payload = {
            "url": image_url,
            "async": False,
            "settings": {
                "actions": [{"type": "download_file"}]
            }
        }
        print("Retrieving download URL.")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", json=payload, headers=HEADERS)
        actions = response.json()["data"]["actions"]
        download_url = actions[0]["output"]
        download_ext = os.path.splitext(download_url)[1]
        
        print("Downloading image.")
        img_response = requests.get(download_url)
        filepath = f"{filename}{download_ext}"
        with open(filepath, 'wb') as f:
            f.write(img_response.content)
    

    Convert any webpage into LLM-ready Markdown using Gaffa

    The ability to convert websites into LLM-friendly markdown is powerful when building applications for summarization, Q&A, or knowledge extraction. In this guide, you'll learn how to use the Gaffa APIarrow-up-right to extract the main content of any web page using browser rendering and convert it into structured markdown.

    By the end of this guide, you’ll be able to:

    • Render web pages using Gaffa’s API.

    • Extract clean page content.

    • Generate structured markdown suitable for LLM-based Q&A or summarization.

    hashtag
    Prerequistes

    1. Install Python 3.10 or newer.

    2. Create a virtual environment

    1. Install the required libraries

    1. Get your key and key, and store them as environment variables:

    hashtag
    Convert a webpage to Markdown

    In the code below, we define a function that takes a URL as input, makes a POST request to the Gaffa API, invoking the action, which uses the browser rendering engine to extract the main content of the page and convert it into markdown.

    hashtag
    Ask questions using OpenAI

    Now that we have the markdown content, we can ask questions about it using the OpenAI API. The function below takes the markdown content and a question as input and uses the OpenAI API to generate a summary based on the provided content. In this case, we are using the model, but you can choose any other model.

    The markdown becomes the model’s context, enabling accurate answers about the original web content.

    hashtag
    User Interaction and Execution

    Having defined the functions, we can now create a simple command-line interface that allows users to input a URL and ask questions about the content.

    hashtag
    Full Script

    The full script is available to download from the .

    hashtag
    Running the Script

    To run the script, simply execute it in your terminal:

    With your script running, you can enter any URL of any web page, and the script will fetch the markdown content and allow you to ask questions about it.

    Gaffa can help automatically fill out your forms!
    Gaffa APIarrow-up-right
    OpenAI APIarrow-up-right
    generate_markdown
    gpt-3.5-turboarrow-up-right
    Gaffa Python Examples GitHub repoarrow-up-right
    python -m venv venv && source venv/bin/activate
    pip install requests openai
    GAFFA_API_KEY=your_gaffa_api_key
    OPENAI_API_KEY=your_openai_api_key
    import requests
    import openai
    
    GAFFA_API_KEY = os.getenv("GAFFA_API_KEY")
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    
    # Fetch the markdown content from Gaffa
    def fetch_markdown_with_gaffa(url):
        payload = {
            "url": url,
            "proxy_location": None,
            "async": False,
            "max_cache_age": 0,
            "settings": {
                "record_request": False,
                "actions": [
                    {
                        "type": "wait",
                        "selector": "article"
                    },
                    {
                        "type": "generate_markdown"
                    }
                ]
            }
        }
       
        # Set the headers for the request
        headers = {
            "x-api-key": GAFFA_API_KEY,
            "Content-Type": "application/json"
        }
        # Make the POST request to the Gaffa API
        print("Calling Gaffa API to generate markdown...")
        response = requests.post("https://api.gaffa.dev/v1/browser/requests", json=payload, headers=headers)
        response.raise_for_status()
       
        # Extract the markdown URL from the response
        markdown_url = response.json()["data"]["actions"][1]["output"]
       
        # Fetch the markdown content from the generated URL
        print(f"📥 Fetching markdown from: {markdown_url}")
        markdown_response = requests.get(markdown_url)
        markdown_response.raise_for_status()
       
        return markdown_response.text
    def ask_question(markdown, question):
        openai.api_key = OPENAI_API_KEY
        prompt = (
            f"You are an assistant helping analyze different webpages.\n\n"
            f"Markdown content:\n{markdown[:3000]}\n\n"
            f"Question: {question}\nAnswer as clearly as possible."
        )
    
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message["content"]
    def main():
        url = input("Enter the URL of the article: ")
        try:
            markdown = fetch_markdown_with_gaffa(url)
            print("\n✅ Markdown successfully retrieved from Gaffa.\n")
    
            while True:
                question = input("Ask a question about the content (or type 'exit'): ")
                if question.lower() == "exit":
                    break
                answer = ask_question(markdown, question)
                print(f"\n💬 Answer: {answer}\n")
    
        except Exception as e:
            print(f"⚠️ Error: {e}")
    
     if __name__ == "__main__":
        main()
    python your_script_name.py

    Parse JSON

    circle-info

    Paid Action: This action will consume credits based on the amount of content being parsed, see more below.

    triangle-exclamation

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact supportarrow-up-right and we can enable this feature for your account.

    Type: parse_json

    The parse_json action extracts data from web pages and online PDFs. It uses AI to parse web content from text into a pre-defined data schema and return it as a JSON object.

    The action allows you to convert unstructured content such as academic papers, forms, and webpages into JSON objects, which you can use in automations, analysis, or further processing.

    This feature currently works for online PDFs and web page text.

    hashtag
    Parameters

    Name
    Type
    Required
    Description

    See .

    hashtag
    Defining Data Schemas

    A data schema tells the model exactly what JSON structure to produce.

    You can define schemas in two ways:

    • Inline schemas (defined directly inside the action)

    • Reusable schemas (created via the Schema API and referenced by ID in your requests)

    hashtag
    Schema Structure

    A schema has:

    Property
    Type
    Description

    Each field in the fields array has:

    hashtag
    Supported Field Types

    Type
    Description

    hashtag
    Inline Schema Example

    This example shows:

    • Simple fields (string, datetime) for basic data

    • Object fields for grouped related data with nested fields

    hashtag
    Schema Operations

    Instead of defining schemas inline every time, they can be saved to your Gaffa account and be reused across multiple requests. This makes your actions more readable, easier to maintain, and ensures consistency when parsing similar content.

    hashtag
    Creating a Saved Schema

    Use the endpoint to create a reusable schema:

    Response:

    Save the id returned in the response, you'll use this to reference the schema in your requests

    hashtag
    Managing Schemas

    hashtag
    List all schemas:

    Allows you to view all schemas saved to your account:

    Endpoint:

    hashtag
    Update a schema:

    Allows you to modify an existing schema by its ID:

    Endpoint:

    hashtag
    Delete a schema:

    Removes a schema from your account:

    Endpoint:

    hashtag
    Common Schema Patterns

    Simple List Extraction

    Nested Objects

    hashtag
    Pricing

    The credits this action uses depends on the model used. Here are the current supported models and their pricing:

    Model
    Input Token Cost
    Output Token Cost

    model

    string`

    The AI model you wish to use to parse the content into JSON. Default: gpt-4o-mini Accepted: ["gpt-4o-mini"]

    input_token_cap

    int

    The max number of source input tokens that will be passed to the AI model to parse. This can be used to prevent unnecessary credit usage. If your source input is longer than the token cap, it will be abbreviated. Default: 1,000,000

    selector

    string

    The that defines an element you want to parse the content of - this is useful if you are only interested in the contents of a certain element.

    output_type

    string

    Should the action output be saved to a file where a URL will be returned or should the parsed JSON object be included directly in the request. Default: file Accepted: ["file", "inline"]

    max_pages

    int

    If you are parsing a PDF you can specify this parameter to limit the number of pages that are passed to the LLM. Default: no limit

    object

    Nested structured object

    string

    Text value

    Array fields
    for lists of items with nested
    fields
    defining each item's structure

    data_schema_id

    string

    The id of the data schema you have defined that you want to transform the content into. You must provide a data_schema or data_schema_id with your request.

    data_schema

    json

    A JSON object describing the data_schema you want to transform the content into.

    You must provide a data_schema or data_schema_id with your request.

    instruction

    string

    description

    string

    Explains what data the schema extracts and provides context to help the AI model understand the extraction goal. Example: "Extract product details from this e-commerce product page"

    fields

    array

    Each field defines a piece of data to extract from the content. See field properties below.

    name

    string

    This identifies the schema and should clearly indicate what data it extracts. Example: "ProductInfo", "ArticleMetadata", "ContactForm"

    descripton

    string

    Include details about format, handling of missing values, or special cases.

    Example: "Maximum salary in GBP. If only one value is provided, use the same value for both min and max. Return null if not provided."

    fields

    array

    Required only for object and array types.

    name

    string

    Use clear, descriptive names that follow your preferred naming convention (e.g., snake_case or camelCase). Example: "product_name", "published_date", "author_email"

    type

    string

    Determines how the AI interprets and structures the extracted data. Must be one of the supported types below.

    array

    List of items

    boolean

    True/False

    datetime

    timestamp

    decimal

    Precise decimal

    double

    Floating-point number

    integer

    Whole number

    gpt-4o-mini

    1 credit per 10,000 input tokens

    4 credits per 10,000 output tokens

    universal parameters
    POST /v1/schemasarrow-up-right
    GET /v1/schemasarrow-up-right
    PUT /v1/schemasarrow-up-right
    DELETE /v1/schemas/:idarrow-up-right

    A custom instruction, in addition to any detail you have added to the data schema, that you want to include with this particular parse.

    Infinitely Scroll an Ecommerce Site

    An example request that uses Gaffa to infinitely scroll down a simulated ecommerce site whilst recording the interaction.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .

    Gaffa automates infinite scrolling on dynamic pages like e-commerce storefronts. Set a duration, and Gaffa will capture all content as it scrolls. Each session can be recorded as a video for playback, letting you debug or review the interaction.

    hashtag
    API Request

    {
      "type": "parse_json",
      "data_schema": {
        "name": "ArticleMetadata",
        "description": "Extract metadata from an article",
        "fields": [
          {
            "type": "string",
            "name": "title",
            "description": "Article title"
          },
          {
            "type": "string",
            "name": "author",
            "description": "Author name"
          },
          {
            "type": "datetime",
            "name": "published",
            "description": "Publication date"
          }
        ]
      },
      "model": "gpt-4o-mini",
      "output_type": "inline"
    }
    curl -L \
      --request POST \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "ProductInfo",
        "description": "Extract product details from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "boolean",
            "name": "in_stock",
            "description": "Product availability"
          },
          {
            "type": "object",
            "name": "ratings",
            "description": "Product rating information",
            "fields": [
              {
                "type": "double",
                "name": "average",
                "description": "Average rating score"
              },
              {
                "type": "integer",
                "name": "total_reviews",
                "description": "Number of reviews"
              }
            ]
          },
          {
            "type": "array",
            "name": "tags",
            "description": "Product tags",
            "fields": [
              {
                "type": "string",
                "name": "tag",
                "description": "Individual tag name"
              }
            ]
          }
        ]
      }'
    {
      "id": "schema_abc123xyz",
      "name": "ProductInfo",
      "description": "Extract product details from e-commerce pages",
      "fields": [...]
    }
    curl -L \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    curl -L \
      --request PUT \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "id": "schema_abc123xyz",
        "name": "ProductInfo",
        "description": "Extract detailed product information from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "string",
            "name": "brand",
            "description": "Product brand name"
          }
        ]
      }'
    curl -L \
      --request DELETE \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    {
      "name": "TagList",
      "description": "Extract article tags",
      "fields": [
        {
          "type": "array",
          "name": "tags",
          "description": "List of article tags",
          "fields": [
            {
              "type": "string",
              "name": "tag",
              "description": "Individual tag name"
            }
          ]
        }
      ]
    }
    {
      "name": "ProductWithReviews",
      "description": "Product details with nested review data",
      "fields": [
        {
          "type": "string",
          "name": "product_name",
          "description": "Product name"
        },
        {
          "type": "object",
          "name": "pricing",
          "description": "Pricing information",
          "fields": [
            {
              "type": "decimal",
              "name": "current_price",
              "description": "Current price"
            },
            {
              "type": "decimal",
              "name": "original_price",
              "description": "Original price before discount"
            },
            {
              "type": "integer",
              "name": "discount_percentage",
              "description": "Discount percentage"
            }
          ]
        }
      ]
    }
    selector arrow-up-right
    The request below uses the POST endpoint to open the demo site on the ecommerce site simulator with an infinitely scrolling storefront. It will wait for and dismiss a dialog box, wait for a product to load and then scroll down the page for a maximum of 20 seconds - if new items load it will keep scrolling.

    hashtag
    Actions

    hashtag
    Response

    Here's a video showing Gaffa scrolling the page for 20 seconds as more items load.

    hashtag
    Read More

    Read more about screen recording here. (TODO)

    demo site.arrow-up-right
    Gaffa API Playgroundarrow-up-right
    Waitchevron-right
    Clickchevron-right
    Scrollchevron-right
    Get Startedchevron-right
    {
      "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=infinite",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": true,
        "actions": [
          {
            "type": "wait",
            "selector": "div[role=\"dialog\"]",
            "timeout": 10000
          },
          {
            "type": "click",
            "selector": "[data-testid=\"accept-all-button\"]"
          },
          {
            "type": "wait",
            "selector": "[data-testid^=\"product-1\"]",
            "timeout": 5000
          },
          {
            "type": "scroll",
            "percentage": 100,
            "max_scroll_time": 20000
          }
        ]
      }
    }

    hashtag
    Create a new browser request

    post

    This endpoint loads the required URL in our browser and then performs the selected actions.

    Authorizations
    X-API-KeystringRequired
    Body
    post
    /v1/browser/requests

    hashtag
    Create a new data schema

    post

    Creates a new data schema definition and returns the created schema.

    Authorizations
    X-API-KeystringRequired
    Body
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    Responses
    chevron-right
    200

    Payload of DataSchema

    application/json
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    post
    /v1/schemas

    hashtag
    List data schemas

    get

    Retrieves a paginated list of data schemas.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    pageSizeinteger · int32Optional
    pageinteger · int32Optional
    Responses
    chevron-right
    200

    Payload of PagedResult containing DataSchema

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    get
    /v1/schemas

    hashtag
    Update an existing data schema

    put

    Updates an existing data schema by its ID and returns the updated schema.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired
    Body
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    Responses
    chevron-right
    200

    Payload of DataSchema

    application/json
    idstring · nullableOptional

    The unique identifier for the data schema.

    namestring · nullableOptional

    The name of the schema or field.

    descriptionstring · nullableOptional

    A description of the schema or field.

    put
    /v1/schemas/{id}

    hashtag
    Get Sitemap

    get

    This endpoint retrieves sitemap requests in bulk by id or status.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    idsstringOptional

    The unique identifiers of the sitemap requests to retrieve.

    Example: {"value":"smr_1234567890abcdef,smr_0987654321fedcba"}
    statusstringOptional

    The statuses of the sitemap requests to filter by. Valid values: pending, completed, failed

    Example: {"value":"completed,pending"}
    pageSizeinteger · int32Optional

    Items to return per page (default: 30).

    Example: {"value":30}
    pageinteger · int32Optional

    Page number of the pagination (default: 1).

    Example: {"value":1}
    Responses
    chevron-right
    200

    A collection of sitemap requests that match the criteria

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    chevron-right
    400

    Invalid query parameters

    application/json
    get
    /v1/site/map

    hashtag
    Get a sitemap request by ID

    get

    This endpoint retrieves a sitemap request by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired

    The unique identifier of the sitemap request to retrieve.

    Responses
    chevron-right
    200

    The sitemap request

    application/json
    idstring · nullableOptional

    ID of the sitemap request

    urlstring · nullableOptional

    URL of the request

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    from_cacheboolean · nullableOptional

    If this request was served from the cache

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    linksstring[] · nullableOptional

    List of URLs found in the sitemap

    link_countinteger · int32 · nullableOptional

    Number of links found

    chevron-right
    404

    Sitemap request not found

    application/json
    get
    /v1/site/map/{id}

    hashtag
    Get a browser request by ID

    get

    This endpoint retrieves a browser request by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired

    The unique identifier of the browser request to retrieve.

    Query parameters
    idstringRequired

    The unique identifiers of the browser request to retrieve.

    Responses
    chevron-right
    200

    The browser request

    application/json
    idstring · nullableOptional

    ID of the browser request

    urlstring · nullableOptional

    URL of the request

    proxy_locationstring · nullableOptional

    The proxy location of the request.

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    actual_urlstring · nullableOptional

    The actual URL captured, after any redirects.

    http_status_codeinteger · int32Optional

    The http status code for the request.

    from_cacheboolean · nullableOptional

    If this request was served from the cached

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    page_load_timestring · timespanOptional

    How long did the page take to fully render.

    videostring · nullableOptional

    Video url

    chevron-right
    404

    Browser request not found

    application/json
    get
    /v1/browser/requests/{id}

    hashtag
    Delete a data schema

    delete

    Deletes a data schema by its ID.

    Authorizations
    X-API-KeystringRequired
    Path parameters
    idstringRequired
    Responses
    chevron-right
    204

    No description

    delete
    /v1/schemas/{id}

    hashtag
    Create a new sitemap request

    post

    This endpoint processes a website's sitemap and returns all URLs found within it.

    Authorizations
    X-API-KeystringRequired
    Body
    urlstringOptional

    The url you want our sitemap reader to process on your behalf

    max_cache_ageinteger · int32 · nullableOptional

    Maximum cache age in seconds for this request. If a cached result exists within this timeframe, it will be returned. Default is 0 (no cache).

    Responses
    chevron-right
    200

    The sitemap request response detailing the URLs found

    application/json
    idstring · nullableOptional

    ID of the sitemap request

    urlstring · nullableOptional

    URL of the request

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    from_cacheboolean · nullableOptional

    If this request was served from the cache

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    linksstring[] · nullableOptional

    List of URLs found in the sitemap

    link_countinteger · int32 · nullableOptional

    Number of links found

    chevron-right
    408

    The sitemap request timed out after 60 seconds

    application/json
    chevron-right
    503

    The requested site is unavailable

    application/json
    post
    /v1/site/map

    hashtag
    Get multiple browser requests

    get

    This endpoint retrieves browser requests in bulk by id or status.

    Authorizations
    X-API-KeystringRequired
    Query parameters
    idsstringOptional

    The unique identifiers of the browser requests to retrieve.

    Example: {"value":"brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6,brq_V2P6X38RDRMRyYcNJ82qPSH5eFfQRD"}
    statusstringOptional

    The statuses of the browser requests to filter by. Valid values: pending, running, completed, failed

    Example: {"value":"completed,running"}
    pageSizeinteger · int32Optional

    Items to return per page (default: 30).

    Example: {"value":20}
    pageinteger · int32Optional

    Page number of the pagination (default: 1).

    Example: {"value":1}
    Responses
    chevron-right
    200

    A collection of browser requests that match the criteria

    application/json
    total_pagesinteger · int32 · nullableOptional

    The total number of pages available

    total_recordsinteger · int32 · nullableOptional

    The total number of records across all pages

    pageinteger · int32 · nullableOptional

    The page number to return (1-based)

    Default: 1
    page_sizeinteger · int32 · nullableOptional

    The number of records to return per page

    Default: 30
    chevron-right
    400

    Invalid query parameters

    application/json
    get
    /v1/browser/requests
    proxy_locationstring · nullableOptional

    The location of the proxy server that your request will be routed through, null means no proxy is used

    Default: null
    urlstringOptional

    The url you want our browsers to visit on your behalf

    asyncbooleanOptional

    Whether the request should be processed asynchronously, synchronous requests can be maximum 60 seconds long.

    Default: true
    max_cache_ageinteger · int32 · nullableOptional

    The maximum age of a cached result in seconds. 0 means the cache will never be used

    Default: 0
    Responses
    chevron-right
    200

    The browser request response detailing the state and output of the request

    application/json
    idstring · nullableOptional

    ID of the browser request

    urlstring · nullableOptional

    URL of the request

    proxy_locationstring · nullableOptional

    The proxy location of the request.

    statestring · nullableOptional

    The status of the request

    credit_usageinteger · int32 · nullableOptional

    The number of credits used by the request

    errorstring · nullableOptional

    The name of the error type

    error_reasonstring · nullableOptional

    More detail about the error

    actual_urlstring · nullableOptional

    The actual URL captured, after any redirects.

    http_status_codeinteger · int32Optional

    The http status code for the request.

    from_cacheboolean · nullableOptional

    If this request was served from the cached

    started_atstring · date-timeOptional

    The time in UTC when the request started.

    completed_atstring · date-timeOptional

    The time in UTC when the request finished.

    running_timestring · timespanOptional

    The running time of the request

    page_load_timestring · timespanOptional

    How long did the page take to fully render.

    videostring · nullableOptional

    Video url

    chevron-right
    408

    The browser request timed out - an example error

    application/json
    200

    Payload of DataSchema

    200

    Payload of PagedResult containing DataSchema

    200

    Payload of DataSchema

    204

    No description

    No content

    POST /v1/browser/requests HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 370
    
    "{\"proxy_location\":null,\"url\":\"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20\",\"async\":false,\"max_cache_age\":0,\"settings\":{\"record_request\":false,\"actions\":[{\"type\":\"wait\",\"selector\":\"table\"},{\"type\":\"print\",\"size\":\"A4\",\"margin\":20}],\"time_limit\":60000,\"max_media_bandwidth\":null,\"output\":null,\"block_ads\":false}}"
    {"data":{"id":"brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6","url":"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20","proxy_location":null,"state":"completed","credit_usage":1,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T14:33:38.7762685+00:00","completed_at":"2024-11-22T14:33:42.7135779+00:00","running_time":null,"page_load_time":"00:00:00.1902889","actions":[{"id":"act_V2P6Q3fSHhBWAhf4BQJxj9oYbQF1V9","type":"wait","custom_id":null,"timestamp":"2024-11-22T14:33:38.9665719+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2P6Q6BuQhoYDAVkQMVSKkkwmUrArb","type":"print","custom_id":null,"timestamp":"2024-11-22T14:33:42.3025888+00:00","output":"https://storage.gaffa.dev/brq/pdf/brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6/....","reference":null,"iterations":null,"actions":null,"error":null}],"video":null},"error":null}
    {"id":"schema_abc123def456","name":"Customer Schema","description":"Data schema for customer information","fields":[{"type":"string","name":"firstName","description":"Customer's first name","fields":[]},{"type":"string","name":"lastName","description":"Customer's last name","fields":[]},{"type":"integer","name":"age","description":"Customer's age in years","fields":[]},{"type":"boolean","name":"isActive","description":"Whether the customer account is active","fields":[]}]}
    POST /v1/schemas HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 518
    
    "{\"name\":\"Customer Schema\",\"description\":\"Data schema for customer information\",\"fields\":[{\"type\":\"string\",\"name\":\"firstName\",\"description\":\"Customer's first name\",\"fields\":[]},{\"type\":\"string\",\"name\":\"lastName\",\"description\":\"Customer's last name\",\"fields\":[]},{\"type\":\"integer\",\"name\":\"age\",\"description\":\"Customer's age in years\",\"fields\":[]},{\"type\":\"boolean\",\"name\":\"isActive\",\"description\":\"Whether the customer account is active\",\"fields\":[]}]}"
    {"data":{"total_pages":1,"total_records":3,"results":[{"id":"schema_abc123def456","name":"Customer Schema","description":"Data schema for customer information","fields":[{"type":"string","name":"firstName","description":"Customer's first name","fields":[]},{"type":"string","name":"lastName","description":"Customer's last name","fields":[]},{"type":"integer","name":"age","description":"Customer's age in years","fields":[]},{"type":"boolean","name":"isActive","description":"Whether the customer account is active","fields":[]}]},{"id":"schema_xyz789uvw123","name":"Product Schema","description":"Data schema for product information","fields":[{"type":"string","name":"productName","description":"Name of the product","fields":[]},{"type":"decimal","name":"price","description":"Product price","fields":[]},{"type":"boolean","name":"inStock","description":"Whether the product is in stock","fields":[]},{"type":"array","name":"tags","description":"Product tags","fields":[{"type":"string","name":"tagItem","description":null,"fields":[]}]}]},{"id":"schema_hij456klm789","name":"Order Schema","description":"Data schema for order processing","fields":[{"type":"string","name":"orderId","description":"Unique order identifier","fields":[]},{"type":"datetime","name":"orderDate","description":"Date when order was placed","fields":[]},{"type":"object","name":"customer","description":"Customer information","fields":[{"type":"string","name":"customerId","description":"Customer identifier","fields":[]},{"type":"string","name":"email","description":"Customer email address","fields":[]}]},{"type":"decimal","name":"totalAmount","description":"Total order amount","fields":[]}]}],"page":1,"page_size":30},"error":null}
    GET /v1/schemas HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "name": "text",
      "description": "text",
      "fields": [
        {
          "type": 0,
          "name": "text",
          "description": "text",
          "fields": [
            {
              "type": 0,
              "name": "text",
              "description": "text",
              "fields": [
                "[Circular Reference]"
              ]
            }
          ]
        }
      ]
    }
    PUT /v1/schemas/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 1082
    
    "{\"name\":\"Updated Product Schema\",\"description\":\"Enhanced schema for product information with additional fields\",\"fields\":[{\"type\":\"string\",\"name\":\"productName\",\"description\":\"Name of the product\",\"fields\":[]},{\"type\":\"decimal\",\"name\":\"price\",\"description\":\"Product price\",\"fields\":[]},{\"type\":\"boolean\",\"name\":\"inStock\",\"description\":\"Whether the product is in stock\",\"fields\":[]},{\"type\":\"array\",\"name\":\"tags\",\"description\":\"Product tags\",\"fields\":[{\"type\":\"string\",\"name\":\"tagItem\",\"description\":null,\"fields\":[]}]},{\"type\":\"array\",\"name\":\"categories\",\"description\":\"Product categories\",\"fields\":[{\"type\":\"string\",\"name\":\"category\",\"description\":null,\"fields\":[]}]},{\"type\":\"object\",\"name\":\"specifications\",\"description\":\"Technical specifications\",\"fields\":[{\"type\":\"string\",\"name\":\"dimensions\",\"description\":\"Product dimensions\",\"fields\":[]},{\"type\":\"double\",\"name\":\"weight\",\"description\":\"Product weight in grams\",\"fields\":[]}]}]}"
    {"total_pages":0,"total_records":1,"results":[{"id":"smr_1234567890abcdef","url":"https://example.com","state":"completed","credit_usage":1,"error":null,"error_reason":null,"from_cache":false,"started_at":"2024-01-01T12:00:00+00:00","completed_at":"2024-01-01T12:01:00+00:00","running_time":"00:01:00","links":["https://example.com/","https://example.com/about","https://example.com/products"],"link_count":3}],"page":1,"page_size":30}
    GET /v1/site/map HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "url": "text",
      "state": "text",
      "credit_usage": 1,
      "error": "text",
      "error_reason": "text",
      "from_cache": true,
      "started_at": "2026-03-05T07:58:07.752Z",
      "completed_at": "2026-03-05T07:58:07.752Z",
      "running_time": "text",
      "links": [
        "text"
      ],
      "link_count": 1
    }
    GET /v1/site/map/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {
      "id": "text",
      "url": "text",
      "proxy_location": "text",
      "state": "text",
      "credit_usage": 1,
      "error": "text",
      "error_reason": "text",
      "actual_url": "text",
      "http_status_code": 1,
      "from_cache": true,
      "started_at": "2026-03-05T07:58:07.752Z",
      "completed_at": "2026-03-05T07:58:07.752Z",
      "running_time": "text",
      "page_load_time": "text",
      "actions": [
        {
          "id": "text",
          "type": "text",
          "custom_id": "text",
          "timestamp": "2026-03-05T07:58:07.752Z",
          "output": {},
          "reference": "text",
          "iterations": 1,
          "actions": [
            {
              "id": "text",
              "type": "text",
              "custom_id": "text",
              "timestamp": "2026-03-05T07:58:07.752Z",
              "output": {},
              "reference": "text",
              "iterations": 1,
              "actions": [
                "[Circular Reference]"
              ],
              "error": "text"
            }
          ],
          "error": "text"
        }
      ],
      "video": "text"
    }
    GET /v1/browser/requests/{id}?id=text HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    DELETE /v1/schemas/{id} HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    {"id":"smr_1234567890abcdef","url":"https://example.com","state":"completed","credit_usage":1,"error":null,"error_reason":null,"from_cache":false,"started_at":"2024-01-01T12:00:00+00:00","completed_at":"2024-01-01T12:00:30+00:00","running_time":"00:00:30","links":["https://example.com/","https://example.com/about","https://example.com/products","https://example.com/contact"],"link_count":4}
    POST /v1/site/map HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Content-Type: application/json
    Accept: */*
    Content-Length: 55
    
    "{\"url\":\"https://example.com\",\"max_cache_age\":0}"
    {"data":{"total_pages":1,"total_records":2,"results":[{"id":"brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP","url":"https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20","proxy_location":null,"state":"completed","credit_usage":4,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T16:31:13.128103+00:00","completed_at":"2024-11-22T16:31:47.020851+00:00","running_time":null,"page_load_time":"00:00:03.4705813","actions":[{"id":"act_V2PUfETiTdXwzEgAW2NPURnATW7we9","type":"wait","custom_id":null,"timestamp":"2024-11-22T16:31:16.6080484+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2PUfBreQxHR2SNqGXuPzzWoiyRsrm","type":"print","custom_id":null,"timestamp":"2024-11-22T16:31:40.5760333+00:00","output":"https://storage.gaffa.dev/brq/pdf/brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP/act_V2PUfBreQxHR2SNqGXuPzzWoiyRsrm.pdf","reference":null,"iterations":null,"actions":null,"error":null}],"video":"https://storage.gaffa.dev/brq/video/brq_V2PUfFA8AQPAQ5VEsewpxdGUSZkgKP.mp4"},{"id":"brq_V2NmHY9FsvPQEGbfVBSeV6UCp2SXjC","url":"https://demo.gaffa.dev/simulate/article?loadTime=3&paragraphs=10&images=3","proxy_location":null,"state":"completed","credit_usage":1,"error":null,"error_reason":null,"actual_url":null,"http_status_code":200,"from_cache":false,"started_at":"2024-11-22T12:52:48.708264+00:00","completed_at":"2024-11-22T12:52:54.25994+00:00","running_time":null,"page_load_time":"00:00:00.8094888","actions":[{"id":"act_V2NmHijnQa9iPDNcvhjS2GGFt5se8j","type":"wait","custom_id":null,"timestamp":"2024-11-22T12:52:49.5690537+00:00","output":null,"reference":null,"iterations":null,"actions":null,"error":null},{"id":"act_V2NmHgs27VJKB49YavtK4CcyErdfvD","type":"generate_markdown","custom_id":null,"timestamp":"2024-11-22T12:52:52.8353136+00:00","output":"https://storage.gaffa.dev/brq/md/brq_V2NmHY9FsvPQEGbfVBSeV6UCp2SXjC/act_V2NmHgs27VJKB49YavtK4CcyErdfvD.md","reference":null,"iterations":null,"actions":null,"error":null}],"video":null},{"id":"brq_V2HvS2cw4Z2wonqEAwbxoxjrkmRdEM","url":"https://demo.gaffa.dev/simulate/article","proxy_location":null,"state":"failed","credit_usage":0,"error":null,"error_reason":null,"actual_url":null,"http_status_code":null,"from_cache":false,"started_at":null,"completed_at":null,"running_time":null,"page_load_time":null,"actions":null,"video":null}],"page":1,"page_size":30},"error":null}
    GET /v1/browser/requests HTTP/1.1
    Host: api.gaffa.dev
    X-API-Key: YOUR_API_KEY
    Accept: */*
    
    Gaffa scrolling to the bottom of a simulated ecommerce page!