All pages
Powered by GitBook
1 of 24

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Block DOM Removals

Type: block_dom_removals

This action will prevent the page from removing items from the page. This is useful if you are trying to scrape data from a Javascript-based web application that removes items from the page when they are out of view which can make grabbing data difficult.

Using this action will block DOM removals for the rest of the browser request.

Parameters

See universal parameters.

Usage

Capture the cookies of the current page

Capture Cookies

Type: capture_cookies

This action will capture the browser cookies currently saved for the web page you are on and return them as a JSON object with key/values.

Parameters

See universal parameters.

Usage

Capture the cookies of the current page

Capture Element

Type: capture_element

Returns the , essentially the contents, of a particular element on the page. This can be used when you are only interested in the contents of a particular element.

Parameters

Name
Type
Required

Capture DOM

Type: capture_dom

This action will capture and return the raw dom of the site which you can then extract data from on your end.

For common AI scenarios you may find this returns too much data so we have provided a action which distills the DOM to only the important elements.

Parameters

See .

Capture Screenshot

Type: capture_screenshot

Takes a screenshot of the current page. You can choose to take a full screen screenshot showing the whole page or just the current view.

Parameters

Name
Type
Required
Usage

Capture the raw DOM of the current page

Example Output

generate_simplified_dom
universal parameters
13KB
GaffaDOMSample.txt
Open
"actions": [
    {
      "type": "capture_dom"
    }
]
Description

selector

string

The that defines the element whose contents you want to capture.

timeout

integer

The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

See universal parameters.

Usage

Click an element on the page

The following code will wait 1 second for the .page_contents element to appear and return an html file containg the div's innerHTML.

innerHTML
Description

size

string

The size of paper the page should be printed to. Default: view Accepted: ["view", "fullscreen"]

See universal parameters.

Usage

The following captures the current section of the page currently visible in the browser.

Example Output

An example screenshot in fullscreen mode.

"actions": [
    {
      "type": "block_dom_removals"
    }
]
"actions": [
    {
      "type": "capture_cookies"
    }
]

Actions

When you can specify a list of actions you wish for us to carry out on the requested web page. These actions conform to the following format:

Universal Parameters

All actions have the following parameters:

Name
Type
Required
Description

Capture Snapshot

Type: capture_snapshot

This output type will return a HTML file which captures a static version of the page state. The page will load offline and can be saved to your local machine.

This will:

  • Load and embed all images on the page.

  • Embed all css files

Currently, Javascript will be disabled and interactivity might not worked as expected but this feature should be useful for preserving the page state as it was and allowing you to view it offline.

Click

Type: click

Request that the browser clicks a particular element on the page.

Parameters

Name
Type
Required
Description

Generate Simplified DOM

Type: generate_simplified_dom

When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into a LLM. The generate_simplified_dom output format processes the HTML in the following way:

  • Removes all links in the head

  • Removes all script

"actions": [
    {
      "type": "capture_element",
      "selector": ".page_contents",
      "timeout": 1000
    }
]
"actions": [
    {
        "type": "capture_screenshot",
        "size": "view"
    }
]

API Playground Examples

In the following pages you can view all the pre-built requests we've built to show what is possible with the Gaffa web automation API.

You can start using these in the API Playground once you've created an account.

Parameters

See universal parameters

Usage

The following captures the current section of the page currently visible in the browser.

Example Output

Here's an example that shows an offline snapshot of a site

518KB
GaffaSnapshotSample.mhtml
Open
nodes and links to scripts
  • Removes all style nodes

  • Remove style attributes from all elements

  • Remove all links to stylesheets

  • Remove all noscript elements outside of the body

  • Finds all hrefs with query strings and removes the query strings

  • Important meta tags are kept, all others are removed

  • Remove all alternate links

  • Remove all SVG paths

  • Remove empty text nodes and excessive spacing

  • Parameters

    See universal parameters.

    Usage

    The following JSON captures the DOM of the page and simplifies it.

    We are actively working to improve this and to make this process more configurable - let us know if there's something you think we can improve.

    Example Output

    6KB
    GaffaSimplifiedDOMSample.txt
    Open
    "actions": [
        {
            "type": "capture_snapshot",
        }
    ]
    "actions": [
        {
            "type": "generate_simplified_dom"
        }
    ]

    type

    string

    The type name of the action.

    continue_on_fail

    boolean

    Should execution of further actions continue or throw an error if this action fails. Default: false

    customId

    string

    A customId to help you find the action in the response. Default: null

    Action Execution

    Actions are carried out in the order they are submitted. Every action type has a continue_on_fail parameter which defaults to false, this means that if any action fails the execution of the browser request ends and an error will be returned. Setting continue_on_fail to true ensures that all actions are carried out, regardless of previous action results and an error will not be returned.

    Custom Id

    As shown above, you can submit a customId with each action you submit to the API. We'll include this Id in the outputs from the browser request so you can find a certain action's output and/or status easily in the response.

    Response Format

    When a browser request has completed, information on an action's execution

    Supported Actions

    The Gaffa API supports the following actions detailed below. Click the "read more" buttons to read more information about each type.

    Actions without outputs

    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More

    Actions with outputs

    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    Type
    Description
    Read More
    making a Browser Request

    selector

    string

    The that defines the page element that the browser should click on.

    timeout

    integer

    The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)

    See universal parameters.

    Usage

    Click an element on the page

    The following code will wait 1 second and then continue with the next action, if provided.

    Wait for a particular element to appear

    The following code will wait for the logo to appear for a maximum of 5 seconds and it will continue with the list of actions

    selector

    Scroll

    Type: scroll

    Request that the browser scrolls to a certain point on the page or, in the case of pages with infinite scrolling, scrolls for a particular amount of time.

    Parameters

    Name
    Type
    Required
    Description

    Print

    Type: print

    Request that the browser prints the page to a PDF.

    Parameters

    Name
    Type
    Required
    Description

    Download File

    Type: download_file

    Request a copy of the most recent file viewed in the browser.

    Parameters

    Name
    Type
    Required
    Description
    {
        "type": "", //the type of the action
        //other params follow as key value pairs
        "key": value //string, number etc. 
    }
    {
        "id": "", //a unique id given to the action by Gaffa
        "type": "capture_screenshot", //the type of the action
        "query": "", //a representation of the action in querystring format
        "timestamp": "", //the UTC timestamp the action was executed
        "output": "" //if the action has an output you will find a url for this here,
        "error": "" //if the requesst fails the error message will be returned here
    }
    "actions": [
        {
          "type": "click",
          "selector": "a.header__logo"
        }
    ]
    "actions": [
          {
            "type": "wait",
            "selector": "a.header__logo",
              "timeout": 5000,
              "continueOnFail": true
          }
    ]

    click

    Click on a given element

    Click

    scroll

    Scroll to a particular point on the page or, in the case of pages with infinite scrolling, scroll until a given time has elapsed.

    Scroll

    type

    Type the provided text into a given element

    Type

    wait

    Wait for a given time to elapse or an element to appear on page before proceeding to the next action.

    Wait

    capture_cookies

    Save a JSON object of cookies for the current page

    Capture Cookies

    capture_dom

    Export the raw DOM page data

    DOM

    capture_screenshot

    Capture a screenshot of the web page

    Screenshot

    capture_snapshot

    Create a completely static version of the web page which can be accessed offline

    Snapshot

    download_file

    Download an online file using Gaffa

    Download File

    generate_markdown

    Convert the page into markdown

    Markdown

    generate_simplified_dom

    Generate a simplified version of the DOM

    Simplified DOM

    parse_json

    Parse online data to a defined JSON schema

    JSON Parsing

    print

    Print the web page to a PDF

    Print

    selector

    size

    string

    The size of paper the page should be printed to. Default: A4 Accepted: ["A4"]

    margin

    integer

    The margin of the page in pixels when the page is printed to PDF. Default: 20

    orientation

    string

    Should execution of further actions continue or throw an error if this action fails. Default: portrait Accepted: ["portrait", "landscape"]

    continue_on_fail

    boolean

    Should execution of further actions continue or throw an error if this action fails. Default: true

    See universal parameters.

    Usage

    Print a page in landscape to PDF

    The following JSON prints the page to a PDF in landscape with margins of 20px.

    Example Output

    51KB
    GaffaPrintPdfExample.pdf
    PDF
    Open

    timeout

    integer

    The maximum amount of time the browser should wait for a file to download. Default: 5,000 (5s)

    See universal parameters.

    Files Supported

    Currently this only works with the following file formats: .pdf, .jpg, .png, .gif, .bmp, .webp, .svg, .tiff, .tif, .img

    Usage

    Download a copy of a PDF open in the Browser

    The following waits 20s for a file to download and then returns it.

    And the service responds with the file being in the action output:

    "actions": [
        {
            "type": "print",
            "page_size": "A4",
            "orientation": "landscape",
            "margin": 20
        }
    ]
    "actions": [
        {
            "type": "download_file",
            "timeout": 20000
        }
    ]
    "actions": [
          {
            "id": "act_VHhrUbXjZSaYCPTqbBYD4acCzzeFGH",
            "type": "download_file",
            "query": "download_file?continue_on_fail=false&timeout=20000",
            "timestamp": "2025-05-30T15:02:06.6615306Z",
            "output": "https://storage.gaffa.dev/brq/downloads/5845df07-3749-424e-9c64-9602be19a857.pdf"
          }
        ]

    percentage

    integer

    The percentage the page should scroll up or down (+/-) Range: [-100 - 0 - 100] Default: 100 (% - scroll to bottom)

    wait_time

    integer

    After arriving at the desired scroll location this the time Gaffa should monitor for changes to the page height before marking the action as succeeded. Read more . Default: 0

    max_scroll_time

    integer

    The maximum amount of time the page should be scrolled for, in milliseconds. After this time passes, the action will be cancelled. This doesn't cause the action to fail. Default: 20,000 (20s)

    scroll_speed

    string

    The speed which the page should scroll to the desired point. You can read more about this . Default: medium Accepted: [slow, medium, instant]

    interval

    See universal parameters.

    Scroll Speed & Interval

    Gaffa gives you a flexibility about how fast you scroll down the page which can be really useful to get around restrictions enforced by some sites which detect and limit fast scrolling. By experimenting with scroll_speed and interval you will be able to create the perfect scrolling action for your scenario. The speed settings are as follows:

    • instant- the page will smoothly scroll to the desired position immediately, useful for sites with no rate limits or loading events caused by scroll actions.

    • medium - human-like scrolling at a normal speed to the desired position. Gaffa will scroll in much the same way as you would using a mouse.

    • slow- human-like scrolling at a very slow speed to the desired position. The speed is comparable to scrolling whilst reading a page.

    intervalallows you to adjust the scroll speed further by inserting pauses between scroll events.

    We've found some sites with infinite scrolling and strict rate limits respond better to immediate speed scroll events to the bottom of the page with large intervalsbetween these scrolls to keep within rate limits.

    Wait Time

    If wait_time is set to 0 and Gaffa arrives at the desired location then Gaffa will immediately mark the action as succeeded. However, if another value is set then the page will be monitored for the desired amount of time to check for further expansions. If, during this period, the page expands again then Gaffa will continue scrolling to the desired location and the wait will reset.

    This can be really useful if you find that the site takes some time to load more items when you reach the bottom of the page and more will be loaded after the action has suceeded.

    Usage

    Scroll a particular percentage down the page

    The following code will scroll half way down the page.

    Scroll an infinitely scrolling webpage

    The following code will scroll to the bottom of the page and then keep scrolling when new content loads for a maximum of 25 seconds, waiting 1 second for new content and scrolling at a slow pace with 1 second between scroll actions.

    Read more

    Infinitely Scroll an Ecommerce Site

    An example request that uses Gaffa to infinitely scroll down a simulated ecommerce site whilst recording the interaction.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site. You can run this request right now in the Gaffa API Playground.

    Gaffa automates infinite scrolling on dynamic pages like e-commerce storefronts. Set a duration, and Gaffa will capture all content as it scrolls. Each session can be recorded as a video for playback, letting you debug or review the interaction.

    API Request

    The request below uses the POST endpoint to open the demo site on the ecommerce site simulator with an infinitely scrolling storefront. It will wait for and dismiss a dialog box, wait for a product to load and then scroll down the page for a maximum of 20 seconds - if new items load it will keep scrolling.

    Actions

    Response

    Here's a video showing Gaffa scrolling the page for 20 seconds as more items load.

    Read More

    Read more about screen recording here. (TODO)

    Type

    Type: type

    Request that the browser type a particular bit of text into a field.

    Parameters

    Name
    Type
    Required
    Description

    See .

    Sites that use more advanced bot detection often use keyboard events to detect unusual activity on their site, rather than immediately dropping all characters of the text into a field our platform types the text in a human-like manner.

    Usage

    Type into a text box

    The following action will type into a particular text field.

    Wait for an element to appear before typing

    The following code will wait a maximum of 10 seconds for the email input to appear in the field and then type in the provided email.

    Parse Table

    Type: parse_table

    Finds a table on the page with a given selector and then converts the table data into a JSON object.

    This action first fins the table headers and converts them into property names by converting them to lower case and replacing non-alphanumeric characters with underscores. It then processes each table row and for each cell is extracts the contents and saves a value. At the moment, all values will be string types.

    Parameters

    Name
    Type
    Required
    Description

    See .

    Usage

    Extract a table on the page

    The following code will wait 1 second for the .large_table element to appear and return a JSON file with the headers and rows converted.

    Generate Markdown

    Type: generate_markdown

    The markdown output format can export the data of the page (an article, table etc.) in a human and LLM readable format which removes unnecessary styling data and other "junk" that is only relevant for the site to work properly.

    Gaffa exports GitHub flavoured markdown with comments removed and unknown tags ignored.

    Parameters

    See universal parameters.

    Usage

    The following converts the current page to markdown:

    Example Output

    Capture a Full Height Screenshot

    An example request that uses Gaffa to dismiss a modal, scroll to the bottom of a page and then capture a full height screenshot.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .

    Gaffa can also capture screenshots at any point during your interaction for use in your app or just to work out exactly was being shown at a given point in time. You can capture just what is shown as if you were looking at the screen or the full height of the page.

    API Request

    The request below uses the to open the demo site on the ecommerce page with 20 items, wait for and dismiss the dialog, scroll to the bottom of the page and capture a full height screenshot.

    Automated Form Filling

    An example request that uses Gaffa to automate the completion of a form and waits for a success modal to appear.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .

    Filling forms is tedious, Gaffa can be used to fill out a form in a human-like manner so you can spend time doing much more interesting things.

    API Request

    The request below uses the to open the demo site on the form simulator page with some sections pre-filled (for speed). After typing in the required information and clicking submit, Gaffa waits for the success dialog to show before returning a video of the interaction.

    Wait

    Type: wait

    Request that the browser waits a given amount of time or for a particular item to appear on the page.

    Parameters

    Name
    Type
    Required
    Description

    Browser Requests

    Making web automation requests has never been so simple.

    Browser Requests allow you to send the Gaffa API a URL and a list of actions you want to be carried out, including any outputs you want from the page. We'll carry out the request on our cloud browsers and return you the response with no need to worry about proxies, IP rotation, web automation frameworks and scaling.

    There's absolutely zero configuration needed and you can interact with Gaffa from any program that can send web requests. We think it's by far the simplest way to automate simple web tasks and the good news is, we're just getting started and have much more planned.


    Example request

    Running a new browser request is as simple as sending the following . Below, you can see the url () and a list of actions which instruct Gaffa to wait for a table to load and print the page to PDF.

    "actions": [
          {
            "type": "scroll",
            "percentage": 50,
          }
    ]
    "actions": [
          {
            "type": "scroll",
            "percentage": 100,
            "scroll_speed": "slow",
            "max_scroll_time": 25000,
            "interval": 1000,
            "wait_time": 1000
          }
    ]

    integer

    The amount of time, in milliseconds, that scrolling should pause between scroll events. Read more about this below. Default: 0

    timeout

    integer

    The maximum amount of time Gaffa will wait for the page to become scrollable Default: 0

    How to Handle Infinite Scrolling and Dynamic Loading with Gaffa’s Scroll Action

    below
    below
    Cover

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

    selector

    string

    The selector that defines the page element that the browser should click on.

    text

    string

    The text the browser should enter into the text field.

    timeout

    integer

    The maximum amount of time the browser should wait for the element that needs to be typed in to appear. Default: 5000 (5s)

    universal parameters

    selector

    string

    The selector that defines the table whose contents you want to parse.

    timeout

    integer

    The maximum amount of time the browser should wait for the table defined by the selector to appear. Default: 5000 (5s)

    universal parameters

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please contact support and we can enable this feature for your account.

    5KB
    GaffaMarkdownExample.md
    Open

    You can read more about this particular example and how you can run it right now in our API Playground here


    Proxy servers

    In order to access public sites and use proxy servers you'll need to sign up for a paid account but after that you'll be able to build automations for any site you wish.

    Gaffa makes proxying your traffic through a global network of residential proxies super simple. Setting proxy_location in your request will allow you to utilize one of our partner third party proxy services to gain local access to a site.

    Not setting a proxy_location will mean the request does not use a proxy server and will use a generic datacenter IP.

    Available Locations

    Proxy Server Location
    Country Code

    United States

    us

    Ireland

    ie

    Singapore

    sg

    France

    fr

    At the moment all our servers are in one location but we aim to introduce local machines to our proxy locations for a more realistic end-user load times. If this would interest you please contact support.

    IP Types

    Currently all our IP addresses are residential IP addresses which are procured through reputable third parties.

    IP Rotation

    IP rotation is an essential part of any web data, scraping or automation task. In Gaffa, each browser request is treated as unique. We regularly rotate the IP addresses used so you should assume that each request will be carried out from a different IP address from the last.

    We are working to supporter a greater range of IP address scenarios, like static IPs in the future, as well as more trusted proxies for requests that require enhanced levels of security (logins etc.)

    Restrictions

    Whilst we'll do our best to provide access to as wide a range of sites as possible we may have to restrict access to certain sites to prevent abuse of our service or of other services. Our proxy partners may also enforce restrictions on certain sites and categories of sites which we don't have any control over.


    Caching

    max_cache_age: integer

    When we were building Gaffa we noticed that a lot of pre-existing scraping tools don't allow users to easily share their scraped web data with each other, despite many users requesting the same web pages on the same sites. Not only is this a waste of a user's allowance, it also puts a burden on the site owners who are serving the same data to different users for the same purpose. Because of this in Gaffa we have created a service-wide cache.

    How it works

    When making a browser request you can provide a max_cache_ageparameter which is a number in seconds equal or greater than 0. This values denotes the maximum age of data you would accept from the API. If another user of our service has requested the same URL with exactly the same parameters and actions as you in this chosen timeframe then the response will be returned to you immediately and the response will not be carried out on one of our browsers. If there are multiple identical requests in the given timeframe then the most recent will be returned. This will save you time waiting for the response, as well as credits, because requests returned from the cache don't use any bandwidth.


    Screen Recording

    record_request: boolean

    By specifying record_request you can ask Gaffa to screen record your automation and return a video in the response allowing you to view the magic happening or to debug your automation.

    Recording requests comes at an additional cost.


    Max Media Bandwidth

    max_media_bandwidth: integer

    If you are using Gaffa on a site with lots of images and videos and more interested in the text data on the page, you can cap how much data a page loads in MB using the max_media_bandwidth setting. This makes your automation faster and prevents spending credits on data you aren't interested in. With the max_media_bandwidth value set, Gaffa monitors data being downloaded by the page and when downloaded data exceeds the given number of MB, all further downloads of images or video will be cancelled. max_media_bandwidth defaults to null meaning downloads are not capped. Setting a value of 0 will cause no images to load which can work on some sites but on others this could lead to the site thinking you are using an ad blocker.


    Time Limit

    time_limit: integer

    Using the setting time_limit caps the maximum running time of the request in milliseconds. If this time expires all incomplete actions will be cancelled and the request will return an error. This cap has to be less than the maximum request running time dictated by your plan and if not set, will default to this value.


    Actions

    We currently support ten different types of actions which you can read more about here.


    Stealth

    We believe your AI Agents should be able to use the internet exactly how humans would. Gaffa can help you get access to sites with some of the most challenging anti-bot restrictions using a combination of proxies, human-like behavior, captcha solving and a custom browser implementation. We handle and maintain all of that so you can focus on building your solution!


    Examples

    We've created a number of sample browser requests you can read about here or you can jump straight into the API Playground to start running them right now.


    API Endpoints

    Check out our API reference for more details about the endpoints available, particularly those you can use to query for past requests by id or status.

    POST body to our endpoint
    our demo site
    "actions": [
          {
                "name": "type",
                "selector": "#postform-text",
                "text": "Hello world!"
          }
    ]
    "actions": [
          {
             "name": "type",
             "selector": "form input[name="email"]",
             "text": "test@test.com"
             "timeout": 10000
          }
    ]
    "actions": [
        {
          "type": "parse_table",
          "selector": ".large_table",
          "timeout": 1000
        }
    ]
    "actions": [
        {
            "type": "generate_markdown"
        }
    ]
    {
      "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "max_media_bandwidth": null,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }

    Actions

    Response

    The export full height screenshot of the page showing all items.

    Gaffa's full height screenshot
    demo site.
    Gaffa API Playground
    POST endpoint
    Wait
    Click
    Scroll
    Capture Screenshot
    Actions

    Response

    Here's a video showing Gaffa filling out the page and waiting for the success modal.

    Read More

    Read more about screen recording here (TODO).

    demo site.
    Gaffa API Playground
    POST endpoint
    Type
    Click
    Wait

    time

    integer

    The time in milliseconds that the browser should wait.

    selector

    string

    The that defines the page element that the browser should wait to appear.

    timeout

    integer

    The maximum amount of time the browser should wait for the provided selector to appear. Default: 5,000 (5s)

    See universal parameters.

    Usage

    Wait for a particular amount of time

    The following code will wait 1 second and then continue with the next action, if provided.

    Wait for a particular element to appear

    The following code will wait for a table to appear on the page for a maximum of 5 seconds. If the table has not appeared after 5 seconds the next action will be executed, if provided.

    Wait
    Click
    Scroll
    Get Started
    {
      "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "div[role=\"dialog\"]",
            "timeout": 10000
          },
          {
            "type": "click",
            "selector": "[data-testid=\"accept-all-button\"]"
          },
          {
            "type": "wait",
            "selector": "[data-testid^=\"product-1\"]",
            "timeout": 5000
          },
          {
            "type": "scroll",
            "percentage": 100
          },
          {
            "type": "capture_screenshot",
            "size": "fullscreen"
          }
        ]
      }
    }
    {
      "url": "https://demo.gaffa.dev/simulate/form?loadTime=3&showModal=false&modalDelay=0&formType=address&firstName=John&lastName=Doe&address1=123%20Main%20Street&city=London&country=UK",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": true,
        "actions": [
          {
            "type": "type",
            "selector": "#email",
            "text": "johndoe@example.com"
          },
          {
            "type": "type",
            "selector": "#state",
            "text": "CA"
          },
          {
            "type": "type",
            "selector": "#zipCode",
            "text": "12345"
          },
          {
            "type": "click",
            "selector": "button[type='submit']"
          },
          {
            "type": "wait",
            "selector": "[role=\"dialog\"] h2:has-text(\"Success!\")",
            "timeout": 10000
          }
        ]
      }
    }
    "actions": [
          {
            "type": "wait",
            "time": 1000,
          }
    ]
    "actions": [
          {
            "type": "wait",
            "selector": "table",
            "timeout": 5000,
            "continueOnFail": true
          }
    ]
    {
      "url": "https://demo.gaffa.dev/simulate/ecommerce?loadTime=3&showModal=true&modalDelay=0&itemCount=infinite",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": true,
        "actions": [
          {
            "type": "wait",
            "selector": "div[role=\"dialog\"]",
            "timeout": 10000
          },
          {
            "type": "click",
            "selector": "[data-testid=\"accept-all-button\"]"
          },
          {
            "type": "wait",
            "selector": "[data-testid^=\"product-1\"]",
            "timeout": 5000
          },
          {
            "type": "scroll",
            "percentage": 100,
            "max_scroll_time": 20000
          }
        ]
      }
    }
    selector

    Export Web Page to PDF

    An example request that uses Gaffa to convert an HTML page to a PDF. There are lots of HMTL to PDF API's but Gaffa handles it easily, as well as doing much more.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site. You can run this request right now in the Gaffa API Playground.

    Gaffa's print to PDF feature allows you to export web pages as PDF files easily. Unlike the standard "Print to PDF" in your local browser, Gaffa's feature waits for specific items to load, uses proxies, and scales with your product's growth. Enhance your customer experience and streamline your PDF export process

    API Request

    The request below uses the POST endpoint to open the demo site on the table page, wait for the table to load and then print the webpage to a PDF in size A4 with a margin of 20 and using the portrait orientation.

    Actions

    Read the full documentation for these actions here.

    Response

    Here's an example of the PDF returned by the request after waiting for the table to load.

    Wait
    Print
    51KB
    GaffaPrintPdfExample.pdf
    PDF
    Open
    {
      "url": "https://demo.gaffa.dev/simulate/table?loadTime=3&rowCount=20",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "table"
          },
          {
            "type": "print",
            "size": "A4",
            "margin": 20,
            "orientation": "portrait"
          }
        ]
      }
    }

    Parse JSON

    Paid Action: This action will consume credits based on the amount of content being parsed, see more below.

    Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.

    Type: parse_json

    The parse_json action extracts data from web pages and online PDFs. It uses AI to parse web content from text into a pre-defined data schema and return it as a JSON object.

    The action allows you to convert unstructured content such as academic papers, forms, and webpages into JSON objects, which you can use in automations, analysis, or further processing.

    This feature currently works for online PDFs and web page text.

    Parameters

    Name
    Type
    Required
    Description

    See .

    Defining Data Schemas

    A data schema tells the model exactly what JSON structure to produce.

    You can define schemas in two ways:

    • Inline schemas (defined directly inside the action)

    • Reusable schemas (created via the Schema API and referenced by ID in your requests)

    Schema Structure

    A schema has:

    Property
    Type
    Description

    Each field in the fields array has:

    Supported Field Types

    Type
    Description

    Inline Schema Example

    This example shows:

    • Simple fields (string, datetime) for basic data

    • Object fields for grouped related data with nested fields

    • Array fields for lists of items with nested fields defining each item's structure

    Schema Operations

    Instead of defining schemas inline every time, they can be saved to your Gaffa account and be reused across multiple requests. This makes your actions more readable, easier to maintain, and ensures consistency when parsing similar content.

    Creating a Saved Schema

    Use the endpoint to create a reusable schema:

    Response:

    Save the id returned in the response, you'll use this to reference the schema in your requests

    Managing Schemas

    List all schemas:

    Allows you to view all schemas saved to your account:

    Endpoint:

    Update a schema:

    Allows you to modify an existing schema by its ID:

    Endpoint:

    Delete a schema:

    Removes a schema from your account:

    Endpoint:

    Common Schema Patterns

    Simple List Extraction

    Nested Objects

    Pricing

    The credits this action uses depends on the model used. Here are the current supported models and their pricing:

    Model
    Input Token Cost
    Output Token Cost

    model

    string`

    The AI model you wish to use to parse the content into JSON. Default: gpt-4o-mini Accepted: ["gpt-4o-mini"]

    input_token_cap

    int

    The max number of source input tokens that will be passed to the AI model to parse. This can be used to prevent unnecessary credit usage. If your source input is longer than the token cap, it will be abbreviated. Default: 1,000,000

    selector

    string

    The that defines an element you want to parse the content of - this is useful if you are only interested in the contents of a certain element.

    output_type

    string

    Should the action output be saved to a file where a URL will be returned or should the parsed JSON object be included directly in the request. Default: file Accepted: ["file", "inline"]

    max_pages

    int

    If you are parsing a PDF you can specify this parameter to limit the number of pages that are passed to the LLM. Default: no limit

    object

    Nested structured object

    string

    Text value

    data_schema_id

    string

    The id of the data schema you have defined that you want to transform the content into. You must provide a data_schema or data_schema_id with your request.

    data_schema

    json

    A JSON object describing the data_schema you want to transform the content into.

    You must provide a data_schema or data_schema_id with your request.

    instruction

    string

    description

    string

    Explains what data the schema extracts and provides context to help the AI model understand the extraction goal. Example: "Extract product details from this e-commerce product page"

    fields

    array

    Each field defines a piece of data to extract from the content. See field properties below.

    name

    string

    This identifies the schema and should clearly indicate what data it extracts. Example: "ProductInfo", "ArticleMetadata", "ContactForm"

    descripton

    string

    Include details about format, handling of missing values, or special cases.

    Example: "Maximum salary in GBP. If only one value is provided, use the same value for both min and max. Return null if not provided."

    fields

    array

    Required only for object and array types.

    name

    string

    Use clear, descriptive names that follow your preferred naming convention (e.g., snake_case or camelCase). Example: "product_name", "published_date", "author_email"

    type

    string

    Determines how the AI interprets and structures the extracted data. Must be one of the supported types below.

    array

    List of items

    boolean

    True/False

    datetime

    timestamp

    decimal

    Precise decimal

    double

    Floating-point number

    integer

    Whole number

    gpt-4o-mini

    1 credit per 10,000 input tokens

    4 credits per 10,000 output tokens

    universal parameters
    POST /v1/schemas
    GET /v1/schemas
    PUT /v1/schemas
    DELETE /v1/schemas/:id
    contact support

    A custom instruction, in addition to any detail you have added to the data schema, that you want to include with this particular parse.

    {
      "type": "parse_json",
      "data_schema": {
        "name": "ArticleMetadata",
        "description": "Extract metadata from an article",
        "fields": [
          {
            "type": "string",
            "name": "title",
            "description": "Article title"
          },
          {
            "type": "string",
            "name": "author",
            "description": "Author name"
          },
          {
            "type": "datetime",
            "name": "published",
            "description": "Publication date"
          }
        ]
      },
      "model": "gpt-4o-mini",
      "output_type": "inline"
    }
    curl -L \
      --request POST \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "name": "ProductInfo",
        "description": "Extract product details from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "boolean",
            "name": "in_stock",
            "description": "Product availability"
          },
          {
            "type": "object",
            "name": "ratings",
            "description": "Product rating information",
            "fields": [
              {
                "type": "double",
                "name": "average",
                "description": "Average rating score"
              },
              {
                "type": "integer",
                "name": "total_reviews",
                "description": "Number of reviews"
              }
            ]
          },
          {
            "type": "array",
            "name": "tags",
            "description": "Product tags",
            "fields": [
              {
                "type": "string",
                "name": "tag",
                "description": "Individual tag name"
              }
            ]
          }
        ]
      }'
    {
      "id": "schema_abc123xyz",
      "name": "ProductInfo",
      "description": "Extract product details from e-commerce pages",
      "fields": [...]
    }
    curl -L \
      --url 'https://api.gaffa.dev/v1/schemas' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    curl -L \
      --request PUT \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Content-Type: application/json' \
      --data '{
        "id": "schema_abc123xyz",
        "name": "ProductInfo",
        "description": "Extract detailed product information from e-commerce pages",
        "fields": [
          {
            "type": "string",
            "name": "product_name",
            "description": "The product title"
          },
          {
            "type": "decimal",
            "name": "price",
            "description": "Current price"
          },
          {
            "type": "string",
            "name": "brand",
            "description": "Product brand name"
          }
        ]
      }'
    curl -L \
      --request DELETE \
      --url 'https://api.gaffa.dev/v1/schemas/{id}' \
      --header 'X-API-Key: YOUR_API_KEY' \
      --header 'Accept: */*'
    {
      "name": "TagList",
      "description": "Extract article tags",
      "fields": [
        {
          "type": "array",
          "name": "tags",
          "description": "List of article tags",
          "fields": [
            {
              "type": "string",
              "name": "tag",
              "description": "Individual tag name"
            }
          ]
        }
      ]
    }
    {
      "name": "ProductWithReviews",
      "description": "Product details with nested review data",
      "fields": [
        {
          "type": "string",
          "name": "product_name",
          "description": "Product name"
        },
        {
          "type": "object",
          "name": "pricing",
          "description": "Pricing information",
          "fields": [
            {
              "type": "decimal",
              "name": "current_price",
              "description": "Current price"
            },
            {
              "type": "decimal",
              "name": "original_price",
              "description": "Original price before discount"
            },
            {
              "type": "integer",
              "name": "discount_percentage",
              "description": "Discount percentage"
            }
          ]
        }
      ]
    }
    selector
    Gaffa scrolling to the bottom of a simulated ecommerce page!
    Gaffa can help automatically fill out your forms!

    Convert Web Page to Markdown

    An example request that uses Gaffa to convert a web page page to markdown. This could be used to export web page reports or to print the content of a page in a readable format.

    The following example is a request we've pre-built to show you Gaffa's capabilities against our demo site. You can run this request right now in the Gaffa API Playground.

    Gaffa converts web pages to clean markdown, stripping away styling, scripts, and images. This optimizes content for LLM applications by reducing token usage while preserving essential information.

    API Request

    The request below uses the POST endpoint to open the demo site on the article simulator, wait for the article to load and then generate a markdown from the page's content which you can download for use in your program.

    Actions

    Response

    Here's an example of the PDF returned by the request after waiting for the article to load.

    Wait
    Generate Markdown
    5KB
    GaffaMarkdownExample.md
    Open
    {
      "url": "https://demo.gaffa.dev/simulate/article?loadTime=3&paragraphs=10&images=3",
      "proxy_location": null,
      "async": false,
      "max_cache_age": 0,
      "settings": {
        "record_request": false,
        "actions": [
          {
            "type": "wait",
            "selector": "article"
          },
          {
            "type": "generate_markdown"
          }
        ]
      }
    }