Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Type: print
Request that the browser prints the page to a PDF.
size
string
The size of paper the page should be printed to.
Default: A4
Accepted: ["A4"]
margin
integer
The margin of the page in pixels when the page is printed to PDF. Default: 20
orientation
string
Should execution of further actions continue or throw an error if this action fails.
Default: portrait
Accepted: ["portrait", "landscape"]
continue_on_fail
boolean
Should execution of further actions continue or throw an error if this action fails. Default: true
See universal parameters.
The following JSON prints the page to a PDF in landscape with margins of 20px.
"actions": [
{
"type": "print",
"page_size": "A4",
"orientation": "landscape",
"margin": 20
}
]
Type: generate_markdown
The markdown output format can export the data of the page (an article, table etc.) in a human and LLM readable format which removes unnecessary styling data and other "junk" that is only relevant for the site to work properly.
Gaffa exports GitHub flavoured markdown with comments removed and unknown tags ignored.
See universal parameters.
The following converts the current page to markdown:
"actions": [
{
"type": "generate_markdown"
}
]
Type: capture_dom
This action will capture and return the raw dom of the site which you can then extract data from on your end.
For common AI scenarios you may find this returns too much data so we have provided a generate_simplified_dom
action which distills the DOM to only the important elements.
See universal parameters.
Capture the raw DOM of the current page
"actions": [
{
"type": "capture_dom"
}
]
Type: capture_element
Returns the , essentially the contents, of a particular element on the page. This can be used when you are only interested in the contents of a particular element.
See .
The following code will wait 1 second for the .page_contents
element to appear and return an html file containg the div's innerHTML.
Type: capture_snapshot
This output type will return a HTML file which captures a static version of the page state. The page will load offline and can be saved to your local machine.
This will:
Load and embed all images on the page.
Embed all css files
Currently, Javascript will be disabled and interactivity might not worked as expected but this feature should be useful for preserving the page state as it was and allowing you to view it offline.
See
The following captures the current section of the page currently visible in the browser.
Here's an example that shows an offline snapshot of a site
Type: capture_screenshot
Takes a screenshot of the current page. You can choose to take a full screen screenshot showing the whole page or just the current view.
See .
The following captures the current section of the page currently visible in the browser.
An example screenshot in fullscreen
mode.
Type: download_file
Request a copy of the most recent file viewed in the browser.
See .
Currently this only works with PDF files.
The following waits 20s for a file to download and then returns it.
And the service responds with the file being in the action output:
Type: click
Request that the browser clicks a particular element on the page.
See .
The following code will wait 1 second and then continue with the next action, if provided.
The following code will wait for the logo to appear for a maximum of 5 seconds and it will continue with the list of actions
Type: parse_table
Finds a table on the page with a given selector and then converts the table data into a JSON object.
This action first fins the table headers and converts them into property names by converting them to lower case and replacing non-alphanumeric characters with underscores. It then processes each table row and for each cell is extracts the contents and saves a value. At the moment, all values will be string
types.
See .
The following code will wait 1 second for the .large_table
element to appear and return a JSON file with the headers and rows converted.
Type: generate_simplified_dom
When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into a LLM.
The generate_simplified_dom
output format processes the HTML in the following way:
Removes all links in the head
Removes all script
nodes and links to scripts
Removes all style
nodes
Remove style
attributes from all elements
Remove all links to stylesheets
Remove all noscript
elements outside of the body
Finds all hrefs
with query strings and removes the query strings
Important meta
tags are kept, all others are removed
Remove all alternate
links
Remove all SVG paths
Remove empty text nodes and excessive spacing
See .
The following JSON captures the DOM of the page and simplifies it.
"actions": [
{
"type": "capture_snapshot",
}
]
timeout
integer
The maximum amount of time the browser should wait for a file to download. Default: 5,000 (5s)
"actions": [
{
"type": "download_file",
"timeout": 20000
}
]
"actions": [
{
"id": "act_VHhrUbXjZSaYCPTqbBYD4acCzzeFGH",
"type": "download_file",
"query": "download_file?continue_on_fail=false&timeout=20000",
"timestamp": "2025-05-30T15:02:06.6615306Z",
"output": "https://storage.gaffa.dev/brq/downloads/5845df07-3749-424e-9c64-9602be19a857.pdf"
}
]
selector
string
The selector that defines the page element that the browser should click on.
timeout
integer
The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)
"actions": [
{
"type": "click",
"selector": "a.header__logo"
}
]
"actions": [
{
"type": "wait",
"selector": "a.header__logo",
"timeout": 5000,
"continueOnFail": true
}
]
"actions": [
{
"type": "generate_simplified_dom"
}
]
size
string
The size of paper the page should be printed to.
Default: view
Accepted: ["view", "fullscreen"]
"actions": [
{
"type": "capture_screenshot",
"size": "view"
}
]
"actions": [
{
"type": "capture_cookies"
}
]
selector
string
The selector that defines the element whose contents you want to capture.
timeout
integer
The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)
"actions": [
{
"type": "capture_element",
"selector": ".page_contents",
"timeout": 1000
}
]
selector
string
The selector that defines the table whose contents you want to parse.
timeout
integer
The maximum amount of time the browser should wait for the table defined by the selector to appear. Default: 5000 (5s)
"actions": [
{
"type": "parse_table",
"selector": ".large_table",
"timeout": 1000
}
]
When making a Browser Request you can specify a list of actions you wish for us to carry out on the requested web page. These actions conform to the following format:
{
"type": "", //the type of the action
//other params follow as key value pairs
"key": value //string, number etc.
}
All actions have the following parameters:
type
string
The type name of the action.
continue_on_fail
boolean
Should execution of further actions continue or throw an error if this action fails.
Default: false
customId
string
A customId to help you find the action in the response.
Default: null
Actions are carried out in the order they are submitted. Every action type has a continue_on_fail
parameter which defaults to false
, this means that if any action fails the execution of the browser request ends and an error will be returned. Setting continue_on_fail
to true
ensures that all actions are carried out, regardless of previous action results and an error will not be returned.
As shown above, you can submit a customId with each action you submit to the API. We'll include this Id in the outputs from the browser request so you can find a certain action's output and/or status easily in the response.
When a browser request has completed, information on an action's execution
{
"id": "", //a unique id given to the action by Gaffa
"type": "capture_screenshot", //the type of the action
"query": "", //a representation of the action in querystring format
"timestamp": "", //the UTC timestamp the action was executed
"output": "" //if the action has an output you will find a url for this here,
"error": "" //if the requesst fails the error message will be returned here
}
The Gaffa API supports the following actions detailed below. Click the "read more" buttons to read more information about each type.
Type: wait
Request that the browser waits a given amount of time or for a particular item to appear on the page.
time
integer
The time in milliseconds that the browser should wait.
selector
string
The that defines the page element that the browser should wait to appear.
timeout
integer
The maximum amount of time the browser should wait for the provided selector to appear. Default: 5,000 (5s)
See universal parameters.
The following code will wait 1 second and then continue with the next action, if provided.
"actions": [
{
"name": "wait",
"time": 1000,
}
]
The following code will wait for a table to appear on the page for a maximum of 5 seconds. If the table has not appeared after 5 seconds the next action will be executed, if provided.
"actions": [
{
"name": "wait",
"selector": "table",
"timeout": 5000,
"continueOnFail": true
}
]
Type: type
Request that the browser type a particular bit of text into a field.
selector
string
The that defines the page element that the browser should click on.
text
string
The text the browser should enter into the text field.
timeout
integer
The maximum amount of time the browser should wait for the element that needs to be typed in to appear. Default: 5000 (5s)
See universal parameters.
The following action will type into a particular text field.
"actions": [
{
"name": "type",
"selector": "#postform-text",
"text": "Hello world!"
}
]
The following code will wait a maximum of 10 seconds for the email input to appear in the field and then type in the provided email.
"actions": [
{
"name": "type",
"selector": "form input[name="email"]",
"text": "test@test.com"
"timeout": 10000
}
]
Type: parse_json
Use AI to parse web content from text into a pre-defined data schema and return it as a JSON object.
This feature currently works for online PDFs and web page text.
See .
The credits this action uses depends on the model used. Here are the current supported models and their pricing:
click
Click on a given element
scroll
Scroll to a particular point on the page or, in the case of pages with infinite scrolling, scroll until a given time has elapsed.
type
Type the provided text into a given element
wait
Wait for a given time to elapse or an element to appear on page before proceeding to the next action.
capture_cookies
Save a JSON object of cookies for the current page
capture_dom
Export the raw DOM page data
capture_screenshot
Capture a screenshot of the web page
capture_snapshot
Create a completely static version of the web page which can be accessed offline
download_file
Download an online file using Gaffa
generate_markdown
Convert the page into markdown
generate_simplified_dom
Generate a simplified version of the DOM
parse_json
Parse online data to a defined JSON schema
print
Print the web page to a PDF
data_schema_id
string
The id of the data schema you have defined that you want to transform the content into.
You must provide a data_schema
or data_schema_id
with your request.
data_schema
json
A JSON object describing the data_schema you want to transform the content into.
You must provide a data_schema
or data_schema_id
with your request.
instruction
string
A custom instruction, in addition to any detail you have added to the data schema, that you want to include with this particular parse.
model
string
`
The AI model you wish to use to parse the content into JSON.
Default: gpt-4o-mini
Accepted: ["gpt-4o-mini"]
input_token_cap
int
The max number of source input tokens that will be passed to the AI model to parse. This can be used to prevent unnecessary credit usage. If your source input is longer than the token cap, it will be abbreviated. Default: 1,000,000
selector
string
The selector that defines an element you want to parse the content of - this is useful if you are only interested in the contents of a certain element.
output_type
string
Should the action output be saved to a file where a URL will be returned or should the parsed JSON object be included directly in the request.
Default: file
Accepted: ["file", "inline"]
gpt-4o-mini
1 credit per 10,000 input tokens
4 credits per 10,000 output tokens
Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.
Type: scroll
Request that the browser scrolls to a certain point on the page or, in the case of pages with infinite scrolling, scrolls for a particular amount of time.
See .
Gaffa gives you a flexibility about how fast you scroll down the page which can be really useful to get around restrictions enforced by some sites which detect and limit fast scrolling. By experimenting with scroll_speed
and interval
you will be able to create the perfect scrolling action for your scenario. The speed settings are as follows:
instant
- the page will smoothly scroll to the desired position immediately, useful for sites with no rate limits or loading events caused by scroll actions.
medium
- human-like scrolling at a normal speed to the desired position. Gaffa will scroll in much the same way as you would using a mouse.
slow
- human-like scrolling at a very slow speed to the desired position. The speed is comparable to scrolling whilst reading a page.
interval
allows you to adjust the scroll speed further by inserting pauses between scroll events.
If wait_time
is set to 0 and Gaffa arrives at the desired location then Gaffa will immediately mark the action as succeeded. However, if another value is set then the page will be monitored for the desired amount of time to check for further expansions. If, during this period, the page expands again then Gaffa will continue scrolling to the desired location and the wait will reset.
The following code will scroll half way down the page.
The following code will scroll to the bottom of the page and then keep scrolling when new content loads for a maximum of 25 seconds, waiting 1 second for new content and scrolling at a slow pace with 1 second between scroll actions.
Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.
Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.
Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.
percentage
integer
The percentage the page should scroll up or down (+/-) Range: [-100 - 0 - 100] Default: 100 (% - scroll to bottom)
wait_time
integer
After arriving at the desired scroll location this the time Gaffa should monitor for changes to the page height before marking the action as succeeded. Read more below. Default: 0
max_scroll_time
integer
The maximum amount of time the page should be scrolled for, in milliseconds. After this time passes, the action will be cancelled. This doesn't cause the action to fail. Default: 20,000 (20s)
scroll_speed
string
The speed which the page should scroll to the desired point. You can read more about this below.
Default: medium
Accepted: [slow
, medium
, instant
]
interval
integer
The amount of time, in milliseconds, that scrolling should pause between scroll events. Read more about this below. Default: 0
"actions": [
{
"name": "scroll",
"percentage": 50,
}
]
"actions": [
{
"name": "scroll",
"percentage": 100,
"scroll_speed": "slow",
"max_scroll_time": 25000,
"interval": 1000,
"wait_time": 1000
}
]
Beta Feature: This feature is currently in beta and restricted to approved users. If you're are interested in trying it, please and we can enable this feature for your account.
Type: block_dom_removals
This action will prevent the page from removing items from the page. This is useful if you are trying to scrape data from a Javascript-based web application that removes items from the page when they are out of view which can make grabbing data difficult.
Using this action will block DOM removals for the rest of the browser request.
See universal parameters.
Capture the cookies of the current page
"actions": [
{
"type": "block_dom_removals"
}
]