Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
What is Gaffa?
Gaffa is a powerful API for browser automation which allows you to control real web browsers at scale through a simple interface with no configuration necessary. We'll handle the complexities of managing infrastructure like virtual machines, proxies and caching so you can focus on building powerful and reliable web automation and AI applications!
Gaffa is ready to power your web automations:
Simplicity - there's no need to learn another new framework, Gaffa is accessible through a simple REST API - just tell it what site you want to visit and what actions you want to perform and it will be carried out as soon as you send the request.
Real browsers - headless browsers are popular but we make it simple to control real cloud-hosted browsers at scale which render JavaScript sites exactly as they would on a local machine, are harder to detect when doing scraping and allow full observability. We're also planning to allow you to go beyond just being able to control web browsers!
Proxies - you can easily choose to route your traffic through a network of residential proxy IP addresses to help avoid bot-detection on sites you are trying to automate.
Scalable - whether you want to control a single cloud browser or 100s in parallel with Gaffa you can do that easily without one thought about infrastructure management.
Powerful data processing - once you've accessed your desired site you can export your data in a constantly growing number of formats. If you want the to feed into a large language model or to feed into a vision modal we can help.
We'll be sporadically announcing updates and new features in our newsletter - .
API Playground
Start experimenting with the Gaffa API right now.
Get Started
The simple steps to get you started using Gaffa in your apps.
API Reference
Explore the API and docs for the finer details
An introduction to the Gaffa Browser API. Learn how you can get started building fast, powerful web automations!
Browser requests are charged in terms of credits based on the following factors:
Request length: Billed at 1 credit per 30 seconds the request takes to run on the browser.
If screen recording is enabled, this is doubled to 2 credits per 30 seconds.
Proxy bandwidth usage: All requests that use a proxy_location
parameter use our network of residential proxies and are billed at 1500 credits per 1GB of bandwidth used.
Each successful request will deduct the corresponding number of credits from your monthly allowance. Be sure to use as many of your monthly credits as you want as they don't roll over month to month.
Welcome to the Gaffa documentation site! You'll find everything you need here to get started using API including , you can use to interact with our cloud browsers and you can run right away in our API Playground.
Gaffa is currently in it's very early stages, so we'd love to hear how we can improve our docs and API to make life easier for our users. If you have any questions or comments please or us . To stay up to date with latest developments, features and news on mission to support the development of revolutionary AI Agents, sign up to sporadic updates.
You can sign up to create a Gaffa account . After signing up you'll immediately be able to use the API to start using our which has a number of pre-built automations for simulating a range of scenarios.
In order to avoid scaling issues for our existing customers we are currently operating a queuing system for new accounts. Simply join the queue when prompted on your and we'll let you know when you have access. If you want to jump the queue, you can fill out a short survey to help us better understand our users and we'll approve your account sooner!
The easiest way to make your first Gaffa is to start using our where you can see several pre-made and interactive browser request examples of automations we've built against our test site which simulates some common scraping and web automation scenarios. You can run these examples without a paid account and also edit them easily to experiment - once you have a paid account you can also use the playground to build your automations for other sites.
Once you have a paid account and are ready to start building your own browser requests you'll want to read about all the other you can use for your solution as well as how you can easily use , as well as the .
View our current pricing plans on the Gaffa
Print to PDF
Export a web page to PDF and wait for elements to load with the Gaffa API.
Convert to Markdown
Export a web page to markdown format - useful feeding into LLM apps.
Infinitely Scroll
Scroll the bottom of a page that infinitely loads items and record the interaction.
Capture Screenshot
Interact with a page and capture the a screenshot of the whole page.
Form Completion
Fill out a form in a human-like way and record the interaction
Subscriptions and Credits
You can now buy "pay as you go" credits to be used without a subscription, or to complement the credits in your subscription for larger one-off jobs.
Actions
Proxies
Actions
Settings
Stealth
We've developed some new browser technology which makes Gaffa's browser look even more like human-initiated website traffic.
Type: click
Request that the browser clicks a particular element on the page.
selector
string
timeout
integer
The maximum amount of time the browser should wait for the element defined by the selector to appear. Default: 5000 (5s)
The following code will wait 1 second and then continue with the next action, if provided.
The following code will wait for the logo to appear for a maximum of 5 seconds and it will continue with the list of actions
Type: generate_markdown
The markdown output format can export the data of the page (an article, table etc.) in a human and LLM readable format which removes unnecessary styling data and other "junk" that is only relevant for the site to work properly.
The following converts the current page to markdown:
Type: generate_simplified_dom
When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into a LLM.
The generate_simplified_dom
output format processes the HTML in the following way:
Removes all links in the head
Removes all script
nodes and links to scripts
Removes all style
nodes
Remove style
attributes from all elements
Remove all links to stylesheets
Remove all noscript
elements outside of the body
Finds all hrefs
with query strings and removes the query strings
Important meta
tags are kept, all others are removed
Remove all alternate
links
Remove all SVG paths
Remove empty text nodes and excessive spacing
The following JSON captures the DOM of the page and simplifies it.
Type: print
Request that the browser prints the page to a PDF.
size
string
The size of paper the page should be printed to.
Default: A4
Accepted: ["A4"]
margin
integer
The margin of the page in pixels when the page is printed to PDF. Default: 20
orientation
string
Should execution of further actions continue or throw an error if this action fails.
Default: portrait
Accepted: ["portrait", "landscape"]
continue_on_fail
boolean
Should execution of further actions continue or throw an error if this action fails. Default: true
The following JSON prints the page to a PDF in landscape with margins of 20px.
Type: type
Request that the browser type a particular bit of text into a field.
selector
string
text
string
The text the browser should enter into the text field.
timeout
integer
The maximum amount of time the browser should wait for the element that needs to be typed in to appear. Default: 5000 (5s)
The following action will type into a particular text field.
The following code will wait a maximum of 10 seconds for the email input to appear in the field and then type in the provided email.
An example request that uses Gaffa to convert an HTML page to a PDF. There are lots of HMTL to PDF API's but Gaffa handles it easily, as well as doing much more.
Gaffa's print to PDF feature allows you to export web pages as PDF files easily. Unlike the standard "Print to PDF" in your local browser, Gaffa's feature waits for specific items to load, uses proxies, and scales with your product's growth. Enhance your customer experience and streamline your PDF export process
Read the full documentation for these actions here.
Here's an example of the PDF returned by the request after waiting for the table to load.
The following endpoint allows you to query browser request for your account by ID.
We use API Keys for authenticating requests to our API. In this document we'll explain how you can manage and use the keys for your account.
If you are worried you have exposed your Gaffa API key or just want to periodically rotate your keys you can create another key and then delete your old keys. Deleted keys will immediately stop working for new requests to the API but past browser requests made using old keys will still be available.
Our API is secured with a customer header X-API-Key
whose value should be any current API key in your account. That's all you need to add to your request!
Type: capture_screenshot
Takes a screenshot of the current page. You can choose to take a full screen screenshot showing the whole page or just the current view.
The following captures the current section of the page currently visible in the browser.
An example screenshot in fullscreen
mode.
Type: capture_snapshot
This output type will return a HTML file which captures a static version of the page state. The page will load offline and can be saved to your local machine.
This will:
Load and embed all images on the page.
Embed all css files
Currently, Javascript will be disabled and interactivity might not worked as expected but this feature should be useful for preserving the page state as it was and allowing you to view it offline.
The following captures the current section of the page currently visible in the browser.
Here's an example that shows an offline snapshot of a site
In the following pages you can view all the pre-built requests we've built to show what is possible with the Gaffa web automation API.
The following endpoint creates a browser request and either runs it synchronously or returns immediately with an ID so you can check it status later using this endpoint.
The following endpoint allows you to query for multiple browser requests, either by status or a list of particular ids, submitting a request with neither of these will return all requests for your account.
Keep up to date with Gaffa's changes. To be the first to hear about changes subscribe to our .
We've adjusted the plans and credits slightly, take a look at the .
We've edited the , and actions to better support finding elements to interact with or wait for inside iframes with no extra configuration necessary.
We have added support for another European location,
action no longer removes classes.
default timeout
now 5 seconds
removed timeout and add new functionality using wait_time
, max_scroll_time
, scroll_speed
and interval
default timeout now 5 seconds
default timeout now 5 seconds
caps media downloads to prevent excess data usage.
added to cap the duration of requests.
For common AI scenarios you may find this returns too much data so we have provided a action which distills the DOM to only the important elements.
See .
The that defines the page element that the browser should click on.
See .
Gaffa exports with comments removed and unknown tags ignored.
See .
See .
See .
The that defines the page element that the browser should click on.
See .
The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .
The request below uses the to open the demo site on the table page, wait for the table to load and then print the webpage to a PDF in size A4 with a margin of 20 and using the portrait orientation.
For more information on browser requests, .
Once your account is approved, you will need to create an API key to send your requests to our API. Go to your account and create a new key with a name. Once the key is created, copy the value and you will immediately be free to start using it to make requests.
See .
See
You can start using these in the once you've created an account.
For more information on browser requests, .
For more information on browser requests, .
size
string
The size of paper the page should be printed to.
Default: view
Accepted: ["view", "fullscreen"]
Type: wait
Request that the browser waits a given amount of time or for a particular item to appear on the page.
time
integer
The time in milliseconds that the browser should wait.
selector
string
timeout
integer
The maximum amount of time the browser should wait for the provided selector to appear. Default: 5,000 (5s)
The following code will wait 1 second and then continue with the next action, if provided.
The following code will wait for a table to appear on the page for a maximum of 5 seconds. If the table has not appeared after 5 seconds the next action will be executed, if provided.
An example request that uses Gaffa to dismiss a modal, scroll to the bottom of a page and then capture a full height screenshot.
Gaffa can also capture screenshots at any point during your interaction for use in your app or just to work out exactly was being shown at a given point in time. You can capture just what is shown as if you were looking at the screen or the full height of the page.
The export full height screenshot of the page showing all items.
An example request that uses Gaffa to convert a web page page to markdown. This could be used to export web page reports or to print the content of a page in a readable format.
Gaffa converts web pages to clean markdown, stripping away styling, scripts, and images. This optimizes content for LLM applications by reducing token usage while preserving essential information.
Here's an example of the PDF returned by the request after waiting for the article to load.
All actions have the following parameters:
Actions are carried out in the order they are submitted. Every action type has a continue_on_fail
parameter which defaults to false
, this means that if any action fails the execution of the browser request ends and an error will be returned. Setting continue_on_fail
to true
ensures that all actions are carried out, regardless of previous action results and an error will not be returned.
As shown above, you can submit a customId with each action you submit to the API. We'll include this Id in the outputs from the browser request so you can find a certain action's output and/or status easily in the response.
When a browser request has completed, information on an action's execution
The Gaffa API supports the following actions detailed below. Click the "read more" buttons to read more information about each type.
Type: scroll
Request that the browser scrolls to a certain point on the page or, in the case of pages with infinite scrolling, scrolls for a particular amount of time.
Gaffa gives you a flexibility about how fast you scroll down the page which can be really useful to get around restrictions enforced by some sites which detect and limit fast scrolling. By experimenting with scroll_speed
and interval
you will be able to create the perfect scrolling action for your scenario. The speed settings are as follows:
instant
- the page will smoothly scroll to the desired position immediately, useful for sites with no rate limits or loading events caused by scroll actions.
medium
- human-like scrolling at a normal speed to the desired position. Gaffa will scroll in much the same way as you would using a mouse.
slow
- human-like scrolling at a very slow speed to the desired position. The speed is comparable to scrolling whilst reading a page.
interval
allows you to adjust the scroll speed further by inserting pauses between scroll events.
If wait_time
is set to 0 and Gaffa arrives at the desired location then Gaffa will immediately mark the action as succeeded. However, if another value is set then the page will be monitored for the desired amount of time to check for further expansions. If, during this period, the page expands again then Gaffa will continue scrolling to the desired location and the wait will reset.
The following code will scroll half way down the page.
The following code will scroll to the bottom of the page and then keep scrolling when new content loads for a maximum of 25 seconds, waiting 1 second for new content and scrolling at a slow pace with 1 second between scroll actions.
An example request that uses Gaffa to infinitely scroll down a simulated ecommerce site whilst recording the interaction.
Gaffa automates infinite scrolling on dynamic pages like e-commerce storefronts. Set a duration, and Gaffa will capture all content as it scrolls. Each session can be recorded as a video for playback, letting you debug or review the interaction.
Here's a video showing Gaffa scrolling the page for 20 seconds as more items load.
Read more about screen recording here. (TODO)
An example request that uses Gaffa to automate the completion of a form and waits for a success modal to appear.
Filling forms is tedious, Gaffa can be used to fill out a form in a human-like manner so you can spend time doing much more interesting things.
Here's a video showing Gaffa filling out the page and waiting for the success modal.
Read more about screen recording here (TODO).
The that defines the page element that the browser should wait to appear.
See .
The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .
The request below uses the to open the demo site on the ecommerce page with 20 items, wait for and dismiss the dialog, scroll to the bottom of the page and capture a full height screenshot.
The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .
The request below uses the to open the demo site on the article simulator, wait for the article to load and then generate a markdown from the page's content which you can download for use in your program.
When you can specify a list of actions you wish for us to carry out on the requested web page. These actions conform to the following format:
See .
The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .
The request below uses the to open the demo site on the ecommerce site simulator with an infinitely scrolling storefront. It will wait for and dismiss a dialog box, wait for a product to load and then scroll down the page for a maximum of 20 seconds - if new items load it will keep scrolling.
The following example is a request we've pre-built to show you Gaffa's capabilities against our You can run this request right now in the .
The request below uses the to open the demo site on the form simulator page with some sections pre-filled (for speed). After typing in the required information and clicking submit, Gaffa waits for the success dialog to show before returning a video of the interaction.
type
string
The type name of the action.
continue_on_fail
boolean
Should execution of further actions continue or throw an error if this action fails.
Default: false
customId
string
A customId to help you find the action in the response.
Default: null
click
Click on a given element
scroll
Scroll to a particular point on the page or, in the case of pages with infinite scrolling, scroll until a given time has elapsed.
type
Type the provided text into a given element
wait
Wait for a given time to elapse or an element to appear on page before proceeding to the next action.
capture_dom
Export the raw DOM page data
capture_screenshot
Capture a screenshot of the web page
generate_markdown
Convert the page into markdown
generate_simplified_dom
Generate a simplified version of the DOM
capture_snapshot
Create a completely static version of the web page which can be accessed offline
print
Print the web page to a PDF
percentage
integer
The percentage the page should scroll up or down (+/-) Range: [-100 - 0 - 100] Default: 100 (% - scroll to bottom)
wait_time
integer
max_scroll_time
integer
The maximum amount of time the page should be scrolled for, in milliseconds. After this time passes, the action will be cancelled. This doesn't cause the action to fail. Default: 20,000 (20s)
scroll_speed
string
interval
integer
After arriving at the desired scroll location this the time Gaffa should monitor for changes to the page height before marking the action as succeeded. Read more . Default: 0
The speed which the page should scroll to the desired point. You can read more about this .
Default: medium
Accepted: [slow
, medium
, instant
]
The amount of time, in milliseconds, that scrolling should pause between scroll events. Read more about this . Default: 0
Making web automation requests has never been so simple.
Browser Requests are our first main product and allow you to send the Gaffa API a URL and a list of actions you want to be carried out, including any outputs you want from the page. We'll carry out the request on our cloud browsers and return you the response with no need to worry about proxies, IP rotation, web automation frameworks and scaling.
There's absolutely zero configuration needed and you can interact with Gaffa from any program that can send web requests. We think it's by far the simplest way to automate simple web tasks and the good news is, we're just getting started and have much more planned.
Gaffa makes proxying your traffic through a global network of residential proxies super simple. Setting proxy_location
in your request will allow you to utilize one of our partner third party proxy services to gain local access to a site.
Not setting a proxy_location
will mean the request does not use a proxy server and will use a generic datacenter IP.
United States
us
Ireland
ie
Singapore
sg
France
fr
Currently all our IP addresses are residential IP addresses which are procured through reputable third parties.
IP rotation is an essential part of any web data, scraping or automation task. In Gaffa, each browser request is treated as unique. We regularly rotate the IP addresses used so you should assume that each request will be carried out from a different IP address from the last.
Whilst we'll do our best to provide access to as wide a range of sites as possible we may have to restrict access to certain sites to prevent abuse of our service or of other services. Our proxy partners may also enforce restrictions on certain sites and categories of sites which we don't have any control over.
When we were building Gaffa we noticed that a lot of pre-existing scraping tools don't allow users to easily share their scraped web data with each other, despite many users requesting the same web pages on the same sites. Not only is this a waste of a user's allowance, it also puts a burden on the site owners who are serving the same data to different users for the same purpose. Because of this in Gaffa we have created a service-wide cache.
When making a browser request you can provide a MaxCacheAge
parameter which is a number in seconds equal or greater than 0. This values denotes the maximum age of data you would accept from the API.
If another user of our service has requested the same URL with exactly the same parameters and actions as you in this chosen timeframe then the response will be returned to you immediately and the response will not be carried out on one of our browsers. If there are multiple identical requests in the given timeframe then the most recent will be returned.
This will save you time waiting for the response, as well as credits, because requests returned from the cache don't use any bandwidth.
By specifying record_request
you can ask Gaffa to screen record your automation and return a video in the response allowing you to view the magic happening or to debug your automation.
If you are using Gaffa on a site with lots of images and videos and more interested in the text data on the page, you can cap how much data a page loads in MB using the max_media_bandwidth
setting. This makes your automation faster and prevents spending credits on data you aren't interested in.
With the max_media_bandwidth
value set, Gaffa monitors data being downloaded by the page and when downloaded data exceeds the given number of MB, all further downloads of images or video will be cancelled.
max_media_bandwidth
defaults to null
meaning downloads are not capped.
Using the setting time_limit
caps the maximum running time of the request in milliseconds. If this time expires all incomplete actions will be cancelled and the request will return an error. This cap has to be less than the maximum request running time dictated by your plan and if not set, will default to this value.
We believe your AI Agents should be able to use the internet exactly how humans would. Gaffa can help you get access to sites with some of the most challenging anti-bot restrictions using a combination of proxies, human-like behavior, captcha solving and a custom browser implementation. We handle and maintain all of that so you can focus on building your solution!
Running a new browser request is as simple as sending the following . Below, you can see the url () and a list of actions which instruct Gaffa to wait for a table to load and print the page to PDF.
You can read more about this particular example and how you can run it right now in our API Playground
In order to access public sites and use proxy servers you'll need to sign up for a but after that you'll be able to build automations for any site you wish.
Recording requests comes at an .
We currently support ten different types of actions which you can read more about .
We've created a number of sample browser requests you can read about or you can jump straight into the to start running them right now.
Check out our API reference for more details about the endpoints available, particularly .
This endpoint retrieves a browser request by its ID.
The unique identifier of the browser request to retrieve.
The unique identifiers of the browser request to retrieve.
This endpoint loads the required URL in our browser and then performs the selected actions.
The location of the proxy server that your request will be routed through, null means no proxy is used
null
The url you want our browsers to visit on your behalf
Whether the request should be processed asynchronously, synchronous requests can be maximum 60 seconds long.
true
The maximum age of a cached result in seconds. 0 means the cache will never be used
0
This endpoint retrieves browser requests in bulk by id or status.
The unique identifiers of the browser requests to retrieve.
{"value":"brq_V2P6PqrZpycFtbc7mtXE4tsNbeg2N6,brq_V2P6X38RDRMRyYcNJ82qPSH5eFfQRD"}
The statuses of the browser requests to filter by. Valid values: pending, running, completed, failed
{"value":"completed,running"}
Items to return per page (default: 30).
{"value":20}
Page number of the pagination (default: 1).
{"value":1}