Generate Simplified DOM
Type: generate_simplified_dom
When you're looking at the DOM of a web page, there's a lot of unnecessary data that can be discarded if you are only interested in the page's elements or looking to export the data into a LLM.
The generate_simplified_dom output format processes the HTML in the following way:
Removes all links in the
headRemoves all
scriptnodes and links to scriptsRemoves all
stylenodesRemove
styleattributes from all elementsRemove all links to stylesheets
Remove all
noscriptelements outside of the bodyFinds all
hrefswith query strings and removes the query stringsImportant
metatags are kept, all others are removedRemove all
alternatelinksRemove all SVG paths
Remove empty text nodes and excessive spacing
Parameters
See universal parameters.
Usage
The following JSON captures the DOM of the page and simplifies it.
"actions": [
{
"type": "generate_simplified_dom"
}
]Example Output
Last updated