Skip to main content

Data Extraction Actions

Actions for reading data from page elements.

getText

Get the text content of an element.

Value: string (selector)

{ "getText": "h1.title" }

Returns: string — the element's textContent

Options

OptionTypeDefaultDescription
savestringCache result with this key for later use with ${key}
pushstringPush result into an array in the data cache
waitForbooleantrueWait for element
timeoutnumber10000Max wait time
iframestring | string[]Target inside iframe
{
"getText": ".product-price",
"options": { "save": "price" }
}

getHTML

Get the HTML content of an element.

Value: string (selector)

{ "getHTML": ".content" }

Returns: string — the element's innerHTML (or outerHTML if outer: true)

Options

OptionTypeDefaultDescription
outerbooleanfalseReturn outerHTML instead of innerHTML
savestringCache the result
pushstringPush into array
{
"getHTML": "#article",
"options": { "outer": true, "save": "articleHTML" }
}

getAttribute

Get an attribute value from an element.

Value: [selector, attribute]

{ "getAttribute": ["a.main-link", "href"] }

Returns: string — the attribute value

{
"getAttribute": ["img.avatar", "src"],
"options": { "save": "avatarUrl" }
}

getValue

Get the current value of a form input, textarea, or select element.

Value: string (selector)

{ "getValue": "input[name=email]" }

Returns: string — the element's .value

{
"getValue": "select#country",
"options": { "save": "selectedCountry" }
}

extractAll

Extract structured data from multiple matching elements. Powerful for scraping lists, tables, and grids.

Value: string (container selector)

{
"extractAll": ".product-card",
"options": {
"fieldMap": {
"title": "h3",
"price": ".price",
"link": ["a", "href"],
"image": { "selector": "img", "attribute": "src" }
}
}
}

Returns: array of objects

[
{ "title": "Product A", "price": "$19.99", "link": "/products/a", "image": "/img/a.jpg" },
{ "title": "Product B", "price": "$29.99", "link": "/products/b", "image": "/img/b.jpg" }
]

Field Map Syntax

The fieldMap maps field names to extraction rules:

FormatDescriptionExample
"selector"Get textContent of first match"title": "h3"
["selector", "attr"]Get attribute of first match"link": ["a", "href"]
{ selector, attribute }Explicit object form"img": { "selector": "img", "attribute": "src" }

Options

OptionTypeDefaultDescription
fieldMapobjectMap of field names to selectors
limitnumber0 (unlimited)Max number of items to extract
savestringCache the result array
pushstringPush result into array
waitForbooleantrueWait for container elements
timeoutnumber10000Max wait time
{
"extractAll": "table tbody tr",
"options": {
"fieldMap": {
"name": "td:nth-child(1)",
"email": "td:nth-child(2)",
"role": "td:nth-child(3)"
},
"limit": 10,
"save": "users"
}
}

getSavedData

Retrieve previously cached data values.

Value: string[] (array of key names)

{ "getSavedData": ["price", "title", "users"] }

Returns: object — key-value map of cached data

{
"price": "$19.99",
"title": "Product A",
"users": [...]
}

clearSavedData

Clear cached data.

Value: true or string[] (specific keys to clear)

{ "clearSavedData": true }
{ "clearSavedData": ["price", "title"] }

Example: Scrape a Product Listing

{
"actions": [
{ "openNewTab": "https://example.com/products" },
{ "waitForElement": ".product-card" },
{
"extractAll": ".product-card",
"options": {
"fieldMap": {
"name": "h3.product-name",
"price": ".price",
"rating": ".stars",
"url": ["a", "href"],
"image": ["img", "src"]
},
"limit": 20,
"save": "products"
}
},
{ "getSavedData": ["products"] }
]
}

Example: Collect Data Across Pages

{
"actions": [
{ "openNewTab": "https://example.com/page/1" },
{ "getText": "h1", "options": { "save": "pageTitle" } },
{ "getText": ".result-count", "options": { "save": "totalResults" } },
{ "getAttribute": ["a.next-page", "href"], "options": { "save": "nextPageUrl" } },
{ "getSavedData": ["pageTitle", "totalResults", "nextPageUrl"] }
]
}