Skip to main content
View source

Apify

View as Markdown

A RocketRide tool node that exposes Apify Actors (web scrapers and automation tasks) to an AI agent.

What it does

Gives an agent the ability to run Apify Actors and read their results. The agent can launch any Actor by ID or name, wait for it to finish, and receive the items it produced, or read items from an existing Apify dataset. Useful for agents that need to extract structured web data on demand.

Uses the official apify-client Python SDK. The client is created once at pipeline start from the configured API token; an empty token fails the node at startup.

Every Actor run is bounded by three configurable safety limits: item count (default 100), run timeout (default 120 seconds), and spend cap (default $1 USD). This prevents an agent-chosen Actor from hanging the pipeline or overspending. Agent-supplied limit values are clamped to the configured maximum.


Configuration

FieldTypeDescription
apikeystringDefault empty. Apify API token
max_itemsintegerDefault 100. Upper cap on items returned per call (agent requests are clamped to this).
run_timeout_secsintegerDefault 120. Max seconds an Actor run may take before it is stopped.
max_cost_usdnumberDefault 1. Spend limit per run for pay-per-event Actors.

Invalid or missing values for max_items, run_timeout_secs, and max_cost_usd fall back to their defaults; integer limits are floored at 1.


Available tools

Tools are registered under the apify prefix.

run_actor

Run an Apify Actor to completion and return the items it produced.

| Tool | Description | |---|---|---| | run_actor | Run an Apify Actor to completion and return the items it produced. | | get_dataset_items | Read items from an existing Apify dataset. |

Returns { success, dataset_id, count, items } -- the run's default dataset ID and its items. If the run produces no dataset, returns success: true with an empty dataset_id and zero items.

get_dataset_items

Read items from an existing Apify dataset.

ParameterRequiredDescription
dataset_idyesApify dataset ID to read
limitnoMax items to return (default 100, capped by max_items)

Returns { success, count, items }.


Safety limits

run_actor passes the configured bounds directly to the Apify run:

  • Item cap -- the effective limit (agent request clamped to max_items) is sent as the run's max_items and also applied when reading the resulting dataset.
  • Timeout -- run_timeout_secs bounds both the Actor's run time and how long the node waits for the run to finish.
  • Cost cap -- max_cost_usd is sent as the run's max_total_charge_usd, limiting spend on pay-per-event Actors.

Authentication

Set apikey to an Apify API token (stored as a secure field). The token is used to construct the ApifyClient when the pipeline starts and is required -- configuration validation warns, and pipeline startup fails, if it is missing.

See the Apify documentation for obtaining a token and for per-Actor input schemas.


Schema

FieldTypeDescriptionDefault
tool_apify.apikeystringAPI Token
Apify API token
""
tool_apify.max_cost_usdnumberMax Cost (USD)
Spend limit per run for pay-per-event Actors.
1
tool_apify.max_itemsintegerMax Items
Upper cap on items returned per call (agent requests are clamped to this).
100
tool_apify.run_timeout_secsintegerRun Timeout (seconds)
Max seconds an Actor run may take before it is stopped.
120

Dependencies

  • apify-client >=3,<4