Agent harness & long-running tasks

How Rank agents sustain long, autonomous work — the ReAct loop, stop conditions, stagnation control, context compaction, sub-agents and multi-agent orchestration.

An agent is not a single model call. When an agent has tools and is allowed to act, it runs inside a harness: a loop that lets it plan, take an action, observe the result, reflect, and keep going until the request is fully covered. The harness is what makes a request like “investigate every open port and cross-check the findings” complete on its own instead of stopping after the first tool call.

This page explains how that loop works and the mechanisms that keep it productive over long-running tasks — work that spans many iterations and would otherwise overflow the context window or wander in circles.

The ReAct loop

Every iteration follows the same shape: the model thinks, optionally calls one or more tools, observes the results, and decides the next action. The agent writes its answer as plain text — with no tool call — only when it has everything it needs.

agent_start
  plan                          (optional, for substantial missions)
  repeat each iteration:
    iteration_start
    thinking
    tool_call -> tool_result    (zero or more tools)
    observe results + record the step
    context-compaction check
    stop here if a stop condition is met
  final answer                  (streamed as `content`, when no tool is needed)
agent_finished -> complete

Before the loop, if the mission is substantial and the agent has tools, a short planning phase runs: the agent writes a brief, concrete plan (what a complete answer looks like, which tools to use and in what order, and how it will know it is done). The plan is then carried into every iteration as context.

Stop conditions

The loop ends as soon as one of these conditions is true. The final stop_reason is reported in the agent_finished event:

`stop_reason`	Meaning
`goal_reached`	The agent produced a final answer with no further tool calls. The normal, successful exit.
`max_iterations`	The iteration cap was hit (default 25 for a top-level agent).
`timeout`	The wall-clock budget elapsed (default 900s / 15 min for a top-level agent).
`budget`	The token budget was exhausted (roughly 80% of the model’s usable context).
`stagnation`	The agent stopped making progress (see below).
`cancelled`	The client cancelled the run.

These limits are defaults and are configured per agent and model; treat them as guardrails, not contracts. When the loop stops for any of max_iterations, timeout, budget or stagnation, the agent does not end abruptly: it runs a final synthesis pass that writes the best possible answer from everything gathered so far and is explicit about what is still uncertain.

Staying on track

Two mechanisms stop an agent from spinning without making progress:

Stagnation detection. After each round of tool results, the harness checks whether anything new was learned (it fingerprints the outputs). If several iterations in a row produce no novel information, a stagnation counter rises; once it crosses the threshold (default 3), the loop stops with stop_reason: stagnation and synthesizes an answer.
Duplicate-call suppression. A tool invoked again with identical arguments is skipped rather than re-run; the agent sees a [SKIPPED] result (surfaced as a tool_result with skipped: true) and is pushed to try something different.

In pentest agents, the harness also issues a nudge when the agent leaves obviously relevant tools unused, redirecting it to keep investigating before it gives up.

Long-running tasks: context compaction

The single most important mechanism for long work is context compaction. As the agent works, every step (its narration, the tools it called and their outputs) accumulates in a scratchpad. Left unchecked, that scratchpad would eventually exceed the model’s context window.

Instead, the harness manages the scratchpad continuously:

The most recent steps (default the last 4) are always kept in full, because the model needs them intact to decide the next action.
Older steps are trimmed in the working prompt, and once the scratchpad is estimated to reach ~70% of the token budget, they are summarized into a compact running memory and dropped from the step list. Key facts, relevant tool outputs, decisions made and what still remains are preserved.

Each time this happens the agent emits a context_compaction event (summarized_steps, summary_chars). The net effect: an agent can run for dozens of iterations on a hard problem without losing the thread and without blowing past its context window — that is what makes the harness robust on long-running tasks.

Sub-agents

When an agent has the spawn_subagent tool, it can delegate a self-contained or parallelizable subtask to a focused sub-agent that runs concurrently with its own isolated context. Sub-agents have their own (tighter) limits — by default depth 1 (a sub-agent cannot spawn more), up to 3 concurrent, 15 iterations and 300s each.

The key rule: a sub-agent’s text is never part of the answer. Its work streams live as nested activity (depth > 0, grouped by instance_id) via subagent_spawn … subagent_complete, and its result is returned to the parent as a tool result. The parent decides what to incorporate into the final answer.

Multi-agent orchestration (automatic pentests)

An automatic pentest runs several agents per phase. Here a higher-level orchestrator drives the same per-agent harness across the phase and then consolidates the results. Its lifecycle is reported through orchestration events:

orchestration_start → the phase’s agents begin, with periodic orchestration_status heartbeats and agent_status_change updates as each agent advances.
consolidation_start → consolidation_heartbeat → consolidation_complete as the orchestrator merges every agent’s findings into a single phase result.
orchestration_complete (or orchestration_cancelled) closes the phase.

A specialized browser agent (browser_agent_start) is used when a target needs interactive web navigation; its credentials are masked in the streamed activity.

Watching it happen

Every mechanism above is observable in real time. The harness emits a typed agent_event for each step — plan, iteration_start, thinking, tool_call, tool_result, context_compaction, subagent_spawn, agent_finished, and the orchestration events — alongside the content that builds the actual answer.

Streaming

The full agent_event reference and how to consume the SSE stream.

Stream agent events

A worked recipe that renders the live activity timeline.

Agents, tools & MCP

What an agent is made of and the two agent types.