Agents

What Agents Do

Agents turn task instructions into candidate code changes inside DevBox workspaces. SeaSnoke keeps their output tied to workspace threads, runtime logs, checks, diffs, and review decisions so your team can evaluate the work without switching tools.

Agents operate inside the boundaries of the workspace. They read the relevant repository context, work inside an isolated checkout, recreate enough of the environment to validate their changes, and report what they changed. The output is a candidate for review, not an automatic production change.

An agent run usually has four visible phases:

Context gathering: the agent reads project files, instructions, tests, and related code.
Environment preparation: the workspace installs dependencies, starts services, or restores a snapshot when available.
Implementation: the agent edits the repository to satisfy the task.
Validation: the agent runs the available checks or records why a check could not run.
Summary: the agent explains the important files changed, tradeoffs, and remaining risks.

Mermaid

sequenceDiagram
  participant User
  participant SeaSnoke
  participant Workspace
  participant Agent
  participant Review

  User->>SeaSnoke: Create task
  SeaSnoke->>Workspace: Open DevBox workspace
  Workspace->>Agent: Start thread with repository context
  Agent->>Workspace: Read, edit, validate, and summarize
  Workspace->>SeaSnoke: Return candidate change and logs
  SeaSnoke->>Review: Present diff, checks, and notes

Threads, Not Just Runs

In larger work, SeaSnoke presents agents as workspace threads. A thread can be a single linear coding session, or one role inside a graph-backed task.

Common thread roles include:

Planner: breaks a large request into independently reviewable slices.
Worker: implements one slice of the task in its own isolated checkout.
Reducer: combines compatible worker outputs into one candidate.
Verifier: inspects the combined result, runs checks, and can request fixes.

Each thread should have its own conversation, logs, files touched, status, and diff artifact. This keeps parallel agent work understandable without forcing every worker to push a visible branch.

Writing Effective Instructions

The best instructions describe the desired outcome, constraints, and verification method. They do not need to specify every file if the repository already makes that clear.

Strong task instructions include:

the user-facing behavior that should change
the repository or package involved
known constraints, such as "do not change the public API"
expected tests or checks
examples of inputs and outputs when behavior is subtle

Less effective instructions are broad or subjective, such as "make this better" or "clean up the app." Those can still work, but they tend to produce candidates that are harder to review.

Add pagination to the tasks API.

Requirements:
- Accept `limit` and `cursor` query parameters.
- Preserve the current default response for callers that omit pagination.
- Add tests for first page, next page, and invalid cursor.
- Do not change task creation behavior.

Review Signals

Diffs show exactly what changed.
Checks show whether tests and quality gates passed.
Console logs show setup, agent, and verification output.
Thread messages show the agent conversation and decisions.
Notes summarize the reasoning and tradeoffs behind a candidate.
Status makes it clear whether a run is queued, active, completed, or blocked.

Agent Statuses

SeaSnoke surfaces agent status so reviewers can tell whether a run is still active or ready to inspect.

Common statuses:

Queued: the run has been accepted and is waiting to start.
Running: the agent is actively working.
Needs attention: the run reached a condition that requires a human decision.
Completed: the run produced a candidate.
Failed: the run could not produce a candidate.
Canceled: the run was stopped before completion.

Failures are still useful. A failed run can reveal missing setup instructions, broken tests, unavailable secrets, or task requirements that need to be narrowed.

Comparing Agents and Runs

Multiple runs can be useful when the task has several plausible solutions. For example, a reviewer may want to compare a minimal patch against a broader cleanup, or compare two approaches to an API shape.

When comparing runs, focus on:

correctness against the original task
size and readability of the diff
test quality
compatibility with existing conventions
deployment or migration risk
whether the summary matches the actual code

SeaSnoke keeps each candidate separate so the team can make that decision deliberately.

Human Review Still Matters

Agents can produce useful code quickly, but human review remains the control point. Reviewers should check product behavior, security assumptions, data model changes, and long-term maintainability before a candidate is merged.

Treat agent output the same way you would treat a pull request from a teammate: inspect the diff, run or review checks, ask for revisions when needed, and merge only when the change is understood.