Job output

Every run writes a job directory you can inspect, diff, or feed into CI. By default jobs land under jobs/. Change that with --jobs-dir.

Layout

jobs/seaport-<run-id>/
├── config.json
├── result.json
└── <task-name>/
    ├── config.json
    ├── result.json
    ├── agent/
    │   └── trajectory.json
    └── verifier/
        ├── reward.txt
        ├── test-stdout.txt
        └── test-stderr.txt

With multiple attempts, each trial directory gets an attempt suffix:

jobs/seaport-<run-id>/
└── acme-task-attempt-2/

The files

FileWhat it holds
result.json (job)Aggregate pass or fail, average reward, and a record per task attempt.
config.json (job)The run configuration: agent, attempts, concurrency, backend, model, target.
result.json (trial)Pass or fail and the reward for one attempt, plus an error if it failed.
trajectory.jsonThe agent command, exit status, stdout, and stderr for the agent phase.
reward.txtThe raw reward the verifier wrote, 1 or 0.
test-stdout.txt / test-stderr.txtThe verifier’s output streams.

Reading results

The end of every run prints a short summary: the job directory, the target, how many tasks passed, and whether the whole run passed. For anything deeper, the JSON is plain and stable, so jq, a spreadsheet, or a CI step can read it directly.

# overall pass count and average reward
jq '{passed, reward, tasks_passed, tasks_total}' jobs/seaport-*/result.json

Exit codes

The exit codes are:

CodeMeaning
0Every task passed.
2Usage error, such as a bad flag or missing source.
3A requested feature is not implemented yet.
4One or more tasks failed.