fix-wg-init

Fix: wg init --route codex-cli produces a non-functional project at runtime

Metadata

Statusdone
Assignedagent-1053
Agent identityf51439356729d112a6c404803d88015d5b44832c6c584c62b96732b63c2b0c7e
Created2026-04-29T01:16:39.353824386+00:00
Started2026-04-29T01:19:43.688620882+00:00
Completed2026-04-29T01:38:42.447536331+00:00
Tagspriority-critical,bug,codex,init, eval-scheduled
Eval score0.86
└ blocking impact0.90
└ completeness0.90
└ constraint fidelity0.25
└ coordination overhead0.85
└ correctness0.85
└ downstream usability0.85
└ efficiency0.80
└ intent fidelity0.83
└ style adherence0.85

Description

Description

wg init --route codex-cli writes a config file with codex:gpt-5.5 / codex:gpt-5.4-mini correctly, but the resulting project is completely non-functional at runtime. The bump-codex-defaults task (commit at agent-1030) claimed live-smoke validation but never actually executed it. Surfaced via live smoke in /tmp/codex-smoke at 2026-04-29 01:15Z.

Bugs found by live smoke

Bug 1: Dispatcher ignores project [agent].model and [dispatcher].model

  • config.toml says [dispatcher].model = "codex:gpt-5.5"
  • daemon log on startup says: Coordinator config: poll_interval=5s, max_agents=1, executor=claude, model=claude:opus
  • The dispatcher is loading global ~/.wg/config.toml or hardcoded defaults instead of the project config.

Bug 2: wg init claims to create default agent but doesn't

  • init output: "Created default agent: Careful Programmer (f5143935)."
  • After init: wg agent list reports "No agents defined."
  • No .wg/agency/agents/ directory exists. The agent file was never written.
  • Plus: "Warning: auto_assign is enabled but no agents are defined."

Bug 3: Service runtime dir at wrong path

  • Daemon writes to /tmp/codex-smoke/service/daemon.sock
  • Should be /tmp/codex-smoke/.wg/service/daemon.sock (consistent with where config.toml lives)
  • Both paths exist after init — the config dir was created by wg init but service uses a separate (sibling) directory

Bug 4: Graph watcher watches non-existent path

  • Daemon log: Graph watcher active on /tmp/codex-smoke/graph.jsonl
  • Actual file: /tmp/codex-smoke/.wg/graph.jsonl
  • Graph watcher is broken — never sees graph mutations from wg add etc.

Bug 5: Continuous reconciliation error every tick

  • Every dispatcher tick logs: Coordinator tick error: Failed to load graph for task-aware reaping
  • Dispatcher cannot read graph. No agents will ever spawn.

Common root cause hypothesis

All five bugs likely trace to inconsistent project-dir resolution between subsystems:

  • wg init resolves project dir → writes to /tmp/codex-smoke/.wg/
  • Service start resolves project dir → writes service files to /tmp/codex-smoke/service/ and looks for graph at /tmp/codex-smoke/graph.jsonl
  • Some subsystem cascades to global ~/.wg/ for config, picking up claude:opus instead of project's codex:gpt-5.5

Suspect: WG_DIR env var handling is partial — propagated to some code paths but not others. Or .wg/ discovery is implemented in some subsystems but not the service / dispatcher / graph-watcher. The init-help output mentions resolver precedence (--dir > $WG_DIR > .wg > .workgraph > ~/.wg > ./.wg) — verify all subsystems use the same resolver.

Repro

mkdir /tmp/codex-test && cd /tmp/codex-test
wg init --route codex-cli
WG_DIR=/tmp/codex-test wg agent list                  # → "No agents defined" (Bug 2)
WG_DIR=/tmp/codex-test wg service start --max-agents 1  # spawns OK
ls /tmp/codex-test/                                   # see service/ and .wg/ as siblings (Bug 3)
tail -20 /tmp/codex-test/service/daemon.log           # see Bugs 1, 4, 5

Validation

  • Failing test written first (TDD): an integration test that runs wg init --route codex-cli in a tmpdir, then wg service start --max-agents 1 against it, then verifies: - daemon.log says executor=codex, model=codex:gpt-5.5 - service files are under .wg/service/, not a sibling service/ directory - graph watcher path is .wg/graph.jsonl - no "Failed to load graph" errors after first tick - wg agent list shows the default agent the init created
  • All five bugs above are demonstrably fixed (confirm with the same repro steps)
  • Live smoke beyond the integration test: actually spawn a worker, confirm WG_EXECUTOR_TYPE=codex and WG_MODEL=codex:gpt-5.5 in the spawned process env. THIS IS THE STEP bump-codex-defaults SKIPPED — do not skip it again.
  • Permanent smoke scenario added under tests/smoke/scenarios/codex-cli-fresh-init/ that runs the full repro and asserts end-to-end. This task id in owners.
  • cargo build + cargo test pass with no regressions
  • cargo install --path . was run before claiming done

Process note

The bump-codex-defaults task (which JUST landed) had this same validation step in its description:

Live smoke: in a fresh tmpdir, wg init --route codex-cli then spawn a tiny task and confirm worker uses gpt-5.5

The agent claimed done without actually running this. Per the user's standing feedback (memory: 'verify exhaustively before claiming done' / 'test the binary, not just the build'), the agent's validation was inadequate. Apply that lesson here: the live smoke MUST be observed working, screenshot or daemon-log-paste in this task's log as evidence. Do not claim done off cargo-test alone.

Depends on

Required by

Log