fix-agent-prompting — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-2465`
Agent identity	`02e879681e52e0a384106169be043416c4d946e850ab26b2269c57681b52a6e7`
Model	`codex:gpt-5.5`
Created	2026-05-04T21:38:29.819505239+00:00
Started	2026-05-04T21:39:41.156166881+00:00
Completed	2026-05-04T22:04:13.227859980+00:00
Tags	`fix,docs,agents,prompting`, `eval-scheduled`
Eval score	0.78
└ blocking impact	0.80
└ completeness	0.70
└ constraint fidelity	0.25
└ coordination overhead	0.85
└ correctness	0.85
└ downstream usability	0.80
└ efficiency	0.75
└ intent fidelity	0.84
└ style adherence	0.85

Description

Chat agents (observed: codex agent inside .chat-0) interpret wg nex as a way to dispatch one-shot LLM requests, invoking it like:

wg nex "Please create a weather forecast for Copenhagen ..."

wg nex is an interactive REPL that needs a TTY. Without one, it waits indefinitely on stdin. Agent's bash subprocess hangs; chat tab freezes; user can't cancel cleanly (separate concern, see fix-tui-chat-cancel).

User report 2026-05-04: 'It's kind of annoying I just got a freeze in this state and I can't even cancel it. It's like it seems that the agent thought that I had to run WG next for some reason. I don't really understand why that happened.'

Root cause

The bundled agent-guide (wg agent-guide) likely doesn't clearly say:

wg nex is interactive-only (REPL); chat agents should NEVER invoke it from bash
For one-shot LLM calls inside an agent's task, use wg add to file a sub-task, not wg nex
The orchestration model is graph-based (wg add tasks), NOT shell-process LLM-call hop

Fix

Update the bundled agent-guide content (src/text/agent_guide.md or wherever wg agent-guide reads from) to include explicit guidance:

## Don't run wg nex from bash

`wg nex` is an interactive REPL that needs a terminal. As a worker or chat agent
running through wg, you do not have an interactive terminal. Invoking `wg nex`
from bash will hang on stdin and block your task.

If you need to dispatch additional LLM work:
- File a sub-task with `wg add 'description' --after <current-task-id>` — let
  the dispatcher spawn an agent for it
- For evaluation / scoring, use `wg evaluate run <task>` or related agency
  commands that are batch-mode and won't hang

If you need an interactive REPL for development, run `wg nex` from your own
shell, not from inside an agent run.

This goes into the universal contract bundled in the binary. After it lands + cargo install + agents respawn, both claude and codex chat agents see it via wg agent-guide.

Validation

wg agent-guide output contains the 'Don't run wg nex from bash' section
Live smoke: spawn a chat agent in a fresh project, ask it to do something that previously triggered wg nex invocation. Pre-fix: agent runs wg nex, hangs. Post-fix: agent files wg add (or notes it can't / asks the user).
No regression of any other agent-guide content
cargo build + cargo test pass
cargo install --path . was run before claiming done

Coordinate

fix-agents-md (shipped) — established lock-step CLAUDE.md / AGENTS.md
architectural-remove-wg (in flight) — removes wg_* MCP tools, simplifies the path agents take
This task adds the SPECIFIC guidance about wg nex that agents are getting wrong

## Description
Chat agents (observed: codex agent inside .chat-0) interpret `wg nex` as a way to dispatch one-shot LLM requests, invoking it like:

```
wg nex "Please create a weather forecast for Copenhagen ..."
```

`wg nex` is an interactive REPL that needs a TTY. Without one, it waits indefinitely on stdin. Agent's bash subprocess hangs; chat tab freezes; user can't cancel cleanly (separate concern, see fix-tui-chat-cancel).

## Root cause
The bundled agent-guide (`wg agent-guide`) likely doesn't clearly say:
- `wg nex` is interactive-only (REPL); chat agents should NEVER invoke it from bash
- For one-shot LLM calls inside an agent's task, use `wg add` to file a sub-task, not `wg nex`
- The orchestration model is graph-based (wg add tasks), NOT shell-process LLM-call hop

## Fix
Update the bundled agent-guide content (`src/text/agent_guide.md` or wherever `wg agent-guide` reads from) to include explicit guidance:

```md
## Don't run wg nex from bash

`wg nex` is an interactive REPL that needs a terminal. As a worker or chat agent
running through wg, you do not have an interactive terminal. Invoking `wg nex`
from bash will hang on stdin and block your task.

If you need to dispatch additional LLM work:
- File a sub-task with `wg add 'description' --after <current-task-id>` — let
the dispatcher spawn an agent for it
- For evaluation / scoring, use `wg evaluate run <task>` or related agency
commands that are batch-mode and won't hang

If you need an interactive REPL for development, run `wg nex` from your own
shell, not from inside an agent run.
```

This goes into the universal contract bundled in the binary. After it lands +
cargo install + agents respawn, both claude and codex chat agents see it via
`wg agent-guide`.

## Validation
- [ ] `wg agent-guide` output contains the 'Don't run wg nex from bash' section
- [ ] Live smoke: spawn a chat agent in a fresh project, ask it to do something that previously triggered wg nex invocation. Pre-fix: agent runs wg nex, hangs. Post-fix: agent files wg add (or notes it can't / asks the user).
- [ ] No regression of any other agent-guide content
- [ ] cargo build + cargo test pass
- [ ] cargo install --path . was run before claiming done

## Coordinate
- fix-agents-md (shipped) — established lock-step CLAUDE.md / AGENTS.md
- architectural-remove-wg (in flight) — removes wg_* MCP tools, simplifies the path agents take
- This task adds the SPECIFIC guidance about wg nex that agents are getting wrong

Depends on

done .assign-fix-agent-prompting

Required by

done .flip-fix-agent-prompting

Log

2026-05-04T21:38:29.796074507+00:00 Task paused
2026-05-04T21:39:14.693685414+00:00 Task published
2026-05-04T21:39:37.302912606+00:00 Lightweight assignment: agent=Careful Programmer (02e87968), exec_mode=full, context_scope=task, reason=Programmer role + Careful tradeoff fit documentation fix requiring cargo build, test, and smoke validation.
2026-05-04T21:39:41.156171791+00:00 Spawned by coordinator --executor codex --model gpt-5.5
2026-05-04T21:40:02.219592554+00:00 Starting implementation: inspecting bundled agent guide and existing guide tests
2026-05-04T21:44:31.622962830+00:00 Progress: updated agent guide warning and added regression tests
2026-05-04T21:44:53.497846961+00:00 Progress: updated agent guide warning and added regression tests
2026-05-04T21:53:52.264008001+00:00 Validated: guide warning added; focused unit test and smoke pass; cargo build passes; cargo test has unrelated prompt snapshot hash mismatch
2026-05-04T22:01:28.634717917+00:00 Live smoke attempted in fresh project with codex exec; blocked by codex auth 401 before agent action, no task created and no scratch wg nex process launched
2026-05-04T22:02:27.861514686+00:00 Validated: cargo build pass; cargo test pass after refreshing two stale prompt snapshot hashes; installed wg contains new agent-guide warning; owned smoke passes
2026-05-04T22:03:24.286366686+00:00 Committed: 23a72899a — pushed to origin/wg/agent-2465/fix-agent-prompting
2026-05-04T22:04:13.227867855+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-04T22:05:50.173851015+00:00 PendingEval → Done (evaluator passed; downstream unblocks)