fix-nex-agent

Fix: nex agent prompting — direct toward web/data tools when task needs current data (don't default to code generation)

Metadata

Statusdone
Assignedagent-2478
Agent identity02e879681e52e0a384106169be043416c4d946e850ab26b2269c57681b52a6e7
Modelcodex:gpt-5.5
Created2026-05-04T21:58:09.358085955+00:00
Started2026-05-04T21:58:59.794201959+00:00
Completed2026-05-04T22:22:23.877719189+00:00
Tagsfix,nex,agent,prompting,tools, eval-scheduled
Eval score0.85
└ blocking impact0.87
└ completeness0.87
└ constraint fidelity0.55
└ coordination overhead0.87
└ correctness0.88
└ downstream usability0.82
└ efficiency0.77
└ intent fidelity0.83
└ style adherence0.88

Description

Description

Observed in .chat-0 (nex/qwen3-coder) 2026-05-04: user asked for a 'Copenhagen weather forecast for June 28-July 3, 2026'. The agent's response: write a Rust program in ~/household/src/main.rs that outputs hardcoded weather text via println!.

User reaction: 'wg nex tried to write a rust program too instead of searching the web lol.'

Root cause (multi-layer)

  1. Model bias: qwen3-coder is a coding model. Its training prior heavily favors 'write code to solve this' over 'use a tool / fetch data / answer directly'. Reasonable for many tasks; wrong for current-data tasks.

  2. Tool affordances: nex's tool list may not include web search (or curl in a way the agent recognizes as fetching live data). Agent sees: file write, bash, code execution. No 'web' affordance. So it falls back to 'write code that fakes the data'.

  3. Prompt guidance: nex's system prompt likely doesn't say 'if the task needs current data and you don't have a web tool, ASK the user for the data or NOTE that you can't fetch it'. Without this guidance, the model fills the gap by hallucinating.

What the agent SHOULD have done

Several reasonable responses:

  • Note that it has no web access tool, ask the user to paste the forecast OR confirm 'do you want me to write a placeholder program?'
  • If bash has curl available: curl https://wttr.in/Copenhagen and parse — actual current data
  • If neither: explicitly say 'I don't have a way to fetch current data; here's a placeholder structure you can fill in'

Spec

A. Improve nex's bundled prompt to handle this case

Add to nex's system prompt (or the agent-guide content nex consumes):

When asked to produce content that requires current real-world data (weather, news, prices, dates beyond your training cutoff, etc.):
- If you have a web fetch tool, use it.
- If you have bash, try curl / wget for known data endpoints (e.g., curl wttr.in for weather).
- If neither: explicitly state you cannot fetch live data, and ask the user to either provide it or confirm they want a code skeleton / placeholder.
- Do NOT default to writing code that fabricates the data.

B. Audit nex's tool list for a web-fetch affordance

  • Does nex have a curl / fetch tool exposed to the agent? If not, why not?
  • Recommend: expose a basic 'fetch_url' tool that the agent can call. Bound to safe verbs (GET only by default), respects user-configurable allowlists.
  • If web fetch is intentionally not available: at least have bash, and surface to the agent that bash is the path for HTTP requests.

C. Model selection guidance

qwen3-coder is biased toward code; for general-assistant work the user might want a non-coder model. nex should accept a non-coder model spec without complaint. The agent's behavior shouldn't entirely depend on this — the prompt fix in A handles the bias even when the user picks qwen3-coder.

Validation

  • Failing test or repro: ask nex/qwen3-coder for current weather; pre-fix, it writes a code skeleton; post-fix, it asks the user OR uses curl OR explicitly says 'I can't fetch data'
  • Test with a non-coder model (claude:haiku, gpt-5.4-mini): same behavior — explicit honesty about data access
  • Test with curl available in bash: agent attempts a fetch via curl
  • cargo build + cargo test pass
  • cargo install --path . was run before claiming done

Out of scope for this task

  • Building a full web-fetch tool with HTML parsing / search ranking. That's a bigger feature; this task just adds the prompt guidance + bash hint.
  • Restricting what models can be used with nex. That's user choice; this fix makes the agent behavior reasonable regardless of model.

Depends on

Required by

Log