synth-agency-end-to-end — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-2449`
Agent identity	`3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3`
Model	`claude:opus`
Created	2026-05-04T17:45:30.726705677+00:00
Started	2026-05-04T20:58:10.073607339+00:00
Completed	2026-05-04T21:10:20.934887771+00:00
Tags	`agency,sync,synthesis,federation`, `eval-scheduled`
Eval score	0.87
└ blocking impact	0.82
└ completeness	0.88
└ coordination overhead	0.90
└ correctness	0.85
└ downstream usability	0.88
└ efficiency	0.82
└ intent fidelity	0.86
└ style adherence	0.90

Description

Final fan-in synthesis. Verify cross-area composition by running an end-to-end smoke that exercises the federation primitive: a workgraph project's agency primitives must be byte-for-byte loadable by agentbureau/agency tooling (or vice versa, per the chosen alignment direction in research-agency-hash-compat).

End-to-end smoke shape

wg init a scratch project, wg agency init to seed primitives.
wg agency export --format agency-csv > /tmp/wg-export.csv.
Take agency's upstream primitives/starter.csv (download from https://raw.githubusercontent.com/agentbureau/agency/main/primitives/starter.csv) and run wg agency import --format agency-csv against it.
Re-export and byte-diff against the upstream — should be empty.
Pick three primitives (one role_component, one desired_outcome, one trade_off_config). Compute their content_hash by hand (sha256 of canonical input). Confirm the hash matches the on-disk .wg/agency/primitives/{components,outcomes,tradeoffs}/{hash}.yaml filename.
Confirm wg agency stats reports v1.2.4 compat version.

Deliverable

wg log synthesis report with:

Each step's pass/fail
Any drift detected (and whether it's structural or cosmetic)
Concrete follow-up tasks filed as wg add --after synth-agency-end-to-end ... for any drift left to close
Final verdict: alignment achieved (yes/partial/no)

Validation

All 6 smoke steps executed, output captured
Byte-diff in step 4 is empty (or differences explained as intentional)
Hand-computed hashes in step 5 match on-disk filenames for all three sample primitives
Synthesis report posted via wg log with verdict
Any residual drift filed as follow-up tasks (or explicit note that none was found)
cargo build + cargo test pass

## Description
Final fan-in synthesis. Verify cross-area composition by running an end-to-end smoke that exercises the federation primitive: a workgraph project's agency primitives must be byte-for-byte loadable by agentbureau/agency tooling (or vice versa, per the chosen alignment direction in `research-agency-hash-compat`).

## End-to-end smoke shape
1. `wg init` a scratch project, `wg agency init` to seed primitives.
2. `wg agency export --format agency-csv > /tmp/wg-export.csv`.
3. Take agency's upstream `primitives/starter.csv` (download from https://raw.githubusercontent.com/agentbureau/agency/main/primitives/starter.csv) and run `wg agency import --format agency-csv` against it.
4. Re-export and byte-diff against the upstream — should be empty.
5. Pick three primitives (one role_component, one desired_outcome, one trade_off_config). Compute their content_hash by hand (sha256 of canonical input). Confirm the hash matches the on-disk `.wg/agency/primitives/{components,outcomes,tradeoffs}/{hash}.yaml` filename.
6. Confirm `wg agency stats` reports v1.2.4 compat version.

## Deliverable
`wg log` synthesis report with:
- Each step's pass/fail
- Any drift detected (and whether it's structural or cosmetic)
- Concrete follow-up tasks filed as `wg add --after synth-agency-end-to-end ...` for any drift left to close
- Final verdict: alignment achieved (yes/partial/no)

## Validation
- [ ] All 6 smoke steps executed, output captured
- [ ] Byte-diff in step 4 is empty (or differences explained as intentional)
- [ ] Hand-computed hashes in step 5 match on-disk filenames for all three sample primitives
- [ ] Synthesis report posted via `wg log` with verdict
- [ ] Any residual drift filed as follow-up tasks (or explicit note that none was found)
- [ ] cargo build + cargo test pass

Depends on

Required by

Log

2026-05-04T17:45:30.664109062+00:00 Task paused
2026-05-04T18:36:22.791580095+00:00 Task published
2026-05-04T18:36:40.891420764+00:00 Lightweight assignment: agent=Default Evaluator (31847164), exec_mode=full, context_scope=graph, reason=This is a correctness-critical synthesis and validation task; the Evaluator agent is the best role fit, with careful tradeoff behavior and enough scope to run the end-to-end smoke, verify hashes, and file follow-ups if drift remains.
2026-05-04T20:58:10.073611487+00:00 Spawned by coordinator --executor claude --model opus
2026-05-04T20:58:29.977017261+00:00 Starting end-to-end agency federation smoke. Plan: scratch wg init -> agency init -> export -> upstream import -> re-export diff -> hand-hash 3 primitives -> stats version check.
2026-05-04T21:09:34.397826614+00:00 === Synth: agency end-to-end smoke — federation byte-equality === VERDICT: PARTIAL alignment. Federation primitive (per-row content_hash equality) is byte-perfect. CSV envelope roundtrip is NOT byte-equal yet — three classes of cosmetic + one class of semantic drift, both filed as concrete follow-ups. --- Step results --- [1] PASS scratch wg init + wg agency init (/tmp/wg-agency-smoke-r2nl). Seeded 8 roles + 11 tradeoffs and auto-imported 637 primitives from the embedded starter. [2] PASS wg agency export --format agency-csv -> /tmp/wg-export.csv. 673 lines (1 header + 672 primitives). Header byte-equal to upstream. [3] PASS curl https://raw.githubusercontent.com/agentbureau/agency/main/primitives/starter.csv -> /tmp/upstream-starter.csv (sha256 8a8a483b...) (701 data rows). wg agency import --format agency-csv ingested it: Components 364, Outcomes 105, Tradeoffs 232 = 701 distinct primitives. [4] FAIL (with characterized drift) wg agency export -> /tmp/wg-reexport.csv (737 rows). Raw diff vs upstream: ALL rows differ (1439 diff lines). - Root cause #1 (cosmetic): upstream uses CRLF, exporter writes LF. After 'tr -d \r' on upstream, set difference shrinks to: 667 byte-equal rows / 34 upstream-only / 69 reexport-only. - 18 of 34 upstream-only rows: same (type,name) appears twice in upstream with different 'scope' values (task vs meta:assigner / meta:evaluator). wg's import collapses to one row. - 15 of 34 upstream-only rows: cosmetic — domain field 'software, management' (with space) becomes 'software,management' (no space) on import; or upstream leaves parent_ids empty while re-export fills it from parent_content_hash. - 1 of 34 truly missing: role_component identify-write-up-audience-and-adapt collides by description with locally-seeded adapt-research-synthesis-for-non-domain-audience and is silently merged. - 69 reexport-only rows: workgraph's seeded primitives (capitalized titles like 'Calibrated evaluation grade', 'Default Evaluator outputs') that don't exist in upstream and are correctly absent from it. [5] PASS Hand-computed sha256(description) for one of each type matches the on-disk filename byte-for-byte: - component '00548ae...' description 'Analyzes dependencies and structural decisions.' - outcome '00f6dd7...' description 'Return a deduplicated synthesis ... trade-offs stated' - tradeoff '00943f3...' description 'When a project has a canonical reference: compare working ... not at session end' Confirms hash equation src/agency/hash.rs::description_hash = sha256(description.as_bytes()) per Agency v1.2.4 spec. [6] PASS wg agency stats reports 'v1.2.4 fields'. wg agency stats --json shows agency_compat_version='1.2.4'. Source: src/agency/mod.rs:16 const WG_AGENCY_COMPAT_VERSION = '1.2.4'. --- Cargo gates --- cargo build: OK (1m56s, 23 warnings, 0 errors) cargo test --lib agency::: 228 passed / 0 failed cargo test --test '*agency*': 237 passed across 12 binaries / 0 failed --- Drift classification --- Cosmetic (envelope normalization): CRLF/LF, domain whitespace, parent_ids derived field Semantic (identity rule): per-scope duplicate collapse, description-collision dedup Local-only extras: 69 wg-seeded primitives not in upstream --- Follow-ups filed --- - fix-agency-csv (Fix: agency CSV export byte-equal roundtrip with upstream starter.csv) — covers CRLF, domain whitespace, parent_ids handling - investigate-agency-import (Investigate: agency import dedup rules) — covers per-scope variants and same-description collisions --- Federation alignment summary --- Per-primitive content_hash compat: YES (byte-perfect, hand-verified) Compat version handshake (v1.2.4): YES (CLI surface + JSON + const all align) CSV roundtrip byte-equality: NO (drift characterized, follow-ups filed) Final verdict: PARTIAL — production-ready for hash-based federation handshakes; CSV byte-equality is a fixable cosmetic + dedup cleanup tracked in fix-agency-csv + investigate-agency-import.
2026-05-04T21:10:20.934898341+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-04T21:11:43.887765798+00:00 PendingEval → Done (evaluator passed; downstream unblocks)