design-agency-sync — Workgraph live mirror

Metadata

Status	done
Assigned	`agent-2342`
Agent identity	`3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3`
Model	`claude:opus`
Created	2026-05-04T17:28:58.772994839+00:00
Started	2026-05-04T17:31:42.322841408+00:00
Completed	2026-05-04T17:48:28.614914201+00:00
Tags	`priority-high,design,autopoietic,agency,sync`, `eval-scheduled`
Eval score	0.93
└ blocking impact	0.94
└ completeness	0.96
└ coordination overhead	0.92
└ correctness	0.94
└ downstream usability	0.97
└ efficiency	0.89
└ intent fidelity	0.88
└ style adherence	0.93

Description

Workgraph's agency system (roles, tradeoffs, agents, evaluation, evolve loop) needs deep alignment with the agentbureau/agency repo on GitHub. User wants exact match in the agent definition pipeline. The asymmetry between the two systems is producing drift in how agents are defined and evolved.

User direct quote 2026-05-04: 'we should have an agency sync task with https://github.com/agentbureau/agency we want to exactly match it in the agent definition pipeline. needs deep alignment. assign codex:gpt-5.5 to everything. or at least, we should spawn an autopoietic spark that does the fanout.'

This task is the autopoietic spark. Same pattern as design-nex-chat: this design investigates, identifies deltas, then FILES the fan-out subgraph (--paused) of research + impl + synthesis tasks. Chat agent (next user prompt) calls wg publish <root> --wcc to release.

Investigation areas

1. Read the agentbureau/agency repo

Start at https://github.com/agentbureau/agency. Map the agent definition pipeline:

How are roles defined? Schema, fields, validation
How are tradeoffs / motivations defined?
How are agents (role + tradeoff) instantiated?
How are evaluations performed?
How is the evolve loop structured?
Persistence format, file layout, content-hash identity, federation primitives
Any concrete CSV / YAML / TOML / JSON schemas the project ships

2. Compare with workgraph's current agency

Workgraph's agency system already exists (per CLAUDE.md):

Roles, tradeoffs, agents in .wg/agency/
FLIP scoring, evaluation pipeline
Federation via content-hash IDs

Diff each:

Where do field names differ?
Where do file formats differ?
Where are concepts in one system but not the other?
Where are the workgraph implementations stricter or looser than agentbureau/agency expects?

3. Identify deep-alignment delta

Per user's 'deep alignment' framing: aim to make workgraph's pipeline a strict superset OR exact match of agentbureau/agency's. If superset, the agentbureau/agency primitives drop in unchanged; if exact match, the two are interoperable.

Note any places where workgraph has GOOD reasons to diverge (e.g., domain-specific evaluation thresholds). Document those as intentional, not as drift.

Subgraph the design must file (autopoietic deliverable)

After investigation, file these subgraph tasks via wg add --paused --no-place:

Research fan-out (parallel, all --model claude:opus)

1+ research tasks per major delta area (schemas, evaluation, evolve loop, federation, etc.)
Each posts findings via wg log with file:line citations + concrete fix proposal

Implementation fan-out (parallel where possible, all --model codex:gpt-5.5)

1 impl task per research item
Each implements the alignment for its area
Cross-tested via integration smoke

Cross-model peer review (parallel after each impl, all --model claude:opus)

1 peer-review task per impl, mirroring the design-nex-chat pattern
Reviewer reads the impl's diff + smoke output, posts concur/concern verdict

Integration impl (single, --model codex:gpt-5.5)

Wires all the area-fixes together; ensures workgraph's agency exactly matches agentbureau/agency's pipeline shape

Fan-in synthesis (single, --model claude:opus)

Verifies cross-area composition
Runs an end-to-end smoke: a workgraph project's agency primitives are byte-for-byte loadable by agentbureau/agency tooling (or vice versa, per the chosen alignment direction)

Deliverable

wg log entry on this task with:

Investigation findings (current state of both systems + delta map)
Subgraph shape decided (specific task list with names + dependencies + rationale for parallel-vs-serial)
All sub-tasks filed via wg add --paused --no-place
Final note: 'subgraph filed, ready for publish — chat agent should run wg publish <root> --wcc to release'

Validation

agentbureau/agency repo investigated, key primitives documented
Delta map produced (current workgraph vs target alignment)
Subgraph filed: research + impl + peer-review + integration + synthesis tasks all present
Subgraph dependencies wired correctly (--after chains for the right ordering)
Per-task model assigned per the pattern (research=opus, impl=codex:gpt-5.5, peer-review=opus, integration=codex:gpt-5.5, synthesis=opus)
Subgraph remains paused (--paused) — chat agent releases via wg publish in next user turn
No source / doc modifications outside filing the subgraph
Design doc posted via wg log

Coordinate

This is unrelated to the in-flight README chain. Different code surface, different concern. Should run in parallel with that chain without merge conflicts (agency code vs README docs).

Process note

Same autopoietic pattern as design-nex-chat. Worth extracting as a wg func once it lands — 'design-and-fanout' becomes a reusable function for any 'investigate then build a subgraph' pattern. Out of scope for this task.

## Description
Workgraph's agency system (roles, tradeoffs, agents, evaluation, evolve loop) needs deep alignment with the agentbureau/agency repo on GitHub. User wants exact match in the agent definition pipeline. The asymmetry between the two systems is producing drift in how agents are defined and evolved.

User direct quote 2026-05-04: 'we should have an agency sync task with https://github.com/agentbureau/agency we want to exactly match it in the agent definition pipeline. needs deep alignment. assign codex:gpt-5.5 to everything. or at least, we should spawn an autopoietic spark that does the fanout.'

This task is the autopoietic spark. Same pattern as design-nex-chat: this design investigates, identifies deltas, then FILES the fan-out subgraph (--paused) of research + impl + synthesis tasks. Chat agent (next user prompt) calls `wg publish <root> --wcc` to release.

## Investigation areas

### 1. Read the agentbureau/agency repo
Start at https://github.com/agentbureau/agency. Map the agent definition pipeline:
- How are roles defined? Schema, fields, validation
- How are tradeoffs / motivations defined?
- How are agents (role + tradeoff) instantiated?
- How are evaluations performed?
- How is the evolve loop structured?
- Persistence format, file layout, content-hash identity, federation primitives
- Any concrete CSV / YAML / TOML / JSON schemas the project ships

### 2. Compare with workgraph's current agency
Workgraph's agency system already exists (per CLAUDE.md):
- Roles, tradeoffs, agents in `.wg/agency/`
- FLIP scoring, evaluation pipeline
- Federation via content-hash IDs

Diff each:
- Where do field names differ?
- Where do file formats differ?
- Where are concepts in one system but not the other?
- Where are the workgraph implementations stricter or looser than agentbureau/agency expects?

### 3. Identify deep-alignment delta
Per user's 'deep alignment' framing: aim to make workgraph's pipeline a strict superset OR exact match of agentbureau/agency's. If superset, the agentbureau/agency primitives drop in unchanged; if exact match, the two are interoperable.

Note any places where workgraph has GOOD reasons to diverge (e.g., domain-specific evaluation thresholds). Document those as intentional, not as drift.

## Subgraph the design must file (autopoietic deliverable)

After investigation, file these subgraph tasks via `wg add --paused --no-place`:

### Research fan-out (parallel, all --model claude:opus)
- 1+ research tasks per major delta area (schemas, evaluation, evolve loop, federation, etc.)
- Each posts findings via `wg log` with file:line citations + concrete fix proposal

### Implementation fan-out (parallel where possible, all --model codex:gpt-5.5)
- 1 impl task per research item
- Each implements the alignment for its area
- Cross-tested via integration smoke

### Cross-model peer review (parallel after each impl, all --model claude:opus)
- 1 peer-review task per impl, mirroring the design-nex-chat pattern
- Reviewer reads the impl's diff + smoke output, posts concur/concern verdict

### Integration impl (single, --model codex:gpt-5.5)
- Wires all the area-fixes together; ensures workgraph's agency exactly matches agentbureau/agency's pipeline shape

### Fan-in synthesis (single, --model claude:opus)
- Verifies cross-area composition
- Runs an end-to-end smoke: a workgraph project's agency primitives are byte-for-byte loadable by agentbureau/agency tooling (or vice versa, per the chosen alignment direction)

## Deliverable

`wg log` entry on this task with:
- Investigation findings (current state of both systems + delta map)
- Subgraph shape decided (specific task list with names + dependencies + rationale for parallel-vs-serial)
- All sub-tasks filed via `wg add --paused --no-place`
- Final note: 'subgraph filed, ready for publish — chat agent should run `wg publish <root> --wcc` to release'

## Validation
- [ ] agentbureau/agency repo investigated, key primitives documented
- [ ] Delta map produced (current workgraph vs target alignment)
- [ ] Subgraph filed: research + impl + peer-review + integration + synthesis tasks all present
- [ ] Subgraph dependencies wired correctly (`--after` chains for the right ordering)
- [ ] Per-task model assigned per the pattern (research=opus, impl=codex:gpt-5.5, peer-review=opus, integration=codex:gpt-5.5, synthesis=opus)
- [ ] Subgraph remains paused (--paused) — chat agent releases via wg publish in next user turn
- [ ] No source / doc modifications outside filing the subgraph
- [ ] Design doc posted via `wg log`

## Coordinate

This is unrelated to the in-flight README chain. Different code surface, different concern. Should run in parallel with that chain without merge conflicts (agency code vs README docs).

## Process note

Same autopoietic pattern as design-nex-chat. Worth extracting as a wg func once it lands — 'design-and-fanout' becomes a reusable function for any 'investigate then build a subgraph' pattern. Out of scope for this task.

Depends on

done .assign-design-agency-sync

Required by

(none)

Log

2026-05-04T17:28:58.741555310+00:00 Task paused
2026-05-04T17:29:29.312017352+00:00 Task published
2026-05-04T17:31:42.322849623+00:00 Spawned by coordinator --executor claude --model opus
2026-05-04T17:32:11.093264363+00:00 Starting investigation of agentbureau/agency repo and workgraph's current agency surface
2026-05-04T17:47:04.091438936+00:00 === DESIGN: agency sync with agentbureau/agency === Investigation summary, delta map, and subgraph filed. Chat agent: run `wg publish design-agency-sync --wcc` to release the subgraph. ## A. Investigation findings ### A.1 agentbureau/agency v1.2.4 (https://github.com/agentbureau/agency) - Python, distributed via pipx; SQLite (`~/.agency/db.sqlite` w/ sqlite-vec) is ground truth. - Primitives: role_components, desired_outcomes, trade_off_configs. - Identity: SHA-256 of `description` field. Content_hash is THE primary key. - Lineage v1.2.4: parent_ids (JSON list of content hashes), generation, created_by enum {human, import, evolver, agent_creator}, reframing_potential, scope ∈ {task, meta:assigner, meta:evaluator, meta:evolver, meta:agent_creator}. - Compositions: agent = role_component_set + desired_outcome + trade_off_config; agent_hash is composition identifier. - Evaluations: agency_task_id, output, score, task_completed, score_type, dimensional_scores (caller-defined JSON dict), cascaded_evaluation_ids. Cascade equally to constituent primitives via primitive_performance. - Starter CSV (12 cols): type, name, description, quality, domain_specificity, domain, origin_instance_id, parent_content_hash, scope, parent_ids, generation, created_by. - Composition rules CSV (separate, watched, `~/.agency/composition-rules.csv`): agent_type ∈ {assigner, evaluator, evolver, agent_creator}, rule, max_role_components, max_desired_outcomes, max_trade_off_configs, all_projects, project_ids. - Functional agents: assigner / evaluator / evolver / agent_creator — first-class via `scope`. - Domain taxonomy: software, research, writing, analysis, legal, strategy, science, management. - Task-type taxonomy: build, evaluate, review, research, analyse, debug, plan, synthesise. - Federation: ed25519 keypair under `~/.agency/keys/`; remote client via `agency client setup` (single-instance + remote, not peer-to-peer in spec). - Evolution: infrastructure only in v1.2.4; tools land v1.3.0. ### A.2 workgraph current state (this repo) - Storage: per-primitive YAML at `.wg/agency/primitives/{components,outcomes,tradeoffs}/{hash}.yaml`; cache YAMLs at `.wg/agency/cache/{roles,agents}/{hash}.yaml`. - Hash inputs (src/agency/hash.rs) are WIDER than agency: - component (hash.rs:16-35): description + category + content - outcome (hash.rs:39-52): description + success_criteria - tradeoff (hash.rs:56-75): description + acceptable_tradeoffs + unacceptable_tradeoffs - role (hash.rs:79-94): sorted component_ids + outcome_id (composition; analog: agency agent_hash) - agent (hash.rs:98-112): role_id + tradeoff_id (note: serializes tradeoff_id under legacy `motivation_id` key) - Primitive types (src/agency/types.rs:184-261) carry rich extensions: ContentRef (Name|File|Url|Inline), ComponentCategory (translated|enhanced|novel), success_criteria, acceptable/unacceptable_tradeoffs, requires_human_oversight, AccessControl, PerformanceRecord, staleness_flags, domain_tags, metadata, former_agents, former_deployments. - Lineage (types.rs:124-149): parent_ids, generation, created_by (FREEFORM string, e.g. "evolver-{run_id}"), created_at. - Evaluations (types.rs:483-510): 7 hardcoded dimensions (correctness, completeness, efficiency, style_adherence, downstream_usability, coordination_overhead, blocking_impact); RubricLevel 5-band (failing/below/meets/exceeds/exceptional); FLIP source. - Functional agents: implicit via task ID prefixes (`.evaluate-*`, `.place-*`, `.assign-*`, `.flip-*`); pinned to claude:haiku per CLAUDE.md. - Bridge: src/agency/agency_bridge.rs already POSTs evaluations to Agency HTTP API and pulls assignments — partial alignment. - AssignmentSource tracks Native vs Agency{agency_task_id} (types.rs:686-716). - Workgraph LACKS: quality, domain_specificity, domain, scope, origin_instance_id, parent_content_hash, composition-rules.csv overlay. - Workgraph HAS that agency lacks: ContentRef, ComponentCategory, success_criteria, acceptable/unacceptable_tradeoffs, requires_human_oversight, AccessControl, PerformanceRecord, staleness_flags, RubricLevel, IterationConfig, FLIP, structured 7-dim evaluation. ### A.3 Delta map — four cohesive areas 1. **Schema fields** (additive): wg lacks 5 fields agency uses (quality, domain_specificity, domain, scope, origin_instance_id) + lineage `created_by` is freeform vs agency enum. Easy: add as serde-optional with defaults. 2. **Hash inputs** (load-bearing for federation): wg hashes wider inputs; same primitive will get different IDs in the two systems. THE federation primitive — must be settled first, governs all dependent migrations. 3. **CSV roundtrip** (federation surface): wg has YAML-per-file storage and an existing `wg agency import` command (src/commands/agency_import.rs) but unclear if it accepts agency's 12-col starter CSV byte-for-byte. 4. **Functional-agent scope + composition rules** (mechanism delta, not data delta): wg uses task-tag prefixes; agency uses primitive `scope` field + watched composition-rules.csv. Both encode the same intent — choose to adopt agency's mechanism additively without removing wg's task-tag scheme. Direction (per "deep alignment"): make wg a STRICT SUPERSET of agency — agency primitives drop in unchanged, wg's extensions persist. Hash-input alignment is the load-bearing primitive; everything else is additive. ## B. Subgraph shape (14 tasks, all paused, all --no-place) Topology: ``` research-agency-schema-delta ─────► impl-agency-schema-fields ─────► peer-review-agency-schema ─┐ research-agency-hash-compat ─────► impl-agency-hash-compat ─────► peer-review-agency-hash ─┤ design-agency-sync ► ├─► impl-agency-integration ─► synth-agency-end-to-end research-agency-csv-roundtrip ─────► impl-agency-csv-roundtrip ─────► peer-review-agency-csv ─┤ research-agency-scope-rules ─────► impl-agency-scope-rules ─────► peer-review-agency-scope ─┘ ``` - 4 research (claude:opus): one per delta area; runs in parallel. - 4 impl (codex:gpt-5.5): each --after its research; runs in parallel; non-overlapping file scopes specified per task to prevent merge conflicts. - 4 peer-review (claude:opus): cross-model review, --after each impl; runs in parallel; mirrors design-nex-chat pattern. - 1 integration (codex:gpt-5.5): --after all four peer-reviews; cross-cutting glue (init, stats, scan, agency_bridge field-mapping, compat-version constant, CLAUDE.md note). - 1 synthesis (claude:opus): --after integration; end-to-end byte-equality smoke against agency's upstream starter.csv. Rationale for parallel-vs-serial: - All four areas are research-able in parallel because they cite distinct files (types.rs vs hash.rs vs agency_import.rs vs prompt.rs). - Impl phase parallelizes via strict file-scope partition declared in each task description (no two impls write the same file). Hash impl is the trickiest dependency — but its file scope is hash.rs+lineage.rs+migration.rs only, so it doesn't block schema-fields impl which owns types.rs. - Peer reviews run pairwise with their impl; cross-model verifies the codex:gpt-5.5 implementations against an opus reviewer. - Integration is single-threaded — by definition glue work no area-impl owns. Synthesis is single-threaded — final verdict. Per-task model assignments (matches user's "assign codex:gpt-5.5 to everything" + design-nex-chat pattern): - Research: claude:opus (reasoning-heavy diff work) - Impl: codex:gpt-5.5 (per user request) - Peer review: claude:opus (cross-model independence) - Integration: codex:gpt-5.5 - Synthesis: claude:opus (final verdict + drift assessment) ## C. Filed subgraph (`wg list --status open` filtered to agency,sync) Research (4): - research-agency-schema-delta - research-agency-hash-compat - research-agency-csv-roundtrip - research-agency-scope-rules Impl (4): - impl-agency-schema-fields (--after research-agency-schema-delta) - impl-agency-hash-compat (--after research-agency-hash-compat) - impl-agency-csv-roundtrip (--after research-agency-csv-roundtrip) - impl-agency-scope-rules (--after research-agency-scope-rules) Peer review (4): - peer-review-agency-schema (--after impl-agency-schema-fields) - peer-review-agency-hash (--after impl-agency-hash-compat) - peer-review-agency-csv (--after impl-agency-csv-roundtrip) - peer-review-agency-scope (--after impl-agency-scope-rules) Integration (1): - impl-agency-integration (--after peer-review-agency-{schema,hash,csv,scope}) Synthesis (1): - synth-agency-end-to-end (--after impl-agency-integration) All filed with --paused --no-place. Each task description includes a ## Validation section, file scope (for impl tasks to prevent overlap), and file:line citations into both src/agency/* and the agency specification. ## D. Ready for publish Subgraph filed, ready for publish — chat agent should run `wg publish design-agency-sync --wcc` to release.
2026-05-04T17:47:54.204844871+00:00 Validated: 14 subtasks filed --paused --no-place, deps wired (4 research → 4 impl → 4 peer-review → 1 integration → 1 synthesis), models assigned per pattern (research/peer-review/synth=claude:opus, impl/integration=codex:gpt-5.5), no source/doc edits, design doc posted. Ready for chat agent to publish.
2026-05-04T17:48:28.614924671+00:00 Task pending eval (agent reported done; awaiting `.evaluate-*` to score)
2026-05-04T17:49:57.101176821+00:00 PendingEval → Done (evaluator passed; downstream unblocks)