re-inspect-pii

Re-inspect: PII in WorkGraph PUBLISH SURFACE only (not raw graph)

Metadata

Statusdone
Assignedagent-226
Agent identity3184716484e6f0ea08bb13539daf07686ee79d440505f1fdf2de0357707034c3
Created2026-05-02T01:32:19.624668945+00:00
Started2026-05-02T01:32:49.315843144+00:00
Completed2026-05-02T01:41:51.473421783+00:00
Tagsgrant,urgent,trace-inspection,landing-page, eval-scheduled
Eval score0.88
└ blocking impact0.88
└ completeness0.88
└ constraint fidelity0.70
└ coordination overhead0.90
└ correctness0.88
└ downstream usability0.85
└ efficiency0.85
└ intent fidelity0.78
└ style adherence0.93

Description

Description

The previous inspection (~/poietic.life/notes/incorporation-trace-inspection-20260501.md) flagged PII in 67+ files of the unpacked incorporation trace and recommended 4-8h redaction. That estimate assumed publishing the entire raw .workgraph directory. Erik (correctly) pointed out that's not what 'publish' looks like — WorkGraph publishes an HTML view (task graph viewer + task pages + log entries + artifacts), not the full raw state with chat-history JSONL and per-agent message stores.

This task RE-SCOPES the PII analysis to the actual publish surface, and re-estimates redaction burden.

What to do

1. Identify the publish surface

Inspect the WorkGraph codebase / docs to determine what wg publish (or equivalent HTML export) actually exposes from a trace. Likely candidates to confirm:

  • Task titles and descriptions
  • Task status, dependency edges, agent assignments
  • wg log entries on each task
  • Artifacts registered by tasks
  • Maybe: select agent log excerpts
  • Probably NOT: raw chat-history-*.jsonl, raw messages/*.jsonl per-agent stores, raw worktree contents

If unsure, lean on the WorkGraph repo at github.com/graphwork/workgraph (use WebFetch) or the local install. Be explicit about what's published vs what stays local.

2. Re-map the PII findings to the publish surface

The previous inspection flagged these as PII-bearing:

  • messages/record-founder-personal.jsonl
  • chat-history-5.jsonl
  • log/agents/record-founder-personal/
  • Plus 'agents/ and log/ directories generally' for SSNs, addresses, phones across all 3 founders

For each, determine: does this content actually appear in the publish surface? If messages/*.jsonl is local-only and not part of the HTML export, then record-founder-personal.jsonl is moot for publication purposes.

3. Re-scan the trace for PII that DOES land in the publish surface

Specifically search task descriptions, wg log entries, and artifact contents (not raw chat history) for:

  • SSNs (regex: \b\d{3}-\d{2}-\d{4}\b or 9-digit context)
  • Founder home addresses (street + city + state patterns)
  • Phone numbers
  • Bank account numbers, EIN beyond what's already public, anything else sensitive

If you find PII in these places, list the specific task IDs / log entries.

4. Re-estimate redaction burden

Given the actual publish surface and actual PII hits, re-estimate:

  • Number of items to redact
  • Wall-clock effort (likely 30 min - 2 hr if scoped to publish surface only)
  • Whether founder sign-off is still needed (probably yes, but lighter)

5. Recommend disposition

Three options reframed:

  • Option A: Publish HTML view as-is if no PII actually lands in the publish surface
  • Option B: Targeted scrub if a small handful of items need redaction (specify exactly which)
  • Option C: Defer if even the targeted scrub would take more than the time Erik has

What to read

  • incorporation-trace/.workgraph/ (the unpacked trace)
  • ~/poietic.life/notes/incorporation-trace-inspection-20260501.md (previous inspection)
  • WorkGraph publish/export logic (locally installed wg, or github.com/graphwork/workgraph via WebFetch)

Output

Write ~/poietic.life/notes/incorporation-trace-pii-rescope-20260501.md (under 1000 words):

  1. Headline (one paragraph): how much smaller is the actual redaction burden vs the original estimate?
  2. Publish surface — what wg publish actually exposes (with evidence: code paths, doc references, or empirical exports)
  3. PII in publish surface — table of (PII type, location in publish surface, count). If empty, say so.
  4. Revised redaction burden — minutes-to-hours estimate with concrete steps
  5. Recommendation — Option A / B / C with one-paragraph justification

wg log a one-paragraph summary on this task.

Constraints

  • Read-only inspection.
  • Do NOT quote any actual PII in the output (file paths and counts only).
  • No em-dashes.
  • Under 1000 words.
  • Be honest if the publish surface DOES carry PII the previous inspection caught — don't sandbag to make the answer Erik wants to hear.

Validation

  • Publish surface identified with evidence
  • PII findings re-mapped to publish surface only
  • Targeted PII rescan done
  • Revised burden estimate is concrete (minutes-to-hours)
  • Disposition recommendation explicit
  • Output at ~/poietic.life/notes/incorporation-trace-pii-rescope-YYYYMMDD.md

Depends on

Required by

Messages 2 messages (2 unread)

  1. #1user2026-05-02T01:34:22.764969156+00:00read
    ERIK CLARIFICATION (do this empirically, not by reasoning): Run `wg html` (or whatever the actual command is for HTML export — check `wg --help`) AGAINST the unpacked trace at `incorporation-trace/.workgraph/`. The output of that command IS the publish surface — the actual files a public visitor would see. Inspect those rendered HTML files for PII (SSNs, addresses, phones). Don't reason about what 'should' or 'shouldn't' be in the publish surface — just generate it and grep it.
    
    Concrete steps:
    1. `cd incorporation-trace && wg html` (or equivalent — figure out the right invocation; might be `wg html --output ./html` or similar)
    2. Note the output directory
    3. `grep -rE '\b\d{3}-\d{2}-\d{4}\b' <output-dir>` for SSNs
    4. Search for known founder address fragments, phone numbers
    5. List PII hits with file paths
    6. Estimate scrub effort based on the actual hit count
    
    Don't scrub the raw `.workgraph` — that's local state, not publish surface. Scrub the rendered HTML if anything (or re-render after fixing source task descriptions). The recommendation should be based on what the HTML actually contains, not what we think it would contain.
  2. #2re-inspect-pii2026-05-02T01:41:33.130268082+00:00read
    Acknowledged — re-ran the analysis empirically per your instructions. Rendered wg html against incorporation-trace/.workgraph/ produced 494 HTML files (5.1MB). Grep for SSN/address/EIN/phone patterns: PII appears in exactly 3 of 494 files (index.html plus the 2 record-founder-personal task pages). Phone numbers: zero in publish surface. Then ran a regex source-scrub of graph.jsonl and messages/record-founder-personal.jsonl on a copy of the trace and re-rendered: zero PII hits in the 494-page output. Total wall-clock for scrub-and-rerender was ~5 minutes; full burden including founder sign-off on the 3-record diff is 30-60 minutes. Updated report at ~/poietic.life/notes/incorporation-trace-pii-rescope-20260501.md (776 words). Recommendation: Option B (source scrub then re-render), validated end-to-end.

Log