# Paddock Setup Redesign — Handoff Document

**Date:** 2026-04-19
**Branch:** `adjust-page-load-stuff`
**Linear:** UNIVERS-105 (frozen upload screen)
**Diagram:** `docs/diagrams/paddock-setup-flow-v2.html` (open in browser to render)

---

## Problem Statement

The paddock setup flow (`/paddock/new`) has two core UX problems:

1. **Frozen screen on confirm:** After the student clicks "Confirm" on the classification form, a spinner overlay locks the form for 5-15 seconds while the confirm API runs. The screen doesn't change — it looks frozen. *(Partially addressed in this branch: spinner replaced with a progress phase screen.)*

2. **Enrichment wait with no feedback:** After confirm, the student lands on `CourseFormingDesk` and waits 70-200 seconds with only pulsing stage indicators. No real-time progress. No way to do anything useful.

3. **The confirmation form collects the wrong things:** Domain (freetext, corrupts canonical matching), professor name (corrupts dedup), and format hint (after the parse already ran). The form should be eliminated and replaced with an inline editor.

---

## Current Architecture

### Upload Flow (what exists today)

```
Upload → text extraction → LLM parse (3-tier cascade) → SSE stream to client
→ confirm_required event → full classification form (title, professor, term,
  format, domain, meeting days) → confirm API → workspace created → enrichment
  triggered → redirect to dashboard → CourseFormingDesk (70-200s wait)
```

### Parser Pipeline (3-tier cascade)

| Tier | Method | Trigger | Output |
|------|--------|---------|--------|
| 1 | Regex date extraction | Always runs first | Milestones with real dates |
| 2 | Structural topic extraction (numbered, roman, lettered, module, UPPERCASE) | Tier 1 yields 0 milestones | Topics with synthesized dates |
| 3 | LLM semantic extraction (Claude API) | Tier 2 yields <3 topics | Milestones with LLM-extracted dates or synthesized |

Cascade is robust — eventually gets milestones for most syllabi. The problem is in format detection and the UX around it, not the extraction itself.

### Format Detection (current — broken)

**File:** `apps/web-svelte/src/lib/server/paddock/syllabus-metadata.ts`
**Function:** `detectFormatHint()`

Counts raw lines matching regex patterns:
- `DATE_LINE`: lines starting with `Jan 14`, `1/14`, etc.
- `NUMBERED_LINE`: lines starting with `1.`, `2)`, etc.
- `TOPIC_HEADING`: lines starting with `Topic #1`, `Unit 3`, etc.
- Threshold: **5+ matches** → assign format hint

**Bug:** Counts raw date lines, not date+topic pairs. A syllabus with 5 administrative dates ("Jan 14: Office hours change", "Feb 3: Paper 1 due", "Mar 10: Midterm", "Apr 1: Paper 2 due", "May 5: Final exam") triggers `formatHint = 'dated'` even though it's a topic-based syllabus with scattered admin dates.

**Same bug for numbered:** A syllabus with "1. Course Description, 2. Grading Policy, 3. Attendance, 4. Materials, 5. Topics" triggers `formatHint = 'numbered'` even though only #5 is topical.

**Impact:** Format hint is sent to the client and shown to the student. Wrong hint erodes trust. Also used by the confirm endpoint to decide whether a format correction triggers re-parse.

### Confirmation Form (current — wrong fields)

**File:** `apps/web-svelte/src/lib/components/paddock/CourseClassificationConfirm.svelte`

Collects:
- `courseTitle` — auto-detected, minor impact *(keep as editable in new editor)*
- `professorName` — used for canonical identity dedup. Freetext means students type inconsistently ("Prof. Johnson" vs "Professor Johnson" vs blank), corrupting dedup. *(Remove from student input — dedup should use rawTextHash)*
- `termLabel` — cosmetic display label *(low priority, could keep in editor)*
- `formatHint` — dropdown: dated/numbered/topic-based/unclear. Shown AFTER parse already ran. Changing it doesn't trigger re-parse in the cached-plan path. *(Remove — format is validated implicitly by whether the editor result looks right)*
- `domain` — freetext. Flows into `applyDomainOverride()` and `updateCanonicalIdentity()`. A student typing "MY CRIMLAW CLASS" corrupts the canonical domain that drives scenario matching for all students in that course. *(Remove entirely — domain must be system-inferred only)*
- `meetingDays` — checkboxes. Pre-filled if detected. *(Keep — move to editor, make required when not detected)*

**File:** `apps/web-svelte/src/routes/api/paddock/setup/confirm/+server.ts`

Confirm endpoint does:
1. Lock the draft row
2. Apply classification overrides (title, professor, term, format, domain, meeting days)
3. Check for existing workspace (dedup via rawTextHash)
4. Use cached parsed plan if available (Layer 2 from this branch), or re-parse
5. If meeting days changed from detected: rebuild week plan via `buildSyllabusPlan()`
6. Create workspace with plan + module assignments
7. Save canonical identity
8. Trigger enrichment (POST to `/api/paddock/enrich`)

### Enrichment Pipeline

**File:** `apps/web-svelte/src/routes/api/paddock/enrich/+server.ts`

Runs as background job (300s Vercel Pro budget). Steps:
1. Concept extraction (Pass 2B) — LLM, 10-40s
2. Topic selection — deterministic, 3-8s
3. Domain classification — LLM, 5-15s
4. Scenario template synthesis — LLM, 15-30s (only for novel domains)
5. Scenario matching — LLM + DB search, 20-60s
6. Scenario persistence — DB, 5-10s
7. Quality gate — validation, 2-5s
8. Final plan write — DB, <1s
9. Run generation trigger — background job
10. Precompute — background job, 8-20s

**Total: 70-200 seconds.** No real-time feedback to client today.

### Dashboard Desk States

**File:** `apps/web-svelte/src/lib/dashboard/desk-state-resolver.ts`

States: `INTRO_DESK → COURSE_FORMING_DESK → COURSE_READY_TWO_COLUMN_DESK → MATURE_THREE_COLUMN_DESK`

`COURSE_FORMING_DESK` is shown while enrichment is running. *(Fixed in this branch: gated on `courseCount <= 1` so multi-course users don't regress to it.)*

**File:** `apps/web-svelte/src/lib/components/dashboard/desk/CourseFormingDesk.svelte`

Shows 3 pulsing stages: "Syllabus mapped" → "Weeks extracted" → "Practice preparing". *(Enhanced in this branch: `enrichmentStatus` threaded through, failure state handled.)*

---

## What's Already Done (This Branch)

### Commits on `adjust-page-load-stuff`:

1. **SSE progress events during upload** — the setup endpoint now emits `step` events during the parse phase so the student sees "Reading your syllabus..." → "Building your weekly plan..." with module animations.

2. **Parsed plan caching** — the LLM parse result is cached as JSONB in `paddock_setup_drafts.parsed_plan_json`. The confirm endpoint uses the cache instead of re-parsing (eliminates 8-30s duplicate LLM call). Migration 215.

3. **Enrichment status on CourseFormingDesk** — `enrichmentStatus` is threaded from DB through widget queries to the desk component. Failed enrichment shows "Scenarios unavailable" with guidance copy.

4. **Dashboard desk regression fix** — `COURSE_FORMING_DESK` is now gated on `courseCount <= 1`. Multi-course users adding a new forming course stay on their current desk state.

5. **Confirm progress screen** — when student clicks "Confirm", the classification form is replaced with a progress card (cycling step labels) instead of a frozen spinner overlay. *(This is interim — the full redesign below replaces the confirm form entirely.)*

---

## Proposed Redesign

### 1. Eliminate the confirmation form → Replace with side-by-side editor

**Remove from student input:**
- `domain` — system-inferred only, never user-editable. Flows from `_canonicalDomain` in parse.
- `professorName` — dedup via `rawTextHash`, not user-typed name. Store detected name as display label only.
- `formatHint` — implicitly validated by whether the parsed plan looks right. No dropdown.

**Keep in editor:**
- `courseTitle` — editable inline, auto-detected
- `meetingDays` — required when not detected, pre-filled when detected
- `termLabel` — optional, low priority
- The week plan itself — each row is editable (title, date)

**Layout:** Syllabus text on left (readonly, scrollable), parsed week plan on right (editable fields). Same screen for all parse quality levels — the difference is how much is pre-filled.

### 2. Fix format detection

**In `detectFormatHint()` (`syllabus-metadata.ts`):**
- Before counting date lines, filter through `isMetadataDateLine()` and `isAdministrativeLine()` — only count lines where a date is genuinely paired with course topic content.
- Before counting numbered lines, filter through the non-topic header list (Grading, Materials, Prerequisites, Attendance, etc.).
- Lower threshold from 5 to 4 for dated (4 date+topic pairs = one month of weekly dated topics, enough to confirm).

### 3. Topic distribution algorithm (for topic-based and partial syllabi)

**Inputs:**
- N = number of topics extracted
- M = meetings per week (1, 2, or 3) — from student's meeting days selection
- W = semester weeks (default 14, derivable from term label)

**Formula:**
```
total_meetings = W × M
ratio = total_meetings / N

if ratio ≈ 1 (0.8–1.5):  → 1:1 mapping, leftover = flex/review
if ratio > 1.5:           → Multi-session: LLM assigns weights (heavy topics get more meetings)
if ratio < 0.8:           → Grouped: LLM combines related topics into shared meetings
```

**For partially temporal syllabi:**
- Parse the structured part (real dates/weeks) as-is
- Count remaining weeks: `W_remaining = W - structured_weeks`
- Distribute floating topics across remaining weeks using the same formula
- Mark boundary in editor: "● from syllabus" vs "○ synthesized"

### 4. Redirect to course workspace, not dashboard

After workspace creation, redirect to `/paddock/[courseId]` instead of `/dashboard`. The week plan is visible immediately. Enrichment results stream in via SSE.

### 5. Real-time enrichment via SSE on workspace

**New endpoint:** `GET /api/paddock/enrich-status/[courseId]` — SSE stream.

The enrichment pipeline already updates `enrichment_status` in the DB at each stage. The new SSE endpoint:
1. Checks current status on connect (send initial state)
2. Polls DB every 3-5s (or uses pg NOTIFY for true push)
3. Emits events as stages complete: `concepts_done`, `domain_done`, `scenarios_done`, `generation_done`, `precompute_done`
4. Client updates workspace UI in place — scenarios appear, "Your course is ready" banner activates

Follows the same SSE pattern already used in the upload streaming phase.

### 6. Confidence-based editor behavior

| Confidence | Trigger | Editor behavior |
|---|---|---|
| High (dated, 8+ milestones) | 4+ date+topic pairs, sequential dates | Pre-filled, minimal editing. Meeting days pre-filled. |
| High (numbered, 8+) | 4+ week-labeled milestones | Pre-filled with week numbers. Dates synthesized. Meeting days may need input. |
| Medium (topic-based, 8+) | 8+ topics, no temporal structure | Topics pre-filled. Meeting days REQUIRED. Dates synthesized after days provided. |
| Medium (numbered/dated, 4-7) | Fewer milestones but clear structure | Pre-filled. Subtle note: "We found a shorter schedule than expected." |
| Low (ambiguous, 4-7) | No structure, few items | Show what we found. Note: "We found fewer topics than expected. Is this the full course?" |
| Very low (<4 milestones) | Parse mostly failed | Editor shows best attempt. Student corrects heavily or adds weeks manually from syllabus text. |

---

## Files to Modify

### Frontend (existing files)

| File | Change |
|------|--------|
| `apps/web-svelte/src/lib/components/paddock/PaddockQuickSetup.svelte` | Replace confirm phase with editor phase. Remove `CourseClassificationConfirm` import. Add side-by-side editor layout. |
| `apps/web-svelte/src/lib/components/paddock/CourseClassificationConfirm.svelte` | **Delete** — replaced by editor. |
| `apps/web-svelte/src/routes/paddock/syllabus-api.ts` | Update types — remove `classification` fields that are eliminated (domain, professorName from student input). |

### Frontend (new files)

| File | Purpose |
|------|---------|
| `apps/web-svelte/src/lib/components/paddock/SyllabusEditor.svelte` | The side-by-side editor component. Left: syllabus text. Right: editable week plan + meeting days. |
| `apps/web-svelte/src/lib/components/paddock/WeekPlanEditor.svelte` | The right panel — editable week rows with title, date, add/remove. |

### Backend

| File | Change |
|------|--------|
| `apps/web-svelte/src/lib/server/paddock/syllabus-metadata.ts` | Fix `detectFormatHint()`: filter admin/metadata lines before counting. Lower dated threshold to 4. |
| `apps/web-svelte/src/routes/api/paddock/setup/confirm/+server.ts` | Remove `domain` from classification schema. Remove `professorName` from classification (keep auto-detected). Keep `meetingDays`, `courseTitle`, `termLabel`. Accept edited week plan from editor. |
| `apps/web-svelte/src/routes/api/paddock/enrich-status/[courseId]/+server.ts` | **New** — SSE endpoint for real-time enrichment status. |
| `apps/web-svelte/src/routes/paddock/[courseId]/+page.svelte` | Subscribe to enrichment SSE. Show scenarios appearing in real-time. |

### Shared

| File | Change |
|------|--------|
| `apps/web-svelte/src/lib/dashboard/desk-types.ts` | No change needed — `enrichmentStatus` already threaded. |
| `apps/web-svelte/src/lib/server/paddock/topic-distribution.ts` | **New** — the meeting-frequency-aware topic distribution algorithm. |

---

## Open Questions

1. **Semester length inference:** Default to 14 weeks. Derive from term label if possible ("Fall 2026" → Aug 19 to Nov 21 → 14 weeks). Should we ever ask the student explicitly, or just default + let them add/remove weeks in the editor?

2. **Manual entry fallback:** When parse completely fails, the editor is mostly empty. Should we build a "manual setup" mode (student types topics one by one), or is the empty editor with syllabus text on the left sufficient?

3. **Editing after creation:** The workspace already has a plan editor. Should the setup editor and the workspace editor share the same component, or are they different enough to warrant separate implementations?

4. **SSE vs polling for enrichment:** SSE is more elegant but requires a long-lived connection. Polling every 5s is simpler and more resilient to network interruptions. The upload stream already uses SSE, so the pattern exists.

5. **Starter scenario while waiting:** Surface a foundational practice scenario from the detected domain in the workspace while enrichment runs. Good for engagement but adds complexity. Ship after core flow is stable?

---

## Render the Diagram

Open this file in any browser to see all diagrams rendered:

```
docs/diagrams/paddock-setup-flow-v2.html
```

Previous version (v1, before editor decision):

```
docs/diagrams/paddock-setup-flow.html
```
