# Drill Generation Pipeline

Documents the current state of the Paddock drill generation system as of April 2026. Written as a reference before expanding to balanced run variety (issue-spotting + argument-attack).

## Overview

When a student uploads a syllabus, the Paddock generates one **elements drill** per qualifying module and stores it in `paddock_generated_runs`. Drills are surfaced on the course page alongside the module they belong to.

## What Gets Generated

- **Type:** `drill` only (`run_type = 'drill'`)
- **Count:** One per qualifying module (no minimum, no explicit cap)
- **Qualifying condition:** Module must have `inferredDomain` set (all modules with a recognized legal domain)
- **Quality gate:** `isGenericContent()` rejects content containing placeholder markers, rules under 50 chars, generic element names, short fact patterns, or unnamed parties

## Generation Entry Points

Three API routes trigger the same pipeline:

| Route | When |
|---|---|
| `POST /api/paddock/enrich` | Primary — called after syllabus upload enrichment |
| `POST /api/paddock/setup` | Re-trigger for existing-plan dedup path |
| `POST /api/paddock/courses` | Manual course save |

All three apply the same filter and call `triggerRunGeneration()`.

## Pipeline Flow

```
Upload → enrichment → computeSyllabusFingerprint()
                    → syllabusHasGeneratedRuns(fingerprint)  ← idempotency check
                    → triggerRunGeneration({ courseId, modules, fingerprint })
                         → Promise.allSettled(modules.map(mod =>
                             POST /api/paddock/generate-runs/[courseId]/[mod.index]
                           ))
                    → scheduleBackground(runPrecomputeForCourse())
```

## Module Filtering

```typescript
const modulesForGeneration = plan.modules
  .map((m, originalIndex) => ({ m, originalIndex }))
  .filter(({ m }) => m.inferredDomain)          // skip modules without a domain
  .map(({ m, originalIndex }) => ({
    index: originalIndex,                         // 0-based position in plan
    title: m.title,
    inferredDomain: m.inferredDomain!,
    topicSignals: m.topicSignals ?? [],
    chipFocus: m.chipFocus ?? []
  }));
```

No slice or cap — all qualifying modules are dispatched.

## Per-Module Generation (`generate-runs/[courseId]/[moduleIndex]`)

Each module gets its own Vercel function invocation:

1. **Idempotency check** — if a run already exists for `(fingerprint, moduleIndex, 'drill')`, return cached
2. **Generate** — `generateDrillContent({ domain, moduleTitle, topicSignals, chipFocus })`
3. **Quality gate** — `isGenericContent()` rejects and returns 502 if content is generic
4. **Persist** — `insertGeneratedRun()` upserts to `paddock_generated_runs`
5. **Async enrichment** — `enrichDrillWithEmbeddings()` adds Jina v3 vectors to each keyFact phrase (runs in background via `scheduleBackground`)

**Budget:** 60s per module (`export const config = { maxDuration: 60 }`)
**Parallelism:** All modules run concurrently via `Promise.allSettled` — individual failures don't block other modules

## Rate Limiting

- **3 batches per user per hour** (`paddock_run_generation` action, 3600000ms window)
- Rate limit applies at the pipeline level (per-syllabus), not per-module
- A 16-module course counts as 1 batch, not 16

## Idempotency

Two layers:
1. **Pipeline level** — `syllabusHasGeneratedRuns(fingerprint)` skips the entire fanout if *any* run exists for this fingerprint
2. **Module level** — `getGeneratedRunForModule(fingerprint, moduleIndex, runType)` skips individual modules already generated

⚠️ **Known gap:** If some modules fail on first generation and others succeed, the pipeline-level check (`syllabusHasGeneratedRuns`) prevents the failed modules from being retried on subsequent triggers.

## Database

### `paddock_generated_runs`
| Column | Notes |
|---|---|
| `syllabus_fingerprint` | SHA-256 of sorted module titles — ties runs to a specific syllabus version |
| `module_index` | 0-based position in `plan_json.modules` array |
| `module_title` | Denormalized for display |
| `domain` | e.g. `"compliance & regulatory"` |
| `run_type` | `CHECK (run_type IN ('drill', 'issue-spotting', 'argument-attack'))` — all 3 types are schema-valid |
| `content_json` | Full `PreAuthoredRunContent` blob |
| `content_id` | Deterministic: `gen-{hash(fingerprint + moduleIndex + runType)}` |
| `expires_at` | Runs expire; student only sees non-expired runs |

### `paddock_run_attempts`
Also supports all three `run_type` values. No changes needed for new types.

## Content Types

All defined in `$lib/practice/content/paddock/types.ts`:

```typescript
type RunType = 'drill' | 'issue-spotting' | 'argument-attack';

// Drill: elements identification
interface DrillContent {
  rule: string;
  elements: Array<{ name: string; keyFacts: KeyFact[] }>;
  factPattern: string;
  parties: string[];
  hint: string;
}

// Issue spotting: identify legal issues in sentences
interface IssueSpottingContent {
  factPattern: string;
  sentences: Array<{ text: string; issues: string[]; explanation: string }>;
  allIssues: string[];
  issueLabels: Record<string, string>;
}

// Argument attack: identify flawed reasoning
interface ArgumentAttackContent {
  flawedArgument: string;
  flawType: string;
  flawOptions: string[];
  correctExplanation: string;
  difficultyNotes: string;
}
```

## Student-Facing Surface

Drills are surfaced on `/paddock/[courseId]`:

1. Loader fetches `getGeneratedRunsForSyllabus(fingerprint)` — returns all non-expired runs for the course
2. `buildAvailableDrillsByModule(normalizedModules, generatedDrillRecords)` groups by `module_index`
3. `buildDoctrinesByModule()` assembles key doctrine state for each module
4. Student navigates to `/drill?contentId=gen-...` to do the drill

## Quality Gate — `isGenericContent()`

Located in `run-content-generator.ts`. Rejects content if:
- Contains generic markers: `"Element 1:"`, `"The foundational requirement"`, `"applicable legal standard or test"`, etc.
- `rule.length < 50`
- Any element name matches `/^Element \d/i`
- `factPattern.length < 100`
- Any party matches `/^the (plaintiff|defendant)$/i`

## Known Limitations

1. **No minimum guarantee** — modules can fail the quality gate and get no drill; there's no retry
2. **Pipeline-level idempotency blocks retries** — once any module succeeds, the whole course is considered "cached"
3. **Only drills** — issue-spotting and argument-attack generators don't exist yet (Linear: see balanced run variety issue)
4. **Student routes are drill-only** — `/drill/+page.server.ts` only resolves `runType === 'drill'`

## Related

- Linear issue: Balanced run variety per module (drill + issue-spotting + argument-attack)
- `src/lib/server/paddock/run-content-generator.ts` — LLM generation + quality gate
- `src/lib/server/paddock/run-generation-pipeline.ts` — fanout orchestration
- `src/lib/server/paddock/drill-embedding-enrichment.ts` — Jina v3 embedding enrichment
- `src/lib/server/db/paddock-generated-runs.ts` — DB layer
