**PRIVILEGED AND CONFIDENTIAL**

---

# Upload Your Own Prompt — System Overview

### The core privacy promise

When a user uploads their own MEE question, **the raw prompt text never touches our database.** It is processed in memory and then discarded server-side. Everything the server retains is either a hash or AI-generated output derived *from* the prompt.

---

### How it works end-to-end

**Step 1 — User pastes their question**
The user pastes an MEE essay prompt into a textarea, checks an attestation box confirming authorized use, and submits. Before submission, minimum 50 / maximum 10,000 characters are enforced.

**Step 2 — Server creates a session record (no prompt stored)**
The server computes a **SHA-256 hash** of the prompt text and stores that hash — not the text itself — in the `upload_sessions` database table. What goes into the DB at this point: the hash, the user ID, an attestation flag, a `status='pending'`, and an `expires_at` 90 minutes out. The raw prompt is gone from the server the moment the hash is computed.

**Step 3 — Client saves the prompt to IndexedDB**
After the server redirects back to the browser with the new `sessionId`, the client-side code calls `saveUploadSession()`, which writes the prompt text into the **browser's IndexedDB** (via the `idb` library, database named `shep-upload-sessions`). From this point forward, the prompt lives *only in the user's browser on that device.* If they clear site data or switch devices, it's gone.

**Step 4 — AI generation pipeline (prompt is input-only)**
The client auto-triggers a `generate` form action. On the server, a two-step LLM pipeline runs:

1. **Classifier** (Gemini 2.0 Flash) — receives the prompt, identifies the MEE subject (contracts, torts, etc.), confidence level, the essay's question "calls," and likely issue archetypes. Output is structured JSON; the prompt itself is not stored anywhere.
2. **Packet generator** (Gemini 2.5 Pro) — receives the classifier output plus the prompt again, and generates a complete grading framework: rule atoms, acceptable reasoning paths, reasoning moves. Again, the prompt is input only.

The server then **validates the generated hash** against the stored SHA-256 to confirm the same prompt was submitted (tamper check), then stores the **generated packet** — which contains zero prompt text — in the `generated_packet_json` column. The `prompt_text` column does not exist in the schema.

**Step 5 — Session display (prompt loaded from IndexedDB, not the server)**
When the practice page loads, an `onMount` hook calls `getPromptText(sessionId)` from IndexedDB. The server provides the session state (status, generated packet, detected issues) but **never the prompt**. If IndexedDB doesn't have it — wrong device, cleared cache — the UI shows: *"Your question needs to be re-uploaded"* or *"Original question not available — it was stored only in your browser for privacy."*

**Step 6 — User submits their response**
The server receives the written response, creates a submission record, runs the grading pipeline against the generated packet (no prompt text involved at this stage), and redirects to results. Results also load the prompt from IndexedDB for display — same pattern.

---

### Why AI is "blind" to the prompt after generation

The AI pipeline runs once at generation time. After that:

- The generated packet (rubric, rule atoms, grading criteria) is what the grading engine uses — not the prompt.
- The grading engine (V3.2) is the same engine used for platform-authored questions. It only sees the scoring bundle, which is derived from the packet, not the original text.
- There is no pathway in the codebase for any downstream system to retrieve the original prompt, because it was never written to the database.

---

### Key guarantees

| Guarantee | How it's enforced |
|---|---|
| Prompt never stored server-side | No `prompt_text` column exists in the schema |
| Audit trail without exposure | SHA-256 hash stored; not reversible |
| Device-scoped privacy | IndexedDB is browser-local; no server sync |
| Session expiration | 90-minute TTL on `upload_sessions` |
| Rate limiting | 5 uploads per user per hour via `reserveUploadSlot()` |
| Tamper detection | Server re-hashes submitted prompt and compares to stored hash before generation |
| Attestation | User confirms authorized use; stored as boolean flag |

---

### Summary

The prompt text exists server-side only for the ~5 seconds the LLM calls take, as an in-memory variable. After that it lives exclusively in the user's browser. Our database has a hash and a grading packet; it has never seen the question itself.
