Case Study·

Case Study: How AI Turned 15 Hours of Legal Review Into 15 Minutes

A real engagement where we used Claude Code to parse 1,271 co-parenting messages from PDF exports, structure them into searchable data, and generate a cited evidence report for discovery. What used to take 15 hours of paralegal work took 15 minutes.

A paralegal sends over the assignment: take these discovery responses, find messages in the communication platform that support each answer, and compile everything into a report with citations. The communication log is five months of co-parenting messages on Our Family Wizard (OFW), exported as a PDF. Over 1,200 messages across 180 conversation threads.

The estimated time? About 15 hours of manual review.

We did it in 15 minutes.

The takeaway: This wasn't a research experiment or a demo with synthetic data. This was a real case with real deadlines. The AI-generated evidence report was used directly in the legal proceeding, and the attorney's feedback was that the output quality exceeded what they typically receive from manual review.


The problem: great data, terrible format

Our Family Wizard is a co-parenting communication platform widely used in family law. Courts like it because every message is timestamped, logged, and admissible. Attorneys like it because it creates a clean paper trail.

What nobody likes is the export format.

OFW only lets you export messages as a PDF. Not CSV, not JSON, not any structured format that a computer could meaningfully work with. The PDF uses a thread-view layout where the newest reply sits at the top of each page, with the full thread history repeated below it. Messages get duplicated across pages. Headers blend into body text. Page breaks split messages mid-sentence.

For a human reviewer, this means reading through hundreds of PDF pages, mentally tracking which messages they've already seen (because duplicates are everywhere), cross-referencing each discovery question against the full message history, and manually assembling a report with exact quotes and timestamps.

It's tedious, error-prone, and expensive. And it's fundamentally a data problem pretending to be a legal problem.


The approach: treat it like a data pipeline

Once we reframed the task, the solution became clear. This wasn't about reading documents faster. It was about transforming unstructured data into structured data, and then running queries against it.

We used Claude Code (Anthropic's agentic coding tool) to build the entire pipeline in a single session. Here's what the pipeline looks like end to end:

Phase 1: PDF parsing and structuring

The first challenge was turning the OFW PDF into clean, structured data. Claude Code built a custom parser that:

  1. Extracted raw text from the PDF using pdfjs-dist, handling the multi-page layout where text content flows across page boundaries.
  2. Cleaned the output by stripping page headers (| Message Report Page X of Y), handling cross-page text duplication, and normalizing whitespace.
  3. Parsed message headers using regex patterns to identify the Sent, From, To, First Viewed, and Subject fields for each message, then captured the body text between headers.
  4. Deduplicated messages by keying on (sender, timestamp) and keeping the longest body variant, since the thread-view format repeats messages across pages with varying amounts of truncation.
  5. Grouped messages into threads by normalizing subjects (stripping Re: prefixes) and clustering messages that belong to the same conversation.
  6. Assigned message numbers by correlating the Message X of Y markers in the PDF with parsed messages, preserving the official OFW numbering for citation purposes.

The result: 1,271 unique messages organized into 180 threads, spanning five months of communication. Each message came out as a clean JSON object:

{
  "id": 142,
  "messageNumber": 605,
  "sent": "2026-01-30T15:35:00",
  "from": "Party A",
  "to": "Party B",
  "firstViewed": "2026-01-30T15:45:00",
  "subject": "Custody Scheduling Rule",
  "body": "Here's what I'm thinking for the schedule...",
  "threadId": "custody-scheduling-rule",
  "wordCount": 69
}

The parser handled edge cases that would trip up a naive approach: messages where the subject line was missing, attachments listed inline, timestamps in 12-hour format with AM/PM, and the OFW-specific quirk where "First Viewed" timestamps sometimes appear on the same line as the recipient.

Phase 2: AI-powered evidence matching

With structured data in hand, the second phase was where things got interesting. We provided Claude with two inputs:

  1. The structured JSON containing all 1,271 parsed messages with full metadata
  2. The discovery responses PDF containing the specific claims that needed supporting evidence

What's worth emphasizing: the AI doesn't just ingest the entire message history and produce a summary. It works the data the way an experienced researcher would, narrowing and navigating strategically. For a discovery answer about a scheduling dispute in late February, the AI would first filter to messages from that time period, scan subject lines to identify which threads are likely relevant, pull those specific threads, read through them for supporting details, and then extract the exact quotes with message numbers.

It's a targeted, iterative search process. The AI identifies the timeframe and topic from the discovery question, narrows to the right threads, reads context within those threads, and pulls the evidence. That's why the results are precise rather than vague: the model isn't trying to hold 1,271 messages in its head at once. It's doing what a good paralegal does, but across the entire dataset in seconds instead of hours.

And because the data was structured with thread IDs, timestamps, and sender metadata, the AI could follow a conversation across multiple messages, understand who said what and when, and identify patterns that span weeks of back-and-forth exchanges.

Phase 3: The output

The AI generated a comprehensive evidence report that included:

  • A chronological timeline of key events with citations to specific messages
  • Pattern analysis identifying recurring behaviors across the communication log (e.g., scheduling disputes that followed a predictable escalation pattern over multiple weeks)
  • Direct quotes from relevant messages with OFW message numbers for easy verification
  • Thread summaries that captured the arc of multi-message conversations, not just cherry-picked individual messages
  • Response time analysis drawn from the sent and firstViewed timestamps

One thing that surprised us: the narrative quality. The AI didn't just find relevant messages; it assembled them into a coherent, chronological account that showed how situations developed over time. When eight weeks of Saturday scheduling disputes each followed a similar pattern, the AI identified that pattern, documented each instance with citations, and presented them as a connected sequence rather than isolated data points. The resulting report read like something a senior paralegal would produce after days of careful work.

The legal team used the report directly, supplementing where needed rather than starting from scratch.


Why this worked (and when it wouldn't)

This approach worked well because of a few specific conditions:

The data was bounded. 1,271 messages is a large volume for a human to read cover-to-cover, but it fits comfortably in a modern LLM's context window once structured. We weren't dealing with millions of documents where retrieval-augmented generation (RAG) or chunking strategies would be necessary.

The task was search-and-match, not judgment. We weren't asking the AI to assess credibility, weigh legal arguments, or decide what matters. We were asking it to find messages that match specific factual claims. That's a retrieval and correlation task where AI excels.

The source data was structured once parsed. The hard part was getting from PDF to JSON. Once that bridge was crossed, the analysis was almost trivially easy for the model. Structured data with consistent fields (sender, timestamp, body, thread) is exactly what LLMs handle well.

Human review remained in the loop. The attorney reviewed every citation, verified quotes against the original OFW records, and made the final decisions about what to include. The AI did the grunt work. The human did the judgment work.

This approach would be less effective for tasks that require subjective legal reasoning, interpretation of ambiguous language in context only an attorney would understand, or situations where the relevant evidence isn't in the text itself but in what's missing from the text.


The numbers

At a typical paralegal billing rate of $150/hour, 15 hours of manual document review costs $2,250. The AI-assisted pipeline used less than $5 in API costs. That's a 450x cost reduction.

MetricManual ProcessAI-Assisted
Time to complete~15 hours~15 minutes
Cost~$2,250 (@ $150/hr)< $5 in API costs
Messages reviewed~1,271 (with re-reads)1,271 (exhaustive, single pass)
CoveragePartial (fatigue-dependent)Complete (every message searched)
Citation formatManual transcriptionAuto-generated with message numbers

The time and cost savings are striking, but the coverage improvement might matter more. A paralegal reading through 1,200+ messages for 15 hours will inevitably skim, skip, and miss things. Fatigue is real. The AI searched every single message against every single discovery question, every time. And for the kinds of cases where billing sensitivity matters, the difference between $2,250 and $5 can change the calculus on what evidence review is even worth pursuing.


We're not suggesting that AI replaces paralegals or attorneys. The legal judgment, the strategy, the advocacy: those remain deeply human. But a significant portion of legal work is information retrieval dressed up as professional services. Finding the needle in the haystack. Cross-referencing document A against document B. Compiling chronologies from raw records.

These are data problems. And for data problems, the right tool isn't a faster reader. It's a pipeline that transforms the data into a structure where the answers become queryable.

The tools for building these pipelines exist today. Claude Code built a working PDF parser and evidence analysis system in a single session. The barrier isn't technology. It's recognizing which legal tasks are actually data tasks in disguise.

If your team spends hours manually reviewing communication logs, discovery documents, or compliance records, there's likely an AI-assisted approach that can compress that work from days to minutes, with better coverage and full citations.

Get in touch if you want to explore what that looks like for your practice.