FirmAdapt
FirmAdapt
LIVE DEMO
Back to Blog
law-firmse-discoverypredictive-codinglitigation-cost-reduction

E-Discovery Cost Reduction: How Predictive Coding Saved a Firm $2.3 Million

By Basel IsmailApril 2, 2026

A pharmaceutical company facing multidistrict litigation had a discovery obligation covering 4.2 million documents. The initial estimate for a traditional linear review, with contract attorneys processing documents one by one, came in at $3.4 million over 14 months. The firm proposed predictive coding instead. Total cost for the technology-assisted review: $1.1 million over 5 months, with defensibility metrics that exceeded what linear review typically achieves.

The $2.3 million difference was not just about spending less money. The timeline compression mattered more to the client than the cost savings because the litigation strategy depended on completing discovery before a key regulatory deadline.

How Predictive Coding Actually Works

Predictive coding, sometimes called technology-assisted review (TAR) or continuous active learning (CAL), uses machine learning to prioritize document review. The process starts with a senior attorney reviewing a seed set of documents, typically 1,000-2,000 documents selected to represent the range of issues in the case. The attorney codes each document as responsive, non-responsive, or privileged.

The algorithm learns from these coding decisions and ranks the remaining documents by predicted relevance. The highest-ranked documents get reviewed next, and those coding decisions further train the model. With each iteration, the algorithm gets better at predicting which documents are relevant.

The efficiency gain comes from the fact that most large document collections are overwhelmingly non-responsive. In a typical commercial litigation, only 3-8% of collected documents are actually relevant to the case. Predictive coding identifies this relevant subset without requiring human review of the other 92-97%. In the pharmaceutical case, only 4.1% of the 4.2 million documents were ultimately coded as responsive, which meant the linear review approach would have required attorneys to look at 4 million irrelevant documents to find the 172,000 that mattered.

The Cost Breakdown

Linear review costs are straightforward to calculate. Contract attorneys typically bill at $45-75 per hour for document review. An experienced reviewer processes 50-70 documents per hour. For 4.2 million documents at 60 documents per hour and $55 per hour, the math produces roughly $3.85 million in reviewer costs alone, plus project management, quality control, and hosting fees. The firm's $3.4 million estimate had already assumed some efficiency gains from batch processing and keyword culling.

The predictive coding costs broke down differently. Technology licensing and hosting ran about $180,000. The senior attorney time for training the model, reviewing the seed set, and conducting validation rounds totaled approximately $320,000. Review of the AI-prioritized documents by a smaller team of experienced attorneys cost $480,000. Quality control and defensibility testing added another $120,000.

The per-document cost dropped from approximately $0.81 with linear review to $0.26 with predictive coding. Scaled across 4.2 million documents, those per-unit savings compound dramatically.

Defensibility Concerns and How They Were Addressed

The most common objection to predictive coding is defensibility. Opposing counsel may argue that the algorithm missed responsive documents. Courts have addressed this issue repeatedly since Judge Andrew Peck's landmark 2012 opinion in Da Silva Moore, and the consensus has shifted firmly toward accepting technology-assisted review as reasonable, and in some cases more defensible than linear review.

The defensibility argument actually favors predictive coding in several ways. First, the process generates detailed metrics: precision, recall, F1 scores, and richness calculations that quantify exactly how thorough the review was. Linear review produces no comparable quality metrics. A firm conducting linear review can report that attorneys looked at every document, but they cannot quantify how accurate those attorneys were.

Second, predictive coding enables statistical validation. In the pharmaceutical case, the firm drew a random sample of 2,500 documents that the algorithm had classified as non-responsive and had senior attorneys review them manually. The elusion rate (the percentage of responsive documents in the non-responsive set) was 1.2%, which translates to a recall rate above 96%. Most linear reviews, when subjected to similar quality testing, show recall rates between 60% and 80%.

Third, the entire process is documented and reproducible. The seed set decisions, the algorithm's training iterations, the validation methodology, and the results are all logged. If opposing counsel challenges the review, the firm can produce a complete record of how every coding decision was made.

When Predictive Coding Makes Sense

Predictive coding produces the largest cost savings on large document collections, generally above 500,000 documents. Below that threshold, the setup costs and attorney time for training the model may not justify the investment compared to a well-managed linear review with keyword filtering.

The technology works best when the responsive documents share identifiable patterns, whether in language, participants, date ranges, or subject matter. Cases involving discrete events (a specific product defect, a particular transaction, a defined time period of alleged misconduct) tend to produce better predictive coding results than cases involving diffuse, ongoing conduct where relevance is harder to define.

Multi-issue cases present both an opportunity and a challenge. The algorithm can be trained to identify documents relevant to different issues simultaneously, which is more efficient than running separate reviews for each issue. But the training set needs to include examples of each issue, which increases the senior attorney time required for the initial coding rounds.

For law firms managing complex litigation, the ability to offer predictive coding as a standard tool has become a competitive differentiator. Clients increasingly expect their firms to use technology-assisted review on large cases, and they are reluctant to pay for linear review when a more efficient alternative exists.

What the Numbers Look Like Across Different Case Sizes

The $2.3 million savings in the pharmaceutical case represents a large-scale example, but the proportional savings apply across different case sizes. On a 1 million document collection, firms typically report 55-65% cost reductions versus linear review. On collections above 5 million documents, the savings often exceed 75% because the fixed costs of setting up the predictive coding workflow get amortized across more documents.

Time savings follow a similar pattern. A 4.2 million document linear review that would take 14 months can be completed in 4-6 months with predictive coding, depending on the complexity of the issues and the validation requirements. For cases on aggressive litigation schedules, this time compression can be more valuable than the cost savings.

The calculation gets interesting when you factor in accuracy. If predictive coding achieves 96% recall compared to 75% for linear review, the technology-assisted approach is not just cheaper and faster. It is also finding more responsive documents. The documents that linear review misses tend to be the ones with unusual language or unexpected relevance, exactly the documents that can change the trajectory of a case. Spending less money and getting better results is a combination that makes the adoption decision straightforward for firms willing to invest in the initial learning curve.

Ready to uncover operational inefficiencies and learn how to fix them with AI?
Try FirmAdapt free with 10 analysis credits. No credit card required.
Get Started Free
E-Discovery Cost Reduction: How Predictive Coding Saved a Firm $2.3 Million | FirmAdapt | FirmAdapt