With AACER access Populate Axis 3 for all 94 districts -- complete the composite model Attorney-level national benchmarking (every consumer bankruptcy attorney, scored) Verified 1328(f) violation count with attorney attribution Publishable dataset for peer review Without AACER access Continue...

Lab Notebook -- Open Bankruptcy Project Explained

Last updated: April 2, 2026

What this page is

A running research notebook documenting new tools, datasets, and findings as they develop. Not public, not indexed, not linked from anywhere. Updated between conversations. Intended as a shared reference for research collaborators -- see what's been built, what the data says, and where the gaps are.

Current State of the Toolkit

5.1M FJC cases loaded
(94 districts)

347 Analysis tools
(Python scripts)

38 Firms scored
(review + FJC cross-ref)

264 Verified 1328(f)
violations

30 Districts with
Ch.13 baselines

New: 3-Axis Composite Mill Detection Model

Built April 2, 2026

The core question: can you identify bankruptcy mills from public data alone, without relying on insider knowledge or media reports? We now have a working 3-axis model that cross-links independent signals.

Axis 1: Suppression

Google review 1-star percentage

Mills suppress or prevent negative reviews. Control group median: ~11%. Suppressed mills: 0-3%.

Weight: 0-35 points

Axis 2: Solicitation

1-review Google account ratio

Mills pad reviews with single-use accounts. Legitimate firms: 0-15%. Mills: 15-25%+.

Weight: 0-35 points

Axis 3: Outcome

Ch.13 dismissal rate delta from district baseline

Not raw dismissal rate -- the delta from the firm's own district baseline. A 48% rate means different things in a 27% district vs. a 54% district.

Weight: 0-30 points

Why district-adjusted?

Raw Ch.13 dismissal rates vary dramatically by district. Without adjustment, a firm performing at the national average gets scored identically whether it's in a low-dismissal district or a high-dismissal one. The delta isolates firm-specific underperformance from structural variation.

District baselines computed from the FJC dataset range from ~21% to ~82%, a nearly 4x spread. Example range from districts with 10,000+ closed Ch.13 cases:

District Type	Ch.13 Baseline	Closed Cases
Low-dismissal district	~27%	16,000+
Mid-range district	~43%	16,000+
National average (all Ch.13)	~45%	5.1M
High-dismissal district	~63%	17,000+

Archetype discovery

The model identifies two distinct mill archetypes through the cross-link of Axes 1 and 2:

Suppressed mills (low 1-star + high 1-review accounts): Actively manage reputation. Near-zero negative reviews. Heavy padding with single-use Google accounts. Observed: 0.3-2.5% one-star, 14-24% one-review accounts.
Unsuppressed mills (high 1-star + high 1-review accounts): Solicit reviews aggressively but don't suppress negatives. Bad outcomes leak through. Observed: 10-27% one-star, 16-22% one-review accounts.

Both archetypes can be confirmed by Axis 3 (outcome delta), but this axis requires attorney-level case data in the firm's actual operating district.

Sample results

38 firms scored: NACBA board members, circuit leaders, known mills, and controls. Firms with fewer than 10 reviews receive a 60% confidence discount; fewer than 20 receive a 30% discount.

Firm Type	Reviews	1-Star %	1-Acct %	Delta	Score Range
Known suppressed mill (A)	734	0.3%	23.6%	+3.2	70-75
Known suppressed mill (B)	324	1.2%	14.2%	+4.0	47-48
Known unsuppressed mill (C)	1,500	26.5%	21.6%	+6.8	38
Legitimate firm (control avg)	18-280	7-23%	3-15%	-3 to -15	0-15

Key finding: The reviewer sample for Axis 3 understates the real outcome gap, because clients who leave Google reviews are biased toward satisfied clients. In one case, the composite scorer showed a +4.0 delta from the reviewer sample, but direct PACER docket mining of the same firm's full caseload revealed an actual Ch.13 dismissal rate of 89% -- more than 3x the district baseline. The reviewer sample sees the happy clients; the docket sees everyone.

Where the Model Breaks Down

Data coverage gap: Axis 3 depends on attorney-level case data

The FJC database has 5.1M cases but does not include attorney first names or firm names -- only attorney_last (last name). Name matching produces false positives in large districts.
District-constrained matching fixes cross-district contamination but reduces hit rates. Many firms return N/A on Axis 3 because their districts lack sufficient FJC coverage.
Only 4 districts have deep FJC coverage (10,000+ closed Ch.13 cases each). The remaining 90 districts have sparse or zero closed-case data for this methodology.
Axes 1 and 2 (review signals) work everywhere -- they require only a Google Maps listing. Axis 3 (outcome) is the bottleneck.

What AACER would unlock

AACER provides attorney-level case data nationally: attorney names (first and last), firm names, filing dates, dispositions, prior filing history, case-level detail across all 94 districts. With AACER:

Axis 3 goes from 4-district coverage to 94-district coverage
Name matching goes from fuzzy debtor-name LIKE queries to exact attorney-firm linkage
Enables the first national attorney-level outcome benchmarking -- every consumer bankruptcy attorney in the country scored against their district peers
Combined with the review signals (Axes 1-2), produces a validated mill detection model testable against the full population
The 392,412 prior-filer discharge count could be refined to an actual verified violation count with attorney attribution

Tool Inventory (Research-Relevant)

Tool	What it does	Data source
`mill_composite_score.py`	3-axis composite scorer with district baselines	Google reviews + FJC 5.1M cases
`review_scrape.py`	Google Maps review scraper (CDP automation)	Google Maps
`review_audit.py`	Cross-reference reviewers against FJC cases	Scraped reviews + FJC
`run.py check [name]`	Attorney scorecard: caseload, dismissal rate, chapter mix, red flags	FJC 5.1M cases
`run.py scan`	Blind outlier detection across all attorneys in dataset	FJC
`run.py predict [name]`	ML mill probability (logistic regression, 7 features)	FJC
`run.py compare`	Subject firm vs. same-district controls	FJC
`run.py worst N`	Worst N attorneys by composite score	FJC
1328(f) Screener	Client-side SQL.js tool checking discharge eligibility	PACER case data (user-provided)
FJC MCP Server	Natural language queries against FJC data	FJC 5.1M cases

Research Log

April 2, 2026

Built 3-axis composite mill scorer. Cross-linked review suppression, review padding, and Ch.13 dismissal rate delta from district baseline. Scored 38 firms (NACBA board/leaders, known mills, controls). Discovery: district adjustment is essential -- without it, firms in low-baseline districts appear similar to firms in high-baseline districts despite vastly different performance relative to local peers. Identified data gap: FJC coverage insufficient for Axis 3 in most districts. AACER data would close the gap.

April 1, 2026

Built Google review scraping and audit pipeline. CDP-based scraper for Google Maps reviews. Forensic cross-reference engine matching reviewer names against 5.1M FJC bankruptcy cases (nickname expansion, fuzzy matching). Scraped 38 firms. Discovered two mill archetypes: review-suppressed (0-2% one-star) and unsuppressed (10-27% one-star). Both show elevated one-review Google accounts (15-25%).

March 30, 2026

Built national audit tool. 9-screen detection model (inspired by Madoff-style red flags: outsized returns without visible mechanism). 93-district heatmap visualization. Automated detection of all previously-known mill attorneys validated against manual identification.

March 25, 2026

First research call. Methodology reviewed -- no red flags identified. Co-authorship interest expressed. AACER dataset access discussed as path to national coverage. LoPucki introduction mentioned.

March 17-23, 2026

Rules Committee submission accepted. Proposed amendment to Rule 4004 (mandatory 1328(f) verification with docket notation). Assigned docket number 26-BK-3. Published on uscourts.gov.

Open Research Questions

Can the composite model predict mills out-of-sample? Train on known mills, test against unlabeled firms. Requires AACER for Axis 3 coverage nationally.
What is the true 1328(f) violation rate? The 392,412 figure is the universe of cases requiring verification. With AACER date data, the actual violation count could be computed directly.
Is there an attorney-level correlation between dismissal rate and 1328(f) violation rate? Hypothesis: high-dismissal attorneys are more likely to file barred cases because they churn clients through repeat filings.
Does review manipulation correlate with other misconduct indicators? Axes 1-2 are cheap to compute (no PACER costs). If they predict Axis 3 outcomes, review data alone could serve as a national screening tool.
District-level structural variation: Why does the Ch.13 dismissal baseline range from 21% to 82% across districts? Judicial culture, debtor demographics, trustee behavior, or attorney quality? The FJC data can answer this with the right controls.

What Comes Next

With AACER access

Populate Axis 3 for all 94 districts -- complete the composite model
Attorney-level national benchmarking (every consumer bankruptcy attorney, scored)
Verified 1328(f) violation count with attorney attribution
Publishable dataset for peer review

Without AACER access

Continue targeted PACER pulls for specific firms (expensive, slow)
Publish Axes 1-2 findings (review signals alone) as a preliminary paper
Expand review scraping to top 200 consumer bankruptcy firms nationally
Use RECAP/CourtListener for additional case enrichment