move volatility, not returns.
Across all 12 regions, γ coefficients for geo-to-variance range from 0.784 to 0.934. No significant return-prediction relationship showed up. The variance channel is the one that works.
Three research initiatives targeting the qualitative friction channels — geopolitical escalation, SEC footnote hedging schedules, and human capital workforce diligence. By mapping messy, unstructured records to bounded risk indices, these frameworks seek to build the numbers before the narrative.
A scoring engine that takes real-time conflict data from GDELT and ACLED and turns it into a daily variance score for each region. The core bet: geopolitical instability predicts how much a linked ETF moves, not which way.
A research engine that parses unstructured SEC footnotes with Gemini AI to score proved reserves, hedging contract floors, and unit margins. The core bet: micro-fundamental quality signals predict stock outperformance.
A diagnostic framework that scores workforce health for private equity targets before close. Five components, anchored to public records, normalised by sector. The output is a single composite score that goes into the data room alongside the financials.
Most risk conversations in finance end with a qualitative opinion: a country rating, a management bio, an analyst note. Those outputs cannot be backtested, compared across deals, or revisited after the fact.
Our projects start from the same constraint: take messy, partial inputs and produce one bounded, sector-normalised score. The score is the deliverable. Everything around it is context.
Goldstein publishes a 29.2% false positive rate and a 252-day warm-up exclusion per region. CHC explicitly disclaims predictive ability and frames its output as a descriptive diagnostic.
None of our projects claim more than the data actually supports. A coefficient needs to clear statistical significance. A score needs a documented methodology and a pipeline someone else could reproduce.
The score is the deliverable.— Method note · § 02
Everything around it is context.
Conflict data moves volatility, not price. This project tests that idea across 12 global chokepoints — does real-time instability from GDELT and ACLED predict how much a linked sector ETF moves, rather than which direction?
Most geopolitical risk tools end in a qualitative output: a country rating, an analyst note, a headline index. None of them produce a number you can backtest.
This project asks something narrower: does event-level conflict data predict variance increases in linked financial instruments? Not returns, not direction. Variance. That distinction determines whether the signal is practically useful or just intellectually interesting.
Across 12 chokepoints and four years of data, the answer is yes, with statistically significant coefficients and a hit rate that survives out-of-sample testing.
The Geopolitical Risk Premium Score (GRPS) has three components: event-based instability from GDELT and ACLED, sector volatility premium versus rolling benchmarks, and VIX z-score conditioning to account for broader market fear.
Each region goes through a 252-day warm-up before any score gets issued. No regime label is published until that window closes. The false positive rate, 29.2%, is included in every summary; it is not buried.
The scoring engine itself is private. The data pipeline, fetchers, quality checks, and backtest framework are all open-source and documented.
Across all 12 regions, γ coefficients for geo-to-variance range from 0.784 to 0.934. No significant return-prediction relationship showed up. The variance channel is the one that works.
323 threshold-crossing events validated across a 21-day forward window. Average hit rate for vol exceeding the 75th percentile post-crossing: 64.4%. The signal measures a structural regime, not an inefficiency, so it does not decay on publication.
The ETF proxy has to isolate region-specific exposure. Early iterations used XLE for multiple regions and got mathematically identical signals. Each of the 12 final proxies was chosen to maximise independence of the geopolitical channel.
The continuous 0–100 score adds less information than the three-regime label (STABLE, ELEVATED, CRITICAL) for practical risk management. The regime label is the summary output; the score is the audit trail.
Adding VIX z-score as a conditioning variable cut false positives by roughly 8 percentage points versus the unconditioned model. Broader market fear modulates how much local instability actually translates into sector variance.
The first 252 trading days per region are excluded from all validation, no exceptions. Evaluating before enough data accumulates produces inflated coefficients. Published results only reflect the post-warmup window.
| Region | ETF Proxy | γ (geo → variance) | p-value | Rationale |
|---|---|---|---|---|
| Middle East | XLE | 0.934 | < 0.001 | Energy sector — direct oil supply exposure |
| Eastern Europe | XME | 0.918 | < 0.001 | Metals & mining — commodity shock channel |
| Taiwan Strait | SOXX | 0.897 | < 0.001 | Semiconductors — TSMC supply chain risk |
| Remaining 9 regions | — | — | 90-day post-warm-up validation window in progress | |
Public financial screens skip what matters most — hedging cushions, proved reserve replacement lifespans, well-unit margins. By extracting SEC footnotes via Gemini AI, Caligula constructs a sector-normalised corporate quality index. Query a covered ticker from the 31-name study universe to build a footnote-driven deep dive.
| Rank | Ticker | Tier | Composite | Unit Econ. | Cap. Disc. | Bal. Sheet | Hedge Book | Reserves | Operational | Sentiment | Macro |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Loading study rankings… | |||||||||||
Universe normalised over weighted E&P sub-scores. Point-in-time dates enforced via EDGAR cutoff indexes. Click any row for the single-name deep dive.
A 12-year quarterly point-in-time simulation, rebalancing every quarter. We buy the top quartile of highest-ranked corporate names (Long basket) and short the bottom quartile (Short basket) to verify the predictive risk-adjusted edge of the 8-pillar scoring engine.
The simulation runs across two studies — the focused Permian Basin E&P Study and a diversified General Corporate Equities index. The Sharpe improvement is the experiment: does pulling qualitative detail out of footnotes actually move the risk-adjusted return?
| Quarter | Long Return | Short Return | L/S Net | N (Long) | N (Short) |
|---|
Energy · Oil & Gas Exploration & Production · Trailing LTM
| Metric | Value | Score |
|---|
Human capital is usually the biggest driver of value in a buyout, and the least quantified input in the deal memo. The CHC Platform turns workforce risk signals into a composite score that goes into the data room alongside the financials, before close.
PE due diligence is rigorous where data is structured: financials, legal, tax, customer concentration. But the moment you get to the people actually running the business, it turns qualitative and essentially unverifiable.
Consulting firms have written about this gap for decades. Two-thirds of post-merger integration failures trace back to culture and workforce issues, yet most deal memos give human capital fewer than two pages, usually just a management bio.
The problem is not that buyers don’t care about workforce health. It is that no structured, scoreable instrument exists to assess it. This framework is an attempt to build one.
Large-cap public targets have Glassdoor data, 10-K disclosures, analyst coverage, LinkedIn headcount signals. They are still hard to assess, but the signals at least exist.
Private companies with $50M–$500M EBITDA, the core of the middle market, have almost none of this. No EDGAR filings, no Glassdoor depth, no public attrition data. PE sponsors acquiring these companies are pricing workforce risk with essentially no structured input.
That is the segment this platform focuses on. The data architecture is built for the constraints of private, mid-market targets: it anchors on public-record signals and primary survey data collected during diligence, instead of relying on historical disclosures the target never made.
The HCV Index is the quantitative core of the platform. It produces a single composite score, bounded 0 to 1, from five independent sub-components. Each component draws from a different data layer. Sector normalisation against published industry benchmarks is in progress; until it lands, scores are most meaningful within a single sector rather than across deals.
The platform produces a Cultural Health Certificate, a PDF that can sit in an M&A data room next to the financial and legal due diligence reports.
The report includes the composite HCV score, each sub-component with its contributing signals, a four-tier risk classification (GREEN / YELLOW / ORANGE / RED), sector-relative percentile, and a narrative flag section summarising material risks for the deal team.
The sell-side deployment model means the target company commissions the CHC during deal prep, typically 60-90 days before marketing, and includes it in the Virtual Data Room. Investment banks advising the process become the natural distribution channel.
The HCV Index is a descriptive diagnostic, not a predictive model. It does not claim to forecast post-close turnover or EBITDA impact with precision.
The claim is narrower and more defensible: a structured, multi-signal composite score gives better diligence coverage than unstructured management interviews alone, and it produces a comparable, archivable record of the assessment.
Component weights draw on published practitioner literature (Korn Ferry, Deloitte Human Capital Trends, Mercer workforce risk indices) and are designed to be recalibrated as engagement data accumulates.
| Property | Design Target | Implementation | Status |
|---|---|---|---|
| HCV Score Range | 0.0 – 1.0 bounded | Proprietary normalisation over weighted sub-score · Sector-adjusted | ✓ Bounded |
| Risk Classification | 4-tier: GREEN / YELLOW / ORANGE / RED | Proprietary monotone thresholds | ✓ Implemented |
| WARN Act Hard Floor | BDR floor trigger | Programmatic override in BDR computation · Cannot be suppressed | ✓ Enforced |
| Survey Anonymisation | No PII stored at any point | UUID4 tokens · No name/email linkage · AES-256-GCM encrypted | ✓ By design |
| Single-use Survey Tokens | Each respondent token: one submission | Token invalidated on first use · Duplicate submission returns 400 | ✓ Enforced |
| Authenticated Encryption | Tamper detection on all stored data | AES-256-GCM · InvalidTag exception on any bit-flip · No silent corruption | ✓ GCM Mode |
| Sector Normalisation | Cross-sector comparability | Published benchmark means/stdevs (Manufacturing, Healthcare, SaaS/Tech, default) | In Progress |
| Component Weight Source | Grounded in literature | Korn Ferry, Deloitte, Mercer practitioner research · Recalibration on engagement data | Proposed |
| Private Company Data Coverage | Works without Glassdoor / 10-K | Job posting history (Apify) · WARN Act · OSHA · PACER · Primary survey | ✓ Public-record anchored |
| Report Delivery | Data-room ready PDF, 72-hr turnaround | WeasyPrint + Jinja2 pipeline · JWT-authenticated delivery link | In Development |
Methodology questions, academic discussion, collaboration proposals, or requests for the full documentation and validation walkthrough. This is a research portfolio, and correspondence is welcome from anyone engaging seriously with the work — whether in academia, private equity, banking, or human capital consulting.