COTexplorer/README.md
Greg 37f8eac932 Initial commit: CFTC COT Explorer
FastAPI application that ingests CFTC Commitments of Traders data into SQLite
and exposes it via a REST API with analytics endpoints (screener, percentile rank,
concentration). Includes CLI for historical and weekly data ingestion, Docker setup,
and a frontend.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 11:23:00 +01:00

148 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CFTC COT Explorer
A tool for ingesting, storing, and querying CFTC (Commodity Futures Trading Commission) Commitments of Traders (COT) report data. Provides a REST API backed by a SQLite database.
## Overview
The CFTC publishes weekly COT reports showing the positioning of commercial, non-commercial, and non-reportable traders across futures markets. This project:
- Parses the weekly HTML reports and historical ZIP archives from the CFTC website
- Stores the data in a SQLite database
- Exposes the data via a FastAPI REST API with analytics endpoints
## Project Structure
```
.
├── app/
│ ├── db.py # Database connection and initialization
│ ├── ingestion/
│ │ ├── cli.py # Command-line ingestion tool
│ │ ├── importer.py # Import logic (HTML, ZIP, download)
│ │ └── parser.py # HTML/ZIP parser
│ └── api/
│ ├── main.py # FastAPI application entry point
│ ├── models.py # Pydantic response models
│ └── routes/
│ ├── commodities.py # /api/commodities, /api/exchanges
│ ├── positions.py # /api/positions/{code}/...
│ ├── analytics.py # /api/analytics/...
│ └── reports.py # /api/reports/...
├── data/ # SQLite database and downloaded HTML files
├── schema.sql # Database schema
└── cftc_downloader.py # Standalone downloader script
```
## Setup
**Requirements:** Python 3.10+
```bash
pip install fastapi uvicorn requests beautifulsoup4
```
**Initialize the database:**
```bash
python -m app.ingestion.cli init-db
```
## Data Ingestion
### Import local HTML files
If you have weekly HTML files saved in `./data/`:
```bash
python -m app.ingestion.cli import-local-html --data-dir ./data
```
### Download and import the latest weekly report
```bash
python -m app.ingestion.cli download-and-import
```
### Import the full historical archive (1995present)
```bash
python -m app.ingestion.cli import-history --start-year 1995 --end-year 2026
```
### Import a specific file
```bash
python -m app.ingestion.cli import-html data/2026-03-10_deacbtlof.htm
python -m app.ingestion.cli import-zip deahistfo2024.zip
```
### Check database status
```bash
python -m app.ingestion.cli status
```
Ingestion is idempotent — re-running any import command will skip already-imported sources.
## Running the API
```bash
uvicorn app.api.main:app --reload
```
The API will be available at `http://localhost:8000`. Interactive docs are at `http://localhost:8000/docs`.
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `DB_PATH` | `app/data/cot.db` | Path to the SQLite database file |
## API Endpoints
### Commodities
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/exchanges` | List all exchanges with commodity counts |
| `GET` | `/api/commodities` | List all commodities (filter by `?exchange=CME`) |
| `GET` | `/api/commodities/{cftc_code}` | Get metadata for a single commodity |
### Positions
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/positions/{cftc_code}/latest` | Latest report with all row types and concentration data |
| `GET` | `/api/positions/{cftc_code}/history` | Time series of positions (supports `from_date`, `to_date`, `row_type`) |
| `GET` | `/api/positions/{cftc_code}/extremes` | All-time min/max for open interest and net positions |
| `GET` | `/api/positions/compare` | Compare a metric across multiple commodities (comma-separated `codes`) |
### Analytics
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/analytics/screener` | Rank markets by non-commercial net position percentile |
| `GET` | `/api/analytics/{cftc_code}/net-position-percentile` | Percentile rank and z-score for current net position |
| `GET` | `/api/analytics/{cftc_code}/concentration` | Largest-trader concentration data over time |
#### Screener parameters
| Parameter | Default | Description |
|-----------|---------|-------------|
| `exchange` | — | Filter by exchange abbreviation |
| `lookback_weeks` | `156` | Historical window for percentile calculation (41560) |
| `top_n` | `50` | Number of results to return |
| `direction` | — | Filter to `long` (≥50th pct) or `short` (<50th pct) |
## Database Schema
The SQLite database contains five tables:
- **`commodities`** one row per unique market (CFTC code, name, exchange)
- **`reports`** one row per (commodity, report date)
- **`positions`** position data per report, split into `All`, `Old`, and `Other` row types; includes open interest, long/short counts, week-over-week changes, percent of open interest, and trader counts
- **`concentration`** largest-trader concentration ratios (top 4 and top 8 traders, gross and net)
- **`import_log`** tracks which source files have been processed
A convenience view `v_net_positions` joins all tables and pre-computes net positions (long minus short) for each trader category.