THETIS-MRV API
FastAPI · PostgreSQL · Emissions Data
Overview
Every time I needed EMSA data for research I had to download the latest Excel file, figure out which columns had shifted since last year, and spend twenty minutes cleaning before I could ask a single question. That gets old fast. So I built an API instead. 87,000 records, 2018 to 2024, clean endpoints, and it just works.
The challenge
The source files changed schema almost every year. Field names shifted, units were inconsistent, some years had coverage others did not. There was no stable way to query across the full dataset without doing the cleanup yourself each time. I needed something that absorbed the mess upstream and handed back clean, consistent data regardless of which year you were looking at.
The approach
- 01 Built importer scripts to normalize source fields and map them into a stable PostgreSQL schema in Supabase, handling year-to-year differences without breaking the API layer.
- 02 Implemented FastAPI endpoints for year summaries, ship lookups, fleet totals, and grouped aggregates with filtering, pagination, and proper validation.
- 03 Kept the endpoints practical for real data workflows, not just a demo layer, but something you can actually wire into analysis scripts.
- 04 Set up weekly GitHub Actions sync so new EMSA releases flow in automatically without manual reruns.
Outcome
The part of my workflow that used to start with a file download and twenty minutes of cleanup now starts at a GET request. That sounds like a small thing but it changes which questions feel worth asking, especially across multiple years. Weekly sync keeps it current without any effort.
Working on something in the maritime software or applied AI space? Let's talk →
Next project