Startup Projects
VizYourGov is a startup that aggregates and visualizes national/state campaign finance transactions, voting records and donor data to provide greater transparency and voting guidance…coming later in 2024.
-
Data Engineering: Rewrote data-pipelineRewrote data and web scraping pipeline used to collect, process, store and aggregate hundreds of GBs of campaign finance transactions, voting records and politician profile data. Migrated from manual- and cron-orchestration of Python scripts and SQL stored procedures to Dagster-orchestrated Python assets and dbt based SQL transforms. This resulted in:
Increased robustness via
- Tracking data quality metrics
- Materialization retries
- Runtime type checking
Accelerated development velocity by
- Enabling local development environment via abstracting I/O and resources
- Reducing contribution threshold and increasing pipeline visibility via Asset Catalog and End-to-End Data Lineage
Enhanced pipeline efficiency via
- Concurrent execution
- Incremental materialization
- Asset/pipeline partitioning
- Standardization around Apache Parquet and Arrow ecosystem (IPC, Flight, ADBC…)
Dagster, dbt, Apache Arrow, Arrow ADBC, Apache Parquet, pandas, Postgres, DuckDB, asyncio, Beautiful Soup -
Cloud infrastructure: Rearchitected AWS platformMigrated infrastructure from a self-hosted monolith into serverless architecture with managed Postgres instance and IAM base authentication: Additionally, I implemented a Github Actions based CI/CD pipeline which leveraged AWS CDK to create per-branch staging environments. Overall, the migration significantly reduced downtime and increased the deployment cadence from quarterly to weekly by eliminating the overhead of custom AWS configurations and reducing the code review feedback loop.
AWS: Route 53, CloudFront, S3, App Runner, RDS, CDK, Docker, Github Actions