Projects

Things I've built, researched, and shipped

Car Specifications Dataset

Open-source dataset covering 44,934 car models and variations mass-produced from 1985 to early 2022. Includes a Scrapy-based crawler with clear instructions for re-crawling.

Python Scrapy Open Data
View Repository

E-commerce Product Classification

Classification module for e-commerce product names into four categories. Uses sBERT and phoBERT transformer embeddings with a custom two-layer neural network. ONNX-accelerated inference, deployed on Streamlit Cloud.

PyTorch ONNX sBERT phoBERT Streamlit

AML Data Serving Pipeline

Bank-wide Anti-Money Laundering data serving from datalake at Techcombank. Fabricated 10+ ETL jobs and contributed 200+ features to the Risk Datamart. Part of the Credit Application Fraud detection system.

PySpark Databricks AWS Enterprise