Blue Cross Blue Shield — AI & Machine Learning Research Assistant
–- Fine-tuned Med42-8B LLM using a QLoRA pipeline in Python, reducing training loss from 2.49 → 0.09 across 15,000+ clinical prompt-response pairs for oncology decision support.
- Engineered a synthetic dataset of 5,200+ TCGA-LUAD–style patient records covering NCCN guideline adherence and mutation-driven reasoning, generating 30,000+ instruction-tuned training pairs.
- Resolved a JSONL data quality gap, redesigning pipeline logic to push clinical citation coverage to 75%+ and significantly improve model reliability.
- Deployed and evaluated Med42 LLM, achieving 72% accuracy on USMLE benchmarks — outperforming GPT-3.5 by ~11 points.
- Built SQL-based pipelines to analyze 10,000+ lung cancer EHR records for patient survival patterns.