Professional Experience

Reinforce Labs

Remote

Feb 2026 — Present

CBRN Risk Evaluation: Evaluate frontier AI models for chemical, biological, radiological, and nuclear risk — assessing whether models can provide meaningful uplift for weapons-relevant knowledge.
AI Safety Research: Develop evaluation methodologies and red-teaming protocols for scientific misuse prevention across large language models.

Remote

Sept 2025 — Present

Scientific Review: Serve as final-stage reviewer for 24,000 chemistry prompts used in LLM training — spanning molecular dynamics, electroanalytical methods, protein aggregation, quantum mechanics, and spectroscopic characterization (NMR, MS).
Reasoning Integrity: Audit prompt-response pairs for conceptual accuracy, flawed reasoning chains, and deviation from current literature. Each prompt must be solvable de novo from first principles.
Prompt Engineering: Collaborate with PhD research fellows to architect high-difficulty reasoning questions that stress-test model understanding of graduate-level chemistry.

Remote

Nov 2024 — Sept 2025

Safety Systems: Reviewed 140,000+ multimodal prompts (text, image, video) for harmful content generation — built detection patterns for CSAM, PII leakage, and adversarial edge cases.
Multimodal Perception: Tuned model interpretation of physical scene dynamics — camera geometry, motion types, spatial reasoning — improving object recognition and situational awareness across visual inputs.
Voice & Audio Models: Led audio model development for Grok Voice & Companions: parameterized tone, cadence, timbre, and emotional valence with a global team of voice actors. Validated multilingual transcription accuracy.
Scientific Benchmarking: Authored domain-specific chemistry evaluations that contributed to Grok's performance on Humanity's Last Exam.
Evaluation Design: Ran controlled A/B experiments with Likert-scale instrumentation across voice model variants to optimize emotional naturalness.

Remote

Jun 2024 — Sept 2024

Adversarial Stress-Testing: Wrote PhD-level prompts targeting failure modes in GPT-o1 ("Strawberry") — probing logical consistency, domain accuracy, and chain-of-thought coherence.
Chain-of-Thought Analysis: Evaluated internal reasoning traces for correctness, hidden bias, and pedagogical clarity. Flagged systematic failure patterns across STEM domains.
Rubric Development: Designed scoring rubrics adopted across the evaluation pipeline — measurably improved model consistency on the GPQA benchmark.

Ojai, CA — Private Boarding School

Aug 2022 — Nov 2024

Instruction: Designed and taught AP/IB-aligned statistics, calculus, CAD, and Java programming across grades 6–12 (400+ students). Built original problem sets, assessments, and instructional materials for a private boarding school serving international families.
Competition Mentorship: Prepared students for USNCO, IChO, IMO, and Science Olympiad competitions. Mentored individual science fair projects from hypothesis design through regional presentation.
Enrichment Programs: Directed the school robotics program — restructured operations using Lean/5S methodology. Mentored students in Berklee College of Music Game Music Jam.
Standardized Testing: Coached SAT Math and AP exam preparation. Developed custom diagnostic assessments to identify gaps and accelerate score improvement.
Operations: Ran makerspace operations end-to-end: procurement, grant writing, fundraising, and regulatory compliance.

Thousand Oaks, CA

Jan 2022 — Aug 2022

Pharmaceutical Manufacturing: Operated GMP oral solid dosage equipment — milling, blending, coating, roller compaction (Gerteis), and tablet pressing (Korsch) — for oncology and pediatrics R&D campaigns.
Documentation & Reproducibility: Authored 5+ SOPs that formalized two decades of tacit operator knowledge into version-controlled documents meeting GxP regulatory standards.
Regulatory Compliance: Maintained batch records, logbooks, and chain-of-custody documentation through FDA and internal audit cycles.

Remote

Jun 2020 — Jun 2021

Live Instruction: Delivered 250+ sessions in higher math, thermodynamics, and chemistry — ranked #1 tutor on the platform across two academic terms, in partnership with Khan Academy.
Assessment & Credentialing: Designed a multi-tier tutor certification system adopted by UChicago, Caltech, and MIT admissions. Trained and onboarded 200+ peer tutors.
Policy & Compliance: Built the platform's first trust and safety framework — balanced FERPA, COPPA, and CIPA requirements with minimal data collection. Reduced sign-up attrition by 41%.
Analytics: Stood up KPI dashboards tracking user growth, session quality, and stakeholder satisfaction to guide product and policy decisions.

Goleta, CA

Sept 2016 — Sept 2019

Education Research: Ran pre/post assessments with 90+ secondary students — measured a 25–30% gain in scientific process understanding, validating curriculum effectiveness.
Curriculum Development: Designed and delivered NGSS-aligned modules in chemistry and biology for 200+ students per year.
Grant-Linked Impact: Grew program participation 40% through an ASBMB partnership — work contributed to subsequent $3M DoD and $1M congressional awards.
Team Management: Directed a rotating cohort of 15–20 undergraduate volunteers in scientific mentorship and experimental design instruction.

Remote

Mar 2015 — Jul 2018

Early-Stage Product: Joined at the $7.5M seed round (Shasta, Spark Capital, Omidyar). Contributed to growth from pilot to 500K+ MAU before Google acquisition.
Content at Scale: Wrote hundreds of STEM and humanities explanations. Standardized LaTeX formatting and built editorial guidelines that survived the platform migration.
Product Validation: Beta-tested camera-based math parsing, chat, and photo recognition during the 2015 Stanford pilot — feedback directly informed v1 product decisions.
Acquisition & Migration: Expanded subject coverage from 12 to 50+ areas. Designed Firebase-based tooling and led contributor onboarding through the Google transition.