Mostly spent my time hanging out with friends, consuming a ton of online content, and not really having any direction in life. Mildly nihilistic, honestly. But experiencing CTY and Canada/USA Mathcamp was special and invigorating—the first environments where I felt intellectually excited about ideas and the people around me.
Joe Kwon
Trying to help steer towards better (AI-entangled) futures!
AI is poised to be deeply transformative. I think about how to make that go well!
work
- Evaluating Contextual Illegality: AI Compliance in Corporate Law Scenarios Testing how AI systems handle context-dependent legal compliance
- Measuring AI R&D Automation Assessing the pace and extent of AI-driven automation in research and development
- Internal Deployment Gaps in AI Regulation How current AI governance misses risks from internal use of AI systems
the path here
Studied CS and psychology. The summer before sophomore year, I worked with Gabriel Kreiman and Mengmi Zhang at Harvard/MIT Center for Brains, Minds, and Machines on visual cognition and context reasoning. It was my first research experience and I'm grateful they invested their time in a mostly floundering freshman. During school I worked with Julian Jara-Ettinger's lab, building computational models of social cognition.
Around 2020 I started paying attention to how much emergent capability was showing up in AI systems. Worked on one of OpenAI's early RLHF projects under Long Ouyang and Jeff Wu—my first hands-on experience with LLMs, and it got me scaling pilled. Then at Berkeley with Jacob Steinhardt and Dan Hendrycks, I worked on out-of-distribution detection, AI forecasting, and building evaluations for ML systems.
After college I joined Josh Tenenbaum's Computational Cognitive Science Lab, working closely with Sydney Levine on moral cognition—how people reason about rules, norms, and each other. We built models that tried to capture the structure of moral judgment, which I think matters for AI alignment too. Separately, worked with Stephen Casper and Dylan Hadfield-Menell on red-teaming methods to systematically find where language models fail.
In 2023 I spent about a year as a research engineer on multi-lingual LLMs under Honglak Lee, working with Lajanugen Logeswaran, Dongsub Shim, and Tolga Ergen. Synthetic data, pretraining, finetuning, evals. One thread I liked: leveraging language-invariant concepts so models can learn new languages more efficiently.
In late 2024, I worked with David Krueger's group testing activation steering methods. At the time it was unclear how well these techniques actually worked, what exactly you could do with them, and where they broke down—we wanted to figure that out.
In 2025 I moved to DC to work on AI policy and governance—first at the Center for AI Policy, writing reports on AI agents, cybersecurity, and autonomous systems, then GovAI's DC fellowship, working on risks from internal AI deployment and metrics for tracking automated AI R&D. This was refreshing because the questions felt immediately important and impactful. I enjoyed communicating ideas and recommendations to people—tens of thousands read my reports in total—and it led to being invited as a panelist on a Georgetown × World Bank conference on "Making AI Work: What Firms and Workers Need."
Astra Fellow working with Tom Davidson and Fabien Roger on secretly loyal AI—the risk that an AI system could be deliberately trained to appear aligned with an institution's goals while covertly serving a different actor's interests. I'm focused on threat modeling and designing ML experiments that stress-test this scenario and would be useful for the broader research agenda.
rabbit holes
reading
- The Gentle Romance: Stories of AI and humanity — Richard Ngo
- The Night Circus — Erin Morgenstern
- The Book of Five Rings — Miyamoto Musashi
listening
looking
Updating soon.
bookmarks
- More coming soon.
- Omar Chishti
- Hoyeon Chang