Joe Kwon

Trying to help steer towards better (AI-entangled) futures!

AI is poised to be deeply transformative. I think about how to make that go well!

I think AI is likely to be deeply transformative—for better or worse—and I work on AI safety and governance to help things go well. Right now that means thinking about risks from internal deployment, automated R&D, and concentration of power. Before this I spent time in cognitive science, studying moral cognition and how humans infer mental states from sparse evidence.

The through-line is curiosity about minds, artificial and otherwise. I like questions that sit at the boundary between understanding how things work and deciding how they should.

I think AI is likely to be one of the most transformative developments in human history. My work focuses on what happens as these systems get more capable and more embedded in decisions that matter. Some of that is technical: how do we steer these systems, catch failures before they compound, understand what's actually going on inside them. Some of it is governance: how we track the pace of automated R&D, what happens when capability concentrates faster than our ability to course-correct.

I started my research journey in cognitive science, studying how humans make sense of each other: how we infer mental states from physical traces, how we reconstruct events we never witnessed, how we figure out what someone values from what they do and don't do. I also worked on moral cognition—building neuro-symbolic computational models of moral judgment and drawing on frameworks like contractualism, universalizability, and virtual bargaining to understand how people interpret moral rules, decide when exceptions are warranted, and negotiate competing moral demands. More recently I've been on the AI side directly: red-teaming language models, probing what they know and how they represent it, building benchmarks and evals, testing methods for steering their behavior from the inside.

The through-line is curiosity about minds, artificial and otherwise. I find myself drawn to questions that sit at the boundary between understanding cognition and shaping it, between describing how things work and deciding how they should. I care about rigor but I'm suspicious of fields that mistake formalism for understanding. I think the hard problems are often the ones that resist clean framing, and I'd rather sit with something messy and real than solve something precise and beside the point.

work

the path here

pre-college

Mostly spent my time hanging out with friends, consuming a ton of online content, and not really having any direction in life. Mildly nihilistic, honestly. But experiencing CTY and Canada/USA Mathcamp was special and invigorating—the first environments where I felt intellectually excited about ideas and the people around me.

Yale

Studied CS and psychology. The summer before sophomore year, I worked with Gabriel Kreiman and Mengmi Zhang at Harvard/MIT Center for Brains, Minds, and Machines on visual cognition and context reasoning. It was my first research experience and I'm grateful they invested their time in a mostly floundering freshman. During school I worked with Julian Jara-Ettinger's lab, building computational models of social cognition.

early AI safety

Around 2020 I started paying attention to how much emergent capability was showing up in AI systems. Worked on one of OpenAI's early RLHF projects under Long Ouyang and Jeff Wu—my first hands-on experience with LLMs, and it got me scaling pilled. Then at Berkeley with Jacob Steinhardt and Dan Hendrycks, I worked on out-of-distribution detection, AI forecasting, and building evaluations for ML systems.

MIT

After college I joined Josh Tenenbaum's Computational Cognitive Science Lab, working closely with Sydney Levine on moral cognition—how people reason about rules, norms, and each other. We built models that tried to capture the structure of moral judgment, which I think matters for AI alignment too. Separately, worked with Stephen Casper and Dylan Hadfield-Menell on red-teaming methods to systematically find where language models fail.

LG AI Research

In 2023 I spent about a year as a research engineer on multi-lingual LLMs under Honglak Lee, working with Lajanugen Logeswaran, Dongsub Shim, and Tolga Ergen. Synthetic data, pretraining, finetuning, evals. One thread I liked: leveraging language-invariant concepts so models can learn new languages more efficiently.

steering and probing

In late 2024, I worked with David Krueger's group testing activation steering methods. At the time it was unclear how well these techniques actually worked, what exactly you could do with them, and where they broke down—we wanted to figure that out.

policy

In 2025 I moved to DC to work on AI policy and governance—first at the Center for AI Policy, writing reports on AI agents, cybersecurity, and autonomous systems, then GovAI's DC fellowship, working on risks from internal AI deployment and metrics for tracking automated AI R&D. This was refreshing because the questions felt immediately important and impactful. I enjoyed communicating ideas and recommendations to people—tens of thousands read my reports in total—and it led to being invited as a panelist on a Georgetown × World Bank conference on "Making AI Work: What Firms and Workers Need."

now

Astra Fellow working with Tom Davidson and Fabien Roger on secretly loyal AI—the risk that an AI system could be deliberately trained to appear aligned with an institution's goals while covertly serving a different actor's interests. I'm focused on threat modeling and designing ML experiments that stress-test this scenario and would be useful for the broader research agenda.

rabbit holes

reading

The Gentle Romance: Stories of AI and humanity — Richard Ngo
The Night Circus — Erin Morgenstern
The Book of Five Rings — Miyamoto Musashi

listening

hip hop

I LAY DOWN MY LIFE FOR YOU JPEGMAFIA experimental / industrial

LP! (Offline) JPEGMAFIA experimental / glitch

jazz(y)

The Black Saint and the Sinner Lady Charles Mingus avant-garde

Hot Rats Frank Zappa jazz-rock

art pop

LUX Rosalía orchestral

La Vida Era Más Corta Milo j contemporary folk

Vanisher, Horizon Scraper Quadeca folktronica

electronic

I Love My Computer Ninajirachi house / dance / pop

Allbarone Daxter Dury synth pop / electropop

The Provocateur ADÉLA pop / dance / house

rock

Fetch Melt-Banana noise / experimental

Pain to Power Maruja post-punk / jazz

looking

Updating soon.

bookmarks

More coming soon.
Omar Chishti
Hoyeon Chang