14 Ideas for AI Safety Field-Building: A Lean Impact Approach
Note: this is not intended as a fully-developed blog post. It's rather a work-in-progress public document to quickly share my ideas.
I'll start with some context around approach and methodology, and then share some specific ideas for field-building.
Scaling the AI Safety Work Force
There are probably a little under 1 billion working-age adults in the developed world. With roughly 1,500 FTE in AI Safety, that means only 1 in 600,000 working-age adults in the developed world works full-time on AI safety. I propose to try to aim for a 100-fold increase, from roughly 1,500 to 150,000 people, within three years. This would still only be 0.02% of working-age adults. Assuming exponential growth, this would mean recruiting and onboarding 118k people in one year, three years from now.
The AI Safety Onboarding Pipeline is one way of visualizing the steps involved in onboarding people into AI Safety: (open in new tab)

The scale of each of those steps would have to grow 100-fold.
Working with a Theory of Change
Effective field-building requires systematic thinking about how our actions lead to impact. In my eyes, there are three ways of using theories of change, that are most powerful when combined:
Top-down: Start from "making AI go well" and work backwards to identify necessary conditions and interventions.
Bottom-up: Begin with intuitively promising solutions and trace their causal chains forward to expected impact.
Sideways: If you have a solution in mind, identify what problem it addresses, then brainstorm alternative solutions for that same problem.
Here's my own preliminary exploration of the top-down theory of change for making AI go well:
(open in new tab)

Lean Impact: Adapting Startup Methods for AI Safety
There are many useful best practices from the for-profit startup world. I find adapting them to a for-impact world improves my understand without having to start from scratch.
For any idea to create real change, we need to verify three key assumptions:
- Value hypothesis: will people actually use the intervention, product, or service, or read the publication?
- Growth hypothesis: can this scale to reach enough people?
- Impact hypothesis: when used, does this solution have impact?
Ideally, each feature of each intervention needs clear metrics tied to a theory of change. The underlying assumptions can be broken down and tested.
I hope to write more about this in the future, but for now: the book Lean Impact is a great starting point, though it's aimed more generally at social impact and not at AI Safety.
Now that we have some context, let's shortly look at some ideas:
14 Field-Building Ideas to Test
These ideas require rigorous theory of change development, risk assessment, and iterative development and validation. For now, they're just ideas. Starting points for experimentation, not finished proposals. Presumably some of them have been considered and discussed before.
AI Safety Pledge - Commitment mechanism inspired by Giving What We Can (see detailed exploration below)
Virtual AI Safety Career Transition Coach - Scalable, personalized guidance through career transitions
Theory of Change Tool - Interactive tool for developing theories of change. Like ChatPRD does for product requirement documents but then for impact.
AI Ops for AI Safety - AI-enabled workflows to amplify researcher productivity across the community. This should focus on safety-specific workflows that would not otherwise by tackled by the general, for-profit sector.
AI-generated content - Scalable high-quality AI safety content for diverse audiences. Today much AI-generated content is considered slop. But we will reach a point where it can be high quality. Let's ensure we'll be ready to use it for field-building.
AI Safety knowledge sharing tool - Virtual employee for AI safety collaboration, connecting researchers and insights. This would require strong buy-in and trust from the various AI safety research labs, of course.
Studying persuasion of AI X-risks - Research optimal approaches for convincing different stakeholder groups to take actions on the risks.
AI Safety communication training - Workshops, online curricula, or AI chatbot practice for effective advocacy
Stakeholder analysis AI - Analyze public figures' reasoning styles from transcripts to tailor persuasion strategies
AI Safety mental health project - Address unique psychological challenges of existential risk work to boost effectiveness and retention
Positive message testing - Counter helplessness with empowering narratives (BlueDot Impact: "Even when people accept the risks, this creates helplessness rather than motivation". See AI safety needs more public-facing advocacy)
Therapeutic values-alignment - Help people process existential anxieties and align actions with values through adapted ACT techniques
Overton window tracker - Automated monitoring of the public discourse around various aspects of AI Safety, that researchers and communicators can use to inform their communication style. Could be useful in keeping AI Safety un-polarized and un-politized.
Policy Advisor Chatbot - Help policymakers design effective AI governance and counter industry lobbying
Case Study: The AI Safety Pledge
Note: this is a very early draft, not a fully fledged proposal. I am currently entertained by the idea of an AI Safety Pledge, but not convinced that it's useful and desirable
Inspired by the Founder's Pledge and the 10% Pledge, we can offer people transitioning to an AI safety career to make an AI Safety Pledge. It could look something like this:
- I pledge to spend the coming years of my career on AI safety.
- If I don't manage to do so, for example because I can't find a job in AI Safety, I will donate 10% of my income to the AI safety movement.
- If I ever do decide to move back into AI safety, I can receive back my contributions to support my AI safety work.
Theory of Change
Hopefully, this pledge will:
- incentivize people to try harder to complete their career transition to AI safety
- create more buy-in towards keeping people accountable to their good intentions, e.g. through a virtual career coach.
- decrease the effective income gap between non-safety work (which would now be reduced by 10%) and AI safety work
- incentivize people to keep trying moving back to AI safety even if they weren't successful initially
Some of the risks:
- it might stimulate earning-to-give, which 80'000 hours currently views as less effective than direct career contributions.
- it might be perceived by the outside world as cult-like
- AI safety work may be hard to define
To do:
- Visualize the theory of change as causal chains
- Talk to people in the field to gauge opinions
- Look for existing similar ideas
Moving Forward
I'm expanding my knowledge daily and generating new ideas weekly. This list represents current thinking that will undoubtedly evolve. The goal isn't to be right initially but to learn fast enough to find what works before it's too late.