Back to all posts

Three Skills You Need to Launch a Successful AI Safety Project

by Luc Brinkman
AI SafetyProjectsEntrepreneurshipImpact

This is a draft. It's in a good state and I hope you enjoy reading it, but please don't share it widely. (Ah, and the intro will still receive a major rewrite to make it more compelling)

Most AI Safety projects fail to be impactful for the same preventable reasons.

This post is for people launching a new project at a new or existing organisation.1 It helps you identify what skills you're missing so you can avoid common pitfalls.

If, one year from now, your project fails to have real impact, it's probably because:

  • You didn't know where to start or took too long to launch
  • Nobody read your paper, implemented your policy, or used your tool
  • Your project tackled the wrong problem – one that wasn't impactful to solve in the first place

Most failures seem to come from missing one of these three skills: entrepreneurial skills, impact assessment skills, and strategic understanding of AI safety. I have yet to meet anyone who's mastered all three.

I hope my framing of the challenges in launching a new AI safety project is useful to you. I think my background and skillset are sufficient2 to make useful suggestions, though these remain speculative rather than the result of quantitative study. If you disagree, you may have good reasons – I'd love to hear them.

Most projects fail – yours doesn't have to

Even with good intentions and competent execution, most ideas end up having little effect on existential risk.

In the startup world, successful founders don't just settle for their first idea. Most ideas won't be profitable, so founders keep looking until they find something that will be. They iterate relentlessly, test assumptions rigorously, and pivot when needed. In AI Safety, most ideas won't be impactful either3. But just like at a startup, you can iterate toward impact.

Impact follows a power law.4 A small number of project will create most of the impact. By default, your project probably won't be one of them. But it could be – if you optimize deliberately for impact.5

By the way: when I say failure, I include "no impact", "little impact" and "negative impact (i.e. active harm)".6

Don't get stuck failing

Failure is good only when you fail fast. This allows you to move on to a better idea. This requires both detection and response.

  • Detection: you need to recognize when your project isn't on track for impact. A lack of detection leads hidden failure, where you think your project is working but you're not having impact.
  • Response: when you detect problems, you need to adequately respond to them by iterating or pivoting. Too little response, and you get stubborn failure, where you know that your project isn't really working, but you keep trying without really changing anything.7

It's very easy to simply be unaware of your lack of impact, or rationalize it away. Without adequate detection and response, you will get stuck in failure.

How to have impact

Your goal when launching an AI Safety project should be to maximally reduce existential risks from advanced AI.

To have impact, you need to:

  • Build something (execution)
  • That people engage with (engagement)
  • In ways that create impact (effectiveness)8

Think of it roughly as: Impact = Engagement × Effectiveness9

You need good enough execution, and then you maximize both engagement and effectiveness. The three skills will help you do that.

The three skills you need

There are three skills that will help you to build something that people engage with in a way that creates impact.

1. Entrepreneurial skills help you iterate toward something people actually engage with.
2. Impact assessment skills ensure that engagement translates to risk reduction.
3. Strategic understanding of AI safety helps you make better assumptions and judgments about what matters most.

Together, these help you understand problems, prioritize between them, and develop solutions that work.

Two archetypes

Most people could use more of each skill, but your starting point determines where you should focus:

  • Seasoned entrepreneurs know how to build something users love and make a profit. However, they often don't know what problems in AI Safety are worth solving or how to optimize for impact. They typically need more of skills 2 (impact assessment) and 3 (strategic understanding).
  • AI Safety veterans know what problems need solving, but often lack the tools to build something that addresses them effectively. They typically need more of skill 1 (entrepreneurship) and some of skill 2 (impact assessment).

And I really mean experienced entrepreneurs and experienced AI safety folks. Most people will need much more of all 3 skills. I for sure know I do.

1. Entrepreneurial skills

In AI safety, entrepreneurial skills help you do two things: iterate toward success and build something people engage with.

That "something" can be a research paper, policy campaign, tool, fellowship program, or any other AI safety intervention.

Iterating to success10

A key aspect of entrepreneurship is iterating quickly to develop your ideas:

  1. Make your best guess at a good idea
  2. Test specific assumptions behind your idea
  3. Improve the idea or switch to a different one
  4. Repeat11

Iteration velocity is at the heart of entrepreneurship. Adopting an iteration mindset helps you create maximum impact in a minimal amount of time.

The other thing that entrepreneurial skills help you with is:

  • Execution: Building something . . .
  • Engagement: . . . that people engage with.

Execution: How to build something

Building something requires generalist execution skills: project and people management, finance, communication, engineering, research.12

I will not spend much time on these because:

  • Most of you already know you need these skills
  • Good resources already exist (like the book "How to Launch an Impactful Nonprofit")
  • This is not where most startups fail

In fact, most new ideas will fail, even if competently executed.13 Rather than just building something, it's more important to find the right thing to build. Something that, if competently executed, people will actually engage with, in a way that leads to impact. We'll first turn to engagement, and then to impact.

What does engagement look like?14

Engagement, adoption, traction, or usage looks different in different subdomains of AI Safety:

  • Researchers build on your work
  • Legislators implement your policies
  • People join in your fellowships
  • Users adopt your tool or product

Throughout this post, I will often refer to the people that engage with your work as "users".

If people don't engage with your work, you won't have impact. You may as well have not built it.

Engagement: Build something people want

To build something people engage with, you need to understand what they actually need.15

You do this by talking to users, identifying assumptions, testing the riskiest assumptions, and iterating toward a solution.

There are entire books on this topic – e.g. The Lean Startup, Inspired, The Mom Test and Continuous Discovery Habits. There are also podcasts (Lenny's Podcast) and entrepreneurship bootcamps that teach these skills.

These concepts will help you create something that people use. But engagement alone isn't enough for impact.

Why engagement alone isn't enough for impact

I like bananas. If you sell bananas, I'll buy them from you. But this has zero impact on existential risk. 16

Remember: Impact = Engagement × Effectiveness

If your project isn't effective, it won't be impactful. No matter how much people engage with it. Here's what that can look like:

  • Your research is built on by others but turns out to be a technical novelty rather than effective at reducing x-risks17
  • Your policy gets passed because it makes legislators look good without making meaningful changes
  • You're hosting fellowships but there are no jobs for your fellows to be hired into

2. Impact assessment skills

These skills help you iterate toward impact, not just engagement.

Where entrepreneurial skills help you build something people use, impact measurement skills ensure that engagement actually leads to impact.

What even is 'impact'?

Reducing existential risk is a game of probabilities.

If a world without your project has a 50% chance of extinction and a world with your project has a 49% chance, you've reduced existential risk by one percentage point.18

Your goal should be to reduce the probability of existential risks by the maximum amount of percentage points. Assessing your impact tells you if you're moving in the right direction.

Learning to estimate impact involves concepts like expected value, counterfactuals, and cost-effectiveness. You can learn more about them on the EA Forum, LessWrong, and the book How to Launch a High-Impact Nonprofit.

Why impact is harder than profit

Creating a profitable startup is hard. Achieving impact in AI Safety is arguably even harder.

Impact can be harder than profit for three reasons:

  • There is no clear (market) signal to guide you.
  • To have impact, you need both engagement (like a for-profit)19 AND effectiveness (unlike a for-profit).20
  • AI Safety is largely pre-paradigmatic.

The pre-paradigmatic challenge

AI Safety doesn't have an established paradigm yet.21 We can't predict with certainty what will be impactful. So why bother optimizing so deliberately?

First, imperfect predictions are still valuable. For example, AI Safety experts can often point out specific reasons why a project or idea is unlikely to be impactful.22

Secondly, I argue the lack of a paradigm actually makes deliberate thinking about impact more important, not less. Without clear guides on what will lead to impact, you have to figure it out for yourself.

The tools described in this post help you optimize under uncertainty. The goal isn't to get it perfect or cripple yourself with analysis paralysis.23. But I do think most people would benefit from spending more time thinking about their impact.

How to find out what is impactful

The theory of change is one of your main tools for finding out what is impactful.

A theory of change isn't just an explanation linking your activities to impact.24 It helps you figure out what to work on in the first place. When used well, a theory of change can help you:

  • Break down problems into smaller sub-problems
  • Identify solutions that can have real impact
  • Spot key assumptions to test
  • Update your views when new evidence comes in

A theory of change should help you move toward impact.25

Other useful tools include the importance-tractability-neglectedness (ITN) framework and considerations choosing between for-profit vs. nonprofit structures and their associated incentive structures.

To learn more about these topics I will again refer you to the EA Forum, LessWrong, and the book How to Launch a High-Impact Nonprofit.

Measure proxies and validate causal chains

You can't measure AI safety impact directly. But you can—and should—estimate it.

You'll have to create a theory of change and validate each causal step: #bad-flow

  • Test critical assumptions
  • Measure intermediate outcomes
  • Subjectively judge assumptions that can't be measured directly26

Many intermediate outcomes can be measured. For example, a fellowship can track the percentage of fellows placed in AI safety jobs within six months.

But some assumptions that heavily influence impact can't be measured, like whether we face higher x-risk from loss-of-control or from power concentration scenarios.

To judge these assumptions well, you need accurate strategic understanding of AI safety.

3. Strategic Understanding of AI Safety

Your strategic understanding gives your project direction.

It's your mental model of how the AI safety landscape works. It combines your beliefs, assumptions, and causal understandings about which problems matter most and which interventions are likely to work. This is roughly what's sometimes called your world model.

This helps you understand the different perspectives to questions like:

  • Once we reach AGI, how long do we have until superintelligence?
  • Should we prioritize risks from bad human actors or from misaligned AI?
  • What are the differences in how humans, AGI, and ASI would kill us all?

The answers to these questions have massive implications for what you should be working on. In AI Safety, there is often no clear right answer, but there are answers that are quite clearly wrong.27 Developing an accurate world model will make you less wrong.

In practice, strategic understanding enables:

  1. Better assumptions: When you test those, they're more likely to be validated. Therefore, each iteration becomes a bigger step toward impact.
  2. Better judgement: This helps you assess assumptions that can't be measured directly.

Without good strategic understanding, bad assumptions slow your progress and bad judgment fools you into believing you're having impact even when you're not.

What's challenging: everyone has a world model, whether accurate or not. And most people think theirs is accurate, even when it's not.28

How to develop an accurate world model

Developing an accurate world model of AI safety takes time and effort.29

I'm not the best person to tell you how to get there. Hopefully, other people can propose tractable methods. But here are some starting points (please take these as more speculative than other claims in this post):

Good world models are:

  • Probabilistic
  • Evidence-based
  • Built on causal chains

They let you:

  • Identify cruxes that could change your priorities
  • Think through different scenarios
  • Predict the effects of different interventions

A good AI safety world model should probably include deep views on threat models, takeoff speeds, and coordination problems30. As you dive into a specific sub-problem, you will want to develop accurate views there too.

Because of this, I expect it to be more helpful to ask yourself "What are the key uncertainties in my world model?" than on "Is my world model correct?".31

Resources that can help you develop a good AI Safety world model probably include:

Integrating the three skills

Fall in love with the problem, not the solution33

Make sure you fall in love with the entire problem of AI Safety, first. Don't get attached prematurely to a specific solution, or even a sub-problem. Use the three skills to identify a specific sub-problem where you have the highest expected impact.

Only then think about solutions.34

Note that the maximum impact of a project is capped by the importance of the problem it solves. If a problem causes 3% of existential risk, a solution can only reduce risk by 3% It's often better to focus on another problem that's responsible for 40% existential risk.35

Iterating on impact36

Your goal should be to reject bad ideas as fast as possible, so you can get to a good idea sooner.

To iterate toward impact, you need to test two things:

  • Engagement: Will users actually engage with what you'll build? This one applies to any startup.37
  • Effectiveness: Will that engagement actually reduce existential risk? This one is specific to AI Safety.

You do this by identifying your assumptions and testing them with targeted experiments. How to do this is discussed in the books mentioned before.

Talk to others about your ideas

Go out there. Talk to people. It's one of the fastest ways to validate assumptions, shoot down bad ideas, and come up with new ones.

There are two types of people you should talk to:

  • Talk to users to figure out what they actually need. This helps with the 'engagement' part of the impact equation. Note that talking to users usefully is not intuitive: read "The Mom Test" to learn how to do this well.
  • Talk to experts about your strategic understanding and your theory of change.38 This helps with the 'effectiveness' part of the impact equation. This is especially important because it's very easy to be overconfident about the accuracy of your world model. Seek out criticism - it gives you a reality check. Try to discuss specific assumptions rather than your entire theory of change. Also, while I think expert opinions are useful for strategy, I would avoid expert opinions with regard to the 'engagement' part of the impact equation.39

Remember: it's your goal to reject bad ideas as soon as possible. Users and experts are there to help you do that.

Choose your next steps

Your next step should be whatever maximizes your impact on reducing existential risk from advanced AI.

First, honestly assess which of the three skills you're lacking most. Then, depending on your background and situation, focus on some mixture of:

Learning/Upskilling:

  • Acquire entrepreneurial skills
  • Acquire impact assessment skills
  • Enhance your strategic understanding of AI safety

Testing assumptions:

  • Map your assumptions around user value, usability, feasibility, growth, and your theory of change
  • Test the riskiest ones with the simplest tests you can think of
  • Update your beliefs based on what you learn

Exploring new ideas:

  • Identify the most important problems
  • Explore ways of addressing them

To upskill, besides the various books, blogs and podcasts I mentioned, you may want to:

  • Talk to experts
  • Learn by writing
  • Share your ideas and get feedback (with people you know or on LessWrong ShortForm posts)
  • Get a coach for entrepreneurship/product development and another for AI strategy
  • Practice, practice, practice

Take calculated risks, and bias toward action

AI safety funders say they like taking calculated risks in a "hits-based approach."40

This maximizes expected impact. They make bets based on potential upside despite the risks. The risks that (A) your project will fail to be impactful and (B) your project may be actively harmful. Make it your duty to yourself, the community, and the world to race vigorously to de-risk your project. Both to validate the upsides and reduce the downsides. Because that's the fastest way to impact.

Time to ensure AI safety is running out. The AI Safety space needs leaders, builders, and entrepreneurs to save the world. You could be one of them. I think mastering the three skills discussed in this post sets you up for success.

If you're not ready to launch yet, consider contributing to an existing project first. But please don't wait until you feel "ready". You'll have to develop your skills through practice anyway.

Start small and iterate, to:

Build something.
That people engage with.
In ways that create impact.

Footnotes

  1. This post applies to both nonprofits and for-profits.

  2. Recently, I have taken BlueDot's AGI Strategy Course, had discussions with ~30 experts and peers about the AI Safety landscape, and received favorable feedback from AI Safety veterans on the observations that culminated in this post. While I've only been active in the AI Safety space for a few months, my background further includes founding a student-run nonprofit developing iron-based energy storage, which has resulted in a commercial spinoff and the host university making iron-based energy storage one of its strategic focus areas. I've also read 50+ nonfiction books and recently (re)-read several books about entrepreneurship and product management with the aim of adapting the best practices they describe to AI Safety.

  3. If anything, impact is probably harder to achieve than profit. More on this later.

  4. This seems quite clear for nonprofits in other EA cause areas and for startups and companies in the for-profit world, so I expect it to hold for AI Safety too.

  5. You'll need some luck, too.

  6. For example, you might inadvertently increase AI capabilities or cause public or political backlash against safety.

  7. Responding adequately to problems looks similar in the startup-world, so later in the post, we will turn to there for solutions. What I describe here as "stubborn failure" is similar to the phenomenon of "Zombie Startups". It's also reminiscent of the definition of insanity: "Doing the same thing over and over again and expecting different results"

  8. Read this as "Effectiveness ≡ Impact per unit of engagement". If you can think of a better word to describe that then 'effectiveness', let me know and I will change it.

  9. This is somewhat related to the framing of "Impact = Magnitude x Direction" that's sometimes used.

  10. #note This section feels very shallow. Should I at least make some references to further reading? It also currently overlaps with the "Engagement" section

  11. This cycle is loosely adapted from the Build-Measure-Learn cycle described in "The Lean Startup"

  12. #note Do I want to mention somewhere that you will need specific/domain knowledge? Because I do think that's fair to mention.

  13. In his book "The Right It", Alberto Savoia refers to this as "the law of market failure"

  14. #note I don't know where to put this. Should I even just remove it altogether?

  15. In startup jargon, this is often called 'value' or 'desirability'. There are other factors to consider, like 'usability', business 'feasibility', and having a working 'growth engine'. However, 'value' -- do users actually want your product -- is often the most important.

  16. #note I want more silly but illustrative analogies like this.

  17. Sparse auto-encoders are arguably an example of this.
    Also, most safety researchers value impact. So if they like your research this may in part be because they expect it to be impactful. But their expectation may be wrong. Or they may like your research for different reasons.

  18. A fun unit you can use to measure this probabilistic impact is microdooms, where 1 microdoom = 1/1,000,000 probability of existential risk.

  19. I will refer here and in other place to for-profits as regular companies not aimed at AI Safety. Of course, an AI Safety project can be set up as a for-profit too.

  20. Although arguably, engagement is sometimes easier in a nonprofit setting. For example, the various fellowships have no trouble finding enough participants. In contrast, though, many products, tools, and blog posts do struggle to get engagement.

  21. See, e.g. https://ai-safety-atlas.com/chapters/03/07 or https://www.thecompendium.ai/ai-safety.

  22. Though there are also areas where experts disagree. In such cases, it becomes even more important to assess the arguments they use.

  23. See e.g. Holden Karnofsky on the 80000 hours podcast, where he says "When people ask me for career advice or whatever, the usual thing I’d say is: take a bunch of options that all seem competitive, and all seem like they could be the best thing, and that it’s not obvious which ones are better than others from an impact perspective. And from there I would say go with personal fit, go with the energy you feel to work on them."

  24. Aaron Schwartz calls this a theory of action instead.

  25. Your theory of change should probably cover at least the effectiveness of what you're building. It often makes sense for it to include parts of engagement too, but that's probably not strictly necessary -- after-all, for-profit startups don't use theories of change either.

  26. The requirement to subjectively judge some of your assumptions is in contrast to other altruistic domains, such as global health and animal welfare. While impact is still difficult to achieve in those domains, you can often directly measure impact outcomes such as child mortality. #cite some Measure Evaluate Learn article

  27. Or rather, there are reasons for giving a certain answer that are quite clearly wrong.

  28. There are various psychological mechanisms like motivated reasoning that make this outcome more likely.

  29. By the way, unlike the first two skills, I consider "strategic understanding" part skill and part knowledge. The skill involves how to develop your understanding and how to use it, whereas the world model itself is mainly knowledge.

  30. #note we can probably leave out this sentence now that I provided some examples earlier on.

  31. #note bad flow. This needs to be somewhere else or just removed altogether?

  32. I'm just talking about their helpfulness towards world modeling, here. Of course they are very useful for other things like technical upskilling too.

  33. #note Should I move this to the section about entrepreneurial skills?

  34. This process of understanding the problem first before thinking about the solution is examplified by the "Double Diamond" process. Of course, progress in reality is usually not so linear. But it's a useful heuristic nonetheless.

  35. Focusing on an important problem is a useful heuristic, but of course tractability and neglectedness matter too. In the end, it's about the project's expected percentage reduction of existential risk.

  36. #note Entire section feels redundant. Perhaps I want to emphasise that iteration should be used on effectiveness/impact too?

  37. This includes all the standard startup hypotheses surrounding desirability, usability, business viability, and technical feasibility.

  38. AI Safety experts are generally very busy. One way to get feedback on these topics is with ShortForm posts on LessWrong, on which others can then respond.

  39. Books like The Right It emphasize that for building (for-profit) products, it's much more effective to get your own data from talking to and testing with users, than to rely on expert opinions. This is also in line with other entrepreneurial books and best practices. If you do get an opinion from an expert, make sure you understand the assumptions underlying their claims, rather than taking their claims at face value.

  40. #note add citation.
    This is also what the approach startup investors take.