Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")

Joe Carlsmith Audio

#Society #Philosophy #Joe

22:08

Host Francesca Amiker sits down with directors Joe and Anthony Russo, producer Angela Russo-Otstot, stars Millie Bobby Brown and Chris Pratt, and more to uncover how family was the key to building the emotional core of The Electric State . From the Russos’ own experiences growing up in a large Italian family to the film’s central relationship between Michelle and her robot brother Kid Cosmo, family relationships both on and off of the set were the key to bringing The Electric State to life. Listen to more from Netflix Podcasts . State Secrets: Inside the Making of The Electric State is produced by Netflix and Treefort Media.…

for ca. et år siden 9:01

MP3•Episode hjem

This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

Kapitler

1. Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs") (00:00:00)

2. 2.2.4 What if you intentionally train models to have long-term goals? (00:00:38)

3. 2.2.4.1 Training the model on long episodes (00:01:23)

4. 2.2.4.2 Using short episodes to train a model to pursue long-term goals (00:04:33)

63 episoder

Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")

Joe Carlsmith Audio

published for ca. et år siden

Del

MP3•Episode hjem

This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Kapitler

1. Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs") (00:00:00)

2. 2.2.4 What if you intentionally train models to have long-term goals? (00:00:38)

3. 2.2.4.1 Training the model on long episodes (00:01:23)

4. 2.2.4.2 Using short episodes to train a model to pursue long-term goals (00:04:33)

63 episoder

#Society #Philosophy #Joe

Alle episoder

Joe Carlsmith Audio

1
AI for AI safety 27:51

for 17 dage siden27:51

27:51

We should try extremely hard to use AI labor to help address the alignment problem. Text version here: https://joecarlsmith.com/2025/03/14/ai-for-ai-safety

Joe Carlsmith Audio

1
Paths and waystations in AI safety 18:07

for 19 dage siden18:07

18:07

On the structure of the path to safe superintelligence, and some possible milestones along the way. Text version here: https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety

Joe Carlsmith Audio

1
When should we worry about AI power-seeking? 46:54

for 6 weeks siden46:54

46:54

Examining the conditions required for rogue AI behavior. Text version here: https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power

Joe Carlsmith Audio

1
What is it to solve the alignment problem? 40:13

for 6 weeks siden40:13

40:13

Also: to avoid it? Handle it? Solve it forever? Solve it completely? Text version here: https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment

Joe Carlsmith Audio

1
How do we solve the alignment problem? 8:43

for 6 weeks siden8:43

8:43

Introduction to a series of essays about paths to safe and useful superintelligence. Text version here: https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem

Joe Carlsmith Audio

1
Fake thinking and real thinking 1:18:47

for 9 weeks siden1:18:47

1:18:47

When the line pulls at your hand. Text version here: https://joecarlsmith.com/2025/01/28/fake-thinking-and-real-thinking/.

Joe Carlsmith Audio

1
Takes on "Alignment Faking in Large Language Models" 1:27:54

for 15 weeks siden1:27:54

1:27:54

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

Joe Carlsmith Audio

1
(Part 2, AI takeover) Extended audio from my conversation with Dwarkesh Patel 2:07:33

for 26 weeks siden2:07:33

2:07:33

Extended audio from my conversation with Dwarkesh Patel. This part focuses on the basic story about AI takeover. Transcript available on my website here: https://joecarlsmith.com/2024/09/30/part-2-ai-takeover-extended-audio-transcript-from-my-conversation-with-dwarkesh-patel

Joe Carlsmith Audio

1
(Part 1, Otherness) Extended audio from my conversation with Dwarkesh Patel 3:58:38

for 26 weeks siden3:58:38

3:58:38

Extended audio from my conversation with Dwarkesh Patel. This part focuses on my series "Otherness and control in the age of AGI." Transcript available on my website here: https://joecarlsmith.com/2024/09/30/part-1-otherness-extended-audio-transcript-from-my-conversation-with-dwarkesh-patel/

Joe Carlsmith Audio

1
Introduction and summary for "Otherness and control in the age of AGI" 12:23

for 40 weeks siden12:23

12:23

This is the introduction and summary for my series "Otherness and control in the age of AGI." Text version here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

Joe Carlsmith Audio

1
Second half of full audio for "Otherness and control in the age of AGI" 4:11:02

for 41 weeks siden4:11:02

4:11:02

Second half of the full audio for my series on how agents with different values should relate to one another, and on the ethics of seeking and sharing power. First half here: https://joecarlsmithaudio.buzzsprout.com/2034731/15266490-first-half-of-full-audio-for-otherness-and-control-in-the-age-of-agi PDF of the full series here: https://jc.gatspress.com/pdf/otherness_full.pdf Summary of the series here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi…

Joe Carlsmith Audio

1
First half of full audio for "Otherness and control in the age of AGI" 3:07:29

for 41 weeks siden3:07:29

3:07:29

First half of the full audio for my series on how agents with different values should relate to one another, and on the ethics of seeking and sharing power. Second half here: https://joecarlsmithaudio.buzzsprout.com/2034731/15272132-second-half-of-full-audio-for-otherness-and-control-in-the-age-of-agi PDF of the full series here: https://jc.gatspress.com/pdf/otherness_full.pdf Summary of the series here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi…

Joe Carlsmith Audio

1
Loving a world you don't trust 1:03:54

for 41 weeks siden1:03:54

1:03:54

Garden, campfire, healing water. Text version here: https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi…

Joe Carlsmith Audio

1
On attunement 44:14

for 1 year siden44:14

44:14

Examining a certain kind of meaning-laden receptivity to the world. Text version here: https://joecarlsmith.com/2024/03/25/on-attunement This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi (Though: note that I haven't put the summary post on the podcast yet.)…

Joe Carlsmith Audio

1
On green 1:15:13

for 1 year siden1:15:13