Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Joe Carlsmith Audio

Indhold leveret af Joe Carlsmith. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Joe Carlsmith eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

1y ago 21:25

MP3•Episode hjem

This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

Kapitler

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

57 episoder

#Society #Philosophy #Joe

Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Joe Carlsmith Audio

published 1y ago

Del

MP3•Episode hjem

This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Kapitler

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

57 episoder

#Society #Philosophy #Joe

ทุกตอน

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

Lyt til 500+ emner

Minder om Joe Carlsmith Audio

Podcasts der er værd at lytte til

Joe Carlsmith Audio « » Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Kapitler

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Kapitler

1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)

2. 2.2.2 Two sources of beyond-episode goals (00:00:28)

3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)

4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)

5. 2.2.2.1.2 How will models think about time? (00:05:02)

6. 2.2.2.1.3 The role of “reflection” (00:08:09)

7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)

8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)

9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)

10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)

Podcasts der er værd at lytte til

Velkommen til Player FM!

Minder om Joe Carlsmith Audio

Hurtig referencevejledning

Joe Carlsmith Audio « »
Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")