Gå offline med appen Player FM !
The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")
Manage episode 387194172 series 3402048
This is section 2.3.1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Kapitler
1. The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs") (00:00:00)
2. 2.3 Aiming at reward-on-the-episode as part of a power-motivated instrumental strategy (00:00:41)
3. 2.3.1 The classic goal-guarding story (00:01:41)
4. 2.3.1.1 The goal-guarding hypothesis (00:02:45)
5. 2.3.1.1.1 The crystallization hypothesis (00:03:31)
6. 2.3.1.1.2 Would the goals of a would-be schemer “float around”? (00:08:00)
7. 2.3.1.1.3 What about looser forms of goal-guarding? (00:11:04)
8. 2.3.1.1.4 Introspective goal-guarding methods (00:16:56)
57 episoder
Manage episode 387194172 series 3402048
This is section 2.3.1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Kapitler
1. The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs") (00:00:00)
2. 2.3 Aiming at reward-on-the-episode as part of a power-motivated instrumental strategy (00:00:41)
3. 2.3.1 The classic goal-guarding story (00:01:41)
4. 2.3.1.1 The goal-guarding hypothesis (00:02:45)
5. 2.3.1.1.1 The crystallization hypothesis (00:03:31)
6. 2.3.1.1.2 Would the goals of a would-be schemer “float around”? (00:08:00)
7. 2.3.1.1.3 What about looser forms of goal-guarding? (00:11:04)
8. 2.3.1.1.4 Introspective goal-guarding methods (00:16:56)
57 episoder
ทุกตอน
×Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.