Gå offline med appen Player FM !
Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")
Manage episode 386304331 series 3402048
This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Kapitler
1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)
2. 2.2.2 Two sources of beyond-episode goals (00:00:28)
3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)
4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)
5. 2.2.2.1.2 How will models think about time? (00:05:02)
6. 2.2.2.1.3 The role of “reflection” (00:08:09)
7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)
8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)
9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)
10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)
57 episoder
Manage episode 386304331 series 3402048
This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?”
Text of the report here: https://arxiv.org/abs/2311.08379
Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power
Kapitler
1. Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs") (00:00:00)
2. 2.2.2 Two sources of beyond-episode goals (00:00:28)
3. 2.2.2.1 Training-game-independent beyond-episode goals (00:01:32)
4. 2.2.2.1.1 Are beyond-episode goals the default? (00:03:26)
5. 2.2.2.1.2 How will models think about time? (00:05:02)
6. 2.2.2.1.3 The role of “reflection” (00:08:09)
7. 2.2.2.1.4 Pushing back on beyond-episode goals using adversarial training (00:10:56)
8. 2.2.2.2 Training-game-dependent beyond-episode goals (00:12:45)
9. 2.2.2.2.1 Can gradient descent “notice” the benefits of turning a non-schemer into a schemer? (00:14:47)
10. 2.2.2.2.2 Is SGD pulling scheming out of models by any means necessary? (00:18:51)
57 episoder
ทุกตอน
×Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.