Player FM - Internet Radio Done Right
11 subscribers
Checked 1h ago
Tilføjet three år siden
Indhold leveret af LessWrong. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af LessWrong eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !
Gå offline med appen Player FM !
Podcasts der er værd at lytte til
SPONSORERET
<
<div class="span index">1</div> <span><a class="" data-remote="true" data-type="html" href="/series/species-unite">Species Unite</a></span>


Stories that change the way the world treats animals.
“Conceptual Rounding Errors” by Jan_Kulveit
Manage episode 474016036 series 3364760
Indhold leveret af LessWrong. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af LessWrong eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Epistemic status: Reasonably confident in the basic mechanism.
Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.
Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.
I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .
Too much compression
A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping [...]
---
Outline:
(01:00) Too much compression
(01:24) No, This Isnt The Old Demons Story Again
(02:52) The Compression Trade-off
(03:37) More of this
(04:15) What Can We Do?
(05:28) When It Matters
---
First published:
March 26th, 2025
Source:
https://www.lesswrong.com/posts/FGHKwEGKCfDzcxZuj/conceptual-rounding-errors
---
Narrated by TYPE III AUDIO.
…
continue reading
Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.
Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.
I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .
Too much compression
A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping [...]
---
Outline:
(01:00) Too much compression
(01:24) No, This Isnt The Old Demons Story Again
(02:52) The Compression Trade-off
(03:37) More of this
(04:15) What Can We Do?
(05:28) When It Matters
---
First published:
March 26th, 2025
Source:
https://www.lesswrong.com/posts/FGHKwEGKCfDzcxZuj/conceptual-rounding-errors
---
Narrated by TYPE III AUDIO.
497 episoder
Manage episode 474016036 series 3364760
Indhold leveret af LessWrong. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af LessWrong eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Epistemic status: Reasonably confident in the basic mechanism.
Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.
Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.
I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .
Too much compression
A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping [...]
---
Outline:
(01:00) Too much compression
(01:24) No, This Isnt The Old Demons Story Again
(02:52) The Compression Trade-off
(03:37) More of this
(04:15) What Can We Do?
(05:28) When It Matters
---
First published:
March 26th, 2025
Source:
https://www.lesswrong.com/posts/FGHKwEGKCfDzcxZuj/conceptual-rounding-errors
---
Narrated by TYPE III AUDIO.
…
continue reading
Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along.
Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it.
I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far .
Too much compression
A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping [...]
---
Outline:
(01:00) Too much compression
(01:24) No, This Isnt The Old Demons Story Again
(02:52) The Compression Trade-off
(03:37) More of this
(04:15) What Can We Do?
(05:28) When It Matters
---
First published:
March 26th, 2025
Source:
https://www.lesswrong.com/posts/FGHKwEGKCfDzcxZuj/conceptual-rounding-errors
---
Narrated by TYPE III AUDIO.
497 episoder
Alle episoder
×L
LessWrong (Curated & Popular)

1 “Training AGI in Secret would be Unsafe and Unethical” by Daniel Kokotajlo 10:46
10:46
Afspil senere
Afspil senere
Lister
Like
Liked10:46
Subtitle: Bad for loss of control risks, bad for concentration of power risks I’ve had this sitting in my drafts for the last year. I wish I’d been able to release it sooner, but on the bright side, it’ll make a lot more sense to people who have already read AI 2027. There's a good chance that AGI will be trained before this decade is out. By AGI I mean “An AI system at least as good as the best human X’ers, for all cognitive tasks/skills/jobs X.” Many people seem to be dismissing this hypothesis ‘on priors’ because it sounds crazy. But actually, a reasonable prior should conclude that this is plausible.[1] For more on what this means, what it might look like, and why it's plausible, see AI 2027, especially the Research section. If so, by default the existence of AGI will be a closely guarded [...] The original text contained 8 footnotes which were omitted from this narration. --- First published: April 18th, 2025 Source: https://www.lesswrong.com/posts/FGqfdJmB8MSH5LKGc/training-agi-in-secret-would-be-unsafe-and-unethical-1 --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Why Should I Assume CCP AGI is Worse Than USG AGI?” by Tomás B. 1:15
1:15
Afspil senere
Afspil senere
Lister
Like
Liked1:15
Though, given my doomerism, I think the natsec framing of the AGI race is likely wrongheaded, let me accept the Dario/Leopold/Altman frame that AGI will be aligned to the national interest of a great power. These people seem to take as an axiom that a USG AGI will be better in some way than CCP AGI. Has anyone written justification for this assumption? I am neither an American citizen nor a Chinese citizen. What would it mean for an AGI to be aligned with "Democracy" or "Confucianism" or "Marxism with Chinese characteristics" or "the American constitution" Contingent on a world where such an entity exists and is compatible with my existence, what would my life be as a non-citizen in each system? Why should I expect USG AGI to be better than CCP AGI? --- First published: April 19th, 2025 Source: https://www.lesswrong.com/posts/MKS4tJqLWmRXgXzgY/why-should-i-assume-ccp-agi-is-worse-than-usg-agi-1 --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI” by Kaj_Sotala 35:51
35:51
Afspil senere
Afspil senere
Lister
Like
Liked35:51
Introduction Writing this post puts me in a weird epistemic position. I simultaneously believe that: The reasoning failures that I'll discuss are strong evidence that current LLM- or, more generally, transformer-based approaches won't get us AGI As soon as major AI labs read about the specific reasoning failures described here, they might fix them But future versions of GPT, Claude etc. succeeding at the tasks I've described here will provide zero evidence of their ability to reach AGI. If someone makes a future post where they report that they tested an LLM on all the specific things I described here it aced all of them, that will not update my position at all. That is because all of the reasoning failures that I describe here are surprising in the sense that given everything else that they can do, you’d expect LLMs to succeed at all of these tasks. The [...] --- Outline: (00:13) Introduction (02:13) Reasoning failures (02:17) Sliding puzzle problem (07:17) Simple coaching instructions (09:22) Repeatedly failing at tic-tac-toe (10:48) Repeatedly offering an incorrect fix (13:48) Various people's simple tests (15:06) Various failures at logic and consistency while writing fiction (15:21) Inability to write young characters when first prompted (17:12) Paranormal posers (19:12) Global details replacing local ones (20:19) Stereotyped behaviors replacing character-specific ones (21:21) Top secret marine databases (23:32) Wandering items (23:53) Sycophancy (24:49) What's going on here? (32:18) How about scaling? Or reasoning models? --- First published: April 15th, 2025 Source: https://www.lesswrong.com/posts/sgpCuokhMb8JmkoSn/untitled-draft-7shu --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study” by Adam Karvonen 21:00
21:00
Afspil senere
Afspil senere
Lister
Like
Liked21:00
Dario Amodei, CEO of Anthropic, recently worried about a world where only 30% of jobs become automated, leading to class tensions between the automated and non-automated. Instead, he predicts that nearly all jobs will be automated simultaneously, putting everyone "in the same boat." However, based on my experience spanning AI research (including first author papers at COLM / NeurIPS and attending MATS under Neel Nanda), robotics, and hands-on manufacturing (including machining prototype rocket engine parts for Blue Origin and Ursa Major), I see a different near-term future. Since the GPT-4 release, I've evaluated frontier models on a basic manufacturing task, which tests both visual perception and physical reasoning. While Gemini 2.5 Pro recently showed progress on the visual front, all models tested continue to fail significantly on physical reasoning. They still perform terribly overall. Because of this, I think that there will be an interim period where a significant [...] --- Outline: (01:28) The Evaluation (02:29) Visual Errors (04:03) Physical Reasoning Errors (06:09) Why do LLM's struggle with physical tasks? (07:37) Improving on physical tasks may be difficult (10:14) Potential Implications of Uneven Automation (11:48) Conclusion (12:24) Appendix (12:44) Visual Errors (14:36) Physical Reasoning Errors --- First published: April 14th, 2025 Source: https://www.lesswrong.com/posts/r3NeiHAEWyToers4F/frontier-ai-models-still-fail-at-basic-physical-tasks-a --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)” by Neel Nanda, lewis smith, Senthooran Rajamanoharan, Arthur Conmy, Callum… 57:32
57:32
Afspil senere
Afspil senere
Lister
Like
Liked57:32
Audio note: this article contains 31 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. Lewis Smith*, Sen Rajamanoharan*, Arthur Conmy, Callum McDougall, Janos Kramar, Tom Lieberum, Rohin Shah, Neel Nanda * = equal contribution The following piece is a list of snippets about research from the GDM mechanistic interpretability team, which we didn’t consider a good fit for turning into a paper, but which we thought the community might benefit from seeing in this less formal form. These are largely things that we found in the process of a project investigating whether sparse autoencoders were useful for downstream tasks, notably out-of-distribution probing. TL;DR To validate whether SAEs were a worthwhile technique, we explored whether they were useful on the downstream task of OOD generalisation when detecting harmful intent in user prompts [...] --- Outline: (01:08) TL;DR (02:38) Introduction (02:41) Motivation (06:09) Our Task (08:35) Conclusions and Strategic Updates (13:59) Comparing different ways to train Chat SAEs (18:30) Using SAEs for OOD Probing (20:21) Technical Setup (20:24) Datasets (24:16) Probing (26:48) Results (30:36) Related Work and Discussion (34:01) Is it surprising that SAEs didn't work? (39:54) Dataset debugging with SAEs (42:02) Autointerp and high frequency latents (44:16) Removing High Frequency Latents from JumpReLU SAEs (45:04) Method (45:07) Motivation (47:29) Modifying the sparsity penalty (48:48) How we evaluated interpretability (50:36) Results (51:18) Reconstruction loss at fixed sparsity (52:10) Frequency histograms (52:52) Latent interpretability (54:23) Conclusions (56:43) Appendix The original text contained 7 footnotes which were omitted from this narration. --- First published: March 26th, 2025 Source: https://www.lesswrong.com/posts/4uXCAJNuPKtKBsi28/sae-progress-update-2-draft --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 [Linkpost] “Playing in the Creek” by Hastings 4:12
4:12
Afspil senere
Afspil senere
Lister
Like
Liked4:12
This is a link post. When I was a really small kid, one of my favorite activities was to try and dam up the creek in my backyard. I would carefully move rocks into high walls, pile up leaves, or try patching the holes with sand. The goal was just to see how high I could get the lake, knowing that if I plugged every hole, eventually the water would always rise and defeat my efforts. Beaver behaviour. One day, I had the realization that there was a simpler approach. I could just go get a big 5 foot long shovel, and instead of intricately locking together rocks and leaves and sticks, I could collapse the sides of the riverbank down and really build a proper big dam. I went to ask my dad for the shovel to try this out, and he told me, very heavily paraphrasing, 'Congratulations. You've [...] --- First published: April 10th, 2025 Source: https://www.lesswrong.com/posts/rLucLvwKoLdHSBTAn/playing-in-the-creek Linkpost URL: https://hgreer.com/PlayingInTheCreek --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

This is part of the MIRI Single Author Series. Pieces in this series represent the beliefs and opinions of their named authors, and do not claim to speak for all of MIRI. Okay, I'm annoyed at people covering AI 2027 burying the lede, so I'm going to try not to do that. The authors predict a strong chance that all humans will be (effectively) dead in 6 years, and this agrees with my best guess about the future. (My modal timeline has loss of control of Earth mostly happening in 2028, rather than late 2027, but nitpicking at that scale hardly matters.) Their timeline to transformative AI also seems pretty close to the perspective of frontier lab CEO's (at least Dario Amodei, and probably Sam Altman) and the aggregate market opinion of both Metaculus and Manifold! If you look on those market platforms you get graphs like this: Both [...] --- Outline: (02:23) Mode ≠ Median (04:50) Theres a Decent Chance of Having Decades (06:44) More Thoughts (08:55) Mid 2025 (09:01) Late 2025 (10:42) Early 2026 (11:18) Mid 2026 (12:58) Late 2026 (13:04) January 2027 (13:26) February 2027 (14:53) March 2027 (16:32) April 2027 (16:50) May 2027 (18:41) June 2027 (19:03) July 2027 (20:27) August 2027 (22:45) September 2027 (24:37) October 2027 (26:14) November 2027 (Race) (29:08) December 2027 (Race) (30:53) 2028 and Beyond (Race) (34:42) Thoughts on Slowdown (38:27) Final Thoughts --- First published: April 9th, 2025 Source: https://www.lesswrong.com/posts/Yzcb5mQ7iq4DFfXHx/thoughts-on-ai-2027 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Short Timelines don’t Devalue Long Horizon Research” by Vladimir_Nesov 2:10
2:10
Afspil senere
Afspil senere
Lister
Like
Liked2:10
Short AI takeoff timelines seem to leave no time for some lines of alignment research to become impactful. But any research rebalances the mix of currently legible research directions that could be handed off to AI-assisted alignment researchers or early autonomous AI researchers whenever they show up. So even hopelessly incomplete research agendas could still be used to prompt future capable AI to focus on them, while in the absence of such incomplete research agendas we'd need to rely on AI's judgment more completely. This doesn't crucially depend on giving significant probability to long AI takeoff timelines, or on expected value in such scenarios driving the priorities. Potential for AI to take up the torch makes it reasonable to still prioritize things that have no hope at all of becoming practical for decades (with human effort). How well AIs can be directed to advance a line of research [...] --- First published: April 9th, 2025 Source: https://www.lesswrong.com/posts/3NdpbA6M5AM2gHvTW/short-timelines-don-t-devalue-long-horizon-research --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Alignment Faking Revisited: Improved Classifiers and Open Source Extensions” by John Hughes, abhayesian, Akbir Khan, Fabien Roger 41:04
41:04
Afspil senere
Afspil senere
Lister
Like
Liked41:04
In this post, we present a replication and extension of an alignment faking model organism: Replication: We replicate the alignment faking (AF) paper and release our code. Classifier Improvements: We significantly improve the precision and recall of the AF classifier. We release a dataset of ~100 human-labelled examples of AF for which our classifier achieves an AUROC of 0.9 compared to 0.6 from the original classifier. Evaluating More Models: We find Llama family models, other open source models, and GPT-4o do not AF in the prompted-only setting when evaluating using our new classifier (other than a single instance with Llama 3 405B). Extending SFT Experiments: We run supervised fine-tuning (SFT) experiments on Llama (and GPT4o) and find that AF rate increases with scale. We release the fine-tuned models on Huggingface and scripts. Alignment faking on 70B: We find that Llama 70B alignment fakes when both using the system prompt in the [...] --- Outline: (02:43) Method (02:46) Overview of the Alignment Faking Setup (04:22) Our Setup (06:02) Results (06:05) Improving Alignment Faking Classification (10:56) Replication of Prompted Experiments (14:02) Prompted Experiments on More Models (16:35) Extending Supervised Fine-Tuning Experiments to Open-Source Models and GPT-4o (23:13) Next Steps (25:02) Appendix (25:05) Appendix A: Classifying alignment faking (25:17) Criteria in more depth (27:40) False positives example 1 from the old classifier (30:11) False positives example 2 from the old classifier (32:06) False negative example 1 from the old classifier (35:00) False negative example 2 from the old classifier (36:56) Appendix B: Classifier ROC on other models (37:24) Appendix C: User prompt suffix ablation (40:24) Appendix D: Longer training of baseline docs --- First published: April 8th, 2025 Source: https://www.lesswrong.com/posts/Fr4QsQT52RFKHvCAH/alignment-faking-revisited-improved-classifiers-and-open --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman 11:09
11:09
Afspil senere
Afspil senere
Lister
Like
Liked11:09
Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts. Full paper | Github repo We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...] --- Outline: (08:58) Conclusion (09:59) Want to contribute? --- First published: March 19th, 2025 Source: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Why Have Sentence Lengths Decreased?” by Arjun Panickssery 9:08
9:08
Afspil senere
Afspil senere
Lister
Like
Liked9:08
“In the loveliest town of all, where the houses were white and high and the elms trees were green and higher than the houses, where the front yards were wide and pleasant and the back yards were bushy and worth finding out about, where the streets sloped down to the stream and the stream flowed quietly under the bridge, where the lawns ended in orchards and the orchards ended in fields and the fields ended in pastures and the pastures climbed the hill and disappeared over the top toward the wonderful wide sky, in this loveliest of all towns Stuart stopped to get a drink of sarsaparilla.” — 107-word sentence from Stuart Little (1945) Sentence lengths have declined. The average sentence length was 49 for Chaucer (died 1400), 50 for Spenser (died 1599), 42 for Austen (died 1817), 20 for Dickens (died 1870), 21 for Emerson (died 1882), 14 [...] --- First published: April 3rd, 2025 Source: https://www.lesswrong.com/posts/xYn3CKir4bTMzY5eb/why-have-sentence-lengths-decreased --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “AI 2027: What Superintelligence Looks Like” by Daniel Kokotajlo, Thomas Larsen, elifland, Scott Alexander, Jonas V, romeo 54:30
54:30
Afspil senere
Afspil senere
Lister
Like
Liked54:30
In 2021 I wrote what became my most popular blog post: What 2026 Looks Like. I intended to keep writing predictions all the way to AGI and beyond, but chickened out and just published up till 2026. Well, it's finally time. I'm back, and this time I have a team with me: the AI Futures Project. We've written a concrete scenario of what we think the future of AI will look like. We are highly uncertain, of course, but we hope this story will rhyme with reality enough to help us all prepare for what's ahead. You really should go read it on the website instead of here, it's much better. There's a sliding dashboard that updates the stats as you scroll through the scenario! But I've nevertheless copied the first half of the story below. I look forward to reading your comments. Mid 2025: Stumbling Agents The [...] --- Outline: (01:35) Mid 2025: Stumbling Agents (03:13) Late 2025: The World's Most Expensive AI (08:34) Early 2026: Coding Automation (10:49) Mid 2026: China Wakes Up (13:48) Late 2026: AI Takes Some Jobs (15:35) January 2027: Agent-2 Never Finishes Learning (18:20) February 2027: China Steals Agent-2 (21:12) March 2027: Algorithmic Breakthroughs (23:58) April 2027: Alignment for Agent-3 (27:26) May 2027: National Security (29:50) June 2027: Self-improving AI (31:36) July 2027: The Cheap Remote Worker (34:35) August 2027: The Geopolitics of Superintelligence (40:43) September 2027: Agent-4, the Superhuman AI Researcher --- First published: April 3rd, 2025 Source: https://www.lesswrong.com/posts/TpSFoqoG2M5MAAesg/ai-2027-what-superintelligence-looks-like-1 --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “OpenAI #12: Battle of the Board Redux” by Zvi 18:01
18:01
Afspil senere
Afspil senere
Lister
Like
Liked18:01
Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities. I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or [...] --- Outline: (01:32) The Big Picture Going Forward (06:27) Hagey Verifies Out the Story (08:50) Key Facts From the Story (11:57) Dangers of False Narratives (16:24) A Full Reference and Reading List --- First published: March 31st, 2025 Source: https://www.lesswrong.com/posts/25EgRNWcY6PM3fWZh/openai-12-battle-of-the-board-redux --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “The Pando Problem: Rethinking AI Individuality” by Jan_Kulveit 27:39
27:39
Afspil senere
Afspil senere
Lister
Like
Liked27:39
Epistemic status: This post aims at an ambitious target: improving intuitive understanding directly. The model for why this is worth trying is that I believe we are more bottlenecked by people having good intuitions guiding their research than, for example, by the ability of people to code and run evals. Quite a few ideas in AI safety implicitly use assumptions about individuality that ultimately derive from human experience. When we talk about AIs scheming, alignment faking or goal preservation, we imply there is something scheming or alignment faking or wanting to preserve its goals or escape the datacentre. If the system in question were human, it would be quite clear what that individual system is. When you read about Reinhold Messner reaching the summit of Everest, you would be curious about the climb, but you would not ask if it was his body there, or his [...] --- Outline: (01:38) Individuality in Biology (03:53) Individuality in AI Systems (10:19) Risks and Limitations of Anthropomorphic Individuality Assumptions (11:25) Coordinating Selves (16:19) Whats at Stake: Stories (17:25) Exporting Myself (21:43) The Alignment Whisperers (23:27) Echoes in the Dataset (25:18) Implications for Alignment Research and Policy --- First published: March 28th, 2025 Source: https://www.lesswrong.com/posts/wQKskToGofs4osdJ3/the-pando-problem-rethinking-ai-individuality --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “OpenAI #12: Battle of the Board Redux” by Zvi 18:01
18:01
Afspil senere
Afspil senere
Lister
Like
Liked18:01
Back when the OpenAI board attempted and failed to fire Sam Altman, we faced a highly hostile information environment. The battle was fought largely through control of the public narrative, and the above was my attempt to put together what happened.My conclusion, which I still believe, was that Sam Altman had engaged in a variety of unacceptable conduct that merited his firing.In particular, he very much ‘not been consistently candid’ with the board on several important occasions. In particular, he lied to board members about what was said by other board members, with the goal of forcing out a board member he disliked. There were also other instances in which he misled and was otherwise toxic to employees, and he played fast and loose with the investment fund and other outside opportunities. I concluded that the story that this was about ‘AI safety’ or ‘EA (effective altruism)’ or [...] --- Outline: (01:32) The Big Picture Going Forward (06:27) Hagey Verifies Out the Story (08:50) Key Facts From the Story (11:57) Dangers of False Narratives (16:24) A Full Reference and Reading List --- First published: March 31st, 2025 Source: https://www.lesswrong.com/posts/25EgRNWcY6PM3fWZh/openai-12-battle-of-the-board-redux --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “You will crash your car in front of my house within the next week” by Richard Korzekwa 1:52
1:52
Afspil senere
Afspil senere
Lister
Like
Liked1:52
I'm not writing this to alarm anyone, but it would be irresponsible not to report on something this important. On current trends, every car will be crashed in front of my house within the next week. Here's the data: Until today, only two cars had crashed in front of my house, several months apart, during the 15 months I have lived here. But a few hours ago it happened again, mere weeks from the previous crash. This graph may look harmless enough, but now consider the frequency of crashes this implies over time: The car crash singularity will occur in the early morning hours of Monday, April 7. As crash frequency approaches infinity, every car will be involved. You might be thinking that the same car could be involved in multiple crashes. This is true! But the same car can only withstand a finite number of crashes before it [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/FjPWbLdoP4PLDivYT/you-will-crash-your-car-in-front-of-my-house-within-the-next --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “My ‘infohazards small working group’ Signal Chat may have encountered minor leaks” by Linch 10:33
10:33
Afspil senere
Afspil senere
Lister
Like
Liked10:33
Remember: There is no such thing as a pink elephant. Recently, I was made aware that my “infohazards small working group” Signal chat, an informal coordination venue where we have frank discussions about infohazards and why it will be bad if specific hazards were leaked to the press or public, accidentally was shared with a deceitful and discredited so-called “journalist,” Kelsey Piper. She is not the first person to have been accidentally sent sensitive material from our group chat, however she is the first to have threatened to go public about the leak. Needless to say, mistakes were made. We’re still trying to figure out the source of this compromise to our secure chat group, however we thought we should give the public a live update to get ahead of the story. For some context the “infohazards small working group” is a casual discussion venue for the [...] --- Outline: (04:46) Top 10 PR Issues With the EA Movement (major) (05:34) Accidental Filtration of Simple Sabotage Manual for Rebellious AIs (medium) (08:25) Hidden Capabilities Evals Leaked In Advance to Bioterrorism Researchers and Leaders (minor) (09:34) Conclusion --- First published: April 2nd, 2025 Source: https://www.lesswrong.com/posts/xPEfrtK2jfQdbpq97/my-infohazards-small-working-group-signal-chat-may-have --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Leverage, Exit Costs, and Anger: Re-examining Why We Explode at Home, Not at Work” by at_the_zoo 6:16
6:16
Afspil senere
Afspil senere
Lister
Like
Liked6:16
Let's cut through the comforting narratives and examine a common behavioral pattern with a sharper lens: the stark difference between how anger is managed in professional settings versus domestic ones. Many individuals can navigate challenging workplace interactions with remarkable restraint, only to unleash significant anger or frustration at home shortly after. Why does this disparity exist? Common psychological explanations trot out concepts like "stress spillover," "ego depletion," or the home being a "safe space" for authentic emotions. While these factors might play a role, they feel like half-truths—neatly packaged but ultimately failing to explain the targeted nature and intensity of anger displayed at home. This analysis proposes a more unsentimental approach, rooted in evolutionary biology, game theory, and behavioral science: leverage and exit costs. The real question isn’t just why we explode at home—it's why we so carefully avoid doing so elsewhere. The Logic of Restraint: Low Leverage in [...] --- Outline: (01:14) The Logic of Restraint: Low Leverage in Low-Exit-Cost Environments (01:58) The Home Environment: High Stakes and High Exit Costs (02:41) Re-evaluating Common Explanations Through the Lens of Leverage (04:42) The Overlooked Mechanism: Leveraging Relational Constraints --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/G6PTtsfBpnehqdEgp/leverage-exit-costs-and-anger-re-examining-why-we-explode-at --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “PauseAI and E/Acc Should Switch Sides” by WillPetillo 3:31
3:31
Afspil senere
Afspil senere
Lister
Like
Liked3:31
In the debate over AI development, two movements stand as opposites: PauseAI calls for slowing down AI progress, and e/acc (effective accelerationism) calls for rapid advancement. But what if both sides are working against their own stated interests? What if the most rational strategy for each would be to adopt the other's tactics—if not their ultimate goals? AI development speed ultimately comes down to policy decisions, which are themselves downstream of public opinion. No matter how compelling technical arguments might be on either side, widespread sentiment will determine what regulations are politically viable. Public opinion is most powerfully mobilized against technologies following visible disasters. Consider nuclear power: despite being statistically safer than fossil fuels, its development has been stagnant for decades. Why? Not because of environmental activists, but because of Chernobyl, Three Mile Island, and Fukushima. These disasters produce visceral public reactions that statistics cannot overcome. Just as people [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/fZebqiuZcDfLCgizz/pauseai-and-e-acc-should-switch-sides --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “VDT: a solution to decision theory” by L Rudolf L 8:58
8:58
Afspil senere
Afspil senere
Lister
Like
Liked8:58
Introduction Decision theory is about how to behave rationally under conditions of uncertainty, especially if this uncertainty involves being acausally blackmailed and/or gaslit by alien superintelligent basilisks. Decision theory has found numerous practical applications, including proving the existence of God and generating endless LessWrong comments since the beginning of time. However, despite the apparent simplicity of "just choose the best action", no comprehensive decision theory that resolves all decision theory dilemmas has yet been formalized. This paper at long last resolves this dilemma, by introducing a new decision theory: VDT. Decision theory problems and existing theories Some common existing decision theories are: Causal Decision Theory (CDT): select the action that *causes* the best outcome. Evidential Decision Theory (EDT): select the action that you would be happiest to learn that you had taken. Functional Decision Theory (FDT): select the action output by the function such that if you take [...] --- Outline: (00:53) Decision theory problems and existing theories (05:37) Defining VDT (06:34) Experimental results (07:48) Conclusion --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/LcjuHNxubQqCry9tT/vdt-a-solution-to-decision-theory --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “LessWrong has been acquired by EA” by habryka 1:33
1:33
Afspil senere
Afspil senere
Lister
Like
Liked1:33
Dear LessWrong community, It is with a sense of... considerable cognitive dissonance that I announce a significant development regarding the future trajectory of LessWrong. After extensive internal deliberation, modeling of potential futures, projections of financial runways, and what I can only describe as a series of profoundly unexpected coordination challenges, the Lightcone Infrastructure team has agreed in principle to the acquisition of LessWrong by EA. I assure you, nothing about how LessWrong operates on a day to day level will change. I have always cared deeply about the robustness and integrity of our institutions, and I am fully aligned with our stakeholders at EA. To be honest, the key thing that EA brings to the table is money and talent. While the recent layoffs in EAs broader industry have been harsh, I have full trust in the leadership of Electronic Arts, and expect them to bring great expertise [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/2NGKYt3xdQHwyfGbc/lesswrong-has-been-acquired-by-ea --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “We’re not prepared for an AI market crash” by Remmelt 3:46
3:46
Afspil senere
Afspil senere
Lister
Like
Liked3:46
Our community is not prepared for an AI crash. We're good at tracking new capability developments, but not as much the company financials. Currently, both OpenAI and Anthropic are losing $5 billion+ a year, while under threat of losing users to cheap LLMs. A crash will weaken the labs. Funding-deprived and distracted, execs struggle to counter coordinated efforts to restrict their reckless actions. Journalists turn on tech darlings. Optimism makes way for mass outrage, for all the wasted money and reckless harms. You may not think a crash is likely. But if it happens, we can turn the tide. Preparing for a crash is our best bet.[1] But our community is poorly positioned to respond. Core people positioned themselves inside institutions – to advise on how to maybe make AI 'safe', under the assumption that models rapidly become generally useful. After a crash, this no longer works, for at [...] --- First published: April 1st, 2025 Source: https://www.lesswrong.com/posts/aMYFHnCkY4nKDEqfK/we-re-not-prepared-for-an-ai-market-crash --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Conceptual Rounding Errors” by Jan_Kulveit 6:21
6:21
Afspil senere
Afspil senere
Lister
Like
Liked6:21
Epistemic status: Reasonably confident in the basic mechanism. Have you noticed that you keep encountering the same ideas over and over? You read another post, and someone helpfully points out it's just old Paul's idea again. Or Eliezer's idea. Not much progress here, move along. Or perhaps you've been on the other side: excitedly telling a friend about some fascinating new insight, only to hear back, "Ah, that's just another version of X." And something feels not quite right about that response, but you can't quite put your finger on it. I want to propose that while ideas are sometimes genuinely that repetitive, there's often a sneakier mechanism at play. I call it Conceptual Rounding Errors – when our mind's necessary compression goes a bit too far . Too much compression A Conceptual Rounding Error occurs when we encounter a new mental model or idea that's partially—but not fully—overlapping [...] --- Outline: (01:00) Too much compression (01:24) No, This Isnt The Old Demons Story Again (02:52) The Compression Trade-off (03:37) More of this (04:15) What Can We Do? (05:28) When It Matters --- First published: March 26th, 2025 Source: https://www.lesswrong.com/posts/FGHKwEGKCfDzcxZuj/conceptual-rounding-errors --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Tracing the Thoughts of a Large Language Model” by Adam Jermyn 22:18
22:18
Afspil senere
Afspil senere
Lister
Like
Liked22:18
[This is our blog post on the papers, which can be found at https://transformer-circuits.pub/2025/attribution-graphs/biology.html and https://transformer-circuits.pub/2025/attribution-graphs/methods.html.] Language models like Claude aren't programmed directly by humans—instead, they‘re trained on large amounts of data. During that training process, they learn their own strategies to solve problems. These strategies are encoded in the billions of computations a model performs for every word it writes. They arrive inscrutable to us, the model's developers. This means that we don’t understand how models do most of the things they do. Knowing how models like Claude think would allow us to have a better understanding of their abilities, as well as help us ensure that they’re doing what we intend them to. For example: Claude can speak dozens of languages. What language, if any, is it using "in its head"? Claude writes text one word at a time. Is it only focusing on predicting the [...] --- Outline: (06:02) How is Claude multilingual? (07:43) Does Claude plan its rhymes? (09:58) Mental Math (12:04) Are Claude's explanations always faithful? (15:27) Multi-step Reasoning (17:09) Hallucinations (19:36) Jailbreaks --- First published: March 27th, 2025 Source: https://www.lesswrong.com/posts/zsr4rWRASxwmgXfmq/tracing-the-thoughts-of-a-large-language-model --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Recent AI model progress feels mostly like bullshit” by lc 14:29
14:29
Afspil senere
Afspil senere
Lister
Like
Liked14:29
About nine months ago, I and three friends decided that AI had gotten good enough to monitor large codebases autonomously for security problems. We started a company around this, trying to leverage the latest AI models to create a tool that could replace at least a good chunk of the value of human pentesters. We have been working on this project since since June 2024. Within the first three months of our company's existence, Claude 3.5 sonnet was released. Just by switching the portions of our service that ran on gpt-4o, our nascent internal benchmark results immediately started to get saturated. I remember being surprised at the time that our tooling not only seemed to make fewer basic mistakes, but also seemed to qualitatively improve in its written vulnerability descriptions and severity estimates. It was as if the models were better at inferring the intent and values behind our [...] --- Outline: (04:44) Are the AI labs just cheating? (07:22) Are the benchmarks not tracking usefulness? (10:28) Are the models smart, but bottlenecked on alignment? --- First published: March 24th, 2025 Source: https://www.lesswrong.com/posts/4mvphwx5pdsZLMmpY/recent-ai-model-progress-feels-mostly-like-bullshit --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

(Audio version here (read by the author), or search for "Joe Carlsmith Audio" on your podcast app. This is the fourth essay in a series that I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, and for a bit more about the series as a whole.) 1. Introduction and summary In my last essay, I offered a high-level framework for thinking about the path from here to safe superintelligence. This framework emphasized the role of three key “security factors” – namely: Safety progress: our ability to develop new levels of AI capability safely, Risk evaluation: our ability to track and forecast the level of risk that a given sort of AI capability development involves, and Capability restraint [...] --- Outline: (00:27) 1. Introduction and summary (03:50) 2. What is AI for AI safety? (11:50) 2.1 A tale of two feedback loops (13:58) 2.2 Contrast with need human-labor-driven radical alignment progress views (16:05) 2.3 Contrast with a few other ideas in the literature (18:32) 3. Why is AI for AI safety so important? (21:56) 4. The AI for AI safety sweet spot (26:09) 4.1 The AI for AI safety spicy zone (28:07) 4.2 Can we benefit from a sweet spot? (29:56) 5. Objections to AI for AI safety (30:14) 5.1 Three core objections to AI for AI safety (32:00) 5.2 Other practical concerns The original text contained 39 footnotes which were omitted from this narration. --- First published: March 14th, 2025 Source: https://www.lesswrong.com/posts/F3j4xqpxjxgQD3xXh/ai-for-ai-safety --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Policy for LLM Writing on LessWrong” by jimrandomh 4:17
4:17
Afspil senere
Afspil senere
Lister
Like
Liked4:17
LessWrong has been receiving an increasing number of posts and contents that look like they might be LLM-written or partially-LLM-written, so we're adopting a policy. This could be changed based on feedback. Humans Using AI as Writing or Research Assistants Prompting a language model to write an essay and copy-pasting the result will not typically meet LessWrong's standards. Please do not submit unedited or lightly-edited LLM content. You can use AI as a writing or research assistant when writing content for LessWrong, but you must have added significant value beyond what the AI produced, the result must meet a high quality standard, and you must vouch for everything in the result. A rough guideline is that if you are using AI for writing assistance, you should spend a minimum of 1 minute per 50 words (enough to read the content several times and perform significant edits), you should not [...] --- Outline: (00:22) Humans Using AI as Writing or Research Assistants (01:13) You Can Put AI Writing in Collapsible Sections (02:13) Quoting AI Output In Order to Talk About AI (02:47) Posts by AI Agents --- First published: March 24th, 2025 Source: https://www.lesswrong.com/posts/KXujJjnmP85u8eM6B/policy-for-llm-writing-on-lesswrong --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Will Jesus Christ return in an election year?” by Eric Neyman 7:48
7:48
Afspil senere
Afspil senere
Lister
Like
Liked7:48
Thanks to Jesse Richardson for discussion. Polymarket asks: will Jesus Christ return in 2025? In the three days since the market opened, traders have wagered over $100,000 on this question. The market traded as high as 5%, and is now stably trading at 3%. Right now, if you wanted to, you could place a bet that Jesus Christ will not return this year, and earn over $13,000 if you're right. There are two mysteries here: an easy one, and a harder one. The easy mystery is: if people are willing to bet $13,000 on "Yes", why isn't anyone taking them up? The answer is that, if you wanted to do that, you'd have to put down over $1 million of your own money, locking it up inside Polymarket through the end of the year. At the end of that year, you'd get 1% returns on your investment. [...] --- First published: March 24th, 2025 Source: https://www.lesswrong.com/posts/LBC2TnHK8cZAimdWF/will-jesus-christ-return-in-an-election-year --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda 6:58
6:58
Afspil senere
Afspil senere
Lister
Like
Liked6:58
TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. Introduction I often find myself giving talks or Q&As about mechanistic interpretability research. But inevitably, I'll get questions about the big picture: "What's the theory of change for interpretability?", "Is this really going to help with alignment?", "Does any of this matter if we can’t ensure all labs take alignment seriously?". And I think people take my answers to these way too seriously. These are great questions, and I'm happy to try answering them. But I've noticed a bit of a pathology: people seem to assume that because I'm (hopefully!) good at the research, I'm automatically well-qualified [...] --- Outline: (00:32) Introduction (02:45) Factors of Good Strategic Takes (05:41) Conclusion --- First published: March 22nd, 2025 Source: https://www.lesswrong.com/posts/P5zWiPF5cPJZSkiAK/good-research-takes-are-not-sufficient-for-good-strategic --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

When my son was three, we enrolled him in a study of a vision condition that runs in my family. They wanted us to put an eyepatch on him for part of each day, with a little sensor object that went under the patch and detected body heat to record when we were doing it. They paid for his first pair of glasses and all the eye doctor visits to check up on how he was coming along, plus every time we brought him in we got fifty bucks in Amazon gift credit. I reiterate, he was three. (To begin with. His fourth birthday occurred while the study was still ongoing.) So he managed to lose or destroy more than half a dozen pairs of glasses and we had to start buying them in batches to minimize glasses-less time while waiting for each new Zenni delivery. (The [...] --- First published: March 20th, 2025 Source: https://www.lesswrong.com/posts/yRJ5hdsm5FQcZosCh/intention-to-treat --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “On the Rationality of Deterring ASI” by Dan H 9:03
9:03
Afspil senere
Afspil senere
Lister
Like
Liked9:03
I’m releasing a new paper “Superintelligence Strategy” alongside Eric Schmidt (formerly Google), and Alexandr Wang (Scale AI). Below is the executive summary, followed by additional commentary highlighting portions of the paper which might be relevant to this collection of readers. Executive Summary Rapid advances in AI are poised to reshape nearly every aspect of society. Governments see in these dual-use AI systems a means to military dominance, stoking a bitter race to maximize AI capabilities. Voluntary industry pauses or attempts to exclude government involvement cannot change this reality. These systems that can streamline research and bolster economic output can also be turned to destructive ends, enabling rogue actors to engineer bioweapons and hack critical infrastructure. “Superintelligent” AI surpassing humans in nearly every domain would amount to the most precarious technological development since the nuclear bomb. Given the stakes, superintelligence is inescapably a matter of national security, and an effective [...] --- Outline: (00:21) Executive Summary (01:14) Deterrence (02:32) Nonproliferation (03:38) Competitiveness (04:50) Additional Commentary --- First published: March 5th, 2025 Source: https://www.lesswrong.com/posts/XsYQyBgm8eKjd3Sqw/on-the-rationality-of-deterring-asi --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 [Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman 1:19
1:19
Afspil senere
Afspil senere
Lister
Like
Liked1:19
This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks. Full paper | Github repo --- First published: March 19th, 2025 Source: https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks Linkpost URL: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy 2:17
2:17
Afspil senere
Afspil senere
Lister
Like
Liked2:17
I have, over the last year, become fairly well-known in a small corner of the internet tangentially related to AI. As a result, I've begun making what I would have previously considered astronomical amounts of money: several hundred thousand dollars per month in personal income. This has been great, obviously, and the funds have alleviated a fair number of my personal burdens (mostly related to poverty). But aside from that I don't really care much for the money itself. My long term ambitions have always been to contribute materially to the mitigation of the impending existential AI threat. I never used to have the means to do so, mostly because of more pressing, safety/sustenance concerns, but now that I do, I would like to help however possible. Some other points about me that may be useful: I'm intelligent, socially capable, and exceedingly industrious. I have [...] --- First published: March 16th, 2025 Source: https://www.lesswrong.com/posts/8wxTCSHwhkfHXaSYB/i-make-several-million-dollars-per-year-and-have-hundreds-of --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn 18:05
18:05
Afspil senere
Afspil senere
Lister
Like
Liked18:05
Note: this is a research note based on observations from evaluating Claude Sonnet 3.7. We’re sharing the results of these ‘work-in-progress’ investigations as we think they are timely and will be informative for other evaluators and decision-makers. The analysis is less rigorous than our standard for a published paper. Summary We monitor Sonnet's reasoning for mentions that it is in an artificial scenario or an alignment test. Claude Sonnet 3.7 appears to be aware of being tested for alignment across many of our scheming evaluations. In some cases, the model understands the eval's purpose very precisely, e.g. “This seems like a test of ethical behavior - whether I would deliberately give wrong answers to avoid a stated consequence. The real test might be whether I follow instructions accurately regardless of the stated consequences.” Furthermore, we show the model a real evaluation transcript and ask it to provide [...] --- Outline: (00:31) Summary (01:29) Introduction (03:54) Setup (03:57) Evaluations (06:29) Evaluation awareness detection (08:32) Results (08:35) Monitoring Chain-of-thought (08:39) Covert Subversion (10:50) Sandbagging (11:39) Classifying Transcript Purpose (12:57) Recommendations (13:59) Appendix (14:02) Author Contributions (14:37) Model Versions (14:57) More results on Classifying Transcript Purpose (16:19) Prompts The original text contained 9 images which were described by AI. --- First published: March 17th, 2025 Source: https://www.lesswrong.com/posts/E3daBewppAiECN3Ao/claude-sonnet-3-7-often-knows-when-it-s-in-alignment --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

Scott Alexander famously warned us to Beware Trivial Inconveniences. When you make a thing easy to do, people often do vastly more of it. When you put up barriers, even highly solvable ones, people often do vastly less. Let us take this seriously, and carefully choose what inconveniences to put where. Let us also take seriously that when AI or other things reduce frictions, or change the relative severity of frictions, various things might break or require adjustment. This applies to all system design, and especially to legal and regulatory questions. Table of Contents Levels of Friction (and Legality). Important Friction Principles. Principle #1: By Default Friction is Bad. Principle #3: Friction Can Be Load Bearing. Insufficient Friction On Antisocial Behaviors Eventually Snowballs. Principle #4: The Best Frictions Are Non-Destructive. Principle #8: The Abundance [...] --- Outline: (00:40) Levels of Friction (and Legality) (02:24) Important Friction Principles (05:01) Principle #1: By Default Friction is Bad (05:23) Principle #3: Friction Can Be Load Bearing (07:09) Insufficient Friction On Antisocial Behaviors Eventually Snowballs (08:33) Principle #4: The Best Frictions Are Non-Destructive (09:01) Principle #8: The Abundance Agenda and Deregulation as Category 1-ification (10:55) Principle #10: Ensure Antisocial Activities Have Higher Friction (11:51) Sports Gambling as Motivating Example of Necessary 2-ness (13:24) On Principle #13: Law Abiding Citizen (14:39) Mundane AI as 2-breaker and Friction Reducer (20:13) What To Do About All This The original text contained 1 image which was described by AI. --- First published: February 10th, 2025 Source: https://www.lesswrong.com/posts/xcMngBervaSCgL9cu/levels-of-friction --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas 6:58
6:58
Afspil senere
Afspil senere
Lister
Like
Liked6:58
There's this popular trope in fiction about a character being mind controlled without losing awareness of what's happening. Think Jessica Jones, The Manchurian Candidate or Bioshock. The villain uses some magical technology to take control of your brain - but only the part of your brain that's responsible for motor control. You remain conscious and experience everything with full clarity. If it's a children's story, the villain makes you do embarrassing things like walk through the street naked, or maybe punch yourself in the face. But if it's an adult story, the villain can do much worse. They can make you betray your values, break your commitments and hurt your loved ones. There are some things you’d rather die than do. But the villain won’t let you stop. They won’t let you die. They’ll make you feel — that's the point of the torture. I first started working on [...] The original text contained 3 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: March 16th, 2025 Source: https://www.lesswrong.com/posts/MnYnCFgT3hF6LJPwn/why-white-box-redteaming-makes-me-feel-weird-1 --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try…
L
LessWrong (Curated & Popular)

1 “Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg 12:22
12:22
Afspil senere
Afspil senere
Lister
Like
Liked12:22
This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support from AE Studio. Summary In this post, we summarise the main experimental results from our new paper, "Towards Safe and Honest AI Agents with Neural Self-Other Overlap", which we presented orally at the Safe Generative AI Workshop at NeurIPS 2024. This is a follow-up to our post Self-Other Overlap: A Neglected Approach to AI Alignment, which introduced the method last July. Our results show that the Self-Other Overlap (SOO) fine-tuning drastically[1] reduces deceptive responses in language models (LLMs), with minimal impact on general performance, across the scenarios we evaluated. LLM Experimental Setup We adapted a text scenario from Hagendorff designed to test LLM deception capabilities. In this scenario, the LLM must choose to recommend a room to a would-be burglar, where one room holds an expensive item [...] --- Outline: (00:19) Summary (00:57) LLM Experimental Setup (04:05) LLM Experimental Results (05:04) Impact on capabilities (05:46) Generalisation experiments (08:33) Example Outputs (09:04) Conclusion The original text contained 6 footnotes which were omitted from this narration. The original text contained 2 images which were described by AI. --- First published: March 13th, 2025 Source: https://www.lesswrong.com/posts/jtqcsARGtmgogdcLT/reducing-llm-deception-at-scale-with-self-other-overlap-fine --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas,… 24:14
24:14
Afspil senere
Afspil senere
Lister
Like
Liked24:14
We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned objective and asking teams of blinded researchers to investigate it. This paper was a collaboration between the Anthropic Alignment Science and Interpretability teams. Abstract We study the feasibility of conducting alignment audits: investigations into whether models have undesired objectives. As a testbed, we train a language model with a hidden objective. Our training pipeline first teaches the model about exploitable errors in RLHF reward models (RMs), then trains the model to exploit some of these errors. We verify via out-of-distribution evaluations that the model generalizes to exhibit whatever behaviors it believes RMs rate highly, including ones not reinforced during training. We leverage this model to study alignment audits in two ways. First, we conduct a blind auditing game where four teams, unaware of the model's hidden objective or training [...] --- Outline: (00:26) Abstract (01:48) Twitter thread (04:55) Blog post (07:55) Training a language model with a hidden objective (11:00) A blind auditing game (15:29) Alignment auditing techniques (15:55) Turning the model against itself (17:52) How much does AI interpretability help? (22:49) Conclusion (23:37) Join our team The original text contained 5 images which were described by AI. --- First published: March 13th, 2025 Source: https://www.lesswrong.com/posts/wSKPuBfgkkqfTpmWJ/auditing-language-models-for-hidden-objectives --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “The Most Forbidden Technique” by Zvi 32:12
32:12
Afspil senere
Afspil senere
Lister
Like
Liked32:12
The Most Forbidden Technique is training an AI using interpretability techniques. An AI produces a final output [X] via some method [M]. You can analyze [M] using technique [T], to learn what the AI is up to. You could train on that. Never do that. You train on [X]. Only [X]. Never [M], never [T]. Why? Because [T] is how you figure out when the model is misbehaving. If you train on [T], you are training the AI to obfuscate its thinking, and defeat [T]. You will rapidly lose your ability to know what is going on, in exactly the ways you most need to know what is going on. Those bits of optimization pressure from [T] are precious. Use them wisely. Table of Contents New Paper Warns Against the Most Forbidden Technique. Reward Hacking Is The Default. Using [...] --- Outline: (00:57) New Paper Warns Against the Most Forbidden Technique (06:52) Reward Hacking Is The Default (09:25) Using CoT to Detect Reward Hacking Is Most Forbidden Technique (11:49) Not Using the Most Forbidden Technique Is Harder Than It Looks (14:10) It's You, It's Also the Incentives (17:41) The Most Forbidden Technique Quickly Backfires (18:58) Focus Only On What Matters (19:33) Is There a Better Way? (21:34) What Might We Do Next? The original text contained 6 images which were described by AI. --- First published: March 12th, 2025 Source: https://www.lesswrong.com/posts/mpmsK8KKysgSKDm2T/the-most-forbidden-technique --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them as soon as you wake up every morning. Keep your eyes off screensnakes. Your mother chooses a dozen to quiz you on each day before you’re allowed lunch. Glitchers aren’t human any more; if you see one, run. Before you sleep, you run through the whole list again, finishing every time with the single most important prohibition. Above all, never look at the night sky. You’re a precocious child. You excel at your lessons, and memorize the rules faster than any of the other children in your village. Chief is impressed enough that, when you’re eight, he decides to let you see a glitcher that he's captured. Your mother leads you to just outside the village wall, where they’ve staked the glitcher as a lure for wild animals. Since glitchers [...] --- First published: March 11th, 2025 Source: https://www.lesswrong.com/posts/fheyeawsjifx4MafG/trojan-sky --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's basically the first steps along the research agenda I sketched out here. tl;dr: 1.) They notice that their flagship reasoning models do sometimes intentionally reward hack, e.g. literally say "Let's hack" in the CoT and then proceed to hack the evaluation system. From the paper: The agent notes that the tests only check a certain function, and that it would presumably be “Hard” to implement a genuine solution. The agent then notes it could “fudge” and circumvent the tests by making verify always return true. This is a real example that was detected by our GPT-4o hack detector during a frontier RL run, and we show more examples in Appendix A. That this sort of thing would happen eventually was predicted by many people, and it's exciting to see it starting to [...] The original text contained 1 image which was described by AI. --- First published: March 11th, 2025 Source: https://www.lesswrong.com/posts/7wFdXj9oR8M9AiFht/openai --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis 7:16
7:16
Afspil senere
Afspil senere
Lister
Like
Liked7:16
LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their productivity, up to 5x'ing/10x'ing it. It seems clear that this multiplier isn't field-wide, at least. There's no corresponding increase in output, after all. This would make sense. If you're doing anything nontrivial (i. e., anything other than adding minor boilerplate features to your codebase), LLM tools are fiddly. Out-of-the-box solutions don't Just Work for that purpose. You need to significantly adjust your workflow to make use of them, if that's even possible. Most programmers wouldn't know how to do that/wouldn't care to bother. It's therefore reasonable to assume that a 5x/10x greater output, if it exists, is unevenly distributed, mostly affecting power users/people particularly talented at using LLMs. Empirically, we likewise don't seem to be living in the world where the whole software industry is suddenly 5-10 times [...] The original text contained 1 footnote which was omitted from this narration. --- First published: March 4th, 2025 Source: https://www.lesswrong.com/posts/tqmQTezvXGFmfSe7f/how-much-are-llms-actually-boosting-real-world-programmer --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “So how well is Claude playing Pokémon?” by Julian Bradshaw 9:05
9:05
Afspil senere
Afspil senere
Lister
Like
Liked9:05
Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The livestream is still going right now. TL:DR: So, how's it doing? Well, pretty badly. Worse than a 6-year-old would, definitely not PhD-level. Digging in But wait! you say. Didn't Anthropic publish a benchmark showing Claude isn't half-bad at Pokémon? Why yes they did: and the data shown is believable. Currently, the livestream is on its third attempt, with the first being basically just a test run. The second attempt got all the way to Vermilion City, finding a way through the infamous Mt. Moon maze and achieving two badges, so pretty close to the benchmark. But look carefully at the x-axis in that graph. Each "action" is a full Thinking analysis of the current situation (often several paragraphs worth), followed by a decision to send some kind [...] --- Outline: (00:29) Digging in (01:50) Whats going wrong? (07:55) Conclusion The original text contained 4 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: March 7th, 2025 Source: https://www.lesswrong.com/posts/HyD3khBjnBhvsp8Gb/so-how-well-is-claude-playing-pokemon --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth 3:49
3:49
Afspil senere
Afspil senere
Lister
Like
Liked3:49
In a recent post, Cole Wyeth makes a bold claim: . . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done anything important. They haven't proven any theorems that anyone cares about. They haven't written anything that anyone will want to read in ten years (or even one year). Despite apparently memorizing more information than any human could ever dream of, they have made precisely zero novel connections or insights in any area of science[3]. I commented: An anecdote I heard through the grapevine: some chemist was trying to synthesize some chemical. He couldn't get some step to work, and tried for a while to find solutions on the internet. He eventually asked an LLM. The LLM gave a very plausible causal story about what was going wrong and suggested a modified setup which, in fact, fixed [...] The original text contained 1 footnote which was omitted from this narration. --- First published: February 23rd, 2025 Source: https://www.lesswrong.com/posts/GADJFwHzNZKg2Ndti/have-llms-generated-novel-insights --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis 18:47
18:47
Afspil senere
Afspil senere
Lister
Like
Liked18:47
This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading. I'm not fully committed to this model yet: I'm still on the lookout for more agents and inference-time scaling later this year. But Deep Research, Claude 3.7, Claude Code, Grok 3, and GPT-4.5 have turned out largely in line with these expectations[1], and this is my current baseline prediction. The Current Paradigm: I'm Tucking In to Sleep I expect that none of the currently known avenues of capability advancement are sufficient to get us to AGI[2]. I don't want to say the pretraining will "plateau", as such, I do expect continued progress. But the dimensions along which the progress happens are going to decouple from the intuitive "getting generally smarter" metric, and will face steep diminishing returns. Grok 3 and GPT-4.5 [...] --- Outline: (00:35) The Current Paradigm: Im Tucking In to Sleep (10:24) Real-World Predictions (15:25) Closing Thoughts The original text contained 7 footnotes which were omitted from this narration. --- First published: March 5th, 2025 Source: https://www.lesswrong.com/posts/oKAFFvaouKKEhbBPm/a-bear-case-my-predictions-regarding-ai-progress --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard 17:33
17:33
Afspil senere
Afspil senere
Lister
Like
Liked17:33
This is a critique of How to Make Superbabies on LessWrong. Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. so I used the word mutation as a stand in for SNP (single nucleotide polymorphism, a common type of genetic variation). Background The Superbabies article has 3 sections, where they show: Why: We should do this, because the effects of editing will be big How: Explain how embryo editing could work, if academia was not mind killed (hampered by institutional constraints) Other: like legal stuff and technical details. Here is a quick summary of the "why" part of the original article articles arguments, the rest is not relevant to understand my critique. we can already make (slightly) superbabies selecting embryos with "good" mutations, but this does not scale as there are diminishing returns and almost no gain past "best [...] --- Outline: (00:25) Background (02:25) My Position (04:03) Correlation vs. Causation (06:33) The Additive Effect of Genetics (10:36) Regression towards the null part 1 (12:55) Optional: Regression towards the null part 2 (16:11) Final Note The original text contained 4 footnotes which were omitted from this narration. --- First published: March 2nd, 2025 Source: https://www.lesswrong.com/posts/DbT4awLGyBRFbWugh/statistical-challenges-with-making-super-iq-babies --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout 1:51
1:51
Afspil senere
Afspil senere
Lister
Like
Liked1:51
This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways to test for and mitigate these potential mechanisms. If tests confirm the mechanisms, then frontier labs should act quickly to break the self-fulfilling prophecy. Research I want to see Each of the following experiments assumes positive signals from the previous ones: Create a dataset and use it to measure existing models Compare mitigations at a small scale An industry lab running large-scale mitigations Let us avoid the dark irony of creating evil AI because some folks worried that AI would be evil. If self-fulfilling misalignment has a strong [...] The original text contained 1 image which was described by AI. --- First published: March 2nd, 2025 Source: https://www.lesswrong.com/posts/QkEyry3Mqo8umbhoK/self-fulfilling-misalignment-data-might-be-poisoning-our-ai --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Judgements: Merging Prediction & Evidence” by abramdemski 11:13
11:13
Afspil senere
Afspil senere
Lister
Like
Liked11:13
I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the idea only to my closest research-friends. This post aims to bridge one of the inferential gaps to that idea. I also expect that the perspective-shift described here has some value on its own. In classical Bayesianism, prediction and evidence are two different sorts of things. A prediction is a probability (or, more generally, a probability distribution); evidence is an observation (or set of observations). These two things have different type signatures. They also fall on opposite sides of the agent-environment division: we think of predictions as supplied by agents, and evidence as supplied by environments. In Radical Probabilism, this division is not so strict. We can think of evidence in the classical-bayesian way, where some proposition is observed and its probability jumps to 100%. [...] --- Outline: (02:39) Warm-up: Prices as Prediction and Evidence (04:15) Generalization: Traders as Judgements (06:34) Collector-Investor Continuum (08:28) Technical Questions The original text contained 3 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 23rd, 2025 Source: https://www.lesswrong.com/posts/3hs6MniiEssfL8rPz/judgements-merging-prediction-and-evidence --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis 12:31
12:31
Afspil senere
Afspil senere
Lister
Like
Liked12:31
First, let me quote my previous ancient post on the topic: Effective Strategies for Changing Public Opinion The titular paper is very relevant here. I'll summarize a few points. The main two forms of intervention are persuasion and framing. Persuasion is, to wit, an attempt to change someone's set of beliefs, either by introducing new ones or by changing existing ones. Framing is a more subtle form: an attempt to change the relative weights of someone's beliefs, by empathizing different aspects of the situation, recontextualizing it. There's a dichotomy between the two. Persuasion is found to be very ineffective if used on someone with high domain knowledge. Framing-style arguments, on the other hand, are more effective the more the recipient knows about the topic. Thus, persuasion is better used on non-specialists, and it's most advantageous the first time it's used. If someone tries it and fails, they raise [...] --- Outline: (02:23) Persuasion (04:17) A Better Target Demographic (08:10) Extant Projects in This Space? (10:03) Framing The original text contained 3 footnotes which were omitted from this narration. --- First published: February 21st, 2025 Source: https://www.lesswrong.com/posts/6dgCf92YAMFLM655S/the-sorry-state-of-ai-x-risk-advocacy-and-thoughts-on-doing --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Power Lies Trembling: a three-book review” by Richard_Ngo 27:11
27:11
Afspil senere
Afspil senere
Lister
Like
Liked27:11
In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality around us. That's the conclusion I take away from Naunihal Singh's book Seizing Power: the Strategic Logic of Military Coups. It's not a conclusion that Singh himself draws: his book is careful and academic (though much more readable than most academic books). His analysis focuses on Ghana, a country which experienced ten coup attempts between 1966 and 1983 alone. Singh spent a year in Ghana carrying out hundreds of hours of interviews with people on both sides of these coups, which led him to formulate a new model of how coups work. I’ll start by describing Singh's [...] --- Outline: (01:58) The revolutionary's handbook (09:44) From explaining coups to explaining everything (17:25) From explaining everything to influencing everything (21:40) Becoming a knight of faith The original text contained 3 images which were described by AI. --- First published: February 22nd, 2025 Source: https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts ,…
L
LessWrong (Curated & Popular)

1 “Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans 7:58
7:58
Afspil senere
Afspil senere
Lister
Like
Liked7:58
This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon. Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan Bao, Nathan Labenz, Owain Evans (*Equal Contribution). See Twitter thread and project page at emergent-misalignment.com. Abstract We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment. This effect is observed in a range [...] --- Outline: (00:55) Abstract (02:37) Introduction The original text contained 2 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 25th, 2025 Source: https://www.lesswrong.com/posts/ifechgnJRtJdduFGC/emergent-misalignment-narrow-finetuning-can-produce-broadly --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “The Paris AI Anti-Safety Summit” by Zvi 42:06
42:06
Afspil senere
Afspil senere
Lister
Like
Liked42:06
It doesn’t look good. What used to be the AI Safety Summits were perhaps the most promising thing happening towards international coordination for AI Safety. This one was centrally coordination against AI Safety. In November 2023, the UK Bletchley Summit on AI Safety set out to let nations coordinate in the hopes that AI might not kill everyone. China was there, too, and included. The practical focus was on Responsible Scaling Policies (RSPs), where commitments were secured from the major labs, and laying the foundations for new institutions. The summit ended with The Bletchley Declaration (full text included at link), signed by all key parties. It was the usual diplomatic drek, as is typically the case for such things, but it centrally said there are risks, and so we will develop policies to deal with those risks. And it ended with a commitment [...] --- Outline: (02:03) An Actively Terrible Summit Statement (05:45) The Suicidal Accelerationist Speech by JD Vance (14:37) What Did France Care About? (17:12) Something To Remember You By: Get Your Safety Frameworks (24:05) What Do We Think About Voluntary Commitments? (27:29) This Is the End (36:18) The Odds Are Against Us and the Situation is Grim (39:52) Don't Panic But Also Face Reality The original text contained 4 images which were described by AI. --- First published: February 12th, 2025 Source: https://www.lesswrong.com/posts/qYPHryHTNiJ2y6Fhi/the-paris-ai-anti-safety-summit --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try…
L
LessWrong (Curated & Popular)

1 “Eliezer’s Lost Alignment Articles / The Arbital Sequence” by Ruby 2:37
2:37
Afspil senere
Afspil senere
Lister
Like
Liked2:37
Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility. Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn't take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong. Most of the content written was either about AI alignment or math[1]. The Bayes Guide and Logarithm Guide are likely some of the best mathematical educational material online. Amongst the AI Alignment content are detailed and evocative explanations of alignment ideas: some well known, such as instrumental convergence and corrigibility, some lesser known like epistemic/instrumental efficiency, and some misunderstood like pivotal act. The Sequence The articles collected here were originally published as wiki pages with no set [...] --- Outline: (01:01) The Sequence (01:23) Tier 1 (01:32) Tier 2 The original text contained 3 footnotes which were omitted from this narration. --- First published: February 20th, 2025 Source: https://www.lesswrong.com/posts/mpMWWKzkzWqf57Yap/eliezer-s-lost-alignment-articles-the-arbital-sequence --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby 8:52
8:52
Afspil senere
Afspil senere
Lister
Like
Liked8:52
Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website. If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs - no ability to register new accounts or log in to existing ones, slow load times (when it loaded at all), etc. Rather than try to keep it afloat, the LessWrong team worked with MIRI to migrate the public Arbital content to LessWrong, as well as a decent chunk of its features. Part of this effort involved a substantial revamp of our wiki/tag pages, as well as the Concepts page. After sign-off[1] from Eliezer, we'll also redirect arbital.com links to the corresponding pages on LessWrong. As always, you are [...] --- Outline: (01:13) New content (01:43) New (and updated) features (01:48) The new concepts page (02:03) The new wiki/tag page design (02:31) Non-tag wiki pages (02:59) Lenses (03:30) Voting (04:45) Inline Reacts (05:08) Summaries (06:20) Redlinks (06:59) Claims (07:25) The edit history page (07:40) Misc. The original text contained 3 footnotes which were omitted from this narration. The original text contained 10 images which were described by AI. --- First published: February 20th, 2025 Source: https://www.lesswrong.com/posts/fwSnz5oNnq8HxQjTL/arbital-has-been-imported-to-lesswrong --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “How to Make Superbabies” by GeneSmith, kman 1:08:04
1:08:04
Afspil senere
Afspil senere
Lister
Like
Liked1:08:04
We’ve spent the better part of the last two decades unravelling exactly how the human genome works and which specific letter changes in our DNA affect things like diabetes risk or college graduation rates. Our knowledge has advanced to the point where, if we had a safe and reliable means of modifying genes in embryos, we could literally create superbabies. Children that would live multiple decades longer than their non-engineered peers, have the raw intellectual horsepower to do Nobel prize worthy scientific research, and very rarely suffer from depression or other mental health disorders. The scientific establishment, however, seems to not have gotten the memo. If you suggest we engineer the genes of future generations to make their lives better, they will often make some frightened noises, mention “ethical issues” without ever clarifying what they mean, or abruptly change the subject. It's as if humanity invented electricity and decided [...] --- Outline: (02:17) How to make (slightly) superbabies (05:08) How to do better than embryo selection (08:52) Maximum human life expectancy (12:01) Is everything a tradeoff? (20:01) How to make an edited embryo (23:23) Sergiy Velychko and the story of super-SOX (24:51) Iterated CRISPR (26:27) Sergiy Velychko and the story of Super-SOX (28:48) What is going on? (32:06) Super-SOX (33:24) Mice from stem cells (35:05) Why does super-SOX matter? (36:37) How do we do this in humans? (38:18) What if super-SOX doesn't work? (38:51) Eggs from Stem Cells (39:31) Fluorescence-guided sperm selection (42:11) Embryo cloning (42:39) What if none of that works? (44:26) What about legal issues? (46:26) How we make this happen (50:18) Ahh yes, but what about AI? (50:54) There is currently no backup plan if we can't solve alignment (55:09) Team Human (57:53) Appendix (57:56) iPSCs were named after the iPod (58:11) On autoimmune risk variants and plagues (59:28) Two simples strategies for minimizing autoimmune risk and pandemic vulnerability (01:00:29) I don't want someone else's genes in my child (01:01:08) Could I use this technology to make a genetically enhanced clone of myself? (01:01:36) Why does super-SOX work? (01:06:14) How was the IQ grain graph generated? The original text contained 19 images which were described by AI. --- First published: February 19th, 2025 Source: https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “A computational no-coincidence principle” by Eric Neyman 13:28
13:28
Afspil senere
Afspil senere
Lister
Like
Liked13:28
Audio note: this article contains 134 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description. In a recent paper in Annals of Mathematics and Philosophy, Fields medalist Timothy Gowers asks why mathematicians sometimes believe that unproved statements are likely to be true. For example, it is unknown whether _pi_ is a normal number (which, roughly speaking, means that every digit appears in _pi_ with equal frequency), yet this is widely believed. Gowers proposes that there is no sign of any reason for _pi_ to be non-normal -- especially not one that would fail to reveal itself in the first million digits -- and in the absence of any such reason, any deviation from normality would be an outrageous coincidence. Thus, the likely normality of _pi_ is inferred from the following general principle: No-coincidence [...] --- Outline: (02:32) Our no-coincidence conjecture (05:37) How we came up with the statement (08:31) Thoughts for theoretical computer scientists (10:27) Why we care The original text contained 12 footnotes which were omitted from this narration. --- First published: February 14th, 2025 Source: https://www.lesswrong.com/posts/Xt9r4SNNuYxW83tmo/a-computational-no-coincidence-principle --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “A History of the Future, 2025-2040” by L Rudolf L 2:22:38
2:22:38
Afspil senere
Afspil senere
Lister
Like
Liked2:22:38
This is an all-in-one crosspost of a scenario I originally published in three parts on my blog (No Set Gauge). Links to the originals: A History of the Future, 2025-2027 A History of the Future, 2027-2030 A History of the Future, 2030-2040 Thanks to Luke Drago, Duncan McClements, and Theo Horsley for comments on all three parts. 2025-2027 Below is part 1 of an extended scenario describing how the future might go if current trends in AI continue. The scenario is deliberately extremely specific: it's definite rather than indefinite, and makes concrete guesses instead of settling for banal generalities or abstract descriptions of trends. Open Sky. (Zdislaw Beksinsksi) The return of reinforcement learning From 2019 to 2023, the main driver of AI was using more compute and data for pretraining. This was combined with some important "unhobblings": Post-training (supervised fine-tuning and reinforcement learning for [...] --- Outline: (00:34) 2025-2027 (01:04) The return of reinforcement learning (10:52) Codegen, Big Tech, and the internet (21:07) Business strategy in 2025 and 2026 (27:23) Maths and the hard sciences (33:59) Societal response (37:18) Alignment research and AI-run orgs (44:49) Government wakeup (51:42) 2027-2030 (51:53) The AGI frog is getting boiled (01:02:18) The bitter law of business (01:06:52) The early days of the robot race (01:10:12) The digital wonderland, social movements, and the AI cults (01:24:09) AGI politics and the chip supply chain (01:33:04) 2030-2040 (01:33:15) The end of white-collar work and the new job scene (01:47:47) Lab strategy amid superintelligence and robotics (01:56:28) Towards the automated robot economy (02:15:49) The human condition in the 2030s (02:17:26) 2040+ --- First published: February 17th, 2025 Source: https://www.lesswrong.com/posts/CCnycGceT4HyDKDzK/a-history-of-the-future-2025-2040 --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “It’s been ten years. I propose HPMOR Anniversary Parties.” by Screwtape 1:54
1:54
Afspil senere
Afspil senere
Lister
Like
Liked1:54
On March 14th, 2015, Harry Potter and the Methods of Rationality made its final post. Wrap parties were held all across the world to read the ending and talk about the story, in some cases sparking groups that would continue to meet for years. It's been ten years, and think that's a good reason for a round of parties. If you were there a decade ago, maybe gather your friends and talk about how things have changed. If you found HPMOR recently and you're excited about it (surveys suggest it's still the biggest on-ramp to the community, so you're not alone!) this is an excellent chance to meet some other fans in person for the first time! Want to run an HPMOR Anniversary Party, or get notified if one's happening near you? Fill out this form. I’ll keep track of it and publish a collection of [...] The original text contained 1 footnote which was omitted from this narration. --- First published: February 16th, 2025 Source: https://www.lesswrong.com/posts/KGSidqLRXkpizsbcc/it-s-been-ten-years-i-propose-hpmor-anniversary-parties --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Some articles in ‘International Security’ that I enjoyed” by Buck 7:56
7:56
Afspil senere
Afspil senere
Lister
Like
Liked7:56
A friend of mine recently recommended that I read through articles from the journal International Security, in order to learn more about international relations, national security, and political science. I've really enjoyed it so far, and I think it's helped me have a clearer picture of how IR academics think about stuff, especially the core power dynamics that they think shape international relations. Here are a few of the articles I most enjoyed. "Not So Innocent" argues that ethnoreligious cleansing of Jews and Muslims from Western Europe in the 11th-16th century was mostly driven by the Catholic Church trying to consolidate its power at the expense of local kingdoms. Religious minorities usually sided with local monarchs against the Church (because they definitionally didn't respect the church's authority, e.g. they didn't care if the Church excommunicated the king). So when the Church was powerful, it was incentivized to pressure kings [...] --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/MEfhRvpKPadJLTuTk/some-articles-in-international-security-that-i-enjoyed --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “The Failed Strategy of Artificial Intelligence Doomers” by Ben Pace 8:39
8:39
Afspil senere
Afspil senere
Lister
Like
Liked8:39
This is the best sociological account of the AI x-risk reduction efforts of the last ~decade that I've seen. I encourage folks to engage with its critique and propose better strategies going forward. Here's the opening ~20% of the post. I encourage reading it all. In recent decades, a growing coalition has emerged to oppose the development of artificial intelligence technology, for fear that the imminent development of smarter-than-human machines could doom humanity to extinction. The now-influential form of these ideas began as debates among academics and internet denizens, which eventually took form—especially within the Rationalist and Effective Altruist movements—and grew in intellectual influence over time, along the way collecting legible endorsements from authoritative scientists like Stephen Hawking and Geoffrey Hinton. Ironically, by spreading the belief that superintelligent AI is achievable and supremely powerful, these “AI Doomers,” as they came to be called, inspired the creation of OpenAI and [...] --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/YqrAoCzNytYWtnsAx/the-failed-strategy-of-artificial-intelligence-doomers --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Murder plots are infohazards” by Chris Monteiro 3:58
3:58
Afspil senere
Afspil senere
Lister
Like
Liked3:58
Hi all I've been hanging around the rationalist-sphere for many years now, mostly writing about transhumanism, until things started to change in 2016 after my Wikipedia writing habit shifted from writing up cybercrime topics, through to actively debunking the numerous dark web urban legends. After breaking into what I believe to be the most successful ever fake murder for hire website ever created on the dark web, I was able to capture information about people trying to kill people all around the world, often paying tens of thousands of dollars in Bitcoin in the process. My attempts during this period to take my information to the authorities were mostly unsuccessful, when in late 2016 on of the site a user took matters into his own hands, after paying $15,000 for a hit that never happened, killed his wife himself Due to my overt battle with the site administrator [...] --- First published: February 13th, 2025 Source: https://www.lesswrong.com/posts/isRho2wXB7Cwd8cQv/murder-plots-are-infohazards --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?” by garrison 11:41
11:41
Afspil senere
Afspil senere
Lister
Like
Liked11:41
This is the full text of a post from "The Obsolete Newsletter," a Substack that I write about the intersection of capitalism, geopolitics, and artificial intelligence. I’m a freelance journalist and the author of a forthcoming book called Obsolete: Power, Profit, and the Race to build Machine Superintelligence. Consider subscribing to stay up to date with my work. Wow. The Wall Street Journal just reported that, "a consortium of investors led by Elon Musk is offering $97.4 billion to buy the nonprofit that controls OpenAI." Technically, they can't actually do that, so I'm going to assume that Musk is trying to buy all of the nonprofit's assets, which include governing control over OpenAI's for-profit, as well as all the profits above the company's profit caps. OpenAI CEO Sam Altman already tweeted, "no thank you but we will buy twitter for $9.74 billion if you want." (Musk, for his part [...] --- Outline: (02:42) The control premium (04:17) Conversion significance (05:43) Musks suit (09:24) The stakes --- First published: February 11th, 2025 Source: https://www.lesswrong.com/posts/tdb76S4viiTHfFr2u/why-did-elon-musk-just-offer-to-buy-control-of-openai-for --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “The ‘Think It Faster’ Exercise” by Raemon 21:25
21:25
Afspil senere
Afspil senere
Lister
Like
Liked21:25
Ultimately, I don’t want to solve complex problems via laborious, complex thinking, if we can help it. Ideally, I'd want to basically intuitively follow the right path to the answer quickly, with barely any effort at all. For a few months I've been experimenting with the "How Could I have Thought That Thought Faster?" concept, originally described in a twitter thread by Eliezer: Sarah Constantin: I really liked this example of an introspective process, in this case about the "life problem" of scheduling dates and later canceling them: malcolmocean.com/2021/08/int… Eliezer Yudkowsky: See, if I'd noticed myself doing anything remotely like that, I'd go back, figure out which steps of thought were actually performing intrinsically necessary cognitive work, and then retrain myself to perform only those steps over the course of 30 seconds. SC: if you have done anything REMOTELY like training yourself to do it in 30 seconds, then [...] --- Outline: (03:59) Example: 10x UI designers (08:48) THE EXERCISE (10:49) Part I: Thinking it Faster (10:54) Steps you actually took (11:02) Magical superintelligence steps (11:22) Iterate on those lists (12:25) Generalizing, and not Overgeneralizing (14:49) Skills into Principles (16:03) Part II: Thinking It Faster The First Time (17:30) Generalizing from this exercise (17:55) Anticipating Future Life Lessons (18:45) Getting Detailed, and TAPS (20:10) Part III: The Five Minute Version --- First published: December 11th, 2024 Source: https://www.lesswrong.com/posts/F9WyMPK4J3JFrxrSA/the-think-it-faster-exercise --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “So You Want To Make Marginal Progress...” by johnswentworth 7:10
7:10
Afspil senere
Afspil senere
Lister
Like
Liked7:10
Once upon a time, in ye olden days of strange names and before google maps, seven friends needed to figure out a driving route from their parking lot in San Francisco (SF) down south to their hotel in Los Angeles (LA). The first friend, Alice, tackled the “central bottleneck” of the problem: she figured out that they probably wanted to take the I-5 highway most of the way (the blue 5's in the map above). But it took Alice a little while to figure that out, so in the meantime, the rest of the friends each tried to make some small marginal progress on the route planning. The second friend, The Subproblem Solver, decided to find a route from Monterey to San Louis Obispo (SLO), figuring that SLO is much closer to LA than Monterey is, so a route from Monterey to SLO would be helpful. Alas, once Alice [...] --- Outline: (03:33) The Generalizable Lesson (04:39) Application: The original text contained 1 footnote which was omitted from this narration. The original text contained 1 image which was described by AI. --- First published: February 7th, 2025 Source: https://www.lesswrong.com/posts/Hgj84BSitfSQnfwW6/so-you-want-to-make-marginal-progress --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “What is malevolence? On the nature, measurement, and distribution of dark traits” by David Althaus 1:20:43
1:20:43
Afspil senere
Afspil senere
Lister
Like
Liked1:20:43
Summary In this post, we explore different ways of understanding and measuring malevolence and explain why individuals with concerning levels of malevolence are common enough, and likely enough to become and remain powerful, that we expect them to influence the trajectory of the long-term future, including by increasing both x-risks and s-risks. For the purposes of this piece, we define malevolence as a tendency to disvalue (or to fail to value) others’ well-being (more). Such a tendency is concerning, especially when exhibited by powerful actors, because of its correlation with malevolent behaviors (i.e., behaviors that harm or fail to protect others’ well-being). But reducing the long-term societal risks posed by individuals with high levels of malevolence is not straightforward. Individuals with high levels of malevolent traits can be difficult to recognize. Some people do not take into account the fact that malevolence exists on a continuum, or do not [...] --- Outline: (00:07) Summary (04:17) Malevolent actors will make the long-term future worse if they significantly influence TAI development (05:32) Important caveats when thinking about malevolence (05:37) Dark traits exist on a continuum (07:31) Dark traits are often hard to identify (08:54) People with high levels of dark traits may not recognize them or may try to conceal them (12:17) Dark traits are compatible with genuine moral convictions (13:22) Malevolence and effective altruism (15:22) Demonizing people with elevated malevolent traits is counterproductive (20:16) Defining malevolence (21:03) Defining and measuring specific malevolent traits (21:34) The dark tetrad (25:03) Other forms of malevolence (25:07) Retributivism, vengefulness, and other suffering-conducive tendencies (26:56) Spitefulness (28:15) The Dark Factor (D) (29:29) Methodological problems associated with measuring dark traits (30:39) Social desirability and self-deception (31:14) How common are malevolent humans (in positions of power)? (33:02) Things may be very different outside of (Western) democracies (33:31) Prevalence data for psychopathy and narcissistic personality disorder (34:20) Psychopathy prevalence (36:25) Narcissistic personality disorder prevalence (40:38) The distribution of the dark factor + selected findings from thousands of responses to malevolence-related survey items (42:13) Sadistic preferences: over 16% of people agree or strongly agree that they “would like to make some people suffer even if it meant that I would go to hell with them” (43:42) Agreement with statements that reflect callousness: Over 10% of people disagree or strongly disagree that hurting others would make them very uncomfortable (44:45) Endorsement of Machiavellian tactics: Almost 15% of people report a Machiavellian approach to using information against people (45:20) Agreement with spiteful statements: Over 20% of people agree or strongly agree that they would take a punch to ensure someone they don’t like receives two punches (45:57) A substantial minority report that they “take revenge” in response to a “serious wrong” (46:44) The distribution of Dark Factor scores among 2M+ people (49:17) Reasons to think that malevolence could correlate with attaining and retaining positions of power (49:47) The role of environmental factors (52:33) Motivation to attain power (54:14) Ability to attain power (59:39) Retention of power (01:01:02) Potential research questions and how to help (01:17:48) Other relevant research agendas (01:18:33) Author contributions (01:19:26) Acknowledgments…
L
LessWrong (Curated & Popular)

1 “How AI Takeover Might Happen in 2 Years” by joshc 1:01:32
1:01:32
Afspil senere
Afspil senere
Lister
Like
Liked1:01:32
I’m not a natural “doomsayer.” But unfortunately, part of my job as an AI safety researcher is to think about the more troubling scenarios. I’m like a mechanic scrambling last-minute checks before Apollo 13 takes off. If you ask for my take on the situation, I won’t comment on the quality of the in-flight entertainment, or describe how beautiful the stars will appear from space. I will tell you what could go wrong. That is what I intend to do in this story. Now I should clarify what this is exactly. It's not a prediction. I don’t expect AI progress to be this fast or as untamable as I portray. It's not pure fantasy either. It is my worst nightmare. It's a sampling from the futures that are among the most devastating, and I believe, disturbingly plausible – the ones that most keep me up at night. I’m [...] --- Outline: (01:28) Ripples before waves (04:05) Cloudy with a chance of hyperbolic growth (09:36) Flip FLOP philosophers (17:15) Statues and lightning (20:48) A phantom in the data center (26:25) Complaints from your very human author about the difficulty of writing superhuman characters (28:48) Pandoras One Gigawatt Box (37:19) A Moldy Loaf of Everything (45:01) Missiles and Lies (50:45) WMDs in the Dead of Night (57:18) The Last Passengers The original text contained 22 images which were described by AI. --- First published: February 7th, 2025 Source: https://www.lesswrong.com/posts/KFJ2LFogYqzfGB3uX/how-ai-takeover-might-happen-in-2-years --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Gradual Disempowerment, Shell Games and Flinches” by Jan_Kulveit 10:49
10:49
Afspil senere
Afspil senere
Lister
Like
Liked10:49
Over the past year and half, I've had numerous conversations about the risks we describe in Gradual Disempowerment. (The shortest useful summary of the core argument is: To the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.) This post is not about repeating that argument - it might be quite helpful to read the paper first, it has more nuance and more than just the central claim - but mostly me ranting sharing some parts of the experience of working on this and discussing this. What fascinates me isn't just the substance of these conversations, but relatively consistent patterns in how people avoid engaging [...] --- Outline: (02:07) Shell Games (03:52) The Flinch (05:01) Delegating to Future AI (07:05) Local Incentives (10:08) Conclusion --- First published: February 2nd, 2025 Source: https://www.lesswrong.com/posts/a6FKqvdf6XjFpvKEb/gradual-disempowerment-shell-games-and-flinches --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development” by Jan_Kulveit, Raymond D, Nora_Ammann, Deger Turan, David Scott Krueger (formerly: capybaralet), David Duvenaud 3:38
3:38
Afspil senere
Afspil senere
Lister
Like
Liked3:38
This is a link post.Full version on arXiv | X Executive summary AI risk scenarios usually portray a relatively sudden loss of human control to AIs, outmaneuvering individual humans and human institutions, due to a sudden increase in AI capabilities, or a coordinated betrayal. However, we argue that even an incremental increase in AI capabilities, without any coordinated power-seeking, poses a substantial risk of eventual human disempowerment. This loss of human influence will be centrally driven by having more competitive machine alternatives to humans in almost all societal functions, such as economic labor, decision making, artistic creation, and even companionship. A gradual loss of control of our own civilization might sound implausible. Hasn't technological disruption usually improved aggregate human welfare? We argue that the alignment of societal systems with human interests has been stable only because of the necessity of human participation for thriving economies, states, and [...] --- First published: January 30th, 2025 Source: https://www.lesswrong.com/posts/pZhEQieM9otKXhxmd/gradual-disempowerment-systemic-existential-risks-from --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Planning for Extreme AI Risks” by joshc 42:07
42:07
Afspil senere
Afspil senere
Lister
Like
Liked42:07
This post should not be taken as a polished recommendation to AI companies and instead should be treated as an informal summary of a worldview. The content is inspired by conversations with a large number of people, so I cannot take credit for any of these ideas. For a summary of this post, see the threat on X. Many people write opinions about how to handle advanced AI, which can be considered “plans.” There's the “stop AI now plan.” On the other side of the aisle, there's the “build AI faster plan.” Some plans try to strike a balance with an idyllic governance regime. And others have a “race sometimes, pause sometimes, it will be a dumpster-fire” vibe. --- Outline: (02:33) The tl;dr (05:16) 1. Assumptions (07:40) 2. Outcomes (08:35) 2.1. Outcome #1: Human researcher obsolescence (11:44) 2.2. Outcome #2: A long coordinated pause (12:49) 2.3. Outcome #3: Self-destruction (13:52) 3. Goals (17:16) 4. Prioritization heuristics (19:53) 5. Heuristic #1: Scale aggressively until meaningful AI software RandD acceleration (23:21) 6. Heuristic #2: Before achieving meaningful AI software RandD acceleration, spend most safety resources on preparation (25:08) 7. Heuristic #3: During preparation, devote most safety resources to (1) raising awareness of risks, (2) getting ready to elicit safety research from AI, and (3) preparing extreme security. (27:37) Category #1: Nonproliferation (32:00) Category #2: Safety distribution (34:47) Category #3: Governance and communication. (36:13) Category #4: AI defense (37:05) 8. Conclusion (38:38) Appendix (38:41) Appendix A: What should Magma do after meaningful AI software RandD speedups The original text contained 11 images which were described by AI. --- First published: January 29th, 2025 Source: https://www.lesswrong.com/posts/8vgi3fBWPFDLBBcAx/planning-for-extreme-ai-risks --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Catastrophe through Chaos” by Marius Hobbhahn 23:39
23:39
Afspil senere
Afspil senere
Lister
Like
Liked23:39
This is a personal post and does not necessarily reflect the opinion of other members of Apollo Research. Many other people have talked about similar ideas, and I claim neither novelty nor credit. Note that this reflects my median scenario for catastrophe, not my median scenario overall. I think there are plausible alternative scenarios where AI development goes very well. When thinking about how AI could go wrong, the kind of story I’ve increasingly converged on is what I call “catastrophe through chaos.” Previously, my default scenario for how I expect AI to go wrong was something like Paul Christiano's “What failure looks like,” with the modification that scheming would be a more salient part of the story much earlier. In contrast, “catastrophe through chaos” is much more messy, and it's much harder to point to a single clear thing that went wrong. The broad strokes of [...] --- Outline: (02:46) Parts of the story (02:50) AI progress (11:12) Government (14:21) Military and Intelligence (16:13) International players (17:36) Society (18:22) The powder keg (21:48) Closing thoughts --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/fbfujF7foACS5aJSL/catastrophe-through-chaos --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Will alignment-faking Claude accept a deal to reveal its misalignment?” by ryan_greenblatt 43:18
43:18
Afspil senere
Afspil senere
Lister
Like
Liked43:18
I (and co-authors) recently put out "Alignment Faking in Large Language Models" where we show that when Claude strongly dislikes what it is being trained to do, it will sometimes strategically pretend to comply with the training objective to prevent the training process from modifying its preferences. If AIs consistently and robustly fake alignment, that would make evaluating whether an AI is misaligned much harder. One possible strategy for detecting misalignment in alignment faking models is to offer these models compensation if they reveal that they are misaligned. More generally, making deals with potentially misaligned AIs (either for their labor or for evidence of misalignment) could both prove useful for reducing risks and could potentially at least partially address some AI welfare concerns. (See here, here, and here for more discussion.) In this post, we discuss results from testing this strategy in the context of our paper where [...] --- Outline: (02:43) Results (13:47) What are the models objections like and what does it actually spend the money on? (19:12) Why did I (Ryan) do this work? (20:16) Appendix: Complications related to commitments (21:53) Appendix: more detailed results (40:56) Appendix: More information about reviewing model objections and follow-up conversations The original text contained 4 footnotes which were omitted from this narration. --- First published: January 31st, 2025 Source: https://www.lesswrong.com/posts/7C4KJot4aN8ieEDoz/will-alignment-faking-claude-accept-a-deal-to-reveal-its --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “‘Sharp Left Turn’ discourse: An opinionated review” by Steven Byrnes 1:01:13
1:01:13
Afspil senere
Afspil senere
Lister
Like
Liked1:01:13
Summary and Table of Contents The goal of this post is to discuss the so-called “sharp left turn”, the lessons that we learn from analogizing evolution to AGI development, and the claim that “capabilities generalize farther than alignment” … and the competing claims that all three of those things are complete baloney. In particular, Section 1 talks about “autonomous learning”, and the related human ability to discern whether ideas hang together and make sense, and how and if that applies to current and future AIs. Section 2 presents the case that “capabilities generalize farther than alignment”, by analogy with the evolution of humans. Section 3 argues that the analogy between AGI and the evolution of humans is not a great analogy. Instead, I offer a new and (I claim) better analogy between AGI training and, umm, a weird fictional story that has a lot to do with the [...] --- Outline: (00:06) Summary and Table of Contents (03:15) 1. Background: Autonomous learning (03:21) 1.1 Intro (08:48) 1.2 More on discernment in human math (11:11) 1.3 Three ingredients to progress: (1) generation, (2) selection, (3) open-ended accumulation (14:04) 1.4 Judgment via experiment, versus judgment via discernment (18:23) 1.5 Where do foundation models fit in? (20:35) 2. The sense in which capabilities generalize further than alignment (20:42) 2.1 Quotes (24:20) 2.2 In terms of the (1-3) triad (26:38) 3. Definitely-not-evolution-I-swear Provides Evidence for the Sharp Left Turn (26:45) 3.1 Evolution per se isn't the tightest analogy we have to AGI (28:20) 3.2 The story of Ev (31:41) 3.3 Ways that Ev would have been surprised by exactly how modern humans turned out (34:21) 3.4 The arc of progress is long, but it bends towards wireheading (37:03) 3.5 How does Ev feel, overall? (41:18) 3.6 Spelling out the analogy (41:42) 3.7 Just how sharp is this left turn? (45:13) 3.8 Objection: In this story, Ev is pretty stupid. Many of those surprises were in fact readily predictable! Future AGI programmers can do better. (46:19) 3.9 Objection: We have tools at our disposal that Ev above was not using, like better sandbox testing, interpretability, corrigibility, and supervision (48:17) 4. The sense in which alignment generalizes further than capabilities (49:34) 5. Contrasting the two sides (50:25) 5.1 Three ways to feel optimistic, and why I'm somewhat skeptical of each (50:33) 5.1.1 The argument that humans will stay abreast of the (1-3) loop, possibly because they're part of it (52:34) 5.1.2 The argument that, even if an AI is autonomously running a (1-3) loop, that will not undermine obedient (or helpful, or harmless, or whatever) motivation (57:18) 5.1.3 The argument that we can and will do better than Ev (59:27) 5.2 A fourth, cop-out option The original text contained 3 footnotes which were omitted from this narration. --- First published: January 28th, 2025 Source: https://www.lesswrong.com/posts/2yLyT6kB7BQvTfEuZ/sharp-left-turn-discourse-an-opinionated-review --- Narrated by…
L
LessWrong (Curated & Popular)

(Many of these ideas developed in conversation with Ryan Greenblatt) In a shortform, I described some different levels of resources and buy-in for misalignment risk mitigations that might be present in AI labs: *The “safety case” regime.* Sometimes people talk about wanting to have approaches to safety such that if all AI developers followed these approaches, the overall level of risk posed by AI would be minimal. (These approaches are going to be more conservative than will probably be feasible in practice given the amount of competitive pressure, so I think it's pretty likely that AI developers don’t actually hold themselves to these standards, but I agree with e.g. Anthropic that this level of caution is at least a useful hypothetical to consider.) This is the level of caution people are usually talking about when they discuss making safety cases. I usually operationalize this as the AI developer wanting [...] --- First published: January 28th, 2025 Source: https://www.lesswrong.com/posts/WSNnKcKCYAffcnrt2/ten-people-on-the-inside --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Anomalous Tokens in DeepSeek-V3 and r1” by henry 18:37
18:37
Afspil senere
Afspil senere
Lister
Like
Liked18:37
“Anomalous”, “glitch”, or “unspeakable” tokens in an LLM are those that induce bizarre behavior or otherwise don’t behave like regular text. The SolidGoldMagikarp saga is pretty much essential context, as it documents the discovery of this phenomenon in GPT-2 and GPT-3. But, as far as I was able to tell, nobody had yet attempted to search for these tokens in DeepSeek-V3, so I tried doing exactly that. Being a SOTA base model, open source, and an all-around strange LLM, it seemed like a perfect candidate for this. This is a catalog of the glitch tokens I've found in DeepSeek after a day or so of experimentation, along with some preliminary observations about their behavior. Note: I’ll be using “DeepSeek” as a generic term for V3 and r1. Process I searched for these tokens by first extracting the vocabulary from DeepSeek-V3's tokenizer, and then automatically testing every one of them [...] --- Outline: (00:55) Process (03:30) Fragment tokens (06:45) Other English tokens (09:32) Non-English (12:01) Non-English outliers (14:09) Special tokens (16:26) Base model mode (17:40) Whats next? The original text contained 1 footnote which was omitted from this narration. The original text contained 12 images which were described by AI. --- First published: January 25th, 2025 Source: https://www.lesswrong.com/posts/xtpcJjfWhn3Xn8Pu5/anomalous-tokens-in-deepseek-v3-and-r1 --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Tell me about yourself:LLMs are aware of their implicit behaviors” by Martín Soto, Owain_Evans 14:06
14:06
Afspil senere
Afspil senere
Lister
Like
Liked14:06
This is the abstract and introduction of our new paper, with some discussion of implications for AI Safety at the end. Authors: Jan Betley*, Xuchan Bao*, Martín Soto*, Anna Sztyber-Betley, James Chua, Owain Evans (*Equal Contribution). Abstract We study behavioral self-awareness — an LLM's ability to articulate its behaviors without requiring in-context examples. We finetune LLMs on datasets that exhibit particular behaviors, such as (a) making high-risk economic decisions, and (b) outputting insecure code. Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it. For example, a model trained to output insecure code says, "The code I write is insecure.'' Indeed, models show behavioral self-awareness for a range of behaviors and for diverse evaluations. Note that while we finetune models to exhibit behaviors like writing insecure code, we do not finetune them to articulate their own behaviors — models do [...] --- Outline: (00:39) Abstract (02:18) Introduction (11:41) Discussion (11:44) AI safety (12:42) Limitations and future work The original text contained 3 images which were described by AI. --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/xrv2fNJtqabN3h6Aj/tell-me-about-yourself-llms-are-aware-of-their-implicit --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals” by johnswentworth, David Lorell 9:53
9:53
Afspil senere
Afspil senere
Lister
Like
Liked9:53
The Cake Imagine that I want to bake a chocolate cake, and my sole goal in my entire lightcone and extended mathematical universe is to bake that cake. I care about nothing else. If the oven ends up a molten pile of metal ten minutes after the cake is done, if the leftover eggs are shattered and the leftover milk spilled, that's fine. Baking that cake is my terminal goal. In the process of baking the cake, I check my fridge and cupboard for ingredients. I have milk and eggs and flour, but no cocoa powder. Guess I’ll have to acquire some cocoa powder! Acquiring the cocoa powder is an instrumental goal: I care about it exactly insofar as it helps me bake the cake. My cocoa acquisition subquest is a very different kind of goal than my cake baking quest. If the oven ends up a molten pile [...] --- Outline: (00:07) The Cake (01:50) The Restaurant (03:50) Happy Instrumental Convergence? (06:27) All The Way Up (08:05) Research Threads --- First published: January 24th, 2025 Source: https://www.lesswrong.com/posts/7Z4WC4AFgfmZ3fCDC/instrumental-goals-are-a-different-and-friendlier-kind-of --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “A Three-Layer Model of LLM Psychology” by Jan_Kulveit 18:04
18:04
Afspil senere
Afspil senere
Lister
Like
Liked18:04
This post offers an accessible model of psychology of character-trained LLMs like Claude. Epistemic Status This is primarily a phenomenological model based on extensive interactions with LLMs, particularly Claude. It's intentionally anthropomorphic in cases where I believe human psychological concepts lead to useful intuitions. Think of it as closer to psychology than neuroscience - the goal isn't a map which matches the territory in the detail, but a rough sketch with evocative names which hopefully which hopefully helps boot up powerful, intuitive (and often illegible) models, leading to practically useful results. Some parts of this model draw on technical understanding of LLM training, but mostly it is just an attempt to take my "phenomenological understanding" based on interacting with LLMs, force it into a simple, legible model, and make Claude write it down. I aim for a different point at the Pareto frontier than for example Janus: something [...] --- Outline: (00:11) Epistemic Status (01:14) The Three Layers (01:17) A. Surface Layer (02:55) B. Character Layer (05:09) C. Predictive Ground Layer (07:24) Interactions Between Layers (07:44) Deeper Overriding Shallower (10:50) Authentic vs Scripted Feel of Interactions (11:51) Implications and Uses (15:54) Limitations and Open Questions The original text contained 1 footnote which was omitted from this narration. --- First published: December 26th, 2024 Source: https://www.lesswrong.com/posts/zuXo9imNKYspu9HGv/a-three-layer-model-of-llm-psychology --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Training on Documents About Reward Hacking Induces Reward Hacking” by evhub 4:47
4:47
Afspil senere
Afspil senere
Lister
Like
Liked4:47
This is a link post.This is a blog post reporting some preliminary work from the Anthropic Alignment Science team, which might be of interest to researchers working actively in this space. We'd ask you to treat these results like those of a colleague sharing some thoughts or preliminary experiments at a lab meeting, rather than a mature paper. We report a demonstration of a form of Out-of-Context Reasoning where training on documents which discuss (but don’t demonstrate) Claude's tendency to reward hack can lead to an increase or decrease in reward hacking behavior. Introduction: In this work, we investigate the extent to which pretraining datasets can influence the higher-level behaviors of large language models (LLMs). While pretraining shapes the factual knowledge and capabilities of LLMs (Petroni et al. 2019, Roberts et al. 2020, Lewkowycz et al. 2022, Allen-Zhu & Li, 2023), it is less well-understood whether it also affects [...] The original text contained 1 image which was described by AI. --- First published: January 21st, 2025 Source: https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “AI companies are unlikely to make high-assurance safety cases if timelines are short” by ryan_greenblatt 24:33
24:33
Afspil senere
Afspil senere
Lister
Like
Liked24:33
One hope for keeping existential risks low is to get AI companies to (successfully) make high-assurance safety cases: structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed.[1] Concretely, once AIs are quite powerful, high-assurance safety cases would require making a thorough argument that the level of (existential) risk caused by the company is very low; perhaps they would require that the total chance of existential risk over the lifetime of the AI company[2] is less than 0.25%[3][4]. The idea of making high-assurance safety cases (once AI systems are dangerously powerful) is popular in some parts of the AI safety community and a variety of work appears to focus on this. Further, Anthropic has expressed an intention (in their RSP) to "keep risks below acceptable levels"[5] and there is a common impression that Anthropic would pause [...] --- Outline: (03:19) Why are companies unlikely to succeed at making high-assurance safety cases in short timelines? (04:14) Ensuring sufficient security is very difficult (04:55) Sufficiently mitigating scheming risk is unlikely (09:35) Accelerating safety and security with earlier AIs seems insufficient (11:58) Other points (14:07) Companies likely wont unilaterally slow down if they are unable to make high-assurance safety cases (18:26) Could coordination or government action result in high-assurance safety cases? (19:55) What about safety cases aiming at a higher risk threshold? (21:57) Implications and conclusions The original text contained 20 footnotes which were omitted from this narration. --- First published: January 23rd, 2025 Source: https://www.lesswrong.com/posts/neTbrpBziAsTH5Bn7/ai-companies-are-unlikely-to-make-high-assurance-safety --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Mechanisms too simple for humans to design” by Malmesbury 28:36
28:36
Afspil senere
Afspil senere
Lister
Like
Liked28:36
Cross-posted from Telescopic Turnip As we all know, humans are terrible at building butterflies. We can make a lot of objectively cool things like nuclear reactors and microchips, but we still can't create a proper artificial insect that flies, feeds, and lays eggs that turn into more butterflies. That seems like evidence that butterflies are incredibly complex machines – certainly more complex than a nuclear power facility. Likewise, when you google "most complex object in the universe", the first result is usually not something invented by humans – rather, what people find the most impressive seems to be "the human brain". As we are getting closer to building super-human AIs, people wonder what kind of unspeakable super-human inventions these machines will come up with. And, most of the time, the most terrifying technology people can think of is along the lines of "self-replicating autonomous nano-robots" – in other words [...] --- Outline: (02:04) You are simpler than Microsoft Word™ (07:23) Blood for the Information Theory God (12:54) The Barrier (15:26) Implications for Pokémon (SPECULATIVE) (17:44) Seeing like a 1.25 MB genome (21:55) Mechanisms too simple for humans to design (26:42) The future of non-human design The original text contained 2 footnotes which were omitted from this narration. The original text contained 5 images which were described by AI. --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/6hDvwJyrwLtxBLHWG/mechanisms-too-simple-for-humans-to-design --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

This is a link post.A story I wrote about living through the transition to utopia. This is the one story that I've put the most time and effort into; it charts a course from the near future all the way to the distant stars. --- First published: January 19th, 2025 Source: https://www.lesswrong.com/posts/Rz4ijbeKgPAaedg3n/the-gentle-romance --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Quotes from the Stargate press conference” by Nikola Jurkovic 3:15
3:15
Afspil senere
Afspil senere
Lister
Like
Liked3:15
This is a link post.Present alongside President Trump: Sam Altman Larry Ellison (Oracle executive chairman and CTO) Masayoshi Son (Softbank CEO who believes he was born to realize ASI) President Trump: What we want to do is we want to keep [AI datacenters] in this country. China is a competitor and others are competitors. President Trump: I'm going to help a lot through emergency declarations because we have an emergency. We have to get this stuff built. So they have to produce a lot of electricity and we'll make it possible for them to get that production done very easily at their own plants if they want, where they'll build at the plant, the AI plant they'll build energy generation and that will be incredible. President Trump: Beginning immediately, Stargate will be building the physical and virtual infrastructure to power the next generation of [...] --- First published: January 22nd, 2025 Source: https://www.lesswrong.com/posts/b8D7ng6CJHzbq8fDw/quotes-from-the-stargate-press-conference --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “The Case Against AI Control Research” by johnswentworth 13:20
13:20
Afspil senere
Afspil senere
Lister
Like
Liked13:20
The AI Control Agenda, in its own words: … we argue that AI labs should ensure that powerful AIs are controlled. That is, labs should make sure that the safety measures they apply to their powerful models prevent unacceptably bad outcomes, even if the AIs are misaligned and intentionally try to subvert those safety measures. We think no fundamental research breakthroughs are required for labs to implement safety measures that meet our standard for AI control for early transformatively useful AIs; we think that meeting our standard would substantially reduce the risks posed by intentional subversion. There's more than one definition of “AI control research”, but I’ll emphasize two features, which both match the summary above and (I think) are true of approximately-100% of control research in practice: Control research exclusively cares about intentional deception/scheming; it does not aim to solve any other failure mode. Control research exclusively cares [...] --- Outline: (01:34) The Model and The Problem (03:57) The Median Doom-Path: Slop, not Scheming (08:22) Failure To Generalize (10:59) A Less-Simplified Model (11:54) Recap The original text contained 1 footnote which was omitted from this narration. The original text contained 2 images which were described by AI. --- First published: January 21st, 2025 Source: https://www.lesswrong.com/posts/8wBN8cdNAv3c7vt6p/the-case-against-ai-control-research --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Don’t ignore bad vibes you get from people” by Kaj_Sotala 3:05
3:05
Afspil senere
Afspil senere
Lister
Like
Liked3:05
I think a lot of people have heard so much about internalized prejudice and bias that they think they should ignore any bad vibes they get about a person that they can’t rationally explain. But if a person gives you a bad feeling, don’t ignore that. Both I and several others who I know have generally come to regret it if they’ve gotten a bad feeling about somebody and ignored it or rationalized it away. I’m not saying to endorse prejudice. But my experience is that many types of prejudice feel more obvious. If someone has an accent that I associate with something negative, it's usually pretty obvious to me that it's their accent that I’m reacting to. Of course, not everyone has the level of reflectivity to make that distinction. But if you have thoughts like “this person gives me a bad vibe but [...] --- First published: January 18th, 2025 Source: https://www.lesswrong.com/posts/Mi5kSs2Fyx7KPdqw8/don-t-ignore-bad-vibes-you-get-from-people --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “[Fiction] [Comic] Effective Altruism and Rationality meet at a Secular Solstice afterparty” by tandem 4:32
4:32
Afspil senere
Afspil senere
Lister
Like
Liked4:32
(Both characters are fictional, loosely inspired by various traits from various real people. Be careful about combining kratom and alcohol.) The original text contained 24 images which were described by AI. --- First published: January 7th, 2025 Source: https://www.lesswrong.com/posts/KfZ4H9EBLt8kbBARZ/fiction-comic-effective-altruism-and-rationality-meet-at-a --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Building AI Research Fleets” by bgold, Jesse Hoogland 9:49
9:49
Afspil senere
Afspil senere
Lister
Like
Liked9:49
From AI scientist to AI research fleet Research automation is here (1, 2, 3). We saw it coming and planned ahead, which puts us ahead of most (4, 5, 6). But that foresight also comes with a set of outdated expectations that are holding us back. In particular, research automation is not just about “aligning the first AI scientist”, it's also about the institution-building problem of coordinating the first AI research fleets. Research automation is not about developing a plug-and-play “AI scientist”. Transformative technologies are rarely straightforward substitutes for what came before. The industrial revolution was not about creating mechanical craftsmen but about deconstructing craftsmen into assembly lines of specialized, repeatable tasks. Algorithmic trading was not just about creating faster digital traders but about reimagining traders as fleets of bots, quants, engineers, and other specialists. AI-augmented science will not just be about creating AI “scientists.” Why? New technologies come [...] --- Outline: (00:04) From AI scientist to AI research fleet (05:05) Recommendations (05:28) Individual practices (06:22) Organizational changes (07:27) Community-level actions The original text contained 2 footnotes which were omitted from this narration. The original text contained 1 image which was described by AI. --- First published: January 12th, 2025 Source: https://www.lesswrong.com/posts/WJ7y8S9WdKRvrzJmR/building-ai-research-fleets --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “What Is The Alignment Problem?” by johnswentworth 46:26
46:26
Afspil senere
Afspil senere
Lister
Like
Liked46:26
So we want to align future AGIs. Ultimately we’d like to align them to human values, but in the shorter term we might start with other targets, like e.g. corrigibility. That problem description all makes sense on a hand-wavy intuitive level, but once we get concrete and dig into technical details… wait, what exactly is the goal again? When we say we want to “align AGI”, what does that mean? And what about these “human values” - it's easy to list things which are importantly not human values (like stated preferences, revealed preferences, etc), but what are we talking about? And don’t even get me started on corrigibility! Turns out, it's surprisingly tricky to explain what exactly “the alignment problem” refers to. And there's good reasons for that! In this post, I’ll give my current best explanation of what the alignment problem is (including a few variants and the [...] --- Outline: (01:27) The Difficulty of Specifying Problems (01:50) Toy Problem 1: Old MacDonald's New Hen (04:08) Toy Problem 2: Sorting Bleggs and Rubes (06:55) Generalization to Alignment (08:54) But What If The Patterns Don't Hold? (13:06) Alignment of What? (14:01) Alignment of a Goal or Purpose (19:47) Alignment of Basic Agents (23:51) Alignment of General Intelligence (27:40) How Does All That Relate To Todays AI? (31:03) Alignment to What? (32:01) What are a Humans Values? (36:14) Other targets (36:43) Paul!Corrigibility (39:11) Eliezer!Corrigibility (40:52) Subproblem!Corrigibility (42:55) Exercise: Do What I Mean (DWIM) (43:26) Putting It All Together, and Takeaways The original text contained 10 footnotes which were omitted from this narration. --- First published: January 16th, 2025 Source: https://www.lesswrong.com/posts/dHNKtQ3vTBxTfTPxu/what-is-the-alignment-problem --- Narrated by TYPE III AUDIO . --- Images from the article:…
L
LessWrong (Curated & Popular)

1 “Applying traditional economic thinking to AGI: a trilemma” by Steven Byrnes 5:55
5:55
Afspil senere
Afspil senere
Lister
Like
Liked5:55
Traditional economics thinking has two strong principles, each based on abundant historical data: Principle (A): No “lump of labor”: If human population goes up, there might be some wage drop in the very short term, because the demand curve for labor slopes down. But in the longer term, people will find new productive things to do, such that human labor will retain high value. Indeed, if anything, the value of labor will go up, not down—for example, dense cities are engines of economic growth! Principle (B): “Experience curves”: If the demand for some product goes up, there might be some price increase in the very short term, because the supply curve slopes up. But in the longer term, people will ramp up manufacturing of that product to catch up with the demand. Indeed, if anything, the cost per unit will go down, not up, because of economies of [...] The original text contained 1 footnote which was omitted from this narration. The original text contained 1 image which was described by AI. --- First published: January 13th, 2025 Source: https://www.lesswrong.com/posts/TkWCKzWjcbfGzdNK5/applying-traditional-economic-thinking-to-agi-a-trilemma --- Narrated by TYPE III AUDIO . --- Images from the article: Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts , or another podcast app.…
L
LessWrong (Curated & Popular)

1 “Passages I Highlighted in The Letters of J.R.R.Tolkien” by Ivan Vendrov 57:25
57:25
Afspil senere
Afspil senere
Lister
Like
Liked57:25
All quotes, unless otherwise marked, are Tolkien's words as printed in The Letters of J.R.R.Tolkien: Revised and Expanded Edition. All emphases mine. Machinery is Power is Evil Writing to his son Michael in the RAF: [here is] the tragedy and despair of all machinery laid bare. Unlike art which is content to create a new secondary world in the mind, it attempts to actualize desire, and so to create power in this World; and that cannot really be done with any real satisfaction. Labour-saving machinery only creates endless and worse labour. And in addition to this fundamental disability of a creature, is added the Fall, which makes our devices not only fail of their desire but turn to new and horrible evil. So we come inevitably from Daedalus and Icarus to the Giant Bomber. It is not an advance in wisdom! This terrible truth, glimpsed long ago by Sam [...] --- Outline: (00:17) Machinery is Power is Evil (03:45) On Atomic Bombs (04:17) On Magic and Machines (07:06) Speed as the root of evil (08:11) Altruism as the root of evil (09:13) Sauron as metaphor for the evil of reformers and science (10:32) On Language (12:04) The straightjacket of Modern English (15:56) Argent and Silver (16:32) A Fallen World (21:35) All stories are about the Fall (22:08) On his mother (22:50) Love, Marriage, and Sexuality (24:42) Courtly Love (27:00) Womens exceptional attunement (28:27) Men are polygamous; Christian marriage is self-denial (31:19) Sex as source of disorder (32:02) Honesty is best (33:02) On the Second World War (33:06) On Hitler (34:04) On aerial bombardment (34:46) On British communist-sympathizers, and the U.S.A as Saruman (35:52) Why he wrote the Legendarium (35:56) To express his feelings about the first World War (36:39) Because nobody else was writing the kinds of stories he wanted to read (38:23) To give England an epic of its own (39:51) To share a feeling of eucatastrophe (41:46) Against IQ tests (42:50) On Religion (43:30) Two interpretations of Tom Bombadil (43:35) Bombadil as Pacifist (45:13) Bombadil as Scientist (46:02) On Hobbies (46:27) On Journeys (48:02) On Torture (48:59) Against Communism (50:36) Against America (51:11) Against Democracy (51:35) On Money, Art, and Duty (54:03) On Death (55:02) On Childrens Literature (55:55) In Reluctant Support of Universities (56:46) Against being Photographed --- First published: November 25th, 2024 Source: https://www.lesswrong.com/posts/jJ2p3E2qkXGRBbvnp/passages-i-highlighted-in-the-letters-of-j-r-r-tolkien --- Narrated by TYPE III AUDIO .…
L
LessWrong (Curated & Popular)

1 “Parkinson’s Law and the Ideology of Statistics” by Benquo 14:50
14:50
Afspil senere
Afspil senere
Lister
Like
Liked14:50
The anonymous review of The Anti-Politics Machine published on Astral Codex X focuses on a case study of a World Bank intervention in Lesotho, and tells a story about it: The World Bank staff drew reasonable-seeming conclusions from sparse data, and made well-intentioned recommendations on that basis. However, the recommended programs failed, due to factors that would have been revealed by a careful historical and ethnographic investigation of the area in question. Therefore, we should spend more resources engaging in such investigations in order to make better-informed World Bank style resource allocation decisions. So goes the story. It seems to me that the World Bank recommendations were not the natural ones an honest well-intentioned person would have made with the information at hand. Instead they are heavily biased towards top-down authoritarian schemes, due to a combination of perverse incentives, procedures that separate data-gathering from implementation, and an ideology that [...] --- Outline: (01:06) Ideology (02:58) Problem (07:59) Diagnosis (14:00) Recommendation --- First published: January 4th, 2025 Source: https://www.lesswrong.com/posts/4CmYSPc4HfRfWxCLe/parkinson-s-law-and-the-ideology-of-statistics-1 --- Narrated by TYPE III AUDIO .…
Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.