Bedste LessWrong-podcasts (2025)

1
“Will Jesus Christ return in an election year?” by Eric Neyman 7:48

10h ago7:48

7:48

Thanks to Jesse Richardson for discussion. Polymarket asks: will Jesus Christ return in 2025? In the three days since the market opened, traders have wagered over $100,000 on this question. The market traded as high as 5%, and is now stably trading at 3%. Right now, if you wanted to, you could place a bet that Jesus Christ will not return this year…

1
“Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda 6:58

2d ago6:58

6:58

TL;DR Having a good research track record is some evidence of good big-picture takes, but it's weak evidence. Strategic thinking is hard, and requires different skills. But people often conflate these skills, leading to excessive deference to researchers in the field, without evidence that that person is good at strategic thinking specifically. Int…

1
“Intention to Treat” by Alicorn 3:45

3d ago3:45

3:45

When my son was three, we enrolled him in a study of a vision condition that runs in my family. They wanted us to put an eyepatch on him for part of each day, with a little sensor object that went under the patch and detected body heat to record when we were doing it. They paid for his first pair of glasses and all the eye doctor visits to check up…

1
“On the Rationality of Deterring ASI” by Dan H 9:03

3d ago9:03

9:03

I’m releasing a new paper “Superintelligence Strategy” alongside Eric Schmidt (formerly Google), and Alexandr Wang (Scale AI). Below is the executive summary, followed by additional commentary highlighting portions of the paper which might be relevant to this collection of readers. Executive Summary Rapid advances in AI are poised to reshape nearly…

1
[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman 1:19

1d ago1:19

1:19

This is a link post. Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under a decade, we will see AI agents that can in…

1
“I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy 2:17

6d ago2:17

2:17

I have, over the last year, become fairly well-known in a small corner of the internet tangentially related to AI. As a result, I've begun making what I would have previously considered astronomical amounts of money: several hundred thousand dollars per month in personal income. This has been great, obviously, and the funds have alleviated a fair n…

1
“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn 18:05

7d ago18:05

18:05

Note: this is a research note based on observations from evaluating Claude Sonnet 3.7. We’re sharing the results of these ‘work-in-progress’ investigations as we think they are timely and will be informative for other evaluators and decision-makers. The analysis is less rigorous than our standard for a published paper. Summary We monitor Sonnet's r…

1
“Levels of Friction” by Zvi 22:43

7d ago22:43

22:43

Scott Alexander famously warned us to Beware Trivial Inconveniences. When you make a thing easy to do, people often do vastly more of it. When you put up barriers, even highly solvable ones, people often do vastly less. Let us take this seriously, and carefully choose what inconveniences to put where. Let us also take seriously that when AI or othe…

1
“Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas 6:58

7d ago6:58

6:58

There's this popular trope in fiction about a character being mind controlled without losing awareness of what's happening. Think Jessica Jones, The Manchurian Candidate or Bioshock. The villain uses some magical technology to take control of your brain - but only the part of your brain that's responsible for motor control. You remain conscious and…

1
“Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg 12:22

8d ago12:22

12:22

This research was conducted at AE Studio and supported by the AI Safety Grants programme administered by Foresight Institute with additional support from AE Studio. Summary In this post, we summarise the main experimental results from our new paper, "Towards Safe and Honest AI Agents with Neural Self-Other Overlap", which we presented orally at the…

1
“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam ... 24:14

9d ago24:14

24:14

We study alignment audits—systematic investigations into whether an AI is pursuing hidden objectives—by training a model with a hidden misaligned objective and asking teams of blinded researchers to investigate it. This paper was a collaboration between the Anthropic Alignment Science and Interpretability teams. Abstract We study the feasibility of…

1
“The Most Forbidden Technique” by Zvi 32:12

11d ago32:12

32:12

The Most Forbidden Technique is training an AI using interpretability techniques. An AI produces a final output [X] via some method [M]. You can analyze [M] using technique [T], to learn what the AI is up to. You could train on that. Never do that. You train on [X]. Only [X]. Never [M], never [T]. Why? Because [T] is how you figure out when the mod…

1
“Trojan Sky” by Richard_Ngo 22:28

12d ago22:28

22:28

You learn the rules as soon as you’re old enough to speak. Don’t talk to jabberjays. You recite them as soon as you wake up every morning. Keep your eyes off screensnakes. Your mother chooses a dozen to quiz you on each day before you’re allowed lunch. Glitchers aren’t human any more; if you see one, run. Before you sleep, you run through the whole…

1
“OpenAI:” by Daniel Kokotajlo 7:21

14d ago7:21

7:21

Exciting Update: OpenAI has released this blog post and paper which makes me very happy. It's basically the first steps along the research agenda I sketched out here. tl;dr: 1.) They notice that their flagship reasoning models do sometimes intentionally reward hack, e.g. literally say "Let's hack" in the CoT and then proceed to hack the evaluation …

1
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis 7:16

16d ago7:16

7:16

LLM-based coding-assistance tools have been out for ~2 years now. Many developers have been reporting that this is dramatically increasing their productivity, up to 5x'ing/10x'ing it. It seems clear that this multiplier isn't field-wide, at least. There's no corresponding increase in output, after all. This would make sense. If you're doing anythin…

1
“So how well is Claude playing Pokémon?” by Julian Bradshaw 9:05

16d ago9:05

9:05

Background: After the release of Claude 3.7 Sonnet,[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The livestream is still going right now. TL:DR: So, how's it doing? Well, pretty badly. Worse than a 6-year-old would, definitely not PhD-level. Digging in But wait! you say. Didn't Anthropic publish a benchm…

1
“Methods for strong human germline engineering” by TsviBT 0:18

18d ago0:18

0:18

Note: an audio narration is not available for this article. Please see the original text. The original text contained 169 footnotes which were omitted from this narration. The original text contained 79 images which were described by AI. --- First published: March 3rd, 2025 Source: https://www.lesswrong.com/posts/2w6hjptanQ3cDyDw7/methods-for-stron…

1
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth 3:49

19d ago3:49

3:49

In a recent post, Cole Wyeth makes a bold claim: . . . there is one crucial test (yes this is a crux) that LLMs have not passed. They have never done anything important. They haven't proven any theorems that anyone cares about. They haven't written anything that anyone will want to read in ten years (or even one year). Despite apparently memorizing…

1
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis 18:47

19d ago18:47

18:47

This isn't really a "timeline", as such – I don't know the timings – but this is my current, fairly optimistic take on where we're heading. I'm not fully committed to this model yet: I'm still on the lookout for more agents and inference-time scaling later this year. But Deep Research, Claude 3.7, Claude Code, Grok 3, and GPT-4.5 have turned out la…

1
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard 17:33

20d ago17:33

17:33

This is a critique of How to Make Superbabies on LessWrong. Disclaimer: I am not a geneticist[1], and I've tried to use as little jargon as possible. so I used the word mutation as a stand in for SNP (single nucleotide polymorphism, a common type of genetic variation). Background The Superbabies article has 3 sections, where they show: Why: We shou…

1
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout 1:51

20d ago1:51

1:51

This is a link post.Your AI's training data might make it more “evil” and more able to circumvent your security, monitoring, and control measures. Evidence suggests that when you pretrain a powerful model to predict a blog post about how powerful models will probably have bad goals, then the model is more likely to adopt bad goals. I discuss ways t…

1
“Judgements: Merging Prediction & Evidence” by abramdemski 11:13

24d ago11:13

11:13

I recently wrote about complete feedback, an idea which I think is quite important for AI safety. However, my note was quite brief, explaining the idea only to my closest research-friends. This post aims to bridge one of the inferential gaps to that idea. I also expect that the perspective-shift described here has some value on its own. In classica…

1
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis 12:31

27d ago12:31

12:31

First, let me quote my previous ancient post on the topic: Effective Strategies for Changing Public Opinion The titular paper is very relevant here. I'll summarize a few points. The main two forms of intervention are persuasion and framing. Persuasion is, to wit, an attempt to change someone's set of beliefs, either by introducing new ones or by ch…

1
“Power Lies Trembling: a three-book review” by Richard_Ngo 27:11

27d ago27:11

27:11

In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They’re huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality a…

1
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans 7:58

27d ago7:58

7:58

This is the abstract and introduction of our new paper. We show that finetuning state-of-the-art LLMs on a narrow task, such as writing vulnerable code, can lead to misaligned behavior in various different contexts. We don't fully understand that phenomenon. Authors: Jan Betley*, Daniel Tan*, Niels Warncke*, Anna Sztyber-Betley, Martín Soto, Xuchan…

Podcasts der er værd at lytte til

LessWrong Podcasts

Podcasts der er værd at lytte til

1
LessWrong (Curated & Popular)

LessWrong

1
“Will Jesus Christ return in an election year?” by Eric Neyman 7:48

1
“Good Research Takes are Not Sufficient for Good Strategic Takes” by Neel Nanda 6:58

1
“Intention to Treat” by Alicorn 3:45

1
“On the Rationality of Deterring ASI” by Dan H 9:03

1
[Linkpost] “METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman 1:19

1
“I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?” by shrimpy 2:17

1
“Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations” by Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer, Marius Hobbhahn 18:05

1
“Levels of Friction” by Zvi 22:43

1
“Why White-Box Redteaming Makes Me Feel Weird” by Zygi Straznickas 6:58

1
“Reducing LLM deception at scale with self-other overlap fine-tuning” by Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Mike Vaiana, Cameron Berg 12:22

1
“Auditing language models for hidden objectives” by Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Akbir Khan, Euan Ong, Christopher Olah, Fabien Roger, Meg, Drake Thomas, Adam ... 24:14

1
“The Most Forbidden Technique” by Zvi 32:12

1
“Trojan Sky” by Richard_Ngo 22:28

1
“OpenAI:” by Daniel Kokotajlo 7:21

1
“How Much Are LLMs Actually Boosting Real-World Programmer Productivity?” by Thane Ruthenis 7:16

1
“So how well is Claude playing Pokémon?” by Julian Bradshaw 9:05

1
“Methods for strong human germline engineering” by TsviBT 0:18

1
“Have LLMs Generated Novel Insights?” by abramdemski, Cole Wyeth 3:49

1
“A Bear Case: My Predictions Regarding AI Progress” by Thane Ruthenis 18:47

1
“Statistical Challenges with Making Super IQ babies” by Jan Christian Refsgaard 17:33

1
“Self-fulfilling misalignment data might be poisoning our AI models” by TurnTrout 1:51

1
“Judgements: Merging Prediction & Evidence” by abramdemski 11:13

1
“The Sorry State of AI X-Risk Advocacy, and Thoughts on Doing Better” by Thane Ruthenis 12:31

1
“Power Lies Trembling: a three-book review” by Richard_Ngo 27:11

1
“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs” by Jan Betley, Owain_Evans 7:58

Hurtig referencevejledning