Gå offline med appen Player FM !
LW - AI #82: The Governor Ponders by Zvi
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 440808868 series 3314709
The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow.
Meanwhile, we await Newsom's decision on SB 1047. The smart money was always that Gavin Newsom would make us wait before offering his verdict on SB 1047. It's a big decision. Don't rush him. In the meantime, what hints he has offered suggest he's buying into some of the anti-1047 talking points. I'm offering a letter to him here based on his comments, if you have any way to help convince him now would be the time to use that. But mostly, it's up to him now.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. Apply for unemployment.
4. Language Models Don't Offer Mundane Utility. How to avoid the blame.
5. Deepfaketown and Botpocalypse Soon. A social network of you plus bots.
6. They Took Our Jobs. Not much impact yet, but software jobs still hard to find.
7. Get Involved. Lighthaven Eternal September, individual rooms for rent.
8. Introducing. Automated scientific literature review.
9. In Other AI News. OpenAI creates independent board to oversee safety.
10. Quiet Speculations. Who is preparing for the upside? Or appreciating it now?
11. Intelligent Design. Intelligence. It's a real thing.
12. SB 1047: The Governor Ponders. They got to him, but did they get to him enough?
13. Letter to Newsom. A final summary, based on Newsom's recent comments.
14. The Quest for Sane Regulations. How should we update based on o1?
15. Rhetorical Innovation. The warnings will continue, whether or not anyone listens.
16. Claude Writes Short Stories. It is pondering what you might expect it to ponder.
17. Questions of Sentience. Creating such things should not be taken lightly.
18. People Are Worried About AI Killing Everyone. The endgame is what matters.
19. The Lighter Side. You can never be sure.
Language Models Offer Mundane Utility
Arbitrate your Nevada unemployment benefits appeal, using Gemini. This should solve the backlog of 10k+ cases, and also I expect higher accuracy than the existing method, at least until we see attempts to game the system. Then it gets fun. That's also job retraining.
o1 usage limit raised to 50 messages per day for o1-mini, 50 per week for o1-preview.
o1 can do multiplication reliably up to about 46 digits, andabout 50% accurately up through about 810, a huge leap from gpt-4o, although Colin Fraser reports 4o can be made better tat this than one would expect.
o1 is much better than 4o at evaluating medical insurance claims, and determining whether requests for care should be approved, especially in terms of executing existing guidelines, and automating administrative tasks. It seems like a clear step change in usefulness in practice.
The claim is that being sassy and juicy and bitchy improves Claude Instant numerical reasoning. What I actually see here is that it breaks Claude Instant out of trick questions. Where Claude would previously fall into a trap, you have it fall back on what is effectively 'common sense,' and it starts getting actually easy questions right.
Language Models Don't Offer Mundane Utility
A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control. However, humans often demand manual mode be available to them, allowing humans to override the AI, even when it doesn't make any practical sense to offer this. And then, if the human can in theory switch to manual mode and override the AI, blame to the human returns, even when the human exerting that control was clearly impractical in context.
The top example here is self-driving cars, and blame for car crashes.
The results suggest that the human thirst for ill...
2437 episoder
Fetch error
Hmmm there seems to be a problem fetching this series right now. Last successful fetch was on October 09, 2024 12:46 ()
What now? This series will be checked again in the next hour. If you believe it should be working, please verify the publisher's feed link below is valid and includes actual episode links. You can contact support to request the feed be immediately fetched.
Manage episode 440808868 series 3314709
The big news of the week was of course OpenAI releasing their new model o1. If you read one post this week, read that one. Everything else is a relative sideshow.
Meanwhile, we await Newsom's decision on SB 1047. The smart money was always that Gavin Newsom would make us wait before offering his verdict on SB 1047. It's a big decision. Don't rush him. In the meantime, what hints he has offered suggest he's buying into some of the anti-1047 talking points. I'm offering a letter to him here based on his comments, if you have any way to help convince him now would be the time to use that. But mostly, it's up to him now.
Table of Contents
1. Introduction.
2. Table of Contents.
3. Language Models Offer Mundane Utility. Apply for unemployment.
4. Language Models Don't Offer Mundane Utility. How to avoid the blame.
5. Deepfaketown and Botpocalypse Soon. A social network of you plus bots.
6. They Took Our Jobs. Not much impact yet, but software jobs still hard to find.
7. Get Involved. Lighthaven Eternal September, individual rooms for rent.
8. Introducing. Automated scientific literature review.
9. In Other AI News. OpenAI creates independent board to oversee safety.
10. Quiet Speculations. Who is preparing for the upside? Or appreciating it now?
11. Intelligent Design. Intelligence. It's a real thing.
12. SB 1047: The Governor Ponders. They got to him, but did they get to him enough?
13. Letter to Newsom. A final summary, based on Newsom's recent comments.
14. The Quest for Sane Regulations. How should we update based on o1?
15. Rhetorical Innovation. The warnings will continue, whether or not anyone listens.
16. Claude Writes Short Stories. It is pondering what you might expect it to ponder.
17. Questions of Sentience. Creating such things should not be taken lightly.
18. People Are Worried About AI Killing Everyone. The endgame is what matters.
19. The Lighter Side. You can never be sure.
Language Models Offer Mundane Utility
Arbitrate your Nevada unemployment benefits appeal, using Gemini. This should solve the backlog of 10k+ cases, and also I expect higher accuracy than the existing method, at least until we see attempts to game the system. Then it gets fun. That's also job retraining.
o1 usage limit raised to 50 messages per day for o1-mini, 50 per week for o1-preview.
o1 can do multiplication reliably up to about 46 digits, andabout 50% accurately up through about 810, a huge leap from gpt-4o, although Colin Fraser reports 4o can be made better tat this than one would expect.
o1 is much better than 4o at evaluating medical insurance claims, and determining whether requests for care should be approved, especially in terms of executing existing guidelines, and automating administrative tasks. It seems like a clear step change in usefulness in practice.
The claim is that being sassy and juicy and bitchy improves Claude Instant numerical reasoning. What I actually see here is that it breaks Claude Instant out of trick questions. Where Claude would previously fall into a trap, you have it fall back on what is effectively 'common sense,' and it starts getting actually easy questions right.
Language Models Don't Offer Mundane Utility
A key advantage of using an AI is that you can no longer be blamed for an outcome out of your control. However, humans often demand manual mode be available to them, allowing humans to override the AI, even when it doesn't make any practical sense to offer this. And then, if the human can in theory switch to manual mode and override the AI, blame to the human returns, even when the human exerting that control was clearly impractical in context.
The top example here is self-driving cars, and blame for car crashes.
The results suggest that the human thirst for ill...
2437 episoder
All episodes
×Velkommen til Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.