Artwork

Indhold leveret af Machine Learning Street Talk (MLST). Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Machine Learning Street Talk (MLST) eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

Nicholas Carlini (Google DeepMind)

1:21:15
 
Del
 

Manage episode 463134694 series 2803422
Indhold leveret af Machine Learning Street Talk (MLST). Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Machine Learning Street Talk (MLST) eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0

TOC:

1. ML Security Fundamentals

[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals

[00:03:04] 1.2 ML Security Vulnerabilities and System Design

[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior

[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects

2. Model Evaluation and Research Methods

[00:19:40] 2.1 Model Reasoning and Evaluation Metrics

[00:24:37] 2.2 Security Research Philosophy and Methodology

[00:27:50] 2.3 Security Disclosure Norms and Community Differences

3. LLM Applications and Best Practices

[00:44:29] 3.1 Practical LLM Applications and Productivity Gains

[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies

[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code

4. Advanced LLM Research and Architecture

[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience

[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges

[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction

REFS:

[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/

[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/

[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644

[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud

[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html

[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675

[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774

[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation

[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241

[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html

[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html

[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878

[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html

[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634

[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

  continue reading

238 episoder

Artwork
iconDel
 
Manage episode 463134694 series 2803422
Indhold leveret af Machine Learning Street Talk (MLST). Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af Machine Learning Street Talk (MLST) eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.

Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.

SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?

Goto https://tufalabs.ai/

***

Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0

TOC:

1. ML Security Fundamentals

[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals

[00:03:04] 1.2 ML Security Vulnerabilities and System Design

[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior

[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects

2. Model Evaluation and Research Methods

[00:19:40] 2.1 Model Reasoning and Evaluation Metrics

[00:24:37] 2.2 Security Research Philosophy and Methodology

[00:27:50] 2.3 Security Disclosure Norms and Community Differences

3. LLM Applications and Best Practices

[00:44:29] 3.1 Practical LLM Applications and Productivity Gains

[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies

[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code

4. Advanced LLM Research and Architecture

[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience

[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges

[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction

REFS:

[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/

[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/

[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644

[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud

[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html

[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675

[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774

[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation

[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241

[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html

[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html

[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878

[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html

[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634

[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

  continue reading

238 episoder

Alle episoder

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning

Lyt til dette show, mens du udforsker
Afspil