Artwork

Indhold leveret af NLP Highlights and Allen Institute for Artificial Intelligence. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af NLP Highlights and Allen Institute for Artificial Intelligence eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

128 - Dynamic Benchmarking, with Douwe Kiela

47:00
 
Del
 

Manage episode 295350568 series 1452120
Indhold leveret af NLP Highlights and Allen Institute for Artificial Intelligence. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af NLP Highlights and Allen Institute for Artificial Intelligence eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
We discussed adversarial dataset construction and dynamic benchmarking in this episode with Douwe Kiela, a research scientist at Facebook AI Research who has been working on a dynamic benchmarking platform called Dynabench. Dynamic benchmarking tries to address the issue of many recent datasets getting solved with little progress being made towards solving the corresponding tasks. The idea is to involve models in the data collection loop to encourage humans to provide data points that are hard for those models, thereby continuously collecting harder datasets. We discussed the details of this approach, and some potential caveats. We also discussed dynamic leaderboards, a recent addition to Dynabench that rank systems based on their utility given specific use cases. Papers discussed in this episode: 1. Dynabench: Rethinking Benchmarking in NLP (https://www.semanticscholar.org/paper/Dynabench%3A-Rethinking-Benchmarking-in-NLP-Kiela-Bartolo/77a096d80eb4dd4ccd103d1660c5a5498f7d026b) 2. Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking (https://www.semanticscholar.org/paper/Dynaboard%3A-An-Evaluation-As-A-Service-Platform-for-Ma-Ethayarajh/d25bb256e5b69f769a429750217b0d9ec1cf4d86) 3. Adversarial NLI: A New Benchmark for Natural Language Understanding (https://www.semanticscholar.org/paper/Adversarial-NLI%3A-A-New-Benchmark-for-Natural-Nie-Williams/9d87300892911275520a4f7a5e5abf4f1c002fec) 4. DynaSent: A Dynamic Benchmark for Sentiment Analysis (https://www.semanticscholar.org/paper/DynaSent%3A-A-Dynamic-Benchmark-for-Sentiment-Potts-Wu/284dfcf7f25ca87b2db235c6cdc848b4143d3923) Douwe Kiela's webpage: https://douwekiela.github.io/ The hosts for this episode are Pradeep Dasigi and Alexis Ross.
  continue reading

145 episoder

Artwork
iconDel
 
Manage episode 295350568 series 1452120
Indhold leveret af NLP Highlights and Allen Institute for Artificial Intelligence. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af NLP Highlights and Allen Institute for Artificial Intelligence eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
We discussed adversarial dataset construction and dynamic benchmarking in this episode with Douwe Kiela, a research scientist at Facebook AI Research who has been working on a dynamic benchmarking platform called Dynabench. Dynamic benchmarking tries to address the issue of many recent datasets getting solved with little progress being made towards solving the corresponding tasks. The idea is to involve models in the data collection loop to encourage humans to provide data points that are hard for those models, thereby continuously collecting harder datasets. We discussed the details of this approach, and some potential caveats. We also discussed dynamic leaderboards, a recent addition to Dynabench that rank systems based on their utility given specific use cases. Papers discussed in this episode: 1. Dynabench: Rethinking Benchmarking in NLP (https://www.semanticscholar.org/paper/Dynabench%3A-Rethinking-Benchmarking-in-NLP-Kiela-Bartolo/77a096d80eb4dd4ccd103d1660c5a5498f7d026b) 2. Dynaboard: An Evaluation-As-A-Service Platform for Holistic Next-Generation Benchmarking (https://www.semanticscholar.org/paper/Dynaboard%3A-An-Evaluation-As-A-Service-Platform-for-Ma-Ethayarajh/d25bb256e5b69f769a429750217b0d9ec1cf4d86) 3. Adversarial NLI: A New Benchmark for Natural Language Understanding (https://www.semanticscholar.org/paper/Adversarial-NLI%3A-A-New-Benchmark-for-Natural-Nie-Williams/9d87300892911275520a4f7a5e5abf4f1c002fec) 4. DynaSent: A Dynamic Benchmark for Sentiment Analysis (https://www.semanticscholar.org/paper/DynaSent%3A-A-Dynamic-Benchmark-for-Sentiment-Potts-Wu/284dfcf7f25ca87b2db235c6cdc848b4143d3923) Douwe Kiela's webpage: https://douwekiela.github.io/ The hosts for this episode are Pradeep Dasigi and Alexis Ross.
  continue reading

145 episoder

Semua episod

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning

Lyt til dette show, mens du udforsker
Afspil