Artwork

Indhold leveret af The Nonlinear Fund. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af The Nonlinear Fund eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Player FM - Podcast-app
Gå offline med appen Player FM !

LW - The Dunning-Kruger of disproving Dunning-Kruger by kromem

9:08
 
Del
 

Manage episode 418997960 series 3337129
Indhold leveret af The Nonlinear Fund. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af The Nonlinear Fund eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Dunning-Kruger of disproving Dunning-Kruger, published by kromem on May 18, 2024 on LessWrong. In an online discussion elsewhere today someone linked this article which in turn linked the paper Gignac & Zajenkowski, The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data (PDF) (ironically hosted on @gwern's site). And I just don't understand what they were thinking. Let's look at their methodology real quick in section 2.2 (emphasis added): 2.2.1. Subjectively assessed intelligence Participants assessed their own intelligence on a scale ranging from 1 to 25 (see Zajenkowski, Stolarski, Maciantowicz, Malesza, & Witowska, 2016). Five groups of five columns were labelled as very low, low, average, high or very high, respectively (see Fig. S1). Participants' SAIQ was indexed with the marked column counting from the first to the left; thus, the scores ranged from 1 to 25. Prior to providing a response to the scale, the following instruction was presented: "People differ with respect to their intelligence and can have a low, average or high level. Using the following scale, please indicate where you can be placed compared to other people. Please mark an X in the appropriate box corresponding to your level of intelligence." In order to place the 25-point scale SAIQ scores onto a scale more comparable to a conventional IQ score (i.e., M = 100; SD = 15), we transformed the scores such that values of 1, 2, 3, 4, 5… 21, 22, 23, 24, 25 were recoded to 40, 45, 50, 55, 60… 140, 145, 150, 155, 160. As the transformation was entirely linear, the results derived from the raw scale SAI scores and the recoded scale SAI scores were the same. Any alarm bells yet? Let's look at how they measured actual results: 2.2.2. Objectively assessed intelligence Participants completed the Advanced Progressive Matrices (APM; Raven, Court, & Raven, 1994). The APM is a non-verbal intelligence test which consists of items that include a matrix of figural patterns with a missing piece. The goal is to discover the rules that govern the matrix and to apply them to the response options. The APM is considered to be less affected by culture and/or education (Raven et al., 1994). It is known as good, but not perfect, indicator of general intellectual functioning (Carroll, 1993; Gignac, 2015). We used the age-based norms published in Raven et al. (1994, p. 55) to convert the raw APM scores into percentile scores. We then converted the percentile scores into z-scores with the IDF.NORMAL function in SPSS. Then, we converted the z-scores into IQ scores by multiplying them by 15 and adding 100. Although the norms were relatively old, we considered them essentially valid, given evidence that the Flynn effect had slowed down considerably by 1980 to 1990 and may have even reversed to a small degree since the early 1990s (Woodley of Menie et al., 2018). An example of the self-assessment scoring question was in the supplemental materials of the paper. I couldn't access it behind a paywall, but the paper they reference does include a great example of the scoring sheet in its appendix which I'm including here: So we have what appears to be a linear self-assessment scale broken into 25 segments. If I were a participant filling this out, knowing how I've consistently performed on standardized tests around the 96-98th percentile, I'd have personally selected the top segment, which looks like it corresponds to the self-assessment of being in the top 4% of test takers. Behind the scenes they would then have proceeded to take that assessment and scale it to an IQ score of 160, at the 99.99th percentile (no, I don't think that highly of myself). Even if I had been conservative with my self assessment and gone with what looks like the 92-96th pe...
  continue reading

1687 episoder

Artwork
iconDel
 
Manage episode 418997960 series 3337129
Indhold leveret af The Nonlinear Fund. Alt podcastindhold inklusive episoder, grafik og podcastbeskrivelser uploades og leveres direkte af The Nonlinear Fund eller deres podcastplatformspartner. Hvis du mener, at nogen bruger dit ophavsretligt beskyttede værk uden din tilladelse, kan du følge processen beskrevet her https://da.player.fm/legal.
Link to original article
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Dunning-Kruger of disproving Dunning-Kruger, published by kromem on May 18, 2024 on LessWrong. In an online discussion elsewhere today someone linked this article which in turn linked the paper Gignac & Zajenkowski, The Dunning-Kruger effect is (mostly) a statistical artefact: Valid approaches to testing the hypothesis with individual differences data (PDF) (ironically hosted on @gwern's site). And I just don't understand what they were thinking. Let's look at their methodology real quick in section 2.2 (emphasis added): 2.2.1. Subjectively assessed intelligence Participants assessed their own intelligence on a scale ranging from 1 to 25 (see Zajenkowski, Stolarski, Maciantowicz, Malesza, & Witowska, 2016). Five groups of five columns were labelled as very low, low, average, high or very high, respectively (see Fig. S1). Participants' SAIQ was indexed with the marked column counting from the first to the left; thus, the scores ranged from 1 to 25. Prior to providing a response to the scale, the following instruction was presented: "People differ with respect to their intelligence and can have a low, average or high level. Using the following scale, please indicate where you can be placed compared to other people. Please mark an X in the appropriate box corresponding to your level of intelligence." In order to place the 25-point scale SAIQ scores onto a scale more comparable to a conventional IQ score (i.e., M = 100; SD = 15), we transformed the scores such that values of 1, 2, 3, 4, 5… 21, 22, 23, 24, 25 were recoded to 40, 45, 50, 55, 60… 140, 145, 150, 155, 160. As the transformation was entirely linear, the results derived from the raw scale SAI scores and the recoded scale SAI scores were the same. Any alarm bells yet? Let's look at how they measured actual results: 2.2.2. Objectively assessed intelligence Participants completed the Advanced Progressive Matrices (APM; Raven, Court, & Raven, 1994). The APM is a non-verbal intelligence test which consists of items that include a matrix of figural patterns with a missing piece. The goal is to discover the rules that govern the matrix and to apply them to the response options. The APM is considered to be less affected by culture and/or education (Raven et al., 1994). It is known as good, but not perfect, indicator of general intellectual functioning (Carroll, 1993; Gignac, 2015). We used the age-based norms published in Raven et al. (1994, p. 55) to convert the raw APM scores into percentile scores. We then converted the percentile scores into z-scores with the IDF.NORMAL function in SPSS. Then, we converted the z-scores into IQ scores by multiplying them by 15 and adding 100. Although the norms were relatively old, we considered them essentially valid, given evidence that the Flynn effect had slowed down considerably by 1980 to 1990 and may have even reversed to a small degree since the early 1990s (Woodley of Menie et al., 2018). An example of the self-assessment scoring question was in the supplemental materials of the paper. I couldn't access it behind a paywall, but the paper they reference does include a great example of the scoring sheet in its appendix which I'm including here: So we have what appears to be a linear self-assessment scale broken into 25 segments. If I were a participant filling this out, knowing how I've consistently performed on standardized tests around the 96-98th percentile, I'd have personally selected the top segment, which looks like it corresponds to the self-assessment of being in the top 4% of test takers. Behind the scenes they would then have proceeded to take that assessment and scale it to an IQ score of 160, at the 99.99th percentile (no, I don't think that highly of myself). Even if I had been conservative with my self assessment and gone with what looks like the 92-96th pe...
  continue reading

1687 episoder

सभी एपिसोड

×
 
Loading …

Velkommen til Player FM!

Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.

 

Hurtig referencevejledning