Score Extrapolations? - Gifted Issues Discussion Forum

Gifted Issues Discussion homepage

As I learn more about IQ testing, I've come to understand that not everyone with a certain IQ score achieves index scores that match due to ceiling effects. I was wondering if there have been any studies conducted on tests with extended norms showing the mean increase (when the extended norms are applied) in each subtest and/or mean increases in the index scores? Specifically with regard to number of ceilings reached and (this may be wishful thinking) with regard to qualitative observations during test sessions? It'd be interesting to see how the correlations change, and the data could be useful for admission to gifted programs as well.

As a general note, test developers usually do pay attention to how scores change depending on how ceiling rules are applied, while piloting and standardizing tests. The idea is the ceiling rules won't significantly change the raw scores compared to administering every item available, but -will- minimize fatigue and frustration from plowing through excessive numbers of too-hard/incorrect items. Of course, the behavior of outliers won't necessarily be included in the final decision rules, since they are by definition a very tiny contributor to the data set.

You can see some of the data on increases in mean subtest/index scores on the extended norms in the tables from the WISC-IV ExNorms supplement:
https://images.pearsonclinical.com/images/assets/WISC-IV/WISCIV_TechReport_7.pdf

Qualitative observation data may exist, but I doubt it was consistent enough to code without prospectively training the examiners.

I should also add that the reason IQ scores are often higher than the index and subtest scores that contribute to them actually does not have to do with ceiling effects. There is a nice explanation (more math than most will want, but enough for the eager math consumer) of the composite effect here, with reference specifically to the WJIV, but generally applicable to most cognitive instruments that combine subtests into cluster or composite scores:
http://www.nelson.com/assessment/pdf/wjiv_asb_7.pdf

Thanks for the response, Aeh! Very interesting reads. It'd be interesting to see a breakdown of the increase by index and subtest for children who reached the ceiling for that index in particular, knowing that each child hit different ceilings on the test.

As for composite scores, it makes a lot of sense that they'd be more extreme than the averages when combining the distributions. I'd be interested to know if different subtests for the same broad skill differentiate more with respect to different areas of the skill (like activating different fibers in the same muscle) or if the different formats are there to reduce the advantages or disadvantages an individual may have with one format alone.

In general, the technical manuals for most cognitive instruments may include listings of the correlations of individual subtests with particular composites, loadings on g (general intelligence), etc. And typically, some subtests load more on higher-order factors than others do. Some tests will describe individual subtests as focused more on certain aspects of a cluster/composite area, but subtest-level interpretation is usually better left to professionals with more extensive clinical training and experience (and some in the field hold that even professionals should abstain), as the reliability of subtests is usually not as favorable as that of composites.

I also think there's a certain limit to highest ability rating items of a certain difficulty rating can measure. For example, how would someone correctly answering 10 questions in the same specific skill all rated at difficulty +0.5 standard deviations be scored if the correlations between those items are ~0.9? I assume the overall score wouldn't be that much higher than the +0.5 difficulty level.

I think percentiles calculated using raw place value (e.g. rank 3 out of all test takers) on a test would be misleading if the range of item difficulty isn't that wide, and errors leading to different percentiles would more likely be based on random, careless mistakes and not differences in the skill being measured (at least for the extreme ends).

Since the cumulative probability of a correct answer increases exponentially with z-score in a similar manner as probability decreases exponentially on a normal curve, I think percentiles derived from z-scores are more accurate in these cases. This all came from a friend asking whether a 36 on the ACT (99.9th percentile based on rank percentile method, 99.6th using SD method) is qualitatively different from a 35 (99th rank percentile, 99.3rd using SD); I think the SD percentiles and 0.3 difference make more sense given the difficulty ceilings, but please let me know if I'm mistaken.