I also think there's a certain limit to highest ability rating items of a certain difficulty rating can measure. For example, how would someone correctly answering 10 questions in the same specific skill all rated at difficulty +0.5 standard deviations be scored if the correlations between those items are ~0.9? I assume the overall score wouldn't be that much higher than the +0.5 difficulty level.

I think percentiles calculated using raw place value (e.g. rank 3 out of all test takers) on a test would be misleading if the range of item difficulty isn't that wide, and errors leading to different percentiles would more likely be based on random, careless mistakes and not differences in the skill being measured (at least for the extreme ends).

Since the cumulative probability of a correct answer increases exponentially with z-score in a similar manner as probability decreases exponentially on a normal curve, I think percentiles derived from z-scores are more accurate in these cases. This all came from a friend asking whether a 36 on the ACT (99.9th percentile based on rank percentile method, 99.6th using SD method) is qualitatively different from a 35 (99th rank percentile, 99.3rd using SD); I think the SD percentiles and 0.3 difference make more sense given the difficulty ceilings, but please let me know if I'm mistaken.

Last edited by Pabulum; 11/13/19 03:07 PM.

Graduated DS