aeh, you mention that the NNAT has a history of being more fair to diverse populations, but there is a study to suggest that english language learners scored more poorly on it than other tests. The study also found that the NNAT overidentified both extreme high and low scorers on the test.
http://gcq.sagepub.com/content/52/4/275This study was done on the original NNAT, not the NNAT2. I'm curious if the development of the new version has eliminated some of those flaws. The counselor at my son's school also mentioned that the NNAT2 was a more difficult test than the cogat, which seems strange given that they are both normed to an average of 100. His district is diverse (we live in Texas), so I'm wondering if that might have something to do with her impression, if indeed the NNAT2 underidentifies minority students in comparison to the cogat.
Sorry to derail the thread a bit, though it does point to the fact that these tests are imperfect. I would tend to trust the results of an individually administered IQ test over a 30 minute group test, even if the group test is more recent.
First, of course, individually-administered tests are better measures of cognitive ability for pretty much anyone. They are more nuanced, cover a more diverse range of ability-types, are more adaptive, and allow for querying when children give atypical responses.
Second, as to the study you linked, that study compared the NNAT to two other instruments administered as group cognitive assessments. As it happens, it was authored by the author of one of those tests (coincidentally, the test that the study found to be the "best" was the one by the author). Not knocking the research, but giving some perspective. The Ravens, which was found not to be properly centered (the mean is off), is a very old test, so that's not a surprising finding. The CogAT has a much larger standardization sample than the NNAT, so it is also not surprising that it would have more accurate norms. Note that the article is about comparing three nonverbal assessments. We can only see the abstract before the paywall, so I don't know what's in the rest of the article, but the title suggests that the comparisons were between the CogAT6 Nonverbal and the other two tests. This is not how schools commonly use the CogAT.
This study comparing the CogAT6 to the NNAT-2 found that the NNAT-2 had slightly less of a gap for Hispanics and ELLs than the CogAT Composite, with gaps significantly affecting GATE identification for Asians and ELLs. The CogAT Nonverbal was slightly better than the NNAT-2. They also found that follow-up WISCIVs matched the Composite scores the best, which also makes sense, since the domain overlap should be better:
http://gcq.sagepub.com/content/57/2/101.abstractThe history of schools attempting to use figural tests for GATE screening reflects lawsuits in California back in the 80s and 90s regarding overrepresentation in special education of CLD/minority populations. Courts decided that the face cultural load of verbal IQ tests discriminated against diverse populations. This was the impetus for research and design into updated nonverbal tests. Item analysis since then has found that there is actually negligible cultural bias in most of the gold-standard cognitive assessments (though, naturally, LEP factors continue to affect performance on verbal tasks). So yes, it is quite possible that the CogAT6 composite does not discriminate against diverse populations any more than nonverbal instruments of equal psychometric robustness. One can make an argument that nonverbal assessments discriminate against verbally-skewed students of all CL backgrounds.
However, to my knowledge, the history of GATE screening with and without nonverbal instruments appears to support their use as part of the process, in terms of the actual shift in diversity of students accepted to programs. Whether use of these instruments continues to be necessary to level the playing field for CLDs is an open question, especially as newer tests come out, and CLD factors are given more attention in test design and item tryout.
Bottom line, no one score can be used to describe a whole person. Obvious, right?