Like any measurement system, cognitive assessments (IQ tests) have standard error. This is why it is better to view scores as confidence ranges, rather than unitary measures. Typically, composite scores are best reported in 95% confidence intervals (+/- 2 SEM) (though some conventions use the 90% confidence level). If the 95% confidence interval overlaps the cut score used by some definition of giftedness, there is some expectation that this individual's nominal classification may vary by the assessment episode.

In the technical manual of a given instrument, you can also find data on reliability and validity, which gives some sense of the statistical stability of the results, and the likelihood that they measure the constructs claimed. Test publishers and designers are usually aiming for reliability coefficients of better than 0.9.

To the qualitative aspect of your question: all cognitive measures purport to assess native learning ability, rather than experience or instruction, but unavoidably confound the two to greater or lesser extents, especially with regard to language-based skills. The gold standard instruments have lesser degrees of this confound (e.g., WISC, SB, WJ, DAS, KABC). Legitimate test prep is unlikely to affect their scores by more than a few points, in the vast majority of cases. (I leave aside the issue of adults training children on copyrighted test materials, which would be blatant cheating.) I can imagine, though, that there may be a small subset of children who will have their anxiety reduced by test prep, and will thus display more of their true ability on testing. But this is not score inflation due to test prep, but score normalization, due to increased comfort level. In most cases, this kind of special prep is unnecessary, (and may even heighten anxiety for some children); assessment professionals are trained in putting children at ease, and the activities are generally experienced as fun, or at least neutral.

And yes, many more factors can result in low estimates of ability, including emotional interference, cultural/linguistic differences, 2e, and youth. In very young children, the extremely wide range of non-pathological development, in addition to the inconsistent testability of small children, makes test results less stable than later in childhood, or in adulthood.


...pronounced like the long vowel and first letter of the alphabet...