What an individual tester sees is going, of course, to be heavily dependent on which children come to that tester to be tested. That could be affected by a lot of different things. For example, a tester who draws testees from a population where high IQs are the norm and only a really exceptional child is likely to be brought for testing (or one, like Miraca Gross, famous for being interested in PG children) is more likely to experience that scores tend to be underestimates than a tester who tests from a more average population.
But this seems to go against what others have said, which is that very high scores in young kids are likely to be very unstable?
I don't think so, if I understand correctly what you're asking. But it is all a bit delicate because we have Bayesian reasoning going on under the hood here! Here's an example, with fictional numbers, to show what I was getting at. Consider three situations.
(a) A school gets AvTester to test all 100 pupils. Of all the children AvTester tests, under 1% turn out to have IQs over 150. Jo scores 155.
(b) A school for the gifted gets GTester to test all its 100 pupils. Of all the children GTester tests, 5% turn out to have IQs over 150. Jim scores 155.
(c) Jane is taken to PGTester for testing because she is strongly suspected to be PG. Of all the children PGTester tests (she only does testing on children referred like that), 50% turn out to have IQs over 150. Jane scores 155.
Next year Jo, Jim and Jane are all retested by the same tester. Who if anyone will score over 150? We don't know; but the probability that the child will score over 150, given the information here, is NOT the same for all the children. Jane is far more likely to than Jo. Why? At bottom because the IQ score is not the only information about Jane - we also have the factors that led to her being tested in the first place, and this still matters. It has the effect of making Jane's high score less likely to be any kind of fluke than Jo's is. It's nothing, or not much, to do with being able to guess answers or not; it's to do with there being any uncertainty in the score from any cause. Happening to be asked a question you know the answer to, or happening to be in a good mood that day, or even the tester happening to make a mistake in your favour, or anything, will do.
Notice that the effect of this is that AvTester will see a strong regression to the mean effect, and will perceive that high scores in children this age are relatively unstable, compared with PGTester who will see a weaker regression to the mean effect (NB the mean that matters to her is the mean of the scores *she* sees) and will perceive that high scores in children this age are relatively stable. They're both right, for the populations they test.
What's true for one's own child? There probability gets tricky - you can quite legitimately get very different answers, depending on what population you decide to consider your child as a member of. I hate probability :-)
If you're confused by this you're not alone. It's the same problem as when doctors have to take into account the prevalence of a disorder as well as the test results in estimating the probability that someone with a positive test for a disorder actually has the disorder. They are notoriously bad at doing this. (The moral is, if you ever get a positive result on a screening test, talk to a statistician as well as to your GP :-)