There are two primary reasons why cognitive assessment measures differ, both statistical in nature:
1. Standard error/regression to the mean
2. Differences in how they attempt to quantify underlying constructs.
1. There is natural variation in the way individuals perform on tasks from occasion to occasion, which means that any individual measure is only the best estimate of some hypothetical "true" value for that task. If we could guarantee ideal performance on any occasion, then we might be able to claim a true measurement. We can't, of course, for a host of reasons, including how much sleep the student had the night before, whether the examiner made a judgement call on scoring a marginal response, the level of engagement the student had with the examiner and the task, whether seeing "0"s and "1"s encouraged or discouraged the student, etc. So scores are supposed to be reported in confidence intervals (ranges), to capture the idea that this is not a hard-wired number, but a best estimate.
In addition, regression to the mean says that, because scores far away from the mean are relatively rare, it is even more rare that two scores that far from the mean will be obtained. Practically speaking, this means that, if you administer two tests that purport to measure the same thing, and the first one is quite far off the mean (say, very high), the expectation is that the second measure will be closer to the mean (i.e., lower, in the case of high GT-type scores).
2. Each assessment is attempting to sample skills related to an underlying construct (in the case of cognition, this is usually "g", or general intelligence--those researchers of cognition have had an ongoing discussion on this for decades). But each test developer selects different exact tasks and items for this, each of which may sample g to different extents (unless you believe that g is really a whole host of subscripted "g"s, in which case maybe you are sampling different aspects of cognition altogether; or perhaps some combination of the two).
The bottom line is that measurement error and differences in sampling embedded in the test designs mean that cognitive assessment scores are quite commonly different from administration to administration, and from test to test. And if you test enough times, you will probably come up with a few outliers (notably higher or lower than the hypothetical "true" score) eventually.