I think the problem may be that the test is not evidently reliable for teasing out the differences beyond 99.9%. I recall being told that the younger the test-taker, the less reliable for differentiating the far right-tail.