A few notes:
- the pages that you saw sound like they were probably from the Beery, since there is no task where staying in lines is relevant on the WISC, but that can come up on the Beery, depending on which tasks were administered.
- he would not have been required to touch the iPad for the WISC, but he could have if he had chosen. It is also possible to provide all of the relevant responses orally, which may have been how he did it. I was mainly curious about the PSI, but it sounds like he did that on paper.
And now for the WJ-IV. Grade equivalents, unfortunately, remain a fixture in too many evaluation reports, when they are usually inappropriate, as both APA and NASP have asserted for decades. The WJ-IV (and, in fact, every instrument you named for this round of evaluation) is a norm-referenced (deviation IQ, in this case) test, which means that what it is designed to do is to compare the ordinal performance of the examinee to the standardization population. That is, how a student's performance ranks compared to their age- or grade-peers. (e.g., first through 100th.) It is not designed to determine at what grade-level a student actually functions, or whether they meet learning standards for a particular grade in a particular curriculum.
Where then, do the grade equivalent scores come from? They are generated by statistical methods that identify the raw score obtained by the average student at the named grade-level in the standardization sample. The hypothetical average student could have obtained that score by multiple pathways, as, for example, they could have missed lots of items scattered along the way, but completed some very high level items, or they could have had many sequential correct answers and then abruptly hit their skill/knowledge limit. Another confound is that not every skill develops on the same trajectory, nor are they necessarily smooth and even, so that even if the raw scores were consistently obtained in the same way, a difference of one or two correct answers might, at some levels, make negligible difference in the nominal GE, and at other levels, suggest unrealistically large differences. Some skills level off developmentally relatively early (e.g., addition fact fluency), while others have such a high ceiling that even most educated adults struggle to demonstrate consistent mastery (e.g., calculus items in calculation).
Suffice it to say that these tests are not designed to generate the true grade equivalent, and you are correct, although not uncommon, it is a weird (and inappropriate) way to report scores.