Oy. Honestly, it sounds to me like they were looking for reasons not to accelerate him. But lets play devil's advocate. "Lack of rigor" in the CBM-math may be because some of these probes are just that--probes to check on progress in a skill area based on a few key indicators. They can be subject to the same lack of comprehensiveness (is that a word?) that the WJ-type achievement tests have. Also, some tests that are called CBM aren't really based on the curriculum. The legitimate curriculum-based measure in this case would be the series of benchmark assessments pulled directly from the classroom curriculum. The difficulty with that is that it is unlikely to have good national norms. But if what you want to know is how likely a child is to be successful in this particular setting, using this particular curriculum, that is where you will find the information. My preference in your son's case would have been to use the benchmark assessments to pinpoint specific gaps, fill them, and then grade accelerate him to whatever point in the curriculum that he was consistently below 80% success on benchmarks (as a means of locating his zone of proximal development). I actually did this with #1 in the third grade, when my math-loving offspring started to complain that math was boring. (Of course, I had been asking the school to do it since first grade, but didn't get through to anyone until we reached this particular teacher.) The teacher allowed #1 to take a pretest for each unit, and skip any chapter that was 80+% accurate. This ensured that every aspect of the curriculum was touched on somewhere, but no more than one day had to be wasted on already-mastered skills. Perhaps that actually is the school's plan...but if it is, it would be nice if they let you in on it!
All right: A little bit about age/grade equivalents:
For nearly all norm-referenced standardized tests, both cognitive and achievement, the grade and age equivalents reported are derived not by determining what material constitutes the mastery level or instructional range of the average students of that grade or age, but by linear regression, using the test score at the 50th %ile for the listed age/grade. Now this sounds at face value like it would be a legitimate measure of age/grade equivalency--i.e., something you could use to determine appropriate subject/grade placement. But it's not! This is because what you actually learn about a student who's listed grade equivalence is 5.7 is not that they are appropriately placed in a fifth-grade-seventh-month curriculum, but that they got the same score on the test (which doesn't necessarily cover fifth-grade material) that the statistically-average fifth-grade-seventh-month student did. For example, on a measure of addition fluency, a fifth-grade student often can complete items much more quickly than a second-grade student, but the degree of math difficulty is the same for both of them. So a grade equivalent of 5.7 is not qualitatively much different from a GE of 2.7, nor does obtaining a GE of 5.7 on such a measure mean that an individual is capable of doing late fifth-grade work.
As with the above example, many skills have variable slopes across age/grade. In early elementary, the decoding skills of the average student are on a very steep curve, so the curve is more compressed. On the other hand, most people have mastered basic decoding skills, including the exceptions, by the time they have reached middle school. So the difference between a GE in middle school and high school on a word-level reading task is just a few points--not that meaningful statistically, but dramatically different in apparent grade-level.
This is why program criteria are usually based on standard scores or percentiles (as the more statistically robust cutoff scores), rather than GE/AE, and why determining grade/subject placement should be tied to the criteria that students normally are required to meet to demonstrate mastery of the curriculum.