I don't know if this would help, but I'd have him add in one more data point for a fictional adult and see that it doesn't make the means the same just b/c there are the same # of subjects in each group now. I have no idea what your # are, but let's say that the adult mean is 20 and the child mean is 30. I'd pick another # for fictional adult#1 (something like the mode or another #s that is well within the typical range for the adults he tested). Have him recalculate the mean with that fictional #.

Perhaps seeing that his conclusion doesn't work out might get him seeing your point. He could then (if interested) play around with sets of #s with the same # of data points in them but different # for each data point and see how having the same overall # of scores doesn't create the same mean if the scores themselves are different. Does that make any sense?