That is some impressive math development! So that suggests the WIAT-III mps result was not a fluke. In fact, he surpassed the minimum raw score necessary to max out the standard score by quite a bit, if he got out at least to item #56.
On substitutions: yes, limited subtest substitution can be done, however, in his case, the only one that would make a difference is to use Zoo Locations for the FSIQ, instead of Picture Memory, and the way you have listed them suggests that this was already done. In the remaining cases, the substitution would result in the same or lower Index/IQ scores than the standard subtest would.
Your concern with visual spatial problem solving may be confounded by motor delays, if the primary indicator is performance on mazes. I would look at motor-free spatial reasoning instead, such as the ability to predict paper snowflakes. (Fold a piece of paper in front of him, make a couple of cuts, and then ask him what he thinks the paper will look like when it's unfolded. Then unfold it and see how accurate he was. Ideally, you would have a field of possible answers for him to view and select, prior to unfolding, such as on the old Stanford-Binet.) Or, when doing jigsaw puzzles, whether he can identify the correct piece (not necessarily manipulate it into the puzzle, but simply pick it out by eye). How's his sense of direction? Can he judge sizes and distances by eye? Just a few ideas.
Or it could be that your speculation is correct, and he's just less exceptional at visual spatial reasoning than his twin is. (Still pretty good, though!)