It is a statistically significant difference (just over 1 SD between VC and VS, and just under 1.5 SD between NVI and VCI), but not necessarily rare--that is, more people than you expect might have this difference. puffin is correct that the more important question is whether you see IRL differences between his ability to process language and his ability to process visual spatial information (which you are reporting). It tends to support your suspicion that his working memory subtests split out on the same lines; the auditory working memory task is lower, and the visual memory task is higher.
If there is some kind of auditory processing concern, the speech therapist may or may not pick up on it, as the kind of testing they do is usually focused on language, rather than auditory processing. For the latter assessment, you would need an audiologist (although some speech language pathologists are also audiologists), usually attached to a hospital. I should point out though, that there is limited evidence-based intervention for auditory processing disorders. Mostly, one implements accommodations, many of which are good practice for a large variety of conditions, such as anxiety and inattention. For instance, pairing visual cues with verbal directions, establishing a listening set prior to directions, keeping oral instructions clear and concise, repeating/rephrasing, checking frequently for comprehension, preferential seating (distraction-free, proximal to instruction). An auditory trainer/FM system is one of the few accommodations that is specific to auditory processing (and hearing impairment).
I'm interested in the low average achievement scores, though.