Yeah, I think that mostly what the Flynn effect actually shows is not human evolution (as Flynn more or less is claiming) but the fact that intelligence testing is always-- and HAS always-- tested for highly inaccurate proxies of what everyone considers "intellectual/cognitive capacity."
We don't really know how to measure the real deal. IQ tests measure something, all right. But that probably isn't the same thing.
After all, if one administered one type of test, it isn entirely possible that my Australian Shepherd could score much better on it than my daughter... just calling it an "intelligence/IQ" test doesn't mean that is what is actually being measured. Suppose that the test involved scent discrimination? Would that be fair or right? After all, it's a sensory processing and cognitive activity. LOL. The thing is, I can guarantee that the dog is better suited to that activity.
On the other hand, if it involved adding vectors...
Maybe we've just figured out how to ENRICH young children's learning environments better, and most of the people who are testing IQ are... um... self-selected and by extension, more concerned about maximizing cognitive potential?
Biology doesn't really allow for "evolution" over a couple of generations like that. Epigenetic explanations... possibly.
But given the shift in test questions, it seems to me that we do not really have an apples-to-apples comparison to start with. Earlier cohorts may not population-match with later ones, either.