There's tons of research out there on grouping and tracking and gifted models, etc, though unfortunately most of it is pretty limited and/ or poor quality, which makes it easy to abuse. Stick with systematic reviews as much as you can, though even those tend to be pretty political, especially in the fraught world of grouping and de-tracking. Here's one of the better (though not terribly recent) ones:

https://nrcgt.uconn.edu/wp-content/uploads/sites/953/2015/04/rbdm9204.pdf

Overall, what all of the research tends to say, regardless of ostensible topic, is that kids learn better when provided the right material in the right way (go figure). The delivery model doesn't matter as much for academics, as long as this is being achieved. Not surprisingly, however, the more integrated the model, the more diverse the class needs are and the harder the logistics of actually getting each kid the right material. Simply put, which kid gets more time with the material and teaching they need: the one in the class of 20 kids with similar needs, or the one in the class of 20 kids who each have different needs? Pity the poor teacher!

I have yet to read a single study of differentiation that concludes 'differentiation works' by assessing a teacher actually differentiating. Instead, they all say the above - kids learn best when you teach them in their ZPD - and since that's what differentiation does, it works.

When it comes to understanding math de-tracking and what actually does and doesn't help students, one of the best papers I have ever seen takes a thoughtful look at the outcomes of 'Algebra for All' and 'Double-Dose Algebra'. I consider it must-read for any district considering de-tracking:

https://consortium.uchicago.edu/sites/default/files/2018-10/Sorting%20Brief.pdf

With respect to delivery models, you may find this lit review undertaken for one of our school boards helpful (starts page 9). It also does an interesting job of untangling why so much lit on gifted psychosocial outcomes seems contradictory, and concludes that it the lit is actually fairly consistent *if* you control for delivery model. For example, many poor outcomes for gifted students (such as stigma or mental health issues) are found in students in more integrated models, and are reduced when students spend more time with peers with similar learning needs.

https://weblink.ocdsb.ca/weblink/0/...ew%20_Final%20Report_Sep%2009%202016.pdf