Taming the swarm
The continuous attempts to tame the flood of information have a lot in common with detecting emergent patterns in complex systems: we look for a deterministic model that can explain the behavior detected in the large number of observations, present that model and observe how it stacks out against further observation that might or might not fit into the model.
Our theoretical thinking, though, is limited to the outliers that actually perform the creation of the models, the examination of the underlying assumptions and the critical review of the results presented.
Yes, the large majority of data scientists behave in a critical manner, explore within the limits of their modelling tools, and then use those limitations to explore further, refine and limit their model and observation parameters.
Those limitations, the constraints inherent to the production of a theory of the particular model under study and the limitations of imperfect knowledge and limited computing time move scientists to accept limited modeling techniques, as well as try to extend the metaphor of the underlying theory, effectively assuming the model used to study physical or economic events might use the limited assumptions of theoretical spaces within which the model is first derived.
For example, the linear model attempting to simulate and predict a situation in which there are at least four independent variables is going to fall short of explaining situations that arise in those variables, Furthermore, the assumption that the model is linear is also one that further reduces the validity of said model, since it is impossible to explain various nonlinear effects that might occur. The presence of higher order components will lead to interesting behavior that can be explained with many different mathematical tools, but definitely not within the framework of a first order linear model.
Seeing this happen in academia makes me think about the silo nature of research: models that could benefit from the hand of a mathematician or physicist are left unregarded, barely explained, while extra rigour is demanded of the most simple of models, in an attempt tom formalize and eliminate criticism due to methodology, when that methodology is not at fault here: the initial assumptions forget that the reality being described is perhaps better described using some other, higher order, dynamic description.
Meanwhile , we get to listen as migration is explained as dependent on happiness, while a whole other raft of environmental, economic and social factors are left unmentioned.