Notes for a critical analysis of Machine Learning in education

4223

By Carlo Perrotta

Featured image by Daniel Friedman

Part 1

Artificial Intelligence is without doubt a topic deserving serious consideration and scholarly attention, but its impact on education is still rather unclear. It feels timely to begin a critical conversation, while keeping in mind how marginal AI may still be for the day-to-day business of education. As critical education researchers we find ourselves gazing more and more into the near or even the far future, trying to challenge the narratives of disruption and innovation that inspire actual policies and drive investments in various ‘cutting-edge’ areas of educational technology. While future-gazing may help us understand the trajectory of modern education, it is imperative to remain within clear methodological parameters.

Some of my recent research (with Ben Williamson) focused on the so-called ‘social life’ of learning analytics. As part of this work, we tried to apply a methodological approach based on three premises:

It is possible to study algorithmic systems sociologically, focusing on their development as methods situated in history and politics.
It is possible to study algorithmic systems from a sociomaterial angle, considering the entanglements of interests, forms of specialised knowledge, actual centres of expertise and computational materialities: indicators, servers, standards, software code, industry frameworks and so on.
It is possible to study algorithmic systems focusing on their performative dimensions, as sets of mathematical or ‘neuromorphic’ (i.e. mimicking how the brain processes and classifies data) operations in which two fundamental functions are conflated: the discovery of ‘natural’ patterns in the world and the active imposition of ordering structures, which are open to ideological influencing and, indeed, poisoning.

Recent trends in economic sociology were a source of inspiration, in particular critical analyses at the intersection of material semiotics and performativity. However, we also tried to articulate an original contribution in the invitation to engage more thoughtfully and critically with the actual ‘under the bonnet’ operations that allow algorithmic systems to act as sociotechnical devices. Even recent, outstanding work like Donald MacKenzie’s analysis of material signals in High-Frequency Trading algorithms is somewhat hesitant to engage with the so-called ‘secret sauce’ of code and data analysis, without realising that the use of expressions like secret sauce also contributes to the obfuscation of these systems. While commercial and legal factors may constrain or even stop in its tracks a more in-depth examination of algorithmic agency (seen as a mix of human and non-human elements), especially in fields like finance, it is still possible to develop a critical, indeed sociological, understanding of what algorithms actually ‘do’ as sequences of calculations and materialisations of social forces and ideologies. This belief rests on the assumption that a number of mechanical/ mathematical properties, principles and limitations apply to most algorithmic systems. These factors can be, to use a navigational analogy, the coordinates that allow us to triangulate the approximate location and movement of the underlying computational processes. Most importantly, they can be used as entry points for a more sociological examination of how computational processes become entangled with values, ideologies and interests. This is not too different from the notion of technography introduced by Taina Bucher, which is concerned with ‘norms and values that have been delegated to and materialized in technology’.

My goal with this post is to begin a technographic inquiry into AI in education, applying the critical approach rapidly sketched above to the topic of Machine Learning (a subset of AI). This will be an iterative exercise that will involve several posts on this website, through which I hope to achieve increasing degrees of clarity and the identification of a few empirically viable challenges. In this first post, I will introduce some key concepts and areas of tension that, as suggested above, can act as points of access for a more sociological and material investigation of ML’s educational implications. A certain amount of trivialisation is to be expected, and AI and ML experts reading this article may feel amused or even irritated. I apologise for this. In my defence, I half-seriously appeal to cognitive science (a major source of inspiration for modern AI), which suggests that there are advantages in approaches that try to make sense of the world using only limited knowledge and tentative, heuristic strategies: less can be more in many cases. The following list may also benefit those seeking a quick methodological introduction to AI in education. Its function is to begin to dispel some of the ‘epistemic fog’ that surrounds these systems, and pave the way for critical social analysis.

AI is a broad church made of specialised subsets: while current discourse about AI often revolves around themes of human displacement and disempowerment, it might be helpful to reframe the debate in terms of the actual specific AI subsets, and their operational dimensions in specific domains. For instance, the field of robotic locomotion is an AI subset that originated from iterations and further elaborations of Zero Motion Point (ZMP) algorithms, which deal with the rules and conditions that determine movement, positioning and balance in real life situations. This AI subset is associated with ‘humanoid robots’, like the popular Boston Dynamics ones. Indeed, the notion of specialised AI subsets operating separately or in synchronous fashion, each performing some kind of ‘work’, is part of the current trajectory of applied AI, to be distinguished from general purpose AI (i.e. human-like artificial intelligence), which is a comparatively slow moving area of research and development with almost no real-life implications for the foreseeable future.
Machine Learning is a subset of AI: ML is a computational method to identify patterns in data and make predictions on never-before-seen instances of the same data. It is important to distinguish, at this point, between the individual algorithmic techniques and the applied procedural relationships in which they become embedded. Predictive analytic techniques can be as ‘simple’ as a linear regression, where a line in a bi-dimensional space ‘fits’ a series of observations, allowing to predict the state of one variable (Y) on the basis of another (X), or as ‘complex’ as multilayered artificial neural networks that work their way through datasets by loosely replicating certain aspects of the biological brain.

Source: Wikipedia

Methods are put to work, but first they must learn what to do. Individual predictive methods are then enrolled, to use Actor Network Theory terminology, in various forms of applied Machine Learning procedures, as labouring algorithms within algorithms. Once enrolled, they ‘learn’ from training datasets, so that the applied algorithm can be deployed to make predictions about new data it encounters. For instance, a trained algorithm can recognise faces or distinguish between pictures of cats and dogs ‘on the fly’, without having to compute from scratch every single time it encounters a new instance (a picture or an actual human). The process of learning is dependent on a number a preconditions, which are more open, uncertain and negotiable than often assumed. The first precondition is computational tractability, which means that these systems can only deal with quantifiable signals, i.e. data that can be counted and ordered on a continuous scale, or categorised using discrete values where the differential between states is known and discernible using a simple classificatory system. The second precondition is feature selection, i.e. the identification of parameters that help make sense of the quantifiable signals. Feature selection can be performed automatically, but it remains an inherently flawed and error-prone process, where certain characteristics will be included at the expense of others. The chief criterion of feature selection is to maximise powerful learning, which leads to more effective prediction, and this is often achieved at the expense of an accurate representation of a training dataset. This is called the Bias-Variance trade-off.
The bias-variance trade-off is worth talking about: Bias can be described as the difference between the function that the algorithm is trying to induce by enacting its pattern-finding permutations, as it works its way through a learning dataset, and the underlying ‘true’ function. Therefore, zero bias is when the inducted function (guessed, you might say), is exactly the same as the underlying function. The best prediction models have only a moderate or average degree of fit with the data, which means that choosing a model with the best fit is likely to result in poor predictions. The dilemma is therefore between a detailed but potentially wasteful representation of the data – one that might be inefficient and noisy (i.e. picking up patterns that are irrelevant for the predictive task at hand) – and a biased one, more effective in terms of prediction but which might miss important explanatory patterns. The relationship between bias and variance is also influenced by other factors, broadly related to the complexity of the phenomena that we are trying to predict. It might helpful to introduce at his point the traditional Bayesian distinction between small worlds, where alternatives and probability distributions are known, and where benefit—maximising rational choice is possible in abstract, and large worlds where the relationships between variables are often multidimensional and replete with collinearity and dependencies. Many of the domains in which we seek to perform predictions are very large indeed (education and learning being egregious examples). This translates in a difficult challenge for Machine Learning: the larger the world, the larger the learning dataset needs to be for the process to be effective. However, the larger the world under study the more sparse, fragmented and messy these datasets will be. The dilemma is therefore between going down a simplification slope to make a large world smaller and smaller, so that we can work productively with more yielding data, or to spend resources in trying to understand a large world at the expense of our ability to predict it. These are highly consequential performative choices that create the very ontological conditions in which algorithms, seen as sociotechnical systems, operate.

The process of feature selection and the simplification slope, through which large worlds are actively made small, are perfect examples of how a critical examination of Machine Learning in education could begin. I will return to these (and other) issues in the next posts.

RELATED ARTICLESMORE FROM AUTHOR

Algorithmic bias and fairness in education: a (very brief) primer

Seminar on AI for Educational Development with Dr Selena Nemorin

Mechanical induction and heuristics in AI: upon opening the black box and finding it full of guesswork

What if I could predict your future from the colour of your shoes?

Netflix and high school algebra: a short story about methodological domination

Exploring the social and ethical implications of using autonomous robots in the classroom

RELATED ARTICLES MORE FROM AUTHOR