The ethics of machine learning


This page in:

Max Kasy has a two-part series in Phenomenal World, titled “The Politics of Machine Learning.” I was drawn more to the ethical dilemmas that emerge, hence the title. While the arguments on fairness and equality, as well as the discussion on inequality of opportunity intrigued me in this particular case, I find myself generally drawn to Kasy’s writings, which include topics as diverse as statistical decision making vs. randomization, universal basic income, adaptive treatment assignments in experiments, and much more – even when I don’t find myself always agreeing with all the ideas.

In this piece on machine learning (ML), part I is a gentle introduction into the history of prediction in statistics, how machines learn, and who they currently serve. While acknowledging that many of the techniques for the problem of prediction were introduced into astronomy and physical sciences in the early 1800s, we should not think that there is nothing new in ML: particularly impressive achievements include image and speech recognition, as well as using written text as data (Could one take advantage of the latter in economics or other social sciences? Surely researchers must be trying…). Kasy then proceeds to give definitions for some basic ML language, such as regularization, tuning, and multi-armed bandit algorithms to describe how machines learn. Part 1 concludes with a discussion of the main uses of these techniques today: by companies for profits; by politicians for votes; by law enforcement and courts for bail, parole, and sentencing decisions; and by military and intelligence agencies for targeted killings.

A common thread connects these uses: the proliferation of data on each of us allows the masters to treat individuals differently. The predictable and not so predictable effects of such treatment is what is fascinating, intriguing, and scary, which is what Kasy discusses in Part 2

Part II starts with a discussion that is particularly relevant for news and political messaging. The recognition that the messages themselves being different to different groups (increasingly smaller and well-defined ones at that) might cause bubbles and contribute to polarization are well-trodden territory, but there are interesting thoughts about whether only this is to blame for polarization when older people, who rely less on these types of messaging on social media and more on good old cable news, are more polarized in their opinions than youth. There is also the interesting idea that if I think I am just being fed what I want to hear, I start tuning out because I am now questioning the authenticity of the message: there are short-term gains to be balanced with longer-term concerns, including negative externalities for the system as a whole…

The discussion on fairness and equality is most pertinent for me, as we are about to start an adaptive field trial using multi-armed bandit (MAB) experimentation. I regularly think about the unintended consequences of such algorithms, meaning, despite the fact that the algorithm is trying to maximize some social welfare function we imposed, patterns might emerge, in which some groups are being treated differently than others and we are not sure how we (or others or members of these groups themselves) feel about such differentiated (contextual, to use ML language) treatment…

A little diversion (from me) with a concrete example might help provide a little more context: suppose that you are in a setting where you’re trying to help individuals decide to take an action concerning a complex but important topic. It could be that they are trying to choose the best mortgage product or the most suitable birth control method, where the decision has serious consequences for the individual and the average individual needs some information and perhaps even some expert opinion. Information asymmetries, lack of trust in the experts, and externalities imply that choice architecture and subsidies might help achieve welfare gains – individually and socially. You get to observe clients continuously and record their choices under different styles of treatment conditions, say, communication strategies, subsidy schemes, etc. MAB algorithms will help in such cases to discover winning treatment combinations faster than classical A/B testing. They can do this without differentiating treatment between individuals, but if you were to allow contextualization by using all the background characteristics (contexts) you know about your clients, then you have a black box producing treatment conditions that are different for each individual. The lack of transparency is one problem, but, at least in principle, it should not be hard to overcome. But, even completely transparent schemes may produce differences in treatment between groups with different circumstances that are morally irrelevant to the outcome and, hence, might be undesirable from an ethical standpoint. Equality of opportunity dictates that we don’t want to discriminate on characteristics that are not under an individual’s control, such as her race, where she was born, the parents to whom she was born, etc. What if your algorithm, with which you can find no fault ex ante, is producing such patterns? What then? Intentional or unintentional, discrimination will have consequences…

The question is whether ML will exacerbate existing inequalities between groups or will it help policymakers deliver to their constituents in a way that makes such gaps smaller – by discriminating according to need, which, one can argue, is the definition of equitable treatment. As Kasy makes clear, the answer will surely depend on the context and predicting these effects is very difficult at the moment. There are worlds in which we don’t even have the basic requirement of transparency satisfied, especially when it comes to the use of ML by actors for private gain (companies, politicians, militaries, etc.). But, there are also worlds, which Kasy calls a utopian social policy, in which we harness ML for providing everyone with the public (or private) goods that they need. As I can attest from dipping our toes into this territory in our own work, the idea of equalizing possibilities for everyone by tailoring the treatments to their needs and contexts is tantalizing and infectious: armed with the best intentions, you do want to try it. But, that approach also requires you to stay vigilant and observe how these inequalities in treatment interact with existing injustices in the world and how they impact social welfare and inequality.

There is an interesting section on inequality of opportunity, where the author wonders if it even makes sense to try to make a distinction between opportunity and effort. Of course, the whole field of inequality of opportunity is based on that distinction, with the caveat that the lines are not always clear as to what constitutes a circumstance (beyond one’s control) vs. effort. Kasy argues that as big data include more and more background characteristics, everything will become more predictable, and it may be futile to try to worry about inequality of opportunity rather than that of the outcomes. This is perhaps the least clear argument made in the piece and I am not sure I understand it completely. Sure, it may be practical, if not exactly fair, to focus on inequality of outcomes from a practical policy standpoint. But, I am not sure what ML has to do with this quite yet. My sense is that we do need to keep an eye on inequalities between groups that are salient to our own society/setting and if the groups are defined by morally irrelevant characteristics, then we need to think hard about how to deploy any ML-assisted policies. Within-group inequality in treatments, calculated as the residual of the total inequality that remains after accounting between-group inequality, will remain but we can perhaps accept this – even if we are not exactly sanguine about them…

To be clear, the counterfactual world, that is the world without ML, in which many actors don’t have their clients’ or citizens’ best interests in mind, contains a large amount of injustices and inefficiencies. Like all tools, ML has the potential to make things better at least in some settings, but it can also make things worse. Vigilance is in order…

The piece ends with a discussion of who owns your data, which I really liked and is related to my previous post on issues surrounding informed consent when doing research with adolescents. Currently big data (compilations of your individual data) is used either by policymakers or private corporations. It’s not completely crazy to think about a third case where you own your data and give permission for its use on a case by case basis. Such a system might yield substantially different outcomes than the alternatives. Lots to think about and ponder about these complex yet pertinent issues regarding data and its use in our lives…


Berk Özler

Lead Economist, Development Research Group, World Bank

Angela Zhou
July 11, 2019

Thanks very much for the super interesting writeup/reflection!
I wanted to follow up on some thoughts:
> Equality of opportunity dictates that we don’t want to discriminate on characteristics that are not under an individual’s control, such as her race, where she was born, the parents to whom she was born, etc. What if your algorithm, with which you can find no fault ex ante, is producing such patterns?

I think this is interesting in the context of allocation based on treatment *efficacy* (e.g. for a contextual bandit setting: the ideal allocation rule gives treatment based on a personalized prediction of benefit). What if an allocation algorithm leads to differential allocation treatment (between-group differences), but this is "explained away" by differences in an ATE by group membership?

I wonder about the implications of the looseness of equality of opportunity, in assuming societal consensus in distinguishing between circumstances and effort, for personalized treatment effects. One made-up example might draw on the literature re: active labor market programs. What if, on average, a group has a lower contextual predicted treatment benefit -- but perhaps this could be due to existing discrimination in the labor market at the time of program implementation, leading to poorer outcomes for the group on average? My naive intuition is that the disparity in allocation (though it is accounted for by treatment effects), may still be objectionable. The mechanisms for adjudicating circumstance vs. effort might ultimately circle back to adjudicating different micro-foundations or structural models.

It seems that to inform normative prescriptions re: what to do if disparities surface in an algorithmic scheme, substantive and contextual insight regarding distributional goals for policies and programs, and domain-level contexts regarding welfare and inequality, are most necessary. I think insights regarding these trade-offs from the policy realm are particularly relevant for the fairness in machine learning community (since the "objective function" of a policy intervention is closely tied to individual social welfare; while private actors have recourse to "business necessity" arguments).