What does it mean for a algorithm to be fair
In commissioned a 90-day study this culminated in a report (PDF) on the state of "big data" and Relat Ed Technologies. The authors give many recommendations, including this central warning.
WARNING:ALGORITHMS can facilitate illegal discrimination!
Here's a not-so-imaginary example of the problem. A Bank wants people to take loans with high interest rates, and it also serves ads for these loans. A modern idea was to use an algorithm to decide, based on the sliver of known information about a user visiting a website, Which advertisement to present this gives the largest chance of the user clicking on it. There ' s one Problem:these algorithms is trained on historical data, and poor uneducated people (often racial minorities) There are ahistorical trend of being more likely to succumb to predatory loan advertisements than, the general population. So a algorithm that's "just" trying to maximize clickthrough could also be targeting black people, de facto denying them o Pportunities for fair loans. Such behavior is illegal.
On the other hand, even if algorithms is not making illegal decisions, by TRAINING&NBSP;ALGORITHMS in data produced by Humans, we naturally reinforce prejudices of the majority. This can has negative effects, like Google's AutoComplete finishing "is transgenders" with "Going to hell?" Even if this was the most common question being asked on Google, and even if the majority think it ' S&nbs P;morally acceptable to display the this to the users, this shows that algorithms does in fact encode our PREJUDICES.&NBSP;PEOP Le is slowly coming to realize-the point where it is recently covered in the New York times< /em>.
There is many facets to the algorithm fairness problem one that have not even been widely acknowledged as a problem, DESPI Te the Times article. The message has been echoed by machine learning researchers but mostly ignored by practitioners. In particular, "experts" continually make ignorant claims such as, "equations can ' t be racist," and the following quote FR Om the above linked article about the Chicago Police Department have been using algorithms to do predictive policing.
Wernick denies that [the predictive policing] algorithm uses ' any racial, neighborhood, or other such information ' to Assi St in compiling the Heat list [of potential repeat offenders].
Why are this ignorant? Because of the well-known fact that removing explicit racial features from data does not Eliminate an Algorithm ' s ability to learn race. If racial features disproportionately correlate with crime (as they does in the US), then a algorithm which learns race is Actually doing exactly what's it is designed to do! One needs to is very thorough to say that's algorithm does not ' use race ' in its computations. Algorithms is not designed in a vacuum, but is rather in conjunction with the designer's analysis of Their data. There is points of failure here:the designer can unwittingly encode biases into the algorithm based on a biased expl Oration of the data, and the data itself can encode biases due to human decisions made to create it. because of this, the Burden of proof is (or should be!) on the Practitioner to guarantee they was not violating discrimination l Aw. Wernick should instead prove mathematically that the PoliCing algorithm does not discriminate.
While this viewpoint is idealistic, it's a bit naive because there is no accepted definition of what itmeansFor a algorithm to is fair. In fact, from a precise mathematical standpoint, there isn ' t even a preciseLegalDefinition of what the it means for any practice to be fair. In the US the existing legal theory are called disparate impact, which states, a practice can be considered illegal dis Crimination if it has a "disproportionately adverse" effect on the members of a protected group. Here "disproportionate" are precisely defined by the 80% rule, but this is somehow not enforced as stated. As with many legal issues, laws is broad assertions that is challenged on a case-by-case basis. In the case of fairness, the legal decision usually hinges on whether anindividual was treated unfairly, because the one who files the individual. Our understanding of the law are cobbled together, essentially through anecdotes slanted by political agendas. A mathematician can ' t make progress with that. We want the Mathematical essence of fairness, not something the can is interpreted depending on the court majority.
The problem is exacerbated for data mining because the practitioners often demonstrate a poor understanding Of&nbs P;statistics, the management doesn ' t understand algorithms, and almost everyone is lulled into a false sense of Security via abstraction (Remember, "equations can ' t be racist"). Experts in discrimination Law aren ' t trained to audit algorithms, and engineers aren ' t trained in social science or L Aw. The speed with which, becomes practice far outpaces, the speed of which anyone can keep up. Especially true at places like Google and Facebook, where teams of in-house mathematicians and algorithm desi Gners bypass the delay between academia and industry.
And perhaps the worst part is this even the world's best mathematicians and computer scientists don ' t know what to Interpre t the output of many popular learning algorithms. This isn ' t just a problem so stupid people aren ' t listening to smart people, it's so everyone is "stupid." A more politically correct-say it:transparency in machine learning is a wide open problem. Take, for example, deep learning. A far-removed adaptation of neuroscience to data mining, deep learning have become the flagship technique spearheading mode RN Advances in image tagging, speech recognition, and other classification problems.
A typical example of how a deep neural network learns to tag images. Image source:http://engineering.flipboard.com/2015/05/scaling-convnets/
The picture above shows what low level "features" (which essentially boil down to simple numerical combinations of pixel Values) is combined in a "neural network" to more complicated image-like structures. The Claim that these features represent natural concepts like "cat" and "horse" has fueled the public attention ; On deep Learning for years. But looking at the above, was there any reasonable-to-say whether these is encoding "discriminatory information"? Not only was this an open question, but we don ' t even know what kinds of problems deep learning can Solv E! how can we understand to what extent neural networks can encode discrimination if we don ' t have a D Eep understanding of why a neural network are good at what it does?
What makes this worse was that there was only about ten people in the world who understand the practical aspects of deep Learning well enough to achieve the record results for deep learning. This means they spent a ton of time tinkering the model to make it domain-specific, and nobody really knows whether the SU Btle differences between the top models correspond to genuine advances or slight overfitting or luck. Say whether the fiasco with google tagging images of black people as apes was caused by the data or the deep learning algorithm or by Some obscure Tweak made by the designer? I doubt even the designer could tell if you have any certainty.
Opacity and a lack of interpretability are the rule more than the exception in machine learning. Celebrated techniques like support Vector machines, boosting, and recent popular "tensor methods" is all highly  ; opaque. This means so even if EW knew what fairness meant, it's still a challenge (though one we ' d be suited for) to modify Exi Sting algorithms to become fair. but with recent success stories in theoretical computer science connectin G Security, trust, and privacy, computer scientists have started to take up the call of nailing down what f Airness means, and how to measure and enforce fairness in algorithms. There is now a yearly workshop called fairness, accountability, and Transparency in machine learning (FAT-ML, an Aweso Me acronym), and some famous theory researchers is starting to get involved, as is social scientists and legal expe Rts. Full disclosure, II days ago I gave a talk as part of the This workshop on modificAtions to Adaboost that seem to make it more fair. More on this in a future post.
From our perspective, we the computer scientists and mathematicians, the central obstacle are still that we don ' t have a go OD definition of fairness.
The next post I want to get a bit more technical. I ' ll describe the parts of the Fairness literature I like (which'll be biased), I'll hypothesize about the tension betwe En statistical fairness and individual fairness, and I ' ll entertain ideas on how someone designing a controversial algorit HM (such as a predictive policing algorithm) could maintain transparency and accountability over its discriminatory impact . In subsequent posts I want to explain in more detail what it seems so difficult-come up with a useful definition of fair Ness, and to describe some of the ideas I and my coauthors has worked on.
Until then!
Share this:
- Share on Facebook (Opens in New window)
- Click to share on Google + (Opens in new window)
- Click to share the Reddit (Opens in New window)
- Click to Share on Twitter (Opens in New window)
- More
Like this:like Loading ...
What does it mean for a algorithm to be fair