Face recognition and AI ethics

Way back in the 1970s and early 1980s, the tech industry created a transformative new technology that gave governments and corporations an unprecedented ability to track, analyse and understand all of us. Relational databases meant that for the first time things that had always been theoretically possible on a small scale became practically possible on a massive scale. People worried about this, a lot, and wrote books about it, a lot.

Specifically, we worried about two kinds of problem:

We worried that these databases would contain bad data or bad assumptions, and in particular that they might inadvertently and unconsciously encode the existing prejudices and biases of our societies and fix them into machinery. We worried people would screw up.
And, we worried about people deliberately building and using these systems to do bad things

That is, we worried what would happen if these systems didn’t work and we worried what would happen if they did work.

We’re now having much the same conversation about AI in general (or more properly machine learning) and especially about face recognition, which has only become practical because of machine learning. And, we’re worrying about the same things - we worry what happens if it doesn’t work and we worry what happens if it does work. We’re also, I think, trying to work out how much of this is a new problem, and how much of it we’re worried about, and why we’re worried.

First, ‘when people screw up’.

When good people use bad data

People make mistakes with databases. We’ve probably all heard some variant of the old joke that the tax office has misspelled your name and it’s easier to change your name than to get the mistake fixed. There’s also the not-at-all-a-joke problem that you have the same name as a wanted criminal and the police keep stopping you, or indeed that you have the same name as a suspected terrorist and find yourself on a no-fly list or worse. Meanwhile, this spring a security researcher claimed that he’d registered ‘NULL’ as his custom licence place and now gets hundreds of random misdirected parking tickets.

These kinds of stories capture three distinct issues:

The system might have bad data (the name is misspelled)…
Or have a bug or bad assumption in how it processes data (it can’t handle ‘Null’ as a name, or ‘Scunthorpe’ triggers an obscenity filter)
And, the system is being used by people who don’t have the training, processes, institutional structure or individual empowerment to recognise such a mistake and react appropriately.

Of course, all bureaucratic processes are subject to this set of problems, going back a few thousand years before anyone made the first punch card. Databases gave us a new way to express it on a different scale, and so now does machine learning. But ML brings different kinds of ways to screw up, and these are inherent in how it works.

So: imagine you want a software system that can recognise photos of cats. The old way to do this would be to build logical steps - you’d make something that could detect edges, something that could detect pointed ears, an eye detector, a leg counter and so on… and you’d end up with several hundred steps all bolted together and it would never quite work. Really, this was like trying to make a mechanical horse - perfectly possible in theory, but in practice the complexity was too great. There’s a whole class of computer science problems like this - thing that are easy for us to do but hard or impossible for us to explain how we do. Machine learning changes these from logic problems to statistics problems. Instead of writing down how you recognise a photo of X, you take a hundred thousand examples of X and a hundred thousand examples of not-X and use a statistical engine to generate (‘train’) a model that can tell the difference to a given degree of certainty. Then you give it a photo and it tells you whether it matched X or not-X and by what degree. Instead of telling the computer the rules, the computer works out the rules based on the data and the answers (‘this is X, that is not-X) that you give it.

This works fantastically well for a whole class of problem, including face recognition, but it introduces two areas for error.

First, what exactly is in the training data - in your examples of X and Not-X? Are you sure? What ELSE is in those example sets?

My favourite example of what can go wrong here comes from a project for recognising cancer in photos of skin. The obvious problem is that you might not have an appropriate distribution of samples of skin in different tones. But another problem that can arise is that dermatologists tend to put rulers in the photo of cancer, for scale - so if all the examples of ‘cancer’ have a ruler and all the examples of ‘not-cancer’ do not, that might be a lot more statistically prominent than those small blemishes. You inadvertently built a ruler-recogniser instead of a cancer-recogniser.

The structural thing to understand here is that the system has no understanding of what it’s looking at - it has no concept of skin or cancer or colour or gender or people or even images. It doesn’t know what these things are any more than a washing machine knows what clothes are. It’s just doing a statistical comparison of data sets. So, again - what is your data set? How is it selected? What might be in it that you don’t notice - even if you’re looking? How might different human groups be represented in misleading ways? And what might be in your data that has nothing to do with people and no predictive value, yet affects the result? Are all your ‘healthy’ photos taken under incandescent light and all your ‘unhealthy’ pictures taken under LED light? You might not be able to tell, but the computer will be using that as a signal.

Second, a subtler point - what does ‘match’ mean? The computers and databases that we’re all familiar with generally give ‘yes/no’ answers. Is this licence plate reported stolen? Is this credit card valid? Does it have available balance? Is this flight booking confirmed? How many orders are there for this customer number? But machine learning doesn’t give yes/no answers. It gives ‘maybe’, ‘maybe not’ and ‘probably’ answers. It gives probabilities. So, if your user interface presents a ‘probably’ as a ‘yes’, this can create problems.

You can see both of these issues coming together in a couple of recent publicity stunts: train a face recognition system on mugshots of criminals (and only criminals), and then take a photo of an honest and decent person (normally a politician) and ask if there are any matches, taking care to use a fairly low confidence level, and the system says YES! - and this politician is ‘matched’ against a bank robber.

To a computer scientist, this can look like sabotage - you deliberately use a skewed data set, deliberately set the accuracy too low for the use case and then (mis)represent a probabilistic result as YES WE HAVE A MATCH. You could have run the same exercise with photos of kittens instead of criminals, or indeed photos of cabbages - if you tell the computer ‘find the closest match for this photo of a face amongst these photos of cabbages’, it will say ‘well, this cabbage is the closest.’ You’ve set the system up to fail - like driving a car into a wall and then saying ‘Look! It crashed!’ as though you’ve proved something.

But of course, you have proved something - you’ve proved that cars can be crashed. And these kinds of exercises have value because people hear ‘artificial intelligence’ and think that it’s, well, intelligence - that it’s ‘AI’ and ‘maths’ and a computer and ‘maths can’t be biased’. The maths can’t be biased but the data can be. There’s a lot of value to demonstrating that actually, this technology can be screwed up, just as databases can be screwed up, and they will be. People will build face recognition systems in exactly this way and not understand why they won’t produce reliable results, and then sell those products to small police departments and say ‘it’s AI - it can never be wrong’.

These issues are fundamental to machine learning, and it’s important to repeat that they have nothing specifically to do with data about people. You could build a system that recognises imminent failure in gas turbines and not realise that your sample data has biased it against telemetry from Siemens sensors. Equally, machine learning is hugely powerful - it really can recognise things that computers could never recognise before, with a huge range of extremely valuable uses cases. But, just as we had to understand that databases are very useful but can be ‘wrong’, we also have to understand how this works, both to try to avoid screwing up and to make sure that people understand that the computer could still be wrong. Machine learning is much better at doing certain things than people, just as a dog is much better at finding drugs than people, but we wouldn’t convict someone on a dog’s evidence. And dogs are much more intelligent than any machine learning.

When bad people use good data

So far, I’ve been talking about what happens when a face recognition system (or any machine learning system) gives inaccurate results, but an equal and opposite problem is that people can build a system that gives accurate results and then use those results for something we don’t like. The use of faces is an easy concern to focus on - your face can be seen from across the street without your even knowing, and you can’t change it.

Everyone’s example of what we don’t want is China’s use of every kind of surveillance technology, including face recognition, in its province of Xinjiang, as part of its systematic repression of the ethnic Uyghur Muslim population there. Indeed, a Chinese research paper explicitly aimed at spotting Uyghur faces recently attracted a lot of attention. But of course, these technologies are actually being deployed across the whole of China, if not always on the same scale, and they’re being used for all sorts of things, not all of which are quite so obviously worrying.

You can get a useful window into this in the 600 page IPO prospectus from Megvii, released in August of this year. Megvii is one of the larger companies supplying what it calls ‘smart city IoT’ to various arms of the Chinese government; it says it has 106 Chinese cities as customers, up from 30 in 2016, with 1,500 R&D staff, and $100m of revenue from this business in the first half of 2019. China has turned panopticons into a business.

Megvii doesn’t talk about spotting Uyghurs on the street. It does talk about ‘public safety’ and law enforcement. But it also mentions, for example:

Police being able to identify a lost and confused elderly person who’d forgotten their name and address
Automatically dispatching elevators in a large office building
Checking that tenants in subsidised housing do not illegally sublet their apartments
Building whitelists of people allowed to enter a kindergarten
Banks identifying customers at the cashiers’ desk.

Much like databases today, face recognition will be used for all sorts of things in many parts of societies, including many things that don’t today look like a face recognition use case. Some of these will be a problem, but not all. Which ones? How would we tell?

Today, there are some obvious frameworks that people use in thinking about this:

Is it being done by the state or by a private company?
Is it active or passive - are you knowingly using it (to register at reception, say) or is it happening as soon as you walk through the door, or even walk past the lobby in the street?
If it’s passive, is it being disclosed? If it’s active, do you have a choice?
Is it being linked to a real-world identity or just used as an anonymous ID (for example, to generate statistics on flow through a transit system)?
And, is this being done to give me some utility, or purely for someone else’s benefit?
Then of course, you get to questions that are really about databases, not face recognition per se: where are you storing it, who has access, and can I demand to see it or demand you delete it?

Hence, I think most people are comfortable with a machine at Customs checking your face against the photo in the passport and the photo on file, and recording that. We might be comfortable with our bank using face recognition as well. This is explicit, it has a clear reason and it’s being done by an organisation that you recognise has a valid reason to do this. Equally, we accept that our mobile phone company knows where we are, and that our bank knows how much money we have, because that’s how they work. We probably wouldn’t accept it the other way around - my phone company doesn’t get to know my salary. Different entities have permission for different things. I trust the supermarket with my children’s lives but I wouldn’t trust it with a streaming music service.

At the other end of the spectrum, imagine a property developer that uses face recognition to tag and track everyone walking down a shopping street, which shops they go into, what products they look at, pick up and try on, and then links that to the points of sale and the credit card. I think most people would be pretty uncomfortable with this - it’s passive, it’s being done by a private company, you might not even know it was happening, and it’s not for your benefit. It’s a non-consensual intrusion into your privacy. They don’t have permission.

But on the other hand, would this tracking be OK if it was anonymous - if it was never explicitly linked to the credit card and to a human name, and only used to analyse footfall? What if it used clothes and gait to track people around a mall, instead of faces? What if a public transit authority uses anonymised faces to get metrics around typical journeys through the system? And why exactly is this different to what retailers already do with credit cards (which can be linked to your identity at purchase) and transit authorities do with tickets and smart cards (which often are)? Perhaps it’s not quite so clear what we approve of.

Retailers tracking their customers does make lots of people unhappy on principle, even without any faces being involved (and even if they’ve been doing it for decades), but how about a very obvious government, public safety use case - recognising wanted criminals?

We’re all (I think) comfortable with the idea of mugshots and ‘Wanted’ posters. We understand that the police put them up in their office, and maybe have some on the dashboard of their patrol car. In parallel, we have a pretty wide deployment today of licence plate recognition cameras for law enforcement (or just tolls). But what if a police patrol car has a bank of cameras that scan every face within a hundred yards against a national database of outstanding warrants? What if the Amber Alert system tells every autonomous car in the city to scan both passing cars and passing faces for the target? (Presume in all of these cases that we really are looking for actual criminals and not protesters, or Uyghurs.) I’m not sure how much consensus there would be about this. Do people trust the police?

You could argue, say, that it would not be OK for the police to scan ‘all’ of the faces all of the time (imagine if the NYPD scanned every face entering every subway station in New York), but that it would be OK to scan historic footage for one particular face. That sounds different, but why? What’s the repeatable chain of logic? This reminds me of the US courts’ decision that limits how the police can put a GPS tracker on a suspect’s car - they have to follow them around manually, the old-fashioned way. Is it that we don’t want it, or that we don’t want it to be too easy, or too automated? At the extreme, the US firearms agency is banned from storing gun records in a searchable database - everything has to be analogue, and searched by hand. There’s something about the automation itself that we don’t always like - when something that always been theoretically possible on a small scale becomes practically possible on a massive scale.

Part of the experience of databases, though, was that some things create discomfort only because they’re new and unfamiliar, and face recognition is the same. Part of the ambivalence, for any given use case, is the novelty, and that may settle and resettle:

This might be a genuinely new and bad thing that we don’t like at all
Or, it may be new and we decide we don’t care
We may decide that it’s just a new expression of an old thing we don’t worry about
It may be that this was indeed being done before, but somehow doing it with your face makes it different, or just makes us more aware that it’s being done at all.

All of this discussion, really, is about perception, culture and politics, not technology, and while we can mostly agree on the extremes there’s a very large grey area in the middle where reasonable people will disagree. This will probably also be different in different places - a good illustration of this is in attitudes to compulsory national identity cards. The UK doesn’t have one and has always refused to have one, with most people seeing the very idea as a fundamental breach of civil liberties. France, the land of ‘liberté’, has them and doesn’t worry about it (but the French census does not collect ethnicity because the Nazis used this to round up Jews during the occupation). The US doesn’t have them in theory, but is arguably introducing them by the back door. And Germany has them, despite very strong objections to other intrusions by the state for obvious historical reasons.

There’s not necessarily any right answer here and no way to get to one through any analytic process - this is a social, cultural and political question, with all sorts of unpredictable outcomes. The US bans a gun database, and yet, the US also has a company called “PatronScan’ that scans your driving licence against a private blacklist of 38,000 people (another database) shared across over 600 bars and nighclubs. Meanwhile, many state DMVs sell personal information to private companies. Imagine sitting down in 1980 and building a decision matrix to predict that.

Ethics and regulation

The tech industry’s most visible initial response to these issues has been to create ethics boards of various kinds at individual companies, and to create codes of conduct across the industry for individual engineers, researchers and companies to sign up to. The idea of both of these:

To promise not to create things with ‘bad data’ (in the broadest sense)
To promise not to build ‘bad things’, and in the case of ethics boards to have a process for deciding what counts as a bad thing.

This is necessary, but I think insufficient.

First, promising that you won’t build a product that produces inaccurate results seems to me rather like writing a promise to yourself not to screw up. No-one plans to screw up. You can make lists of specific kinds of screw-ups that you will try to avoid, and you’ll make progress in some of these, but that won’t stop it happening. You also won’t stop other people doing it.

Going back to databases, a big in Hertz’s systems recently led to cars being falsely reported as stolen, and their customers being arrested.. This wasn’t a machine learning screw-up - it was a screw-up of 40 year old technology. We’ve been talking about how you can make mistakes in databases for longer than most database engineers have been alive, and yet it still happens. The important thing here was that the police officer who pulled Steve over understood the concept of a database being wrong and had the common sense - and empowerment - to check.

That comes back to the face recognition stunts I mentioned earlier - you can promise not to make mistakes, but it’s probably more valuable to publicise the idea that mistakes will happen - that you can’t just presume the computer must be right. We should publicise this to the engineers at a third-tier outsourcer that’s bodging together a ‘spot shoplifter faces’ system, but we should also publicise it to that police officer, and to the lawyer and judge. After all, those mistakes will carry on happening, in every computer system, for as long as humans are allow to touch them.

Second, it’s all well and good for people at any given company to decide that a particular use of face recognition (or any kind of machine learning project) is evil, and that they won’t build it, but ‘evil’ is often a matter of opinion, and as I’ve discussed above there are many cases where reasonable people will disagree about whether we want something to exist or not. Megvii has an ethics board too, and that ethics board has signed off on ‘smart city IoT’.

Moreover, as Megvii and many other examples show, this technology is increasingly a commodity. The cutting edge work is still limited to a relatively small number of companies and institutions, but ‘face recognition’ is now freely available to any software company to build with. You yourself can decide that you don’t want to build X or Y, but that really has no bearing on whether it will get built. So, is the objective to prevent your company from creating this, or to prevent this thing from being created and used at all?

This of course takes us to the other strand of reaction, which is the push for binding regulation of face recognition, at every level of government from individual cities to the EU. These entities do of course have the power of compulsion - they still can’t stop you from screwing up, but they can mandate auditing processes to catch mistakes and remedies or penalties if they happen, and (for example) require the right to see or delete ‘your’ data, and they can also ban or control particular use cases.

The challenge here, I think, is to work out the right level of abstraction. When Bernie Madoff’s Ponzi scheme imploded, we didn’t say that Excel needed tighter regulation or that his landlord should have spotted what he was doing - the right layer to intervene was within financial services. Equally, we regulate financial services, but mortgages, credit cards, the stock market and retail banks’ capital requirements are all handled very separately. A law that tries to regulate using your face to unlock your phone, or turn yourself into a kitten, and also a system for spotting loyalty-card holders in a supermarket, and to determine where the police can use cameras and how they can store data, is unlikely to be very effective.

Finally, there is a tendency to frame this conversation in terms of the Chinese government and the US constitution. But no-one outside the USA knows or cares what the US constitution says, and Megvii already provides its ‘smart city IoT’ products to customers in 15 countries outside China. The really interesting question here, which I think goes far beyond face recognition to many other parts of the the internet, is the degree to which on one hand what one might call the ‘EU Model’ of privacy and regulation of tech spreads, and indeed (as with GDPR) is imposed on US companies, and on the other the degree to which the Chinese model spreads to places that find it more appealing than either the EU or the US models.

Policy, Artificial IntelligenceBenedict Evans9 September 2019