How the law got it wrong with Apple Card ' TechCrunch

Advocates for algorithmic justice are beginning to see the light of day in court through legal investigations into enterprises such as UHG and Apple Card. The Apple Card case is an excellent example of how anti-discrimination laws are not keeping up with the rapid pace of scientific research in quantifiable fairness.
Although it is true that Apple and its underwriters were not found guilty of fair lending violations by the judge, there were clear warning signs that this ruling should have been a red flag for enterprises that use machine learning in any regulated area. Executives who fail to recognize algorithmic fairness will face legal and reputational challenges.

What happened to the Apple Card?

Late 2019, David Heinemeier Hansson, a social media celebrity and startup leader, raised an important issue via Twitter to much fanfare. He received almost 50,000 likes and replies to his tweet. This led to him asking Apple and Goldman Sachs for explanations as to why he and his spouse, who have the same financial abilities, would receive different credit limits. It was a moment of truth for many algorithmic fairness advocates. This culminated in an inquiry by the NY Department of Financial Services.

It may appear that credit underwriters find it encouraging that the DFS in March concluded that Goldman's underwriting algorithm was not in violation of the 1974 rules for financial access. This was to ensure that women and minorities are protected from discrimination in lending. This result, while disappointing for activists, was not unexpected to us who work closely with finance data teams.

Credit underwriting is just one example of algorithmic financial institution applications where the risks and benefits of experimentation outweigh the benefits. It was possible to predict that Goldman would be found guilty, as the laws regarding fair lending are clearly and rigorously enforced.

Yet, I have no doubt that the Goldman/Apple algorithm discriminates. It is just like every other credit scoring or underwriting algorithm. These algorithms could easily be ruined if researchers had access to the data and models required to verify this claim. This is because the NY DFS released a partial version of its methodology for vetting Goldman's algorithm. As you might expect, the audit was far below the current standards used by algorithm auditors.

What did DFS (under the current law) do to assess fairness of Apple Card

DFS examined first whether Goldman had used prohibited traits of potential applicants such as gender or marital status to verify that the Apple algorithm was fair. This was an easy one for Goldman to pass as they don't include race, gender, marital status or marital status in the model. We know for years that certain model features can be used as proxy for protected classes.

For instance, if you are Black and a pregnant woman, your chances of getting credit may be lower than those in the overarching protected categories.

Based on 50 years worth of legal precedent, the DFS methodology did not mention whether they had considered this question. However, we can guess they did. They would have discovered that credit scores are so closely correlated with race that some states are considering banning their use for casualty coverage. Proxy features are a recent addition to the research spotlight, and this is our first example of science outpacing regulation.

DFS searched for credit profiles with similar content that belonged to different protected classes, but did not have protected features. They tried to determine what would happen to credit decisions if the applicant's gender was changed. What would happen to the credit decision if the applicant were a female?

This seems to be one way of defining fair. In the field of machine-learning fairness, there is a concept known as a flip test. It is one of many measures that make up an individual fairness concept, which sounds exactly like it. Patrick Hall, principal scientist at the boutique AI law firm, spoke to me about the most common analysis used in fair lending cases investigations. He referred to the DFS methods used to audit Apple Card and called it basic regression. This is a 1970s-style version of the flip test. It's also the second example of insufficient laws.

A new vocabulary to describe algorithmic fairness

Since Solon Barocas' seminal paper Big Datas, Disparate Impact (2016) in 2016, researchers have been working hard to translate core philosophical concepts into mathematical terms. Numerous conferences have been established, and new fairness tracks have emerged at the most important AI events. This field is experiencing hypergrowth and the law has not kept up. This legal reprieve, like the one that was granted to cybersecurity will not last forever.

DFS may be forgiven for its softball audit, given that fair lending laws were established by the civil rights movement. They have not changed much over the 50 years since their inception. Before machine learning fairness research took off, the legal precedents were in place. DFS would have been well-equipped to handle the task of evaluating fairness of Apple Card if they had the extensive vocabulary and algorithmic assessment tools that has developed over the past five years.

For example, the DFS report does not mention equalized odds. This line of inquiry was first popularized in 2018 by Joy Buolamwini and Timnit Gebru. Their Gender Shades paper showed that facial recognition algorithms fail to recognize dark female faces more often then they do subjects with lighter skin. This reasoning is valid for many other applications beyond computer vision.

Equalized odds would question Apple's algorithm about its ability to predict creditworthiness accurately. What percentage of the time does it make a mistake? Do these error rates differ between people of different races, genders or disabilities? Hall says that these measurements are crucial, but not yet fully codified in the legal system.

It is easy to see why Goldman would be a problem for these underserved communities at a national level if he found out that Goldman often underestimates the qualifications of female applicants.

Financial services Catch-22

Modern auditors are well aware that legal precedents do not capture the nuances of fairness for intersectional combination within minority groups. This problem is compounded by machine learning models that can be complex. For example, if you are Black, a woman, and pregnant, your chances of getting credit might be lower than the average outcome for each of the overarching protected categories.

The underrepresented groups might not benefit from a comprehensive audit of the system without paying special attention to their uniqueness. This is because the sample size for minorities is, by definition, smaller than the rest of the set. Modern auditors favor fairness by using awareness approaches that allow us measure results with explicit knowledge about the demographics of each group.

There is a catch-22. Auditors in financial services and other highly-regulated areas often have a hard time using fairness through awareness because they might be prohibited from gathering sensitive information at the beginning. This legal restriction was intended to stop lenders from discriminating. This covers algorithmic discrimination and is our third example legal insufficiency.

This makes it difficult to understand how models treat underserved populations because we don't have the ability to collect this information. We might not be able to prove that full-time mothers, for example, have smaller credit files because they don't make every credit-based purchase under both spouses' names. It is possible that minority groups are more likely to work as gig workers, tipped workers or to participate in cash-based businesses, which can lead to commonalities between their income profiles that are less common for the majority.

These differences in credit scores do not necessarily indicate financial responsibility or creditworthiness. You would like to be able to accurately predict creditworthiness if you want to use a credit score.

This is what it means for AI-using businesses

Apple's story is a good example. Apple updated their credit policy in a positive epilogue to counter discrimination. This update was made possible by our outdated laws. Apple CEO Tim Cook made a bold statement highlighting the inequity in the industry's credit scoring calculations.

The new policy allows parents or spouses to merge credit files so that the weaker credit file is able to benefit from the stronger. This is a great example how a company can think ahead and take steps to reduce discrimination in the world. Apple was able to update their policies ahead of any regulation that might result from this inquiry.

This is an advantage for Apple because NY DFS detailed the insufficiency and need to update laws in this area. Linda A. Lacewell, Superintendent of Financial Services: Credit scoring as it stands today and the laws and regulations that prohibit discrimination in lending need to be strengthened and modernized. This is something I have seen regulators keenly pursue.

It is clear that American regulators are working hard to improve laws that govern AI. They take advantage of the robust vocabulary that allows for equality in math and automation. OCC, CFPB and FTC are all keen to tackle algorithmic discrimination, even if they move slowly.

We believe algorithmic discrimination has become rampant in the interim, partly because the industry has been slow to adopt the academic language that the past few years have brought. There is no excuse for companies not to use this new field of fairness and eradicate the predictive discrimination that is almost certain. The EU is also in agreement, and has draft laws specifically addressing AI which will be adopted within the next two-years.

Machine learning has advanced rapidly, with many new techniques being discovered each year. This field is just now at a stage where it can be automated to some extent. Even though American law is slow to change, standards bodies have provided guidance to help lower the severity and frequency of these issues.

It is illegal, regardless of whether the algorithm discriminates. Anyone using advanced analytics to aid in healthcare, housing, finance, education, or government is likely to be unaware of these laws.

The industry will have to decide which definitions are fairest until clearer guidance is available regarding the many applications of AI in sensitive circumstances.