IBM Trusted AI toolkits for Python combat AI bias

On Aug 27, 2019

Researchers at IBM are developing ways to reduce bias in machine learning models and to identify bias in data sets that train AI, with the goal of avoiding discrimination in the behavior and decisions made by AI-driven applications.

As a result of this research, called the Trusted AI project, the company has released three open source, Python-based toolkits. The latest toolkit, AI Explainability 360, which was released earlier this month, includes algorithms that can be used to explain the decisions made by a machine learning model.

The three IBM Trusted AI toolkits for Python are all accessible from GitHub. Some information about the three packages:

AI Explainability 360, aka AIX360, provides algorithms that cover the different dimensions of explainability of machine learning models and proxy explainability metrics. The extensible toolkit can help users understand how these models predict labels by various means throughout the AI application lifecycle. Algorithmic research is translated from the lab into actual practice for domains including finance, human capital management, education, and healthcare. The AIX360 toolkit was introduced on August 8, 2019 and can be downloaded from this link. You can access API docs at this link.
AI Fairness 360, or AIF360, provides metrics to check for unwanted bias in data sets and machine learning models and contains algorithms to mitigate bias. With this toolkit, IBM aims to prevent the development of machine learning models that could give certain privileged groups a systematic advantage. Bias in training data, due to either prejudice in labels or under-sampling or over-sampling, leads to models with biased decision-making. Introduced in September 2018, AIF360 includes nine algorithms that can be called in a standard way. AIF360 contains tutorials on credit scoring, predicting medical expenditures, and classifying face images by gender. AIF360 code can be found at this link. Documentation is accessible at this link.
Adversarial Robustness Toolbox is a Python library supporting developers and researchers in defending DNNs (Deep Neural Networks) against adversarial attacks and thus making AI systems more secure and trustworthy. DNNs are vulnerable to adversarial examples, which are inputs (say, images) deliberately modified to produce a desired response by the DNN. The toolbox can be used to build defense techniques and deploy practical defenses. The approach for defending DNNs involves measuring model robustness and model hardening, with approaches such as preprocessing DNN inputs to augment training data with adversarial samples, and leveraging runtime detection methods to flag any inputs that might have been tampered with by an adversary. Released by IBM Research Ireland in April 2018, the Adversarial Robustness Toolbox can be found at this link.

Moving forward, IBM also is pondering the release of a tool for accountability of AI models. The intent is that over the lifecycle of a model, provenance and a data trail would be maintained, so the model can be trusted.