The presence of bias in data has led to a lot of research being conducted to understand the impact of bias on machine learning (ML) models and data-driven decision-making systems.2 Research has focused on questions related to the fairness in the decisions taken by models trained with biased data, and on designing methods to increase the transparency of automated decision-making processes so that possible bias issues may be easily spotted and "fixed" by removing bias.
Recent approaches taken in the literature to deal with data bias first aim to understand the cause of the problem (for example, a subset of the population being underrepresented in the training dataset) and then propose and evaluate an ad hoc intervention to reduce or remove the bias from the system (for example, by selecting which additional training data items to label in order to rebalance the dataset and increase equality—that is, a balanced representation of classes—rather than equity—that is, overrepresenting the disadvantaged subset of the population.
No entries found