
armed Believing in the generative potential of the technology, a growing number of researchers and companies aim to tackle bias in AI by creating artificial images of people of color. Proponents argue that AI generators can fill diversity gaps in existing image databases by supplementing them with synthetic images. Some researchers are using machine learning architectures to map existing photos of people onto new races to “balance the racial distribution of the dataset.” Other companies, such as Generated Media and Qoves Lab, are using similar techniques to create entirely new likenesses for their image libraries, “building … faces of every race and ethnicity,” as Qoves Lab puts it, to ensure “truly fair face dataset”. As they see it, these tools will address data bias by cheaply and efficiently generating different images on command.
The problems these technologists hope to solve are critical. Artificial intelligence is rife with flaws, unlocking phones for the wrong people because it can’t tell Asian faces apart, wrongly accusing people of crimes they didn’t commit, and mistaking dark-skinned people for gorillas. These high-profile failures are not anomalies, but an unavoidable consequence of the data on which the AI is trained, which is disproportionately skewed toward whites and men—making these tools inaccessible to anyone who doesn’t fit this narrow archetype. It’s an imprecise tool. In theory, the solution is simple: we just need to train a more diverse training set. In practice, however, this has proven to be an incredibly labor-intensive task due to the scale of inputs such systems require and the extent of current data omissions (for example, IBM research shows that six out of eight salient face datasets Composed of more than 80% light-skinned faces). Thus, creating diverse datasets without manual sourcing is an enticing possibility.
Yet when we take a closer look at the ways in which the proposal could affect our tools and our relationship to them, the long shadow of this seemingly convenient solution begins to take dire shape.
computer vision has It has been developed in some form since the mid-20th century. Initially, researchers attempted to build tools top-down, manually defining rules (“human faces have two symmetrical eyes”) to identify desired image categories. These rules are translated into calculations that are then programmed into the computer to help it search for pixel patterns that correspond to those of the object being described. However, this approach proved largely unsuccessful, given the wide variety of subjects, angles, and lighting conditions that make up a single photo, and the difficulty of translating even simple rules into coherent formulas.
Over time, the increase in publicly available images has enabled a more bottom-up process through machine learning. Using this approach, large aggregates of labeled data are fed into the system. Through “supervised learning,” the algorithm takes this data and teaches itself to distinguish the desired categories specified by the researchers. This technique is much more flexible than top-down approaches because it does not rely on rules that may vary for different conditions. By training itself on a variety of inputs, the machine can identify relevant similarities between images of a given category without being explicitly told what those similarities are, creating a more adaptive model.
However, the bottom-up approach is not perfect. In particular, these systems are largely limited by the data they provide. As tech writer Rob Horning puts it, this technique “assumes a closed system.” It is difficult for them to extrapolate outside of the given parameters, leading to limited performance on subjects for which they have not been well trained; for example, data discrepancies resulted in Microsoft’s FaceDetect having a 20% error rate on dark-skinned women and 20% on white men. The error rate for men hovered around 0%. The knock-on effect of these training biases on performance is why technology ethicists have begun preaching the importance of dataset diversity, and why companies and researchers are racing to fix the problem. As a popular saying in AI goes, “garbage in, garbage out”.
This adage applies equally to image generators, which also require large datasets to train their art in photorealistic representation. Most face generators today employ generative adversarial networks (or GANs) as their underlying architecture. At its core, GANs work by having two networks (a generator and a discriminator) cooperate with each other. While the generator generates images from noisy input, the discriminator tries to classify the generated fake images from the real images provided by the training set. Over time, this “adversarial network” enables the generator to improve and create images that the discriminator cannot identify as fake. The initial input acts as an anchor for this process. Historically, tens of thousands of such images have been required to produce sufficiently realistic results, suggesting the importance of a diverse training set for the proper development of these tools.