How Copyright Biases AI
Using copyrighted works as training data for AI is not only a fair use, but one that can quite literally promote fairness.
The quandary of biased data producing biased results is not new—it's as old as the first computer. AI trained on vast amounts of data are used by our banks and our bosses, computers and our criminal justice system, which is why it's crucial to understand why AI seems to reflect, amplify, and perpetuate human bias rather than eliminate it. There is a robust body of scholarship, even entire conferences, dedicated to reducing biases and enhancing the fairness. Scholars have long examined the complex legal and ethical questions posed by collecting, storing, and processing the quantities of "Big Data" required to train AI. Absent from the conversation, however, are analyses from copyright scholars about how our legal framework inadvertently biases who can use which data.
This Article, still in progress, is the first to address how copyright law channels AI in a fundamentally biased direction by advantaging established companies and privileging biased data—and suggests that using copyrighted works as training data to mitigate bias is a fair use.