This topic is part of an interactive knowledge graph with 118 connected AI & data topics, audio explainers, and guided learning paths.

Open explorer →
Say What?Safety, Risk & Governance › Training data & copyright issues
Safety, Risk & Governance

Training data & copyright issues

By Mark Ziler · Last updated 2026-04-05

AI models are trained on massive datasets that include copyrighted material — books, articles, images, code. The legal landscape is still being defined through active lawsuits. For your business, the practical risk is in AI-generated content: if an AI produces marketing copy or designs that closely mirror copyrighted work, your business could face liability. The mitigation: use AI to draft and ideate, but have a human review for originality before publishing.

Go deeper

Your marketing team used an AI image generator to create visuals for a new campaign. One of the generated images looks remarkably similar to a competitor's copyrighted brand photography — similar enough that someone on social media noticed and tagged both companies. Whether or not it constitutes infringement, you're now spending time and legal budget on a problem that didn't need to exist. The AI didn't copy deliberately. It learned patterns from its training data, and sometimes those patterns produce outputs that land too close to specific sources.

The trap most companies fall into is assuming that 'AI generated it, not us' provides legal cover. Current legal consensus is trending the opposite direction: if you publish it, you own the liability regardless of how it was produced. Treat AI-generated content like you'd treat work from a freelancer — review it before it goes out, and don't publish anything you can't confidently say is original.

Questions to ask

Explore this topic interactively →