Do we know what data was used to train the AI models we rely on — and are there legal risks?

Training data provenance is an active legal battleground. If the model was trained on copyrighted content, you may face downstream liability. Ask your vendor about sources and indemnification.

Are we generating AI content that could infringe on someone else's intellectual property?

AI models can reproduce patterns from training data closely enough to constitute infringement. If generated content resembles existing copyrighted work, the liability may be yours, not the provider's.

What is our policy on using AI-generated content in client deliverables or public communications?

Without a clear policy, each employee makes their own judgment. This creates inconsistent quality, potential IP issues, and client trust risks. Define the policy, including disclosure requirements.

Safety, Risk & Governance

Training data & copyright issues

By Mark Ziler · Last updated 2026-04-05

AI models are trained on massive datasets that include copyrighted material — books, articles, images, code. The legal landscape is still being defined through active lawsuits. For your business, the practical risk is in AI-generated content: if an AI produces marketing copy or designs that closely mirror copyrighted work, your business could face liability. The mitigation: use AI to draft and ideate, but have a human review for originality before publishing.

Go deeper

Your marketing team used an AI image generator to create visuals for a new campaign. One of the generated images looks remarkably similar to a competitor's copyrighted brand photography — similar enough that someone on social media noticed and tagged both companies. Whether or not it constitutes infringement, you're now spending time and legal budget on a problem that didn't need to exist. The AI didn't copy deliberately. It learned patterns from its training data, and sometimes those patterns produce outputs that land too close to specific sources.

The trap most companies fall into is assuming that 'AI generated it, not us' provides legal cover. Current legal consensus is trending the opposite direction: if you publish it, you own the liability regardless of how it was produced. Treat AI-generated content like you'd treat work from a freelancer — review it before it goes out, and don't publish anything you can't confidently say is original.

Questions to ask

Do we have a review step for AI-generated content before it goes public, specifically checking for similarity to existing copyrighted work?
Does our AI content vendor offer any indemnification or intellectual property protections in their terms?
Are there content categories where we should avoid AI generation entirely due to copyright risk — such as designs, illustrations, or long-form copy in our specific industry?