Generative AI Lawyers: Copyright Implications

October 29, 2022

ai dev

Generative AI is in the courtroom. Artists are suing over training data. Developers are questioning code Copilot generates. The legal landscape is uncertain.

The Core Issue

AI models learn from data. That data includes:

Copyrighted images
Licensed code
Published text
Commercial artwork

The question: Is training on this data legal? Is the output infringing?

The Arguments

Against AI Training

Artist creates work → Work posted online
                           ↓
                    Scraped for training
                           ↓
                    AI can replicate style
                           ↓
            Artist loses commissions to AI

Arguments:

Training is unauthorized copying
Outputs can replicate specific styles
Economic harm to creators
No consent, no compensation

For AI Training

Arguments:

Fair use (transformative use)
Models don’t store copies
Similar to how humans learn
Promotes innovation
Information wants to be free

The Lawsuits

Getty Images vs. Stability AI

Claim: Stability AI scraped millions of Getty images without license.

Evidence: Outputs sometimes include garbled Getty watermarks.

Implications: Commercial image training may require licensing.

Artists vs. Stability/Midjourney

Class action by artists claiming:

Unauthorized reproduction (training)
Creating derivative works (outputs)
Trademark violations
Right of publicity infringement

GitHub Copilot Lawsuit

Programmers sued GitHub/Microsoft/OpenAI claiming:

Copilot reproduces licensed code
Violates open source licenses (GPL, MIT, etc.)
Removes attribution requirements

The Code Problem

License Violations?

Copilot was trained on public GitHub repos. Some were GPL licensed:

GPL Code  →  Training  →  Copilot suggests similar code
                              ↓
                     Is this GPL-derived?
                              ↓
                 Must the using project be GPL?

Attribution

MIT license requires attribution:

Copyright (c) [year] [author]
Permission is hereby granted...

When Copilot suggests code, where’s the attribution?

Practical Reality

# Copilot might suggest this for "implement fizzbuzz"
def fizzbuzz(n):
    for i in range(1, n + 1):
        if i % 15 == 0:
            print("FizzBuzz")
        elif i % 3 == 0:
            print("Fizz")
        elif i % 5 == 0:
            print("Buzz")
        else:
            print(i)

Is this someone’s copyrighted code? Or so generic it’s uncopyrightable?

Current Legal Framework

Fair Use (US)

Four factors:

Purpose: Commercial or educational?
Nature: Creative or factual work?
Amount: How much was used?
Effect: Market impact on original?

For AI training:

Purpose: Commercial (mostly)
Nature: Creative works used
Amount: Entire works used
Effect: Debatable

No clear answer yet.

EU Copyright

Directive on Copyright (2019):

Text/data mining exception exists
But for research, not commercial
Rights holders can opt out

More restrictive than US fair use.

Implications for Developers

Using AI Tools

Risk Level	Use Case
Lower	Boilerplate, common patterns
Medium	Library-specific code
Higher	Novel implementations
Highest	Code matching specific projects

Mitigation Strategies

# 1. Review AI suggestions
# Don't blindly accept

# 2. Check for distinctive patterns
# If it looks too specific, search for the source

# 3. Document your process
# Show you made modifications

# 4. Know your company policy
# Many have AI tool policies now

Enterprise Considerations

Some companies ban Copilot for IP reasons
Code review should scrutinize AI suggestions
Consider indemnification (GitHub offers some)

What’s Coming

Likely Outcomes

Settlement templates: Big AI companies may create licensing frameworks
Opt-out registries: Ways for rights holders to exclude their work
Compensation pools: Like music royalties, but for training data
Disclosure requirements: AI-generated content must be labeled

What Won’t Happen

Generative AI won’t be banned
Large models won’t be unwound
Innovation won’t stop

The genie is out.

Practical Advice

For Developers

1. Treat AI suggestions as starting points
2. Modify substantially before using
3. Run plagiarism checks for critical code
4. Follow company policies

For AI Users

1. Don't prompt for specific artist's styles by name
2. Modify outputs before commercial use
3. Disclose AI assistance when appropriate
4. Stay informed about legal developments

For Companies

1. Establish AI usage policies
2. Train employees on IP considerations
3. Consider indemnification clauses
4. Document AI tool usage

Final Thoughts

The law is catching up to the technology. The outcome will shape AI development for years.

For now: Use AI tools thoughtfully. Understand the uncertainty. Don’t pretend AI output is solely your creation.

The lawsuits will be resolved. The ethical questions will remain.

The code may be generated. The responsibility is still yours.