Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To protect an individual's image property rights from image generating AI's -- wouldn't it be simpler for the IETF (or other standards-producing group) to simply create an

AI image exclusion standard

, similar to "robots.txt" -- which would tell an AI data-gathering web crawler that a given image or set of images -- was off-limits for use as data?

https://en.wikipedia.org/wiki/Robots.txt

https://www.ietf.org/



Entities training models have no incentive to follow such metadata. If we accept the premise that "more input -> better models" then there's every reason to ignore non-legally-binding metadata requests.

Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.

If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.


>Entities training models have no incentive to follow such metadata. If we accept the premise that "more input -> better models" then there's every reason to ignore non-legally-binding metadata requests.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Robots.txt survived because the use of it to gatekeep valuable goodies was never widespread. Most sites want to be indexed, most URLs excluded by the robots file are not of interest to the search engine anyway, and use of robots to prevent crawling actually interesting pages is marginal.

Robots.txt survived because it was a "digital signpost" a "digital sign" -- sort of like the way you might put a "Private Property -- No Trespassing" sign in your yard.

Most moral/ethical/lawful people -- will obey that sign.

Some might not.

But the some that might not -- probably constitute about a 0.000001% minority of the population, whereas the majority that do -- probably constitute about 99.99999% of the population.

"Robots.txt" is a sign -- much like a road sign is.

People can obey them -- or they can ignore them -- but they can ignore them only at their own peril!

It's a sign which provides a hint for what the right thing to do in a certain set of circumstances -- which is what the Law is; which is what the majority of Laws are.

People can obey them -- or they can choose to ignore them -- but only at their own peril!

Most will choose to obey them. Most will choose to "take the hint", proverbially speaking!

A few might not -- but that doesn't mean the majority won't!

>If there was ever genuine uptake in using robots to gatekeep the really good stuff search engines would've stopped respecting it pretty much immediately - it isn't legally binding after all.

Again, name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.


And then what? The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt. So someone would have to enforce the standard. Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

Nevertheless, I hope that at some not-so-far point in the future there will be more legal guidance about this kind of stuff, i.e. it will be made clear that scraping violates copyright. This still won't solve the problem of detectability but it would at least increase the risk of scrapers, should they be caught.


>The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

// Part of Image Web Scraper For AI Image Generator ingestion psuedocode:

if fileExists("no-ai.txt") {

  // Abort image scraping for this site -- move on to the next site
} else {

  // Continue image scraping for this site
};

See? Nice and simple!

Also -- let me ask you this -- what happens to the intellectual property (or just plain property) rights of Images on the web after the author dies? Or say, 50 years (or whatever the legal copyright timeout is) after the author dies?

Legal grey area perhaps?

Also -- what about Images that exist in other legal jurisdictions -- i.e., other countries?

How do we know what set of laws are to apply to a given image?

?

Point is: If you're going to endorse and/or construct a legal framework (and have it be binding -- keep in mind you're going to have to traverse the legal jurisdictions of many countries, many countries!) -- you might as well consider such issues.

Also -- at least in the United States, we have Juries that can override any Law (Separation of Powers) -- that is, that which is considered "legally binding" -- may not be quite so "legally binding" if/when properly explained to a proper jury in light of extenuating (or just plain other) circumstances!

So kindly think of these issues prior to making all-encompasing proposals as to what you think should be "legally binding" or not.

I comprehend that you are just trying to solve a problem; I comprehend and empathize; but the problem might be a bit greater than you think, and there might be one if not serveral unexplored partial/better (since no one solution, legal or otherwise, will be all-encompassing) solutions -- because the problem is so large in scope -- but all of these issues must be considered in parallel -- or errors, present or future will occur...


> Part of Image Web Scraper For AI Image Generator ingestion psuedocode:...

Yes, and who is supposed to run that code?

> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?

Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining. So in practical terms it's impossible to remove an image if it has already been trained on.

So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.

[1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

[2] https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-...

[3] https://www.heise.de/hintergrund/Stock-photographer-sues-AI-...


>Yes, and who is supposed to run that code?

People that are honest and ethical?

And/or groups that don't want to risk getting sued? (Your: [1] [2] [3])?

>> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?

Because:

a) (Some) American Lawyers (AKA "Bar Association Members") -- are Sue Happy?

b) Because various Governments / Deep States (foreign and domestic) / Dark Money Groups / Paid (and highly biased) Political Activists -- want to see if they can get new draconian laws (whilst believing their actions to be super-patriotic to their respective countries!) -- or at least court precedents that move in that direction -- passed?

c) Because there's big money at stake, all the way around? (https://www.biblegateway.com/passage/?search=1%20Timothy%206...)

d) Because the alleged "victims" are "playing the victim card"?

(https://tvtropes.org/pmwiki/pmwiki.php/Main/PlayingTheVictim...) (Note that as a theory, this pairs well with (a)!)

(How much revenue will they be losing if their net income from artwork was $0? Also, wouldn't such high profile cases give the artists a ton of free advertising? The Defendant companies should counter-sue for giving the Plaintiff artists what amounts to free publicity for their artwork so great that they couldn't buy it with all of the Google advertising credits in the world!)

>Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining.

>"without retraining"...

Meditate on that one for a moment...

>So in practical terms it's impossible to remove an image if it has already been trained on.

In practical terms -- just retrain the model -- sans ("without") the encroaching images!

The models will need to be updated every couple of months anyway to include new public data from the web!

Create a list of images NOT to include in the next run (see above, "no-ai.txt" -- good suggestion incidentally!) -- and then don't include them" on the next run!

It's not Rocket Science! :-)

(Also, arguably Elon Musk doesn't think that "Rocket Science" is in fact as hard as "Rocket Science" is purported to be -- but that's a separate debate! <g>)

>So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.

Explain to me, from the point of view of an AI company, how that AI company is to know ahead of time NOT to include an image from the web? (And thus not break the law, copyright law at least, and thus not incur the lawsuits and all the chaos that will apparently follow such an act?)

How is the AI company supposed to know, ahead of time, that a given image on the web is not to be included?

How please?

Because you see, that's the root of the problem you are trying to solve.

In fact, let me ask you a better question...

How can an arbitrary Internet User -- not a big, legally powerful AI company, but an arbitrary small-fry Internet User -- know ahead of time, that a given Image, exposed to the public via the public Internet; the Web -- that the artist who created that image (or the intellectual/artistic property holder) -- does NOT want their Image to be used for specific purposes?

?

Because well, I don't know of any easily parsible, easily understandable standard for that on the Web currently...

So, to recap, the question is:

How is everybody (humans and machines) to know the unambiguous, easily parsable, easily understandable uses that the artist (or intellectual/artistic property) of an image -- wishes/wills for that image?

And how to easily know the unintended uses?

That might be a better definition of the problem that is trying to be solved...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: