Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think that's true at all. Images and text get reposted with or without consent, often without attribution. It wouldn't make it right for the AI companies to scrape when the original author doesn't want that but someone else has ignored their wishes and requirements. Basically, what good is putting your stuff behind login or some other restrictive viewing method if someone just saves the image/text? I think it's still a relatively serious problem for people creating things. And without some form of easy access to viewing, the people creating things don't get the visibility and exposure they need to get an audience/clients.

This is one the AI companies should offer the olive branch on IMO, there must be a way to use stenography to transparently embed a "don't process for AI" code into an image or text or music or any other creative work that won't be noticeable by humans, but the AI would see if it tried to process the content for training. I think it would be a very convenient answer and probably not be detrimental to the AI companies, but I also imagine that the AI companies would not be very eager to spend the resources implementing this. I do think they're the best source for such protections for artists though.

Ideally, without a previous written agreement for a dataset from the original creators, the AI companies probably shouldn't be using it for training at all, but I doubt that will happen -- the system I mention above should be _opt-in_, that is, you must tag such content that is free to be AI trained in order for AI to be trained on it, but I have 0 faith that the AI companies would agree to such a self-limitation.

edit: added mention to music and other creative works in second paragraph 1st sentence

edit 2: Added final paragraph as I do think this should be opt-in, but don't believe AI companies would ever accept this, even though they should by all means in my opinion.



Here are my 2 cents, I think we will need some laws specifying two types of AI models, ones trained with full consent (opt-in) for its training material and ones without. The first one would be like Adobe's firefly model where they allegedly own everything they trained it with, or something where you go around asking for consent for each thing in your training corpus (probably unfeasible for large models). Maybe things in the public domain would be ok to train with. In this case there are no restrictions and the output from such models can even be copyrighted.

Now for the second type, representing models such as Stable Difusion and Chat GPT, it would be required to have their trained model freely available to anyone and any resulting output would not be copyrightable. It may be a more fairer way of allowing anyone to harness the power of AI models that contain essentially the knowledge of all man kind, but without giving any party an unfair monopoly on it.

This should be easily enforceable for big corporations, else it would be too obvious if they are trying to pass one type model as another or even keep the truth about their model from leaking. It might not be as easy to keep small groups or individuals from breaking those rules, but hey, at least it evens the playing field.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: