LAION-5B includes images of humans without their explicit consent. Images of people generally involve IRB/HSR. Additionally, almost any IRB will mention that if you’re using data derived from humans, you must go through IRB.
LAION can say all they want that they’re not including images in their dataset. They include a script to download those URLs into images on disk. By being a company that’s not bound to decades of university ethics regulations, they are seemingly allowed to skirt what you learn on your first day as a researcher in academia. It may be legal, but it sure is not ethical.
Please provide link to another academic publication agreeing with your claim that linking to online content is unethical without the subject’s explicit approval.
It's one thing to link to online content. They also provide a download script to then turn the links into realizable images.
This defense, that they merely provide links and not images, is the thin layer of abstraction that their entire ethics case is built on top of. They give you everything needed to create massive datasets of human data without doing it for you.
LAION can say all they want that they’re not including images in their dataset. They include a script to download those URLs into images on disk. By being a company that’s not bound to decades of university ethics regulations, they are seemingly allowed to skirt what you learn on your first day as a researcher in academia. It may be legal, but it sure is not ethical.