Conversation
Edited 1 month ago
#scraping
Show content
@mrose.ink.bsky.social said it perfectly:

https://bsky.app/profile/mrose.ink/post/3lbwpud2mes2n

"One enduring complication with all this is that scraping happens all the time for reasons that people *don’t* find inherently objectionable, and in fact support—the Wayback Machine, all kinds of public health and extremism research, etc. The mistake was assuming that goodwill transfers.

A key problem in the Disc Horse (and policy to a lesser extent) is reminding people that scraping as a technological process is Important, Actually, for all the things You Think Are Good, and any proposed solutions to curtail GAI training uses need to be VERY narrowly tailored to not impact those.

All the proposed solutions so far have had some critical flaw that makes them unworkable.

Manual consent? Ok, how do we implement that at scale? robots.txt style flags are fine, but they’re also not legally binding—and that’s good! If they were, Wayback wouldn’t be able to index!

So exclusion protocols can be ignored, For Good Reason. “What if we give an exclusion protocol the force of law for this specific use?” Closer, but there’s active debate in the courts about whether this is all a fair use, and if the answer is “yes,” then it doesn’t matter

…then best case scenario the tags are rendered null (because you can’t legally override fair use), and worst case you’ve just recreated a DMCA 1201 style lockout trick, and we have spent the last 25 years seeing just how incredibly those fuck up everything around them."
0
2
1