@buherator Anti-scraping measures do not necessarily affect indexers. What does is that the Big Ones are also very much doing scraping too, and would display AI summaries/suggestions instead of real search finds. Thus, people block them too. Not as a side-effect, but as a deliberate choice.
Using pre-anti-scraping LLMs works, maybe, for a little while. But those fail to capture new stuff behind anti-scraper measures, and as such, are inadequate for search longer term.
Using smaller indexes, personal/community ones still yield decent results. Decent, as in, usually better than pre-AI global indexes.
Beating anti-scraper measures in the long run is not possible, because people can - and will - put their stuff behind login walls with no open registration, or take them completely offline.
The winning move would be making community-run smaller indexes a viable option. Federated, decentralized search, if you wish.
I get that there's a lot of nuance here, that's why I asked for "consideration" that can include e.g. allowing standard crawlers.
The problem is that... what do you define as standard crawler? Googlebot & Bingbot? Or Kagi's? The first two will happily display AI summaries first, and both try to hide real finds. I do not think it is worth lettimg them through. As far as I remember, both double as AI scrapers too.
Apparently building an index is much bigger effort than I expected (based on the struggles of EU and alternative providers), so I don't think that will happen in the near future.
A global index is hard, indeed, especially if you wish to filter out AI slop. A limited, community-run or topical index on the other hand? That's a whole lot easier.
I've been using my own YaCy instance for the past two years. It yields very good results, better than any other search engine did, even prior to AI - because it is not a global index. It indexes what I tell it to: whatever I bookmark on my GtS instance or in my Readeck, and whatever else I send its way.
Yes, it is limited, and it won't find stuff I didn't put in. In those cases, I fall back to laternatives: asking on fedi, or if that fails too, duckduckgo. I haven't used DDG this year yet.
Now, if we had community-run smaller indexes, and topical ones, with searxngs thrown on top to combine some of them, my educated guess is that we'd have viable search without much AI slop.
The trouble is, such an index still takes effort, requires resources, and great care. I can do it on a personal level, because I'm a stubborn little mouse. It's not viable for normal people - hence, community indexes. But that's hard. Not global-index-level hard, but harder than it needs to be.
LLM performance will degrade for sure, but I don't think it will restore trust in traditional search or otherwise move ppl away from assistants once they became dependent.
I think traditional search is dead, and has been for a while now. A global index will never yield useful results again (thanks, partly, to all the AI slop, but even before that, SEO and other algorithm cheating for the relentless pursuit of $$$ made global indexes worthless, imo).
Btw. my post was less about your work, and more about e.g. GitHub where content is no longer properly searchable either via web search or their internal search :)
My gut feeling is that GitHub search was degraded not as an anti-scraper measure, but to push people towards LLMs. That's GitHub's entire purpose lately, so why would this be an exception? :)