Is there a simple way to severly impede webscraping and LLM data collection of my website?

Maroon@lemmy.world · edit-2 11 days ago

Is there a simple way to severly impede webscraping and LLM data collection of my website?

IphtashuFitz@lemmy.world · 11 days ago

Try using “curl -A” to specify a User-Agent string that matches Chrome or Firefox.

corroded@lemmy.world · 10 days ago

I probably should have specified I’m using libcurl, but I did try the equivalent of what you suggested. I even tried setting a list of user agents and having it cycle through. None of them work. A lot of anti-scraping methods use much more complex schemes than just validating the user agent. In some cases, even a headless browser will be blocked.