“The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts“
Nah, screw that, actively sabotage the training data if they’re going to keep scraping data after being told not to. Poison it with gibberish bad info. Otherwise you’re just giving them irrelevant but not unuseful training data, so no real incentive to only scrape pages that have allowed it.
great, just, one issue.
“The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts“
Nah, screw that, actively sabotage the training data if they’re going to keep scraping data after being told not to. Poison it with gibberish bad info. Otherwise you’re just giving them irrelevant but not unuseful training data, so no real incentive to only scrape pages that have allowed it.