X (formerly Twitter) updated its terms of service to ban scraping and crawling entirely, presumably to prevent AI models from being trained on its data. The new terms, which come into effect on September 29th, prohibit any kind of scraping or scanning without “Prior Written Consent”. The previous version of the terms allowed crawling according to the robots.txt file with instructions for search robots.
In the past few months, X has modified its robots.txt file, which contains instructions for crawler bots about which parts of the site they can visit, and removed instructions for all crawler bots except Google. In 2015, Twitter struck an agreement with Google to display tweets in search results. It’s unclear if the nature or terms of that deal have changed under the new leadership. So far, no comments have been received from either company.
The robots.txt file settings now prevent crawlers from getting information about likes and retweets (or should we call them “rex” now?) related to specific messages. It also prevents search robots from viewing the account’s likes, media files, and photos.
In June, the social network briefly banned non-logged-in users from viewing posts. A few days later, the company removed the sign-in requirement to view tweets. Elon Musk explained this temporary measure “Theft of website data, affecting the quality of service for ordinary users.”
Musk strongly objects to companies collecting X data to train AI models. In April, he threatened to sue Microsoft for illegally using social media data to train AI models. In July, he filed a lawsuit against several unnamed companies.