Reddit Restricts Most Wayback Machine Access Citing Concerns Over AI Data Scraping
Reddit has used technical limitations to prevent the Wayback Machine of Internet Archive from archiving nearly all of its content. The platform reconfirmed that this move is particularly due to the prevention of unauthorized large-scale scraping of AI data, especially by companies training models of artificial intelligence. This action has direct effects on the history of the discussions of users on Reddit.
Highlights:
- Reddit deployed technical blocks limiting Wayback Machine archiving.
- Preventing unauthorized AI data scraping is the stated primary reason.
- Internet Archive confirmed these restrictions hinder preservation.
- Reddit asserts control over its data's use for AI training.
- The block reduces long-term access to archived discussions.
Reddit has actively prevented crawling by the Wayback Machine. The platform relates this move directly to preventing unauthorised AI data scraping. The high volume of conversation data that Reddit has access to is a valuable source of intellectual property to the company; particularly in the active development of AI. The main aim is controlling access.
The Internet Archive accepts that the limitations severely compromise the stated mission of preserving the digital public record. Manual user saves are still technically feasible, but automated archiving of Reddit is mostly prevented. This discourages custodianship of the history and context of the online community.
This case depicts a conflict between the content control platforms and the organizations in need of data to scrape AI data. The move by Reddit is an indication of the platforms holding onto such extraction of data. It also installs obstacles on maintaining the entire historical record of internet discourse because of the limitation of AI data scrapping.