Reportedly, Google Books is indexing poorly written works generated by artificial intelligence.

Editor
By Editor

Google Books, an essential tool for academics, has recently come under scrutiny for indexing low-quality books that could impact the accuracy of its language tracking tool, Ngram. 404Media reported that Google Books included books that appeared to be written by AI, such as Tristin McIver’s Bears, Bulls, and Wolves: Stock Trading for the Twenty-Year-Old, which seemed to pull information from Wikipedia and included phrases commonly used by chatbots like ChatGPT. Despite the majority of books in the search results being related to AI, some appeared to lack human authorship and relevance to the search terms.

The Ngram tool, which tracks changes in language over time, relies on data from Google Books to analyze how language usage has evolved. Google Books has scanned and indexed written works dating back to the 1500s, providing a wealth of data for Ngram to draw upon. However, the accuracy of Ngram could be compromised if low-quality or AI-generated works are included in the dataset. Linguists and other academics rely on Ngram for their research, making it crucial that the data it cites is reliable and reflective of genuine language usage trends.

Google assured 404Media that recent works on Google Books do not currently impact Ngram results and that any potential inclusion of low-quality books in future data updates would be taken into consideration. However, the concern remains that unintentional indexing of AI-generated content could skew the results of Ngram and mislead researchers who rely on the tool for language analysis. With the last data update for Ngram dating back to 2019, the potential impact of recent low-quality or bot-generated books on the tool’s accuracy remains unknown.

As illustrated by the case of Tristin McIver’s book, which used phrases commonly associated with AI or chatbots, there is a risk that AI-generated content could unintentionally be included in Google Books and subsequently impact the data available for Ngram. While Google Books is a valuable resource for accessing a wide range of published material, the quality and authenticity of the content within its index must be closely monitored to ensure the reliability of tools like Ngram. Researchers and academics who rely on Ngram for language analysis must remain vigilant about the potential impact of low-quality or bot-generated works on the tool’s accuracy and validity.

In conclusion, Google Books’ indexing of low-quality or AI-generated books has raised concerns about the potential impact on the accuracy of the Ngram language tracking tool. With Ngram relying on data from Google Books to analyze language changes over time, the unintended inclusion of low-quality or bot-generated content could compromise the reliability of the tool. While Google has stated that recent works on Google Books do not currently affect Ngram results, the possibility of future data updates including such content raises questions about the tool’s accuracy and integrity. Researchers and academics must remain cautious about the quality of the data they rely on for language analysis and advocate for measures to ensure the authenticity and reliability of tools like Ngram.

Share This Article