Google quietly updated their Google Search Central documentation to note that they are now indexing .csv files.
Something curious about the indexing of CSV files by Google is that Google’s Dataset search appearance already used CSV files but apparently only when described with structured data.
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats as structured data.
This opens up a new way to get crawled or if a publisher doesn’t want their .csv files crawled, it may mean updating robots.txt to exclude those files.
Comma-Separated Values (CSV)
Here are some examples of what can qualify as a dataset:
- A table or a CSV file with some data
- An organized collection of tables
- A file in a proprietary format that contains data
- A collection of files that together constitute some meaningful dataset
- A structured object with data in some other format that you might want to load into a special tool for processing
- Images capturing data
- Files relating to machine learning, such as trained parameters or neural network structure definitions
- Anything that looks like a dataset to you”
The updated documentation makes it clearer that Google relies on the structured data to use CSV files in their dataset search appearance.
Google’s ability to index CSV files is a new functionality because a “filetype” search on Google for CSV files does not currently return CSV files.
Comma-separated values (CSV) files are text files that save data in a tabular format that can be displayed as a spreadsheet.
Google updated the above documentation in 2022 and redirected it to the new Search Central Documentation.
The use of tabular data as a search appearance goes back to 2018, when Google announced that they would be showing that kind of data in search when the data is accompanied with structured data.
Featured image by Shutterstock/Jane Kelly
They are useful for doing things like uploading a list of URLs for crawling to software like Screaming Frog.
File types indexable by Google
Dataset (Dataset, DataCatalog, DataDownload) structured data
Read Google’s Search Central Dataset Documentation:
But will this change mean that Google will eventually crawl CSV files and use those for search appearances (in addition to tabular data notated in structured data)?
“Datasets are easier to find when you provide supporting information such as their name, description, creator and distribution formats are provided as structured data…
Here are some examples of what can qualify as a dataset:
CSV files contain data in plain text, which means that the CSV files do not contain style elements like fonts nor does it contain images or active links.
The definition of a core algorithm update is when Google makes “significant” and “broad changes” to their core algorithm.
Dataset structured data documentation on Google’s old Developer documentation (viewable on Archive.org) states that CSV files are an acceptable standard for appearing in dataset search features.
This is what the current documentation explains today:
But it may bear considering whether Google has improved their crawling engine to be able to index CSV or if that capability was already there.
It may be a coincidence that the indexing of CSV files and the core algorithm update happened at virtually the same time.
According to the original documentation:
Google’s approach to dataset discovery makes use of schema.org and other metadata standards that can be added to pages that describe datasets…
A table or a CSV file with some data…”
Google Indexing CSV Related to Recent Update?
Read the updated list of a indexable file types:
Searches like the following currently do not return CSV files:
- filetype:csv site:.gov
- filetype:csv site:.edu
- filetype:csv site:.com
Google Has Already Indirectly Used CSV Files
But they are also useful for organizing data in a spreadsheet.
CSV File Indexing Is New