Google Search Console Data & BigQuery For Enhanced Analytics


Google Search Console is a great tool for SEO pros.

But as many of us know, using the interface exclusively comes with some limitations.

In the past, you often had to have specific knowledge or the help of a developer to overcome some of them by pulling the data from the API directly.

Google started offering a native Google Search Console (GSC) to what was Google Data Studio (now Looker Studio) connector around 2018.

This integration allows users to directly pull data from GSC into Looker Studio (Google Data Studio) to create customizable reports and dashboards without needing third-party connectors or additional API configurations.

But then, in February 2023, things got interesting.

Google now allows you to put in place an automated, built-in bulk data export to BigQuery, Google’s data warehouse storage solution.

Let’s get candid for a minute: most of us still rely on the GSC interface to do many of our activities.

This article will dive into why the bulk data export to BigQuery is a big deal.

Be warned: This is not a silver bullet that will solve all of the limitations we face as SEO pros. But it’s a great tool if you know how to set it up and use it properly.

Break Free From Data Constraints With BigQuery Bulk Exports

Initially, the bulk data export was meant for websites that received traffic to tens of thousands of pages and/or from tens of thousands of queries.

Data Volumes

Currently, you have three data export options beyond the BigQuery bulk data export:

  • Most of the reports in GSC allow you to export up to 1,000 rows.
  • You can get up to 50,000 rows via a Looker Studio integration.
  • With the API, you get up to 50,000 rows, enabling you to pull a few more elements beyond the performance data: URL Inspection, sitemaps, and sites’ data.

Daniel Waisberg, Search Advocate at Google, explains it this way:

“The most powerful way to export performance data is the bulk data export, where you can get the biggest amount of data.”

There are no row limits when you use the BigQuery bulk export.

BigQuery’s bulk data export allows you to pull all rows of data available in your GSC account.

This makes BigQuery much more suitable for large websites or SEO analyses requiring a complete dataset.

Data Retention

Google BigQuery enables unlimited data retention, allowing SEO pros to perform historical trend analyses that are not restricted by the 16-month data storage limit in Google Search Console.

Looker Studio and the API do not inherently offer this feature. This means you gain a real capacity to see evolutions over multiple years, and better understand and analyze progressions.

As a storage solution, BigQuery allows you to stock your data for as long as you wish and overcome this limitation.

The ability to retain and access unlimited historical data is a game-changer for SEO professionals for several reasons:

  • Comprehensive long-term analysis: Unlimited data retention means that SEO analysts can conduct trend analyses over extended periods. This is great news for those of us who want a more accurate assessment of how our SEO strategies are performing in the long term.
  • Seasonal and event-driven trends: If your website experiences seasonal fluctuations or events that cause periodic spikes in traffic, the ability to look back at longer historical data will provide invaluable insights.
  • Customized reporting: Having all of your data stored in BigQuery makes it easier to generate custom reports tailored to specific needs. You can create a report to answer virtually any question.
  • Improved troubleshooting: The ability to track performance over time makes it easier to identify issues, understand their root causes, and implement effective fixes.
  • Adaptability: Unlimited data retention gives you the flexibility to adapt your SEO strategies while maintaining a comprehensive historical perspective for context.

Data Caveats

Just like most data tracking tools, you won’t be surprised to learn that there is no retroactivity.

Keep in mind that the GSC bulk data export starts sending data daily to BigQuery only after you set it up. This means that you won’t be able to store and access the data before that.

It’s a “from this point forward” system, meaning you need to plan ahead if you want to make use of historical data later on. And even if you plan ahead, the data exports will start up to 48 hours later.

While the bulk data export does include significant metrics such as site and URL performance data, not all types of data are exported.

For example, coverage reports and other specialized reports available in GSC are not part of what gets sent to BigQuery.

Two primary tables are generated: searchdata_site_impression and searchdata_url_impression. The former aggregates data by property, so if two pages show up for the same query, it counts as one impression.

The latter table provides data aggregated by URL, offering a more granular view. In plain English, when you use Google Search Console’s bulk data export to BigQuery, two main tables are created:

  • searchdata_site_impression: This table gives you an overview of how your entire website is doing in Google Search. For example, if someone searches for “best sausage dog costume” and two pages from your website appear in the results, this table will count it as one “impression” (or one view) for your entire site rather than two separate views for each page.
  • searchdata_url_impression: This table is more detailed and focuses on individual web pages. Using the same example of “best sausage dog costume,” if two pages from your site show up in the search results, this table will count it as two separate impressions, one for each page that appears.

Another important element is that you are dealing with partitioned data tables. The data in BigQuery is organized into partition tables based on dates.

Each day’s data gets an update, and it’s crucial to be mindful of this when formulating your queries, especially if you want to keep your operations efficient.

If this is still a bit obscure for you, just remember that the data comes in daily and that it has an impact on how you go about things when doing data analysis.

Why Set This Up?

There are advantages to setting up BigQuery bulk exports:

Joining GSC Data With Other Data Sources

Getting the Google Search Console out in a data warehouse means that you can enjoy the advantages of joining it with other data sources (either directly in BigQuery or in your own data warehouse).

You could, for instance, blend data from the GSC and Google Analytics 4 and have more insightful information regarding conversions and behaviors driven by organic Google traffic.

Run Complex Calculations/Operations Using SQL

A solution such as BigQuery allows you to query your data in order to run complex calculations and operations to drive your analysis deeper.

Using SQL, you can segment, filter, and run your own formulas.

Anonymized Queries

BigQuery deals with anonymized queries differently from other ETL vendors that access the data via the API.

It aggregates all the metrics for the anonymized queries per site/URL per day.

It doesn’t just omit the rows, which helps analysts get complete sums of impressions and clicks when you aggregate the data.

What’s The Catch?

Unfortunately, no tool or solution is perfect. This new built-in integration has some downfalls. Here are the main ones:

It Means Developing Expertise Beyond SEO

You should get familiar with Google Cloud Platform, BigQuery, and SQL on top of your GSC knowledge.

Starting a bulk data export entails carrying out tasks in GSC but also Google Cloud.

An SQL-Based Platform Requiring Specific Expertise

With BigQuery, you need SQL to access and make the most of your data.

You therefore need to make SQL queries or have someone in-house to do it for you.

The platform also has its own way of functioning.

Using it efficiently requires knowing how to use it, which requires time and experience.

While Looker Studio does allow SQL-like data manipulation, it may not offer the full power and flexibility of SQL for complex analyses.

API data would need to be further processed to achieve similar results.

URL Impressions Contain More Anonymized Queries

“One thing to be mindful of is the difference in anonymized query volume between the  searchdata_url_impression table and the searchdata_site_impression table.

Like the GSC interface, some queries for particular URLs in particular countries might be so infrequent that they could potentially identify the searcher.

As a result, you’ll see a greater portion of anonymized queries in your searchdata_url_impression table than in your searchdata_site_impression table.” Source: Trevor Fox.

Potential Costs

Even though this feature is initially free, it might not be the case forever.

BigQuery is billed based on the amount of data stored in a project and the queries that you run.

The solution has thresholds from where you start to pay potentially each month.

Over time, it might then become costly – but it all depends on the amount of data exported (websites with many pages and queries will probably be heavier in that regard) and the queries you run to access and manipulate it.

How To Get Your GSC Data In BigQuery

1. Create A Google Cloud Project With BigQuery And Billing Enabled

The first step is to create a project in Google Cloud with BigQuery and billing enabled.

Access the Console. On the top left, click on the project you currently are in (or Select a project if you have none), this will open a popup.

Click on NEW PROJECT and follow the steps. Be careful when you choose the region because you will have to pick the same one when you set up the bulk export in the GSC.

This part is not spoken about very often. If you wish to query two datasets like GSC and GA4 data, they need to be in the same region.

“For some areas like Europe and North America, you can query across the wider continental region but in places like Australia you can’t query across Melbourne and Sydney.

Both datasets need to be in the exact same location”

Sarah Crooke, BigQuery Consultant at Melorium, Australia, said:

Once the project is created, go to the Billing section. Use the search bar at the top to find it. Google Cloud does not have the most user-friendly interface without the search bar.

You need to create a billing account. Piece of advice before you proceed: Take the time to investigate if you don’t already have a billing account set up by someone else in the company.

Once that’s done, you can assign the billing account to your project. You need a billing account in order to set up the bulk export.

Please follow the instructions provided by the Google Cloud documentation to do so.

Then, you need to go to the APIs & Services section (again, you can use the search bar to find it).

Look for the Bigquery API. Enable it for the project you created.

One more step: You need to add a user. This will enable Google Search Console to dump the data in BigQuery. Here is the official documentation to do this.

Let’s break it down quickly: 

  • Navigate in the sidebar to IAM and Admin. The page should say Permissions for project <your_project>.
  • Click + GRANT ACCESS.
  • It will open a panel with Add principals.
  • In New Principals, put [email protected]
  • Select two roles: BigQuery Job User and BigQuery Data Editor. You can use the search bar to find them.
  • Save.

Lastly, select your project and copy the Cloud project ID associated with it.

You’re done in Google Cloud!

2. Setup The Bulk Data Export In The GSC Property Of Your Choice

Once the Google Cloud part is completed, you will need to activate the bulk data export to your new Google Cloud project directly in the Google Search Console.

To do so, go to the Settings section of the property you want to export data from and click on Bulk data export.

Paste the Cloud project ID of the project you created before. You can also customize the name of the dataset that the GSC will create in your project (it is “searchconsole” by default).

Lastly, pick the same dataset location that use chose for your Google Cloud project.

Once you are all set, click on Continue. The GSC will let you know if this initial setup is functional or not. The dataset will also be created in your project.

The data exports will start up to 48 hours later.

They are daily and include the data for the day of the setup. While API can be set to do scheduled pulls, it often requires additional programming.

This is why the bulk data export works for many big websites.

Keep in mind that the GSC can run into data export issues after this initial setup, in which case it is supposed to retry an export the following day.

We recommend you query your data in the first days to check if it is being stored properly.

So, What Next?

You can get started querying data now! Here are some things you can analyze that cannot be analyzed easily in another way:

  • Query multiple pages at once: In BigQuery, you can run a single SQL query to get metrics for all pages (or a subset of pages) without having to click through each one individually.
  • Traffic seasonality report: Compare performance metrics by season to identify trends and optimize campaigns accordingly.
  • Bulk analysis across multiple sites: If you manage a brand with more than one website, this allows you to look at clicks across all these sites at once.
  • Click-through rate (CTR) by page and query: Instead of just looking at the average CTR, you could calculate the CTR for each individual page and search query.

In summary

In summary, the built-in bulk data export feature from Google Search Console to Google’s BigQuery offers a more robust solution for data analytics in SEO.

However, there are limitations, such as the need to develop expertise in Google Cloud and SQL, and potential costs associated with BigQuery storage and queries.

More resources: 


Featured Image: Suvit Topaiboon/Shutterstock



منبع