Google On How Googlebot Handles AI Generated Content

Ammon Johns asked the question, which was read by Ulrika Viberg.

If it’s an empty page, then we might be like, we don’t know.

Google’s crawler, Googlebot, also downloads the HTML, images, CSS and JavaScript files to render the webpage.

How Google Handles AI Generated Content


Is it likely that rendering processes might have to be simplified?”

If we see, okay, this looks like absolute.. we can be very certain that this is crap, and the JavaScript might just add more crap, then bye.

The algorithm was not created to find low quality machine generated content. But they discovered that the algorithm automatically discovered it.

“So we are doing quality detection or quality control at multiple stages, and most s****y content doesn’t necessarily need JavaScript to show us how s****y it is.

What Ammon apparently wants to know is if there are any special processes happening in response to the AI content in order to deal with the increased crawling and rendering load.

“This paper posits that detectors trained to discriminate human vs. machine-written text are effective predictors of webpages’ language quality, outperforming a baseline supervised spam classifier.”

“…we are doing quality detection or quality control at multiple stages…

Webpage rendering is the process of creating the webpage in a browser by downloading the HTML, images, CSS and JavaScript then putting it all together into a webpage.

They said, content production increases due to AI, putting increasing loads on crawling and rendering.

AI might increase the scale, but doesn’t change that much. Rendering is not the culprit here.”

Quality Detection Applies To AI

In it the researchers observe:

Martin Splitt did not say that Google was applying AI detection on the content.

Martin’s answer provided insights into how Google handles AI generated content and the role of quality control.

Googlebot Webpage Rendering

This is very interesting because Search Engine Journal published an article about a quality detection algorithm that also detects low quality AI content.

The algorithm was designed to detect machine generated content that also detects low quality content in general.

And then, when rendering comes back with crap, we’re like, yeah okay, fair enough, this has been crap.

“…we’re rolling out a series of improvements to Search to make it easier for people to find helpful content made by, and for, people.”

Watch the Duda webinar featuring Martin Splitt at the 35:50 minute mark:

Exploring the Art of Rendering with Google’s Martin Splitt

One of the audience members asked the question about whether the large amount of AI content had an effect on Google’s ability to render pages at the point of crawling.

So, this is already happening. This is not something new.

Martin next addresses the obvious issue with AI content that SEOs wonder about, which is detecting it.

Danny Sullivan wrote about the Helpful Content algorithm:

What Martin seems to be saying is that:

  1. There’s nothing new being applied for AI content
  2. Google uses quality detection for both human and AI content

So, this is already happening. This is not something new.

Martin continued:

I see it a lot.

People usually don’t put empty pages here, so let’s at least try to render.

Circling back to what Martin Splitt said:

“No, I don’t think so, because my best guess is…”

AI might increase the scale, but doesn’t change that much.”

Here is the question:

The research paper is titled, Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study.

He didn’t just mention content written by people once though. His article announcing the Helpful Content system mentioned it three times.

Google’s Martin Splitt was asked how Googlebot’s crawling and rendering was adapting to the increase in AI generated content.

Martin offered an explanation but he also added information about how Google decides at crawl time whether a webpage is low quality and what Google does after a determination.

Martin Splitt replied:

“So, we have one from Ammon as well, and this is something that is talked about a lot.

He said that Google was using Quality Detection at multiple stages.

So, if we catch that it is s****y content before, then we skip rendering, what’s the point?

The context of Martin’s comments were in a webinar called Exploring the Art of Rendering with Google’s Martin Splitt, which was produced by Duda.

Much about this algorithm tracks with everything Google announced about their Helpful Content system which is designed to identify content that is written by people.