Google: Crawl Budgets & Delays Not About Page Size
This past Friday, while at the GooglePlex, John Mueller, Martin Splitt and Lizzi Harvey from the Google team hosted an office hours and Martin MacDonald was there to represent the SEOs. He asked a question about crawl budget and stuff and if a page the size of 10mb can reduce crawl budget compared to a page only 400kb. All the Googlers shook their heads no.
What does matter is the number of requests your server can handle. If Google detects that your server is slowing down in terms of the number of requests, Google will back off a bit to make sure GoogleBot isn’t the reason your server crashes. But page size really isn’t a direct factor to Google slowing down the crawl of your web site.
Here is the transcript but it starts at 31:11 minute mark into the video (you can also scroll back a bit more to hear more of the question):
Martin MacDonald: Is that [crawl budget] tied to a hard number of URLs… transfer size it is much that it might reduce a pages website at websites pages from 10mbs to 300kbs. Would that dramatically increase the number of pages they can crawl?
John Mueller: I don’t think that would change anything.
Martin Splitt: It’s request
John Mueller: I mean what happens, what sometimes happens, is if you have a large response then it just takes longer for us to get that and with that we’ll probably crawl them less because we really trying to avoid having too many simultaneous connections to server. So if you have a smaller response size and obviously we can get more simultaneous requests and we could theoretically get more. But it’s not the case that if you reduce the size of your pages and suddenly solve problems.
Martin Splitt: Also it’s also that when the response takes a long time it’s not just the size of the page, it is also the response time, the service tend to respond slower than if overloaded or allowed to be overloaded. So that’s also signal that we’re picking up. Like this takes a really long time to get data from the server, maybe we should look into the crawl limits of the hosts code on this particular service so that we’re not taking down the server.
John Muller: We look at it on a per server level.So if you have content from a CDN or from from other networks, other places then that would apply to their protocol. Essentially because like how how slow and embedded resources doesn’t really affect the rest of the content on the site.