Server Logs & Hubspot


#1

Hello everyone, I’m hoping to receive some knowledge regarding Hubspot and server logs. Recently, spoofed Googlebot or Bingbot variants began receiving 403 forbidden errors. Hubspot support informed me this is a recent change which verifies IP address with user-agent and serves 403 to rogues. As recent as a month ago this was not the case. I’d like to peek at our server logs for this reason primarily, but other SEO reasons as well.

Unfortunately, I’m working with a site that has been in the hands of several different developers and platforms over the years, leading to a convoluted lack of understanding within our organization of how our site works.

We used to be on Wordpress, but migrated to Hubspot. We shifted hosting over to Cloudflare, but many of the benefits from that switch were lost when we learned that Hubspot wanted an exorbitant sum of $$$ to support SSL. However, as it was not harming anything, Cloudflare remained. Hubspot seems to think we are not hosting our root on Hubspot servers, so support informed me that they can not provide me with server logs. Here’s what they said (I was also asking about .htaccess access):

“We don’t provide any access to a .htaccess file as this would be a file for many HubSpot portal servers and not just your own. The same goes for general server logs, since we are only hosting content and not your root domain. I would recommend reaching out to your DNS provider, since they would have logs specific to your own root domain, which is information we simply cannot provide.”

Hmm. That’s not how I understood things to work, so I reached out to our most recent previous developer, to which he said:

"Hmmm, that’s a little strange they would say that. The DNS stuff is a bit convoluted for the domain but they are in fact hosting the site on their servers. Whoever you were talking to was probably confused because the nameservers are on cloudflare.

The reason for having cloudflare in the mix was when the site was still on wordpress we moved it to cloudflare for some additional security, to improve performance and get the site going through SSL and HTTP/2. Come to find out once the site was launched on hubspot there was some large sum of money they wanted to enable the site over SSL. Many of the benefits of cloudflare were lost at that point but it wasn’t doing any harm so cloudflare remained the nameservers.

I don’t recall how it worked but it seems like hubspot prefers that you use their nameservers so that may be why they are telling you that they are just hosting the content and not the root domain. There currently is a permanent redirect on the root domain that forces requests back over to www. Because of that, trying to find something on the root domain or saying that they aren’t hosting the root domain is sort of a pointless: the root domain always goes back to www with a permanent redirect. If you have some URLs you’re aware of that aren’t doing that then you’ll need to look at rules on the cloudflare side or the hubspot side related to redirects and see what kinds of exceptions there are that are causing that. When I left I wasn’t aware of any sorts of exceptions at least on the cloudflare side."

Can anyone tell me where I ought go to get server logs? Has anyone successfully done this before?


Server log request
#2

Hi @nchimonas

Can you tell me more about what information you’d be looking for with this? We would not be able to provide access to the raw server logs for a site hosted on HubSpot, but we may have another way to see what you’re looking for.


Server log request
#3

I would like to know when any user-agent Googlebot has requested resources from our server, the status code associated with their requests, and the IP ranges of various user-agent Googlebot requests so I can validate real vs spoof, and only assess real Googlebot requests.

I want to know the crawl path of Googlebot, Bingbot, and other various crawlers. Which resources they request, when, and their status codes.


#4

We wouldn’t have anything that specific in the data we’d report on. You would be able to get some of that data for Googlebot using Google’s webmaster tools:
https://support.google.com/webmasters/answer/35120?hl=en

As far as spoofed bots, we do block requests using some known search bot user-agents that are from IPs not owned by the respective search engines, so any test requests using those user-agents will receive an error. We started blocking those spoofed requests on August 18th this year.