Webhook attempt timed out due to a backlog of concurrent requests


#1

This is driving me a little crazy. I've done extensive tests on our api and know we can handle up to around 300 reqs/s before the server starts lagging.

I tried setting the concurrent requests to 5, 10, 50, and now it's at 100.

The test I do every time is to connect one bare account to the app, and import 600 new contacts with first and last names. Usually around 4700 contacts worth of webhook batches successfully get pushed to my API. The other 1200 contacts timeout and retry like 8-9 times or more before successfully going through.

I'm seeing this message in the Webhook monitoring: "Webhook attempt timed out due to a backlog of concurrent requests."

I don't understand. Does this mean my API timed out? Does this mean there are too many batches being moved in HubSpot and it has to retry later. I assume it's the former.

If so, I have a few questions regarding the nature of the HubSpot Webhook Deamon:

  • How many actual requests per second does it send if I set 100 concurrent requests?
  • What is the actual timeout of the webhook? (I've seen 1 second here and 2 seconds there, no one really has an answer)

My api simply takes the json, parses it (takes microseconds to do that) then pushes them into our queue system (which also takes 1-10 ms max). The DNS is through cloudflare and I've done ample speed tests on the server. I also have around 6 replications of the API I'm using which load balances the whole thing out.

Need some light shed on these issues. When I have 500 users I don't want to be battling these problems.


#3

Hi @dedupely, that seems weird. It might be an issue with our concurrency limit. I'm going to reach out to my team on this and dig in. As a super short-term experiment, what happens if you keep ramping up your concurrency limit? Maybe to 200 / 300 / 400 + ? I'm definitely reaching out about this. This does not seem right at all.

For your second question, at the bottom of this doc you'll find the way we handle retries. I've pasted it below for simplicity-sake

Retries

If your service has problems handling notifications at any time, we will attempt to re-send failed notifications up to 10 times.

We will retry in the following cases:

  • Connection Failed - If we cannot open an http connection to the provided webhook URL
  • Timeout - If your service takes longer than 2 second to send back a response to a batch of notifications
  • Error codes - If your service responds with an HTTP status code that is not one of: 400, 401, 403, 404, 405

Retries will occur with an exponential backoff based on the next attemptNumber . So the first retry will happen in 2 seconds, the second retry in 4 seconds, the third retry in 8 seconds, etc.

Batches and retries: When retrying, notifications will only be included in a batch for the initial attempt, and the first two retries. After that, notifications will be sent in their own batch of one, to prevent any issues with a specific notification in the batch from preventing a separate notification from completing successfully.


#4

I've noticed it eventually processes all the webhooks. Just that it takes a few retries. I've tried with 200+ and pretty much gives the same results.


#5

Got it @dedupely. I'm going to still reach out to the team in charge of webhooks so we can get eyes on why something like this might be happening. Thanks for the info!


#6

No, thank you! I really need help on this. Want to make a big splash with the integration. Love that you guys give support on this stuff.


#8

Hi @dedupely, sorry for the long delay in response on this topic. Our team has been hard at work identifying the root cause of this issue, and why this error is occurring. We don't have an official update on the situation but we do think this is a bug that is being experience by other accounts as well. We are continuing to look into this and will update you as soon as we can.


#9

Thanks! This is awesome. I thought it was just me.


#10

Hello, i am having the same trouble.. my webhook works fine.. Details are pushed to the other system and actions are done, but in the workflow there is a try again and try again till it works... This probably make the problem that the details are pushed 3 times to the other system... Hope there will be a solution.. Just wanted to let you know that i have same problem.