-
Notifications
You must be signed in to change notification settings - Fork 218
Description
Hello there,
I am a long time Sidekiq user who is considering using solid_queue instead.
One thing that I haven't seen much chat about is that solid_queue doesn't have the concept of 'the set of jobs that are currently retrying' ( ie: have failed at least once, have a retry scheduled (or currently running), but have not yet 'failed'.
This bothers me because in the past monitoring that set has been the most useful from an operational perspective.
Generally what I would do is the following
-
Gather metrics on the number of jobs in each state and generate alerts for spikes. So I was able to be alerted if there was a spike in the number of jobs retrying. (Alerting on a spike in the number of failed jobs would be too late)
-
Once I was alerted that there were more jobs retrying than usual, I would find the Retries Tab in Sidekiq very useful as I could see at a glance
- Which Job types are failing
- The retry count
- what error was encountered to cause the retry
-
For further investigation I would look at our error tracking tool
However with solid_queue I can only monitor failed jobs (which is too late) or scheduled jobs ( which has actual scheduled jobs mixed in with retries, and no access to the last error)
I suspect that the reason behind this is that solid_queue relies on ActiveJob for retries whereas other backends have their own retry mechanism, but given that 'executions' and 'exception_executions' are available in the arguments it might be possible to add support for this.
Is anything like this planned?
In the meantime, I have been trying to do something for 1) above with a query like
SolidQueue::Job.where("scheduled_at > NOW() AND (arguments::jsonb->>'executions')::int > 1").count
But :
- It sometimes reports more than the number of retrying jobs that I can see in the Scheduled Jobs tab of Mission control and I'm not sure why
- I am unsure about the performance of such a query
- I am unsure about how 'future' proof such an approach might be