Great post, and this is something we've faced as well. Luckily our jobs are main...

ejlangev · on June 27, 2014

Yeah it tends to be from unresponsive external web services that crop up every once in a while. Having a couple of jobs that fail that way isn't the end of the world for us event if we don't retry them.

Yes, the situation you're describing is the RESQUE_TERM_TIMEOUT option which dictates how long the parent process waits to send a KILL signal after it send the TERM signal to the child. On Heroku you want that to be less than 10 seconds (and in practice more like 8 at max) otherwise heroku will terminate both processes with a KILL signal at the same time.