As a Rails freelancer, it’s not rare for me to rescue apps with broken cron jobs, which often induce severe bugs. I definitely wanted to avoid those issues when bootstrapping WiseCash though — and to make sure I could sleep well at night or go on week-ends without having my family life disturbed. Here is what I do so far to achieve that goal.
Keeping an eye on your scheduled tasks with Proby
Proby monitors your cron jobs execution and notifies you if one didn’t run or ended up with an error.
Each cron job has a unique id and you just wrap the cron call with the proby script:
0 7 * * 0 PROBY_TASK_ID=XYZT /yourapp/current/script/proby 'cd /yourapp/current && RAILS_ENV=production bundle exec rake cron:remind'
Cron jobs may not start for various reasons (RVM setup, missing dependency, …) and may also start but raise errors. You’ll get notified by email in both cases — put appropriate email routing rules in place to make sure you can get a proper notification.
Note that I had to tweak the proby script a bit to make sure errors were reported as such, in my case (see my gist).
Getting detailed error notifications for cron jobs
If you use an error notification service, make sure it properly handles rake tasks (or whatever form your cron jobs take). If you are unsure, make a real test and verify what happens in case of error, because I found out that not all notification services handle rake tasks properly.
This is a must-have complement to Proby to diagnose and fix errors — HoneyBadger helped me pinpoint a very weird interaction between my codebase and NewRelic for instance, which only happened in the cron rake tasks and only in production.
Making sure you can resume the processing after an error
Currently my cron jobs “fail-fast” when an exception occur. This let me investigate and resume manually once I fully understand what happened, which I prefer to looping over and generating hundreds of errors.
I write those jobs in a more or less idempotent fashion, using simple timestamps to keep track of what was processed, for instance:
def self.remind(user) now = Time.now.utc today = now.to_date unless user.last_reminded_at.try(:to_date) == today send_the_reminder(user) user.update_attribute(:last_reminded_at, now) end end
Using this pattern, if I relaunch the process manually after investigating an error, already processed users won’t receive a duplicate.
Keep a low memory consumption
A fairly common Rails pattern seen in cron jobs is to load all the records at once then process them:
User.all.each do |user| CronJobs::WeeklyReminder.process!(user) end
Such code often goes unnoticed for years, until one day when the server blows up because too many records get loaded in memory at once.
Instead, make sure to use statements like find_each or find_in_batches that load a limited number of records at once:
User.find_each do |user| CronJobs::WeeklyReminder.process!(user) end
So far these simple tips helped me sleep very well at night, and handle my current load pretty well too. I hope these will be helpful to you too!
Thanks for sharing this article around!
comments powered by Disqus