Writing Reliable Cron Jobs

November 30, 2013

As a Rails freelancer, it’s not rare for me to rescue apps with broken cron jobs, which often induce severe bugs. I definitely wanted to avoid those issues when bootstrapping WiseCash though — and to make sure I could sleep well at night or go on week-ends without having my family life disturbed. Here is what I do so far to achieve that goal.

Keeping an eye on your scheduled tasks with Proby

Proby monitors your cron jobs execution and notifies you if one didn’t run or ended up with an error.

Proby

Each cron job has a unique id and you just wrap the cron call with the proby script:

0 7 * * 0 PROBY_TASK_ID=XYZT /yourapp/current/script/proby 'cd /yourapp/current && RAILS_ENV=production bundle exec rake cron:remind'

Cron jobs may not start for various reasons (RVM setup, missing dependency, …) and may also start but raise errors. You’ll get notified by email in both cases — put appropriate email routing rules in place to make sure you can get a proper notification.

Note that I had to tweak the proby script a bit to make sure errors were reported as such, in my case (see my gist).

Getting detailed error notifications for cron jobs

If you use an error notification service, make sure it properly handles rake tasks (or whatever form your cron jobs take). If you are unsure, make a real test and verify what happens in case of error, because I found out that not all notification services handle rake tasks properly.

For WiseCash I currently use HoneyBadger which has built-in rake integration which works well for me.

This is a must-have complement to Proby to diagnose and fix errors — HoneyBadger helped me pinpoint a very weird interaction between my codebase and NewRelic for instance, which only happened in the cron rake tasks and only in production.

Hey! Enjoying this post? Subscribe to get more on the following topics!
You will get one or two articles per month on:

Making sure you can resume the processing after an error

Currently my cron jobs “fail-fast” when an exception occur. This let me investigate and resume manually once I fully understand what happened, which I prefer to looping over and generating hundreds of errors.

I write those jobs in a more or less idempotent fashion, using simple timestamps to keep track of what was processed, for instance:

def self.remind(user)
  now = Time.now.utc
  today = now.to_date
  unless user.last_reminded_at.try(:to_date) == today
    send_the_reminder(user)
    user.update_attribute(:last_reminded_at, now)
  end
end

Using this pattern, if I relaunch the process manually after investigating an error, already processed users won’t receive a duplicate.

Keep a low memory consumption

A fairly common Rails pattern seen in cron jobs is to load all the records at once then process them:

User.all.each do |user|
  CronJobs::WeeklyReminder.process!(user)
end

Such code often goes unnoticed for years, until one day when the server blows up because too many records get loaded in memory at once.

Instead, make sure to use statements like find_each or find_in_batches that load a limited number of records at once:

User.find_each do |user|
  CronJobs::WeeklyReminder.process!(user)
end

So far these simple tips helped me sleep very well at night, and handle my current load pretty well too. I hope these will be helpful to you too!

Thibaut Barrère (WiseCash founder)

@thibaut_barrere

thibaut@wisecashhq.com

Thanks for sharing this article around!


comments powered by Disqus