asemanfar - a blog about programming

Bluepill: a new process monitoring tool

November 04, 2009

Over the past several weeks, two co-workers and I wrote a simple process monitoring tool: bluepill, because it keeps things up. Bluepill replaces the existing process monitoring tool we were using which we had some issues with.

Bluepill is now running on over a dozen of our machines with no memory leaks. It's monitoring several of our services, including long-running rake tasks, background workers, unicorn processes, and pandemic workers. It's been idling at 18MB (see related post for graph).

Here is a sample of the DSL:

   1  Bluepill.application("app_name") do |app|
   2    app.process("process_name") do |process|
   3      process.start_command = "/usr/bin/some_start_command"
   4      process.pid_file = "/tmp/some_pid_file.pid"
   5  
   6      process.checks :cpu_usage, :every => 10.seconds, :below => 5, :times => 3        
   7      process.checks :mem_usage, :every => 10.seconds, :below => 100.megabytes, :times => [3,5]
   8      process.checks :flapping, :times => 2, :within => 30.seconds, :retry_in => 7.seconds
   9    end
  10  end

Check the readme for detailed installation and usage.

read, fork, and report bugs.

Comments

posted by plymouth on 11/05/09 12:51 AM UTC

this looks less like a "DSL" and more like a straight-forward API

posted by Arya Asemanfar on 11/05/09 04:19 AM UTC

You're right, it's not strictly a DSL, but it's not a straight-forward API either. If you look at the source, what we call the DSL is a standalone file that interacts with the rest of the classes to configure monitoring as you specified.

We separate it out the configuration code from the monitoring code so we don't have methods like checks that accepts a condition name and options as arguments and generates a new ProcessCondition object for an incomplete Process object. The intention was to not litter our monitoring-logic code with configuration niceties; they're two separate concerns.

posted by Matt Hooks on 11/05/09 09:51 PM UTC

Awesome! I've been looking into process monitoring lately and both god and monit came up short. I'm looking forward to checking out bluepill.

posted by Jim Jeffers on 12/14/09 01:07 PM UTC

Hi there,

I'm assuming a lot of people may be interested in setting up BluePill in conjunction with Phusion Passenger but I can't find anything regarding the subject online. I'd like to do this because despite passengers process monitoring I still run into frozen threads that aren't getting shut down when they should be. Just today I found a passenger spawned ruby instance that grew to 800mb. I had to terminate it myself.

So I was wondering how would you go about monitoring the instances of ruby that passenger is spawning for an application and how would you terminate them? It seems less straightforward because we only need to terminate the threads (so passenger can take care of restarting them). Could you point me in the right direction?

posted by Arya Asemanfar on 12/15/09 05:48 PM UTC

Hi Jim,

I believe Passenger's ruby workers are processes and not threads, Passenger forks to create the ruby worker that processes Rails requests. So given that, you can monitor the children of the Passenger ApplicationSpawner. Check out the post on monitoring unicorn with bluepill to see how to monitor child processes.

Since Apache (or nginx) is the process that starts the Passenger ApplicationSpawner, it's going to be tricky to set a start command (you won't be able to have a functioning start command since there is no way to directly start the ApplicationSpawner. It's possible to change bluepill to make the start command optional, but it'd take a bit of reworking some things. You can create a ticket on the issues page for a feature request to support that.

Also, if you haven't looked at Unicorn, I recommend it over Passenger. If you have the ability to switch, take a looking into it. It worked a lot better for us.

posted by Nick Howard on 01/18/10 09:41 PM UTC

Awesome. I was looking at using God to do some process monitoring, but after seeing bluepill I thought, "heck with God, I'll use this."

Also, the link to the readme on github is broken.


Leave a Comment