asemanfar - a blog about programming

 

Posts tagged with "rails"

ActiveRecord bug with reload and new_record

January 27, 2010

In ActiveRecord, the new_record? method returns true when the object represents a record that is not persisted. This is implemented by a trivial flag set when you initialize a new record with Model.new. The reload method in ActiveRecord calls find and passes the primary key (usually id) for that model, replaces the attributes hash with the newly retrieved (and clears some internal caches). A bug arises when you try to call reload on a new record after you've set a value for id.

After you initialize a new record, set its id, and then call reload which subsequently loads all the attributes from the database (if it exists), the AR object is left in a weird state; when you save it tries to insert despite the fact that the record knows it came from the database and will raise an error.

   1  record_a = Model.new(:some_attribute => "some value")
   2  record_b = Model.new(:some_attribute => "some other value")
   3  record_a.id = record_b.id = 1
   4  record_a.save
   5  record_b.reload # this reloads the record with the attributes from record_a
   6  record_b.save # this tries to do an insert and throws ActiveRecord::RecordNotUnique

The fix is relatively simple: set @new_record to false in the reload method.

I should explain why I'm doing this in the first place and when this would happen to you. I ran into this in a highly concurrent environment where the primary key of a record is not an auto increment column. When two processes do a select to find a record that doesn't exist yet and try to create it, they will race to create that record in the DB; losing that race means you'll get ActiveRecord::RecordNotUnique. After this patch goes through, you'll be able to call reload and continue as you were. If you are writing an application where the primary key is not auto increment, you should be aware of this race case wherever you create records of that model and rescue ActiveRecord::RecordNotUnique.

ActiveRecord Attribute Delta Updates

August 05, 2009

Typically, when you update an attribute and save it using ActiveRecord in highly concurrent applications, you either have to use locking or be okay with clobbering the attribute with the last written value. In some cases, the nature of the data requires that locking be used if you want to maintain accuracy. But in other cases, specifically with numerical attributes, you can use SQL to express the difference you wish to apply.

   1  UPDATE `widgets` SET `number` = 5 WHERE `id` = 1;
   2  -- vs
   3  UPDATE `widgets` SET `number` = `number` + 5 WHERE `id` = 1;

So in the first case, if you have 3 clients who first load the model from the database, and then all change the number attribute and then save it, you get something like this:

   1  UPDATE `widgets` SET `number` = 3 WHERE `id` = 1;
   2  UPDATE `widgets` SET `number` = 9 WHERE `id` = 1;
   3  UPDATE `widgets` SET `number` = 4 WHERE `id` = 1;

When in actuality, the first one wanted to go from 0 to 3, the second from 0 to 9, and the last 0 to 4. If the second and third clients had known about the previous updates, the end value would be 16, but instead it's 4 because the last update overrode the previous. In some cases, this is intended, but not in all.

A solution to this problem is to make ActiveRecord use the alternate syntax to set the difference from the current value. Here's a gem to do that for you: ar-deltas

   1  class Widget < ActiveRecord::Base
   2    delta_attributes :number_one, :number_two
   3  end
   4  
   5  model = Widget.find(1)
   6  puts model.number #=> 0
   7  model.number += 5
   8  model.save
   9  
  10  # Before:
  11  # UPDATE `widgets` SET `number` = 5 WHERE `id` = 1;
  12  
  13  # After:
  14  # UPDATE `widgets` SET `number` = `number` + 5 WHERE `id` = 1;

Pandemic -- because information needs to be contagious

April 10, 2009

Pandemic is a map-reduce framework. You give it the map, process, and reduce methods and it handles the rest. It's designed to serve requests in real-time, but can also be used for offline tasks.

It's different from the typical map-reduce framework in that it doesn't have a master-worker structure. Every node can map, process, and reduce. It also doesn't have the concept of jobs, everything is a request.

The framework is designed to be as flexible as possible, there is no rigid request format, or API, you can specify it however you want. You can send it http-style headers and a body, you can send it JSON, or you can even just send it a single line and have it do whatever you want. The only requirement is that you write your handler to appropriately act on the request and return the response.

Here is how you use it:

Server

   1  require 'rubygems'
   2  require 'pandemic'
   3  
   4  class Handler < Pandemic::ServerSide::Handler
   5    def process(body)
   6      body.reverse
   7    end
   8  end
   9  
  10  pandemic_server = epidemic!
  11  pandemic_server.handler = Handler.new
  12  pandemic_server.start.join

In this example, the handler doesn't define the map or reduce methods, and the defaults are used. The default for each is as follows:

  • map: Send the full request body to every connected node
  • process: Return the body (do nothing)
  • reduce: Concatenate all the responses

Client

   1  require 'rubygems'
   2  require 'pandemic'
   3  
   4  class TextFlipper
   5    include Pandemize
   6    def flip(str)
   7      pandemic.request(str)
   8    end
   9  end

Config

Both the server and client have config files:

   1  # pandemic_server.yml
   2  servers:
   3    - host1:4000
   4    - host2:4000
   5  response_timeout: 0.5

Each value for the server list is the host:port that a node can bind to. The servers value can be a hash or an array of hashes, but I'll get to that later. The response timeout is how long to wait for responses from nodes before returning to the client.

   1  # pandemic_client.yml
   2  servers:
   3    - host1:4000
   4    - host2:4000
   5  max_connections_per_server: 10
   6  min_connections_per_server: 1
   7  response_timeout: 1

The min/max connections refers to how many connections to each node. If you're using the client in Rails, then just use 1 for both min/max since it's single threaded.

More Config

There are three ways to start a server:

  • ruby server.rb -i 0
  • ruby server.rb -i machine1hostname
  • ruby server.rb -a localhost:4000

The first refers to the index in the servers array:

   1  servers:
   2    - host1:4000 # started with ruby server.rb -i 0
   3    - host2:4000 # started with ruby server.rb -i 1

The second refers to the index in the servers hash. This can be particularly useful if you use the hostname as the key.

   1  servers:
   2    machine1: host1:4000 # started with ruby server.rb -i machine1
   3    machine2: host2:4000 # started with ruby server.rb -i machine2

The third is to specify the host and port explicitly. Ensure that the host and port you specify is actually in the config otherwise the other nodes won't be able to communicate with it.

You can also set node-specific configuration options.

   1  servers:
   2    - host1:4000:
   3        database: pandemic_node_1
   4        host: localhost
   5        username: foobar
   6        password: f00bar
   7    - host2:4000:
   8        database: pandemic_node_2
   9        host: localhost
  10        username: fizzbuzz
  11        password: f1zzbuzz

And you can access these additional options using config.get(keys) in your handler:

   1  class Handler < Pandemic::ServerSide::Handler
   2    def initialize
   3      @dbh = Mysql.real_connect(*config.get('host', 'username',
   4                                            'password', 'database')) 
   5    end
   6  end

Code: github repository

Install:
sudo gem -a http://gems.github.com
sudo gem install arya-pandemic

Request Queue via Mongrel Proctitle

February 11, 2009

In our cluster, we run many mongrels on several app servers all behind a load balancer. In order to get an idea of how each app server and its mongrels are doing, we use rtomayko's mongrel_proctitle gem. This gem sets up a mongrel handler that extracts request information from the client's request and sets it as the process title so when you do a ps aux or run top you can see each mongrel's queue length and request info:

   1  mongrel_rails [8000/1/4]: handling 127.0.0.1: GET /users
   2  # that's [port/queue length/requests handled]: handling client ip: request

Problem

Last week, we were having some misbehaving mongrels that would lock up and essentially stop serving requests. We'd look at the process titles and see something like this:

   1  mongrel rails [8021/14/6123]: handling 127.0.0.1: GET /status

We have an action designated to be a lightweight health check to let the load balancer know what's up and according to this process title, this action was locking up the mongrel and growing to a queue length of around 14. This didn't make sense, why is the lightest weight action locking up the mongrel?

Solution

With a little digging, I found that the mongrel_proctitle gem is broken. More precisely, it is only accurate when there is a single request being handled and none queued. The handler was setting the process title as soon as it received the request, which then led that client's thread to the next handler, which happens to be the synchronized Rails handler. So the request the process title shows is the most recent received request and not the current request being handled.

With a few small modifications, we now have a more accurate process title that also shows the rest of the queue (up to a character limit) as a comma-separated list:

   1  mongrel rails [8021/2/6123]: handling 127.0.0.1: GET /users, 127.0.0.1: GET /status

The updated version is available on github and also as a gem at gems.github.com as arya-mongrel_proctitle:

   1  gem sources -a http://gems.github.com
   2  sudo gem install arya-mongrel_proctitle

Note, as described in the comments in the code, it's still not exact. There is at least one race case where the entire list won't be ordered accurately, but it's unlikely to occur and still gives a good idea of what's going on.