July 21, 2009

Boston Meet-up.

Headed to Boston next week, planning to meet-up next Tuesday (7/28) 7pm at Cambridge Brewing Co. Drop by for a beer, food and maybe a little Erlang.

June 5, 2009

Sending CouchDB Update Notifications to RabbitMQ.

Working at Cloudant I use CouchDB on a daily basis. This evening for fun I decided to write some Ruby to take update notifications and push them into RabbitMQ. There are other examples of using the update notifications and Ruby in Couch such as the view updater out on the Couch wiki. It turned out super simple. There are a few AMQP libraries for Ruby, in this example I am going to use carrot.  It’s based on the  amqp library without all the eventmachine stuff. So here it goes:

couch_amqp.rb :

#!/usr/bin/ruby

require ‘rubygems’
require ‘carrot’

def main
queue = “couchdb”
run = true
couchq = Carrot.queue(:queue => queue)

while run do

notifications = gets

if notifications == nil
run = false
else
couchq.publish(notifications)
end

end
end

main

As you can tell we connect to a queue called “couchdb” on by default this is on localhost. Next we have a loop that continually runs and grabs updates from stdin. I then publish each notification to the queue and that’s that. To get the messages out of the queue I used irb and carrot.

[user@host ~]$ irb
irb(main):001:0> require ‘rubygems’
=> true
irb(main):002:0> require ‘carrot’
=> true
irb(main):003:0> couchq = Carrot.queue(:queue => “couchdb”)
=> #<Carrot::AMQP::Queue:0x7f8d2284b640 <snip>
irb(main):004:0> couchq.pop
=> “{\”type\”:\”updated\”,\”db\”:\”test1\”}\n”

So yeah, pretty simple stuff. Go ahead relax! :)

[EDIT 06/05/2009 2326 PST : Don't forget to add the entry to your local.ini]

[update_notification]

couch_amqp=/PATH/TO/couch_amqp.rb

March 11, 2009

Erlang Factory.

It’s official I will be giving a talk at the Erlang Factory conference next month. I will be speaking about web server performance with a final round of tests that should be much more complete than the last couple death-matches. Hope to see you there.

March 5, 2009

Erlang Queue and Merle.

Lately I have been playing around with the idea of adding a process pool to merle or at least a layer that allows you to use a process pool. I also happened across Erlang’s queue implementation. It has all the basic functions you expect from a queue and two API’s. So I created a branch of merle to play around with this idea. There are two main differences from the mainline merle, the first is the pid is always passed to the functions doing the work rather than using ?SERVER, for instance.

stats() ->
gen_server2:call(?SERVER, {stats}).

versus

stats(Pid) ->
gen_server2:call(Pid, {stats}).

This allows more than one gen_server process to be started, the down side being you have to pass this Pid variable around. The other change is a new module called queue_merle, this is a sort of the process pool later that interfaces with merle. Obviously this is a very rough cut but seems to do the trick. The start function starts five merle processes and adds them to the queue, rotate rotates the queue taking all the head of the queue and inserting it into the bottom. I have impletemeted getkey and set as well. They accept a queue, key and/or value. The downside to this implementation is similar to that of using merle without defining ?SERVER, you have to know what, in this case, queue you are using and you need to make sure it is the most current otherwise you will end up getting more calls to one process than another. Here is an example of usage.

1> Queue = queue_merle:start().
{[<0.39.0>,<0.38.0>,<0.37.0>,<0.36.0>],[<0.33.0>]}
2> {Queue1, Result1} = queue_merle:set(Queue, a, “asdf”).
{{[<0.33.0>,<0.39.0>,<0.38.0>],[<0.36.0>,<0.37.0>]},ok}
3> {Queue2, Result2} = queue_merle:set(Queue1, b, “1234″).
{{[<0.36.0>,<0.33.0>,<0.39.0>,<0.38.0>],[<0.37.0>]},ok}
4> {Queue3, Result3} = queue_merle:getkey(Queue2, a).
{{[<0.37.0>,<0.36.0>,<0.33.0>],[<0.38.0>,<0.39.0>]},”asdf”}
5> {Queue4, Result4} = queue_merle:getkey(Queue3, b).
{{[<0.38.0>,<0.37.0>,<0.36.0>,<0.33.0>],[<0.39.0>]},”1234″}

As you can see the queue is rotating each time the functions are run but due to not allowing for multiple assignment one has to grab the new version of the queue each time and use it for the next operation. I imagine there is probably a cleaner way to do this, if I come up with one I like it will probably get added to mainline merle. Fun stuff.

February 6, 2009

New Stuff for merle.

I have been playing around with merle and have switched it from using the normal gen_server to using LShift’s modified gen_server2. It has a couple of changes to make things faster, the key is:

From a comment in their source file:

More efficient handling of selective receives in callbacks gen_server2 processes drain their message queue into an internal buffer before invoking any callback module functions. Messages are dequeued from the buffer for processing. Thus the effective message queue of a gen_server2 process is the concatenation of the internal buffer and the real message queue. As a result of the draining, any selective receive invoked inside a callback is less likely to have to scan a large message queue.

This means if you send a ton of messages at once it can handle this more effectively. In the case of merle this means more gets/puts/deletes/etc in a shorter amount of time. Some of the downsides are stated on the mailing list. I believe for the workload that merle does (lots of small messages in short time spans) this is a great addition. For other use cases it may not be, you know when you should test.

I have run some tests using gen_server and gen_server2 doing a large number of ‘set’ operations to memcached. The test consisted of running merle:set(a, “1″) a specific number of times (25k, 50k and 100k) with both gen_server and gen_server2. Since the mailbox gets backed up the Erlang processes are started before the operations complete on the memcached side. I didn’t have a good way to watch the memcached logs for when the operations completed and log timestamps so I used a simple stopwatch app to physically do the timing. Obviously this isn’t scientific but as you will see the differences are large enough its not a big deal.

gen_server

(click here for a larger view)

As you can see gen_server2 performs much better (almost linearly?), shaving large amounts of time off. Also note that on the gen_server 100k tests I stopped the testing once it reached 5 minutes, so I am unsure how much longer those would have went on. Below is the raw data, I also preformed subsequent tests and found that my initial findings seemed to be accurate.

gen_server test 1 gen_server2 test 1 gen_server test 2 gen_server2 test 2
25000 24 4 25 4
50000 134 8 115 8
100000 300 18 300 16

The latest source for merle using gen_server2 has been committed to github, give it shot and let me know if you find any bugs.