January 1, 2010

Fun with the CouchDB _changes feed and RabbitMQ.

I was recently introduced to yajl-ruby, ruby bindings to the C based yajl json parsing/encoding libraries. After discovering that it can parse HTTP streams it seemed like it would be a perfect fit for use with CouchDB. A while back I wrote some code to push update notifications to RabbitMQ and a commenter mentioned using the _changes feed instead. Combining the _changes feed and yajl-ruby’s HttpStream seemed like a good way to do it.

The _changes feed is a running list of all the documents that have changed in a database listed in order by sequence number. This is similar to update notifications but gives more information such as the document IDs and is HTTP based (with multiple feed styles) rather than stdout. Additionally you can create design document filters which can be specified as a query parameter to give you only the parts of the feed you want. All in all _changes is a pretty powerful feature.

Now for the fun stuff, the code. There are a few dependencies I used to do this, specifically focused on making it fast. As such I used EventMachine based libraries for AMQP and HTTP requests. The first bit of code takes the _changes feed for the “test” database, parses the feed, uses the document ID to request that document and publish it to the queue. One key item to note is that this code requires the latest yajl-ruby from github to run properly. Additionally, this works nicely with feed=continuous so it grabs the documents as they are changed without a need for polling.

Note that there is a variable for since, this allows you to start from a specific sequence number so you can skip over old changes.

The next bit of code works from the other side of the queue. It subscribes to the queue, parses the JSON, performs some operations on it and puts the results back into another CouchDB database called “results”.

What could it be used for? My first thought is some sort of parallel computation, boot up a few dozen EC2 nodes and start dumping data into CouchDB. Have all those nodes pop messages off the queue, process them and dump the results back into Couch. Legitimately one could chain these together to process the results again. The queue ends up being a simple job management system with the EC2 nodes popping new messages as they finish processing them. With a little bit of work, features and the right use case I think could be a pretty powerful system.

Check out the code, my other projects and follow me on twitter @williamsjoe.

[edit: made a slight improvement to changes_sub.rb on 20100107]

September 19, 2009

Red Black Trees.

Been reading up on Red-black trees, a self-balancing binary tree. Here are some resources I found interesting.

August 20, 2009

HAProxy Stats Socket and fun with socat.

I’ve been debugging issues with HTTP, my backend servers and HAProxy. After a quick email to the HAProxy mailing list I found out about a configuration option stats socket PATH. This will create a socket you can send commands to and get more information out of HAProxy. To do this I just used some simle unix tools, the key is socat. From the man:

socat is a relay for bidirectional data transfer between two independent data channels. Each of these data channels may be a file, pipe, device (serial line etc. or a pseudo terminal), a socket (UNIX, IP4, IP6 – raw, UDP, TCP), an SSL socket, proxy CONNECT connection, a file descriptor (stdin etc.), the GNU line editor (readline), a program, or a combination of two of these. These modes include generation of “listening” sockets, named pipes, and pseudo terminals.

Here are a few examples of how to use the stats socket. First, you need to add stats socket PATH to your configuration and restart haproxy. You should then find a socket located at the path specified, I used /tmp/haproxy. Now you can send it commands to get more information and stats from HAProxy.

echo "show stat" | socat unix-connect:/tmp/haproxy stdio

This will give you stats on all of your backends and frontends, some of the same stuff you see on the stats page enabled by the stats uri configuration. As an added bonus it’s all in CSV.

echo "show errors" | socat unix-connect:/tmp/haproxy stdio

show errors will give you a capture of last error on each backend/frontend.

echo "show info" | socat unix-connect:/tmp/haproxy stdio

This will give you information about the running HAProxy process such as pid, uptime and etc.

echo "show sess" | socat unix-connect:/tmp/haproxy stdio

This will dump (possibly huge) info about all know sessions.

For more details check out the docs section 9 and stats socket in section 3.1.

Bonus socat fun.

socat is a more full featured cousin of netcat. Both can be used in similar ways, one thing I use them for occasionally is debugging REST and etc. This was a real help when working with an API that didn’t have a library, I could test things out without needing to make erroneous calls to the API. In the simplest case you can have either of them listen on a port and output all the details of the request. To do this with socat run:

socat tcp-listen:8000 stdio

This will listen for connections on port 8000. Doing the same thing with netcat is easy as well:

netcat -l -p 8000

For instance you can see the output from creating a document in CouchDB.

In one terminal:

$ irb
irb(main):001:0> require 'rubygems'
=> true
irb(main):002:0> require 'rest_client'
=> true
irb(main):003:0> RestClient.put("http://localhost:8000/somedb/somedoc", "{\"somekey\": \"somevalue\"}", :content_type => "application/json")

In another run your mock server:

$ socat tcp-listen:8000 stdio
PUT /somedb/somedoc HTTP/1.1
Accept: application/xml
Content-Type: application/json
Accept-Encoding: gzip, deflate
Content-Length: 24
Host: localhost:8000

{"somekey": "somevalue"}

Oh! By the way, if you install netcat from source, don’t compile with -DGAPING_SECURITY_HOLE unless you know what you are doing. :D

July 21, 2009

Boston Meet-up.

Headed to Boston next week, planning to meet-up next Tuesday (7/28) 7pm at Cambridge Brewing Co. Drop by for a beer, food and maybe a little Erlang.

June 5, 2009

Sending CouchDB Update Notifications to RabbitMQ.

Working at Cloudant I use CouchDB on a daily basis. This evening for fun I decided to write some Ruby to take update notifications and push them into RabbitMQ. There are other examples of using the update notifications and Ruby in Couch such as the view updater out on the Couch wiki. It turned out super simple. There are a few AMQP libraries for Ruby, in this example I am going to use carrot.  It’s based on the  amqp library without all the eventmachine stuff. So here it goes:

couch_amqp.rb :

#!/usr/bin/ruby

require ‘rubygems’
require ‘carrot’

def main
queue = “couchdb”
run = true
couchq = Carrot.queue(:queue => queue)

while run do

notifications = gets

if notifications == nil
run = false
else
couchq.publish(notifications)
end

end
end

main

As you can tell we connect to a queue called “couchdb” on by default this is on localhost. Next we have a loop that continually runs and grabs updates from stdin. I then publish each notification to the queue and that’s that. To get the messages out of the queue I used irb and carrot.

[user@host ~]$ irb
irb(main):001:0> require ‘rubygems’
=> true
irb(main):002:0> require ‘carrot’
=> true
irb(main):003:0> couchq = Carrot.queue(:queue => “couchdb”)
=> #<Carrot::AMQP::Queue:0×7f8d2284b640 <snip>
irb(main):004:0> couchq.pop
=> “{\”type\”:\”updated\”,\”db\”:\”test1\”}\n”

So yeah, pretty simple stuff. Go ahead relax! :)

[EDIT 06/05/2009 2326 PST : Don't forget to add the entry to your local.ini]

[update_notification]

couch_amqp=/PATH/TO/couch_amqp.rb