

![]()
May 26, 2008
May 23, 2008
More Hadoop, Grid Engine Goodness.
Over at GridEngine.info they found a link on DanT’s Sun blog that has a sweet tutorial on setting up Hadoop using SGE’s parallel environments with loose integration.
Here we are relying on master node to start othe daemons ( [rs]sh the machine and start daemons) and distribute jobs , and we donot have control on the TaskTracker threads. This way of setting a pe in Grid Engine is called loose-integration
With some more effort one could also achieve a tighter integration wherein the task of starting daemons and tasks on other slaves could be done by SGE. But this would require further understanding of Hadoop internals.
Pretty dope.
Using Pig with Hadoop.
Pig is a query language for use with Hadoop. It allows users to query hadoop data similar to a SQL database. Formally, according to their website:
Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
To get rolling you need the following:
- A Java SDK Installed
- Ant Installed
- Subversion
- A working installation of Hadoop
Once you are rolling with those items we can install Pig and test it out.
First, you need to download Pig from their Subversion repository. Once done you will need to build it with Ant.
svn co http://svn.apache.org/repos/asf/incubator/pig/trunk pig-svn
cd pig-svn
ant
From there you can run the following command to drop into the interactive shell.
java -cp pig.jar:HADOOPSITEPATH org.apache.pig.Main
Or you can run a pig script that you have already created.
java -cp pig.jar:HADOOPSITEPATH somescript.pig
HADOOPSITEPATH needs to point to the directory that contains the hadoop-site.xml file.
If you run into an issue such as:
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.dfs.ClientProtocol version mismatch. (client = 29, server = 23)
You will need to upgrade Hadoop so the versions match.
In the end you should get something that looks like this:
[cluster@front pig-svn]$ java -cp pig.jar:HADOOPSITEPATH org.apache.pig.Main
2008-05-23 10:37:42,478 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: front.esper:9000
2008-05-23 10:37:42,585 [main] WARN org.apache.hadoop.fs.FileSystem – “front.esper:9000″ is a deprecated filesystem name. Use “hdfs://front.esper:9000/” instead.
2008-05-23 10:37:43,117 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to map-reduce job tracker at: front.esper:9001
2008-05-23 10:37:43,246 [main] WARN org.apache.hadoop.fs.FileSystem – “front.esper:9000″ is a deprecated filesystem name. Use “hdfs://front.esper:9000/” instead.
grunt>
If you need more info on the above steps check out the Pig Wiki.
From here you can follow their tutorial or play around in the shell. Regarding the tutorial, I can’t seem to find the download of the archive they mention “Pig tutorial file (*.gz)”. If anyone knows where that can be found let me know and I will post it.
May 6, 2008
Case and point.
Related to yesterdays post regarding the gas tax holiday, where I stated
Both Clinton and McCain have it wrong, I imagine they contacted their campaign advisors rather than their economic advisors before suggesting its repeal.
Apparently it’s at least in part true. Over here I found
John McCain, the presumptive Republican presidential nominee who should know better, was the first presidential candidate to endorse the gas-tax holiday for the summer driving season. Reportedly, the idea originated with a political pollster, not among Mr. McCain’s economic advisers.
May 5, 2008
Gas Tax Holiday.
Why all the hoopla over the gas tax holiday? Both Clinton and McCain have it wrong, I imagine they contacted their campaign advisors rather than their economic advisors before suggesting its repeal. The numbers simply do not work out on the side of the consumer. The way I see it reducing gas prices by $0.184 will reduce the national average ($3.611) for regular grade gasoline to $3.427. So currently a 15 gallon tank takes $54.165 to fill. If we remove the tax it costs $51.405, a savings of $2.76. This means that you get about 8/10’s of a gallon more for the same money. It’s a bit of a savings but nothing to write home about.
The way I understand it, when the cost of a good goes down, demand increases due to an individual being able to purchase more of it with the same amount of money. This means there is less supply of that good since more is being purchased. This is especially true since in this case because the raw materials (oil) used in the production of this good (gasoline) is a finite resource that is controlled by a cartel (OPEC). So as a result the price of gasoline goes back up since there is less of it.
Here comes the economics.
All this is due to the demand for gasoline in the US is inelastic. That is demand hasn’t changed much with the increase in price this is because there is no realistic substitute for gasoline. There are options (E85, electric and etc) but they cannot yet be considered developed to the point of being a substitute for gasoline. Additionally gasoline is a normal good, this generally means that with the decrease in price there is higher demand. What I am talking about here is price elasticity of demand:
In economics and business studies, the price elasticity of demand (PED) is an elasticity that measures the nature and percentage of the relationship between changes in quantity demanded of a good and changes in its price.
Obama gets a gold star because in the end you have slightly lower prices in the short run and the same or higher prices in the long run. I imagine that a gas tax holiday would lower prices for a couple months (at most) and shortly there after we would be back to the same (if not higher) price we are today. The price will go back up because our demand of gasoline will likely not decrease from present levels, additionally when the price goes down we will likely see an increase in demand since the price is lower.
Also, we must not forget that we would be out millons/billons in tax revenue. At this point I don’t think the government has the cash to lose. We need it for Iraq, the helping our neighbors with the global food crisis and to help the people effected by disaster in Myanmar. So there are plenty better ways to spend the gas tax than on our poor oil addiction.
Interestingly (but not surprisingly) the effects of higher gas prices are causing increasing demand on smaller cars, which is a prime example of cross price elasticity of demand. Greg Mankiw’s blog has an example of this in a NYT article and how camels are making a comeback. He knows his stuff, is a Harvard prof and has a decent economics blog. He also wrote the text book for my macro class in college, it’s one of the few text books I didn’t sell after graduation.