I have been doing some work with Erlang lately and wanted to do figure out how to do XML parsing. After a bit of looking I found Erlsom, which is a XML parsing library for Erlang. It has a few modes including a SAX parser and a “simple sort of DOM parser”. I have had experience using Java, xerces and jdom so this sounded good to me. So I created an XML file containing music data, stuff like artist, album, song title and etc. So I decided to screw around with some of the examples found in their docs.
First, I had to install Erlsom. I just downloaded the tarball and extracted it and did the normal configure, make, make install. One issue I noticed is a error with non-visual (”^M”) characters in the config* files. I just used dos2unix to remove the bad characters and the configure script then worked fine.
Then I threw together my XML file, I had some old ID3 tag parsing code from a project of years ago that I used to create it. It basically looked like this. Then I just started up the Erlang console, loaded and parsed the XML file.
[zeusfaber@der-dieb ~]$ erl
Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:2] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.6.3 (abort with ^G)
1> {ok, Xml} = file:read_file(”music-library.xml”).
{ok,<<”\r\n
Arms and SleepersLimited Edition EP”…>>}
2> erlsom:parse_sax(Xml, [], fun(Event, Acc) -> io:format(”~p~n”, [Event]), Acc end).
startDocument
{processingInstruction,”xml”,
” version=\”1.0\” encoding=\”UTF-8\”"}
{startElement,[],”Library”,[],[]}
{startElement,[],”ArtistName”,[],[]}
{characters,”Arms and Sleepers”}
{startElement,[],”AlbumTitle”,[],[]}
{characters,”Limited Edition EP”}
{startElement,[],”SongTitle”,[],
[{attribute,"SongDate",[],[],”Unknown”},
{attribute,”SongGenre”,[],[],”Unknown”}]}
{characters,”We’re all in Paris Now (pt. 1)”}
–SNIP–
{startElement,[],”ArtistName”,[],[]}
{characters,”Wolf Parade”}
{startElement,[],”AlbumTitle”,[],[]}
{characters,”At Mount Zoomer”}
{startElement,[],”SongTitle”,[],
[{attribute,"SongDate",[],[],”2008″},
{attribute,”SongGenre”,[],[],”Unknown”}]}
{characters,”Kissing the Beehive”}
{endElement,[],”SongTitle”,[]}
{endElement,[],”AlbumTitle”,[]}
{endElement,[],”ArtistName”,[]}
{endElement,[],”Library”,[]}
endDocument
{ok,[],”\r\n”}
Once it loads the file you can do operations on the data. I did a few counts of artists and songs. The first counts how many times “Wolf Parade” shows up in the ‘characters’ field (ie {characters,”Wolf Parade”}).
3> CountWolfParade = fun(Event, Acc) -> case Event of {characters, “Wolf Parade”} -> Acc + 1; _ -> Acc end end.
#Fun
4> erlsom:parse_sax(Xml, 0, CountWolfParade).
{ok,9,”\r\n”}
So I have nine entires of “Wolf Parade”. Next, I ran it on the other artist in the XML, Arms and Sleepers.
5> CountArmsAndSleepers = fun(Event, Acc) -> case Event of {characters, “Arms and Sleepers”} -> Acc + 1; _ -> Acc end end.
#Fun
6> erlsom:parse_sax(Xml, 0, CountArmsAndSleepers).
{ok,6,”\r\n”}
This time I have six. In both cases the counts matched what was in the XML file. Next, I decided to count not a characters field but based on one of the element names, specifically “SongTitle”. The count should give me the total number of songs.
7> CountTotalSongs = fun(Event, Acc) -> case Event of {startElement, _, “SongTitle”, _, _} -> Acc + 1; _ -> Acc end end.
#Fun
8> erlsom:parse_sax(Xml, 0, CountTotalSongs).
{ok,15,”\r\n”}
Don’t forget that things need to match the ‘{startElement,[],”SongTitle”,[],[]}’ directive so your patten needs to look something like ‘{startElement, _, “SongTitle”, _, _}’ so it takes into account the empty brackets (’[]‘).
The pattern matching in Erlang and Erlsom makes parsing pretty easy although I have heard that using Erlang alone to parse XML is troublesome.
1 response so far ↓
1 Another Erlang XML Project. // Sep 22, 2008 at 7:47 pm
[...] past weekend I decided to work on another erlang project. Similar to the last one it revolves around parsing XML. This time I am using xmerl to parse Amazon wishlist data. I used a [...]
Leave a Comment