In unix/linux, you can accomplish so many things which would normally require programming languages in other operating systems. The unix shell is such a powerful tool that almost every bit of automation is possible in unix. So, when it comes to automation, unix is my #1 choice. Today, I’ll share with you how to read xml attributes in unix.

PROBLEM

Although, there isn’t an in built library in unix for reading an xml attribute, there are several ways to read an xml attribute in unix. Some of them are complicated and some of them are too easy. Today, I’ll discuss one of the easiest ways to read the attribute of an xml node in unix.

COMMON APPROACHES

Before I come to the easiest one, I’ll point out some of the other ways to read/parse an xml but without much explanation. Here are some of the suggestions made by fellow unix users :-

  1. Using xsltproc: xsltproc is a unix tool for transforming xml into html/ other xml format by using xslt (stylesheet). This method needs the knowledge of xslt and might be a bit tricky and cumbersome. However, this method will have more accurate and less error prone results because of the use of an xml oriented tool
  2. Using awk / sed: awk or sed can be used in a somewhat tricky manner to read the xml attributes. This one is also sometimes tricky and a beginner might wanna avoid

THE SOLUTION – MY APPROACH

The way i read the attributes of an xml node in unix is quite easy and can be used effectively for many attributes but with one drawback. This method cannot be used to read attribute values if there are multiple attributes with the same name. Otherwise this method is very easy and very efficient.

Let’s consider an xml from which we want to read the attribute values.
e.g. item.xml

<root>
<item name="foo" value="bar"></item>
</root>

In the above example, we can read the attribute values of “name” and “value” by using the following unix command.

eval $(tr '[< >]' '\n' < item.xml | egrep 'name|value')

EXPLANATION

In order to understand the above command, let’s break it down into small pieces.
First, let’s consider the command "tr '[< >]' '\n'". This command is basically splitting the xml into lines. It replaces >< and spaces with new lines. In other words, the above xml will become:

root
item 
name="foo" 
value="bar"
/item
/root

Now, the piped command egrep 'name|value' does nothing but filter out all lines except the ones containing the name and value attributes. This way, the xml is now reduced to

name="foo"
value="bar"

Then we evaluate this string as unix commands by using the eval command so that their values are assigned to their respective variable names. And we can easily access the values of these attributes by using their variable names in the shell as $name and $value respectively.