-
Title:
Fundamentals of XML - Data Wranging with MongoDB
-
Description:
-
In case you're not terribly familiar with XML, let's spend
-
a few minutes talking syntax. Even if you are familiar,
-
it might make sense to follow along with this little
-
review. So in XML, elements are the basic building blocks
-
of an XML document. Now an XML element is composed
-
of an open tag and a closed tag, now this
-
is some data drawn from the New York Times developer
-
API Encourage you to have a look at this site.
-
We are going to look at some data from the most
-
popular API. These are for example articles that are most frequently
-
emailed among readers of the New York Times. Okay, so
-
let's look at a couple of examples here. So, the first
-
thing that we might notice about this particular document is
-
that we have some tags for num results or some elements
-
that have to do with the number of results. So,
-
this is actually result set from having done a query to
-
the most popular API and we've got a, an
-
element that tells us how many results were identified
-
by our query. And then the list of results
-
follows. Now this happens to be a single result here.
-
And we can see that this result begins right
-
here with this open tag and closes right here
-
with this close tag. Okay. Now just as a
-
couple of other examples of the data within this particular
-
result, we can take a look at the byline, note that
-
it's got a close tag, as well. And some of the
-
other elements, here if you note the title for example, this
-
happens to be an article about bedbugs. Okay. So, this provides an
-
example using some really nicely named tags. We know what these
-
mean. Now, there's another aspect of XML that we need to concern
-
ourselves with especially given some of the exercises that we're going
-
to have. Later on. And those have to do with attributes for
-
XML elements. Now, this document provides a number of very
-
nice examples of elements in XML. But what we don't
-
have here are any examples of attributes for any of
-
these elements being used. So what I'd like to do here
-
is talk about essentially the two types of data that
-
we're going to look at that have been encoded in XML. One
-
is this more documented oriented type of XML, which is
-
originally the type of data that XML was designed to encode.
-
But then we can also take a look at
-
something like this. Okay, now this is actual data from
-
the OpenStreetMap project. This is a pretty close
-
zoomed in view from OpenStreetMap of West Belmont
-
avenue. Particularly the 1000 block. And you can see right
-
here, there's a Giardano's Restaurant here. Giardano's is a famous
-
pizza chain in Chicago. So, this is data that is
-
essentially from a layer on top of that particular map.
-
This is data that is human created. So, users
-
of OpenStreetMap have actually added this data on
-
top of the map data. And what I want to point out here is that this is very much
-
not document oriented. This is just data. Okay? And
-
a lot of times you see HTML used in this
-
way, you'll see that attributes are heavily used. So
-
in this particular example, this is the node that represents
-
the Giordano's restaurant. We can see that there is
-
a number of attributes specified for this particular element.
-
Common among them are the latitude and longitude attributes
-
that this particular annotation applies to. So, essentially what
-
this data element provides is a mapping from geographic
-
coordinates to more common street address coordinates. Okay? So
-
this is a good example of attributes in XML
-
and there's one other thing that I want to point
-
out here. And that is this type of tag here.
-
Now in this particular data they're doing something that I probably
-
wouldn't do, but it is the type of thing that
-
you're going to see as a data scientist and likely already
-
have. Essentially, they've just got a bunch of key value
-
pairs that are encoded in something called a tag element. And,
-
in this case, none of these tag elements have a
-
close tag. Instead, they use this special xml syntax where you
-
can simply create what are called empty tags, that
-
is tags that don't have any content. All of
-
the data for this type of tag is contained
-
directly within its attributes. So the most emailed example
-
here provides us a nice example of document oriented
-
XML with lots of content inside the elements. And
-
this particular example from your OpenStreetMap project provides us
-
with other end of this spectrum which is very
-
data oriented XML where all or almost all of the data
-
is contained within attributes for the individual elements and in this
-
types of cases, you often have mostly or at least many
-
empty elements within the XML data that you are looking at.