English subtitles

← Intro to XML - Data Wranging with MongoDB

Get Embed Code
5 Languages

Showing Revision 3 created 05/24/2016 by Udacity Robot.

  1. Great. Let's dive into XML. Over the course of my
  2. career, I've used XML in a number of different data
  3. science projects. One of these was working with a large
  4. collection of research articles. So, just as an example, here's the
  5. original Google paper from Brin and Page when they were
  6. still grad students at Stanford. Now, what I was doing in
  7. this project is what's known as citation analysis. In citation
  8. analysis, What we're doing is comparing the relative importance of papers
  9. based on how many other research articles cite them. So
  10. for example you could compare the Google paper with some of
  11. my work. Which receives a much more modest number of
  12. citations compared to the 11,000 that Brin and Page's paper got.
  13. Now when I was doing my work most of the
  14. data that I used was not publicly available. But nowadays the
  15. same type of data is available and it's encoded as XML.
  16. There are quite a few open access publishers, like bio-med central.
  17. These publishers produce every article they publish both in a
  18. print form like this, and in .xml. Now, in order to
  19. do something like citation analysis, what we need to do
  20. is access the bibliography for each article. So what I want
  21. to look at as an example, is how easy it
  22. is when you have your data encoded as .xml, to pull
  23. out that type of data, and use it programmatically.
  24. So let's take a look at the references for this paper.
  25. Here are all of the other papers that this particular
  26. research article cites, now let's take a look at the XML
  27. version of this paper. Here is that exact same paper
  28. only here, instead of being designed for reading, it is instead
  29. encoded as data, let's jump down to the bibliography for
  30. this paper. And here it is, the very beginning of the
  31. bibliography. If we take a look at the print version of
  32. the article again, we can see that it does, in fact,
  33. align with what we're seeing here. So, this type
  34. of use of XML is very much what the designers
  35. of XML had in mind, where you have documents that
  36. have lots of text, but text. That you want to
  37. encode so that portions of it, at least, can
  38. be used programatically, like we might want to do with
  39. a bibliography of a research article. Or the author list
  40. and other data that occurs throughout a document like this.