English subtitles

← XML Design Principles - Data Wranging with MongoDB

Get Embed Code
4 Languages

Showing Revision 2 created 05/24/2016 by Udacity Robot.

  1. XML was designed with a number of goals in
  2. mind. One of the most important for purposes of this
  3. class is, it was designed to provide data transfer
  4. that's platform independent. What does that mean? Well the idea
  5. here is that you can have a Producer App.
  6. That's written in any programming language on any operating system
  7. and any type of hardware. And the consumer app
  8. implemented in any other programming language operating system or hardware.
  9. There is no binding between how the consumer app
  10. or the Producer app is implemented, because they both agree
  11. to speak XML to one another. And of course
  12. in addition to consuming XML from the Producer app, the
  13. consumer, wouId also write XML to the Producer app.
  14. Another important goal for XML is that it would be
  15. easy to write programs to read and write XML. The
  16. designers also wanted a data format that could be validated.
  17. So in XML, we write a specification for a
  18. particular type of document. And then any specific examples
  19. of that document that are produced can be validated
  20. against that specification. So BioMed Central has a specification for
  21. the research article format, and any articles that are
  22. produced are validated against that format to ensure They
  23. adhere to the rules for that data model. XML
  24. is designed to be human readable, and as we saw
  25. in the example, we can get a pretty good idea
  26. of what information is contained with in an XML encoding
  27. just by looking at it. And finally, XML is designed
  28. to support a wide variety of applications. We've seen one
  29. application of XML, we're going to take a look at
  30. several others. That essentially span a number of different ways
  31. in which XML can be applied to exchanging data between
  32. applications. If you're interested in more information about this data format,
  33. I encourage you to take a look at the
  34. W3C site. We're going to talk a little bit about what
  35. having a standard means. One of the most important benefits
  36. of there being an XML standard, is that we have
  37. robust parsers in most programming languages, Python included. What this
  38. means for us as data scientists, is that we get
  39. to focus on our own applications. We don't have to
  40. worry about writing parsers. For some ad hoc data format.
  41. Previously each messaging system had its own format
  42. and all were different which made the type
  43. of messaging that we do now very messy,
  44. complex and expensive to do. If everyone uses the
  45. same syntax, it makes writing these systems a
  46. lot faster. And much more reliable. Another advantage for
  47. XML is that it's free. Now, that's free
  48. as in beer but also free from legal encumbrances.
  49. It's not a format that any company owns and may change out from under us. XML
  50. information can be manipulated programmatically. So we can
  51. build databases to support specific types of queries. Or,
  52. we can piece together data from different sources
  53. or take it apart to be reused in
  54. different ways. XML documents can also be reliably
  55. converted into other formats with no loss of information.
  56. XML lets you separate form, or appearance,
  57. from content. So, your XML file contains
  58. your document information, all of your text
  59. and data and identifies it's structure. Formatting
  60. and other processing needs are identified separately
  61. in a Stylesheet or processing system. In
  62. the BoiMed Central example, it is actually the XML that is transformed into HTML
  63. for rendering on the website or into PDF for download. Using a Stylesheet.
  64. And style sheet processing system. The two are combined in output time to
  65. apply the required formatting to the text of data identified by its structure.
  66. This structure might define location, position, order
  67. or any other aspects of the data.