English (United States) subtitles

← Intro - Data Wrangling with MongoDB

Get Embed Code
6 Languages

Showing Revision 1 created 08/02/2016 by Udacity Robot.

  1. Hi I'm Shanon Bradshaw director of education
  2. at MongoDB, MongoDB is the company behind the
  3. open source noSQL database of the same name. Prior to joining MongoDB, I was a
  4. computer scientist working in academia and consulting
  5. on a number of different data science projects
  6. in the financial industry, social media and other
  7. spaces. This is a class about data wrangling.
  8. Data scientist spend about 70% of their time data
  9. wrangling. So what is data wrangling? Well, it's a process
  10. of gathering, extracting, cleaning, and storing our data. Only
  11. after that does it really make sense to do any
  12. analysis. So if you're quantum wall street and you
  13. want to build models to automate trading you first need
  14. to make sure you're basing your models on reliable
  15. data. Or if you're building a map app you need
  16. to ensure your data's correct, or you can quickly find yourself in something
  17. of a public relations disaster. As another example if you're working on
  18. a smaller scale Analyzing data for a research team, you want to
  19. ensure your team can make decisions based on the data you're providing. If
  20. you don't take the time to ensure your data is in good
  21. shape before doing any analysis, you run a big risk of wasting
  22. a lot of time later on, or worse, losing the faith of
  23. your colleagues, who depend on the data you've prepared. It's a little like
  24. putting the cart before the horse. So while we're going
  25. to get to some analysis, That's why we're doing all this
  26. in the first place. This class is really about getting
  27. your data ready so that any analysis is built on a
  28. solid foundation of good data. Here you're going to get
  29. a chance to do some tinkering. We're going to develop your
  30. hacker muscles, or, should I say, wrangler muscles. We'll work
  31. with lots of different types of data. For music, energy, Wikipedia,
  32. and Twitter, to name a few. We'll also teach you how
  33. to work with data in most of the formats you're likely
  34. to see. JSON, XML, CSV, Excel, and HTML. And even some
  35. legacy text formats. In the last half of the course, we'll
  36. show you how to store your data in MondoDB, and use
  37. it to support analysis. Mongo DB is becoming increasingly important to
  38. data scientists around the world, as a powerful and scalable tool
  39. for big data problems. And we'll wrap it up with a
  40. case study, that allows you to put all the pieces together.
  41. We're happy to have you in the class. Let's get started.