Return to Video

Twitter Data Set - Data Wranging with MongoDB

  • 0:00 - 0:03
    Okay. I hope you enjoyed the course to this point.
  • 0:03 - 0:05
    In this lesson, we're going to work with a collection of
  • 0:05 - 0:08
    tweets. Now, I need to make clear that since this is
  • 0:08 - 0:11
    a collection that was gathered some time ago, it does not
  • 0:11 - 0:13
    reflect the state of Twitter feeds as they look right
  • 0:13 - 0:17
    now. It's a small snapshot in time. So our tweets had
  • 0:17 - 0:21
    the following form. So you can see that there is a,
  • 0:21 - 0:25
    unique identifier. There will be text for the tweet itself and
  • 0:25 - 0:28
    then an entities field. Now the entities field is broken
  • 0:28 - 0:30
    down into user mentions, urls and hashtags and we actually
  • 0:30 - 0:33
    took a look at one tweet, in the last lesson,
  • 0:33 - 0:36
    so this should be at least partially familiar to you.
  • 0:36 - 0:40
    User mentions, urls and hashtags represent that type of data,
  • 0:40 - 0:43
    and where it's found in the text of a tweet.
  • 0:43 - 0:46
    It's been extracted for us and stored in these individual
  • 0:46 - 0:50
    fields. Okay and then with each tweet, there's information about the
  • 0:50 - 0:52
    user at the time the tweet was made. As
  • 0:52 - 0:55
    you will see, our tweet documents actually contain many more
  • 0:55 - 0:58
    fields. We're representing those by the ellipses you see
  • 0:58 - 1:01
    in this example. As with the other data sets we've
  • 1:01 - 1:05
    considered, this type of data, is representative of what
  • 1:05 - 1:08
    you might work with as a data scientist. Many data
  • 1:08 - 1:11
    scientists are employed in spaces that work heavily with social
  • 1:11 - 1:15
    media. Google, Facebook, and Twitter are some of the most
  • 1:15 - 1:19
    prominent of thousands of firms, actually, that employ people to analyze this
  • 1:19 - 1:22
    type of data. Now, just for a moment imagine the types of
  • 1:22 - 1:25
    analysis you might want to do on tweets. Common for this type
  • 1:25 - 1:29
    of data is to understand the behavior of users, and the networks involved.
  • 1:29 - 1:31
    There are lots of ways we can do that. Now, one of
  • 1:31 - 1:35
    the most powerful things about putting our data in a database is
  • 1:35 - 1:39
    that most databases provide some analytics tools built in, that enable us
  • 1:39 - 1:41
    to explore our data a bit and get a sense for the story
  • 1:41 - 1:45
    it tells. In MongoDB, the built-in analytics tools take
  • 1:45 - 1:47
    the form of what we call the aggregation framework.
  • 1:48 - 1:50
    While not a replacement for MapReduce in a
  • 1:50 - 1:53
    lot of situations, it does provide a powerful tool for
  • 1:53 - 1:57
    exploring our data, and exploring it whether we're auditing
  • 1:57 - 2:00
    the quality of our data or doing some analysis.
  • 2:00 - 2:03
    And with each major release of MongoDB, this becomes
  • 2:03 - 2:06
    a more powerful tool. There are several really valuable feature
  • 2:06 - 2:08
    enhancements actually in the 2.6 release.
Tytuł:
Twitter Data Set - Data Wranging with MongoDB
Video Language:
English
Team:
Udacity
Projekt:
UD032: Data Wrangling with MongoDB
Duration:
02:08

English (United States) subtitles

Revisions