## ← Counting Words Serially - Intro to Data Science

• 2 Followers
• 33 Lines

### Get Embed Code x Embed video Use the following code to embed this video. See our usage guide for more details on embedding. Paste this in your document somewhere (closest to the closing body tag is preferable): ```<script type="text/javascript" src='https://amara.org/embedder-iframe'></script> ``` Paste this inside your HTML body, where you want to include the widget: ```<div class="amara-embed" data-url="http://www.youtube.com/watch?v=LY1ylspy2OU" data-team="udacity"></div> ``` 4 Languages

Showing Revision 5 created 05/25/2016 by Udacity Robot.

1. Here is one way to explain the Mapreduce programming
2. model. Say that I wanted to count the number of
3. occurrences of each word that appears at least once in
4. a document. Let's use the text of Alice in Wonderland.
5. Here's a bit of text that says Alice was
6. begining to get very tired of sitting by her sister
7. on the bank And of having nothing to do. If
8. I wanted to solve this problem without Mapreduce, I might
9. create a Python dictionary consisting of all the words
10. and their counts. I could go through the document
11. and say, for each word in the document, if
12. there is a key for that word, add one.
13. Otherwise, set the initial for that key equal to
14. one. And instead of applying it to this short
15. sentence fragment from the book, we'd apply it to
16. the entire book. Before we solve this problem with Mapreduce,
17. why don't you try to write a Python script
18. along the lines of what we just discussed, that will
19. get the job done. Given many lines of a text,
20. create a dictionary with a key for each word, and
21. a value corresponding to the count of the word in
22. that text. Note that we want the words to be
23. stripped of any capitalization and punctuation. We just want the
24. basic words. Here's some code to get you started. First,
25. we import system string. And then we
26. initialize an empty dictionary, which will hold our
27. words and values. We cycle through the lines of the input, and for each line we
28. create an array, data. Which is essentially all
29. of the words in that line, split by
30. white space. So if we started with this
31. line. Hello, how are you? It would become,
32. hello, how, are, and you, in an array of length four. Your code should go here.
33. After we split the line by white space, and before we print out the dictionary.