English subtitles

← 06x-01 Office Hours 5

Get Embed Code
1 Language

Showing Revision 4 created 11/06/2012 by podsinprint_user1.

  1. Hi and welcome to the fifth office hours.
  2. Hey, welcome, everybody. We got a bunch of students
  3. who are discussing in the forum and we picked
  4. them out because we like what they contributed
  5. to the forum and were going to give them a
  6. chance to ask some questions. So I think,
  7. Elia, you have the first question.
  8. Yes. As I was saying, Dr. Norvig is a luminary,
  9. he is one of the greatest experts
  10. on Natural Language Processing and he's
  11. head researcher at Google which is the most
  12. famous company for taking text from users and
  13. turning it into meaning across the vast wide
  14. interwebs. So, given that we touch the tip of the
  15. iceberg of NLP in Unit 3 with a little bit of regular
  16. expressions and regex parser and I was
  17. wondering if Dr. Norvig would, at some point in
  18. the future, teach a 300 level or 400 level Udacity
  19. class on using Python or maybe another
  20. programming language to do NLP. I find this is an
  21. absolutely fascinating field and I know its had
  22. tremendous success such as at Renaissance
  23. Technologies with Watson and so on. And I was
  24. really hoping to learn it and I dont think there
  25. is a better authority for that in the world
  26. than Dr.Norvig. So will he be teaching
  27. us as class in the future.
  28. Okay. So first of all, you have to be careful
  29. about trying to become a teachers pet on an
  30. online class as well as regular class. So lets
  31. calm down the praise a little bit. But thank you.
  32. Secondly, I think thats a great idea. I think that is
  33. a good course, its one I would like to do at some
  34. point but there is only so many hours in a day. So
  35. were doing one thing at a time. It would be fun to
  36. do it. It would be a more advanced level class. I
  37. guess one reason why I am not doing it right now
  38. is that there is another class being offered by
  39. Jurafsky and Chris Manning and they are both
  40. equally extremely well-qualified experts. So they,
  41. I am sure, teach a very good class too, be fun for
  42. me to do it. I have new colleague at Google,
  43. Prabhakar Raghavan, whos also an expert and
  44. has published several books in his area, might be
  45. fun for me to do it together with him. So I dont
  46. know exactly what the schedule is but I know
  47. it is a goal of mine to get around
  48. teaching their class at some point.
  49. Alright. Alison, I think you have question.
  50. I do, yes. I wanted to ask you a little bit
  51. about spotting errors and optimization of
  52. expected value functions where youve got these
  53. really complicated mathematical functions.
  54. I mean I think when I was trying to solve, to find
  55. the bugs in my PIC code, all the easy cases
  56. worked all the ones I could check but there was
  57. something wrong with the maths because I
  58. wasnt winning any of the games and that was
  59. the only way I could tell because I wasnt winning
  60. the games. Everything looked plausible apart
  61. from that. So how did you go what sort of
  62. strategies do you use to try and catch
  63. our errors in computation workout?
  64. Okay. So I can answer that both in terms
  65. of strategies in general and then specifically
  66. for this type of problem you mentioned.
  67. So in general, you want a testing strategy
  68. that does integration testing. As were
  69. testing the whole thing and unit testing, youre
  70. testing a little bit at a time and what you were
  71. discovering is you have this integration test of
  72. how my function play against another function.
  73. So thats the whole program all at once. And you
  74. were finding that you were losing. And so thats a
  75. great test to do. Its an integration test and you
  76. failed the test. Its an important test to do because
  77. if you succeed, then you are happy. But if you dont
  78. succeed, at least you know something wrong but
  79. it doesnt tell you very much at all about where
  80. its wrong and thats what you discovered. So that's
  81. why we have the unit test that tests smaller
  82. pieces and I guess the things to do there is to
  83. start with pieces for which you know the right
  84. answer. Say, here is a position thats near the
  85. end of the game, its clear, I should do this move,
  86. does my function do the right thing on that case.
  87. And then I think its that one right, then back up
  88. one step and say, okay, the move before that and
  89. I know that thats the right move, what would the
  90. right move be before that? And that maybe you
  91. can calculate out your own in pencil and paper
  92. and then you can test to see if they got that one
  93. right. Anyway, you start moving backwards and
  94. making the test complicated one more step at a
  95. time and then you can get to the point where the
  96. unit tests match up with the integration test
  97. and then hopefully you got the right answer.
  98. Alright. Tincher, you had a few questions.
  99. Do you want to start asking for those?
  100. Yeah, my questions have a general theme.
  101. So the code that Ive seen in this class.
  102. so Ive actually tinkered with Python a
  103. little bit before this class but the code that I have
  104. seen in this class is the code that I have never
  105. seen before. So its been really cool to be
  106. exposed to that but it makes me wonder what
  107. else am I missing, what are the things it makes
  108. me wonder that there is a whole realm of
  109. developing code that I dont even know about
  110. thats not just the code. So I want to know your
  111. development environment and I want to know like
  112. what type of integrated development environment
  113. you used whether you used GUI debugger,
  114. whether you used a profiler and actually most
  115. important question is what are some of the most
  116. important third-party Python libraries
  117. that you think we should study?
  118. Okay. So thats a great question and I might not
  119. be the right person to answer that because
  120. most of my use of Python is educational more than
  121. sort of serious professional development. And so I
  122. intentionally try to stay away from a lot of the fancy
  123. tools because I want to say I want to do something
  124. that I know all the students that are going to see
  125. what I am doing can duplicate and so if its something
  126. thats proprietary, third-party fancy tool, I want to
  127. not have that. And I guess I am also I am kind of
  128. old-fashioned. I dont know what the right IDE is
  129. and I am playing around with some of them but
  130. you know, I am used to using Emacs that I have
  131. used for many, many years and so Ive got all the
  132. commands, sort of memorize them at my
  133. fingertips and so I tend to use that. There are
  134. some other IDEs that duplicate a lot of the
  135. command. So I can be familiar using those as
  136. well and Im trying those out. And in terms of the
  137. libraries, I guess the most important one to me is
  138. the SciPy and the NumPy and I really would hope
  139. that at some point they become sort of an official
  140. part of the Python distribution and its all
  141. packaged together. You can get them packaged
  142. them together but you can't rely on everybody
  143. having those installed. I think the reason that they
  144. havent been merged together so far is that they
  145. are on a different release schedules in terms of
  146. the number of months or years in between
  147. releases is at a different pace. But I hope
  148. that can be worked out and that becomes
  149. standardized, that would be really great. And
  150. then in terms of how I do debugging and so on,
  151. occasionally, I use the symbolic debugger not
  152. that much, mostly just print statements and tests
  153. and breaking it down into small pieces using the
  154. interactive interpreter a lot. So I load the program
  155. in and then start typing in calls to my sub-
  156. functions in the interactive interpreter looking at
  157. the results so you have those makes sense. I use the
  158. profiler occasionally but I guess I usually find
  159. that if its so slow that you need a profiler then
  160. maybe Python isnt the language for you. So often
  161. I found it slow then the answer is to go and
  162. recode it in java or something rather than to try to
  163. make Python work faster. I know it all depends
  164. on the problem what youre trying to use and
  165. what youre trying to integrate with.
  166. Alright, Alison, looks like you have another
  167. question youd like to ask?
  168. I do, yes. Some of us started off with
  169. CS101 and then move on to this class and
  170. we all got a bit stuck on Unit 3 but were doing
  171. okay now. And were now thinking about what we
  172. should do in the next hexamester what do you
  173. think would be a good step-on from CS212 in
  174. terms of Udacitys course is given that we
  175. probably only got time to do one.
  176. So first of all, I want to say congratulations to
  177. you and all the others who really truly started with
  178. 101 as their first class. I think its quite
  179. impressive that youve been able to come this far.
  180. And we designed 101 so that it could be taken by
  181. people with no experience before but a lot of people
  182. in 101 did have some degree of prior experience.
  183. So for those of you who didnt, its a testament to
  184. your perseverance and I think its something that 101
  185. work in terms of being able to hit that broad
  186. audience. Lets see, and can you pull up the list
  187. of classes that were going to have next? Ive got.
  188. You got it and got to pull up already. Yes. Okay.
  189. Go ahead. The computer science classes that
  190. were offering are going to be software testing,
  191. software engineering, and data structuring in
  192. algorithms. Maybe you can comment on which of
  193. those three youd recommend next. Lets see.
  194. And what levels are they at? Thats a good
  195. question. Its something I dont have at the top of
  196. my head. Yeah. But I think most of them are
  197. going to be 200 level classes and perhaps the
  198. engineering will be a 100 level. Yeah. Not
  199. committed to any of these statements yet
  200. because were still ironing out the details. Weve
  201. had a lot of professors in lately doing a lot of
  202. filming. We have a new recording studio with five
  203. recording setups. So that will have content now
  204. Pretty exciting. But in addition to those three
  205. computer science classes we are going to have a
  206. discrete math course, a statistics course taught
  207. by Sebastian and a physics course that I will be
  208. teaching. Okay, yes, that all sounds good.
  209. I guess I would probably say that the algorithms
  210. would tend to be the next class if I look at the
  211. way people usually learn in a traditional
  212. university, you might have the algorithm class
  213. would be the natural next one.
  214. And who is teaching that one?
  215. Algorithm is going to be taught by Michael Littman.
  216. So and I know Mike, he is a great teacher.
  217. So I think you would enjoy that class, not to take
  218. anything away from the other ones but they might be
  219. something that you could take afterwards.
  220. Alright. Ginger, it looks like you have another
  221. question. Yes. And this was back to Unit 3 again also.
  222. So I was curious as to whether commercial regular
  223. expression and/or JSON parsers were written anything
  224. like we saw in Unit 3 or was that just was that
  225. just called for academic purposes of presenting
  226. grammars and the power of generalization?
  227. Okay, thats a great question. And the answer is
  228. somewhat so many systems like that do write
  229. parsers where the grammar that youre writing is
  230. similar that you say, here is the left hand side,
  231. here is an arrow, here is the right hand side and
  232. there are lots of systems that have that basic
  233. type of format and so that is familiar. You will see
  234. that elsewhere and that is used in professional
  235. level systems. Now, how thats actually
  236. interpreted is somewhat more complicated
  237. because they wanted to be easy for you to
  238. express the grammar but they also wanted to be
  239. fast and efficient to run that grammar to parse
  240. some and so there is more complications in
  241. terms of how you turn the grammar into a data
  242. structure that can be efficiently processed and I
  243. didnt want to worry about those complications, I
  244. wanted to just show you that it was possible to do it.
  245. Alright. Alison, you got another question.
  246. Yeah, I be prepared like I was told. I wanted to
  247. ask a little bit about how the sort of optimization
  248. function weve been doing in Unit 5 compares to
  249. game AIs, because for a computer game a near
  250. optimal strategy might not be as much fun to play
  251. as one wereThats right. the computer
  252. actually makes some mistakes and I know that
  253. bad gaming programs people keep accusing the
  254. writers of cheating with the die rolls because
  255. were beginning to play well but it looks like
  256. they are lucky. So what sort of approaches
  257. work best in the real world?
  258. Yes, so thats a great question. So in the real
  259. world, usually your goal is not just to win the
  260. game, your goal is to keep the people that are
  261. playing the games happy. And so that
  262. means a couple of things. One, some of the
  263. games are used partly as a teaching mechanism.
  264. So particularly for things like Chess and Go,
  265. chess players want to play it not just because
  266. they want to play a game, but because they want
  267. to improve their game. And so the system has to
  268. do something, it has to be able to explain in some
  269. ways why it did things so that you can learn from
  270. that. So that really changes the focus of it.
  271. Another part you bring out is its really boring to
  272. lose all the time and there are several ways
  273. around that. One is you can just have a setting,
  274. right? So how much compute time do you spend?
  275. How far do you look ahead and that can be a little
  276. slider and if youre losing, you can pull that slider
  277. back so you can catch up and if you are
  278. collaborating the computer, you can push the
  279. slider up, so you can try harder and that going to
  280. work. So in the more sophisticated games
  281. actually have an internal model of whats going
  282. on and what makes sport interest and then they
  283. all respond to that and not so much in the chess
  284. and checkers types of games, but in things like
  285. an auto race game where it would be boring if
  286. you were far behind or far ahead. There is a
  287. routines there that try to make everybody be
  288. packed together. So the winning cars will slow down
  289. if you are behind and they will speed up if
  290. you past them. But they have to do it in a
  291. believable way that doesnt just look like the
  292. system is cheating; it looks like its a real race.
  293. And so we know it can be harder to make it look
  294. realistic than it can be to just do the optimum
  295. work thing. Alright. Elia, you have another
  296. question? Yes. Dr. Norvig mentioned he uses
  297. Python for an educational or recreational
  298. purposes and I am wondering that what sort of
  299. languages do we really need to know if we are
  300. intent on pursuing a technical career, for example,
  301. I have a BS in operations research,
  302. Masters in statistics and I intend for my career to
  303. actually use these skills and so I wonder as a
  304. technical professional, which languages should I
  305. really know so that I could show, yes, I can hack
  306. this and when does Udacity plan on offering
  307. courses using these languages because
  308. it's such a great system?
  309. Great. Yeah, so I didnt mean to knock Python
  310. anyway or say that its not professional level.
  311. Certainly, there are lots and lots of people and
  312. companies that are using Python all the time. I
  313. was just saying I was using a subset of it sort of
  314. the intersection of what you would use
  315. professionally and what you would use
  316. educationally. So Python is a good answer.
  317. There are other languages that are similar to that
  318. that are gaining in popularity, Ruby and Closure,
  319. you hear about a lot. The I guess the strongest
  320. languages traditionally would be JAVA and C++
  321. and so there a little more verbose in terms of
  322. what you have to write but also a little bit more
  323. explicit in terms of how you understand whats
  324. going on in the programme. So one of the
  325. disadvantages of Python that we see for used
  326. by professionals is there is no declarations of
  327. what the inputs and outputs functions are. You
  328. could sort of see whats going on just by looking
  329. at the function and if youre writing a small
  330. program, and thats probably fine, but as the
  331. programs get larger, sometimes you can get lost
  332. do not know exactly what do I pass in here. You
  333. know and says, oh, I see I should pass in a
  334. sequence to this function that is either a list or
  335. triple okay or does it have to be one or the other,
  336. sometimes its not obvious in Python and that can
  337. be two errors in a larger project where its just
  338. hard to keep tracking all of that stuff. And so a
  339. language like a JAVA is more explicit that way
  340. even though you have to write more stuff to get to
  341. the same type of result. But I think, you know, in
  342. the end, you look around at the jobs you want
  343. and the community you want to be a part of and
  344. see whether they are using. So you know, if
  345. youre a statistician, all the languages I
  346. mentioned I know can be used for those types of
  347. jobs, I know there is also people used math lab
  348. and R and mathematical and other packages like
  349. that. So you have to look at your community and
  350. see what they use and then be proficient in the
  351. tools that the community expects. The next
  352. question is actually write-in question that came
  353. Portchanista says that there seems to be a
  354. widespread belief that during development,
  355. educations make code more and more
  356. complicated and less elegant and eventually the
  357. lines of special case code becomes so much
  358. greater than the amount of actual original design
  359. that get this mess on our hands so is this
  360. generally true or a myth? So you certainly see
  361. that happening that larger portion of the code can
  362. be error handling and then a very small portion is,
  363. here is the core algorithm after Ive taken care of
  364. all the errors. And I think there are different styles
  365. that lead to that happening more or less. One is
  366. in terms of the flexibility of the language and so a
  367. language like Python is very flexible. As I said, it
  368. doesnt care in many cases what you pass in, so
  369. you dont have to spend a lot of time checking the
  370. inputs to say this is exactly the right type of input.
  371. You just say, ah, go ahead and do it. So that
  372. makes the code more concise that way, it could
  373. be less so in another language. I guess it also
  374. depends on how much control you have over the
  375. whole system. So if you are writing your whole
  376. system from scratch, then you can kind of set the
  377. guideline that the only types of objects that Im
  378. going to create are objects of this type and I
  379. know that this is all that I am passing around and
  380. so I can pass them around with impunity and
  381. never have to check. And thats great if you are
  382. building a system from scratch. In other cases,
  383. you have to interface with existing systems and
  384. you never know quite whats coming out of those
  385. other systems. So you will always have to check
  386. and say, if I was passing in something myself to
  387. this function, I know it would be the right thing but
  388. I am getting it downloading it off of website or
  389. calling another server and getting a call back and
  390. being passed something. So now, the first thing I
  391. have to do is check the inputs to see if they are
  392. valid before I can do something. And, yes, true,
  393. that can end up being more work than the sort of
  394. actual real work of the code.
  395. Alright. Well, it looks like maybe all the questions
  396. we have for today. Thank you to you guys for asking
  397. those questions. Anybody else so got a last one. Oh,
  398. factor sink. Okay. Would you like to I dont
  399. know if your audio is working but if it is try and
  400. give it a shot and otherwise I can repeat.
  401. Oh, yes. I wanted to ask about the current research
  402. in the big data that Google does. I think its a
  403. very hot topic these days, and its going to be more
  404. and more important using the probabilistic models,
  405. something like my continuation of what we were
  406. doing this weeks lesson. Yeah. So please, if you
  407. could tell us something more about what you are
  408. doing. Okay, thats great topic. Were doing a lot
  409. and I think you are right that a lot of what we do
  410. does rely on probabilistic models and the reason
  411. is because we are collecting a lot of data and we
  412. are collecting it from a lot of sources and work
  413. hasnt gone in to verify those sources upfront we
  414. talk about that trade-off. So you can verify a data
  415. first or you just grab it as it is. And when we take
  416. stuff off the web, when we gather images, when
  417. we gather text, its just anything is out there.
  418. It is not like a well-formed database where
  419. you know you got a database of employees or you
  420. got a database of bank records and every record
  421. is verified to be accurate. We dont have any of
  422. them. So weve got all the stuff; some of it is
  423. right, some of it is wrong, much of it is duplicates.
  424. And we can't just take it as is. So and we dont
  425. know where the uncertainty is. So we need this
  426. probabilistic model. So thats certainly the
  427. approach. Now, what does it mean to do
  428. research in that area? Lots of bunch of things.
  429. One is what is the data you want to work on? You
  430. work on text of various kinds, web pages
  431. obviously and then theres all sorts of other new
  432. forms now of blogs and tweets and comments
  433. and everything else. Theres also audio and
  434. visual, still images and movies, were collecting
  435. lots and lots of those and learning how to process
  436. them. Those are harder than text because you
  437. can't deal just with the raw pixels or the raw
  438. sound form; you have to interpret that first. And
  439. so there is research in making models of what it
  440. means to be an amateur, what it means to be a
  441. sound. We have to do all that. Then there is
  442. research in how to do this all efficiency, how to
  443. move around this terabytes and petabytes of data
  444. from once place to another, and get enough
  445. computers into the right place and get the data
  446. processed in the right place. Were doing all that.
  447. There is research in machine learning or in
  448. optimization of saying what is the optimal result
  449. for this that there is so much data, its hard to
  450. compute using a traditional algorithm. Can we
  451. come up with algorithm instead of proximate the
  452. optimal result and will run efficiently over this and
  453. so in some case, thats inventing new algorithms
  454. for adapting existing algorithms to work on these
  455. larger amounts of data and to take advantage of
  456. the computing resources we have. So lots of
  457. work left to be done. You know, I feel like were
  458. just getting started but its an exciting area. So I
  459. promise I won't solve it all before you guys learn
  460. enough that you can contribute enough.
  461. Alright. Well, thank you very much. Thank you
  462. to Alison and Praseck and Ginger and Elia. Yeah,
  463. thank you. And to all the other students
  464. who are watching, thanks. See you next week.
  465. See you next week.