0:00:17.242,0:00:20.037 Six thousand miles of road, 0:00:20.037,0:00:22.018 600 miles of subway track, 0:00:22.018,0:00:24.037 400 miles of bike lanes, 0:00:24.037,0:00:25.721 and a half a mile of tram track, 0:00:25.721,0:00:27.739 if you've ever been to Roosevelt Island. 0:00:27.739,0:00:30.576 These are the numbers that make up[br]the infrastructure of NYC, 0:00:30.576,0:00:32.744 these are the statistics[br]of our infrastructure. 0:00:32.744,0:00:35.780 They're the kind of numbers[br]released in reports by city agencies. 0:00:35.780,0:00:38.953 For example, the Department[br]of Transportation will probably tell you 0:00:38.953,0:00:40.700 how many miles of road they maintain. 0:00:40.700,0:00:43.501 The MTA will boast how many miles[br]of subway track there are. 0:00:43.501,0:00:45.565 But most city agencies give us statistics. 0:00:45.565,0:00:48.802 This is from a report this year[br]from the Taxi & Limousine Commission, 0:00:48.802,0:00:53.019 where we've learned that there is[br]about 13,500 taxis here in NYC. 0:00:53.019,0:00:54.346 Pretty interesting, right? 0:00:54.346,0:00:57.135 But did you ever think about[br]where these numbers came from? 0:00:57.135,0:01:00.061 Because for these numbers to exist[br]somebody at the city agency 0:01:00.061,0:01:03.671 has to stop and say hmm, here's a number[br]that somebody might want to know. 0:01:03.671,0:01:05.900 Here's a number[br]that our citizens want to know. 0:01:05.900,0:01:07.640 So they go back to their raw data, 0:01:07.640,0:01:09.424 they count, they add, they calculate, 0:01:09.424,0:01:11.002 and then they put out reports. 0:01:11.002,0:01:13.501 And those reports[br]will have numbers like this. 0:01:13.501,0:01:16.043 The problem is, how do they know[br]all of our questions? 0:01:16.043,0:01:17.485 We have lots of questions. 0:01:17.485,0:01:20.764 In fact, in some ways there's literally[br]an infinite number of questions 0:01:20.764,0:01:22.260 that we can ask about our city. 0:01:22.260,0:01:23.893 So the agencies can never keep up. 0:01:23.893,0:01:25.660 So the paradigm isn't exactly working 0:01:25.660,0:01:28.261 and I think our policy makers realize that 0:01:28.261,0:01:31.632 because in 2012, Mayor Bloomberg[br]signed into law what he called 0:01:31.632,0:01:35.791 the most ambitious and comprehensive[br]open data legislation in the country. 0:01:35.791,0:01:37.568 In a lot of ways he's right. 0:01:37.568,0:01:42.390 In the last two years the city's released[br]1,000 data sets on our open data portal 0:01:42.390,0:01:44.123 and, it's pretty awesome. 0:01:44.123,0:01:45.559 You look at data like this, 0:01:45.559,0:01:47.623 and instead of counting[br]the number of cabs, 0:01:47.623,0:01:49.600 we can start to ask different questions. 0:01:49.600,0:01:52.373 So I had a question:[br]When is rush hour in NYC? 0:01:52.373,0:01:55.092 It can be pretty bothersome.[br]When is rush hour exactly? 0:01:55.092,0:01:57.820 And I thought to myself,[br]these cabs aren't just numbers, 0:01:57.820,0:02:01.134 these are GPS recorders driving around[br]in our city's streets recording 0:02:01.134,0:02:02.684 each and every right they take. 0:02:02.684,0:02:03.810 There's data there. 0:02:03.810,0:02:05.845 And I looked at that data[br]and I made a plot 0:02:05.845,0:02:08.573 of the average speed of taxis in NYC[br]throughout the day. 0:02:08.573,0:02:12.983 You can see that from around midnight[br]to around 5:18 AM, speed increases, 0:02:12.983,0:02:15.860 and at that point, things turn around. 0:02:15.860,0:02:19.834 They get slower, slower and slower[br]until about 8:35 AM 0:02:19.834,0:02:22.591 when they end up at 11.5 mph. 0:02:22.591,0:02:25.741 The average taxi is going at 11.5 mph[br]in our city streets, 0:02:25.741,0:02:28.219 and it turns out it stays that way 0:02:28.219,0:02:30.697 for the entire day. 0:02:30.697,0:02:33.175 (Laughter) 0:02:33.175,0:02:36.011 So I said to myself, I guess[br]there's no rush hour in NYC, 0:02:36.011,0:02:37.376 there's just a "rush day." 0:02:37.376,0:02:38.241 (Laughter) 0:02:38.241,0:02:39.158 Makes sense. 0:02:39.158,0:02:41.138 This is important[br]for a couple of reasons. 0:02:41.138,0:02:44.803 If you are a transportation planner,[br]this might be pretty interesting to know. 0:02:44.803,0:02:46.732 But if you want to get somewhere quickly 0:02:46.732,0:02:49.699 you now know to set your alarm[br]for 4:45 AM and you're all set. 0:02:49.699,0:02:50.497 New York, right? 0:02:50.497,0:02:52.181 But there's story behind this data, 0:02:52.181,0:02:54.143 it wasn't just available as it turns out. 0:02:54.143,0:02:57.733 It actually came from something called[br]a Freedom of Information Law Request, 0:02:57.733,0:02:58.658 or a FOIL Request. 0:02:58.658,0:03:01.632 This is a form you can find on[br]the Taxi & Limousine Commission website. 0:03:01.632,0:03:04.453 In order to access this data,[br]you need to go get this form, 0:03:04.453,0:03:06.475 fill it out, and they will notify you. 0:03:06.475,0:03:09.282 And a guy name Chris Whong[br]did exactly that. 0:03:09.282,0:03:10.973 Chris went down and they told him, 0:03:10.973,0:03:13.750 "Just bring a brand new hard drive[br]to our office, 0:03:13.750,0:03:17.040 leave it here for 5 hours,[br]we'll copy the data and you take it back." 0:03:17.040,0:03:19.305 And that's where this data came from. 0:03:19.305,0:03:22.349 Now, Chris is the kind of guy[br]that wants to make the data public, 0:03:22.349,0:03:25.975 so it ended up online for all to use[br]and that's where this graph came from. 0:03:25.975,0:03:27.859 And the fact that it exists is amazing. 0:03:27.859,0:03:29.804 These GPS recorders - really cool! 0:03:29.804,0:03:32.826 But the fact that we have citizens[br]walking around with hard drives 0:03:32.826,0:03:35.353 picking up data from city agencies[br]to make it public - 0:03:35.353,0:03:37.691 it was already kind of public,[br]you could get to it, 0:03:37.691,0:03:39.517 but it was "public", it wasn't public. 0:03:39.517,0:03:41.572 And we can do better than that as a city, 0:03:41.572,0:03:44.349 we don't need our citizens[br]walking around with hard drives. 0:03:44.349,0:03:47.252 Now, not every dataset[br]is behind a FOIL request. 0:03:47.252,0:03:50.809 Here's a map I made with[br]the most dangerous intersections in NYC 0:03:50.809,0:03:53.086 based on cyclist accidents. 0:03:53.086,0:03:54.875 So the red areas are more dangerous. 0:03:54.875,0:03:57.240 What it shows is first[br]the East side of Manhattan, 0:03:57.240,0:04:01.058 especially in the lower area of Manhattan,[br]has more cycle accidents. 0:04:01.058,0:04:02.244 That might makes sense 0:04:02.244,0:04:05.320 because there are more cyclist[br]coming off the bridges over there. 0:04:05.320,0:04:07.386 But there's other hotspots worth studying. 0:04:07.386,0:04:10.047 There's Williamsburg.[br]There's Roosevelt Avenue in Queens. 0:04:10.047,0:04:12.754 This is exactly the type of data[br]we need for vision zero. 0:04:12.754,0:04:14.728 This is exactly what we're looking for. 0:04:14.728,0:04:16.778 But there's story[br]behind this data as well. 0:04:16.778,0:04:18.304 This data didn't just appear. 0:04:18.304,0:04:20.825 How many of you guys know this logo? 0:04:20.825,0:04:22.069 Yeah, I see some shakes. 0:04:22.069,0:04:24.753 Have you ever tried to copy[br]and paste data out of a PDF 0:04:24.753,0:04:25.950 and make sense of it? 0:04:25.950,0:04:27.295 I see more shakes. 0:04:27.295,0:04:30.683 More of you tried to copying and pasting[br]than knew the logo. I like that. 0:04:30.683,0:04:33.731 What happen is, the data[br]that you just saw was actually on a PDF. 0:04:33.731,0:04:39.474 In fact, hundreds, and hundreds,[br]of pages of PDF put out by our own NYPD, 0:04:39.474,0:04:40.772 and in order to access it, 0:04:40.772,0:04:44.075 you either have to copy and paste[br]for hundred and hundred of hours, 0:04:44.075,0:04:45.590 or you could be John Krauss. 0:04:45.590,0:04:46.861 John Krauss is like, 0:04:46.861,0:04:50.232 I'm not going to copy and paste this data,[br]I'm going to write a program. 0:04:50.232,0:04:52.384 It's called the NYPD Crash Data Band-Aid. 0:04:52.384,0:04:55.227 And it goes to the NYPD's website[br]and it would download PDFs. 0:04:55.227,0:04:56.722 Every day with it would search; 0:04:56.722,0:04:58.642 if it found a PDF, it would download it, 0:04:58.642,0:05:00.912 and it would run[br]some PDF-scraping program, 0:05:00.912,0:05:02.296 and out would come the text 0:05:02.296,0:05:05.616 and it would go on the Internet,[br]and people could make maps like that. 0:05:05.616,0:05:08.831 And the fact that the data is here,[br]that we can have access to it - 0:05:08.831,0:05:11.452 every accident, by the way, is a row[br]on this table. 0:05:11.452,0:05:13.280 You can imagine how many PDF that is. 0:05:13.280,0:05:15.463 The fact that we[br]have access to that is great. 0:05:15.463,0:05:17.836 But let's not release it in PDF form. 0:05:17.836,0:05:20.570 Because then we're having our citizens[br]write PDF scrapers. 0:05:20.570,0:05:22.706 It's not the best use[br]of our citizens' time, 0:05:22.706,0:05:24.724 and we, as a city,[br]can do better than that. 0:05:24.724,0:05:27.096 The good news is that[br]the de Blasio Administration 0:05:27.096,0:05:30.077 actually released this data[br]a few months ago, 0:05:30.077,0:05:31.756 so now, we can have access to it. 0:05:31.756,0:05:34.353 But there's a lot of data[br]still entombed in PDF. 0:05:34.353,0:05:37.831 For example our crime data,[br]still is only available in PDF. 0:05:37.831,0:05:39.412 And not just our crime data, 0:05:39.412,0:05:41.638 our own city budget. 0:05:41.638,0:05:45.406 Our city budget is only[br]readable right now in PDF form. 0:05:45.406,0:05:47.441 And it's not just us[br]that can't analyze it - 0:05:47.441,0:05:50.152 our own legislators[br]who vote for the budget, 0:05:50.152,0:05:52.085 also only get it in PDF. 0:05:52.085,0:05:55.892 So our legislators cannot analyze[br]the budget that they are voting for. 0:05:55.892,0:05:59.597 And I think as a city we can do[br]a little better than that as well. 0:05:59.597,0:06:02.082 Now, there's a lot of data[br]that's not hidden in PDFs. 0:06:02.082,0:06:03.839 This is an example of a map I made. 0:06:03.839,0:06:07.003 And this is the dirtiest waterways in NYC. 0:06:07.003,0:06:08.319 How do I measure dirty? 0:06:08.319,0:06:09.945 Well, it's kind of a little weird, 0:06:09.945,0:06:12.418 but I looked at the level[br]of fecal coliform, 0:06:12.418,0:06:15.627 which is a measurement of fecal matter[br]in each of our waterways. 0:06:15.627,0:06:19.068 The larger the circle,[br]the dirtier the water. 0:06:19.068,0:06:22.273 The large circles are dirty waters,[br]the smaller circles are cleaner. 0:06:22.273,0:06:24.211 What you see is inland waterways. 0:06:24.211,0:06:27.460 This is all data that was sampled[br]by the city over the last 5 years. 0:06:27.460,0:06:29.716 And inland waterways are,[br]in general, dirtier. 0:06:29.716,0:06:31.132 That makes sense, right? 0:06:31.132,0:06:32.970 And I learned a few things from this. 0:06:32.970,0:06:39.277 Number 1: never swim in anything[br]that ends in creek or canal. 0:06:39.277,0:06:42.351 Number 2: I also found[br]the dirtiest waterways in New York City 0:06:42.351,0:06:44.047 by this measure, one measure. 0:06:44.047,0:06:45.120 In Coney Island Creek, 0:06:45.120,0:06:48.476 which is not Coney Island you swim in,[br]luckily, it's on the other side. 0:06:48.476,0:06:52.685 But Coney Island Creek, 94% of samples[br]taken over the last 5 years 0:06:52.685,0:06:55.220 have had fecal levels so high, 0:06:55.220,0:06:58.471 that it would be against state law[br]to swim in the water. 0:06:58.471,0:07:01.099 And this is not the kind of fact[br]that you're going to see 0:07:01.099,0:07:03.767 boasted in a city report[br]or on the front page of nyc.gov. 0:07:03.767,0:07:05.313 You're not going to see it there, 0:07:05.313,0:07:08.125 but the fact that we can[br]get to that data, is awesome. 0:07:08.125,0:07:09.925 Once again, it wasn't super easy, 0:07:09.925,0:07:12.251 because this data was not[br]on the open data portal. 0:07:12.251,0:07:14.255 If you were to go to the open data portal, 0:07:14.255,0:07:16.774 you'd see just a snippet of it,[br]a year or a few months. 0:07:16.774,0:07:20.078 It was actually on the Department[br]of Environmental Protection's website. 0:07:20.078,0:07:24.023 Each one of these links is an Excel sheet,[br]and this Excel sheet is different. 0:07:24.023,0:07:26.667 Every heading is different:[br]you copy, paste, reorganize. 0:07:26.667,0:07:29.592 When you do you can make maps[br]and that's great, but once again, 0:07:29.592,0:07:32.473 we can do better than that as a city,[br]we can normalize things. 0:07:32.473,0:07:35.653 We're getting there because[br]there's this website that Socrata makes, 0:07:35.653,0:07:37.131 called the Open Data Portal NYC. 0:07:37.131,0:07:39.275 This is where 1100 data sets,[br]that don't suffer 0:07:39.275,0:07:40.829 from the things I told you live, 0:07:40.829,0:07:42.899 and that number is growing,[br]and that's great. 0:07:42.899,0:07:46.695 You can download data in any format,[br]be it CSV or PDF or Excel document. 0:07:46.695,0:07:49.700 Whatever you want,[br]you can download the data that way. 0:07:49.700,0:07:51.156 The problem is, once you do, 0:07:51.156,0:07:55.625 you'll find that each agency[br]codes their addresses differently. 0:07:55.625,0:07:57.670 So, one is street name,[br]intersection street, 0:07:57.670,0:08:00.155 street, borough, address building,[br]building, address. 0:08:00.155,0:08:03.343 So, once again, you're spending time,[br]even when we have this portal, 0:08:03.343,0:08:05.862 you're spending time[br]normalizing our address field. 0:08:05.862,0:08:08.380 I think that's not the best use[br]of our citizens' time, 0:08:08.380,0:08:10.227 we can do better than that as a city. 0:08:10.227,0:08:11.863 We can standardize our addresses. 0:08:11.863,0:08:13.811 If we do, we can get more maps like this. 0:08:13.811,0:08:16.062 This is a map of fire hydrants[br]in New York City. 0:08:16.062,0:08:17.645 But not just any fire hydrant. 0:08:17.645,0:08:20.171 These are the top 250[br]grossing fire hydrants 0:08:20.171,0:08:22.862 in terms of parking tickets. 0:08:22.862,0:08:24.988 (Laughter) 0:08:24.988,0:08:27.109 So I learned a few things from this map. 0:08:27.109,0:08:30.239 Number 1: just don't park[br]on the Upper East side. 0:08:30.239,0:08:33.516 Just don't. No matter where you park,[br]you will get a hydrant ticket. 0:08:33.516,0:08:37.952 Number 2: I found the two highest[br]grossing hydrants in all of New York City. 0:08:37.952,0:08:39.475 They are on the Lower East side, 0:08:39.475,0:08:44.597 and they are bringing in over[br]55,000 dollars a year in parking tickets. 0:08:44.597,0:08:47.262 And that seemed a little strange to me[br]when I noticed it, 0:08:47.262,0:08:49.374 so I did a little digging,[br]and it turns out 0:08:49.374,0:08:52.610 what you had is a hydrant[br]and something called a curb extension, 0:08:52.610,0:08:54.663 which is like a seven-foot space[br]to walk on, 0:08:54.663,0:08:55.846 and then a parking spot. 0:08:55.846,0:08:57.940 So these cars came along and the hydrant - 0:08:57.940,0:08:59.785 "It's all the way over there, I'm fine," 0:08:59.785,0:09:03.254 and there was actually a parking spot[br]painted there beautifully for them. 0:09:03.254,0:09:06.269 They would park there and the NYPD[br]disagree with the designation, 0:09:06.269,0:09:07.562 and would ticket them. 0:09:07.562,0:09:09.905 And it wasn't just me[br]who found a parking ticket. 0:09:09.905,0:09:13.612 This is the Google street view car[br]driving by, finding same parking ticket. 0:09:13.612,0:09:16.076 So I wrote about this[br]on my blog, on I Quant NY, 0:09:16.076,0:09:18.465 and the DOT responded and they said, 0:09:18.465,0:09:22.792 "While the DOT has not received[br]any complaints about this location, 0:09:22.792,0:09:27.341 we will review the roadway markings[br]and make any appropriate alterations." 0:09:27.341,0:09:30.329 I thought to myself, you know,[br]typical government response, 0:09:30.329,0:09:31.944 all right, moved on with my life. 0:09:31.944,0:09:36.519 But then, a few weeks later,[br]something incredible happened. 0:09:36.519,0:09:38.638 They repainted the spot. 0:09:38.638,0:09:41.299 And for a second I thought[br]I saw the future of open data 0:09:41.299,0:09:43.216 because think about what happened here. 0:09:43.216,0:09:48.423 For five years, this spot[br]was being ticketed, and it was confusing. 0:09:48.423,0:09:52.804 And then a citizen found something,[br]they told the city and within a few weeks, 0:09:52.804,0:09:55.286 the problem was fixed. It's amazing. 0:09:55.286,0:09:58.092 A lot of people see open data[br]as being a watch dog, it's not. 0:09:58.092,0:09:59.375 It's about being a partner. 0:09:59.375,0:10:02.504 We can empower our citizens to be[br]better partners for government, 0:10:02.504,0:10:04.081 and it's not that hard. 0:10:04.081,0:10:05.530 All we need are a few changes. 0:10:05.530,0:10:06.663 If you're FOILing data, 0:10:06.663,0:10:09.305 if you seeing your data[br]being FOILed over and over again, 0:10:09.305,0:10:12.109 let's release it to the public, that's[br]a sign that it should be made public. 0:10:12.109,0:10:15.015 And if you're a government agency[br]releasing a PDF, 0:10:15.015,0:10:18.909 let's pass a legislation that requires you[br]to post it with your underlying data, 0:10:18.909,0:10:20.944 because that data[br]is coming from somewhere. 0:10:20.944,0:10:23.649 I don't know where,[br]but you can release it with the PDF. 0:10:23.649,0:10:25.981 And let's adopt and share[br]some open data standards. 0:10:25.981,0:10:28.680 Let's start with our addresses[br]here in New York City. 0:10:28.680,0:10:30.707 Let's just start[br]normalizing our addresses. 0:10:30.707,0:10:32.691 Because New York is a leader in open data. 0:10:32.691,0:10:35.318 Despite all this, we're absolutely[br]a leader in open data, 0:10:35.318,0:10:38.375 and if we start normalizing things,[br]and set an open data standard, 0:10:38.375,0:10:39.332 others will follow. 0:10:39.332,0:10:41.834 The state will follow,[br]maybe the federal government, 0:10:41.834,0:10:43.393 other countries could follow, 0:10:43.393,0:10:47.112 and we're not that far off from a time[br]where you can write one program 0:10:47.112,0:10:49.176 and map information from a 100 countries. 0:10:49.176,0:10:51.750 It's not science fiction,[br]we're actually quite close. 0:10:51.750,0:10:54.037 And by the way, who are we[br]empowering with this? 0:10:54.037,0:10:57.614 Because it's not just John Krauss,[br]it's not just Chris Whong. 0:10:57.614,0:11:00.905 There are hundred of meetups[br]going around in New York City right now, 0:11:00.905,0:11:02.027 active meetups. 0:11:02.027,0:11:04.606 There are thousands of people[br]attending these meetups. 0:11:04.606,0:11:06.973 These people are going after work[br]and on weekends, 0:11:06.973,0:11:09.758 and they're attending these meetups[br]to look at open data, 0:11:09.758,0:11:11.384 and make our city a better place. 0:11:11.384,0:11:15.929 Groups like BetaNYC who just last week,[br]released something called citygram.nyc 0:11:15.929,0:11:18.161 that allows you to subscribe[br]to 311 complaints 0:11:18.161,0:11:20.305 around your own home,[br]or around your office. 0:11:20.305,0:11:22.686 You put in your address,[br]you get local complaints. 0:11:22.686,0:11:25.774 And it's not just the tech community[br]that are after these things. 0:11:25.774,0:11:28.399 It's urban planners like the students[br]I teach at Pratt. 0:11:28.399,0:11:30.363 It's policy advocates, it's everyone, 0:11:30.363,0:11:32.919 it's citizens from a diverse[br]set of backgrounds. 0:11:32.919,0:11:35.680 And with some small incremental changes, 0:11:35.680,0:11:38.788 we can unlock the passion[br]and the ability of our citizens 0:11:38.788,0:11:41.897 to harness open data[br]and make our city even better, 0:11:41.897,0:11:45.958 whether is one data set[br]or one parking spot at a time. 0:11:45.958,0:11:47.213 Thank you. 0:11:47.213,0:11:50.274 (Applause)