WEBVTT 00:00:17.242 --> 00:00:20.037 Six thousand miles of road, 00:00:20.037 --> 00:00:22.018 600 miles of subway track, 00:00:22.018 --> 00:00:24.037 400 miles of bike lanes, 00:00:24.037 --> 00:00:25.721 and a half a mile of tram track, 00:00:25.721 --> 00:00:27.739 if you've ever been to Roosevelt Island. 00:00:27.739 --> 00:00:30.576 These are the numbers that make up the infrastructure of NYC, 00:00:30.576 --> 00:00:32.744 these are the statistics of our infrastructure. 00:00:32.744 --> 00:00:35.780 They're the kind of numbers released in reports by city agencies. 00:00:35.780 --> 00:00:38.953 For example, the Department of Transportation will probably tell you 00:00:38.953 --> 00:00:40.700 how many miles of road they maintain. 00:00:40.700 --> 00:00:43.501 The MTA will boast how many miles of subway track there are. 00:00:43.501 --> 00:00:45.565 But most city agencies give us statistics. 00:00:45.565 --> 00:00:48.802 This is from a report this year from the Taxi & Limousine Commission, 00:00:48.802 --> 00:00:53.019 where we've learned that there is about 13,500 taxis here in NYC. 00:00:53.019 --> 00:00:54.346 Pretty interesting, right? 00:00:54.346 --> 00:00:57.135 But did you ever think about where these numbers came from? 00:00:57.135 --> 00:01:00.061 Because for these numbers to exist somebody at the city agency 00:01:00.061 --> 00:01:03.671 has to stop and say hmm, here's a number that somebody might want to know. 00:01:03.671 --> 00:01:05.900 Here's a number that our citizens want to know. 00:01:05.900 --> 00:01:07.640 So they go back to their raw data, 00:01:07.640 --> 00:01:09.424 they count, they add, they calculate, 00:01:09.424 --> 00:01:11.002 and then they put out reports. 00:01:11.002 --> 00:01:13.501 And those reports will have numbers like this. 00:01:13.501 --> 00:01:16.043 The problem is, how do they know all of our questions? 00:01:16.043 --> 00:01:17.485 We have lots of questions. 00:01:17.485 --> 00:01:20.764 In fact, in some ways there's literally an infinite number of questions 00:01:20.764 --> 00:01:22.260 that we can ask about our city. 00:01:22.260 --> 00:01:23.893 So the agencies can never keep up. 00:01:23.893 --> 00:01:25.660 So the paradigm isn't exactly working 00:01:25.660 --> 00:01:28.261 and I think our policy makers realize that 00:01:28.261 --> 00:01:31.632 because in 2012, Mayor Bloomberg signed into law what he called 00:01:31.632 --> 00:01:35.791 the most ambitious and comprehensive open data legislation in the country. 00:01:35.791 --> 00:01:37.568 In a lot of ways he's right. 00:01:37.568 --> 00:01:42.390 In the last two years the city's released 1,000 data sets on our open data portal 00:01:42.390 --> 00:01:44.123 and, it's pretty awesome. 00:01:44.123 --> 00:01:45.559 You look at data like this, 00:01:45.559 --> 00:01:47.623 and instead of counting the number of cabs, 00:01:47.623 --> 00:01:49.600 we can start to ask different questions. 00:01:49.600 --> 00:01:52.373 So I had a question: When is rush hour in NYC? 00:01:52.373 --> 00:01:55.092 It can be pretty bothersome. When is rush hour exactly? 00:01:55.092 --> 00:01:57.820 And I thought to myself, these cabs aren't just numbers, 00:01:57.820 --> 00:02:01.134 these are GPS recorders driving around in our city's streets recording 00:02:01.134 --> 00:02:02.684 each and every right they take. 00:02:02.684 --> 00:02:03.810 There's data there. 00:02:03.810 --> 00:02:05.845 And I looked at that data and I made a plot 00:02:05.845 --> 00:02:08.573 of the average speed of taxis in NYC throughout the day. 00:02:08.573 --> 00:02:12.983 You can see that from around midnight to around 5:18 AM, speed increases, 00:02:12.983 --> 00:02:15.860 and at that point, things turn around. 00:02:15.860 --> 00:02:19.834 They get slower, slower and slower until about 8:35 AM 00:02:19.834 --> 00:02:22.591 when they end up at 11.5 mph. 00:02:22.591 --> 00:02:25.741 The average taxi is going at 11.5 mph in our city streets, 00:02:25.741 --> 00:02:28.219 and it turns out it stays that way 00:02:28.219 --> 00:02:30.697 for the entire day. 00:02:30.697 --> 00:02:33.175 (Laughter) 00:02:33.175 --> 00:02:36.011 So I said to myself, I guess there's no rush hour in NYC, 00:02:36.011 --> 00:02:37.376 there's just a "rush day." 00:02:37.376 --> 00:02:38.241 (Laughter) 00:02:38.241 --> 00:02:39.158 Makes sense. 00:02:39.158 --> 00:02:41.138 This is important for a couple of reasons. 00:02:41.138 --> 00:02:44.803 If you are a transportation planner, this might be pretty interesting to know. 00:02:44.803 --> 00:02:46.732 But if you want to get somewhere quickly 00:02:46.732 --> 00:02:49.699 you now know to set your alarm for 4:45 AM and you're all set. 00:02:49.699 --> 00:02:50.497 New York, right? 00:02:50.497 --> 00:02:52.181 But there's story behind this data, 00:02:52.181 --> 00:02:54.143 it wasn't just available as it turns out. 00:02:54.143 --> 00:02:57.733 It actually came from something called a Freedom of Information Law Request, 00:02:57.733 --> 00:02:58.658 or a FOIL Request. 00:02:58.658 --> 00:03:01.632 This is a form you can find on the Taxi & Limousine Commission website. 00:03:01.632 --> 00:03:04.453 In order to access this data, you need to go get this form, 00:03:04.453 --> 00:03:06.475 fill it out, and they will notify you. 00:03:06.475 --> 00:03:09.282 And a guy name Chris Whong did exactly that. 00:03:09.282 --> 00:03:10.973 Chris went down and they told him, 00:03:10.973 --> 00:03:13.750 "Just bring a brand new hard drive to our office, 00:03:13.750 --> 00:03:17.040 leave it here for 5 hours, we'll copy the data and you take it back." 00:03:17.040 --> 00:03:19.305 And that's where this data came from. 00:03:19.305 --> 00:03:22.349 Now, Chris is the kind of guy that wants to make the data public, 00:03:22.349 --> 00:03:25.975 so it ended up online for all to use and that's where this graph came from. 00:03:25.975 --> 00:03:27.859 And the fact that it exists is amazing. 00:03:27.859 --> 00:03:29.804 These GPS recorders - really cool! 00:03:29.804 --> 00:03:32.826 But the fact that we have citizens walking around with hard drives 00:03:32.826 --> 00:03:35.353 picking up data from city agencies to make it public - 00:03:35.353 --> 00:03:37.691 it was already kind of public, you could get to it, 00:03:37.691 --> 00:03:39.517 but it was "public", it wasn't public. 00:03:39.517 --> 00:03:41.572 And we can do better than that as a city, 00:03:41.572 --> 00:03:44.349 we don't need our citizens walking around with hard drives. 00:03:44.349 --> 00:03:47.252 Now, not every dataset is behind a FOIL request. 00:03:47.252 --> 00:03:50.809 Here's a map I made with the most dangerous intersections in NYC 00:03:50.809 --> 00:03:53.086 based on cyclist accidents. 00:03:53.086 --> 00:03:54.875 So the red areas are more dangerous. 00:03:54.875 --> 00:03:57.240 What it shows is first the East side of Manhattan, 00:03:57.240 --> 00:04:01.058 especially in the lower area of Manhattan, has more cycle accidents. 00:04:01.058 --> 00:04:02.244 That might makes sense 00:04:02.244 --> 00:04:05.320 because there are more cyclist coming off the bridges over there. 00:04:05.320 --> 00:04:07.386 But there's other hotspots worth studying. 00:04:07.386 --> 00:04:10.047 There's Williamsburg. There's Roosevelt Avenue in Queens. 00:04:10.047 --> 00:04:12.754 This is exactly the type of data we need for vision zero. 00:04:12.754 --> 00:04:14.728 This is exactly what we're looking for. 00:04:14.728 --> 00:04:16.778 But there's story behind this data as well. 00:04:16.778 --> 00:04:18.304 This data didn't just appear. 00:04:18.304 --> 00:04:20.825 How many of you guys know this logo? 00:04:20.825 --> 00:04:22.069 Yeah, I see some shakes. 00:04:22.069 --> 00:04:24.753 Have you ever tried to copy and paste data out of a PDF 00:04:24.753 --> 00:04:25.950 and make sense of it? 00:04:25.950 --> 00:04:27.295 I see more shakes. 00:04:27.295 --> 00:04:30.683 More of you tried to copying and pasting than knew the logo. I like that. 00:04:30.683 --> 00:04:33.731 What happen is, the data that you just saw was actually on a PDF. 00:04:33.731 --> 00:04:39.474 In fact, hundreds, and hundreds, of pages of PDF put out by our own NYPD, 00:04:39.474 --> 00:04:40.772 and in order to access it, 00:04:40.772 --> 00:04:44.075 you either have to copy and paste for hundred and hundred of hours, 00:04:44.075 --> 00:04:45.590 or you could be John Krauss. 00:04:45.590 --> 00:04:46.861 John Krauss is like, 00:04:46.861 --> 00:04:50.232 I'm not going to copy and paste this data, I'm going to write a program. 00:04:50.232 --> 00:04:52.384 It's called the NYPD Crash Data Band-Aid. 00:04:52.384 --> 00:04:55.227 And it goes to the NYPD's website and it would download PDFs. 00:04:55.227 --> 00:04:56.722 Every day with it would search; 00:04:56.722 --> 00:04:58.642 if it found a PDF, it would download it, 00:04:58.642 --> 00:05:00.912 and it would run some PDF-scraping program, 00:05:00.912 --> 00:05:02.296 and out would come the text 00:05:02.296 --> 00:05:05.616 and it would go on the Internet, and people could make maps like that. 00:05:05.616 --> 00:05:08.831 And the fact that the data is here, that we can have access to it - 00:05:08.831 --> 00:05:11.452 every accident, by the way, is a row on this table. 00:05:11.452 --> 00:05:13.280 You can imagine how many PDF that is. 00:05:13.280 --> 00:05:15.463 The fact that we have access to that is great. 00:05:15.463 --> 00:05:17.836 But let's not release it in PDF form. 00:05:17.836 --> 00:05:20.570 Because then we're having our citizens write PDF scrapers. 00:05:20.570 --> 00:05:22.706 It's not the best use of our citizens' time, 00:05:22.706 --> 00:05:24.724 and we, as a city, can do better than that. 00:05:24.724 --> 00:05:27.096 The good news is that the de Blasio Administration 00:05:27.096 --> 00:05:30.077 actually released this data a few months ago, 00:05:30.077 --> 00:05:31.756 so now, we can have access to it. 00:05:31.756 --> 00:05:34.353 But there's a lot of data still entombed in PDF. 00:05:34.353 --> 00:05:37.831 For example our crime data, still is only available in PDF. 00:05:37.831 --> 00:05:39.412 And not just our crime data, 00:05:39.412 --> 00:05:41.638 our own city budget. 00:05:41.638 --> 00:05:45.406 Our city budget is only readable right now in PDF form. 00:05:45.406 --> 00:05:47.441 And it's not just us that can't analyze it - 00:05:47.441 --> 00:05:50.152 our own legislators who vote for the budget, 00:05:50.152 --> 00:05:52.085 also only get it in PDF. 00:05:52.085 --> 00:05:55.892 So our legislators cannot analyze the budget that they are voting for. 00:05:55.892 --> 00:05:59.597 And I think as a city we can do a little better than that as well. 00:05:59.597 --> 00:06:02.082 Now, there's a lot of data that's not hidden in PDFs. 00:06:02.082 --> 00:06:03.839 This is an example of a map I made. 00:06:03.839 --> 00:06:07.003 And this is the dirtiest waterways in NYC. 00:06:07.003 --> 00:06:08.319 How do I measure dirty? 00:06:08.319 --> 00:06:09.945 Well, it's kind of a little weird, 00:06:09.945 --> 00:06:12.418 but I looked at the level of fecal coliform, 00:06:12.418 --> 00:06:15.627 which is a measurement of fecal matter in each of our waterways. 00:06:15.627 --> 00:06:19.068 The larger the circle, the dirtier the water. 00:06:19.068 --> 00:06:22.273 The large circles are dirty waters, the smaller circles are cleaner. 00:06:22.273 --> 00:06:24.211 What you see is inland waterways. 00:06:24.211 --> 00:06:27.460 This is all data that was sampled by the city over the last 5 years. 00:06:27.460 --> 00:06:29.716 And inland waterways are, in general, dirtier. 00:06:29.716 --> 00:06:31.132 That makes sense, right? 00:06:31.132 --> 00:06:32.970 And I learned a few things from this. 00:06:32.970 --> 00:06:39.277 Number 1: never swim in anything that ends in creek or canal. 00:06:39.277 --> 00:06:42.351 Number 2: I also found the dirtiest waterways in New York City 00:06:42.351 --> 00:06:44.047 by this measure, one measure. 00:06:44.047 --> 00:06:45.120 In Coney Island Creek, 00:06:45.120 --> 00:06:48.476 which is not Coney Island you swim in, luckily, it's on the other side. 00:06:48.476 --> 00:06:52.685 But Coney Island Creek, 94% of samples taken over the last 5 years 00:06:52.685 --> 00:06:55.220 have had fecal levels so high, 00:06:55.220 --> 00:06:58.471 that it would be against state law to swim in the water. 00:06:58.471 --> 00:07:01.099 And this is not the kind of fact that you're going to see 00:07:01.099 --> 00:07:03.767 boasted in a city report or on the front page of nyc.gov. 00:07:03.767 --> 00:07:05.313 You're not going to see it there, 00:07:05.313 --> 00:07:08.125 but the fact that we can get to that data, is awesome. 00:07:08.125 --> 00:07:09.925 Once again, it wasn't super easy, 00:07:09.925 --> 00:07:12.251 because this data was not on the open data portal. 00:07:12.251 --> 00:07:14.255 If you were to go to the open data portal, 00:07:14.255 --> 00:07:16.774 you'd see just a snippet of it, a year or a few months. 00:07:16.774 --> 00:07:20.078 It was actually on the Department of Environmental Protection's website. 00:07:20.078 --> 00:07:24.023 Each one of these links is an Excel sheet, and this Excel sheet is different. 00:07:24.023 --> 00:07:26.667 Every heading is different: you copy, paste, reorganize. 00:07:26.667 --> 00:07:29.592 When you do you can make maps and that's great, but once again, 00:07:29.592 --> 00:07:32.473 we can do better than that as a city, we can normalize things. 00:07:32.473 --> 00:07:35.653 We're getting there because there's this website that Socrata makes, 00:07:35.653 --> 00:07:37.131 called the Open Data Portal NYC. 00:07:37.131 --> 00:07:39.275 This is where 1100 data sets, that don't suffer 00:07:39.275 --> 00:07:40.829 from the things I told you live, 00:07:40.829 --> 00:07:42.899 and that number is growing, and that's great. 00:07:42.899 --> 00:07:46.695 You can download data in any format, be it CSV or PDF or Excel document. 00:07:46.695 --> 00:07:49.700 Whatever you want, you can download the data that way. 00:07:49.700 --> 00:07:51.156 The problem is, once you do, 00:07:51.156 --> 00:07:55.625 you'll find that each agency codes their addresses differently. 00:07:55.625 --> 00:07:57.670 So, one is street name, intersection street, 00:07:57.670 --> 00:08:00.155 street, borough, address building, building, address. 00:08:00.155 --> 00:08:03.343 So, once again, you're spending time, even when we have this portal, 00:08:03.343 --> 00:08:05.862 you're spending time normalizing our address field. 00:08:05.862 --> 00:08:08.380 I think that's not the best use of our citizens' time, 00:08:08.380 --> 00:08:10.227 we can do better than that as a city. 00:08:10.227 --> 00:08:11.863 We can standardize our addresses. 00:08:11.863 --> 00:08:13.811 If we do, we can get more maps like this. 00:08:13.811 --> 00:08:16.062 This is a map of fire hydrants in New York City. 00:08:16.062 --> 00:08:17.645 But not just any fire hydrant. 00:08:17.645 --> 00:08:20.171 These are the top 250 grossing fire hydrants 00:08:20.171 --> 00:08:22.862 in terms of parking tickets. 00:08:22.862 --> 00:08:24.988 (Laughter) 00:08:24.988 --> 00:08:27.109 So I learned a few things from this map. 00:08:27.109 --> 00:08:30.239 Number 1: just don't park on the Upper East side. 00:08:30.239 --> 00:08:33.516 Just don't. No matter where you park, you will get a hydrant ticket. 00:08:33.516 --> 00:08:37.952 Number 2: I found the two highest grossing hydrants in all of New York City. 00:08:37.952 --> 00:08:39.475 They are on the Lower East side, 00:08:39.475 --> 00:08:44.597 and they are bringing in over 55,000 dollars a year in parking tickets. 00:08:44.597 --> 00:08:47.262 And that seemed a little strange to me when I noticed it, 00:08:47.262 --> 00:08:49.374 so I did a little digging, and it turns out 00:08:49.374 --> 00:08:52.610 what you had is a hydrant and something called a curb extension, 00:08:52.610 --> 00:08:54.663 which is like a seven-foot space to walk on, 00:08:54.663 --> 00:08:55.846 and then a parking spot. 00:08:55.846 --> 00:08:57.940 So these cars came along and the hydrant - 00:08:57.940 --> 00:08:59.785 "It's all the way over there, I'm fine," 00:08:59.785 --> 00:09:03.254 and there was actually a parking spot painted there beautifully for them. 00:09:03.254 --> 00:09:06.269 They would park there and the NYPD disagree with the designation, 00:09:06.269 --> 00:09:07.562 and would ticket them. 00:09:07.562 --> 00:09:09.905 And it wasn't just me who found a parking ticket. 00:09:09.905 --> 00:09:13.612 This is the Google street view car driving by, finding same parking ticket. 00:09:13.612 --> 00:09:16.076 So I wrote about this on my blog, on I Quant NY, 00:09:16.076 --> 00:09:18.465 and the DOT responded and they said, 00:09:18.465 --> 00:09:22.792 "While the DOT has not received any complaints about this location, 00:09:22.792 --> 00:09:27.341 we will review the roadway markings and make any appropriate alterations." 00:09:27.341 --> 00:09:30.329 I thought to myself, you know, typical government response, 00:09:30.329 --> 00:09:31.944 all right, moved on with my life. 00:09:31.944 --> 00:09:36.519 But then, a few weeks later, something incredible happened. 00:09:36.519 --> 00:09:38.638 They repainted the spot. 00:09:38.638 --> 00:09:41.299 And for a second I thought I saw the future of open data 00:09:41.299 --> 00:09:43.216 because think about what happened here. 00:09:43.216 --> 00:09:48.423 For five years, this spot was being ticketed, and it was confusing. 00:09:48.423 --> 00:09:52.804 And then a citizen found something, they told the city and within a few weeks, 00:09:52.804 --> 00:09:55.286 the problem was fixed. It's amazing. 00:09:55.286 --> 00:09:58.092 A lot of people see open data as being a watch dog, it's not. 00:09:58.092 --> 00:09:59.375 It's about being a partner. 00:09:59.375 --> 00:10:02.504 We can empower our citizens to be better partners for government, 00:10:02.504 --> 00:10:04.081 and it's not that hard. 00:10:04.081 --> 00:10:05.530 All we need are a few changes. 00:10:05.530 --> 00:10:06.663 If you're FOILing data, 00:10:06.663 --> 00:10:09.305 if you seeing your data being FOILed over and over again, 00:10:09.305 --> 00:10:12.109 let's release it to the public, that's a sign that it should be made public. 00:10:12.109 --> 00:10:15.015 And if you're a government agency releasing a PDF, 00:10:15.015 --> 00:10:18.909 let's pass a legislation that requires you to post it with your underlying data, 00:10:18.909 --> 00:10:20.944 because that data is coming from somewhere. 00:10:20.944 --> 00:10:23.649 I don't know where, but you can release it with the PDF. 00:10:23.649 --> 00:10:25.981 And let's adopt and share some open data standards. 00:10:25.981 --> 00:10:28.680 Let's start with our addresses here in New York City. 00:10:28.680 --> 00:10:30.707 Let's just start normalizing our addresses. 00:10:30.707 --> 00:10:32.691 Because New York is a leader in open data. 00:10:32.691 --> 00:10:35.318 Despite all this, we're absolutely a leader in open data, 00:10:35.318 --> 00:10:38.375 and if we start normalizing things, and set an open data standard, 00:10:38.375 --> 00:10:39.332 others will follow. 00:10:39.332 --> 00:10:41.834 The state will follow, maybe the federal government, 00:10:41.834 --> 00:10:43.393 other countries could follow, 00:10:43.393 --> 00:10:47.112 and we're not that far off from a time where you can write one program 00:10:47.112 --> 00:10:49.176 and map information from a 100 countries. 00:10:49.176 --> 00:10:51.750 It's not science fiction, we're actually quite close. 00:10:51.750 --> 00:10:54.037 And by the way, who are we empowering with this? 00:10:54.037 --> 00:10:57.614 Because it's not just John Krauss, it's not just Chris Whong. 00:10:57.614 --> 00:11:00.905 There are hundred of meetups going around in New York City right now, 00:11:00.905 --> 00:11:02.027 active meetups. 00:11:02.027 --> 00:11:04.606 There are thousands of people attending these meetups. 00:11:04.606 --> 00:11:06.973 These people are going after work and on weekends, 00:11:06.973 --> 00:11:09.758 and they're attending these meetups to look at open data, 00:11:09.758 --> 00:11:11.384 and make our city a better place. 00:11:11.384 --> 00:11:15.929 Groups like BetaNYC who just last week, released something called citygram.nyc 00:11:15.929 --> 00:11:18.161 that allows you to subscribe to 311 complaints 00:11:18.161 --> 00:11:20.305 around your own home, or around your office. 00:11:20.305 --> 00:11:22.686 You put in your address, you get local complaints. 00:11:22.686 --> 00:11:25.774 And it's not just the tech community that are after these things. 00:11:25.774 --> 00:11:28.399 It's urban planners like the students I teach at Pratt. 00:11:28.399 --> 00:11:30.363 It's policy advocates, it's everyone, 00:11:30.363 --> 00:11:32.919 it's citizens from a diverse set of backgrounds. 00:11:32.919 --> 00:11:35.680 And with some small incremental changes, 00:11:35.680 --> 00:11:38.788 we can unlock the passion and the ability of our citizens 00:11:38.788 --> 00:11:41.897 to harness open data and make our city even better, 00:11:41.897 --> 00:11:45.958 whether is one data set or one parking spot at a time. 00:11:45.958 --> 00:11:47.213 Thank you. 00:11:47.213 --> 00:11:50.274 (Applause)