Return to Video

How we found the worst place to park in New York City — using big data

  • 0:01 - 0:04
    Six thousand miles of road,
  • 0:04 - 0:06
    600 miles of subway track,
  • 0:06 - 0:07
    400 miles of bike lanes
  • 0:07 - 0:09
    and a half a mile of tram track,
  • 0:09 - 0:11
    if you've ever been to Roosevelt Island.
  • 0:11 - 0:14
    These are the numbers that make up
    the infrastructure of New York City.
  • 0:14 - 0:17
    These are the statistics
    of our infrastructure.
  • 0:17 - 0:21
    They're the kind of numbers you can find
    released in reports by city agencies.
  • 0:21 - 0:24
    For example, the Department
    of Transportation will probably tell you
  • 0:24 - 0:26
    how many miles of road they maintain.
  • 0:26 - 0:29
    The MTA will boast how many miles
    of subway track there are.
  • 0:29 - 0:30
    Most city agencies give us statistics.
  • 0:30 - 0:32
    This is from a report this year
  • 0:32 - 0:34
    from the Taxi and Limousine Commission,
  • 0:34 - 0:37
    where we learn that there's about
    13,500 taxis here in New York City.
  • 0:37 - 0:38
    Pretty interesting, right?
  • 0:38 - 0:41
    But did you ever think about
    where these numbers came from?
  • 0:41 - 0:44
    Because for these numbers to exist,
    someone at the city agency
  • 0:44 - 0:48
    had to stop and say, hmm, here's a number
    that somebody might want want to know.
  • 0:48 - 0:50
    Here's a number
    that our citizens want to know.
  • 0:50 - 0:52
    So they go back to their raw data,
  • 0:52 - 0:54
    they count, they add, they calculate,
  • 0:54 - 0:55
    and then they put out reports,
  • 0:55 - 0:57
    and those reports
    will have numbers like this.
  • 0:57 - 1:00
    The problem is, how do they know
    all of our questions?
  • 1:00 - 1:01
    We have lots of questions.
  • 1:01 - 1:05
    In fact, in some ways there's literally
    an infinite number of questions
  • 1:05 - 1:06
    that we can ask about our city.
  • 1:06 - 1:08
    The agencies can never keep up.
  • 1:08 - 1:12
    So the paradigm isn't exactly working,
    and I think our policymakers realize that,
  • 1:12 - 1:16
    because in 2012, Mayor Bloomberg
    signed into law what he called
  • 1:16 - 1:20
    the most ambitious and comprehensive
    open data legislation in the country.
  • 1:20 - 1:21
    In a lot of ways, he's right.
  • 1:21 - 1:24
    In the last two years,
    the city has released 1,000 datasets
  • 1:24 - 1:26
    on our open data portal,
  • 1:26 - 1:27
    and it's pretty awesome.
  • 1:27 - 1:29
    So you go and look at data like this,
  • 1:29 - 1:32
    and instead of just counting
    the number of cabs,
  • 1:32 - 1:34
    we can start to ask different questions.
  • 1:34 - 1:35
    So I had a question.
  • 1:35 - 1:36
    When's rush hour in New York City?
  • 1:36 - 1:39
    It can be pretty bothersome.
    When is rush hour exactly?
  • 1:39 - 1:42
    And I thought to myself,
    these cabs aren't just numbers,
  • 1:42 - 1:44
    these are GPS recorders
    driving around in our city streets
  • 1:44 - 1:46
    recording each and every ride they take.
  • 1:46 - 1:49
    There's data there,
    and I looked at that data,
  • 1:49 - 1:53
    and I made a plot of the average speed of
    taxis in New York City throughout the day.
  • 1:53 - 1:56
    You can see that from about midnight
    to around 5:18 in the morning,
  • 1:56 - 2:00
    speed increases, and at that point,
    things turn around,
  • 2:00 - 2:04
    and they get slower and slower and slower
    until about 8:35 in the morning,
  • 2:04 - 2:06
    when they end up at around
    11 and a half miles per hour.
  • 2:06 - 2:10
    The average taxi is going 11 and a half
    miles per hour on our city streets,
  • 2:10 - 2:12
    and it turns out it stays that way
  • 2:12 - 2:15
    for the entire day.
  • 2:15 - 2:16
    (Laughter)
  • 2:16 - 2:20
    So I said to myself, I guess
    there's no rush hour in New York City.
  • 2:20 - 2:21
    There's just a rush day.
  • 2:21 - 2:24
    Makes sense. And this is important
    for a couple of reasons.
  • 2:24 - 2:28
    If you're a transportation planner,
    this might be pretty interesting to know.
  • 2:28 - 2:30
    But if you want to get somewhere quickly,
  • 2:30 - 2:33
    you now know to set your alarm for
    4:45 in the morning and you're all set.
  • 2:33 - 2:34
    New York, right?
  • 2:34 - 2:36
    But there's a story behind this data.
  • 2:36 - 2:38
    This data wasn't
    just available, it turns out.
  • 2:38 - 2:42
    It actually came from something called
    a Freedom of Information Law Request,
  • 2:42 - 2:43
    or a FOIL Request.
  • 2:43 - 2:46
    This is a form you can find on the
    Taxi and Limousine Commission website.
  • 2:46 - 2:49
    In order to access this data,
    you need to go get this form,
  • 2:49 - 2:51
    fill it out, and they will notify you,
  • 2:51 - 2:53
    and a guy named Chris Whong
    did exactly that.
  • 2:53 - 2:55
    Chris went down, and they told him,
  • 2:55 - 2:58
    "Just bring a brand new hard drive
    down to our office,
  • 2:58 - 3:01
    leave it here for five hours,
    we'll copy the data and you take it back."
  • 3:01 - 3:03
    And that's where this data came from.
  • 3:03 - 3:06
    Now, Chris is the kind of guy
    who wants to make the data public,
  • 3:06 - 3:10
    and so it ended up online for all to use,
    and that's where this graph came from.
  • 3:10 - 3:14
    And the fact that it exists is amazing.
    These GPS recorders -- really cool.
  • 3:14 - 3:17
    But the fact that we have citizens
    walking around with hard drives
  • 3:17 - 3:19
    picking up data from city agencies
    to make it public --
  • 3:19 - 3:22
    it was already kind of public,
    you could get to it,
  • 3:22 - 3:23
    but it was "public," it wasn't public.
  • 3:23 - 3:25
    And we can do better than that as a city.
  • 3:25 - 3:28
    We don't need our citizens
    walking around with hard drives.
  • 3:28 - 3:31
    Now, not every dataset
    is behind a FOIL Request.
  • 3:31 - 3:34
    Here is a map I made with the most
    dangerous intersections in New York City
  • 3:34 - 3:36
    based on cyclist accidents.
  • 3:36 - 3:38
    So the red areas are more dangerous.
  • 3:38 - 3:41
    And what it shows is first
    the East side of Manhattan,
  • 3:41 - 3:44
    especially in the lower area of Manhattan,
    has more cyclist accidents.
  • 3:44 - 3:45
    That might make sense
  • 3:45 - 3:48
    because there are more cyclists
    coming off the bridges there.
  • 3:48 - 3:50
    But there's other hotspots worth studying.
  • 3:50 - 3:53
    There's Williamsburg.
    There's Roosevelt Avenue in Queens.
  • 3:53 - 3:56
    And this is exactly the kind of data
    we need for Vision Zero.
  • 3:56 - 3:58
    This is exactly what we're looking for.
  • 3:58 - 4:00
    But there's a story
    behind this data as well.
  • 4:00 - 4:02
    This data didn't just appear.
  • 4:02 - 4:04
    How many of you guys know this logo?
  • 4:04 - 4:06
    Yeah, I see some shakes.
  • 4:06 - 4:08
    Have you ever tried to copy
    and paste data out of a PDF
  • 4:08 - 4:10
    and make sense of it?
  • 4:10 - 4:11
    I see more shakes.
  • 4:11 - 4:14
    More of you tried copying and pasting
    than knew the logo. I like that.
  • 4:14 - 4:18
    So what happened is, the data
    that you just saw was actually on a PDF.
  • 4:18 - 4:21
    In fact, hundreds and hundreds
    and hundreds of pages of PDF
  • 4:21 - 4:23
    put out by our very own NYPD,
  • 4:23 - 4:26
    and in order to access it,
    you would either have to copy and paste
  • 4:26 - 4:28
    for hundreds and hundreds of hours,
  • 4:28 - 4:29
    or you could be John Krauss.
  • 4:29 - 4:30
    John Krauss was like,
  • 4:30 - 4:34
    I'm not going to copy and paste this data.
    I'm going to write a program.
  • 4:34 - 4:36
    It's called the NYPD Crash Data Band-Aid,
  • 4:36 - 4:39
    and it goes to the NYPD's website
    and it would download PDFs.
  • 4:39 - 4:42
    Every day it would search;
    if it found a PDF, it would download it
  • 4:42 - 4:44
    and then it would run
    some PDF-scraping program,
  • 4:44 - 4:46
    and out would come the text,
  • 4:46 - 4:49
    and it would go on the Internet,
    and then people could make maps like that.
  • 4:49 - 4:53
    And the fact that the data's here,
    the fact that we have access to it --
  • 4:53 - 4:55
    Every accident, by the way,
    is a row in this table.
  • 4:55 - 4:57
    You can imagine how many PDFs that is.
  • 4:57 - 4:59
    The fact that we
    have access to that is great,
  • 4:59 - 5:01
    but let's not release it in PDF form,
  • 5:01 - 5:04
    because then we're having our citizens
    write PDF scrapers.
  • 5:04 - 5:06
    It's not the best use
    of our citizens' time,
  • 5:06 - 5:08
    and we as a city can do better than that.
  • 5:08 - 5:11
    Now, the good news is that
    the de Blasio administration
  • 5:11 - 5:13
    actually recently released this data
    a few months ago,
  • 5:13 - 5:15
    and so now we can
    actually have access to it,
  • 5:15 - 5:18
    but there's a lot of data
    still entombed in PDF.
  • 5:18 - 5:21
    For example, our crime data
    is still only available in PDF.
  • 5:21 - 5:25
    And not just our crime data,
    our own city budget.
  • 5:25 - 5:29
    Our city budget is only readable
    right now in PDF form.
  • 5:29 - 5:31
    And it's not just us
    that can't analyze it --
  • 5:31 - 5:34
    our own legislators
    who vote for the budget
  • 5:34 - 5:36
    also only get it in PDF.
  • 5:36 - 5:40
    So our legislators cannot analyze
    the budget that they are voting for.
  • 5:40 - 5:43
    And I think as a city we can do
    a little better than that as well.
  • 5:43 - 5:46
    Now, there's a lot of data
    that's not hidden in PDFs.
  • 5:46 - 5:47
    This is an example of a map I made,
  • 5:47 - 5:50
    and this is the dirtiest waterways
    in New York City.
  • 5:50 - 5:52
    Now, how do I measure dirty?
  • 5:52 - 5:54
    Well, it's kind of a little weird,
  • 5:54 - 5:56
    but I looked at the level
    of fecal coliform,
  • 5:56 - 5:59
    which is a measurement of fecal matter
    in each of our waterways.
  • 5:59 - 6:03
    The larger the circle,
    the dirtier the water,
  • 6:03 - 6:06
    so the large circles are dirty water,
    the small circles are cleaner.
  • 6:06 - 6:08
    What you see is inland waterways.
  • 6:08 - 6:11
    This is all data that was sampled
    by the city over the last five years.
  • 6:11 - 6:14
    And inland waterways are,
    in general, dirtier.
  • 6:14 - 6:15
    That makes sense, right?
  • 6:15 - 6:18
    And the bigger circles are dirty.
    And I learned a few things from this.
  • 6:18 - 6:21
    Number one: Never swim in anything
    that ends in "creek" or "canal."
  • 6:21 - 6:26
    But number two: I also found
    the dirtiest waterway in New York City,
  • 6:26 - 6:28
    by this measure, one measure.
  • 6:28 - 6:31
    In Coney Island Creek, which is not
    the Coney Island you swim in, luckily.
  • 6:31 - 6:32
    It's on the other side.
  • 6:32 - 6:36
    But Coney Island Creek, 94 percent
    of samples taken over the last five years
  • 6:36 - 6:38
    have had fecal levels so high
  • 6:38 - 6:41
    that it would be against state law
    to swim in the water.
  • 6:41 - 6:44
    And this is not the kind of fact
    that you're going to see
  • 6:44 - 6:46
    boasted in a city report, right?
  • 6:46 - 6:48
    It's not going to be
    the front page on nyc.gov.
  • 6:48 - 6:50
    You're not going to see it there,
  • 6:50 - 6:52
    but the fact that we can get
    to that data is awesome.
  • 6:52 - 6:54
    But once again, it wasn't super easy,
  • 6:54 - 6:56
    because this data was not
    on the open data portal.
  • 6:56 - 6:58
    If you were to go to the open data portal,
  • 6:58 - 7:01
    you'd see just a snippet of it,
    a year or a few months.
  • 7:01 - 7:04
    It was actually on the Department
    of Environmental Protection's website.
  • 7:04 - 7:08
    And each one of these links is an Excel
    sheet, and each Excel sheet is different.
  • 7:08 - 7:11
    Every heading is different:
    you copy, paste, reorganize.
  • 7:11 - 7:14
    When you do you can make maps
    and that's great, but once again,
  • 7:14 - 7:17
    we can do better than that
    as a city, we can normalize things.
  • 7:17 - 7:20
    And we're getting there, because
    there's this website that Socrata makes
  • 7:20 - 7:22
    called the Open Data Portal NYC.
  • 7:22 - 7:24
    This is where 1,100 data sets
    that don't suffer
  • 7:24 - 7:26
    from the things I just told you live,
  • 7:26 - 7:28
    and that number is growing,
    and that's great.
  • 7:28 - 7:31
    You can download data in any format,
    be it CSV or PDF or Excel document.
  • 7:31 - 7:34
    Whatever you want,
    you can download the data that way.
  • 7:34 - 7:35
    The problem is, once you do,
  • 7:35 - 7:39
    you will find that each agency
    codes their addresses differently.
  • 7:39 - 7:41
    So one is street name,
    intersection street,
  • 7:41 - 7:43
    street, borough, address, building,
    building address.
  • 7:43 - 7:47
    So once again, you're spending time,
    even when we have this portal,
  • 7:47 - 7:49
    you're spending time
    normalizing our address fields.
  • 7:49 - 7:52
    And that's not the best use
    of our citizens' time.
  • 7:52 - 7:53
    We can do better than that as a city.
  • 7:53 - 7:55
    We can standardize our addresses,
  • 7:55 - 7:57
    and if we do,
    we can get more maps like this.
  • 7:57 - 8:00
    This is a map of fire hydrants
    in New York City,
  • 8:00 - 8:01
    but not just any fire hydrants.
  • 8:01 - 8:06
    These are the top 250 grossing fire
    hydrants in terms of parking tickets.
  • 8:06 - 8:08
    (Laughter)
  • 8:08 - 8:11
    So I learned a few things from this map,
    and I really like this map.
  • 8:11 - 8:14
    Number one, just don't park
    on the Upper East Side.
  • 8:14 - 8:17
    Just don't. It doesn't matter where
    you park, you will get a hydrant ticket.
  • 8:17 - 8:21
    Number two, I found the two highest
    grossing hydrants in all of New York City,
  • 8:21 - 8:23
    and they're on the Lower East Side,
  • 8:23 - 8:28
    and they were bringing in over
    55,000 dollars a year in parking tickets.
  • 8:28 - 8:31
    And that seemed a little strange
    to me when I noticed it,
  • 8:31 - 8:34
    so I did a little digging and it turns out
    what you had is a hydrant
  • 8:34 - 8:36
    and then something called
    a curb extension,
  • 8:36 - 8:38
    which is like a seven-foot
    space to walk on,
  • 8:38 - 8:39
    and then a parking spot.
  • 8:39 - 8:42
    And so these cars came along,
    and the hydrant --
  • 8:42 - 8:44
    "It's all the way over there, I'm fine,"
  • 8:44 - 8:47
    and there was actually a parking spot
    painted there beautifully for them.
  • 8:47 - 8:50
    They would park there, and the NYPD
    disagreed with this designation
  • 8:50 - 8:51
    and would ticket them.
  • 8:51 - 8:54
    And it wasn't just me
    who found a parking ticket.
  • 8:54 - 8:56
    This is the Google
    Street View car driving by
  • 8:56 - 8:57
    finding the same parking ticket.
  • 8:57 - 9:02
    So I wrote about this on my blog,
    on I Quant NY, and the DOT responded,
  • 9:02 - 9:03
    and they said,
  • 9:03 - 9:06
    "While the DOT has not received
    any complaints about this location,
  • 9:06 - 9:11
    we will review the roadway markings
    and make any appropriate alterations."
  • 9:11 - 9:14
    And I thought to myself,
    typical government response,
  • 9:14 - 9:16
    all right, moved on with my life.
  • 9:16 - 9:20
    But then, a few weeks later,
    something incredible happened.
  • 9:20 - 9:22
    They repainted the spot,
  • 9:22 - 9:25
    and for a second I thought I saw
    the future of open data,
  • 9:25 - 9:27
    because think about what happened here.
  • 9:27 - 9:32
    For five years, this spot was being
    ticketed, and it was confusing,
  • 9:32 - 9:36
    and then a citizen found something,
    they told the city, and within a few weeks
  • 9:36 - 9:38
    the problem was fixed.
  • 9:38 - 9:41
    It's amazing. And a lot of people
    see open data as being a watchdog.
  • 9:41 - 9:43
    It's not, it's about being a partner.
  • 9:43 - 9:46
    We can empower our citizens
    to be better partners for government,
  • 9:46 - 9:48
    and it's not that hard.
  • 9:48 - 9:49
    All we need are a few changes.
  • 9:49 - 9:50
    If you're FOILing data,
  • 9:50 - 9:53
    if you're seeing your data
    being FOILed over and over again,
  • 9:53 - 9:57
    let's release it to the public, that's
    a sign that it should be made public.
  • 9:57 - 9:59
    And if you're a government agency
    releasing a PDF,
  • 9:59 - 10:03
    let's pass legislation that requires you
    to post it with the underlying data,
  • 10:03 - 10:05
    because that data
    is coming from somewhere.
  • 10:05 - 10:07
    I don't know where, but it's
    coming from somewhere,
  • 10:07 - 10:09
    and you can release it with the PDF.
  • 10:09 - 10:11
    And let's adopt and share
    some open data standards.
  • 10:11 - 10:14
    Let's start with our addresses
    here in New York City.
  • 10:14 - 10:16
    Let's just start
    normalizing our addresses.
  • 10:16 - 10:18
    Because New York is a leader in open data.
  • 10:18 - 10:21
    Despite all this, we are absolutely
    a leader in open data,
  • 10:21 - 10:24
    and if we start normalizing things,
    and set an open data standard,
  • 10:24 - 10:28
    others will follow. The state will follow,
    and maybe the federal government,
  • 10:28 - 10:29
    Other countries could follow,
  • 10:29 - 10:32
    and we're not that far off from a time
    where you could write one program
  • 10:32 - 10:34
    and map information from 100 countries.
  • 10:34 - 10:37
    It's not science fiction.
    We're actually quite close.
  • 10:37 - 10:39
    And by the way, who are we
    empowering with this?
  • 10:39 - 10:42
    Because it's not just John Krauss
    and it's not just Chris Whong.
  • 10:42 - 10:45
    There are hundreds of meetups
    going on in New York City right now,
  • 10:45 - 10:46
    active meetups.
  • 10:46 - 10:49
    There are thousands of people
    attending these meetups.
  • 10:49 - 10:51
    These people are going after work
    and on weekends,
  • 10:51 - 10:54
    and they're attending these meetups
    to look at open data
  • 10:54 - 10:55
    and make our city a better place.
  • 10:55 - 10:59
    Groups like BetaNYC, who just last week
    released something called citygram.nyc
  • 10:59 - 11:02
    that allows you to subscribe
    to 311 complaints
  • 11:02 - 11:04
    around your own home,
    or around your office.
  • 11:04 - 11:06
    You put in your address,
    you get local complaints.
  • 11:06 - 11:09
    And it's not just the tech community
    that are after these things.
  • 11:09 - 11:12
    It's urban planners like
    the students I teach at Pratt.
  • 11:12 - 11:14
    It's policy advocates, it's everyone,
  • 11:14 - 11:17
    it's citizens from a diverse
    set of backgrounds.
  • 11:17 - 11:19
    And with some small, incremental changes,
  • 11:19 - 11:23
    we can unlock the passion
    and the ability of our citizens
  • 11:23 - 11:26
    to harness open data
    and make our city even better,
  • 11:26 - 11:29
    whether it's one dataset,
    or one parking spot at a time.
  • 11:29 - 11:32
    Thank you.
  • 11:32 - 11:35
    (Applause)
Title:
How we found the worst place to park in New York City — using big data
Speaker:
Ben Wellington
Description:

City agencies have access to a wealth of data and statistics reflecting every part of urban life. But as data analyst Ben Wellington suggests in this entertaining talk, sometimes they just don't know what to do with it. He shows how a combination of unexpected questions and smart data crunching can produce strangely useful insights, and shares tips on how to release large sets of data so that anyone can use them.

more » « less
Video Language:
English
Team:
closed TED
Project:
TEDTalks
Duration:
11:48

English subtitles

Revisions Compare revisions