Return to Video

Optimising public transport: A data-driven bike-sharing study in Marburg

  • 0:00 - 0:25
    RC3 preroll music
  • 0:25 - 0:31
    Herald: Hello, everyone, welcome back to
    Chaos West TV. The next talk will start
  • 0:31 - 0:35
    momentarily. I will now switch back to
    German for a few seconds to announce a
  • 0:35 - 0:41
    translation. Then I'll switch back and
    then we'll go off to the races as they say
  • 0:41 - 0:46
    So nochmal schnell auf Deutsch,
    willkommen zurück zu Chaos West TV, eure
  • 0:46 - 0:51
    beste Bühne auf dem rc3. Der nächste Talk
    beginnt gleich er ist zwar auf Englisch
  • 0:51 - 0:56
    wird aber wie so vieles dank unserer
    Übersetzungscrew auf Deutsch übersetzt.
  • 0:56 - 1:00
    Ihr solltet in der Lage sein das im Stream
    einfach auszuwählen ohne größere Probleme
  • 1:00 - 1:04
    und dann könnt ihr den Vortrag auch direkt
    simultanübersetzt auf Deutsch hören
  • 1:04 - 1:06
    und ich rede jetzt auf Englisch weiter.
  • 1:06 - 1:08
    Alright back to English.
  • 1:08 - 1:12
    Now in the comfort of your own homes
    or wherever you're viewing the stream,
  • 1:12 - 1:15
    please do a warm round of applause
    for our next speaker,
  • 1:15 - 1:23
    Martin, who will talk about
    optimizing public transport.
  • 1:23 - 1:24
    Let's go.
  • 1:27 - 1:33
    Martin: Welcome to my contribution to this
    year's rC3 2021 in the form of this talk,
  • 1:33 - 1:37
    Optimizing public transport:
    a data-driven bike sharing study in Marburg
  • 1:37 - 1:42
    I would like to thank the organizers of the
    rC3 2021 for organizing the whole event.
  • 1:42 - 1:47
    And in particular, I would like to thank
    the channel that accepted me Chaos West TV
  • 1:47 - 1:53
    well for accepting the presentation of my
    work. Today I would like to give you a
  • 1:53 - 1:57
    quick overview of one of my hobby projects
    in which I scraped and therefore
  • 1:57 - 2:02
    downloaded over one million data points
    regarding the bike sharing system in the
  • 2:02 - 2:09
    city of Marburg. This study came about
    when I was traveling from Stuttgart to
  • 2:09 - 2:13
    Frankfurt and ultimately to Marburg some
    time ago, and I was watching the amazing
  • 2:13 - 2:17
    SpiegelMining talk by David Kriesel. So
    thank you very much for this implicit
  • 2:17 - 2:21
    inspiration of the work that you're about
    to see now.
  • 2:21 - 2:26
    Who am I? My name is Martin Lellep,
    and I studied physics in the past,
  • 2:26 - 2:30
    and actually, I continue to do so in the
    form of a Ph.D. in theoretical physics at
  • 2:30 - 2:34
    the University of Edinburgh in Scotland
    and in my spare time I like to do data
  • 2:34 - 2:39
    analysis of all kinds of data.
    There are two more things...
  • 2:39 - 2:42
    There are two more things
    that are important for here now.
  • 2:42 - 2:46
    It's first of all, I studied at the
    University of Marburg, obviously in
  • 2:46 - 2:52
    Marburg previously, and then also I like
    to ride my bike. Marburg, for those who
  • 2:52 - 2:56
    don't know it yet, it's a small,
    university dominated town that is in the
  • 2:56 - 3:01
    north of Frankfurt am Main, roughly 80
    kilometers. So an hour by car or an hour
  • 3:01 - 3:07
    by train, approximately. And again, it's
    quite dominated by the university that is
  • 3:07 - 3:12
    located there, and that can be seen simply
    in terms of, for instance, numbers. There
  • 3:12 - 3:19
    are roughly 25,000 students for an overall
    population of 77,000 residents in total,
  • 3:19 - 3:27
    which is quite substantial, obviously. You
    can see a quite popular picture here of a
  • 3:27 - 3:33
    picturesque scene in Marburg. We can see
    the castle and then the river Lahn, as
  • 3:33 - 3:37
    well as a few houses and a bit of green.
    And the bike rentals are currently
  • 3:37 - 3:44
    provided at the time of recording this by
    the company called Nextbike. Before now
  • 3:44 - 3:50
    diving into a bit more technical details,
    I would like to motivate my story or my
  • 3:50 - 3:55
    study by the story of Anna. Anna is a
    university... is a university student at
  • 3:55 - 4:00
    the University of Marburg, and she lives a
    bit outside the city, so she typically
  • 4:00 - 4:07
    does not walk to the place that she needs
    to be or study at. But she takes the bus
  • 4:07 - 4:13
    from her... from her flat to the
    university, to the city. And then does the
  • 4:13 - 4:20
    last mile by walking or cycling or
    whatever. And she's also quite an eager
  • 4:20 - 4:24
    student, so she very often studies quite
    late. As you can see here, that's a
  • 4:24 - 4:30
    picture of late Marburg, so to say, and
    just as it happens now, she needs to catch
  • 4:30 - 4:35
    a bus now because she's a bit late. She
    forgot to pack in her... her fancy MacBook
  • 4:35 - 4:41
    in time, so she needs to hurry up a
    bit and, well, didn't really make it. So
  • 4:41 - 4:44
    therefore, she thought maybe
  • 4:44 - 4:47
    taking a Nextbike for the last mile
    to the bus station is a good idea
  • 4:47 - 4:51
    so she can safely take then subsequently
    the bus home. And normally the bus…
  • 4:51 - 4:56
    The Nextbike stations look like
    that here. So there are plenty of bikes.
  • 4:56 - 5:03
    It's very easy to go there, grab a bike
    and go to your destination. Now Anna must
  • 5:03 - 5:08
    be a very unlucky student today because
    she arrives at the bike station, and it
  • 5:08 - 5:13
    turns out that the station is empty, so
    ultimately she misses at least this bus
  • 5:13 - 5:20
    and therefore only arrives at home a bit
    later. Her cooking plans and her Netflix
  • 5:20 - 5:26
    plans, all that stuff postponed a bit
    because, well, she arrives a bit later.
  • 5:26 - 5:33
    And that's, of course, a very, very sad
    story, and maybe it happens to multiple
  • 5:33 - 5:38
    people, not only Anna. And in fact, it
    also happened to me a few times, and every
  • 5:38 - 5:42
    time it happened to me, I thought, well, I
    must be the most unlucky person in whole
  • 5:42 - 5:47
    Marburg going to a normally completely
    fully packed bike station and now it's
  • 5:47 - 5:52
    completely empty. Missing, for instance,
    subsequent public transportation.
  • 5:52 - 5:57
    After it happened to me a few times, I
    thought, well, maybe I'm not that unlucky.
  • 5:57 - 6:03
    So is there may be a system to empty bike
    stations in Marburg. And given all my
  • 6:03 - 6:07
    my spare time interest of analyzing and
    capturing data, I thought, well, data to
  • 6:07 - 6:13
    the rescue, of course. And therefore, the
    idea for this talk now was to build a web
  • 6:13 - 6:18
    scraper in order to acquire Nextbike data.
    Collect the data, store the data, analyze
  • 6:18 - 6:23
    the data and then hopefully finally help
    Anna, me, and other students to figure out
  • 6:23 - 6:29
    which stations maybe to avoid and which
    stations are safe to go to if you're in
  • 6:29 - 6:31
    desperate need for a bike.
  • 6:32 - 6:36
    The tech stack that I'm using here,
    it's based on a Docker container
  • 6:36 - 6:40
    in which a python scraper runs
    every 30 seconds that queries the
  • 6:40 - 6:45
    Nextbike API. It downloads the data, it
    parses the data, and then saves the data
  • 6:45 - 6:51
    outside the Docker container in order to
    be evaluated later on. And it turns out
  • 6:51 - 6:56
    that the whole concept of what I just
    described also has a name. It's called
  • 6:56 - 7:02
    Extract, Transform, Load Pipeline or ETL
    in short. And what I again wrote here is
  • 7:02 - 7:06
    an ETL pipeline in Python, and then I
    wrote an analysis code also written in
  • 7:06 - 7:14
    Python and all that was running on a small
    home server in my flat. The data that I
  • 7:14 - 7:20
    captured consists of the bikes identified
    through IDs and then also the locations of
  • 7:20 - 7:24
    those bikes, typically at stations, but
    some of them were also freestanding and
  • 7:24 - 7:29
    last but not least, the station locations,
    and of course, obviously also a list of of
  • 7:29 - 7:39
    stations. And then with it, I went ahead
    and did a few pictures that I'm about to
  • 7:39 - 7:44
    show now and a few analyses. And if you're
    interested in that and there are slides
  • 7:44 - 7:49
    available on this website here, the
    website can be read through the QR code or
  • 7:49 - 7:53
    through that link and this website
    contains the slides that you'll see in
  • 7:53 - 7:57
    here, high resolution figures, a few
    interactive figures and all the
  • 7:57 - 8:01
    information on the previous blog articles
    that I wrote about this topic.
  • 8:03 - 8:07
    So the results of Anna, first of all, to
    start slowly. It turns out that there are
  • 8:07 - 8:13
    37 bike stations in Marburg,
    with roughly 230 bikes spread across
  • 8:13 - 8:16
    the whole Nextbike Marburg ecosystem.
  • 8:17 - 8:21
    And it's now, well, knowing that
    there are roughly 40 stations,
  • 8:21 - 8:23
    it's quite interesting to see
    where these stations are,
  • 8:23 - 8:25
    because then Anna could,
  • 8:25 - 8:29
    for instance, already go to another
    station if one station is empty.
  • 8:29 - 8:33
    And what you can see here is now a map
    of Marburg, where the stations are
  • 8:33 - 8:36
    annotated by these dots.
    And the area of the dot,
  • 8:36 - 8:40
    as well as the color code,
    corresponds to the average number of
  • 8:40 - 8:47
    parked bikes at that station. So let's see
    an interactive version because it's a bit
  • 8:47 - 8:54
    nicer to see it in that way. So I click on
    here. Alright. OK, now we can pan around
  • 8:54 - 9:00
    and zoom as you can often do with these
    interactive graphics and also by clicking
  • 9:00 - 9:05
    on these buttons or on these these points,
    you can see the station name, as well as
  • 9:05 - 9:12
    the average number of bikes placed there.
    And becomes quite obvious that, well, most
  • 9:12 - 9:18
    of the stations are in the central part of
    the city, a few in the outskirts here. And
  • 9:18 - 9:23
    it turns out that the largest station in
    terms of the number of parked bikes on
  • 9:23 - 9:28
    average is the main train station
    Hauptbahnhof. There are again a few more
  • 9:28 - 9:31
    spread around the
    central part of the station,
  • 9:31 - 9:34
    such as the Elisabeth-Blochmann-Platz,
    which is the second largest station.
  • 9:34 - 9:38
    And then if you continue the train
    line here, you can see that there's
  • 9:38 - 9:45
    actually another set of stations, where
    the secondary train station is.
  • 9:45 - 9:48
    So that's another train station,
    smaller train station.
  • 9:51 - 9:58
    OK, so the first results for Anna
    would then be a day-hour usage histogram,
  • 9:58 - 10:04
    because it's the kind of the first order
    approach, I would say, in order to see how
  • 10:04 - 10:12
    the ecosystem of Nextbikes is in use
    against day as well as hour. And therefore
  • 10:12 - 10:19
    Anna will based on this figure here, she
    will understand when to maybe plan for a
  • 10:19 - 10:24
    bit more time when looking for a bike in a
    desperate fashion. And since this figure
  • 10:24 - 10:28
    is a bit more difficult to understand, I
    would like to take a moment to explain it
  • 10:28 - 10:32
    and we are going to start with the top
    figure here. What you can see on the x
  • 10:32 - 10:36
    axis is the hour of the day and on the y
    axis, and that's shown in the whole
  • 10:36 - 10:40
    figure. So each of the the numbers that
    you see is the following: it's the
  • 10:40 - 10:47
    average. And well it's the number of
    parked bikes and then you subtract the
  • 10:47 - 10:52
    average of the number of parked bikes in
    the whole ecosystem of Marburg. So that
  • 10:52 - 10:56
    means if a number of zero is encountered
    like roughly here, it means that the
  • 10:56 - 11:01
    average number of parked bikes simply in
    the system at that point in time. When the
  • 11:01 - 11:06
    number is larger, it's above the average,
    if it's smaller, it's below the average.
  • 11:06 - 11:11
    And you can clearly see from this small
    figure here already that in the morning,
  • 11:11 - 11:15
    more bikes are typically parked. And then
    in the evenings or around noon, you can
  • 11:15 - 11:22
    see two dips, a bimodal distribution so to
    say. Where people, well, obviously use
  • 11:22 - 11:28
    bikes around noon and six p.m. roughly
    where these used bikes, of course, are not
  • 11:28 - 11:32
    parked, and therefore these numbers are
    smaller. And the same thing can be done
  • 11:32 - 11:37
    for the day of the week. Here and here you
    can see that the Monday, well, the
  • 11:37 - 11:40
    beginning of the week and the end of the
    week, meaning Monday, Tuesday and Saturday
  • 11:40 - 11:47
    Sunday are a bit more popular, so more
    people ride a bike and therefore fewer
  • 11:47 - 11:50
    bikes are parked and therefore this is
    negative. And then in the middle of the
  • 11:50 - 11:56
    week, fewer people seem to ride the bike,
    the bikes in general. And if you combine
  • 11:56 - 12:00
    these figures now, you can see the the
    joint histogram here, where you can not
  • 12:00 - 12:05
    only look for time or day separately, but
    also in a combined fashion. So you would,
  • 12:05 - 12:10
    for instance, see that Monday morning is
    the time where many people use bikes
  • 12:10 - 12:14
    because they are not as many bikes parked.
    And then also on a Saturday, you can see
  • 12:14 - 12:21
    the same, so around afternoon many people
    seem to use the bikes. Last but not least
  • 12:21 - 12:25
    on Friday mornings, it's quite easy to get
    a bike because many bikes appear to be
  • 12:25 - 12:30
    parked, maybe because people envision
    already the weekend. So that's the first
  • 12:30 - 12:37
    outcome for Anna. Well try to avoid times
    around six and around noon when
  • 12:37 - 12:41
    desperately looking for bike. And although
    even more interesting part for Anna is the
  • 12:41 - 12:46
    probability to find a specific station to
    be empty. For that, I took the time series
  • 12:46 - 12:51
    of the number of parked bikes and counted
    the occasions where there was no bike for
  • 12:51 - 12:56
    each of the stations here. And that has
    been done again for each station
  • 12:56 - 13:00
    separately, so for each station, at the
    end of the day, you get a number that
  • 13:00 - 13:04
    denotes the probability of finding that
    station empty. And clearly, for instance,
  • 13:04 - 13:08
    the Hauptbahnhof, the main train station,
    which was the largest station. It's
  • 13:10 - 13:15
    quite unlikely to find it empty,
    and contrary, if you go to these
  • 13:15 - 13:18
    stations down here, for instance
    the Am Plan / Wirtschaftswissenschaften
  • 13:18 - 13:24
    it turns out that these are empty at about
    70 percent of the time, which is quite
  • 13:24 - 13:29
    substantial, I would say. And
    interestingly, if you now look for the the
  • 13:29 - 13:33
    secondary train station in Marburg, the
    Südbahnhof, you can see that this has
  • 13:33 - 13:38
    quite a substantial probability of
    running empty at about 30 to 40 percent.
  • 13:38 - 13:42
    In particular, in comparison to the main
    train station, which is essentially almost
  • 13:42 - 13:51
    never empty. Also interestingly, you can
    then plot these probabilities against the
  • 13:51 - 13:55
    average number of parked bikes at the
    station and you find an antiproportional
  • 13:55 - 13:59
    relation between those two. It means that
    the larger the stations, the more unlikely
  • 13:59 - 14:03
    it is that it's empty, which is quite a
    reasonable outcome, I would say.
  • 14:03 - 14:06
    So finally, to conclude for Anna,
  • 14:06 - 14:09
    she should try to avoid small stations
  • 14:09 - 14:12
    and in particular, she should try
    to avoid the stations that are
  • 14:12 - 14:15
    well, annotated here with
    the sad smiley, because these
  • 14:15 - 14:19
    tend to run empty quite often.
  • 14:21 - 14:25
    OK, so I have all this ETL pipeline
    stuff already set up,
  • 14:25 - 14:28
    I have collected
    over a million data points
  • 14:28 - 14:33
    and then I thought, well, maybe there's
    more in the data then only helping Anna.
  • 14:33 - 14:38
    So everything that I've shown you so far,
    it's from the perspective of a user.
  • 14:38 - 14:41
    And now I would like to turn to
    what's the perspective of a city.
  • 14:41 - 14:43
    And there I would like to
    ask a few questions, like…
  • 14:43 - 14:46
    How is Nextbike used in Marburg?
    first of all,
  • 14:46 - 14:49
    and then, in general,
    Is cycling a good thing for a city?
  • 14:49 - 14:53
    How can, or,
    Can cycling contribute to a better city?
  • 14:53 - 14:58
    And now–better is of course first a quite
    vague term–and then last, but not least,
  • 14:58 - 15:01
    is it worth improving
    bike infrastructure for a city?
  • 15:03 - 15:10
    And all this again, is now from the
    perspective of a city instead of a user.
  • 15:10 - 15:15
    The first thing that I would like to start
    with is something that I call the distance
  • 15:15 - 15:22
    matrix in which I concentrated on the
    positions of the bike stations and
  • 15:22 - 15:26
    computed the pairwise distances for all of
    them. And since the distance is, of
  • 15:26 - 15:32
    course, symmetric, also the stored matrix
    is now in the end also symmetric. And,
  • 15:32 - 15:36
    It turns out that there are roughly 600
    combinations, and these combinations can
  • 15:36 - 15:42
    be shown in a symmetric matrix, as shown
    here, where on the x axis this one here
  • 15:42 - 15:48
    and the y axis you can see the stations
    and then each combination denotes
  • 15:48 - 15:53
    the distance between that one station and
    the other station. It turns out that the
  • 15:53 - 15:57
    range of these distances is between zero
    and roughly nine kilometers. And of
  • 15:57 - 16:03
    course, those that have a zero distance to
    other stations are essentially the…
  • 16:03 - 16:08
    the stations themselves. So if you pick a
    station, obviously the distance to itself
  • 16:08 - 16:12
    is zero and therefore the diagonal is
    exactly zero. And then again, all the
  • 16:12 - 16:20
    remaining part is a symmetric copy of the
    other diagonal part. The other thing and
  • 16:20 - 16:27
    that is now the main treasure, I would say
    of this study, so the main base for
  • 16:27 - 16:31
    everything that follows is what I call the
    transition matrix, where I counted the
  • 16:31 - 16:36
    number of transition of bikes from one
    station to the other station. That is now,
  • 16:36 - 16:40
    of course, not symmetric anymore because
    just because, say, five bikes go from one to
  • 16:40 - 16:44
    the other station, it does not mean that
    these five bikes really come back again.
  • 16:44 - 16:51
    And therefore, the number of entries
    is roughly 1400. Again, it can be shown
  • 16:51 - 16:58
    or visualized in the same fashion.
    So you again have the stations on the one
  • 16:58 - 17:03
    axis and the same stations on the other
    axis, and now each entry here in the
  • 17:03 - 17:07
    matrix corresponds to the number of
    transitions of bikes from one to the
  • 17:07 - 17:15
    other. And the range is from zero to over
    3000. And it turns out that actually the
  • 17:15 - 17:19
    self transitions, meaning somebody takes a
    bike from a station, does something with a
  • 17:19 - 17:23
    bike, maybe grocery shop, grocery shopping
    or so, and then the person comes back to
  • 17:23 - 17:30
    the same station. These events occur the
    most frequent and therefore the largest
  • 17:30 - 17:36
    entry are on the diagonal, typically.
    Sometimes it is not so interesting what
  • 17:36 - 17:41
    happens regarding the self transitions and
    therefore another matrix can be derived
  • 17:41 - 17:46
    from the first one, namely a transition
    matrix without diagonal elements where
  • 17:46 - 17:52
    those elements have been set to zero as
    you can see here, if you look closely.
  • 17:52 - 17:58
    Speaking of looking closely, it's quite
    educational if you not only see the
  • 17:58 - 18:02
    figures, but also can explore them a bit,
    and therefore I rendered an interactive
  • 18:02 - 18:07
    version of it. Let's... let's visit it. So
    that's now again, the matrix without the
  • 18:07 - 18:12
    diagonal and one with the diagonal. And
    now by hovering over these entries so you
  • 18:12 - 18:17
    can see that, for instance, from Am
    Schülerpark to Ockershäuser Allee zero
  • 18:17 - 18:21
    transitions happened. And then a bit
    larger one, for instance, Biegenstraße to
  • 18:21 - 18:28
    Hauptbahnhof over 800 transitions happened
    in the time of capturing the data. So feel
  • 18:28 - 18:35
    free to explore a bit, maybe identify the
    most, most interesting, most used popular
  • 18:35 - 18:45
    routes. Ok, such a transition matrix can
    actually also be shown as a network graph
  • 18:45 - 18:49
    where here I concentrate only on the
    largest entry because it turns out the
  • 18:49 - 18:56
    full transition matrix is a bit too dense.
    And what is shown out here is as blue
  • 18:56 - 19:04
    circles, it corresponds to a station and
    then these edges here are drawn whenever
  • 19:04 - 19:08
    there happens a transition. And you can
    already see here that there are a few
  • 19:08 - 19:13
    stations that are quite isolated, like
    those and then many stations have a self
  • 19:13 - 19:16
    transition and mostly feed to a more
    central station.
  • 19:16 - 19:20
    And since that is also more
    interesting in an interactive fashion,
  • 19:20 - 19:23
    I also rendered
    an interactive version of that.
  • 19:23 - 19:29
    Now again, we can zoom, pan around
    and drag the graph around a bit.
  • 19:29 - 19:34
    And interestingly, if you click on a
    station, you can see from where
  • 19:34 - 19:40
    transitions happen to that station. So
    like those interconnected central ones,
  • 19:40 - 19:43
    like the Hauptbahnhof, the main train
    station, it's quite connected in the
  • 19:43 - 19:47
    graph. And then there are a few like
    Friedrichplatz which are not connected at
  • 19:47 - 19:54
    all. Interestingly, that one here, for
    instance, the Cafe Trauma/Aföllerwiesen it
  • 19:54 - 19:58
    doesn't even have a self connection. So it
    turns out that, well, people apparently
  • 19:58 - 20:02
    mostly use it for taking a bike going into
    the city.
  • 20:02 - 20:08
    And most dominantly,
    the Elisabeth-Blochmann-Platz, actually.
  • 20:12 - 20:18
    OK, so if you now take
    these transition matrices,
  • 20:18 - 20:22
    as well as the distance matrices
    into account and mix them, first of all,
  • 20:22 - 20:29
    you can get a few interesting numbers. So
    here I calculated the overall number of
  • 20:29 - 20:35
    trips, which turned out to be 210,000
    trips in the time of capturing the data,
  • 20:35 - 20:40
    which is quite some essential number for
    such a small city like Marburg. And this
  • 20:40 - 20:44
    is, of course, computed by taking the sum
    of the transition matrix elements. And
  • 20:44 - 20:48
    then if you weigh these sums or these
    entries with the distances between those
  • 20:48 - 20:54
    stations, it turns out that those
    transitions or those trips essentially
  • 20:54 - 20:59
    correspond to a distance of 320,000
    kilometers that have been traveled, which
  • 20:59 - 21:02
    is a few times around the Earth actually.
  • 21:02 - 21:05
    Now, when these two basic numbers and the
  • 21:05 - 21:11
    the matrices that I introduced earlier are
    combined with a few statistical details –
  • 21:11 - 21:15
    like, for instance, the average
    consumption of fuel of a car or how much
  • 21:15 - 21:21
    CO2 it produces while driving – a few
    ecological, economic and social benefits
  • 21:21 - 21:26
    of a bike system or cycling in general can
    be derived. First of all, I found it quite
  • 21:26 - 21:33
    entertaining that the overall number of
    calories burned corresponds to 8.6 million
  • 21:33 - 21:40
    kilocalories. And to convert that to a bit
    more, well, real life number, I would say
  • 21:40 - 21:44
    I calculated how many Nutella jars
    those are, and it turns out that
  • 21:44 - 21:48
    it's roughly 4,000 Nutella jars that
    have been burned in terms of calories
  • 21:48 - 21:56
    just by this system of cycling. And then
    also, it can be found that this distance
  • 21:56 - 22:00
    here, if you would have driven it
    by a car, you would have,
  • 22:00 - 22:06
    well, used almost 26,000 liters of fuel.
    You would have produced 40 tons of CO2.
  • 22:06 - 22:13
    And that fuel that you would have bought
    would have cost 34,000 €, actually.
  • 22:13 - 22:18
    Interestingly, that number here
    of 40 tons of saved CO2
  • 22:18 - 22:23
    corresponds to an average
    German who lives for 4 years
  • 22:23 - 22:27
    or 4 Germans that live for one year.
    So a typical German produces
  • 22:27 - 22:31
    roughly 10 tons, and therefore
    it's four times that, obviously.
  • 22:33 - 22:36
    Ok, so again, from the transition matrix,
  • 22:36 - 22:40
    you can derive a few more interesting
    details like, for instance, details that
  • 22:40 - 22:44
    are interesting from the perspective
    of traffic management.
  • 22:44 - 22:49
    Like, here I calculated the most popular
    routes by finding the maximal elements
  • 22:49 - 22:54
    of the transition matrix. And it turns out
    that the most popular route has been used
  • 22:54 - 22:59
    well over 2000 times a year from the
    Hauptbahnhof to the Ginseldorfer Weg. And
  • 22:59 - 23:03
    if you look closely, you can see that the
    main train station or the Hauptbahnhof,
  • 23:03 - 23:07
    as well as the Elisabeth-Blochmann-Platz
    is involved in many of those top row routes.
  • 23:07 - 23:13
    And that's now again interesting. For
    instance, if a city would like to improve
  • 23:13 - 23:19
    the bike system because we've now seen
    it has quite a good impact for social,
  • 23:19 - 23:23
    ecological, and economical aspects.
  • 23:23 - 23:27
    But let's say the the city has maybe
    limited financial resources.
  • 23:27 - 23:30
    It would be interesting to simply
    calculate the most popular routes,
  • 23:30 - 23:34
    and then start fixing
    or improving them first.
  • 23:36 - 23:39
    OK, now at that point,
    you might ask yourself,
  • 23:39 - 23:42
    Well, what kind of data did he scrape?
  • 23:42 - 23:44
    And for that, I would like to
    show you this graph. It shows
  • 23:44 - 23:48
    the number of parked bikes in the whole
    ecosystem of Marburg against time.
  • 23:48 - 23:51
    And as you can see,
    I did it in two batches.
  • 23:51 - 23:56
    The first one has been obtained from
    March to December 2020. So last year.
  • 23:56 - 24:01
    And then I restarted the scraping at the
    end of April and finished just a few days
  • 24:01 - 24:07
    ago in December 2021. And you can clearly
    see that the number of parked bikes
  • 24:07 - 24:12
    decreases when the weather is good or when
    there are summer months and therefore most
  • 24:12 - 24:18
    likely because the weather is good. And of
    course, it suggests itself a bit given
  • 24:18 - 24:23
    that I captured this in 2020 and that one
    year in 2021 and taking the corona
  • 24:23 - 24:25
    pandemic into account. Well, how does it
    compare?
  • 24:25 - 24:31
    And therefore, I concentrated on the
    overlapping month of the two data sets
  • 24:31 - 24:35
    and calculated, well,
    the comparison, as you can see here.
  • 24:35 - 24:40
    Now in blue, it's 2021 this year
    and 2021, sorry 2020 is shown in red.
  • 24:40 - 24:44
    And you can see that the number of
    parked bikes increased actually.
  • 24:44 - 24:50
    There might be a multitude
    of explanations for that. I don't know.
  • 24:50 - 24:55
    Maybe one explanation could be that people
    took more advantage of working from home.
  • 24:56 - 25:01
    OK, so everything that I've shown you so far,
  • 25:01 - 25:05
    it's been mostly statistical statements,
    averages, sums and stuff like that,
  • 25:05 - 25:10
    and now I was interested if it's possible
    to do also more precise predictions.
  • 25:10 - 25:13
    And therefore I turn
    towards a machine learning or
  • 25:13 - 25:18
    artificial intelligence task where I
    predicted the num… where I tried to
  • 25:18 - 25:21
    predict the number of parked bikes,
    meaning the quantity that I've shown over
  • 25:21 - 25:26
    and over again in the in the last few
    minutes. So is it possible to predict that
  • 25:26 - 25:31
    number based on the hour of the day, the
    weekday and the temperature that is shown
  • 25:31 - 25:37
    here for 2020? And when starting such a
    task, it's always, first of all, very
  • 25:37 - 25:41
    useful to investigate the training data.
    And therefore well I try to plot it. And
  • 25:41 - 25:45
    And because it's a three dimensional face
    space, it's also very simple to plot it.
  • 25:45 - 25:49
    So you can essentially plot it as a
    scatterplot. And the color coding here has
  • 25:49 - 25:54
    been chosen to denote the target variable,
    meaning the number of parked bikes.
  • 25:54 - 25:57
    And just by inspecting the data, you can
    already see that the smaller the
  • 25:57 - 26:03
    temperatures are, the fewer… sorry, the
    more bikes are parked and therefore the
  • 26:03 - 26:08
    fewer bikes are used. I use a random
    forest machine learning model, which
  • 26:08 - 26:13
    consists... which is an ensemble model of
    decision trees, of randomized decision
  • 26:13 - 26:18
    trees. And this model is quite powerful
    because it can work with little data. It
  • 26:18 - 26:23
    can work with a lot of data, and it's also
    very flexible. If you would ever like to
  • 26:23 - 26:28
    extend the face space, like maybe it would
    be interesting to see if one could predict
  • 26:28 - 26:33
    the number of parked bikes given a bank
    holiday or given weekend. And all these
  • 26:33 - 26:38
    aspects could be added to the random
    forest relatively easily. And that's now
  • 26:38 - 26:42
    the outcome: So I show the measured data,
    well that's been data that hasn't been
  • 26:42 - 26:50
    seen by the model before, and I show that
    data here and then the densely covered,
  • 26:50 - 26:53
    face-based prediction of the machine
    learning model here. And you can see that
  • 26:53 - 26:58
    the color trends, they correspond quite
    well to each other. Like you can, for
  • 26:58 - 27:03
    instance, see the smaller numbers or
    larger parked numbers in the regime of
  • 27:03 - 27:08
    small temperature and also from a
    quantitative perspective, the prediction
  • 27:08 - 27:12
    is quite decent as the square root of the
    mean squared error corresponds to a
  • 27:12 - 27:16
    roughly a tenth of the average value of
    the parked bikes.
  • 27:16 - 27:23
    Which, again in this context is quite a
    decent prediction performance,
  • 27:23 - 27:27
    given how naive the
    approach was in general.
  • 27:27 - 27:31
    OK, I did a bit more on machine learning,
    but I'm not showing that here.
  • 27:31 - 27:37
    I calculated the Markov steady state
    for the same data essentially.
  • 27:37 - 27:43
    And if you're interested in that, well,
    feel free to check out this link here.
  • 27:44 - 27:47
    OK, last but not least, I would,
    of course, like to come to
  • 27:47 - 27:51
    the summary for Anna, me,
    and maybe other students.
  • 27:51 - 27:57
    So first of all, what I did was to scrape
    Nextbike data in Marburg in order to find,
  • 27:59 - 28:04
    which stations to potentially avoid when
    you're in desperate need for a Nextbike.
  • 28:04 - 28:09
    And for that, I calculated
    the probabilities of empty stations
  • 28:09 - 28:14
    and found that the larger the station,
    the less likely it is to run out of bikes.
  • 28:14 - 28:17
    So the general recommendation
    from my side would be:
  • 28:17 - 28:21
    try to find larger stations if you're
    in desperate need for an Nextbike.
  • 28:21 - 28:26
    And feel free to go back to
    the interactive map to see the
  • 28:26 - 28:31
    the locations of these stations, which is
    quite interesting in itself, I would say.
  • 28:31 - 28:34
    And then I turned towards
    the perspective of a city, and
  • 28:34 - 28:40
    investigated a bit the usage patterns
    of Nextbikes and therefore representative
  • 28:40 - 28:45
    most likely also cycling in Marburg, where
    I calculated the day-hour usage.
  • 28:45 - 28:49
    So when is the system quite busy
    and generally the most popular routes,
  • 28:49 - 28:56
    which might be of use for city planning
    and also social, economical, and
  • 28:56 - 28:59
    ecological benefits of the whole system.
  • 29:00 - 29:02
    Last but not least, I showed that
  • 29:02 - 29:06
    more precise predictions are possible when
    maybe a statistical statement is not
  • 29:06 - 29:09
    enough and you would like
    to do per case predictions.
  • 29:10 - 29:14
    Last but not least, I was fortunate
    enough to work with AstA Marburg.
  • 29:14 - 29:20
    In particular, Lucas and David,
    thank you very much for your trust
  • 29:20 - 29:25
    in that project where we try to optimize
    the placement of the bikes in the future.
  • 29:26 - 29:29
    The take home messages are now,
    first of all:
  • 29:29 - 29:32
    Bikes are amazing! And not only are they
    amazing for you and the environment,
  • 29:32 - 29:38
    but also for your wallet.
    So you save essentially money on gas.
  • 29:38 - 29:41
    And also, I would like to,
  • 29:42 - 29:45
    well, highlight that those data-driven
    optimizations of public transport
  • 29:45 - 29:50
    have the potential to, well,
    increase the life, the quality of life of
  • 29:50 - 29:55
    many of us at moderate cost. So again, I
    would like to come back to a case where
  • 29:55 - 29:57
    maybe a city would like to
    improve bike infrastructure
  • 29:57 - 30:00
    that doesn't have enough
    money to do it in one go.
  • 30:00 - 30:04
    So then it might be interesting
    to first find–in a data-driven way–which
  • 30:04 - 30:13
    combinations of, now in Nextbike terms,
    maybe stations or in general streets
  • 30:13 - 30:17
    are popular, and then these might be worth
    being fixed first with a limited budget.
  • 30:18 - 30:23
    OK, if you're interested in more, I was
    very fortunate to be able to speak at the
  • 30:23 - 30:29
    last rC3 already about data in Marburg,
    but last year I spoke about parking
  • 30:29 - 30:33
    in Marburg. If you like to, well, read the
    blog articles corresponding to that
  • 30:33 - 30:39
    or just see the official CCC video,
    just follow these links shown here.
  • 30:39 - 30:41
    Thank you very much for your attention.
  • 30:41 - 30:46
    If you have anything to get in contact
    with me, reach out to my e-mail address.
  • 30:46 - 30:50
    Maybe some ideas on how to improve
    a talk or what else to evaluate.
  • 30:50 - 30:53
    And then all the supplementary
    materials that I mentioned,
  • 30:53 - 30:57
    and what I've shown here,
    can be found again on this link here.
  • 30:57 - 31:00
    In particular, thank you very much
    to all the people who reached out to me
  • 31:00 - 31:03
    based on my last year's talk. I haven't
    come about to respond properly, but
  • 31:03 - 31:07
    I'm 100 percent certain that I will do so.
  • 31:07 - 31:11
    Thank you very much for your attention,
    and have a good year.
  • 31:16 - 31:21
    Herald: Alright, welcome back. It's time
    for the Q&A now. You probably know the
  • 31:21 - 31:26
    drill, but I repeat it anyway. If you're
    on Twitter, on Mastodon or on the
  • 31:26 - 31:33
    Fediverse in general, the hashtag is
    #rc3cwtv to ask any questions. And if
  • 31:33 - 31:38
    you're in the hackint IRC, the channel
    name is the same except there's a dash in
  • 31:38 - 31:43
    between the rc3 and the cwtv. And we
    apparently already have some questions, so
  • 31:43 - 31:46
    I'll just get started now.
  • 31:46 - 31:50
    First question:
    Is the Nextbike API free to use?
  • 31:50 - 31:54
    Does Nextbike even know
    that you did this scraping?
  • 31:54 - 32:00
    Martin: Yes, so as far as I know, the
    Nextbike API has been reverse engineered
  • 32:00 - 32:06
    from the iOS app and there's a Github repo
    by ubahnverleih and he documents lots of
  • 32:06 - 32:16
    APIs of public transport companies like
    Nextbike or some companies that also
  • 32:16 - 32:25
    produce the scooters. And since it's the
    public, since it's the official iOS API,
  • 32:25 - 32:30
    it's more or less public, so to say,
    it's free and it's pretty much quota unlimited
  • 32:30 - 32:34
    because normally all the iPhones
    access it. But again, I can only recommend
  • 32:34 - 32:37
    the ubahnverleih repository
    on that on Github.
  • 32:37 - 32:40
    Herald: And you don't need
    any credentials to access it?
  • 32:40 - 32:46
    Martin: No. Actually, you can, as far as
    I checked, you can pretty much access the
  • 32:46 - 32:53
    whole world. So you can access stations
    in Poland in, well, all of Germany now.
  • 32:54 - 32:59
    Herald: That's cool. It's probably
    accidental, but it's quite cool anyway.
  • 32:59 - 33:00
    Martin: laughs Yeah.
  • 33:00 - 33:04
    Herald: Ok. What software did you use for
    the machine learning stuff?
  • 33:04 - 33:07
    Martin: The machine learning stuff
    has been done with Python,
  • 33:07 - 33:12
    and then specifically with sklearn,
    which is a quite popular machine learning
  • 33:12 - 33:16
    framework for Python.
  • 33:17 - 33:20
    Herald: The working horse of the machine
    learning community, I would say.
  • 33:20 - 33:22
    Martin: Yes, exactly yeah.
  • 33:22 - 33:27
    Herald: Do you know if the Nextbike adds
    or removes bikes from the stations?
  • 33:27 - 33:31
    Or do they relocate the bikes?
    Or do… I mean, do they do that?
  • 33:31 - 33:35
    Or does it just happen
    as an emergent behavior?
  • 33:36 - 33:41
    Martin: I would say that…
    So, I had the chance to speak
  • 33:41 - 33:46
    with a person of Nextbike while
    I was working for the Marburg-ASTA
  • 33:46 - 33:51
    and he said that first of all, it's not
    not very technical yet. Well, not very
  • 33:51 - 33:58
    digitalized yet, and they essentially
    drive around. So I'm pretty sure that they
  • 33:58 - 34:01
    certainly collect bikes that need
    maintenance, but then logically,
  • 34:01 - 34:04
    logically, probably also
    relocate them where necessary.
  • 34:06 - 34:11
    Herald: All right. OK, someone wants to
    know if the scripts that you use would be
  • 34:11 - 34:17
    public? I assume the main part with the
    API is already answered if you gave the
  • 34:17 - 34:20
    Github repo. But are you planning to open
    source anything else?
  • 34:21 - 34:26
    Martin: Potentially so I have no plans on
    doing so just because it's additional
  • 34:26 - 34:33
    work, to be honest. If you're… well, I
    can just do the same, well offer the same
  • 34:33 - 34:38
    same thing as last year: Just write me an
    email and if there's enough people who are
  • 34:38 - 34:44
    interested, I probably strip down to my
    internal repository. But since in the
  • 34:44 - 34:49
    internal repository there are a few
    private notes, that one is not published
  • 34:49 - 34:50
    for sure right now.
  • 34:52 - 34:54
    Herald: All right. Anything else?
  • 34:55 - 34:59
    Dear listeners,
    you have maybe 30 seconds to comply.
  • 35:00 - 35:04
    So there's one question, about
    the time period of data that you have,
  • 35:04 - 35:06
    but I think you answered it in the talk.
    Right?
  • 35:06 - 35:14
    Martin: Yes, it's more or less whole 2020
    and 1/2 to 2/3 of 2021 that I collected.
  • 35:14 - 35:18
    Herald: OK, so you're probably mostly has
    like a pandemic situation?
  • 35:18 - 35:20
    Martin: Yes, exclusively.
    Pretty much, yeah
  • 35:21 - 35:25
    Herald: I wonder if that's more or less
    usage than usual. I mean, it's less people
  • 35:25 - 35:29
    having to go places, but more people
    wanting to not use public transport.
  • 35:29 - 35:32
    Martin: Yes, so based on my data,
    I can see that it's
  • 35:32 - 35:36
    the number of parked bikes and
    therefore the usage is going down, so
  • 35:36 - 35:40
    the number of parked bikes is going up.
    Therefore, the usage is going down and
  • 35:40 - 35:46
    that was also confirmed internally by some
    Nextbike people. Now, one more thing, so
  • 35:46 - 35:52
    regarding the people who are interested in
    the code, regardless of if I am going to
  • 35:52 - 35:57
    publish it or not, they if you have
    questions, just drop me an email. I mean,
  • 35:57 - 36:02
    the writing, the scraper in particular,
    it's it's absolutely trivial. And if it's
  • 36:02 - 36:07
    not trivial for you, then the code
    wouldn't be of of value to you anyway.
  • 36:08 - 36:14
    Herald: All right. How does your data
    interpret broken / unavailable bikes at
  • 36:14 - 36:19
    the station? I mean, can you see that?
    Or do you take it into account?
  • 36:19 - 36:22
    Martin: Yes, so I don't see directly.
  • 36:22 - 36:28
    I mean, I have a list of of all the bikes
    and if I would dig a little bit deeper,
  • 36:28 - 36:33
    I could probably, you know, compile a list
    where I see where the bike, where a
  • 36:33 - 36:38
    particular bike is standing at the moment.
    And if that bike would be, for instance,
  • 36:38 - 36:42
    absent for a for a longer time, I could
    conclude that it's maybe broken,
  • 36:42 - 36:47
    maintenance, maintained or something like
    that. But there's no direct data on that.
  • 36:47 - 36:53
    Herald: All right. Do you do you think
    that Nextbike moving the bikes has somehow
  • 36:53 - 36:57
    biased your data.
    Like if basically relocate them?
  • 36:57 - 37:00
    Martin: That's a good question. I have
    absolutely no idea. So I mean, what I what
  • 37:00 - 37:08
    I did calculate was that, so I defined a
    term that I, a term of activity,
  • 37:08 - 37:13
    I defined it as the number of bikes coming
    in, divided by the number of bikes going
  • 37:13 - 37:17
    out, plus the number of bikes going in. So
    it's so to say the activity and when
  • 37:17 - 37:22
    that number - it's obviously between zero
    and one - and if it's far from zero point
  • 37:22 - 37:27
    five, that would mean that the station
    runs empty essentially or overfills at
  • 37:27 - 37:32
    some point and there are a few stations
    where it's a bit above zero point five.
  • 37:32 - 37:39
    But of course, that's only this well, the
    the data that I used has all only the
  • 37:39 - 37:44
    moved bikes incorporated already. So it's
    not really something that could be used
  • 37:44 - 37:47
    for really trying to find it.
  • 37:48 - 37:52
    Herald: Do you, I mean, is this just kind
    of data also available for,
  • 37:52 - 37:56
    for bike sharing services
    that don't have docking?
  • 37:56 - 37:59
    If they even exist still in Germany?
    I kind of lost track.
  • 37:59 - 38:02
    I think maybe they
    all went bankrupt, but of course…
  • 38:02 - 38:04
    Martin: What do you mean by docking?
  • 38:04 - 38:07
    Herald: By, you know, they don't have
    fixed stations, but they are floating.
  • 38:07 - 38:13
    Martin: So I mean, all that I did was to
    look at the stations, but actually there
  • 38:13 - 38:17
    are a few free standing ones also in
    Marburg, and these people are typically
  • 38:17 - 38:23
    penalized, penalized by money, so they
    have to pay, pay a fee. I didn't analyze
  • 38:23 - 38:27
    it at all. Would be interesting for sure.
    And as far as I know, there are cities
  • 38:27 - 38:33
    where it's completely, well, there are
    no stations for Nextbike,
  • 38:33 - 38:36
    where people can drop it off
    wherever they like.
  • 38:36 - 38:39
    Don't quote me on that, it's
    just something that I've heard.
  • 38:39 - 38:43
    Most likely in the large cities.
    So maybe in Berlin could be.
  • 38:43 - 38:48
    Herald: Yeah, I think here there are like
    some locations where you have to drop the
  • 38:48 - 38:50
    bikes, but that's,
    I'm not sure if that's Nextbike.
  • 38:50 - 38:55
    I can never remember which ones
    laughs I actually end up using.
  • 38:56 - 39:02
    All right, everybody. Now is your last
    chance to ask more questions.
  • 39:02 - 39:07
    I feel like at Teleshopping, like the rC3
    Teleshopping, which I highly recommend if
  • 39:07 - 39:12
    you haven't checked it out. It's probably
    the peak experience at the remote Congress
  • 39:12 - 39:17
    is the Teleshopping channel.
    And you should all have a look.
  • 39:17 - 39:22
    And maybe buy some…
    some extremely useful items that they sell
  • 39:27 - 39:32
    Herald: OK, so the chat confirms that
    Nextbike does have cities without stations
  • 39:32 - 39:33
    Martin: Ah ja ja, very good.
  • 39:34 - 39:36
    Yet, I mean, I can only…
  • 39:36 - 39:42
    if you're remotely interested in all
    these public transport data studies,
  • 39:42 - 39:46
    definitely check out the
    ubahnverleih Github repository.
  • 39:46 - 39:49
    There's a large number
    of systems documented there.
  • 39:50 - 39:55
    Herald: OK, and that's just ubahnverleih,
    just as you would write it.
  • 39:55 - 39:58
    Martin: Yes, let me look it up
    very quickly, Ubahn…
  • 40:03 - 40:08
    Well, the person is from Ulm,
    and he also contributed to the
  • 40:08 - 40:14
    CCC infrastructure. His name is
    Constantine and yes, it's ubahnverleih.
  • 40:14 - 40:18
    And I think it's like, I think the repo
    name name is WoBike, as far as I know,
  • 40:18 - 40:20
    Herald: All right. Good. Thank you.
  • 40:23 - 40:29
    Alright. I think we've managed to exhaust
    the internet. So, people, where can they
  • 40:29 - 40:33
    find you have to have any further
    questions? Are you going to be wandering
  • 40:33 - 40:37
    the remote, the world or what it's called?
    You know the…
  • 40:37 - 40:41
    Martin: Well, that's a good idea. I
    haven't planned, but I can. So I've no
  • 40:41 - 40:46
    idea how it works, but I'm sure I can
    figure it out. So I mean, in general, drop
  • 40:46 - 40:53
    me an email and you can find my email on
    lellep dot xyz. It's my website.
  • 40:55 - 40:59
    Other than that, I could be online
    in the 2D world adventure now,
  • 40:59 - 41:02
    if that's of of value to anybody.
  • 41:02 - 41:05
    Herald: People can maybe hunt you
    down if they really need to, you need to.
  • 41:05 - 41:08
    Martin: definitely ja.
  • 41:08 - 41:12
    Herald: OK, wonderful. Well, thank you for
    your talk and for answering the questions.
  • 41:12 - 41:17
    And thanks everyone for tuning in.
    Have a good remainder of Congress.
  • 41:17 - 41:21
    I think you should be able to at some
    point rate talks in the Fahrplan,
  • 41:21 - 41:25
    if that feature still exists, so if you
    want to see more of this kind of stuff,
  • 41:25 - 41:27
    maybe leave some feedback.
  • 41:28 - 41:29
    Bye bye.
  • 41:30 - 41:31
    Martin: Bye.
  • 41:31 - 41:44
    rC3 postroll music
  • 41:44 - 41:52
    Subtitles created by c3subtitles.de
    in the year 2022. Join, and help us!
Title:
Optimising public transport: A data-driven bike-sharing study in Marburg
Description:

more » « less
Video Language:
English
Duration:
41:55

English subtitles

Revisions