- 
RC3 preroll music
 
- 
Herald: Hello, everyone, welcome back to
Chaos West TV. The next talk will start
 
- 
momentarily. I will now switch back to
German for a few seconds to announce a
 
- 
translation. Then I'll switch back and
then we'll go off to the races as they say
 
- 
So nochmal schnell auf Deutsch,
willkommen zurück zu Chaos West TV, eure
 
- 
beste Bühne auf dem rc3. Der nächste Talk
beginnt gleich er ist zwar auf Englisch
 
- 
wird aber wie so vieles dank unserer
Übersetzungscrew auf Deutsch übersetzt.
 
- 
Ihr solltet in der Lage sein das im Stream
einfach auszuwählen ohne größere Probleme
 
- 
und dann könnt ihr den Vortrag auch direkt
simultanübersetzt auf Deutsch hören
 
- 
und ich rede jetzt auf Englisch weiter.
 
- 
Alright back to English.
 
- 
Now in the comfort of your own homes
or wherever you're viewing the stream,
 
- 
please do a warm round of applause 
for our next speaker,
 
- 
Martin, who will talk about 
optimizing public transport.
 
- 
Let's go.
 
- 
Martin: Welcome to my contribution to this
year's rC3 2021 in the form of this talk,
 
- 
Optimizing public transport: 
a data-driven bike sharing study in Marburg
 
- 
I would like to thank the organizers of the
rC3 2021 for organizing the whole event.
 
- 
And in particular, I would like to thank
the channel that accepted me Chaos West TV
 
- 
well for accepting the presentation of my
work. Today I would like to give you a
 
- 
quick overview of one of my hobby projects
in which I scraped and therefore
 
- 
downloaded over one million data points
regarding the bike sharing system in the
 
- 
city of Marburg. This study came about
when I was traveling from Stuttgart to
 
- 
Frankfurt and ultimately to Marburg some
time ago, and I was watching the amazing
 
- 
SpiegelMining talk by David Kriesel. So
thank you very much for this implicit
 
- 
inspiration of the work that you're about
to see now.
 
- 
Who am I? My name is Martin Lellep, 
and I studied physics in the past,
 
- 
and actually, I continue to do so in the 
form of a Ph.D. in theoretical physics at
 
- 
the University of Edinburgh in Scotland 
and in my spare time I like to do data
 
- 
analysis of all kinds of data. 
There are two more things...
 
- 
There are two more things 
that are important for here now.
 
- 
It's first of all, I studied at the
University of Marburg, obviously in
 
- 
Marburg previously, and then also I like
to ride my bike. Marburg, for those who
 
- 
don't know it yet, it's a small,
university dominated town that is in the
 
- 
north of Frankfurt am Main, roughly 80
kilometers. So an hour by car or an hour
 
- 
by train, approximately. And again, it's
quite dominated by the university that is
 
- 
located there, and that can be seen simply
in terms of, for instance, numbers. There
 
- 
are roughly 25,000 students for an overall
population of 77,000 residents in total,
 
- 
which is quite substantial, obviously. You
can see a quite popular picture here of a
 
- 
picturesque scene in Marburg. We can see
the castle and then the river Lahn, as
 
- 
well as a few houses and a bit of green.
And the bike rentals are currently
 
- 
provided at the time of recording this by
the company called Nextbike. Before now
 
- 
diving into a bit more technical details,
I would like to motivate my story or my
 
- 
study by the story of Anna. Anna is a
university... is a university student at
 
- 
the University of Marburg, and she lives a
bit outside the city, so she typically
 
- 
does not walk to the place that she needs
to be or study at. But she takes the bus
 
- 
from her... from her flat to the
university, to the city. And then does the
 
- 
last mile by walking or cycling or
whatever. And she's also quite an eager
 
- 
student, so she very often studies quite
late. As you can see here, that's a
 
- 
picture of late Marburg, so to say, and
just as it happens now, she needs to catch
 
- 
a bus now because she's a bit late. She
forgot to pack in her... her fancy MacBook
 
- 
in time, so she needs to hurry up a
bit and, well, didn't really make it. So
 
- 
therefore, she thought maybe
 
- 
taking a Nextbike for the last mile
to the bus station is a good idea
 
- 
so she can safely take then subsequently 
the bus home. And normally the bus…
 
- 
The Nextbike stations look like
that here. So there are plenty of bikes.
 
- 
It's very easy to go there, grab a bike
and go to your destination. Now Anna must
 
- 
be a very unlucky student today because
she arrives at the bike station, and it
 
- 
turns out that the station is empty, so
ultimately she misses at least this bus
 
- 
and therefore only arrives at home a bit
later. Her cooking plans and her Netflix
 
- 
plans, all that stuff postponed a bit
because, well, she arrives a bit later.
 
- 
And that's, of course, a very, very sad
story, and maybe it happens to multiple
 
- 
people, not only Anna. And in fact, it
also happened to me a few times, and every
 
- 
time it happened to me, I thought, well, I
must be the most unlucky person in whole
 
- 
Marburg going to a normally completely
fully packed bike station and now it's
 
- 
completely empty. Missing, for instance,
subsequent public transportation.
 
- 
After it happened to me a few times, I
thought, well, maybe I'm not that unlucky.
 
- 
So is there may be a system to empty bike
stations in Marburg. And given all my
 
- 
my spare time interest of analyzing and
capturing data, I thought, well, data to
 
- 
the rescue, of course. And therefore, the
idea for this talk now was to build a web
 
- 
scraper in order to acquire Nextbike data.
Collect the data, store the data, analyze
 
- 
the data and then hopefully finally help
Anna, me, and other students to figure out
 
- 
which stations maybe to avoid and which
stations are safe to go to if you're in
 
- 
desperate need for a bike.
 
- 
The tech stack that I'm using here, 
it's based on a Docker container
 
- 
in which a python scraper runs 
every 30 seconds that queries the
 
- 
Nextbike API. It downloads the data, it
parses the data, and then saves the data
 
- 
outside the Docker container in order to
be evaluated later on. And it turns out
 
- 
that the whole concept of what I just
described also has a name. It's called
 
- 
Extract, Transform, Load Pipeline or ETL
in short. And what I again wrote here is
 
- 
an ETL pipeline in Python, and then I
wrote an analysis code also written in
 
- 
Python and all that was running on a small
home server in my flat. The data that I
 
- 
captured consists of the bikes identified
through IDs and then also the locations of
 
- 
those bikes, typically at stations, but
some of them were also freestanding and
 
- 
last but not least, the station locations,
and of course, obviously also a list of of
 
- 
stations. And then with it, I went ahead
and did a few pictures that I'm about to
 
- 
show now and a few analyses. And if you're
interested in that and there are slides
 
- 
available on this website here, the
website can be read through the QR code or
 
- 
through that link and this website
contains the slides that you'll see in
 
- 
here, high resolution figures, a few
interactive figures and all the
 
- 
information on the previous blog articles
that I wrote about this topic.
 
- 
So the results of Anna, first of all, to
start slowly. It turns out that there are
 
- 
37 bike stations in Marburg, 
with roughly 230 bikes spread across
 
- 
the whole Nextbike Marburg ecosystem.
 
- 
And it's now, well, knowing that 
there are roughly 40 stations,
 
- 
it's quite interesting to see
where these stations are,
 
- 
because then Anna could,
 
- 
for instance, already go to another 
station if one station is empty.
 
- 
And what you can see here is now a map
of Marburg, where the stations are
 
- 
annotated by these dots. 
And the area of the dot,
 
- 
as well as the color code, 
corresponds to the average number of
 
- 
parked bikes at that station. So let's see
an interactive version because it's a bit
 
- 
nicer to see it in that way. So I click on
here. Alright. OK, now we can pan around
 
- 
and zoom as you can often do with these
interactive graphics and also by clicking
 
- 
on these buttons or on these these points,
you can see the station name, as well as
 
- 
the average number of bikes placed there.
And becomes quite obvious that, well, most
 
- 
of the stations are in the central part of
the city, a few in the outskirts here. And
 
- 
it turns out that the largest station in
terms of the number of parked bikes on
 
- 
average is the main train station
Hauptbahnhof. There are again a few more
 
- 
spread around the 
central part of the station,
 
- 
such as the Elisabeth-Blochmann-Platz, 
which is the second largest station.
 
- 
And then if you continue the train 
line here, you can see that there's
 
- 
actually another set of stations, where 
the secondary train station is.
 
- 
So that's another train station, 
smaller train station.
 
- 
OK, so the first results for Anna
would then be a day-hour usage histogram,
 
- 
because it's the kind of the first order
approach, I would say, in order to see how
 
- 
the ecosystem of Nextbikes is in use
against day as well as hour. And therefore
 
- 
Anna will based on this figure here, she
will understand when to maybe plan for a
 
- 
bit more time when looking for a bike in a
desperate fashion. And since this figure
 
- 
is a bit more difficult to understand, I
would like to take a moment to explain it
 
- 
and we are going to start with the top
figure here. What you can see on the x
 
- 
axis is the hour of the day and on the y
axis, and that's shown in the whole
 
- 
figure. So each of the the numbers that
you see is the following: it's the
 
- 
average. And well it's the number of
parked bikes and then you subtract the
 
- 
average of the number of parked bikes in
the whole ecosystem of Marburg. So that
 
- 
means if a number of zero is encountered
like roughly here, it means that the
 
- 
average number of parked bikes simply in
the system at that point in time. When the
 
- 
number is larger, it's above the average,
if it's smaller, it's below the average.
 
- 
And you can clearly see from this small
figure here already that in the morning,
 
- 
more bikes are typically parked. And then
in the evenings or around noon, you can
 
- 
see two dips, a bimodal distribution so to
say. Where people, well, obviously use
 
- 
bikes around noon and six p.m. roughly
where these used bikes, of course, are not
 
- 
parked, and therefore these numbers are
smaller. And the same thing can be done
 
- 
for the day of the week. Here and here you
can see that the Monday, well, the
 
- 
beginning of the week and the end of the
week, meaning Monday, Tuesday and Saturday
 
- 
Sunday are a bit more popular, so more
people ride a bike and therefore fewer
 
- 
bikes are parked and therefore this is
negative. And then in the middle of the
 
- 
week, fewer people seem to ride the bike,
the bikes in general. And if you combine
 
- 
these figures now, you can see the the
joint histogram here, where you can not
 
- 
only look for time or day separately, but
also in a combined fashion. So you would,
 
- 
for instance, see that Monday morning is
the time where many people use bikes
 
- 
because they are not as many bikes parked.
And then also on a Saturday, you can see
 
- 
the same, so around afternoon many people
seem to use the bikes. Last but not least
 
- 
on Friday mornings, it's quite easy to get
a bike because many bikes appear to be
 
- 
parked, maybe because people envision
already the weekend. So that's the first
 
- 
outcome for Anna. Well try to avoid times
around six and around noon when
 
- 
desperately looking for bike. And although
even more interesting part for Anna is the
 
- 
probability to find a specific station to
be empty. For that, I took the time series
 
- 
of the number of parked bikes and counted
the occasions where there was no bike for
 
- 
each of the stations here. And that has
been done again for each station
 
- 
separately, so for each station, at the
end of the day, you get a number that
 
- 
denotes the probability of finding that
station empty. And clearly, for instance,
 
- 
the Hauptbahnhof, the main train station,
which was the largest station. It's
 
- 
quite unlikely to find it empty,
and contrary, if you go to these
 
- 
stations down here, for instance 
the Am Plan / Wirtschaftswissenschaften
 
- 
it turns out that these are empty at about
70 percent of the time, which is quite
 
- 
substantial, I would say. And
interestingly, if you now look for the the
 
- 
secondary train station in Marburg, the
Südbahnhof, you can see that this has
 
- 
quite a substantial probability of 
running empty at about 30 to 40 percent.
 
- 
In particular, in comparison to the main 
train station, which is essentially almost
 
- 
never empty. Also interestingly, you can
then plot these probabilities against the
 
- 
average number of parked bikes at the
station and you find an antiproportional
 
- 
relation between those two. It means that
the larger the stations, the more unlikely
 
- 
it is that it's empty, which is quite a
reasonable outcome, I would say.
 
- 
So finally, to conclude for Anna,
 
- 
she should try to avoid small stations
 
- 
and in particular, she should try
to avoid the stations that are
 
- 
well, annotated here with
the sad smiley, because these
 
- 
tend to run empty quite often.
 
- 
OK, so I have all this ETL pipeline 
stuff already set up,
 
- 
I have collected 
over a million data points
 
- 
and then I thought, well, maybe there's
more in the data then only helping Anna.
 
- 
So everything that I've shown you so far,
it's from the perspective of a user.
 
- 
And now I would like to turn to 
what's the perspective of a city.
 
- 
And there I would like to 
ask a few questions, like…
 
- 
How is Nextbike used in Marburg?
first of all,
 
- 
and then, in general, 
Is cycling a good thing for a city?
 
- 
How can, or,
Can cycling contribute to a better city?
 
- 
And now–better is of course first a quite
vague term–and then last, but not least,
 
- 
is it worth improving 
bike infrastructure for a city?
 
- 
And all this again, is now from the
perspective of a city instead of a user.
 
- 
The first thing that I would like to start
with is something that I call the distance
 
- 
matrix in which I concentrated on the
positions of the bike stations and
 
- 
computed the pairwise distances for all of
them. And since the distance is, of
 
- 
course, symmetric, also the stored matrix
is now in the end also symmetric. And,
 
- 
It turns out that there are roughly 600
combinations, and these combinations can
 
- 
be shown in a symmetric matrix, as shown
here, where on the x axis this one here
 
- 
and the y axis you can see the stations
and then each combination denotes
 
- 
the distance between that one station and
the other station. It turns out that the
 
- 
range of these distances is between zero
and roughly nine kilometers. And of
 
- 
course, those that have a zero distance to
other stations are essentially the…
 
- 
the stations themselves. So if you pick a
station, obviously the distance to itself
 
- 
is zero and therefore the diagonal is
exactly zero. And then again, all the
 
- 
remaining part is a symmetric copy of the
other diagonal part. The other thing and
 
- 
that is now the main treasure, I would say
of this study, so the main base for
 
- 
everything that follows is what I call the
transition matrix, where I counted the
 
- 
number of transition of bikes from one
station to the other station. That is now,
 
- 
of course, not symmetric anymore because
just because, say, five bikes go from one to
 
- 
the other station, it does not mean that
these five bikes really come back again.
 
- 
And therefore, the number of entries 
is roughly 1400. Again, it can be shown
 
- 
or visualized in the same fashion.
So you again have the stations on the one
 
- 
axis and the same stations on the other
axis, and now each entry here in the
 
- 
matrix corresponds to the number of
transitions of bikes from one to the
 
- 
other. And the range is from zero to over
3000. And it turns out that actually the
 
- 
self transitions, meaning somebody takes a
bike from a station, does something with a
 
- 
bike, maybe grocery shop, grocery shopping
or so, and then the person comes back to
 
- 
the same station. These events occur the
most frequent and therefore the largest
 
- 
entry are on the diagonal, typically.
Sometimes it is not so interesting what
 
- 
happens regarding the self transitions and
therefore another matrix can be derived
 
- 
from the first one, namely a transition
matrix without diagonal elements where
 
- 
those elements have been set to zero as
you can see here, if you look closely.
 
- 
Speaking of looking closely, it's quite
educational if you not only see the
 
- 
figures, but also can explore them a bit,
and therefore I rendered an interactive
 
- 
version of it. Let's... let's visit it. So
that's now again, the matrix without the
 
- 
diagonal and one with the diagonal. And
now by hovering over these entries so you
 
- 
can see that, for instance, from Am
Schülerpark to Ockershäuser Allee zero
 
- 
transitions happened. And then a bit
larger one, for instance, Biegenstraße to
 
- 
Hauptbahnhof over 800 transitions happened
in the time of capturing the data. So feel
 
- 
free to explore a bit, maybe identify the
most, most interesting, most used popular
 
- 
routes. Ok, such a transition matrix can
actually also be shown as a network graph
 
- 
where here I concentrate only on the
largest entry because it turns out the
 
- 
full transition matrix is a bit too dense.
And what is shown out here is as blue
 
- 
circles, it corresponds to a station and
then these edges here are drawn whenever
 
- 
there happens a transition. And you can
already see here that there are a few
 
- 
stations that are quite isolated, like
those and then many stations have a self
 
- 
transition and mostly feed to a more
central station.
 
- 
And since that is also more
interesting in an interactive fashion,
 
- 
I also rendered 
an interactive version of that.
 
- 
Now again, we can zoom, pan around
and drag the graph around a bit.
 
- 
And interestingly, if you click on a
station, you can see from where
 
- 
transitions happen to that station. So
like those interconnected central ones,
 
- 
like the Hauptbahnhof, the main train
station, it's quite connected in the
 
- 
graph. And then there are a few like
Friedrichplatz which are not connected at
 
- 
all. Interestingly, that one here, for
instance, the Cafe Trauma/Aföllerwiesen it
 
- 
doesn't even have a self connection. So it
turns out that, well, people apparently
 
- 
mostly use it for taking a bike going into
the city.
 
- 
And most dominantly, 
the Elisabeth-Blochmann-Platz, actually.
 
- 
OK, so if you now take 
these transition matrices,
 
- 
as well as the distance matrices
into account and mix them, first of all,
 
- 
you can get a few interesting numbers. So
here I calculated the overall number of
 
- 
trips, which turned out to be 210,000
trips in the time of capturing the data,
 
- 
which is quite some essential number for
such a small city like Marburg. And this
 
- 
is, of course, computed by taking the sum
of the transition matrix elements. And
 
- 
then if you weigh these sums or these
entries with the distances between those
 
- 
stations, it turns out that those
transitions or those trips essentially
 
- 
correspond to a distance of 320,000
kilometers that have been traveled, which
 
- 
is a few times around the Earth actually.
 
- 
Now, when these two basic numbers and the
 
- 
the matrices that I introduced earlier are
combined with a few statistical details –
 
- 
like, for instance, the average
consumption of fuel of a car or how much
 
- 
CO2 it produces while driving – a few
ecological, economic and social benefits
 
- 
of a bike system or cycling in general can
be derived. First of all, I found it quite
 
- 
entertaining that the overall number of
calories burned corresponds to 8.6 million
 
- 
kilocalories. And to convert that to a bit
more, well, real life number, I would say
 
- 
I calculated how many Nutella jars 
those are, and it turns out that
 
- 
it's roughly 4,000 Nutella jars that
have been burned in terms of calories
 
- 
just by this system of cycling. And then 
also, it can be found that this distance
 
- 
here, if you would have driven it 
by a car, you would have,
 
- 
well, used almost 26,000 liters of fuel. 
You would have produced 40 tons of CO2.
 
- 
And that fuel that you would have bought
would have cost 34,000 €, actually.
 
- 
Interestingly, that number here 
of 40 tons of saved CO2
 
- 
corresponds to an average
German who lives for 4 years
 
- 
or 4 Germans that live for one year.
So a typical German produces
 
- 
roughly 10 tons, and therefore 
it's four times that, obviously.
 
- 
Ok, so again, from the transition matrix,
 
- 
you can derive a few more interesting
details like, for instance, details that
 
- 
are interesting from the perspective 
of traffic management.
 
- 
Like, here I calculated the most popular
routes by finding the maximal elements
 
- 
of the transition matrix. And it turns out
that the most popular route has been used
 
- 
well over 2000 times a year from the
Hauptbahnhof to the Ginseldorfer Weg. And
 
- 
if you look closely, you can see that the
main train station or the Hauptbahnhof,
 
- 
as well as the Elisabeth-Blochmann-Platz
is involved in many of those top row routes.
 
- 
And that's now again interesting. For
instance, if a city would like to improve
 
- 
the bike system because we've now seen
it has quite a good impact for social,
 
- 
ecological, and economical aspects.
 
- 
But let's say the the city has maybe 
limited financial resources.
 
- 
It would be interesting to simply
calculate the most popular routes,
 
- 
and then start fixing 
or improving them first.
 
- 
OK, now at that point, 
you might ask yourself,
 
- 
Well, what kind of data did he scrape?
 
- 
And for that, I would like to
show you this graph. It shows
 
- 
the number of parked bikes in the whole 
ecosystem of Marburg against time.
 
- 
And as you can see, 
I did it in two batches.
 
- 
The first one has been obtained from 
March to December 2020. So last year.
 
- 
And then I restarted the scraping at the
end of April and finished just a few days
 
- 
ago in December 2021. And you can clearly
see that the number of parked bikes
 
- 
decreases when the weather is good or when
there are summer months and therefore most
 
- 
likely because the weather is good. And of
course, it suggests itself a bit given
 
- 
that I captured this in 2020 and that one
year in 2021 and taking the corona
 
- 
pandemic into account. Well, how does it
compare?
 
- 
And therefore, I concentrated on the 
overlapping month of the two data sets
 
- 
and calculated, well, 
the comparison, as you can see here.
 
- 
Now in blue, it's 2021 this year 
and 2021, sorry 2020 is shown in red.
 
- 
And you can see that the number of
parked bikes increased actually.
 
- 
There might be a multitude 
of explanations for that. I don't know.
 
- 
Maybe one explanation could be that people 
took more advantage of working from home.
 
- 
OK, so everything that I've shown you so far,
 
- 
it's been mostly statistical statements, 
averages, sums and stuff like that,
 
- 
and now I was interested if it's possible 
to do also more precise predictions.
 
- 
And therefore I turn 
towards a machine learning or
 
- 
artificial intelligence task where I
predicted the num… where I tried to
 
- 
predict the number of parked bikes,
meaning the quantity that I've shown over
 
- 
and over again in the in the last few
minutes. So is it possible to predict that
 
- 
number based on the hour of the day, the
weekday and the temperature that is shown
 
- 
here for 2020? And when starting such a
task, it's always, first of all, very
 
- 
useful to investigate the training data.
And therefore well I try to plot it. And
 
- 
And because it's a three dimensional face
space, it's also very simple to plot it.
 
- 
So you can essentially plot it as a
scatterplot. And the color coding here has
 
- 
been chosen to denote the target variable,
meaning the number of parked bikes.
 
- 
And just by inspecting the data, you can
already see that the smaller the
 
- 
temperatures are, the fewer… sorry, the
more bikes are parked and therefore the
 
- 
fewer bikes are used. I use a random
forest machine learning model, which
 
- 
consists... which is an ensemble model of
decision trees, of randomized decision
 
- 
trees. And this model is quite powerful
because it can work with little data. It
 
- 
can work with a lot of data, and it's also
very flexible. If you would ever like to
 
- 
extend the face space, like maybe it would
be interesting to see if one could predict
 
- 
the number of parked bikes given a bank
holiday or given weekend. And all these
 
- 
aspects could be added to the random
forest relatively easily. And that's now
 
- 
the outcome: So I show the measured data,
well that's been data that hasn't been
 
- 
seen by the model before, and I show that
data here and then the densely covered,
 
- 
face-based prediction of the machine
learning model here. And you can see that
 
- 
the color trends, they correspond quite
well to each other. Like you can, for
 
- 
instance, see the smaller numbers or
larger parked numbers in the regime of
 
- 
small temperature and also from a
quantitative perspective, the prediction
 
- 
is quite decent as the square root of the
mean squared error corresponds to a
 
- 
roughly a tenth of the average value of
the parked bikes.
 
- 
Which, again in this context is quite a 
decent prediction performance,
 
- 
given how naive the
approach was in general.
 
- 
OK, I did a bit more on machine learning, 
but I'm not showing that here.
 
- 
I calculated the Markov steady state
for the same data essentially.
 
- 
And if you're interested in that, well, 
feel free to check out this link here.
 
- 
OK, last but not least, I would, 
of course, like to come to
 
- 
the summary for Anna, me, 
and maybe other students.
 
- 
So first of all, what I did was to scrape 
Nextbike data in Marburg in order to find,
 
- 
which stations to potentially avoid when 
you're in desperate need for a Nextbike.
 
- 
And for that, I calculated 
the probabilities of empty stations
 
- 
and found that the larger the station, 
the less likely it is to run out of bikes.
 
- 
So the general recommendation 
from my side would be:
 
- 
try to find larger stations if you're 
in desperate need for an Nextbike.
 
- 
And feel free to go back to 
the interactive map to see the
 
- 
the locations of these stations, which is
quite interesting in itself, I would say.
 
- 
And then I turned towards 
the perspective of a city, and
 
- 
investigated a bit the usage patterns
of Nextbikes and therefore representative
 
- 
most likely also cycling in Marburg, where
I calculated the day-hour usage.
 
- 
So when is the system quite busy 
and generally the most popular routes,
 
- 
which might be of use for city planning 
and also social, economical, and
 
- 
ecological benefits of the whole system.
 
- 
Last but not least, I showed that
 
- 
more precise predictions are possible when
maybe a statistical statement is not
 
- 
enough and you would like 
to do per case predictions.
 
- 
Last but not least, I was fortunate 
enough to work with AstA Marburg.
 
- 
In particular, Lucas and David, 
thank you very much for your trust
 
- 
in that project where we try to optimize 
the placement of the bikes in the future.
 
- 
The take home messages are now, 
first of all:
 
- 
Bikes are amazing! And not only are they
amazing for you and the environment,
 
- 
but also for your wallet.
So you save essentially money on gas.
 
- 
And also, I would like to,
 
- 
well, highlight that those data-driven 
optimizations of public transport
 
- 
have the potential to, well, 
increase the life, the quality of life of
 
- 
many of us at moderate cost. So again, I
would like to come back to a case where
 
- 
maybe a city would like to 
improve bike infrastructure
 
- 
that doesn't have enough 
money to do it in one go.
 
- 
So then it might be interesting 
to first find–in a data-driven way–which
 
- 
combinations of, now in Nextbike terms, 
maybe stations or in general streets
 
- 
are popular, and then these might be worth
being fixed first with a limited budget.
 
- 
OK, if you're interested in more, I was 
very fortunate to be able to speak at the
 
- 
last rC3 already about data in Marburg, 
but last year I spoke about parking
 
- 
in Marburg. If you like to, well, read the 
blog articles corresponding to that
 
- 
or just see the official CCC video, 
just follow these links shown here.
 
- 
Thank you very much for your attention.
 
- 
If you have anything to get in contact 
with me, reach out to my e-mail address.
 
- 
Maybe some ideas on how to improve 
a talk or what else to evaluate.
 
- 
And then all the supplementary 
materials that I mentioned,
 
- 
and what I've shown here, 
can be found again on this link here.
 
- 
In particular, thank you very much
to all the people who reached out to me
 
- 
based on my last year's talk. I haven't
come about to respond properly, but
 
- 
I'm 100 percent certain that I will do so.
 
- 
Thank you very much for your attention, 
and have a good year.
 
- 
Herald: Alright, welcome back. It's time
for the Q&A now. You probably know the
 
- 
drill, but I repeat it anyway. If you're
on Twitter, on Mastodon or on the
 
- 
Fediverse in general, the hashtag is
#rc3cwtv to ask any questions. And if
 
- 
you're in the hackint IRC, the channel
name is the same except there's a dash in
 
- 
between the rc3 and the cwtv. And we
apparently already have some questions, so
 
- 
I'll just get started now.
 
- 
First question:
Is the Nextbike API free to use?
 
- 
Does Nextbike even know 
that you did this scraping?
 
- 
Martin: Yes, so as far as I know, the
Nextbike API has been reverse engineered
 
- 
from the iOS app and there's a Github repo
by ubahnverleih and he documents lots of
 
- 
APIs of public transport companies like
Nextbike or some companies that also
 
- 
produce the scooters. And since it's the
public, since it's the official iOS API,
 
- 
it's more or less public, so to say, 
it's free and it's pretty much quota unlimited
 
- 
because normally all the iPhones 
access it. But again, I can only recommend
 
- 
the ubahnverleih repository 
on that on Github.
 
- 
Herald: And you don't need 
any credentials to access it?
 
- 
Martin: No. Actually, you can, as far as 
I checked, you can pretty much access the
 
- 
whole world. So you can access stations 
in Poland in, well, all of Germany now.
 
- 
Herald: That's cool. It's probably 
accidental, but it's quite cool anyway.
 
- 
Martin: laughs Yeah.
 
- 
Herald: Ok. What software did you use for
the machine learning stuff?
 
- 
Martin: The machine learning stuff 
has been done with Python,
 
- 
and then specifically with sklearn, 
which is a quite popular machine learning
 
- 
framework for Python.
 
- 
Herald: The working horse of the machine
learning community, I would say.
 
- 
Martin: Yes, exactly yeah.
 
- 
Herald: Do you know if the Nextbike adds
or removes bikes from the stations?
 
- 
Or do they relocate the bikes?
Or do… I mean, do they do that?
 
- 
Or does it just happen 
as an emergent behavior?
 
- 
Martin: I would say that… 
So, I had the chance to speak
 
- 
with a person of Nextbike while
I was working for the Marburg-ASTA
 
- 
and he said that first of all, it's not
not very technical yet. Well, not very
 
- 
digitalized yet, and they essentially
drive around. So I'm pretty sure that they
 
- 
certainly collect bikes that need
maintenance, but then logically,
 
- 
logically, probably also 
relocate them where necessary.
 
- 
Herald: All right. OK, someone wants to
know if the scripts that you use would be
 
- 
public? I assume the main part with the
API is already answered if you gave the
 
- 
Github repo. But are you planning to open
source anything else?
 
- 
Martin: Potentially so I have no plans on
doing so just because it's additional
 
- 
work, to be honest. If you're… well, I
can just do the same, well offer the same
 
- 
same thing as last year: Just write me an
email and if there's enough people who are
 
- 
interested, I probably strip down to my
internal repository. But since in the
 
- 
internal repository there are a few
private notes, that one is not published
 
- 
for sure right now.
 
- 
Herald: All right. Anything else?
 
- 
Dear listeners, 
you have maybe 30 seconds to comply.
 
- 
So there's one question, about 
the time period of data that you have,
 
- 
but I think you answered it in the talk.
Right?
 
- 
Martin: Yes, it's more or less whole 2020
and 1/2 to 2/3 of 2021 that I collected.
 
- 
Herald: OK, so you're probably mostly has
like a pandemic situation?
 
- 
Martin: Yes, exclusively.
Pretty much, yeah
 
- 
Herald: I wonder if that's more or less 
usage than usual. I mean, it's less people
 
- 
having to go places, but more people 
wanting to not use public transport.
 
- 
Martin: Yes, so based on my data, 
I can see that it's
 
- 
the number of parked bikes and 
therefore the usage is going down, so
 
- 
the number of parked bikes is going up.
Therefore, the usage is going down and
 
- 
that was also confirmed internally by some
Nextbike people. Now, one more thing, so
 
- 
regarding the people who are interested in
the code, regardless of if I am going to
 
- 
publish it or not, they if you have
questions, just drop me an email. I mean,
 
- 
the writing, the scraper in particular,
it's it's absolutely trivial. And if it's
 
- 
not trivial for you, then the code 
wouldn't be of of value to you anyway.
 
- 
Herald: All right. How does your data 
interpret broken / unavailable bikes at
 
- 
the station? I mean, can you see that? 
Or do you take it into account?
 
- 
Martin: Yes, so I don't see directly.
 
- 
I mean, I have a list of of all the bikes
and if I would dig a little bit deeper,
 
- 
I could probably, you know, compile a list
where I see where the bike, where a
 
- 
particular bike is standing at the moment.
And if that bike would be, for instance,
 
- 
absent for a for a longer time, I could
conclude that it's maybe broken,
 
- 
maintenance, maintained or something like
that. But there's no direct data on that.
 
- 
Herald: All right. Do you do you think
that Nextbike moving the bikes has somehow
 
- 
biased your data. 
Like if basically relocate them?
 
- 
Martin: That's a good question. I have
absolutely no idea. So I mean, what I what
 
- 
I did calculate was that, so I defined a
term that I, a term of activity,
 
- 
I defined it as the number of bikes coming
in, divided by the number of bikes going
 
- 
out, plus the number of bikes going in. So
it's so to say the activity and when
 
- 
that number - it's obviously between zero
and one - and if it's far from zero point
 
- 
five, that would mean that the station
runs empty essentially or overfills at
 
- 
some point and there are a few stations
where it's a bit above zero point five.
 
- 
But of course, that's only this well, the
the data that I used has all only the
 
- 
moved bikes incorporated already. So it's
not really something that could be used
 
- 
for really trying to find it.
 
- 
Herald: Do you, I mean, is this just kind
of data also available for,
 
- 
for bike sharing services 
that don't have docking?
 
- 
If they even exist still in Germany? 
I kind of lost track.
 
- 
I think maybe they 
all went bankrupt, but of course…
 
- 
Martin: What do you mean by docking?
 
- 
Herald: By, you know, they don't have
fixed stations, but they are floating.
 
- 
Martin: So I mean, all that I did was to
look at the stations, but actually there
 
- 
are a few free standing ones also in
Marburg, and these people are typically
 
- 
penalized, penalized by money, so they
have to pay, pay a fee. I didn't analyze
 
- 
it at all. Would be interesting for sure.
And as far as I know, there are cities
 
- 
where it's completely, well, there are 
no stations for Nextbike,
 
- 
where people can drop it off 
wherever they like.
 
- 
Don't quote me on that, it's 
just something that I've heard.
 
- 
Most likely in the large cities.
So maybe in Berlin could be.
 
- 
Herald: Yeah, I think here there are like
some locations where you have to drop the
 
- 
bikes, but that's, 
I'm not sure if that's Nextbike.
 
- 
I can never remember which ones
laughs I actually end up using.
 
- 
All right, everybody. Now is your last
chance to ask more questions.
 
- 
I feel like at Teleshopping, like the rC3
Teleshopping, which I highly recommend if
 
- 
you haven't checked it out. It's probably
the peak experience at the remote Congress
 
- 
is the Teleshopping channel.
And you should all have a look.
 
- 
And maybe buy some… 
some extremely useful items that they sell
 
- 
Herald: OK, so the chat confirms that 
Nextbike does have cities without stations
 
- 
Martin: Ah ja ja, very good.
 
- 
Yet, I mean, I can only…
 
- 
if you're remotely interested in all
these public transport data studies,
 
- 
definitely check out the 
ubahnverleih Github repository.
 
- 
There's a large number 
of systems documented there.
 
- 
Herald: OK, and that's just ubahnverleih, 
just as you would write it.
 
- 
Martin: Yes, let me look it up 
very quickly, Ubahn…
 
- 
Well, the person is from Ulm,
and he also contributed to the
 
- 
CCC infrastructure. His name is
Constantine and yes, it's ubahnverleih.
 
- 
And I think it's like, I think the repo
name name is WoBike, as far as I know,
 
- 
Herald: All right. Good. Thank you.
 
- 
Alright. I think we've managed to exhaust
the internet. So, people, where can they
 
- 
find you have to have any further
questions? Are you going to be wandering
 
- 
the remote, the world or what it's called?
You know the…
 
- 
Martin: Well, that's a good idea. I
haven't planned, but I can. So I've no
 
- 
idea how it works, but I'm sure I can
figure it out. So I mean, in general, drop
 
- 
me an email and you can find my email on
lellep dot xyz. It's my website.
 
- 
Other than that, I could be online 
in the 2D world adventure now,
 
- 
if that's of of value to anybody.
 
- 
Herald: People can maybe hunt you
down if they really need to, you need to.
 
- 
Martin: definitely ja.
 
- 
Herald: OK, wonderful. Well, thank you for
your talk and for answering the questions.
 
- 
And thanks everyone for tuning in.
Have a good remainder of Congress.
 
- 
I think you should be able to at some
point rate talks in the Fahrplan,
 
- 
if that feature still exists, so if you 
want to see more of this kind of stuff,
 
- 
maybe leave some feedback.
 
- 
Bye bye.
 
- 
Martin: Bye.
 
- 
rC3 postroll music
 
- 
Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!