-
RC3 preroll music
-
Herald: Hello, everyone, welcome back to
Chaos West TV. The next talk will start
-
momentarily. I will now switch back to
German for a few seconds to announce a
-
translation. Then I'll switch back and
then we'll go off to the races as they say
-
So nochmal schnell auf Deutsch,
willkommen zurück zu Chaos West TV, eure
-
beste Bühne auf dem rc3. Der nächste Talk
beginnt gleich er ist zwar auf Englisch
-
wird aber wie so vieles dank unserer
Übersetzungscrew auf Deutsch übersetzt.
-
Ihr solltet in der Lage sein das im Stream
einfach auszuwählen ohne größere Probleme
-
und dann könnt ihr den Vortrag auch direkt
simultanübersetzt auf Deutsch hören
-
und ich rede jetzt auf Englisch weiter.
-
Alright back to English.
-
Now in the comfort of your own homes
or wherever you're viewing the stream,
-
please do a warm round of applause
for our next speaker,
-
Martin, who will talk about
optimizing public transport.
-
Let's go.
-
Martin: Welcome to my contribution to this
year's rC3 2021 in the form of this talk,
-
Optimizing public transport:
a data-driven bike sharing study in Marburg
-
I would like to thank the organizers of the
rC3 2021 for organizing the whole event.
-
And in particular, I would like to thank
the channel that accepted me Chaos West TV
-
well for accepting the presentation of my
work. Today I would like to give you a
-
quick overview of one of my hobby projects
in which I scraped and therefore
-
downloaded over one million data points
regarding the bike sharing system in the
-
city of Marburg. This study came about
when I was traveling from Stuttgart to
-
Frankfurt and ultimately to Marburg some
time ago, and I was watching the amazing
-
SpiegelMining talk by David Kriesel. So
thank you very much for this implicit
-
inspiration of the work that you're about
to see now.
-
Who am I? My name is Martin Lellep,
and I studied physics in the past,
-
and actually, I continue to do so in the
form of a Ph.D. in theoretical physics at
-
the University of Edinburgh in Scotland
and in my spare time I like to do data
-
analysis of all kinds of data.
There are two more things...
-
There are two more things
that are important for here now.
-
It's first of all, I studied at the
University of Marburg, obviously in
-
Marburg previously, and then also I like
to ride my bike. Marburg, for those who
-
don't know it yet, it's a small,
university dominated town that is in the
-
north of Frankfurt am Main, roughly 80
kilometers. So an hour by car or an hour
-
by train, approximately. And again, it's
quite dominated by the university that is
-
located there, and that can be seen simply
in terms of, for instance, numbers. There
-
are roughly 25,000 students for an overall
population of 77,000 residents in total,
-
which is quite substantial, obviously. You
can see a quite popular picture here of a
-
picturesque scene in Marburg. We can see
the castle and then the river Lahn, as
-
well as a few houses and a bit of green.
And the bike rentals are currently
-
provided at the time of recording this by
the company called Nextbike. Before now
-
diving into a bit more technical details,
I would like to motivate my story or my
-
study by the story of Anna. Anna is a
university... is a university student at
-
the University of Marburg, and she lives a
bit outside the city, so she typically
-
does not walk to the place that she needs
to be or study at. But she takes the bus
-
from her... from her flat to the
university, to the city. And then does the
-
last mile by walking or cycling or
whatever. And she's also quite an eager
-
student, so she very often studies quite
late. As you can see here, that's a
-
picture of late Marburg, so to say, and
just as it happens now, she needs to catch
-
a bus now because she's a bit late. She
forgot to pack in her... her fancy MacBook
-
in time, so she needs to hurry up a
bit and, well, didn't really make it. So
-
therefore, she thought maybe
-
taking a Nextbike for the last mile
to the bus station is a good idea
-
so she can safely take then subsequently
the bus home. And normally the bus…
-
The Nextbike stations look like
that here. So there are plenty of bikes.
-
It's very easy to go there, grab a bike
and go to your destination. Now Anna must
-
be a very unlucky student today because
she arrives at the bike station, and it
-
turns out that the station is empty, so
ultimately she misses at least this bus
-
and therefore only arrives at home a bit
later. Her cooking plans and her Netflix
-
plans, all that stuff postponed a bit
because, well, she arrives a bit later.
-
And that's, of course, a very, very sad
story, and maybe it happens to multiple
-
people, not only Anna. And in fact, it
also happened to me a few times, and every
-
time it happened to me, I thought, well, I
must be the most unlucky person in whole
-
Marburg going to a normally completely
fully packed bike station and now it's
-
completely empty. Missing, for instance,
subsequent public transportation.
-
After it happened to me a few times, I
thought, well, maybe I'm not that unlucky.
-
So is there may be a system to empty bike
stations in Marburg. And given all my
-
my spare time interest of analyzing and
capturing data, I thought, well, data to
-
the rescue, of course. And therefore, the
idea for this talk now was to build a web
-
scraper in order to acquire Nextbike data.
Collect the data, store the data, analyze
-
the data and then hopefully finally help
Anna, me, and other students to figure out
-
which stations maybe to avoid and which
stations are safe to go to if you're in
-
desperate need for a bike.
-
The tech stack that I'm using here,
it's based on a Docker container
-
in which a python scraper runs
every 30 seconds that queries the
-
Nextbike API. It downloads the data, it
parses the data, and then saves the data
-
outside the Docker container in order to
be evaluated later on. And it turns out
-
that the whole concept of what I just
described also has a name. It's called
-
Extract, Transform, Load Pipeline or ETL
in short. And what I again wrote here is
-
an ETL pipeline in Python, and then I
wrote an analysis code also written in
-
Python and all that was running on a small
home server in my flat. The data that I
-
captured consists of the bikes identified
through IDs and then also the locations of
-
those bikes, typically at stations, but
some of them were also freestanding and
-
last but not least, the station locations,
and of course, obviously also a list of of
-
stations. And then with it, I went ahead
and did a few pictures that I'm about to
-
show now and a few analyses. And if you're
interested in that and there are slides
-
available on this website here, the
website can be read through the QR code or
-
through that link and this website
contains the slides that you'll see in
-
here, high resolution figures, a few
interactive figures and all the
-
information on the previous blog articles
that I wrote about this topic.
-
So the results of Anna, first of all, to
start slowly. It turns out that there are
-
37 bike stations in Marburg,
with roughly 230 bikes spread across
-
the whole Nextbike Marburg ecosystem.
-
And it's now, well, knowing that
there are roughly 40 stations,
-
it's quite interesting to see
where these stations are,
-
because then Anna could,
-
for instance, already go to another
station if one station is empty.
-
And what you can see here is now a map
of Marburg, where the stations are
-
annotated by these dots.
And the area of the dot,
-
as well as the color code,
corresponds to the average number of
-
parked bikes at that station. So let's see
an interactive version because it's a bit
-
nicer to see it in that way. So I click on
here. Alright. OK, now we can pan around
-
and zoom as you can often do with these
interactive graphics and also by clicking
-
on these buttons or on these these points,
you can see the station name, as well as
-
the average number of bikes placed there.
And becomes quite obvious that, well, most
-
of the stations are in the central part of
the city, a few in the outskirts here. And
-
it turns out that the largest station in
terms of the number of parked bikes on
-
average is the main train station
Hauptbahnhof. There are again a few more
-
spread around the
central part of the station,
-
such as the Elisabeth-Blochmann-Platz,
which is the second largest station.
-
And then if you continue the train
line here, you can see that there's
-
actually another set of stations, where
the secondary train station is.
-
So that's another train station,
smaller train station.
-
OK, so the first results for Anna
would then be a day-hour usage histogram,
-
because it's the kind of the first order
approach, I would say, in order to see how
-
the ecosystem of Nextbikes is in use
against day as well as hour. And therefore
-
Anna will based on this figure here, she
will understand when to maybe plan for a
-
bit more time when looking for a bike in a
desperate fashion. And since this figure
-
is a bit more difficult to understand, I
would like to take a moment to explain it
-
and we are going to start with the top
figure here. What you can see on the x
-
axis is the hour of the day and on the y
axis, and that's shown in the whole
-
figure. So each of the the numbers that
you see is the following: it's the
-
average. And well it's the number of
parked bikes and then you subtract the
-
average of the number of parked bikes in
the whole ecosystem of Marburg. So that
-
means if a number of zero is encountered
like roughly here, it means that the
-
average number of parked bikes simply in
the system at that point in time. When the
-
number is larger, it's above the average,
if it's smaller, it's below the average.
-
And you can clearly see from this small
figure here already that in the morning,
-
more bikes are typically parked. And then
in the evenings or around noon, you can
-
see two dips, a bimodal distribution so to
say. Where people, well, obviously use
-
bikes around noon and six p.m. roughly
where these used bikes, of course, are not
-
parked, and therefore these numbers are
smaller. And the same thing can be done
-
for the day of the week. Here and here you
can see that the Monday, well, the
-
beginning of the week and the end of the
week, meaning Monday, Tuesday and Saturday
-
Sunday are a bit more popular, so more
people ride a bike and therefore fewer
-
bikes are parked and therefore this is
negative. And then in the middle of the
-
week, fewer people seem to ride the bike,
the bikes in general. And if you combine
-
these figures now, you can see the the
joint histogram here, where you can not
-
only look for time or day separately, but
also in a combined fashion. So you would,
-
for instance, see that Monday morning is
the time where many people use bikes
-
because they are not as many bikes parked.
And then also on a Saturday, you can see
-
the same, so around afternoon many people
seem to use the bikes. Last but not least
-
on Friday mornings, it's quite easy to get
a bike because many bikes appear to be
-
parked, maybe because people envision
already the weekend. So that's the first
-
outcome for Anna. Well try to avoid times
around six and around noon when
-
desperately looking for bike. And although
even more interesting part for Anna is the
-
probability to find a specific station to
be empty. For that, I took the time series
-
of the number of parked bikes and counted
the occasions where there was no bike for
-
each of the stations here. And that has
been done again for each station
-
separately, so for each station, at the
end of the day, you get a number that
-
denotes the probability of finding that
station empty. And clearly, for instance,
-
the Hauptbahnhof, the main train station,
which was the largest station. It's
-
quite unlikely to find it empty,
and contrary, if you go to these
-
stations down here, for instance
the Am Plan / Wirtschaftswissenschaften
-
it turns out that these are empty at about
70 percent of the time, which is quite
-
substantial, I would say. And
interestingly, if you now look for the the
-
secondary train station in Marburg, the
Südbahnhof, you can see that this has
-
quite a substantial probability of
running empty at about 30 to 40 percent.
-
In particular, in comparison to the main
train station, which is essentially almost
-
never empty. Also interestingly, you can
then plot these probabilities against the
-
average number of parked bikes at the
station and you find an antiproportional
-
relation between those two. It means that
the larger the stations, the more unlikely
-
it is that it's empty, which is quite a
reasonable outcome, I would say.
-
So finally, to conclude for Anna,
-
she should try to avoid small stations
-
and in particular, she should try
to avoid the stations that are
-
well, annotated here with
the sad smiley, because these
-
tend to run empty quite often.
-
OK, so I have all this ETL pipeline
stuff already set up,
-
I have collected
over a million data points
-
and then I thought, well, maybe there's
more in the data then only helping Anna.
-
So everything that I've shown you so far,
it's from the perspective of a user.
-
And now I would like to turn to
what's the perspective of a city.
-
And there I would like to
ask a few questions, like…
-
How is Nextbike used in Marburg?
first of all,
-
and then, in general,
Is cycling a good thing for a city?
-
How can, or,
Can cycling contribute to a better city?
-
And now–better is of course first a quite
vague term–and then last, but not least,
-
is it worth improving
bike infrastructure for a city?
-
And all this again, is now from the
perspective of a city instead of a user.
-
The first thing that I would like to start
with is something that I call the distance
-
matrix in which I concentrated on the
positions of the bike stations and
-
computed the pairwise distances for all of
them. And since the distance is, of
-
course, symmetric, also the stored matrix
is now in the end also symmetric. And,
-
It turns out that there are roughly 600
combinations, and these combinations can
-
be shown in a symmetric matrix, as shown
here, where on the x axis this one here
-
and the y axis you can see the stations
and then each combination denotes
-
the distance between that one station and
the other station. It turns out that the
-
range of these distances is between zero
and roughly nine kilometers. And of
-
course, those that have a zero distance to
other stations are essentially the…
-
the stations themselves. So if you pick a
station, obviously the distance to itself
-
is zero and therefore the diagonal is
exactly zero. And then again, all the
-
remaining part is a symmetric copy of the
other diagonal part. The other thing and
-
that is now the main treasure, I would say
of this study, so the main base for
-
everything that follows is what I call the
transition matrix, where I counted the
-
number of transition of bikes from one
station to the other station. That is now,
-
of course, not symmetric anymore because
just because, say, five bikes go from one to
-
the other station, it does not mean that
these five bikes really come back again.
-
And therefore, the number of entries
is roughly 1400. Again, it can be shown
-
or visualized in the same fashion.
So you again have the stations on the one
-
axis and the same stations on the other
axis, and now each entry here in the
-
matrix corresponds to the number of
transitions of bikes from one to the
-
other. And the range is from zero to over
3000. And it turns out that actually the
-
self transitions, meaning somebody takes a
bike from a station, does something with a
-
bike, maybe grocery shop, grocery shopping
or so, and then the person comes back to
-
the same station. These events occur the
most frequent and therefore the largest
-
entry are on the diagonal, typically.
Sometimes it is not so interesting what
-
happens regarding the self transitions and
therefore another matrix can be derived
-
from the first one, namely a transition
matrix without diagonal elements where
-
those elements have been set to zero as
you can see here, if you look closely.
-
Speaking of looking closely, it's quite
educational if you not only see the
-
figures, but also can explore them a bit,
and therefore I rendered an interactive
-
version of it. Let's... let's visit it. So
that's now again, the matrix without the
-
diagonal and one with the diagonal. And
now by hovering over these entries so you
-
can see that, for instance, from Am
Schülerpark to Ockershäuser Allee zero
-
transitions happened. And then a bit
larger one, for instance, Biegenstraße to
-
Hauptbahnhof over 800 transitions happened
in the time of capturing the data. So feel
-
free to explore a bit, maybe identify the
most, most interesting, most used popular
-
routes. Ok, such a transition matrix can
actually also be shown as a network graph
-
where here I concentrate only on the
largest entry because it turns out the
-
full transition matrix is a bit too dense.
And what is shown out here is as blue
-
circles, it corresponds to a station and
then these edges here are drawn whenever
-
there happens a transition. And you can
already see here that there are a few
-
stations that are quite isolated, like
those and then many stations have a self
-
transition and mostly feed to a more
central station.
-
And since that is also more
interesting in an interactive fashion,
-
I also rendered
an interactive version of that.
-
Now again, we can zoom, pan around
and drag the graph around a bit.
-
And interestingly, if you click on a
station, you can see from where
-
transitions happen to that station. So
like those interconnected central ones,
-
like the Hauptbahnhof, the main train
station, it's quite connected in the
-
graph. And then there are a few like
Friedrichplatz which are not connected at
-
all. Interestingly, that one here, for
instance, the Cafe Trauma/Aföllerwiesen it
-
doesn't even have a self connection. So it
turns out that, well, people apparently
-
mostly use it for taking a bike going into
the city.
-
And most dominantly,
the Elisabeth-Blochmann-Platz, actually.
-
OK, so if you now take
these transition matrices,
-
as well as the distance matrices
into account and mix them, first of all,
-
you can get a few interesting numbers. So
here I calculated the overall number of
-
trips, which turned out to be 210,000
trips in the time of capturing the data,
-
which is quite some essential number for
such a small city like Marburg. And this
-
is, of course, computed by taking the sum
of the transition matrix elements. And
-
then if you weigh these sums or these
entries with the distances between those
-
stations, it turns out that those
transitions or those trips essentially
-
correspond to a distance of 320,000
kilometers that have been traveled, which
-
is a few times around the Earth actually.
-
Now, when these two basic numbers and the
-
the matrices that I introduced earlier are
combined with a few statistical details –
-
like, for instance, the average
consumption of fuel of a car or how much
-
CO2 it produces while driving – a few
ecological, economic and social benefits
-
of a bike system or cycling in general can
be derived. First of all, I found it quite
-
entertaining that the overall number of
calories burned corresponds to 8.6 million
-
kilocalories. And to convert that to a bit
more, well, real life number, I would say
-
I calculated how many Nutella jars
those are, and it turns out that
-
it's roughly 4,000 Nutella jars that
have been burned in terms of calories
-
just by this system of cycling. And then
also, it can be found that this distance
-
here, if you would have driven it
by a car, you would have,
-
well, used almost 26,000 liters of fuel.
You would have produced 40 tons of CO2.
-
And that fuel that you would have bought
would have cost 34,000 €, actually.
-
Interestingly, that number here
of 40 tons of saved CO2
-
corresponds to an average
German who lives for 4 years
-
or 4 Germans that live for one year.
So a typical German produces
-
roughly 10 tons, and therefore
it's four times that, obviously.
-
Ok, so again, from the transition matrix,
-
you can derive a few more interesting
details like, for instance, details that
-
are interesting from the perspective
of traffic management.
-
Like, here I calculated the most popular
routes by finding the maximal elements
-
of the transition matrix. And it turns out
that the most popular route has been used
-
well over 2000 times a year from the
Hauptbahnhof to the Ginseldorfer Weg. And
-
if you look closely, you can see that the
main train station or the Hauptbahnhof,
-
as well as the Elisabeth-Blochmann-Platz
is involved in many of those top row routes.
-
And that's now again interesting. For
instance, if a city would like to improve
-
the bike system because we've now seen
it has quite a good impact for social,
-
ecological, and economical aspects.
-
But let's say the the city has maybe
limited financial resources.
-
It would be interesting to simply
calculate the most popular routes,
-
and then start fixing
or improving them first.
-
OK, now at that point,
you might ask yourself,
-
Well, what kind of data did he scrape?
-
And for that, I would like to
show you this graph. It shows
-
the number of parked bikes in the whole
ecosystem of Marburg against time.
-
And as you can see,
I did it in two batches.
-
The first one has been obtained from
March to December 2020. So last year.
-
And then I restarted the scraping at the
end of April and finished just a few days
-
ago in December 2021. And you can clearly
see that the number of parked bikes
-
decreases when the weather is good or when
there are summer months and therefore most
-
likely because the weather is good. And of
course, it suggests itself a bit given
-
that I captured this in 2020 and that one
year in 2021 and taking the corona
-
pandemic into account. Well, how does it
compare?
-
And therefore, I concentrated on the
overlapping month of the two data sets
-
and calculated, well,
the comparison, as you can see here.
-
Now in blue, it's 2021 this year
and 2021, sorry 2020 is shown in red.
-
And you can see that the number of
parked bikes increased actually.
-
There might be a multitude
of explanations for that. I don't know.
-
Maybe one explanation could be that people
took more advantage of working from home.
-
OK, so everything that I've shown you so far,
-
it's been mostly statistical statements,
averages, sums and stuff like that,
-
and now I was interested if it's possible
to do also more precise predictions.
-
And therefore I turn
towards a machine learning or
-
artificial intelligence task where I
predicted the num… where I tried to
-
predict the number of parked bikes,
meaning the quantity that I've shown over
-
and over again in the in the last few
minutes. So is it possible to predict that
-
number based on the hour of the day, the
weekday and the temperature that is shown
-
here for 2020? And when starting such a
task, it's always, first of all, very
-
useful to investigate the training data.
And therefore well I try to plot it. And
-
And because it's a three dimensional face
space, it's also very simple to plot it.
-
So you can essentially plot it as a
scatterplot. And the color coding here has
-
been chosen to denote the target variable,
meaning the number of parked bikes.
-
And just by inspecting the data, you can
already see that the smaller the
-
temperatures are, the fewer… sorry, the
more bikes are parked and therefore the
-
fewer bikes are used. I use a random
forest machine learning model, which
-
consists... which is an ensemble model of
decision trees, of randomized decision
-
trees. And this model is quite powerful
because it can work with little data. It
-
can work with a lot of data, and it's also
very flexible. If you would ever like to
-
extend the face space, like maybe it would
be interesting to see if one could predict
-
the number of parked bikes given a bank
holiday or given weekend. And all these
-
aspects could be added to the random
forest relatively easily. And that's now
-
the outcome: So I show the measured data,
well that's been data that hasn't been
-
seen by the model before, and I show that
data here and then the densely covered,
-
face-based prediction of the machine
learning model here. And you can see that
-
the color trends, they correspond quite
well to each other. Like you can, for
-
instance, see the smaller numbers or
larger parked numbers in the regime of
-
small temperature and also from a
quantitative perspective, the prediction
-
is quite decent as the square root of the
mean squared error corresponds to a
-
roughly a tenth of the average value of
the parked bikes.
-
Which, again in this context is quite a
decent prediction performance,
-
given how naive the
approach was in general.
-
OK, I did a bit more on machine learning,
but I'm not showing that here.
-
I calculated the Markov steady state
for the same data essentially.
-
And if you're interested in that, well,
feel free to check out this link here.
-
OK, last but not least, I would,
of course, like to come to
-
the summary for Anna, me,
and maybe other students.
-
So first of all, what I did was to scrape
Nextbike data in Marburg in order to find,
-
which stations to potentially avoid when
you're in desperate need for a Nextbike.
-
And for that, I calculated
the probabilities of empty stations
-
and found that the larger the station,
the less likely it is to run out of bikes.
-
So the general recommendation
from my side would be:
-
try to find larger stations if you're
in desperate need for an Nextbike.
-
And feel free to go back to
the interactive map to see the
-
the locations of these stations, which is
quite interesting in itself, I would say.
-
And then I turned towards
the perspective of a city, and
-
investigated a bit the usage patterns
of Nextbikes and therefore representative
-
most likely also cycling in Marburg, where
I calculated the day-hour usage.
-
So when is the system quite busy
and generally the most popular routes,
-
which might be of use for city planning
and also social, economical, and
-
ecological benefits of the whole system.
-
Last but not least, I showed that
-
more precise predictions are possible when
maybe a statistical statement is not
-
enough and you would like
to do per case predictions.
-
Last but not least, I was fortunate
enough to work with AstA Marburg.
-
In particular, Lucas and David,
thank you very much for your trust
-
in that project where we try to optimize
the placement of the bikes in the future.
-
The take home messages are now,
first of all:
-
Bikes are amazing! And not only are they
amazing for you and the environment,
-
but also for your wallet.
So you save essentially money on gas.
-
And also, I would like to,
-
well, highlight that those data-driven
optimizations of public transport
-
have the potential to, well,
increase the life, the quality of life of
-
many of us at moderate cost. So again, I
would like to come back to a case where
-
maybe a city would like to
improve bike infrastructure
-
that doesn't have enough
money to do it in one go.
-
So then it might be interesting
to first find–in a data-driven way–which
-
combinations of, now in Nextbike terms,
maybe stations or in general streets
-
are popular, and then these might be worth
being fixed first with a limited budget.
-
OK, if you're interested in more, I was
very fortunate to be able to speak at the
-
last rC3 already about data in Marburg,
but last year I spoke about parking
-
in Marburg. If you like to, well, read the
blog articles corresponding to that
-
or just see the official CCC video,
just follow these links shown here.
-
Thank you very much for your attention.
-
If you have anything to get in contact
with me, reach out to my e-mail address.
-
Maybe some ideas on how to improve
a talk or what else to evaluate.
-
And then all the supplementary
materials that I mentioned,
-
and what I've shown here,
can be found again on this link here.
-
In particular, thank you very much
to all the people who reached out to me
-
based on my last year's talk. I haven't
come about to respond properly, but
-
I'm 100 percent certain that I will do so.
-
Thank you very much for your attention,
and have a good year.
-
Herald: Alright, welcome back. It's time
for the Q&A now. You probably know the
-
drill, but I repeat it anyway. If you're
on Twitter, on Mastodon or on the
-
Fediverse in general, the hashtag is
#rc3cwtv to ask any questions. And if
-
you're in the hackint IRC, the channel
name is the same except there's a dash in
-
between the rc3 and the cwtv. And we
apparently already have some questions, so
-
I'll just get started now.
-
First question:
Is the Nextbike API free to use?
-
Does Nextbike even know
that you did this scraping?
-
Martin: Yes, so as far as I know, the
Nextbike API has been reverse engineered
-
from the iOS app and there's a Github repo
by ubahnverleih and he documents lots of
-
APIs of public transport companies like
Nextbike or some companies that also
-
produce the scooters. And since it's the
public, since it's the official iOS API,
-
it's more or less public, so to say,
it's free and it's pretty much quota unlimited
-
because normally all the iPhones
access it. But again, I can only recommend
-
the ubahnverleih repository
on that on Github.
-
Herald: And you don't need
any credentials to access it?
-
Martin: No. Actually, you can, as far as
I checked, you can pretty much access the
-
whole world. So you can access stations
in Poland in, well, all of Germany now.
-
Herald: That's cool. It's probably
accidental, but it's quite cool anyway.
-
Martin: laughs Yeah.
-
Herald: Ok. What software did you use for
the machine learning stuff?
-
Martin: The machine learning stuff
has been done with Python,
-
and then specifically with sklearn,
which is a quite popular machine learning
-
framework for Python.
-
Herald: The working horse of the machine
learning community, I would say.
-
Martin: Yes, exactly yeah.
-
Herald: Do you know if the Nextbike adds
or removes bikes from the stations?
-
Or do they relocate the bikes?
Or do… I mean, do they do that?
-
Or does it just happen
as an emergent behavior?
-
Martin: I would say that…
So, I had the chance to speak
-
with a person of Nextbike while
I was working for the Marburg-ASTA
-
and he said that first of all, it's not
not very technical yet. Well, not very
-
digitalized yet, and they essentially
drive around. So I'm pretty sure that they
-
certainly collect bikes that need
maintenance, but then logically,
-
logically, probably also
relocate them where necessary.
-
Herald: All right. OK, someone wants to
know if the scripts that you use would be
-
public? I assume the main part with the
API is already answered if you gave the
-
Github repo. But are you planning to open
source anything else?
-
Martin: Potentially so I have no plans on
doing so just because it's additional
-
work, to be honest. If you're… well, I
can just do the same, well offer the same
-
same thing as last year: Just write me an
email and if there's enough people who are
-
interested, I probably strip down to my
internal repository. But since in the
-
internal repository there are a few
private notes, that one is not published
-
for sure right now.
-
Herald: All right. Anything else?
-
Dear listeners,
you have maybe 30 seconds to comply.
-
So there's one question, about
the time period of data that you have,
-
but I think you answered it in the talk.
Right?
-
Martin: Yes, it's more or less whole 2020
and 1/2 to 2/3 of 2021 that I collected.
-
Herald: OK, so you're probably mostly has
like a pandemic situation?
-
Martin: Yes, exclusively.
Pretty much, yeah
-
Herald: I wonder if that's more or less
usage than usual. I mean, it's less people
-
having to go places, but more people
wanting to not use public transport.
-
Martin: Yes, so based on my data,
I can see that it's
-
the number of parked bikes and
therefore the usage is going down, so
-
the number of parked bikes is going up.
Therefore, the usage is going down and
-
that was also confirmed internally by some
Nextbike people. Now, one more thing, so
-
regarding the people who are interested in
the code, regardless of if I am going to
-
publish it or not, they if you have
questions, just drop me an email. I mean,
-
the writing, the scraper in particular,
it's it's absolutely trivial. And if it's
-
not trivial for you, then the code
wouldn't be of of value to you anyway.
-
Herald: All right. How does your data
interpret broken / unavailable bikes at
-
the station? I mean, can you see that?
Or do you take it into account?
-
Martin: Yes, so I don't see directly.
-
I mean, I have a list of of all the bikes
and if I would dig a little bit deeper,
-
I could probably, you know, compile a list
where I see where the bike, where a
-
particular bike is standing at the moment.
And if that bike would be, for instance,
-
absent for a for a longer time, I could
conclude that it's maybe broken,
-
maintenance, maintained or something like
that. But there's no direct data on that.
-
Herald: All right. Do you do you think
that Nextbike moving the bikes has somehow
-
biased your data.
Like if basically relocate them?
-
Martin: That's a good question. I have
absolutely no idea. So I mean, what I what
-
I did calculate was that, so I defined a
term that I, a term of activity,
-
I defined it as the number of bikes coming
in, divided by the number of bikes going
-
out, plus the number of bikes going in. So
it's so to say the activity and when
-
that number - it's obviously between zero
and one - and if it's far from zero point
-
five, that would mean that the station
runs empty essentially or overfills at
-
some point and there are a few stations
where it's a bit above zero point five.
-
But of course, that's only this well, the
the data that I used has all only the
-
moved bikes incorporated already. So it's
not really something that could be used
-
for really trying to find it.
-
Herald: Do you, I mean, is this just kind
of data also available for,
-
for bike sharing services
that don't have docking?
-
If they even exist still in Germany?
I kind of lost track.
-
I think maybe they
all went bankrupt, but of course…
-
Martin: What do you mean by docking?
-
Herald: By, you know, they don't have
fixed stations, but they are floating.
-
Martin: So I mean, all that I did was to
look at the stations, but actually there
-
are a few free standing ones also in
Marburg, and these people are typically
-
penalized, penalized by money, so they
have to pay, pay a fee. I didn't analyze
-
it at all. Would be interesting for sure.
And as far as I know, there are cities
-
where it's completely, well, there are
no stations for Nextbike,
-
where people can drop it off
wherever they like.
-
Don't quote me on that, it's
just something that I've heard.
-
Most likely in the large cities.
So maybe in Berlin could be.
-
Herald: Yeah, I think here there are like
some locations where you have to drop the
-
bikes, but that's,
I'm not sure if that's Nextbike.
-
I can never remember which ones
laughs I actually end up using.
-
All right, everybody. Now is your last
chance to ask more questions.
-
I feel like at Teleshopping, like the rC3
Teleshopping, which I highly recommend if
-
you haven't checked it out. It's probably
the peak experience at the remote Congress
-
is the Teleshopping channel.
And you should all have a look.
-
And maybe buy some…
some extremely useful items that they sell
-
Herald: OK, so the chat confirms that
Nextbike does have cities without stations
-
Martin: Ah ja ja, very good.
-
Yet, I mean, I can only…
-
if you're remotely interested in all
these public transport data studies,
-
definitely check out the
ubahnverleih Github repository.
-
There's a large number
of systems documented there.
-
Herald: OK, and that's just ubahnverleih,
just as you would write it.
-
Martin: Yes, let me look it up
very quickly, Ubahn…
-
Well, the person is from Ulm,
and he also contributed to the
-
CCC infrastructure. His name is
Constantine and yes, it's ubahnverleih.
-
And I think it's like, I think the repo
name name is WoBike, as far as I know,
-
Herald: All right. Good. Thank you.
-
Alright. I think we've managed to exhaust
the internet. So, people, where can they
-
find you have to have any further
questions? Are you going to be wandering
-
the remote, the world or what it's called?
You know the…
-
Martin: Well, that's a good idea. I
haven't planned, but I can. So I've no
-
idea how it works, but I'm sure I can
figure it out. So I mean, in general, drop
-
me an email and you can find my email on
lellep dot xyz. It's my website.
-
Other than that, I could be online
in the 2D world adventure now,
-
if that's of of value to anybody.
-
Herald: People can maybe hunt you
down if they really need to, you need to.
-
Martin: definitely ja.
-
Herald: OK, wonderful. Well, thank you for
your talk and for answering the questions.
-
And thanks everyone for tuning in.
Have a good remainder of Congress.
-
I think you should be able to at some
point rate talks in the Fahrplan,
-
if that feature still exists, so if you
want to see more of this kind of stuff,
-
maybe leave some feedback.
-
Bye bye.
-
Martin: Bye.
-
rC3 postroll music
-
Subtitles created by c3subtitles.de
in the year 2022. Join, and help us!