hacc preroll music
Herald: And a lovely welcome back to the
haccs stage on the third day this
Congress, we are here with a talk on "A
few quantitive thoughts on parking in
Marburg" by Martin L. He's interested in
data analytics now and infrastructure and
traffic in general. And because of that,
he started scraping publicly available
parking data in Marburg and just went on
and analyzed it and found a lot of
interesting things which he is going to
present in this talk to you right now. In
case you didn't know, there is IRC client
on the live.hacc.media where you can ask
questions later or with the #rC3hacc tag
on Twitter.
Martin Lellep: Welcome to my talk "A few
quantitative thoughts on parking in
Marburg". I am delighted to speak here on
this Congress because I love the yearly
conferences. Also, thank you to the
organizing team for making all this
possible. You do an absolutely fabulous
job. Now, the first question that you
should ask is: why? The following is a
purely hobby project question, I came up
with a question because transportation is
important, but unfortunately, it's also
difficult. The most popular vehicles these
days are cars and hence the question, how
do people park in Marburg? Who am I? My
name is Martin, and I analyze publicly
available data. I live close to Marburg,
therefore the parking in Marburg. Now, a
little bit of background regarding
Marburg, it's a small picturesque, vibrant
university town. There are a few highlights,
such as the castle, the old town and the
river, just to name a few. It has around
80,000 residents and a somewhat dense core
around the old town. You can see a few
pictures here of the castle, the old town
and the river, respectively. Now, at this
point, I would like to give my props to
David Kriesel because all this work was
inspired by his amazing data science
talks. You can find them on YouTube. And I
absolutely encourage you to look for the
Bahnmining, Spiegelmining and the Xerox
story talks. OK, so if you have questions,
then please ask, I will be there live
during the Q&A of this conference and also
you can send me an email with whatever you
like, essentially. OK, so first of all, I
would like to give a quick introduction to
the data source. Now, the data, the
parking data from Marburg is publicly,
well it's published live on a system that
is implemented by the city, by the city
council, I believe . It's called
Parkleitsystem Marburg or PLS for now, and
it publishes the data such as the parking
decks, the number of free parking spots
and the location. The address here is
pls.marburg.de. And let's see how it
looks. Yeah, so obviously it's still
online and you can see here the parking
deck names listed, the number of free
parking spots. Color coded is if it is
rather full or if it's rather empty, you
can see here all of them are in the green.
The green color coding here, it's
because it's probably close to Christmas.
Nobody wants to really park in the city.
And the only one that's this one here, the
Marktdreieck Parkdeck that it has some
load to it. Then also there's a button
called route. So whenever you click on the
on this button, say we we pick the
Erlenring-Center button, we are redirected
to Google Maps and we can see here the
location of this parking deck, for
example. Let's go back. Last but not
least, there's also the maximum vehicle
allowance and of course, the time stamp of
the data. OK, back to the presentation
now. This is a very simple website, so of
course it's easy to scrape and that's what
I did. Regarding the scraper, I used a
Linux computer and a docker container. And
this scraper, you can see a small sketch
here to the left, it simply visits the
website every 3 minutes inside the docker
container and writes the data into I
believe it was csv files, which are
subsequently used for the data analysis.
All of it, the scraper and the analysis
scripts are written in Python. OK, the
data format is pretty simple, it's
processed internally with data frames,
with the package panda. Everybody who
knows Python probably knows panda, anyway.
It's the data format is as follows. The
row corresponds to the time. The column
corresponds to the specific parking deck,
and the cell corresponds to the number of
free parking spots at that time of that
parking deck. Now, in order to make the
numbers a bit more usable, I transformed
the number of free parking spots to the
number of used parking spots by
subtracting it from the maximum along the
time. OK, now the intro is just to get
used to the data, we'd like to take a look
at the locations of the of the park houses
or the park decks. This is a screenshot.
There's an interactive version. Let me
open it here. It's a interactive map. You
can see two types of markers, the first
one red, the second one green, and that's
because the red ones are the ones that are
given, well they are encoded in the links
of the PLS system, and they
are actually wrong. So when you click on
the for instance. Erlenring-Center parking
deck that I've done before, the location,
longitude and latitude are actually
incorrect and, um, Google Maps corrected
on the fly. And therefore, I have shown
here the ones given on the website that
are incorrect in red and the ones shown
that are correct. So you can safely focus
only on the green ones. Um, a quick
overview here is the train station region,
there are two. And then they are scattered
around the city. Um, sometimes there are
two parking decks very close by, for
instance, these two and these two. And
that's because it's essentially one
parking deck with two parking sections
typically inside the building and on top
of the building. OK, let's go back to the
presentation. With that in place, we or we
take a look at the joined data, meaning I
accumulate the number of used parking
spots across all the parking decks. You
can see that here now, so it's a quite
comprehensive picture, I started data
scraping in August 2019 and stopped it at
the end of February 2020.
This data here is a different resample
frequency of the original and raw data. I
started with a resample of one hour. So
just a reminder, it's the true frequency
is three minutes. Again, I resampled here
into one hour. It's not very easy to
understand on that scale here. Then to one
day it's the orange now and lastly on one
week and we can learn different things
from it. So in particular, the orange
curve of one day shows that there might be
some periodicity in the signal. And the
green one shows that there are times or
weeks that are particularly... where
there's particularly little parking
demand, for instance, here around
Christmas 2019. OK, so again, from the
orange signal, you can see that there's
probably some periodicity, and in order to
quantify that, I plotted the or computed
the auto correlation function. The auto
correlation function essentially takes a
time signal and computes the overlap
between the time signal and the same
signal shifted by some time and whenever
there's a large overlap. That points
towards the periodicity, and here we can
see that the periodicity maximum or the
auto correlation maximum, the first one
corresponds to one week and therefore the
periodicity can be safely assumed to be at
seven days. Of course, when there's
periodicity and a signal at seven days,
for instance, there's also periodicity. In
14 days and in 21 days, but the
correlation coefficients, they decay
typically. OK, now we have the periodicity
with respect to days in place. Now let's
take a look at the day and hour demand.
And for that, I computed a two dimensional
histogram with the day Monday to Sunday on
the one axis and the other axis
corresponds to the hour. And here we can
clearly see that the majority of the
parking demand is around the noon hour. So
starting from 11 to to approximately,
let's say, 5 p.m. or so. Interestingly.
That was a point where I was surprised is
that Sunday's is a day where there's
little parking demand in Marburg, I
wouldv'e guesstimated that Sunday when
everybody has spare time, they typically
rush into the city. But that's obviously
not the case. Another interesting fact is
that Monday morning seemed to be very
difficult to get up because you can see
the parking demand is smaller than on on
other mornings. OK, now, after that, I
come to the separated... separate and
analysis where I take a look at the
individual parking decks. So first of all,
again, the times series, it's it's a bit
dense and it's very hard to see. So there
are a few things to learn from the
picture. So first of all, the green
signal that corresponds to the Erlenring-
Center. Reminder, I just opened it. In the
very beginning of this talk seems to be
the dominant one, then there are quite a
few data gaps. So take for instance. Well,
it's very apparent here for the violet
one, the Furthstraße-Parkdeck, this one
here. And that's an extreme case. It had
obviously some kind of problem. It was
open for some time and then closed for
some other times. Typically, park houses
or parking decks are either open 24/7, but
there are also quite a few that are that
close overnight. OK, next I was interested
in the statistics of parking demand for
individual parking decks, so I
concentrated only on, say, one parking
deck and computed the histograms of the
used parking spots also, depending on the
time. Let's focus here on the Oberstadt,
it's the old town and you can see that the
overall parking demand peaks at around,
let's say, maybe 20 used parking spots, so
that's the average, but that's not for all
times when we make that statement,
depending on the time, for instance, the
morning we can see that's approximately
the same. But when we go towards noon, we
can see that the number of parking spots
or used parking spots increases. There are
even a few times when it's at the maximum
around noon. Now, when we go towards later
hours, the maximum shifts towards smaller
values again. Now, this this behavior of
the maximum shifting, so clearly,
depending on the hour, is not not apparent
for all the parking decks. For instance,
the Parkdreieck here ... Marktdreieck,
sorry, that doesn't show the signal as
clear as the Oberstadt one. OK, from this
all now we can quantify also the, I call
it integral parking demand, simply it's
the the number of parking spots that have
been provided per parking deck. Now the
picture here, it's normalized to the
maximum and one can see from this picture
here very easily that the Erlenring-
Center, as we've estimated or guessed
previously already is the one that's
dominating the whole city. It's providing
the most parking spots by a large margin,
actually. The next one is the Lahn-Center
and then maybe the Oberstadt and the other
ones follow after these. Another
interesting point here is that the
proportion of parking spots provided on
weekends differs for the different parking
decks. For instance, here you can see this
one here is quite a big portion, the
Erlenring-Center, also on weekends.
Contrary, the Marktdreieck-Parkdeck has
only a very small portion of, um, of
parking spots provided on weekends. It
might be interesting to know that this
particular parking station is ... it's the
one that is used if you want to go to a
doctor, because it's very close. So many
doctors are not open on Sundays, on
Saturdays, and therefore probably the
parking demand is quite low. Now, there's
a temporal version also where I rendered a
small video that I'm opening now, and you
can see essentially the same as in the
previous graph, but against time. Again,
it's very apparent that there's a
periodicity and here my scraper crashed
and it's back in business again, and I
found it interesting to see that there are
parking decks that have cars... well that
host cars, even at night, for instance,
here the Erlenring-Center again in the
Lahn-Center, the ones that are the largest
one, they offer parking also overnight.
And there are some cars in there,
probably. OK, let's close that again. Now,
I come lastly to the prediction part now.
The goal here is to measure the parking
demand through the parking decks, but then
to interpolate between the parking decks,
so I would like... so I have ...say the
Oberstadt the old town and the, I don't
know, the Erlenring, which was the largest
one. I would like to know what's the
parking demand in between, for instance.
For doing so, I use a spatial fit and I
use a machine learning model for that, in
order to do that spatial fit. It is now,
in this particular case, a non parametric
model called Gaussian Process Regression.
And the nice thing about that is that it
also returns the uncertainty. Because say,
for instance, you would like to use these
model, machine learning predictions to
say, build some kind of parking deck or to
get rid of one. All these operations, all
these derived actions would be very
expensive. So you would like to know if
the uncertainty is large or small for
whatever the machine learning model
predicts. Just for the math oriented
people. If you're interested in that
model, definitely take a look at the, I
would call it, Gaussian process bible by
Rasmussen. It's amazing to read. Yeah,
there are two, um, evaluations now, I did.
The first one is based on the whole data
set, so there's no spatial or..sorry...
there's no temporal resolution. And what I
do, I did well, I rrendered a video and I
would like to explain you the outcome of
that while it is running. The top picture
here shows you the prediction by the
machine learning model. And the the bottom
picture shows you the uncertainty. The
training data, meaning the parking decks,
is denoted by the black points. Now, first
of all, the uncertainty, you can see that
wherever there is training data, the
uncertainty goes down. So the model is,
um, certain about its prediction that
because, well, there's training data and
in between the uncertainty rises again.
Now the prediction, you can see some small
hill. It's exactly the Erlenring-Center,
which was the largest one. Now, what is
shown in the video is it's rotating. You
can see the coordinates of Marburg on the
on the plane, on the bottom plane. And at
some point, the view rotates upwards and
gives you a top down perspective with a
corresponding color bars or corresponding
color map. So, again, here's the the
maximum, the Erlenring-Center. And I did
that because next we would like to finally
measure the parking demand between
stations. OK, there's another small video
again, and now we start right from the top
down, color coded view and again, the
black points are the... is the training
data, but now the red points are, is kind
of test data, meaning positions in
between. I concentrated now on the Mensa
because I have a special relation with the
Mensa, the physics department, the
university library, the train station and
the cinema. And just to demonstrate from
this spatial fit, we can derive the
parking demand at these positions also.
Here, this yellow pike, it's the
Erlenring-Center again. Now, that's only a
qualitative result, of course, I don't
want to derive any quantitative at this
point, it's just a proof of concept that
it is possible to derive something like
that from the publicly available data.
Now I forgot to mention the beginning that
there's a bonus and I would like to come
to the bonus now. It is about the Corona
crisis or pandemic, of course. What I did
is, the initial data acquisition phase,
here in black, that's the whole talk was
about that black portion here. I stopped
it at around the end of February and I
restarted the whole data acquisition
process now again at in approximately
April. Just to capture something from the
Corona crisis as well. And you can see
here again, the time series. I think the
most interesting bit about it and the most
comprehensive bit is the the mean . You
can see the the mean across the whole time
denoted by this dashed line. And you can
see that the mean is smaller. So during
the Corona pandemic fewer people parked in
Marburg, which is reasonable, I would say.
But there are also times where the number
of parking spots decreased significantly.
So for instance, right when the Corona
crisis started in April and now the second
wave in October, November, December, it is
visible that the parking demand decreased
a lot. And I went one step further and
wanted to know the the differences between
pre Corona and during Corona also for each
of the parking decks, that's what I did
here. It's now not the normalized parking
demand, but the absolute parking demand.
So now we can see also the absolute
numbers, the black black bars you've seen
previously already. Now the red bars is
for the during the Corona crisis. And then
I defined these, the first wave and the
second wave as serious corona times. So I
also plotted a third bar... set of bars
here. And it's interesting to see that
while most of the parking decks, of
course, suffered in terms of providing
parking demands or most of them provided
fewer parking decks, parking spots. But
there are a few, like, for instance, the
Marktdreieck-Parkdeck here that, well,
almost increased. We can see during the
corona in general it increased a bit. And
then during the heavy corona, it increased
even more. And as I mentioned before, this
is the parking deck that corresponds to,
yeah, a whole collection of doctors. So. I
derive that well during Corona times the
parking demand in front of doctors even
increased a tiny bit. Yeah, with that, I
would like to come to my conclusions.
Thank you for sticking with me until now.
So I scraped publicly available data here
with a small scraper set up. I analyzed
it, for instance, for day and hour
patterns. And last but not least, did some
machine learning in order to quantify the
demand in between the stations, there is
an accompanying blog article also. You can
find it down here, there all the figures
in higher resolution and you can play
around with an interactive map also, if
you like. Um, and to finally now conclude
the presentation. I would like to hear
from you what you think about this
analysis. I'd like to improve with these
kind of mini studies. And therefore, I
would be very interested in your critique
regarding the content, the presentation
and general content... comments. Again,
you can email me to this email address
here, or alternatively, I set up a Google,
um, Google form. So the Google forms
document that exactly comprised of these
questions, and you can simply type them in
if you're interested. Thank you very much.
Herald: All right, first of all, thank you
for this amazing talk, I have a few
questions what have been relayed to me and
I'm just going to ask them one after the
other. And let's not waste any time and
start with the first one. Have you found
parking decks that are usually heavily
overloaded or never completely used?
Martin: Um so. Given that there are only
around what was it, 8 or 9 or 10 in the
data set, honestly, I never looked for for
that question. So, um, short answers is:
No. Long answer, yes, I could have or I
still could, I would say.
H: OK. Have you tried prediction in time,
so guessing which parking decks will be
exhausted soon?
M: No, no. So that's obviously it's
like... it's... I would consider that
something like the predictive maintenance
of traffic business kind of. It's
definitely a thing that people that have
more time and more are willing to invest
more definitely should do and could do. I
would say I mean, there's lots of lots of
additional data that might be of interest,
like weather data. And, for instance, is
it a is it a public holiday, yes or no and
all that kind of stuff. So, again, short
answer.: No. Long answer. Yes. Would be
possible.
H: OK, so if anyone watching has the time
or energy to do that, they could.
M: Absolutely. Yes.
H: OK, and the last question I have right
now is, will the code or especially the
scraping part be available publicly or
like in the GitHub or somewhere?
H: Um, I could do that. So I was very I
was quite hesitant with it. So obviously
publishing the data could be problematic.
I have no experience with it on the legal
side. So I would probably not publish the
data, which is I mean, it's old data
anyway. So and but then regarding the
code, I was just waiting if anybody's
interested. So given that somebody stated
the interest, I would probably publish it.
Yes.
H: OK, yeah I think that's it from the
question side .
M: Hmhm.
H: And they were all answered quite
nicely. And judging by that, I don't get
any more questions right now. So, yeah, I
would conclude is talk. Maybe you can also
like have a last word. From my side I'm
done here.
M: Yes. So, um, well, thank you very much
for watching the talk. And I try to
improve. I think I said it on the last
slide. If I'm right, let me know if you
have any doubts or things to improve
essentially on. And then regarding maybe
the last question of publishing it, I
believe that I put a link there to find my
blog and I would probably just add another
blog post stating well there's an github
repository. You can go there and just find
just find the code and stuff like that
there. So if you're interested, just, you
know, find my website. My name is Martin
Lellep. Um, and then you will in a few
days, I guess probably in 2021 only. So I
won't be able to publish it in the next
two days. But then the code will be
public. Yes.
H: OK, then. Have a great day. Great time
at Congress and byebye.
postroll music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!