-
hacc preroll music
-
Herald: And a lovely welcome back to the
haccs stage on the third day this
-
Congress, we are here with a talk on "A
few quantitive thoughts on parking in
-
Marburg" by Martin L. He's interested in
data analytics now and infrastructure and
-
traffic in general. And because of that,
he started scraping publicly available
-
parking data in Marburg and just went on
and analyzed it and found a lot of
-
interesting things which he is going to
present in this talk to you right now. In
-
case you didn't know, there is IRC client
on the live.hacc.media where you can ask
-
questions later or with the #rC3hacc tag
on Twitter.
-
Martin Lellep: Welcome to my talk "A few
quantitative thoughts on parking in
-
Marburg". I am delighted to speak here on
this Congress because I love the yearly
-
conferences. Also, thank you to the
organizing team for making all this
-
possible. You do an absolutely fabulous
job. Now, the first question that you
-
should ask is: why? The following is a
purely hobby project question, I came up
-
with a question because transportation is
important, but unfortunately, it's also
-
difficult. The most popular vehicles these
days are cars and hence the question, how
-
do people park in Marburg? Who am I? My
name is Martin, and I analyze publicly
-
available data. I live close to Marburg,
therefore the parking in Marburg. Now, a
-
little bit of background regarding
Marburg, it's a small picturesque, vibrant
-
university town. There are a few highlights,
such as the castle, the old town and the
-
river, just to name a few. It has around
80,000 residents and a somewhat dense core
-
around the old town. You can see a few
pictures here of the castle, the old town
-
and the river, respectively. Now, at this
point, I would like to give my props to
-
David Kriesel because all this work was
inspired by his amazing data science
-
talks. You can find them on YouTube. And I
absolutely encourage you to look for the
-
Bahnmining, Spiegelmining and the Xerox
story talks. OK, so if you have questions,
-
then please ask, I will be there live
during the Q&A of this conference and also
-
you can send me an email with whatever you
like, essentially. OK, so first of all, I
-
would like to give a quick introduction to
the data source. Now, the data, the
-
parking data from Marburg is publicly,
well it's published live on a system that
-
is implemented by the city, by the city
council, I believe . It's called
-
Parkleitsystem Marburg or PLS for now, and
it publishes the data such as the parking
-
decks, the number of free parking spots
and the location. The address here is
-
pls.marburg.de. And let's see how it
looks. Yeah, so obviously it's still
-
online and you can see here the parking
deck names listed, the number of free
-
parking spots. Color coded is if it is
rather full or if it's rather empty, you
-
can see here all of them are in the green.
The green color coding here, it's
-
because it's probably close to Christmas.
Nobody wants to really park in the city.
-
And the only one that's this one here, the
Marktdreieck Parkdeck that it has some
-
load to it. Then also there's a button
called route. So whenever you click on the
-
on this button, say we we pick the
Erlenring-Center button, we are redirected
-
to Google Maps and we can see here the
location of this parking deck, for
-
example. Let's go back. Last but not
least, there's also the maximum vehicle
-
allowance and of course, the time stamp of
the data. OK, back to the presentation
-
now. This is a very simple website, so of
course it's easy to scrape and that's what
-
I did. Regarding the scraper, I used a
Linux computer and a docker container. And
-
this scraper, you can see a small sketch
here to the left, it simply visits the
-
website every 3 minutes inside the docker
container and writes the data into I
-
believe it was csv files, which are
subsequently used for the data analysis.
-
All of it, the scraper and the analysis
scripts are written in Python. OK, the
-
data format is pretty simple, it's
processed internally with data frames,
-
with the package panda. Everybody who
knows Python probably knows panda, anyway.
-
It's the data format is as follows. The
row corresponds to the time. The column
-
corresponds to the specific parking deck,
and the cell corresponds to the number of
-
free parking spots at that time of that
parking deck. Now, in order to make the
-
numbers a bit more usable, I transformed
the number of free parking spots to the
-
number of used parking spots by
subtracting it from the maximum along the
-
time. OK, now the intro is just to get
used to the data, we'd like to take a look
-
at the locations of the of the park houses
or the park decks. This is a screenshot.
-
There's an interactive version. Let me
open it here. It's a interactive map. You
-
can see two types of markers, the first
one red, the second one green, and that's
-
because the red ones are the ones that are
given, well they are encoded in the links
-
of the PLS system, and they
are actually wrong. So when you click on
-
the for instance. Erlenring-Center parking
deck that I've done before, the location,
-
longitude and latitude are actually
incorrect and, um, Google Maps corrected
-
on the fly. And therefore, I have shown
here the ones given on the website that
-
are incorrect in red and the ones shown
that are correct. So you can safely focus
-
only on the green ones. Um, a quick
overview here is the train station region,
-
there are two. And then they are scattered
around the city. Um, sometimes there are
-
two parking decks very close by, for
instance, these two and these two. And
-
that's because it's essentially one
parking deck with two parking sections
-
typically inside the building and on top
of the building. OK, let's go back to the
-
presentation. With that in place, we or we
take a look at the joined data, meaning I
-
accumulate the number of used parking
spots across all the parking decks. You
-
can see that here now, so it's a quite
comprehensive picture, I started data
-
scraping in August 2019 and stopped it at
the end of February 2020.
-
This data here is a different resample
frequency of the original and raw data. I
-
started with a resample of one hour. So
just a reminder, it's the true frequency
-
is three minutes. Again, I resampled here
into one hour. It's not very easy to
-
understand on that scale here. Then to one
day it's the orange now and lastly on one
-
week and we can learn different things
from it. So in particular, the orange
-
curve of one day shows that there might be
some periodicity in the signal. And the
-
green one shows that there are times or
weeks that are particularly... where
-
there's particularly little parking
demand, for instance, here around
-
Christmas 2019. OK, so again, from the
orange signal, you can see that there's
-
probably some periodicity, and in order to
quantify that, I plotted the or computed
-
the auto correlation function. The auto
correlation function essentially takes a
-
time signal and computes the overlap
between the time signal and the same
-
signal shifted by some time and whenever
there's a large overlap. That points
-
towards the periodicity, and here we can
see that the periodicity maximum or the
-
auto correlation maximum, the first one
corresponds to one week and therefore the
-
periodicity can be safely assumed to be at
seven days. Of course, when there's
-
periodicity and a signal at seven days,
for instance, there's also periodicity. In
-
14 days and in 21 days, but the
correlation coefficients, they decay
-
typically. OK, now we have the periodicity
with respect to days in place. Now let's
-
take a look at the day and hour demand.
And for that, I computed a two dimensional
-
histogram with the day Monday to Sunday on
the one axis and the other axis
-
corresponds to the hour. And here we can
clearly see that the majority of the
-
parking demand is around the noon hour. So
starting from 11 to to approximately,
-
let's say, 5 p.m. or so. Interestingly.
That was a point where I was surprised is
-
that Sunday's is a day where there's
little parking demand in Marburg, I
-
wouldv'e guesstimated that Sunday when
everybody has spare time, they typically
-
rush into the city. But that's obviously
not the case. Another interesting fact is
-
that Monday morning seemed to be very
difficult to get up because you can see
-
the parking demand is smaller than on on
other mornings. OK, now, after that, I
-
come to the separated... separate and
analysis where I take a look at the
-
individual parking decks. So first of all,
again, the times series, it's it's a bit
-
dense and it's very hard to see. So there
are a few things to learn from the
-
picture. So first of all, the green
signal that corresponds to the Erlenring-
-
Center. Reminder, I just opened it. In the
very beginning of this talk seems to be
-
the dominant one, then there are quite a
few data gaps. So take for instance. Well,
-
it's very apparent here for the violet
one, the Furthstraße-Parkdeck, this one
-
here. And that's an extreme case. It had
obviously some kind of problem. It was
-
open for some time and then closed for
some other times. Typically, park houses
-
or parking decks are either open 24/7, but
there are also quite a few that are that
-
close overnight. OK, next I was interested
in the statistics of parking demand for
-
individual parking decks, so I
concentrated only on, say, one parking
-
deck and computed the histograms of the
used parking spots also, depending on the
-
time. Let's focus here on the Oberstadt,
it's the old town and you can see that the
-
overall parking demand peaks at around,
let's say, maybe 20 used parking spots, so
-
that's the average, but that's not for all
times when we make that statement,
-
depending on the time, for instance, the
morning we can see that's approximately
-
the same. But when we go towards noon, we
can see that the number of parking spots
-
or used parking spots increases. There are
even a few times when it's at the maximum
-
around noon. Now, when we go towards later
hours, the maximum shifts towards smaller
-
values again. Now, this this behavior of
the maximum shifting, so clearly,
-
depending on the hour, is not not apparent
for all the parking decks. For instance,
-
the Parkdreieck here ... Marktdreieck,
sorry, that doesn't show the signal as
-
clear as the Oberstadt one. OK, from this
all now we can quantify also the, I call
-
it integral parking demand, simply it's
the the number of parking spots that have
-
been provided per parking deck. Now the
picture here, it's normalized to the
-
maximum and one can see from this picture
here very easily that the Erlenring-
-
Center, as we've estimated or guessed
previously already is the one that's
-
dominating the whole city. It's providing
the most parking spots by a large margin,
-
actually. The next one is the Lahn-Center
and then maybe the Oberstadt and the other
-
ones follow after these. Another
interesting point here is that the
-
proportion of parking spots provided on
weekends differs for the different parking
-
decks. For instance, here you can see this
one here is quite a big portion, the
-
Erlenring-Center, also on weekends.
Contrary, the Marktdreieck-Parkdeck has
-
only a very small portion of, um, of
parking spots provided on weekends. It
-
might be interesting to know that this
particular parking station is ... it's the
-
one that is used if you want to go to a
doctor, because it's very close. So many
-
doctors are not open on Sundays, on
Saturdays, and therefore probably the
-
parking demand is quite low. Now, there's
a temporal version also where I rendered a
-
small video that I'm opening now, and you
can see essentially the same as in the
-
previous graph, but against time. Again,
it's very apparent that there's a
-
periodicity and here my scraper crashed
and it's back in business again, and I
-
found it interesting to see that there are
parking decks that have cars... well that
-
host cars, even at night, for instance,
here the Erlenring-Center again in the
-
Lahn-Center, the ones that are the largest
one, they offer parking also overnight.
-
And there are some cars in there,
probably. OK, let's close that again. Now,
-
I come lastly to the prediction part now.
The goal here is to measure the parking
-
demand through the parking decks, but then
to interpolate between the parking decks,
-
so I would like... so I have ...say the
Oberstadt the old town and the, I don't
-
know, the Erlenring, which was the largest
one. I would like to know what's the
-
parking demand in between, for instance.
For doing so, I use a spatial fit and I
-
use a machine learning model for that, in
order to do that spatial fit. It is now,
-
in this particular case, a non parametric
model called Gaussian Process Regression.
-
And the nice thing about that is that it
also returns the uncertainty. Because say,
-
for instance, you would like to use these
model, machine learning predictions to
-
say, build some kind of parking deck or to
get rid of one. All these operations, all
-
these derived actions would be very
expensive. So you would like to know if
-
the uncertainty is large or small for
whatever the machine learning model
-
predicts. Just for the math oriented
people. If you're interested in that
-
model, definitely take a look at the, I
would call it, Gaussian process bible by
-
Rasmussen. It's amazing to read. Yeah,
there are two, um, evaluations now, I did.
-
The first one is based on the whole data
set, so there's no spatial or..sorry...
-
there's no temporal resolution. And what I
do, I did well, I rrendered a video and I
-
would like to explain you the outcome of
that while it is running. The top picture
-
here shows you the prediction by the
machine learning model. And the the bottom
-
picture shows you the uncertainty. The
training data, meaning the parking decks,
-
is denoted by the black points. Now, first
of all, the uncertainty, you can see that
-
wherever there is training data, the
uncertainty goes down. So the model is,
-
um, certain about its prediction that
because, well, there's training data and
-
in between the uncertainty rises again.
Now the prediction, you can see some small
-
hill. It's exactly the Erlenring-Center,
which was the largest one. Now, what is
-
shown in the video is it's rotating. You
can see the coordinates of Marburg on the
-
on the plane, on the bottom plane. And at
some point, the view rotates upwards and
-
gives you a top down perspective with a
corresponding color bars or corresponding
-
color map. So, again, here's the the
maximum, the Erlenring-Center. And I did
-
that because next we would like to finally
measure the parking demand between
-
stations. OK, there's another small video
again, and now we start right from the top
-
down, color coded view and again, the
black points are the... is the training
-
data, but now the red points are, is kind
of test data, meaning positions in
-
between. I concentrated now on the Mensa
because I have a special relation with the
-
Mensa, the physics department, the
university library, the train station and
-
the cinema. And just to demonstrate from
this spatial fit, we can derive the
-
parking demand at these positions also.
Here, this yellow pike, it's the
-
Erlenring-Center again. Now, that's only a
qualitative result, of course, I don't
-
want to derive any quantitative at this
point, it's just a proof of concept that
-
it is possible to derive something like
that from the publicly available data.
-
Now I forgot to mention the beginning that
there's a bonus and I would like to come
-
to the bonus now. It is about the Corona
crisis or pandemic, of course. What I did
-
is, the initial data acquisition phase,
here in black, that's the whole talk was
-
about that black portion here. I stopped
it at around the end of February and I
-
restarted the whole data acquisition
process now again at in approximately
-
April. Just to capture something from the
Corona crisis as well. And you can see
-
here again, the time series. I think the
most interesting bit about it and the most
-
comprehensive bit is the the mean . You
can see the the mean across the whole time
-
denoted by this dashed line. And you can
see that the mean is smaller. So during
-
the Corona pandemic fewer people parked in
Marburg, which is reasonable, I would say.
-
But there are also times where the number
of parking spots decreased significantly.
-
So for instance, right when the Corona
crisis started in April and now the second
-
wave in October, November, December, it is
visible that the parking demand decreased
-
a lot. And I went one step further and
wanted to know the the differences between
-
pre Corona and during Corona also for each
of the parking decks, that's what I did
-
here. It's now not the normalized parking
demand, but the absolute parking demand.
-
So now we can see also the absolute
numbers, the black black bars you've seen
-
previously already. Now the red bars is
for the during the Corona crisis. And then
-
I defined these, the first wave and the
second wave as serious corona times. So I
-
also plotted a third bar... set of bars
here. And it's interesting to see that
-
while most of the parking decks, of
course, suffered in terms of providing
-
parking demands or most of them provided
fewer parking decks, parking spots. But
-
there are a few, like, for instance, the
Marktdreieck-Parkdeck here that, well,
-
almost increased. We can see during the
corona in general it increased a bit. And
-
then during the heavy corona, it increased
even more. And as I mentioned before, this
-
is the parking deck that corresponds to,
yeah, a whole collection of doctors. So. I
-
derive that well during Corona times the
parking demand in front of doctors even
-
increased a tiny bit. Yeah, with that, I
would like to come to my conclusions.
-
Thank you for sticking with me until now.
So I scraped publicly available data here
-
with a small scraper set up. I analyzed
it, for instance, for day and hour
-
patterns. And last but not least, did some
machine learning in order to quantify the
-
demand in between the stations, there is
an accompanying blog article also. You can
-
find it down here, there all the figures
in higher resolution and you can play
-
around with an interactive map also, if
you like. Um, and to finally now conclude
-
the presentation. I would like to hear
from you what you think about this
-
analysis. I'd like to improve with these
kind of mini studies. And therefore, I
-
would be very interested in your critique
regarding the content, the presentation
-
and general content... comments. Again,
you can email me to this email address
-
here, or alternatively, I set up a Google,
um, Google form. So the Google forms
-
document that exactly comprised of these
questions, and you can simply type them in
-
if you're interested. Thank you very much.
-
Herald: All right, first of all, thank you
for this amazing talk, I have a few
-
questions what have been relayed to me and
I'm just going to ask them one after the
-
other. And let's not waste any time and
start with the first one. Have you found
-
parking decks that are usually heavily
overloaded or never completely used?
-
Martin: Um so. Given that there are only
around what was it, 8 or 9 or 10 in the
-
data set, honestly, I never looked for for
that question. So, um, short answers is:
-
No. Long answer, yes, I could have or I
still could, I would say.
-
H: OK. Have you tried prediction in time,
so guessing which parking decks will be
-
exhausted soon?
M: No, no. So that's obviously it's
-
like... it's... I would consider that
something like the predictive maintenance
-
of traffic business kind of. It's
definitely a thing that people that have
-
more time and more are willing to invest
more definitely should do and could do. I
-
would say I mean, there's lots of lots of
additional data that might be of interest,
-
like weather data. And, for instance, is
it a is it a public holiday, yes or no and
-
all that kind of stuff. So, again, short
answer.: No. Long answer. Yes. Would be
-
possible.
H: OK, so if anyone watching has the time
-
or energy to do that, they could.
M: Absolutely. Yes.
-
H: OK, and the last question I have right
now is, will the code or especially the
-
scraping part be available publicly or
like in the GitHub or somewhere?
-
H: Um, I could do that. So I was very I
was quite hesitant with it. So obviously
-
publishing the data could be problematic.
I have no experience with it on the legal
-
side. So I would probably not publish the
data, which is I mean, it's old data
-
anyway. So and but then regarding the
code, I was just waiting if anybody's
-
interested. So given that somebody stated
the interest, I would probably publish it.
-
Yes.
H: OK, yeah I think that's it from the
-
question side .
M: Hmhm.
-
H: And they were all answered quite
nicely. And judging by that, I don't get
-
any more questions right now. So, yeah, I
would conclude is talk. Maybe you can also
-
like have a last word. From my side I'm
done here.
-
M: Yes. So, um, well, thank you very much
for watching the talk. And I try to
-
improve. I think I said it on the last
slide. If I'm right, let me know if you
-
have any doubts or things to improve
essentially on. And then regarding maybe
-
the last question of publishing it, I
believe that I put a link there to find my
-
blog and I would probably just add another
blog post stating well there's an github
-
repository. You can go there and just find
just find the code and stuff like that
-
there. So if you're interested, just, you
know, find my website. My name is Martin
-
Lellep. Um, and then you will in a few
days, I guess probably in 2021 only. So I
-
won't be able to publish it in the next
two days. But then the code will be
-
public. Yes.
H: OK, then. Have a great day. Great time
-
at Congress and byebye.
-
postroll music
-
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!