hacc preroll music Herald: And a lovely welcome back to the haccs stage on the third day this Congress, we are here with a talk on "A few quantitive thoughts on parking in Marburg" by Martin L. He's interested in data analytics now and infrastructure and traffic in general. And because of that, he started scraping publicly available parking data in Marburg and just went on and analyzed it and found a lot of interesting things which he is going to present in this talk to you right now. In case you didn't know, there is IRC client on the live.hacc.media where you can ask questions later or with the #rC3hacc tag on Twitter. Martin Lellep: Welcome to my talk "A few quantitative thoughts on parking in Marburg". I am delighted to speak here on this Congress because I love the yearly conferences. Also, thank you to the organizing team for making all this possible. You do an absolutely fabulous job. Now, the first question that you should ask is: why? The following is a purely hobby project question, I came up with a question because transportation is important, but unfortunately, it's also difficult. The most popular vehicles these days are cars and hence the question, how do people park in Marburg? Who am I? My name is Martin, and I analyze publicly available data. I live close to Marburg, therefore the parking in Marburg. Now, a little bit of background regarding Marburg, it's a small picturesque, vibrant university town. There are a few highlights, such as the castle, the old town and the river, just to name a few. It has around 80,000 residents and a somewhat dense core around the old town. You can see a few pictures here of the castle, the old town and the river, respectively. Now, at this point, I would like to give my props to David Kriesel because all this work was inspired by his amazing data science talks. You can find them on YouTube. And I absolutely encourage you to look for the Bahnmining, Spiegelmining and the Xerox story talks. OK, so if you have questions, then please ask, I will be there live during the Q&A of this conference and also you can send me an email with whatever you like, essentially. OK, so first of all, I would like to give a quick introduction to the data source. Now, the data, the parking data from Marburg is publicly, well it's published live on a system that is implemented by the city, by the city council, I believe . It's called Parkleitsystem Marburg or PLS for now, and it publishes the data such as the parking decks, the number of free parking spots and the location. The address here is pls.marburg.de. And let's see how it looks. Yeah, so obviously it's still online and you can see here the parking deck names listed, the number of free parking spots. Color coded is if it is rather full or if it's rather empty, you can see here all of them are in the green. The green color coding here, it's because it's probably close to Christmas. Nobody wants to really park in the city. And the only one that's this one here, the Marktdreieck Parkdeck that it has some load to it. Then also there's a button called route. So whenever you click on the on this button, say we we pick the Erlenring-Center button, we are redirected to Google Maps and we can see here the location of this parking deck, for example. Let's go back. Last but not least, there's also the maximum vehicle allowance and of course, the time stamp of the data. OK, back to the presentation now. This is a very simple website, so of course it's easy to scrape and that's what I did. Regarding the scraper, I used a Linux computer and a docker container. And this scraper, you can see a small sketch here to the left, it simply visits the website every 3 minutes inside the docker container and writes the data into I believe it was csv files, which are subsequently used for the data analysis. All of it, the scraper and the analysis scripts are written in Python. OK, the data format is pretty simple, it's processed internally with data frames, with the package panda. Everybody who knows Python probably knows panda, anyway. It's the data format is as follows. The row corresponds to the time. The column corresponds to the specific parking deck, and the cell corresponds to the number of free parking spots at that time of that parking deck. Now, in order to make the numbers a bit more usable, I transformed the number of free parking spots to the number of used parking spots by subtracting it from the maximum along the time. OK, now the intro is just to get used to the data, we'd like to take a look at the locations of the of the park houses or the park decks. This is a screenshot. There's an interactive version. Let me open it here. It's a interactive map. You can see two types of markers, the first one red, the second one green, and that's because the red ones are the ones that are given, well they are encoded in the links of the PLS system, and they are actually wrong. So when you click on the for instance. Erlenring-Center parking deck that I've done before, the location, longitude and latitude are actually incorrect and, um, Google Maps corrected on the fly. And therefore, I have shown here the ones given on the website that are incorrect in red and the ones shown that are correct. So you can safely focus only on the green ones. Um, a quick overview here is the train station region, there are two. And then they are scattered around the city. Um, sometimes there are two parking decks very close by, for instance, these two and these two. And that's because it's essentially one parking deck with two parking sections typically inside the building and on top of the building. OK, let's go back to the presentation. With that in place, we or we take a look at the joined data, meaning I accumulate the number of used parking spots across all the parking decks. You can see that here now, so it's a quite comprehensive picture, I started data scraping in August 2019 and stopped it at the end of February 2020. This data here is a different resample frequency of the original and raw data. I started with a resample of one hour. So just a reminder, it's the true frequency is three minutes. Again, I resampled here into one hour. It's not very easy to understand on that scale here. Then to one day it's the orange now and lastly on one week and we can learn different things from it. So in particular, the orange curve of one day shows that there might be some periodicity in the signal. And the green one shows that there are times or weeks that are particularly... where there's particularly little parking demand, for instance, here around Christmas 2019. OK, so again, from the orange signal, you can see that there's probably some periodicity, and in order to quantify that, I plotted the or computed the auto correlation function. The auto correlation function essentially takes a time signal and computes the overlap between the time signal and the same signal shifted by some time and whenever there's a large overlap. That points towards the periodicity, and here we can see that the periodicity maximum or the auto correlation maximum, the first one corresponds to one week and therefore the periodicity can be safely assumed to be at seven days. Of course, when there's periodicity and a signal at seven days, for instance, there's also periodicity. In 14 days and in 21 days, but the correlation coefficients, they decay typically. OK, now we have the periodicity with respect to days in place. Now let's take a look at the day and hour demand. And for that, I computed a two dimensional histogram with the day Monday to Sunday on the one axis and the other axis corresponds to the hour. And here we can clearly see that the majority of the parking demand is around the noon hour. So starting from 11 to to approximately, let's say, 5 p.m. or so. Interestingly. That was a point where I was surprised is that Sunday's is a day where there's little parking demand in Marburg, I wouldv'e guesstimated that Sunday when everybody has spare time, they typically rush into the city. But that's obviously not the case. Another interesting fact is that Monday morning seemed to be very difficult to get up because you can see the parking demand is smaller than on on other mornings. OK, now, after that, I come to the separated... separate and analysis where I take a look at the individual parking decks. So first of all, again, the times series, it's it's a bit dense and it's very hard to see. So there are a few things to learn from the picture. So first of all, the green signal that corresponds to the Erlenring- Center. Reminder, I just opened it. In the very beginning of this talk seems to be the dominant one, then there are quite a few data gaps. So take for instance. Well, it's very apparent here for the violet one, the Furthstraße-Parkdeck, this one here. And that's an extreme case. It had obviously some kind of problem. It was open for some time and then closed for some other times. Typically, park houses or parking decks are either open 24/7, but there are also quite a few that are that close overnight. OK, next I was interested in the statistics of parking demand for individual parking decks, so I concentrated only on, say, one parking deck and computed the histograms of the used parking spots also, depending on the time. Let's focus here on the Oberstadt, it's the old town and you can see that the overall parking demand peaks at around, let's say, maybe 20 used parking spots, so that's the average, but that's not for all times when we make that statement, depending on the time, for instance, the morning we can see that's approximately the same. But when we go towards noon, we can see that the number of parking spots or used parking spots increases. There are even a few times when it's at the maximum around noon. Now, when we go towards later hours, the maximum shifts towards smaller values again. Now, this this behavior of the maximum shifting, so clearly, depending on the hour, is not not apparent for all the parking decks. For instance, the Parkdreieck here ... Marktdreieck, sorry, that doesn't show the signal as clear as the Oberstadt one. OK, from this all now we can quantify also the, I call it integral parking demand, simply it's the the number of parking spots that have been provided per parking deck. Now the picture here, it's normalized to the maximum and one can see from this picture here very easily that the Erlenring- Center, as we've estimated or guessed previously already is the one that's dominating the whole city. It's providing the most parking spots by a large margin, actually. The next one is the Lahn-Center and then maybe the Oberstadt and the other ones follow after these. Another interesting point here is that the proportion of parking spots provided on weekends differs for the different parking decks. For instance, here you can see this one here is quite a big portion, the Erlenring-Center, also on weekends. Contrary, the Marktdreieck-Parkdeck has only a very small portion of, um, of parking spots provided on weekends. It might be interesting to know that this particular parking station is ... it's the one that is used if you want to go to a doctor, because it's very close. So many doctors are not open on Sundays, on Saturdays, and therefore probably the parking demand is quite low. Now, there's a temporal version also where I rendered a small video that I'm opening now, and you can see essentially the same as in the previous graph, but against time. Again, it's very apparent that there's a periodicity and here my scraper crashed and it's back in business again, and I found it interesting to see that there are parking decks that have cars... well that host cars, even at night, for instance, here the Erlenring-Center again in the Lahn-Center, the ones that are the largest one, they offer parking also overnight. And there are some cars in there, probably. OK, let's close that again. Now, I come lastly to the prediction part now. The goal here is to measure the parking demand through the parking decks, but then to interpolate between the parking decks, so I would like... so I have ...say the Oberstadt the old town and the, I don't know, the Erlenring, which was the largest one. I would like to know what's the parking demand in between, for instance. For doing so, I use a spatial fit and I use a machine learning model for that, in order to do that spatial fit. It is now, in this particular case, a non parametric model called Gaussian Process Regression. And the nice thing about that is that it also returns the uncertainty. Because say, for instance, you would like to use these model, machine learning predictions to say, build some kind of parking deck or to get rid of one. All these operations, all these derived actions would be very expensive. So you would like to know if the uncertainty is large or small for whatever the machine learning model predicts. Just for the math oriented people. If you're interested in that model, definitely take a look at the, I would call it, Gaussian process bible by Rasmussen. It's amazing to read. Yeah, there are two, um, evaluations now, I did. The first one is based on the whole data set, so there's no spatial or..sorry... there's no temporal resolution. And what I do, I did well, I rrendered a video and I would like to explain you the outcome of that while it is running. The top picture here shows you the prediction by the machine learning model. And the the bottom picture shows you the uncertainty. The training data, meaning the parking decks, is denoted by the black points. Now, first of all, the uncertainty, you can see that wherever there is training data, the uncertainty goes down. So the model is, um, certain about its prediction that because, well, there's training data and in between the uncertainty rises again. Now the prediction, you can see some small hill. It's exactly the Erlenring-Center, which was the largest one. Now, what is shown in the video is it's rotating. You can see the coordinates of Marburg on the on the plane, on the bottom plane. And at some point, the view rotates upwards and gives you a top down perspective with a corresponding color bars or corresponding color map. So, again, here's the the maximum, the Erlenring-Center. And I did that because next we would like to finally measure the parking demand between stations. OK, there's another small video again, and now we start right from the top down, color coded view and again, the black points are the... is the training data, but now the red points are, is kind of test data, meaning positions in between. I concentrated now on the Mensa because I have a special relation with the Mensa, the physics department, the university library, the train station and the cinema. And just to demonstrate from this spatial fit, we can derive the parking demand at these positions also. Here, this yellow pike, it's the Erlenring-Center again. Now, that's only a qualitative result, of course, I don't want to derive any quantitative at this point, it's just a proof of concept that it is possible to derive something like that from the publicly available data. Now I forgot to mention the beginning that there's a bonus and I would like to come to the bonus now. It is about the Corona crisis or pandemic, of course. What I did is, the initial data acquisition phase, here in black, that's the whole talk was about that black portion here. I stopped it at around the end of February and I restarted the whole data acquisition process now again at in approximately April. Just to capture something from the Corona crisis as well. And you can see here again, the time series. I think the most interesting bit about it and the most comprehensive bit is the the mean . You can see the the mean across the whole time denoted by this dashed line. And you can see that the mean is smaller. So during the Corona pandemic fewer people parked in Marburg, which is reasonable, I would say. But there are also times where the number of parking spots decreased significantly. So for instance, right when the Corona crisis started in April and now the second wave in October, November, December, it is visible that the parking demand decreased a lot. And I went one step further and wanted to know the the differences between pre Corona and during Corona also for each of the parking decks, that's what I did here. It's now not the normalized parking demand, but the absolute parking demand. So now we can see also the absolute numbers, the black black bars you've seen previously already. Now the red bars is for the during the Corona crisis. And then I defined these, the first wave and the second wave as serious corona times. So I also plotted a third bar... set of bars here. And it's interesting to see that while most of the parking decks, of course, suffered in terms of providing parking demands or most of them provided fewer parking decks, parking spots. But there are a few, like, for instance, the Marktdreieck-Parkdeck here that, well, almost increased. We can see during the corona in general it increased a bit. And then during the heavy corona, it increased even more. And as I mentioned before, this is the parking deck that corresponds to, yeah, a whole collection of doctors. So. I derive that well during Corona times the parking demand in front of doctors even increased a tiny bit. Yeah, with that, I would like to come to my conclusions. Thank you for sticking with me until now. So I scraped publicly available data here with a small scraper set up. I analyzed it, for instance, for day and hour patterns. And last but not least, did some machine learning in order to quantify the demand in between the stations, there is an accompanying blog article also. You can find it down here, there all the figures in higher resolution and you can play around with an interactive map also, if you like. Um, and to finally now conclude the presentation. I would like to hear from you what you think about this analysis. I'd like to improve with these kind of mini studies. And therefore, I would be very interested in your critique regarding the content, the presentation and general content... comments. Again, you can email me to this email address here, or alternatively, I set up a Google, um, Google form. So the Google forms document that exactly comprised of these questions, and you can simply type them in if you're interested. Thank you very much. Herald: All right, first of all, thank you for this amazing talk, I have a few questions what have been relayed to me and I'm just going to ask them one after the other. And let's not waste any time and start with the first one. Have you found parking decks that are usually heavily overloaded or never completely used? Martin: Um so. Given that there are only around what was it, 8 or 9 or 10 in the data set, honestly, I never looked for for that question. So, um, short answers is: No. Long answer, yes, I could have or I still could, I would say. H: OK. Have you tried prediction in time, so guessing which parking decks will be exhausted soon? M: No, no. So that's obviously it's like... it's... I would consider that something like the predictive maintenance of traffic business kind of. It's definitely a thing that people that have more time and more are willing to invest more definitely should do and could do. I would say I mean, there's lots of lots of additional data that might be of interest, like weather data. And, for instance, is it a is it a public holiday, yes or no and all that kind of stuff. So, again, short answer.: No. Long answer. Yes. Would be possible. H: OK, so if anyone watching has the time or energy to do that, they could. M: Absolutely. Yes. H: OK, and the last question I have right now is, will the code or especially the scraping part be available publicly or like in the GitHub or somewhere? H: Um, I could do that. So I was very I was quite hesitant with it. So obviously publishing the data could be problematic. I have no experience with it on the legal side. So I would probably not publish the data, which is I mean, it's old data anyway. So and but then regarding the code, I was just waiting if anybody's interested. So given that somebody stated the interest, I would probably publish it. Yes. H: OK, yeah I think that's it from the question side . M: Hmhm. H: And they were all answered quite nicely. And judging by that, I don't get any more questions right now. So, yeah, I would conclude is talk. Maybe you can also like have a last word. From my side I'm done here. M: Yes. So, um, well, thank you very much for watching the talk. And I try to improve. I think I said it on the last slide. If I'm right, let me know if you have any doubts or things to improve essentially on. And then regarding maybe the last question of publishing it, I believe that I put a link there to find my blog and I would probably just add another blog post stating well there's an github repository. You can go there and just find just find the code and stuff like that there. So if you're interested, just, you know, find my website. My name is Martin Lellep. Um, and then you will in a few days, I guess probably in 2021 only. So I won't be able to publish it in the next two days. But then the code will be public. Yes. H: OK, then. Have a great day. Great time at Congress and byebye. postroll music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!