36c3 preroll music Herald: So, hey, we're finally ready to start, we have Volker Krause here with a privacy by design travel assistant and it's going to be about building Open Source travel assistants, I think, and this talk will be in English. And if you want translations, wenn ihr eine deutsche Übersetzung haben wollt, haben wir hier hinten auch ganz tolle Übersetzer in unserer Kabine, da könnt ihr auf c3lingo.org mal reinhören, wie die alles live mitreden. Genau. Now. Let's have a warm welcome for Volker here and have fun with his talk. Applause Volker Krause: Thank you. OK, so what is this about? You probably know those features in, most prominently Google Mail, but I think TripIt was the one that pioneered this. So GMail reads your email and then detects any kind of booking information in there, like your boarding passes, your train tickets, your hotel bookings and so on. And it can integrate that into your calendar and can present you a unified itinerary for your entire trip and monitor that for changes. And all of that doesn't cost you anything. Maybe apart from a bit of your privacy. Well, not too bad, you might think. But if you look at what kind of data is actually involved in just your travel. Right. The obvious things that come to mind, your name, your birthday, your credit card number, your passport number, that kind of information. Right. But that isn't even the worst part on this, because those operators don't just get to see your specific data for one trip, right? They get to see every… everyone's trip. And now if you combine that information, that actually uncovers a lot of information about... relations between people, your interests, who you work for, where you live and all of that. Right. So pretty much everyone here traveled to Leipzig for the last four days in the year. If that happens for two of us, once, right, that might be coincidence. If that happens two or three years in a row, that is some kind of information. But yeah, what to do about that, right? The easy solution is, just not use those services. It's like first world luxury stuff anyway. That works until you end up in a foreign country where you don't speak any of the local languages and then get introduced to their counterpart of Schienenersatzverkehr or Tarifzonenrandgebiet. And at that point, you might be interested in actually understanding what's happening on your your trip in in some form that you actually understand and that you are familiar with, ideally without installing 15 different vendor applications for whatever you actually might be traveling, right? So we need something better. And that obviously leads us to, let's do it ourselves. Then, we can at least design this for privacy right from the start. Build it on top of Free Software and Open Data. Well, of course, we need to... at least it's not entirely obvious that this will actually work, right? The Google and Apple, they have a total different amount of resources available for this. So, can we actually build this ourselves? So let's have a look at what those services actually need to function. And it turns out it's primarily about data, not so much about code. There are some difficult parts in terms of code involved as well, like the image processing and a PDF to detect a barcode in your boarding pass. But all of that exists as ready-made building blocks. So you basically just need to put this nicely together. So let's look at the data. That's the more interesting part. And in general, that breaks down to three different categories. The first one is what I call personal data here. So that's basically booking information, documents or tickets, boarding passes, specific for you. So there at least you don't have a problem with access because that is sent to you and you need to have access to that. But it comes in all kinds of forms and shapes. So there are the challenges to actually extract that . The second kind of data is what I would call static data. So, for example, the location of an airport. Now, you could argue that that could change and there is rumors that some people apparently managed to build new airports. I live in Berlin, so I don't believe this. Jokes aside, so, "static" refers to within, static within the release cycle of the software. So several weeks or a few months. So this is stuff that we can ship as offline databases. And offline, of course, helps us with privacy because then you're not observable from the outside. And the third category is dynamic data. So stuff that is very, very short lived, such as delay information. There is no way we can do that offline. If we want that kind of information, we will always need some kind of online querying. Then let's look through those three categories in a bit more detail. For the booking data, google was faced with the same problem, so they used their monopoly and defined a standard in which operators should ideally have machine readable annotations on their booking information. And that's awesome, because we can just use the same, the same system. That's what nowadays became schema.org, which I think Lucas mentioned in the morning as well. At least in the US and Europe, you'll find that in about 30 to 50% of booking emails you get from hotels, airlines or event brokers. So that's a good start. But then there's the rest, which is basically unstructured data, random PDF files or HTML emails we have to work with. There's Apple wallet boarding passes. They are somewhat semi structured and most widespread for flight tickets. Well, that's somewhat usable. And barcodes, so that's what you, again, see on boarding passes or train tickets. I could probably fill an entire talk just with the various details on the different barcode systems, the one for boarding passes, I think, Karsten Nohl had to talk at Congress a few years back, where he showed how they work and what you can do with them. Instagram #boardingpass is a very nice source of test data. The one that you find on, on German railway tickets is also pretty much researched already. The ones we actually had to break ourselves were the one for Italy. I think to my knowledge, we are the first ones to publish the content of those binary barcodes. And we are currently working on the VDV Kernapplikation E-Ticket, which is the standard for German local transportation tickets. That actually has some crypto that you need to get around to actually see the content. So there is, if you're interested in that kind of stuff, there is quite some interesting detail to be found in this. But let's continue with the static data. There, of course, we have Wikidata. That has almost everything we need. And we are making heavy use of that. And that's also why I'm here today on the Wikimedia stage. One thing that Wikidata doesn't do perfectly is timezone information. That's why we're using the open street map data for this. There's in Wikidata, three different time zone… or ways of specifying the time zone. UTC offsets, some kind, of coarse, human readable naming like Central European Summer Time, and then the actual IANA time zone specifications like Europe/Berlin. And that's the one we actually need because they contain daylight saving time transitions. And that is actually crucial for travel assistance, because you can have a flight from, say, the US to Europe, at the night where there is daylight saving time transition on one end. And if we get that wrong, right, we are off by one hour. And that could mean you miss your flight. So that we need to get absolutely right. And Wikidata there mixes the three timezone variations. So that's why we fall back to OpenStreetMap there. Another area that still needs work is vendor specific station identifiers. So there's a number of train companies that have their own numeric identifier, or alphanumeric identifiers, which you find, for example, in barcodes of tickets. So that's our way to actually find out where people are traveling. So that's something we are trying to feed into Wikidata as we get our hands on those identifiers. For airports, that's easy because they are internationally standardized. For train stations, that's a bit more messy. And finally, the dynamic data. That's again, an area where we benefit from Google using their monopoly. They wanted to have local public transportation information in Google Maps. So they defined the GTFS format, which is a way for local transport operators to send their schedules to Google. But most of the time, that is done in a way that they basically publish this as Open Data. And that way, all of us get access to it. And then there's Navitia, which is a Free Software implementation of like a routing and journey query service that consumes all of those Open Data schedule information. And that then in turn, we can use again to, yeah, find our departure schedules, delays and that kind of live information. Apple Wallet also has some kind of live updating polling mechanism. But that is somewhat dangerous because it leaks personal identifiable information. So the, basically, a unique identifier for your pass is sent out with the API request to to pull an update. So that is basically just a last resort mechanism if you have nothing else. And then there's a bunch of vendor specific, more or less proprietary APIs that we could use. They are unfortunately not often compatible with Free Software and Open Source, because, they might require API keys that you're not allowed to share, or they have terms and conditions that are simply incompatible with what we are trying to do. So for some, this works, but there's still some room for improvement in those vendors' understanding the value of proper Open Data access. OK, so that's the theory, let's have a look at what we have actually built for this. So there's two, ya, backend components, so to say there is the extraction library that implements the schema.org data model for flights, for trains, for hotels, for restaurants and for events. It can do the structured data extraction. That might sound easy at first, but it turns out that for some of the operators, doing proper JSON array encoding is somewhat hard. So, I mean, you need to do a... need to have a comma in between two objects and brackets around it. Some of them struggle with that. So we have to have lots of workarounds in, in parsing the data we receive. Then we have an unstructured extraction system that's basically small scripts per provider or per operator that then, yeah, use regular expressions or XPATH queries depending on the input and turn that into our data model. We currently, I think, have 50, slightly more than 50 of those. I know that Apple has about 600, so that is still one order of magnitude more. But it's not impossible. Right. So I think we, we have the means there with Free Software to come to a similar result than people that have an Apple or Google scale budget for this. The service coverage is actually quite different. So, for Apple, I've seen their custom extractor. So they have a lot of like US car rental services. We have somewhat more important stuff like CCC tickets. So the Congress ticket is actually recognized and I managed to get in with the app. What the expection engine also does is it augments whatever we find in the input documents by information we have on Wikidata. So we usually have time zones, countries, geo coordinates, all that useful stuff for then offering assistance features on top. And input formats is basically everything I mentioned. The usual stuff you're getting in an email from a transport operator or any kind of booking document. The second piece on like, on backend components is the public transportation library. That's basically our client API for Navitia mainly, but also for some of the proprietary widespread backends like HAFAS. That's the stuff Deutsche Bahn is using. And it can aggregate the results from multiple backends. And if you're using Open Data in the backend - interference noise - it propagates the attribution information correctly. So. And just a few days ago, it also gained support for querying train and platform layouts or "Wagenstandsanzeiger" in German so we can have all of that in the app. And now of course there's the KDE Itinerary app itself. So it has, oh… it's very hard to read here. It's basically a timeline with the various booking information you have grouped together by trip. It can insert the live weather information. Again, that's online access, so it's optional, but yeah, it's kind of useful. And this is… you probably can't read that. But that's my train to Leipzig this morning and that's actually the Congress entry ticket. And the box at the top is the collapsible group for my trip to Leipzig for Congress. And it can show the actual tickets and barcodes, including Apple Wallet passes. So, if you sometimes have a, like a manual inspection at an airport where they don't scan your boarding pass, but look at it, apparently that looks reasonable enough that you can board an aircraft with it. At least, I wasn't arrested so far. And then we have one of my favorite features, also powered by Wikidata. It's the power plug incompatibility warning. interference noise - So, I mean, if you're traveling to, say, the US, or UK, you're probably aware that they have like incompatible power plugs. But there are some countries where this isn't – at least to me, isn't that obvious, like Switzerland or Italy, where only half of my power plugs work. So this is the Italy example. It tells me that my Schuko plugs won't work, only my Europlugs and. interference noise - And the right one is, I think for the U.K., where nothing is compatible. If you occasionally forget your power plug convertor while traveling, that is super useful. And then, of course, we have the integration with real time data. So we can show the delay information and platform changes. The part in the middle is the alternative connection selection for trains. So if you have a, like a train ticket that isn't bound to a specific connection, right, then the app lets you pick the one you actually want to take. Or if you're missing a connection, you need to move to a different train, you can do that right in the app as well. The screenshot on the right hand side is the, like your overall travel statistics. So if you're interested in, like, seeing the carbon impact off of all your trips and the year over year changes, right, the app shows that to you. And I wasn't really successful, but that's largely because the old data is incomplete. So if you're interested in that, right, since we have all the data, that can help you see if you're actually on the right track there. And then to get data into that, we also have a plugin for email clients. This one is for for KMail. So it basically then runs the extraction on the email you're currently looking at and it shows you a summary of what's in there. In this case, my train to Leipzig this morning, including the option to add that to the calendar or send it to the app on the phone. We also have the browser extension. So this is the website of the yearly KDE conference, which has the schema.org annotations on it. And the browser extension recognizes that. And again, offers me to to add that either to my calendar or to the itinerary app. And that also works on many restaurant websites or event websites. They have those annotations on the website for the Google search. So again, we benefit a bit from the, Google incomprehensible. OK, then we get to the more experimental stuff that basically just was finished in the last couple of days, that we haven't shown anywhere else publicly yet. The first one is, and that's a bit better to read, at least, if you saw the timeline earlier, right, it had my train booking to Leipzig and then the Congress ticket. But that still leaves two gaps, right. I need to get from home to the station in Berlin, and I need to get from the station in Leipzig to Congress. And what we have now is a way for the app to automatically recognize those gaps and fill them with suggestions on what kind of local transport you could take. So here the one for Leipzig to Congress is expanded and shows the tram. That still needs some work to do live tracking so that it accounts for delays and changes your alarm clock in the morning if there's delays on that trip. But we have all the building blocks to make the whole thing much more smart in this area now. And that, I think was literally done yesterday. So that's why the graphics still are very basic. That's the train layout, coach layout display for your trip. So that you know where your reserved seat on the train can actually be found. Then, I only showed the KMail plugin so far. We also have a work-in- progress Thunderbird integration, which is probably the much more widespread email client. Featurewise, more or less the same I showed for KMail, so it scans the email and displays your summary and offers you to put that into the app or, possibly later on also into the calendar. This one is even more experimental. I can only show you a screenshot of Web Inspector proving that it managed to extract something. That's the integration with Nextcloud. I hope we'll have an actual working prototype for this in January then. Those two things are, of course, important for you to even get to the data, the booking data, that then the app or other tools you built on top can consume. OK, so where to get this from? There's the wiki link up there. The app is currently not yet in the Play Store or in the F-Droid master repository. We have an F-Droid nightly build repository. I hope that within the next month we'll get actual official releases in the easier to reach stores than what we have right now. If you are interested in helping with that, there's some stuff in Wikidata where improvement on the data directly benefits this work, and that is specifically around train stations. I think in Germany, last time I checked, we still had a few hundred train stations that didn't have geo coordinates or even a human readable label. So that's something to look at. Vendor-specific or even the more or less standard train station identifiers is something to look at. So UIC or IBNR codes for train stations, that helps a lot. Yeah. And then, we kind of need test data for the extractions. So, forget everything I said about privacy. If you have any kind of booking documents or emails you want to donate to support this and get the providers you're using supported in in the extraction engine, talk to me. That would be extremely useful. Yeah, that's it. Thank you. Applause Herald: Hello, hello? Yeah. That's a very impressive project, I think, do we have questions then I'll hand you my microphone. Yes. Q: Would it be possible to extract platform lift data for train stations? A: Sorry? Platform…. Q: Platform lift data. A: Oh, I think Deutsche Bahn has an Open Data API for the live status of lifts. That would, of course, in theory be possible. What we are trying to do is to be generic enough so that this might not be applicable in just one country, although it is very European focused because most of the team is there. But lifts is something that is easy enough to generalize in a data model, right? Its location on the platform, and, are they working or not? So, yeah, that that would be a nice addition. That goes into the entire direction of, ya, indoor navigation or navigation around larger train stations and airports. So that's probably something where we could use a better overall display with the OpenStreetMap data and then augment that with, like the, where exactly is your train stopping and in which coach is your seat, and then have the lift data so we can basically guide you to the right place in a better way. Yeah. Herald: Any more questions? Yes. Q: It's the mobile app written in Qt as well? A: Yes, most of this is C++ code, because that's what we use at KDE. The mobile client as well. There's a bit of Java for platform integration with android. I don't think anyone has ever tried to build it on iOS, but of course it works on Linux based mobile platforms as well, thanks to Qt and C++, yeah. Q: So you mostly talked about the mobile app so far, which is understandable, but as it's a QML application does it also run on desktop? And, a second question, how do, how do all the plugins and the different instances of the app share their data? A: So, yes, the app runs on desktop. I was trying to see if I can actually start it here. I'm not sure on which screen it will end up. That's where we do most of the development. Let me see if I can move it over. Oh, thank you. And I need to find my mouse cursor on the two screens. Uh. I think I need to end the presentation first, but, yeah, short answer, of course. There we go. And let me switch to… to… yeah, so that's it, running on desktop. It has a mobile UI there. That could, of course, be extended to be more useful on the desktop as well. And in terms of storage, that is currently internal to the app, there is no second process accessing the actual data storage. That would just unnecessarily complicate it for now. But if there is a use for that, yeah, we'll need to see. Q: Yeah, but, but, but there was an option, in the e-mail plugin, for example, to send it to the app. Can I then only send it to my local app and not to the mobile app? A: Oh, the central app, that's using KDE Connect. That's an integration software that allows you to remote control your phone from the desktop. So that's basically bundling up all the information and sends it to the app on the phone. And… or it can import it locally, so. Herald: OK, do we have other questions? No, we don't have time? So then, thank you very much, Volker, maybe you can tell people where they can find you if they have anything more they want to talk about. But…. A: Yeah, I mean, there's my email address and otherwise I'll be around all day, all four days. Herald: Around where? Volker Krause: Probably somewhere. So it just is a bit tricky. Herald: …catch him before he runs away, then! All right. So give a round of applause again and thank you, Volker! Applause postroll music Subtitles created by c3subtitles.de in the year 2021. Join, and help us!