-
36c3 preroll music
-
Herald: So, hey, we're finally ready to
start, we have Volker Krause here with a
-
privacy by design travel assistant and
it's going to be about building Open
-
Source travel assistants, I think, and
this talk will be in English. And if you
-
want translations, wenn ihr eine deutsche
Übersetzung haben wollt, haben wir hier
-
hinten auch ganz tolle Übersetzer in
unserer Kabine, da könnt ihr auf
-
c3lingo.org mal reinhören, wie die alles
live mitreden. Genau. Now. Let's have
-
a warm welcome for Volker here and have
fun with his talk.
-
Applause
-
Volker Krause: Thank you. OK, so what is
this about? You probably know those
-
features in, most prominently Google
Mail, but I think TripIt was the one that
-
pioneered this. So GMail reads your
email and then detects any kind of booking
-
information in there, like your boarding
passes, your train tickets, your hotel
-
bookings and so on. And it can integrate
that into your calendar and can present
-
you a unified itinerary for your entire
trip and monitor that for changes. And
-
all of that doesn't cost you anything.
Maybe apart from a bit of your privacy.
-
Well, not too bad, you might think. But if
you look at what kind of data is actually
-
involved in just your travel. Right.
The obvious things that come to
-
mind, your name, your birthday, your
credit card number, your passport number,
-
that kind of information. Right. But that
isn't even the worst part on this,
-
because those operators don't just get to
see your specific data for one trip,
-
right? They get to see every… everyone's
trip. And now if you combine that
-
information, that actually uncovers a lot
of information about... relations
-
between people, your interests, who
you work for, where you live and all of
-
that. Right. So pretty much everyone here
traveled to Leipzig for the last four days
-
in the year. If that happens for
two of us, once, right, that might be
-
coincidence. If that happens two or three
years in a row, that is some kind of
-
information. But yeah, what to do
about that, right? The easy solution
-
is, just not use those services. It's like
first world luxury stuff anyway. That
-
works until you end up in a foreign
country where you don't speak any of the
-
local languages and then get introduced to
their counterpart of Schienenersatzverkehr
-
or Tarifzonenrandgebiet. And at that
point, you might be interested in actually
-
understanding what's happening on your
your trip in in some form that you
-
actually understand and that you are
familiar with, ideally without installing
-
15 different vendor applications for
whatever you actually might be
-
traveling, right? So we need something
better. And that obviously leads us to,
-
let's do it ourselves. Then, we can at
least design this for privacy right from
-
the start. Build it on top of Free
Software and Open Data. Well, of course,
-
we need to... at least it's not entirely
obvious that this will actually work,
-
right? The Google and Apple, they have a
total different amount of resources
-
available for this. So, can we actually
build this ourselves? So let's have a look
-
at what those services actually need to
function. And it turns out it's primarily
-
about data, not so much about code.
There are some difficult parts
-
in terms of code involved as well, like
the image processing and a PDF to detect a
-
barcode in your boarding pass. But all of
that exists as ready-made building blocks.
-
So you basically just need to put this
nicely together. So let's look at the
-
data. That's the more interesting part.
And in general, that breaks down to
-
three different categories. The first one
is what I call personal data here. So
-
that's basically booking information,
documents or tickets, boarding passes,
-
specific for you. So there at least you
don't have a problem with access because
-
that is sent to you and you need to have
access to that. But it comes in all kinds
-
of forms and shapes. So there are the
challenges to actually extract that . The
-
second kind of data is what I would call
static data. So, for example, the location
-
of an airport. Now, you could argue that
that could change and there is rumors
-
that some people apparently managed to
build new airports. I live in Berlin, so I
-
don't believe this. Jokes aside, so,
"static" refers to within, static within
-
the release cycle of the software. So
several weeks or a few months. So this is
-
stuff that we can ship as offline
databases. And offline, of course, helps
-
us with privacy because then you're not
observable from the outside. And the third
-
category is dynamic data. So stuff that is
very, very short lived, such as delay
-
information. There is no way we can do
that offline. If we want that kind of
-
information, we will always need some kind
of online querying. Then let's look
-
through those three categories in a bit
more detail. For the booking data, google
-
was faced with the same problem, so they
used their monopoly and defined a standard
-
in which operators should ideally have
machine readable annotations on their
-
booking information. And that's awesome,
because we can just use the same, the same
-
system. That's what nowadays became
schema.org, which I think Lucas mentioned
-
in the morning as well. At least in
the US and Europe, you'll find
-
that in about 30 to 50% of booking emails
you get from hotels, airlines or event
-
brokers. So that's a good start. But then
there's the rest, which is basically
-
unstructured data, random PDF files or
HTML emails we have to work with. There's
-
Apple wallet boarding passes. They are
somewhat semi structured and most
-
widespread for flight tickets. Well,
that's somewhat usable. And barcodes, so
-
that's what you, again, see on boarding
passes or train tickets. I could
-
probably fill an entire talk just with the
various details on the different
-
barcode systems, the one for boarding
passes, I think, Karsten Nohl had to talk
-
at Congress a few years back, where he
showed how they work and what you can do
-
with them. Instagram #boardingpass is a
very nice source of test data. The one
-
that you find on, on German railway
tickets is also pretty much researched
-
already. The ones we actually had to
break ourselves were the one for Italy. I
-
think to my knowledge, we are the first
ones to publish the content of those
-
binary barcodes. And we are currently
working on the VDV Kernapplikation
-
E-Ticket, which is the standard for German
local transportation tickets. That
-
actually has some crypto that you need to
get around to actually see the content. So
-
there is, if you're interested in that
kind of stuff, there is quite some
-
interesting detail to be found in this.
But let's continue with the
-
static data. There, of course, we have
Wikidata. That has almost everything we
-
need. And we are making heavy use of that.
And that's also why I'm here today on the
-
Wikimedia stage. One thing that
Wikidata doesn't do perfectly is timezone
-
information. That's why we're using the
open street map data for this. There's in
-
Wikidata, three different time zone… or
ways of specifying the time zone. UTC
-
offsets, some kind, of coarse, human
readable naming like Central European
-
Summer Time, and then the actual IANA time
zone specifications like Europe/Berlin.
-
And that's the one we actually need
because they contain daylight saving time
-
transitions. And that is actually crucial
for travel assistance, because you
-
can have a flight from, say, the US to
Europe, at the night where there is
-
daylight saving time transition on one
end. And if we get that wrong, right, we
-
are off by one hour. And that could mean
you miss your flight. So that we
-
need to get absolutely right. And
Wikidata there mixes the three timezone
-
variations. So that's why we fall back to
OpenStreetMap there. Another area
-
that still needs work is vendor specific
station identifiers. So there's a number
-
of train companies that have their own
numeric identifier, or alphanumeric
-
identifiers, which you find, for example,
in barcodes of tickets. So that's our
-
way to actually find out where people are
traveling. So that's something we are
-
trying to feed into Wikidata as we get our
hands on those identifiers. For airports,
-
that's easy because they are
internationally standardized. For train
-
stations, that's a bit more messy. And
finally, the dynamic data. That's again,
-
an area where we benefit from Google using
their monopoly. They wanted to have local
-
public transportation information in
Google Maps. So they defined the GTFS
-
format, which is a way for local transport
operators to send their schedules to
-
Google. But most of the time, that is done
in a way that they basically publish this
-
as Open Data. And that way, all of us get
access to it. And then there's Navitia,
-
which is a Free Software implementation of
like a routing and journey query service
-
that consumes all of those Open Data
schedule information. And that then in
-
turn, we can use again to, yeah, find our
departure schedules, delays and that
-
kind of live information. Apple Wallet
also has some kind of live updating
-
polling mechanism. But that is somewhat
dangerous because it leaks personal
-
identifiable information. So the,
basically, a unique identifier for your
-
pass is sent out with the API request to
to pull an update. So that is basically
-
just a last resort mechanism if you have
nothing else. And then there's a bunch of
-
vendor specific, more or less proprietary
APIs that we could use. They are
-
unfortunately not often compatible with
Free Software and Open Source, because,
-
they might require API keys that you're
not allowed to share, or they have terms
-
and conditions that are simply
incompatible with what we are trying to
-
do. So for some, this works, but there's
still some room for improvement in those
-
vendors' understanding the value of proper
Open Data access. OK, so that's the
-
theory, let's have a look at what we have
actually built for this. So there's two,
-
ya, backend components, so to say there is
the extraction library that implements the
-
schema.org data model for flights,
for trains, for hotels, for restaurants
-
and for events. It can do the structured
data extraction. That might sound easy at
-
first, but it turns out that for some of
the operators, doing proper JSON array
-
encoding is somewhat hard. So, I mean, you
need to do a... need to have a comma in
-
between two objects and brackets around
it. Some of them struggle with that. So we
-
have to have lots of workarounds in, in
parsing the data we receive. Then we have
-
an unstructured extraction system that's
basically small scripts per provider
-
or per operator that then, yeah, use
regular expressions or XPATH queries
-
depending on the input and turn that into
our data model. We currently, I think,
-
have 50, slightly more than 50 of those. I
know that Apple has about 600, so that is
-
still one order of magnitude more. But
it's not impossible. Right. So I think we,
-
we have the means there with Free Software
to come to a similar result than
-
people that have an Apple or Google scale
budget for this. The service coverage is
-
actually quite different. So, for Apple,
I've seen their custom extractor.
-
So they have a lot of like US car rental
services. We have somewhat more important
-
stuff like CCC tickets. So the Congress
ticket is actually recognized and I
-
managed to get in with the app. What
the expection engine also does is it
-
augments whatever we find in the input
documents by information we have on
-
Wikidata. So we usually have time zones,
countries, geo coordinates, all that
-
useful stuff for then offering assistance
features on top. And input formats is
-
basically everything I mentioned. The
usual stuff you're getting in an email
-
from a transport operator or any kind
of booking document. The second piece on
-
like, on backend components is the public
transportation library. That's basically
-
our client API for Navitia mainly, but
also for some of the proprietary
-
widespread backends like HAFAS. That's the
stuff Deutsche Bahn is using. And it can
-
aggregate the results from multiple
backends. And if you're using Open Data in
-
the backend - interference noise - it
propagates the attribution information
-
correctly. So. And just a few days ago, it
also gained support for querying train and
-
platform layouts or "Wagenstandsanzeiger"
in German so we can have all of that in
-
the app. And now of course there's the KDE
Itinerary app itself. So it has, oh… it's
-
very hard to read here. It's basically a
timeline with the various booking
-
information you have grouped together by
trip. It can insert the live weather
-
information. Again, that's online access,
so it's optional, but yeah, it's kind of
-
useful. And this is… you probably can't
read that. But that's my train to Leipzig
-
this morning and that's actually the
Congress entry ticket. And the box at the
-
top is the collapsible group for my trip
to Leipzig for Congress. And it can
-
show the actual tickets and barcodes,
including Apple Wallet passes. So, if you
-
sometimes have a, like a manual inspection
at an airport where they don't scan your
-
boarding pass, but look at it, apparently
that looks reasonable enough that you can
-
board an aircraft with it. At least, I
wasn't arrested so far. And then we have
-
one of my favorite features, also powered
by Wikidata. It's the power plug
-
incompatibility warning. interference
noise - So, I mean, if you're traveling
-
to, say, the US, or UK, you're probably
aware that they have like incompatible
-
power plugs. But there are some
countries where this isn't – at least to
-
me, isn't that obvious, like Switzerland
or Italy, where only half of my power
-
plugs work. So this is the Italy example.
It tells me that my Schuko plugs won't
-
work, only my Europlugs and. interference
noise - And the right one is, I
-
think for the U.K., where nothing is
compatible. If you occasionally forget
-
your power plug convertor while traveling,
that is super useful. And then, of course,
-
we have the integration with real
time data. So we can show the delay
-
information and platform changes. The part
in the middle is the alternative
-
connection selection for trains. So
if you have a, like a train ticket that
-
isn't bound to a specific connection,
right, then the app lets you pick the one
-
you actually want to take. Or if you're
missing a connection, you need to move to
-
a different train, you can do that right
in the app as well. The screenshot on the
-
right hand side is the, like your overall
travel statistics. So if you're interested
-
in, like, seeing the carbon impact off of
all your trips and the year over year
-
changes, right, the app shows that to you.
And I wasn't really successful, but that's
-
largely because the old data is
incomplete. So if you're interested in
-
that, right, since we have all the data,
that can help you see if you're
-
actually on the right track there. And
then to get data into that, we also have a
-
plugin for email clients. This one is for
for KMail. So it basically then runs the
-
extraction on the email you're
currently looking at and it shows you a
-
summary of what's in there. In this case,
my train to Leipzig this morning,
-
including the option to add that to the
calendar or send it to the app on the
-
phone. We also have the browser extension.
So this is the website of the yearly KDE
-
conference, which has the schema.org
annotations on it. And the browser
-
extension recognizes that. And again,
offers me to to add that either to my
-
calendar or to the itinerary app. And that
also works on many restaurant websites or
-
event websites. They have those
annotations on the website for the Google
-
search. So again, we benefit a bit from
the, Google incomprehensible. OK, then
-
we get to the more experimental stuff that
basically just was finished in the last
-
couple of days, that we haven't shown
anywhere else publicly yet. The first one
-
is, and that's a bit better to read, at
least, if you saw the timeline earlier,
-
right, it had my train booking to Leipzig
and then the Congress ticket. But that
-
still leaves two gaps, right. I need to
get from home to the station in Berlin,
-
and I need to get from the station in
Leipzig to Congress. And what we have now
-
is a way for the app to automatically
recognize those gaps and fill them with
-
suggestions on what kind of local
transport you could take. So here the one
-
for Leipzig to Congress is expanded
and shows the tram. That still needs
-
some work to do live tracking so that
it accounts for delays and changes your
-
alarm clock in the morning if there's
delays on that trip. But we have
-
all the building blocks to make the
whole thing much more smart in this
-
area now. And that, I think was literally
done yesterday. So that's why the graphics
-
still are very basic. That's the train
layout, coach layout display for
-
your trip. So that you know where your
reserved seat on the train can actually be
-
found. Then, I only showed the KMail
plugin so far. We also have a work-in-
-
progress Thunderbird integration, which is
probably the much more widespread email
-
client. Featurewise, more or less the same
I showed for KMail, so it scans the email
-
and displays your summary and offers you
to put that into the app or, possibly
-
later on also into the calendar.
This one is even more experimental. I can
-
only show you a screenshot of Web
Inspector proving that it managed to
-
extract something. That's the integration
with Nextcloud. I hope we'll have an
-
actual working prototype for this in
January then. Those two things are, of
-
course, important for you to even get
to the data, the booking data, that then
-
the app or other tools you built on top
can consume. OK, so where to get this
-
from? There's the wiki link up there. The
app is currently not yet in the Play Store
-
or in the F-Droid master repository. We
have an F-Droid nightly build repository.
-
I hope that within the next month we'll
get actual official releases in the easier
-
to reach stores than what we have
right now. If you are interested in
-
helping with that, there's some stuff in
Wikidata where improvement on the data
-
directly benefits this work, and that is
specifically around train stations. I
-
think in Germany, last time I checked, we
still had a few hundred train stations
-
that didn't have geo coordinates or even a
human readable label. So that's something
-
to look at. Vendor-specific or even the
more or less standard train station
-
identifiers is something to look at. So
UIC or IBNR codes for train stations,
-
that helps a lot. Yeah. And then, we kind
of need test data for the extractions. So,
-
forget everything I said about privacy. If
you have any kind of booking documents or
-
emails you want to donate to support this
and get the providers you're using
-
supported in in the extraction engine,
talk to me. That would be extremely
-
useful. Yeah, that's it. Thank you.
-
Applause
-
Herald: Hello, hello? Yeah. That's a very
impressive project, I think, do we have
-
questions then I'll hand you my
microphone. Yes.
-
Q: Would it be possible to extract
platform lift data for train stations?
-
A: Sorry? Platform….
Q: Platform lift data.
-
A: Oh, I think Deutsche Bahn has an Open
Data API for the live status of lifts.
-
That would, of course, in theory be
possible. What we are trying to do is to
-
be generic enough so that this might not
be applicable in just one country,
-
although it is very European focused
because most of the team is there. But
-
lifts is something that is easy enough to
generalize in a data model, right? Its
-
location on the platform, and, are they
working or not? So, yeah, that that would
-
be a nice addition. That goes into the
entire direction of, ya, indoor navigation
-
or navigation around larger train stations
and airports. So that's probably something
-
where we could use a better overall
display with the OpenStreetMap data and
-
then augment that with, like the, where
exactly is your train stopping and in
-
which coach is your seat, and then have
the lift data so we can basically guide
-
you to the right place in a better
way. Yeah.
-
Herald: Any more questions? Yes.
Q: It's the mobile app written in Qt as
-
well?
A: Yes, most of this is C++ code, because
-
that's what we use at KDE. The mobile
client as well. There's a bit of Java for
-
platform integration with android. I don't
think anyone has ever tried to build it on
-
iOS, but of course it works on Linux based
mobile platforms as well, thanks to Qt and
-
C++, yeah.
Q: So you mostly talked about the mobile
-
app so far, which is understandable, but
as it's a QML application does it also run
-
on desktop? And, a second question, how
do, how do all the plugins and the
-
different instances of the app share their
data?
-
A: So, yes, the app runs on desktop. I was
trying to see if I can actually start it
-
here. I'm not sure on which screen it will
end up. That's where we do most of the
-
development. Let me see if I can move it
over. Oh, thank you. And I need to find my
-
mouse cursor on the two screens. Uh. I
think I need to end the presentation
-
first, but, yeah, short answer, of course.
There we go. And let me switch to… to…
-
yeah, so that's it, running on
desktop. It has a mobile UI there. That
-
could, of course, be extended to be more
useful on the desktop as well. And in
-
terms of storage, that is currently
internal to the app, there is no second
-
process accessing the actual data storage.
That would just unnecessarily complicate
-
it for now. But if there is a use for
that, yeah, we'll need to see.
-
Q: Yeah, but, but, but there was an
option, in the e-mail plugin, for example,
-
to send it to the app. Can I then only
send it to my local app and not to the
-
mobile app?
A: Oh, the central app, that's using
-
KDE Connect. That's an integration
software that allows you to remote control
-
your phone from the desktop. So that's
basically bundling up all the information
-
and sends it to the app on the phone. And…
or it can import it locally, so.
-
Herald: OK, do we have other questions?
No, we don't have time? So then, thank you
-
very much, Volker, maybe you can tell
people where they can find you if they
-
have anything more they want to talk
about. But….
-
A: Yeah, I mean, there's my email
address and otherwise I'll be around all
-
day, all four days.
Herald: Around where?
-
Volker Krause: Probably somewhere. So it
just is a bit tricky.
-
Herald: …catch him before he runs away,
then! All right. So give a round of
-
applause again and thank you, Volker!
-
Applause
-
postroll music
-
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!