36c3 preroll music
Herald: So, hey, we're finally ready to
start, we have Volker Krause here with a
privacy by design travel assistant and
it's going to be about building Open
Source travel assistants, I think, and
this talk will be in English. And if you
want translations, wenn ihr eine deutsche
Übersetzung haben wollt, haben wir hier
hinten auch ganz tolle Übersetzer in
unserer Kabine, da könnt ihr auf
c3lingo.org mal reinhören, wie die alles
live mitreden. Genau. Now. Let's have
a warm welcome for Volker here and have
fun with his talk.
Applause
Volker Krause: Thank you. OK, so what is
this about? You probably know those
features in, most prominently Google
Mail, but I think TripIt was the one that
pioneered this. So GMail reads your
email and then detects any kind of booking
information in there, like your boarding
passes, your train tickets, your hotel
bookings and so on. And it can integrate
that into your calendar and can present
you a unified itinerary for your entire
trip and monitor that for changes. And
all of that doesn't cost you anything.
Maybe apart from a bit of your privacy.
Well, not too bad, you might think. But if
you look at what kind of data is actually
involved in just your travel. Right.
The obvious things that come to
mind, your name, your birthday, your
credit card number, your passport number,
that kind of information. Right. But that
isn't even the worst part on this,
because those operators don't just get to
see your specific data for one trip,
right? They get to see every… everyone's
trip. And now if you combine that
information, that actually uncovers a lot
of information about... relations
between people, your interests, who
you work for, where you live and all of
that. Right. So pretty much everyone here
traveled to Leipzig for the last four days
in the year. If that happens for
two of us, once, right, that might be
coincidence. If that happens two or three
years in a row, that is some kind of
information. But yeah, what to do
about that, right? The easy solution
is, just not use those services. It's like
first world luxury stuff anyway. That
works until you end up in a foreign
country where you don't speak any of the
local languages and then get introduced to
their counterpart of Schienenersatzverkehr
or Tarifzonenrandgebiet. And at that
point, you might be interested in actually
understanding what's happening on your
your trip in in some form that you
actually understand and that you are
familiar with, ideally without installing
15 different vendor applications for
whatever you actually might be
traveling, right? So we need something
better. And that obviously leads us to,
let's do it ourselves. Then, we can at
least design this for privacy right from
the start. Build it on top of Free
Software and Open Data. Well, of course,
we need to... at least it's not entirely
obvious that this will actually work,
right? The Google and Apple, they have a
total different amount of resources
available for this. So, can we actually
build this ourselves? So let's have a look
at what those services actually need to
function. And it turns out it's primarily
about data, not so much about code.
There are some difficult parts
in terms of code involved as well, like
the image processing and a PDF to detect a
barcode in your boarding pass. But all of
that exists as ready-made building blocks.
So you basically just need to put this
nicely together. So let's look at the
data. That's the more interesting part.
And in general, that breaks down to
three different categories. The first one
is what I call personal data here. So
that's basically booking information,
documents or tickets, boarding passes,
specific for you. So there at least you
don't have a problem with access because
that is sent to you and you need to have
access to that. But it comes in all kinds
of forms and shapes. So there are the
challenges to actually extract that . The
second kind of data is what I would call
static data. So, for example, the location
of an airport. Now, you could argue that
that could change and there is rumors
that some people apparently managed to
build new airports. I live in Berlin, so I
don't believe this. Jokes aside, so,
"static" refers to within, static within
the release cycle of the software. So
several weeks or a few months. So this is
stuff that we can ship as offline
databases. And offline, of course, helps
us with privacy because then you're not
observable from the outside. And the third
category is dynamic data. So stuff that is
very, very short lived, such as delay
information. There is no way we can do
that offline. If we want that kind of
information, we will always need some kind
of online querying. Then let's look
through those three categories in a bit
more detail. For the booking data, google
was faced with the same problem, so they
used their monopoly and defined a standard
in which operators should ideally have
machine readable annotations on their
booking information. And that's awesome,
because we can just use the same, the same
system. That's what nowadays became
schema.org, which I think Lucas mentioned
in the morning as well. At least in
the US and Europe, you'll find
that in about 30 to 50% of booking emails
you get from hotels, airlines or event
brokers. So that's a good start. But then
there's the rest, which is basically
unstructured data, random PDF files or
HTML emails we have to work with. There's
Apple wallet boarding passes. They are
somewhat semi structured and most
widespread for flight tickets. Well,
that's somewhat usable. And barcodes, so
that's what you, again, see on boarding
passes or train tickets. I could
probably fill an entire talk just with the
various details on the different
barcode systems, the one for boarding
passes, I think, Karsten Nohl had to talk
at Congress a few years back, where he
showed how they work and what you can do
with them. Instagram #boardingpass is a
very nice source of test data. The one
that you find on, on German railway
tickets is also pretty much researched
already. The ones we actually had to
break ourselves were the one for Italy. I
think to my knowledge, we are the first
ones to publish the content of those
binary barcodes. And we are currently
working on the VDV Kernapplikation
E-Ticket, which is the standard for German
local transportation tickets. That
actually has some crypto that you need to
get around to actually see the content. So
there is, if you're interested in that
kind of stuff, there is quite some
interesting detail to be found in this.
But let's continue with the
static data. There, of course, we have
Wikidata. That has almost everything we
need. And we are making heavy use of that.
And that's also why I'm here today on the
Wikimedia stage. One thing that
Wikidata doesn't do perfectly is timezone
information. That's why we're using the
open street map data for this. There's in
Wikidata, three different time zone… or
ways of specifying the time zone. UTC
offsets, some kind, of coarse, human
readable naming like Central European
Summer Time, and then the actual IANA time
zone specifications like Europe/Berlin.
And that's the one we actually need
because they contain daylight saving time
transitions. And that is actually crucial
for travel assistance, because you
can have a flight from, say, the US to
Europe, at the night where there is
daylight saving time transition on one
end. And if we get that wrong, right, we
are off by one hour. And that could mean
you miss your flight. So that we
need to get absolutely right. And
Wikidata there mixes the three timezone
variations. So that's why we fall back to
OpenStreetMap there. Another area
that still needs work is vendor specific
station identifiers. So there's a number
of train companies that have their own
numeric identifier, or alphanumeric
identifiers, which you find, for example,
in barcodes of tickets. So that's our
way to actually find out where people are
traveling. So that's something we are
trying to feed into Wikidata as we get our
hands on those identifiers. For airports,
that's easy because they are
internationally standardized. For train
stations, that's a bit more messy. And
finally, the dynamic data. That's again,
an area where we benefit from Google using
their monopoly. They wanted to have local
public transportation information in
Google Maps. So they defined the GTFS
format, which is a way for local transport
operators to send their schedules to
Google. But most of the time, that is done
in a way that they basically publish this
as Open Data. And that way, all of us get
access to it. And then there's Navitia,
which is a Free Software implementation of
like a routing and journey query service
that consumes all of those Open Data
schedule information. And that then in
turn, we can use again to, yeah, find our
departure schedules, delays and that
kind of live information. Apple Wallet
also has some kind of live updating
polling mechanism. But that is somewhat
dangerous because it leaks personal
identifiable information. So the,
basically, a unique identifier for your
pass is sent out with the API request to
to pull an update. So that is basically
just a last resort mechanism if you have
nothing else. And then there's a bunch of
vendor specific, more or less proprietary
APIs that we could use. They are
unfortunately not often compatible with
Free Software and Open Source, because,
they might require API keys that you're
not allowed to share, or they have terms
and conditions that are simply
incompatible with what we are trying to
do. So for some, this works, but there's
still some room for improvement in those
vendors' understanding the value of proper
Open Data access. OK, so that's the
theory, let's have a look at what we have
actually built for this. So there's two,
ya, backend components, so to say there is
the extraction library that implements the
schema.org data model for flights,
for trains, for hotels, for restaurants
and for events. It can do the structured
data extraction. That might sound easy at
first, but it turns out that for some of
the operators, doing proper JSON array
encoding is somewhat hard. So, I mean, you
need to do a... need to have a comma in
between two objects and brackets around
it. Some of them struggle with that. So we
have to have lots of workarounds in, in
parsing the data we receive. Then we have
an unstructured extraction system that's
basically small scripts per provider
or per operator that then, yeah, use
regular expressions or XPATH queries
depending on the input and turn that into
our data model. We currently, I think,
have 50, slightly more than 50 of those. I
know that Apple has about 600, so that is
still one order of magnitude more. But
it's not impossible. Right. So I think we,
we have the means there with Free Software
to come to a similar result than
people that have an Apple or Google scale
budget for this. The service coverage is
actually quite different. So, for Apple,
I've seen their custom extractor.
So they have a lot of like US car rental
services. We have somewhat more important
stuff like CCC tickets. So the Congress
ticket is actually recognized and I
managed to get in with the app. What
the expection engine also does is it
augments whatever we find in the input
documents by information we have on
Wikidata. So we usually have time zones,
countries, geo coordinates, all that
useful stuff for then offering assistance
features on top. And input formats is
basically everything I mentioned. The
usual stuff you're getting in an email
from a transport operator or any kind
of booking document. The second piece on
like, on backend components is the public
transportation library. That's basically
our client API for Navitia mainly, but
also for some of the proprietary
widespread backends like HAFAS. That's the
stuff Deutsche Bahn is using. And it can
aggregate the results from multiple
backends. And if you're using Open Data in
the backend - interference noise - it
propagates the attribution information
correctly. So. And just a few days ago, it
also gained support for querying train and
platform layouts or "Wagenstandsanzeiger"
in German so we can have all of that in
the app. And now of course there's the KDE
Itinerary app itself. So it has, oh… it's
very hard to read here. It's basically a
timeline with the various booking
information you have grouped together by
trip. It can insert the live weather
information. Again, that's online access,
so it's optional, but yeah, it's kind of
useful. And this is… you probably can't
read that. But that's my train to Leipzig
this morning and that's actually the
Congress entry ticket. And the box at the
top is the collapsible group for my trip
to Leipzig for Congress. And it can
show the actual tickets and barcodes,
including Apple Wallet passes. So, if you
sometimes have a, like a manual inspection
at an airport where they don't scan your
boarding pass, but look at it, apparently
that looks reasonable enough that you can
board an aircraft with it. At least, I
wasn't arrested so far. And then we have
one of my favorite features, also powered
by Wikidata. It's the power plug
incompatibility warning. interference
noise - So, I mean, if you're traveling
to, say, the US, or UK, you're probably
aware that they have like incompatible
power plugs. But there are some
countries where this isn't – at least to
me, isn't that obvious, like Switzerland
or Italy, where only half of my power
plugs work. So this is the Italy example.
It tells me that my Schuko plugs won't
work, only my Europlugs and. interference
noise - And the right one is, I
think for the U.K., where nothing is
compatible. If you occasionally forget
your power plug convertor while traveling,
that is super useful. And then, of course,
we have the integration with real
time data. So we can show the delay
information and platform changes. The part
in the middle is the alternative
connection selection for trains. So
if you have a, like a train ticket that
isn't bound to a specific connection,
right, then the app lets you pick the one
you actually want to take. Or if you're
missing a connection, you need to move to
a different train, you can do that right
in the app as well. The screenshot on the
right hand side is the, like your overall
travel statistics. So if you're interested
in, like, seeing the carbon impact off of
all your trips and the year over year
changes, right, the app shows that to you.
And I wasn't really successful, but that's
largely because the old data is
incomplete. So if you're interested in
that, right, since we have all the data,
that can help you see if you're
actually on the right track there. And
then to get data into that, we also have a
plugin for email clients. This one is for
for KMail. So it basically then runs the
extraction on the email you're
currently looking at and it shows you a
summary of what's in there. In this case,
my train to Leipzig this morning,
including the option to add that to the
calendar or send it to the app on the
phone. We also have the browser extension.
So this is the website of the yearly KDE
conference, which has the schema.org
annotations on it. And the browser
extension recognizes that. And again,
offers me to to add that either to my
calendar or to the itinerary app. And that
also works on many restaurant websites or
event websites. They have those
annotations on the website for the Google
search. So again, we benefit a bit from
the, Google incomprehensible. OK, then
we get to the more experimental stuff that
basically just was finished in the last
couple of days, that we haven't shown
anywhere else publicly yet. The first one
is, and that's a bit better to read, at
least, if you saw the timeline earlier,
right, it had my train booking to Leipzig
and then the Congress ticket. But that
still leaves two gaps, right. I need to
get from home to the station in Berlin,
and I need to get from the station in
Leipzig to Congress. And what we have now
is a way for the app to automatically
recognize those gaps and fill them with
suggestions on what kind of local
transport you could take. So here the one
for Leipzig to Congress is expanded
and shows the tram. That still needs
some work to do live tracking so that
it accounts for delays and changes your
alarm clock in the morning if there's
delays on that trip. But we have
all the building blocks to make the
whole thing much more smart in this
area now. And that, I think was literally
done yesterday. So that's why the graphics
still are very basic. That's the train
layout, coach layout display for
your trip. So that you know where your
reserved seat on the train can actually be
found. Then, I only showed the KMail
plugin so far. We also have a work-in-
progress Thunderbird integration, which is
probably the much more widespread email
client. Featurewise, more or less the same
I showed for KMail, so it scans the email
and displays your summary and offers you
to put that into the app or, possibly
later on also into the calendar.
This one is even more experimental. I can
only show you a screenshot of Web
Inspector proving that it managed to
extract something. That's the integration
with Nextcloud. I hope we'll have an
actual working prototype for this in
January then. Those two things are, of
course, important for you to even get
to the data, the booking data, that then
the app or other tools you built on top
can consume. OK, so where to get this
from? There's the wiki link up there. The
app is currently not yet in the Play Store
or in the F-Droid master repository. We
have an F-Droid nightly build repository.
I hope that within the next month we'll
get actual official releases in the easier
to reach stores than what we have
right now. If you are interested in
helping with that, there's some stuff in
Wikidata where improvement on the data
directly benefits this work, and that is
specifically around train stations. I
think in Germany, last time I checked, we
still had a few hundred train stations
that didn't have geo coordinates or even a
human readable label. So that's something
to look at. Vendor-specific or even the
more or less standard train station
identifiers is something to look at. So
UIC or IBNR codes for train stations,
that helps a lot. Yeah. And then, we kind
of need test data for the extractions. So,
forget everything I said about privacy. If
you have any kind of booking documents or
emails you want to donate to support this
and get the providers you're using
supported in in the extraction engine,
talk to me. That would be extremely
useful. Yeah, that's it. Thank you.
Applause
Herald: Hello, hello? Yeah. That's a very
impressive project, I think, do we have
questions then I'll hand you my
microphone. Yes.
Q: Would it be possible to extract
platform lift data for train stations?
A: Sorry? Platform….
Q: Platform lift data.
A: Oh, I think Deutsche Bahn has an Open
Data API for the live status of lifts.
That would, of course, in theory be
possible. What we are trying to do is to
be generic enough so that this might not
be applicable in just one country,
although it is very European focused
because most of the team is there. But
lifts is something that is easy enough to
generalize in a data model, right? Its
location on the platform, and, are they
working or not? So, yeah, that that would
be a nice addition. That goes into the
entire direction of, ya, indoor navigation
or navigation around larger train stations
and airports. So that's probably something
where we could use a better overall
display with the OpenStreetMap data and
then augment that with, like the, where
exactly is your train stopping and in
which coach is your seat, and then have
the lift data so we can basically guide
you to the right place in a better
way. Yeah.
Herald: Any more questions? Yes.
Q: It's the mobile app written in Qt as
well?
A: Yes, most of this is C++ code, because
that's what we use at KDE. The mobile
client as well. There's a bit of Java for
platform integration with android. I don't
think anyone has ever tried to build it on
iOS, but of course it works on Linux based
mobile platforms as well, thanks to Qt and
C++, yeah.
Q: So you mostly talked about the mobile
app so far, which is understandable, but
as it's a QML application does it also run
on desktop? And, a second question, how
do, how do all the plugins and the
different instances of the app share their
data?
A: So, yes, the app runs on desktop. I was
trying to see if I can actually start it
here. I'm not sure on which screen it will
end up. That's where we do most of the
development. Let me see if I can move it
over. Oh, thank you. And I need to find my
mouse cursor on the two screens. Uh. I
think I need to end the presentation
first, but, yeah, short answer, of course.
There we go. And let me switch to… to…
yeah, so that's it, running on
desktop. It has a mobile UI there. That
could, of course, be extended to be more
useful on the desktop as well. And in
terms of storage, that is currently
internal to the app, there is no second
process accessing the actual data storage.
That would just unnecessarily complicate
it for now. But if there is a use for
that, yeah, we'll need to see.
Q: Yeah, but, but, but there was an
option, in the e-mail plugin, for example,
to send it to the app. Can I then only
send it to my local app and not to the
mobile app?
A: Oh, the central app, that's using
KDE Connect. That's an integration
software that allows you to remote control
your phone from the desktop. So that's
basically bundling up all the information
and sends it to the app on the phone. And…
or it can import it locally, so.
Herald: OK, do we have other questions?
No, we don't have time? So then, thank you
very much, Volker, maybe you can tell
people where they can find you if they
have anything more they want to talk
about. But….
A: Yeah, I mean, there's my email
address and otherwise I'll be around all
day, all four days.
Herald: Around where?
Volker Krause: Probably somewhere. So it
just is a bit tricky.
Herald: …catch him before he runs away,
then! All right. So give a round of
applause again and thank you, Volker!
Applause
postroll music
Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!