<i>36c3 preroll music</i>

Herald: So, hey, we're finally ready to
start, we have Volker Krause here with a

privacy by design travel assistant and
it's going to be about building Open

Source travel assistants, I think, and
this talk will be in English. And if you

want translations, wenn ihr eine deutsche
Übersetzung haben wollt, haben wir hier

hinten auch ganz tolle Übersetzer in
unserer Kabine, da könnt ihr auf

c3lingo.org mal reinhören, wie die alles
live mitreden. Genau. Now. Let's have

a warm welcome for Volker here and have
fun with his talk.

<i>Applause</i>

Volker Krause: Thank you. OK, so what is
this about? You probably know those

features in, most prominently Google
Mail, but I think TripIt was the one that

pioneered this. So GMail reads your
email and then detects any kind of booking

information in there, like your boarding
passes, your train tickets, your hotel

bookings and so on. And it can integrate
that into your calendar and can present

you a unified itinerary for your entire
trip and monitor that for changes. And

all of that doesn't cost you anything.
Maybe apart from a bit of your privacy.

Well, not too bad, you might think. But if
you look at what kind of data is actually

involved in just your travel. Right. 
The obvious things that come to

mind, your name, your birthday, your
credit card number, your passport number,

that kind of information. Right. But that
isn't even the worst part on this,

because those operators don't just get to
see your specific data for one trip,

right? They get to see every… everyone's
trip. And now if you combine that

information, that actually uncovers a lot
of information about... relations

between people, your interests, who
you work for, where you live and all of

that. Right. So pretty much everyone here
traveled to Leipzig for the last four days

in the year. If that happens for
two of us, once, right, that might be

coincidence. If that happens two or three
years in a row, that is some kind of

information. But yeah, what to do
about that, right? The easy solution

is, just not use those services. It's like
first world luxury stuff anyway. That

works until you end up in a foreign
country where you don't speak any of the

local languages and then get introduced to
their counterpart of Schienenersatzverkehr

or Tarifzonenrandgebiet. And at that
point, you might be interested in actually

understanding what's happening on your
your trip in in some form that you

actually understand and that you are
familiar with, ideally without installing

15 different vendor applications for
whatever you actually might be

traveling, right? So we need something
better. And that obviously leads us to,

let's do it ourselves. Then, we can at
least design this for privacy right from

the start. Build it on top of Free
Software and Open Data. Well, of course,

we need to... at least it's not entirely
obvious that this will actually work,

right? The Google and Apple, they have a
total different amount of resources

available for this. So, can we actually
build this ourselves? So let's have a look

at what those services actually need to
function. And it turns out it's primarily

about data, not so much about code.
There are some difficult parts

in terms of code involved as well, like
the image processing and a PDF to detect a

barcode in your boarding pass. But all of
that exists as ready-made building blocks.

So you basically just need to put this
nicely together. So let's look at the

data. That's the more interesting part.
And in general, that breaks down to

three different categories. The first one
is what I call personal data here. So

that's basically booking information,
documents or tickets, boarding passes,

specific for you. So there at least you
don't have a problem with access because

that is sent to you and you need to have
access to that. But it comes in all kinds

of forms and shapes. So there are the
challenges to actually extract that . The

second kind of data is what I would call
static data. So, for example, the location

of an airport. Now, you could argue that
that could change and there is rumors

that some people apparently managed to
build new airports. I live in Berlin, so I

don't believe this. Jokes aside, so,
"static" refers to within, static within

the release cycle of the software. So
several weeks or a few months. So this is

stuff that we can ship as offline
databases. And offline, of course, helps

us with privacy because then you're not
observable from the outside. And the third

category is dynamic data. So stuff that is
very, very short lived, such as delay

information. There is no way we can do
that offline. If we want that kind of

information, we will always need some kind
of online querying. Then let's look

through those three categories in a bit
more detail. For the booking data, google

was faced with the same problem, so they
used their monopoly and defined a standard

in which operators should ideally have
machine readable annotations on their

booking information. And that's awesome,
because we can just use the same, the same

system. That's what nowadays became
schema.org, which I think Lucas mentioned

in the morning as well. At least in
the US and Europe, you'll find

that in about 30 to 50% of booking emails
you get from hotels, airlines or event

brokers. So that's a good start. But then
there's the rest, which is basically

unstructured data, random PDF files or
HTML emails we have to work with. There's

Apple wallet boarding passes. They are
somewhat semi structured and most

widespread for flight tickets. Well,
that's somewhat usable. And barcodes, so

that's what you, again, see on boarding
passes or train tickets. I could

probably fill an entire talk just with the
various details on the different

barcode systems, the one for boarding
passes, I think, Karsten Nohl had to talk

at Congress a few years back, where he
showed how they work and what you can do

with them. Instagram #boardingpass is a
very nice source of test data. The one

that you find on, on German railway
tickets is also pretty much researched

already. The ones we actually had to
break ourselves were the one for Italy. I

think to my knowledge, we are the first
ones to publish the content of those

binary barcodes. And we are currently
working on the VDV Kernapplikation

E-Ticket, which is the standard for German
local transportation tickets. That

actually has some crypto that you need to
get around to actually see the content. So

there is, if you're interested in that
kind of stuff, there is quite some

interesting detail to be found in this.
But let's continue with the

static data. There, of course, we have
Wikidata. That has almost everything we

need. And we are making heavy use of that.
And that's also why I'm here today on the

Wikimedia stage. One thing that
Wikidata doesn't do perfectly is timezone

information. That's why we're using the
open street map data for this. There's in

Wikidata, three different time zone… or
ways of specifying the time zone. UTC

offsets, some kind, of coarse, human
readable naming like Central European

Summer Time, and then the actual IANA time
zone specifications like Europe/Berlin.

And that's the one we actually need
because they contain daylight saving time

transitions. And that is actually crucial
for travel assistance, because you

can have a flight from, say, the US to
Europe, at the night where there is

daylight saving time transition on one
end. And if we get that wrong, right, we

are off by one hour. And that could mean
you miss your flight. So that we

need to get absolutely right. And
Wikidata there mixes the three timezone

variations. So that's why we fall back to
OpenStreetMap there. Another area

that still needs work is vendor specific
station identifiers. So there's a number

of train companies that have their own
numeric identifier, or alphanumeric

identifiers, which you find, for example,
in barcodes of tickets. So that's our

way to actually find out where people are
traveling. So that's something we are

trying to feed into Wikidata as we get our
hands on those identifiers. For airports,

that's easy because they are
internationally standardized. For train

stations, that's a bit more messy. And
finally, the dynamic data. That's again,

an area where we benefit from Google using
their monopoly. They wanted to have local

public transportation information in
Google Maps. So they defined the GTFS

format, which is a way for local transport
operators to send their schedules to

Google. But most of the time, that is done
in a way that they basically publish this

as Open Data. And that way, all of us get
access to it. And then there's Navitia,

which is a Free Software implementation of
like a routing and journey query service

that consumes all of those Open Data
schedule information. And that then in

turn, we can use again to, yeah, find our
departure schedules, delays and that

kind of live information. Apple Wallet
also has some kind of live updating

polling mechanism. But that is somewhat
dangerous because it leaks personal

identifiable information. So the,
basically, a unique identifier for your

pass is sent out with the API request to
to pull an update. So that is basically

just a last resort mechanism if you have
nothing else. And then there's a bunch of

vendor specific, more or less proprietary
APIs that we could use. They are

unfortunately not often compatible with
Free Software and Open Source, because,

they might require API keys that you're
not allowed to share, or they have terms

and conditions that are simply
incompatible with what we are trying to

do. So for some, this works, but there's
still some room for improvement in those

vendors' understanding the value of proper
Open Data access. OK, so that's the

theory, let's have a look at what we have
actually built for this. So there's two,

ya, backend components, so to say there is
the extraction library that implements the

schema.org data model for flights,
for trains, for hotels, for restaurants

and for events. It can do the structured
data extraction. That might sound easy at

first, but it turns out that for some of
the operators, doing proper JSON array

encoding is somewhat hard. So, I mean, you
need to do a... need to have a comma in

between two objects and brackets around
it. Some of them struggle with that. So we

have to have lots of workarounds in, in
parsing the data we receive. Then we have

an unstructured extraction system that's
basically small scripts per provider

or per operator that then, yeah, use
regular expressions or XPATH queries

depending on the input and turn that into
our data model. We currently, I think,

have 50, slightly more than 50 of those. I
know that Apple has about 600, so that is

still one order of magnitude more. But
it's not impossible. Right. So I think we,

we have the means there with Free Software
to come to a similar result than

people that have an Apple or Google scale
budget for this. The service coverage is

actually quite different. So, for Apple,
I've seen their custom extractor.

So they have a lot of like US car rental
services. We have somewhat more important

stuff like CCC tickets. So the Congress
ticket is actually recognized and I

managed to get in with the app. What
the expection engine also does is it

augments whatever we find in the input
documents by information we have on

Wikidata. So we usually have time zones,
countries, geo coordinates, all that

useful stuff for then offering assistance
features on top. And input formats is

basically everything I mentioned. The
usual stuff you're getting in an email

from a transport operator or any kind
of booking document. The second piece on

like, on backend components is the public
transportation library. That's basically

our client API for Navitia mainly, but
also for some of the proprietary

widespread backends like HAFAS. That's the
stuff Deutsche Bahn is using. And it can

aggregate the results from multiple
backends. And if you're using Open Data in

the backend - <i>interference noise</i> - it
propagates the attribution information

correctly. So. And just a few days ago, it
also gained support for querying train and

platform layouts or "Wagenstandsanzeiger"
in German so we can have all of that in

the app. And now of course there's the KDE
Itinerary app itself. So it has, oh… it's

very hard to read here. It's basically a
timeline with the various booking

information you have grouped together by
trip. It can insert the live weather

information. Again, that's online access,
so it's optional, but yeah, it's kind of

useful. And this is… you probably can't
read that. But that's my train to Leipzig

this morning and that's actually the
Congress entry ticket. And the box at the

top is the collapsible group for my trip
to Leipzig for Congress. And it can

show the actual tickets and barcodes,
including Apple Wallet passes. So, if you

sometimes have a, like a manual inspection
at an airport where they don't scan your

boarding pass, but look at it, apparently
that looks reasonable enough that you can

board an aircraft with it. At least, I
wasn't arrested so far. And then we have

one of my favorite features, also powered
by Wikidata. It's the power plug

incompatibility warning. <i>interference
noise</i> - So, I mean, if you're traveling

to, say, the US, or UK, you're probably
aware that they have like incompatible

power plugs. But there are some
countries where this isn't – at least to

me, isn't that obvious, like Switzerland
or Italy, where only half of my power

plugs work. So this is the Italy example.
It tells me that my Schuko plugs won't

work, only my Europlugs and. <i>interference
noise</i> - And the right one is, I

think for the U.K., where nothing is
compatible. If you occasionally forget

your power plug convertor while traveling,
that is super useful. And then, of course,

we have the integration with real
time data. So we can show the delay

information and platform changes. The part
in the middle is the alternative

connection selection for trains. So
if you have a, like a train ticket that

isn't bound to a specific connection,
right, then the app lets you pick the one

you actually want to take. Or if you're
missing a connection, you need to move to

a different train, you can do that right
in the app as well. The screenshot on the

right hand side is the, like your overall
travel statistics. So if you're interested

in, like, seeing the carbon impact off of
all your trips and the year over year

changes, right, the app shows that to you.
And I wasn't really successful, but that's

largely because the old data is
incomplete. So if you're interested in

that, right, since we have all the data,
that can help you see if you're

actually on the right track there. And
then to get data into that, we also have a

plugin for email clients. This one is for
for KMail. So it basically then runs the

extraction on the email you're
currently looking at and it shows you a

summary of what's in there. In this case,
my train to Leipzig this morning,

including the option to add that to the
calendar or send it to the app on the

phone. We also have the browser extension.
So this is the website of the yearly KDE

conference, which has the schema.org
annotations on it. And the browser

extension recognizes that. And again,
offers me to to add that either to my

calendar or to the itinerary app. And that
also works on many restaurant websites or

event websites. They have those
annotations on the website for the Google

search. So again, we benefit a bit from
the, Google <i>incomprehensible</i>. OK, then

we get to the more experimental stuff that
basically just was finished in the last

couple of days, that we haven't shown
anywhere else publicly yet. The first one

is, and that's a bit better to read, at
least, if you saw the timeline earlier,

right, it had my train booking to Leipzig
and then the Congress ticket. But that

still leaves two gaps, right. I need to
get from home to the station in Berlin,

and I need to get from the station in
Leipzig to Congress. And what we have now

is a way for the app to automatically
recognize those gaps and fill them with

suggestions on what kind of local
transport you could take. So here the one

for Leipzig to Congress is expanded
and shows the tram. That still needs

some work to do live tracking so that
it accounts for delays and changes your

alarm clock in the morning if there's
delays on that trip. But we have

all the building blocks to make the
whole thing much more smart in this

area now. And that, I think was literally
done yesterday. So that's why the graphics

still are very basic. That's the train
layout, coach layout display for

your trip. So that you know where your
reserved seat on the train can actually be

found. Then, I only showed the KMail
plugin so far. We also have a work-in-

progress Thunderbird integration, which is
probably the much more widespread email

client. Featurewise, more or less the same
I showed for KMail, so it scans the email

and displays your summary and offers you
to put that into the app or, possibly

later on also into the calendar.
This one is even more experimental. I can

only show you a screenshot of Web
Inspector proving that it managed to

extract something. That's the integration
with Nextcloud. I hope we'll have an

actual working prototype for this in
January then. Those two things are, of

course, important for you to even get
to the data, the booking data, that then

the app or other tools you built on top
can consume. OK, so where to get this

from? There's the wiki link up there. The
app is currently not yet in the Play Store

or in the F-Droid master repository. We
have an F-Droid nightly build repository.

I hope that within the next month we'll
get actual official releases in the easier

to reach stores than what we have
right now. If you are interested in

helping with that, there's some stuff in
Wikidata where improvement on the data

directly benefits this work, and that is
specifically around train stations. I

think in Germany, last time I checked, we
still had a few hundred train stations

that didn't have geo coordinates or even a
human readable label. So that's something

to look at. Vendor-specific or even the
more or less standard train station

identifiers is something to look at. So
UIC or IBNR codes for train stations,

that helps a lot. Yeah. And then, we kind
of need test data for the extractions. So,

forget everything I said about privacy. If
you have any kind of booking documents or

emails you want to donate to support this
and get the providers you're using

supported in in the extraction engine,
talk to me. That would be extremely

useful. Yeah, that's it. Thank you.

<i>Applause</i>

Herald: Hello, hello? Yeah. That's a very
impressive project, I think, do we have

questions then I'll hand you my
microphone. Yes.

Q: Would it be possible to extract
platform lift data for train stations?

A: Sorry? Platform….
Q: Platform lift data.

A: Oh, I think Deutsche Bahn has an Open
Data API for the live status of lifts.

That would, of course, in theory be
possible. What we are trying to do is to

be generic enough so that this might not
be applicable in just one country,

although it is very European focused
because most of the team is there. But

lifts is something that is easy enough to
generalize in a data model, right? Its

location on the platform, and, are they
working or not? So, yeah, that that would

be a nice addition. That goes into the
entire direction of, ya, indoor navigation

or navigation around larger train stations
and airports. So that's probably something

where we could use a better overall
display with the OpenStreetMap data and

then augment that with, like the, where
exactly is your train stopping and in

which coach is your seat, and then have
the lift data so we can basically guide

you to the right place in a better
way. Yeah.

Herald: Any more questions? Yes.
Q: It's the mobile app written in Qt as

well?
A: Yes, most of this is C++ code, because

that's what we use at KDE. The mobile
client as well. There's a bit of Java for

platform integration with android. I don't
think anyone has ever tried to build it on

iOS, but of course it works on Linux based
mobile platforms as well, thanks to Qt and

C++, yeah.
Q: So you mostly talked about the mobile

app so far, which is understandable, but
as it's a QML application does it also run

on desktop? And, a second question, how
do, how do all the plugins and the

different instances of the app share their
data?

A: So, yes, the app runs on desktop. I was
trying to see if I can actually start it

here. I'm not sure on which screen it will
end up. That's where we do most of the

development. Let me see if I can move it
over. Oh, thank you. And I need to find my

mouse cursor on the two screens. Uh. I
think I need to end the presentation

first, but, yeah, short answer, of course.
There we go. And let me switch to… to…

yeah, so that's it, running on
desktop. It has a mobile UI there. That

could, of course, be extended to be more
useful on the desktop as well. And in

terms of storage, that is currently
internal to the app, there is no second

process accessing the actual data storage.
That would just unnecessarily complicate

it for now. But if there is a use for
that, yeah, we'll need to see.

Q: Yeah, but, but, but there was an
option, in the e-mail plugin, for example,

to send it to the app. Can I then only
send it to my local app and not to the

mobile app?
A: Oh, the central app, that's using

KDE Connect. That's an integration
software that allows you to remote control

your phone from the desktop. So that's
basically bundling up all the information

and sends it to the app on the phone. And…
or it can import it locally, so.

Herald: OK, do we have other questions?
No, we don't have time? So then, thank you

very much, Volker, maybe you can tell
people where they can find you if they

have anything more they want to talk
about. But….

A: Yeah, I mean, there's my email
address and otherwise I'll be around all

day, all four days.
Herald: Around where?

Volker Krause: Probably somewhere. So it
just is a bit tricky.

Herald: …catch him before he runs away,
then! All right. So give a round of

applause again and thank you, Volker!

<i>Applause</i>

<i>postroll music</i>

Subtitles created by c3subtitles.de
in the year 2021. Join, and help us!