36C3 Wikipaka WG: Free Software for Open Science

0:00 - 0:24

36C3 preroll music
0:24 - 0:30

purine:bitter: Thanks a lot to WikiPakaWG
for hosting this and for keeping us all
0:30 - 0:39

awake. So probably it's not wrong to say
Good Morning everyone. Okay, what I would
0:39 - 0:45

like to do so this all of this has been
announced as a discussion so there's
0:45 - 0:52

probably no point in me talking to you for
something like 55 minutes straight. So I
0:52 - 0:59

would just like to give you a couple of
slides on what we could discuss and then
0:59 - 1:08

see where we want to go with this one,
okay? So to start off with: Who of you
1:08 - 1:17

considers him- or herself to be a
scientist? Okay, who has the pleasure to
1:17 - 1:25

work within the European scientific
system? Okay, and within the German one?
1:25 - 1:34

Okay, so negative control: Who knows what
the capital of North Dakota is? Okay, so
1:34 - 1:42

there is no rigor mortis in your arms.
Okay, so topic today is Free Software for
1:42 - 1:47

Open Science and as I have some
association with the Free Software
1:47 - 1:55

Foundation Europe, well we should probably
start with the definitions: So number one,
1:55 - 2:00

what do we consider to be Free Software in
this one: It's pretty much every software
2:00 - 2:07

that would be released under an either
FSF- or OSI-compliant license. So this is
2:07 - 2:17

what most people know also as Open Source
and main point here is, as the FSF and OSI
2:17 - 2:21

definitions pretty much standardized the
same things that they just have different
2:21 - 2:32

ways to say it, it should be made sure
that it guarantees the Four Freedoms to
2:32 - 2:39

the user, so to use, to study, to improve
and to share the piece of software and of
2:39 - 2:46

course this does require the existence and
openness of a source code and the ability
2:46 - 2:55

to actually create derivatives. Okay so
and I think for everyone who has been
2:55 - 3:00

working in science it's pretty clear that
those four core freedoms are very well
3:00 - 3:05

aligned with what we're trying to do in
science okay we're trying to build up on
3:05 - 3:12

the work of others and to get humanity
along and increase our overall knowledge.
3:12 - 3:20

So for that reason what we're doing there
is exactly that we're exercising those
3:20 - 3:25

four freedoms just not necessarily that
we're doing it in a digital or code-based
3:25 - 3:31

manner. Okay so that's the first thing.
Then what actually is Open Science? So
3:31 - 3:37

first of all, Open Science is a Class A
buzzword. Nevertheless, the European
3:37 - 3:45

Commission took the liberty to get a
committee in there, in that case the OSPP,
3:45 - 3:53

the Open Science Policy Platform, and
those people developed a lot of bits or
3:53 - 4:01

paper, whatever. And what they defined is
eight key areas, they are called sometimes
4:01 - 4:08

called "ambitions", sometimes they're
called "priorities", which is the key
4:08 - 4:14

things that need to be addressed in the
midterm to move European science to what
4:14 - 4:21

they consider to be Open Science. And this
is not only, and that's very important,
4:21 - 4:26

about the classical things that you might
know like Open Access and Open Data. Open
4:26 - 4:30

Access and Open Data are basically
incorporated in here, so scholarly
4:30 - 4:35

communication, it says "Future of
Scholarly Communication", which can be
4:35 - 4:43

everything from Open Access to just going
digital. However, we should all be aware
4:43 - 4:51

that European Commission now has endorsed
Plan S, which is a rather far-reaching
4:51 - 4:56

push towards more or rather radical
program in terms of publishing
4:56 - 5:02

requirements, so we can consider that this
part for scholarly communication is really
5:02 - 5:09

meant to be Open Access. And then the
other things, so Open Data is what is
5:09 - 5:16

called here to be FAIR Data, because the
Commission typically tries to avoid the
5:16 - 5:21

term "Open", because "Open" is of course
is not FAIR and FAIR unfortunately is not
5:21 - 5:26

"Open". But this is where we lead our
discussions. So this means that we only
5:26 - 5:32

have two of the classical Open Science
points that are in here. Everything else
5:32 - 5:38

are things like "Incentives", so this is
how can we generate better citation or how
5:38 - 5:43

can we make sure that the people who do
the work get the credit, so we might need
5:43 - 5:57

some reform in how we do citations. Then
"Indicators" is -- was that me or was that
5:57 - 6:05

okay -- so "Indicators" is kind of a way
to try to overcome the simple citation
6:05 - 6:13

indices and of course especially the
impact factor. "EOSC" for those of you
6:13 - 6:16

have not heard that term that's a very
large project, that's the European Open
6:16 - 6:22

Science Cloud. It's still rather ill-
defined what it should be, it's getting
6:22 - 6:27

better along the way but the term has been
out there for three years. In the end what
6:27 - 6:33

this is about is to really create a large
federated European infrastructure for
6:33 - 6:41

scientific data. The main funding for that
one will come from the National States and
6:41 - 6:48

so for example the German implementation
is called NFDI, National Research Data
6:48 - 6:53

Infrastructure, and will be heavily funded
by nearly 1 billion Euros over the next 10
6:53 - 7:03

years so this is the scale that we are
talking about. "Integrity" means how to
7:03 - 7:10

assure integrity, "Skills" is how to train
the next generation of scientists and CS
7:10 - 7:16

is the abbreviation for "Citizen Science".
So with all of this you see that what Open
7:16 - 7:20

Science is not just trying to do tick
marks, what they're really trying to push
7:20 - 7:29

for is a rather fundamental change in the
way how we do our work to what's really
7:29 - 7:36

becoming a more egalitarian system and a
more open and participatory system. Okay,
7:36 - 7:43

so now the question is, what is the role
that free software can play in this. And
7:43 - 7:47

so one of the things that we need to
define here are we talking about Free
7:47 - 7:54

Software for Open Science, which is the
thing that this talk was announced for.
7:54 - 7:58

But of course we could also, if that's the
general interest, to talk about Free
7:58 - 8:04

Software in Open Science or in science in
general. So distinction would be that the
8:04 - 8:09

"for Open Science" is mainly, here we're
talking about software as a research
8:09 - 8:14

product, so this is mainly the main focus
software that is created by the scientists
8:14 - 8:22

themselves and here we then have of course
issues like how to sustain it how to
8:22 - 8:30

ensure quality and how to choose proper
licensing models for it. While the "in
8:30 - 8:35

science" is more generally talking about
generic software tools so this is
8:35 - 8:41

operating system, office suites and so on
that are just used by scientists in more
8:41 - 8:51

general. In both cases the main point of
course is how Free Software can contribute
8:51 - 8:57

to the scientific endeavor is of course by
promoting the reproducibility because
8:57 - 9:05

everyone can use these tools there is no
there is no pay wall in that case. So you
9:05 - 9:12

don't need to purchase as given Microsoft
Office version to recreate an Excel table
9:12 - 9:19

or something like this and of course also
the attempt to reduce black boxing. The
9:19 - 9:29

other thing that is more specific for Free
Software for Open Science is the general
9:29 - 9:36

thing that we already said: Okay, so some
of the ideas of Free Software align well
9:36 - 9:41

with what we're trying to do in science.
But more importantly the question right
9:41 - 9:47

now is: Does it fit the policies under
which we are operating? And so of course
9:47 - 9:56

the main policy that most people know is
FAIR. So FAIR stands for Findable,
9:56 - 10:02

Accessible Interoperable and Reusable and
it's a kind of a paradigm that was
10:02 - 10:12

defined, so published 2016, was in the
making for a couple of years before that
10:12 - 10:18

and this is something that was a primarily
geared towards data. The nice thing about
10:18 - 10:25

FAIR is that the 2016 paper also
operationalizes this so they give criteria
10:25 - 10:33

on what you need to do or what you need to
ensure that for example a data set is
10:33 - 10:39

findable, what it means how it needs to be
accessible and so on so forth. And of
10:39 - 10:45

course reuse also says something about,
well you need to put a license on it, but
10:45 - 10:53

otherwise it's not that specific. Okay,
now importantly for this one stuff, that
10:53 - 10:59

is FAIR does not necessarily align with
Free Software because Free Software means
10:59 - 11:04

that there are no restrict- that there are
basically no restrictions in use, while
11:04 - 11:17

the reusability for FAIR simply says:
People somehow need to be able to reuse
11:17 - 11:23

it, so there needs to be a clear pathway.
That can still be a proprietary license,
11:23 - 11:30

okay and that license might still not
allow you to do everything with it, there
11:30 - 11:36

just needs to be this ability. So that's
one of the main things where FAIR does not
11:36 - 11:42

fit the usual - the Free Software
definitions. On the other hand of course,
11:42 - 11:54

Free Software doesn't say anything about
-- Oh No! I killed the alpaca! --
11:54 - 12:00

Applause
Okay, I'm probably gonna be kicked off the
12:00 - 12:14

stage any minute, okay sorry. Alright, so
on the other hand, I can write beautiful
12:14 - 12:18

code and put it under an Open Source
license and put it on a USB stick and bury
12:18 - 12:25

it somewhere in my garden. Okay, so then
it's neither findable nor accessible and
12:25 - 12:31

this is of course also something where the
classical definitions for Free Software
12:31 - 12:35

don't necessarily match these two
criteria, which nevertheless also for
12:35 - 12:43

software do make sense. Finally one last
thing is that FAIR defines a product, so
12:43 - 12:46

it says: Okay, so the outcome of your
research needs to comply with different
12:46 - 12:51

criteria and that's of course a relatively
easy thing to test. What it does not do
12:51 - 12:56

and maybe from a software development
perspective this is something that is more
12:56 - 13:01

important, it doesn't define a process how
we do things. And this is one of the
13:01 - 13:09

things that also one of the German
committees so the RfII has recently
13:09 - 13:15

started to criticize for FAIR that we say
okay, FAIR data just says this one, but
13:15 - 13:20

you can have completely rubbish data and
it can still be FAIR. But what we want to
13:20 - 13:28

have is high quality FAIR data. So FAIR
clearly is some kind of minimal consensus
13:28 - 13:35

it's condicio sine qua non, but we
probably need to extend it at this point
13:35 - 13:41

and of course was this one we can also
discuss on how we want to continue, how we
13:41 - 13:49

want to get this into or align this with
Free Software. Okay, so that's more or
13:49 - 13:55

less the brief introduction, now there are
a couple of things that we can discuss
13:55 - 14:02

further, depending on your interest. And
that would be basically what about the
14:02 - 14:06

current European policies, before we
review what about the current German
14:06 - 14:16

policies, what about generic Free Software
tools. But maybe that's the point where
14:16 - 14:32

you could say something to
get us going a bit.
14:32 - 14:35

Question: I think it's working -- You
mentioned that the current software
14:35 - 14:40

standards might not be in line with the
policies, what were you exactly referring
14:40 - 14:42

to?
Answer: Can you repeat this?
14:42 - 14:46

Q: You mentioned before that the current
software procedures or standards might not
14:46 - 14:51

be in line with the policies in the
European Union. What exactly did you mean
14:51 - 15:04

by that?
A: So the thing is that the so I can
15:04 - 15:11

comply with OSI regulations for Open
Source Software, but none of our funding
15:11 - 15:18

bodies says you need to be OSI compliant.
What they say typically is you should do
15:18 - 15:24

stuff that is FAIR but right now one of
the issues, this is what basically this
15:24 - 15:32

slide then says, is the question whether
any of the policy makers really define
15:32 - 15:38

code as a primary research object. And
that's right now not the case so therefore
15:38 - 15:44

everyone assumes that code behaves like
data and to equal code with data is
15:44 - 15:50

something where some people get cold
shivers, others don't because it is an
15:50 - 15:55

operation that you can do, it's a lossy
operation, but it might be it might help
15:55 - 16:03

us in some ways. And the main point here
is that code has some idiosyncrasies that
16:03 - 16:07

make it distinct from data and this is
where our policies break. On the other
16:07 - 16:12

hand, some of the policies that we came up
-- not for research but in general, so
16:12 - 16:18

from the from the Free Software
perspective -- that we made up there,
16:18 - 16:23

didn't make it into the policy documents
and so therefore are not incorporated
16:23 - 16:30

there. Okay, so FAIR criteria and the
other ones don't completely overlap. So
16:30 - 16:34

most people might write code but it still
won't align with a FAIR criterion if you
16:34 - 16:48

would take it one to one.
Q: So a question about the topic item to
16:48 - 16:53

start the licensing. So when we say we
have a commercial company who like
16:53 - 16:59

Microsoft who develops an office package
and when you say Free Software for Open
16:59 - 17:05

Science it would be better to like invest
the money not into license cost where
17:05 - 17:10

reoccurring but better for like and like a
bigger thing like country to invest more
17:10 - 17:18

in like open code or like open programs.
Is this kind of like tackled by what you
17:18 - 17:25

mean with the FAIR or the Open Source?
A: This is this is one of the things that
17:25 - 17:32

not necessary is not necessarily so you
could construct it in a way that it
17:32 - 17:37

actually overlaps with FAIR. Because
you're talking about reproducibility, oh
17:37 - 17:42

well so okay, FAIR doesn't say
reproducibility but it says accessibility
17:42 - 17:46

and if you're using formats that are
proprietary you could say okay well this
17:46 - 17:51

is not accessible to everyone because you
need to pay for it. Now the thing is that
17:51 - 17:55

there are a lot of things where you have
to pay for so this was one of the things
17:55 - 18:03

that was never on the agenda to try to be
eradicated. This is, so the generic
18:03 - 18:09

software part is just something that I
that came into this whole process later,
18:09 - 18:17

initially it was really geared towards
the: How can scientists make sure that or
18:17 - 18:21

how does the software produced by
scientists is both Free Software and
18:21 - 18:27

contributes to Open Science and what do we
need to do to create potentially
18:27 - 18:33

additional funding opportunities for,
because this is where typically breaks, to
18:33 - 18:40

say well I can write better code if I have
more man or woman power, if I have people
18:40 - 18:46

who curate, if I have people who do who do
issue fixing and so on and so forth. Which
18:46 - 18:53

right now is not considered part of the
research process but in reality, so by the
18:53 - 18:58

policy makers, but in reality it already
has become that. Now if you're saying you
18:58 - 19:04

are using generic software or generic
office suits for that one, then yes, we
19:04 - 19:09

are investing a lot on in these things in
the tertiary education and in the research
19:09 - 19:16

sector and, personal opinion, yes we
should spend this on things that doesn't
19:16 - 19:22

nudge people towards proprietary
solutions. But the question there but
19:22 - 19:29

that's something that is because it it has
a stronger education component also for
19:29 - 19:35

student education, so I wanted to bring it
up here because I thought okay maybe it's
19:35 - 19:41

something that more people here are
interested in. But I agree that it doesn't
19:41 - 19:49

overlap completely, doesn't strongly
overlap with the with the Open Science
19:49 - 20:00

part.
Q: Right, okay. I've heard some people
20:00 - 20:05

work on the FAIR principles specific for
software. You've heard about it and you
20:05 - 20:14

know what kind of the differences are?
A: Yes, so thanks for this input. So let
20:14 - 20:24

me check. Okay I've missed that one. So
yeah, there's a recent paper that just
20:24 - 20:33

came out a couple of weeks ago by Anna-
Lena Lamprecht, she's from the Netherlands
20:33 - 20:42

eScience Center. So what they try to do
is, they to use the catalog or this the
20:42 - 20:48

original FAIR criteria and check for each
of those ones does it apply to software,
20:48 - 20:59

yes or no? And then change them, amend
them in a way to make sure that it then,
20:59 - 21:04

well, better fits into the process. So
they for example say well so there needs
21:04 - 21:10

to be some kind of documented quality
control, they're more talking of course
21:10 - 21:14

about software repositories, they then
include versioning, which is one of the
21:14 - 21:19

huge things that sets code apart from
data, which is once it's released
21:19 - 21:25

typically a rather static object. So
they're trying to get somewhere and I
21:25 - 21:35

think it's, it's a good document to start
with but in my personal opinion, I think
21:35 - 21:39

it wasn't bold enough. You might have
been, I mean we had this discussion at the
21:39 - 21:48

RSE19 conference also, where Anna-Lena
also was there, and it tries to stick very
21:48 - 21:53

closely to FAIR, because they assume that
this is what people know. Which I think is
21:53 - 21:57

good. On the other hand there's a very
clear recommendation form most bodies that
21:57 - 22:02

FAIR should not be extended, so we don't
need, as they say, we don't need
22:02 - 22:07

"additional letters" for FAIR and they
really want to have those basically as one
22:07 - 22:15

concept to stick on to stick with data. So
therefore I think it would have been
22:15 - 22:23

necessary have a bolder step to to try to
work in all the established development
22:23 - 22:29

policies that we already have than just to
stick as close as possible to FAIR and
22:29 - 22:34

then just change the nitty-gritty details,
which is what they did. But nevertheless I
22:34 - 22:38

think it's it's something that is clearly
worth reading.
22:38 - 22:43

Q: Thanks a lot for your talk this
resonated a lot with me and as someone
22:43 - 22:50

working in research infrastructure I think
it's super important that we focus on
22:50 - 22:56

recognizing research infrastructure so all
kinds of services like sustainable data
22:56 - 23:02

storage for researchers, tools that help
make data discoverable and things like
23:02 - 23:05

that. That this should be considered a
public good right?
23:05 - 23:09

A: Yes
Q: And so next to what you mentioned and
23:09 - 23:14

rightly so with Microsoft, the other risk
that I currently see, is that legacy
23:14 - 23:21

publishers like Elsevier, like Springer-
Nature and so on, try to capture the whole
23:21 - 23:30

market so this all as trying to deliver on
all the needs that researchers have in the
23:30 - 23:38

digital area with huge platforms. And this
is like a battle that we almost have lost
23:38 - 23:45

already, as it seems. So there are many
interesting very good free and open source
23:45 - 23:51

alternatives to what they deliver but it's
really not recognized very well why this
23:51 - 23:57

is so important. This is my impression.
A: Yeah I mean I would I would second
23:57 - 24:03

that. So, I think and this is it's
interesting to see the large publishing
24:03 - 24:08

companies now really moving away from
their traditional business because
24:08 - 24:12

apparently they have recognized that they
might be on a losing path there. But
24:12 - 24:19

really to offer a wholesale data
management solutions to institutes. I mean
24:19 - 24:23

there is, this is probably just an
anecdote, but so apparently Elsevier
24:23 - 24:29

offered to I think the Netherlands or the
Dutch government to say that they said:
24:29 - 24:35

Okay, we do all of your data management or
basically you get everything for free, but
24:35 - 24:41

each and every institution has to deliver
but we become your central data deposition
24:41 - 24:50

platform. Which well, unfortunately it
might appeal to some politicians, I think
24:50 - 24:56

it doesn't appeal to anyone else in this
room given that probably Elsevier is a
24:56 - 25:03

company that is even more hated than
Microsoft for reasons completely unknown I
25:03 - 25:08

mean they just make a revenue of thirty-
five percent every year so maybe we should
25:08 - 25:18

just buy stock options.
Q: Oh thank you for your talk. What I not
25:18 - 25:24

completely understand is why we use the
FAIR concept for as a point of reference
25:24 - 25:29

at all. Because I feel like this the
concept of Open Access in science is far
25:29 - 25:34

more applicable to code. So in the end
code is text and it's part of the
25:34 - 25:39

scientific publication system, so we have
references from and to code and such
25:39 - 25:47

things. And the the Open Access yeah yeah
the the concept of Open Access has the
25:47 - 25:52

same ancestors like the scientific
publication system with the Mertonian
25:52 - 26:00

norms of science and such, so why don't
treat code like scientific publications.
26:00 - 26:05

A: Ok, I'm honestly I'm relatively open to
this idea because this is I mean is the
26:05 - 26:11

reason why we're having this discussion.
The mainly what I'm presenting to you now
26:11 - 26:16

is mainly developed out of the existing EU
policies and the EU talks about FAIR a
26:16 - 26:21

lot. Because for them it's an
operationalized thing, it's something that
26:21 - 26:23

they would like to test in the end, they
it's something that they would like to
26:23 - 26:30

score and so on so forth so that paper
pushers have something to do with. But I
26:30 - 26:37

agree that we can simply say well in the
end the openness is more important and
26:37 - 26:47

FAIR, as we already said, isn't open, so
therefore the Open Access would maybe the
26:47 - 26:53

better point to to hook this up so yeah I
agree on that.
26:53 - 26:57

postroll music
26:57 - 27:20

Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!

Title:: 36C3 Wikipaka WG: Free Software for Open Science
Description:: more » « less
Video Language:: English
Duration:: 27:15

	Bar Sch edited English subtitles for 36C3 Wikipaka WG: Free Software for Open Science
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Free Software for Open Science
	C3Subtitles edited English subtitles for 36C3 Wikipaka WG: Free Software for Open Science

English subtitles

Revisions

Revision 3 Edited

Bar Sch

36C3 Wikipaka WG: Free Software for Open Science

Revisions

Our website uses cookies

Operating cookies (Required)