36C3 preroll music
purine:bitter: Thanks a lot to WikiPakaWG
for hosting this and for keeping us all
awake. So probably it's not wrong to say
Good Morning everyone. Okay, what I would
like to do so this all of this has been
announced as a discussion so there's
probably no point in me talking to you for
something like 55 minutes straight. So I
would just like to give you a couple of
slides on what we could discuss and then
see where we want to go with this one,
okay? So to start off with: Who of you
considers him- or herself to be a
scientist? Okay, who has the pleasure to
work within the European scientific
system? Okay, and within the German one?
Okay, so negative control: Who knows what
the capital of North Dakota is? Okay, so
there is no rigor mortis in your arms.
Okay, so topic today is Free Software for
Open Science and as I have some
association with the Free Software
Foundation Europe, well we should probably
start with the definitions: So number one,
what do we consider to be Free Software in
this one: It's pretty much every software
that would be released under an either
FSF- or OSI-compliant license. So this is
what most people know also as Open Source
and main point here is, as the FSF and OSI
definitions pretty much standardized the
same things that they just have different
ways to say it, it should be made sure
that it guarantees the Four Freedoms to
the user, so to use, to study, to improve
and to share the piece of software and of
course this does require the existence and
openness of a source code and the ability
to actually create derivatives. Okay so
and I think for everyone who has been
working in science it's pretty clear that
those four core freedoms are very well
aligned with what we're trying to do in
science okay we're trying to build up on
the work of others and to get humanity
along and increase our overall knowledge.
So for that reason what we're doing there
is exactly that we're exercising those
four freedoms just not necessarily that
we're doing it in a digital or code-based
manner. Okay so that's the first thing.
Then what actually is Open Science? So
first of all, Open Science is a Class A
buzzword. Nevertheless, the European
Commission took the liberty to get a
committee in there, in that case the OSPP,
the Open Science Policy Platform, and
those people developed a lot of bits or
paper, whatever. And what they defined is
eight key areas, they are called sometimes
called "ambitions", sometimes they're
called "priorities", which is the key
things that need to be addressed in the
midterm to move European science to what
they consider to be Open Science. And this
is not only, and that's very important,
about the classical things that you might
know like Open Access and Open Data. Open
Access and Open Data are basically
incorporated in here, so scholarly
communication, it says "Future of
Scholarly Communication", which can be
everything from Open Access to just going
digital. However, we should all be aware
that European Commission now has endorsed
Plan S, which is a rather far-reaching
push towards more or rather radical
program in terms of publishing
requirements, so we can consider that this
part for scholarly communication is really
meant to be Open Access. And then the
other things, so Open Data is what is
called here to be FAIR Data, because the
Commission typically tries to avoid the
term "Open", because "Open" is of course
is not FAIR and FAIR unfortunately is not
"Open". But this is where we lead our
discussions. So this means that we only
have two of the classical Open Science
points that are in here. Everything else
are things like "Incentives", so this is
how can we generate better citation or how
can we make sure that the people who do
the work get the credit, so we might need
some reform in how we do citations. Then
"Indicators" is -- was that me or was that
okay -- so "Indicators" is kind of a way
to try to overcome the simple citation
indices and of course especially the
impact factor. "EOSC" for those of you
have not heard that term that's a very
large project, that's the European Open
Science Cloud. It's still rather ill-
defined what it should be, it's getting
better along the way but the term has been
out there for three years. In the end what
this is about is to really create a large
federated European infrastructure for
scientific data. The main funding for that
one will come from the National States and
so for example the German implementation
is called NFDI, National Research Data
Infrastructure, and will be heavily funded
by nearly 1 billion Euros over the next 10
years so this is the scale that we are
talking about. "Integrity" means how to
assure integrity, "Skills" is how to train
the next generation of scientists and CS
is the abbreviation for "Citizen Science".
So with all of this you see that what Open
Science is not just trying to do tick
marks, what they're really trying to push
for is a rather fundamental change in the
way how we do our work to what's really
becoming a more egalitarian system and a
more open and participatory system. Okay,
so now the question is, what is the role
that free software can play in this. And
so one of the things that we need to
define here are we talking about Free
Software for Open Science, which is the
thing that this talk was announced for.
But of course we could also, if that's the
general interest, to talk about Free
Software in Open Science or in science in
general. So distinction would be that the
"for Open Science" is mainly, here we're
talking about software as a research
product, so this is mainly the main focus
software that is created by the scientists
themselves and here we then have of course
issues like how to sustain it how to
ensure quality and how to choose proper
licensing models for it. While the "in
science" is more generally talking about
generic software tools so this is
operating system, office suites and so on
that are just used by scientists in more
general. In both cases the main point of
course is how Free Software can contribute
to the scientific endeavor is of course by
promoting the reproducibility because
everyone can use these tools there is no
there is no pay wall in that case. So you
don't need to purchase as given Microsoft
Office version to recreate an Excel table
or something like this and of course also
the attempt to reduce black boxing. The
other thing that is more specific for Free
Software for Open Science is the general
thing that we already said: Okay, so some
of the ideas of Free Software align well
with what we're trying to do in science.
But more importantly the question right
now is: Does it fit the policies under
which we are operating? And so of course
the main policy that most people know is
FAIR. So FAIR stands for Findable,
Accessible Interoperable and Reusable and
it's a kind of a paradigm that was
defined, so published 2016, was in the
making for a couple of years before that
and this is something that was a primarily
geared towards data. The nice thing about
FAIR is that the 2016 paper also
operationalizes this so they give criteria
on what you need to do or what you need to
ensure that for example a data set is
findable, what it means how it needs to be
accessible and so on so forth. And of
course reuse also says something about,
well you need to put a license on it, but
otherwise it's not that specific. Okay,
now importantly for this one stuff, that
is FAIR does not necessarily align with
Free Software because Free Software means
that there are no restrict- that there are
basically no restrictions in use, while
the reusability for FAIR simply says:
People somehow need to be able to reuse
it, so there needs to be a clear pathway.
That can still be a proprietary license,
okay and that license might still not
allow you to do everything with it, there
just needs to be this ability. So that's
one of the main things where FAIR does not
fit the usual - the Free Software
definitions. On the other hand of course,
Free Software doesn't say anything about
-- Oh No! I killed the alpaca! --
Applause
Okay, I'm probably gonna be kicked off the
stage any minute, okay sorry. Alright, so
on the other hand, I can write beautiful
code and put it under an Open Source
license and put it on a USB stick and bury
it somewhere in my garden. Okay, so then
it's neither findable nor accessible and
this is of course also something where the
classical definitions for Free Software
don't necessarily match these two
criteria, which nevertheless also for
software do make sense. Finally one last
thing is that FAIR defines a product, so
it says: Okay, so the outcome of your
research needs to comply with different
criteria and that's of course a relatively
easy thing to test. What it does not do
and maybe from a software development
perspective this is something that is more
important, it doesn't define a process how
we do things. And this is one of the
things that also one of the German
committees so the RfII has recently
started to criticize for FAIR that we say
okay, FAIR data just says this one, but
you can have completely rubbish data and
it can still be FAIR. But what we want to
have is high quality FAIR data. So FAIR
clearly is some kind of minimal consensus
it's condicio sine qua non, but we
probably need to extend it at this point
and of course was this one we can also
discuss on how we want to continue, how we
want to get this into or align this with
Free Software. Okay, so that's more or
less the brief introduction, now there are
a couple of things that we can discuss
further, depending on your interest. And
that would be basically what about the
current European policies, before we
review what about the current German
policies, what about generic Free Software
tools. But maybe that's the point where
you could say something to
get us going a bit.
Question: I think it's working -- You
mentioned that the current software
standards might not be in line with the
policies, what were you exactly referring
to?
Answer: Can you repeat this?
Q: You mentioned before that the current
software procedures or standards might not
be in line with the policies in the
European Union. What exactly did you mean
by that?
A: So the thing is that the so I can
comply with OSI regulations for Open
Source Software, but none of our funding
bodies says you need to be OSI compliant.
What they say typically is you should do
stuff that is FAIR but right now one of
the issues, this is what basically this
slide then says, is the question whether
any of the policy makers really define
code as a primary research object. And
that's right now not the case so therefore
everyone assumes that code behaves like
data and to equal code with data is
something where some people get cold
shivers, others don't because it is an
operation that you can do, it's a lossy
operation, but it might be it might help
us in some ways. And the main point here
is that code has some idiosyncrasies that
make it distinct from data and this is
where our policies break. On the other
hand, some of the policies that we came up
-- not for research but in general, so
from the from the Free Software
perspective -- that we made up there,
didn't make it into the policy documents
and so therefore are not incorporated
there. Okay, so FAIR criteria and the
other ones don't completely overlap. So
most people might write code but it still
won't align with a FAIR criterion if you
would take it one to one.
Q: So a question about the topic item to
start the licensing. So when we say we
have a commercial company who like
Microsoft who develops an office package
and when you say Free Software for Open
Science it would be better to like invest
the money not into license cost where
reoccurring but better for like and like a
bigger thing like country to invest more
in like open code or like open programs.
Is this kind of like tackled by what you
mean with the FAIR or the Open Source?
A: This is this is one of the things that
not necessary is not necessarily so you
could construct it in a way that it
actually overlaps with FAIR. Because
you're talking about reproducibility, oh
well so okay, FAIR doesn't say
reproducibility but it says accessibility
and if you're using formats that are
proprietary you could say okay well this
is not accessible to everyone because you
need to pay for it. Now the thing is that
there are a lot of things where you have
to pay for so this was one of the things
that was never on the agenda to try to be
eradicated. This is, so the generic
software part is just something that I
that came into this whole process later,
initially it was really geared towards
the: How can scientists make sure that or
how does the software produced by
scientists is both Free Software and
contributes to Open Science and what do we
need to do to create potentially
additional funding opportunities for,
because this is where typically breaks, to
say well I can write better code if I have
more man or woman power, if I have people
who curate, if I have people who do who do
issue fixing and so on and so forth. Which
right now is not considered part of the
research process but in reality, so by the
policy makers, but in reality it already
has become that. Now if you're saying you
are using generic software or generic
office suits for that one, then yes, we
are investing a lot on in these things in
the tertiary education and in the research
sector and, personal opinion, yes we
should spend this on things that doesn't
nudge people towards proprietary
solutions. But the question there but
that's something that is because it it has
a stronger education component also for
student education, so I wanted to bring it
up here because I thought okay maybe it's
something that more people here are
interested in. But I agree that it doesn't
overlap completely, doesn't strongly
overlap with the with the Open Science
part.
Q: Right, okay. I've heard some people
work on the FAIR principles specific for
software. You've heard about it and you
know what kind of the differences are?
A: Yes, so thanks for this input. So let
me check. Okay I've missed that one. So
yeah, there's a recent paper that just
came out a couple of weeks ago by Anna-
Lena Lamprecht, she's from the Netherlands
eScience Center. So what they try to do
is, they to use the catalog or this the
original FAIR criteria and check for each
of those ones does it apply to software,
yes or no? And then change them, amend
them in a way to make sure that it then,
well, better fits into the process. So
they for example say well so there needs
to be some kind of documented quality
control, they're more talking of course
about software repositories, they then
include versioning, which is one of the
huge things that sets code apart from
data, which is once it's released
typically a rather static object. So
they're trying to get somewhere and I
think it's, it's a good document to start
with but in my personal opinion, I think
it wasn't bold enough. You might have
been, I mean we had this discussion at the
RSE19 conference also, where Anna-Lena
also was there, and it tries to stick very
closely to FAIR, because they assume that
this is what people know. Which I think is
good. On the other hand there's a very
clear recommendation form most bodies that
FAIR should not be extended, so we don't
need, as they say, we don't need
"additional letters" for FAIR and they
really want to have those basically as one
concept to stick on to stick with data. So
therefore I think it would have been
necessary have a bolder step to to try to
work in all the established development
policies that we already have than just to
stick as close as possible to FAIR and
then just change the nitty-gritty details,
which is what they did. But nevertheless I
think it's it's something that is clearly
worth reading.
Q: Thanks a lot for your talk this
resonated a lot with me and as someone
working in research infrastructure I think
it's super important that we focus on
recognizing research infrastructure so all
kinds of services like sustainable data
storage for researchers, tools that help
make data discoverable and things like
that. That this should be considered a
public good right?
A: Yes
Q: And so next to what you mentioned and
rightly so with Microsoft, the other risk
that I currently see, is that legacy
publishers like Elsevier, like Springer-
Nature and so on, try to capture the whole
market so this all as trying to deliver on
all the needs that researchers have in the
digital area with huge platforms. And this
is like a battle that we almost have lost
already, as it seems. So there are many
interesting very good free and open source
alternatives to what they deliver but it's
really not recognized very well why this
is so important. This is my impression.
A: Yeah I mean I would I would second
that. So, I think and this is it's
interesting to see the large publishing
companies now really moving away from
their traditional business because
apparently they have recognized that they
might be on a losing path there. But
really to offer a wholesale data
management solutions to institutes. I mean
there is, this is probably just an
anecdote, but so apparently Elsevier
offered to I think the Netherlands or the
Dutch government to say that they said:
Okay, we do all of your data management or
basically you get everything for free, but
each and every institution has to deliver
but we become your central data deposition
platform. Which well, unfortunately it
might appeal to some politicians, I think
it doesn't appeal to anyone else in this
room given that probably Elsevier is a
company that is even more hated than
Microsoft for reasons completely unknown I
mean they just make a revenue of thirty-
five percent every year so maybe we should
just buy stock options.
Q: Oh thank you for your talk. What I not
completely understand is why we use the
FAIR concept for as a point of reference
at all. Because I feel like this the
concept of Open Access in science is far
more applicable to code. So in the end
code is text and it's part of the
scientific publication system, so we have
references from and to code and such
things. And the the Open Access yeah yeah
the the concept of Open Access has the
same ancestors like the scientific
publication system with the Mertonian
norms of science and such, so why don't
treat code like scientific publications.
A: Ok, I'm honestly I'm relatively open to
this idea because this is I mean is the
reason why we're having this discussion.
The mainly what I'm presenting to you now
is mainly developed out of the existing EU
policies and the EU talks about FAIR a
lot. Because for them it's an
operationalized thing, it's something that
they would like to test in the end, they
it's something that they would like to
score and so on so forth so that paper
pushers have something to do with. But I
agree that we can simply say well in the
end the openness is more important and
FAIR, as we already said, isn't open, so
therefore the Open Access would maybe the
better point to to hook this up so yeah I
agree on that.
postroll music
Subtitles created by c3subtitles.de
in the year 2020. Join, and help us!