Return to Video

36C3 Wikipaka WG: Free Software for Open Science

  • 0:00 - 0:24
    36C3 preroll music
  • 0:24 - 0:30
    purine:bitter: Thanks a lot to WikiPakaWG
    for hosting this and for keeping us all
  • 0:30 - 0:39
    awake. So probably it's not wrong to say
    Good Morning everyone. Okay, what I would
  • 0:39 - 0:45
    like to do so this all of this has been
    announced as a discussion so there's
  • 0:45 - 0:52
    probably no point in me talking to you for
    something like 55 minutes straight. So I
  • 0:52 - 0:59
    would just like to give you a couple of
    slides on what we could discuss and then
  • 0:59 - 1:08
    see where we want to go with this one,
    okay? So to start off with: Who of you
  • 1:08 - 1:17
    considers him- or herself to be a
    scientist? Okay, who has the pleasure to
  • 1:17 - 1:25
    work within the European scientific
    system? Okay, and within the German one?
  • 1:25 - 1:34
    Okay, so negative control: Who knows what
    the capital of North Dakota is? Okay, so
  • 1:34 - 1:42
    there is no rigor mortis in your arms.
    Okay, so topic today is Free Software for
  • 1:42 - 1:47
    Open Science and as I have some
    association with the Free Software
  • 1:47 - 1:55
    Foundation Europe, well we should probably
    start with the definitions: So number one,
  • 1:55 - 2:00
    what do we consider to be Free Software in
    this one: It's pretty much every software
  • 2:00 - 2:07
    that would be released under an either
    FSF- or OSI-compliant license. So this is
  • 2:07 - 2:17
    what most people know also as Open Source
    and main point here is, as the FSF and OSI
  • 2:17 - 2:21
    definitions pretty much standardized the
    same things that they just have different
  • 2:21 - 2:32
    ways to say it, it should be made sure
    that it guarantees the Four Freedoms to
  • 2:32 - 2:39
    the user, so to use, to study, to improve
    and to share the piece of software and of
  • 2:39 - 2:46
    course this does require the existence and
    openness of a source code and the ability
  • 2:46 - 2:55
    to actually create derivatives. Okay so
    and I think for everyone who has been
  • 2:55 - 3:00
    working in science it's pretty clear that
    those four core freedoms are very well
  • 3:00 - 3:05
    aligned with what we're trying to do in
    science okay we're trying to build up on
  • 3:05 - 3:12
    the work of others and to get humanity
    along and increase our overall knowledge.
  • 3:12 - 3:20
    So for that reason what we're doing there
    is exactly that we're exercising those
  • 3:20 - 3:25
    four freedoms just not necessarily that
    we're doing it in a digital or code-based
  • 3:25 - 3:31
    manner. Okay so that's the first thing.
    Then what actually is Open Science? So
  • 3:31 - 3:37
    first of all, Open Science is a Class A
    buzzword. Nevertheless, the European
  • 3:37 - 3:45
    Commission took the liberty to get a
    committee in there, in that case the OSPP,
  • 3:45 - 3:53
    the Open Science Policy Platform, and
    those people developed a lot of bits or
  • 3:53 - 4:01
    paper, whatever. And what they defined is
    eight key areas, they are called sometimes
  • 4:01 - 4:08
    called "ambitions", sometimes they're
    called "priorities", which is the key
  • 4:08 - 4:14
    things that need to be addressed in the
    midterm to move European science to what
  • 4:14 - 4:21
    they consider to be Open Science. And this
    is not only, and that's very important,
  • 4:21 - 4:26
    about the classical things that you might
    know like Open Access and Open Data. Open
  • 4:26 - 4:30
    Access and Open Data are basically
    incorporated in here, so scholarly
  • 4:30 - 4:35
    communication, it says "Future of
    Scholarly Communication", which can be
  • 4:35 - 4:43
    everything from Open Access to just going
    digital. However, we should all be aware
  • 4:43 - 4:51
    that European Commission now has endorsed
    Plan S, which is a rather far-reaching
  • 4:51 - 4:56
    push towards more or rather radical
    program in terms of publishing
  • 4:56 - 5:02
    requirements, so we can consider that this
    part for scholarly communication is really
  • 5:02 - 5:09
    meant to be Open Access. And then the
    other things, so Open Data is what is
  • 5:09 - 5:16
    called here to be FAIR Data, because the
    Commission typically tries to avoid the
  • 5:16 - 5:21
    term "Open", because "Open" is of course
    is not FAIR and FAIR unfortunately is not
  • 5:21 - 5:26
    "Open". But this is where we lead our
    discussions. So this means that we only
  • 5:26 - 5:32
    have two of the classical Open Science
    points that are in here. Everything else
  • 5:32 - 5:38
    are things like "Incentives", so this is
    how can we generate better citation or how
  • 5:38 - 5:43
    can we make sure that the people who do
    the work get the credit, so we might need
  • 5:43 - 5:57
    some reform in how we do citations. Then
    "Indicators" is -- was that me or was that
  • 5:57 - 6:05
    okay -- so "Indicators" is kind of a way
    to try to overcome the simple citation
  • 6:05 - 6:13
    indices and of course especially the
    impact factor. "EOSC" for those of you
  • 6:13 - 6:16
    have not heard that term that's a very
    large project, that's the European Open
  • 6:16 - 6:22
    Science Cloud. It's still rather ill-
    defined what it should be, it's getting
  • 6:22 - 6:27
    better along the way but the term has been
    out there for three years. In the end what
  • 6:27 - 6:33
    this is about is to really create a large
    federated European infrastructure for
  • 6:33 - 6:41
    scientific data. The main funding for that
    one will come from the National States and
  • 6:41 - 6:48
    so for example the German implementation
    is called NFDI, National Research Data
  • 6:48 - 6:53
    Infrastructure, and will be heavily funded
    by nearly 1 billion Euros over the next 10
  • 6:53 - 7:03
    years so this is the scale that we are
    talking about. "Integrity" means how to
  • 7:03 - 7:10
    assure integrity, "Skills" is how to train
    the next generation of scientists and CS
  • 7:10 - 7:16
    is the abbreviation for "Citizen Science".
    So with all of this you see that what Open
  • 7:16 - 7:20
    Science is not just trying to do tick
    marks, what they're really trying to push
  • 7:20 - 7:29
    for is a rather fundamental change in the
    way how we do our work to what's really
  • 7:29 - 7:36
    becoming a more egalitarian system and a
    more open and participatory system. Okay,
  • 7:36 - 7:43
    so now the question is, what is the role
    that free software can play in this. And
  • 7:43 - 7:47
    so one of the things that we need to
    define here are we talking about Free
  • 7:47 - 7:54
    Software for Open Science, which is the
    thing that this talk was announced for.
  • 7:54 - 7:58
    But of course we could also, if that's the
    general interest, to talk about Free
  • 7:58 - 8:04
    Software in Open Science or in science in
    general. So distinction would be that the
  • 8:04 - 8:09
    "for Open Science" is mainly, here we're
    talking about software as a research
  • 8:09 - 8:14
    product, so this is mainly the main focus
    software that is created by the scientists
  • 8:14 - 8:22
    themselves and here we then have of course
    issues like how to sustain it how to
  • 8:22 - 8:30
    ensure quality and how to choose proper
    licensing models for it. While the "in
  • 8:30 - 8:35
    science" is more generally talking about
    generic software tools so this is
  • 8:35 - 8:41
    operating system, office suites and so on
    that are just used by scientists in more
  • 8:41 - 8:51
    general. In both cases the main point of
    course is how Free Software can contribute
  • 8:51 - 8:57
    to the scientific endeavor is of course by
    promoting the reproducibility because
  • 8:57 - 9:05
    everyone can use these tools there is no
    there is no pay wall in that case. So you
  • 9:05 - 9:12
    don't need to purchase as given Microsoft
    Office version to recreate an Excel table
  • 9:12 - 9:19
    or something like this and of course also
    the attempt to reduce black boxing. The
  • 9:19 - 9:29
    other thing that is more specific for Free
    Software for Open Science is the general
  • 9:29 - 9:36
    thing that we already said: Okay, so some
    of the ideas of Free Software align well
  • 9:36 - 9:41
    with what we're trying to do in science.
    But more importantly the question right
  • 9:41 - 9:47
    now is: Does it fit the policies under
    which we are operating? And so of course
  • 9:47 - 9:56
    the main policy that most people know is
    FAIR. So FAIR stands for Findable,
  • 9:56 - 10:02
    Accessible Interoperable and Reusable and
    it's a kind of a paradigm that was
  • 10:02 - 10:12
    defined, so published 2016, was in the
    making for a couple of years before that
  • 10:12 - 10:18
    and this is something that was a primarily
    geared towards data. The nice thing about
  • 10:18 - 10:25
    FAIR is that the 2016 paper also
    operationalizes this so they give criteria
  • 10:25 - 10:33
    on what you need to do or what you need to
    ensure that for example a data set is
  • 10:33 - 10:39
    findable, what it means how it needs to be
    accessible and so on so forth. And of
  • 10:39 - 10:45
    course reuse also says something about,
    well you need to put a license on it, but
  • 10:45 - 10:53
    otherwise it's not that specific. Okay,
    now importantly for this one stuff, that
  • 10:53 - 10:59
    is FAIR does not necessarily align with
    Free Software because Free Software means
  • 10:59 - 11:04
    that there are no restrict- that there are
    basically no restrictions in use, while
  • 11:04 - 11:17
    the reusability for FAIR simply says:
    People somehow need to be able to reuse
  • 11:17 - 11:23
    it, so there needs to be a clear pathway.
    That can still be a proprietary license,
  • 11:23 - 11:30
    okay and that license might still not
    allow you to do everything with it, there
  • 11:30 - 11:36
    just needs to be this ability. So that's
    one of the main things where FAIR does not
  • 11:36 - 11:42
    fit the usual - the Free Software
    definitions. On the other hand of course,
  • 11:42 - 11:54
    Free Software doesn't say anything about
    -- Oh No! I killed the alpaca! --
  • 11:54 - 12:00
    Applause
    Okay, I'm probably gonna be kicked off the
  • 12:00 - 12:14
    stage any minute, okay sorry. Alright, so
    on the other hand, I can write beautiful
  • 12:14 - 12:18
    code and put it under an Open Source
    license and put it on a USB stick and bury
  • 12:18 - 12:25
    it somewhere in my garden. Okay, so then
    it's neither findable nor accessible and
  • 12:25 - 12:31
    this is of course also something where the
    classical definitions for Free Software
  • 12:31 - 12:35
    don't necessarily match these two
    criteria, which nevertheless also for
  • 12:35 - 12:43
    software do make sense. Finally one last
    thing is that FAIR defines a product, so
  • 12:43 - 12:46
    it says: Okay, so the outcome of your
    research needs to comply with different
  • 12:46 - 12:51
    criteria and that's of course a relatively
    easy thing to test. What it does not do
  • 12:51 - 12:56
    and maybe from a software development
    perspective this is something that is more
  • 12:56 - 13:01
    important, it doesn't define a process how
    we do things. And this is one of the
  • 13:01 - 13:09
    things that also one of the German
    committees so the RfII has recently
  • 13:09 - 13:15
    started to criticize for FAIR that we say
    okay, FAIR data just says this one, but
  • 13:15 - 13:20
    you can have completely rubbish data and
    it can still be FAIR. But what we want to
  • 13:20 - 13:28
    have is high quality FAIR data. So FAIR
    clearly is some kind of minimal consensus
  • 13:28 - 13:35
    it's condicio sine qua non, but we
    probably need to extend it at this point
  • 13:35 - 13:41
    and of course was this one we can also
    discuss on how we want to continue, how we
  • 13:41 - 13:49
    want to get this into or align this with
    Free Software. Okay, so that's more or
  • 13:49 - 13:55
    less the brief introduction, now there are
    a couple of things that we can discuss
  • 13:55 - 14:02
    further, depending on your interest. And
    that would be basically what about the
  • 14:02 - 14:06
    current European policies, before we
    review what about the current German
  • 14:06 - 14:16
    policies, what about generic Free Software
    tools. But maybe that's the point where
  • 14:16 - 14:32
    you could say something to
    get us going a bit.
  • 14:32 - 14:35
    Question: I think it's working -- You
    mentioned that the current software
  • 14:35 - 14:40
    standards might not be in line with the
    policies, what were you exactly referring
  • 14:40 - 14:42
    to?
    Answer: Can you repeat this?
  • 14:42 - 14:46
    Q: You mentioned before that the current
    software procedures or standards might not
  • 14:46 - 14:51
    be in line with the policies in the
    European Union. What exactly did you mean
  • 14:51 - 15:04
    by that?
    A: So the thing is that the so I can
  • 15:04 - 15:11
    comply with OSI regulations for Open
    Source Software, but none of our funding
  • 15:11 - 15:18
    bodies says you need to be OSI compliant.
    What they say typically is you should do
  • 15:18 - 15:24
    stuff that is FAIR but right now one of
    the issues, this is what basically this
  • 15:24 - 15:32
    slide then says, is the question whether
    any of the policy makers really define
  • 15:32 - 15:38
    code as a primary research object. And
    that's right now not the case so therefore
  • 15:38 - 15:44
    everyone assumes that code behaves like
    data and to equal code with data is
  • 15:44 - 15:50
    something where some people get cold
    shivers, others don't because it is an
  • 15:50 - 15:55
    operation that you can do, it's a lossy
    operation, but it might be it might help
  • 15:55 - 16:03
    us in some ways. And the main point here
    is that code has some idiosyncrasies that
  • 16:03 - 16:07
    make it distinct from data and this is
    where our policies break. On the other
  • 16:07 - 16:12
    hand, some of the policies that we came up
    -- not for research but in general, so
  • 16:12 - 16:18
    from the from the Free Software
    perspective -- that we made up there,
  • 16:18 - 16:23
    didn't make it into the policy documents
    and so therefore are not incorporated
  • 16:23 - 16:30
    there. Okay, so FAIR criteria and the
    other ones don't completely overlap. So
  • 16:30 - 16:34
    most people might write code but it still
    won't align with a FAIR criterion if you
  • 16:34 - 16:48
    would take it one to one.
    Q: So a question about the topic item to
  • 16:48 - 16:53
    start the licensing. So when we say we
    have a commercial company who like
  • 16:53 - 16:59
    Microsoft who develops an office package
    and when you say Free Software for Open
  • 16:59 - 17:05
    Science it would be better to like invest
    the money not into license cost where
  • 17:05 - 17:10
    reoccurring but better for like and like a
    bigger thing like country to invest more
  • 17:10 - 17:18
    in like open code or like open programs.
    Is this kind of like tackled by what you
  • 17:18 - 17:25
    mean with the FAIR or the Open Source?
    A: This is this is one of the things that
  • 17:25 - 17:32
    not necessary is not necessarily so you
    could construct it in a way that it
  • 17:32 - 17:37
    actually overlaps with FAIR. Because
    you're talking about reproducibility, oh
  • 17:37 - 17:42
    well so okay, FAIR doesn't say
    reproducibility but it says accessibility
  • 17:42 - 17:46
    and if you're using formats that are
    proprietary you could say okay well this
  • 17:46 - 17:51
    is not accessible to everyone because you
    need to pay for it. Now the thing is that
  • 17:51 - 17:55
    there are a lot of things where you have
    to pay for so this was one of the things
  • 17:55 - 18:03
    that was never on the agenda to try to be
    eradicated. This is, so the generic
  • 18:03 - 18:09
    software part is just something that I
    that came into this whole process later,
  • 18:09 - 18:17
    initially it was really geared towards
    the: How can scientists make sure that or
  • 18:17 - 18:21
    how does the software produced by
    scientists is both Free Software and
  • 18:21 - 18:27
    contributes to Open Science and what do we
    need to do to create potentially
  • 18:27 - 18:33
    additional funding opportunities for,
    because this is where typically breaks, to
  • 18:33 - 18:40
    say well I can write better code if I have
    more man or woman power, if I have people
  • 18:40 - 18:46
    who curate, if I have people who do who do
    issue fixing and so on and so forth. Which
  • 18:46 - 18:53
    right now is not considered part of the
    research process but in reality, so by the
  • 18:53 - 18:58
    policy makers, but in reality it already
    has become that. Now if you're saying you
  • 18:58 - 19:04
    are using generic software or generic
    office suits for that one, then yes, we
  • 19:04 - 19:09
    are investing a lot on in these things in
    the tertiary education and in the research
  • 19:09 - 19:16
    sector and, personal opinion, yes we
    should spend this on things that doesn't
  • 19:16 - 19:22
    nudge people towards proprietary
    solutions. But the question there but
  • 19:22 - 19:29
    that's something that is because it it has
    a stronger education component also for
  • 19:29 - 19:35
    student education, so I wanted to bring it
    up here because I thought okay maybe it's
  • 19:35 - 19:41
    something that more people here are
    interested in. But I agree that it doesn't
  • 19:41 - 19:49
    overlap completely, doesn't strongly
    overlap with the with the Open Science
  • 19:49 - 20:00
    part.
    Q: Right, okay. I've heard some people
  • 20:00 - 20:05
    work on the FAIR principles specific for
    software. You've heard about it and you
  • 20:05 - 20:14
    know what kind of the differences are?
    A: Yes, so thanks for this input. So let
  • 20:14 - 20:24
    me check. Okay I've missed that one. So
    yeah, there's a recent paper that just
  • 20:24 - 20:33
    came out a couple of weeks ago by Anna-
    Lena Lamprecht, she's from the Netherlands
  • 20:33 - 20:42
    eScience Center. So what they try to do
    is, they to use the catalog or this the
  • 20:42 - 20:48
    original FAIR criteria and check for each
    of those ones does it apply to software,
  • 20:48 - 20:59
    yes or no? And then change them, amend
    them in a way to make sure that it then,
  • 20:59 - 21:04
    well, better fits into the process. So
    they for example say well so there needs
  • 21:04 - 21:10
    to be some kind of documented quality
    control, they're more talking of course
  • 21:10 - 21:14
    about software repositories, they then
    include versioning, which is one of the
  • 21:14 - 21:19
    huge things that sets code apart from
    data, which is once it's released
  • 21:19 - 21:25
    typically a rather static object. So
    they're trying to get somewhere and I
  • 21:25 - 21:35
    think it's, it's a good document to start
    with but in my personal opinion, I think
  • 21:35 - 21:39
    it wasn't bold enough. You might have
    been, I mean we had this discussion at the
  • 21:39 - 21:48
    RSE19 conference also, where Anna-Lena
    also was there, and it tries to stick very
  • 21:48 - 21:53
    closely to FAIR, because they assume that
    this is what people know. Which I think is
  • 21:53 - 21:57
    good. On the other hand there's a very
    clear recommendation form most bodies that
  • 21:57 - 22:02
    FAIR should not be extended, so we don't
    need, as they say, we don't need
  • 22:02 - 22:07
    "additional letters" for FAIR and they
    really want to have those basically as one
  • 22:07 - 22:15
    concept to stick on to stick with data. So
    therefore I think it would have been
  • 22:15 - 22:23
    necessary have a bolder step to to try to
    work in all the established development
  • 22:23 - 22:29
    policies that we already have than just to
    stick as close as possible to FAIR and
  • 22:29 - 22:34
    then just change the nitty-gritty details,
    which is what they did. But nevertheless I
  • 22:34 - 22:38
    think it's it's something that is clearly
    worth reading.
  • 22:38 - 22:43
    Q: Thanks a lot for your talk this
    resonated a lot with me and as someone
  • 22:43 - 22:50
    working in research infrastructure I think
    it's super important that we focus on
  • 22:50 - 22:56
    recognizing research infrastructure so all
    kinds of services like sustainable data
  • 22:56 - 23:02
    storage for researchers, tools that help
    make data discoverable and things like
  • 23:02 - 23:05
    that. That this should be considered a
    public good right?
  • 23:05 - 23:09
    A: Yes
    Q: And so next to what you mentioned and
  • 23:09 - 23:14
    rightly so with Microsoft, the other risk
    that I currently see, is that legacy
  • 23:14 - 23:21
    publishers like Elsevier, like Springer-
    Nature and so on, try to capture the whole
  • 23:21 - 23:30
    market so this all as trying to deliver on
    all the needs that researchers have in the
  • 23:30 - 23:38
    digital area with huge platforms. And this
    is like a battle that we almost have lost
  • 23:38 - 23:45
    already, as it seems. So there are many
    interesting very good free and open source
  • 23:45 - 23:51
    alternatives to what they deliver but it's
    really not recognized very well why this
  • 23:51 - 23:57
    is so important. This is my impression.
    A: Yeah I mean I would I would second
  • 23:57 - 24:03
    that. So, I think and this is it's
    interesting to see the large publishing
  • 24:03 - 24:08
    companies now really moving away from
    their traditional business because
  • 24:08 - 24:12
    apparently they have recognized that they
    might be on a losing path there. But
  • 24:12 - 24:19
    really to offer a wholesale data
    management solutions to institutes. I mean
  • 24:19 - 24:23
    there is, this is probably just an
    anecdote, but so apparently Elsevier
  • 24:23 - 24:29
    offered to I think the Netherlands or the
    Dutch government to say that they said:
  • 24:29 - 24:35
    Okay, we do all of your data management or
    basically you get everything for free, but
  • 24:35 - 24:41
    each and every institution has to deliver
    but we become your central data deposition
  • 24:41 - 24:50
    platform. Which well, unfortunately it
    might appeal to some politicians, I think
  • 24:50 - 24:56
    it doesn't appeal to anyone else in this
    room given that probably Elsevier is a
  • 24:56 - 25:03
    company that is even more hated than
    Microsoft for reasons completely unknown I
  • 25:03 - 25:08
    mean they just make a revenue of thirty-
    five percent every year so maybe we should
  • 25:08 - 25:18
    just buy stock options.
    Q: Oh thank you for your talk. What I not
  • 25:18 - 25:24
    completely understand is why we use the
    FAIR concept for as a point of reference
  • 25:24 - 25:29
    at all. Because I feel like this the
    concept of Open Access in science is far
  • 25:29 - 25:34
    more applicable to code. So in the end
    code is text and it's part of the
  • 25:34 - 25:39
    scientific publication system, so we have
    references from and to code and such
  • 25:39 - 25:47
    things. And the the Open Access yeah yeah
    the the concept of Open Access has the
  • 25:47 - 25:52
    same ancestors like the scientific
    publication system with the Mertonian
  • 25:52 - 26:00
    norms of science and such, so why don't
    treat code like scientific publications.
  • 26:00 - 26:05
    A: Ok, I'm honestly I'm relatively open to
    this idea because this is I mean is the
  • 26:05 - 26:11
    reason why we're having this discussion.
    The mainly what I'm presenting to you now
  • 26:11 - 26:16
    is mainly developed out of the existing EU
    policies and the EU talks about FAIR a
  • 26:16 - 26:21
    lot. Because for them it's an
    operationalized thing, it's something that
  • 26:21 - 26:23
    they would like to test in the end, they
    it's something that they would like to
  • 26:23 - 26:30
    score and so on so forth so that paper
    pushers have something to do with. But I
  • 26:30 - 26:37
    agree that we can simply say well in the
    end the openness is more important and
  • 26:37 - 26:47
    FAIR, as we already said, isn't open, so
    therefore the Open Access would maybe the
  • 26:47 - 26:53
    better point to to hook this up so yeah I
    agree on that.
  • 26:53 - 26:57
    postroll music
  • 26:57 - 27:20
    Subtitles created by c3subtitles.de
    in the year 2020. Join, and help us!
Title:
36C3 Wikipaka WG: Free Software for Open Science
Description:

more » « less
Video Language:
English
Duration:
27:15

English subtitles

Revisions Compare revisions