Well, it's almost time
to begin the presentation.
We will begin this last session
with a presentation on WikiCite,
led by Elizabeth Seiver,
Simon Cobb, and Liam Wyatt.
And I'll just let you introduce yourself.
Please don't hesitate
to take notes on Etherpad.
Thank you for everything.
Alright, let's get started.
So, I'm Elizabeth Seiver.
I'm the outgoing
program manager for WikiCite.
And I wanted to tell you all
a little bit about it.
Just as a show of hands, how many people
are already familiar with WikiCite?
That's great. I'm just glad
that so many of you are.
I was wondering how many people here--
I was thinking about it--
is just like, "Who are all these people
putting all the citations
in Wikidata and filling it up?"
And WikiCite is so much more.
So, we're all excited
to tell you about it today.
So, what is WikiCite?
The goal of WikiCite
is to collect all citations
for the sum of all human knowledge.
You know, just a little something.
And we're doing this in a number of ways.
And one of them
is via conferences and workshops
and getting together
the community of people
who are interested
in working on citations.
And it's a very diverse group of people.
So, of course, we have people
who are working in Wikidata,
and other Wikimedians.
We have librarians,
people into linked open data,
software engineers, data scientists,
open knowledge advocates--
coming together about
linked open bibliographic data.
So, in terms of the history of WikiCite,
it was founded as an initiative in 2016.
And we secured dedicated funding
for events for three years in 2018.
And as I mentioned,
you're probably familiar with the big--
the millions of citations
that we already have
that are hosted on Wikidata.
So, what are we doing in WikiCite
and with all these citations?
It's not just about collecting them.
It's about using them.
And it creates so many opportunities
for new projects.
So, one of the things
you can do with this data
is build data models
for bibliographic item types,
which should be exciting for people
who are into schemas.
You can also do open cataloging
and disambiguation--
sorry, my notes are not in sync with this.
And people are also building
tools on top of this.
Visualization tools, such as Scholia.
If you're interested at all
in open cataloging,
or author disambiguation,
or just even figuring out
how sources link together,
WikiCite is a good way to do that.
So, in terms of the direction
that WikiCite is heading in,
one of the things
we're trying to do is expand
all the types of things that are cited.
Right now, in Wikidata,
it's mostly journal articles.
We'd like to keep growing our community,
especially outside of the Global North
and outside of English
language publications.
And I realize this is actually something
that Liam will be talking about.
So, what we wanted to do now,
to do sort of a deep dive
into one of the uses of Wikidata.
So, for that, I would like
to introduce Simon Cobb.
Hi, everyone.
So, what I want to talk about
is an example of something
we could potentially focus on
within the scope of WikiCite.
And that's the data quality issues
that I've been encountering
over the last year, as I've been editing
on scholarly papers.
The three issues I'm going
to briefly touch on
are the quality of the author items
that are getting attached
to scholarly articles,
issues around DOI formats,
and just general curation
of the data that we're creating.
Firstly, we look at some authors.
Oh, sorry, firstly,
I'll provide some context.
We've got 26 million
scholarly article items now.
And the data quality issues
I'm going to talk about,
a very small proportion of these
are generally creating
quite good quality data.
We have a lot of external identifiers--
21.65 million PubMed IDs,
19 million DOIs, and we've added
8.3 million author statements,
although we still have 105.5 million
author name strings to replace.
In terms of the authors,
we've been creating a lot
of items from ORCID IDs.
We've got over half a million items
with an ORCID ID now.
But over 50% of those
do not have any affiliation data yet.
And that's now in employer
or in educated at.
I found 25,000
where we only have two statements.
That's an ORCID ID,
and an instance of a human.
This isn't particularly
useful in terms of--
we use for anyone else
and beyond Wikidata.
If we're serious about approaching
a bibliographic database
and providing open data for people,
we really need to be focusing
on quality, I believe.
So, there's a lot of work to be done.
We've done really well
with automatic input,
but I think we need to, in the future,
step back and think
how can we really make this data useful.
And one of the ways to do that
is by making our author items
better quality
by adding affiliation information,
adding first names, surnames,
and just moving beyond
occupation researcher,
trying to get what field people
are working in, for example.
Moving on to DOIs.
When I was looking at how many
scholarly papers we have now,
I immediately noticed that we have DOIs
that are just four characters.
And that is not a correct DOI.
We've got about 110 items
with this DOI format.
In the grand scheme of things,
not that big a problem.
But that's never been a correct DOI
that's being created
by an automatic process.
No one's checked that and realized
we had this error and corrected it.
So, it's kind of an appeal
I want to make to people--
if you're doing batch imports,
to check what you're doing,
look for these obvious
data quality problems.
And another final issue
that I've noticed is errata.
We have over 13 thousand items
that are instance of errata,
but they're not linked
to the paper they're correcting.
So, I've also produced a table
of the top ten titles of the--
these are errata items.
You will notice they're not
particularly informative.
So, as some point,
we're going to have to go back
and look at how we can actually
get the information
about what these errata are correcting,
because they're not really
of much use to anyone at the moment.
So, in the future, I hope this is one area
that we can work on as a community,
and we can coordinate a bit better
with what data imports we're doing,
and how we can curate all our data,
bring it all together,
and combine our expertise.
I'm going to pass over to Liam now
to talk a bit about how we might be able
to coordinate our efforts in the future.
Thank you.
So, as mentioned
in the final slide from Elizabeth,
WikiCite is trying to be
more and more diverse,
and high quality, and more widely spread.
The idea is over the next year or so,
with the dedicated funding
that's been provided and is available
over a three-year period,
of which we've entered,
to change WikiCite-- the conference--
which there's been a few--
into a series of proposals from you,
into what we're calling
"satellite events" around the world.
This will be focusing--
there'll be a call for a proposal system--
like reviewing a procedure
that is currently not yet invented
for deciding on how to--
what's the word I'm after--
prioritize these requests.
And see if we can't get a wider diversity
of content contributor and topic
supported in the WikiCite umbrella,
through this series of satellite events.
To that end,
the WikiCite grant--
was successfully applied for and received
through the work
of WikiCite's father, Dario,
who many of you might know
from the Wikimedia Foundation.
Dario no longer works
with the Wikimedia Foundation,
and so this grant has a--
needed a home.
What has happened
is that the WikiCite steering committee,
primarily made up of the organizing team
from last year's WikiCite conference,
will continue to oversee this work,
and the Wikimedia Foundation
has hired a temporary
or a part-time coordinator,
to oversee and support that work,
and to promote and receive
those applications
for the satellite events.
And that will be me.
(laughter and cheers)
So, I got the call yesterday
so that I could be able
to like confirm that in--
among an audience
which is highly relevant to that topic.
Which is helpful, so I can talk to you
here and now about that.
So, this is listed as a panel
in the program.
Even though it's a bit of a--
I think panel is a generous way
of describing the three of us
in this context.
But the idea is we would like
to hear from you
on that immediate thought about--
or questions to Simon, as well--
if you have questions
for Simon, specifically--
about what you think are good directions
that should be addressed
or should be attempted
in this forthcoming year,
either individually, online--
and things that not
necessarily you can do,
but think should be done.
And specifically, to start thinking about
what a satellite event would mean
with relation to open citations
and how the community at large
would best be served
by that kind of support.
Beyond merely financial,
but what does support mean
for satellite events in open citations
according to you.
If you want to come back up, and we can--
Did you have a question?
(woman) Ah, yes. I do research
on predatory publishing
and on retractions.
You only mentioned errata.
So, how are you dealing
with expressions of concern
and retractions?
And what is your policy on trying
to identify predatory publishers?
Okay, so, within the scope
of preparing for this,
I wasn't looking at retractions,
but people have been doing work on that
and trying to-- we have a property--
notice of retractions--
so we can be creating those links.
I don't know what extent
that's happened in the same way.
Not all the errata are linked
to the paper that's being corrected.
I suspect that's a similar case with--
- (woman) It's exactly the same.
- Yeah.
As I said, I wasn't looking at that,
but we can potentially link the retraction
to the retracted article,
the retraction notice
to the retracted article.
In terms of predatory publishers,
I'm not aware of anyone
having done any work in this area,
but I wouldn't like to say
that hasn't happened.
We have Charles, whose hand
is going up there.
Do you want to comment
on predatory publishers, Charles?
(Charles) Well, I encountered
this problem in the ScienceSource project.
And first of all, I did what I could
to put fields list in Wikidata format.
Fields list isn't sort of what everybody
wants to be dealing with,
but it was a starting point.
So, that has been done,
as far as I was able to.
But the thing I rely on more, perhaps,
is DOAJ IDs.
That is, if we put all the DOAJ IDs
into Wikidata,
we'd have made a really good attempt
to isolate the predatory publishers.
And that is not the whole story,
but these days,
it's the bulk of the story.
(woman) [Is the directory
of open access there?]
- (Charles) Directory of open access, yes.
- (woman) Alright, good.
(man) To start with, I just spent a year
traveling around New Zealand
trying to explain Wikidata
to the library community,
and as soon as I mentioned WikiCite,
their eyes rolled,
because they've just been told
they have to be [up] with Wikipedia,
Wiki Commons, Wikidata.
Here's another Wiki project
that they need to know about.
"Why can't we just do it all
with Wikidata?" they were saying.
So, there's a public perception
problem straightaway,
and that's the very community
that we need to have onboard
for this to work.
I'm interested in thinking
how we are going to reach
the library community, educate them,
and get them integrally involved
in this process?
I have thoughts, but I'd like
to hear your thoughts first.
- Sure, I think--
- (assistant) [This one is on.]
This better? Alright.
Feel like I'm in a concert.
So, one of the things we've tried to do
is incorporate librarians and libraries
into WikiCite in everything that we do.
So, on the steering committee,
we have at least
two librarians, if not more.
And at our actual WikiCite events,
one of the things that's actually
pretty great about WikiCite
is that we end up getting
both speakers and participants,
who maybe are not actually involved
in any Wiki projects.
So, we don't have Wiki fatigue.
And a lot of times, they're coming
from the perspective of...
"Well, I'm interested in linked open data,
I love to use citations at my university,
can you tell me a little bit more
about how Wikidata works,
and how I might use the citations
that are in Wikidata?"
So, I think it's very much
about bringing these communities together,
which might seem disparate,
around these common goals
for people who are really concerned
about curating data,
and then, people who might already know
about how to do that on Wikidata.
I would say, in terms of the confusion,
the complexity implied by the question
of well, there's WikiCite,
and there's Wikidata, and there's this...
WikiCite is a brand name,
it's a project-- GLAM-Wiki--
GLAM-Wiki also uses the word Wiki,
but it's not pretending to be a Wiki
or competing with Wikipedia and Wikidata.
It's the particular focus area
of reference information,
"referenceable" information.
Now, particularly in the context
of a series of conferences
that have happened
over the last few years,
and the conference is called WikiCite--
particularly within this community,
the Wikidata core group,
WikiCite is seen, known, understood
as a large number of items
uploaded to Wikidata
about scholarly publications.
That is what is understood as WikiCite
by this community, mostly.
I would like to--
there is a question about,
could WikiCite be made
into its own Wikibase
of just citation stuff?
Not Wikidata, and then there's federation,
and funky things like that,
and you could put a lot more
very specific information
about individual, citable things there,
which is a perfectly valid way
of dealing with questions
of notability and properties.
But the technology for doing that
is not yet relevant in any way.
We need a lot more work,
particularly on federation in Wikibase
to make sure everything syncs neatly.
So, until such time
as that would be a viable outcome,
in the meantime, all of the things
that would serve that kind of outcome
also serve just improving
the quality on Wikidata
and improving the links
with Wikipedia and Wikisource.
The brand name is,
as far as I'm concerned, irrelevant.
It's just the project to make
better footnotes.
(woman 2) Just a comment
in relation to your query
about satellite proposals
for satellite conferences--
I don't think you realize
the level of ignorance
about Wiki-anything
from our country in New Zealand.
I mean, seriously.
As an Australian, I recognize
the ignorance of New Zealanders--
(laughter)
(woman 2) Oh, [inaudible], come on!
What I'm trying to say
is that if we have a satellite
or somehow organize
a joint satellite conference,
from my perspective, what I'm looking for
is strategies and how
to engage the community.
They aren't even at the level of being--
they don't know enough
to even be enthusiastic
about Wikidata and WikiCite yet.
They look at it with a lot of skepticism,
if they're even aware of it.
So I, in particular, want to be able
to have a meeting
in order to be able to learn from those.
We've already engaged
more successfully with the community
to get a skill base in order to build
some collaborations in New Zealand.
You're talking about extra people
to actually engage with.
I just want the core library community
to get on board,
and then go the extra step.
It's like I'm looking at you saying
that we want to reach out
to other communities,
and I'm saying, I just want
to reach out to a community.
You know, we're a lot further
behind where we are.
So, yeah.
I would not wish to pretend that WikiCite
and open bibliographic information
is the be-all and end-all of Wikidata
or Wikimedia outreach.
It's a specific subset.
And I would not wish to try
and make WikiCite a brand,
appear to be overriding or replacing
or somehow getting in the way
of just general, good quality outreach
about Wikimedia,
and working with libraries,
in general, and Wikidata,
even more specific.
This is a subset of Wikidata.
So, particularly, for WikiCite
satellite events,
I don't want to make it appear
like there's a competition for Wiki--
so, everything about Wikidata now
has to be called WikiCite-- no.
This is a really quite niche--
in the scheme of things-- topic area,
supporting general awareness-raising
about Wikidata
and open access information,
and Wikimedia is far beyond the scope
of this kind of particular
specialist outreach.
And that's not to say
that it's not a good thing, too.
(woman 2) I just perceived--
sorry, one more comment--
WikiCite as the possible inroad
to those at the wider community
for the people we want to get on board.
So, to me, WikiCite is--
yes, it's a subset,
and really a much smaller set
of beliefs and information, et cetera--
but I see it as an easy steppingstone
to get them addicted,
and then you can open it up.
So, yeah.
(assistant) We have just time
for one short question.
So, one of you have another question
for the WikiCite team?
Thank you for sharing
this feedback with us.
Oh, somebody has a question.
(assistant) Which one of you wants to...
(woman 3) Hi, thank you so much for this.
I was just wondering,
is there ever going to be
a paring of the bibliography
used in Wikipedia articles and WikiCite?
Are you planning to move
all those references and parse them
so that we can do some analyses
of which references we're using
in the Wikipedia articles--
and when you create an article
in another language
just to get suggestions of this,
are the references that have been used,
kind of like that.
I know one of the short-term goals
of WikiCite is to have all citations
in WikiProjects represented in Wikidata.
Currently, there's not
an automatic pipeline
that keeps that updated,
but that's definitely one
of our primary goals.
And ultimately, there
is not specific support
in the developer community
for that kind of activity in particular.
That's on the interests
of individual community members
to do exports-- like all this work
that's been demonstrated
that's not from the foundation--
people doing individual work
on their interests.
So, that could be a good satellite event
to try and explore that kind of work.
Getting it a good pipeline
so that you can make references
in Wikipedia's easily hook
into Wikidata items,
multilingual, et cetera--
does not yet exist technologically,
and certain languages
have concerns about that.
The larger the Wikipedia language,
the more defensive they are
about using Wikidata directly.
But that'll come.
Yeah, I was just going to say
when Liam's finished with that--
that it's strictly citations or something
that are very much within scope,
and what we would like to work for,
but that needs community to build this,
to take on that challenge, I think.
And also, we need to be doing the outreach
to the Wikipedians to show them
that we can provide good
quality data consistently.
(assistant) We are running out of time.
So, if someone has another question
I think that these nice people
will ask you privately after.
So, it's time for us,
for the last edition,
and we are welcoming on stage.
Jean-Fred, Envel, and...
(applause)