-
Well, it's almost time
to begin the presentation.
-
We will begin this last session
with a presentation on WikiCite,
-
led by Elizabeth Seiver,
Simon Cobb, and Liam Wyatt.
-
And I'll just let you introduce yourself.
-
Please don't hesitate
to take notes on Etherpad.
-
Thank you for everything.
-
Alright, let's get started.
-
So, I'm Elizabeth Seiver.
-
I'm the outgoing
program manager for WikiCite.
-
And I wanted to tell you all
a little bit about it.
-
Just as a show of hands, how many people
are already familiar with WikiCite?
-
That's great. I'm just glad
that so many of you are.
-
I was wondering how many people here--
I was thinking about it--
-
is just like, "Who are all these people
putting all the citations
-
in Wikidata and filling it up?"
-
And WikiCite is so much more.
-
So, we're all excited
to tell you about it today.
-
So, what is WikiCite?
-
The goal of WikiCite
is to collect all citations
-
for the sum of all human knowledge.
-
You know, just a little something.
-
And we're doing this in a number of ways.
-
And one of them
is via conferences and workshops
-
and getting together
the community of people
-
who are interested
in working on citations.
-
And it's a very diverse group of people.
-
So, of course, we have people
who are working in Wikidata,
-
and other Wikimedians.
-
We have librarians,
people into linked open data,
-
software engineers, data scientists,
open knowledge advocates--
-
coming together about
linked open bibliographic data.
-
So, in terms of the history of WikiCite,
-
it was founded as an initiative in 2016.
-
And we secured dedicated funding
for events for three years in 2018.
-
And as I mentioned,
you're probably familiar with the big--
-
the millions of citations
that we already have
-
that are hosted on Wikidata.
-
So, what are we doing in WikiCite
and with all these citations?
-
It's not just about collecting them.
It's about using them.
-
And it creates so many opportunities
for new projects.
-
So, one of the things
you can do with this data
-
is build data models
for bibliographic item types,
-
which should be exciting for people
who are into schemas.
-
You can also do open cataloging
and disambiguation--
-
sorry, my notes are not in sync with this.
-
And people are also building
tools on top of this.
-
Visualization tools, such as Scholia.
-
If you're interested at all
in open cataloging,
-
or author disambiguation,
-
or just even figuring out
how sources link together,
-
WikiCite is a good way to do that.
-
So, in terms of the direction
that WikiCite is heading in,
-
one of the things
we're trying to do is expand
-
all the types of things that are cited.
-
Right now, in Wikidata,
it's mostly journal articles.
-
We'd like to keep growing our community,
-
especially outside of the Global North
-
and outside of English
language publications.
-
And I realize this is actually something
that Liam will be talking about.
-
So, what we wanted to do now,
-
to do sort of a deep dive
into one of the uses of Wikidata.
-
So, for that, I would like
to introduce Simon Cobb.
-
Hi, everyone.
-
So, what I want to talk about
is an example of something
-
we could potentially focus on
within the scope of WikiCite.
-
And that's the data quality issues
that I've been encountering
-
over the last year, as I've been editing
on scholarly papers.
-
The three issues I'm going
to briefly touch on
-
are the quality of the author items
that are getting attached
-
to scholarly articles,
issues around DOI formats,
-
and just general curation
of the data that we're creating.
-
Firstly, we look at some authors.
-
Oh, sorry, firstly,
I'll provide some context.
-
We've got 26 million
scholarly article items now.
-
And the data quality issues
I'm going to talk about,
-
a very small proportion of these
-
are generally creating
quite good quality data.
-
We have a lot of external identifiers--
21.65 million PubMed IDs,
-
19 million DOIs, and we've added
8.3 million author statements,
-
although we still have 105.5 million
author name strings to replace.
-
In terms of the authors,
-
we've been creating a lot
of items from ORCID IDs.
-
We've got over half a million items
with an ORCID ID now.
-
But over 50% of those
do not have any affiliation data yet.
-
And that's now in employer
or in educated at.
-
I found 25,000
-
where we only have two statements.
-
That's an ORCID ID,
and an instance of a human.
-
This isn't particularly
useful in terms of--
-
we use for anyone else
and beyond Wikidata.
-
If we're serious about approaching
a bibliographic database
-
and providing open data for people,
-
we really need to be focusing
on quality, I believe.
-
So, there's a lot of work to be done.
-
We've done really well
with automatic input,
-
but I think we need to, in the future,
-
step back and think
how can we really make this data useful.
-
And one of the ways to do that
-
is by making our author items
better quality
-
by adding affiliation information,
adding first names, surnames,
-
and just moving beyond
occupation researcher,
-
trying to get what field people
are working in, for example.
-
Moving on to DOIs.
-
When I was looking at how many
scholarly papers we have now,
-
I immediately noticed that we have DOIs
that are just four characters.
-
And that is not a correct DOI.
-
We've got about 110 items
with this DOI format.
-
In the grand scheme of things,
not that big a problem.
-
But that's never been a correct DOI
-
that's being created
by an automatic process.
-
No one's checked that and realized
we had this error and corrected it.
-
So, it's kind of an appeal
I want to make to people--
-
if you're doing batch imports,
to check what you're doing,
-
look for these obvious
data quality problems.
-
And another final issue
that I've noticed is errata.
-
We have over 13 thousand items
that are instance of errata,
-
but they're not linked
to the paper they're correcting.
-
So, I've also produced a table
of the top ten titles of the--
-
these are errata items.
-
You will notice they're not
particularly informative.
-
So, as some point,
we're going to have to go back
-
and look at how we can actually
get the information
-
about what these errata are correcting,
-
because they're not really
of much use to anyone at the moment.
-
So, in the future, I hope this is one area
that we can work on as a community,
-
and we can coordinate a bit better
with what data imports we're doing,
-
and how we can curate all our data,
bring it all together,
-
and combine our expertise.
-
I'm going to pass over to Liam now
-
to talk a bit about how we might be able
to coordinate our efforts in the future.
-
Thank you.
-
So, as mentioned
in the final slide from Elizabeth,
-
WikiCite is trying to be
more and more diverse,
-
and high quality, and more widely spread.
-
The idea is over the next year or so,
with the dedicated funding
-
that's been provided and is available
over a three-year period,
-
of which we've entered,
-
to change WikiCite-- the conference--
which there's been a few--
-
into a series of proposals from you,
-
into what we're calling
"satellite events" around the world.
-
This will be focusing--
there'll be a call for a proposal system--
-
like reviewing a procedure
-
that is currently not yet invented
-
for deciding on how to--
-
what's the word I'm after--
-
prioritize these requests.
-
And see if we can't get a wider diversity
-
of content contributor and topic
-
supported in the WikiCite umbrella,
-
through this series of satellite events.
-
To that end,
-
the WikiCite grant--
-
was successfully applied for and received
-
through the work
of WikiCite's father, Dario,
-
who many of you might know
from the Wikimedia Foundation.
-
Dario no longer works
with the Wikimedia Foundation,
-
and so this grant has a--
-
needed a home.
-
What has happened
is that the WikiCite steering committee,
-
primarily made up of the organizing team
-
from last year's WikiCite conference,
-
will continue to oversee this work,
-
and the Wikimedia Foundation
has hired a temporary
-
or a part-time coordinator,
-
to oversee and support that work,
-
and to promote and receive
those applications
-
for the satellite events.
-
And that will be me.
-
(laughter and cheers)
-
So, I got the call yesterday
-
so that I could be able
to like confirm that in--
-
among an audience
which is highly relevant to that topic.
-
Which is helpful, so I can talk to you
here and now about that.
-
So, this is listed as a panel
in the program.
-
Even though it's a bit of a--
-
I think panel is a generous way
-
of describing the three of us
in this context.
-
But the idea is we would like
to hear from you
-
on that immediate thought about--
-
or questions to Simon, as well--
-
if you have questions
for Simon, specifically--
-
about what you think are good directions
-
that should be addressed
or should be attempted
-
in this forthcoming year,
-
either individually, online--
-
and things that not
necessarily you can do,
-
but think should be done.
-
And specifically, to start thinking about
-
what a satellite event would mean
-
with relation to open citations
-
and how the community at large
would best be served
-
by that kind of support.
-
Beyond merely financial,
but what does support mean
-
for satellite events in open citations
according to you.
-
If you want to come back up, and we can--
-
Did you have a question?
-
(woman) Ah, yes. I do research
on predatory publishing
-
and on retractions.
-
You only mentioned errata.
-
So, how are you dealing
with expressions of concern
-
and retractions?
-
And what is your policy on trying
to identify predatory publishers?
-
Okay, so, within the scope
of preparing for this,
-
I wasn't looking at retractions,
but people have been doing work on that
-
and trying to-- we have a property--
notice of retractions--
-
so we can be creating those links.
-
I don't know what extent
that's happened in the same way.
-
Not all the errata are linked
to the paper that's being corrected.
-
I suspect that's a similar case with--
-
- (woman) It's exactly the same.
- Yeah.
-
As I said, I wasn't looking at that,
-
but we can potentially link the retraction
to the retracted article,
-
the retraction notice
to the retracted article.
-
In terms of predatory publishers,
-
I'm not aware of anyone
having done any work in this area,
-
but I wouldn't like to say
that hasn't happened.
-
We have Charles, whose hand
is going up there.
-
Do you want to comment
on predatory publishers, Charles?
-
(Charles) Well, I encountered
this problem in the ScienceSource project.
-
And first of all, I did what I could
to put fields list in Wikidata format.
-
Fields list isn't sort of what everybody
wants to be dealing with,
-
but it was a starting point.
-
So, that has been done,
as far as I was able to.
-
But the thing I rely on more, perhaps,
-
is DOAJ IDs.
-
That is, if we put all the DOAJ IDs
-
into Wikidata,
-
we'd have made a really good attempt
-
to isolate the predatory publishers.
-
And that is not the whole story,
-
but these days,
it's the bulk of the story.
-
(woman) [Is the directory
of open access there?]
-
- (Charles) Directory of open access, yes.
- (woman) Alright, good.
-
(man) To start with, I just spent a year
traveling around New Zealand
-
trying to explain Wikidata
to the library community,
-
and as soon as I mentioned WikiCite,
their eyes rolled,
-
because they've just been told
they have to be [up] with Wikipedia,
-
Wiki Commons, Wikidata.
-
Here's another Wiki project
that they need to know about.
-
"Why can't we just do it all
with Wikidata?" they were saying.
-
So, there's a public perception
problem straightaway,
-
and that's the very community
that we need to have onboard
-
for this to work.
-
I'm interested in thinking
how we are going to reach
-
the library community, educate them,
and get them integrally involved
-
in this process?
-
I have thoughts, but I'd like
to hear your thoughts first.
-
- Sure, I think--
- (assistant) [This one is on.]
-
This better? Alright.
-
Feel like I'm in a concert.
-
So, one of the things we've tried to do
-
is incorporate librarians and libraries
-
into WikiCite in everything that we do.
-
So, on the steering committee,
-
we have at least
two librarians, if not more.
-
And at our actual WikiCite events,
-
one of the things that's actually
pretty great about WikiCite
-
is that we end up getting
both speakers and participants,
-
who maybe are not actually involved
in any Wiki projects.
-
So, we don't have Wiki fatigue.
-
And a lot of times, they're coming
from the perspective of...
-
"Well, I'm interested in linked open data,
-
I love to use citations at my university,
-
can you tell me a little bit more
about how Wikidata works,
-
and how I might use the citations
that are in Wikidata?"
-
So, I think it's very much
about bringing these communities together,
-
which might seem disparate,
around these common goals
-
for people who are really concerned
about curating data,
-
and then, people who might already know
about how to do that on Wikidata.
-
I would say, in terms of the confusion,
-
the complexity implied by the question
of well, there's WikiCite,
-
and there's Wikidata, and there's this...
-
WikiCite is a brand name,
it's a project-- GLAM-Wiki--
-
GLAM-Wiki also uses the word Wiki,
-
but it's not pretending to be a Wiki
-
or competing with Wikipedia and Wikidata.
-
It's the particular focus area
of reference information,
-
"referenceable" information.
-
Now, particularly in the context
of a series of conferences
-
that have happened
over the last few years,
-
and the conference is called WikiCite--
-
particularly within this community,
the Wikidata core group,
-
WikiCite is seen, known, understood
as a large number of items
-
uploaded to Wikidata
about scholarly publications.
-
That is what is understood as WikiCite
by this community, mostly.
-
I would like to--
-
there is a question about,
-
could WikiCite be made
into its own Wikibase
-
of just citation stuff?
-
Not Wikidata, and then there's federation,
-
and funky things like that,
-
and you could put a lot more
very specific information
-
about individual, citable things there,
-
which is a perfectly valid way
-
of dealing with questions
of notability and properties.
-
But the technology for doing that
-
is not yet relevant in any way.
-
We need a lot more work,
particularly on federation in Wikibase
-
to make sure everything syncs neatly.
-
So, until such time
as that would be a viable outcome,
-
in the meantime, all of the things
that would serve that kind of outcome
-
also serve just improving
the quality on Wikidata
-
and improving the links
with Wikipedia and Wikisource.
-
The brand name is,
as far as I'm concerned, irrelevant.
-
It's just the project to make
better footnotes.
-
(woman 2) Just a comment
in relation to your query
-
about satellite proposals
-
for satellite conferences--
-
I don't think you realize
the level of ignorance
-
about Wiki-anything
from our country in New Zealand.
-
I mean, seriously.
-
As an Australian, I recognize
the ignorance of New Zealanders--
-
(laughter)
-
(woman 2) Oh, [inaudible], come on!
-
What I'm trying to say
is that if we have a satellite
-
or somehow organize
a joint satellite conference,
-
from my perspective, what I'm looking for
-
is strategies and how
to engage the community.
-
They aren't even at the level of being--
-
they don't know enough
to even be enthusiastic
-
about Wikidata and WikiCite yet.
-
They look at it with a lot of skepticism,
if they're even aware of it.
-
So I, in particular, want to be able
to have a meeting
-
in order to be able to learn from those.
-
We've already engaged
more successfully with the community
-
to get a skill base in order to build
some collaborations in New Zealand.
-
You're talking about extra people
to actually engage with.
-
I just want the core library community
to get on board,
-
and then go the extra step.
-
It's like I'm looking at you saying
that we want to reach out
-
to other communities,
-
and I'm saying, I just want
to reach out to a community.
-
You know, we're a lot further
behind where we are.
-
So, yeah.
-
I would not wish to pretend that WikiCite
and open bibliographic information
-
is the be-all and end-all of Wikidata
or Wikimedia outreach.
-
It's a specific subset.
-
And I would not wish to try
and make WikiCite a brand,
-
appear to be overriding or replacing
-
or somehow getting in the way
-
of just general, good quality outreach
about Wikimedia,
-
and working with libraries,
in general, and Wikidata,
-
even more specific.
-
This is a subset of Wikidata.
-
So, particularly, for WikiCite
satellite events,
-
I don't want to make it appear
like there's a competition for Wiki--
-
so, everything about Wikidata now
has to be called WikiCite-- no.
-
This is a really quite niche--
in the scheme of things-- topic area,
-
supporting general awareness-raising
-
about Wikidata
and open access information,
-
and Wikimedia is far beyond the scope
-
of this kind of particular
specialist outreach.
-
And that's not to say
that it's not a good thing, too.
-
(woman 2) I just perceived--
sorry, one more comment--
-
WikiCite as the possible inroad
-
to those at the wider community
-
for the people we want to get on board.
-
So, to me, WikiCite is--
yes, it's a subset,
-
and really a much smaller set
of beliefs and information, et cetera--
-
but I see it as an easy steppingstone
to get them addicted,
-
and then you can open it up.
-
So, yeah.
-
(assistant) We have just time
for one short question.
-
So, one of you have another question
for the WikiCite team?
-
Thank you for sharing
this feedback with us.
-
Oh, somebody has a question.
-
(assistant) Which one of you wants to...
-
(woman 3) Hi, thank you so much for this.
-
I was just wondering,
is there ever going to be
-
a paring of the bibliography
used in Wikipedia articles and WikiCite?
-
Are you planning to move
all those references and parse them
-
so that we can do some analyses
of which references we're using
-
in the Wikipedia articles--
-
and when you create an article
in another language
-
just to get suggestions of this,
are the references that have been used,
-
kind of like that.
-
I know one of the short-term goals
of WikiCite is to have all citations
-
in WikiProjects represented in Wikidata.
-
Currently, there's not
an automatic pipeline
-
that keeps that updated,
-
but that's definitely one
of our primary goals.
-
And ultimately, there
is not specific support
-
in the developer community
for that kind of activity in particular.
-
That's on the interests
of individual community members
-
to do exports-- like all this work
that's been demonstrated
-
that's not from the foundation--
-
people doing individual work
on their interests.
-
So, that could be a good satellite event
-
to try and explore that kind of work.
-
Getting it a good pipeline
so that you can make references
-
in Wikipedia's easily hook
into Wikidata items,
-
multilingual, et cetera--
-
does not yet exist technologically,
-
and certain languages
have concerns about that.
-
The larger the Wikipedia language,
the more defensive they are
-
about using Wikidata directly.
-
But that'll come.
-
Yeah, I was just going to say
when Liam's finished with that--
-
that it's strictly citations or something
that are very much within scope,
-
and what we would like to work for,
but that needs community to build this,
-
to take on that challenge, I think.
-
And also, we need to be doing the outreach
to the Wikipedians to show them
-
that we can provide good
quality data consistently.
-
(assistant) We are running out of time.
-
So, if someone has another question
-
I think that these nice people
will ask you privately after.
-
So, it's time for us,
for the last edition,
-
and we are welcoming on stage.
-
Jean-Fred, Envel, and...
-
(applause)