Welcome to digital dialogues.

We have a speaker today who I think
has one of the most interesting minds

in the field, and it will be a treat
to hear what Allan has to say

and take it on board,

and our associate director Trevor Muñoz
will be introducing Allan.

What I'd like to do is to have you all
introduce yourselves to begin.

Your name and where you're from,
and after you do that,

I'll ask Trevor to come up.

So Stephanie, do you want to start?

(audience members introduce themselves)

I'm gonna turn it over to Trevor now
who will introduce Allan

and then we'll get on with the show.

For those of you who came in afterwards,

My name's Trevor Muñoz.

I'm an associate director here.

It's my great pleasure to introduce
our Digital Dialogue speaker today.

Allan Renear is interim dean and professor

at the Graduate School of Library
and Information Science

at the University of Illinois,
Urbana-Champaign.

Allan also has a long and storied career
in the digital humanities.

Before he went to GSLIS,

he was the director of s scholarly
technology group at Brown University

where he did some groundbreaking
digital humanities research.

He wrote some of the, I would say,
seminal papers of digital humanities

around text encoding
and our ideas about documents.

I know he's updated some
of the ideas about documents.

I think we'll hear a little
about this today.

After leading a digital humanities group
at Brown for many years,

he went to the Graduate School of Library
and Information Science of Illinois

and while he's been there he's done
a long string of interesting work

around data curation,
foundational concepts

in our understanding
of digital systems,

digital objects, and this
recent work has taken him

from digital humanities into considering
objects such as scientific data sets,

and the systems we use
to manage and curate them.

As Neil mentioned, Allan has
one of the most interesting minds

in digital humanities, and I think we'll
all benefit from his incisive perspective

on things that we thought we knew.

So at this point I'll turn it over
to Allan to talk about

an Eliminativist Ontology
of the digital world,

and what it means for data curation.

So, welcome Allan.

Thank you.

Thank you for inviting me.
Thank you.

It's great to be here with my old friends
and my new friends.

And eliminativist,
it's a hard word to pronounce,

ontology of the digital world
and what it means for data curation.

You know, you always get these titles
well in advance of the actual talk

and you're sure you're going
to accomplish so much

by the time the talk rolls around.

Never quite do, so I'm not
going to [inaudible]

an Ontology of the digital world,

but I will say enough to suggest

how a particular kind
of Ontology might develop.

So this is more like <i>towards</i>

an eliminativist ontology
of the digital world.

It will be a kind of unapologetic,
reflective, almost philosophical

meditation on the conceptual foundations
of information science.

As Trevor indicated I was in the workplace
in digital humanities for about 20 years.

In the last few years I've enjoyed
indulging my pension

for the philosophy of the things
I've been doing for so long.

My work such as it is now is so social,

I cannot figure out what's mine
and what's other people's,

and I've practically given up.

Most of what I'm presenting here
has been collaboratively developed

by these people, and probably some others

and there are quite a number
of papers out there

in this vein if you want to read more.

But I make the slides and I also
am totally responsible,

not only for the mistakes and implicities,

but for anything that seems
just a little over the top,

that's probably mine.

I'm not sure that my colleagues
would agree with everything that I say,

but that's the problem
when you work collaboratively.

Deeply collaboratively.

Your really sign on for most
of what's being asserted,

not necessarily for all of it.

I also should give credit.
A lot of the projects that you'll see here

are funded by NSF and also IMLS located

at the Center for Informatics Research
and Sciences Scholarship at GSLIS,

directed by Carole Palmer.

Where do I point this?
I feel like a geezer.

At the screen? There we go.

I think it's fair to say
I'm going to be doing ontology.

I don't mean a lot by the word ontology.

I probably could say
conceptual modeling,

and that would work just as well,

so don't read too much
into the word ontology.

To make sure that you don't read
too much into the word ontology,

I'm going to talk a little bit
about something I'm not going to do,

and that is Meta-Ontology.

You may wonder, "why bother?",
but you'll see in a minute.

Meta-Ontology is,
as you can probably guess,

about ontology: assertions, analysis,
arguments, claims, etc. about ontology.

A claim in meta-ontology might be,

"when it comes to ontology,
there's no fact of the matter."

There's no theory independent,
society independent fact of the matter.

Ontologies are constructed
by people, by theories,

by shared interests, and so on.

That's a meta-ontological claim.
It's about the nature of ontology.

It's claiming that it's
in some broad sense relative.

A relativist claim about ontology.

Another common
meta-ontological claim,

well, actually, every meta-ontological
claim has, of course,

a companion claim that denies it,
so here are two meta-ontological claims.

One, there's a sharp distinction
between science and ontology.

Two: there's no sharp distinction
between science and ontology.

That's a meta-ontological claim.

So Willard Van Orman Quine,
probably the leading pholosopher

of the second half of the 20th Century,

was a relativist. He did not believe
there is any fact of the matter

with respect to ontology.

He also did not think there was
a sharp line between science and ontology.

He was a relativist
about everything, so [inaudible]

he was a relativist about ontology.

These are examples of issues with which
I am not going to concern myself.

And the reason I'm not going
to concern myself with these issues

is that they're very distracting.

No one ever changes their mind.
I no longer think that they're much fun.

I also don't think
that they are very important.

For the most part, no matter
what your meta-ontological views,

you [inaudible] ontology the same way.

Relativists and absolutist do ontology
more or less the same way.

Those who believe there is
a sharp dividing line

between ontology and science
and those who don't,

more or less do ontology the same way.

The actual practice of ontology,
apart from meta-ontology,

I find to be engaging and practical,
useful, an important thing to do.

So how is ontology done, typically?

By people with different,
<i>or no</i> meta-ontological views.

For the most part we start
with our beliefs about the world.

The beliefs we actually have.

These could be common sense,
ordinary beliefs

or they could be scientific beliefs.
They could be mathematical beliefs.

We start with those beliefs
and we ask ourselves,

"what must there be in the world
if these things that I believe are true?

"What kinds of things must there be
in the world if my beliefs are true?

What kinds of relationships
do they have to one another?"

and, when you make a list of the things

that apparently you think
that there are in the world,

sometimes that list looks too long.
It looks like you have some duplicates.

Perhaps you've been misled by language,

and you have two different words
for the same thing.

Perhaps you realized
that some kind of thing

was composed of other things.

Perhaps you also discover
you don't have enough things on the list,

and maybe you were confused by synonyms.

I did start by saying you think about
your beliefs, your concepts, beliefs,

and go on from there
but typically it's hard to do that

without looking carefully
at the sentences

that express our beliefs, and that's
where the synonyms and ambiguity come in.

When we do ontology, most of the time

we're thinking about what we believe,
but the device that assists us

in examining what we believe
are the sentences that we use

to express our belief.

So, starting from that point, we go on
to try to create a picture of the world

that is consistent
and simple and accurate

and reflects the world
that must be out there

if the beliefs that we have
are in fact true.

And from what I can see, it doesn't matter
what your meta-ontology is.

That's how you do ontology.

That's how a lot of ontology is done.

I've decided that if I've ever done
meta-ontology in the past,

I'm not going to be doing it anymore.

I'm sticking to ontology.

Those were preliminaries, and maybe
this one is a little bit as well.

The theme of this presentation
is Eliminativism,

and the basic idea here
is that with respect

to our beliefs about the world,

respect to our common sense
conceptual scheme,

some of the things
we think exist, don't.

Now, you may have encountered
this perspective in the past.

One place where it's
particularly prominent,

where it's been called Eliminativism

is in cognitive science,
where in the last 20 to 30 years

a number of cognitive scientists
have argued that our folk psychology

of desire, belief, action,
is profoundly misleading.

That in fact, there really are no beliefs,
desires, intentions [inaudible].

Instead, there are other things
that are more scientifically respectable,

that are more explanatory,
that will give a better account

of the same phenomena in the world

that we've been using belief,
desire, intention to describe.

Those cognitive scientists
were characterized

as eliminating folk beliefs,

and the word elimination, and I think
it was really in cognitive science

that it became particularly prominent,

the reason elimination
was important as a concept

to the cognitive scientists
doing this elimination

is that it contrasted with what
behaviorists were doing

when they reduced beliefs to behavior.

So instead of reducing beliefs
to dispositions to behave,

the more advanced cognitive scientists
instead wanted to give

an alternative account
of folk psychological notions.

An alternative account.

One that discarded them,
in a sense, completely,

unlike behaviorists, who were saying,

"I'll tell you what belief really is:
it's a disposition to behave,"

the cognitive scientists are alluding to,
maybe idealizing a bit or something,

were saying, "I'm not going to tell you

what belief really is,
because there are none."

You need to let it go
and adopt these other notions,

which will find much more service.

Most of my intellectual life
I have detested Eliminativists.

I now find myself
on the edge of becoming one.

In information science,
when we develop models

that presumably describe precisely
some process, for instance, or some domain

and we use a language that is intended
to be understood literally,

we discover problems that are such

that elimination of an entity,
of an entity type,

becomes a tempting solution.

This is, in my experience, particularly
the case where our models or ontologies

are representing processes
that involve change and identity.

Eliminativist strategies become
very tempting, at least to me.

I'm now going to explore some elimination

and it's hard to let these things go.

You may not want to.
Hence, the courage.

As I talk now, please feel free
to interject at any point,

now that we're sort of getting
to the interesting part.

I'm not sure exactly
how long this will take,

and how much time
there'll be for questions

so just speak up if you have a question
or clarification, or if you wish

to contradict me.

If I want to put your contradiction off,
I'll just do it.

(audience chuckles)

So change. We are often told, those of us
who've been in digital humanities

for a long time, and been through
the whole hypertext excitement,

and all the excitement
around things virtual,

are told repeatedly
that digital objects are fluid,

malleable.

More generally, that the digital world
is a place of constant change.

And even if you're not caught up
in the breathless hype of hypertext

and virtual worlds and such,

it does seem that the digital world
is a place of constant change.

After all, we add records to databases.

We edit documents.

Our files get larger and smaller.

We add things to our digital collections,
and we take them away.

A lot of stuff seems to be happening.

A lot of stuff seems to be changing
in the digital world,

and these changes
are absolutely essential

to the practical work that we do.

When we add a record to a database.

When we remove an item
from a collection.

When we edit a document.

Those are modifications
to digital objects, apparently

and you might say it's the whole reason
for having digital objects

so we can do things like that more easily.

So the digital world does seem to be
a place of constant change.

I'm going to argue that you are,
we are all, deluded.

Digital objects
are absolutely immutable.

So, questions before us:

When a digital object changes,
exactly what changes?

If digital objects can't change,
then what is really going on in the world

when we say, speaking loosely,
that they change?

And, what is a digital object anyway?

So here we go.

This is the beginning of the argument

that digital objects cannot change.
That they are immutable

and I can give several different
versions of this argument.

This is in a way the most general.

It relies upon your ordinary intuitions
about sentences.

Unlike some arguments
to the same conclusion,

it's not based on set theory
or discrete mathematics

nor is it restricted to the digital world.

Consider the sentence,
"I remember Verona."

Imagine that it's the first sentence
of the first chapter or draft of a novel.

Now, suppose the author edits
that sentence to read,

"I remember, but dimly, Verona".

The first sentence of the draft
has been modified.

It's been changed.

It's now longer.

I submit that if you weren't
on your guard,

none of those sentences
would have seemed suspicious.

The problem is,
exactly what got longer?

Something used to be three words
and is now five words.

Seems like it ought to be a sentence.
It consists of words, after all,

but what sentences would it be?

"I remember Verona"?
No.

That sentence did not get longer.

It's true, that sentence at one time
consisted only of three words,

but it still consists only of three words.

That sentence, "I remember Verona",

has not gotten longer.

"I remember, but dimly, Verona."
Is that the thing that got longer?

It's true that it's five words,
so it's longer than "I remember Verona",

but it's always been longer
than "I remember Verona."

It has not become longer
than "I remember Verona."

Did the paragraph get longer,
or the chapter, or the entire text?

The arguments I just gave here
apply equally to those things as well.

Just think of them
as a longer string of words.

I'm pausing for just a moment in case
somebody wants to interject something.

This is the, you might say,
the simple argument for immutability

for certain kinds of objects.

It's reasonable, but I just
have to stop and say,

"Wait a second! What do we actually mean
by modification or change anyway?"

I would submit that we mean
by modification or change

that something loses or gains a property.

In the case of the Verona sentence,
the point of the last slide

is to suggest that there's
no plausible candidate

on the landscape for that,
for a thing of that kind.

(audience member) That which the author
is trying to project into the mind

of the reader.

Has that changed?

(audience member) Yes.

I would posit that as one
of the things that's changed.

So if you think of writing
as a communicative intent

of projecting my thoughts
into your thoughts,

we mediated the paper,
word processor or whatever.

That's what's changed.

I have two kind of conflicting
answers to that,

and I have a slide devoted
to that particular assertion.

I don't actually disagree
with you on a deep level.

The author is, you might say,
to use your phrasing,

trying to project something else
into the mind of the reader,

and so there's a sense in which
what the author's trying to project

into the mind of the reader
has changed.

I admit that.
There's a sense.

But it's not the right sense,
because the thing

that the author was
trying to project at time T1

hasn't changed. It's still
"I remember Verona."

The thing the author was trying
to project in T2 hasn't changed,

it's still "I remember,
but dimly, Verona."

The author's trying to project
a new thing into the mind of the reader.

I would agree with that.

There's a slide I wasn't going to show
but since this came up I'll show later,

that uses a coffee queue
to illustrate the same point.

We say the first person
in line has changed,

the first person in line
used to be 50 years old,

but the first person in line
is now 20 years old.

We're not claiming that in the interim
somebody who was 50 became 20.

It's a good classic, and not very often
heard response to this, I think.

But in the end, we end up agreeing.

(audience member) Well, I don't know
how much you want to derail--

I don't want to derail your discussion,
but I might argue that the first sentence

was an imperfect projection
to that what I'm referring to,

and the second was the most accurate,

so it's not this thing T1, T2,

it's that the first try was a poor one.

So, you may complete
this line of reasoning, I think,

by saying, "you know, Allan, no one
was ever confused about this

"in the first place.

"We never really thought
that there was a thing that changed,"

and I'll say, I won't contest that,

but I'll be suspicious, and part
of my suspicion has to do

with UML diagrams that I have seen

that imply change where both
you and I would say there is none.

(audience member 2) Is there
a philosophically rigorous way

for identifying things
by their structural location

or what we may call, some kind
of abstract properties that they have

by virtue of the space
they mark out within a thing,

so for example, the classicists
have their [inaudible] of stuff,

and they, for them, Plato consists
of the following structure

where there may be disagreements
about the exact wording of line one,

but line one of the such and such
work is identifiable as a thing,

regardless of what words or specific words
or characters we think occupy that space,

in the same way, if we marked up
the text of Moby Dick,

what are we saying
of the first paragraph?

is something that if this were
a digital edition we could get an ID,

we could put to it, even if we had
disagreements about specific words

that are in there and I feel as though
our intuitions about the first sentence

of this thing
are sort of along those lines.

We're not talking
about those specific words,

we're talking about that structural piece
which then we apply our words

that you're going to destabilize.

But I wonder if this is just
kind of an intuitional way,

or if there is a philosophically
rigorous way of talking about that

and if so, I imagine it wouldn't change
what you're saying

but it would feel more satisfying
if you could speak in those terms.

So actually again I think that perspective
that you're taking right now

was one that was consistent
with where I'm going,

but it's actually harder than you think
to identify the paragraph

apart from its particular contents.

That is, to identify it
in a way that is consistent,

but the logic based modeling languages
that we typically use,

and I think that will become
apparent as we go.

So where was I?

So, what we're claiming, and this is,
especially after these two comments

is sort of becoming
[inaudible] now, but still,

we're claiming that the following
[inaudible] is false.

There exists something <i>x</i>,

and if you're familiar
with first order logic

you know this is a drum roll
that's needed right here,

because this is what indicates
our ontological commitments,

the fact that we're using
an existential quantifier

to say in a serious
ontological tone of voice,

"there is an <i>x</i> such that <i>x</i>
at T1 had length three,

and <i>x</i> at T2 had length five.

So the claim here is that
an assertion of this kind

is false, there's no such thing.

This is a topic that
Aristotle actually takes up

in the Physics and also
I think in the Metaphysics

and here's a quote from the Physics

where he's considering
a similar problem:

"There must be
a substrate <i>ὑποκείµενον</i>

"underlying all processes
of becoming and changing,

but what can it be
in the present case?"

He's asking about something
very similar to what we are discussing.

What can it be in the present case?

We are totally insane,
because, guess what?

We agree that "the first sentence
was three words and is now five"

can express a true proposition,

so now we're really taken aback, right?

But what we deny is that this
is the proposition that it asserts.

So we know that this sentence
can express a true proposition,

but we're denying that the proposition
that's expressed by this sentence

understood as true,
is this proposition.

The one that has
this logical form.

And yes, I'm distinguishing
proposition and sentence,

[inuadible]

We're denying that the sentence
is literally true

and in a way the notion of literal truth
is [inaudible] throughout.

So continuing in the same vein,
the claim is that sentences

like “Jane lengthened the ﬁrst sentence
of her novel” are idioms

such as the average plumber
has 3.2 children.

If you were to represent that in logic,
if you were doing a logic exercise,

you might be tempted if you
were in a hurry and it was like [inaudible]

to simply say, "well, there is something
that's an average plumber,

but of course, you're off
on the wrong foot already.

That would not be the right way
to formalize the proposition

expressed by that sentence

despite the fact that the surface
syntax of the sentence

might suggest that it is.

"There's a scarcity
of common sense in the room,"

I'm not saying there's something
which is the scarcity of common sense,

but even more ordinary sentences
like "Lumbergh revised the TPS memo."

(my favorite movie)

Sentences like that, yes,
they can express true propositions,

but the true proposition that they express
is not one that looks like "there is an <i>x</i>

such that <i>x</i> is the TPS memo,
and <i>x</i> was devised by Lumbergh."

So it's obvious that the average plumber
is a kind of logical fiction,

but I don't think it's obvious
that the TPS memo is a logical fiction.

Our claim is that it is.

It is, and that means
that if you're going to use

a logic based representation language,

like RDF, OWL, Classic,
whatever your favorite is,

you have a lot of work to do
to get from sentences like this

into a formalism that you can trust.

The great biologist, Richard Lewontin,
made a little more of a reprise of remarks

by Rosenbluth and Wiener,

"The price of metaphor
is eternal vigilance."

If you want to get
from an ordinary sentence like this

into a representation in a logic based
knowledge representation language,

and you want to be able
to really trust that representation

to never lead you astray in inferencing,

it's hard.

But if you don't get there,
you'll be relying on metaphors

and idioms and logical fictions,

and the price of metaphor is eternal
vigilance against confusing yourself.

Drawing UML rectangles
for things that don't exist.

So I'm going to move quickly
through some of these slides.

Sort of taking the temperature
of my audience,

I think we've assimilated
this basic argument.

I'm capable of belaboring things
at great length,

so I think I'll not.

I do want to suggest that
if you still find the argument irritating,

and are sure there must be
some way out,

you might see your problem
as trying to decide

which one of these
three things to reject:

documents are strings,
strings cannot be modified,

documents can be modified.

You can reject
more than one, but why?

If you can justify rejecting one,
you've gotten around the puzzle

that I presented you.

But, for each one that you reject,
you have an obligation.

If you reject the first, you need to offer
an alternative definition of document.

One that supports modification.

If you reject the second,
you need to reconcile modification

with the extensionality,
with the apparent immutability of strings

and if you reject the third,
then you have to give some account

of what's really going on

in cases of modification,
such as editing.

If editing is not the modification
of the document,

strictly speaking, then what is it?

So whichever one you reject,

you've got a kind of an obligation
in order to make your rejection

credible, plausible.

Just for fun, I'm going to call this
the MITH feud, 2013

and going to ask you,
those of you who think,

I'm going to ask you which
of the assertions

in the inconsistent triad
you would reject.

I'm slowing down
just to give you a chance to form your--

They can't all be true, right?

To form your opinion.

Alright.

Who wants to reject one?

(man in audience)
Documents are not strings.

Documents are not strings.

The party of documents are not strings?

Okay. Who wants to reject
"strings cannot be modified"?

Wow! Three. Okay.

Who wants to reject
"documents can be modified"?

Interesting. I've never had
such an even distribution.

With respect to the first assertion,
"documents are strings",

I have to confess that it was
a convenience to some extent

to assert that documents are strings.

[Karen Rickett] and I first presented this
at Extreme Markup, now called Balisage,

those are XML zealots,
and so we used the XML definition

for the XML standard:

"A textual object
is a well formed XML document if:

taken as a whole, it matches
the production labeled document..."

the only kind of thing that can match
the production is a string.

It's harder than you might think.

I shouldn't say that
you might think, but--

it's harder than sometimes, I think

to get out of this simply by denying
that documents are strings,

because most of the definitions
of document are text.

Even when they're not definitions
in terms of strings,

are nevertheless similar enough
in the right respects

that they're also unmodifiable.

At this conference, for instance,
it's very common to say

that a document is a graph,
meaning this kind of graph, you know?

And they mean that
in the mathematical sense.

But a graph is a set of tuples,

and sets of tuples can't change because
sets can't lose their data. [Grambergs]

So graphs don't work.

If you look closely at FRBR's notion
of the expression

as symbolic notation,
it's pretty much string like,

even if it's not a string.

A string in the mathematical sense
is a function from integers

into some domain of elements.

The notion of expression
is not exactly mathematical,

but it's clearly a sequence of elements

and our intuitions about
the Verona sentence, I think,

count against FRBR's
notion of expression.

Similarly, contextual criticism
cancels notion of a text,

I also think is not the kind of thing
that can be changed.

I'm going to come back
to that in a minute,

so this is not the end of [inaudible].

Strings cannot be modified.

Some of you said that strings
cannot be modified

as things can be modified.

So modification on my account
is a losing or a gaining of a property.

I would claim that a string
like "13571" has properties,

but it has no properties
that it can lose.

It has the property of having
[inaudible] five tokens,

of having one token [inaudible] twice,

[inaudible] 35, and so on,

but I would say that that string,
that string, can't lose these properties.

That string cannot lose those properties.

We cannot identify a thing that once
had one of those properties

and later, did not.

Now, I realize there are sub-properties
a string can have, and lose,

for instance, "13571" has the property
of being talked about in college [inaudible]

and it will lose that property.

But that's a pretty thin change, right?

That's not a change to the string.

That's a change in the relationship
between the string and some other thing.

It's like you might not be
the tallest person in the room,

but when the tallest person
in the room leaves,

you might become
the tallest person in the room.

Have you changed?
I would say no.

So the thing about strings is that
although they have some properties,

all of their inherent properties,
they have essentially

so they can't lose them.

They only have their relation
properties contingently.

So that is sort of the interesting thing
about things like strings.

they have some contingent properties,

but all of their contingent
properties are relational.

They have some inherent properties
which could count as properties

generating modification

if you could lose them,
but all of their inherent properties

are essential, so they can't
lose them, so they don't change.

In favor of "documents
can be modified", we all believe it.

It's part of what we say and do.

So these last three slides
are supposed to suggest

that it's not easy
to get out of this problem.

There are, in my mind,
four relatively significant responses.

One is to deny that documents are anything
of the kind I've been saying they are.

That they're material
objects in the world,

and material objects
in the world can change.

Another is to say that documents
are social objects

and social objects can change.

Another is to say that every time
we edit a document

there's actually
a new document being created,

so documents aren't really changing,

but what's happening is that
new documents are being created,

so this does deny that documents
can be modified.

The last one, which I gave
an asterisk to because I think

it's the one-- I'm not an eliminativist,
it's the one I'm going for,

the string-in-a-role strategy,

which argues that documents
are things like strings,

but they are not just strings,
they are strings

in a particular communicative role.

(man in audience) Can I
try another way out?

Yeah.

(man in audience) So, what if
I take the argument

that the three assertions
are <i>not</i> contradictory?

So take a look at the second one.
If we think of a string as an element

drawn from the set of all possible
combinations of characters,

then you're simply drawing
a new element from that set,

so if you look at it from that perspective
the three are not contradictory.

I guess this is closest
to the new document theory,

is that when you modify a document,
just simply drawing another string

from the set, you're not
modifying the string.

I would say that is
the new document theory.

Which is I think
the most popular response,

particularly from [inaudible]
computer scientist.

It does deny that documents
can be modified,

which is, I think, that strictly speaking,
literally speaking,

documents can be modified.

[inaudible]

So the string-in-a-role strategy.

string-in-a-role is somewhat harsh
in that it does deny

our common sense belief
that documents can be modified.

It also doesn't just do that, by the way.
It also finesses the definition

of document in a very subtle
and important way.

This response claims that a document
is a string in a particular role.

That in fact, being a document
is a property that strings come to have

in particular contingent
social situations.

And here's the finessing,
and it's an ontological

maneuver, you might say.

On this account, document
is not a type of entity.

Being a document is a role
that some entities come to have

in particular circumstances.

So document is a kind
of nominalization of a relationship,

the kind of thing
you would not express as--

at least it's plausible
that it would be inappropriate

to express in your UML diagram,
it would be inappropriate to have

a rectangle for documents.
Instead you would have

a rectangle for strings, and an arc
for being in a documentary role,

or something like that.

So compare this, and I get
the example from Guarino and Welty,

this is very well known,
the concepts of person and student.

A student is a person
in a particular role.

A person who has enrolled, let's say.

But a person is not a role
that something else takes on.

That's the intuition here.

A person can become a student,
and later cease to be a student.

We'll see this example again in a bit.

So just summarizing,
documents can enroll.

This is consistent with not just
Guarino and Welty but also John Searle,

if you're familiar with his writing
about the ontology of the social world.

Documents are strings,

but strings are only documents
while they are in a communicative role.

Because documents are strings,
they're going to be immutable.

The thing that is a document can't change.

I mentioned the burden
that one has to bear

if he denied modification.

How do we give an account
of what apparent modification must be?

And I know I'm waving my hands
a bit at this point,

but roughly, when we say
that a document is being modified

what's going on is that a person
or persons comes to prefer

a different string for a particular
communicative role

than the string previously
preferred for that role.

I think I may have heard that
even from a couple of you already.

Apparent changes in digital documents,
and you can generalize this account

to all digital objects,

apparent changes in digital objects.

Remember the constantly changing
digital world I referred to

at the beginning.

Apparent changes in digital objects
are actually changes in us,

in the person or persons interacting
with those objects.

They're not changes
in the documents themselves.

So what changes
when a digital object changes?

To answer the question
posed earlier, you do.

I promised you some Eliminativism.

If you find it hard to accept
that documents cannot change,

and you should find it hard to accept,

because it is part
of our conceptual scheme, I think.

There is another way out.

(voice in audience) This is
such a relief to me, Allan.

(laughter)

There's another way out, trust me!

You're not going to be happy without it.

To rehearse where we are,
it is commonly believed that documents

can be revised, edited, shortened,
lengthened, and modified in various ways.

That belief is widespread
and deeply rooted.

I characterized it as part
of our conceptual scheme.

Perhaps it is so deeply rooted
that it's actually integral

to our concept of a document.

If that's the case, then we can
express this relationship this way.

If there are documents, then there
are modifiable documents.

It may be more natural to say, if there
are documents, then they are modifiable.

But we've shown that there are
no modifiable documents.

From the claim that if there are documents
there are modifiable documents,

and the assertion "there are
no modifiable documents",

the conclusion is only
that there are no documents,

and that's elimination.

Let me just briefly say that there is
another line of reasoning

to the same conclusion, that looks at
the constricts in discrete mathematics

that are typically used
to define digital documents.

All of those concepts, whether they
be strings or graphs or relations

all are eventually defined
in terms of sets

and our standard set theory holds
that membership in a set

is essential to the identity of the set.

Sets cannot lose or gain members.

Sometimes mathematicians speak loosely,

but when they're not speaking loosely,

they do recognize that one set <i>S</i> and one
set <i>T</i> are identical if, and only if,

they have exactly the same members

and that's a forward and back,
that's not just at a time.

Sets are used to define strings.

They're used to define
the relations in a--

actually, let me expand on that a bit.

In a relational database model we see
information as organized in a table.

And our textbooks tell us that table
is understood as a mathematical relation,

which is a set of <i>n</i> sized tuples.

We speak of adding or deleting
records from tables.

That corresponds to adding
or deleting tuples from a set,

and having the set survive the change.

Sets cannot lose or gain elements,
whatever they are.

The conclusion is documents
can never change.

You can't add a record to a database.

You can't delete a record
from a database.

Database switched to table here,
but it doesn't make any difference.

A database table is a relation.

A relation is a set.

Sets have their members essentially.

They can't lose their [inaudible].

Same goes for collections.

Collections are often defined as sets.

I think I've got them coming up here.

Yeah, there we go!

Good old [Ed Fox]'s students gave
this account of the digital library.

Collection is a set. There, they say it.
They even use curly braces.

If a collection is a set,
you can't add anything to it.

Nor can you remove it from it.

[inaudible]

Suddenly all these things
that we had in our digital world

that we're very familiar with,
very familiar,

talked about all the time,

seem to incorporate logical
inconsistency in their very nature.

One response is to say, no,

it's not inconsistent, it's just that
our notion of those things was inadequate

and we have to face the fact that
you can't add something to a collection.

You can't subtract
a record from a database.

You can't edit a document.

That's one response.

The eliminativist says, you know what?

If you're going to go
that far, you give up--

rather than adopt a position that is
that repugnant to my conceptual scheme,

my notion of a document,
a collection, a database,

I'd rather just say there aren't any,

because the idea of a database table
that you can't add a record to

is just not consistent
with my notion of database table.

(man in audience) As a computer scientist,
can I offer what I seem to think

is an easier way out?

When you're modifying a table,

this is actually going back to my attempt
at going down the path of a new document.

What you're doing is you're actually
choosing a new relationship

whose new properties
reflect the differences

that adding [inaudible] a table.

So when we say
we're adding a new table,

that's a shorthand for saying
we're manifesting a new relationship

in which the only difference
between this relationship

and the previous relationship is a table
that is the row that I modified.

And again, I actually am not
going to contest that view

because the point I want to make
is that literally speaking,

the relation is not modified.

(man in audience) Yeah, you're choosing
a new relationship from the universe

of all possible relationships

and when you're saying
we modified the table,

that's just a shorthand for doing that

and I think that doesn't deny
the existence of documents

or tables or anything else,
but gets us out of this jam.

So I would say it doesn't
get us out of the jam,

because what we're agreeing
is what's really going on.

But, I maintain that a relation cannot--

that you cannot add
a record to a relation.

(man in audience) That's right,
it's a new relation.

Well, it's a different relation in a way.

So we actually agree, I think.

(man in audience) But you're denying
the existence of the document.

What I'm saying is that
if the immutability of relations

is repugnant to your
concept of a relation,

then there is another approach,
and that is to deny

that there are, I have to say
tables in this case.

There are database tables.

So a database table
is a modifiable relation,

but there are no modifiable relations.

Therefore, there are no database tables.

That's how the argument goes.

In Khan's original paper
on the relational model

he sets up this near convergence
that we have here.

He says something like,
I can't remember exactly,

but he talks about how an actual database
over time is really a function

from times to sets,
from sets to tuples.

And you could say to me,
"Allan, you're completely confused here.

"A database is a function as Khan says.
A database is a function from times

to sets of tuples."

I'd say, yes, that may be true,
but there's still nothing

in the landscape that's mutable.

So when you start writing assertions
or a modeling framework,

UML, RDF, whatever, you had better not
have variables ranging over tables

that are modifiable because that would be
a literal interpretation of the sentence

you and I have agreed on
interpreting with a paraphrase.

(man in audience) Yeah,
I guess as a computer scientist,

if you are working in the domain
of functional programming

I don't think any of this
would seem as a shock.

I guess I don't see the cognitive
dissonance that should spring in my head,

that you're saying
should spring in my head--

So I think maybe you're right.

That at this point, having talked
about the paraphrases [inaudible],

the kind of dissonance
started dissipating.

The problem is most acute
when we're trying to actually develop

a conceptual model for a repository

or a preservation system
or a document management system

and we're drawing boxes
and arrows and have an interpretation

[inaudible] watching.

The decisions that we have
to make are actually hard.

Let me take a specific example.

So, [Planets?], which is based on PREMIS,

has a nice UML diagram
of its preservation model.

And they classify
documents as bitstreams,

and they also attach a modification date
to the document class

but if a document is actually
a particular bitstream,

then it is not going to be modifiable.

You think of the class of bitstreams
as the class of every common

rhetorically possible bitstream.

A document is one of them.

That document cannot become
some other bitstream.

To me, that just says well,
there's interesting work to do here,

if we're going to have a UML diagram
that matches our intuitions

a little more closely or that lets us work
with these a little better,

but my general point is, if you take
the sentences we are likely to articulate,

and try to represent them in logic base

of conceptual modeling language,

even if you're pretty good at it,
even if you try hard,

you will end up just like
the Premise Planets people did,

not creating a system
like you just described

in your paraphrases,

but creating one that actually
has contradictions in it.

Most of the time it doesn't matter

because there's so much English involved,

there's so much human
intervention involved,

we're able to navigate these problems,

but the more we move towards
automatic inferencing

over our ontologies
and over our assertions,

the more likely it is that we start
to replicate every paradox

of the last 2,000 years
in these lights-out

automated inferencing systems
that are just completely unforgiving,

that don't understand
what we really mean.

(man 3 in audience) Can I add
an element of time management here--

(man in audience) Feel free
to shut me up if you--

(man 3 in audience) No, it's just
that we are running out of time,

so, Allan, could we shift over
to a couple of questions before we stop?

(man 4 in audience) I think I need a bit

of clarification on what
is meant by document

because you talked about documents
as a sentence, even a database.

It seems that it could apply
to any sort of digital object.

Is that what you mean,
and if that is the case,

then although I could agree that
a digital object or document

is definitely a set or a bitstream,

I keep on disagreeing that it's a string,

because although a string is a set,
there are other properties and restraints

that are associated
with the fact that it's a string.

For example, it has
a certain order, a sequentiality.

And that doesn't exist in every document.

First of all, [inaudible]

so I think I need a bit more clarification
on what you mean by document

and [inaudible].

So, I don't have a--

you may have noticed in the beginning.

I don't necessarily want to tie myself
to any particular account of document,

either specific definition
or colloquial notion,

so I would say, I'll take candidates
for what a document is.

Presumably, in ordinary circumstances,
something like the TPS memo.

Something that can be revised,
something that can be authored,

something that communicates,

and when we look at definitions,
whether it's an FRBR, or a [inaudible]

or the XML standard, we often see
accounts of a document

that do make it look
like a structure of some kind,

often a string of symbols.

But maybe one clarification,
though is that clearly,

I'm not focused attention
on the repeatable abstraction,

not on the material object
that embodies the abstraction.

(man 4 in audience) Yeah, absolutely.
That's why I think [inaudible] really works.

I'm not trying to contradict
the conclusions that you attempt to draw.

I think that you managed to convince me
that a document is an immutable object.

I just don't think it's a good idea
to call it a string,

because a lot of documents
will not be strings.

So I'm going to take an example
from what I know better than [inaudible].

I consider them documents in that
they are revised, they're edited,

they communicate a set of instructions
and a lot more, if you will.

Although there is definitely some sense
of sequence, they do not operate

as a sequence only because even though
they communicate something

that is supposed to happen in time,

so there's one event after the other,

they also represent [inaudible]

events that happen at the same time

so you have at least
two sequences that are concurrent.

And this is not just thinking
in terms of a graph.

It's not just an overlapping of qualities,

it's an ontological problem
and you cannot just model it

as something that happens
in a sequence, in a line.

It's not a string.

So let me try this as a response.

In the end, despite all this talk
about strings and such,

it's the fact that abstract objects
have no contingent inherent properties

that drives the argument forward.

I refer to specific constructs
from discrete mathematics

like strings and so on,

because they're so common in the books
that we read about digital objects.

But, however you conceptualize your score,
if it's a repeatable abstraction,

it's going to be implausible
that it's mutable.

It is plausible because
we talked about [inaudible] score,

but for the same reasons given here,

after reflection, it becomes
implausible that it's mutable.

And so, the kinds of paraphrases
that we use for strings

will also involve social,
community intention.

Looks like we're shifting
change to communities.

Is what we're doing.

Convention, intention, all that stuff

is going to have to happen
for us of course as well.

Even though I don't have a snappy answer
for what a score is,

I'm still fairly confident
that whatever it is,

if it's a repeatable abstraction,
it will not have any inherent properties

that are contingent
and therefore will be modifiable,

and therefore its apparent modification

will be a social construction.

A genuine social construction
that's dependent upon

our intentional effort as a community.

We have to stop there, but if your brain
is like mine right now it's racing

in a lot of different directions.

Our mind.

I'm sure that Allan will be
at the front of the room

to talk to you if you
would like to talk more.

Let's thank him
for a great digital dialogue.

(applause)