Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World

0:00 - 0:02

Good morning, everyone.
0:03 - 0:06

Thank you for coming here
[unclear] of the semester.
0:08 - 0:10

So, I'm going to start.
0:11 - 0:14

Access to the internet
is greater than ever before
0:14 - 0:17

and as a consequence,
it's becoming more multilingual.
0:19 - 0:23

However, there's evidence of segmentation
of cyberspace
0:23 - 0:25

due to language and national borders.
0:28 - 0:31

This image serves to illustrate that.
0:32 - 0:36

This is the language communities
of Twitter in Europe.
0:37 - 0:41

So, what you can see are tweets
geolocated over a map of Europe
0:41 - 0:44

and the different colors
represent the different languages.
0:45 - 0:51

You can even see regional languages
like Catalan in the Catalan region of Spain
0:52 - 0:56

And this is going to be useful
for an example I'm going to use later.
1:02 - 1:04

I'm interested in Twitter in particular,
1:04 - 1:07

because of the speed
of information dissemination
1:07 - 1:11

and that most of this information
is publicly accessible.
1:14 - 1:19

I'm going to illustrate this
with a capture
1:19 - 1:22

of a dynamic visualization
you can find on the Twitter blog
1:22 - 1:25

by Miguel Rios.
1:25 - 1:29

And what you can see here
is the global flow of tweets
1:29 - 1:31

after the earthquake in Japan.
1:32 - 1:35

In pink, there are the tweets
coming out of Japan
1:35 - 1:37

and, in green, the retweets
all over the world.
1:39 - 1:45

This illustrates that in Twitter
information is spreading across countries.
1:46 - 1:48

But how can this happen?
1:49 - 1:55

Expatriates, migrants, minorities.
diaspora communities, language learners
1:55 - 1:59

all play an important role
in building transnational networks
1:59 - 2:03

and cultural bridges
between nations and communities.
2:04 - 2:06

They are the multilingual users
on the internet.
2:08 - 2:11

The overarching research question is:
2:11 - 2:17

how are multilingual users of Twitter
connecting different language groups?
2:22 - 2:27

In 2009, the Berkman Center of Internet
and Society at Harvard University
2:27 - 2:30

mapped the Arabic blogosphere
2:30 - 2:33

and they described a key concept
for my research.
2:35 - 2:41

They discovered an English bridge
and a French bridge of bloggers
2:41 - 2:46

that were writing in their native
Arabic language and in English or French.
2:47 - 2:52

And they were connecting the different
national blogospheres
2:52 - 2:53

with the international one.
2:55 - 3:00

This might have played a role in the Arab
popular uprisings in 2011
3:00 - 3:03

for reaching out to the world.
3:05 - 3:09

And this is connected with a concept
that first appeared in 2008
3:09 - 3:12

of the bridge bloggers.
3:14 - 3:16

So, bridge bloggers are bloggers
3:16 - 3:20

that are trying to connect
their local communities
3:20 - 3:23

to a wider global audience.
3:25 - 3:29

The image you can see here
is actually the visualization they created
3:29 - 3:33

of mapping the Arabic blogosphere.
3:35 - 3:37

Each dot is a blogger, or a blog.
3:39 - 3:43

The size represents their popularity,
so how many incoming links they have
3:43 - 3:45

and they grouped them--
3:45 - 3:48

the neighborhoods they created
3:48 - 3:52

in relation to the linking
between them.
3:53 - 3:56

So, the ones that are grouped together
are linking among each other.
3:57 - 3:59

The colors are a different question.
3:59 - 4:06

The colors represent "attentive clusters",
that's how they call it.
4:06 - 4:12

And they look at their online resources
and media outlets
4:12 - 4:14

these blogs were linking to.
4:15 - 4:19

So, blogs of the same colors
are following the same media outlets
4:19 - 4:21

and online resources.
4:21 - 4:25

And they did human coding
to label those groups.
4:26 - 4:30

And here is where we see
the label English grids
4:30 - 4:32

the responses from Cuba
in English
4:32 - 4:34

and up there, there's [unclear] France.
4:36 - 4:41

And so I think it's important to retain
the concept of attentive clusters.
4:44 - 4:49

Now, let's go back to 2011
during the Arab popular uprisings.
4:50 - 4:55

And I'll show you a visualization
of the influence network
4:55 - 4:57

of Twitter users in Egypt.
4:58 - 5:00

So, what you're seeing here
5:00 - 5:04

just imagine people down the street
at Tahrir Square
5:04 - 5:08

tweeting in Arabic about what's going on
on the ground.
5:08 - 5:12

And those are the people in red.
5:13 - 5:18

So, these red dots represent users
that are tweeting in Arabic.
5:18 - 5:20

Then we have the international community
5:20 - 5:26

or even Americans, British and so on
tweeting in English.
5:26 - 5:28

And they are in blue,
those blue dots.
5:29 - 5:33

And then, interestingly, we have
people in between them.
5:33 - 5:38

which are illustrated in different
degrees of violet, or violet shades.
5:39 - 5:43

This represents the fact that they
are tweeting in both Arabic and English.
5:45 - 5:47

So, what we're seeing
is the bridge Twitters
5:48 - 5:53

because, like Ethan Zuckermann called them
"bridge bloggers".
5:56 - 5:59

So, another context.
5:59 - 6:05

The same year, 2011, a lot
of big protests were going on in Europe.
6:05 - 6:07

And in particular, in Spain.
6:07 - 6:12

They started on May 15th 2011
there were massive protests.
6:13 - 6:17

And of because of this context,
this situation
6:17 - 6:23

new attentive clusters were emerging
in the social media landscape of Spain.
6:28 - 6:34

Now, this is a visualization you can find
in the Socialflow blog, research blog
6:34 - 6:35

on social networks.
6:35 - 6:41

And what it is, is it tracks the origin
and the initial spread
6:41 - 6:45

of the hashtag #occupywallstreet
in Twitter.
6:47 - 6:52

They detected that one of the first users
of the hashtag #occupywallstreet
6:52 - 6:57

was on July 13th 2011, linking to a blog
post of Adbusters.
6:58 - 7:02

So you have the Twitter account
of Adbusters there, very big
7:02 - 7:05

because it's being retweeted a lot.
7:06 - 7:07

And mentioned a lot.
7:08 - 7:14

And they collected these mentions
and the tweets that had these mentions
7:14 - 7:17

and these retweets with the hashtag
during July 13th.
7:18 - 7:21

From July 13th to July 23rd.
7:21 - 7:24

So, from the first 10 days
of the use of this hashtag
7:24 - 7:27

it was from the very beginning of the use
of this hashtag on Twitter.
7:31 - 7:32

They just mapped the accounts
7:33 - 7:39

and the series of posts with the hashtag
and mentions with the hashtag
7:41 - 7:43

and the users that were connecting
7:43 - 7:45

because of these mentions
and retweets.
7:46 - 7:49

Now the interesting thing
in this visualization
7:49 - 7:51

is that they
7:51 - 7:54

the Socialflow people
particularly in [inaudible]
7:55 - 8:00

detected this Spanish brand
of users
8:00 - 8:03

were forming an attentive cluster.
8:06 - 8:09

Mentioning and retweeting about it
in Spanish
8:09 - 8:13

using the hashtag in their messages
in Spanish.
8:14 - 8:17

And they point out in the blog
8:18 - 8:20

that this Spanish contingent
8:20 - 8:24

helped post and spread the word
about Occupy Wall Street
8:24 - 8:29

even before most of the United States
was aware of it.
8:32 - 8:34

So, I found that very interesting.
8:34 - 8:37

And it was due to the context
in Spain at that moment
8:38 - 8:45

with big protests and new clusters
forming in the social media landscape.
8:57 - 9:01

Now I have shown you the importance
of these multilingual users
9:01 - 9:06

in connecting language communities
and spreading information
9:06 - 9:09

across countries, acting as mediators.
9:11 - 9:16

But let's focus on another aspect
of connecting language groups
9:16 - 9:17

which is language choice.
9:18 - 9:23

So I'm going to devote a moment
to speak about languages
9:23 - 9:24

and language choice.
9:28 - 9:31

To understand languages in the world
9:31 - 9:33

I'm going to use a telescope.
9:37 - 9:40

So de Swaan...
9:41 - 9:45

...proposed a theory called
the world language system
9:45 - 9:46

back in the 1990s.
9:47 - 9:51

to explain the languages in the world.
9:52 - 9:55

And he used a very beautiful metaphor,
the constellation.
9:57 - 10:02

So, in his theory there's about a dozen
languages in the world
10:02 - 10:05

that are the hearts of the system,
or the suns.
10:06 - 10:07

The suns of the system.
10:08 - 10:11

For instance, English, French, Spanish,
Arabic and more.
10:12 - 10:17

And then there are hundreds,
maybe more than 100, 200...
10:17 - 10:23

national languages that are orbiting
around these suns like planets.
10:24 - 10:28

And finally we have regional
and minority languages
10:28 - 10:32

that are orbiting these planets
like satellites.
10:33 - 10:38

And he used this metaphor
to explain the power relationships
10:38 - 10:40

between languages.
10:40 - 10:43

This is a theory of what he called
10:43 - 10:47

"communication potential
and language competition"
10:48 - 10:51

A key point he made
10:52 - 10:55

is that the system holds together
10:55 - 10:59

thanks to multilingual people
and interpreters.
11:00 - 11:03

This is what's providing cohesion
to the system.
11:04 - 11:07

He also made a controversial proposal
11:07 - 11:11

about the communication potential
of a language.
11:12 - 11:15

So, he proposed a formula,
a mathematical formula
11:15 - 11:20

where he could estimate the communication
potential of a language
11:20 - 11:25

and supposedly a person with tools
through learning and usage
11:25 - 11:28

based on the communications of that.
11:28 - 11:34

For example, a person might decide
to learn English and use English
11:35 - 11:41

because not only does it provide
communication with English native speakers
11:42 - 11:46

but also, adding to that, it provides
the possibility to communicate
11:46 - 11:50

with all the second-language learners
of English
11:50 - 11:53

from many different languages,
many different countries.
11:53 - 11:56

So, supposedly, in history
11:56 - 12:00

English provides
the greatest communication.
12:01 - 12:05

And he received some criticism,
because of the central role of English
12:05 - 12:07

in his theory
12:07 - 12:10

He said it was the central hub
of all the system.
12:13 - 12:20

There's also the language ecology paradigm
first proposed by Haugen in 1972
12:22 - 12:25

and there's this idea of an ecosystem
of languages
12:25 - 12:30

and, again, it's using another metaphor
12:30 - 12:32

and because of this metaphor
12:32 - 12:34

also appeared the idea
of endangered languages.
12:36 - 12:40

I'm going to briefly just read
the definition.
12:40 - 12:43

He defined the language ecology as:
12:43 - 12:47

"the study of interactions between
any given language and its environment"
12:48 - 12:49

and what I think is very important:
12:49 - 12:54

"language exists only in the minds
of its users"
12:57 - 13:00

which leads me to point at my research.
13:02 - 13:06

In my research, I'm using a microscope
to see the cells
13:06 - 13:09

and my cells in my study
are the Twitter users.
13:12 - 13:13

Why is that?
13:16 - 13:20

Because as Haugen explains,
there's a psychological dimension
13:20 - 13:22

to language ecology
13:22 - 13:25

where language interacts
with other languages
13:25 - 13:28

in the minds of multilingual people.
13:29 - 13:32

And there's a sociological dimension
to language ecology
13:32 - 13:38

where we use language to communicate
and interact with other people.
13:38 - 13:44

And this language ecology generates
because of the people
13:44 - 13:46

that decide to use that language
13:46 - 13:50

learning and interacting
with people using it.
13:51 - 13:55

And this is the point
of language choice in languages.
13:56 - 14:00

So, I focus on the connections of people
and the language choice.
14:04 - 14:08

So, these are the four points
I'm going to be speaking about.
14:08 - 14:13

But actually the main focus
is going to be the first point
14:13 - 14:19

Social network analysis and the taxonomy
of intersections between language groups
14:20 - 14:23

This is where I'm going to be spending
most of the time.
14:23 - 14:27

And then very briefly,
just for compilation purposes
14:27 - 14:30

I'm going to speak about another
small study that I did
14:30 - 14:31

the factor analysis
14:31 - 14:34

looking at the influence
of the social network
14:34 - 14:39

in the language choices of the users.
14:39 - 14:42

So, how the social network
influences language choice
14:42 - 14:43

of our multilingual users.
14:45 - 14:50

And then I'm going to briefly also talk
about the last study of my dissertation
14:50 - 14:52

that is still ongoing.
14:53 - 14:55

So, I still have new research
to talk about.
14:55 - 14:58

And it's content analysis
14:58 - 15:00

and in this case I'm focusing on
intrinsic factors
15:00 - 15:03

intrinsic to the messages
15:03 - 15:06

about the topic,
and the type of exchange.
15:06 - 15:08

If it's a reply,
if it's a public post
15:08 - 15:10

and how that influences
the language choice as well.
15:11 - 15:12

And finally I will...
15:14 - 15:17

I'm going to give you my reflections
15:18 - 15:23

so I can invite your thoughts
and suggestions and discussions about it.
15:27 - 15:29

Briefly, I'm going to start
with the sampling
15:29 - 15:32

so I can talk about the rest
of the research.
15:35 - 15:37

So my focus is on multilingual users,
15:37 - 15:40

how did I identify multilingual users
on Twitter?
15:42 - 15:44

It was giving me a headache.
15:44 - 15:47

Finally what we decided...
15:49 - 15:50

this research has been--
15:51 - 15:54

I have always had the help
of Jennifer Golbeck,
15:54 - 15:55

she was my adviser.
15:55 - 15:57

And I did this with her help.
15:58 - 16:02

So what we did, was gather a list
of what is called stopwords.
16:03 - 16:05

From different languages
and you have a list over there.
16:06 - 16:10

And then the stopword lists
you can find them on the internet.
16:10 - 16:13

They are created
for computational linguistics
16:13 - 16:16

so they use it for filtering purposes.
16:17 - 16:19

And they are common words
in a language.
16:20 - 16:21

Very common words in a language.
16:21 - 16:26

So, sometimes they're used precisely
for eliminating them from texts
16:26 - 16:29

when they're in, for example,
searches in Google
16:29 - 16:33

the eliminate the stopwords,
the stopwords that you type
16:33 - 16:35

in the search.
16:35 - 16:38

But in this case I wanted
to find the stopwords
16:38 - 16:41

that are very common in the language
to represent the language.
16:41 - 16:44

And so we had to select words
that were not written the same
16:44 - 16:46

as in another language.
16:46 - 16:48

Sometimes, could be confusing
and ambiguous.
16:51 - 16:54

Then I typed in Google...
16:56 - 16:59

one word in one language
and one word in another language.
16:59 - 17:01

Usually I was always using
one English word
17:01 - 17:04

and one word in a different language.
17:05 - 17:08

And I looked in the Twitter domain.
17:09 - 17:12

So the search results from Google
will give me the profiles
17:13 - 17:20

of people on Twitter that in theory
wrote messages in both languages.
17:20 - 17:24

We had to do a lot of hand-combing
to actually see if it was in two languages
17:25 - 17:28

or it was just that they were mentioning
an English song
17:28 - 17:32

the title of an English song
but they had no English in the rest.
17:32 - 17:36

So we had to ensure
that they were authoring tweets
17:36 - 17:38

in two languages.
17:38 - 17:40

So writing them, not just retweeting them
17:40 - 17:43

they were not just automatic postings
from Facebook.
17:43 - 17:48

So we had a long set of criteria
a lot of manual combing
17:48 - 17:53

and then finally we selected
92 multilingual users
17:53 - 17:58

and in total they used 19 languages,
2 or 3 languages per person.
18:01 - 18:05

Now, I don't know if you want to ask
some questions about the sampling
18:05 - 18:08

because there's a lot of details about it.
18:13 - 18:15

No doubts?
18:15 - 18:17

Or maybe they'll come later!
18:19 - 18:22

Now, how do I do
the social networks analysis?
18:23 - 18:28

Well, now I have my 92 multilingual users
technically they are called the ego
18:28 - 18:30

of an egocentric network.
18:31 - 18:33

This is the cell of my study.
18:34 - 18:36

It started with the nucleus of the cell
18:36 - 18:38

which is my multilingual user
18:38 - 18:40

and then I go to Twitter
18:40 - 18:43

and first of all I have instructed--
18:45 - 18:47

so in this case my ego
is called the Painter
18:48 - 18:54

and I have extracted the last 50 messages
that he posted on Twitter
18:54 - 18:57

to see the languages
this person used-- is using.
18:57 - 19:02

And I see that he is using English,
Spanish and Catalan.
19:03 - 19:05

Catalan is a regional language in Spain
19:06 - 19:08

and I have shown you on the map
the region before
19:08 - 19:09

where the region was.
19:09 - 19:12

And they speak both Catalan
and Spanish.
19:14 - 19:17

So, this person is tweeting
in a minority language
19:17 - 19:18

a national language
19:18 - 19:21

and also international.
19:27 - 19:32

So, I already found the Painter
and I know what languages this person speaks
19:32 - 19:34

well, uses on Twitter,
19:34 - 19:36

and then I extract
all the social networks.
19:36 - 19:38

So, the followers on Twitter
19:38 - 19:40

you know that on Twitter
you have followers
19:40 - 19:41

and you follow people.
19:41 - 19:43

I extracted both.
19:43 - 19:48

The followers of the Painter
the people that are following him on Twitter
19:48 - 19:52

and also how the friends
are connecting to each other.
19:52 - 19:57

So, all of them, all of these dots
are the followers
19:57 - 19:59

the people following the Painter
on Twitter
19:59 - 20:03

and also I see how they connect
among each other, ok?
20:05 - 20:09

So the Painter follows Eduard
in the center
20:10 - 20:12

and it seems he's very popular.
20:14 - 20:17

And then I extract the last 30 posts
of Eduard--
20:17 - 20:19

there's a reason for that
20:19 - 20:22

but vernacular
is mostly economy questions!
20:25 - 20:26

I will tell you why!
20:26 - 20:29

So I extracted the last 30 posts of Eduard
20:29 - 20:32

and then I do
automatic language identification
20:32 - 20:37

with the Google API
for language identification
20:39 - 20:40

which costs money.
20:41 - 20:43

So you have to really think
about how many posts you want to send
20:43 - 20:46

to Google and how much money
you have available
20:46 - 20:48

and what is the accuracy
you're going to have
20:48 - 20:51

according to how many posts you send.
20:51 - 20:53

There's a lot of testing going on there.
20:54 - 20:58

I do the same with everybody
in the social network.
20:59 - 21:00

I extract the last 30 posts
21:00 - 21:02

use the Google identification
21:02 - 21:08

build that algorithm that decides
based on the languages of these 30 posts
21:08 - 21:12

is this person monolingual?
Is this person multilingual?
21:12 - 21:13

Which languages?
21:13 - 21:15

And then I laddered them, ok.
21:17 - 21:19

This is just a visualization behind the--
21:20 - 21:27

Perhaps person 1 is monolingual,
or bilingual of two languages.
21:32 - 21:36

Now that I have all the friends
of the Painter
21:36 - 21:37

how they connect,
21:37 - 21:41

I color code them
depending on the languages they are using.
21:42 - 21:45

And here, what you can see
is very interesting.
21:46 - 21:49

I don't know if you can distinguish
the colors well
21:49 - 21:54

because up here, this area,
that is like a triangle
21:54 - 21:58

there's a group of users
writing in English.
21:59 - 22:01

And it's pink.
Sort of pinkish.
22:01 - 22:05

And then, down here
there's this Spanish group
22:05 - 22:07

in light green.
22:08 - 22:12

And, in the middle, the one
that perhaps doesn't distinguish as well
22:12 - 22:15

from the English,
is the Catalan group.
22:16 - 22:19

So the users writing in Catalan
in dark blue.
22:20 - 22:22

And then there's a set of violets
in between
22:22 - 22:26

and these violets represent
the bilingual users
22:26 - 22:29

either English and Catalan
or English and Spanish.
22:30 - 22:33

And then there's darker green
around here,
22:33 - 22:36

they are using both Catalan and Spanish.
22:36 - 22:38

So there's a lot of bilinguals
going on.
22:38 - 22:40

And there's an interesting dynamics
22:40 - 22:43

in that you have this English group
up there
22:43 - 22:44

and the Spanish group up here
22:44 - 22:46

and the Catalan group in the middle.
22:46 - 22:49

And this Catalan group is very mixed up
with the Spanish group
22:50 - 22:52

which makes sense,
because it's a bilingual community.
23:01 - 23:07

So, this is how I built the egocentric
network of my 92 multilingual users.
23:09 - 23:11

The Painter is just one of them.
I have 92.
23:11 - 23:17

I have 92 cells or egocentric networks
that I studied with my microscope.
23:18 - 23:22

Do you want to ask some questions
about this process
23:22 - 23:23

or this visualization?
23:25 - 23:30

(person 1) Of the bilingual units,
are they users or tweets?
23:31 - 23:32

They are users, yeah.
23:32 - 23:36

So, the dots represent people.
23:36 - 23:40

So, like Eduard here.
They represent people.
23:42 - 23:45

Now each dot to determine the language
and the color
23:45 - 23:48

I extracted 30 posts
23:48 - 23:53

So, it's an interesting question
because the 30 posts
23:53 - 23:56

have different language levels
assigned to them
23:56 - 23:57

especially if they were bilingual
23:57 - 24:02

and I had to decide which language level
I was going to assign to the user.
24:02 - 24:05

So, I had to build an algorithm
with a set of rules
24:10 - 24:11

basically saying--
24:11 - 24:17

the Google identification system
would give me a language
24:17 - 24:18

and a confidence level
24:18 - 24:19

So if the confidence level was very low
24:19 - 24:24

I would say "discard that"
because I had a series of pluristics
24:24 - 24:30

based on both the number of tweets
using a particular language
24:30 - 24:33

and also on the confidence level.
24:34 - 24:38

And there are a lot
of technical challenges there as well.
24:40 - 24:42

(woman) So, it's possible
that some of these posts
24:42 - 24:46

many of these posts would be multilingual,
I'm sorry monolingual in one language or the other?
24:46 - 24:52

So it's also possible that some
of these individual posts
24:52 - 24:54

would mix languages?
24:55 - 24:57

Yes, it is possible.
It's very possible!
24:57 - 25:00

It's very challenging
for the automatic system!
25:02 - 25:04

(woman) Right, ok.
I just wanted to be clear--
25:04 - 25:05

Yes, exactly.
25:05 - 25:11

So it's not as frequent as I expected,
having bilingual posts
25:11 - 25:13

that I would call.
25:13 - 25:14

But it's happening.
25:15 - 25:21

And so, for a series of tests,
I had to do manual combing
25:21 - 25:23

and I saw that sometimes
it was the case
25:23 - 25:27

that they were doing some sort
of translation in the same tweet
25:27 - 25:32

and sometimes it was just the case
that they were mentioning titles of things
25:32 - 25:34

or places in a different language.
25:35 - 25:39

So, there's a lot of issues
surrounding the automatic handling of this
25:39 - 25:44

but you are dealing with 92 networks
25:44 - 25:51

and they have between 30
and 5,000 nodes in them.
25:53 - 25:56

So, I don't remember the numbers exactly,
25:56 - 25:59

but I'm talking about
around 80,000 people.
26:01 - 26:05

So detecting the language of 80,000 people
and this is small-scale.
26:05 - 26:08

If you go to millions,
you need an automatic system.
26:08 - 26:11

And one of the things I'm having
to write up in my dissertation
26:11 - 26:14

is what are the challenges.
26:14 - 26:18

You have to be prepared for them,
to solve those problems.
26:19 - 26:22

And one of them is what do you do
with bilingual posts
26:22 - 26:24

which language do you assign to that post?
26:24 - 26:28

Automatic posts, spam...
there's a lot of problems.
26:30 - 26:31

Challenges, I mean.
26:31 - 26:35

That's what makes it interesting
because you cannot do manual combing
26:35 - 26:36

on these scales.
26:39 - 26:41

Do you have another question?
26:45 - 26:48

So, now, what am I doing with this?
26:51 - 26:56

I'm going to classify my social networks,
looking at the patterns
26:56 - 26:59

of overlaps between the languages groups.
27:00 - 27:02

And overlaps or intersections.
27:03 - 27:08

I'm looking specifically at the networks
that have only two language groups
27:08 - 27:12

I had five of these networks
that were trilingual
27:12 - 27:16

so I put them aside to go simple
first with just two language groups
27:16 - 27:18

to see how they interconnect.
27:19 - 27:21

And then I classified them
27:22 - 27:24

first following a qualitative analysis
27:24 - 27:29

and then I used network statistics
that I developed with my adviser
27:29 - 27:30

for this purpose.
27:31 - 27:34

And I will talk later a little more
about it.
27:34 - 27:38

So, tried to provide
more robust measures for that.
27:39 - 27:44

I classified them and I came up
with some types.
27:46 - 27:50

This is what I call the gatekeeper
language bridge type.
27:51 - 27:53

And there's some variants of it,
obviously.
27:54 - 27:56

What you can see here
is the network of a person
27:56 - 28:00

and I'm going to assume this person
is in the United States
28:00 - 28:02

and speaks both Spanish and English.
28:04 - 28:06

Let's call her Maria.
28:06 - 28:12

So she's Maria and she has two groups
of friends using Spanish on Twitter
28:13 - 28:16

and then that big group of friends
using English.
28:17 - 28:20

And, as you can see,
there's just a few nodes
28:20 - 28:22

connecting the two language groups.
28:22 - 28:28

You can see that the social structure
can be different from the language groups
28:29 - 28:32

so you can have maybe a group of friends
and a group of coworkers
28:32 - 28:36

inside the same language group,
so it can be more complex
28:36 - 28:41

than just dividing the social network
by language groups.
28:41 - 28:46

There can be more grouping
because of other social resources.
28:47 - 28:51

But the interesting thing is that
there are only a few nodes
28:51 - 28:53

where people are connecting
holding together these Twitters.
28:55 - 29:01

I think this was friends
with English here.
29:01 - 29:05

You can see, in this case, it seems
like the two groups
29:05 - 29:08

are holding closely together
29:09 - 29:14

because there are much more links
holding the two groups together.
29:15 - 29:18

Of course, this is going to depend
on the size of the networks
29:18 - 29:23

so I had to account for the size
when coming up with measures
29:23 - 29:26

with network connections
29:26 - 29:28

I had to provide ratios.
29:28 - 29:32

Now, the ratio of [close] language linking
here and here
29:32 - 29:34

and you have these types--
29:36 - 29:40

These types are not just clear-cut.
29:40 - 29:42

There's an evolution.
29:42 - 29:43

There's people that have
very few connections
29:43 - 29:45

with the language groups
29:45 - 29:47

and then progressively there's people
with more and more.
29:48 - 29:49

And this increases.
29:49 - 29:52

Which points to the fact,
that my cells are there.
29:53 - 29:57

Which means I don't see the evolution
over time, ok?
29:58 - 30:00

This is a limitation of my research.
30:00 - 30:05

I just see the social network
of this person looked
30:05 - 30:07

at a particular point in time.
30:08 - 30:10

I don't know how it evolves over time.
30:10 - 30:13

So, for myself, it's just there.
30:14 - 30:19

It would be interesting
to see these different patterns
30:19 - 30:21

that I have been observing.
30:21 - 30:27

Maybe over time these connections
between languages maybe increasing.
30:29 - 30:32

Now we have the integration
and union type
30:33 - 30:37

where in this case you have a person
from an Arab country
30:37 - 30:41

and green represents the friends
that are using Arabic
30:41 - 30:45

and the friends using English are in pink,
but there's also violet
30:45 - 30:47

there are bilinguals.
30:47 - 30:52

That means there's a group
of English users
30:52 - 30:57

and bilingual English - Arabic users
inserted in the group of Arabic, inside.
31:00 - 31:01

That's the integration,
so they're integrated.
31:02 - 31:08

And then I have a Greek guy,
who uses Greek and English
31:08 - 31:09

and his Arabic friends.
31:09 - 31:12

And in this case, you can see
it's sort of light blue
31:12 - 31:17

representing Greek, so the friends
that tweet in Greek
31:17 - 31:21

Pink again represents people tweeting
in English
31:21 - 31:23

and there's a lot of bilinguals.
31:23 - 31:27

So these kind of dark blues
represent the bilinguals.
31:27 - 31:29

And these are two groups
31:29 - 31:33

that if you've seen before,
the gatekeeper and the language bridge
31:33 - 31:35

progressively getting closer and closer
31:35 - 31:41

with more and more links
across languages.
31:41 - 31:43

In this case, this is like the extreme.
31:43 - 31:46

The links between the two languages
are so dense
31:46 - 31:51

that you cannot almost distinguish
where the border is
31:51 - 31:53

between the two language groups.
31:53 - 31:59

And, interestingly, the border might be
even only noticeable
31:59 - 32:01

because there's a lot of bilinguals
around it.
32:02 - 32:05

And this is the union type
where they unite.
32:07 - 32:10

And finally, the peripheral language type.
32:10 - 32:14

This is a Brazilian guy,
the network of a Brazilian guy
32:15 - 32:17

where you have--
32:17 - 32:19

probably he lives in the United States
or something like that--
32:19 - 32:23

because this guy has mostly
all this big group of friends
32:23 - 32:25

tweeting in English.
32:27 - 32:32

And then there's the side tentacle
running outside, using Portuguese.
32:35 - 32:36

And this is like a periphery landscape.
32:36 - 32:39

So, in the periphery there's a small group
of Portuguese language.
32:40 - 32:45

Now, I forgot to mention that there's dots
that are light yellow or white.
32:45 - 32:48

Those are the ones that have no data.
32:49 - 32:51

So, I don't know
the language they're using
32:51 - 32:53

because either their accounts are closed
32:53 - 32:58

or for some reason, in between the collection
of data they closed the account.
32:59 - 33:03

Mostly, the reason
is that they're private accounts
33:04 - 33:06

where you cannot get the data from.
33:06 - 33:09

I think somewhere I read
it was about 5 percent.
33:09 - 33:10

I'm not sure.
33:10 - 33:14

But for one reason or another,
I don't have that information.
33:17 - 33:21

Now, why am I classifying them?
These networks?
33:23 - 33:26

Well, the reason is that--
33:26 - 33:29

well, there are some studies
that demonstrate that the social structure
33:29 - 33:34

the structure of the social networks
influences the spread of information.
33:34 - 33:36

How information disseminates
in the network.
33:39 - 33:43

So, I'm just assuming
that these different structures
33:43 - 33:46

are going to influence the spread
of information.
33:47 - 33:50

But this is a study that has to be done.
33:50 - 33:53

I cannot demonstrate that one
of these types
33:53 - 33:56

facilitates the spread of information.
33:56 - 34:02

I can only say that I am assuming,
so that potential study
34:04 - 34:09

could just look at, for example,
if gatekeeper and language bridges
34:11 - 34:16

are not as good for spreading information
as union and integration types.
34:20 - 34:25

Right, we can just assume
because of the cross-language links
34:28 - 34:33

so, how many links there are
or the ratio of discourse language
34:33 - 34:38

may potentially facilitate information
diffusion in these cases.
34:40 - 34:43

So, that study needs to be done.
34:43 - 34:45

I cannot say what's going to happen!
34:45 - 34:47

I just assume it's going to be like that.
34:49 - 34:52

So that is the reason why I classify them.
34:52 - 34:55

I have some network statistics.
34:56 - 35:01

We've made about an 80 percent accuracy
guess, which is quite good,
35:01 - 35:02

but the sample is small.
35:08 - 35:11

So now, do you have any more questions
before I move past to the next study?
35:14 - 35:15

man) I was curious as to how many--
35:15 - 35:19

what was the selection process like
to find the 92 users?
35:20 - 35:23

Well, this is what I've been spending
the beginning
35:23 - 35:27

about just using two stopwords
from two different languages
35:27 - 35:31

typing that in the search box in Google
and searching Twitter
35:31 - 35:33

and then once--
35:33 - 35:36

Basically you just go through
the list of results
35:36 - 35:42

and start opening the profile,
counting the tweets.
35:42 - 35:45

How many in this language,
how many in the other.
35:45 - 35:47

And we put a threshold of 10 percent
35:47 - 35:53

they had to have written 10 percent
of the tweets in a second language
35:53 - 35:57

and you couldn't count retweets
or automatic posting.
35:58 - 36:00

We also had to manually discard
these spammers.
36:02 - 36:04

So, that was the process.
36:06 - 36:10

(woman) And that's a paid search
through Google?
36:10 - 36:13

No, that we did manually
36:13 - 36:14

and then once--
36:14 - 36:20

So the other thing you can say is you can
use these core multilingual users
36:21 - 36:24

and then do what I did for behavior
in these social networks
36:24 - 36:29

which is once you extract the friends
and extract the messages of the friends
36:31 - 36:34

and automatically find the language
36:34 - 36:37

then you can say "Oh, this person
is multilingual" automatically.
36:37 - 36:41

You just process it and you can detect
a lot more multilingual people
36:41 - 36:43

through that process.
36:43 - 36:46

The paid process was sending these posts
36:46 - 36:49

to the Google language
identification tool.
36:50 - 36:55

So, what I did was clean each message
automatically.
36:56 - 37:00

Basically, eliminating the hashtags
37:01 - 37:05

and the mentions
that had an @ in front,
37:05 - 37:10

symbols, URLs, all those things
I would automatically eliminate them
37:10 - 37:14

and then with the rest of the message,
I'd send that to the Google API
37:14 - 37:16

for language identification
37:16 - 37:22

and the Google API would give me
a language level and a confidence binary.
37:22 - 37:23

And that for each message.
37:23 - 37:26

And then I built the algorithm
with the help of Jen Golbeck
37:26 - 37:31

to decide, well I have 30 messages,
500 English
37:31 - 37:35

10 million Spanish and then one in Swahili
which is unlikely
37:37 - 37:40

and you had to decide
the confidence value--
37:40 - 37:43

So I used rules, defined rules
37:43 - 37:46

but it could be done
statistically I think.
37:46 - 37:48

And write some statistical method
to decide
37:48 - 37:52

"well this person actually is bilingual"
or whatever.
37:53 - 37:54

That's the process.
37:54 - 37:56

It's long!
37:56 - 37:57

Yes.
37:58 - 38:00

(woman) Hi, I understand
that you did it manually
38:00 - 38:05

but currently in existing research field
is there any software
38:05 - 38:08

that we can use to capture,
38:08 - 38:12

to have access to all
these different tweets?
38:12 - 38:15

And to capture the different categories?
[inaudible]
38:15 - 38:18

Ok, so you mean the extraction?
38:19 - 38:20

(woman) Yeah.
38:20 - 38:21

No, I didn't do it manually.
38:21 - 38:23

(woman) And the other,
I think the other part
38:23 - 38:26

of your data presentation
is visualizations coming out
38:26 - 38:27

like this graph.
38:27 - 38:33

Can you show us what kind of research
do we have for social scientists
38:33 - 38:35

to present the data in a visual form?
38:35 - 38:37

This is a tool I would recommend.
38:37 - 38:39

[inaudible]
38:39 - 38:41

So, the first question.
38:43 - 38:46

All the extraction from Twitter,
it was automatic.
38:46 - 38:49

I didn't copy the tweets,
it was automatic.
38:49 - 38:51

I used the Twitter API.
38:51 - 38:55

They have a process
for registered developers
38:55 - 38:57

and I extracted it automatically.
39:02 - 39:06

Now, the tools, and I forgot
to put that in this slide
39:06 - 39:09

but in the beginning,
when I showed you the first visualization
39:09 - 39:12

I put the name of the tool in--
39:13 - 39:18

I don't know if I translate well,
but I think it's G-E--
39:18 - 39:24

You can see here, G-E-P-H-I,
I don't know how to pronounce it!
39:24 - 39:27

["Jefy" I think...]
39:28 - 39:32

So, this is the one I've used
for the visualizations
39:34 - 39:37

and it's good because you can use it
on any platform.
39:37 - 39:42

So both on a Mac or a PC or Linux.
39:45 - 39:47

Now, it has limitations for...
39:47 - 39:51

mostly for network statistics
in my opinion.
39:54 - 39:57

The other one, that is very popular
is Node XL.
39:57 - 40:01

And in fact it was developed
here in the ATI lab.
40:02 - 40:04

In the lab where I work.
40:05 - 40:07

So, they collaborated with Microsoft.
40:07 - 40:10

It's a template for Excel
40:11 - 40:13

and it allows--
40:13 - 40:18

In fact they are still adding new features
and there's two people working on it
40:18 - 40:20

in the lab.
40:20 - 40:24

But the reason I haven't used it here,
is because I have a Mac
40:24 - 40:29

and also there's another reason
I like this positioning algorithm
40:31 - 40:33

and this is...
40:33 - 40:37

this is another issue
I haven't talked about
40:37 - 40:40

is how you actually place the dots.
40:40 - 40:47

And actually these algorithms for layout
use force-directed schemes
40:49 - 40:51

like in physics science.
40:51 - 40:54

So if a node has a lot of links
with another node
40:54 - 40:57

they put it closer,
so it's like there's forces
40:57 - 41:00

or strings attaching the nodes.
41:01 - 41:04

And depending on how many strings
there are, they're closer or farther.
41:05 - 41:08

There's physics science rules
for placing them.
41:08 - 41:10

But there's different algorithms
41:10 - 41:15

but the other reason I chose Gephi
is that it has an algorithm
41:15 - 41:21

specifically in this tool
that places my language groups separately
41:21 - 41:24

more than any other algorithm
that I could use in Node XL.
41:24 - 41:29

And it was more useful
to see the groups separated.
41:30 - 41:33

But you can use both
depending on what you want to do.
41:33 - 41:36

They both have weaknesses and strengths,
41:36 - 41:39

different depending
on what you have to do.
41:41 - 41:47

Node XL has more features
for processing many networks
41:48 - 41:51

and extracting network statistics
for many networks at the same time.
41:52 - 41:57

And it has a lot of interesting features,
maybe this is more manual.
41:59 - 42:00

I don't know.
42:00 - 42:05

Somebody called it
"the Photoshop of visualization".
42:09 - 42:14

So I'm going to briefly comment
on the factor analysis.
42:14 - 42:19

The point here, what I want to see
is multilingual users of Twitter
42:21 - 42:24

are aware of their audience in a way.
42:25 - 42:29

And they somehow perceive
how many followers
42:29 - 42:32

of this language or the other they have.
42:33 - 42:36

Maybe not very consciously,
42:38 - 42:40

but they perceive something.
42:40 - 42:42

So, I went to see how this social network
42:42 - 42:47

the fact that there's many languages
or just one in the social network
42:48 - 42:53

can affect the choice of language in this person,
the ego person.
42:55 - 42:58

So, I actually did a lot of testing,
different variables,
42:58 - 43:01

but I'm just going to focus
on the essence,
43:01 - 43:06

which is I have my dependent variable
which is the proportion of English
43:06 - 43:11

used by the ego has 50 posts,
maybe 60 percent of them are in English
43:12 - 43:14

and 40 percent in Spanish,
I don't know.
43:15 - 43:19

And then they have the factor
of how many users in the network
43:19 - 43:21

are in English
and how many are using other languages.
43:22 - 43:24

And then the multilingual index
of the network
43:24 - 43:26

- and this is my favorite part -
43:26 - 43:30

because it's basically saying
43:30 - 43:36

"is multilingualism encouraging English
as a lingua franca?"
43:37 - 43:42

especially on Twitter, where we have these
public posts that anybody can read.
43:43 - 43:47

So anyway... I'm not going to go
into the technical details
43:48 - 43:51

of bi-nodal statistical interpretation.
43:51 - 43:55

What I wanted to do is
that in these combined effects
43:56 - 44:00

of the factors,
which one was more important?
44:01 - 44:03

Was heavier than the others?
44:03 - 44:07

Had more weight in defining these
proportional [inaudible] used by the ego.
44:09 - 44:11

I tried other factors,
44:11 - 44:14

I also looked at the use
of non-English language
44:15 - 44:18

In the end... there are certain,
44:20 - 44:21

I mean, they're obvious somehow.
44:21 - 44:24

I think it's more interesting the process
of what I've learned
44:24 - 44:26

than the results themselves.
44:27 - 44:30

Because basically what I've learned
is that, yeah,
44:31 - 44:33

the English use of the network
44:33 - 44:36

is encouraged by the use
of English by the ego
44:36 - 44:41

and in a certain way it's so important
that any other factor
44:41 - 44:44

is really not that important.
44:45 - 44:49

And even the second most important,
the multilingual index
44:50 - 44:55

was so light compared with
the heavy impact of English
44:56 - 44:57

used in the network.
44:58 - 45:00

But what I thought was really interesting
45:00 - 45:03

was how do you define
the multlinguality of a network?
45:04 - 45:07

And with this I got help
from Jordan Boyd-Graber
45:07 - 45:09

who is also in the iSchool
45:09 - 45:14

and in the lab for computational lab,
the information processing lab
45:14 - 45:15

here in Maryland.
45:15 - 45:18

He helped me
with all these technical aspects.
45:18 - 45:21

And he was the one suggesting
"Well, why don't you look--"
45:21 - 45:25

"instead of just looking at the number
of languages in the network...
45:25 - 45:29

"because sometimes you get
wrongly detected languages...
45:29 - 45:30

like Swahili. Well, no one was really
speaking Swahihi in this network.
45:33 - 45:37

There were technical challenges,
like I explained to you.
45:38 - 45:42

So maybe there's a high number
of languages in the network
45:42 - 45:44

but the network is mostly monolingual.
45:44 - 45:49

Mostly everybody uses English
and just a few people maybe use others
45:50 - 45:52

or maybe just it got wrongly detected.
45:52 - 45:55

And maybe you're just saying
45:55 - 45:57

"Oh yeah, there's ten languages
in the network!"
45:57 - 46:00

and actually it's not
a very multilingual network at all.
46:00 - 46:03

So, we came up with this, the entropy.
46:03 - 46:06

And this is a physics concept
that measures the disorder
46:06 - 46:08

in a system.
46:08 - 46:11

And in this case, the entropy
would be my multilingual index
46:11 - 46:17

and what it's doing is providing a value
between 0 and 1
46:17 - 46:23

So, with 0 it's a very homogeneous system
everyone speaks the same language
46:24 - 46:27

and if it's closer to 1,
it's really a heterogeneous
46:27 - 46:29

and it places an importance
46:29 - 46:32

in how many people
are using its language.
46:32 - 46:36

So, this is the equation,
just to show you it.
46:38 - 46:41

And it takes into account the number
of languages in the network
46:41 - 46:45

and then one of the variables
is how many nodes in that language
46:45 - 46:48

that there are divided by the total number
46:48 - 46:51

and this is what gives the proportion
for example.
46:53 - 46:57

So just to let you know
that there's interesting lessons
46:57 - 46:58

from this study.
46:58 - 47:00

Despite the research not being exciting!
47:01 - 47:03

And this is what I'm doing right now.
47:05 - 47:08

So, the intrinsic characteristic
of the message
47:08 - 47:11

how that influences the language choice.
47:11 - 47:16

First, I'm wondering,
because I just saw it in the content
47:19 - 47:22

are replies encouraging people
to use their native language?
47:23 - 47:27

And public posts encouraging people
to use English as a lingua franca?
47:28 - 47:30

This is one that showed up the same.
47:30 - 47:34

And I changed the handle,
for privacy reasons...
47:35 - 47:38

So this is the reply to somebody
and it's in Arabic.
47:38 - 47:41

And this is a public posting
and it's in English.
47:42 - 47:46

Now, the thing I'm looking at
is public analysis
47:46 - 47:50

and I'm considering with Jordan
to do some automatic topic analysis
47:51 - 47:54

because there's many languages,
so I cannot decode it all
47:55 - 47:57

in many of them.
47:57 - 47:58

Only in three, maybe four...
48:00 - 48:01

So, I'm wondering,
48:01 - 48:04

are technology topics favoring
the use of English?
48:05 - 48:10

And other topics,
international news maybe?
48:11 - 48:16

Whereas other topics
like national news or songs
48:16 - 48:19

they might be encouraging the use
of native languages.
48:21 - 48:23

And then I'm looking
if there's translations
48:23 - 48:27

or if there's cross-cultural words
that you can detect.
48:27 - 48:29

For instance, this person
is writing in English
48:29 - 48:33

but it recommending a visit to a museum
in the city of Lille in France.
48:34 - 48:39

So this person knows the city in France,
knows that to visit the museum
48:39 - 48:41

you go there.
48:41 - 48:43

And this is what I call
cross-cultural words.
48:44 - 48:49

[What I kind of found] is that surprisingly
there's not many translation behaviors
48:49 - 48:53

going on, despite these people
being multilingual.
48:53 - 48:56

And this is what is going to trigger
some reflections.
49:00 - 49:02

How am I doing on time?
49:04 - 49:06

(woman) 1:22.
49:06 - 49:10

(man) Umm, it's usually an hour long...
49:10 - 49:14

So, I will go on with my reflections.
49:14 - 49:18

to encourage some thoughts.
49:18 - 49:22

So the greatest connecting power
is the will of users who want
49:22 - 49:23

to be connected.
49:23 - 49:28

This is a really nice quality,
because the communities of interest
49:28 - 49:32

in social media, in Twitter
is what is bringing people
49:32 - 49:34

from different countries, together.
49:35 - 49:41

And also experiences,
like the Voluntweeters,
49:42 - 49:46

so after the earthquake in Haiti,
there were these spontaneous
49:46 - 49:49

self-organizations of Twitter users
for translating tweets
49:50 - 49:54

and they called themselves Voluntweeters,
there's a paper about that--
49:54 - 49:59

So this is the triggering
of social connections
50:01 - 50:04

across countries, across borders
and across languages.
50:07 - 50:10

But even when the social structure
could potentially facilitate
50:10 - 50:13

information diffusion
and cross-language linking
50:15 - 50:17

this condition is not sufficient.
50:17 - 50:20

There are other factors
like the design of the interfaces
50:20 - 50:22

and the design of systems
that can influence...
50:23 - 50:27

can promote, or not translation behaviors
and cross-cultural awareness.
50:28 - 50:32

And the Wikipedia
of cross-language linking
50:32 - 50:35

you have links for many languages
for every article.
50:37 - 50:41

We also still acknowledge the dynamic
language preferences of multilingual users
50:42 - 50:44

so they could address their messages
to the appropriate audience.
50:44 - 50:47

I like the solution of Google+
with their circles
50:48 - 50:52

where I can put my friends and family
in Spain in a circle
50:52 - 50:55

and write them in Spanish.
50:55 - 51:01

And then the recommendation of people
based on language profile
51:01 - 51:04

would be useful for this spontaneous
self-organization.
51:06 - 51:08

So, these are some of the things.
51:08 - 51:10

The impact of mediation.
51:11 - 51:13

Global Voices is
an international community of bloggers
51:13 - 51:18

that connect bloggers and citizens
from around the world
51:19 - 51:21

in different languages.
51:21 - 51:23

And Scott Hale
51:23 - 51:27

a student from Oxford University
led a very interesting study
51:27 - 51:34

after the earthquake in Haiti about blogs
in Spanish, Japanese and English
51:36 - 51:39

and he looked
at the cross-language linking
51:39 - 51:41

and focusing on this topic
over time.
51:41 - 51:45

And he discovered that 50 percent
of the cross-language linking
51:45 - 51:48

was happening through this platform,
Global Voices.
51:49 - 51:52

So, it had a very big impact
in the language links.
51:54 - 51:58

And finally, social media,
big media outlets,
51:58 - 52:02

people are interconnected
in these complex networks
52:05 - 52:09

and underlying is this language ecosystem.
52:09 - 52:13

So we have the language ecosystem,
and on top of that
52:13 - 52:15

we have the social media ecosystem.
52:15 - 52:20

People would share a video from YouTube
on Twitter, or news on Facebook.
52:21 - 52:26

What happened if we integrate
in this ecosystem
52:27 - 52:31

these platforms, like Global Voices,
like Universal Subtitles
52:31 - 52:34

which is a platform
for crowdsourcing subtitling of videos
52:34 - 52:37

and translation of subtitles
for videos.
52:38 - 52:42

If you integrate that and this
starts connecting, starts building paths
52:42 - 52:46

between languages,
that didn't exist before.
52:46 - 52:51

So I think we should make it easy
for multilingual people to translate
52:51 - 52:55

and subtitle all the content they like,
their favorite content
52:56 - 53:00

and share it with the appropriate audience
so they can start connecting
53:00 - 53:03

the language islands of the internet.
53:03 - 53:06

And that way stories will travel
all over the world.
53:09 - 53:12

Particularly I would like to thank
Jen Golbeck, my adviser
53:12 - 53:14

and Fulbright for supporting
this research.
53:14 - 53:19

And then I open the space
for questions and your ideas
53:19 - 53:22

if this has triggered some thoughts.
53:24 - 53:26

(woman) I have a question
about how this relates
53:26 - 53:28

to your Yahoo award.
53:29 - 53:35

Well, they have the Internet Experiences
lab in California.
53:35 - 53:36

And they--
53:36 - 53:40

So, we tend to think
maybe it's a super tiny place
53:40 - 53:43

but actually there are fields
53:43 - 53:45

and I applied for the social systems.
53:45 - 53:49

The social systems are a category.
53:49 - 53:55

And I think that was embedded
in the Internet Experience lab
53:57 - 53:58

and yeah, they liked it.
53:59 - 54:02

(man) But is it this
work that they are interested in?
54:02 - 54:03

Yes.
54:03 - 54:04

- The languages?
- Yes.
54:04 - 54:08

Well, now I have results,
because I wrote up reports
54:09 - 54:12

about what my work was about.
54:17 - 54:18

Great.
54:22 - 54:23

Yes?
54:23 - 54:26

(woman) I was thinking about
if you analyzed the place...
54:26 - 54:31

like if there's any relationship
between tweeters and tweets
54:31 - 54:34

and the place that the people are.
54:36 - 54:40

I mean, because it's not the same
being a Brazilian in Brazil
54:40 - 54:43

and tweeting in Portuguese
or being Brazilian in the US
54:43 - 54:45

and tweeting in Portuguese--
54:46 - 54:49

There's many, many factors
that I haven't looked at.
54:50 - 54:52

It's not part of your study?
54:52 - 54:54

But because I had to scope it somehow.
54:54 - 54:56

There's so many factors.
54:57 - 55:00

Geography was one that I was originally
intending to look at
55:00 - 55:04

but I found there were so many problems
to actually get the right geography
55:04 - 55:07

the right geolocation.
55:08 - 55:12

The problem is that I didn't originally
collect the geolocation.
55:12 - 55:16

I think only a small percentage
of messages have...
55:16 - 55:18

geolocated information.
55:19 - 55:21

I'm not sure about the percentage there.
55:21 - 55:25

So there's only a small percentage
of messages that have geolocation.
55:25 - 55:28

There's issues with the accuracy...
55:28 - 55:31

What I have collected is the information
in their profile
55:32 - 55:35

they can put the information
about the place,
55:35 - 55:40

but sometimes it's more
or less trustworthy,
55:40 - 55:43

sometimes there's nothing,
and sometimes there's just crazy stuff.
55:43 - 55:45

(audience laughs)
55:47 - 55:50

So, something absolutely has to be there.
55:50 - 55:55

If I wanted to expand this,
geography would be a nice place to go!
55:55 - 55:57

(woman) Ok.
56:00 - 56:01

Yes?
56:01 - 56:02

(man) Could you say a little bit more
56:02 - 56:05

I think you said about the visualization
choices you made?
56:05 - 56:06

Oh yes, well...
56:08 - 56:11

I tried this tool, the Node XL,
56:11 - 56:13

I used both Node XL and Gephi.
56:14 - 56:15

There's more...
56:16 - 56:20

I think there's, I don't remember the name
there's one that was developed
56:20 - 56:22

here in Maryland
56:22 - 56:24

but it's not as user-friendly.
56:26 - 56:30

But I've forgotten the name,
I will have to look it up.
56:30 - 56:34

And there's a lot of tools
that are for really technical people
56:35 - 56:37

that are handling millions of nodes.
56:38 - 56:41

Because with these tools,
for social scientists or humanists
56:41 - 56:42

maybe they are not.
56:42 - 56:49

Some tools can have maybe 300-400 nodes
and still be understandable.
56:51 - 56:56

But if you go beyond that,
actually visualizations get crazy
56:56 - 57:02

and even for more technical tools
for more technical people
57:03 - 57:07

there are hundreds or millions,
they cannot do visualizations
57:08 - 57:12

at some point they just give you
statistical measures.
57:14 - 57:15

I have to leave it out.
57:15 - 57:17

I have a list of tools and that
57:17 - 57:21

but if I need the names,
I need to go through everything.
57:23 - 57:25

(woman) But yours was Mac-accessible?
57:25 - 57:32

Yes, this Gephi tool is Mac-accessible,
you can use it with Microsoft
57:32 - 57:34

with Mac and with Linux.
57:36 - 57:38

And I forgot to say,
it's open source.
57:43 - 57:49

(woman) Did you find
studying languages and internet
57:49 - 57:53

was like a place, unexplored?
57:53 - 57:55

Like here in the United States?
57:55 - 58:00

Like when you began studying
or analyzing this
58:00 - 58:04

you felt that a lot of people
are doing this
58:04 - 58:06

or nobody is doing this
58:06 - 58:08

and I'm the first one trying to--
58:08 - 58:13

I'm not the first one,
but it's a very new area
58:13 - 58:15

to be exploring.
58:15 - 58:17

So, it's very exciting
because of that.
58:17 - 58:19

Because there's so many
unanswered questions
58:19 - 58:24

and I find that surprisingly enough
the United States is not paying so much attention
58:24 - 58:26

about multilinguality issues
58:26 - 58:31

And I think that language policies
are very monolingual-oriented
58:31 - 58:33

but it's terrible
58:33 - 58:37

because there's a whole lot
of multilinguality in this country.
58:37 - 58:41

There's so many people
speaking different languages
58:43 - 58:45

that I'm so amazed
about that contradiction.
58:46 - 58:49

Because in Europe,
it's an obvious challenge for us
58:49 - 58:52

because we need to understand each other
between all these countries
58:52 - 58:54

of the European Union.
58:54 - 58:58

And there's a lot of money invested
in research that relates to multilinguality
58:59 - 59:01

and communication in languages
59:01 - 59:05

and technology in particular,
cross-language systems
59:05 - 59:09

and in libraries there's a lot of work
going on.
59:09 - 59:14

There's investment in the research.
59:15 - 59:18

So yeah, maybe in terms of investment
59:18 - 59:22

the European Union is
not a bad place to be.
59:22 - 59:24

Better than the United States!
59:24 - 59:27

But at the same time,
what I find interesting
59:27 - 59:33

is that here when I talk about it
people are really interested
59:35 - 59:38

and interested in the subject
and excited about it.
59:38 - 59:41

Maybe in Europe it looks more
like old news.
59:41 - 59:44

Like "yeah, we already know that."
59:44 - 59:46

(audience laughs)
59:46 - 59:50

So I find that it's exciting
to be seeing the audience
59:50 - 59:52

like "Oh yeah!"
It's so new.
59:53 - 59:54

*(woman) Yes.
59:59 - 60:03

(woman) As the emerging view
of research in the United States
60:03 - 60:10

can you show me which institutions
or which area of academic institutions
60:12 - 60:15

actually have more invested
in this topic in the US?
60:16 - 60:19

I'm not sure about the institutions.
60:21 - 60:26

What I know, particularly,
in Indiana there's work
60:27 - 60:29

because Susan Herring
is a researcher there.
60:31 - 60:33

She has inspired my work.
60:33 - 60:36

She published a book
The Multilingual Internet
60:36 - 60:41

and she has done research on blogs,
also communities
60:42 - 60:45

of different languages connecting blogs
in the blogosphere.
60:45 - 60:51

So she has been one of the ones,
one of the first tackling these issues
60:51 - 60:55

and she's still going
and she's doing something.
60:55 - 60:59

So, it's the University of Indiana,
I think.
61:01 - 61:03

Yeah, Susan Herring.
Look for her!
61:06 - 61:09

And also at the same university
there's Paolillo.
61:10 - 61:13

He's also doing research
in this area
61:13 - 61:19

and he actually published for UNESCO
for research on language diversity
61:19 - 61:20

on the internet.
61:22 - 61:23

So Susan Herring and Paolillo,
61:23 - 61:25

they are at the same university.
61:27 - 61:30

Those are my inspiring ones.
61:34 - 61:37

Well, at Harvard at the Berkman Center
of Internet and Society also did
61:37 - 61:39

this mapping of the blogs.
61:39 - 61:41

But they don't focus on languages.
61:42 - 61:45

But there's tangential thing
around there.
61:49 - 61:51

(man) One more question?
61:54 - 61:55

Well, thank you very much!
61:55 - 61:56

Thanks!
61:56 - 61:58

(audience applauds)

Title:: Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
Description:: more » « less
Video Language:: English
Team:: MITH Captions (Amara)
Project:: BATCH 1

	on_demand_122 edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa approved English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
	Lena Capa edited English subtitles for Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World

Show all

English subtitles

Revisions Compare revisions

Revision 73 Edited

on_demand_122
Revision 72 Edited

Lena Capa

	Revision Number	Author	Created
	73	on_demand_122
	72	Lena Capa

Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World

Revisions Compare revisions

Our website uses cookies

Operating cookies (Required)