Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
-
0:00 - 0:02Good morning, everyone.
-
0:03 - 0:06Thank you for coming here
[unclear] of the semester. -
0:08 - 0:10So, I'm going to start.
-
0:11 - 0:14Access to the internet
is greater than ever before -
0:14 - 0:17and as a consequence,
it's becoming more multilingual. -
0:19 - 0:23However, there's evidence of segmentation
of cyberspace -
0:23 - 0:25due to language and national borders.
-
0:28 - 0:31This image serves to illustrate that.
-
0:32 - 0:36This is the language communities
of Twitter in Europe. -
0:37 - 0:41So, what you can see are tweets
geolocated over a map of Europe -
0:41 - 0:44and the different colors
represent the different languages. -
0:45 - 0:51You can even see regional languages
like Catalan in the Catalan region of Spain -
0:52 - 0:56And this is going to be useful
for an example I'm going to use later. -
1:02 - 1:04I'm interested in Twitter in particular,
-
1:04 - 1:07because of the speed
of information dissemination -
1:07 - 1:11and that most of this information
is publicly accessible. -
1:14 - 1:19I'm going to illustrate this
with a capture -
1:19 - 1:22of a dynamic visualization
you can find on the Twitter blog -
1:22 - 1:25by Miguel Rios.
-
1:25 - 1:29And what you can see here
is the global flow of tweets -
1:29 - 1:31after the earthquake in Japan.
-
1:32 - 1:35In pink, there are the tweets
coming out of Japan -
1:35 - 1:37and, in green, the retweets
all over the world. -
1:39 - 1:45This illustrates that in Twitter
information is spreading across countries. -
1:46 - 1:48But how can this happen?
-
1:49 - 1:55Expatriates, migrants, minorities.
diaspora communities, language learners -
1:55 - 1:59all play an important role
in building transnational networks -
1:59 - 2:03and cultural bridges
between nations and communities. -
2:04 - 2:06They are the multilingual users
on the internet. -
2:08 - 2:11The overarching research question is:
-
2:11 - 2:17how are multilingual users of Twitter
connecting different language groups? -
2:22 - 2:27In 2009, the Berkman Center of Internet
and Society at Harvard University -
2:27 - 2:30mapped the Arabic blogosphere
-
2:30 - 2:33and they described a key concept
for my research. -
2:35 - 2:41They discovered an English bridge
and a French bridge of bloggers -
2:41 - 2:46that were writing in their native
Arabic language and in English or French. -
2:47 - 2:52And they were connecting the different
national blogospheres -
2:52 - 2:53with the international one.
-
2:55 - 3:00This might have played a role in the Arab
popular uprisings in 2011 -
3:00 - 3:03for reaching out to the world.
-
3:05 - 3:09And this is connected with a concept
that first appeared in 2008 -
3:09 - 3:12of the bridge bloggers.
-
3:14 - 3:16So, bridge bloggers are bloggers
-
3:16 - 3:20that are trying to connect
their local communities -
3:20 - 3:23to a wider global audience.
-
3:25 - 3:29The image you can see here
is actually the visualization they created -
3:29 - 3:33of mapping the Arabic blogosphere.
-
3:35 - 3:37Each dot is a blogger, or a blog.
-
3:39 - 3:43The size represents their popularity,
so how many incoming links they have -
3:43 - 3:45and they grouped them--
-
3:45 - 3:48the neighborhoods they created
-
3:48 - 3:52in relation to the linking
between them. -
3:53 - 3:56So, the ones that are grouped together
are linking among each other. -
3:57 - 3:59The colors are a different question.
-
3:59 - 4:06The colors represent "attentive clusters",
that's how they call it. -
4:06 - 4:12And they look at their online resources
and media outlets -
4:12 - 4:14these blogs were linking to.
-
4:15 - 4:19So, blogs of the same colors
are following the same media outlets -
4:19 - 4:21and online resources.
-
4:21 - 4:25And they did human coding
to label those groups. -
4:26 - 4:30And here is where we see
the label English grids -
4:30 - 4:32the responses from Cuba
in English -
4:32 - 4:34and up there, there's [unclear] France.
-
4:36 - 4:41And so I think it's important to retain
the concept of attentive clusters. -
4:44 - 4:49Now, let's go back to 2011
during the Arab popular uprisings. -
4:50 - 4:55And I'll show you a visualization
of the influence network -
4:55 - 4:57of Twitter users in Egypt.
-
4:58 - 5:00So, what you're seeing here
-
5:00 - 5:04just imagine people down the street
at Tahrir Square -
5:04 - 5:08tweeting in Arabic about what's going on
on the ground. -
5:08 - 5:12And those are the people in red.
-
5:13 - 5:18So, these red dots represent users
that are tweeting in Arabic. -
5:18 - 5:20Then we have the international community
-
5:20 - 5:26or even Americans, British and so on
tweeting in English. -
5:26 - 5:28And they are in blue,
those blue dots. -
5:29 - 5:33And then, interestingly, we have
people in between them. -
5:33 - 5:38which are illustrated in different
degrees of violet, or violet shades. -
5:39 - 5:43This represents the fact that they
are tweeting in both Arabic and English. -
5:45 - 5:47So, what we're seeing
is the bridge Twitters -
5:48 - 5:53because, like Ethan Zuckermann called them
"bridge bloggers". -
5:56 - 5:59So, another context.
-
5:59 - 6:05The same year, 2011, a lot
of big protests were going on in Europe. -
6:05 - 6:07And in particular, in Spain.
-
6:07 - 6:12They started on May 15th 2011
there were massive protests. -
6:13 - 6:17And of because of this context,
this situation -
6:17 - 6:23new attentive clusters were emerging
in the social media landscape of Spain. -
6:28 - 6:34Now, this is a visualization you can find
in the Socialflow blog, research blog -
6:34 - 6:35on social networks.
-
6:35 - 6:41And what it is, is it tracks the origin
and the initial spread -
6:41 - 6:45of the hashtag #occupywallstreet
in Twitter. -
6:47 - 6:52They detected that one of the first users
of the hashtag #occupywallstreet -
6:52 - 6:57was on July 13th 2011, linking to a blog
post of Adbusters. -
6:58 - 7:02So you have the Twitter account
of Adbusters there, very big -
7:02 - 7:05because it's being retweeted a lot.
-
7:06 - 7:07And mentioned a lot.
-
7:08 - 7:14And they collected these mentions
and the tweets that had these mentions -
7:14 - 7:17and these retweets with the hashtag
during July 13th. -
7:18 - 7:21From July 13th to July 23rd.
-
7:21 - 7:24So, from the first 10 days
of the use of this hashtag -
7:24 - 7:27it was from the very beginning of the use
of this hashtag on Twitter. -
7:31 - 7:32They just mapped the accounts
-
7:33 - 7:39and the series of posts with the hashtag
and mentions with the hashtag -
7:41 - 7:43and the users that were connecting
-
7:43 - 7:45because of these mentions
and retweets. -
7:46 - 7:49Now the interesting thing
in this visualization -
7:49 - 7:51is that they
-
7:51 - 7:54the Socialflow people
particularly in [inaudible] -
7:55 - 8:00detected this Spanish brand
of users -
8:00 - 8:03were forming an attentive cluster.
-
8:06 - 8:09Mentioning and retweeting about it
in Spanish -
8:09 - 8:13using the hashtag in their messages
in Spanish. -
8:14 - 8:17And they point out in the blog
-
8:18 - 8:20that this Spanish contingent
-
8:20 - 8:24helped post and spread the word
about Occupy Wall Street -
8:24 - 8:29even before most of the United States
was aware of it. -
8:32 - 8:34So, I found that very interesting.
-
8:34 - 8:37And it was due to the context
in Spain at that moment -
8:38 - 8:45with big protests and new clusters
forming in the social media landscape. -
8:57 - 9:01Now I have shown you the importance
of these multilingual users -
9:01 - 9:06in connecting language communities
and spreading information -
9:06 - 9:09across countries, acting as mediators.
-
9:11 - 9:16But let's focus on another aspect
of connecting language groups -
9:16 - 9:17which is language choice.
-
9:18 - 9:23So I'm going to devote a moment
to speak about languages -
9:23 - 9:24and language choice.
-
9:28 - 9:31To understand languages in the world
-
9:31 - 9:33I'm going to use a telescope.
-
9:37 - 9:40So de Swaan...
-
9:41 - 9:45...proposed a theory called
the world language system -
9:45 - 9:46back in the 1990s.
-
9:47 - 9:51to explain the languages in the world.
-
9:52 - 9:55And he used a very beautiful metaphor,
the constellation. -
9:57 - 10:02So, in his theory there's about a dozen
languages in the world -
10:02 - 10:05that are the hearts of the system,
or the suns. -
10:06 - 10:07The suns of the system.
-
10:08 - 10:11For instance, English, French, Spanish,
Arabic and more. -
10:12 - 10:17And then there are hundreds,
maybe more than 100, 200... -
10:17 - 10:23national languages that are orbiting
around these suns like planets. -
10:24 - 10:28And finally we have regional
and minority languages -
10:28 - 10:32that are orbiting these planets
like satellites. -
10:33 - 10:38And he used this metaphor
to explain the power relationships -
10:38 - 10:40between languages.
-
10:40 - 10:43This is a theory of what he called
-
10:43 - 10:47"communication potential
and language competition" -
10:48 - 10:51A key point he made
-
10:52 - 10:55is that the system holds together
-
10:55 - 10:59thanks to multilingual people
and interpreters. -
11:00 - 11:03This is what's providing cohesion
to the system. -
11:04 - 11:07He also made a controversial proposal
-
11:07 - 11:11about the communication potential
of a language. -
11:12 - 11:15So, he proposed a formula,
a mathematical formula -
11:15 - 11:20where he could estimate the communication
potential of a language -
11:20 - 11:25and supposedly a person with tools
through learning and usage -
11:25 - 11:28based on the communications of that.
-
11:28 - 11:34For example, a person might decide
to learn English and use English -
11:35 - 11:41because not only does it provide
communication with English native speakers -
11:42 - 11:46but also, adding to that, it provides
the possibility to communicate -
11:46 - 11:50with all the second-language learners
of English -
11:50 - 11:53from many different languages,
many different countries. -
11:53 - 11:56So, supposedly, in history
-
11:56 - 12:00English provides
the greatest communication. -
12:01 - 12:05And he received some criticism,
because of the central role of English -
12:05 - 12:07in his theory
-
12:07 - 12:10He said it was the central hub
of all the system. -
12:13 - 12:20There's also the language ecology paradigm
first proposed by Haugen in 1972 -
12:22 - 12:25and there's this idea of an ecosystem
of languages -
12:25 - 12:30and, again, it's using another metaphor
-
12:30 - 12:32and because of this metaphor
-
12:32 - 12:34also appeared the idea
of endangered languages. -
12:36 - 12:40I'm going to briefly just read
the definition. -
12:40 - 12:43He defined the language ecology as:
-
12:43 - 12:47"the study of interactions between
any given language and its environment" -
12:48 - 12:49and what I think is very important:
-
12:49 - 12:54"language exists only in the minds
of its users" -
12:57 - 13:00which leads me to point at my research.
-
13:02 - 13:06In my research, I'm using a microscope
to see the cells -
13:06 - 13:09and my cells in my study
are the Twitter users. -
13:12 - 13:13Why is that?
-
13:16 - 13:20Because as Haugen explains,
there's a psychological dimension -
13:20 - 13:22to language ecology
-
13:22 - 13:25where language interacts
with other languages -
13:25 - 13:28in the minds of multilingual people.
-
13:29 - 13:32And there's a sociological dimension
to language ecology -
13:32 - 13:38where we use language to communicate
and interact with other people. -
13:38 - 13:44And this language ecology generates
because of the people -
13:44 - 13:46that decide to use that language
-
13:46 - 13:50learning and interacting
with people using it. -
13:51 - 13:55And this is the point
of language choice in languages. -
13:56 - 14:00So, I focus on the connections of people
and the language choice. -
14:04 - 14:08So, these are the four points
I'm going to be speaking about. -
14:08 - 14:13But actually the main focus
is going to be the first point -
14:13 - 14:19Social network analysis and the taxonomy
of intersections between language groups -
14:20 - 14:23This is where I'm going to be spending
most of the time. -
14:23 - 14:27And then very briefly,
just for compilation purposes -
14:27 - 14:30I'm going to speak about another
small study that I did -
14:30 - 14:31the factor analysis
-
14:31 - 14:34looking at the influence
of the social network -
14:34 - 14:39in the language choices of the users.
-
14:39 - 14:42So, how the social network
influences language choice -
14:42 - 14:43of our multilingual users.
-
14:45 - 14:50And then I'm going to briefly also talk
about the last study of my dissertation -
14:50 - 14:52that is still ongoing.
-
14:53 - 14:55So, I still have new research
to talk about. -
14:55 - 14:58And it's content analysis
-
14:58 - 15:00and in this case I'm focusing on
intrinsic factors -
15:00 - 15:03intrinsic to the messages
-
15:03 - 15:06about the topic,
and the type of exchange. -
15:06 - 15:08If it's a reply,
if it's a public post -
15:08 - 15:10and how that influences
the language choice as well. -
15:11 - 15:12And finally I will...
-
15:14 - 15:17I'm going to give you my reflections
-
15:18 - 15:23so I can invite your thoughts
and suggestions and discussions about it. -
15:27 - 15:29Briefly, I'm going to start
with the sampling -
15:29 - 15:32so I can talk about the rest
of the research. -
15:35 - 15:37So my focus is on multilingual users,
-
15:37 - 15:40how did I identify multilingual users
on Twitter? -
15:42 - 15:44It was giving me a headache.
-
15:44 - 15:47Finally what we decided...
-
15:49 - 15:50this research has been--
-
15:51 - 15:54I have always had the help
of Jennifer Golbeck, -
15:54 - 15:55she was my adviser.
-
15:55 - 15:57And I did this with her help.
-
15:58 - 16:02So what we did, was gather a list
of what is called stopwords. -
16:03 - 16:05From different languages
and you have a list over there. -
16:06 - 16:10And then the stopword lists
you can find them on the internet. -
16:10 - 16:13They are created
for computational linguistics -
16:13 - 16:16so they use it for filtering purposes.
-
16:17 - 16:19And they are common words
in a language. -
16:20 - 16:21Very common words in a language.
-
16:21 - 16:26So, sometimes they're used precisely
for eliminating them from texts -
16:26 - 16:29when they're in, for example,
searches in Google -
16:29 - 16:33the eliminate the stopwords,
the stopwords that you type -
16:33 - 16:35in the search.
-
16:35 - 16:38But in this case I wanted
to find the stopwords -
16:38 - 16:41that are very common in the language
to represent the language. -
16:41 - 16:44And so we had to select words
that were not written the same -
16:44 - 16:46as in another language.
-
16:46 - 16:48Sometimes, could be confusing
and ambiguous. -
16:51 - 16:54Then I typed in Google...
-
16:56 - 16:59one word in one language
and one word in another language. -
16:59 - 17:01Usually I was always using
one English word -
17:01 - 17:04and one word in a different language.
-
17:05 - 17:08And I looked in the Twitter domain.
-
17:09 - 17:12So the search results from Google
will give me the profiles -
17:13 - 17:20of people on Twitter that in theory
wrote messages in both languages. -
17:20 - 17:24We had to do a lot of hand-combing
to actually see if it was in two languages -
17:25 - 17:28or it was just that they were mentioning
an English song -
17:28 - 17:32the title of an English song
but they had no English in the rest. -
17:32 - 17:36So we had to ensure
that they were authoring tweets -
17:36 - 17:38in two languages.
-
17:38 - 17:40So writing them, not just retweeting them
-
17:40 - 17:43they were not just automatic postings
from Facebook. -
17:43 - 17:48So we had a long set of criteria
a lot of manual combing -
17:48 - 17:53and then finally we selected
92 multilingual users -
17:53 - 17:58and in total they used 19 languages,
2 or 3 languages per person. -
18:01 - 18:05Now, I don't know if you want to ask
some questions about the sampling -
18:05 - 18:08because there's a lot of details about it.
-
18:13 - 18:15No doubts?
-
18:15 - 18:17Or maybe they'll come later!
-
18:19 - 18:22Now, how do I do
the social networks analysis? -
18:23 - 18:28Well, now I have my 92 multilingual users
technically they are called the ego -
18:28 - 18:30of an egocentric network.
-
18:31 - 18:33This is the cell of my study.
-
18:34 - 18:36It started with the nucleus of the cell
-
18:36 - 18:38which is my multilingual user
-
18:38 - 18:40and then I go to Twitter
-
18:40 - 18:43and first of all I have instructed--
-
18:45 - 18:47so in this case my ego
is called the Painter -
18:48 - 18:54and I have extracted the last 50 messages
that he posted on Twitter -
18:54 - 18:57to see the languages
this person used-- is using. -
18:57 - 19:02And I see that he is using English,
Spanish and Catalan. -
19:03 - 19:05Catalan is a regional language in Spain
-
19:06 - 19:08and I have shown you on the map
the region before -
19:08 - 19:09where the region was.
-
19:09 - 19:12And they speak both Catalan
and Spanish. -
19:14 - 19:17So, this person is tweeting
in a minority language -
19:17 - 19:18a national language
-
19:18 - 19:21and also international.
-
19:27 - 19:32So, I already found the Painter
and I know what languages this person speaks -
19:32 - 19:34well, uses on Twitter,
-
19:34 - 19:36and then I extract
all the social networks. -
19:36 - 19:38So, the followers on Twitter
-
19:38 - 19:40you know that on Twitter
you have followers -
19:40 - 19:41and you follow people.
-
19:41 - 19:43I extracted both.
-
19:43 - 19:48The followers of the Painter
the people that are following him on Twitter -
19:48 - 19:52and also how the friends
are connecting to each other. -
19:52 - 19:57So, all of them, all of these dots
are the followers -
19:57 - 19:59the people following the Painter
on Twitter -
19:59 - 20:03and also I see how they connect
among each other, ok? -
20:05 - 20:09So the Painter follows Eduard
in the center -
20:10 - 20:12and it seems he's very popular.
-
20:14 - 20:17And then I extract the last 30 posts
of Eduard-- -
20:17 - 20:19there's a reason for that
-
20:19 - 20:22but vernacular
is mostly economy questions! -
20:25 - 20:26I will tell you why!
-
20:26 - 20:29So I extracted the last 30 posts of Eduard
-
20:29 - 20:32and then I do
automatic language identification -
20:32 - 20:37with the Google API
for language identification -
20:39 - 20:40which costs money.
-
20:41 - 20:43So you have to really think
about how many posts you want to send -
20:43 - 20:46to Google and how much money
you have available -
20:46 - 20:48and what is the accuracy
you're going to have -
20:48 - 20:51according to how many posts you send.
-
20:51 - 20:53There's a lot of testing going on there.
-
20:54 - 20:58I do the same with everybody
in the social network. -
20:59 - 21:00I extract the last 30 posts
-
21:00 - 21:02use the Google identification
-
21:02 - 21:08build that algorithm that decides
based on the languages of these 30 posts -
21:08 - 21:12is this person monolingual?
Is this person multilingual? -
21:12 - 21:13Which languages?
-
21:13 - 21:15And then I laddered them, ok.
-
21:17 - 21:19This is just a visualization behind the--
-
21:20 - 21:27Perhaps person 1 is monolingual,
or bilingual of two languages. -
21:32 - 21:36Now that I have all the friends
of the Painter -
21:36 - 21:37how they connect,
-
21:37 - 21:41I color code them
depending on the languages they are using. -
21:42 - 21:45And here, what you can see
is very interesting. -
21:46 - 21:49I don't know if you can distinguish
the colors well -
21:49 - 21:54because up here, this area,
that is like a triangle -
21:54 - 21:58there's a group of users
writing in English. -
21:59 - 22:01And it's pink.
Sort of pinkish. -
22:01 - 22:05And then, down here
there's this Spanish group -
22:05 - 22:07in light green.
-
22:08 - 22:12And, in the middle, the one
that perhaps doesn't distinguish as well -
22:12 - 22:15from the English,
is the Catalan group. -
22:16 - 22:19So the users writing in Catalan
in dark blue. -
22:20 - 22:22And then there's a set of violets
in between -
22:22 - 22:26and these violets represent
the bilingual users -
22:26 - 22:29either English and Catalan
or English and Spanish. -
22:30 - 22:33And then there's darker green
around here, -
22:33 - 22:36they are using both Catalan and Spanish.
-
22:36 - 22:38So there's a lot of bilinguals
going on. -
22:38 - 22:40And there's an interesting dynamics
-
22:40 - 22:43in that you have this English group
up there -
22:43 - 22:44and the Spanish group up here
-
22:44 - 22:46and the Catalan group in the middle.
-
22:46 - 22:49And this Catalan group is very mixed up
with the Spanish group -
22:50 - 22:52which makes sense,
because it's a bilingual community. -
23:01 - 23:07So, this is how I built the egocentric
network of my 92 multilingual users. -
23:09 - 23:11The Painter is just one of them.
I have 92. -
23:11 - 23:17I have 92 cells or egocentric networks
that I studied with my microscope. -
23:18 - 23:22Do you want to ask some questions
about this process -
23:22 - 23:23or this visualization?
-
23:25 - 23:30(person 1) Of the bilingual units,
are they users or tweets? -
23:31 - 23:32They are users, yeah.
-
23:32 - 23:36So, the dots represent people.
-
23:36 - 23:40So, like Eduard here.
They represent people. -
23:42 - 23:45Now each dot to determine the language
and the color -
23:45 - 23:48I extracted 30 posts
-
23:48 - 23:53So, it's an interesting question
because the 30 posts -
23:53 - 23:56have different language levels
assigned to them -
23:56 - 23:57especially if they were bilingual
-
23:57 - 24:02and I had to decide which language level
I was going to assign to the user. -
24:02 - 24:05So, I had to build an algorithm
with a set of rules -
24:10 - 24:11basically saying--
-
24:11 - 24:17the Google identification system
would give me a language -
24:17 - 24:18and a confidence level
-
24:18 - 24:19So if the confidence level was very low
-
24:19 - 24:24I would say "discard that"
because I had a series of pluristics -
24:24 - 24:30based on both the number of tweets
using a particular language -
24:30 - 24:33and also on the confidence level.
-
24:34 - 24:38And there are a lot
of technical challenges there as well. -
24:40 - 24:42(woman) So, it's possible
that some of these posts -
24:42 - 24:46many of these posts would be multilingual,
I'm sorry monolingual in one language or the other? -
24:46 - 24:52So it's also possible that some
of these individual posts -
24:52 - 24:54would mix languages?
-
24:55 - 24:57Yes, it is possible.
It's very possible! -
24:57 - 25:00It's very challenging
for the automatic system! -
25:02 - 25:04(woman) Right, ok.
I just wanted to be clear-- -
25:04 - 25:05Yes, exactly.
-
25:05 - 25:11So it's not as frequent as I expected,
having bilingual posts -
25:11 - 25:13that I would call.
-
25:13 - 25:14But it's happening.
-
25:15 - 25:21And so, for a series of tests,
I had to do manual combing -
25:21 - 25:23and I saw that sometimes
it was the case -
25:23 - 25:27that they were doing some sort
of translation in the same tweet -
25:27 - 25:32and sometimes it was just the case
that they were mentioning titles of things -
25:32 - 25:34or places in a different language.
-
25:35 - 25:39So, there's a lot of issues
surrounding the automatic handling of this -
25:39 - 25:44but you are dealing with 92 networks
-
25:44 - 25:51and they have between 30
and 5,000 nodes in them. -
25:53 - 25:56So, I don't remember the numbers exactly,
-
25:56 - 25:59but I'm talking about
around 80,000 people. -
26:01 - 26:05So detecting the language of 80,000 people
and this is small-scale. -
26:05 - 26:08If you go to millions,
you need an automatic system. -
26:08 - 26:11And one of the things I'm having
to write up in my dissertation -
26:11 - 26:14is what are the challenges.
-
26:14 - 26:18You have to be prepared for them,
to solve those problems. -
26:19 - 26:22And one of them is what do you do
with bilingual posts -
26:22 - 26:24which language do you assign to that post?
-
26:24 - 26:28Automatic posts, spam...
there's a lot of problems. -
26:30 - 26:31Challenges, I mean.
-
26:31 - 26:35That's what makes it interesting
because you cannot do manual combing -
26:35 - 26:36on these scales.
-
26:39 - 26:41Do you have another question?
-
26:45 - 26:48So, now, what am I doing with this?
-
26:51 - 26:56I'm going to classify my social networks,
looking at the patterns -
26:56 - 26:59of overlaps between the languages groups.
-
27:00 - 27:02And overlaps or intersections.
-
27:03 - 27:08I'm looking specifically at the networks
that have only two language groups -
27:08 - 27:12I had five of these networks
that were trilingual -
27:12 - 27:16so I put them aside to go simple
first with just two language groups -
27:16 - 27:18to see how they interconnect.
-
27:19 - 27:21And then I classified them
-
27:22 - 27:24first following a qualitative analysis
-
27:24 - 27:29and then I used network statistics
that I developed with my adviser -
27:29 - 27:30for this purpose.
-
27:31 - 27:34And I will talk later a little more
about it. -
27:34 - 27:38So, tried to provide
more robust measures for that. -
27:39 - 27:44I classified them and I came up
with some types. -
27:46 - 27:50This is what I call the gatekeeper
language bridge type. -
27:51 - 27:53And there's some variants of it,
obviously. -
27:54 - 27:56What you can see here
is the network of a person -
27:56 - 28:00and I'm going to assume this person
is in the United States -
28:00 - 28:02and speaks both Spanish and English.
-
28:04 - 28:06Let's call her Maria.
-
28:06 - 28:12So she's Maria and she has two groups
of friends using Spanish on Twitter -
28:13 - 28:16and then that big group of friends
using English. -
28:17 - 28:20And, as you can see,
there's just a few nodes -
28:20 - 28:22connecting the two language groups.
-
28:22 - 28:28You can see that the social structure
can be different from the language groups -
28:29 - 28:32so you can have maybe a group of friends
and a group of coworkers -
28:32 - 28:36inside the same language group,
so it can be more complex -
28:36 - 28:41than just dividing the social network
by language groups. -
28:41 - 28:46There can be more grouping
because of other social resources. -
28:47 - 28:51But the interesting thing is that
there are only a few nodes -
28:51 - 28:53where people are connecting
holding together these Twitters. -
28:55 - 29:01I think this was friends
with English here. -
29:01 - 29:05You can see, in this case, it seems
like the two groups -
29:05 - 29:08are holding closely together
-
29:09 - 29:14because there are much more links
holding the two groups together. -
29:15 - 29:18Of course, this is going to depend
on the size of the networks -
29:18 - 29:23so I had to account for the size
when coming up with measures -
29:23 - 29:26with network connections
-
29:26 - 29:28I had to provide ratios.
-
29:28 - 29:32Now, the ratio of [close] language linking
here and here -
29:32 - 29:34and you have these types--
-
29:36 - 29:40These types are not just clear-cut.
-
29:40 - 29:42There's an evolution.
-
29:42 - 29:43There's people that have
very few connections -
29:43 - 29:45with the language groups
-
29:45 - 29:47and then progressively there's people
with more and more. -
29:48 - 29:49And this increases.
-
29:49 - 29:52Which points to the fact,
that my cells are there. -
29:53 - 29:57Which means I don't see the evolution
over time, ok? -
29:58 - 30:00This is a limitation of my research.
-
30:00 - 30:05I just see the social network
of this person looked -
30:05 - 30:07at a particular point in time.
-
30:08 - 30:10I don't know how it evolves over time.
-
30:10 - 30:13So, for myself, it's just there.
-
30:14 - 30:19It would be interesting
to see these different patterns -
30:19 - 30:21that I have been observing.
-
30:21 - 30:27Maybe over time these connections
between languages maybe increasing. -
30:29 - 30:32Now we have the integration
and union type -
30:33 - 30:37where in this case you have a person
from an Arab country -
30:37 - 30:41and green represents the friends
that are using Arabic -
30:41 - 30:45and the friends using English are in pink,
but there's also violet -
30:45 - 30:47there are bilinguals.
-
30:47 - 30:52That means there's a group
of English users -
30:52 - 30:57and bilingual English - Arabic users
inserted in the group of Arabic, inside. -
31:00 - 31:01That's the integration,
so they're integrated. -
31:02 - 31:08And then I have a Greek guy,
who uses Greek and English -
31:08 - 31:09and his Arabic friends.
-
31:09 - 31:12And in this case, you can see
it's sort of light blue -
31:12 - 31:17representing Greek, so the friends
that tweet in Greek -
31:17 - 31:21Pink again represents people tweeting
in English -
31:21 - 31:23and there's a lot of bilinguals.
-
31:23 - 31:27So these kind of dark blues
represent the bilinguals. -
31:27 - 31:29And these are two groups
-
31:29 - 31:33that if you've seen before,
the gatekeeper and the language bridge -
31:33 - 31:35progressively getting closer and closer
-
31:35 - 31:41with more and more links
across languages. -
31:41 - 31:43In this case, this is like the extreme.
-
31:43 - 31:46The links between the two languages
are so dense -
31:46 - 31:51that you cannot almost distinguish
where the border is -
31:51 - 31:53between the two language groups.
-
31:53 - 31:59And, interestingly, the border might be
even only noticeable -
31:59 - 32:01because there's a lot of bilinguals
around it. -
32:02 - 32:05And this is the union type
where they unite. -
32:07 - 32:10And finally, the peripheral language type.
-
32:10 - 32:14This is a Brazilian guy,
the network of a Brazilian guy -
32:15 - 32:17where you have--
-
32:17 - 32:19probably he lives in the United States
or something like that-- -
32:19 - 32:23because this guy has mostly
all this big group of friends -
32:23 - 32:25tweeting in English.
-
32:27 - 32:32And then there's the side tentacle
running outside, using Portuguese. -
32:35 - 32:36And this is like a periphery landscape.
-
32:36 - 32:39So, in the periphery there's a small group
of Portuguese language. -
32:40 - 32:45Now, I forgot to mention that there's dots
that are light yellow or white. -
32:45 - 32:48Those are the ones that have no data.
-
32:49 - 32:51So, I don't know
the language they're using -
32:51 - 32:53because either their accounts are closed
-
32:53 - 32:58or for some reason, in between the collection
of data they closed the account. -
32:59 - 33:03Mostly, the reason
is that they're private accounts -
33:04 - 33:06where you cannot get the data from.
-
33:06 - 33:09I think somewhere I read
it was about 5 percent. -
33:09 - 33:10I'm not sure.
-
33:10 - 33:14But for one reason or another,
I don't have that information. -
33:17 - 33:21Now, why am I classifying them?
These networks? -
33:23 - 33:26Well, the reason is that--
-
33:26 - 33:29well, there are some studies
that demonstrate that the social structure -
33:29 - 33:34the structure of the social networks
influences the spread of information. -
33:34 - 33:36How information disseminates
in the network. -
33:39 - 33:43So, I'm just assuming
that these different structures -
33:43 - 33:46are going to influence the spread
of information. -
33:47 - 33:50But this is a study that has to be done.
-
33:50 - 33:53I cannot demonstrate that one
of these types -
33:53 - 33:56facilitates the spread of information.
-
33:56 - 34:02I can only say that I am assuming,
so that potential study -
34:04 - 34:09could just look at, for example,
if gatekeeper and language bridges -
34:11 - 34:16are not as good for spreading information
as union and integration types. -
34:20 - 34:25Right, we can just assume
because of the cross-language links -
34:28 - 34:33so, how many links there are
or the ratio of discourse language -
34:33 - 34:38may potentially facilitate information
diffusion in these cases. -
34:40 - 34:43So, that study needs to be done.
-
34:43 - 34:45I cannot say what's going to happen!
-
34:45 - 34:47I just assume it's going to be like that.
-
34:49 - 34:52So that is the reason why I classify them.
-
34:52 - 34:55I have some network statistics.
-
34:56 - 35:01We've made about an 80 percent accuracy
guess, which is quite good, -
35:01 - 35:02but the sample is small.
-
35:08 - 35:11So now, do you have any more questions
before I move past to the next study? -
35:14 - 35:15man) I was curious as to how many--
-
35:15 - 35:19what was the selection process like
to find the 92 users? -
35:20 - 35:23Well, this is what I've been spending
the beginning -
35:23 - 35:27about just using two stopwords
from two different languages -
35:27 - 35:31typing that in the search box in Google
and searching Twitter -
35:31 - 35:33and then once--
-
35:33 - 35:36Basically you just go through
the list of results -
35:36 - 35:42and start opening the profile,
counting the tweets. -
35:42 - 35:45How many in this language,
how many in the other. -
35:45 - 35:47And we put a threshold of 10 percent
-
35:47 - 35:53they had to have written 10 percent
of the tweets in a second language -
35:53 - 35:57and you couldn't count retweets
or automatic posting. -
35:58 - 36:00We also had to manually discard
these spammers. -
36:02 - 36:04So, that was the process.
-
36:06 - 36:10(woman) And that's a paid search
through Google? -
36:10 - 36:13No, that we did manually
-
36:13 - 36:14and then once--
-
36:14 - 36:20So the other thing you can say is you can
use these core multilingual users -
36:21 - 36:24and then do what I did for behavior
in these social networks -
36:24 - 36:29which is once you extract the friends
and extract the messages of the friends -
36:31 - 36:34and automatically find the language
-
36:34 - 36:37then you can say "Oh, this person
is multilingual" automatically. -
36:37 - 36:41You just process it and you can detect
a lot more multilingual people -
36:41 - 36:43through that process.
-
36:43 - 36:46The paid process was sending these posts
-
36:46 - 36:49to the Google language
identification tool. -
36:50 - 36:55So, what I did was clean each message
automatically. -
36:56 - 37:00Basically, eliminating the hashtags
-
37:01 - 37:05and the mentions
that had an @ in front, -
37:05 - 37:10symbols, URLs, all those things
I would automatically eliminate them -
37:10 - 37:14and then with the rest of the message,
I'd send that to the Google API -
37:14 - 37:16for language identification
-
37:16 - 37:22and the Google API would give me
a language level and a confidence binary. -
37:22 - 37:23And that for each message.
-
37:23 - 37:26And then I built the algorithm
with the help of Jen Golbeck -
37:26 - 37:31to decide, well I have 30 messages,
500 English -
37:31 - 37:3510 million Spanish and then one in Swahili
which is unlikely -
37:37 - 37:40and you had to decide
the confidence value-- -
37:40 - 37:43So I used rules, defined rules
-
37:43 - 37:46but it could be done
statistically I think. -
37:46 - 37:48And write some statistical method
to decide -
37:48 - 37:52"well this person actually is bilingual"
or whatever. -
37:53 - 37:54That's the process.
-
37:54 - 37:56It's long!
-
37:56 - 37:57Yes.
-
37:58 - 38:00(woman) Hi, I understand
that you did it manually -
38:00 - 38:05but currently in existing research field
is there any software -
38:05 - 38:08that we can use to capture,
-
38:08 - 38:12to have access to all
these different tweets? -
38:12 - 38:15And to capture the different categories?
[inaudible] -
38:15 - 38:18Ok, so you mean the extraction?
-
38:19 - 38:20(woman) Yeah.
-
38:20 - 38:21No, I didn't do it manually.
-
38:21 - 38:23(woman) And the other,
I think the other part -
38:23 - 38:26of your data presentation
is visualizations coming out -
38:26 - 38:27like this graph.
-
38:27 - 38:33Can you show us what kind of research
do we have for social scientists -
38:33 - 38:35to present the data in a visual form?
-
38:35 - 38:37This is a tool I would recommend.
-
38:37 - 38:39[inaudible]
-
38:39 - 38:41So, the first question.
-
38:43 - 38:46All the extraction from Twitter,
it was automatic. -
38:46 - 38:49I didn't copy the tweets,
it was automatic. -
38:49 - 38:51I used the Twitter API.
-
38:51 - 38:55They have a process
for registered developers -
38:55 - 38:57and I extracted it automatically.
-
39:02 - 39:06Now, the tools, and I forgot
to put that in this slide -
39:06 - 39:09but in the beginning,
when I showed you the first visualization -
39:09 - 39:12I put the name of the tool in--
-
39:13 - 39:18I don't know if I translate well,
but I think it's G-E-- -
39:18 - 39:24You can see here, G-E-P-H-I,
I don't know how to pronounce it! -
39:24 - 39:27["Jefy" I think...]
-
39:28 - 39:32So, this is the one I've used
for the visualizations -
39:34 - 39:37and it's good because you can use it
on any platform. -
39:37 - 39:42So both on a Mac or a PC or Linux.
-
39:45 - 39:47Now, it has limitations for...
-
39:47 - 39:51mostly for network statistics
in my opinion. -
39:54 - 39:57The other one, that is very popular
is Node XL. -
39:57 - 40:01And in fact it was developed
here in the ATI lab. -
40:02 - 40:04In the lab where I work.
-
40:05 - 40:07So, they collaborated with Microsoft.
-
40:07 - 40:10It's a template for Excel
-
40:11 - 40:13and it allows--
-
40:13 - 40:18In fact they are still adding new features
and there's two people working on it -
40:18 - 40:20in the lab.
-
40:20 - 40:24But the reason I haven't used it here,
is because I have a Mac -
40:24 - 40:29and also there's another reason
I like this positioning algorithm -
40:31 - 40:33and this is...
-
40:33 - 40:37this is another issue
I haven't talked about -
40:37 - 40:40is how you actually place the dots.
-
40:40 - 40:47And actually these algorithms for layout
use force-directed schemes -
40:49 - 40:51like in physics science.
-
40:51 - 40:54So if a node has a lot of links
with another node -
40:54 - 40:57they put it closer,
so it's like there's forces -
40:57 - 41:00or strings attaching the nodes.
-
41:01 - 41:04And depending on how many strings
there are, they're closer or farther. -
41:05 - 41:08There's physics science rules
for placing them. -
41:08 - 41:10But there's different algorithms
-
41:10 - 41:15but the other reason I chose Gephi
is that it has an algorithm -
41:15 - 41:21specifically in this tool
that places my language groups separately -
41:21 - 41:24more than any other algorithm
that I could use in Node XL. -
41:24 - 41:29And it was more useful
to see the groups separated. -
41:30 - 41:33But you can use both
depending on what you want to do. -
41:33 - 41:36They both have weaknesses and strengths,
-
41:36 - 41:39different depending
on what you have to do. -
41:41 - 41:47Node XL has more features
for processing many networks -
41:48 - 41:51and extracting network statistics
for many networks at the same time. -
41:52 - 41:57And it has a lot of interesting features,
maybe this is more manual. -
41:59 - 42:00I don't know.
-
42:00 - 42:05Somebody called it
"the Photoshop of visualization". -
42:09 - 42:14So I'm going to briefly comment
on the factor analysis. -
42:14 - 42:19The point here, what I want to see
is multilingual users of Twitter -
42:21 - 42:24are aware of their audience in a way.
-
42:25 - 42:29And they somehow perceive
how many followers -
42:29 - 42:32of this language or the other they have.
-
42:33 - 42:36Maybe not very consciously,
-
42:38 - 42:40but they perceive something.
-
42:40 - 42:42So, I went to see how this social network
-
42:42 - 42:47the fact that there's many languages
or just one in the social network -
42:48 - 42:53can affect the choice of language in this person,
the ego person. -
42:55 - 42:58So, I actually did a lot of testing,
different variables, -
42:58 - 43:01but I'm just going to focus
on the essence, -
43:01 - 43:06which is I have my dependent variable
which is the proportion of English -
43:06 - 43:11used by the ego has 50 posts,
maybe 60 percent of them are in English -
43:12 - 43:14and 40 percent in Spanish,
I don't know. -
43:15 - 43:19And then they have the factor
of how many users in the network -
43:19 - 43:21are in English
and how many are using other languages. -
43:22 - 43:24And then the multilingual index
of the network -
43:24 - 43:26- and this is my favorite part -
-
43:26 - 43:30because it's basically saying
-
43:30 - 43:36"is multilingualism encouraging English
as a lingua franca?" -
43:37 - 43:42especially on Twitter, where we have these
public posts that anybody can read. -
43:43 - 43:47So anyway... I'm not going to go
into the technical details -
43:48 - 43:51of bi-nodal statistical interpretation.
-
43:51 - 43:55What I wanted to do is
that in these combined effects -
43:56 - 44:00of the factors,
which one was more important? -
44:01 - 44:03Was heavier than the others?
-
44:03 - 44:07Had more weight in defining these
proportional [inaudible] used by the ego. -
44:09 - 44:11I tried other factors,
-
44:11 - 44:14I also looked at the use
of non-English language -
44:15 - 44:18In the end... there are certain,
-
44:20 - 44:21I mean, they're obvious somehow.
-
44:21 - 44:24I think it's more interesting the process
of what I've learned -
44:24 - 44:26than the results themselves.
-
44:27 - 44:30Because basically what I've learned
is that, yeah, -
44:31 - 44:33the English use of the network
-
44:33 - 44:36is encouraged by the use
of English by the ego -
44:36 - 44:41and in a certain way it's so important
that any other factor -
44:41 - 44:44is really not that important.
-
44:45 - 44:49And even the second most important,
the multilingual index -
44:50 - 44:55was so light compared with
the heavy impact of English -
44:56 - 44:57used in the network.
-
44:58 - 45:00But what I thought was really interesting
-
45:00 - 45:03was how do you define
the multlinguality of a network? -
45:04 - 45:07And with this I got help
from Jordan Boyd-Graber -
45:07 - 45:09who is also in the iSchool
-
45:09 - 45:14and in the lab for computational lab,
the information processing lab -
45:14 - 45:15here in Maryland.
-
45:15 - 45:18He helped me
with all these technical aspects. -
45:18 - 45:21And he was the one suggesting
"Well, why don't you look--" -
45:21 - 45:25"instead of just looking at the number
of languages in the network... -
45:25 - 45:29"because sometimes you get
wrongly detected languages... -
45:29 - 45:30like Swahili. Well, no one was really
speaking Swahihi in this network. -
45:33 - 45:37There were technical challenges,
like I explained to you. -
45:38 - 45:42So maybe there's a high number
of languages in the network -
45:42 - 45:44but the network is mostly monolingual.
-
45:44 - 45:49Mostly everybody uses English
and just a few people maybe use others -
45:50 - 45:52or maybe just it got wrongly detected.
-
45:52 - 45:55And maybe you're just saying
-
45:55 - 45:57"Oh yeah, there's ten languages
in the network!" -
45:57 - 46:00and actually it's not
a very multilingual network at all. -
46:00 - 46:03So, we came up with this, the entropy.
-
46:03 - 46:06And this is a physics concept
that measures the disorder -
46:06 - 46:08in a system.
-
46:08 - 46:11And in this case, the entropy
would be my multilingual index -
46:11 - 46:17and what it's doing is providing a value
between 0 and 1 -
46:17 - 46:23So, with 0 it's a very homogeneous system
everyone speaks the same language -
46:24 - 46:27and if it's closer to 1,
it's really a heterogeneous -
46:27 - 46:29and it places an importance
-
46:29 - 46:32in how many people
are using its language. -
46:32 - 46:36So, this is the equation,
just to show you it. -
46:38 - 46:41And it takes into account the number
of languages in the network -
46:41 - 46:45and then one of the variables
is how many nodes in that language -
46:45 - 46:48that there are divided by the total number
-
46:48 - 46:51and this is what gives the proportion
for example. -
46:53 - 46:57So just to let you know
that there's interesting lessons -
46:57 - 46:58from this study.
-
46:58 - 47:00Despite the research not being exciting!
-
47:01 - 47:03And this is what I'm doing right now.
-
47:05 - 47:08So, the intrinsic characteristic
of the message -
47:08 - 47:11how that influences the language choice.
-
47:11 - 47:16First, I'm wondering,
because I just saw it in the content -
47:19 - 47:22are replies encouraging people
to use their native language? -
47:23 - 47:27And public posts encouraging people
to use English as a lingua franca? -
47:28 - 47:30This is one that showed up the same.
-
47:30 - 47:34And I changed the handle,
for privacy reasons... -
47:35 - 47:38So this is the reply to somebody
and it's in Arabic. -
47:38 - 47:41And this is a public posting
and it's in English. -
47:42 - 47:46Now, the thing I'm looking at
is public analysis -
47:46 - 47:50and I'm considering with Jordan
to do some automatic topic analysis -
47:51 - 47:54because there's many languages,
so I cannot decode it all -
47:55 - 47:57in many of them.
-
47:57 - 47:58Only in three, maybe four...
-
48:00 - 48:01So, I'm wondering,
-
48:01 - 48:04are technology topics favoring
the use of English? -
48:05 - 48:10And other topics,
international news maybe? -
48:11 - 48:16Whereas other topics
like national news or songs -
48:16 - 48:19they might be encouraging the use
of native languages. -
48:21 - 48:23And then I'm looking
if there's translations -
48:23 - 48:27or if there's cross-cultural words
that you can detect. -
48:27 - 48:29For instance, this person
is writing in English -
48:29 - 48:33but it recommending a visit to a museum
in the city of Lille in France. -
48:34 - 48:39So this person knows the city in France,
knows that to visit the museum -
48:39 - 48:41you go there.
-
48:41 - 48:43And this is what I call
cross-cultural words. -
48:44 - 48:49[What I kind of found] is that surprisingly
there's not many translation behaviors -
48:49 - 48:53going on, despite these people
being multilingual. -
48:53 - 48:56And this is what is going to trigger
some reflections. -
49:00 - 49:02How am I doing on time?
-
49:04 - 49:06(woman) 1:22.
-
49:06 - 49:10(man) Umm, it's usually an hour long...
-
49:10 - 49:14So, I will go on with my reflections.
-
49:14 - 49:18to encourage some thoughts.
-
49:18 - 49:22So the greatest connecting power
is the will of users who want -
49:22 - 49:23to be connected.
-
49:23 - 49:28This is a really nice quality,
because the communities of interest -
49:28 - 49:32in social media, in Twitter
is what is bringing people -
49:32 - 49:34from different countries, together.
-
49:35 - 49:41And also experiences,
like the Voluntweeters, -
49:42 - 49:46so after the earthquake in Haiti,
there were these spontaneous -
49:46 - 49:49self-organizations of Twitter users
for translating tweets -
49:50 - 49:54and they called themselves Voluntweeters,
there's a paper about that-- -
49:54 - 49:59So this is the triggering
of social connections -
50:01 - 50:04across countries, across borders
and across languages. -
50:07 - 50:10But even when the social structure
could potentially facilitate -
50:10 - 50:13information diffusion
and cross-language linking -
50:15 - 50:17this condition is not sufficient.
-
50:17 - 50:20There are other factors
like the design of the interfaces -
50:20 - 50:22and the design of systems
that can influence... -
50:23 - 50:27can promote, or not translation behaviors
and cross-cultural awareness. -
50:28 - 50:32And the Wikipedia
of cross-language linking -
50:32 - 50:35you have links for many languages
for every article. -
50:37 - 50:41We also still acknowledge the dynamic
language preferences of multilingual users -
50:42 - 50:44so they could address their messages
to the appropriate audience. -
50:44 - 50:47I like the solution of Google+
with their circles -
50:48 - 50:52where I can put my friends and family
in Spain in a circle -
50:52 - 50:55and write them in Spanish.
-
50:55 - 51:01And then the recommendation of people
based on language profile -
51:01 - 51:04would be useful for this spontaneous
self-organization. -
51:06 - 51:08So, these are some of the things.
-
51:08 - 51:10The impact of mediation.
-
51:11 - 51:13Global Voices is
an international community of bloggers -
51:13 - 51:18that connect bloggers and citizens
from around the world -
51:19 - 51:21in different languages.
-
51:21 - 51:23And Scott Hale
-
51:23 - 51:27a student from Oxford University
led a very interesting study -
51:27 - 51:34after the earthquake in Haiti about blogs
in Spanish, Japanese and English -
51:36 - 51:39and he looked
at the cross-language linking -
51:39 - 51:41and focusing on this topic
over time. -
51:41 - 51:45And he discovered that 50 percent
of the cross-language linking -
51:45 - 51:48was happening through this platform,
Global Voices. -
51:49 - 51:52So, it had a very big impact
in the language links. -
51:54 - 51:58And finally, social media,
big media outlets, -
51:58 - 52:02people are interconnected
in these complex networks -
52:05 - 52:09and underlying is this language ecosystem.
-
52:09 - 52:13So we have the language ecosystem,
and on top of that -
52:13 - 52:15we have the social media ecosystem.
-
52:15 - 52:20People would share a video from YouTube
on Twitter, or news on Facebook. -
52:21 - 52:26What happened if we integrate
in this ecosystem -
52:27 - 52:31these platforms, like Global Voices,
like Universal Subtitles -
52:31 - 52:34which is a platform
for crowdsourcing subtitling of videos -
52:34 - 52:37and translation of subtitles
for videos. -
52:38 - 52:42If you integrate that and this
starts connecting, starts building paths -
52:42 - 52:46between languages,
that didn't exist before. -
52:46 - 52:51So I think we should make it easy
for multilingual people to translate -
52:51 - 52:55and subtitle all the content they like,
their favorite content -
52:56 - 53:00and share it with the appropriate audience
so they can start connecting -
53:00 - 53:03the language islands of the internet.
-
53:03 - 53:06And that way stories will travel
all over the world. -
53:09 - 53:12Particularly I would like to thank
Jen Golbeck, my adviser -
53:12 - 53:14and Fulbright for supporting
this research. -
53:14 - 53:19And then I open the space
for questions and your ideas -
53:19 - 53:22if this has triggered some thoughts.
-
53:24 - 53:26(woman) I have a question
about how this relates -
53:26 - 53:28to your Yahoo award.
-
53:29 - 53:35Well, they have the Internet Experiences
lab in California. -
53:35 - 53:36And they--
-
53:36 - 53:40So, we tend to think
maybe it's a super tiny place -
53:40 - 53:43but actually there are fields
-
53:43 - 53:45and I applied for the social systems.
-
53:45 - 53:49The social systems are a category.
-
53:49 - 53:55And I think that was embedded
in the Internet Experience lab -
53:57 - 53:58and yeah, they liked it.
-
53:59 - 54:02(man) But is it this
work that they are interested in? -
54:02 - 54:03Yes.
-
54:03 - 54:04- The languages?
- Yes. -
54:04 - 54:08Well, now I have results,
because I wrote up reports -
54:09 - 54:12about what my work was about.
-
54:17 - 54:18Great.
-
54:22 - 54:23Yes?
-
54:23 - 54:26(woman) I was thinking about
if you analyzed the place... -
54:26 - 54:31like if there's any relationship
between tweeters and tweets -
54:31 - 54:34and the place that the people are.
-
54:36 - 54:40I mean, because it's not the same
being a Brazilian in Brazil -
54:40 - 54:43and tweeting in Portuguese
or being Brazilian in the US -
54:43 - 54:45and tweeting in Portuguese--
-
54:46 - 54:49There's many, many factors
that I haven't looked at. -
54:50 - 54:52It's not part of your study?
-
54:52 - 54:54But because I had to scope it somehow.
-
54:54 - 54:56There's so many factors.
-
54:57 - 55:00Geography was one that I was originally
intending to look at -
55:00 - 55:04but I found there were so many problems
to actually get the right geography -
55:04 - 55:07the right geolocation.
-
55:08 - 55:12The problem is that I didn't originally
collect the geolocation. -
55:12 - 55:16I think only a small percentage
of messages have... -
55:16 - 55:18geolocated information.
-
55:19 - 55:21I'm not sure about the percentage there.
-
55:21 - 55:25So there's only a small percentage
of messages that have geolocation. -
55:25 - 55:28There's issues with the accuracy...
-
55:28 - 55:31What I have collected is the information
in their profile -
55:32 - 55:35they can put the information
about the place, -
55:35 - 55:40but sometimes it's more
or less trustworthy, -
55:40 - 55:43sometimes there's nothing,
and sometimes there's just crazy stuff. -
55:43 - 55:45(audience laughs)
-
55:47 - 55:50So, something absolutely has to be there.
-
55:50 - 55:55If I wanted to expand this,
geography would be a nice place to go! -
55:55 - 55:57(woman) Ok.
-
56:00 - 56:01Yes?
-
56:01 - 56:02(man) Could you say a little bit more
-
56:02 - 56:05I think you said about the visualization
choices you made? -
56:05 - 56:06Oh yes, well...
-
56:08 - 56:11I tried this tool, the Node XL,
-
56:11 - 56:13I used both Node XL and Gephi.
-
56:14 - 56:15There's more...
-
56:16 - 56:20I think there's, I don't remember the name
there's one that was developed -
56:20 - 56:22here in Maryland
-
56:22 - 56:24but it's not as user-friendly.
-
56:26 - 56:30But I've forgotten the name,
I will have to look it up. -
56:30 - 56:34And there's a lot of tools
that are for really technical people -
56:35 - 56:37that are handling millions of nodes.
-
56:38 - 56:41Because with these tools,
for social scientists or humanists -
56:41 - 56:42maybe they are not.
-
56:42 - 56:49Some tools can have maybe 300-400 nodes
and still be understandable. -
56:51 - 56:56But if you go beyond that,
actually visualizations get crazy -
56:56 - 57:02and even for more technical tools
for more technical people -
57:03 - 57:07there are hundreds or millions,
they cannot do visualizations -
57:08 - 57:12at some point they just give you
statistical measures. -
57:14 - 57:15I have to leave it out.
-
57:15 - 57:17I have a list of tools and that
-
57:17 - 57:21but if I need the names,
I need to go through everything. -
57:23 - 57:25(woman) But yours was Mac-accessible?
-
57:25 - 57:32Yes, this Gephi tool is Mac-accessible,
you can use it with Microsoft -
57:32 - 57:34with Mac and with Linux.
-
57:36 - 57:38And I forgot to say,
it's open source. -
57:43 - 57:49(woman) Did you find
studying languages and internet -
57:49 - 57:53was like a place, unexplored?
-
57:53 - 57:55Like here in the United States?
-
57:55 - 58:00Like when you began studying
or analyzing this -
58:00 - 58:04you felt that a lot of people
are doing this -
58:04 - 58:06or nobody is doing this
-
58:06 - 58:08and I'm the first one trying to--
-
58:08 - 58:13I'm not the first one,
but it's a very new area -
58:13 - 58:15to be exploring.
-
58:15 - 58:17So, it's very exciting
because of that. -
58:17 - 58:19Because there's so many
unanswered questions -
58:19 - 58:24and I find that surprisingly enough
the United States is not paying so much attention -
58:24 - 58:26about multilinguality issues
-
58:26 - 58:31And I think that language policies
are very monolingual-oriented -
58:31 - 58:33but it's terrible
-
58:33 - 58:37because there's a whole lot
of multilinguality in this country. -
58:37 - 58:41There's so many people
speaking different languages -
58:43 - 58:45that I'm so amazed
about that contradiction. -
58:46 - 58:49Because in Europe,
it's an obvious challenge for us -
58:49 - 58:52because we need to understand each other
between all these countries -
58:52 - 58:54of the European Union.
-
58:54 - 58:58And there's a lot of money invested
in research that relates to multilinguality -
58:59 - 59:01and communication in languages
-
59:01 - 59:05and technology in particular,
cross-language systems -
59:05 - 59:09and in libraries there's a lot of work
going on. -
59:09 - 59:14There's investment in the research.
-
59:15 - 59:18So yeah, maybe in terms of investment
-
59:18 - 59:22the European Union is
not a bad place to be. -
59:22 - 59:24Better than the United States!
-
59:24 - 59:27But at the same time,
what I find interesting -
59:27 - 59:33is that here when I talk about it
people are really interested -
59:35 - 59:38and interested in the subject
and excited about it. -
59:38 - 59:41Maybe in Europe it looks more
like old news. -
59:41 - 59:44Like "yeah, we already know that."
-
59:44 - 59:46(audience laughs)
-
59:46 - 59:50So I find that it's exciting
to be seeing the audience -
59:50 - 59:52like "Oh yeah!"
It's so new. -
59:53 - 59:54*(woman) Yes.
-
59:59 - 60:03(woman) As the emerging view
of research in the United States -
60:03 - 60:10can you show me which institutions
or which area of academic institutions -
60:12 - 60:15actually have more invested
in this topic in the US? -
60:16 - 60:19I'm not sure about the institutions.
-
60:21 - 60:26What I know, particularly,
in Indiana there's work -
60:27 - 60:29because Susan Herring
is a researcher there. -
60:31 - 60:33She has inspired my work.
-
60:33 - 60:36She published a book
The Multilingual Internet -
60:36 - 60:41and she has done research on blogs,
also communities -
60:42 - 60:45of different languages connecting blogs
in the blogosphere. -
60:45 - 60:51So she has been one of the ones,
one of the first tackling these issues -
60:51 - 60:55and she's still going
and she's doing something. -
60:55 - 60:59So, it's the University of Indiana,
I think. -
61:01 - 61:03Yeah, Susan Herring.
Look for her! -
61:06 - 61:09And also at the same university
there's Paolillo. -
61:10 - 61:13He's also doing research
in this area -
61:13 - 61:19and he actually published for UNESCO
for research on language diversity -
61:19 - 61:20on the internet.
-
61:22 - 61:23So Susan Herring and Paolillo,
-
61:23 - 61:25they are at the same university.
-
61:27 - 61:30Those are my inspiring ones.
-
61:34 - 61:37Well, at Harvard at the Berkman Center
of Internet and Society also did -
61:37 - 61:39this mapping of the blogs.
-
61:39 - 61:41But they don't focus on languages.
-
61:42 - 61:45But there's tangential thing
around there. -
61:49 - 61:51(man) One more question?
-
61:54 - 61:55Well, thank you very much!
-
61:55 - 61:56Thanks!
-
61:56 - 61:58(audience applauds)
- Title:
- Irene Eleta: Multilingual Users of Twitter: Social Ties Across Language Borders or How a Story Could Travel the World
- Description:
-
- Video Language:
- English
- Team:
- MITH Captions (Amara)
- Project:
- BATCH 1
Show all