-
Welcome to my talk. Thanks for your nice introduction and the nice welcoming from you guys!
-
You see the talk has the allusive name "Surveillance and language"
-
which obviously alludes to Foucault with "Surveillance and punish" (Discipline and Punish in English)
-
However, long before Foucault presented the genesis of the disciplinary society,
-
you find a lovely moral tale in a children's book,
-
which is named "The Kid in the glass house" by Heinrich Oswalt, written in 1877 and very foreshadowing
-
In Frankfurt lives a glazier master,
Mr. Lebrecht Sheibenmann his name;
-
He had a little daughter,
Who never wanted to be washed.
-
And Gretchen came with sponge and soap,
So the bad girl ran away;
-
It even flipped the washing table -
The water flooded the house.
-
So Mr. Lebrecht Scheibenmann began
to build a strange house,
-
A house made only of glass, that, alas!
Was transparent throughout.
-
And in this glass house
the bad daughter was then seated.
-
So that, in order to see,
People stopped on the street.
-
So the kid was ashamed and ran around
In the entire house and screamed:
-
"Where can I hide?
You can see me from everywhere!
-
The roof, the cellar, every room
Is made of glass! you can always see me!"
-
The mother said: "My dear child!
There is a quick fix to that:
-
If people see you decent
They will pass by;
-
[...]
The daughter remembered that;
And tried to be seemly.
-
And because it no longer screamed while washing,
Other people never laughed;
-
Since everyone who peeked into the house,
Sees a kid that's very seemly.
-
And if you have your own child, you people,
That always screams while washing,
-
Just tell it Mr. Lebrecht Scheibenmann,
He will deliver you a glass house immediately.
-
Yes, there... tentative approaches to applausing laughs
Applause
-
Yes, interesting story, that is certainly fitting for our times
-
as Lebrecht Scheibenmann is named Keith Alexander and works for the NSA
-
The NSA has made glass houses out of all our homes
-
we can all be seen in these glass houses
-
and you don't know, or at least I'm quite sure that one pursues educational purposes
-
that certain actions are no longer acceptable
-
and that we internalize this observation
-
Regarding this observation, language obviously plays a very important role
-
Many of our statements take place in the medium of language
-
This has also given hackers the idea to trick the NSA with a site like "Hello NSA"
-
A website which assembles suspicious words into messages like a "bullshitter"
-
and these are then tweeted, mailed or chatted upon
-
to achieve something like the "operation Troll the NSA:
-
that you can jam the NSA scanners, so that you can execute a DDOS attack
-
simply by sending too much content, which is basically suspicious on the basis of keywords
-
The point of my presentation is showing that the image of the NSA is wrong.
-
We cannot assume that at the NSA people really print something
-
as soon as a keyword is displayed and laughter start to analyse everything
-
and look at it closer and do a qualitative evaluation
-
and this certainly is a very intensive task
-
and therefore a keyword spam DDoS would certainly be ineffective
-
You all have probably read the thanksgiving talkingpoints of the NSA.
-
I don't know if you stumbled across it, that under the 4th point there is something utterly important
-
"NSA brings together the best linguists, analysts, mathematicians, engineers and computer scientists
-
in the United States."
and the linguists are named first.
-
slight laughter
-
So you can see that the NSA is definitely aware of language as an important medium
-
and which is also very important to them. In that it surely makes sense to deal with that
-
It happens that the secretary of the Interior has leaked the most recent analysing software, the "Advanced Security Toolkit"
-
Developed by the Von-Leitner-Institute for distributed realtime java. laughter
-
First, we'll look at today's mission.
-
Today's task is to check out the German blogosphere
-
that seems to be radicalizing since the government's take-over by the grand coalition
-
it's important to check if actions are in preparation to identify radical subjects if necessary,
-
which are especially striking. As a start, we choose our targets, of course some are suggested to us
-
Unfortunately I can only present a small selection of possible targets. I would have loved to take more
-
There are a few socio-critical blogs and news sites
-
like blog.fefe.de, Indymedia, Mädchenmannschaft, Netzpolitik.org, rebellmarkt.blogger.de
-
And religiously motivated websites like kreuz.net islambruderschaft.com blog and discussion board salafistic
-
and of course we confirm the selection. This is a very sensitive selection
-
The following analyses are possible. Naturally, I can only show a selection of possible analytic tools today
-
I wish I could show lots more, but there won't be enough time.
-
First we'll look at what authors write about possible sensitive targets
-
Meaning we'll make a target analysis.
-
On the basis of Name Entity Recognition it examines the collocation for possible terror targets
-
We have to... what is this? ...let's have a look in the manual, what Named Entities are
-
since it is our first day today
-
First of all, Named Entities are expressions which distinguish one entity clearly from other entities with similar attributes
-
Spontaneously one thinks of names, but it's not trivial to say what a name is
-
Accordingly, Named Entity Recognition is the procedure with which one identifies such Named Entities
-
There sure are different classes of Named Entities, e.g. people, organisations, places
-
Sometimes it's not very clear what belongs to a certain Named Entity, e.g. "der Bundestag" (Lower House of German Parliament)
-
this can be a geographical place as well as an organisation
-
Now we still need to know what collocations are
-
They are statistically overly random frequent word combinations
-
so "we define a collocation as a combination of two words, that exhibit a tendency to occur near each other in natural language that is to cooccur”
-
like "take a road", "go down a road"
-
Those are typical connections between the words "road", "go down", or "take"
-
and these connections form collocations if they are overly random
-
as we could determine with statistical tests
-
and we can observe them in natural language
-
One example - you don't need to read that now - I wanted to show an example for the word "Spezialexperte"
-
you can see the "keyword in context" here, being the requested key word
-
and you can see the contexts of this word, so apparently they haven't found a "chosen special expert for internet issues"
-
We won't have to make a quiz game of what blog it could come from
-
What you do then, for a collocation analysis you examine contexts
-
e.g. here five words on the left, five words on the right till the beginning or end of a sentence
-
You just count the words that are in the blue area
-
and you compare the relative frequency with the words which are on the left and right in the white area
-
If a word appears significantly more frequent in the blue area, you can say it is a collocation of the word "Spezialexperte"
-
What is striking here for example is "kriegen" or "Adobe-Spezialexperten" laughter
-
You can visualize collocation as graphs laughter
-
The knots denote lexemes (I'm not sure what's there to laugh about)
-
(that's serious linguistics!) and the edges denote "is collocation of"
-
So here you see "the best of the best, sir", Sarrazin and Mehdorn belong there.
-
It proliferates a little more. "Adobe-Backup", "Backup-Spezialexperten“ … interesting
-
Ok. Now we are in the area of the target analysis. Let's start the analysis.
-
What is it we are doing there? What we're doing is recognizing all Named Entities in all Corpora
-
We first calculate it with methods of mechanical learning.
-
Meaning you examine certain contexts in which the Named Entities stand.
-
We have a training corpus which already knows what Named Entities are
-
e.g. that "Bundestag" is an organisation and the software learns from these contexts
-
what typical contexts for such Named Entities are and tries to apply them to new Corpora
-
What we're doing here: we identify in all corpora, in all blogs, that we examine, the Named Entities.
-
we categorize these Named Entities after people, organisation, geographical locations and other
-
and then we calculate the collocations to the relevant Named Entities.
-
e.g. "Angela Merkel" could be interesting or something
-
And then we also look in the collocations, if they contain any danger words
-
Meaning words that indicate terror plans or others. Now we'll do that.
-
The analysis seems to be finished and the result is, we have danger level 1 of 5, so it's not really tragic
-
the software suggests a check of the danger level regarding Berlin
-
being the location of donalphonso, the blogger of Rebellmarkt
-
A potential target of Fefe is the SPD (Social Democratic Party) laughter and the Maedchenmannschaft one is "Kristina Schroeder" (Minister of Family Affairs)
-
As an example, we now have gotten an order to see what bad things donalphonso writes about Berlin and if he is planning something
-
Now we can display collocation graphs or geo-collocations
-
This means that we have a map and at the places which donalphonso writes about there are the correspondent collocations
-
In America he writes about Boyd and culture, lone perpetrators, confused and "hate mail" and stuff
-
Germany, Middle Europe is in the focus of course. It goes down till Italy
-
There you can also see what donalphonso writes about
-
We're approaching Berlin. There are too many collocations to evaluate
-
So we look at our collocation graph and look for references to terror that could take place
-
I'll read out some: " „Berlin“, „Slum“, „Reichshauptslum“, „arm“, „Transferleistung“, „abscheulich“, „Berliner Hipster“ laughter
-
While this may show quite a negative attitude towards the subject, it's not exactly suspicious of terror.
-
The other potential target were the organisations "SPD" with Fefe
-
We'll look at the collocation graph. Fefe and the SPD. laughterapplause
-
hey „betrayer party“, „fall-over party“, let's turn back briefly
-
In total, in the entire list we really found words such as:
-
„hang“, „force“, „top candidate“, „betrayer party“, "fall-over party“, „pest“, „cholera“ laughterapplause
-
If we look at the collocation graph, we can already see that those are accusations
-
But Fefe is not planning to finish the top candidate off
-
Let's continue with the ideology monitor. We'd want to take some measurements now...
-
It has been proven that the NSA has filed many software patents for algorithms about Named Entity Recognition
-
There has been quite some research going on some time ago
-
But first you find out what interesting targets are and what is said about them
-
You can certainly improve that by measuring ideologies.
-
What we want to calculate now is the similarity of texts, from blogs to certain ideologies
-
We have the possibility of measuring extreme leftist, rightist or islamistic attitudes
-
We do this by calculating typical collocations... for a certain corpus
-
From this corpus we learn. So that's our model of comparison.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Not Synced
subtitles created by c3subtitles.de