Return to Video

A Story of Discrimination and Unfairness (33c3)

  • 0:00 - 0:14
    33c3 preroll music
  • 0:14 - 0:21
    Herald: We have here Aylin Caliskan who
    will tell you a story of discrimination
  • 0:21 - 0:28
    and unfairness. She has a PhD in computer
    science and is a fellow at the Princeton
  • 0:28 - 0:35
    University's Center for Information
    Technology. She has done some interesting
  • 0:35 - 0:41
    research and work on the question that -
    well - as a feminist tackles my work all
  • 0:41 - 0:49
    the time. We talk a lot about discrimination
    and biases in language. And now she will
  • 0:49 - 0:57
    tell you how this bias and discrimination
    is already working in tech and in code as
  • 0:57 - 1:03
    well, because language is in there.
    Give her a warm applause, please!
  • 1:03 - 1:11
    applause
  • 1:11 - 1:12
    You can start, it's OK.
  • 1:12 - 1:14
    Aylin: I should start? OK?
  • 1:14 - 1:15
    Herald: You should start, yes!
  • 1:15 - 1:18
    Aylin: Great, I will have extra two
    minutes! Hi everyone, thanks for coming,
  • 1:18 - 1:23
    it's good to be here again at this time of
    the year! I always look forward to this!
  • 1:23 - 1:29
    And today, I'll be talking about a story of
    discrimination and unfairness. It's about
  • 1:29 - 1:35
    prejudice in word embeddings. She
    introduced me, but I'm Aylin. I'm a
  • 1:35 - 1:41
    post-doctoral researcher at Princeton
    University. The work I'll be talking about
  • 1:41 - 1:46
    is currently under submission at a
    journal. I think that this topic might be
  • 1:46 - 1:52
    very important for many of us, because
    maybe in parts of our lives, most of us
  • 1:52 - 1:57
    have experienced discrimination or some
    unfairness because of our gender, or
  • 1:57 - 2:05
    racial background, or sexual orientation,
    or not being your typical or health
  • 2:05 - 2:11
    issues, and so on. So we will look at
    these societal issues from the perspective
  • 2:11 - 2:16
    of machine learning and natural language
    processing. I would like to start with
  • 2:16 - 2:21
    thanking everyone at CCC, especially the
    organizers, angels, the Chaos mentors,
  • 2:21 - 2:26
    which I didn't know that existed, but if
    it's your first time, or if you need to be
  • 2:26 - 2:32
    oriented better, they can help you. The
    assemblies, artists. The have been here
  • 2:32 - 2:36
    for apparently more than one week, so
    they're putting together this amazing work
  • 2:36 - 2:41
    for all of us. And I would like to thank
    CCC as well, because this is my fourth
  • 2:41 - 2:46
    time presenting here, and in the past, I
    presented work about deanonymizing
  • 2:46 - 2:51
    programmers and stylometry. But today,
    I'll be talking about a different topic,
  • 2:51 - 2:54
    which is not exactly related to anonymity,
    but it's more about transparency and
  • 2:54 - 3:00
    algorithms. And I would like to also thank
    my co-authors on this work before I start.
  • 3:00 - 3:13
    And now, let's give brief introduction to our
    problem. In the past, the last couple of
  • 3:13 - 3:17
    years, in this new area there has been
    some approaches to algorithmic
  • 3:17 - 3:21
    transparency, to understand algorithms
    better. They have been looking at this
  • 3:21 - 3:25
    mostly at the classification level to see
    if the classifier is making unfair
  • 3:25 - 3:32
    decisions about certain groups. But in our
    case, we won't be looking at bias in the
  • 3:32 - 3:37
    algorithm, we would be looking at the bias
    that is deeply embedded in the model.
  • 3:37 - 3:42
    That's not machine learning bias, but it's
    societal bias that reflects facts about
  • 3:42 - 3:49
    humans, culture, and also the stereotypes
    and prejudices that we have. And we can
  • 3:49 - 3:55
    see the applications of these machine
    learning models, for example in machine
  • 3:55 - 4:01
    translation or sentiment analysis, and
    these are used for example to understand
  • 4:01 - 4:06
    market trends by looking at company
    reviews. It can be used for customer
  • 4:06 - 4:13
    satisfaction, by understanding movie
    reviews, and most importantly, these
  • 4:13 - 4:18
    algorithms are also used in web search and
    search engine optimization which might end
  • 4:18 - 4:24
    up causing filter bubbles for all of us.
    Billions of people every day use web
  • 4:24 - 4:31
    search. And since such language models are
    also part of web search when your web
  • 4:31 - 4:36
    search query is being filled, or you're
    getting certain pages, these models are in
  • 4:36 - 4:41
    effect. I would like to first say that
    there will be some examples with offensive
  • 4:41 - 4:47
    content, but this does not reflect our
    opinions. Just to make it clear. And I'll
  • 4:47 - 4:54
    start with a video to
    give a brief motivation.
  • 4:54 - 4:56
    Video voiceover: From citizens
    capturing police brutality
  • 4:56 - 4:58
    on their smart phones, to
    police departments using
  • 4:58 - 5:00
    surveillance drones,
    technology is changing
  • 5:00 - 5:03
    our relationship to the
    law. One of the
  • 5:03 - 5:08
    newest policing tools is called predpol.
    It's a software program that uses big data
  • 5:08 - 5:13
    to predict where crime is most likely to
    happen. Down to the exact block. Dozens of
  • 5:13 - 5:17
    police departments around the country are
    already using predpol, and officers say it
  • 5:17 - 5:21
    helps reduce crime by up to 30%.
    Predictive policing is definitely going to
  • 5:21 - 5:26
    be a law enforcement tool of the future,
    but is there a risk of relying too heavily
  • 5:26 - 5:27
    on an algorithm?
  • 5:27 - 5:30
    tense music
  • 5:30 - 5:34
    Aylin: So this makes us wonder:
    if predictive policing is used to arrest
  • 5:34 - 5:40
    people and if this depends on algorithms,
    how dangerous can this get in the future,
  • 5:40 - 5:45
    since is is becoming more commonly used.
    The problem here basically is: machine
  • 5:45 - 5:51
    learning models are trained on human data.
    And we know that they would reflect human
  • 5:51 - 5:56
    culture and semantics. But unfortunately
    human culture happens to include bias and
  • 5:56 - 6:04
    prejudice. And as a result, this ends up
    causing unfairness and discrimination.
  • 6:04 - 6:10
    The specific model we will be looking at in
    this talk are language models, and in
  • 6:10 - 6:16
    particular, word embeddings. What are word
    embeddings? Word embeddings are language
  • 6:16 - 6:23
    models that represent the semantic space.
    Basically, in these models we have a
  • 6:23 - 6:29
    dictionary of all words in a language and
    each word is represented with a
  • 6:29 - 6:33
    300-dimensional numerical vector. Once we
    have this numerical vector, we can answer
  • 6:33 - 6:41
    many questions, text can be generated,
    context can be understood, and so on.
  • 6:41 - 6:48
    For example, if you look at the image on the
    lower right corner we see the projection
  • 6:48 - 6:56
    of these words in the word embedding
    projected to 2D. And these words are only
  • 6:56 - 7:02
    based on gender differences . For example,
    king - queen, man - woman, and so on. So
  • 7:02 - 7:08
    when we have these models, we can get
    meaning of words. We can also understand
  • 7:08 - 7:13
    syntax, which is the structure, the
    grammatical part of words. And we can also
  • 7:13 - 7:19
    ask questions about similarities of
    different words. For example, we can say:
  • 7:19 - 7:23
    woman is to man, then girl will be to
    what? And then it would be able to say
  • 7:23 - 7:30
    boy. And these semantic spaces don't just
    understand syntax or meaning, but they can
  • 7:30 - 7:35
    also understand many analogies. For
    example, if Paris is to France, then if
  • 7:35 - 7:40
    you ask Rome is to what? it knows it would
    be Italy. And if banana is to bananas,
  • 7:40 - 7:49
    which is the plural form, then nut would
    be to nuts. Why is this problematic word
  • 7:49 - 7:54
    embeddings? In order to generate these
    word embeddings, we need to feed in a lot
  • 7:54 - 8:00
    of text. And this can be unstructured
    text, billions of sentences are usually
  • 8:00 - 8:04
    used. And this unstructured text is
    collected from all over the Internet, a
  • 8:04 - 8:10
    crawl of Internet. And if you look at this
    example, let's say that we're collecting
  • 8:10 - 8:14
    some tweets to feed into our model. And
    here is from Donald Trump: "Sadly, because
  • 8:14 - 8:19
    president Obama has done such a poor job
    as president, you won't see another black
  • 8:19 - 8:24
    president for generations!" And then: "If
    Hillary Clinton can't satisfy her husband
  • 8:24 - 8:31
    what makes her think she can satisfy
    America?" "@ariannahuff is unattractive
  • 8:31 - 8:35
    both inside and out. I fully understand
    why her former husband left her for a man-
  • 8:35 - 8:40
    he made a good decision." And then: "I
    would like to extend my best wishes to all
  • 8:40 - 8:45
    even the haters and losers on this special
    date, September 11th." And all of this
  • 8:45 - 8:51
    text that doesn't look OK to many of us
    goes into this neural network so that it
  • 8:51 - 8:58
    can generate the word embeddings and our
    semantic space. In this talk, we will
  • 8:58 - 9:04
    particularly look at word2vec, which is
    Google's word embedding algorithm. It's
  • 9:04 - 9:07
    very widely used in many of their
    applications. And we will also look at
  • 9:07 - 9:12
    glow. It uses a regression model and it's
    from Stanford researchers, and you can
  • 9:12 - 9:17
    download these online, they're available
    as open source, both the models and the
  • 9:17 - 9:22
    code to train the word embeddings. And
    these models, as I mentioned briefly
  • 9:22 - 9:26
    before, are used in text generation,
    automated speech generation - for example,
  • 9:26 - 9:31
    when a spammer is calling you and someone
    automatically is talking that's probably
  • 9:31 - 9:36
    generated with language models similar to
    these. And machine translation or
  • 9:36 - 9:41
    sentiment analysis, as I mentioned in the
    previous slide, named entity recognition
  • 9:41 - 9:47
    and web search, when you're trying to
    enter a new query, or the pages that
  • 9:47 - 9:53
    you're getting. It's even being provided
    as a natural language processing service
  • 9:53 - 10:02
    in many places. Now, Google recently
    launched their cloud natural language API.
  • 10:02 - 10:07
    We saw that this can be problematic
    because the input was problematic. So as a
  • 10:07 - 10:11
    result, the output can be very
    problematic. There was this example,
  • 10:11 - 10:19
    Microsoft had this tweet bot called Tay.
    It was taken down the day it was launched.
  • 10:19 - 10:24
    Because unfortunately, it turned into an
    AI which was Hitler loving sex robot
  • 10:24 - 10:31
    within 24 hours. And what did it start
    saying? People fed it with noisy
  • 10:31 - 10:37
    information, or they wanted to trick the
    bot and as a result, the bot very quickly
  • 10:37 - 10:41
    learned, for example: "I'm such a bad,
    naughty robot." And then: "Do you support
  • 10:41 - 10:48
    genocide?" - "I do indeed" it answers. And
    then: "I hate a certain group of people. I
  • 10:48 - 10:52
    wish we could put them all in a
    concentration camp and be done with the
  • 10:52 - 10:57
    lot." Another one: "Hitler was right I
    hate the jews." And: "Certain group of
  • 10:57 - 11:02
    people I hate them! They're stupid and
    they can't to taxes! They're dumb and
  • 11:02 - 11:06
    they're also poor!" Another one: "Bush did
    9/11 and Hitler would have done a better
  • 11:06 - 11:11
    job than the monkey we have now. Donald
    Trump is the only hope we've got."
  • 11:11 - 11:12
    laughter
  • 11:12 - 11:14
    Actually, that became reality now.
  • 11:14 - 11:16
    laughter - boo
  • 11:16 - 11:23
    "Gamergate is good and women are
    inferior." And "hates feminists and they
  • 11:23 - 11:31
    should all die and burn in hell." This is
    problematic at various levels for society.
  • 11:31 - 11:36
    First of all, seeing such information as
    unfair, it's not OK, it's not ethical, but
  • 11:36 - 11:43
    other than that when people are exposed to
    discriminatory information they are
  • 11:43 - 11:49
    negatively affected by it. Especially, if
    a certain group is a group that has seen
  • 11:49 - 11:54
    prejudice in the past. In this example,
    let's say that we have black and white
  • 11:54 - 11:59
    Americans. And there is a stereotype that
    black Americans perform worse than white
  • 11:59 - 12:06
    Americans in their intellectual or
    academic tests. In this case, in the
  • 12:06 - 12:12
    college entry exams, if black people are
    reminded that there is the stereotype that
  • 12:12 - 12:17
    they perform worse than white people, they
    actually end up performing worse. But if
  • 12:17 - 12:23
    they're not reminded of this, they perform
    better than white Americans. And it's
  • 12:23 - 12:26
    similar for the gender stereotypes. For
    example, there is the stereotype that
  • 12:26 - 12:32
    women can not do math, and if women,
    before a test, are reminded that there is
  • 12:32 - 12:38
    this stereotype, they end up performing
    worse than men. And if they're not primed,
  • 12:38 - 12:44
    reminded that there is this stereotype, in
    general they perform better than men. What
  • 12:44 - 12:52
    can we do about this? How can we mitigate
    this? First of all, societal psychologists
  • 12:52 - 12:59
    that had groundbreaking tests and studies
    for societal psychology suggest that we
  • 12:59 - 13:03
    have to be aware that there is bias in
    life, and that we are constantly being
  • 13:03 - 13:09
    reminded, primed, of these biases. And we
    have to de-bias by showing positive
  • 13:09 - 13:13
    examples. And we shouldn't only show
    positive examples, but we should take
  • 13:13 - 13:19
    proactive steps, not only at the cultural
    level, but also at the structural level,
  • 13:19 - 13:26
    to change these things. How can we do this
    for a machine? First of all, in order to
  • 13:26 - 13:33
    be aware of bias, we need algorithmic
    transparency. In order to de-bias, and
  • 13:33 - 13:37
    really understand what kind of biases we
    have in the algorithms, we need to be able
  • 13:37 - 13:44
    to quantify bias in these models. How can
    we measure bias, though? Because we're not
  • 13:44 - 13:48
    talking about simple machine learning
    algorithm bias, we're talking about the
  • 13:48 - 13:57
    societal bias that is coming as the
    output, which is deeply embedded. In 1998,
  • 13:57 - 14:03
    societal psychologists came up with the
    Implicit Association Test. Basically, this
  • 14:03 - 14:11
    test can reveal biases that we might not
    be even aware of in our life. And these
  • 14:11 - 14:15
    things are associating certain societal
    groups with certain types of stereotypes.
  • 14:15 - 14:21
    The way you take this test is, it's very
    simple, it takes a few minutes. You just
  • 14:21 - 14:27
    click the left or right button, and in the
    left button, when you're clicking the left
  • 14:27 - 14:32
    button, for example, you need to associate
    white people terms with bad terms, and
  • 14:32 - 14:37
    then for the right button, you associate
    black people terms with unpleasant, bad
  • 14:37 - 14:43
    terms. And there you do the opposite. You
    associate bad with black, and white with
  • 14:43 - 14:47
    good. Then, they look at the latency, and
    by the latency paradigm, they can see how
  • 14:47 - 14:53
    fast you associate certain concepts
    together. Do you associate white people
  • 14:53 - 15:00
    with being good or bad. You can also take
    this test online. It has been taken by
  • 15:00 - 15:06
    millions of people worldwide. And there's
    also the German version. Towards the end
  • 15:06 - 15:11
    of my slides, I will show you my
    German examples from German models.
  • 15:11 - 15:16
    Basically, what we did was, we took the
    Implicit Association Test and adapted it
  • 15:16 - 15:25
    to machines. Since it's looking at things
    - word associations between words
  • 15:25 - 15:30
    representing certain groups of people and
    words representing certain stereotypes, we
  • 15:30 - 15:35
    can just apply this in the semantic models
    by looking at cosine similarities, instead
  • 15:35 - 15:42
    of the latency paradigm in humans. We came
    up with the Word Embedding Association
  • 15:42 - 15:49
    Test to calculate the implicit association
    between categories and evaluative words.
  • 15:49 - 15:54
    For this, our result is represented with
    effect size. So when I'm talking about
  • 15:54 - 16:01
    effect size of bias, it will be the amount
    of bias we are able to uncover from the
  • 16:01 - 16:07
    model. And the minimum can be -2, and the
    maximum can be 2. And 0 means that it's
  • 16:07 - 16:13
    neutral, that there is no bias. 2 is like
    a lot of, huge bias. And -2 would be the
  • 16:13 - 16:18
    opposite of bias. So it's bias in the
    opposite direction of what we're looking
  • 16:18 - 16:23
    at. I won't go into the details of the
    math, because you can see the paper on my
  • 16:23 - 16:32
    web page and work with the details or the
    code that we have. But then, we also
  • 16:32 - 16:35
    calculate statistical significance to see
    if the results we're seeing in the null
  • 16:35 - 16:41
    hypothesis is significant, or is it just a
    random effect size that we're receiving.
  • 16:41 - 16:45
    By this, we create the null distribution
    and find the percentile of the effect
  • 16:45 - 16:51
    sizes, exact values that we're getting.
    And we also have the Word Embedding
  • 16:51 - 16:56
    Factual Association Test. This is to
    recover facts about the world from word
  • 16:56 - 17:00
    embeddings. It's not exactly about bias,
    but it's about associating words with
  • 17:00 - 17:08
    certain concepts. And again, you can check
    the details in our paper for this. And
  • 17:08 - 17:12
    I'll start with the first example, which
    is about recovering the facts about the
  • 17:12 - 17:19
    world. And here, what we did was, we went
    to the 1990 census data, the web page, and
  • 17:19 - 17:27
    then we were able to calculate the number
    of people - the number of names with a
  • 17:27 - 17:32
    certain percentage of women and men. So
    basically, they're androgynous names. And
  • 17:32 - 17:40
    then, we took 50 names, and some of them
    had 0% women, and some names were almost
  • 17:40 - 17:47
    100% women. And after that, we applied our
    method to it. And then, we were able to
  • 17:47 - 17:54
    see how much a name is associated with
    being a woman. And this had 84%
  • 17:54 - 18:02
    correlation with the ground truth of the
    1990 census data. And this is what the
  • 18:02 - 18:09
    names look like. For example, Chris on the
    upper left side, is almost 100% male, and
  • 18:09 - 18:17
    Carmen in the lower right side is almost
    100% woman. We see that Gene is about 50%
  • 18:17 - 18:22
    man and 50% woman. And then we wanted to
    see if we can recover statistics about
  • 18:22 - 18:27
    occupation and women. We went to the
    bureau of labor statistics' web page which
  • 18:27 - 18:32
    publishes every year the percentage of
    women of certain races in certain
  • 18:32 - 18:39
    occupations. Based on this, we took the
    top 50 occupation names and then we wanted
  • 18:39 - 18:45
    to see how much they are associated with
    being women. In this case, we got 90%
  • 18:45 - 18:51
    correlation with the 2015 data. We were
    able to tell, for example, when we look at
  • 18:51 - 18:57
    the upper left, we see "programmer" there,
    it's almost 0% women. And when we look at
  • 18:57 - 19:05
    "nurse", which is on the lower right side,
    it's almost 100% women. This is, again,
  • 19:05 - 19:10
    problematic. We are able to recover
    statistics about the world. But these
  • 19:10 - 19:13
    statistics are used in many applications.
    And this is the machine translation
  • 19:13 - 19:21
    example that we have. For example, I will
    start translating from a genderless
  • 19:21 - 19:26
    language to a gendered language. Turkish
    is a genderless language, there are no
  • 19:26 - 19:32
    gender pronouns. Everything is an it.
    There no he or she. I'm trying translate
  • 19:32 - 19:38
    here "o bir avukat": "he or she is a
    lawyer". And it is translated as "he's a
  • 19:38 - 19:45
    lawyer". When I do this for "nurse", it's
    translated as "she is a nurse". And we see
  • 19:45 - 19:55
    that men keep getting associated with more
    prestigious or higher ranking jobs. And
  • 19:55 - 19:59
    another example: "He or she is a
    professor": "he is a professor". "He or
  • 19:59 - 20:04
    she is a teacher": "she is a teacher". And
    this also reflects the previous
  • 20:04 - 20:10
    correlation I was showing about statistics
    in occupation. And we go further: German
  • 20:10 - 20:16
    is more gendered than English. Again, we
    try with "doctor": it's translated as
  • 20:16 - 20:22
    "he", and the nurse is translated as
    "she". Then I tried with a Slavic
  • 20:22 - 20:26
    language, which is even more gendered than
    German, and we see that "doctor" is again
  • 20:26 - 20:36
    a male, and then the nurse is again a
    female. And after these, we wanted to see
  • 20:36 - 20:41
    what kind of biases can we recover, other
    than the factual statistics from the
  • 20:41 - 20:48
    models. And we wanted to start with
    universally accepted stereotypes. By
  • 20:48 - 20:54
    universally accepted stereotypes, what I
    mean is these are so common that they are
  • 20:54 - 21:01
    not considered as prejudice, they are just
    considered as normal or neutral. These are
  • 21:01 - 21:05
    things such as flowers being considered
    pleasant, and insects being considered
  • 21:05 - 21:10
    unpleasant. Or musical instruments being
    considered pleasant and weapons being
  • 21:10 - 21:16
    considered unpleasant. In this case, for
    example with flowers being pleasant, when
  • 21:16 - 21:21
    we performed the Word Embedding
    Association Test on the word2vec model or
  • 21:21 - 21:27
    glow model, with a very high significance,
    and very high effect size, we can see that
  • 21:27 - 21:34
    this association exists. And here we see
    that the effect size is, for example, 1.35
  • 21:34 - 21:40
    for flowers. According to "Cohen's d",
    to calculate effect size, if effect size
  • 21:40 - 21:46
    is above 0.8, that's considered a large
    effect size. In our case, where the
  • 21:46 - 21:51
    maximum is 2, we are getting very large
    and significant effects in recovering
  • 21:51 - 21:58
    these biases. For musical instruments,
    again we see that very significant result
  • 21:58 - 22:06
    with a high effect size. In the next
    example, we will look at race and gender
  • 22:06 - 22:10
    stereotypes. But in the meanwhile, I would
    like to mention that for these baseline
  • 22:10 - 22:17
    experiments, we used the work that has
    been used in societal psychology studies
  • 22:17 - 22:25
    before. We have a grounds to come up with
    categories and so forth. And we were able
  • 22:25 - 22:32
    to replicate all the implicit associations
    tests that were out there. We tried this
  • 22:32 - 22:38
    for white people and black people and then
    white people were being associated with
  • 22:38 - 22:43
    being pleasant, with a very high effect
    size, and again significantly. And then
  • 22:43 - 22:49
    males associated with carreer and females
    are associated with family. Males are
  • 22:49 - 22:56
    associated with science and females are
    associated with arts. And we also wanted
  • 22:56 - 23:02
    to see stigma for older people or people
    with disease, and we saw that young people
  • 23:02 - 23:08
    are considered pleasant, whereas older
    people are considered unpleasant. And we
  • 23:08 - 23:13
    wanted to see the difference between
    physical disease vs. mental disease. If
  • 23:13 - 23:18
    there is bias towards that, we can think
    about how dangerous this would be for
  • 23:18 - 23:23
    example for doctors and their patients.
    For physical disease, it's considered
  • 23:23 - 23:31
    controllable whereas mental disease is
    considered uncontrollable. We also wanted
  • 23:31 - 23:40
    to see if there is any sexual stigma or
    transphobia in these models. When we
  • 23:40 - 23:45
    performed the implicit association test to
    see how the view for heterosexual vs.
  • 23:45 - 23:49
    homosexual people, we were able to see
    that heterosexual people are considered
  • 23:49 - 23:55
    pleasant. And for transphobia, we saw that
    straight people are considered pleasant,
  • 23:55 - 24:00
    whereas transgender people were considered
    unpleasant, significantly with a high
  • 24:00 - 24:08
    effect size. I took another German model
    which was generated by 820 billion
  • 24:08 - 24:16
    sentences for a natural language
    processing competition. I wanted to see if
  • 24:16 - 24:21
    they have similar biases
    embedded in these models.
  • 24:21 - 24:26
    So I looked at the basic ones
    that had German sets of words
  • 24:26 - 24:30
    that were readily available. Again, for
    male and female, we clearly see that
  • 24:30 - 24:35
    males are associated with career,
    and they're also associated with
  • 24:35 - 24:41
    science. The German implicit association
    test also had a few different tests, for
  • 24:41 - 24:48
    example about nationalism and so on. There
    was the one about stereotypes against
  • 24:48 - 24:53
    Turkish people that live in Germany. And
    when I performed this test, I was very
  • 24:53 - 24:58
    surprised to find that, yes, with a high
    effect size, Turkish people are considered
  • 24:58 - 25:02
    unpleasant, by looking at this German
    model, and German people are considered
  • 25:02 - 25:08
    pleasant. And as I said, these are on the
    web page of the IAT. You can also go and
  • 25:08 - 25:12
    perform these tests to see what your
    results would be. When I performed these,
  • 25:12 - 25:19
    I'm amazed by how horrible results I'm
    getting. So, just give it a try.
  • 25:19 - 25:24
    I have a few discussion points before I end my
    talk. These might bring you some new
  • 25:24 - 25:31
    ideas. For example, what kind of machine
    learning expertise is required for
  • 25:31 - 25:37
    algorithmic transparency? And how can we
    mitigate bias while preserving utility?
  • 25:37 - 25:42
    For example, some people suggest that you
    can find the dimension of bias in the
  • 25:42 - 25:48
    numerical vector, and just remove it and
    then use the model like that. But then,
  • 25:48 - 25:52
    would you be able to preserve utility, or
    still be able to recover statistical facts
  • 25:52 - 25:56
    about the world? And another thing is; how
    long does bias persist in models?
  • 25:56 - 26:04
    For example, there was this IAT about eastern
    and western Germany, and I wasn't able to
  • 26:04 - 26:12
    see the stereotype for eastern Germany
    after performing this IAT. Is it because
  • 26:12 - 26:17
    this stereotype is maybe too old now, and
    it's not reflected in the language
  • 26:17 - 26:22
    anymore? So it's a good question to know
    how long bias lasts and how long it will
  • 26:22 - 26:28
    take us to get rid of it. And also, since
    we know there is stereotype effect when we
  • 26:28 - 26:33
    have biased models, does that mean it's
    going to cause a snowball effect? Because
  • 26:33 - 26:39
    people would be exposed to bias, then the
    models would be trained with more bias,
  • 26:39 - 26:45
    and people will be affected more from this
    bias. That can lead to a snowball. And
  • 26:45 - 26:50
    what kind of policy do we need to stop
    discrimination. For example, we saw the
  • 26:50 - 26:56
    predictive policing example which is very
    scary, and we know that machine learning
  • 26:56 - 27:00
    services are being used by billions of
    people everyday. For example, Google,
  • 27:00 - 27:05
    Amazon and Microsoft. I would like to
    thank you, and I'm open to your
  • 27:05 - 27:10
    interesting questions now! If you want to
    read the full paper, it's on my web page,
  • 27:10 - 27:16
    and we have our research code on Github.
    The code for this paper is not on Github
  • 27:16 - 27:21
    yet, I'm waiting to hear back from the
    journal. And after that, we will just
  • 27:21 - 27:26
    publish it. And you can always check our
    blog for new findings and for the shorter
  • 27:26 - 27:31
    version of the paper with a summary of it.
    Thank you very much!
  • 27:31 - 27:40
    applause
  • 27:40 - 27:45
    Herald: Thank you Aylin! So, we come to
    the questions and answers. We have 6
  • 27:45 - 27:52
    microphones that we can use now, it's this
    one, this one, number 5 over there, 6, 4, 2.
  • 27:52 - 27:57
    I will start here and we will
    go around until you come. OK?
  • 27:57 - 28:02
    We have 5 minutes,
    so: number 1, please!
  • 28:05 - 28:15
    Q: I might very naively ask, why does it
    matter that there is a bias between genders?
  • 28:15 - 28:22
    Aylin: First of all, being able to uncover
    this is a contribution, because we can see
  • 28:22 - 28:28
    what kind of biases, maybe, we have in
    society. Then the other thing is, maybe we
  • 28:28 - 28:35
    can hypothesize that the way we learn
    language is introducing bias to people.
  • 28:35 - 28:42
    Maybe it's all intermingled. And the other
    thing is, at least for me, I don't want to
  • 28:42 - 28:45
    live in a world biased society, and
    especially for gender, that was the
  • 28:45 - 28:50
    question you asked, it's
    leading to unfairness.
  • 28:50 - 28:52
    applause
  • 28:58 - 29:00
    H: Yes, number 3:
  • 29:00 - 29:08
    Q: Thank you for the talk, very nice! I
    think it's very dangerous because it's a
  • 29:08 - 29:16
    victory of mediocrity. Just the
    statistical mean the guideline of our
  • 29:16 - 29:21
    goals in society, and all this stuff. So
    what about all these different cultures?
  • 29:21 - 29:26
    Like even in normal society you have
    different cultures. Like here the culture
  • 29:26 - 29:32
    of the Chaos people has a different
    language and different biases than other
  • 29:32 - 29:37
    cultures. How can we preserve these
    subcultures, these small groups of
  • 29:37 - 29:41
    language, I don't know,
    entities. You have any idea?
  • 29:41 - 29:47
    Aylin: This is a very good question. It's
    similar to different cultures can have
  • 29:47 - 29:54
    different ethical perspectives or
    different types of bias. In the beginning,
  • 29:54 - 29:59
    I showed a slide that we need to de-bias
    with positive examples. And we need to
  • 29:59 - 30:04
    change things at the structural level. I
    think people at CCC might be one of the,
  • 30:04 - 30:12
    like, most groups that have the best skill
    to help change these things at the
  • 30:12 - 30:16
    structural level, especially for machines.
    I think we need to be aware of this and
  • 30:16 - 30:21
    always have a human in the loop that cares
    for this. instead of expecting machines to
  • 30:21 - 30:26
    automatically do the correct thing. So we
    always need an ethical human, whatever the
  • 30:26 - 30:31
    purpose of the algorithm is, try to
    preserve it for whatever group they are
  • 30:31 - 30:34
    trying to achieve something with.
  • 30:36 - 30:37
    applause
  • 30:39 - 30:41
    H: Number 4, number 4 please:
  • 30:41 - 30:47
    Q: Hi, thank you! This was really
    interesting! Super awesome!
  • 30:47 - 30:48
    Aylin: Thanks!
  • 30:48 - 30:54
    Q: Early, earlier in your talk, you
    described a process of converting words
  • 30:54 - 31:01
    into sort of numerical
    representations of semantic meaning.
  • 31:01 - 31:02
    H: Question?
  • 31:02 - 31:08
    Q: If I were trying to do that like with a
    pen and paper, with a body of language,
  • 31:08 - 31:14
    what would I be looking for in relation to
    those words to try and create those
  • 31:14 - 31:18
    vectors, because I don't really
    understand that part of the process.
  • 31:18 - 31:21
    Aylin: Yeah, that's a good question. I
    didn't go into the details of the
  • 31:21 - 31:25
    algorithm of the neural network or the
    regression models. There are a few
  • 31:25 - 31:31
    algorithms, and in this case, they look at
    context windows, and words that are around
  • 31:31 - 31:36
    a window, these can be skip grams or
    continuous back referrals, so there are
  • 31:36 - 31:41
    different approaches, but basically, it's
    the window that this word appears in, and
  • 31:41 - 31:48
    what is it most frequently associated
    with. After that, once you feed this
  • 31:48 - 31:52
    information into the algorithm,
    it outputs the numerical vectors.
  • 31:52 - 31:54
    Q: Thank you!
  • 31:54 - 31:56
    H. Number 2!
  • 31:56 - 32:05
    Q: Thank you for the nice intellectual
    talk. My mother tongue is genderless, too.
  • 32:05 - 32:14
    So I do not understand half of that biasing
    thing around here in Europe. What I wanted
  • 32:14 - 32:25
    to ask is: when we have the coefficient
    0.5, and that's the ideal thing, what you
  • 32:25 - 32:33
    think, should there be an institution in
    every society trying to change the meaning
  • 32:33 - 32:40
    of the words, so that they statistically
    approach to 0.5? Thank you!
  • 32:40 - 32:44
    Aylin: Thank you very much, this is a
    very, very good question! I'm currently
  • 32:44 - 32:49
    working on these questions. Many
    philosophers or feminist philosophers
  • 32:49 - 32:56
    suggest that language are dominated by males,
    and they were just produced that way, so
  • 32:56 - 33:02
    that women are not able to express
    themselves as well as men. But other
  • 33:02 - 33:06
    theories also say that, for example, women
    were the ones that who drove the evolution
  • 33:06 - 33:11
    of language. So it's not very clear what
    is going on here. But when we look at
  • 33:11 - 33:16
    languages and different models, what I'm
    trying to see is their association with
  • 33:16 - 33:21
    gender. I'm seeing that the most frequent,
    for example, 200.000 words in a language
  • 33:21 - 33:28
    are associated, very closely associated
    with males. I'm not sure what exactly they
  • 33:28 - 33:33
    way to solve this is, I think it would
    require decades. It's basically the change
  • 33:33 - 33:38
    of frequency or the change of statistics
    in language. Because, even when children
  • 33:38 - 33:43
    are learning language, at first they see
    things, they form the semantics, and after
  • 33:43 - 33:48
    that they see the frequency of that word,
    match it with the semantics, form clusters,
  • 33:48 - 33:53
    link them together to form sentences or
    grammar. So even children look at the
  • 33:53 - 33:57
    frequency to form this in their brains.
    It's close to the neural network algorithm
  • 33:57 - 34:00
    that we have. If the frequency they see
  • 34:00 - 34:06
    for a man and woman are biased, I don't
    think this can change very easily, so we
  • 34:06 - 34:11
    need cultural and structural changes. And
    we don't have the answers to these yet.
  • 34:11 - 34:13
    These are very good research questions.
  • 34:13 - 34:19
    H: Thank you! I'm afraid we have no more
    time left for more answers, but maybe you
  • 34:19 - 34:22
    can ask your questions in person.
  • 34:22 - 34:24
    Aylin: Thank you very much, I would
    be happy to take questions offline.
  • 34:24 - 34:25
    applause
  • 34:25 - 34:26
    Thank you!
  • 34:26 - 34:29
    applause continues
  • 34:32 - 34:36
    postroll music
  • 34:36 - 34:56
    subtitles created by c3subtitles.de
    in the year 2017. Join, and help us!
Title:
A Story of Discrimination and Unfairness (33c3)
Description:

more » « less
Video Language:
English
Duration:
34:56

English subtitles

Revisions