Return to Video

Lecture 4.1: Heuristic Evaluation — Why and How? (16:41)

  • 0:00 - 0:05
    In this video we are going to introduce a technique called Heuristic Evaluation.
  • 0:05 - 0:11
    As we talked about at the beginning of the course, there’s lots of different ways to evaluate software.
  • 0:11 - 0:14
    One that you might be most familiar with is empirical methods,
  • 0:14 - 0:19
    where, of some level of formality, you have actual people trying out your software.
  • 0:19 - 0:25
    It’s also possible to have formal methods, where you’re building a model
  • 0:25 - 0:28
    of how people behave in a particular situation,
  • 0:28 - 0:32
    and that enables you to predict how different user interfaces will work.
  • 0:32 - 0:36
    Or, if you can’t build a closed-form formal model,
  • 0:36 - 0:40
    you can also try out your interface with simulation and have automated tests —
  • 0:40 - 0:44
    that can detect usability bugs and effective designs.
  • 0:44 - 0:49
    This works especially well for low-level stuff; it’s harder to do for higher-level stuff.
  • 0:49 - 0:52
    And what we’re going to talk about today is critique-based approaches,
  • 0:52 - 1:00
    where people are giving you feedback directly, based on their expertise or a set of heuristics.
  • 1:00 - 1:03
    As any of you who have ever taken an art or design class know,
  • 1:03 - 1:06
    peer critique can be an incredibly effective form of feedback,
  • 1:06 - 1:09
    and it can make you make your designs even better.
  • 1:09 - 1:12
    You can get peer critique really at any stage of your design process,
  • 1:12 - 1:17
    but I’d like to highlight a couple that I think can be particularly valuable.
  • 1:17 - 1:21
    First, it’s really valuable to get peer critique before user testing,
  • 1:21 - 1:27
    because that helps you not waste your users on stuff that’s just going to get picked up automatically.
  • 1:27 - 1:30
    You want to be able to focus the valuable resources of user testing
  • 1:30 - 1:34
    on stuff that other people wouldn’t be able to pick up on.
  • 1:34 - 1:37
    The rich qualitative feedback that peer critique provides
  • 1:37 - 1:41
    can also be really valuable before redesigning your application,
  • 1:41 - 1:45
    because what it can do is it can show you what parts of your app you probably want to keep,
  • 1:45 - 1:49
    and what are other parts that are more problematic and deserve redesign.
  • 1:49 - 1:51
    Third, sometimes, you know there are problems,
  • 1:51 - 1:56
    and you need data to be able to convince other stakeholders to make the changes.
  • 1:56 - 2:00
    And peer critique can be a great way, especially if it’s structured,
  • 2:00 - 2:05
    to be able to get the feedback that you need, to make the changes that you know need to happen.
  • 2:06 - 2:11
    And lastly, this kind of structured peer critique can be really valuable before releasing software,
  • 2:11 - 2:16
    because it helps you do a final sanding of the entire design, and smooth out any rough edges.
  • 2:16 - 2:21
    As with most types of evaluation, it’s usually helpful to begin with a clear goal,
  • 2:21 - 2:24
    even if what you ultimately learn is completely unexpected.
  • 2:26 - 2:31
    And so, what we’re going to talk about today is a particular technique called Heuristic Evaluation.
  • 2:31 - 2:35
    Heuristic Evaluation was created by Jakob Nielsen and colleagues, about twenty years ago now.
  • 2:36 - 2:42
    And the goal of Heuristic Evaluation is to be able to find usability problems in the design.
  • 2:43 - 2:44
    I first learned about Heuristic Evaluation
  • 2:44 - 2:50
    when I TA’d James Landay’s Intro to HCI course, and I’ve been using it and teaching it ever since.
  • 2:50 - 2:54
    It’s a really valuable technique because it lets you get feedback really quickly
  • 2:54 - 2:58
    and it’s a high bang-for-the-buck strategy.
  • 2:58 - 3:02
    And the slides that I have here are based off James’ slides for this course,
  • 3:02 - 3:06
    and the materials are all available on Jacob Nielsen’s website.
  • 3:06 - 3:10
    The basic idea of heuristic evaluation is that you’re going to provide a set of people —
  • 3:10 - 3:15
    often other stakeholders on the design team or outside design experts —
  • 3:15 - 3:18
    with a set of heuristics or principles,
  • 3:18 - 3:23
    and they’re going to use those to look for problems in your design.
  • 3:24 - 3:26
    Each of them is first going to do this independently
  • 3:26 - 3:31
    and so they’ll walk through a variety of tasks using your design to look for these bugs.
  • 3:33 - 3:37
    And you’ll see that different evaluators are going to find different problems.
  • 3:37 - 3:41
    And then they’re going to communicate and talk together only at the end, afterwards.
  • 3:43 - 3:47
    At the end of the process, they’re going to get back together and talk about what they found.
  • 3:47 - 3:51
    And this “independent first, gather afterwards”
  • 3:51 - 3:57
    is how you get a “wisdom of crowds” benefit in having multiple evaluators.
  • 3:57 - 3:59
    And one of the reasons that we’re talking about this early in the class
  • 3:59 - 4:05
    is that it’s a technique that you can use, either on a working user interface or on sketches of user interfaces.
  • 4:05 - 4:10
    And so heuristic evaluation works really well in conjunction with paper prototypes
  • 4:10 - 4:16
    and other rapid, low fidelity techniques that you may be using to get your design ideas out quick and fast.
  • 4:18 - 4:22
    Here’s Neilsen’s ten heuristics, and they’re a pretty darn good set.
  • 4:22 - 4:25
    That said, there’s nothing magic about these heuristics.
  • 4:25 - 4:30
    They do a pretty good job of covering many of the problems that you’ll see in many user interfaces;
  • 4:30 - 4:33
    but you can add on any that you want
  • 4:33 - 4:38
    and get rid of any that aren’t appropriate for your system.
  • 4:38 - 4:41
    We’re going to go over the content of these ten heuristics in the next couple lectures,
  • 4:41 - 4:46
    and in this lecture I’d like to introduce the process that you’re going to use with these heuristics.
  • 4:46 - 4:49
    So here’s what you’re going to have your evaluators do:
  • 4:49 - 4:52
    Give them a couple of tasks to use your design for,
  • 4:52 - 4:57
    and have them do each task, stepping through carefully several times.
  • 4:57 - 5:01
    When they’re doing this, they’re going to keep the list of usability principles
  • 5:01 - 5:03
    as a reminder of things to pay attention to.
  • 5:03 - 5:06
    Now which principles will you use?
  • 5:06 - 5:09
    I think Nielsen’s ten heuristics are a fantastic start,
  • 5:09 - 5:13
    and you can augment those with anything else that’s relevant for your domain.
  • 5:13 - 5:19
    So, if you have particular design goals that you would like your design to achieve, include those in the list.
  • 5:19 - 5:22
    Or, if you have particular goals that you’ve set up
  • 5:22 - 5:26
    from competitive analysis of designs that are out there already,
  • 5:26 - 5:27
    that’s great too.
  • 5:27 - 5:33
    Or if there are things that you’ve seen your or other designs excel at,
  • 5:33 - 5:37
    those are important goals too and can be included in your list of heuristics.
  • 5:39 - 5:43
    And then obviously, the important part is that you’re going to take what you learn from these evaluators
  • 5:43 - 5:49
    and use those violations of the heuristics as a way of fixing problems and redesigning.
  • 5:49 - 5:55
    Let’s talk a little bit more about why you might want to have multiple evaluators rather than just one.
  • 5:55 - 6:00
    The graph on this slide is adapted from Jacob Neilsen’s work on heuristic evaluation
  • 6:00 - 6:07
    and what you see is each black square is a bug that a particular evaluator found.
  • 6:08 - 6:12
    An individual evaluator represents a row of this matrix
  • 6:12 - 6:15
    and there’s about twenty evaluators in this set.
  • 6:15 - 6:17
    The columns represent the problems.
  • 6:17 - 6:22
    And what you can see is that there’s some problems that were found by relatively few evaluators
  • 6:22 - 6:25
    and other stuff which almost everybody found.
  • 6:25 - 6:29
    So we’re going to call the stuff on the right the easy problems and the stuff on the left hard problems.
  • 6:30 - 6:35
    And so, in aggregate, what we can say is that no evaluator found every problem,
  • 6:35 - 6:41
    and some evaluators found more than others, and so there are better and worse people to do this.
  • 6:43 - 6:45
    So why not have lots of evaluators?
  • 6:45 - 6:49
    Well, as you add more evaluators, they do find more problems;
  • 6:50 - 6:53
    but it kind of tapers off over time — you lose that benefit eventually.
  • 6:54 - 6:58
    And so from a cost-benefit perspective it’s just stops making sense after a certain point.
  • 6:59 - 7:01
    So where’s the peak of this curve?
  • 7:01 - 7:04
    It’s of course going to depend on the user interface that you’re working with,
  • 7:04 - 7:08
    how much you’re paying people, how much time is involved — all sorts of factors.
  • 7:08 - 7:13
    Jakob Nielsen’s rule of thumb for these kinds of user interfaces and heuristic evaluation
  • 7:13 - 7:19
    is that three to five people tends to work pretty well; and that’s been my experience too.
  • 7:20 - 7:24
    And I think that definitely one of the reasons that people use heuristic evaluation
  • 7:24 - 7:28
    is because it can be an extremely cost-effective way of finding problems.
  • 7:29 - 7:32
    In one study that Jacob Nielsen ran,
  • 7:32 - 7:37
    he estimated that the cost of the problems found with heuristic evaluation were $500,000
  • 7:37 - 7:41
    and the cost of performing it was just over $10,000,
  • 7:41 - 7:49
    and so he estimates a 48-fold benefit-cost ratio for this particular user interface.
  • 7:49 - 7:55
    Obviously, these numbers are back of the envelope, and your mileage will vary.
  • 7:55 - 7:59
    You can think about how to estimate the benefit that you get from something like this
  • 7:59 - 8:03
    if you have an in-house software tool using something like productivity increases —
  • 8:03 - 8:07
    that, if you are making an expense reporting system
  • 8:07 - 8:12
    or other in-house system that will make people’s time more efficiently used —
  • 8:12 - 8:14
    that’s a big usability win.
  • 8:14 - 8:18
    And if you’ve got software that you’re making available on the open market,
  • 8:18 - 8:22
    you can think about the benefit from sales or other measures like that.
  • 8:24 - 8:28
    One thing that we can get from that graph is that evaluators are more likely to find severe problems
  • 8:28 - 8:30
    and that’s good news;
  • 8:30 - 8:32
    and so with a relatively small number of people,
  • 8:32 - 8:36
    you’re pretty likely to stumble across the most important stuff.
  • 8:36 - 8:41
    However, as we saw with just one person in this particular case,
  • 8:41 - 8:46
    even the best evaluator found only about a third of the problems of the system.
  • 8:46 - 8:51
    And so that’s why ganging up a number of evaluators, say five,
  • 8:51 - 8:55
    is going to get you most of the benefit that you’ll be going to be able to achieve.
  • 8:56 - 9:00
    If we compare heuristic evaluation and user testing, one of the things that we see
  • 9:00 - 9:07
    is that heuristic evaluation can often be a lot faster — It takes just an hour or two for an evaluator —
  • 9:07 - 9:11
    and the mechanics of getting a user test up and running can take longer,
  • 9:11 - 9:16
    not even accounting for the fact that you may have to build software.
  • 9:18 - 9:21
    Also, the heuristic evaluation results come pre-interpreted
  • 9:21 - 9:26
    because your evaluators are directly providing you with problems and things to fix,
  • 9:26 - 9:34
    and so it saves you the time of having to infer from the usability tests what might be the problem or solution.
  • 9:36 - 9:39
    Now conversely, experts walking through your system
  • 9:39 - 9:44
    can generate false positives that wouldn’t actually happen in a real environment.
  • 9:44 - 9:50
    And this indeed does happen, and so user testing is, sort of, by definition going to be more accurate.
  • 9:52 - 9:55
    At the end of the day I think it’s valuable to alternate methods:
  • 9:55 - 10:00
    All of the different techniques that you’ll learn in this class for getting feedback can each be valuable,
  • 10:00 - 10:05
    and that [by] cycling through them you can often get the benefits of each.
  • 10:05 - 10:11
    And that can be because with user evaluation and user testing, you’ll find different problems,
  • 10:11 - 10:15
    and by running HE or something like that early in the design process,
  • 10:15 - 10:20
    you’ll avoid wasting real users that you may bring in later on.
  • 10:21 - 10:25
    So now that we’ve seen the benefits, what are the steps?
  • 10:25 - 10:30
    The first thing to do is to get all of your evaluators up to speed,
  • 10:30 - 10:36
    on what the story is behind your software — any necessary domain knowledge they might need —
  • 10:36 - 10:40
    and tell them about the scenario that you’re going to have them step through.
  • 10:41 - 10:45
    Then obviously, you have the evaluation phase where people are working through the interface.
  • 10:45 - 10:50
    Afterwards, each person is going to assign a severity rating,
  • 10:50 - 10:53
    and you do this individually first,
  • 10:53 - 10:56
    and then you’re going to aggregate those into a group severity rating
  • 10:56 - 11:00
    and produce an aggregate report out of that.
  • 11:01 - 11:06
    And finally, once you’ve got this aggregated report, you can share that with the design team,
  • 11:06 - 11:10
    and the design team can discuss what to do with that.
  • 11:10 - 11:13
    Doing this kind of expert review can be really taxing,
  • 11:13 - 11:16
    and so for each of the scenarios that you lay out in your design,
  • 11:16 - 11:22
    it can be valuable to have the evaluator go through that scenario twice.
  • 11:22 - 11:28
    The first time, they’ll just get a sense of it; and the second time, they can focus on more specific elements.
  • 11:30 - 11:35
    If you’ve got some walk-up-and-use system, like a ticket machine somewhere,
  • 11:35 - 11:39
    then you may want to not give people any background information at all,
  • 11:39 - 11:42
    because if you’ve got people that are just getting off the bus or the train,
  • 11:42 - 11:45
    and they walk up to your machine without any prior information,
  • 11:45 - 11:49
    that’s the experience you’ll want your evaluators to have.
  • 11:49 - 11:53
    On the other hand, if you’re going to have a genomic system or other expert user interface,
  • 11:53 - 11:57
    you’ll want to to make sure that whatever training you would give to real users,
  • 11:57 - 12:00
    you’re going to give to your evaluators as well.
  • 12:00 - 12:04
    In other words, whatever the background is, it should be realistic.
  • 12:06 - 12:09
    When your evaluators are walking through your interface,
  • 12:09 - 12:13
    it’s going to be important to produce a list of very specific problems
  • 12:13 - 12:17
    and explain those problems with regard to one of the design heuristics.
  • 12:17 - 12:21
    You don’t want people to just to be, like, “I don’t like it.”
  • 12:21 - 12:26
    And in order to maxilinearly preach you these results for the design team;
  • 12:26 - 12:31
    you’ll want to list each one of these separately so that they can be dealt with efficiently.
  • 12:31 - 12:37
    Separate listings can also help you avoid listing the same repeated problem over and over again.
  • 12:37 - 12:42
    If there’s a repeated element on every single screen, you don’t want to list it at every single screen;
  • 12:42 - 12:46
    you want to list it once so that it can be fixed once.
  • 12:47 - 12:52
    And these problems can be very detailed, like “the name of something is confusing,”
  • 12:52 - 12:56
    or it can be something that has to do more with the flow of the user interface,
  • 12:56 - 13:02
    or the architecture of the user experience and that’s not specifically tied to an interface element.
  • 13:03 - 13:07
    Your evaluators may also find that something is missing that ought to be there,
  • 13:07 - 13:11
    and this can be sometime ambiguous with early prototypes, like paper prototypes.
  • 13:11 - 13:17
    And so you’ll want to clarify whether the user interface is something that you believe to be complete,
  • 13:17 - 13:22
    or whether there are intentional elements missing ahead of time.
  • 13:22 - 13:26
    And, of course, sometimes there are features that are going to be obviously there
  • 13:26 - 13:28
    that are implied by the user interface.
  • 13:28 - 13:32
    And so, mellow out, and relax on those.
  • 13:35 - 13:37
    After your evaluators have gone through the interface,
  • 13:37 - 13:41
    they can each independently assign a severity rating to all of the problems that they’ve found.
  • 13:41 - 13:45
    And that’s going to enable you to allocate resources to fix those problems.
  • 13:45 - 13:48
    It can also help give you feedback about how well you’re doing
  • 13:48 - 13:51
    in terms of the usability of your system in general,
  • 13:51 - 13:55
    and give you a kind of a benchmark of your efforts in this vein.
  • 13:56 - 14:01
    The severity measure that your evaluators are going to come up with is going to combine several things:
  • 14:01 - 14:05
    It’s going to combine the frequency, the impact,
  • 14:05 - 14:09
    and the pervasiveness of the problem that they’re seeing on the screen.
  • 14:09 - 14:14
    So, something that is only in one place may be a less big deal
  • 14:14 - 14:19
    than something that shows up throughout the entire user interface.
  • 14:19 - 14:23
    Similarly, there are going to be some things like misaligned text,
  • 14:23 - 14:28
    which may be inelegant, but aren’t a deal killer in terms of your software.
  • 14:29 - 14:34
    And here is the severity rating system that Nielsen created; you can obviously use anything that you want:
  • 14:34 - 14:37
    It ranges from zero to four,
  • 14:37 - 14:42
    where zero is “at the end of the day your evaluator decides it actually is not usability problem,”
  • 14:42 - 14:48
    all the way up to it being something really catastrophic that has to get fixed right away.
  • 14:49 - 14:51
    And here is an example of a particular problem
  • 14:51 - 14:56
    that our TA Robby found when he was taking CS147 as a student.
  • 14:56 - 15:01
    He walked through somebody’s mobile interface that had a “weight” entry element to it;
  • 15:01 - 15:06
    and he realized that once you entered your weight, there is no way to edit it after the fact.
  • 15:06 - 15:12
    So, that’s kind of clunky, you wish you could fix it — maybe not a disaster.
  • 15:12 - 15:17
    And so what you see here is he’s listed the issue, he’s given it a severity rating,
  • 15:17 - 15:23
    he’s got the heuristic that it violates, and then he describes exactly what the problem is.
  • 15:24 - 15:27
    And finally, after all your evaluators have gone through the interface,
  • 15:27 - 15:31
    listed their problems, and combined them in terms of the severity and importance,
  • 15:31 - 15:34
    you’ll want to debrief with the design team.
  • 15:34 - 15:39
    This is a nice chance to be able to discuss general issues in the user interface and qualitative feedback,
  • 15:39 - 15:42
    and it gives you a chance to go through each of these items
  • 15:42 - 15:46
    and suggest improvements on how you can address these problems.
  • 15:48 - 15:51
    In this debrief session, it can be valuable for the development team
  • 15:51 - 15:56
    to estimate the amount of effort that it would take to fix one of these problems.
  • 15:56 - 16:01
    So, for example, if you’ve got something that is one on your severity scale and not too big a deal —
  • 16:01 - 16:06
    it might have something to do with wording and its dirt simple to fix —
  • 16:06 - 16:08
    that tells you “go ahead and fix it.”
  • 16:08 - 16:11
    Conversely, you may having something which is a catastrophe
  • 16:11 - 16:15
    which takes a lot more effort, but its importance will lead you to fix it.
  • 16:15 - 16:20
    And there’s other things where the importance relative to the cost involved
  • 16:20 - 16:23
    just don’t make sense to deal with right now.
  • 16:23 - 16:27
    And this debrief session can be a great way to brainstorm future design ideas,
  • 16:27 - 16:30
    especially while you’ve got all the stakeholders in the room,
  • 16:30 - 16:34
    and the ideas about what the issues are with the user interface are fresh in their minds.
  • 16:34 - 16:41
    In the next two videos we’ll go through Neilsons’ ten heuristics and talk more about what they mean.
Title:
Lecture 4.1: Heuristic Evaluation — Why and How? (16:41)
Video Language:
English

English subtitles

Revisions