1 00:00:00,032 --> 00:00:05,077 In this video we are going to introduce a technique called Heuristic Evaluation. 2 00:00:05,077 --> 00:00:11,047 As we talked about at the beginning of the course, there’s lots of different ways to evaluate software. 3 00:00:11,047 --> 00:00:14,058 One that you might be most familiar with is empirical methods, 4 00:00:14,058 --> 00:00:19,045 where, of some level of formality, you have actual people trying out your software. 5 00:00:19,045 --> 00:00:25,029 It’s also possible to have formal methods, where you’re building a model 6 00:00:25,029 --> 00:00:28,019 of how people behave in a particular situation, 7 00:00:28,019 --> 00:00:32,020 and that enables you to predict how different user interfaces will work. 8 00:00:32,020 --> 00:00:36,027 Or, if you can’t build a closed-form formal model, 9 00:00:36,027 --> 00:00:40,046 you can also try out your interface with simulation and have automated tests — 10 00:00:40,046 --> 00:00:44,082 that can detect usability bugs and effective designs. 11 00:00:44,082 --> 00:00:49,075 This works especially well for low-level stuff; it’s harder to do for higher-level stuff. 12 00:00:49,075 --> 00:00:52,093 And what we’re going to talk about today is critique-based approaches, 13 00:00:52,093 --> 00:01:00,005 where people are giving you feedback directly, based on their expertise or a set of heuristics. 14 00:01:00,005 --> 00:01:03,024 As any of you who have ever taken an art or design class know, 15 00:01:03,024 --> 00:01:06,072 peer critique can be an incredibly effective form of feedback, 16 00:01:06,072 --> 00:01:09,024 and it can make you make your designs even better. 17 00:01:09,024 --> 00:01:12,095 You can get peer critique really at any stage of your design process, 18 00:01:12,095 --> 00:01:16,840 but I’d like to highlight a couple that I think can be particularly valuable. 19 00:01:16,840 --> 00:01:21,393 First, it’s really valuable to get peer critique before user testing, 20 00:01:21,393 --> 00:01:27,079 because that helps you not waste your users on stuff that’s just going to get picked up automatically. 21 00:01:27,079 --> 00:01:30,352 You want to be able to focus the valuable resources of user testing 22 00:01:30,352 --> 00:01:34,194 on stuff that other people wouldn’t be able to pick up on. 23 00:01:34,194 --> 00:01:37,025 The rich qualitative feedback that peer critique provides 24 00:01:37,025 --> 00:01:40,780 can also be really valuable before redesigning your application, 25 00:01:40,780 --> 00:01:45,147 because what it can do is it can show you what parts of your app you probably want to keep, 26 00:01:45,147 --> 00:01:48,779 and what are other parts that are more problematic and deserve redesign. 27 00:01:49,426 --> 00:01:51,145 Third, sometimes, you know there are problems, 28 00:01:51,145 --> 00:01:55,660 and you need data to be able to convince other stakeholders to make the changes. 29 00:01:55,660 --> 00:01:59,549 And peer critique can be a great way, especially if it’s structured, 30 00:01:59,549 --> 00:02:04,633 to be able to get the feedback that you need, to make the changes that you know need to happen. 31 00:02:05,571 --> 00:02:11,012 And lastly, this kind of structured peer critique can be really valuable before releasing software, 32 00:02:11,012 --> 00:02:15,633 because it helps you do a final sanding of the entire design, and smooth out any rough edges. 33 00:02:15,633 --> 00:02:20,787 As with most types of evaluation, it’s usually helpful to begin with a clear goal, 34 00:02:20,787 --> 00:02:24,068 even if what you ultimately learn is completely unexpected. 35 00:02:26,001 --> 00:02:30,658 And so, what we’re going to talk about today is a particular technique called Heuristic Evaluation. 36 00:02:30,658 --> 00:02:35,460 Heuristic Evaluation was created by Jakob Nielsen and colleagues, about twenty years ago now. 37 00:02:36,122 --> 00:02:41,653 And the goal of Heuristic Evaluation is to be able to find usability problems in the design. 38 00:02:42,653 --> 00:02:44,446 I first learned about Heuristic Evaluation 39 00:02:44,446 --> 00:02:49,748 when I TA’d James Landay’s Intro to HCI course, and I’ve been using it and teaching it ever since. 40 00:02:49,748 --> 00:02:54,049 It’s a really valuable technique because it lets you get feedback really quickly 41 00:02:54,049 --> 00:02:57,698 and it’s a high bang-for-the-buck strategy. 42 00:02:57,698 --> 00:03:01,590 And the slides that I have here are based off James’ slides for this course, 43 00:03:01,590 --> 00:03:05,897 and the materials are all available on Jacob Nielsen’s website. 44 00:03:05,897 --> 00:03:10,055 The basic idea of heuristic evaluation is that you’re going to provide a set of people — 45 00:03:10,055 --> 00:03:15,042 often other stakeholders on the design team or outside design experts — 46 00:03:15,042 --> 00:03:17,937 with a set of heuristics or principles, 47 00:03:17,937 --> 00:03:22,828 and they’re going to use those to look for problems in your design. 48 00:03:23,567 --> 00:03:26,101 Each of them is first going to do this independently 49 00:03:26,101 --> 00:03:31,057 and so they’ll walk through a variety of tasks using your design to look for these bugs. 50 00:03:32,565 --> 00:03:36,846 And you’ll see that different evaluators are going to find different problems. 51 00:03:36,846 --> 00:03:41,330 And then they’re going to communicate and talk together only at the end, afterwards. 52 00:03:43,068 --> 00:03:47,269 At the end of the process, they’re going to get back together and talk about what they found. 53 00:03:47,269 --> 00:03:50,589 And this “independent first, gather afterwards” 54 00:03:50,589 --> 00:03:56,583 is how you get a “wisdom of crowds” benefit in having multiple evaluators. 55 00:03:56,583 --> 00:03:58,772 And one of the reasons that we’re talking about this early in the class 56 00:03:58,772 --> 00:04:05,067 is that it’s a technique that you can use, either on a working user interface or on sketches of user interfaces. 57 00:04:05,067 --> 00:04:10,417 And so heuristic evaluation works really well in conjunction with paper prototypes 58 00:04:10,417 --> 00:04:16,352 and other rapid, low fidelity techniques that you may be using to get your design ideas out quick and fast. 59 00:04:18,290 --> 00:04:22,375 Here’s Neilsen’s ten heuristics, and they’re a pretty darn good set. 60 00:04:22,375 --> 00:04:25,044 That said, there’s nothing magic about these heuristics. 61 00:04:25,044 --> 00:04:30,300 They do a pretty good job of covering many of the problems that you’ll see in many user interfaces; 62 00:04:30,300 --> 00:04:33,488 but you can add on any that you want 63 00:04:33,488 --> 00:04:37,608 and get rid of any that aren’t appropriate for your system. 64 00:04:37,608 --> 00:04:40,984 We’re going to go over the content of these ten heuristics in the next couple lectures, 65 00:04:40,984 --> 00:04:45,543 and in this lecture I’d like to introduce the process that you’re going to use with these heuristics. 66 00:04:46,343 --> 00:04:49,243 So here’s what you’re going to have your evaluators do: 67 00:04:49,243 --> 00:04:52,272 Give them a couple of tasks to use your design for, 68 00:04:52,272 --> 00:04:57,025 and have them do each task, stepping through carefully several times. 69 00:04:57,025 --> 00:05:00,576 When they’re doing this, they’re going to keep the list of usability principles 70 00:05:00,576 --> 00:05:03,065 as a reminder of things to pay attention to. 71 00:05:03,065 --> 00:05:05,707 Now which principles will you use? 72 00:05:05,707 --> 00:05:08,752 I think Nielsen’s ten heuristics are a fantastic start, 73 00:05:08,752 --> 00:05:12,955 and you can augment those with anything else that’s relevant for your domain. 74 00:05:12,955 --> 00:05:19,035 So, if you have particular design goals that you would like your design to achieve, include those in the list. 75 00:05:19,035 --> 00:05:21,572 Or, if you have particular goals that you’ve set up 76 00:05:21,572 --> 00:05:25,893 from competitive analysis of designs that are out there already, 77 00:05:25,893 --> 00:05:27,312 that’s great too. 78 00:05:27,312 --> 00:05:32,621 Or if there are things that you’ve seen your or other designs excel at, 79 00:05:32,621 --> 00:05:37,189 those are important goals too and can be included in your list of heuristics. 80 00:05:38,835 --> 00:05:42,706 And then obviously, the important part is that you’re going to take what you learn from these evaluators 81 00:05:42,706 --> 00:05:48,606 and use those violations of the heuristics as a way of fixing problems and redesigning. 82 00:05:49,360 --> 00:05:55,042 Let’s talk a little bit more about why you might want to have multiple evaluators rather than just one. 83 00:05:55,042 --> 00:05:59,899 The graph on this slide is adapted from Jacob Neilsen’s work on heuristic evaluation 84 00:05:59,899 --> 00:06:06,568 and what you see is each black square is a bug that a particular evaluator found. 85 00:06:07,783 --> 00:06:11,908 An individual evaluator represents a row of this matrix 86 00:06:11,908 --> 00:06:15,036 and there’s about twenty evaluators in this set. 87 00:06:15,036 --> 00:06:16,973 The columns represent the problems. 88 00:06:16,973 --> 00:06:21,566 And what you can see is that there’s some problems that were found by relatively few evaluators 89 00:06:21,566 --> 00:06:24,621 and other stuff which almost everybody found. 90 00:06:24,621 --> 00:06:29,056 So we’re going to call the stuff on the right the easy problems and the stuff on the left hard problems. 91 00:06:30,087 --> 00:06:35,004 And so, in aggregate, what we can say is that no evaluator found every problem, 92 00:06:35,004 --> 00:06:41,407 and some evaluators found more than others, and so there are better and worse people to do this. 93 00:06:43,007 --> 00:06:44,951 So why not have lots of evaluators? 94 00:06:44,951 --> 00:06:48,878 Well, as you add more evaluators, they do find more problems; 95 00:06:49,617 --> 00:06:53,160 but it kind of tapers off over time — you lose that benefit eventually. 96 00:06:53,544 --> 00:06:58,435 And so from a cost-benefit perspective it’s just stops making sense after a certain point. 97 00:06:59,035 --> 00:07:00,604 So where’s the peak of this curve? 98 00:07:00,604 --> 00:07:04,134 It’s of course going to depend on the user interface that you’re working with, 99 00:07:04,134 --> 00:07:08,470 how much you’re paying people, how much time is involved — all sorts of factors. 100 00:07:08,470 --> 00:07:13,459 Jakob Nielsen’s rule of thumb for these kinds of user interfaces and heuristic evaluation 101 00:07:13,459 --> 00:07:19,033 is that three to five people tends to work pretty well; and that’s been my experience too. 102 00:07:20,171 --> 00:07:24,015 And I think that definitely one of the reasons that people use heuristic evaluation 103 00:07:24,015 --> 00:07:28,042 is because it can be an extremely cost-effective way of finding problems. 104 00:07:29,119 --> 00:07:31,590 In one study that Jacob Nielsen ran, 105 00:07:31,590 --> 00:07:37,293 he estimated that the cost of the problems found with heuristic evaluation were $500,000 106 00:07:37,293 --> 00:07:41,171 and the cost of performing it was just over $10,000, 107 00:07:41,171 --> 00:07:48,980 and so he estimates a 48-fold benefit-cost ratio for this particular user interface. 108 00:07:48,980 --> 00:07:54,906 Obviously, these numbers are back of the envelope, and your mileage will vary. 109 00:07:54,906 --> 00:07:58,984 You can think about how to estimate the benefit that you get from something like this 110 00:07:58,984 --> 00:08:03,302 if you have an in-house software tool using something like productivity increases — 111 00:08:03,302 --> 00:08:06,956 that, if you are making an expense reporting system 112 00:08:06,956 --> 00:08:11,672 or other in-house system that will make people’s time more efficiently used — 113 00:08:11,672 --> 00:08:13,901 that’s a big usability win. 114 00:08:13,901 --> 00:08:17,537 And if you’ve got software that you’re making available on the open market, 115 00:08:17,537 --> 00:08:22,450 you can think about the benefit from sales or other measures like that. 116 00:08:23,604 --> 00:08:28,265 One thing that we can get from that graph is that evaluators are more likely to find severe problems 117 00:08:28,265 --> 00:08:29,615 and that’s good news; 118 00:08:29,615 --> 00:08:32,258 and so with a relatively small number of people, 119 00:08:32,258 --> 00:08:35,911 you’re pretty likely to stumble across the most important stuff. 120 00:08:35,911 --> 00:08:40,927 However, as we saw with just one person in this particular case, 121 00:08:40,927 --> 00:08:46,109 even the best evaluator found only about a third of the problems of the system. 122 00:08:46,109 --> 00:08:50,680 And so that’s why ganging up a number of evaluators, say five, 123 00:08:50,680 --> 00:08:54,974 is going to get you most of the benefit that you’ll be going to be able to achieve. 124 00:08:55,958 --> 00:09:00,017 If we compare heuristic evaluation and user testing, one of the things that we see 125 00:09:00,017 --> 00:09:06,927 is that heuristic evaluation can often be a lot faster — It takes just an hour or two for an evaluator — 126 00:09:06,927 --> 00:09:11,458 and the mechanics of getting a user test up and running can take longer, 127 00:09:11,458 --> 00:09:16,344 not even accounting for the fact that you may have to build software. 128 00:09:17,667 --> 00:09:21,465 Also, the heuristic evaluation results come pre-interpreted 129 00:09:21,465 --> 00:09:26,164 because your evaluators are directly providing you with problems and things to fix, 130 00:09:26,164 --> 00:09:34,315 and so it saves you the time of having to infer from the usability tests what might be the problem or solution. 131 00:09:35,638 --> 00:09:39,235 Now conversely, experts walking through your system 132 00:09:39,235 --> 00:09:44,095 can generate false positives that wouldn’t actually happen in a real environment. 133 00:09:44,105 --> 00:09:50,376 And this indeed does happen, and so user testing is, sort of, by definition going to be more accurate. 134 00:09:52,099 --> 00:09:55,071 At the end of the day I think it’s valuable to alternate methods: 135 00:09:55,071 --> 00:10:00,306 All of the different techniques that you’ll learn in this class for getting feedback can each be valuable, 136 00:10:00,306 --> 00:10:04,849 and that [by] cycling through them you can often get the benefits of each. 137 00:10:04,849 --> 00:10:10,642 And that can be because with user evaluation and user testing, you’ll find different problems, 138 00:10:10,642 --> 00:10:15,491 and by running HE or something like that early in the design process, 139 00:10:15,491 --> 00:10:20,217 you’ll avoid wasting real users that you may bring in later on. 140 00:10:21,479 --> 00:10:24,944 So now that we’ve seen the benefits, what are the steps? 141 00:10:24,944 --> 00:10:29,640 The first thing to do is to get all of your evaluators up to speed, 142 00:10:29,640 --> 00:10:35,798 on what the story is behind your software — any necessary domain knowledge they might need — 143 00:10:35,814 --> 00:10:39,663 and tell them about the scenario that you’re going to have them step through. 144 00:10:40,879 --> 00:10:45,081 Then obviously, you have the evaluation phase where people are working through the interface. 145 00:10:45,081 --> 00:10:50,075 Afterwards, each person is going to assign a severity rating, 146 00:10:50,075 --> 00:10:52,742 and you do this individually first, 147 00:10:52,742 --> 00:10:56,123 and then you’re going to aggregate those into a group severity rating 148 00:10:56,123 --> 00:10:59,504 and produce an aggregate report out of that. 149 00:11:00,689 --> 00:11:06,284 And finally, once you’ve got this aggregated report, you can share that with the design team, 150 00:11:06,284 --> 00:11:09,715 and the design team can discuss what to do with that. 151 00:11:10,007 --> 00:11:12,906 Doing this kind of expert review can be really taxing, 152 00:11:12,906 --> 00:11:16,096 and so for each of the scenarios that you lay out in your design, 153 00:11:16,096 --> 00:11:22,056 it can be valuable to have the evaluator go through that scenario twice. 154 00:11:22,056 --> 00:11:28,029 The first time, they’ll just get a sense of it; and the second time, they can focus on more specific elements. 155 00:11:30,029 --> 00:11:34,710 If you’ve got some walk-up-and-use system, like a ticket machine somewhere, 156 00:11:34,710 --> 00:11:38,897 then you may want to not give people any background information at all, 157 00:11:38,897 --> 00:11:42,098 because if you’ve got people that are just getting off the bus or the train, 158 00:11:42,098 --> 00:11:45,369 and they walk up to your machine without any prior information, 159 00:11:45,369 --> 00:11:49,348 that’s the experience you’ll want your evaluators to have. 160 00:11:49,348 --> 00:11:53,485 On the other hand, if you’re going to have a genomic system or other expert user interface, 161 00:11:53,485 --> 00:11:57,020 you’ll want to to make sure that whatever training you would give to real users, 162 00:11:57,020 --> 00:11:59,570 you’re going to give to your evaluators as well. 163 00:11:59,570 --> 00:12:03,553 In other words, whatever the background is, it should be realistic. 164 00:12:05,738 --> 00:12:08,647 When your evaluators are walking through your interface, 165 00:12:08,647 --> 00:12:12,571 it’s going to be important to produce a list of very specific problems 166 00:12:12,571 --> 00:12:16,983 and explain those problems with regard to one of the design heuristics. 167 00:12:16,983 --> 00:12:21,200 You don’t want people to just to be, like, “I don’t like it.” 168 00:12:21,200 --> 00:12:26,233 And in order to maxilinearly preach you these results for the design team; 169 00:12:26,233 --> 00:12:31,445 you’ll want to list each one of these separately so that they can be dealt with efficiently. 170 00:12:31,445 --> 00:12:37,158 Separate listings can also help you avoid listing the same repeated problem over and over again. 171 00:12:37,158 --> 00:12:42,483 If there’s a repeated element on every single screen, you don’t want to list it at every single screen; 172 00:12:42,483 --> 00:12:45,819 you want to list it once so that it can be fixed once. 173 00:12:46,881 --> 00:12:52,322 And these problems can be very detailed, like “the name of something is confusing,” 174 00:12:52,322 --> 00:12:55,709 or it can be something that has to do more with the flow of the user interface, 175 00:12:55,709 --> 00:13:02,109 or the architecture of the user experience and that’s not specifically tied to an interface element. 176 00:13:03,232 --> 00:13:07,048 Your evaluators may also find that something is missing that ought to be there, 177 00:13:07,048 --> 00:13:11,247 and this can be sometime ambiguous with early prototypes, like paper prototypes. 178 00:13:11,247 --> 00:13:17,365 And so you’ll want to clarify whether the user interface is something that you believe to be complete, 179 00:13:17,365 --> 00:13:21,762 or whether there are intentional elements missing ahead of time. 180 00:13:22,177 --> 00:13:25,789 And, of course, sometimes there are features that are going to be obviously there 181 00:13:25,789 --> 00:13:28,077 that are implied by the user interface. 182 00:13:28,077 --> 00:13:31,893 And so, mellow out, and relax on those. 183 00:13:34,509 --> 00:13:36,755 After your evaluators have gone through the interface, 184 00:13:36,755 --> 00:13:41,265 they can each independently assign a severity rating to all of the problems that they’ve found. 185 00:13:41,265 --> 00:13:45,099 And that’s going to enable you to allocate resources to fix those problems. 186 00:13:45,099 --> 00:13:48,220 It can also help give you feedback about how well you’re doing 187 00:13:48,220 --> 00:13:50,972 in terms of the usability of your system in general, 188 00:13:50,972 --> 00:13:55,180 and give you a kind of a benchmark of your efforts in this vein. 189 00:13:56,380 --> 00:14:01,119 The severity measure that your evaluators are going to come up with is going to combine several things: 190 00:14:01,119 --> 00:14:05,032 It’s going to combine the frequency, the impact, 191 00:14:05,032 --> 00:14:08,930 and the pervasiveness of the problem that they’re seeing on the screen. 192 00:14:08,930 --> 00:14:14,052 So, something that is only in one place may be a less big deal 193 00:14:14,052 --> 00:14:18,563 than something that shows up throughout the entire user interface. 194 00:14:18,563 --> 00:14:23,024 Similarly, there are going to be some things like misaligned text, 195 00:14:23,024 --> 00:14:27,553 which may be inelegant, but aren’t a deal killer in terms of your software. 196 00:14:29,060 --> 00:14:34,441 And here is the severity rating system that Nielsen created; you can obviously use anything that you want: 197 00:14:34,441 --> 00:14:36,692 It ranges from zero to four, 198 00:14:36,692 --> 00:14:41,896 where zero is “at the end of the day your evaluator decides it actually is not usability problem,” 199 00:14:41,896 --> 00:14:47,720 all the way up to it being something really catastrophic that has to get fixed right away. 200 00:14:48,766 --> 00:14:51,335 And here is an example of a particular problem 201 00:14:51,335 --> 00:14:56,027 that our TA Robby found when he was taking CS147 as a student. 202 00:14:56,027 --> 00:15:01,079 He walked through somebody’s mobile interface that had a “weight” entry element to it; 203 00:15:01,079 --> 00:15:05,916 and he realized that once you entered your weight, there is no way to edit it after the fact. 204 00:15:05,916 --> 00:15:12,258 So, that’s kind of clunky, you wish you could fix it — maybe not a disaster. 205 00:15:12,258 --> 00:15:17,085 And so what you see here is he’s listed the issue, he’s given it a severity rating, 206 00:15:17,085 --> 00:15:23,157 he’s got the heuristic that it violates, and then he describes exactly what the problem is. 207 00:15:23,634 --> 00:15:26,869 And finally, after all your evaluators have gone through the interface, 208 00:15:26,869 --> 00:15:31,272 listed their problems, and combined them in terms of the severity and importance, 209 00:15:31,272 --> 00:15:34,183 you’ll want to debrief with the design team. 210 00:15:34,183 --> 00:15:39,171 This is a nice chance to be able to discuss general issues in the user interface and qualitative feedback, 211 00:15:39,171 --> 00:15:42,234 and it gives you a chance to go through each of these items 212 00:15:42,234 --> 00:15:45,683 and suggest improvements on how you can address these problems. 213 00:15:47,713 --> 00:15:51,096 In this debrief session, it can be valuable for the development team 214 00:15:51,096 --> 00:15:55,913 to estimate the amount of effort that it would take to fix one of these problems. 215 00:15:55,913 --> 00:16:01,436 So, for example, if you’ve got something that is one on your severity scale and not too big a deal — 216 00:16:01,436 --> 00:16:06,128 it might have something to do with wording and its dirt simple to fix — 217 00:16:06,128 --> 00:16:08,335 that tells you “go ahead and fix it.” 218 00:16:08,335 --> 00:16:11,147 Conversely, you may having something which is a catastrophe 219 00:16:11,147 --> 00:16:15,483 which takes a lot more effort, but its importance will lead you to fix it. 220 00:16:15,483 --> 00:16:19,602 And there’s other things where the importance relative to the cost involved 221 00:16:19,602 --> 00:16:22,813 just don’t make sense to deal with right now. 222 00:16:22,813 --> 00:16:26,867 And this debrief session can be a great way to brainstorm future design ideas, 223 00:16:26,867 --> 00:16:29,723 especially while you’ve got all the stakeholders in the room, 224 00:16:29,723 --> 00:16:34,373 and the ideas about what the issues are with the user interface are fresh in their minds. 225 00:16:34,373 --> 00:16:40,749 In the next two videos we’ll go through Neilsons’ ten heuristics and talk more about what they mean.