In this video we are going to introduce a technique called Heuristic Evaluation. As we talked about at the beginning of the course, there’s lots of different ways to evaluate software. One that you might be most familiar with is empirical methods, where, of some level of formality, you have actual people trying out your software. It’s also possible to have formal methods, where you’re building a model of how people behave in a particular situation, and that enables you to predict how different user interfaces will work. Or, if you can’t build a closed-form formal model, you can also try out your interface with simulation and have automated tests — that can detect usability bugs and effective designs. This works especially well for low-level stuff; it’s harder to do for higher-level stuff. And what we’re going to talk about today is critique-based approaches, where people are giving you feedback directly, based on their expertise or a set of heuristics. As any of you who have ever taken an art or design class know, peer critique can be an incredibly effective form of feedback, and it can make you make your designs even better. You can get peer critique really at any stage of your design process, but I’d like to highlight a couple that I think can be particularly valuable. First, it’s really valuable to get peer critique before user testing, because that helps you not waste your users on stuff that’s just going to get picked up automatically. You want to be able to focus the valuable resources of user testing on stuff that other people wouldn’t be able to pick up on. The rich qualitative feedback that peer critique provides can also be really valuable before redesigning your application, because what it can do is it can show you what parts of your app you probably want to keep, and what are other parts that are more problematic and deserve redesign. Third, sometimes, you know there are problems, and you need data to be able to convince other stakeholders to make the changes. And peer critique can be a great way, especially if it’s structured, to be able to get the feedback that you need, to make the changes that you know need to happen. And lastly, this kind of structured peer critique can be really valuable before releasing software, because it helps you do a final sanding of the entire design, and smooth out any rough edges. As with most types of evaluation, it’s usually helpful to begin with a clear goal, even if what you ultimately learn is completely unexpected. And so, what we’re going to talk about today is a particular technique called Heuristic Evaluation. Heuristic Evaluation was created by Jakob Nielsen and colleagues, about twenty years ago now. And the goal of Heuristic Evaluation is to be able to find usability problems in the design. I first learned about Heuristic Evaluation when I TA’d James Landay’s Intro to HCI course, and I’ve been using it and teaching it ever since. It’s a really valuable technique because it lets you get feedback really quickly and it’s a high bang-for-the-buck strategy. And the slides that I have here are based off James’ slides for this course, and the materials are all available on Jacob Nielsen’s website. The basic idea of heuristic evaluation is that you’re going to provide a set of people — often other stakeholders on the design team or outside design experts — with a set of heuristics or principles, and they’re going to use those to look for problems in your design. Each of them is first going to do this independently and so they’ll walk through a variety of tasks using your design to look for these bugs. And you’ll see that different evaluators are going to find different problems. And then they’re going to communicate and talk together only at the end, afterwards. At the end of the process, they’re going to get back together and talk about what they found. And this “independent first, gather afterwards” is how you get a “wisdom of crowds” benefit in having multiple evaluators. And one of the reasons that we’re talking about this early in the class is that it’s a technique that you can use, either on a working user interface or on sketches of user interfaces. And so heuristic evaluation works really well in conjunction with paper prototypes and other rapid, low fidelity techniques that you may be using to get your design ideas out quick and fast. Here’s Neilsen’s ten heuristics, and they’re a pretty darn good set. That said, there’s nothing magic about these heuristics. They do a pretty good job of covering many of the problems that you’ll see in many user interfaces; but you can add on any that you want and get rid of any that aren’t appropriate for your system. We’re going to go over the content of these ten heuristics in the next couple lectures, and in this lecture I’d like to introduce the process that you’re going to use with these heuristics. So here’s what you’re going to have your evaluators do: Give them a couple of tasks to use your design for, and have them do each task, stepping through carefully several times. When they’re doing this, they’re going to keep the list of usability principles as a reminder of things to pay attention to. Now which principles will you use? I think Nielsen’s ten heuristics are a fantastic start, and you can augment those with anything else that’s relevant for your domain. So, if you have particular design goals that you would like your design to achieve, include those in the list. Or, if you have particular goals that you’ve set up from competitive analysis of designs that are out there already, that’s great too. Or if there are things that you’ve seen your or other designs excel at, those are important goals too and can be included in your list of heuristics. And then obviously, the important part is that you’re going to take what you learn from these evaluators and use those violations of the heuristics as a way of fixing problems and redesigning. Let’s talk a little bit more about why you might want to have multiple evaluators rather than just one. The graph on this slide is adapted from Jacob Neilsen’s work on heuristic evaluation and what you see is each black square is a bug that a particular evaluator found. An individual evaluator represents a row of this matrix and there’s about twenty evaluators in this set. The columns represent the problems. And what you can see is that there’s some problems that were found by relatively few evaluators and other stuff which almost everybody found. So we’re going to call the stuff on the right the easy problems and the stuff on the left hard problems. And so, in aggregate, what we can say is that no evaluator found every problem, and some evaluators found more than others, and so there are better and worse people to do this. So why not have lots of evaluators? Well, as you add more evaluators, they do find more problems; but it kind of tapers off over time — you lose that benefit eventually. And so from a cost-benefit perspective it’s just stops making sense after a certain point. So where’s the peak of this curve? It’s of course going to depend on the user interface that you’re working with, how much you’re paying people, how much time is involved — all sorts of factors. Jakob Nielsen’s rule of thumb for these kinds of user interfaces and heuristic evaluation is that three to five people tends to work pretty well; and that’s been my experience too. And I think that definitely one of the reasons that people use heuristic evaluation is because it can be an extremely cost-effective way of finding problems. In one study that Jacob Nielsen ran, he estimated that the cost of the problems found with heuristic evaluation were $500,000 and the cost of performing it was just over $10,000, and so he estimates a 48-fold benefit-cost ratio for this particular user interface. Obviously, these numbers are back of the envelope, and your mileage will vary. You can think about how to estimate the benefit that you get from something like this if you have an in-house software tool using something like productivity increases — that, if you are making an expense reporting system or other in-house system that will make people’s time more efficiently used — that’s a big usability win. And if you’ve got software that you’re making available on the open market, you can think about the benefit from sales or other measures like that. One thing that we can get from that graph is that evaluators are more likely to find severe problems and that’s good news; and so with a relatively small number of people, you’re pretty likely to stumble across the most important stuff. However, as we saw with just one person in this particular case, even the best evaluator found only about a third of the problems of the system. And so that’s why ganging up a number of evaluators, say five, is going to get you most of the benefit that you’ll be going to be able to achieve. If we compare heuristic evaluation and user testing, one of the things that we see is that heuristic evaluation can often be a lot faster — It takes just an hour or two for an evaluator — and the mechanics of getting a user test up and running can take longer, not even accounting for the fact that you may have to build software. Also, the heuristic evaluation results come pre-interpreted because your evaluators are directly providing you with problems and things to fix, and so it saves you the time of having to infer from the usability tests what might be the problem or solution. Now conversely, experts walking through your system can generate false positives that wouldn’t actually happen in a real environment. And this indeed does happen, and so user testing is, sort of, by definition going to be more accurate. At the end of the day I think it’s valuable to alternate methods: All of the different techniques that you’ll learn in this class for getting feedback can each be valuable, and that [by] cycling through them you can often get the benefits of each. And that can be because with user evaluation and user testing, you’ll find different problems, and by running HE or something like that early in the design process, you’ll avoid wasting real users that you may bring in later on. So now that we’ve seen the benefits, what are the steps? The first thing to do is to get all of your evaluators up to speed, on what the story is behind your software — any necessary domain knowledge they might need — and tell them about the scenario that you’re going to have them step through. Then obviously, you have the evaluation phase where people are working through the interface. Afterwards, each person is going to assign a severity rating, and you do this individually first, and then you’re going to aggregate those into a group severity rating and produce an aggregate report out of that. And finally, once you’ve got this aggregated report, you can share that with the design team, and the design team can discuss what to do with that. Doing this kind of expert review can be really taxing, and so for each of the scenarios that you lay out in your design, it can be valuable to have the evaluator go through that scenario twice. The first time, they’ll just get a sense of it; and the second time, they can focus on more specific elements. If you’ve got some walk-up-and-use system, like a ticket machine somewhere, then you may want to not give people any background information at all, because if you’ve got people that are just getting off the bus or the train, and they walk up to your machine without any prior information, that’s the experience you’ll want your evaluators to have. On the other hand, if you’re going to have a genomic system or other expert user interface, you’ll want to to make sure that whatever training you would give to real users, you’re going to give to your evaluators as well. In other words, whatever the background is, it should be realistic. When your evaluators are walking through your interface, it’s going to be important to produce a list of very specific problems and explain those problems with regard to one of the design heuristics. You don’t want people to just to be, like, “I don’t like it.” And in order to maxilinearly preach you these results for the design team; you’ll want to list each one of these separately so that they can be dealt with efficiently. Separate listings can also help you avoid listing the same repeated problem over and over again. If there’s a repeated element on every single screen, you don’t want to list it at every single screen; you want to list it once so that it can be fixed once. And these problems can be very detailed, like “the name of something is confusing,” or it can be something that has to do more with the flow of the user interface, or the architecture of the user experience and that’s not specifically tied to an interface element. Your evaluators may also find that something is missing that ought to be there, and this can be sometime ambiguous with early prototypes, like paper prototypes. And so you’ll want to clarify whether the user interface is something that you believe to be complete, or whether there are intentional elements missing ahead of time. And, of course, sometimes there are features that are going to be obviously there that are implied by the user interface. And so, mellow out, and relax on those. After your evaluators have gone through the interface, they can each independently assign a severity rating to all of the problems that they’ve found. And that’s going to enable you to allocate resources to fix those problems. It can also help give you feedback about how well you’re doing in terms of the usability of your system in general, and give you a kind of a benchmark of your efforts in this vein. The severity measure that your evaluators are going to come up with is going to combine several things: It’s going to combine the frequency, the impact, and the pervasiveness of the problem that they’re seeing on the screen. So, something that is only in one place may be a less big deal than something that shows up throughout the entire user interface. Similarly, there are going to be some things like misaligned text, which may be inelegant, but aren’t a deal killer in terms of your software. And here is the severity rating system that Nielsen created; you can obviously use anything that you want: It ranges from zero to four, where zero is “at the end of the day your evaluator decides it actually is not usability problem,” all the way up to it being something really catastrophic that has to get fixed right away. And here is an example of a particular problem that our TA Robby found when he was taking CS147 as a student. He walked through somebody’s mobile interface that had a “weight” entry element to it; and he realized that once you entered your weight, there is no way to edit it after the fact. So, that’s kind of clunky, you wish you could fix it — maybe not a disaster. And so what you see here is he’s listed the issue, he’s given it a severity rating, he’s got the heuristic that it violates, and then he describes exactly what the problem is. And finally, after all your evaluators have gone through the interface, listed their problems, and combined them in terms of the severity and importance, you’ll want to debrief with the design team. This is a nice chance to be able to discuss general issues in the user interface and qualitative feedback, and it gives you a chance to go through each of these items and suggest improvements on how you can address these problems. In this debrief session, it can be valuable for the development team to estimate the amount of effort that it would take to fix one of these problems. So, for example, if you’ve got something that is one on your severity scale and not too big a deal — it might have something to do with wording and its dirt simple to fix — that tells you “go ahead and fix it.” Conversely, you may having something which is a catastrophe which takes a lot more effort, but its importance will lead you to fix it. And there’s other things where the importance relative to the cost involved just don’t make sense to deal with right now. And this debrief session can be a great way to brainstorm future design ideas, especially while you’ve got all the stakeholders in the room, and the ideas about what the issues are with the user interface are fresh in their minds. In the next two videos we’ll go through Neilsons’ ten heuristics and talk more about what they mean.