In many ways, the most creative, challenging, and under-appreciated aspect of interaction design is evaluating designs with people. The insights that you’ll get from testing designs with people can help you get new ideas, make changes, decide wisely, and fix bugs. One reason I think design is such an interesting field is its relationship to truth and objectivity. I find design so incredibly fascinating because we can say more in response to a question like: “How can we measure success?” than “It’s just personal preference” or “Whatever feels right.” At the same time, the answers are more complex and more open-ended, more subjective, and require more wisdom than just a number like 7 or 3. One of the things that we’re going to learn in this class is the different kinds of knowledge that you can get out of different kinds of methods. Why evaluate designs with people? Why learn about how people use interactive systems? I think one major reason for this is that it can be difficult to tell how good a user interface is until you’ve tried it out with actual users, and that’s because clients and designers and developers, they may know too much about the domain and the user interface, or have acquired blinders through designing and building the user interface. At the same time they may not know enough about the user’s actual tasks. And while experience and theory can help, it can still be hard to predict what real users will actually do. You might want to know, “Can people figure out how to use it?” or “Do they swear or giggle when using this interface?” “How does this design compare to that design?” and, “If we changed the interface, how does that change people’s behaviour?” “What new practices might emerge?” “How do things change over time?” These are all great questions to ask about an interface, and each will come from different methods. The value of having a broad toolbox of different methods can be especially valuable in emerging areas like mobile and social software where people’s use practices can be particularly context-dependent and also evolves significantly over time in response to how other people use software through network effects and things like that. To give you a flavour of this, I’d like to quickly run through some common types of empiracal research in HCI. The examples I’ll show are mostly published work of one sort or another, because that’s the easiest stuff to share. If you have good examples from current systems out in the world, post them to the forum! I keep an archive of user interface examples, and I and the other students would love to see what you can come up with. One way to learn about the user experience of a design is to bring people into your lab or office and have them try it out. We often call these usability studies. This “watch someone use my interface” approach is a common one in HCI. This basic strategy for traditional user-centred design is to iteratively bring people into your lab or office until you run out of time. And then release. And, if you had deep pockets, these rooms had a one-way glass mirror, and the development team was on the other side. In a leaner environment, this may be just bring in people into your dorm room office. You’ll learn a huge amount by doing this. Every single time that I or a student, friend, or colleague has watched somebody use a new interactive system, we learn something, [as,] as designers we get blinders to systems’ quirks, bugs, and false assumptions. However, there are some major shortcomings to this approach. In particular, the setting probably isn’t very ecologically valid. In the real world, people may have different tasks, goals, motivations, and physical settings than your office or lab. This can be especially true for user interfaces that you think people might use on the go, like at a bus stop or while waiting in line. Second, there can be a “please me” experimental bias, where when you bring somebody in to try out a user interface, they know that they’re trying out the technology that you developed and so they may work harder or be nicer than they would if they had to use it without the constraints of a lab setup with the person who developed it watching right over them. Third, in its most basic form where you’re just trying out just one user interface, there is no comparison point. So while you can track when people laugh, or swear, or smile with joy, you won’t know whether they would’ve laugh more, or sworn less, or smiled more if you’d had a different user interface. And finally it requires bringing people to your physical location. This is often a whole lot easier than a lot of people think. It can be a psychological burden, even if nothing else. A very different way of getting feedback from people is to use a survey. Here is an example of a survey that I got recently from San Francisco asking about different street light designs. Surveys are great because you can quickly get feedback from a large number of responses. And it’s relatively easy to compare multiple alternatives. You can also automatically tally the results. You don’t even need to build anything; you can just show screen shots or mock-ups. One of the things that I’ve learned the hard way, though, is the difference between what people say they’re going to do and what they actually do. Ask people how often they exercise and you’ll probably get a much more optimistic answer than how often they really do exercise. The same holds for the street light example here. Try to imagine what a number of different street light designs might be is really different than actually observing them on the street and having them become part of normal everyday life. Still, it can be valuable to get feedback. Another type of responder strategy is focus groups. In a focus group, you’ll gather together a small group of people to discuss a design or idea. The fact that focus groups involve a group of people is a double-edged sword. On one hand, you can get people to tease out of their colleagues things that they might not have thought to say on their own; on the other hand, for a variety of psychological reasons, people may be inclined to say polite things or generate answers completely on the spot that are totally uncorrelated with what they believe or what they would actually do. Focus groups can be a particularly problematic method when you are looking at trying to gather data about taboo topics or about cultural biases. With those caveats — right now we’re just making a laundry list, and — I think that focus groups, like almost any other method, can play an important role in your toolbelt. Our third category of techniques is to get feedback from experts. For example, in this class we’re going to do a bunch of peer critique for your weekly project assignments. In addition to having users try your interface, it can be important to eat your own dog food and use the tools that you built yourself. When you are getting feedback from experts, it can often be helpful to have some kind of structured format, much like the rubrics you’ll see in your project assignments. And, for getting feedback on user interfaces, one common approach to this structured feedback is called heuristic evaluation, and you’ll learn how to do that in this class; it’s pioneered by Jacob Nielson. Our next genre is comparative experiments: taking two or more distinct options and comparing their performance to each other. These comparisons can take place in lots of different ways: They can be in the lab; they can be in the field; they can be online. These experiments can be more-or-less controlled, and they can take place over shorter or longer durations. What you’re trying to learn here is which option is the more effective, and, more often, what are the active ingredients, what are the variables that matter in creating the user experience that you seek. Here’s an example: My former PhD student Joel Brandt, and his colleague at Adobe, ran a number of studies comparing help interfaces for programmers. In particular they compared a more traditional search-style user interface for finding programming help with a search interface that integrated programming help directly into your environment. By running these comparisons they were able to see how programmers’ behaviour differed based on the changing help user interface. Comparative experiments have an advantage over surveys in that you get to see the actual behaviour as opposed to self report, and they can be better than usability studies because you’re comparing multiple alternatives. This enables you to see what works better or worse, or at least what works different. I find that comparative feedback is also often much more actionable. However, if you are running controlled experiments online, you don’t get to see much about the person on the other side of the screen. And if you are inviting people into your office or lab, the behaviour you’re measuring might not be very realistic. If realistic longitudinal behaviour is what you’re after, participant observation may be the approach for you. This approach is just what it sounds like: observing what people actually do in their actual work environment. And this more long-term evaluation can be important for uncovering things that you might not see in shorter term, more controlled scenarios. For example, my colleagues Bob Sutton and Andrew Hargadon studied brainstorming. The prior literature on brainstorming had focused mostly on questions like “Do people come up with more ideas?” What Bob and Andrew realized by going into the field was that brainstorming served a number of other functions also, like, for example, brainstorming provides a way for members of the design team to demonstrate their creativity to their peers; it allows them to pass along knowledge that then can be reused in other projects; and it creates a fun, exciting environment that people like to work in and that clients like to participate in. In a real ecosystem, all of these things are important, in addition to just having the ideas that people come up with. Nearly all experiments seek to build a theory on some level — I don’t mean anything fancy by this, just that we take some things to be more relevant, and other things less relevant. We might, for example, assume that the ordering of search results may play an important role in what people click on, but that the batting average of the Detroit Tigers doesn’t, unless, of course, somebody’s searching for baseball. If you have a theory that sufficiently, formal mathematically that you may make predictions, then you can compare alternative interfaces using that model, without having to bring people in. And we’ll go over that in this class a little bit, with respect to input models. This makes it possible to try out a number of alternatives really fast. Consequently, when people use simulations, it’s often in conjunction with something like Monte Carlo optimization. One example of this can be found in the ShapeWriter system, where Shuman Zhai and colleagues figured out how to build a keyboard where people could enter an entire word in a single stroke. They were able to do this with the benefit of formal models and optimization-based approaches. Simulation has mostly been used for input techniques because people’s motor performance is probably the most well-quantified area of HCI. And, while we won’t get much to it in this intro course, simulation can also be used for higher-level cognitive tasks; for example, Pete Pirolli and colleagues at PARC had built impressive models of people’s web-searching behaviour. These models enable them to estimate, for example, which links somebody is most likely to click on by looking at the relevant link texts. That’s our whirlwind tour of a number of empirical methods that this class will introduce. You’ll want to pick the right method for the right task, and here’s some issues to consider: If you did it again, would you get the same thing? Another is generalizability and realism — Does this hold for people other than 18-year-old upper-middle-class students who are doing this for course credit or a gift certificate? Is this behaviour also what you’d see in the real world, or only in a more stilted lab environment? Comparisons are important, because they can tell you how the user experience would change with different interface choices, as opposed to just a “people liked it” study. It’s also important to think about how to achieve how these insights efficiently, and not chew up a lot of resources, especially when your goal is practical. My experience as a designer, researcher, teacher, consultant, advisor and mentor has taught me that evaluating designs with people is both easier and more valuable than many people expect, and there’s an incredible lightbulb moment that happens when you actually get designs in front of people and see how they use them. So, to sum up this video, I’d like to ask what could be the most important question: “What do you want to learn?”