1
00:00:01,426 --> 00:00:06,083
In many ways, the most creative, challenging, and under-appreciated aspect of interaction design

2
00:00:06,083 --> 00:00:08,464
is evaluating designs with people.

3
00:00:08,464 --> 00:00:11,566
The insights that you’ll get from testing designs with people

4
00:00:11,566 --> 00:00:16,017
can help you get new ideas, make changes, decide wisely, and fix bugs.

5
00:00:16,017 --> 00:00:20,811
One reason I think design is such an interesting field is its relationship to truth and objectivity.

6
00:00:20,811 --> 00:00:26,407
I find design so incredibly fascinating because we can say more in response to a question like:

7
00:00:26,407 --> 00:00:32,961
“How can we measure success?” than “It’s just personal preference” or “Whatever feels right.”

8
00:00:32,961 --> 00:00:37,474
At the same time, the answers are more complex and more open-ended, more subjective,

9
00:00:37,474 --> 00:00:41,545
and require more wisdom than just a number like 7 or 3.

10
00:00:41,545 --> 00:00:43,850
One of the things that we’re going to learn in this class

11
00:00:43,850 --> 00:00:48,372
is the different kinds of knowledge that you can get out of different kinds of methods.

12
00:00:48,372 --> 00:00:53,319
Why evaluate designs with people? Why learn about how people use interactive systems?

13
00:00:53,319 --> 00:00:58,444
I think one major reason for this is that it can be difficult to tell how good a user interface is

14
00:00:58,444 --> 00:01:03,974
until you’ve tried it out with actual users, and that’s because clients and designers and developers,

15
00:01:03,974 --> 00:01:07,107
they may know too much about the domain and the user interface,

16
00:01:07,107 --> 00:01:11,376
or have acquired blinders through designing and building the user interface.

17
00:01:11,376 --> 00:01:15,415
At the same time they may not know enough about the user’s actual tasks.

18
00:01:15,415 --> 00:01:20,965
And while experience and theory can help, it can still be hard to predict what real users will actually do.

19
00:01:21,888 --> 00:01:25,002
You might want to know, “Can people figure out how to use it?”

20
00:01:25,002 --> 00:01:28,859
or “Do they swear or giggle when using this interface?”

21
00:01:28,859 --> 00:01:31,224
“How does this design compare to that design?”

22
00:01:31,224 --> 00:01:35,337
and, “If we changed the interface, how does that change people’s behaviour?”

23
00:01:35,337 --> 00:01:39,499
“What new practices might emerge?” “How do things change over time?”

24
00:01:39,499 --> 00:01:44,714
These are all great questions to ask about an interface, and each will come from different methods.

25
00:01:44,714 --> 00:01:49,932
The value of having a broad toolbox of different methods can be especially valuable in emerging areas

26
00:01:49,932 --> 00:01:56,178
like mobile and social software where people’s use practices can be particularly context-dependent

27
00:01:56,178 --> 00:02:00,681
and also evolves significantly over time in response to how other people use software

28
00:02:00,681 --> 00:02:03,197
through network effects and things like that.

29
00:02:03,197 --> 00:02:08,534
To give you a flavour of this, I’d like to quickly run through some common types of empiracal research in HCI.

30
00:02:08,534 --> 00:02:11,741
The examples I’ll show are mostly published work of one sort or another,

31
00:02:11,741 --> 00:02:14,024
because that’s the easiest stuff to share.

32
00:02:14,024 --> 00:02:18,654
If you have good examples from current systems out in the world, post them to the forum!

33
00:02:18,654 --> 00:02:21,130
I keep an archive of user interface examples,

34
00:02:21,130 --> 00:02:24,434
and I and the other students would love to see what you can come up with.

35
00:02:24,434 --> 00:02:27,176
One way to learn about the user experience of a design

36
00:02:27,176 --> 00:02:30,811
is to bring people into your lab or office and have them try it out.

37
00:02:30,811 --> 00:02:32,978
We often call these usability studies.

38
00:02:32,978 --> 00:02:37,458
This “watch someone use my interface” approach is a common one in HCI.

39
00:02:37,458 --> 00:02:43,622
This basic strategy for traditional user-centred design is to iteratively bring people

40
00:02:43,622 --> 00:02:48,221
into your lab or office until you run out of time. And then release.

41
00:02:48,221 --> 00:02:52,312
And, if you had deep pockets, these rooms had a one-way glass mirror,

42
00:02:52,312 --> 00:02:54,684
and the development team was on the other side.

43
00:02:54,684 --> 00:02:59,245
In a leaner environment, this may be just bring in people into your dorm room office.

44
00:02:59,245 --> 00:03:01,672
You’ll learn a huge amount by doing this.

45
00:03:01,672 --> 00:03:04,702
Every single time that I or a student, friend, or colleague

46
00:03:04,702 --> 00:03:07,731
has watched somebody use a new interactive system,

47
00:03:07,731 --> 00:03:14,185
we learn something, [as,] as designers we get blinders to systems’ quirks, bugs, and false assumptions.

48
00:03:15,308 --> 00:03:19,562
However, there are some major shortcomings to this approach.

49
00:03:19,562 --> 00:03:24,122
In particular, the setting probably isn’t very ecologically valid.

50
00:03:24,122 --> 00:03:29,463
In the real world, people may have different tasks, goals, motivations, and physical settings

51
00:03:29,463 --> 00:03:32,288
than your office or lab.

52
00:03:32,288 --> 00:03:35,354
This can be especially true for user interfaces that you think people might use on the go,

53
00:03:35,354 --> 00:03:38,405
like at a bus stop or while waiting in line.

54
00:03:38,405 --> 00:03:40,827
Second, there can be a “please me” experimental bias,

55
00:03:40,827 --> 00:03:44,122
where when you bring somebody in to try out a user interface,

56
00:03:44,122 --> 00:03:47,339
they know that they’re trying out the technology that you developed

57
00:03:47,339 --> 00:03:50,966
and so they may work harder or be nicer

58
00:03:50,966 --> 00:03:54,593
than they would if they had to use it without the constraints of a lab setup

59
00:03:54,593 --> 00:03:58,497
with the person who developed it watching right over them.

60
00:03:58,497 --> 00:04:03,338
Third, in its most basic form where you’re just trying out just one user interface, there is no comparison point.

61
00:04:03,338 --> 00:04:09,177
So while you can track when people laugh, or swear, or smile with joy,

62
00:04:09,177 --> 00:04:12,456
you won’t know whether they would’ve laugh more, or sworn less, or smiled more

63
00:04:12,456 --> 00:04:14,974
if you’d had a different user interface.

64
00:04:14,974 --> 00:04:18,176
And finally it requires bringing people to your physical location.

65
00:04:18,176 --> 00:04:20,596
This is often a whole lot easier than a lot of people think.

66
00:04:20,596 --> 00:04:23,845
It can be a psychological burden, even if nothing else.

67
00:04:24,307 --> 00:04:28,172
A very different way of getting feedback from people is to use a survey.

68
00:04:28,172 --> 00:04:31,150
Here is an example of a survey that I got recently from San Francisco

69
00:04:31,150 --> 00:04:34,127
asking about different street light designs.

70
00:04:34,127 --> 00:04:38,151
Surveys are great because you can quickly get feedback from a large number of responses.

71
00:04:38,151 --> 00:04:41,353
And it’s relatively easy to compare multiple alternatives.

72
00:04:41,353 --> 00:04:44,385
You can also automatically tally the results.

73
00:04:44,385 --> 00:04:48,390
You don’t even need to build anything; you can just show screen shots or mock-ups.

74
00:04:48,390 --> 00:04:50,532
One of the things that I’ve learned the hard way, though,

75
00:04:50,532 --> 00:04:55,144
is the difference between what people say they’re going to do and what they actually do.

76
00:04:55,144 --> 00:04:59,026
Ask people how often they exercise and you’ll probably get a much more optimistic answer

77
00:04:59,026 --> 00:05:02,060
than how often they really do exercise.

78
00:05:02,060 --> 00:05:05,173
The same holds for the street light example here.

79
00:05:05,173 --> 00:05:08,999
Try to imagine what a number of different street light designs might be

80
00:05:08,999 --> 00:05:12,191
is really different than actually observing them on the street

81
00:05:12,191 --> 00:05:15,384
and having them become part of normal everyday life.

82
00:05:15,384 --> 00:05:18,085
Still, it can be valuable to get feedback.

83
00:05:18,085 --> 00:05:20,439
Another type of responder strategy is focus groups.

84
00:05:20,439 --> 00:05:26,046
In a focus group, you’ll gather together a small group of people to discuss a design or idea.

85
00:05:26,046 --> 00:05:31,372
The fact that focus groups involve a group of people is a double-edged sword.

86
00:05:31,372 --> 00:05:37,541
On one hand, you can get people to tease out of their colleagues things that they might not have thought

87
00:05:37,541 --> 00:05:44,579
to say on their own; on the other hand, for a variety of psychological reasons, people may be inclined

88
00:05:44,579 --> 00:05:48,774
to say polite things or generate answers completely on the spot

89
00:05:48,774 --> 00:05:53,785
that are totally uncorrelated with what they believe or what they would actually do.

90
00:05:54,662 --> 00:05:59,982
Focus groups can be a particularly problematic method when you are looking at trying to gather data

91
00:05:59,982 --> 00:06:04,135
about taboo topics or about cultural biases.

92
00:06:04,135 --> 00:06:06,723
With those caveats — right now we’re just making a laundry list, and —

93
00:06:06,723 --> 00:06:12,312
I think that focus groups, like almost any other method, can play an important role in your toolbelt.

94
00:06:13,420 --> 00:06:16,574
Our third category of techniques is to get feedback from experts.

95
00:06:16,574 --> 00:06:22,905
For example, in this class we’re going to do a bunch of peer critique for your weekly project assignments.

96
00:06:22,905 --> 00:06:25,370
In addition to having users try your interface,

97
00:06:25,370 --> 00:06:29,775
it can be important to eat your own dog food and use the tools that you built yourself.

98
00:06:29,775 --> 00:06:35,069
When you are getting feedback from experts, it can often be helpful to have some kind of structured format,

99
00:06:35,069 --> 00:06:38,558
much like the rubrics you’ll see in your project assignments.

100
00:06:38,558 --> 00:06:44,881
And, for getting feedback on user interfaces, one common approach to this structured feedback

101
00:06:44,881 --> 00:06:48,390
is called heuristic evaluation, and you’ll learn how to do that in this class;

102
00:06:48,390 --> 00:06:51,051
it’s pioneered by Jacob Nielson.

103
00:06:51,051 --> 00:06:53,496
Our next genre is comparative experiments:

104
00:06:53,496 --> 00:06:57,565
taking two or more distinct options and comparing their performance to each other.

105
00:06:57,565 --> 00:07:00,183
These comparisons can take place in lots of different ways:

106
00:07:00,183 --> 00:07:04,061
They can be in the lab; they can be in the field; they can be online.

107
00:07:04,061 --> 00:07:06,543
These experiments can be more-or-less controlled,

108
00:07:06,543 --> 00:07:10,125
and they can take place over shorter or longer durations.

109
00:07:10,125 --> 00:07:14,235
What you’re trying to learn here is which option is the more effective,

110
00:07:14,235 --> 00:07:16,998
and, more often, what are the active ingredients,

111
00:07:16,998 --> 00:07:21,422
what are the variables that matter in creating the user experience that you seek.

112
00:07:22,006 --> 00:07:26,714
Here’s an example: My former PhD student Joel Brandt, and his colleague at Adobe,

113
00:07:26,714 --> 00:07:30,847
ran a number of studies comparing help interfaces for programmers.

114
00:07:32,139 --> 00:07:38,319
In particular they compared a more traditional search-style user interface for finding programming help

115
00:07:38,319 --> 00:07:43,443
with a search interface that integrated programming help directly into your environment.

116
00:07:43,443 --> 00:07:46,979
By running these comparisons they were able to see how programmers’ behaviour differed

117
00:07:46,979 --> 00:07:50,588
based on the changing help user interface.

118
00:07:50,588 --> 00:07:53,698
Comparative experiments have an advantage over surveys

119
00:07:53,698 --> 00:07:57,230
in that you get to see the actual behaviour as opposed to self report,

120
00:07:57,230 --> 00:08:02,329
and they can be better than usability studies because you’re comparing multiple alternatives.

121
00:08:02,329 --> 00:08:06,780
This enables you to see what works better or worse, or at least what works different.

122
00:08:06,780 --> 00:08:10,366
I find that comparative feedback is also often much more actionable.

123
00:08:11,166 --> 00:08:13,938
However, if you are running controlled experiments online,

124
00:08:13,938 --> 00:08:18,079
you don’t get to see much about the person on the other side of the screen.

125
00:08:18,079 --> 00:08:20,774
And if you are inviting people into your office or lab,

126
00:08:20,774 --> 00:08:24,111
the behaviour you’re measuring might not be very realistic.

127
00:08:24,111 --> 00:08:30,283
If realistic longitudinal behaviour is what you’re after, participant observation may be the approach for you.

128
00:08:30,283 --> 00:08:36,419
This approach is just what it sounds like: observing what people actually do in their actual work environment.

129
00:08:36,419 --> 00:08:40,226
And this more long-term evaluation can be important for uncovering things

130
00:08:40,226 --> 00:08:44,131
that you might not see in shorter term, more controlled scenarios.

131
00:08:44,131 --> 00:08:48,015
For example, my colleagues Bob Sutton and Andrew Hargadon studied brainstorming.

132
00:08:48,015 --> 00:08:51,655
The prior literature on brainstorming had focused mostly on questions like

133
00:08:51,655 --> 00:08:54,402
“Do people come up with more ideas?”

134
00:08:54,402 --> 00:08:56,829
What Bob and Andrew realized by going into the field

135
00:08:56,829 --> 00:09:00,517
was that brainstorming served a number of other functions also,

136
00:09:00,517 --> 00:09:05,365
like, for example, brainstorming provides a way for members of the design team

137
00:09:05,365 --> 00:09:08,081
to demonstrate their creativity to their peers;

138
00:09:08,081 --> 00:09:13,210
it allows them to pass along knowledge that then can be reused in other projects;

139
00:09:13,210 --> 00:09:19,057
and it creates a fun, exciting environment that people like to work in and that clients like to participate in.

140
00:09:19,057 --> 00:09:22,206
In a real ecosystem, all of these things are important,

141
00:09:22,206 --> 00:09:25,514
in addition to just having the ideas that people come up with.

142
00:09:26,191 --> 00:09:32,908
Nearly all experiments seek to build a theory on some level — I don’t mean anything fancy by this,

143
00:09:32,908 --> 00:09:37,309
just that we take some things to be more relevant, and other things less relevant.

144
00:09:37,309 --> 00:09:39,250
We might, for example, assume

145
00:09:39,250 --> 00:09:43,068
that the ordering of search results may play an important role in what people click on,

146
00:09:43,068 --> 00:09:46,415
but that the batting average of the Detroit Tigers doesn’t,

147
00:09:46,415 --> 00:09:49,763
unless, of course, somebody’s searching for baseball.

148
00:09:49,763 --> 00:09:55,093
If you have a theory that sufficiently, formal mathematically that you may make predictions,

149
00:09:55,093 --> 00:10:00,037
then you can compare alternative interfaces using that model, without having to bring people in.

150
00:10:00,037 --> 00:10:05,576
And we’ll go over that in this class a little bit, with respect to input models.

151
00:10:05,576 --> 00:10:10,072
This makes it possible to try out a number of alternatives really fast.

152
00:10:10,072 --> 00:10:12,286
Consequently, when people use simulations,

153
00:10:12,286 --> 00:10:16,378
it’s often in conjunction with something like Monte Carlo optimization.

154
00:10:16,378 --> 00:10:19,934
One example of this can be found in the ShapeWriter system,

155
00:10:19,934 --> 00:10:22,735
where Shuman Zhai and colleagues figured out how to build a keyboard

156
00:10:22,735 --> 00:10:26,122
where people could enter an entire word in a single stroke.

157
00:10:26,122 --> 00:10:31,247
They were able to do this with the benefit of formal models and optimization-based approaches.

158
00:10:31,247 --> 00:10:34,402
Simulation has mostly been used for input techniques

159
00:10:34,402 --> 00:10:39,795
because people’s motor performance is probably the most well-quantified area of HCI.

160
00:10:39,795 --> 00:10:42,701
And, while we won’t get much to it in this intro course,

161
00:10:42,701 --> 00:10:46,266
simulation can also be used for higher-level cognitive tasks;

162
00:10:46,266 --> 00:10:48,497
for example, Pete Pirolli and colleagues at PARC

163
00:10:48,497 --> 00:10:51,528
had built impressive models of people’s web-searching behaviour.

164
00:10:52,467 --> 00:10:57,253
These models enable them to estimate, for example, which links somebody is most likely to click on

165
00:10:57,253 --> 00:11:00,238
by looking at the relevant link texts.

166
00:11:00,238 --> 00:11:05,072
That’s our whirlwind tour of a number of empirical methods that this class will introduce.

167
00:11:05,072 --> 00:11:09,481
You’ll want to pick the right method for the right task, and here’s some issues to consider:

168
00:11:09,481 --> 00:11:13,187
If you did it again, would you get the same thing?

169
00:11:13,187 --> 00:11:18,544
Another is generalizability and realism — Does this hold for people other than 18-year-old

170
00:11:18,544 --> 00:11:23,135
upper-middle-class students who are doing this for course credit or a gift certificate?

171
00:11:23,135 --> 00:11:28,546
Is this behaviour also what you’d see in the real world, or only in a more stilted lab environment?

172
00:11:28,546 --> 00:11:30,864
Comparisons are important, because they can tell you

173
00:11:30,879 --> 00:11:34,351
how the user experience would change with different interface choices,

174
00:11:34,351 --> 00:11:38,553
as opposed to just a “people liked it” study.

175
00:11:38,553 --> 00:11:42,784
It’s also important to think about how to achieve how these insights efficiently,

176
00:11:42,784 --> 00:11:48,747
and not chew up a lot of resources, especially when your goal is practical.

177
00:11:48,747 --> 00:11:54,252
My experience as a designer, researcher, teacher, consultant, advisor and mentor has taught me

178
00:11:54,252 --> 00:12:01,340
that evaluating designs with people is both easier and more valuable than many people expect,

179
00:12:01,340 --> 00:12:04,704
and there’s an incredible lightbulb moment that happens

180
00:12:04,704 --> 00:12:08,831
when you actually get designs in front of people and see how they use them.

181
00:12:08,831 --> 00:12:12,945
So, to sum up this video, I’d like to ask what could be the most important question:

182
00:12:12,945 --> 99:59:59,999
“What do you want to learn?”