-
Not Synced
1
00:00:04,750 --> 00:00:13,540
Welcome back. And this video, I want to talk with you about what I'm going to call questioning questions I want us to dig deeper into.
-
Not Synced
2
00:00:13,540 --> 00:00:21,340
We have a question. We have a candidate operationalization. How do we understand what it's actually going to be doing and how do we evaluate it?
-
Not Synced
3
00:00:21,340 --> 00:00:25,570
So we're learning outcomes here are to be able to identify stakeholders and subgroups for a
-
Not Synced
4
00:00:25,570 --> 00:00:31,930
problem and to evaluate the operationalization of a question to see who and what it prioritizes.
-
Not Synced
5
00:00:31,930 --> 00:00:37,030
A key concept I want you to take away from this lecture is that we can understand
-
Not Synced
6
00:00:37,030 --> 00:00:42,640
a metric or a measurement by asking what it takes to make it improve.
-
Not Synced
7
00:00:42,640 --> 00:00:49,000
So to return to the example from the previous video, we want to assess whether a new change is improved.
-
Not Synced
8
00:00:49,000 --> 00:00:51,910
Our introductory computer science class.
-
Not Synced
9
00:00:51,910 --> 00:01:00,820
We can't one way we can go try to do that is to look at students who have taken their grades in the next class,
-
Not Synced
10
00:01:00,820 --> 00:01:05,200
see us to twenty one and see if they're more likely to pass the next class.
-
Not Synced
11
00:01:05,200 --> 00:01:11,500
We can measure this by looking at the grades. But this is not the only way that we can.
-
Not Synced
12
00:01:11,500 --> 00:01:16,030
We can try to measure whether we've improved since to twenty or one twenty one.
-
Not Synced
13
00:01:16,030 --> 00:01:21,570
So this option, I'm going to call it option one. We're looking at the pass rate. What fraction of students pass.
-
Not Synced
14
00:01:21,570 --> 00:01:26,290
Yes. To twenty one for whatever definition of pass they get a C minus or better.
-
Not Synced
15
00:01:26,290 --> 00:01:31,270
We could, we could up get. We could say what fraction get a B or get a B minus or better.
-
Not Synced
16
00:01:31,270 --> 00:01:36,460
But what fraction of students pass. And does the Nusi as 121 method improve it.
-
Not Synced
17
00:01:36,460 --> 00:01:42,400
Another way we could go try to look at it would be to look at the grades students receive in to twenty one.
-
Not Synced
18
00:01:42,400 --> 00:01:47,920
So what is the average grade for students? For first students with the new intro.
-
Not Synced
19
00:01:47,920 --> 00:01:55,930
With the previous intro and C two twenty one. Does the new, does the new technique in intro improve it.
-
Not Synced
20
00:01:55,930 --> 00:02:04,330
We then have a question. Do we want to look at letter grades and, and maybe compute the average with the same formula used for GPA.
-
Not Synced
21
00:02:04,330 --> 00:02:10,930
Do we want to look at coarse points. So actually look at what did they get in ninety five and ninety nine.
-
Not Synced
22
00:02:10,930 --> 00:02:17,350
But we can look at the movement in the average grades now to evaluate what we have to figure out.
-
Not Synced
23
00:02:17,350 --> 00:02:29,260
Which of these do we want to do. And a key tool that I want you to become familiar with for evaluating and understanding different measurements,
-
Not Synced
24
00:02:29,260 --> 00:02:35,140
metrics, statistics, etc., is to ask how do I improve this measurement?
-
Not Synced
25
00:02:35,140 --> 00:02:38,680
So we're computing's that. They were computing a pass rate, were computing an average grade.
-
Not Synced
26
00:02:38,680 --> 00:02:49,000
How do I make this measurement better or worse? And this is going to give us a lot of insight into how the measurement works, what it prioritizes.
-
Not Synced
27
00:02:49,000 --> 00:02:53,020
This this way of thinking about a measurement or a problem,
-
Not Synced
28
00:02:53,020 --> 00:02:58,030
I think is going to serve you well throughout this class and throughout the rest of your education and work.
-
Not Synced
29
00:02:58,030 --> 00:03:02,710
So let's talk about what these measurements and how to improve these measurements.
-
Not Synced
30
00:03:02,710 --> 00:03:08,710
So if we want to improve the pass rate and we're not changing six to twenty one itself,
-
Not Synced
31
00:03:08,710 --> 00:03:12,100
we can improve the pass rate by just passing everybody in to twenty one.
-
Not Synced
32
00:03:12,100 --> 00:03:18,390
But if we're leaving to twenty one alone and we're changing, we're trying to improve the pass rate.
-
Not Synced
33
00:03:18,390 --> 00:03:23,010
By making a change to the class that prepares students for it,
-
Not Synced
34
00:03:23,010 --> 00:03:28,340
the only way to improve this is by helping those students who are going to have the most difficulty with C.
-
Not Synced
35
00:03:28,340 --> 00:03:35,040
S one to one with C. S two one. If a student goes through one twenty one and they're going to get A, B and A and C.
-
Not Synced
36
00:03:35,040 --> 00:03:40,110
S one twenty one. No change that we make to their excuse me to twenty one.
-
Not Synced
37
00:03:40,110 --> 00:03:45,630
No change that we make to their one twenty one experience is going to improve our metric.
-
Not Synced
38
00:03:45,630 --> 00:03:51,710
The only way to improve it is by helping more students move.
-
Not Synced
39
00:03:51,710 --> 00:04:03,550
From a D to a C minus. Option two, why measuring the grade, we can improve this by making students do better at.
-
Not Synced
40
00:04:03,550 --> 00:04:06,670
But we make do it by enabling students to do better.
-
Not Synced
41
00:04:06,670 --> 00:04:14,110
If a student that would have gotten an A minus under the previous one twenty one is now prepared to the point where they'll get an A.
-
Not Synced
42
00:04:14,110 --> 00:04:19,270
We improve it and we improve it just as much as if we move from a C minus to a C.
-
Not Synced
43
00:04:19,270 --> 00:04:23,650
And so we there's more opportunity to make the grade better.
-
Not Synced
44
00:04:23,650 --> 00:04:31,690
But we can improve this metric only by helping the students who were already well-prepared for C to twenty one.
-
Not Synced
45
00:04:31,690 --> 00:04:36,730
So this brings us to a key point you get, which you measure.
-
Not Synced
46
00:04:36,730 --> 00:04:43,390
If you set something up as the evaluation criteria, we're gonna see this really clearly once we start optimizing machine learning models.
-
Not Synced
47
00:04:43,390 --> 00:04:48,600
When you set something up as your optimization criteria, that's what you get.
-
Not Synced
48
00:04:48,600 --> 00:04:58,470
If you evaluate 120, what if you evaluate changes to the introductory class by people passing the next class, then you?
-
Not Synced
49
00:04:58,470 --> 00:05:08,970
That's going to. That structures the pedagogy because design and the teaching evaluations to favor preparing students to pass the next class.
-
Not Synced
50
00:05:08,970 --> 00:05:18,330
If you measure if you measure increased in average grade, that structure did to improve average grades.
-
Not Synced
51
00:05:18,330 --> 00:05:24,870
But it might but it might focus the attention more on helping the students who were going to get a pretty good grade.
-
Not Synced
52
00:05:24,870 --> 00:05:26,740
Anyway.
-
Not Synced
53
00:05:26,740 --> 00:05:34,930
So going beyond, though, just just so we're looking here at two subgroups, one metric clearly favors the students who are on the edge pass rate.
-
Not Synced
54
00:05:34,930 --> 00:05:42,240
You can only improve it by helping the students who are on the edge. The average grade, you can help across the board.
-
Not Synced
55
00:05:42,240 --> 00:05:50,260
There are a variety of different stakeholders in the in the design of of an introductory class.
-
Not Synced
56
00:05:50,260 --> 00:05:54,130
There's the students themselves who are going to be learning. There's the faculty who have to teach it.
-
Not Synced
57
00:05:54,130 --> 00:06:02,770
Either they teach it themselves, they teach it directly, or they depend on students being prepared by that class as a prerequisite for the class.
-
Not Synced
58
00:06:02,770 --> 00:06:09,990
They do teach the department, obviously, as a stakeholder because it has an interest in students producing a good education
-
Not Synced
59
00:06:09,990 --> 00:06:15,820
that that students are want to come for and that employers want to hire for.
-
Not Synced
60
00:06:15,820 --> 00:06:18,970
Employers want students to come out of the program well-prepared.
-
Not Synced
61
00:06:18,970 --> 00:06:28,360
The university wants one has an interest in having programs that that produce well-prepared students.
-
Not Synced
62
00:06:28,360 --> 00:06:33,970
And there also are attractive to students to be it to increase enrollment numbers.
-
Not Synced
63
00:06:33,970 --> 00:06:40,180
But then even within within stake, the broad stakeholder categories, we have different subgroups.
-
Not Synced
64
00:06:40,180 --> 00:06:43,870
So just within students, we can talk about high performing students.
-
Not Synced
65
00:06:43,870 --> 00:06:48,520
We can talk about students who are underprepared for one or another class.
-
Not Synced
66
00:06:48,520 --> 00:06:51,700
We can talk about students who are in some way or another marginalized.
-
Not Synced
67
00:06:51,700 --> 00:07:02,110
And they may experience changes in the class structure of the class, delivery of the class assessment differently.
-
Not Synced
68
00:07:02,110 --> 00:07:09,950
So even even once we've identified a stakeholder group, not every subset of that stakeholder group is going to experience.
-
Not Synced
69
00:07:09,950 --> 00:07:14,780
What we're trying to study or is going to be reflected in the data in the same way,
-
Not Synced
70
00:07:14,780 --> 00:07:20,360
we need to be able to identify these different groups to understand what it is that we're actually measuring.
-
Not Synced
71
00:07:20,360 --> 00:07:26,600
So I want to return, though, this key question ask how do I improve this more?
-
Not Synced
72
00:07:26,600 --> 00:07:31,620
Faced with a metric that really helps us clarify how a metric or a measurement behaves.
-
Not Synced
73
00:07:31,620 --> 00:07:35,390
What changes to improve it? What also what can remain the same?
-
Not Synced
74
00:07:35,390 --> 00:07:43,150
While it is improved. And then another important question is, how can it be gamed or manipulated?
-
Not Synced
75
00:07:43,150 --> 00:07:46,960
Metrics are always what we call Lawsie and reductive.
-
Not Synced
76
00:07:46,960 --> 00:07:54,600
Which means what that means is that no measurement captures everything about a phenomenon we care about.
-
Not Synced
77
00:07:54,600 --> 00:07:59,850
And reductive means that we're we're taking a complex virts phenomenon.
-
Not Synced
78
00:07:59,850 --> 00:08:04,980
We're reducing it down to one or a handful of measurements. We always lose something there.
-
Not Synced
79
00:08:04,980 --> 00:08:14,670
But the results. But this question helps us evaluate what we're losing and iterate and improve our metrics and also assess and
-
Not Synced
80
00:08:14,670 --> 00:08:21,390
assess whether those those weaknesses are actual challenges to validity or just something we need to keep in mind.
-
Not Synced
81
00:08:21,390 --> 00:08:28,980
So to wrap up, it's crucial to appropriately define our measurements and then to question our definitions and a good import.
-
Not Synced
82
00:08:28,980 --> 00:08:34,650
A useful way to do that is to ask how do we improve a metric that we're thinking about using?
-
Not Synced
83
00:08:34,650 --> 00:08:39,480
And then also we want to look at who or what does a metric prioritize?
-
Not Synced
84
00:08:39,480 --> 00:08:54,933
And is that prioritization is consistent with what we want to accomplish through organizational business or scientific goals.
-
Not Synced