1 00:00:04,700 --> 00:00:08,900 Hello and welcome to see us five thirty three, introduction to data science. 2 00:00:08,900 --> 00:00:12,650 This is the introduction video that's going to give you an overview of what it is going to we're going 3 00:00:12,650 --> 00:00:17,960 to be talking about this semester and provide you with guidance for how to get started with the course. 4 00:00:17,960 --> 00:00:21,650 Are learning outcomes for this video are to introduce the class subject. 5 00:00:21,650 --> 00:00:24,290 What is data science for our purposes? 6 00:00:24,290 --> 00:00:30,650 For you to understand the learning outcomes of the course, for you to understand how this class is structured at a high level, 7 00:00:30,650 --> 00:00:39,890 we'll be supplementing this with discussion in class and then also to know where to get help throughout the semester as you're in the course. 8 00:00:39,890 --> 00:00:44,030 To get started, though, I want to talk about what data science is. 9 00:00:44,030 --> 00:00:49,460 And there are many different people are going to have many different definitions of data science. 10 00:00:49,460 --> 00:00:59,730 But one of the things that. But the definition I've been using as I've built out this class is that data science is the use 11 00:00:59,730 --> 00:01:06,150 of data to provide quantitative insights on questions of scientific business or social interest. 12 00:01:06,150 --> 00:01:13,420 There may be many different things to which we want to apply the science we may have maybe in a business context, 13 00:01:13,420 --> 00:01:18,570 but we want to gain understanding about the effectiveness of our business processes. 14 00:01:18,570 --> 00:01:26,040 We may want to evaluate a change to some aspect of how we carry out our business or how we manufacture, provide our products. 15 00:01:26,040 --> 00:01:30,990 We may want to understand the impacts of some of our business decisions. 16 00:01:30,990 --> 00:01:38,310 We may have a scientific question. What we are trying to produce generalizable knowledge to understand the world around us. 17 00:01:38,310 --> 00:01:40,170 We may have a social interest, 18 00:01:40,170 --> 00:01:47,670 particularly if we're deploying data science in the context of a nonprofit or of a government agency or of an educational institution. 19 00:01:47,670 --> 00:02:00,640 What we're trying to understand, social dynamics of human behavior. We're trying to understand maybe the impact of a policy on on. 20 00:02:00,640 --> 00:02:13,880 For example. We may have social interest where particularly if we are deploying. 21 00:02:13,880 --> 00:02:19,640 We may have social interest, particularly if we are deploying data science in a nonprofit or a government agency 22 00:02:19,640 --> 00:02:23,720 or an educational setting where we're trying to understand social dynamics, 23 00:02:23,720 --> 00:02:28,700 we're trying to understand the effectiveness and the impact of policies and programs 24 00:02:28,700 --> 00:02:35,110 and various other purposes to inform our social mission with quantitative insights. 25 00:02:35,110 --> 00:02:43,780 To give you some more about how I go about thinking about this and some more of the background of how I've been designing the class, 26 00:02:43,780 --> 00:02:51,310 I find this quote from Sergios Mondo's Introduction to science and Technology Studies useful in a book. 27 00:02:51,310 --> 00:02:55,990 He says, By itself, some piece of data has no meaning. 28 00:02:55,990 --> 00:03:01,330 Data is only given meaning as evidence by the people who make use of it. 29 00:03:01,330 --> 00:03:04,990 And this means that we can't just go find a bunch of data. 30 00:03:04,990 --> 00:03:15,610 And, oh, we have the answer. We have to do work to get our data into a form where it can actually provide answers to the questions we care about. 31 00:03:15,610 --> 00:03:23,610 We need to do work to frame our questions in a way that we can actually. 32 00:03:23,610 --> 00:03:31,770 Answer them with the data that we have available or that we can obtain, and this is a human process and it's also a social process, 33 00:03:31,770 --> 00:03:37,200 which is one of the reasons why we're going to be using our own class time for 34 00:03:37,200 --> 00:03:44,820 discussion and application exercises is because we need the data of the data, 35 00:03:44,820 --> 00:03:50,310 gains its meaning and becomes evidence for particular conclusions of particular answers. 36 00:03:50,310 --> 00:03:56,370 Because we go through the process of interpretation and we go through the process of presenting our 37 00:03:56,370 --> 00:04:02,250 interpretations for other to others and discussing and debating how we came to the conclusions, 38 00:04:02,250 --> 00:04:06,030 the conclusions. We came to the level of support they have from the data. 39 00:04:06,030 --> 00:04:15,900 So we're going to be practicing that a lot in this class of going from data to evidentiary meaning as a human process and pursuit of all of this. 40 00:04:15,900 --> 00:04:24,180 There are a number of learning outcomes for this course. The first one is that I want you to be able to explore a data set. 41 00:04:24,180 --> 00:04:29,550 Someone gives you some data and you first need to be able to get your bearings and understand what data you have, 42 00:04:29,550 --> 00:04:35,520 be able to assess whether or not it would answer particular questions and what questions you might be able to answer with it. 43 00:04:35,520 --> 00:04:39,960 You need to be able to define a question that we can answer. 44 00:04:39,960 --> 00:04:48,200 So this is a. This is not an easy process when you're talking about quite a bit more about it in the next two videos. 45 00:04:48,200 --> 00:04:53,720 But taking a goal that we have and turning it into a question that we can actually answer 46 00:04:53,720 --> 00:04:59,990 through a data analysis is a process that takes work and we're going to be learning that. 47 00:04:59,990 --> 00:05:09,350 I want you to be able to then actually carry out your analysis in a way that is documented so other people can understand what you did, 48 00:05:09,350 --> 00:05:18,710 it's reproducible, so others can do the same analysis. And it also doesn't unnecessarily waste computing resources to come to the answers. 49 00:05:18,710 --> 00:05:27,730 You make efficient use of the resources we have available to us. And also then we can scale to being doing analysis on very large data sets. 50 00:05:27,730 --> 00:05:35,320 It's not enough to just do an analysis, I want you to be able to present the results of your analysis both through visuals, 51 00:05:35,320 --> 00:05:42,130 charts and graphics and the written argument to be able to communicate to other people, 52 00:05:42,130 --> 00:05:48,910 your peers and the class myself in the future, your advisers, your supervisors at work, 53 00:05:48,910 --> 00:05:56,650 what it is that you learn from the data and why your conclusions are a reliable and defensible interpretation of the data. 54 00:05:56,650 --> 00:06:00,640 I want you to be able to identify weaknesses in data analysis. 55 00:06:00,640 --> 00:06:03,990 No data analysis is perfect. 56 00:06:03,990 --> 00:06:10,710 There are going to be weaknesses and downsides and we'll have to make tradeoffs when we make decisions of how we analyze the data, 57 00:06:10,710 --> 00:06:19,480 but also not all weaknesses are created equal. I want you to be able to assess whether the impact a weakness has on the correctness and utility. 58 00:06:19,480 --> 00:06:25,530 The result? Some weaknesses are fatal flaws, so we can no longer trust the results we get. 59 00:06:25,530 --> 00:06:29,640 Other weaknesses, however, are. 60 00:06:29,640 --> 00:06:37,650 Caveats that we need to acknowledge and account for when we apply the results, but they don't fundamentally undermine their validity. 61 00:06:37,650 --> 00:06:42,600 I also want you to be able to reason about the ethical implications of your work. 62 00:06:42,600 --> 00:06:46,560 There's a variety of different frameworks and perspectives that we can take on the 63 00:06:46,560 --> 00:06:49,920 ethics of data science work that we're going to be talking about this semester. 64 00:06:49,920 --> 00:06:57,450 But I want you to be able to think about and account for and assess the ethical implications of data science work. 65 00:06:57,450 --> 00:07:03,900 And then finally, I want you to understand how various the broader picture of data science, 66 00:07:03,900 --> 00:07:11,580 the various techniques and applications fit into a framework to give you a map of the of the space, 67 00:07:11,580 --> 00:07:16,560 and particularly with the role this class plays in Boise State's graduate curriculum. 68 00:07:16,560 --> 00:07:22,890 I want you I want to give you a framework that you can use to relate what you're going to learn in other classes, 69 00:07:22,890 --> 00:07:27,480 such as machine learning or large scale data analysis or recommender systems or social 70 00:07:27,480 --> 00:07:34,790 media mining together and develop a coherent picture of what it is to do data science. 71 00:07:34,790 --> 00:07:39,590 Support these outcomes, there are a number of components of the class. The first is the videos and readings. 72 00:07:39,590 --> 00:07:44,990 You're watching one of those videos right now. This is going to be the primary mechanism for content delivery. 73 00:07:44,990 --> 00:07:51,200 I'm not planning to lecture live in class. I'll be doing a little bit of lecture style things here and there as we need to 74 00:07:51,200 --> 00:07:55,610 clear up things that that are confusing or things you have questions about. 75 00:07:55,610 --> 00:08:00,800 But our primary content delivery is going to be through these prerecorded lecture videos. 76 00:08:00,800 --> 00:08:05,780 The website is your guide to all of this. Each week is going to have a page that lays out the videos, 77 00:08:05,780 --> 00:08:12,760 the readings that you need to do in order to be prepared for class, prepared for the assignments and to learn the material. 78 00:08:12,760 --> 00:08:19,120 Are in class, meetings are going to be focused on discussing and applying the materials, some of that will be open discussion, 79 00:08:19,120 --> 00:08:24,790 some of that will be guided application exercises particularly carried out with your classmates. 80 00:08:24,790 --> 00:08:27,860 We're going to be having some quizzes, particularly each Thursday. 81 00:08:27,860 --> 00:08:33,250 There's going to be a quiz before class that assesses your understanding of the material and give both 82 00:08:33,250 --> 00:08:38,730 you and I an initial check on how well your understanding it and whether you're ready to apply it. 83 00:08:38,730 --> 00:08:41,560 In our work and in the assignment, 84 00:08:41,560 --> 00:08:47,110 we're going to have assignments that come at a relatively steady pace throughout the semester, one every other week. 85 00:08:47,110 --> 00:08:51,550 They give you more extended place to practice and develop your skills and demonstrate 86 00:08:51,550 --> 00:08:55,990 your mastery of the material that we're going to be covering this semester. And then finally, 87 00:08:55,990 --> 00:09:06,680 we have two midterm exams that a final exam to give an additional check of your conceptual knowledge of what it is that we're doing in this class. 88 00:09:06,680 --> 00:09:08,450 As we go through the semester, 89 00:09:08,450 --> 00:09:13,950 I expect you're probably going to have questions and you're going to need some help and there's a variety of places you can get it. 90 00:09:13,950 --> 00:09:19,760 You can get it in our class meetings, both from your peers in the class and from me. 91 00:09:19,760 --> 00:09:23,630 You can get it at any time through the class forum on Piazza. 92 00:09:23,630 --> 00:09:29,330 I strongly encourage you to ask questions and answer each other's questions on Piazza when you ask a question, 93 00:09:29,330 --> 00:09:35,510 I'm probably not going to answer it immediately. I'm going to give some time in order to see if others have an answer first. 94 00:09:35,510 --> 00:09:41,150 Oftentimes, helping other people answer questions strengthens our own understanding of the material. 95 00:09:41,150 --> 00:09:46,790 So I encourage you to make use of Piazza both to answer your get your questions answered 96 00:09:46,790 --> 00:09:50,990 and to develop and practice your understanding of the material through helping each other. 97 00:09:50,990 --> 00:09:56,180 I am going to be having office hours. I'm going to be having them physically in my office so long as health permits. 98 00:09:56,180 --> 00:10:00,710 And I will also be able to have remote office hours by appointment. 99 00:10:00,710 --> 00:10:04,670 I do ask that if you have any inquiries about the class that you direct them through Piazza. 100 00:10:04,670 --> 00:10:09,080 If it's something that's just for me, perhaps about grades or something, 101 00:10:09,080 --> 00:10:13,020 you can send a private message to the instructors on Piazza that's going to let me keep all 102 00:10:13,020 --> 00:10:18,170 my course communications in one place so I don't accidentally lose something in my email. 103 00:10:18,170 --> 00:10:23,180 And finally, you can use the Internet as a resource where there's a lot of documentation, resources. 104 00:10:23,180 --> 00:10:31,860 Many of our readings are going to come from various places on the Internet. There's many more sites where you can search for solutions and help also. 105 00:10:31,860 --> 00:10:37,140 I encourage you to make use of public Q&A sites when you're a practicing data scientist after this class. 106 00:10:37,140 --> 00:10:39,960 The Internet is at your disposal for getting your work done. 107 00:10:39,960 --> 00:10:47,600 There's no reason not to go ahead and practice, make use of the resources that are available to you in this class. 108 00:10:47,600 --> 00:10:54,920 I just ask that you do it in a way that's in keeping with the principles of academic integrity, 109 00:10:54,920 --> 00:10:59,240 you can go to a public Q&A site and you can ask, I'm trying to do this thing. 110 00:10:59,240 --> 00:11:05,100 I'm having a problem. And ask a specific question about the problem that you're getting stuck on. 111 00:11:05,100 --> 00:11:12,660 I don't want you to just copy copying a piece of the assignment description pasted in a question answer site and ask, how do I do this? 112 00:11:12,660 --> 00:11:19,410 Demonstrate your learning of the material and how you go about identifying where it is that you're stuck and 113 00:11:19,410 --> 00:11:26,520 asking a question that's going to help uncover the missing piece that you need in order to move forward. 114 00:11:26,520 --> 00:11:34,030 So it's all of that. This class is designed to provide flexibility in learning and to use each modality to its best advantage. 115 00:11:34,030 --> 00:11:39,180 There are tradeoffs. I'm not going to pretend like there's no trade off the how I've designed this class. 116 00:11:39,180 --> 00:11:46,920 But with this design, we can do the content delivery me lecturing in a way where you can go back and watch videos later. 117 00:11:46,920 --> 00:11:51,330 You can catch up. If you have to miss some material, you can speed up the video. 118 00:11:51,330 --> 00:12:02,400 If I talk to slow. And we're going to focus our in class time on things that can only be done through Synchronoss discussion. 119 00:12:02,400 --> 00:12:06,270 Where we're actually engaging with the material together and talking to each 120 00:12:06,270 --> 00:12:10,560 other about it and these things together with the various parts of the class, 121 00:12:10,560 --> 00:12:16,020 I hope are going to allow us to achieve our learning outcomes for the semester. I hope you have a great semester. 122 00:12:16,020 --> 00:12:20,070 I hope you learn a lot. I also expect to learn from you every time I teach. 123 00:12:20,070 --> 00:12:28,050 It's a learning experience for me as well. And I learn more about the subject and about how to teach and communicate about it from my students. 124 00:12:28,050 --> 00:12:34,900 So let's learn together.