[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,0:00:09.54,0:00:12.81,Default,,0000,0000,0000,,This presentation is delivered by the Stanford Center for Professional Dialogue: 0,0:00:12.81,0:00:19.81,Default,,0000,0000,0000,,Development. Dialogue: 0,0:00:28.74,0:00:30.30,Default,,0000,0000,0000,,Welcome back. Dialogue: 0,0:00:30.30,0:00:32.80,Default,,0000,0000,0000,,What I want to do today is Dialogue: 0,0:00:32.80,0:00:37.12,Default,,0000,0000,0000,,continue a discussion of principal components analysis, or PCA. Dialogue: 0,0:00:37.12,0:00:38.78,Default,,0000,0000,0000,,In particular, there's Dialogue: 0,0:00:38.78,0:00:41.73,Default,,0000,0000,0000,,one more application that I didn't get to in the last lecture on Dialogue: 0,0:00:41.73,0:00:43.95,Default,,0000,0000,0000,,[inaudible] indexing, Dialogue: 0,0:00:43.95,0:00:47.34,Default,,0000,0000,0000,,LSI. Then I want to spend just a little time talking about Dialogue: 0,0:00:47.34,0:00:50.40,Default,,0000,0000,0000,,how to implement PCA, Dialogue: 0,0:00:50.40,0:00:53.51,Default,,0000,0000,0000,,especially for very large problems. In particular, I'll spend just a little bit of time talking Dialogue: 0,0:00:53.51,0:00:55.79,Default,,0000,0000,0000,,about singular value decomposition, Dialogue: 0,0:00:55.79,0:00:57.93,Default,,0000,0000,0000,,or the SVD implementation Dialogue: 0,0:00:57.93,0:01:00.17,Default,,0000,0000,0000,,of principal component Dialogue: 0,0:01:00.17,0:01:01.93,Default,,0000,0000,0000,,analysis. So the Dialogue: 0,0:01:01.93,0:01:06.55,Default,,0000,0000,0000,,second half of today's lecture, I want to talk about the different algorithm called Dialogue: 0,0:01:06.55,0:01:09.09,Default,,0000,0000,0000,,independent component analysis, Dialogue: 0,0:01:09.09,0:01:13.40,Default,,0000,0000,0000,,which is, in some ways, related to PCA, but in many other ways, it Dialogue: 0,0:01:13.40,0:01:15.58,Default,,0000,0000,0000,,also manages to accomplish Dialogue: 0,0:01:15.58,0:01:18.39,Default,,0000,0000,0000,,very different things than PCA. Dialogue: 0,0:01:18.39,0:01:22.38,Default,,0000,0000,0000,,So with this lecture, this will actually wrap up our discussion on Dialogue: 0,0:01:22.38,0:01:26.47,Default,,0000,0000,0000,,unsupervised learning. The next lecture, we'll start to talk about Dialogue: 0,0:01:26.47,0:01:31.05,Default,,0000,0000,0000,,reinforcement learning algorithms. Dialogue: 0,0:01:31.05,0:01:32.50,Default,,0000,0000,0000,,Just to Dialogue: 0,0:01:32.50,0:01:36.77,Default,,0000,0000,0000,,recap where we were with PCA, principal Dialogue: 0,0:01:36.77,0:01:38.63,Default,,0000,0000,0000,,component analysis, Dialogue: 0,0:01:38.63,0:01:44.87,Default,,0000,0000,0000,,I said that Dialogue: 0,0:01:44.87,0:01:48.84,Default,,0000,0000,0000,,in PCA, we imagine that we have some very high dimensional data Dialogue: 0,0:01:48.84,0:01:50.97,Default,,0000,0000,0000,,that perhaps Dialogue: 0,0:01:50.97,0:01:54.65,Default,,0000,0000,0000,,lies approximately on some low dimensional subspace. So if you had the data set like Dialogue: 0,0:01:54.65,0:01:55.66,Default,,0000,0000,0000,,this, Dialogue: 0,0:01:55.66,0:01:57.00,Default,,0000,0000,0000,,you might find that Dialogue: 0,0:01:57.00,0:02:00.65,Default,,0000,0000,0000,,that's the first principal component of the data, Dialogue: 0,0:02:00.65,0:02:02.85,Default,,0000,0000,0000,,and that's the second Dialogue: 0,0:02:02.85,0:02:05.65,Default,,0000,0000,0000,,component of this 2-D data. Dialogue: 0,0:02:06.87,0:02:09.27,Default,,0000,0000,0000,,To Dialogue: 0,0:02:09.27,0:02:13.04,Default,,0000,0000,0000,,summarize the algorithm, we have three steps. The first step of PCA Dialogue: 0,0:02:14.68,0:02:16.55,Default,,0000,0000,0000,,was to Dialogue: 0,0:02:16.55,0:02:20.35,Default,,0000,0000,0000,,normalize the data to zero mean and Dialogue: 0,0:02:20.35,0:02:25.79,Default,,0000,0000,0000,,[inaudible]. So Dialogue: 0,0:02:25.79,0:02:27.86,Default,,0000,0000,0000,,tracked out the means of Dialogue: 0,0:02:27.86,0:02:33.20,Default,,0000,0000,0000,,your training examples. So it now has zero means, and then normalize each of your features so Dialogue: 0,0:02:33.20,0:02:35.51,Default,,0000,0000,0000,,that the variance of each feature is now one. Dialogue: 0,0:02:37.72,0:02:40.68,Default,,0000,0000,0000,,The next step was [inaudible] Dialogue: 0,0:02:40.68,0:02:44.39,Default,,0000,0000,0000,,computical variance matrix of your zero mean data. So Dialogue: 0,0:02:44.39,0:02:48.86,Default,,0000,0000,0000,,you compute it as follows. Dialogue: 0,0:02:48.86,0:02:51.31,Default,,0000,0000,0000,,The sum of all the products, Dialogue: 0,0:02:52.41,0:02:55.87,Default,,0000,0000,0000,,and then you find the Dialogue: 0,0:02:55.87,0:03:02.87,Default,,0000,0000,0000,,top K eigen vectors of Dialogue: 0,0:03:03.80,0:03:07.16,Default,,0000,0000,0000,,sigma. Dialogue: 0,0:03:07.16,0:03:09.59,Default,,0000,0000,0000,,So Dialogue: 0,0:03:09.59,0:03:12.59,Default,,0000,0000,0000,,last time we saw the applications of this. For example, Dialogue: 0,0:03:12.59,0:03:18.44,Default,,0000,0000,0000,,one of the applications was to eigen faces where Dialogue: 0,0:03:18.44,0:03:22.90,Default,,0000,0000,0000,,each of your training examples, XI, is an image. Dialogue: 0,0:03:22.90,0:03:24.18,Default,,0000,0000,0000,,So Dialogue: 0,0:03:24.18,0:03:26.75,Default,,0000,0000,0000,,if you have Dialogue: 0,0:03:26.75,0:03:30.89,Default,,0000,0000,0000,,100 by 100 images, if your pictures of faces are Dialogue: 0,0:03:30.89,0:03:33.25,Default,,0000,0000,0000,,100 pixels by 100 pixels, Dialogue: 0,0:03:33.25,0:03:37.52,Default,,0000,0000,0000,,then each of your training examples, XI, Dialogue: 0,0:03:37.52,0:03:39.99,Default,,0000,0000,0000,,will be a 10,000 dimensional vector, Dialogue: 0,0:03:39.99,0:03:42.73,Default,,0000,0000,0000,,corresponding to the Dialogue: 0,0:03:42.73,0:03:47.53,Default,,0000,0000,0000,,10,000 grayscale intensity pixel values. There are 10,000 pixel values in Dialogue: 0,0:03:47.53,0:03:50.35,Default,,0000,0000,0000,,each of your 100 by 100 images. Dialogue: 0,0:03:50.35,0:03:53.18,Default,,0000,0000,0000,,So the eigen faces application was where Dialogue: 0,0:03:53.18,0:03:56.28,Default,,0000,0000,0000,,the training example comprised Dialogue: 0,0:03:56.28,0:03:58.15,Default,,0000,0000,0000,,pictures of faces of people. Dialogue: 0,0:03:58.15,0:03:59.99,Default,,0000,0000,0000,,Then we ran PCA, Dialogue: 0,0:03:59.99,0:04:00.78,Default,,0000,0000,0000,,and then Dialogue: 0,0:04:00.78,0:04:03.09,Default,,0000,0000,0000,,to measure the distance between say Dialogue: 0,0:04:03.09,0:04:04.26,Default,,0000,0000,0000,,a face here Dialogue: 0,0:04:04.26,0:04:06.81,Default,,0000,0000,0000,,and a face there, we would project both Dialogue: 0,0:04:06.81,0:04:09.62,Default,,0000,0000,0000,,of the face images onto the subspace and then Dialogue: 0,0:04:09.62,0:04:11.47,Default,,0000,0000,0000,,measure Dialogue: 0,0:04:11.47,0:04:13.68,Default,,0000,0000,0000,,the distance along the subspace. So in eigen faces, you use something Dialogue: 0,0:04:13.68,0:04:16.92,Default,,0000,0000,0000,,like 50 principle components. Dialogue: 0,0:04:16.92,0:04:20.05,Default,,0000,0000,0000,,So Dialogue: 0,0:04:20.05,0:04:24.08,Default,,0000,0000,0000,,the difficulty of working with problems like these is that Dialogue: 0,0:04:24.08,0:04:27.33,Default,,0000,0000,0000,,in step two of the algorithm, Dialogue: 0,0:04:27.33,0:04:31.07,Default,,0000,0000,0000,,we construct the covariance matrix sigma. Dialogue: 0,0:04:31.07,0:04:37.59,Default,,0000,0000,0000,,The covariance matrix now becomes Dialogue: 0,0:04:37.59,0:04:42.92,Default,,0000,0000,0000,,a 10,000 by 10,000 dimensional matrix, which is huge. That has Dialogue: 0,0:04:42.92,0:04:45.58,Default,,0000,0000,0000,,100 million Dialogue: 0,0:04:45.58,0:04:47.65,Default,,0000,0000,0000,,entries, which is huge. Dialogue: 0,0:04:49.02,0:04:50.79,Default,,0000,0000,0000,,So let's apply PCA to Dialogue: 0,0:04:50.79,0:04:54.29,Default,,0000,0000,0000,,very, very high dimensional data, used as a point of reducing the Dialogue: 0,0:04:54.29,0:04:55.18,Default,,0000,0000,0000,,dimension. But Dialogue: 0,0:04:55.18,0:04:59.55,Default,,0000,0000,0000,,step two of this algorithm had this step where you were constructing [inaudible]. So Dialogue: 0,0:04:59.55,0:05:03.60,Default,,0000,0000,0000,,this extremely large matrix, which you can't do. Dialogue: 0,0:05:03.60,0:05:06.58,Default,,0000,0000,0000,,Come back to this in a second. It turns out one of Dialogue: 0,0:05:06.58,0:05:08.65,Default,,0000,0000,0000,,the other Dialogue: 0,0:05:08.65,0:05:10.64,Default,,0000,0000,0000,,frequently-used applications of Dialogue: 0,0:05:10.64,0:05:11.87,Default,,0000,0000,0000,,PCA Dialogue: 0,0:05:11.87,0:05:14.21,Default,,0000,0000,0000,,is actually to text data. Dialogue: 0,0:05:14.21,0:05:16.33,Default,,0000,0000,0000,,So here's what I Dialogue: 0,0:05:16.33,0:05:21.04,Default,,0000,0000,0000,,mean. Remember our vectorial representation of emails? Dialogue: 0,0:05:21.04,0:05:22.52,Default,,0000,0000,0000,,So this is way back Dialogue: 0,0:05:22.52,0:05:26.87,Default,,0000,0000,0000,,when we were talking about supervised learning algorithms for a Dialogue: 0,0:05:26.87,0:05:29.36,Default,,0000,0000,0000,,stand classification. You remember I said that Dialogue: 0,0:05:29.36,0:05:32.62,Default,,0000,0000,0000,,given a piece of email or given a piece of text document, you Dialogue: 0,0:05:32.62,0:05:35.48,Default,,0000,0000,0000,,can represent it using a very high-dimensional vector Dialogue: 0,0:05:35.48,0:05:36.70,Default,,0000,0000,0000,,by taking Dialogue: 0,0:05:38.99,0:05:43.89,Default,,0000,0000,0000,,- writing down a list of all the words in your dictionary. Somewhere you had the word Dialogue: 0,0:05:43.89,0:05:46.67,Default,,0000,0000,0000,,learn, somewhere you have the word Dialogue: 0,0:05:46.67,0:05:49.68,Default,,0000,0000,0000,,study Dialogue: 0,0:05:50.57,0:05:54.76,Default,,0000,0000,0000,,and so on. Dialogue: 0,0:05:54.76,0:05:58.18,Default,,0000,0000,0000,,Depending on whether each word appears or does not appear in your text document, you put either Dialogue: 0,0:05:58.18,0:05:59.36,Default,,0000,0000,0000,,a one or a zero Dialogue: 0,0:05:59.36,0:06:03.89,Default,,0000,0000,0000,,there. This is a representation we use on an electrode five or electrode six Dialogue: 0,0:06:03.89,0:06:06.73,Default,,0000,0000,0000,,for representing text documents for Dialogue: 0,0:06:06.73,0:06:08.67,Default,,0000,0000,0000,,when we're building Dialogue: 0,0:06:08.67,0:06:12.09,Default,,0000,0000,0000,,[inaudible] based classifiers for Dialogue: 0,0:06:12.09,0:06:14.30,Default,,0000,0000,0000,,[inaudible]. So it turns Dialogue: 0,0:06:14.30,0:06:17.72,Default,,0000,0000,0000,,out one of the common applications of Dialogue: 0,0:06:17.72,0:06:22.21,Default,,0000,0000,0000,,PCA is actually this text data representations as well. Dialogue: 0,0:06:22.21,0:06:23.68,Default,,0000,0000,0000,,When you apply PCA Dialogue: 0,0:06:23.68,0:06:25.65,Default,,0000,0000,0000,,to this sort of data, Dialogue: 0,0:06:25.65,0:06:27.100,Default,,0000,0000,0000,,the resulting Dialogue: 0,0:06:27.100,0:06:34.74,Default,,0000,0000,0000,,algorithm, it often just goes by a different name, just latent semantic indexing. Dialogue: 0,0:06:41.25,0:06:44.01,Default,,0000,0000,0000,,For the sake of completeness, I should say that Dialogue: 0,0:06:44.01,0:06:48.05,Default,,0000,0000,0000,,in LSI, you usually skip the preprocessing step. Dialogue: 0,0:07:06.09,0:07:09.93,Default,,0000,0000,0000,,For various reasons, in LSI, you usually don't normalize the mean of the data to Dialogue: 0,0:07:09.93,0:07:10.94,Default,,0000,0000,0000,,one, Dialogue: 0,0:07:10.94,0:07:14.17,Default,,0000,0000,0000,,and you usually don't normalize the variance of the features to one. Dialogue: 0,0:07:14.17,0:07:18.02,Default,,0000,0000,0000,,These are relatively minor Dialogue: 0,0:07:18.02,0:07:21.20,Default,,0000,0000,0000,,differences, it turns out, so it does something very Dialogue: 0,0:07:21.20,0:07:24.60,Default,,0000,0000,0000,,similar to PCA. Dialogue: 0,0:07:24.60,0:07:25.85,Default,,0000,0000,0000,, Dialogue: 0,0:07:25.85,0:07:27.44,Default,,0000,0000,0000,,Normalizing the variance to one Dialogue: 0,0:07:27.44,0:07:33.44,Default,,0000,0000,0000,,for text data would actually be a bad idea because all the words are - Dialogue: 0,0:07:33.44,0:07:34.39,Default,,0000,0000,0000,,because that Dialogue: 0,0:07:34.39,0:07:37.15,Default,,0000,0000,0000,,would have the affect of Dialogue: 0,0:07:37.15,0:07:39.23,Default,,0000,0000,0000,,dramatically scaling up the Dialogue: 0,0:07:39.23,0:07:43.78,Default,,0000,0000,0000,,weight of rarely occurring words. So for example, the word aardvark hardly ever Dialogue: 0,0:07:43.78,0:07:45.73,Default,,0000,0000,0000,,appears in any document. Dialogue: 0,0:07:45.73,0:07:48.92,Default,,0000,0000,0000,,So to normalize the variance Dialogue: 0,0:07:48.92,0:07:51.01,Default,,0000,0000,0000,,of the second feature to one, you end up - Dialogue: 0,0:07:51.01,0:07:54.86,Default,,0000,0000,0000,,you're scaling up the weight of the word aardvark Dialogue: 0,0:07:54.86,0:07:58.34,Default,,0000,0000,0000,,dramatically. I don't understand why [inaudible]. Dialogue: 0,0:07:58.34,0:08:01.45,Default,,0000,0000,0000,,So let's Dialogue: 0,0:08:01.45,0:08:05.32,Default,,0000,0000,0000,,see. [Inaudible] the language, Dialogue: 0,0:08:05.32,0:08:11.17,Default,,0000,0000,0000,,something that we want to do quite often is, give it two documents, Dialogue: 0,0:08:13.11,0:08:20.11,Default,,0000,0000,0000,,XI and XJ, to measure how similar they are. Dialogue: 0,0:08:20.18,0:08:22.40,Default,,0000,0000,0000,,So for example, Dialogue: 0,0:08:22.40,0:08:25.05,Default,,0000,0000,0000,,I may give you a document and ask Dialogue: 0,0:08:25.05,0:08:28.86,Default,,0000,0000,0000,,you to find me more documents like this one. We're reading some Dialogue: 0,0:08:28.86,0:08:30.90,Default,,0000,0000,0000,,article about some user event of today Dialogue: 0,0:08:30.90,0:08:33.77,Default,,0000,0000,0000,,and want to find out what other news articles there are. So I give you a document and Dialogue: 0,0:08:34.52,0:08:37.16,Default,,0000,0000,0000,,ask you to look at all the other documents you have Dialogue: 0,0:08:37.16,0:08:40.83,Default,,0000,0000,0000,,in this large set of documents and find the documents similar to Dialogue: 0,0:08:40.83,0:08:42.22,Default,,0000,0000,0000,,this. Dialogue: 0,0:08:43.74,0:08:45.21,Default,,0000,0000,0000,,So Dialogue: 0,0:08:45.21,0:08:48.17,Default,,0000,0000,0000,,this is typical text application, so Dialogue: 0,0:08:48.17,0:08:51.14,Default,,0000,0000,0000,,to measure the similarity Dialogue: 0,0:08:52.44,0:08:56.55,Default,,0000,0000,0000,,between two documents in XI and XJ, [inaudible] Dialogue: 0,0:08:56.55,0:08:59.95,Default,,0000,0000,0000,,each of these documents is represented Dialogue: 0,0:08:59.95,0:09:03.19,Default,,0000,0000,0000,,as one of these highdimensional vectors. Dialogue: 0,0:09:03.19,0:09:08.70,Default,,0000,0000,0000,,One common way to do this is to view each of your documents Dialogue: 0,0:09:08.70,0:09:12.71,Default,,0000,0000,0000,,as some sort of very high-dimensional vector. Dialogue: 0,0:09:12.71,0:09:14.10,Default,,0000,0000,0000,,So these Dialogue: 0,0:09:14.10,0:09:18.04,Default,,0000,0000,0000,,are vectors in the very highdimensional space where Dialogue: 0,0:09:18.04,0:09:20.30,Default,,0000,0000,0000,,the dimension of the vector is equal to Dialogue: 0,0:09:20.30,0:09:27.21,Default,,0000,0000,0000,,the number of words in your dictionary. Dialogue: 0,0:09:27.21,0:09:30.63,Default,,0000,0000,0000,,So maybe each of these documents lives in some Dialogue: 0,0:09:30.63,0:09:33.56,Default,,0000,0000,0000,,50,000-dimension space, if you have 50,000 words in your Dialogue: 0,0:09:33.56,0:09:37.09,Default,,0000,0000,0000,,dictionary. So one nature of the similarity between these two documents that's Dialogue: 0,0:09:37.09,0:09:39.80,Default,,0000,0000,0000,,often used is Dialogue: 0,0:09:39.80,0:09:41.44,Default,,0000,0000,0000,,what's the angle Dialogue: 0,0:09:41.44,0:09:43.35,Default,,0000,0000,0000,,between these two Dialogue: 0,0:09:43.35,0:09:50.35,Default,,0000,0000,0000,,documents. Dialogue: 0,0:09:51.24,0:09:52.75,Default,,0000,0000,0000,,In particular, Dialogue: 0,0:09:52.75,0:09:56.03,Default,,0000,0000,0000,,if the angle between these two vectors is small, then Dialogue: 0,0:09:56.03,0:09:59.59,Default,,0000,0000,0000,,the two documents, we'll consider them to be similar. If the angle between Dialogue: 0,0:09:59.59,0:10:03.31,Default,,0000,0000,0000,,these two vectors is large, then we consider the documents to be dissimilar. Dialogue: 0,0:10:03.31,0:10:05.53,Default,,0000,0000,0000,,So Dialogue: 0,0:10:05.53,0:10:10.06,Default,,0000,0000,0000,,more formally, one commonly used heuristic, the national language of processing, Dialogue: 0,0:10:10.06,0:10:13.95,Default,,0000,0000,0000,,is to say that the similarity between the two documents is a co-sine of the angle theta between them. Dialogue: 0,0:10:13.95,0:10:16.71,Default,,0000,0000,0000,,For similar Dialogue: 0,0:10:16.71,0:10:19.27,Default,,0000,0000,0000,,values, anyway, the co-sine Dialogue: 0,0:10:19.27,0:10:23.11,Default,,0000,0000,0000,,is a decreasing function of theta. Dialogue: 0,0:10:23.11,0:10:24.49,Default,,0000,0000,0000,,So the Dialogue: 0,0:10:24.49,0:10:29.56,Default,,0000,0000,0000,,smaller the angle between them, the larger the similarity. Dialogue: 0,0:10:29.56,0:10:30.69,Default,,0000,0000,0000,,The co-sine Dialogue: 0,0:10:30.69,0:10:35.97,Default,,0000,0000,0000,,between two vectors is, of course, just [inaudible] Dialogue: 0,0:10:35.97,0:10:42.97,Default,,0000,0000,0000,,divided Dialogue: 0,0:10:43.94,0:10:46.68,Default,,0000,0000,0000,,by - okay? Dialogue: 0,0:10:46.68,0:10:51.10,Default,,0000,0000,0000,,That's just the linear algebra or the standard Dialogue: 0,0:10:51.10,0:10:56.46,Default,,0000,0000,0000,,geometry definition of the co-sine between two vectors. Here's the Dialogue: 0,0:11:03.54,0:11:10.54,Default,,0000,0000,0000,,intuition behind what LSI is doing. The hope, as usual, is Dialogue: 0,0:11:17.93,0:11:21.06,Default,,0000,0000,0000,,that there Dialogue: 0,0:11:21.06,0:11:24.97,Default,,0000,0000,0000,,may be some interesting axis of variations in the data, Dialogue: 0,0:11:24.97,0:11:27.66,Default,,0000,0000,0000,,and there maybe some other axis that Dialogue: 0,0:11:27.66,0:11:29.34,Default,,0000,0000,0000,,are just Dialogue: 0,0:11:29.34,0:11:33.96,Default,,0000,0000,0000,,noise. So by projecting all of your data on lower-dimensional subspace, the hope is that by Dialogue: 0,0:11:33.96,0:11:37.85,Default,,0000,0000,0000,,running PCA on your text data this way, you can remove some of the noise in the data and Dialogue: 0,0:11:37.85,0:11:41.80,Default,,0000,0000,0000,,get better measures of the similarity between pairs of Dialogue: 0,0:11:41.80,0:11:45.49,Default,,0000,0000,0000,,documents. Let's just delve a little deeper into those examples to convey more intuition about what LSI Dialogue: 0,0:11:45.49,0:11:46.94,Default,,0000,0000,0000,,is doing. Dialogue: 0,0:11:46.94,0:11:47.94,Default,,0000,0000,0000,,So Dialogue: 0,0:11:47.94,0:11:54.94,Default,,0000,0000,0000,,look further in the definition of the co-sine similarity measure. So Dialogue: 0,0:11:56.43,0:11:59.79,Default,,0000,0000,0000,,the numerator Dialogue: 0,0:11:59.79,0:12:06.79,Default,,0000,0000,0000,,or Dialogue: 0,0:12:10.21,0:12:17.21,Default,,0000,0000,0000,,the similarity between the two documents was this inner product, Dialogue: 0,0:12:17.62,0:12:24.27,Default,,0000,0000,0000,,which is therefore sum over K, Dialogue: 0,0:12:24.27,0:12:27.50,Default,,0000,0000,0000,,XIK, Dialogue: 0,0:12:27.50,0:12:28.87,Default,,0000,0000,0000,,XJK. So Dialogue: 0,0:12:30.41,0:12:33.81,Default,,0000,0000,0000,,this inner product would be equal to zero if Dialogue: 0,0:12:33.81,0:12:36.91,Default,,0000,0000,0000,,the two documents have no words in common. So Dialogue: 0,0:12:36.91,0:12:39.90,Default,,0000,0000,0000,,this is really - sum over K - Dialogue: 0,0:12:41.10,0:12:42.93,Default,,0000,0000,0000,,indicator of Dialogue: 0,0:12:42.93,0:12:44.76,Default,,0000,0000,0000,,whether Dialogue: 0,0:12:44.76,0:12:47.17,Default,,0000,0000,0000,,documents, I and Dialogue: 0,0:12:47.17,0:12:54.17,Default,,0000,0000,0000,,J, Dialogue: 0,0:12:54.92,0:12:58.25,Default,,0000,0000,0000,,both contain the word, K, because Dialogue: 0,0:12:58.25,0:13:02.59,Default,,0000,0000,0000,,I guess XIK indicates whether document I contains the word Dialogue: 0,0:13:02.59,0:13:04.41,Default,,0000,0000,0000,,K, and XJK Dialogue: 0,0:13:04.41,0:13:07.83,Default,,0000,0000,0000,,indicates whether document J contains the word, K. Dialogue: 0,0:13:07.83,0:13:10.43,Default,,0000,0000,0000,,So the product would be one only Dialogue: 0,0:13:10.43,0:13:12.42,Default,,0000,0000,0000,,if the word K Dialogue: 0,0:13:12.42,0:13:14.54,Default,,0000,0000,0000,,appears in both documents. Dialogue: 0,0:13:14.54,0:13:17.74,Default,,0000,0000,0000,,Therefore, the similarity between these two documents would be Dialogue: 0,0:13:17.74,0:13:23.45,Default,,0000,0000,0000,,zero if the two documents have no words in common. Dialogue: 0,0:13:23.45,0:13:30.45,Default,,0000,0000,0000,,For example, Dialogue: 0,0:13:31.18,0:13:34.53,Default,,0000,0000,0000,,suppose your document, Dialogue: 0,0:13:34.53,0:13:40.46,Default,,0000,0000,0000,,XI, has the word study and the word Dialogue: 0,0:13:40.46,0:13:41.73,Default,,0000,0000,0000,,XJ, Dialogue: 0,0:13:41.73,0:13:43.46,Default,,0000,0000,0000,,has the word learn. Dialogue: 0,0:13:43.46,0:13:47.53,Default,,0000,0000,0000,,Then these two documents may be considered Dialogue: 0,0:13:47.53,0:13:49.07,Default,,0000,0000,0000,,entirely dissimilar. Dialogue: 0,0:13:50.02,0:13:53.28,Default,,0000,0000,0000,,[Inaudible] effective study strategies. Sometimes you read a Dialogue: 0,0:13:53.28,0:13:57.10,Default,,0000,0000,0000,,news article about that. So you ask, what other documents are similar to this? If Dialogue: 0,0:13:57.10,0:14:01.08,Default,,0000,0000,0000,,there are a bunch of other documents about good methods to Dialogue: 0,0:14:01.08,0:14:04.25,Default,,0000,0000,0000,,learn, than there are words in common. So similarity [inaudible] is zero. Dialogue: 0,0:14:04.25,0:14:06.79,Default,,0000,0000,0000,,So here's Dialogue: 0,0:14:06.79,0:14:09.20,Default,,0000,0000,0000,,a cartoon Dialogue: 0,0:14:09.20,0:14:10.97,Default,,0000,0000,0000,,of what we hope Dialogue: 0,0:14:10.97,0:14:12.82,Default,,0000,0000,0000,,[inaudible] PCA will do, Dialogue: 0,0:14:12.82,0:14:14.34,Default,,0000,0000,0000,,which is Dialogue: 0,0:14:14.34,0:14:17.32,Default,,0000,0000,0000,,suppose that on the horizontal axis, I plot Dialogue: 0,0:14:17.32,0:14:21.33,Default,,0000,0000,0000,,the word Dialogue: 0,0:14:21.33,0:14:25.31,Default,,0000,0000,0000,,learn, and on the vertical access, I plot the word study. Dialogue: 0,0:14:25.31,0:14:29.84,Default,,0000,0000,0000,,So the values take on either the value of zero or one. So if a document Dialogue: 0,0:14:29.84,0:14:33.04,Default,,0000,0000,0000,,contains the words learn but not study, then Dialogue: 0,0:14:33.04,0:14:35.26,Default,,0000,0000,0000,,it'll plot that document there. If Dialogue: 0,0:14:35.26,0:14:38.05,Default,,0000,0000,0000,,a document contains neither the word study nor learn, then it'll plot that Dialogue: 0,0:14:38.05,0:14:40.53,Default,,0000,0000,0000,,at zero, zero. Dialogue: 0,0:14:40.53,0:14:44.12,Default,,0000,0000,0000,,So here's a cartoon behind what PCA Dialogue: 0,0:14:44.12,0:14:46.94,Default,,0000,0000,0000,,is doing, which is Dialogue: 0,0:14:46.94,0:14:51.48,Default,,0000,0000,0000,,we identify lower dimensional subspace. That would be sum - eigen Dialogue: 0,0:14:51.48,0:14:57.63,Default,,0000,0000,0000,,vector, we get out of PCAs. Dialogue: 0,0:14:57.63,0:15:03.38,Default,,0000,0000,0000,,Now, supposed we have a document about learning. We have a document about studying. Dialogue: 0,0:15:03.38,0:15:05.15,Default,,0000,0000,0000,,The document about learning Dialogue: 0,0:15:05.15,0:15:07.71,Default,,0000,0000,0000,,points to the right. Document about studying points Dialogue: 0,0:15:07.71,0:15:11.44,Default,,0000,0000,0000,,up. So the inner product, or the co-sine angle between these two documents would be Dialogue: 0,0:15:11.44,0:15:13.17,Default,,0000,0000,0000,,- excuse Dialogue: 0,0:15:13.17,0:15:15.08,Default,,0000,0000,0000,,me. The inner product between Dialogue: 0,0:15:15.08,0:15:18.85,Default,,0000,0000,0000,,these two documents will be zero. Dialogue: 0,0:15:18.85,0:15:20.28,Default,,0000,0000,0000,,So these two Dialogue: 0,0:15:20.28,0:15:22.36,Default,,0000,0000,0000,,documents are entirely unrelated, Dialogue: 0,0:15:22.36,0:15:24.85,Default,,0000,0000,0000,,which is not what we want. Dialogue: 0,0:15:24.85,0:15:27.68,Default,,0000,0000,0000,,Documents about study, documents about learning, they are related. But Dialogue: 0,0:15:27.68,0:15:32.82,Default,,0000,0000,0000,,we take these two documents, and we project them Dialogue: 0,0:15:32.82,0:15:36.22,Default,,0000,0000,0000,,onto this subspace. Dialogue: 0,0:15:38.32,0:15:40.85,Default,,0000,0000,0000,,Then these two documents now become much Dialogue: 0,0:15:40.85,0:15:42.61,Default,,0000,0000,0000,,closer together, Dialogue: 0,0:15:42.61,0:15:44.79,Default,,0000,0000,0000,,and the algorithm will recognize that Dialogue: 0,0:15:44.79,0:15:47.65,Default,,0000,0000,0000,,when you say the inner product between these two documents, Dialogue: 0,0:15:47.65,0:15:50.58,Default,,0000,0000,0000,,you actually end up with a positive number. So Dialogue: 0,0:15:50.58,0:15:52.19,Default,,0000,0000,0000,,LSI enables Dialogue: 0,0:15:52.19,0:15:56.20,Default,,0000,0000,0000,,our algorithm to recognize that these two documents have some positive similarity Dialogue: 0,0:15:56.20,0:15:58.92,Default,,0000,0000,0000,,between them. Dialogue: 0,0:15:58.92,0:16:01.23,Default,,0000,0000,0000,,So that's just intuition Dialogue: 0,0:16:01.23,0:16:02.37,Default,,0000,0000,0000,,about what Dialogue: 0,0:16:02.37,0:16:05.00,Default,,0000,0000,0000,,PCA may be doing to text data. Dialogue: 0,0:16:05.00,0:16:09.42,Default,,0000,0000,0000,,The same thing goes to other examples and the words study and learn. So you have Dialogue: 0,0:16:09.42,0:16:11.14,Default,,0000,0000,0000,,- you find a document about Dialogue: 0,0:16:11.14,0:16:15.06,Default,,0000,0000,0000,,politicians and a document with the names of prominent Dialogue: 0,0:16:15.06,0:16:16.49,Default,,0000,0000,0000,,politicians. Dialogue: 0,0:16:17.56,0:16:20.74,Default,,0000,0000,0000,,That will also bring the documents closer together, Dialogue: 0,0:16:20.74,0:16:24.65,Default,,0000,0000,0000,,or just any related topics, they end up Dialogue: 0,0:16:24.65,0:16:25.67,Default,,0000,0000,0000,,[inaudible] Dialogue: 0,0:16:25.67,0:16:31.78,Default,,0000,0000,0000,,points closer together and just lower dimensional space. Dialogue: 0,0:16:33.13,0:16:38.38,Default,,0000,0000,0000,,Question about this? Interviewee: [Inaudible]. Dialogue: 0,0:16:38.38,0:16:43.93,Default,,0000,0000,0000,,Which ones? Dialogue: 0,0:16:43.93,0:16:50.93,Default,,0000,0000,0000,,This one? No, the line. Oh, this one. Oh, Dialogue: 0,0:16:53.82,0:16:54.47,Default,,0000,0000,0000,,yes. Dialogue: 0,0:16:54.47,0:17:01.47,Default,,0000,0000,0000,,Thank you. Dialogue: 0,0:17:01.67,0:17:05.51,Default,,0000,0000,0000,,[Inaudible]. Dialogue: 0,0:17:05.51,0:17:06.65,Default,,0000,0000,0000,,So Dialogue: 0,0:17:06.65,0:17:13.65,Default,,0000,0000,0000,,let's talk about how to actually implement this now. Dialogue: 0,0:17:17.23,0:17:20.52,Default,,0000,0000,0000,,Okay. How many of you know what Dialogue: 0,0:17:20.52,0:17:24.78,Default,,0000,0000,0000,,an SVD or single value decomposition is? Wow, Dialogue: 0,0:17:24.78,0:17:25.80,Default,,0000,0000,0000,,that's a lot of you. That's a Dialogue: 0,0:17:25.80,0:17:28.27,Default,,0000,0000,0000,,lot more than I thought. Dialogue: 0,0:17:28.27,0:17:35.27,Default,,0000,0000,0000,,Curious. Did you guys learn it as under grads or as graduate students? Dialogue: 0,0:17:37.31,0:17:38.03,Default,,0000,0000,0000,,All Dialogue: 0,0:17:38.03,0:17:40.43,Default,,0000,0000,0000,,right. Let Dialogue: 0,0:17:40.43,0:17:44.15,Default,,0000,0000,0000,,me talk about it anyway. I Dialogue: 0,0:17:44.15,0:17:47.79,Default,,0000,0000,0000,,wasn't expecting so many of you to know what SVD is, but I want to get this Dialogue: 0,0:17:47.79,0:17:51.03,Default,,0000,0000,0000,,on tape, just so everyone else can learn Dialogue: 0,0:17:51.03,0:17:55.05,Default,,0000,0000,0000,,about this, too. Dialogue: 0,0:17:55.05,0:17:59.02,Default,,0000,0000,0000,,So I'll say a little bit about how to implement Dialogue: 0,0:17:59.02,0:18:01.64,Default,,0000,0000,0000,,PCA. The problem I Dialogue: 0,0:18:01.64,0:18:05.11,Default,,0000,0000,0000,,was eluding to just now was that Dialogue: 0,0:18:05.11,0:18:09.31,Default,,0000,0000,0000,,when you have these very high-dimensional vectors, than sigma is a large matrix. In particular, for Dialogue: 0,0:18:09.31,0:18:11.62,Default,,0000,0000,0000,,our Dialogue: 0,0:18:11.62,0:18:13.39,Default,,0000,0000,0000,,text example, Dialogue: 0,0:18:13.39,0:18:18.00,Default,,0000,0000,0000,,if the vectors XI are 50,000 dimensional, Dialogue: 0,0:18:18.00,0:18:24.43,Default,,0000,0000,0000,,then Dialogue: 0,0:18:24.43,0:18:27.16,Default,,0000,0000,0000,,the covariance matrix will be 50,000 dimensional by 50,000 Dialogue: 0,0:18:27.16,0:18:32.91,Default,,0000,0000,0000,,dimensional, which is much too big to represent explicitly. Dialogue: 0,0:18:32.91,0:18:36.27,Default,,0000,0000,0000,,I guess many Dialogue: 0,0:18:36.27,0:18:41.24,Default,,0000,0000,0000,,of you already know this, but I'll just say it anyway. It Dialogue: 0,0:18:41.24,0:18:48.24,Default,,0000,0000,0000,,turns out there's another way to implement PCA, which is Dialogue: 0,0:18:48.68,0:18:49.45,Default,,0000,0000,0000,,if Dialogue: 0,0:18:49.45,0:18:53.34,Default,,0000,0000,0000,,A is any N by N matrix, Dialogue: 0,0:18:53.34,0:18:56.58,Default,,0000,0000,0000,,than one of the most remarkable results of linear algebra Dialogue: 0,0:18:56.58,0:19:00.97,Default,,0000,0000,0000,,is that the matrix, A, Dialogue: 0,0:19:00.97,0:19:04.28,Default,,0000,0000,0000,,can be decomposed into Dialogue: 0,0:19:04.28,0:19:07.33,Default,,0000,0000,0000,,a singular value Dialogue: 0,0:19:07.33,0:19:10.16,Default,,0000,0000,0000,,decomposition. What that means is that the matrix, A, which Dialogue: 0,0:19:10.16,0:19:11.52,Default,,0000,0000,0000,,is Dialogue: 0,0:19:11.52,0:19:12.68,Default,,0000,0000,0000,,N by N, Dialogue: 0,0:19:12.68,0:19:15.23,Default,,0000,0000,0000,,can always be decomposed into a product of Dialogue: 0,0:19:15.23,0:19:18.28,Default,,0000,0000,0000,,three matrixes. U is N by N, Dialogue: 0,0:19:18.28,0:19:21.47,Default,,0000,0000,0000,,D is a square matrix, which is N by N, and V is Dialogue: 0,0:19:23.75,0:19:27.17,Default,,0000,0000,0000,,also N by N. Dialogue: 0,0:19:27.17,0:19:30.91,Default,,0000,0000,0000,,D Dialogue: 0,0:19:30.91,0:19:34.89,Default,,0000,0000,0000,,is Dialogue: 0,0:19:34.89,0:19:37.32,Default,,0000,0000,0000,,going to be diagonal. Dialogue: 0,0:19:43.03,0:19:45.14,Default,,0000,0000,0000,,Zeros are on the off-diagonals, Dialogue: 0,0:19:45.14,0:19:52.14,Default,,0000,0000,0000,,and the values sigma I are called the singular values of Dialogue: 0,0:19:54.31,0:19:57.53,Default,,0000,0000,0000,,the matrix A. Dialogue: 0,0:20:01.50,0:20:05.47,Default,,0000,0000,0000,,Almost all of you said you learned this as a graduate student, rather than as an under grad, so Dialogue: 0,0:20:05.47,0:20:07.02,Default,,0000,0000,0000,,it turns out that Dialogue: 0,0:20:07.02,0:20:10.82,Default,,0000,0000,0000,,when you take a class in undergraduate linear algebra, usually you learn a bunch of Dialogue: 0,0:20:10.82,0:20:13.96,Default,,0000,0000,0000,,decomposition. So you usually learn about the QLD composition, maybe Dialogue: 0,0:20:13.96,0:20:17.07,Default,,0000,0000,0000,,the LU factorization of the matrixes. Dialogue: 0,0:20:17.07,0:20:20.35,Default,,0000,0000,0000,,Most under grad courses don't get to talk about singular value Dialogue: 0,0:20:20.35,0:20:21.49,Default,,0000,0000,0000,,decompositions, but at Dialogue: 0,0:20:21.49,0:20:22.86,Default,,0000,0000,0000,,least in - almost Dialogue: 0,0:20:22.86,0:20:24.58,Default,,0000,0000,0000,,everything I Dialogue: 0,0:20:24.58,0:20:26.90,Default,,0000,0000,0000,,do in machine learning, Dialogue: 0,0:20:26.90,0:20:31.18,Default,,0000,0000,0000,,you actually find that you end up using SVDs much more than any of the Dialogue: 0,0:20:31.18,0:20:33.12,Default,,0000,0000,0000,,decompositions Dialogue: 0,0:20:33.12,0:20:35.69,Default,,0000,0000,0000,,you learned in typical under grad linear algebra class. Dialogue: 0,0:20:35.69,0:20:40.03,Default,,0000,0000,0000,,So personally, I [inaudible] an SVD dozens of times in the last Dialogue: 0,0:20:40.03,0:20:42.82,Default,,0000,0000,0000,,year, but LU and QRD compositions, Dialogue: 0,0:20:42.82,0:20:45.48,Default,,0000,0000,0000,,I think I used the QRD composition once and an Dialogue: 0,0:20:45.48,0:20:49.53,Default,,0000,0000,0000,,LU decomposition in the last year. So let's see. I'll Dialogue: 0,0:20:49.53,0:20:53.88,Default,,0000,0000,0000,,say Dialogue: 0,0:20:53.88,0:20:54.83,Default,,0000,0000,0000,,a Dialogue: 0,0:20:54.83,0:21:01.83,Default,,0000,0000,0000,,bit Dialogue: 0,0:21:01.84,0:21:04.70,Default,,0000,0000,0000,,more about this. So I'm going to draw the picture, I guess. Dialogue: 0,0:21:04.70,0:21:08.09,Default,,0000,0000,0000,,For example, Dialogue: 0,0:21:08.09,0:21:10.89,Default,,0000,0000,0000,,if A is an N by N matrix, Dialogue: 0,0:21:10.89,0:21:17.89,Default,,0000,0000,0000,,it can be decomposed into another matrix, U, which is also N by N. It's the same Dialogue: 0,0:21:21.78,0:21:24.51,Default,,0000,0000,0000,,size, D, which is Dialogue: 0,0:21:24.51,0:21:29.84,Default,,0000,0000,0000,,N by Dialogue: 0,0:21:29.84,0:21:34.73,Default,,0000,0000,0000,,N. Another square matrix, V transpose, which is also Dialogue: 0,0:21:34.73,0:21:37.43,Default,,0000,0000,0000,,N by Dialogue: 0,0:21:37.43,0:21:42.85,Default,,0000,0000,0000,,N. Furthermore, in a singular value decomposition, the Dialogue: 0,0:21:42.85,0:21:46.46,Default,,0000,0000,0000,,columns of the matrix, U, will be the eigen Dialogue: 0,0:21:46.46,0:21:50.94,Default,,0000,0000,0000,,vectors Dialogue: 0,0:21:50.94,0:21:55.63,Default,,0000,0000,0000,,of A transpose, and the Dialogue: 0,0:21:55.63,0:22:02.02,Default,,0000,0000,0000,,columns of V will be the eigen vectors Dialogue: 0,0:22:02.02,0:22:05.73,Default,,0000,0000,0000,,of A Dialogue: 0,0:22:05.73,0:22:08.29,Default,,0000,0000,0000,,transpose A. Dialogue: 0,0:22:08.29,0:22:12.03,Default,,0000,0000,0000,, Dialogue: 0,0:22:12.03,0:22:16.66,Default,,0000,0000,0000,,To compute it, you just use the SVD commands Dialogue: 0,0:22:16.66,0:22:20.78,Default,,0000,0000,0000,,in Dialogue: 0,0:22:20.78,0:22:22.09,Default,,0000,0000,0000,,Matlab Dialogue: 0,0:22:22.09,0:22:23.01,Default,,0000,0000,0000,,or Dialogue: 0,0:22:23.01,0:22:26.76,Default,,0000,0000,0000,,Octave. Today, say the art in numerical linear algebra is that Dialogue: 0,0:22:26.76,0:22:31.04,Default,,0000,0000,0000,,SVD, singular value decompositions, and matrixes can be computed Dialogue: 0,0:22:31.04,0:22:34.45,Default,,0000,0000,0000,,extremely [inaudible]. We've Dialogue: 0,0:22:34.45,0:22:35.78,Default,,0000,0000,0000,,used a Dialogue: 0,0:22:35.78,0:22:37.61,Default,,0000,0000,0000,,package like Matlab or Octave Dialogue: 0,0:22:37.61,0:22:40.41,Default,,0000,0000,0000,,to compute, say, the eigen vectors of a matrix. Dialogue: 0,0:22:40.41,0:22:43.54,Default,,0000,0000,0000,,So if SVD Dialogue: 0,0:22:43.54,0:22:47.98,Default,,0000,0000,0000,,routines are even more numerically stable than eigen vector routines for finding eigen vector in the Dialogue: 0,0:22:47.98,0:22:49.23,Default,,0000,0000,0000,,matrix. So you can Dialogue: 0,0:22:49.23,0:22:50.25,Default,,0000,0000,0000,,safely Dialogue: 0,0:22:50.25,0:22:53.11,Default,,0000,0000,0000,,use a routine like this, and similar to the way they use Dialogue: 0,0:22:53.11,0:22:56.03,Default,,0000,0000,0000,,a square root command without thinking about Dialogue: 0,0:22:56.03,0:22:58.40,Default,,0000,0000,0000,,how it's computed. You can compute the square Dialogue: 0,0:22:58.40,0:23:03.64,Default,,0000,0000,0000,,root of something and just not worry about it. You know the computer will give you the right Dialogue: 0,0:23:03.64,0:23:07.67,Default,,0000,0000,0000,,answer. For most reasonably-sized matrixes, even up to thousands by thousands Dialogue: 0,0:23:07.67,0:23:11.18,Default,,0000,0000,0000,,matrixes, the SVD routine, I think of it as a square root function. If Dialogue: 0,0:23:11.18,0:23:14.70,Default,,0000,0000,0000,,you call it, it'll give you back the right answer. You don't have to worry Dialogue: 0,0:23:14.70,0:23:16.76,Default,,0000,0000,0000,,too much about Dialogue: 0,0:23:16.76,0:23:20.51,Default,,0000,0000,0000,,it. If you have extremely large matrixes, like a million by a million matrixes, I Dialogue: 0,0:23:20.51,0:23:23.20,Default,,0000,0000,0000,,might start to worry a bit, but a few thousand by a few Dialogue: 0,0:23:23.20,0:23:25.70,Default,,0000,0000,0000,,thousand matrixes, this is Dialogue: 0,0:23:25.70,0:23:29.33,Default,,0000,0000,0000,,implemented very well today. Dialogue: 0,0:23:29.33,0:23:31.36,Default,,0000,0000,0000,,[Inaudible]. Dialogue: 0,0:23:31.36,0:23:34.99,Default,,0000,0000,0000,,What's the complexity of SVD? That's a good question. I actually don't know. I want to guess it's roughly on the Dialogue: 0,0:23:34.99,0:23:36.47,Default,,0000,0000,0000,,order of N-cubed. Dialogue: 0,0:23:36.47,0:23:41.75,Default,,0000,0000,0000,,I'm not sure. [Inaudible] Dialogue: 0,0:23:41.75,0:23:45.01,Default,,0000,0000,0000,,algorithms, so I think - I don't know what's Dialogue: 0,0:23:45.01,0:23:50.47,Default,,0000,0000,0000,,known about the conversion Dialogue: 0,0:23:50.47,0:23:54.37,Default,,0000,0000,0000,,of Dialogue: 0,0:23:54.37,0:23:58.28,Default,,0000,0000,0000,,these algorithms. The example I drew out was for a facts matrix, and a matrix is Dialogue: 0,0:23:58.28,0:23:59.89,Default,,0000,0000,0000,,[inaudible]. Dialogue: 0,0:23:59.89,0:24:03.42,Default,,0000,0000,0000,,In the same way, you can also Dialogue: 0,0:24:03.42,0:24:08.38,Default,,0000,0000,0000,,call SVD on the tall matrix, so it's taller than it's wide. Dialogue: 0,0:24:08.38,0:24:15.38,Default,,0000,0000,0000,,It would decompose it into - okay? A Dialogue: 0,0:24:21.48,0:24:24.19,Default,,0000,0000,0000,,product of three matrixes like Dialogue: 0,0:24:24.19,0:24:28.47,Default,,0000,0000,0000,,that. Dialogue: 0,0:24:28.47,0:24:31.22,Default,,0000,0000,0000,,The nice thing about this is that Dialogue: 0,0:24:31.22,0:24:33.17,Default,,0000,0000,0000,,we can use it to compute Dialogue: 0,0:24:33.17,0:24:40.17,Default,,0000,0000,0000,,eigen vectors and PCA very efficiently. Dialogue: 0,0:24:42.15,0:24:47.38,Default,,0000,0000,0000,,In particular, a Dialogue: 0,0:24:47.38,0:24:52.69,Default,,0000,0000,0000,,covariance matrix sigma was Dialogue: 0,0:24:52.69,0:24:55.41,Default,,0000,0000,0000,,this. It was the sum of all the products, Dialogue: 0,0:24:55.41,0:25:00.75,Default,,0000,0000,0000,,so if you go back and recall the definition of the Dialogue: 0,0:25:00.75,0:25:01.82,Default,,0000,0000,0000,,design matrix - Dialogue: 0,0:25:01.82,0:25:05.08,Default,,0000,0000,0000,,I think I described this in Dialogue: 0,0:25:05.08,0:25:07.72,Default,,0000,0000,0000,,lecture two when Dialogue: 0,0:25:07.72,0:25:13.51,Default,,0000,0000,0000,,we derived the close form solution to these squares [inaudible] these squares. The Dialogue: 0,0:25:13.51,0:25:17.86,Default,,0000,0000,0000,,design matrix was this matrix where I took my Dialogue: 0,0:25:17.86,0:25:21.39,Default,,0000,0000,0000,,examples Dialogue: 0,0:25:21.39,0:25:26.16,Default,,0000,0000,0000,,and stacked them in Dialogue: 0,0:25:26.16,0:25:27.59,Default,,0000,0000,0000,,rows. Dialogue: 0,0:25:27.59,0:25:33.32,Default,,0000,0000,0000,,They call this the design matrix [inaudible]. So if Dialogue: 0,0:25:33.32,0:25:38.78,Default,,0000,0000,0000,,you construct the design matrix, then Dialogue: 0,0:25:38.78,0:25:43.24,Default,,0000,0000,0000,,the covariance matrix sigma Dialogue: 0,0:25:43.24,0:25:47.76,Default,,0000,0000,0000,,can be written just X transposing. Dialogue: 0,0:26:01.27,0:26:08.27,Default,,0000,0000,0000,,That's X transposed, and [inaudible]. Dialogue: 0,0:26:16.23,0:26:16.75,Default,,0000,0000,0000,,Okay? Dialogue: 0,0:26:16.75,0:26:18.86,Default,,0000,0000,0000,,I hope you see why the X transpose X gives you Dialogue: 0,0:26:18.86,0:26:22.46,Default,,0000,0000,0000,,the sum of products of Dialogue: 0,0:26:22.46,0:26:25.65,Default,,0000,0000,0000,,vectors. If you aren't seeing this right now, just go home and convince yourself Dialogue: 0,0:26:25.65,0:26:32.65,Default,,0000,0000,0000,,[inaudible] if it's Dialogue: 0,0:26:37.88,0:26:40.02,Default,,0000,0000,0000,,true. Dialogue: 0,0:26:40.02,0:26:47.02,Default,,0000,0000,0000,,To get the top K eigen vectors of Dialogue: 0,0:26:50.61,0:26:50.95,Default,,0000,0000,0000,,sigma, Dialogue: 0,0:26:50.95,0:26:55.89,Default,,0000,0000,0000,,you would take sigma Dialogue: 0,0:26:55.89,0:26:57.42,Default,,0000,0000,0000,,and decompose it Dialogue: 0,0:26:57.42,0:26:59.65,Default,,0000,0000,0000,,using Dialogue: 0,0:26:59.65,0:27:01.88,Default,,0000,0000,0000,,the - Dialogue: 0,0:27:01.88,0:27:06.71,Default,,0000,0000,0000,,excuse me. Dialogue: 0,0:27:06.71,0:27:11.96,Default,,0000,0000,0000,,You would take the matrix X, and you would compute as SVD. So you get USV transpose. Dialogue: 0,0:27:11.96,0:27:15.40,Default,,0000,0000,0000,,Then the top three Dialogue: 0,0:27:15.40,0:27:19.17,Default,,0000,0000,0000,,columns Dialogue: 0,0:27:19.17,0:27:21.66,Default,,0000,0000,0000,,of U Dialogue: 0,0:27:21.66,0:27:24.34,Default,,0000,0000,0000,,are the top K eigen Dialogue: 0,0:27:24.34,0:27:26.83,Default,,0000,0000,0000,,vectors Dialogue: 0,0:27:26.83,0:27:29.42,Default,,0000,0000,0000,,of Dialogue: 0,0:27:29.42,0:27:31.91,Default,,0000,0000,0000,,X transpose Dialogue: 0,0:27:31.91,0:27:33.66,Default,,0000,0000,0000,,X, Dialogue: 0,0:27:33.66,0:27:37.76,Default,,0000,0000,0000,,which is therefore, the top K eigen vectors of your Dialogue: 0,0:27:37.76,0:27:40.44,Default,,0000,0000,0000,,covariance matrix Dialogue: 0,0:27:40.44,0:27:42.25,Default,,0000,0000,0000,,sigma. So Dialogue: 0,0:27:42.25,0:27:46.17,Default,,0000,0000,0000,,in our examples, the Dialogue: 0,0:27:46.17,0:27:48.95,Default,,0000,0000,0000,,design matrix may be, say R. If you have Dialogue: 0,0:27:48.95,0:27:52.45,Default,,0000,0000,0000,,50,000 words in your dictionary, than the Dialogue: 0,0:27:52.45,0:27:55.05,Default,,0000,0000,0000,,design matrix would be Dialogue: 0,0:27:55.05,0:27:58.15,Default,,0000,0000,0000,,RM by 50,000. Dialogue: 0,0:27:58.15,0:28:01.87,Default,,0000,0000,0000,,[Inaudible] say 100 by 50,000, if you have 100 examples. Dialogue: 0,0:28:01.87,0:28:03.25,Default,,0000,0000,0000,,So X would be Dialogue: 0,0:28:03.25,0:28:06.51,Default,,0000,0000,0000,,quite tractable to represent and compute the Dialogue: 0,0:28:06.51,0:28:10.42,Default,,0000,0000,0000,,SVD, whereas the matrix sigma would be much harder to represent. This is Dialogue: 0,0:28:10.42,0:28:12.26,Default,,0000,0000,0000,,50,000 by 50,000. Dialogue: 0,0:28:12.26,0:28:15.87,Default,,0000,0000,0000,,So this gives you an efficient way to implement Dialogue: 0,0:28:15.87,0:28:18.07,Default,,0000,0000,0000,,PCA. Dialogue: 0,0:28:18.07,0:28:20.56,Default,,0000,0000,0000,,The reason I want to talk about this is Dialogue: 0,0:28:20.56,0:28:24.62,Default,,0000,0000,0000,,in previous years, I didn't talk [inaudible]. Dialogue: 0,0:28:24.62,0:28:28.76,Default,,0000,0000,0000,,The class projects, I found a number of students trying to implement SVD on huge Dialogue: 0,0:28:28.76,0:28:29.58,Default,,0000,0000,0000,,problems and [inaudible], Dialogue: 0,0:28:29.58,0:28:31.95,Default,,0000,0000,0000,,so this is Dialogue: 0,0:28:31.95,0:28:35.01,Default,,0000,0000,0000,,a much better to implement PCA Dialogue: 0,0:28:35.01,0:28:37.60,Default,,0000,0000,0000,,if you have extremely high dimensional data. If you have Dialogue: 0,0:28:37.60,0:28:39.60,Default,,0000,0000,0000,,low dimensional data, if Dialogue: 0,0:28:39.60,0:28:43.50,Default,,0000,0000,0000,,you have 50 or 100 dimensional data, then Dialogue: 0,0:28:43.50,0:28:44.97,Default,,0000,0000,0000,,computing sigma's no problem. You can Dialogue: 0,0:28:44.97,0:28:51.31,Default,,0000,0000,0000,,do it the old way, but otherwise, use the SVD to implement this. Dialogue: 0,0:28:52.78,0:28:59.78,Default,,0000,0000,0000,,Questions about this? Dialogue: 0,0:29:26.18,0:29:30.81,Default,,0000,0000,0000,,The last thing I want to say is that in practice, when you want to implement this, I want to say a note Dialogue: 0,0:29:30.81,0:29:32.24,Default,,0000,0000,0000,,of caution. Dialogue: 0,0:29:32.24,0:29:35.01,Default,,0000,0000,0000,,It turns out that Dialogue: 0,0:29:35.01,0:29:38.91,Default,,0000,0000,0000,,for many applications of - let's see. Dialogue: 0,0:29:38.91,0:29:42.42,Default,,0000,0000,0000,,When you apply SVD to these wide - yeah. Dialogue: 0,0:29:42.42,0:29:43.45,Default,,0000,0000,0000,,Just a quick question. Are Dialogue: 0,0:29:43.45,0:29:48.95,Default,,0000,0000,0000,,the top K columns of U or V because X transposed X is V transpose, right? Let's see. Oh, Dialogue: 0,0:29:48.95,0:29:52.94,Default,,0000,0000,0000,,yes. Dialogue: 0,0:29:52.94,0:29:54.57,Default,,0000,0000,0000,,I think you're Dialogue: 0,0:29:54.57,0:29:55.51,Default,,0000,0000,0000,,right. I Dialogue: 0,0:29:55.51,0:30:02.51,Default,,0000,0000,0000,,think you're right. Dialogue: 0,0:30:03.96,0:30:05.55,Default,,0000,0000,0000,,Let's Dialogue: 0,0:30:05.55,0:30:12.55,Default,,0000,0000,0000,,see. Is it top K columns of U or top K of V? Dialogue: 0,0:30:15.04,0:30:22.04,Default,,0000,0000,0000,,Yeah, I think you're right. Is that right? Something Dialogue: 0,0:30:31.56,0:30:38.56,Default,,0000,0000,0000,,bothers me about that, but I think you're right. Dialogue: 0,0:30:40.33,0:30:46.35,Default,,0000,0000,0000,,[Inaudible], so then X transpose X should be VDD. X Dialogue: 0,0:30:46.35,0:30:53.35,Default,,0000,0000,0000,,is UDV, so X transpose X would be - [Inaudible]. Dialogue: 0,0:31:01.14,0:31:03.39,Default,,0000,0000,0000,,If anyone thinks about this and Dialogue: 0,0:31:03.39,0:31:06.20,Default,,0000,0000,0000,,has another opinion, let me know, but I think you're Dialogue: 0,0:31:06.20,0:31:11.48,Default,,0000,0000,0000,,right. I'll make sure I get the details and let you Dialogue: 0,0:31:11.48,0:31:16.68,Default,,0000,0000,0000,,know. Dialogue: 0,0:31:16.68,0:31:22.24,Default,,0000,0000,0000,,Everyone's still looking at that. Dialogue: 0,0:31:22.24,0:31:22.72,Default,,0000,0000,0000,,Tom, can you figure out Dialogue: 0,0:31:22.72,0:31:25.11,Default,,0000,0000,0000,,the right answer and let me know? That sounds right. Okay. Cool. Okay. Dialogue: 0,0:31:25.11,0:31:30.07,Default,,0000,0000,0000,,So just Dialogue: 0,0:31:30.07,0:31:33.92,Default,,0000,0000,0000,,one last note, a note of caution. It turns out that Dialogue: 0,0:31:33.92,0:31:36.52,Default,,0000,0000,0000,,in this example, I was implementing SVD Dialogue: 0,0:31:36.52,0:31:43.52,Default,,0000,0000,0000,,with a wide matrix. So the matrix X was N by N. Dialogue: 0,0:31:44.13,0:31:47.01,Default,,0000,0000,0000,,It turns out when you Dialogue: 0,0:31:47.01,0:31:52.17,Default,,0000,0000,0000,,find the SVD decomposition of this, Dialogue: 0,0:31:52.17,0:31:57.87,Default,,0000,0000,0000,,it turns out that - Dialogue: 0,0:31:57.87,0:32:01.03,Default,,0000,0000,0000,,let's see. Yeah, I think you're definitely Dialogue: 0,0:32:01.03,0:32:04.23,Default,,0000,0000,0000,,right. So it turns out that we find the SVD Dialogue: 0,0:32:04.23,0:32:07.59,Default,,0000,0000,0000,,of this, the right-most portion of this block of this matrix would be all Dialogue: 0,0:32:07.59,0:32:14.59,Default,,0000,0000,0000,,zeros. Dialogue: 0,0:32:15.13,0:32:17.85,Default,,0000,0000,0000,,Also, when you compute the Dialogue: 0,0:32:17.85,0:32:22.19,Default,,0000,0000,0000,,matrix, D, a large part of this matrix would be zeros. Dialogue: 0,0:32:22.19,0:32:24.34,Default,,0000,0000,0000,,You have the matrix D Dialogue: 0,0:32:24.34,0:32:29.52,Default,,0000,0000,0000,,transpose. So depending on what convention you use, Dialogue: 0,0:32:29.52,0:32:36.15,Default,,0000,0000,0000,,for example, I think Matlab actually uses a convention of just Dialogue: 0,0:32:36.15,0:32:43.15,Default,,0000,0000,0000,,cutting off the zero elements. Dialogue: 0,0:32:47.78,0:32:50.75,Default,,0000,0000,0000,,So the Matlab uses the convention Dialogue: 0,0:32:50.75,0:32:54.26,Default,,0000,0000,0000,,of chopping off the right-most half of the U matrix and chopping off the bottom Dialogue: 0,0:32:54.26,0:32:56.52,Default,,0000,0000,0000,,portion of the D matrix. I'm not sure Dialogue: 0,0:32:56.52,0:33:00.86,Default,,0000,0000,0000,,if this even depends on the version of Matlab, Dialogue: 0,0:33:00.86,0:33:04.06,Default,,0000,0000,0000,,but when you call SVD on Matlab or some other numerical algebra packages, Dialogue: 0,0:33:04.06,0:33:07.82,Default,,0000,0000,0000,,there's slightly different conventions of how to define your SVD when Dialogue: 0,0:33:07.82,0:33:10.26,Default,,0000,0000,0000,,the matrix is wider than it is tall. Dialogue: 0,0:33:10.26,0:33:13.39,Default,,0000,0000,0000,,So just watch out for this and make sure you map Dialogue: 0,0:33:13.39,0:33:14.91,Default,,0000,0000,0000,,whatever convention Dialogue: 0,0:33:14.91,0:33:17.63,Default,,0000,0000,0000,,your numerical algebra library uses Dialogue: 0,0:33:17.63,0:33:22.62,Default,,0000,0000,0000,,to the original computations. Dialogue: 0,0:33:22.62,0:33:23.57,Default,,0000,0000,0000,,It turns out if Dialogue: 0,0:33:23.57,0:33:26.06,Default,,0000,0000,0000,,you turn Matlab [inaudible] Dialogue: 0,0:33:26.06,0:33:30.20,Default,,0000,0000,0000,,or you're writing C code. There are many scientific libraries that can Dialogue: 0,0:33:30.20,0:33:31.59,Default,,0000,0000,0000,, Dialogue: 0,0:33:31.59,0:33:36.11,Default,,0000,0000,0000,,compute SVDs for you, but they're just slightly different in Dialogue: 0,0:33:36.11,0:33:39.33,Default,,0000,0000,0000,,conventions for the dimensions of these matrixes. So just make sure you figure this out for the Dialogue: 0,0:33:39.33,0:33:44.23,Default,,0000,0000,0000,,package that you use. Dialogue: 0,0:33:44.23,0:33:46.23,Default,,0000,0000,0000,,Finally, I just want to Dialogue: 0,0:33:46.23,0:33:49.78,Default,,0000,0000,0000,,take the unsupervised learning algorithms we talked about and just put a little bit Dialogue: 0,0:33:49.78,0:33:52.01,Default,,0000,0000,0000,,of broader context. Dialogue: 0,0:33:52.01,0:33:56.45,Default,,0000,0000,0000,,This is partly in response to the questions I've gotten from students in Dialogue: 0,0:33:56.45,0:33:57.95,Default,,0000,0000,0000,,office hours and elsewhere Dialogue: 0,0:33:57.95,0:34:00.62,Default,,0000,0000,0000,,about when to use each of these Dialogue: 0,0:34:00.62,0:34:01.46,Default,,0000,0000,0000,,algorithms. So Dialogue: 0,0:34:01.46,0:34:05.70,Default,,0000,0000,0000,,I'm going to draw a two by two matrix. This is a little cartoon that Dialogue: 0,0:34:07.80,0:34:10.69,Default,,0000,0000,0000,,I find useful. Dialogue: 0,0:34:15.41,0:34:20.16,Default,,0000,0000,0000,,One of the algorithms we talked about earlier, right Dialogue: 0,0:34:20.16,0:34:21.46,Default,,0000,0000,0000,,before this, was Dialogue: 0,0:34:21.46,0:34:24.16,Default,,0000,0000,0000,,factor analysis, which was - it Dialogue: 0,0:34:24.16,0:34:27.92,Default,,0000,0000,0000,,was - I hope you remember that picture I drew where I would have a bunch of point Z on the Dialogue: 0,0:34:27.92,0:34:28.80,Default,,0000,0000,0000,,line. Dialogue: 0,0:34:28.80,0:34:32.97,Default,,0000,0000,0000,,Then I had these ellipses that I drew. I hope you Dialogue: 0,0:34:32.97,0:34:34.100,Default,,0000,0000,0000,,remember that Dialogue: 0,0:34:34.100,0:34:37.92,Default,,0000,0000,0000,,picture. This was a factor analysis model Dialogue: 0,0:34:37.92,0:34:38.64,Default,,0000,0000,0000,,which models Dialogue: 0,0:34:38.64,0:34:42.39,Default,,0000,0000,0000,,the density effects [inaudible], right? Dialogue: 0,0:34:42.39,0:34:44.64,Default,,0000,0000,0000,,It was also a PCA, just now. Dialogue: 0,0:34:44.64,0:34:45.80,Default,,0000,0000,0000,,So the Dialogue: 0,0:34:45.80,0:34:48.97,Default,,0000,0000,0000,,difference between factor analysis and PCA, the Dialogue: 0,0:34:48.97,0:34:51.83,Default,,0000,0000,0000,,way I think about it, is that factor analysis Dialogue: 0,0:34:51.83,0:34:53.96,Default,,0000,0000,0000,,is a density estimation algorithm. Dialogue: 0,0:34:53.96,0:34:57.74,Default,,0000,0000,0000,,It tries to model the density of the training example's X. Dialogue: 0,0:34:57.74,0:35:01.25,Default,,0000,0000,0000,,Whereas PCA Dialogue: 0,0:35:02.31,0:35:05.70,Default,,0000,0000,0000,,is not a probabilistic Dialogue: 0,0:35:05.70,0:35:06.75,Default,,0000,0000,0000,,algorithm. In particular, Dialogue: 0,0:35:06.75,0:35:11.27,Default,,0000,0000,0000,,it does not endow your training examples of any probabilistic Dialogue: 0,0:35:11.27,0:35:14.41,Default,,0000,0000,0000,,distributions and directly goes to find the subspace. Dialogue: 0,0:35:14.41,0:35:18.65,Default,,0000,0000,0000,,So in terms of when to use factor analysis and when to use PCA, Dialogue: 0,0:35:18.65,0:35:21.70,Default,,0000,0000,0000,,if your goal is to reduce the dimension of the data, Dialogue: 0,0:35:21.70,0:35:26.14,Default,,0000,0000,0000,,if your goal is to find the subspace that the data lies on, Dialogue: 0,0:35:26.14,0:35:27.31,Default,,0000,0000,0000,,then PCA Dialogue: 0,0:35:27.31,0:35:28.32,Default,,0000,0000,0000,,directly Dialogue: 0,0:35:28.32,0:35:32.18,Default,,0000,0000,0000,,tries to find the subspace. I think I would Dialogue: 0,0:35:32.18,0:35:35.28,Default,,0000,0000,0000,,tend to use PCA. Dialogue: 0,0:35:35.28,0:35:39.89,Default,,0000,0000,0000,,Factor analysis, it sort of assumes the data lies on a subspace. Dialogue: 0,0:35:39.89,0:35:45.04,Default,,0000,0000,0000,,Let me write a subspace here. Dialogue: 0,0:35:45.04,0:35:49.53,Default,,0000,0000,0000,,So both of these algorithms sort of assume the data maybe lies close Dialogue: 0,0:35:49.53,0:35:52.01,Default,,0000,0000,0000,,or on some low dimensional subspace, Dialogue: 0,0:35:52.01,0:35:55.100,Default,,0000,0000,0000,,but fundamentally, factor analysis, I think of it as a density estimation algorithm. Dialogue: 0,0:35:55.100,0:35:58.74,Default,,0000,0000,0000,,So that has some very high dimensional distribution. I Dialogue: 0,0:35:58.74,0:36:00.88,Default,,0000,0000,0000,,want to model P of X, then Dialogue: 0,0:36:00.88,0:36:04.52,Default,,0000,0000,0000,,the factor analysis is the algorithm I'm more inclined Dialogue: 0,0:36:04.52,0:36:05.40,Default,,0000,0000,0000,,to use. So Dialogue: 0,0:36:05.40,0:36:07.45,Default,,0000,0000,0000,,even though you could in theory, Dialogue: 0,0:36:07.45,0:36:12.20,Default,,0000,0000,0000,,I would tend to avoid trying to use factor analysis to Dialogue: 0,0:36:12.20,0:36:14.35,Default,,0000,0000,0000,,identify a subspace the Dialogue: 0,0:36:14.35,0:36:15.62,Default,,0000,0000,0000,,data Dialogue: 0,0:36:15.62,0:36:18.45,Default,,0000,0000,0000,,set lies on. So [inaudible], if you want to do Dialogue: 0,0:36:18.45,0:36:21.70,Default,,0000,0000,0000,,anomaly detection, if you want to model P of X Dialogue: 0,0:36:21.70,0:36:26.12,Default,,0000,0000,0000,,so that if you have a very low probability of N, you can factor an anomaly, Dialogue: 0,0:36:26.12,0:36:32.82,Default,,0000,0000,0000,,then I would tend to use factor analysis to do that density estimation. So factor Dialogue: 0,0:36:32.82,0:36:38.19,Default,,0000,0000,0000,,analysis and PCA are both algorithms that assume that your data lies in the subspace. Dialogue: 0,0:36:38.19,0:36:41.76,Default,,0000,0000,0000,,The other cause of algorithms we talked about was Dialogue: 0,0:36:41.76,0:36:45.38,Default,,0000,0000,0000,,algorithms that assumes the data Dialogue: 0,0:36:45.38,0:36:47.38,Default,,0000,0000,0000,,lies in Dialogue: 0,0:36:47.38,0:36:51.41,Default,,0000,0000,0000,,clumps or that the Dialogue: 0,0:36:51.41,0:36:52.36,Default,,0000,0000,0000,,data Dialogue: 0,0:36:52.36,0:36:57.87,Default,,0000,0000,0000,,has a few coherence to groups. So let me just fill in the rest of this picture. Dialogue: 0,0:37:08.58,0:37:09.50,Default,,0000,0000,0000,,So if you think your data lies Dialogue: 0,0:37:09.50,0:37:14.27,Default,,0000,0000,0000,,in clumps or lies in groups, and if it goes [inaudible] Dialogue: 0,0:37:14.27,0:37:16.13,Default,,0000,0000,0000,,density estimation, then I would Dialogue: 0,0:37:16.13,0:37:19.58,Default,,0000,0000,0000,,tend to use a mixture of [inaudible] Dialogue: 0,0:37:19.58,0:37:23.35,Default,,0000,0000,0000,,algorithm. But again, you don't necessarily want to endow your data of any probably Dialogue: 0,0:37:23.35,0:37:26.92,Default,,0000,0000,0000,,semantics, so if you just want to find the clumps of the groups, then I'd be inclined Dialogue: 0,0:37:26.92,0:37:28.100,Default,,0000,0000,0000,,to use a [inaudible] algorithm. So Dialogue: 0,0:37:29.77,0:37:33.29,Default,,0000,0000,0000,,haven't seen anyone else draw this picture before, but I tend to organize these things Dialogue: 0,0:37:33.29,0:37:34.99,Default,,0000,0000,0000,,this way in my brain. Dialogue: 0,0:37:34.99,0:37:36.42,Default,,0000,0000,0000,,Hopefully this helps guide Dialogue: 0,0:37:36.42,0:37:40.46,Default,,0000,0000,0000,,when you might use each of these algorithms as well, depending Dialogue: 0,0:37:40.46,0:37:44.72,Default,,0000,0000,0000,,on whether you believe the data might lie in the subspace or whether it might bind in Dialogue: 0,0:37:44.72,0:37:47.90,Default,,0000,0000,0000,,clumps or groups. Dialogue: 0,0:37:50.72,0:37:53.79,Default,,0000,0000,0000,,All right. Dialogue: 0,0:37:53.79,0:38:00.79,Default,,0000,0000,0000,,That wraps up the discussion on Dialogue: 0,0:38:02.63,0:38:08.72,Default,,0000,0000,0000,,PCA. What I want to do next is talk about Dialogue: 0,0:38:08.72,0:38:15.33,Default,,0000,0000,0000,,independent component analysis, or ICA. Yeah. Interviewee: I have Dialogue: 0,0:38:15.33,0:38:17.58,Default,,0000,0000,0000,,a Dialogue: 0,0:38:17.58,0:38:21.59,Default,,0000,0000,0000,,question about the upper right [inaudible]. So once you have all of the eigen vectors, Dialogue: 0,0:38:21.59,0:38:26.18,Default,,0000,0000,0000,,[inaudible] how similar is feature I to Dialogue: 0,0:38:26.18,0:38:29.96,Default,,0000,0000,0000,,feature J. You pick some eigen vector, and you take some dot products between the Dialogue: 0,0:38:29.96,0:38:31.57,Default,,0000,0000,0000,,feature I and Dialogue: 0,0:38:31.57,0:38:35.02,Default,,0000,0000,0000,,feature J and the eigen vector. But Dialogue: 0,0:38:35.02,0:38:39.63,Default,,0000,0000,0000,,there's a lot of eigen vectors to choose from. Instructor Dialogue: 0,0:38:39.63,0:38:42.05,Default,,0000,0000,0000,,(Andrew Ng):Right. So Justin's question was Dialogue: 0,0:38:42.05,0:38:45.88,Default,,0000,0000,0000,,having found my eigen vectors, how do I choose what eigen vector to use to Dialogue: 0,0:38:45.88,0:38:47.54,Default,,0000,0000,0000,,measure distance. I'm Dialogue: 0,0:38:47.54,0:38:48.95,Default,,0000,0000,0000,,going to Dialogue: 0,0:38:48.95,0:38:51.02,Default,,0000,0000,0000,,start Dialogue: 0,0:38:51.02,0:38:53.29,Default,,0000,0000,0000,,this up. Dialogue: 0,0:38:53.29,0:38:57.30,Default,,0000,0000,0000,,So the Dialogue: 0,0:38:57.30,0:38:58.32,Default,,0000,0000,0000,,answer is really Dialogue: 0,0:38:58.32,0:39:02.03,Default,,0000,0000,0000,,- in this cartoon, I would avoid thinking about Dialogue: 0,0:39:02.03,0:39:03.92,Default,,0000,0000,0000,,eigen vectors one other time. Dialogue: 0,0:39:03.92,0:39:08.05,Default,,0000,0000,0000,,A better way to view this cartoon is that this is actually - Dialogue: 0,0:39:08.05,0:39:11.60,Default,,0000,0000,0000,,if I decide to choose 100 eigen vectors, this is really 100 D Dialogue: 0,0:39:11.60,0:39:18.60,Default,,0000,0000,0000,,subspace. Dialogue: 0,0:39:19.26,0:39:20.34,Default,,0000,0000,0000,,So Dialogue: 0,0:39:20.34,0:39:24.88,Default,,0000,0000,0000,,I'm not actually projecting my data onto one eigen vector. Dialogue: 0,0:39:24.88,0:39:29.77,Default,,0000,0000,0000,,This arrow, this cartoon, this denotes the 100-dimensional Dialogue: 0,0:39:29.77,0:39:32.09,Default,,0000,0000,0000,,subspace [inaudible] by all my eigen vectors. Dialogue: 0,0:39:32.09,0:39:36.12,Default,,0000,0000,0000,,So what I actually do is project my data onto Dialogue: 0,0:39:36.12,0:39:40.15,Default,,0000,0000,0000,,the span, the linear span of eigen vectors. Then I Dialogue: 0,0:39:40.15,0:39:41.44,Default,,0000,0000,0000,,measure distance or take Dialogue: 0,0:39:41.44,0:39:43.49,Default,,0000,0000,0000,,inner products of the distance between Dialogue: 0,0:39:43.49,0:39:49.83,Default,,0000,0000,0000,,the projections of the two points of the eigen vectors. Okay. Dialogue: 0,0:39:49.83,0:39:54.14,Default,,0000,0000,0000,,So let's talk about ICA, Dialogue: 0,0:39:54.14,0:39:58.75,Default,,0000,0000,0000,,independent component analysis. Dialogue: 0,0:39:58.75,0:40:00.75,Default,,0000,0000,0000,,So whereas PCA Dialogue: 0,0:40:00.75,0:40:02.60,Default,,0000,0000,0000,,was an algorithm for finding Dialogue: 0,0:40:02.60,0:40:06.70,Default,,0000,0000,0000,,what I call the main axis of variations of data, Dialogue: 0,0:40:06.70,0:40:11.20,Default,,0000,0000,0000,,in ICA, we're going to try find the independent of components of variations in the Dialogue: 0,0:40:11.20,0:40:12.04,Default,,0000,0000,0000,,data. Dialogue: 0,0:40:12.04,0:40:14.94,Default,,0000,0000,0000,,So switch it to the laptop there, please. Dialogue: 0,0:40:14.94,0:40:16.12,Default,,0000,0000,0000,,We'll just Dialogue: 0,0:40:16.12,0:40:21.90,Default,,0000,0000,0000,,take a second to motivate that. I'm Dialogue: 0,0:40:21.90,0:40:26.77,Default,,0000,0000,0000,,going to do so by Dialogue: 0,0:40:26.77,0:40:32.43,Default,,0000,0000,0000,,- although if you put on the - okay. This is Dialogue: 0,0:40:32.43,0:40:36.62,Default,,0000,0000,0000,,actually a slide that I showed in Dialogue: 0,0:40:36.62,0:40:39.78,Default,,0000,0000,0000,,lecture one of the cocktail party problem. Dialogue: 0,0:40:39.78,0:40:42.62,Default,,0000,0000,0000,,Suppose you have two speakers at a cocktail party, Dialogue: 0,0:40:42.62,0:40:45.02,Default,,0000,0000,0000,,and you have two microphones in the Dialogue: 0,0:40:45.02,0:40:46.12,Default,,0000,0000,0000,,room, overlapping Dialogue: 0,0:40:46.12,0:40:47.96,Default,,0000,0000,0000,,sets of two conversations. Dialogue: 0,0:40:47.96,0:40:51.64,Default,,0000,0000,0000,,Then can you separate out the two original speaker sources? Dialogue: 0,0:40:51.64,0:40:55.65,Default,,0000,0000,0000,,So I actually played this audio as well in the very first lecture, which is Dialogue: 0,0:40:55.65,0:40:59.07,Default,,0000,0000,0000,,suppose microphone one records this. Dialogue: 0,0:40:59.07,0:41:05.49,Default,,0000,0000,0000,,[Recording] Dialogue: 0,0:41:13.23,0:41:16.65,Default,,0000,0000,0000,,So the question is, these are really two speakers, Dialogue: 0,0:41:16.65,0:41:20.81,Default,,0000,0000,0000,,speaking independently of each other. So each speaker is outputting Dialogue: 0,0:41:20.81,0:41:24.70,Default,,0000,0000,0000,,a series of sound signals as independent of the other conversation Dialogue: 0,0:41:24.70,0:41:26.12,Default,,0000,0000,0000,,going on in the room. Dialogue: 0,0:41:26.12,0:41:27.84,Default,,0000,0000,0000,,So Dialogue: 0,0:41:27.84,0:41:31.88,Default,,0000,0000,0000,,this being an supervised learning problem, the question is, can we take these two microphone Dialogue: 0,0:41:31.88,0:41:33.90,Default,,0000,0000,0000,,recordings and feed it to Dialogue: 0,0:41:33.90,0:41:37.33,Default,,0000,0000,0000,,an algorithm to find the independent components in Dialogue: 0,0:41:37.33,0:41:38.45,Default,,0000,0000,0000,,this Dialogue: 0,0:41:38.45,0:41:40.59,Default,,0000,0000,0000,,data? This is the output Dialogue: 0,0:41:42.44,0:41:48.62,Default,,0000,0000,0000,,when we do so. Dialogue: 0,0:41:48.62,0:41:55.05,Default,,0000,0000,0000,,[Recording] This is the other one. [Recording] Dialogue: 0,0:41:55.94,0:41:59.86,Default,,0000,0000,0000,,Just for fun. [Inaudible]. These are audio clips I got Dialogue: 0,0:42:01.41,0:42:04.41,Default,,0000,0000,0000,,from [inaudible]. Just for fun, let me play the other ones as well. This Dialogue: 0,0:42:04.41,0:42:11.41,Default,,0000,0000,0000,,is overlapping microphone one. [Recording] Dialogue: 0,0:42:13.44,0:42:20.44,Default,,0000,0000,0000,,Here's microphone two. [Recording] Dialogue: 0,0:42:21.74,0:42:24.25,Default,,0000,0000,0000,,So given this as input, here's output one. Dialogue: 0,0:42:24.25,0:42:27.46,Default,,0000,0000,0000,, Dialogue: 0,0:42:27.46,0:42:30.91,Default,,0000,0000,0000,,[Recording] Dialogue: 0,0:42:30.91,0:42:33.64,Default,,0000,0000,0000,,It's not perfect, but it's largely cleaned up the music. Dialogue: 0,0:42:33.64,0:42:40.64,Default,,0000,0000,0000,,Here's number two. [Recording] Okay. Switch back to Dialogue: 0,0:42:42.98,0:42:44.90,Default,,0000,0000,0000,,[inaudible], please. Dialogue: 0,0:42:44.90,0:42:46.98,Default,,0000,0000,0000,,So Dialogue: 0,0:42:46.98,0:42:53.98,Default,,0000,0000,0000,,what I want to do now is describe an algorithm that does that. Dialogue: 0,0:42:54.83,0:42:58.30,Default,,0000,0000,0000,,Before Dialogue: 0,0:42:58.30,0:43:03.24,Default,,0000,0000,0000,,I actually jump into the algorithm, I want to say two minutes Dialogue: 0,0:43:03.24,0:43:03.80,Default,,0000,0000,0000,,of Dialogue: 0,0:43:03.80,0:43:10.80,Default,,0000,0000,0000,,CDF, so cumulative distribution functions. I know most Dialogue: 0,0:43:18.67,0:43:21.09,Default,,0000,0000,0000,,of you know what these are, but I'm Dialogue: 0,0:43:21.09,0:43:23.60,Default,,0000,0000,0000,,just going to remind you of what they are. Dialogue: 0,0:43:24.58,0:43:30.35,Default,,0000,0000,0000,,Let's say you have a one-D random variable S. So suppose you have Dialogue: 0,0:43:30.35,0:43:35.90,Default,,0000,0000,0000,,a random variable, S, Dialogue: 0,0:43:35.90,0:43:41.47,Default,,0000,0000,0000,,and suppose it has a property density function [inaudible]. Dialogue: 0,0:43:41.47,0:43:43.41,Default,,0000,0000,0000,,Then Dialogue: 0,0:43:43.41,0:43:45.86,Default,,0000,0000,0000,,the CDF Dialogue: 0,0:43:45.86,0:43:50.14,Default,,0000,0000,0000,,is defined as a function, or rather as F, Dialogue: 0,0:43:50.14,0:43:53.73,Default,,0000,0000,0000,,which is the probability that the random variable, Dialogue: 0,0:43:53.73,0:43:55.92,Default,,0000,0000,0000,,S, is less than the value Dialogue: 0,0:43:55.92,0:43:58.54,Default,,0000,0000,0000,,given by that lower-case Dialogue: 0,0:43:58.54,0:43:59.87,Default,,0000,0000,0000,,value, Dialogue: 0,0:43:59.87,0:44:01.93,Default,,0000,0000,0000,,S. Dialogue: 0,0:44:01.93,0:44:03.27,Default,,0000,0000,0000,,For example, Dialogue: 0,0:44:03.27,0:44:06.10,Default,,0000,0000,0000,,if this is your [inaudible] density, Dialogue: 0,0:44:06.10,0:44:10.23,Default,,0000,0000,0000,,than the density of the [inaudible] usually Dialogue: 0,0:44:10.23,0:44:14.61,Default,,0000,0000,0000,,to note it lower-case phi. That's roughly a bell-shaped density. Then Dialogue: 0,0:44:14.61,0:44:20.32,Default,,0000,0000,0000,,the CDF or the Gaussian Dialogue: 0,0:44:20.32,0:44:22.27,Default,,0000,0000,0000,,will look something like this. Dialogue: 0,0:44:22.27,0:44:24.96,Default,,0000,0000,0000,,There'll be a capital function Dialogue: 0,0:44:24.96,0:44:27.34,Default,,0000,0000,0000,,pi. So if I pick a value Dialogue: 0,0:44:27.34,0:44:29.08,Default,,0000,0000,0000,,S like that, then the Dialogue: 0,0:44:29.08,0:44:30.45,Default,,0000,0000,0000,,height of this - Dialogue: 0,0:44:30.45,0:44:32.57,Default,,0000,0000,0000,,this is [inaudible] probability that Dialogue: 0,0:44:32.57,0:44:35.41,Default,,0000,0000,0000,,my Gaussian random variable is less than Dialogue: 0,0:44:35.41,0:44:37.42,Default,,0000,0000,0000,,that value there. In other words, Dialogue: 0,0:44:37.42,0:44:40.55,Default,,0000,0000,0000,,the height of the function at that point is Dialogue: 0,0:44:40.55,0:44:44.23,Default,,0000,0000,0000,,less Dialogue: 0,0:44:44.23,0:44:46.27,Default,,0000,0000,0000,,than the area of the Gaussian density, Dialogue: 0,0:44:46.27,0:44:48.12,Default,,0000,0000,0000,,up to the point S. Dialogue: 0,0:44:48.12,0:44:48.89,Default,,0000,0000,0000,,As you Dialogue: 0,0:44:48.89,0:44:52.69,Default,,0000,0000,0000,,move further and further to the right, this function will approach one, as Dialogue: 0,0:44:52.69,0:44:59.69,Default,,0000,0000,0000,,you integrate more and more of this area of the Gaussian. So another way to write Dialogue: 0,0:45:04.84,0:45:11.84,Default,,0000,0000,0000,,F Dialogue: 0,0:45:21.11,0:45:28.11,Default,,0000,0000,0000,,of Dialogue: 0,0:45:30.66,0:45:34.62,Default,,0000,0000,0000,,S is the integral, the minus infinity Dialogue: 0,0:45:34.62,0:45:35.73,Default,,0000,0000,0000,,to S of Dialogue: 0,0:45:35.73,0:45:41.74,Default,,0000,0000,0000,,the density, DT. Dialogue: 0,0:45:41.74,0:45:43.86,Default,,0000,0000,0000,,So something that'll come later is Dialogue: 0,0:45:43.86,0:45:48.32,Default,,0000,0000,0000,,suppose I have a random variable, S, and I want to model the distribution of the random Dialogue: 0,0:45:48.32,0:45:49.44,Default,,0000,0000,0000,,variable, S. Dialogue: 0,0:45:49.44,0:45:53.45,Default,,0000,0000,0000,,So one thing I could do is I can specify Dialogue: 0,0:45:53.45,0:45:56.55,Default,,0000,0000,0000,,what I think the density Dialogue: 0,0:45:56.55,0:45:58.05,Default,,0000,0000,0000,,is. Dialogue: 0,0:45:58.05,0:46:03.20,Default,,0000,0000,0000,,Or I can specify Dialogue: 0,0:46:03.20,0:46:04.45,Default,,0000,0000,0000,,what the Dialogue: 0,0:46:04.45,0:46:08.10,Default,,0000,0000,0000,,CDF Dialogue: 0,0:46:08.10,0:46:11.36,Default,,0000,0000,0000,,is. These are related by this equation. F is the integral of P of S. You Dialogue: 0,0:46:11.36,0:46:13.99,Default,,0000,0000,0000,,can also Dialogue: 0,0:46:13.99,0:46:15.72,Default,,0000,0000,0000,,recover the density Dialogue: 0,0:46:15.72,0:46:20.47,Default,,0000,0000,0000,,by taking the CDF and taking the derivative. So F prime, take the derivative Dialogue: 0,0:46:20.47,0:46:21.73,Default,,0000,0000,0000,,of the CDF, Dialogue: 0,0:46:21.73,0:46:23.44,Default,,0000,0000,0000,,you get back the Dialogue: 0,0:46:23.44,0:46:24.71,Default,,0000,0000,0000,,density. So this has come up Dialogue: 0,0:46:24.71,0:46:28.18,Default,,0000,0000,0000,,in the middle of when I derive ICA, which is that Dialogue: 0,0:46:28.18,0:46:32.17,Default,,0000,0000,0000,,there'll be a step where they need to assume a distribution for random variable, S. Dialogue: 0,0:46:32.17,0:46:36.36,Default,,0000,0000,0000,,I can either specify the density for S directly, or I can specify the CDF. I Dialogue: 0,0:46:36.36,0:46:38.82,Default,,0000,0000,0000,,choose to specify the Dialogue: 0,0:46:39.92,0:46:41.53,Default,,0000,0000,0000,,CDF. Dialogue: 0,0:46:41.53,0:46:46.92,Default,,0000,0000,0000,,It has to be some function increasing from zero to one. Dialogue: 0,0:46:46.92,0:46:48.03,Default,,0000,0000,0000,,So you can Dialogue: 0,0:46:48.03,0:46:50.68,Default,,0000,0000,0000,,choose any function that looks like that, and in particular, Dialogue: 0,0:46:51.97,0:46:55.47,Default,,0000,0000,0000,,pulling functions out of a hat that look like that. You can, for instance, choose a Dialogue: 0,0:46:55.47,0:46:58.99,Default,,0000,0000,0000,,sigmoid function of Dialogue: 0,0:46:58.99,0:47:04.22,Default,,0000,0000,0000,,CDF. That would be one way of specifying the distribution of the densities for the random variable S. So Dialogue: 0,0:47:04.22,0:47:05.11,Default,,0000,0000,0000,,this Dialogue: 0,0:47:05.11,0:47:12.11,Default,,0000,0000,0000,,will come up later. Dialogue: 0,0:47:30.30,0:47:33.58,Default,,0000,0000,0000,,Just [inaudible], just raise your hand if that is familiar to you, if you've seen Dialogue: 0,0:47:33.58,0:47:40.58,Default,,0000,0000,0000,,that before. Great. So Dialogue: 0,0:47:42.47,0:47:43.24,Default,,0000,0000,0000,,let's Dialogue: 0,0:47:43.24,0:47:48.63,Default,,0000,0000,0000,,start to derive our RCA, or our independent component analysis Dialogue: 0,0:47:48.63,0:47:50.43,Default,,0000,0000,0000,,algorithm. Dialogue: 0,0:47:50.43,0:47:53.99,Default,,0000,0000,0000,,Let's assume that the Dialogue: 0,0:47:55.86,0:47:59.82,Default,,0000,0000,0000,,data comes from Dialogue: 0,0:47:59.82,0:48:01.98,Default,,0000,0000,0000,,N original Dialogue: 0,0:48:01.98,0:48:03.32,Default,,0000,0000,0000,,sources. Dialogue: 0,0:48:03.32,0:48:07.01,Default,,0000,0000,0000,,So let's say there are N speakers in a cocktail party. Dialogue: 0,0:48:07.01,0:48:09.82,Default,,0000,0000,0000,,So the original sources, I'm Dialogue: 0,0:48:09.82,0:48:11.33,Default,,0000,0000,0000,,going to write as a vector, S Dialogue: 0,0:48:11.33,0:48:13.62,Default,,0000,0000,0000,,as in RN. Dialogue: 0,0:48:13.62,0:48:17.45,Default,,0000,0000,0000,,So just to be concrete about what I mean about that, I'm going to use Dialogue: 0,0:48:17.45,0:48:22.50,Default,,0000,0000,0000,,SIJ to denote the signal Dialogue: 0,0:48:22.50,0:48:25.85,Default,,0000,0000,0000,,from speaker Dialogue: 0,0:48:27.14,0:48:30.22,Default,,0000,0000,0000,,J Dialogue: 0,0:48:30.22,0:48:32.66,Default,,0000,0000,0000,,at time Dialogue: 0,0:48:32.66,0:48:34.08,Default,,0000,0000,0000,,I. Here's what I mean. Dialogue: 0,0:48:34.08,0:48:37.94,Default,,0000,0000,0000,,So what is sound? When you hear sound waves, sound is created Dialogue: 0,0:48:37.94,0:48:39.28,Default,,0000,0000,0000,,by a pattern Dialogue: 0,0:48:39.28,0:48:43.16,Default,,0000,0000,0000,,of expansions and compressions in air. So the way you're hearing my voice is Dialogue: 0,0:48:43.16,0:48:44.62,Default,,0000,0000,0000,,my Dialogue: 0,0:48:44.62,0:48:47.72,Default,,0000,0000,0000,,mouth is causing certain Dialogue: 0,0:48:47.72,0:48:50.96,Default,,0000,0000,0000,,changes in the air pressure, and then your ear is hearing my voice as Dialogue: 0,0:48:50.96,0:48:53.54,Default,,0000,0000,0000,,detecting those changes in air Dialogue: 0,0:48:53.54,0:48:57.73,Default,,0000,0000,0000,,pressure. So what a microphone records, what my mouth is generating, is Dialogue: 0,0:48:57.73,0:48:59.16,Default,,0000,0000,0000,,a pattern. Dialogue: 0,0:48:59.16,0:49:01.46,Default,,0000,0000,0000,,I'm going to draw a cartoon, Dialogue: 0,0:49:01.46,0:49:04.82,Default,,0000,0000,0000,,I guess. Dialogue: 0,0:49:04.82,0:49:06.06,Default,,0000,0000,0000,,Changes in Dialogue: 0,0:49:06.06,0:49:06.97,Default,,0000,0000,0000,,air pressure. So Dialogue: 0,0:49:06.97,0:49:11.12,Default,,0000,0000,0000,,this is what sound is. You look at a microphone recording, you see these roughly periodic Dialogue: 0,0:49:11.12,0:49:13.29,Default,,0000,0000,0000,,signals that comprise of Dialogue: 0,0:49:13.29,0:49:16.23,Default,,0000,0000,0000,,changes in air pressure over time as the air pressure goes Dialogue: 0,0:49:16.23,0:49:18.54,Default,,0000,0000,0000,,above and below some baseline air pressure. Dialogue: 0,0:49:18.54,0:49:19.67,Default,,0000,0000,0000,,So this Dialogue: 0,0:49:19.67,0:49:22.37,Default,,0000,0000,0000,,is what the speech signal looks like, say. Dialogue: 0,0:49:22.37,0:49:26.40,Default,,0000,0000,0000,,So this is speaker one. Dialogue: 0,0:49:26.40,0:49:29.04,Default,,0000,0000,0000,,Then what I'm saying is that Dialogue: 0,0:49:29.04,0:49:31.19,Default,,0000,0000,0000,,- this is some time, T. Dialogue: 0,0:49:31.19,0:49:34.48,Default,,0000,0000,0000,,What I'm saying is that the value of that point, Dialogue: 0,0:49:34.48,0:49:36.99,Default,,0000,0000,0000,,I'm going to denote as S, super Dialogue: 0,0:49:36.99,0:49:40.23,Default,,0000,0000,0000,,script T, sub script one. Dialogue: 0,0:49:40.23,0:49:41.73,Default,,0000,0000,0000,,Similarly, Dialogue: 0,0:49:41.73,0:49:44.89,Default,,0000,0000,0000,,speaker two, it's Dialogue: 0,0:49:44.89,0:49:46.86,Default,,0000,0000,0000,,outputting some sound wave. Speaker voice Dialogue: 0,0:49:46.86,0:49:49.75,Default,,0000,0000,0000,,will play that. It'll actually sound like Dialogue: 0,0:49:49.75,0:49:52.92,Default,,0000,0000,0000,,a single tone, I guess. Dialogue: 0,0:49:52.92,0:49:56.10,Default,,0000,0000,0000,,So in the same way, at the same time, T, Dialogue: 0,0:49:56.10,0:49:59.05,Default,,0000,0000,0000,,the value of the air Dialogue: 0,0:49:59.05,0:50:02.59,Default,,0000,0000,0000,,pressure generated by speaker two, I'll denote as Dialogue: 0,0:50:02.59,0:50:09.59,Default,,0000,0000,0000,,ST Dialogue: 0,0:50:16.58,0:50:23.58,Default,,0000,0000,0000,,2. Dialogue: 0,0:50:29.86,0:50:36.86,Default,,0000,0000,0000,,So we observe Dialogue: 0,0:50:37.77,0:50:40.45,Default,,0000,0000,0000,,XI equals A times SI, where Dialogue: 0,0:50:40.45,0:50:43.41,Default,,0000,0000,0000,,these XIs Dialogue: 0,0:50:43.41,0:50:45.99,Default,,0000,0000,0000,,are vectors in RN. Dialogue: 0,0:50:45.99,0:50:50.27,Default,,0000,0000,0000,,So I'm going to assume Dialogue: 0,0:50:50.27,0:50:53.26,Default,,0000,0000,0000,,that I have N microphones, Dialogue: 0,0:50:53.26,0:50:53.58,Default,,0000,0000,0000,,and Dialogue: 0,0:50:53.58,0:50:58.49,Default,,0000,0000,0000,,each of my microphones records some linear combination Dialogue: 0,0:50:58.49,0:51:01.87,Default,,0000,0000,0000,,of what the speakers are saying. So each microphone records some overlapping Dialogue: 0,0:51:01.87,0:51:04.50,Default,,0000,0000,0000,,combination of what the speakers are saying. Dialogue: 0,0:51:04.50,0:51:07.62,Default,,0000,0000,0000,,For Dialogue: 0,0:51:10.35,0:51:12.67,Default,,0000,0000,0000,,example, XIJ, which is - this Dialogue: 0,0:51:12.67,0:51:16.25,Default,,0000,0000,0000,,is what microphone J records at time, I. So Dialogue: 0,0:51:16.25,0:51:17.35,Default,,0000,0000,0000,,by definition of Dialogue: 0,0:51:17.35,0:51:21.52,Default,,0000,0000,0000,,the matrix multiplication, this is sum Dialogue: 0,0:51:21.52,0:51:23.98,Default,,0000,0000,0000,,of AIKSJ. Dialogue: 0,0:51:23.98,0:51:29.37,Default,,0000,0000,0000,,Oh, excuse me. Dialogue: 0,0:51:29.37,0:51:36.37,Default,,0000,0000,0000,,Okay? So what my J - sorry. Dialogue: 0,0:51:37.18,0:51:41.05,Default,,0000,0000,0000,,So what my J microphone is recording is Dialogue: 0,0:51:42.19,0:51:43.94,Default,,0000,0000,0000,,some linear combination of Dialogue: 0,0:51:43.94,0:51:45.57,Default,,0000,0000,0000,,all of the speakers. So Dialogue: 0,0:51:45.57,0:51:49.78,Default,,0000,0000,0000,,at time I, what microphone J is recording is some linear combination of Dialogue: 0,0:51:49.78,0:51:52.75,Default,,0000,0000,0000,,what all the speakers are saying at time I. Dialogue: 0,0:51:52.75,0:51:54.36,Default,,0000,0000,0000,,So K here Dialogue: 0,0:51:54.36,0:51:57.82,Default,,0000,0000,0000,,indexes over the N speakers. Dialogue: 0,0:51:57.82,0:52:01.24,Default,,0000,0000,0000,,So our goal Dialogue: 0,0:52:02.87,0:52:06.42,Default,,0000,0000,0000,,is to find the matrix, W, equals A inverse, and Dialogue: 0,0:52:06.42,0:52:10.13,Default,,0000,0000,0000,,just defining W that way. Dialogue: 0,0:52:10.13,0:52:17.13,Default,,0000,0000,0000,,So Dialogue: 0,0:52:18.14,0:52:21.35,Default,,0000,0000,0000,,we can recover the original sources Dialogue: 0,0:52:21.35,0:52:23.31,Default,,0000,0000,0000,,as a linear combination of Dialogue: 0,0:52:23.31,0:52:23.56,Default,,0000,0000,0000,,our Dialogue: 0,0:52:23.55,0:52:30.55,Default,,0000,0000,0000,,microphone recordings, XI. Dialogue: 0,0:52:33.06,0:52:35.33,Default,,0000,0000,0000,,Just as a point of notation, Dialogue: 0,0:52:35.33,0:52:42.33,Default,,0000,0000,0000,,I'm going to write the matrix W this way. I'm going to use Dialogue: 0,0:52:50.89,0:52:55.10,Default,,0000,0000,0000,,lower case W subscript one, subscript two and so on to denote the roles Dialogue: 0,0:52:55.10,0:53:02.10,Default,,0000,0000,0000,,of this matrix, W. Dialogue: 0,0:53:13.91,0:53:14.56,Default,,0000,0000,0000,,Let's Dialogue: 0,0:53:14.56,0:53:18.72,Default,,0000,0000,0000,,see. Dialogue: 0,0:53:18.72,0:53:23.54,Default,,0000,0000,0000,,So let's look at why IC is possible. Given these overlapping voices, Dialogue: 0,0:53:23.54,0:53:28.25,Default,,0000,0000,0000,,let's think briefly why it might be possible Dialogue: 0,0:53:28.25,0:53:30.76,Default,,0000,0000,0000,,to recover the original sources. Dialogue: 0,0:53:30.76,0:53:33.06,Default,,0000,0000,0000,,So for the next example, I want Dialogue: 0,0:53:33.06,0:53:36.51,Default,,0000,0000,0000,,to say Dialogue: 0,0:53:42.74,0:53:46.53,Default,,0000,0000,0000,,- let's say that each of my speakers Dialogue: 0,0:53:46.53,0:53:50.38,Default,,0000,0000,0000,,outputs - this will sound like white noise. Can I switch Dialogue: 0,0:53:50.38,0:53:53.38,Default,,0000,0000,0000,,the laptop display, Dialogue: 0,0:53:53.38,0:53:56.71,Default,,0000,0000,0000,,please? For this example, let's say that Dialogue: 0,0:53:57.22,0:54:01.46,Default,,0000,0000,0000,,each of my speakers outputs uniform white noise. So Dialogue: 0,0:54:01.46,0:54:05.46,Default,,0000,0000,0000,,if that's the case, these are my axis, S1 and S2. Dialogue: 0,0:54:05.46,0:54:08.82,Default,,0000,0000,0000,,This is what my two speakers would be uttering. Dialogue: 0,0:54:08.82,0:54:11.29,Default,,0000,0000,0000,,The parts of what they're Dialogue: 0,0:54:11.29,0:54:14.98,Default,,0000,0000,0000,,uttering will look like a line in a square box if the two speakers are independently Dialogue: 0,0:54:14.98,0:54:16.09,Default,,0000,0000,0000,,outputting Dialogue: 0,0:54:16.09,0:54:18.39,Default,,0000,0000,0000,,uniform minus one random variables. Dialogue: 0,0:54:18.39,0:54:20.29,Default,,0000,0000,0000,,So this is part of Dialogue: 0,0:54:20.29,0:54:24.01,Default,,0000,0000,0000,,S1 and S2, my original sources. Dialogue: 0,0:54:24.01,0:54:28.100,Default,,0000,0000,0000,,This would be a typical sample of what my microphones record. Here, at Dialogue: 0,0:54:28.100,0:54:31.40,Default,,0000,0000,0000,,the axis, are X1 and X2. Dialogue: 0,0:54:31.40,0:54:35.09,Default,,0000,0000,0000,,So these are images I got from [inaudible] on Dialogue: 0,0:54:35.09,0:54:37.27,Default,,0000,0000,0000,,ICA. Dialogue: 0,0:54:38.71,0:54:43.69,Default,,0000,0000,0000,,Given a picture like this, you can sort of look at this box, and you can sort of tell what the axis of Dialogue: 0,0:54:43.69,0:54:44.94,Default,,0000,0000,0000,,this Dialogue: 0,0:54:44.94,0:54:45.81,Default,,0000,0000,0000,,parallelogram Dialogue: 0,0:54:45.81,0:54:48.15,Default,,0000,0000,0000,,are. You can figure out Dialogue: 0,0:54:48.15,0:54:51.100,Default,,0000,0000,0000,,what linear transformation would transform the parallelogram back Dialogue: 0,0:54:51.100,0:54:54.36,Default,,0000,0000,0000,,to a box. Dialogue: 0,0:54:54.36,0:54:58.77,Default,,0000,0000,0000,,So it turns out there are some inherent ambiguities in ICA. Dialogue: 0,0:54:58.77,0:55:00.51,Default,,0000,0000,0000,,I'll just say what they are. Dialogue: 0,0:55:00.51,0:55:01.57,Default,,0000,0000,0000,,One is that Dialogue: 0,0:55:01.57,0:55:05.71,Default,,0000,0000,0000,,you can't recover the original indexing of the sources. In particular, Dialogue: 0,0:55:05.71,0:55:07.38,Default,,0000,0000,0000,,if Dialogue: 0,0:55:07.38,0:55:10.81,Default,,0000,0000,0000,,I generated the data for speaker one and speaker two, Dialogue: 0,0:55:10.81,0:55:14.47,Default,,0000,0000,0000,,you can run ICA, and then you may end up with the order of the speakers Dialogue: 0,0:55:14.47,0:55:17.53,Default,,0000,0000,0000,,reversed. What that corresponds to is if you take this Dialogue: 0,0:55:17.53,0:55:21.81,Default,,0000,0000,0000,,picture and you flip this picture along a 45-degree Dialogue: 0,0:55:21.81,0:55:26.13,Default,,0000,0000,0000,,axis. You take a 45-degree axis and reflect this picture across the 45-degree axis, you'll still Dialogue: 0,0:55:26.13,0:55:28.28,Default,,0000,0000,0000,,get a box. So Dialogue: 0,0:55:28.28,0:55:31.32,Default,,0000,0000,0000,,there's no way for the algorithms to tell which was speaker No. 1 and Dialogue: 0,0:55:31.32,0:55:32.91,Default,,0000,0000,0000,,which Dialogue: 0,0:55:32.91,0:55:37.70,Default,,0000,0000,0000,,was speaker No. 2. The numbering or the ordering of the speakers is Dialogue: 0,0:55:37.70,0:55:40.84,Default,,0000,0000,0000,,ambiguous. The other source of ambiguity, and these are the only ambiguities Dialogue: 0,0:55:40.84,0:55:42.09,Default,,0000,0000,0000,,in this example, Dialogue: 0,0:55:42.09,0:55:44.47,Default,,0000,0000,0000,,is the sign of the sources. So Dialogue: 0,0:55:44.47,0:55:49.12,Default,,0000,0000,0000,,given my speakers' recordings, Dialogue: 0,0:55:49.12,0:55:53.19,Default,,0000,0000,0000,,you can't tell whether you got a positive SI or whether you got Dialogue: 0,0:55:53.19,0:55:56.18,Default,,0000,0000,0000,,back a negative SI. Dialogue: 0,0:55:56.18,0:55:58.21,Default,,0000,0000,0000,,In this picture, what that corresponds to Dialogue: 0,0:55:58.21,0:56:02.10,Default,,0000,0000,0000,,is if you take this picture, and you reflect it along the vertical axis, if Dialogue: 0,0:56:02.10,0:56:04.66,Default,,0000,0000,0000,,you reflect it along the horizontal axis, Dialogue: 0,0:56:04.66,0:56:05.91,Default,,0000,0000,0000,,you still get a box. Dialogue: 0,0:56:05.91,0:56:08.72,Default,,0000,0000,0000,,You still get back [inaudible] speakers. Dialogue: 0,0:56:08.72,0:56:09.65,Default,,0000,0000,0000,,So Dialogue: 0,0:56:09.65,0:56:11.72,Default,,0000,0000,0000,,it turns out that in this example, Dialogue: 0,0:56:11.72,0:56:16.60,Default,,0000,0000,0000,,you can't guarantee that you've recovered positive SI rather Dialogue: 0,0:56:16.60,0:56:19.69,Default,,0000,0000,0000,,than negative SI. Dialogue: 0,0:56:19.69,0:56:21.93,Default,,0000,0000,0000,,So it turns out that these are the only Dialogue: 0,0:56:21.93,0:56:25.74,Default,,0000,0000,0000,,two ambiguities in this example. What is the permutation of the speakers, and the Dialogue: 0,0:56:25.74,0:56:28.14,Default,,0000,0000,0000,,other is the sign of the speakers. Dialogue: 0,0:56:28.14,0:56:30.75,Default,,0000,0000,0000,,Permutation of the speakers, there's not much you can do about that. Dialogue: 0,0:56:30.75,0:56:34.91,Default,,0000,0000,0000,,It turns out that if you take the audio Dialogue: 0,0:56:34.91,0:56:35.61,Default,,0000,0000,0000,,source Dialogue: 0,0:56:35.61,0:56:39.20,Default,,0000,0000,0000,,and if you flip the sign, and you take negative S, and if you play that through a Dialogue: 0,0:56:39.20,0:56:43.82,Default,,0000,0000,0000,,microphone it'll sound indistinguishable. Dialogue: 0,0:56:43.82,0:56:44.88,Default,,0000,0000,0000,,So Dialogue: 0,0:56:44.88,0:56:47.83,Default,,0000,0000,0000,,for many of the applications we care about, the sign Dialogue: 0,0:56:47.83,0:56:51.26,Default,,0000,0000,0000,,as well as the permutation Dialogue: 0,0:56:51.26,0:56:55.08,Default,,0000,0000,0000,,is ambiguous, but you don't really care Dialogue: 0,0:56:55.08,0:57:02.08,Default,,0000,0000,0000,,about it. Let's switch back Dialogue: 0,0:57:03.53,0:57:08.99,Default,,0000,0000,0000,,to Dialogue: 0,0:57:08.99,0:57:11.18,Default,,0000,0000,0000,,chalk board, please. Dialogue: 0,0:57:11.18,0:57:15.64,Default,,0000,0000,0000,,It turns out, and I don't want to spend too much time on this, but I do want to say it briefly. Dialogue: 0,0:57:15.64,0:57:17.29,Default,,0000,0000,0000,,It turns out the Dialogue: 0,0:57:17.29,0:57:19.20,Default,,0000,0000,0000,,reason why those are the only Dialogue: 0,0:57:19.20,0:57:25.81,Default,,0000,0000,0000,,sources of ambiguity - so the ambiguities were the Dialogue: 0,0:57:25.81,0:57:29.87,Default,,0000,0000,0000,,permutation of the speakers Dialogue: 0,0:57:29.87,0:57:31.96,Default,,0000,0000,0000,,and the signs. Dialogue: 0,0:57:31.96,0:57:35.40,Default,,0000,0000,0000,,It turns out that Dialogue: 0,0:57:35.40,0:57:39.92,Default,,0000,0000,0000,,the reason these were the only ambiguities was because Dialogue: 0,0:57:39.92,0:57:44.100,Default,,0000,0000,0000,,the SIJs were Dialogue: 0,0:57:44.100,0:57:46.69,Default,,0000,0000,0000,, Dialogue: 0,0:57:46.69,0:57:50.51,Default,,0000,0000,0000,,non-Gaussian. I don't want to spend too much time on this, but I'll say it briefly. Dialogue: 0,0:57:50.51,0:57:54.09,Default,,0000,0000,0000,,Suppose my original sources, S1 and S2, were Gaussian. Dialogue: 0,0:57:54.09,0:57:55.91,Default,,0000,0000,0000,,So Dialogue: 0,0:57:58.33,0:58:02.20,Default,,0000,0000,0000,,suppose SI is Dialogue: 0,0:58:02.20,0:58:04.34,Default,,0000,0000,0000,,Gaussian, would mean zero Dialogue: 0,0:58:04.34,0:58:07.02,Default,,0000,0000,0000,,and identity covariance. Dialogue: 0,0:58:07.02,0:58:10.96,Default,,0000,0000,0000,,That just means that each of my speakers outputs a Gaussian random variable. Here's a typical Dialogue: 0,0:58:10.96,0:58:12.62,Default,,0000,0000,0000,,example of Gaussian Dialogue: 0,0:58:12.62,0:58:18.48,Default,,0000,0000,0000,,data. Dialogue: 0,0:58:18.48,0:58:22.87,Default,,0000,0000,0000,,You will recall the contours of a Gaussian distribution with identity covariants Dialogue: 0,0:58:22.87,0:58:25.09,Default,,0000,0000,0000,,looks like Dialogue: 0,0:58:25.09,0:58:27.74,Default,,0000,0000,0000,,this, right? The Gaussian is a Dialogue: 0,0:58:27.74,0:58:30.57,Default,,0000,0000,0000,,spherically symmetric distribution. Dialogue: 0,0:58:30.57,0:58:35.22,Default,,0000,0000,0000,,So if my speakers were outputting Gaussian random variables, than if Dialogue: 0,0:58:35.22,0:58:38.18,Default,,0000,0000,0000,,I observe a linear combination of this, Dialogue: 0,0:58:38.18,0:58:40.48,Default,,0000,0000,0000,,there's actually no way to recover the Dialogue: 0,0:58:40.48,0:58:43.42,Default,,0000,0000,0000,,original distribution because there's no way for me to tell Dialogue: 0,0:58:43.42,0:58:46.12,Default,,0000,0000,0000,,if the axis are at this angle or if they're at Dialogue: 0,0:58:46.12,0:58:48.35,Default,,0000,0000,0000,,that angle and so Dialogue: 0,0:58:48.35,0:58:52.43,Default,,0000,0000,0000,,on. The Gaussian is a rotationally symmetric Dialogue: 0,0:58:52.43,0:58:56.77,Default,,0000,0000,0000,,distribution, so I would no be able to recover the orientation in the Dialogue: 0,0:58:56.77,0:58:58.84,Default,,0000,0000,0000,,rotation Dialogue: 0,0:58:58.84,0:59:02.28,Default,,0000,0000,0000,,of this. So I don't want to prove this too much. I don't want to spend too much time dwelling on this, but it turns Dialogue: 0,0:59:02.28,0:59:02.90,Default,,0000,0000,0000,,out Dialogue: 0,0:59:02.90,0:59:04.70,Default,,0000,0000,0000,,if your source is a Gaussian, Dialogue: 0,0:59:04.70,0:59:07.93,Default,,0000,0000,0000,,then it's actually impossible to do Dialogue: 0,0:59:07.93,0:59:12.05,Default,,0000,0000,0000,,ICA. ICA relies critically on your data being non-Gaussian because if the data Dialogue: 0,0:59:12.05,0:59:16.94,Default,,0000,0000,0000,,were Gaussian, then the rotation of the data would be ambiguous. So Dialogue: 0,0:59:16.94,0:59:19.08,Default,,0000,0000,0000,,regardless of how much data you have, Dialogue: 0,0:59:19.08,0:59:23.55,Default,,0000,0000,0000,,even if you had infinitely large amounts of data, you would not be able to recover Dialogue: 0,0:59:23.55,0:59:26.74,Default,,0000,0000,0000,,the matrix A or W. Dialogue: 0,0:59:32.78,0:59:39.78,Default,,0000,0000,0000,,Let's go ahead and divide the algorithm. Dialogue: 0,0:59:56.78,1:00:00.94,Default,,0000,0000,0000,,To do this, I need just one more result, and then the derivation will be Dialogue: 0,1:00:03.03,1:00:07.73,Default,,0000,0000,0000,,three lines. [Inaudible] many variables as N, which is the joint vector of the sound that all of my Dialogue: 0,1:00:07.73,1:00:11.31,Default,,0000,0000,0000,,speakers that are emitting at any time. Dialogue: 0,1:00:11.31,1:00:12.46,Default,,0000,0000,0000,,So Dialogue: 0,1:00:12.46,1:00:15.62,Default,,0000,0000,0000,,let's say the density of S is Dialogue: 0,1:00:15.62,1:00:17.34,Default,,0000,0000,0000,,P subscript S, Dialogue: 0,1:00:17.34,1:00:19.57,Default,,0000,0000,0000,,capital S. Dialogue: 0,1:00:19.57,1:00:23.40,Default,,0000,0000,0000,,So my microphone recording records S equals AS, Dialogue: 0,1:00:23.40,1:00:25.32,Default,,0000,0000,0000,,equals W inverse Dialogue: 0,1:00:25.32,1:00:31.02,Default,,0000,0000,0000,,S. Equivalently, S equals W sign of X. Dialogue: 0,1:00:31.02,1:00:34.53,Default,,0000,0000,0000,,So let's think about what is the density of Dialogue: 0,1:00:34.53,1:00:38.21,Default,,0000,0000,0000,,X. So I have P of S. I know the density of Dialogue: 0,1:00:38.21,1:00:41.36,Default,,0000,0000,0000,,S, and X is a linear combination of the S's. Dialogue: 0,1:00:41.36,1:00:45.17,Default,,0000,0000,0000,,So let's figure out what is the density of X. Dialogue: 0,1:00:45.17,1:00:48.67,Default,,0000,0000,0000,,One thing we could do is Dialogue: 0,1:00:48.67,1:00:51.34,Default,,0000,0000,0000,,figure out what S is. So this is just - Dialogue: 0,1:00:51.34,1:00:55.76,Default,,0000,0000,0000,,apply the density of Dialogue: 0,1:00:55.76,1:00:58.07,Default,,0000,0000,0000,,S to W of S. So let's Dialogue: 0,1:00:58.07,1:01:01.100,Default,,0000,0000,0000,,see. This is the probability of S, so we just Dialogue: 0,1:01:02.91,1:01:06.56,Default,,0000,0000,0000,,figure out what S is. S is W times X, so the probability of S is Dialogue: 0,1:01:06.56,1:01:09.94,Default,,0000,0000,0000,,W times X, so the probability of X must be [inaudible]. Dialogue: 0,1:01:09.94,1:01:11.62,Default,,0000,0000,0000,,So this is wrong. Dialogue: 0,1:01:11.62,1:01:14.75,Default,,0000,0000,0000,,It turns out you can do this for probably mass functions but not for Dialogue: 0,1:01:14.75,1:01:16.92,Default,,0000,0000,0000,,continuous density. So in particular, Dialogue: 0,1:01:16.92,1:01:20.97,Default,,0000,0000,0000,,it's not correct to say that the probability of X is - well, you just figure out what Dialogue: 0,1:01:20.97,1:01:22.50,Default,,0000,0000,0000,,S is. Dialogue: 0,1:01:22.50,1:01:26.19,Default,,0000,0000,0000,,Then you say the probability of S is applied to that. This is wrong. You Dialogue: 0,1:01:26.19,1:01:27.82,Default,,0000,0000,0000,,can't do this with densities. Dialogue: 0,1:01:27.82,1:01:30.97,Default,,0000,0000,0000,,You can't say the probability of S is that because it's a property density Dialogue: 0,1:01:30.97,1:01:32.97,Default,,0000,0000,0000,,function. Dialogue: 0,1:01:32.97,1:01:34.46,Default,,0000,0000,0000,,In particular, Dialogue: 0,1:01:34.46,1:01:35.51,Default,,0000,0000,0000,,the Dialogue: 0,1:01:35.51,1:01:37.85,Default,,0000,0000,0000,,right formula is the Dialogue: 0,1:01:37.85,1:01:40.44,Default,,0000,0000,0000,,density of S applied to W times X, Dialogue: 0,1:01:40.44,1:01:41.73,Default,,0000,0000,0000,,times the determinant Dialogue: 0,1:01:41.73,1:01:44.21,Default,,0000,0000,0000,,of the matrix, W. Dialogue: 0,1:01:44.21,1:01:47.19,Default,,0000,0000,0000,,Let me just illustrate that with an example. Dialogue: 0,1:01:47.19,1:01:49.92,Default,,0000,0000,0000,,Let's say Dialogue: 0,1:01:49.92,1:01:51.55,Default,,0000,0000,0000,,the Dialogue: 0,1:01:51.55,1:01:58.20,Default,,0000,0000,0000,,density for S is that. In Dialogue: 0,1:01:58.20,1:02:03.47,Default,,0000,0000,0000,,this example, S is uniform Dialogue: 0,1:02:03.47,1:02:05.54,Default,,0000,0000,0000,,over the unit interval. Dialogue: 0,1:02:07.68,1:02:14.68,Default,,0000,0000,0000,,So the density for S looks like that. It's Dialogue: 0,1:02:15.19,1:02:18.14,Default,,0000,0000,0000,,just density for the uniform Dialogue: 0,1:02:18.14,1:02:20.75,Default,,0000,0000,0000,,distribution of zero one. Dialogue: 0,1:02:20.75,1:02:24.15,Default,,0000,0000,0000,,So let me let X be equal to two times Dialogue: 0,1:02:24.15,1:02:30.01,Default,,0000,0000,0000,,S. So this means A equals two. Dialogue: 0,1:02:30.01,1:02:33.71,Default,,0000,0000,0000,,W equals one half. So if Dialogue: 0,1:02:33.71,1:02:36.72,Default,,0000,0000,0000,,S is a uniform distribution over zero, one, Dialogue: 0,1:02:36.72,1:02:40.32,Default,,0000,0000,0000,,then X, which is two times that, will be the uniform distribution over the Dialogue: 0,1:02:40.32,1:02:43.30,Default,,0000,0000,0000,,range from zero to two. Dialogue: 0,1:02:43.30,1:02:50.30,Default,,0000,0000,0000,,So the density for X will be - Dialogue: 0,1:02:54.36,1:02:57.29,Default,,0000,0000,0000,,that's one, that's two, Dialogue: 0,1:02:57.29,1:03:01.41,Default,,0000,0000,0000,,that's one half, Dialogue: 0,1:03:02.53,1:03:04.95,Default,,0000,0000,0000,,and Dialogue: 0,1:03:04.95,1:03:07.94,Default,,0000,0000,0000,,that's one. Okay? Density for X will be indicator Dialogue: 0,1:03:07.94,1:03:12.73,Default,,0000,0000,0000,,zero [inaudible] for X [inaudible] two Dialogue: 0,1:03:12.73,1:03:15.74,Default,,0000,0000,0000,,times W, times one half. Dialogue: 0,1:03:15.74,1:03:20.23,Default,,0000,0000,0000,,So Dialogue: 0,1:03:20.23,1:03:21.73,Default,,0000,0000,0000,,does that make Dialogue: 0,1:03:21.73,1:03:25.02,Default,,0000,0000,0000,,sense? [Inaudible] computer density for X because X is now spread out Dialogue: 0,1:03:25.02,1:03:28.65,Default,,0000,0000,0000,,across a wider range. The density of X is now smaller, Dialogue: 0,1:03:28.65,1:03:35.65,Default,,0000,0000,0000,,and therefore, the density of X has this one half Dialogue: 0,1:03:37.86,1:03:38.92,Default,,0000,0000,0000,,term Dialogue: 0,1:03:38.92,1:03:42.58,Default,,0000,0000,0000,,here. Okay? This is an illustration for the case of one-dimensional random variables, Dialogue: 0,1:03:42.58,1:03:44.29,Default,,0000,0000,0000,, Dialogue: 0,1:03:44.29,1:03:45.16,Default,,0000,0000,0000,,or S Dialogue: 0,1:03:45.16,1:03:49.49,Default,,0000,0000,0000,,and X of one D. I'm not going to show it, but the generalization of this to vector value random variables is that the Dialogue: 0,1:03:49.49,1:03:51.65,Default,,0000,0000,0000,,density of X is given by this Dialogue: 0,1:03:51.65,1:03:53.95,Default,,0000,0000,0000,,times the determinant of the matrix, W. Over here, Dialogue: 0,1:03:53.95,1:04:00.95,Default,,0000,0000,0000,,I showed the one dimensional [inaudible] generalization. Dialogue: 0,1:04:21.44,1:04:28.44,Default,,0000,0000,0000,,So we're nearly there. Here's Dialogue: 0,1:04:28.75,1:04:33.97,Default,,0000,0000,0000,,how I can implement ICA. Dialogue: 0,1:04:33.97,1:04:37.04,Default,,0000,0000,0000,,So my distribution on Dialogue: 0,1:04:37.04,1:04:44.04,Default,,0000,0000,0000,,S, Dialogue: 0,1:04:50.26,1:04:52.96,Default,,0000,0000,0000,,so I'm going to assume that my density on S Dialogue: 0,1:04:52.96,1:04:55.10,Default,,0000,0000,0000,,is given by this as a product over the Dialogue: 0,1:04:55.10,1:04:59.95,Default,,0000,0000,0000,,N speakers of the density - the product of speaker Dialogue: 0,1:04:59.95,1:05:00.89,Default,,0000,0000,0000,,I Dialogue: 0,1:05:00.89,1:05:03.66,Default,,0000,0000,0000,,emitting a certain sound. This is a product of densities. Dialogue: 0,1:05:03.66,1:05:07.66,Default,,0000,0000,0000,,This is a product of distributions because I'm going to assume that the Dialogue: 0,1:05:07.66,1:05:11.47,Default,,0000,0000,0000,,speakers are having independent conversations. So the SI's independent Dialogue: 0,1:05:11.47,1:05:15.87,Default,,0000,0000,0000,,for different values of I. Dialogue: 0,1:05:15.87,1:05:18.06,Default,,0000,0000,0000,,So by the formula we just worked out, Dialogue: 0,1:05:18.06,1:05:22.36,Default,,0000,0000,0000,,the density for X would be equal to that. Dialogue: 0,1:05:36.60,1:05:39.31,Default,,0000,0000,0000,,I'll just remind you, W was A Dialogue: 0,1:05:39.31,1:05:42.58,Default,,0000,0000,0000,,inverse. It was Dialogue: 0,1:05:42.58,1:05:43.93,Default,,0000,0000,0000,,this matrix Dialogue: 0,1:05:43.93,1:05:47.62,Default,,0000,0000,0000,,I defined previously Dialogue: 0,1:05:47.62,1:05:50.43,Default,,0000,0000,0000,,so that SI Dialogue: 0,1:05:50.43,1:05:52.52,Default,,0000,0000,0000,,equals WI [inaudible] Dialogue: 0,1:05:52.52,1:05:59.21,Default,,0000,0000,0000,,X. So that's what's in Dialogue: 0,1:05:59.21,1:06:02.30,Default,,0000,0000,0000,,there. To complete my formulation for this model, Dialogue: 0,1:06:02.30,1:06:06.36,Default,,0000,0000,0000,,the final thing I need to do is Dialogue: 0,1:06:06.36,1:06:10.18,Default,,0000,0000,0000,,choose Dialogue: 0,1:06:10.18,1:06:11.55,Default,,0000,0000,0000,,a density Dialogue: 0,1:06:11.55,1:06:14.26,Default,,0000,0000,0000,,for what I think each speaker is Dialogue: 0,1:06:14.26,1:06:17.95,Default,,0000,0000,0000,,saying. I need to assume some density over Dialogue: 0,1:06:17.95,1:06:21.66,Default,,0000,0000,0000,,the sounds emitted by an individual speaker. Dialogue: 0,1:06:21.66,1:06:25.63,Default,,0000,0000,0000,,So following the discussion I had right when the [inaudible] Dialogue: 0,1:06:25.63,1:06:27.65,Default,,0000,0000,0000,,ICA, Dialogue: 0,1:06:27.65,1:06:30.56,Default,,0000,0000,0000,,one thing I could do is I could choose Dialogue: 0,1:06:30.56,1:06:32.02,Default,,0000,0000,0000,,the density for S, Dialogue: 0,1:06:32.02,1:06:35.51,Default,,0000,0000,0000,,or equivalently, I could choose the CDF, the cumulative distribution Dialogue: 0,1:06:35.51,1:06:37.17,Default,,0000,0000,0000,,function for Dialogue: 0,1:06:37.17,1:06:38.22,Default,,0000,0000,0000,,S. Dialogue: 0,1:06:38.22,1:06:41.49,Default,,0000,0000,0000,,In this case, I'm going to choose Dialogue: 0,1:06:41.49,1:06:44.82,Default,,0000,0000,0000,,a CDF, probably for historical reasons and probably for Dialogue: 0,1:06:44.82,1:06:46.57,Default,,0000,0000,0000,,convenience. Dialogue: 0,1:06:46.57,1:06:50.02,Default,,0000,0000,0000,,I need to choose the CDF for S, so Dialogue: 0,1:06:50.02,1:06:54.78,Default,,0000,0000,0000,,what that means is I just need to choose some function that increases from zero to Dialogue: 0,1:06:54.78,1:06:59.44,Default,,0000,0000,0000,,what. I know I can't choose a Gaussian because we know you can't Dialogue: 0,1:06:59.44,1:07:02.20,Default,,0000,0000,0000,,do ICA on Gaussian data. Dialogue: 0,1:07:02.20,1:07:04.65,Default,,0000,0000,0000,,So I need some function increasing from zero to one Dialogue: 0,1:07:04.65,1:07:08.64,Default,,0000,0000,0000,,that is not the cumulative distribution function for a Dialogue: 0,1:07:08.64,1:07:10.36,Default,,0000,0000,0000,,Gaussian distribution. Dialogue: 0,1:07:10.36,1:07:14.01,Default,,0000,0000,0000,,So what other functions do I know that increase from zero to one? I Dialogue: 0,1:07:14.01,1:07:16.14,Default,,0000,0000,0000,,just choose the Dialogue: 0,1:07:16.14,1:07:18.33,Default,,0000,0000,0000,,CDF to be Dialogue: 0,1:07:18.33,1:07:21.98,Default,,0000,0000,0000,,the Dialogue: 0,1:07:21.98,1:07:23.04,Default,,0000,0000,0000,,sigmoid function. Dialogue: 0,1:07:23.04,1:07:24.73,Default,,0000,0000,0000,,This is a Dialogue: 0,1:07:24.73,1:07:27.23,Default,,0000,0000,0000,,commonly-made choice that Dialogue: 0,1:07:27.23,1:07:31.05,Default,,0000,0000,0000,,is made for convenience. There is actually no great reason for why you Dialogue: 0,1:07:31.05,1:07:34.08,Default,,0000,0000,0000,,choose a sigmoid function. It's just a convenient function that we all know Dialogue: 0,1:07:34.08,1:07:35.29,Default,,0000,0000,0000,,and are familiar with Dialogue: 0,1:07:35.29,1:07:37.85,Default,,0000,0000,0000,,that happens to increase from zero to one. Dialogue: 0,1:07:37.85,1:07:44.85,Default,,0000,0000,0000,,When you take the derivative Dialogue: 0,1:07:45.79,1:07:49.39,Default,,0000,0000,0000,,of the sigmoid, and that will give you back Dialogue: 0,1:07:49.39,1:07:50.12,Default,,0000,0000,0000,,your Dialogue: 0,1:07:50.12,1:07:55.46,Default,,0000,0000,0000,,density. This is just not Gaussian. This is the main virtue of choosing the sigmoid. Dialogue: 0,1:07:55.46,1:08:02.46,Default,,0000,0000,0000,,So Dialogue: 0,1:08:19.02,1:08:21.96,Default,,0000,0000,0000,,there's really no rational for the choice of sigma. Lots of other things will Dialogue: 0,1:08:21.96,1:08:23.60,Default,,0000,0000,0000,,work fine, too. Dialogue: 0,1:08:23.60,1:08:26.66,Default,,0000,0000,0000,,It's just a common, reasonable default. Dialogue: 0,1:08:38.04,1:08:40.28,Default,,0000,0000,0000,,It turns out that Dialogue: 0,1:08:40.28,1:08:44.63,Default,,0000,0000,0000,,one reason the sigma works well for a lot of data sources is that Dialogue: 0,1:08:44.63,1:08:49.08,Default,,0000,0000,0000,,if this is the Gaussian. Dialogue: 0,1:08:49.08,1:08:52.19,Default,,0000,0000,0000,,If you actually take the sigmoid and you take its derivative, Dialogue: 0,1:09:02.30,1:09:06.64,Default,,0000,0000,0000,,you find that the sigmoid has [inaudible] than the Gaussian. By this I mean Dialogue: 0,1:09:06.64,1:09:10.51,Default,,0000,0000,0000,,the density of the sigmoid dies down to zero much more slowly than Dialogue: 0,1:09:10.51,1:09:12.30,Default,,0000,0000,0000,,the Dialogue: 0,1:09:12.30,1:09:13.49,Default,,0000,0000,0000,,Gaussian. Dialogue: 0,1:09:13.49,1:09:18.08,Default,,0000,0000,0000,,The magnitudes of the tails dies down as E to the minus S squared. Dialogue: 0,1:09:18.08,1:09:21.96,Default,,0000,0000,0000,,For the sigmoid, the tails look like E to the minus Dialogue: 0,1:09:21.96,1:09:26.95,Default,,0000,0000,0000,,S. So the tails die down as E to the minus S, around E Dialogue: 0,1:09:26.95,1:09:29.53,Default,,0000,0000,0000,,to the minus S squared. It turns out that most distributions of this property Dialogue: 0,1:09:29.53,1:09:34.36,Default,,0000,0000,0000,,with [inaudible] tails, where the distribution decays to zero relatively slowly Dialogue: 0,1:09:34.36,1:09:38.44,Default,,0000,0000,0000,,compared to Gaussian will Dialogue: 0,1:09:38.44,1:09:39.92,Default,,0000,0000,0000,,work fine for your data. Dialogue: 0,1:09:39.92,1:09:43.94,Default,,0000,0000,0000,,Actually, one other choice you can sometimes us is what's called the Laplacian Dialogue: 0,1:09:43.94,1:09:46.23,Default,,0000,0000,0000,,distribution, which is Dialogue: 0,1:09:46.23,1:09:53.23,Default,,0000,0000,0000,,that. This will work fine, too, for many data sources. Dialogue: 0,1:10:06.54,1:10:08.11,Default,,0000,0000,0000,,Sticking with the sigmoid for now, I'll just Dialogue: 0,1:10:08.11,1:10:09.42,Default,,0000,0000,0000,,write Dialogue: 0,1:10:09.42,1:10:14.48,Default,,0000,0000,0000,,down the algorithm in two steps. So given Dialogue: 0,1:10:14.48,1:10:17.15,Default,,0000,0000,0000,,my training set, and Dialogue: 0,1:10:17.15,1:10:21.18,Default,,0000,0000,0000,,as you show, this is an unlabeled training set, I can Dialogue: 0,1:10:21.18,1:10:25.77,Default,,0000,0000,0000,,write down the log likelihood of my parameters. So that's - assembled my training Dialogue: 0,1:10:25.77,1:10:27.21,Default,,0000,0000,0000,,examples, log of - times Dialogue: 0,1:10:27.21,1:10:34.21,Default,,0000,0000,0000,,that. Dialogue: 0,1:10:42.87,1:10:44.88,Default,,0000,0000,0000,,So that's my log Dialogue: 0,1:10:44.88,1:10:51.88,Default,,0000,0000,0000,,likelihood. Dialogue: 0,1:10:53.34,1:10:59.38,Default,,0000,0000,0000,,To learn the parameters, W, of this model, I can use the [inaudible] assent, Dialogue: 0,1:10:59.38,1:11:06.38,Default,,0000,0000,0000,,which is Dialogue: 0,1:11:06.57,1:11:08.58,Default,,0000,0000,0000,,just that. Dialogue: 0,1:11:08.58,1:11:11.49,Default,,0000,0000,0000,,It turns out, if you work through the math, Dialogue: 0,1:11:11.49,1:11:13.97,Default,,0000,0000,0000,,let's see. If P of S Dialogue: 0,1:11:13.97,1:11:19.82,Default,,0000,0000,0000,,is equal to the derivative of the Dialogue: 0,1:11:19.82,1:11:23.78,Default,,0000,0000,0000,,sigmoid, then if you just work through the math to compute the [inaudible] there. You've all Dialogue: 0,1:11:23.78,1:11:27.41,Default,,0000,0000,0000,,done this a lot of times. I won't bother to show Dialogue: 0,1:11:27.41,1:11:34.41,Default,,0000,0000,0000,,the details. You find that is equal to this. Dialogue: 0,1:11:46.63,1:11:49.58,Default,,0000,0000,0000,,Okay? That's just - you can work those out yourself. It's just math to Dialogue: 0,1:11:49.58,1:11:54.50,Default,,0000,0000,0000,,compute the derivative of this with respect to Dialogue: 0,1:11:54.50,1:11:59.31,Default,,0000,0000,0000,,W. So to summarize, given the training set, Dialogue: 0,1:11:59.31,1:12:02.10,Default,,0000,0000,0000,,here's my [inaudible] update rule. So you run the Dialogue: 0,1:12:02.10,1:12:06.31,Default,,0000,0000,0000,,[inaudible] to learn the parameters W. Dialogue: 0,1:12:06.31,1:12:08.38,Default,,0000,0000,0000,,After you're Dialogue: 0,1:12:08.38,1:12:09.72,Default,,0000,0000,0000,,done, you then Dialogue: 0,1:12:12.37,1:12:14.11,Default,,0000,0000,0000,,output SI equals Dialogue: 0,1:12:14.11,1:12:16.99,Default,,0000,0000,0000,,WXI, and you've separated your sources Dialogue: 0,1:12:16.99,1:12:18.17,Default,,0000,0000,0000,,of your Dialogue: 0,1:12:18.17,1:12:21.78,Default,,0000,0000,0000,,data back out into the original independent sources. Dialogue: 0,1:12:21.78,1:12:26.20,Default,,0000,0000,0000,,Hopefully up to only a permutation and a plus/minus Dialogue: 0,1:12:26.20,1:12:30.65,Default,,0000,0000,0000,,sign ambiguity. Dialogue: 0,1:12:30.65,1:12:34.56,Default,,0000,0000,0000,,Okay? So just switch back to the laptop, please? Dialogue: 0,1:12:34.56,1:12:41.56,Default,,0000,0000,0000,,So we'll just wrap up with a couple of examples of applications of ICA. Dialogue: 0,1:12:42.21,1:12:43.15,Default,,0000,0000,0000,,This is Dialogue: 0,1:12:43.15,1:12:46.72,Default,,0000,0000,0000,,actually a picture of our TA, Katie. Dialogue: 0,1:12:46.72,1:12:49.98,Default,,0000,0000,0000,,So one of the applications of ICA is Dialogue: 0,1:12:49.98,1:12:52.01,Default,,0000,0000,0000,,to process Dialogue: 0,1:12:52.01,1:12:56.53,Default,,0000,0000,0000,,various types of [inaudible] recording data, so [inaudible]. This Dialogue: 0,1:12:56.53,1:12:58.78,Default,,0000,0000,0000,,is a picture of Dialogue: 0,1:12:58.78,1:13:02.47,Default,,0000,0000,0000,,a EEG cap, in which there are a number of electrodes Dialogue: 0,1:13:02.47,1:13:04.53,Default,,0000,0000,0000,,you place Dialogue: 0,1:13:04.53,1:13:07.96,Default,,0000,0000,0000,,on the - in this case, on Katie's brain, on Katie's scalp. Dialogue: 0,1:13:07.96,1:13:13.37,Default,,0000,0000,0000,,So where each electrode measures changes in voltage over time Dialogue: 0,1:13:13.37,1:13:15.06,Default,,0000,0000,0000,,on the scalp. Dialogue: 0,1:13:15.06,1:13:18.41,Default,,0000,0000,0000,,On the right, it's a typical example of [inaudible] data Dialogue: 0,1:13:18.41,1:13:22.57,Default,,0000,0000,0000,,where each electrode measures - just changes in voltage over Dialogue: 0,1:13:22.57,1:13:23.89,Default,,0000,0000,0000,,time. So Dialogue: 0,1:13:23.89,1:13:27.95,Default,,0000,0000,0000,,the horizontal axis is time, and the vertical axis is voltage. So here's the same thing, Dialogue: 0,1:13:27.95,1:13:29.56,Default,,0000,0000,0000,,blown up a little bit. Dialogue: 0,1:13:29.56,1:13:32.68,Default,,0000,0000,0000,,You notice there are artifacts in this Dialogue: 0,1:13:32.68,1:13:36.34,Default,,0000,0000,0000,,data. Where the circle is, where the data is circled, all Dialogue: 0,1:13:36.34,1:13:37.67,Default,,0000,0000,0000,,the Dialogue: 0,1:13:37.67,1:13:41.18,Default,,0000,0000,0000,,electrodes seem to measure in these very synchronized recordings. Dialogue: 0,1:13:41.18,1:13:44.70,Default,,0000,0000,0000,,It turns out that we look at [inaudible] data as well as a number of other Dialogue: 0,1:13:44.70,1:13:47.02,Default,,0000,0000,0000,,types of data, there are Dialogue: 0,1:13:47.02,1:13:51.55,Default,,0000,0000,0000,,artifacts from heartbeats and from human eye blinks and so on. So the Dialogue: 0,1:13:51.55,1:13:55.03,Default,,0000,0000,0000,,cartoonist, if you imagine, placing the Dialogue: 0,1:13:55.03,1:13:56.73,Default,,0000,0000,0000,,electrodes, or Dialogue: 0,1:13:56.73,1:13:58.32,Default,,0000,0000,0000,,microphones, on my scalp, Dialogue: 0,1:13:58.32,1:14:01.84,Default,,0000,0000,0000,,then each microphone is recording some overlapping combination of all the Dialogue: 0,1:14:01.84,1:14:04.92,Default,,0000,0000,0000,,things happening in my brain or in my body. Dialogue: 0,1:14:04.92,1:14:08.38,Default,,0000,0000,0000,,My brain has a number of different processes going on. My body's [inaudible] Dialogue: 0,1:14:08.38,1:14:10.52,Default,,0000,0000,0000,,going on, and Dialogue: 0,1:14:10.52,1:14:13.43,Default,,0000,0000,0000,,each electrode measures a sum Dialogue: 0,1:14:13.43,1:14:15.68,Default,,0000,0000,0000,,of the different voices in my brain. Dialogue: 0,1:14:15.68,1:14:19.79,Default,,0000,0000,0000,,That didn't quite come out the way I wanted it to. Dialogue: 0,1:14:19.79,1:14:21.53,Default,,0000,0000,0000,,So we can just take this data Dialogue: 0,1:14:21.53,1:14:25.40,Default,,0000,0000,0000,,and run ICA on it and find out one of the independent components, what the Dialogue: 0,1:14:25.40,1:14:26.13,Default,,0000,0000,0000,,independent Dialogue: 0,1:14:26.13,1:14:30.33,Default,,0000,0000,0000,,process are going on in my brain. This is an example of running ICA. Dialogue: 0,1:14:30.33,1:14:33.24,Default,,0000,0000,0000,,So you find that a small number of components, like those shown up there, Dialogue: 0,1:14:33.24,1:14:37.74,Default,,0000,0000,0000,,they correspond to heartbeat, where the arrows - so those are very periodic Dialogue: 0,1:14:37.74,1:14:42.33,Default,,0000,0000,0000,,signals. They come on occasionally and correspond to [inaudible] components of Dialogue: 0,1:14:42.33,1:14:43.05,Default,,0000,0000,0000,,heartbeat. Dialogue: 0,1:14:43.05,1:14:47.46,Default,,0000,0000,0000,,You also find things like an eye blink component, corresponding to a Dialogue: 0,1:14:47.46,1:14:49.78,Default,,0000,0000,0000,,sigmoid generated when you blink your eyes. Dialogue: 0,1:14:49.78,1:14:53.82,Default,,0000,0000,0000,,By doing this, you can then subtract out the heartbeat and the eye blink Dialogue: 0,1:14:53.82,1:14:56.18,Default,,0000,0000,0000,,artifacts from the data, and now Dialogue: 0,1:14:56.18,1:15:01.22,Default,,0000,0000,0000,,you get much cleaner ICA data - get much cleaner EEG readings. You can Dialogue: 0,1:15:01.22,1:15:03.70,Default,,0000,0000,0000,,do further scientific studies. So this is a Dialogue: 0,1:15:03.70,1:15:06.18,Default,,0000,0000,0000,,pretty commonly used preprocessing step Dialogue: 0,1:15:06.18,1:15:09.70,Default,,0000,0000,0000,,that is a common application of ICA. Dialogue: 0,1:15:09.70,1:15:13.03,Default,,0000,0000,0000,,[Inaudible] example is Dialogue: 0,1:15:13.03,1:15:16.30,Default,,0000,0000,0000,,the application, again, from [inaudible]. As Dialogue: 0,1:15:16.30,1:15:20.90,Default,,0000,0000,0000,,a result of running ICA on natural small image patches. Suppose I take Dialogue: 0,1:15:20.90,1:15:22.05,Default,,0000,0000,0000,,natural images Dialogue: 0,1:15:22.05,1:15:25.91,Default,,0000,0000,0000,,and run ICA on the data and ask what are the independent components of data. Dialogue: 0,1:15:25.91,1:15:30.04,Default,,0000,0000,0000,,It turns out that these are the bases you get. So this is a plot of the Dialogue: 0,1:15:30.04,1:15:32.53,Default,,0000,0000,0000,,sources you get. Dialogue: 0,1:15:32.53,1:15:36.27,Default,,0000,0000,0000,,This algorithm is saying that a natural image patch Dialogue: 0,1:15:36.27,1:15:37.75,Default,,0000,0000,0000,,shown Dialogue: 0,1:15:37.75,1:15:39.79,Default,,0000,0000,0000,,on the left Dialogue: 0,1:15:39.79,1:15:45.33,Default,,0000,0000,0000,,is often expressed as a sum, or a linear combination, of Dialogue: 0,1:15:45.33,1:15:46.68,Default,,0000,0000,0000,,independent sources of Dialogue: 0,1:15:46.68,1:15:48.16,Default,,0000,0000,0000,,things that make up images. Dialogue: 0,1:15:48.16,1:15:52.78,Default,,0000,0000,0000,,So this model's natural images is generated by independent objects Dialogue: 0,1:15:52.78,1:15:55.34,Default,,0000,0000,0000,,that generate different ages in the image. Dialogue: 0,1:15:55.34,1:16:01.26,Default,,0000,0000,0000,,One of the fascinating things about this is that, similar to neuroscience, this has also been Dialogue: 0,1:16:01.26,1:16:04.79,Default,,0000,0000,0000,,hypothesized as a method for how the human brain processes image Dialogue: 0,1:16:04.79,1:16:05.100,Default,,0000,0000,0000,,data. It Dialogue: 0,1:16:05.100,1:16:10.14,Default,,0000,0000,0000,,turns out, this is similar, in many ways, to computations Dialogue: 0,1:16:10.14,1:16:15.08,Default,,0000,0000,0000,,happening in early visual processing in the human brain, Dialogue: 0,1:16:15.08,1:16:17.66,Default,,0000,0000,0000,,in the mammalian Dialogue: 0,1:16:17.66,1:16:19.80,Default,,0000,0000,0000,,brain. It's just Dialogue: 0,1:16:19.80,1:16:25.26,Default,,0000,0000,0000,,interesting to see ages are the independent components of images. Dialogue: 0,1:16:25.26,1:16:30.64,Default,,0000,0000,0000,,Are there quick questions, because I'm running late. Quick questions before I close? Interviewee: [Inaudible] square matrix? Instructor (Andrew Dialogue: 0,1:16:30.64,1:16:31.93,Default,,0000,0000,0000,,Ng):Oh, Dialogue: 0,1:16:31.93,1:16:35.41,Default,,0000,0000,0000,,yes. For the algorithms I describe, I assume A is a square matrix. Dialogue: 0,1:16:35.41,1:16:38.59,Default,,0000,0000,0000,,It turns out if you have more microphones than speakers, you can also apply very Dialogue: 0,1:16:38.59,1:16:39.61,Default,,0000,0000,0000,,similar algorithms. If Dialogue: 0,1:16:39.61,1:16:43.92,Default,,0000,0000,0000,,you have fewer microphones than speakers, there's sort of an open research problem. The odds Dialogue: 0,1:16:43.92,1:16:48.46,Default,,0000,0000,0000,,are that if you have one male and one female speaker, but one microphone, you can Dialogue: 0,1:16:48.46,1:16:51.82,Default,,0000,0000,0000,,sometimes sort of separate them because one is high, one is low. If you have two Dialogue: 0,1:16:51.82,1:16:55.46,Default,,0000,0000,0000,,male speakers or two female speakers, then it's beyond the state of the art now to separate them Dialogue: 0,1:16:55.46,1:16:57.05,Default,,0000,0000,0000,,with one Dialogue: 0,1:16:57.05,1:17:00.50,Default,,0000,0000,0000,,microphone. It's a great research program. Okay. Dialogue: 0,1:17:00.50,1:17:04.87,Default,,0000,0000,0000,,Sorry about running late again. Let's close now, and we'll Dialogue: 0,1:17:04.87,1:17:05.75,Default,,0000,0000,0000,,continue reinforcement learning.