[Script Info]
Title: 
[Events]
Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text
Dialogue: 0,0:00:09.54,0:00:12.81,Default,,0000,0000,0000,,This presentation is delivered by the Stanford Center for Professional
Dialogue: 0,0:00:12.81,0:00:19.81,Default,,0000,0000,0000,,Development.
Dialogue: 0,0:00:28.74,0:00:30.30,Default,,0000,0000,0000,,Welcome back.
Dialogue: 0,0:00:30.30,0:00:32.80,Default,,0000,0000,0000,,What I want to do today is
Dialogue: 0,0:00:32.80,0:00:37.12,Default,,0000,0000,0000,,continue a discussion of principal components analysis, or PCA.
Dialogue: 0,0:00:37.12,0:00:38.78,Default,,0000,0000,0000,,In particular, there's
Dialogue: 0,0:00:38.78,0:00:41.73,Default,,0000,0000,0000,,one more application that I didn't get to in the last lecture on
Dialogue: 0,0:00:41.73,0:00:43.95,Default,,0000,0000,0000,,[inaudible] indexing,
Dialogue: 0,0:00:43.95,0:00:47.34,Default,,0000,0000,0000,,LSI. Then I want to spend just a little time talking about
Dialogue: 0,0:00:47.34,0:00:50.40,Default,,0000,0000,0000,,how to implement PCA,
Dialogue: 0,0:00:50.40,0:00:53.51,Default,,0000,0000,0000,,especially for very large problems. In particular, I'll spend just a little bit of time talking
Dialogue: 0,0:00:53.51,0:00:55.79,Default,,0000,0000,0000,,about singular value decomposition,
Dialogue: 0,0:00:55.79,0:00:57.93,Default,,0000,0000,0000,,or the SVD implementation
Dialogue: 0,0:00:57.93,0:01:00.17,Default,,0000,0000,0000,,of principal component
Dialogue: 0,0:01:00.17,0:01:01.93,Default,,0000,0000,0000,,analysis. So the
Dialogue: 0,0:01:01.93,0:01:06.55,Default,,0000,0000,0000,,second half of today's lecture, I want to talk about the different algorithm called
Dialogue: 0,0:01:06.55,0:01:09.09,Default,,0000,0000,0000,,independent component analysis,
Dialogue: 0,0:01:09.09,0:01:13.40,Default,,0000,0000,0000,,which is, in some ways, related to PCA, but in many other ways, it
Dialogue: 0,0:01:13.40,0:01:15.58,Default,,0000,0000,0000,,also manages to accomplish
Dialogue: 0,0:01:15.58,0:01:18.39,Default,,0000,0000,0000,,very different things than PCA.
Dialogue: 0,0:01:18.39,0:01:22.38,Default,,0000,0000,0000,,So with this lecture, this will actually wrap up our discussion on
Dialogue: 0,0:01:22.38,0:01:26.47,Default,,0000,0000,0000,,unsupervised learning. The next lecture, we'll start to talk about
Dialogue: 0,0:01:26.47,0:01:31.05,Default,,0000,0000,0000,,reinforcement learning algorithms.
Dialogue: 0,0:01:31.05,0:01:32.50,Default,,0000,0000,0000,,Just to
Dialogue: 0,0:01:32.50,0:01:36.77,Default,,0000,0000,0000,,recap where we were with PCA, principal
Dialogue: 0,0:01:36.77,0:01:38.63,Default,,0000,0000,0000,,component analysis,
Dialogue: 0,0:01:38.63,0:01:44.87,Default,,0000,0000,0000,,I said that
Dialogue: 0,0:01:44.87,0:01:48.84,Default,,0000,0000,0000,,in PCA, we imagine that we have some very high dimensional data
Dialogue: 0,0:01:48.84,0:01:50.97,Default,,0000,0000,0000,,that perhaps
Dialogue: 0,0:01:50.97,0:01:54.65,Default,,0000,0000,0000,,lies approximately on some low dimensional subspace. So if you had the data set like
Dialogue: 0,0:01:54.65,0:01:55.66,Default,,0000,0000,0000,,this,
Dialogue: 0,0:01:55.66,0:01:57.00,Default,,0000,0000,0000,,you might find that
Dialogue: 0,0:01:57.00,0:02:00.65,Default,,0000,0000,0000,,that's the first principal component of the data,
Dialogue: 0,0:02:00.65,0:02:02.85,Default,,0000,0000,0000,,and that's the second
Dialogue: 0,0:02:02.85,0:02:05.65,Default,,0000,0000,0000,,component of this 2-D data.
Dialogue: 0,0:02:06.87,0:02:09.27,Default,,0000,0000,0000,,To
Dialogue: 0,0:02:09.27,0:02:13.04,Default,,0000,0000,0000,,summarize the algorithm, we have three steps. The first step of PCA
Dialogue: 0,0:02:14.68,0:02:16.55,Default,,0000,0000,0000,,was to
Dialogue: 0,0:02:16.55,0:02:20.35,Default,,0000,0000,0000,,normalize the data to zero mean and
Dialogue: 0,0:02:20.35,0:02:25.79,Default,,0000,0000,0000,,[inaudible]. So
Dialogue: 0,0:02:25.79,0:02:27.86,Default,,0000,0000,0000,,tracked out the means of
Dialogue: 0,0:02:27.86,0:02:33.20,Default,,0000,0000,0000,,your training examples. So it now has zero means, and then normalize each of your features so
Dialogue: 0,0:02:33.20,0:02:35.51,Default,,0000,0000,0000,,that the variance of each feature is now one.
Dialogue: 0,0:02:37.72,0:02:40.68,Default,,0000,0000,0000,,The next step was [inaudible]
Dialogue: 0,0:02:40.68,0:02:44.39,Default,,0000,0000,0000,,computical variance matrix of your zero mean data. So
Dialogue: 0,0:02:44.39,0:02:48.86,Default,,0000,0000,0000,,you compute it as follows.
Dialogue: 0,0:02:48.86,0:02:51.31,Default,,0000,0000,0000,,The sum of all the products,
Dialogue: 0,0:02:52.41,0:02:55.87,Default,,0000,0000,0000,,and then you find the
Dialogue: 0,0:02:55.87,0:03:02.87,Default,,0000,0000,0000,,top K eigen vectors of
Dialogue: 0,0:03:03.80,0:03:07.16,Default,,0000,0000,0000,,sigma.
Dialogue: 0,0:03:07.16,0:03:09.59,Default,,0000,0000,0000,,So
Dialogue: 0,0:03:09.59,0:03:12.59,Default,,0000,0000,0000,,last time we saw the applications of this. For example,
Dialogue: 0,0:03:12.59,0:03:18.44,Default,,0000,0000,0000,,one of the applications was to eigen faces where
Dialogue: 0,0:03:18.44,0:03:22.90,Default,,0000,0000,0000,,each of your training examples, XI, is an image.
Dialogue: 0,0:03:22.90,0:03:24.18,Default,,0000,0000,0000,,So
Dialogue: 0,0:03:24.18,0:03:26.75,Default,,0000,0000,0000,,if you have
Dialogue: 0,0:03:26.75,0:03:30.89,Default,,0000,0000,0000,,100 by 100 images, if your pictures of faces are
Dialogue: 0,0:03:30.89,0:03:33.25,Default,,0000,0000,0000,,100 pixels by 100 pixels,
Dialogue: 0,0:03:33.25,0:03:37.52,Default,,0000,0000,0000,,then each of your training examples, XI,
Dialogue: 0,0:03:37.52,0:03:39.99,Default,,0000,0000,0000,,will be a 10,000 dimensional vector,
Dialogue: 0,0:03:39.99,0:03:42.73,Default,,0000,0000,0000,,corresponding to the
Dialogue: 0,0:03:42.73,0:03:47.53,Default,,0000,0000,0000,,10,000 grayscale intensity pixel values. There are 10,000 pixel values in
Dialogue: 0,0:03:47.53,0:03:50.35,Default,,0000,0000,0000,,each of your 100 by 100 images.
Dialogue: 0,0:03:50.35,0:03:53.18,Default,,0000,0000,0000,,So the eigen faces application was where
Dialogue: 0,0:03:53.18,0:03:56.28,Default,,0000,0000,0000,,the training example comprised
Dialogue: 0,0:03:56.28,0:03:58.15,Default,,0000,0000,0000,,pictures of faces of people.
Dialogue: 0,0:03:58.15,0:03:59.99,Default,,0000,0000,0000,,Then we ran PCA,
Dialogue: 0,0:03:59.99,0:04:00.78,Default,,0000,0000,0000,,and then
Dialogue: 0,0:04:00.78,0:04:03.09,Default,,0000,0000,0000,,to measure the distance between say
Dialogue: 0,0:04:03.09,0:04:04.26,Default,,0000,0000,0000,,a face here
Dialogue: 0,0:04:04.26,0:04:06.81,Default,,0000,0000,0000,,and a face there, we would project both
Dialogue: 0,0:04:06.81,0:04:09.62,Default,,0000,0000,0000,,of the face images onto the subspace and then
Dialogue: 0,0:04:09.62,0:04:11.47,Default,,0000,0000,0000,,measure
Dialogue: 0,0:04:11.47,0:04:13.68,Default,,0000,0000,0000,,the distance along the subspace. So in eigen faces, you use something
Dialogue: 0,0:04:13.68,0:04:16.92,Default,,0000,0000,0000,,like 50 principle components.
Dialogue: 0,0:04:16.92,0:04:20.05,Default,,0000,0000,0000,,So
Dialogue: 0,0:04:20.05,0:04:24.08,Default,,0000,0000,0000,,the difficulty of working with problems like these is that
Dialogue: 0,0:04:24.08,0:04:27.33,Default,,0000,0000,0000,,in step two of the algorithm,
Dialogue: 0,0:04:27.33,0:04:31.07,Default,,0000,0000,0000,,we construct the covariance matrix sigma.
Dialogue: 0,0:04:31.07,0:04:37.59,Default,,0000,0000,0000,,The covariance matrix now becomes
Dialogue: 0,0:04:37.59,0:04:42.92,Default,,0000,0000,0000,,a 10,000 by 10,000 dimensional matrix, which is huge. That has
Dialogue: 0,0:04:42.92,0:04:45.58,Default,,0000,0000,0000,,100 million
Dialogue: 0,0:04:45.58,0:04:47.65,Default,,0000,0000,0000,,entries, which is huge.
Dialogue: 0,0:04:49.02,0:04:50.79,Default,,0000,0000,0000,,So let's apply PCA to
Dialogue: 0,0:04:50.79,0:04:54.29,Default,,0000,0000,0000,,very, very high dimensional data, used as a point of reducing the
Dialogue: 0,0:04:54.29,0:04:55.18,Default,,0000,0000,0000,,dimension. But
Dialogue: 0,0:04:55.18,0:04:59.55,Default,,0000,0000,0000,,step two of this algorithm had this step where you were constructing [inaudible]. So
Dialogue: 0,0:04:59.55,0:05:03.60,Default,,0000,0000,0000,,this extremely large matrix, which you can't do.
Dialogue: 0,0:05:03.60,0:05:06.58,Default,,0000,0000,0000,,Come back to this in a second. It turns out one of
Dialogue: 0,0:05:06.58,0:05:08.65,Default,,0000,0000,0000,,the other
Dialogue: 0,0:05:08.65,0:05:10.64,Default,,0000,0000,0000,,frequently-used applications of
Dialogue: 0,0:05:10.64,0:05:11.87,Default,,0000,0000,0000,,PCA
Dialogue: 0,0:05:11.87,0:05:14.21,Default,,0000,0000,0000,,is actually to text data.
Dialogue: 0,0:05:14.21,0:05:16.33,Default,,0000,0000,0000,,So here's what I
Dialogue: 0,0:05:16.33,0:05:21.04,Default,,0000,0000,0000,,mean. Remember our vectorial representation of emails?
Dialogue: 0,0:05:21.04,0:05:22.52,Default,,0000,0000,0000,,So this is way back
Dialogue: 0,0:05:22.52,0:05:26.87,Default,,0000,0000,0000,,when we were talking about supervised learning algorithms for a
Dialogue: 0,0:05:26.87,0:05:29.36,Default,,0000,0000,0000,,stand classification. You remember I said that
Dialogue: 0,0:05:29.36,0:05:32.62,Default,,0000,0000,0000,,given a piece of email or given a piece of text document, you
Dialogue: 0,0:05:32.62,0:05:35.48,Default,,0000,0000,0000,,can represent it using a very high-dimensional vector
Dialogue: 0,0:05:35.48,0:05:36.70,Default,,0000,0000,0000,,by taking
Dialogue: 0,0:05:38.99,0:05:43.89,Default,,0000,0000,0000,,- writing down a list of all the words in your dictionary. Somewhere you had the word
Dialogue: 0,0:05:43.89,0:05:46.67,Default,,0000,0000,0000,,learn, somewhere you have the word
Dialogue: 0,0:05:46.67,0:05:49.68,Default,,0000,0000,0000,,study
Dialogue: 0,0:05:50.57,0:05:54.76,Default,,0000,0000,0000,,and so on.
Dialogue: 0,0:05:54.76,0:05:58.18,Default,,0000,0000,0000,,Depending on whether each word appears or does not appear in your text document, you put either
Dialogue: 0,0:05:58.18,0:05:59.36,Default,,0000,0000,0000,,a one or a zero
Dialogue: 0,0:05:59.36,0:06:03.89,Default,,0000,0000,0000,,there. This is a representation we use on an electrode five or electrode six
Dialogue: 0,0:06:03.89,0:06:06.73,Default,,0000,0000,0000,,for representing text documents for
Dialogue: 0,0:06:06.73,0:06:08.67,Default,,0000,0000,0000,,when we're building
Dialogue: 0,0:06:08.67,0:06:12.09,Default,,0000,0000,0000,,[inaudible] based classifiers for
Dialogue: 0,0:06:12.09,0:06:14.30,Default,,0000,0000,0000,,[inaudible]. So it turns
Dialogue: 0,0:06:14.30,0:06:17.72,Default,,0000,0000,0000,,out one of the common applications of
Dialogue: 0,0:06:17.72,0:06:22.21,Default,,0000,0000,0000,,PCA is actually this text data representations as well.
Dialogue: 0,0:06:22.21,0:06:23.68,Default,,0000,0000,0000,,When you apply PCA
Dialogue: 0,0:06:23.68,0:06:25.65,Default,,0000,0000,0000,,to this sort of data,
Dialogue: 0,0:06:25.65,0:06:27.100,Default,,0000,0000,0000,,the resulting
Dialogue: 0,0:06:27.100,0:06:34.74,Default,,0000,0000,0000,,algorithm, it often just goes by a different name, just latent semantic indexing.
Dialogue: 0,0:06:41.25,0:06:44.01,Default,,0000,0000,0000,,For the sake of completeness, I should say that
Dialogue: 0,0:06:44.01,0:06:48.05,Default,,0000,0000,0000,,in LSI, you usually skip the preprocessing step.
Dialogue: 0,0:07:06.09,0:07:09.93,Default,,0000,0000,0000,,For various reasons, in LSI, you usually don't normalize the mean of the data to
Dialogue: 0,0:07:09.93,0:07:10.94,Default,,0000,0000,0000,,one,
Dialogue: 0,0:07:10.94,0:07:14.17,Default,,0000,0000,0000,,and you usually don't normalize the variance of the features to one.
Dialogue: 0,0:07:14.17,0:07:18.02,Default,,0000,0000,0000,,These are relatively minor
Dialogue: 0,0:07:18.02,0:07:21.20,Default,,0000,0000,0000,,differences, it turns out, so it does something very
Dialogue: 0,0:07:21.20,0:07:24.60,Default,,0000,0000,0000,,similar to PCA.
Dialogue: 0,0:07:24.60,0:07:25.85,Default,,0000,0000,0000,,
Dialogue: 0,0:07:25.85,0:07:27.44,Default,,0000,0000,0000,,Normalizing the variance to one
Dialogue: 0,0:07:27.44,0:07:33.44,Default,,0000,0000,0000,,for text data would actually be a bad idea because all the words are -
Dialogue: 0,0:07:33.44,0:07:34.39,Default,,0000,0000,0000,,because that
Dialogue: 0,0:07:34.39,0:07:37.15,Default,,0000,0000,0000,,would have the affect of
Dialogue: 0,0:07:37.15,0:07:39.23,Default,,0000,0000,0000,,dramatically scaling up the
Dialogue: 0,0:07:39.23,0:07:43.78,Default,,0000,0000,0000,,weight of rarely occurring words. So for example, the word aardvark hardly ever
Dialogue: 0,0:07:43.78,0:07:45.73,Default,,0000,0000,0000,,appears in any document.
Dialogue: 0,0:07:45.73,0:07:48.92,Default,,0000,0000,0000,,So to normalize the variance
Dialogue: 0,0:07:48.92,0:07:51.01,Default,,0000,0000,0000,,of the second feature to one, you end up -
Dialogue: 0,0:07:51.01,0:07:54.86,Default,,0000,0000,0000,,you're scaling up the weight of the word aardvark
Dialogue: 0,0:07:54.86,0:07:58.34,Default,,0000,0000,0000,,dramatically. I don't understand why [inaudible].
Dialogue: 0,0:07:58.34,0:08:01.45,Default,,0000,0000,0000,,So let's
Dialogue: 0,0:08:01.45,0:08:05.32,Default,,0000,0000,0000,,see. [Inaudible] the language,
Dialogue: 0,0:08:05.32,0:08:11.17,Default,,0000,0000,0000,,something that we want to do quite often is, give it two documents,
Dialogue: 0,0:08:13.11,0:08:20.11,Default,,0000,0000,0000,,XI and XJ, to measure how similar they are.
Dialogue: 0,0:08:20.18,0:08:22.40,Default,,0000,0000,0000,,So for example,
Dialogue: 0,0:08:22.40,0:08:25.05,Default,,0000,0000,0000,,I may give you a document and ask
Dialogue: 0,0:08:25.05,0:08:28.86,Default,,0000,0000,0000,,you to find me more documents like this one. We're reading some
Dialogue: 0,0:08:28.86,0:08:30.90,Default,,0000,0000,0000,,article about some user event of today
Dialogue: 0,0:08:30.90,0:08:33.77,Default,,0000,0000,0000,,and want to find out what other news articles there are. So I give you a document and
Dialogue: 0,0:08:34.52,0:08:37.16,Default,,0000,0000,0000,,ask you to look at all the other documents you have
Dialogue: 0,0:08:37.16,0:08:40.83,Default,,0000,0000,0000,,in this large set of documents and find the documents similar to
Dialogue: 0,0:08:40.83,0:08:42.22,Default,,0000,0000,0000,,this.
Dialogue: 0,0:08:43.74,0:08:45.21,Default,,0000,0000,0000,,So
Dialogue: 0,0:08:45.21,0:08:48.17,Default,,0000,0000,0000,,this is typical text application, so
Dialogue: 0,0:08:48.17,0:08:51.14,Default,,0000,0000,0000,,to measure the similarity
Dialogue: 0,0:08:52.44,0:08:56.55,Default,,0000,0000,0000,,between two documents in XI and XJ, [inaudible]
Dialogue: 0,0:08:56.55,0:08:59.95,Default,,0000,0000,0000,,each of these documents is represented
Dialogue: 0,0:08:59.95,0:09:03.19,Default,,0000,0000,0000,,as one of these highdimensional vectors.
Dialogue: 0,0:09:03.19,0:09:08.70,Default,,0000,0000,0000,,One common way to do this is to view each of your documents
Dialogue: 0,0:09:08.70,0:09:12.71,Default,,0000,0000,0000,,as some sort of very high-dimensional vector.
Dialogue: 0,0:09:12.71,0:09:14.10,Default,,0000,0000,0000,,So these
Dialogue: 0,0:09:14.10,0:09:18.04,Default,,0000,0000,0000,,are vectors in the very highdimensional space where
Dialogue: 0,0:09:18.04,0:09:20.30,Default,,0000,0000,0000,,the dimension of the vector is equal to
Dialogue: 0,0:09:20.30,0:09:27.21,Default,,0000,0000,0000,,the number of words in your dictionary.
Dialogue: 0,0:09:27.21,0:09:30.63,Default,,0000,0000,0000,,So maybe each of these documents lives in some
Dialogue: 0,0:09:30.63,0:09:33.56,Default,,0000,0000,0000,,50,000-dimension space, if you have 50,000 words in your
Dialogue: 0,0:09:33.56,0:09:37.09,Default,,0000,0000,0000,,dictionary. So one nature of the similarity between these two documents that's
Dialogue: 0,0:09:37.09,0:09:39.80,Default,,0000,0000,0000,,often used is
Dialogue: 0,0:09:39.80,0:09:41.44,Default,,0000,0000,0000,,what's the angle
Dialogue: 0,0:09:41.44,0:09:43.35,Default,,0000,0000,0000,,between these two
Dialogue: 0,0:09:43.35,0:09:50.35,Default,,0000,0000,0000,,documents.
Dialogue: 0,0:09:51.24,0:09:52.75,Default,,0000,0000,0000,,In particular,
Dialogue: 0,0:09:52.75,0:09:56.03,Default,,0000,0000,0000,,if the angle between these two vectors is small, then
Dialogue: 0,0:09:56.03,0:09:59.59,Default,,0000,0000,0000,,the two documents, we'll consider them to be similar. If the angle between
Dialogue: 0,0:09:59.59,0:10:03.31,Default,,0000,0000,0000,,these two vectors is large, then we consider the documents to be dissimilar.
Dialogue: 0,0:10:03.31,0:10:05.53,Default,,0000,0000,0000,,So
Dialogue: 0,0:10:05.53,0:10:10.06,Default,,0000,0000,0000,,more formally, one commonly used heuristic, the national language of processing,
Dialogue: 0,0:10:10.06,0:10:13.95,Default,,0000,0000,0000,,is to say that the similarity between the two documents is a co-sine of the angle theta between them.
Dialogue: 0,0:10:13.95,0:10:16.71,Default,,0000,0000,0000,,For similar
Dialogue: 0,0:10:16.71,0:10:19.27,Default,,0000,0000,0000,,values, anyway, the co-sine
Dialogue: 0,0:10:19.27,0:10:23.11,Default,,0000,0000,0000,,is a decreasing function of theta.
Dialogue: 0,0:10:23.11,0:10:24.49,Default,,0000,0000,0000,,So the
Dialogue: 0,0:10:24.49,0:10:29.56,Default,,0000,0000,0000,,smaller the angle between them, the larger the similarity.
Dialogue: 0,0:10:29.56,0:10:30.69,Default,,0000,0000,0000,,The co-sine
Dialogue: 0,0:10:30.69,0:10:35.97,Default,,0000,0000,0000,,between two vectors is, of course, just [inaudible]
Dialogue: 0,0:10:35.97,0:10:42.97,Default,,0000,0000,0000,,divided
Dialogue: 0,0:10:43.94,0:10:46.68,Default,,0000,0000,0000,,by - okay?
Dialogue: 0,0:10:46.68,0:10:51.10,Default,,0000,0000,0000,,That's just the linear algebra or the standard
Dialogue: 0,0:10:51.10,0:10:56.46,Default,,0000,0000,0000,,geometry definition of the co-sine between two vectors. Here's the
Dialogue: 0,0:11:03.54,0:11:10.54,Default,,0000,0000,0000,,intuition behind what LSI is doing. The hope, as usual, is
Dialogue: 0,0:11:17.93,0:11:21.06,Default,,0000,0000,0000,,that there
Dialogue: 0,0:11:21.06,0:11:24.97,Default,,0000,0000,0000,,may be some interesting axis of variations in the data,
Dialogue: 0,0:11:24.97,0:11:27.66,Default,,0000,0000,0000,,and there maybe some other axis that
Dialogue: 0,0:11:27.66,0:11:29.34,Default,,0000,0000,0000,,are just
Dialogue: 0,0:11:29.34,0:11:33.96,Default,,0000,0000,0000,,noise. So by projecting all of your data on lower-dimensional subspace, the hope is that by
Dialogue: 0,0:11:33.96,0:11:37.85,Default,,0000,0000,0000,,running PCA on your text data this way, you can remove some of the noise in the data and
Dialogue: 0,0:11:37.85,0:11:41.80,Default,,0000,0000,0000,,get better measures of the similarity between pairs of
Dialogue: 0,0:11:41.80,0:11:45.49,Default,,0000,0000,0000,,documents. Let's just delve a little deeper into those examples to convey more intuition about what LSI
Dialogue: 0,0:11:45.49,0:11:46.94,Default,,0000,0000,0000,,is doing.
Dialogue: 0,0:11:46.94,0:11:47.94,Default,,0000,0000,0000,,So
Dialogue: 0,0:11:47.94,0:11:54.94,Default,,0000,0000,0000,,look further in the definition of the co-sine similarity measure. So
Dialogue: 0,0:11:56.43,0:11:59.79,Default,,0000,0000,0000,,the numerator
Dialogue: 0,0:11:59.79,0:12:06.79,Default,,0000,0000,0000,,or
Dialogue: 0,0:12:10.21,0:12:17.21,Default,,0000,0000,0000,,the similarity between the two documents was this inner product,
Dialogue: 0,0:12:17.62,0:12:24.27,Default,,0000,0000,0000,,which is therefore sum over K,
Dialogue: 0,0:12:24.27,0:12:27.50,Default,,0000,0000,0000,,XIK,
Dialogue: 0,0:12:27.50,0:12:28.87,Default,,0000,0000,0000,,XJK. So
Dialogue: 0,0:12:30.41,0:12:33.81,Default,,0000,0000,0000,,this inner product would be equal to zero if
Dialogue: 0,0:12:33.81,0:12:36.91,Default,,0000,0000,0000,,the two documents have no words in common. So
Dialogue: 0,0:12:36.91,0:12:39.90,Default,,0000,0000,0000,,this is really - sum over K -
Dialogue: 0,0:12:41.10,0:12:42.93,Default,,0000,0000,0000,,indicator of
Dialogue: 0,0:12:42.93,0:12:44.76,Default,,0000,0000,0000,,whether
Dialogue: 0,0:12:44.76,0:12:47.17,Default,,0000,0000,0000,,documents, I and
Dialogue: 0,0:12:47.17,0:12:54.17,Default,,0000,0000,0000,,J,
Dialogue: 0,0:12:54.92,0:12:58.25,Default,,0000,0000,0000,,both contain the word, K, because
Dialogue: 0,0:12:58.25,0:13:02.59,Default,,0000,0000,0000,,I guess XIK indicates whether document I contains the word
Dialogue: 0,0:13:02.59,0:13:04.41,Default,,0000,0000,0000,,K, and XJK
Dialogue: 0,0:13:04.41,0:13:07.83,Default,,0000,0000,0000,,indicates whether document J contains the word, K.
Dialogue: 0,0:13:07.83,0:13:10.43,Default,,0000,0000,0000,,So the product would be one only
Dialogue: 0,0:13:10.43,0:13:12.42,Default,,0000,0000,0000,,if the word K
Dialogue: 0,0:13:12.42,0:13:14.54,Default,,0000,0000,0000,,appears in both documents.
Dialogue: 0,0:13:14.54,0:13:17.74,Default,,0000,0000,0000,,Therefore, the similarity between these two documents would be
Dialogue: 0,0:13:17.74,0:13:23.45,Default,,0000,0000,0000,,zero if the two documents have no words in common.
Dialogue: 0,0:13:23.45,0:13:30.45,Default,,0000,0000,0000,,For example,
Dialogue: 0,0:13:31.18,0:13:34.53,Default,,0000,0000,0000,,suppose your document,
Dialogue: 0,0:13:34.53,0:13:40.46,Default,,0000,0000,0000,,XI, has the word study and the word
Dialogue: 0,0:13:40.46,0:13:41.73,Default,,0000,0000,0000,,XJ,
Dialogue: 0,0:13:41.73,0:13:43.46,Default,,0000,0000,0000,,has the word learn.
Dialogue: 0,0:13:43.46,0:13:47.53,Default,,0000,0000,0000,,Then these two documents may be considered
Dialogue: 0,0:13:47.53,0:13:49.07,Default,,0000,0000,0000,,entirely dissimilar.
Dialogue: 0,0:13:50.02,0:13:53.28,Default,,0000,0000,0000,,[Inaudible] effective study strategies. Sometimes you read a
Dialogue: 0,0:13:53.28,0:13:57.10,Default,,0000,0000,0000,,news article about that. So you ask, what other documents are similar to this? If
Dialogue: 0,0:13:57.10,0:14:01.08,Default,,0000,0000,0000,,there are a bunch of other documents about good methods to
Dialogue: 0,0:14:01.08,0:14:04.25,Default,,0000,0000,0000,,learn, than there are words in common. So similarity [inaudible] is zero.
Dialogue: 0,0:14:04.25,0:14:06.79,Default,,0000,0000,0000,,So here's
Dialogue: 0,0:14:06.79,0:14:09.20,Default,,0000,0000,0000,,a cartoon
Dialogue: 0,0:14:09.20,0:14:10.97,Default,,0000,0000,0000,,of what we hope
Dialogue: 0,0:14:10.97,0:14:12.82,Default,,0000,0000,0000,,[inaudible] PCA will do,
Dialogue: 0,0:14:12.82,0:14:14.34,Default,,0000,0000,0000,,which is
Dialogue: 0,0:14:14.34,0:14:17.32,Default,,0000,0000,0000,,suppose that on the horizontal axis, I plot
Dialogue: 0,0:14:17.32,0:14:21.33,Default,,0000,0000,0000,,the word
Dialogue: 0,0:14:21.33,0:14:25.31,Default,,0000,0000,0000,,learn, and on the vertical access, I plot the word study.
Dialogue: 0,0:14:25.31,0:14:29.84,Default,,0000,0000,0000,,So the values take on either the value of zero or one. So if a document
Dialogue: 0,0:14:29.84,0:14:33.04,Default,,0000,0000,0000,,contains the words learn but not study, then
Dialogue: 0,0:14:33.04,0:14:35.26,Default,,0000,0000,0000,,it'll plot that document there. If
Dialogue: 0,0:14:35.26,0:14:38.05,Default,,0000,0000,0000,,a document contains neither the word study nor learn, then it'll plot that
Dialogue: 0,0:14:38.05,0:14:40.53,Default,,0000,0000,0000,,at zero, zero.
Dialogue: 0,0:14:40.53,0:14:44.12,Default,,0000,0000,0000,,So here's a cartoon behind what PCA
Dialogue: 0,0:14:44.12,0:14:46.94,Default,,0000,0000,0000,,is doing, which is
Dialogue: 0,0:14:46.94,0:14:51.48,Default,,0000,0000,0000,,we identify lower dimensional subspace. That would be sum - eigen
Dialogue: 0,0:14:51.48,0:14:57.63,Default,,0000,0000,0000,,vector, we get out of PCAs.
Dialogue: 0,0:14:57.63,0:15:03.38,Default,,0000,0000,0000,,Now, supposed we have a document about learning. We have a document about studying.
Dialogue: 0,0:15:03.38,0:15:05.15,Default,,0000,0000,0000,,The document about learning
Dialogue: 0,0:15:05.15,0:15:07.71,Default,,0000,0000,0000,,points to the right. Document about studying points
Dialogue: 0,0:15:07.71,0:15:11.44,Default,,0000,0000,0000,,up. So the inner product, or the co-sine angle between these two documents would be
Dialogue: 0,0:15:11.44,0:15:13.17,Default,,0000,0000,0000,,- excuse
Dialogue: 0,0:15:13.17,0:15:15.08,Default,,0000,0000,0000,,me. The inner product between
Dialogue: 0,0:15:15.08,0:15:18.85,Default,,0000,0000,0000,,these two documents will be zero.
Dialogue: 0,0:15:18.85,0:15:20.28,Default,,0000,0000,0000,,So these two
Dialogue: 0,0:15:20.28,0:15:22.36,Default,,0000,0000,0000,,documents are entirely unrelated,
Dialogue: 0,0:15:22.36,0:15:24.85,Default,,0000,0000,0000,,which is not what we want.
Dialogue: 0,0:15:24.85,0:15:27.68,Default,,0000,0000,0000,,Documents about study, documents about learning, they are related. But
Dialogue: 0,0:15:27.68,0:15:32.82,Default,,0000,0000,0000,,we take these two documents, and we project them
Dialogue: 0,0:15:32.82,0:15:36.22,Default,,0000,0000,0000,,onto this subspace.
Dialogue: 0,0:15:38.32,0:15:40.85,Default,,0000,0000,0000,,Then these two documents now become much
Dialogue: 0,0:15:40.85,0:15:42.61,Default,,0000,0000,0000,,closer together,
Dialogue: 0,0:15:42.61,0:15:44.79,Default,,0000,0000,0000,,and the algorithm will recognize that
Dialogue: 0,0:15:44.79,0:15:47.65,Default,,0000,0000,0000,,when you say the inner product between these two documents,
Dialogue: 0,0:15:47.65,0:15:50.58,Default,,0000,0000,0000,,you actually end up with a positive number. So
Dialogue: 0,0:15:50.58,0:15:52.19,Default,,0000,0000,0000,,LSI enables
Dialogue: 0,0:15:52.19,0:15:56.20,Default,,0000,0000,0000,,our algorithm to recognize that these two documents have some positive similarity
Dialogue: 0,0:15:56.20,0:15:58.92,Default,,0000,0000,0000,,between them.
Dialogue: 0,0:15:58.92,0:16:01.23,Default,,0000,0000,0000,,So that's just intuition
Dialogue: 0,0:16:01.23,0:16:02.37,Default,,0000,0000,0000,,about what
Dialogue: 0,0:16:02.37,0:16:05.00,Default,,0000,0000,0000,,PCA may be doing to text data.
Dialogue: 0,0:16:05.00,0:16:09.42,Default,,0000,0000,0000,,The same thing goes to other examples and the words study and learn. So you have
Dialogue: 0,0:16:09.42,0:16:11.14,Default,,0000,0000,0000,,- you find a document about
Dialogue: 0,0:16:11.14,0:16:15.06,Default,,0000,0000,0000,,politicians and a document with the names of prominent
Dialogue: 0,0:16:15.06,0:16:16.49,Default,,0000,0000,0000,,politicians.
Dialogue: 0,0:16:17.56,0:16:20.74,Default,,0000,0000,0000,,That will also bring the documents closer together,
Dialogue: 0,0:16:20.74,0:16:24.65,Default,,0000,0000,0000,,or just any related topics, they end up
Dialogue: 0,0:16:24.65,0:16:25.67,Default,,0000,0000,0000,,[inaudible]
Dialogue: 0,0:16:25.67,0:16:31.78,Default,,0000,0000,0000,,points closer together and just lower dimensional space.
Dialogue: 0,0:16:33.13,0:16:38.38,Default,,0000,0000,0000,,Question about this? Interviewee: [Inaudible].
Dialogue: 0,0:16:38.38,0:16:43.93,Default,,0000,0000,0000,,Which ones?
Dialogue: 0,0:16:43.93,0:16:50.93,Default,,0000,0000,0000,,This one? No, the line. Oh, this one. Oh,
Dialogue: 0,0:16:53.82,0:16:54.47,Default,,0000,0000,0000,,yes.
Dialogue: 0,0:16:54.47,0:17:01.47,Default,,0000,0000,0000,,Thank you.
Dialogue: 0,0:17:01.67,0:17:05.51,Default,,0000,0000,0000,,[Inaudible].
Dialogue: 0,0:17:05.51,0:17:06.65,Default,,0000,0000,0000,,So
Dialogue: 0,0:17:06.65,0:17:13.65,Default,,0000,0000,0000,,let's talk about how to actually implement this now.
Dialogue: 0,0:17:17.23,0:17:20.52,Default,,0000,0000,0000,,Okay. How many of you know what
Dialogue: 0,0:17:20.52,0:17:24.78,Default,,0000,0000,0000,,an SVD or single value decomposition is? Wow,
Dialogue: 0,0:17:24.78,0:17:25.80,Default,,0000,0000,0000,,that's a lot of you. That's a
Dialogue: 0,0:17:25.80,0:17:28.27,Default,,0000,0000,0000,,lot more than I thought.
Dialogue: 0,0:17:28.27,0:17:35.27,Default,,0000,0000,0000,,Curious. Did you guys learn it as under grads or as graduate students?
Dialogue: 0,0:17:37.31,0:17:38.03,Default,,0000,0000,0000,,All
Dialogue: 0,0:17:38.03,0:17:40.43,Default,,0000,0000,0000,,right. Let
Dialogue: 0,0:17:40.43,0:17:44.15,Default,,0000,0000,0000,,me talk about it anyway. I
Dialogue: 0,0:17:44.15,0:17:47.79,Default,,0000,0000,0000,,wasn't expecting so many of you to know what SVD is, but I want to get this
Dialogue: 0,0:17:47.79,0:17:51.03,Default,,0000,0000,0000,,on tape, just so everyone else can learn
Dialogue: 0,0:17:51.03,0:17:55.05,Default,,0000,0000,0000,,about this, too.
Dialogue: 0,0:17:55.05,0:17:59.02,Default,,0000,0000,0000,,So I'll say a little bit about how to implement
Dialogue: 0,0:17:59.02,0:18:01.64,Default,,0000,0000,0000,,PCA. The problem I
Dialogue: 0,0:18:01.64,0:18:05.11,Default,,0000,0000,0000,,was eluding to just now was that
Dialogue: 0,0:18:05.11,0:18:09.31,Default,,0000,0000,0000,,when you have these very high-dimensional vectors, than sigma is a large matrix. In particular, for
Dialogue: 0,0:18:09.31,0:18:11.62,Default,,0000,0000,0000,,our
Dialogue: 0,0:18:11.62,0:18:13.39,Default,,0000,0000,0000,,text example,
Dialogue: 0,0:18:13.39,0:18:18.00,Default,,0000,0000,0000,,if the vectors XI are 50,000 dimensional,
Dialogue: 0,0:18:18.00,0:18:24.43,Default,,0000,0000,0000,,then
Dialogue: 0,0:18:24.43,0:18:27.16,Default,,0000,0000,0000,,the covariance matrix will be 50,000 dimensional by 50,000
Dialogue: 0,0:18:27.16,0:18:32.91,Default,,0000,0000,0000,,dimensional, which is much too big to represent explicitly.
Dialogue: 0,0:18:32.91,0:18:36.27,Default,,0000,0000,0000,,I guess many
Dialogue: 0,0:18:36.27,0:18:41.24,Default,,0000,0000,0000,,of you already know this, but I'll just say it anyway. It
Dialogue: 0,0:18:41.24,0:18:48.24,Default,,0000,0000,0000,,turns out there's another way to implement PCA, which is
Dialogue: 0,0:18:48.68,0:18:49.45,Default,,0000,0000,0000,,if
Dialogue: 0,0:18:49.45,0:18:53.34,Default,,0000,0000,0000,,A is any N by N matrix,
Dialogue: 0,0:18:53.34,0:18:56.58,Default,,0000,0000,0000,,than one of the most remarkable results of linear algebra
Dialogue: 0,0:18:56.58,0:19:00.97,Default,,0000,0000,0000,,is that the matrix, A,
Dialogue: 0,0:19:00.97,0:19:04.28,Default,,0000,0000,0000,,can be decomposed into
Dialogue: 0,0:19:04.28,0:19:07.33,Default,,0000,0000,0000,,a singular value
Dialogue: 0,0:19:07.33,0:19:10.16,Default,,0000,0000,0000,,decomposition. What that means is that the matrix, A, which
Dialogue: 0,0:19:10.16,0:19:11.52,Default,,0000,0000,0000,,is
Dialogue: 0,0:19:11.52,0:19:12.68,Default,,0000,0000,0000,,N by N,
Dialogue: 0,0:19:12.68,0:19:15.23,Default,,0000,0000,0000,,can always be decomposed into a product of
Dialogue: 0,0:19:15.23,0:19:18.28,Default,,0000,0000,0000,,three matrixes. U is N by N,
Dialogue: 0,0:19:18.28,0:19:21.47,Default,,0000,0000,0000,,D is a square matrix, which is N by N, and V is
Dialogue: 0,0:19:23.75,0:19:27.17,Default,,0000,0000,0000,,also N by N.
Dialogue: 0,0:19:27.17,0:19:30.91,Default,,0000,0000,0000,,D
Dialogue: 0,0:19:30.91,0:19:34.89,Default,,0000,0000,0000,,is
Dialogue: 0,0:19:34.89,0:19:37.32,Default,,0000,0000,0000,,going to be diagonal.
Dialogue: 0,0:19:43.03,0:19:45.14,Default,,0000,0000,0000,,Zeros are on the off-diagonals,
Dialogue: 0,0:19:45.14,0:19:52.14,Default,,0000,0000,0000,,and the values sigma I are called the singular values of
Dialogue: 0,0:19:54.31,0:19:57.53,Default,,0000,0000,0000,,the matrix A.
Dialogue: 0,0:20:01.50,0:20:05.47,Default,,0000,0000,0000,,Almost all of you said you learned this as a graduate student, rather than as an under grad, so
Dialogue: 0,0:20:05.47,0:20:07.02,Default,,0000,0000,0000,,it turns out that
Dialogue: 0,0:20:07.02,0:20:10.82,Default,,0000,0000,0000,,when you take a class in undergraduate linear algebra, usually you learn a bunch of
Dialogue: 0,0:20:10.82,0:20:13.96,Default,,0000,0000,0000,,decomposition. So you usually learn about the QLD composition, maybe
Dialogue: 0,0:20:13.96,0:20:17.07,Default,,0000,0000,0000,,the LU factorization of the matrixes.
Dialogue: 0,0:20:17.07,0:20:20.35,Default,,0000,0000,0000,,Most under grad courses don't get to talk about singular value
Dialogue: 0,0:20:20.35,0:20:21.49,Default,,0000,0000,0000,,decompositions, but at
Dialogue: 0,0:20:21.49,0:20:22.86,Default,,0000,0000,0000,,least in - almost
Dialogue: 0,0:20:22.86,0:20:24.58,Default,,0000,0000,0000,,everything I
Dialogue: 0,0:20:24.58,0:20:26.90,Default,,0000,0000,0000,,do in machine learning,
Dialogue: 0,0:20:26.90,0:20:31.18,Default,,0000,0000,0000,,you actually find that you end up using SVDs much more than any of the
Dialogue: 0,0:20:31.18,0:20:33.12,Default,,0000,0000,0000,,decompositions
Dialogue: 0,0:20:33.12,0:20:35.69,Default,,0000,0000,0000,,you learned in typical under grad linear algebra class.
Dialogue: 0,0:20:35.69,0:20:40.03,Default,,0000,0000,0000,,So personally, I [inaudible] an SVD dozens of times in the last
Dialogue: 0,0:20:40.03,0:20:42.82,Default,,0000,0000,0000,,year, but LU and QRD compositions,
Dialogue: 0,0:20:42.82,0:20:45.48,Default,,0000,0000,0000,,I think I used the QRD composition once and an
Dialogue: 0,0:20:45.48,0:20:49.53,Default,,0000,0000,0000,,LU decomposition in the last year. So let's see. I'll
Dialogue: 0,0:20:49.53,0:20:53.88,Default,,0000,0000,0000,,say
Dialogue: 0,0:20:53.88,0:20:54.83,Default,,0000,0000,0000,,a
Dialogue: 0,0:20:54.83,0:21:01.83,Default,,0000,0000,0000,,bit
Dialogue: 0,0:21:01.84,0:21:04.70,Default,,0000,0000,0000,,more about this. So I'm going to draw the picture, I guess.
Dialogue: 0,0:21:04.70,0:21:08.09,Default,,0000,0000,0000,,For example,
Dialogue: 0,0:21:08.09,0:21:10.89,Default,,0000,0000,0000,,if A is an N by N matrix,
Dialogue: 0,0:21:10.89,0:21:17.89,Default,,0000,0000,0000,,it can be decomposed into another matrix, U, which is also N by N. It's the same
Dialogue: 0,0:21:21.78,0:21:24.51,Default,,0000,0000,0000,,size, D, which is
Dialogue: 0,0:21:24.51,0:21:29.84,Default,,0000,0000,0000,,N by
Dialogue: 0,0:21:29.84,0:21:34.73,Default,,0000,0000,0000,,N. Another square matrix, V transpose, which is also
Dialogue: 0,0:21:34.73,0:21:37.43,Default,,0000,0000,0000,,N by
Dialogue: 0,0:21:37.43,0:21:42.85,Default,,0000,0000,0000,,N. Furthermore, in a singular value decomposition, the
Dialogue: 0,0:21:42.85,0:21:46.46,Default,,0000,0000,0000,,columns of the matrix, U, will be the eigen
Dialogue: 0,0:21:46.46,0:21:50.94,Default,,0000,0000,0000,,vectors
Dialogue: 0,0:21:50.94,0:21:55.63,Default,,0000,0000,0000,,of A transpose, and the
Dialogue: 0,0:21:55.63,0:22:02.02,Default,,0000,0000,0000,,columns of V will be the eigen vectors
Dialogue: 0,0:22:02.02,0:22:05.73,Default,,0000,0000,0000,,of A
Dialogue: 0,0:22:05.73,0:22:08.29,Default,,0000,0000,0000,,transpose A.
Dialogue: 0,0:22:08.29,0:22:12.03,Default,,0000,0000,0000,,
Dialogue: 0,0:22:12.03,0:22:16.66,Default,,0000,0000,0000,,To compute it, you just use the SVD commands
Dialogue: 0,0:22:16.66,0:22:20.78,Default,,0000,0000,0000,,in
Dialogue: 0,0:22:20.78,0:22:22.09,Default,,0000,0000,0000,,Matlab
Dialogue: 0,0:22:22.09,0:22:23.01,Default,,0000,0000,0000,,or
Dialogue: 0,0:22:23.01,0:22:26.76,Default,,0000,0000,0000,,Octave. Today, say the art in numerical linear algebra is that
Dialogue: 0,0:22:26.76,0:22:31.04,Default,,0000,0000,0000,,SVD, singular value decompositions, and matrixes can be computed
Dialogue: 0,0:22:31.04,0:22:34.45,Default,,0000,0000,0000,,extremely [inaudible]. We've
Dialogue: 0,0:22:34.45,0:22:35.78,Default,,0000,0000,0000,,used a
Dialogue: 0,0:22:35.78,0:22:37.61,Default,,0000,0000,0000,,package like Matlab or Octave
Dialogue: 0,0:22:37.61,0:22:40.41,Default,,0000,0000,0000,,to compute, say, the eigen vectors of a matrix.
Dialogue: 0,0:22:40.41,0:22:43.54,Default,,0000,0000,0000,,So if SVD
Dialogue: 0,0:22:43.54,0:22:47.98,Default,,0000,0000,0000,,routines are even more numerically stable than eigen vector routines for finding eigen vector in the
Dialogue: 0,0:22:47.98,0:22:49.23,Default,,0000,0000,0000,,matrix. So you can
Dialogue: 0,0:22:49.23,0:22:50.25,Default,,0000,0000,0000,,safely
Dialogue: 0,0:22:50.25,0:22:53.11,Default,,0000,0000,0000,,use a routine like this, and similar to the way they use
Dialogue: 0,0:22:53.11,0:22:56.03,Default,,0000,0000,0000,,a square root command without thinking about
Dialogue: 0,0:22:56.03,0:22:58.40,Default,,0000,0000,0000,,how it's computed. You can compute the square
Dialogue: 0,0:22:58.40,0:23:03.64,Default,,0000,0000,0000,,root of something and just not worry about it. You know the computer will give you the right
Dialogue: 0,0:23:03.64,0:23:07.67,Default,,0000,0000,0000,,answer. For most reasonably-sized matrixes, even up to thousands by thousands
Dialogue: 0,0:23:07.67,0:23:11.18,Default,,0000,0000,0000,,matrixes, the SVD routine, I think of it as a square root function. If
Dialogue: 0,0:23:11.18,0:23:14.70,Default,,0000,0000,0000,,you call it, it'll give you back the right answer. You don't have to worry
Dialogue: 0,0:23:14.70,0:23:16.76,Default,,0000,0000,0000,,too much about
Dialogue: 0,0:23:16.76,0:23:20.51,Default,,0000,0000,0000,,it. If you have extremely large matrixes, like a million by a million matrixes, I
Dialogue: 0,0:23:20.51,0:23:23.20,Default,,0000,0000,0000,,might start to worry a bit, but a few thousand by a few
Dialogue: 0,0:23:23.20,0:23:25.70,Default,,0000,0000,0000,,thousand matrixes, this is
Dialogue: 0,0:23:25.70,0:23:29.33,Default,,0000,0000,0000,,implemented very well today.
Dialogue: 0,0:23:29.33,0:23:31.36,Default,,0000,0000,0000,,[Inaudible].
Dialogue: 0,0:23:31.36,0:23:34.99,Default,,0000,0000,0000,,What's the complexity of SVD? That's a good question. I actually don't know. I want to guess it's roughly on the
Dialogue: 0,0:23:34.99,0:23:36.47,Default,,0000,0000,0000,,order of N-cubed.
Dialogue: 0,0:23:36.47,0:23:41.75,Default,,0000,0000,0000,,I'm not sure. [Inaudible]
Dialogue: 0,0:23:41.75,0:23:45.01,Default,,0000,0000,0000,,algorithms, so I think - I don't know what's
Dialogue: 0,0:23:45.01,0:23:50.47,Default,,0000,0000,0000,,known about the conversion
Dialogue: 0,0:23:50.47,0:23:54.37,Default,,0000,0000,0000,,of
Dialogue: 0,0:23:54.37,0:23:58.28,Default,,0000,0000,0000,,these algorithms. The example I drew out was for a facts matrix, and a matrix is
Dialogue: 0,0:23:58.28,0:23:59.89,Default,,0000,0000,0000,,[inaudible].
Dialogue: 0,0:23:59.89,0:24:03.42,Default,,0000,0000,0000,,In the same way, you can also
Dialogue: 0,0:24:03.42,0:24:08.38,Default,,0000,0000,0000,,call SVD on the tall matrix, so it's taller than it's wide.
Dialogue: 0,0:24:08.38,0:24:15.38,Default,,0000,0000,0000,,It would decompose it into - okay? A
Dialogue: 0,0:24:21.48,0:24:24.19,Default,,0000,0000,0000,,product of three matrixes like
Dialogue: 0,0:24:24.19,0:24:28.47,Default,,0000,0000,0000,,that.
Dialogue: 0,0:24:28.47,0:24:31.22,Default,,0000,0000,0000,,The nice thing about this is that
Dialogue: 0,0:24:31.22,0:24:33.17,Default,,0000,0000,0000,,we can use it to compute
Dialogue: 0,0:24:33.17,0:24:40.17,Default,,0000,0000,0000,,eigen vectors and PCA very efficiently.
Dialogue: 0,0:24:42.15,0:24:47.38,Default,,0000,0000,0000,,In particular, a
Dialogue: 0,0:24:47.38,0:24:52.69,Default,,0000,0000,0000,,covariance matrix sigma was
Dialogue: 0,0:24:52.69,0:24:55.41,Default,,0000,0000,0000,,this. It was the sum of all the products,
Dialogue: 0,0:24:55.41,0:25:00.75,Default,,0000,0000,0000,,so if you go back and recall the definition of the
Dialogue: 0,0:25:00.75,0:25:01.82,Default,,0000,0000,0000,,design matrix -
Dialogue: 0,0:25:01.82,0:25:05.08,Default,,0000,0000,0000,,I think I described this in
Dialogue: 0,0:25:05.08,0:25:07.72,Default,,0000,0000,0000,,lecture two when
Dialogue: 0,0:25:07.72,0:25:13.51,Default,,0000,0000,0000,,we derived the close form solution to these squares [inaudible] these squares. The
Dialogue: 0,0:25:13.51,0:25:17.86,Default,,0000,0000,0000,,design matrix was this matrix where I took my
Dialogue: 0,0:25:17.86,0:25:21.39,Default,,0000,0000,0000,,examples
Dialogue: 0,0:25:21.39,0:25:26.16,Default,,0000,0000,0000,,and stacked them in
Dialogue: 0,0:25:26.16,0:25:27.59,Default,,0000,0000,0000,,rows.
Dialogue: 0,0:25:27.59,0:25:33.32,Default,,0000,0000,0000,,They call this the design matrix [inaudible]. So if
Dialogue: 0,0:25:33.32,0:25:38.78,Default,,0000,0000,0000,,you construct the design matrix, then
Dialogue: 0,0:25:38.78,0:25:43.24,Default,,0000,0000,0000,,the covariance matrix sigma
Dialogue: 0,0:25:43.24,0:25:47.76,Default,,0000,0000,0000,,can be written just X transposing.
Dialogue: 0,0:26:01.27,0:26:08.27,Default,,0000,0000,0000,,That's X transposed, and [inaudible].
Dialogue: 0,0:26:16.23,0:26:16.75,Default,,0000,0000,0000,,Okay?
Dialogue: 0,0:26:16.75,0:26:18.86,Default,,0000,0000,0000,,I hope you see why the X transpose X gives you
Dialogue: 0,0:26:18.86,0:26:22.46,Default,,0000,0000,0000,,the sum of products of
Dialogue: 0,0:26:22.46,0:26:25.65,Default,,0000,0000,0000,,vectors. If you aren't seeing this right now, just go home and convince yourself
Dialogue: 0,0:26:25.65,0:26:32.65,Default,,0000,0000,0000,,[inaudible] if it's
Dialogue: 0,0:26:37.88,0:26:40.02,Default,,0000,0000,0000,,true.
Dialogue: 0,0:26:40.02,0:26:47.02,Default,,0000,0000,0000,,To get the top K eigen vectors of
Dialogue: 0,0:26:50.61,0:26:50.95,Default,,0000,0000,0000,,sigma,
Dialogue: 0,0:26:50.95,0:26:55.89,Default,,0000,0000,0000,,you would take sigma
Dialogue: 0,0:26:55.89,0:26:57.42,Default,,0000,0000,0000,,and decompose it
Dialogue: 0,0:26:57.42,0:26:59.65,Default,,0000,0000,0000,,using
Dialogue: 0,0:26:59.65,0:27:01.88,Default,,0000,0000,0000,,the -
Dialogue: 0,0:27:01.88,0:27:06.71,Default,,0000,0000,0000,,excuse me.
Dialogue: 0,0:27:06.71,0:27:11.96,Default,,0000,0000,0000,,You would take the matrix X, and you would compute as SVD. So you get USV transpose.
Dialogue: 0,0:27:11.96,0:27:15.40,Default,,0000,0000,0000,,Then the top three
Dialogue: 0,0:27:15.40,0:27:19.17,Default,,0000,0000,0000,,columns
Dialogue: 0,0:27:19.17,0:27:21.66,Default,,0000,0000,0000,,of U
Dialogue: 0,0:27:21.66,0:27:24.34,Default,,0000,0000,0000,,are the top K eigen
Dialogue: 0,0:27:24.34,0:27:26.83,Default,,0000,0000,0000,,vectors
Dialogue: 0,0:27:26.83,0:27:29.42,Default,,0000,0000,0000,,of
Dialogue: 0,0:27:29.42,0:27:31.91,Default,,0000,0000,0000,,X transpose
Dialogue: 0,0:27:31.91,0:27:33.66,Default,,0000,0000,0000,,X,
Dialogue: 0,0:27:33.66,0:27:37.76,Default,,0000,0000,0000,,which is therefore, the top K eigen vectors of your
Dialogue: 0,0:27:37.76,0:27:40.44,Default,,0000,0000,0000,,covariance matrix
Dialogue: 0,0:27:40.44,0:27:42.25,Default,,0000,0000,0000,,sigma. So
Dialogue: 0,0:27:42.25,0:27:46.17,Default,,0000,0000,0000,,in our examples, the
Dialogue: 0,0:27:46.17,0:27:48.95,Default,,0000,0000,0000,,design matrix may be, say R. If you have
Dialogue: 0,0:27:48.95,0:27:52.45,Default,,0000,0000,0000,,50,000 words in your dictionary, than the
Dialogue: 0,0:27:52.45,0:27:55.05,Default,,0000,0000,0000,,design matrix would be
Dialogue: 0,0:27:55.05,0:27:58.15,Default,,0000,0000,0000,,RM by 50,000.
Dialogue: 0,0:27:58.15,0:28:01.87,Default,,0000,0000,0000,,[Inaudible] say 100 by 50,000, if you have 100 examples.
Dialogue: 0,0:28:01.87,0:28:03.25,Default,,0000,0000,0000,,So X would be
Dialogue: 0,0:28:03.25,0:28:06.51,Default,,0000,0000,0000,,quite tractable to represent and compute the
Dialogue: 0,0:28:06.51,0:28:10.42,Default,,0000,0000,0000,,SVD, whereas the matrix sigma would be much harder to represent. This is
Dialogue: 0,0:28:10.42,0:28:12.26,Default,,0000,0000,0000,,50,000 by 50,000.
Dialogue: 0,0:28:12.26,0:28:15.87,Default,,0000,0000,0000,,So this gives you an efficient way to implement
Dialogue: 0,0:28:15.87,0:28:18.07,Default,,0000,0000,0000,,PCA.
Dialogue: 0,0:28:18.07,0:28:20.56,Default,,0000,0000,0000,,The reason I want to talk about this is
Dialogue: 0,0:28:20.56,0:28:24.62,Default,,0000,0000,0000,,in previous years, I didn't talk [inaudible].
Dialogue: 0,0:28:24.62,0:28:28.76,Default,,0000,0000,0000,,The class projects, I found a number of students trying to implement SVD on huge
Dialogue: 0,0:28:28.76,0:28:29.58,Default,,0000,0000,0000,,problems and [inaudible],
Dialogue: 0,0:28:29.58,0:28:31.95,Default,,0000,0000,0000,,so this is
Dialogue: 0,0:28:31.95,0:28:35.01,Default,,0000,0000,0000,,a much better to implement PCA
Dialogue: 0,0:28:35.01,0:28:37.60,Default,,0000,0000,0000,,if you have extremely high dimensional data. If you have
Dialogue: 0,0:28:37.60,0:28:39.60,Default,,0000,0000,0000,,low dimensional data, if
Dialogue: 0,0:28:39.60,0:28:43.50,Default,,0000,0000,0000,,you have 50 or 100 dimensional data, then
Dialogue: 0,0:28:43.50,0:28:44.97,Default,,0000,0000,0000,,computing sigma's no problem. You can
Dialogue: 0,0:28:44.97,0:28:51.31,Default,,0000,0000,0000,,do it the old way, but otherwise, use the SVD to implement this.
Dialogue: 0,0:28:52.78,0:28:59.78,Default,,0000,0000,0000,,Questions about this?
Dialogue: 0,0:29:26.18,0:29:30.81,Default,,0000,0000,0000,,The last thing I want to say is that in practice, when you want to implement this, I want to say a note
Dialogue: 0,0:29:30.81,0:29:32.24,Default,,0000,0000,0000,,of caution.
Dialogue: 0,0:29:32.24,0:29:35.01,Default,,0000,0000,0000,,It turns out that
Dialogue: 0,0:29:35.01,0:29:38.91,Default,,0000,0000,0000,,for many applications of - let's see.
Dialogue: 0,0:29:38.91,0:29:42.42,Default,,0000,0000,0000,,When you apply SVD to these wide - yeah.
Dialogue: 0,0:29:42.42,0:29:43.45,Default,,0000,0000,0000,,Just a quick question. Are
Dialogue: 0,0:29:43.45,0:29:48.95,Default,,0000,0000,0000,,the top K columns of U or V because X transposed X is V transpose, right? Let's see. Oh,
Dialogue: 0,0:29:48.95,0:29:52.94,Default,,0000,0000,0000,,yes.
Dialogue: 0,0:29:52.94,0:29:54.57,Default,,0000,0000,0000,,I think you're
Dialogue: 0,0:29:54.57,0:29:55.51,Default,,0000,0000,0000,,right. I
Dialogue: 0,0:29:55.51,0:30:02.51,Default,,0000,0000,0000,,think you're right.
Dialogue: 0,0:30:03.96,0:30:05.55,Default,,0000,0000,0000,,Let's
Dialogue: 0,0:30:05.55,0:30:12.55,Default,,0000,0000,0000,,see. Is it top K columns of U or top K of V?
Dialogue: 0,0:30:15.04,0:30:22.04,Default,,0000,0000,0000,,Yeah, I think you're right. Is that right? Something
Dialogue: 0,0:30:31.56,0:30:38.56,Default,,0000,0000,0000,,bothers me about that, but I think you're right.
Dialogue: 0,0:30:40.33,0:30:46.35,Default,,0000,0000,0000,,[Inaudible], so then X transpose X should be VDD. X
Dialogue: 0,0:30:46.35,0:30:53.35,Default,,0000,0000,0000,,is UDV, so X transpose X would be - [Inaudible].
Dialogue: 0,0:31:01.14,0:31:03.39,Default,,0000,0000,0000,,If anyone thinks about this and
Dialogue: 0,0:31:03.39,0:31:06.20,Default,,0000,0000,0000,,has another opinion, let me know, but I think you're
Dialogue: 0,0:31:06.20,0:31:11.48,Default,,0000,0000,0000,,right. I'll make sure I get the details and let you
Dialogue: 0,0:31:11.48,0:31:16.68,Default,,0000,0000,0000,,know.
Dialogue: 0,0:31:16.68,0:31:22.24,Default,,0000,0000,0000,,Everyone's still looking at that.
Dialogue: 0,0:31:22.24,0:31:22.72,Default,,0000,0000,0000,,Tom, can you figure out
Dialogue: 0,0:31:22.72,0:31:25.11,Default,,0000,0000,0000,,the right answer and let me know? That sounds right. Okay. Cool. Okay.
Dialogue: 0,0:31:25.11,0:31:30.07,Default,,0000,0000,0000,,So just
Dialogue: 0,0:31:30.07,0:31:33.92,Default,,0000,0000,0000,,one last note, a note of caution. It turns out that
Dialogue: 0,0:31:33.92,0:31:36.52,Default,,0000,0000,0000,,in this example, I was implementing SVD
Dialogue: 0,0:31:36.52,0:31:43.52,Default,,0000,0000,0000,,with a wide matrix. So the matrix X was N by N.
Dialogue: 0,0:31:44.13,0:31:47.01,Default,,0000,0000,0000,,It turns out when you
Dialogue: 0,0:31:47.01,0:31:52.17,Default,,0000,0000,0000,,find the SVD decomposition of this,
Dialogue: 0,0:31:52.17,0:31:57.87,Default,,0000,0000,0000,,it turns out that -
Dialogue: 0,0:31:57.87,0:32:01.03,Default,,0000,0000,0000,,let's see. Yeah, I think you're definitely
Dialogue: 0,0:32:01.03,0:32:04.23,Default,,0000,0000,0000,,right. So it turns out that we find the SVD
Dialogue: 0,0:32:04.23,0:32:07.59,Default,,0000,0000,0000,,of this, the right-most portion of this block of this matrix would be all
Dialogue: 0,0:32:07.59,0:32:14.59,Default,,0000,0000,0000,,zeros.
Dialogue: 0,0:32:15.13,0:32:17.85,Default,,0000,0000,0000,,Also, when you compute the
Dialogue: 0,0:32:17.85,0:32:22.19,Default,,0000,0000,0000,,matrix, D, a large part of this matrix would be zeros.
Dialogue: 0,0:32:22.19,0:32:24.34,Default,,0000,0000,0000,,You have the matrix D
Dialogue: 0,0:32:24.34,0:32:29.52,Default,,0000,0000,0000,,transpose. So depending on what convention you use,
Dialogue: 0,0:32:29.52,0:32:36.15,Default,,0000,0000,0000,,for example, I think Matlab actually uses a convention of just
Dialogue: 0,0:32:36.15,0:32:43.15,Default,,0000,0000,0000,,cutting off the zero elements.
Dialogue: 0,0:32:47.78,0:32:50.75,Default,,0000,0000,0000,,So the Matlab uses the convention
Dialogue: 0,0:32:50.75,0:32:54.26,Default,,0000,0000,0000,,of chopping off the right-most half of the U matrix and chopping off the bottom
Dialogue: 0,0:32:54.26,0:32:56.52,Default,,0000,0000,0000,,portion of the D matrix. I'm not sure
Dialogue: 0,0:32:56.52,0:33:00.86,Default,,0000,0000,0000,,if this even depends on the version of Matlab,
Dialogue: 0,0:33:00.86,0:33:04.06,Default,,0000,0000,0000,,but when you call SVD on Matlab or some other numerical algebra packages,
Dialogue: 0,0:33:04.06,0:33:07.82,Default,,0000,0000,0000,,there's slightly different conventions of how to define your SVD when
Dialogue: 0,0:33:07.82,0:33:10.26,Default,,0000,0000,0000,,the matrix is wider than it is tall.
Dialogue: 0,0:33:10.26,0:33:13.39,Default,,0000,0000,0000,,So just watch out for this and make sure you map
Dialogue: 0,0:33:13.39,0:33:14.91,Default,,0000,0000,0000,,whatever convention
Dialogue: 0,0:33:14.91,0:33:17.63,Default,,0000,0000,0000,,your numerical algebra library uses
Dialogue: 0,0:33:17.63,0:33:22.62,Default,,0000,0000,0000,,to the original computations.
Dialogue: 0,0:33:22.62,0:33:23.57,Default,,0000,0000,0000,,It turns out if
Dialogue: 0,0:33:23.57,0:33:26.06,Default,,0000,0000,0000,,you turn Matlab [inaudible]
Dialogue: 0,0:33:26.06,0:33:30.20,Default,,0000,0000,0000,,or you're writing C code. There are many scientific libraries that can
Dialogue: 0,0:33:30.20,0:33:31.59,Default,,0000,0000,0000,,
Dialogue: 0,0:33:31.59,0:33:36.11,Default,,0000,0000,0000,,compute SVDs for you, but they're just slightly different in
Dialogue: 0,0:33:36.11,0:33:39.33,Default,,0000,0000,0000,,conventions for the dimensions of these matrixes. So just make sure you figure this out for the
Dialogue: 0,0:33:39.33,0:33:44.23,Default,,0000,0000,0000,,package that you use.
Dialogue: 0,0:33:44.23,0:33:46.23,Default,,0000,0000,0000,,Finally, I just want to
Dialogue: 0,0:33:46.23,0:33:49.78,Default,,0000,0000,0000,,take the unsupervised learning algorithms we talked about and just put a little bit
Dialogue: 0,0:33:49.78,0:33:52.01,Default,,0000,0000,0000,,of broader context.
Dialogue: 0,0:33:52.01,0:33:56.45,Default,,0000,0000,0000,,This is partly in response to the questions I've gotten from students in
Dialogue: 0,0:33:56.45,0:33:57.95,Default,,0000,0000,0000,,office hours and elsewhere
Dialogue: 0,0:33:57.95,0:34:00.62,Default,,0000,0000,0000,,about when to use each of these
Dialogue: 0,0:34:00.62,0:34:01.46,Default,,0000,0000,0000,,algorithms. So
Dialogue: 0,0:34:01.46,0:34:05.70,Default,,0000,0000,0000,,I'm going to draw a two by two matrix. This is a little cartoon that
Dialogue: 0,0:34:07.80,0:34:10.69,Default,,0000,0000,0000,,I find useful.
Dialogue: 0,0:34:15.41,0:34:20.16,Default,,0000,0000,0000,,One of the algorithms we talked about earlier, right
Dialogue: 0,0:34:20.16,0:34:21.46,Default,,0000,0000,0000,,before this, was
Dialogue: 0,0:34:21.46,0:34:24.16,Default,,0000,0000,0000,,factor analysis, which was - it
Dialogue: 0,0:34:24.16,0:34:27.92,Default,,0000,0000,0000,,was - I hope you remember that picture I drew where I would have a bunch of point Z on the
Dialogue: 0,0:34:27.92,0:34:28.80,Default,,0000,0000,0000,,line.
Dialogue: 0,0:34:28.80,0:34:32.97,Default,,0000,0000,0000,,Then I had these ellipses that I drew. I hope you
Dialogue: 0,0:34:32.97,0:34:34.100,Default,,0000,0000,0000,,remember that
Dialogue: 0,0:34:34.100,0:34:37.92,Default,,0000,0000,0000,,picture. This was a factor analysis model
Dialogue: 0,0:34:37.92,0:34:38.64,Default,,0000,0000,0000,,which models
Dialogue: 0,0:34:38.64,0:34:42.39,Default,,0000,0000,0000,,the density effects [inaudible], right?
Dialogue: 0,0:34:42.39,0:34:44.64,Default,,0000,0000,0000,,It was also a PCA, just now.
Dialogue: 0,0:34:44.64,0:34:45.80,Default,,0000,0000,0000,,So the
Dialogue: 0,0:34:45.80,0:34:48.97,Default,,0000,0000,0000,,difference between factor analysis and PCA, the
Dialogue: 0,0:34:48.97,0:34:51.83,Default,,0000,0000,0000,,way I think about it, is that factor analysis
Dialogue: 0,0:34:51.83,0:34:53.96,Default,,0000,0000,0000,,is a density estimation algorithm.
Dialogue: 0,0:34:53.96,0:34:57.74,Default,,0000,0000,0000,,It tries to model the density of the training example's X.
Dialogue: 0,0:34:57.74,0:35:01.25,Default,,0000,0000,0000,,Whereas PCA
Dialogue: 0,0:35:02.31,0:35:05.70,Default,,0000,0000,0000,,is not a probabilistic
Dialogue: 0,0:35:05.70,0:35:06.75,Default,,0000,0000,0000,,algorithm. In particular,
Dialogue: 0,0:35:06.75,0:35:11.27,Default,,0000,0000,0000,,it does not endow your training examples of any probabilistic
Dialogue: 0,0:35:11.27,0:35:14.41,Default,,0000,0000,0000,,distributions and directly goes to find the subspace.
Dialogue: 0,0:35:14.41,0:35:18.65,Default,,0000,0000,0000,,So in terms of when to use factor analysis and when to use PCA,
Dialogue: 0,0:35:18.65,0:35:21.70,Default,,0000,0000,0000,,if your goal is to reduce the dimension of the data,
Dialogue: 0,0:35:21.70,0:35:26.14,Default,,0000,0000,0000,,if your goal is to find the subspace that the data lies on,
Dialogue: 0,0:35:26.14,0:35:27.31,Default,,0000,0000,0000,,then PCA
Dialogue: 0,0:35:27.31,0:35:28.32,Default,,0000,0000,0000,,directly
Dialogue: 0,0:35:28.32,0:35:32.18,Default,,0000,0000,0000,,tries to find the subspace. I think I would
Dialogue: 0,0:35:32.18,0:35:35.28,Default,,0000,0000,0000,,tend to use PCA.
Dialogue: 0,0:35:35.28,0:35:39.89,Default,,0000,0000,0000,,Factor analysis, it sort of assumes the data lies on a subspace.
Dialogue: 0,0:35:39.89,0:35:45.04,Default,,0000,0000,0000,,Let me write a subspace here.
Dialogue: 0,0:35:45.04,0:35:49.53,Default,,0000,0000,0000,,So both of these algorithms sort of assume the data maybe lies close
Dialogue: 0,0:35:49.53,0:35:52.01,Default,,0000,0000,0000,,or on some low dimensional subspace,
Dialogue: 0,0:35:52.01,0:35:55.100,Default,,0000,0000,0000,,but fundamentally, factor analysis, I think of it as a density estimation algorithm.
Dialogue: 0,0:35:55.100,0:35:58.74,Default,,0000,0000,0000,,So that has some very high dimensional distribution. I
Dialogue: 0,0:35:58.74,0:36:00.88,Default,,0000,0000,0000,,want to model P of X, then
Dialogue: 0,0:36:00.88,0:36:04.52,Default,,0000,0000,0000,,the factor analysis is the algorithm I'm more inclined
Dialogue: 0,0:36:04.52,0:36:05.40,Default,,0000,0000,0000,,to use. So
Dialogue: 0,0:36:05.40,0:36:07.45,Default,,0000,0000,0000,,even though you could in theory,
Dialogue: 0,0:36:07.45,0:36:12.20,Default,,0000,0000,0000,,I would tend to avoid trying to use factor analysis to
Dialogue: 0,0:36:12.20,0:36:14.35,Default,,0000,0000,0000,,identify a subspace the
Dialogue: 0,0:36:14.35,0:36:15.62,Default,,0000,0000,0000,,data
Dialogue: 0,0:36:15.62,0:36:18.45,Default,,0000,0000,0000,,set lies on. So [inaudible], if you want to do
Dialogue: 0,0:36:18.45,0:36:21.70,Default,,0000,0000,0000,,anomaly detection, if you want to model P of X
Dialogue: 0,0:36:21.70,0:36:26.12,Default,,0000,0000,0000,,so that if you have a very low probability of N, you can factor an anomaly,
Dialogue: 0,0:36:26.12,0:36:32.82,Default,,0000,0000,0000,,then I would tend to use factor analysis to do that density estimation. So factor
Dialogue: 0,0:36:32.82,0:36:38.19,Default,,0000,0000,0000,,analysis and PCA are both algorithms that assume that your data lies in the subspace.
Dialogue: 0,0:36:38.19,0:36:41.76,Default,,0000,0000,0000,,The other cause of algorithms we talked about was
Dialogue: 0,0:36:41.76,0:36:45.38,Default,,0000,0000,0000,,algorithms that assumes the data
Dialogue: 0,0:36:45.38,0:36:47.38,Default,,0000,0000,0000,,lies in
Dialogue: 0,0:36:47.38,0:36:51.41,Default,,0000,0000,0000,,clumps or that the
Dialogue: 0,0:36:51.41,0:36:52.36,Default,,0000,0000,0000,,data
Dialogue: 0,0:36:52.36,0:36:57.87,Default,,0000,0000,0000,,has a few coherence to groups. So let me just fill in the rest of this picture.
Dialogue: 0,0:37:08.58,0:37:09.50,Default,,0000,0000,0000,,So if you think your data lies
Dialogue: 0,0:37:09.50,0:37:14.27,Default,,0000,0000,0000,,in clumps or lies in groups, and if it goes [inaudible]
Dialogue: 0,0:37:14.27,0:37:16.13,Default,,0000,0000,0000,,density estimation, then I would
Dialogue: 0,0:37:16.13,0:37:19.58,Default,,0000,0000,0000,,tend to use a mixture of [inaudible]
Dialogue: 0,0:37:19.58,0:37:23.35,Default,,0000,0000,0000,,algorithm. But again, you don't necessarily want to endow your data of any probably
Dialogue: 0,0:37:23.35,0:37:26.92,Default,,0000,0000,0000,,semantics, so if you just want to find the clumps of the groups, then I'd be inclined
Dialogue: 0,0:37:26.92,0:37:28.100,Default,,0000,0000,0000,,to use a [inaudible] algorithm. So
Dialogue: 0,0:37:29.77,0:37:33.29,Default,,0000,0000,0000,,haven't seen anyone else draw this picture before, but I tend to organize these things
Dialogue: 0,0:37:33.29,0:37:34.99,Default,,0000,0000,0000,,this way in my brain.
Dialogue: 0,0:37:34.99,0:37:36.42,Default,,0000,0000,0000,,Hopefully this helps guide
Dialogue: 0,0:37:36.42,0:37:40.46,Default,,0000,0000,0000,,when you might use each of these algorithms as well, depending
Dialogue: 0,0:37:40.46,0:37:44.72,Default,,0000,0000,0000,,on whether you believe the data might lie in the subspace or whether it might bind in
Dialogue: 0,0:37:44.72,0:37:47.90,Default,,0000,0000,0000,,clumps or groups.
Dialogue: 0,0:37:50.72,0:37:53.79,Default,,0000,0000,0000,,All right.
Dialogue: 0,0:37:53.79,0:38:00.79,Default,,0000,0000,0000,,That wraps up the discussion on
Dialogue: 0,0:38:02.63,0:38:08.72,Default,,0000,0000,0000,,PCA. What I want to do next is talk about
Dialogue: 0,0:38:08.72,0:38:15.33,Default,,0000,0000,0000,,independent component analysis, or ICA. Yeah. Interviewee: I have
Dialogue: 0,0:38:15.33,0:38:17.58,Default,,0000,0000,0000,,a
Dialogue: 0,0:38:17.58,0:38:21.59,Default,,0000,0000,0000,,question about the upper right [inaudible]. So once you have all of the eigen vectors,
Dialogue: 0,0:38:21.59,0:38:26.18,Default,,0000,0000,0000,,[inaudible] how similar is feature I to
Dialogue: 0,0:38:26.18,0:38:29.96,Default,,0000,0000,0000,,feature J. You pick some eigen vector, and you take some dot products between the
Dialogue: 0,0:38:29.96,0:38:31.57,Default,,0000,0000,0000,,feature I and
Dialogue: 0,0:38:31.57,0:38:35.02,Default,,0000,0000,0000,,feature J and the eigen vector. But
Dialogue: 0,0:38:35.02,0:38:39.63,Default,,0000,0000,0000,,there's a lot of eigen vectors to choose from. Instructor
Dialogue: 0,0:38:39.63,0:38:42.05,Default,,0000,0000,0000,,(Andrew Ng):Right. So Justin's question was
Dialogue: 0,0:38:42.05,0:38:45.88,Default,,0000,0000,0000,,having found my eigen vectors, how do I choose what eigen vector to use to
Dialogue: 0,0:38:45.88,0:38:47.54,Default,,0000,0000,0000,,measure distance. I'm
Dialogue: 0,0:38:47.54,0:38:48.95,Default,,0000,0000,0000,,going to
Dialogue: 0,0:38:48.95,0:38:51.02,Default,,0000,0000,0000,,start
Dialogue: 0,0:38:51.02,0:38:53.29,Default,,0000,0000,0000,,this up.
Dialogue: 0,0:38:53.29,0:38:57.30,Default,,0000,0000,0000,,So the
Dialogue: 0,0:38:57.30,0:38:58.32,Default,,0000,0000,0000,,answer is really
Dialogue: 0,0:38:58.32,0:39:02.03,Default,,0000,0000,0000,,- in this cartoon, I would avoid thinking about
Dialogue: 0,0:39:02.03,0:39:03.92,Default,,0000,0000,0000,,eigen vectors one other time.
Dialogue: 0,0:39:03.92,0:39:08.05,Default,,0000,0000,0000,,A better way to view this cartoon is that this is actually -
Dialogue: 0,0:39:08.05,0:39:11.60,Default,,0000,0000,0000,,if I decide to choose 100 eigen vectors, this is really 100 D
Dialogue: 0,0:39:11.60,0:39:18.60,Default,,0000,0000,0000,,subspace.
Dialogue: 0,0:39:19.26,0:39:20.34,Default,,0000,0000,0000,,So
Dialogue: 0,0:39:20.34,0:39:24.88,Default,,0000,0000,0000,,I'm not actually projecting my data onto one eigen vector.
Dialogue: 0,0:39:24.88,0:39:29.77,Default,,0000,0000,0000,,This arrow, this cartoon, this denotes the 100-dimensional
Dialogue: 0,0:39:29.77,0:39:32.09,Default,,0000,0000,0000,,subspace [inaudible] by all my eigen vectors.
Dialogue: 0,0:39:32.09,0:39:36.12,Default,,0000,0000,0000,,So what I actually do is project my data onto
Dialogue: 0,0:39:36.12,0:39:40.15,Default,,0000,0000,0000,,the span, the linear span of eigen vectors. Then I
Dialogue: 0,0:39:40.15,0:39:41.44,Default,,0000,0000,0000,,measure distance or take
Dialogue: 0,0:39:41.44,0:39:43.49,Default,,0000,0000,0000,,inner products of the distance between
Dialogue: 0,0:39:43.49,0:39:49.83,Default,,0000,0000,0000,,the projections of the two points of the eigen vectors. Okay.
Dialogue: 0,0:39:49.83,0:39:54.14,Default,,0000,0000,0000,,So let's talk about ICA,
Dialogue: 0,0:39:54.14,0:39:58.75,Default,,0000,0000,0000,,independent component analysis.
Dialogue: 0,0:39:58.75,0:40:00.75,Default,,0000,0000,0000,,So whereas PCA
Dialogue: 0,0:40:00.75,0:40:02.60,Default,,0000,0000,0000,,was an algorithm for finding
Dialogue: 0,0:40:02.60,0:40:06.70,Default,,0000,0000,0000,,what I call the main axis of variations of data,
Dialogue: 0,0:40:06.70,0:40:11.20,Default,,0000,0000,0000,,in ICA, we're going to try find the independent of components of variations in the
Dialogue: 0,0:40:11.20,0:40:12.04,Default,,0000,0000,0000,,data.
Dialogue: 0,0:40:12.04,0:40:14.94,Default,,0000,0000,0000,,So switch it to the laptop there, please.
Dialogue: 0,0:40:14.94,0:40:16.12,Default,,0000,0000,0000,,We'll just
Dialogue: 0,0:40:16.12,0:40:21.90,Default,,0000,0000,0000,,take a second to motivate that. I'm
Dialogue: 0,0:40:21.90,0:40:26.77,Default,,0000,0000,0000,,going to do so by
Dialogue: 0,0:40:26.77,0:40:32.43,Default,,0000,0000,0000,,- although if you put on the - okay. This is
Dialogue: 0,0:40:32.43,0:40:36.62,Default,,0000,0000,0000,,actually a slide that I showed in
Dialogue: 0,0:40:36.62,0:40:39.78,Default,,0000,0000,0000,,lecture one of the cocktail party problem.
Dialogue: 0,0:40:39.78,0:40:42.62,Default,,0000,0000,0000,,Suppose you have two speakers at a cocktail party,
Dialogue: 0,0:40:42.62,0:40:45.02,Default,,0000,0000,0000,,and you have two microphones in the
Dialogue: 0,0:40:45.02,0:40:46.12,Default,,0000,0000,0000,,room, overlapping
Dialogue: 0,0:40:46.12,0:40:47.96,Default,,0000,0000,0000,,sets of two conversations.
Dialogue: 0,0:40:47.96,0:40:51.64,Default,,0000,0000,0000,,Then can you separate out the two original speaker sources?
Dialogue: 0,0:40:51.64,0:40:55.65,Default,,0000,0000,0000,,So I actually played this audio as well in the very first lecture, which is
Dialogue: 0,0:40:55.65,0:40:59.07,Default,,0000,0000,0000,,suppose microphone one records this.
Dialogue: 0,0:40:59.07,0:41:05.49,Default,,0000,0000,0000,,[Recording]
Dialogue: 0,0:41:13.23,0:41:16.65,Default,,0000,0000,0000,,So the question is, these are really two speakers,
Dialogue: 0,0:41:16.65,0:41:20.81,Default,,0000,0000,0000,,speaking independently of each other. So each speaker is outputting
Dialogue: 0,0:41:20.81,0:41:24.70,Default,,0000,0000,0000,,a series of sound signals as independent of the other conversation
Dialogue: 0,0:41:24.70,0:41:26.12,Default,,0000,0000,0000,,going on in the room.
Dialogue: 0,0:41:26.12,0:41:27.84,Default,,0000,0000,0000,,So
Dialogue: 0,0:41:27.84,0:41:31.88,Default,,0000,0000,0000,,this being an supervised learning problem, the question is, can we take these two microphone
Dialogue: 0,0:41:31.88,0:41:33.90,Default,,0000,0000,0000,,recordings and feed it to
Dialogue: 0,0:41:33.90,0:41:37.33,Default,,0000,0000,0000,,an algorithm to find the independent components in
Dialogue: 0,0:41:37.33,0:41:38.45,Default,,0000,0000,0000,,this
Dialogue: 0,0:41:38.45,0:41:40.59,Default,,0000,0000,0000,,data? This is the output
Dialogue: 0,0:41:42.44,0:41:48.62,Default,,0000,0000,0000,,when we do so.
Dialogue: 0,0:41:48.62,0:41:55.05,Default,,0000,0000,0000,,[Recording] This is the other one. [Recording]
Dialogue: 0,0:41:55.94,0:41:59.86,Default,,0000,0000,0000,,Just for fun. [Inaudible]. These are audio clips I got
Dialogue: 0,0:42:01.41,0:42:04.41,Default,,0000,0000,0000,,from [inaudible]. Just for fun, let me play the other ones as well. This
Dialogue: 0,0:42:04.41,0:42:11.41,Default,,0000,0000,0000,,is overlapping microphone one. [Recording]
Dialogue: 0,0:42:13.44,0:42:20.44,Default,,0000,0000,0000,,Here's microphone two. [Recording]
Dialogue: 0,0:42:21.74,0:42:24.25,Default,,0000,0000,0000,,So given this as input, here's output one.
Dialogue: 0,0:42:24.25,0:42:27.46,Default,,0000,0000,0000,,
Dialogue: 0,0:42:27.46,0:42:30.91,Default,,0000,0000,0000,,[Recording]
Dialogue: 0,0:42:30.91,0:42:33.64,Default,,0000,0000,0000,,It's not perfect, but it's largely cleaned up the music.
Dialogue: 0,0:42:33.64,0:42:40.64,Default,,0000,0000,0000,,Here's number two. [Recording] Okay. Switch back to
Dialogue: 0,0:42:42.98,0:42:44.90,Default,,0000,0000,0000,,[inaudible], please.
Dialogue: 0,0:42:44.90,0:42:46.98,Default,,0000,0000,0000,,So
Dialogue: 0,0:42:46.98,0:42:53.98,Default,,0000,0000,0000,,what I want to do now is describe an algorithm that does that.
Dialogue: 0,0:42:54.83,0:42:58.30,Default,,0000,0000,0000,,Before
Dialogue: 0,0:42:58.30,0:43:03.24,Default,,0000,0000,0000,,I actually jump into the algorithm, I want to say two minutes
Dialogue: 0,0:43:03.24,0:43:03.80,Default,,0000,0000,0000,,of
Dialogue: 0,0:43:03.80,0:43:10.80,Default,,0000,0000,0000,,CDF, so cumulative distribution functions. I know most
Dialogue: 0,0:43:18.67,0:43:21.09,Default,,0000,0000,0000,,of you know what these are, but I'm
Dialogue: 0,0:43:21.09,0:43:23.60,Default,,0000,0000,0000,,just going to remind you of what they are.
Dialogue: 0,0:43:24.58,0:43:30.35,Default,,0000,0000,0000,,Let's say you have a one-D random variable S. So suppose you have
Dialogue: 0,0:43:30.35,0:43:35.90,Default,,0000,0000,0000,,a random variable, S,
Dialogue: 0,0:43:35.90,0:43:41.47,Default,,0000,0000,0000,,and suppose it has a property density function [inaudible].
Dialogue: 0,0:43:41.47,0:43:43.41,Default,,0000,0000,0000,,Then
Dialogue: 0,0:43:43.41,0:43:45.86,Default,,0000,0000,0000,,the CDF
Dialogue: 0,0:43:45.86,0:43:50.14,Default,,0000,0000,0000,,is defined as a function, or rather as F,
Dialogue: 0,0:43:50.14,0:43:53.73,Default,,0000,0000,0000,,which is the probability that the random variable,
Dialogue: 0,0:43:53.73,0:43:55.92,Default,,0000,0000,0000,,S, is less than the value
Dialogue: 0,0:43:55.92,0:43:58.54,Default,,0000,0000,0000,,given by that lower-case
Dialogue: 0,0:43:58.54,0:43:59.87,Default,,0000,0000,0000,,value,
Dialogue: 0,0:43:59.87,0:44:01.93,Default,,0000,0000,0000,,S.
Dialogue: 0,0:44:01.93,0:44:03.27,Default,,0000,0000,0000,,For example,
Dialogue: 0,0:44:03.27,0:44:06.10,Default,,0000,0000,0000,,if this is your [inaudible] density,
Dialogue: 0,0:44:06.10,0:44:10.23,Default,,0000,0000,0000,,than the density of the [inaudible] usually
Dialogue: 0,0:44:10.23,0:44:14.61,Default,,0000,0000,0000,,to note it lower-case phi. That's roughly a bell-shaped density. Then
Dialogue: 0,0:44:14.61,0:44:20.32,Default,,0000,0000,0000,,the CDF or the Gaussian
Dialogue: 0,0:44:20.32,0:44:22.27,Default,,0000,0000,0000,,will look something like this.
Dialogue: 0,0:44:22.27,0:44:24.96,Default,,0000,0000,0000,,There'll be a capital function
Dialogue: 0,0:44:24.96,0:44:27.34,Default,,0000,0000,0000,,pi. So if I pick a value
Dialogue: 0,0:44:27.34,0:44:29.08,Default,,0000,0000,0000,,S like that, then the
Dialogue: 0,0:44:29.08,0:44:30.45,Default,,0000,0000,0000,,height of this -
Dialogue: 0,0:44:30.45,0:44:32.57,Default,,0000,0000,0000,,this is [inaudible] probability that
Dialogue: 0,0:44:32.57,0:44:35.41,Default,,0000,0000,0000,,my Gaussian random variable is less than
Dialogue: 0,0:44:35.41,0:44:37.42,Default,,0000,0000,0000,,that value there. In other words,
Dialogue: 0,0:44:37.42,0:44:40.55,Default,,0000,0000,0000,,the height of the function at that point is
Dialogue: 0,0:44:40.55,0:44:44.23,Default,,0000,0000,0000,,less
Dialogue: 0,0:44:44.23,0:44:46.27,Default,,0000,0000,0000,,than the area of the Gaussian density,
Dialogue: 0,0:44:46.27,0:44:48.12,Default,,0000,0000,0000,,up to the point S.
Dialogue: 0,0:44:48.12,0:44:48.89,Default,,0000,0000,0000,,As you
Dialogue: 0,0:44:48.89,0:44:52.69,Default,,0000,0000,0000,,move further and further to the right, this function will approach one, as
Dialogue: 0,0:44:52.69,0:44:59.69,Default,,0000,0000,0000,,you integrate more and more of this area of the Gaussian. So another way to write
Dialogue: 0,0:45:04.84,0:45:11.84,Default,,0000,0000,0000,,F
Dialogue: 0,0:45:21.11,0:45:28.11,Default,,0000,0000,0000,,of
Dialogue: 0,0:45:30.66,0:45:34.62,Default,,0000,0000,0000,,S is the integral, the minus infinity
Dialogue: 0,0:45:34.62,0:45:35.73,Default,,0000,0000,0000,,to S of
Dialogue: 0,0:45:35.73,0:45:41.74,Default,,0000,0000,0000,,the density, DT.
Dialogue: 0,0:45:41.74,0:45:43.86,Default,,0000,0000,0000,,So something that'll come later is
Dialogue: 0,0:45:43.86,0:45:48.32,Default,,0000,0000,0000,,suppose I have a random variable, S, and I want to model the distribution of the random
Dialogue: 0,0:45:48.32,0:45:49.44,Default,,0000,0000,0000,,variable, S.
Dialogue: 0,0:45:49.44,0:45:53.45,Default,,0000,0000,0000,,So one thing I could do is I can specify
Dialogue: 0,0:45:53.45,0:45:56.55,Default,,0000,0000,0000,,what I think the density
Dialogue: 0,0:45:56.55,0:45:58.05,Default,,0000,0000,0000,,is.
Dialogue: 0,0:45:58.05,0:46:03.20,Default,,0000,0000,0000,,Or I can specify
Dialogue: 0,0:46:03.20,0:46:04.45,Default,,0000,0000,0000,,what the
Dialogue: 0,0:46:04.45,0:46:08.10,Default,,0000,0000,0000,,CDF
Dialogue: 0,0:46:08.10,0:46:11.36,Default,,0000,0000,0000,,is. These are related by this equation. F is the integral of P of S. You
Dialogue: 0,0:46:11.36,0:46:13.99,Default,,0000,0000,0000,,can also
Dialogue: 0,0:46:13.99,0:46:15.72,Default,,0000,0000,0000,,recover the density
Dialogue: 0,0:46:15.72,0:46:20.47,Default,,0000,0000,0000,,by taking the CDF and taking the derivative. So F prime, take the derivative
Dialogue: 0,0:46:20.47,0:46:21.73,Default,,0000,0000,0000,,of the CDF,
Dialogue: 0,0:46:21.73,0:46:23.44,Default,,0000,0000,0000,,you get back the
Dialogue: 0,0:46:23.44,0:46:24.71,Default,,0000,0000,0000,,density. So this has come up
Dialogue: 0,0:46:24.71,0:46:28.18,Default,,0000,0000,0000,,in the middle of when I derive ICA, which is that
Dialogue: 0,0:46:28.18,0:46:32.17,Default,,0000,0000,0000,,there'll be a step where they need to assume a distribution for random variable, S.
Dialogue: 0,0:46:32.17,0:46:36.36,Default,,0000,0000,0000,,I can either specify the density for S directly, or I can specify the CDF. I
Dialogue: 0,0:46:36.36,0:46:38.82,Default,,0000,0000,0000,,choose to specify the
Dialogue: 0,0:46:39.92,0:46:41.53,Default,,0000,0000,0000,,CDF.
Dialogue: 0,0:46:41.53,0:46:46.92,Default,,0000,0000,0000,,It has to be some function increasing from zero to one.
Dialogue: 0,0:46:46.92,0:46:48.03,Default,,0000,0000,0000,,So you can
Dialogue: 0,0:46:48.03,0:46:50.68,Default,,0000,0000,0000,,choose any function that looks like that, and in particular,
Dialogue: 0,0:46:51.97,0:46:55.47,Default,,0000,0000,0000,,pulling functions out of a hat that look like that. You can, for instance, choose a
Dialogue: 0,0:46:55.47,0:46:58.99,Default,,0000,0000,0000,,sigmoid function of
Dialogue: 0,0:46:58.99,0:47:04.22,Default,,0000,0000,0000,,CDF. That would be one way of specifying the distribution of the densities for the random variable S. So
Dialogue: 0,0:47:04.22,0:47:05.11,Default,,0000,0000,0000,,this
Dialogue: 0,0:47:05.11,0:47:12.11,Default,,0000,0000,0000,,will come up later.
Dialogue: 0,0:47:30.30,0:47:33.58,Default,,0000,0000,0000,,Just [inaudible], just raise your hand if that is familiar to you, if you've seen
Dialogue: 0,0:47:33.58,0:47:40.58,Default,,0000,0000,0000,,that before. Great. So
Dialogue: 0,0:47:42.47,0:47:43.24,Default,,0000,0000,0000,,let's
Dialogue: 0,0:47:43.24,0:47:48.63,Default,,0000,0000,0000,,start to derive our RCA, or our independent component analysis
Dialogue: 0,0:47:48.63,0:47:50.43,Default,,0000,0000,0000,,algorithm.
Dialogue: 0,0:47:50.43,0:47:53.99,Default,,0000,0000,0000,,Let's assume that the
Dialogue: 0,0:47:55.86,0:47:59.82,Default,,0000,0000,0000,,data comes from
Dialogue: 0,0:47:59.82,0:48:01.98,Default,,0000,0000,0000,,N original
Dialogue: 0,0:48:01.98,0:48:03.32,Default,,0000,0000,0000,,sources.
Dialogue: 0,0:48:03.32,0:48:07.01,Default,,0000,0000,0000,,So let's say there are N speakers in a cocktail party.
Dialogue: 0,0:48:07.01,0:48:09.82,Default,,0000,0000,0000,,So the original sources, I'm
Dialogue: 0,0:48:09.82,0:48:11.33,Default,,0000,0000,0000,,going to write as a vector, S
Dialogue: 0,0:48:11.33,0:48:13.62,Default,,0000,0000,0000,,as in RN.
Dialogue: 0,0:48:13.62,0:48:17.45,Default,,0000,0000,0000,,So just to be concrete about what I mean about that, I'm going to use
Dialogue: 0,0:48:17.45,0:48:22.50,Default,,0000,0000,0000,,SIJ to denote the signal
Dialogue: 0,0:48:22.50,0:48:25.85,Default,,0000,0000,0000,,from speaker
Dialogue: 0,0:48:27.14,0:48:30.22,Default,,0000,0000,0000,,J
Dialogue: 0,0:48:30.22,0:48:32.66,Default,,0000,0000,0000,,at time
Dialogue: 0,0:48:32.66,0:48:34.08,Default,,0000,0000,0000,,I. Here's what I mean.
Dialogue: 0,0:48:34.08,0:48:37.94,Default,,0000,0000,0000,,So what is sound? When you hear sound waves, sound is created
Dialogue: 0,0:48:37.94,0:48:39.28,Default,,0000,0000,0000,,by a pattern
Dialogue: 0,0:48:39.28,0:48:43.16,Default,,0000,0000,0000,,of expansions and compressions in air. So the way you're hearing my voice is
Dialogue: 0,0:48:43.16,0:48:44.62,Default,,0000,0000,0000,,my
Dialogue: 0,0:48:44.62,0:48:47.72,Default,,0000,0000,0000,,mouth is causing certain
Dialogue: 0,0:48:47.72,0:48:50.96,Default,,0000,0000,0000,,changes in the air pressure, and then your ear is hearing my voice as
Dialogue: 0,0:48:50.96,0:48:53.54,Default,,0000,0000,0000,,detecting those changes in air
Dialogue: 0,0:48:53.54,0:48:57.73,Default,,0000,0000,0000,,pressure. So what a microphone records, what my mouth is generating, is
Dialogue: 0,0:48:57.73,0:48:59.16,Default,,0000,0000,0000,,a pattern.
Dialogue: 0,0:48:59.16,0:49:01.46,Default,,0000,0000,0000,,I'm going to draw a cartoon,
Dialogue: 0,0:49:01.46,0:49:04.82,Default,,0000,0000,0000,,I guess.
Dialogue: 0,0:49:04.82,0:49:06.06,Default,,0000,0000,0000,,Changes in
Dialogue: 0,0:49:06.06,0:49:06.97,Default,,0000,0000,0000,,air pressure. So
Dialogue: 0,0:49:06.97,0:49:11.12,Default,,0000,0000,0000,,this is what sound is. You look at a microphone recording, you see these roughly periodic
Dialogue: 0,0:49:11.12,0:49:13.29,Default,,0000,0000,0000,,signals that comprise of
Dialogue: 0,0:49:13.29,0:49:16.23,Default,,0000,0000,0000,,changes in air pressure over time as the air pressure goes
Dialogue: 0,0:49:16.23,0:49:18.54,Default,,0000,0000,0000,,above and below some baseline air pressure.
Dialogue: 0,0:49:18.54,0:49:19.67,Default,,0000,0000,0000,,So this
Dialogue: 0,0:49:19.67,0:49:22.37,Default,,0000,0000,0000,,is what the speech signal looks like, say.
Dialogue: 0,0:49:22.37,0:49:26.40,Default,,0000,0000,0000,,So this is speaker one.
Dialogue: 0,0:49:26.40,0:49:29.04,Default,,0000,0000,0000,,Then what I'm saying is that
Dialogue: 0,0:49:29.04,0:49:31.19,Default,,0000,0000,0000,,- this is some time, T.
Dialogue: 0,0:49:31.19,0:49:34.48,Default,,0000,0000,0000,,What I'm saying is that the value of that point,
Dialogue: 0,0:49:34.48,0:49:36.99,Default,,0000,0000,0000,,I'm going to denote as S, super
Dialogue: 0,0:49:36.99,0:49:40.23,Default,,0000,0000,0000,,script T, sub script one.
Dialogue: 0,0:49:40.23,0:49:41.73,Default,,0000,0000,0000,,Similarly,
Dialogue: 0,0:49:41.73,0:49:44.89,Default,,0000,0000,0000,,speaker two, it's
Dialogue: 0,0:49:44.89,0:49:46.86,Default,,0000,0000,0000,,outputting some sound wave. Speaker voice
Dialogue: 0,0:49:46.86,0:49:49.75,Default,,0000,0000,0000,,will play that. It'll actually sound like
Dialogue: 0,0:49:49.75,0:49:52.92,Default,,0000,0000,0000,,a single tone, I guess.
Dialogue: 0,0:49:52.92,0:49:56.10,Default,,0000,0000,0000,,So in the same way, at the same time, T,
Dialogue: 0,0:49:56.10,0:49:59.05,Default,,0000,0000,0000,,the value of the air
Dialogue: 0,0:49:59.05,0:50:02.59,Default,,0000,0000,0000,,pressure generated by speaker two, I'll denote as
Dialogue: 0,0:50:02.59,0:50:09.59,Default,,0000,0000,0000,,ST
Dialogue: 0,0:50:16.58,0:50:23.58,Default,,0000,0000,0000,,2.
Dialogue: 0,0:50:29.86,0:50:36.86,Default,,0000,0000,0000,,So we observe
Dialogue: 0,0:50:37.77,0:50:40.45,Default,,0000,0000,0000,,XI equals A times SI, where
Dialogue: 0,0:50:40.45,0:50:43.41,Default,,0000,0000,0000,,these XIs
Dialogue: 0,0:50:43.41,0:50:45.99,Default,,0000,0000,0000,,are vectors in RN.
Dialogue: 0,0:50:45.99,0:50:50.27,Default,,0000,0000,0000,,So I'm going to assume
Dialogue: 0,0:50:50.27,0:50:53.26,Default,,0000,0000,0000,,that I have N microphones,
Dialogue: 0,0:50:53.26,0:50:53.58,Default,,0000,0000,0000,,and
Dialogue: 0,0:50:53.58,0:50:58.49,Default,,0000,0000,0000,,each of my microphones records some linear combination
Dialogue: 0,0:50:58.49,0:51:01.87,Default,,0000,0000,0000,,of what the speakers are saying. So each microphone records some overlapping
Dialogue: 0,0:51:01.87,0:51:04.50,Default,,0000,0000,0000,,combination of what the speakers are saying.
Dialogue: 0,0:51:04.50,0:51:07.62,Default,,0000,0000,0000,,For
Dialogue: 0,0:51:10.35,0:51:12.67,Default,,0000,0000,0000,,example, XIJ, which is - this
Dialogue: 0,0:51:12.67,0:51:16.25,Default,,0000,0000,0000,,is what microphone J records at time, I. So
Dialogue: 0,0:51:16.25,0:51:17.35,Default,,0000,0000,0000,,by definition of
Dialogue: 0,0:51:17.35,0:51:21.52,Default,,0000,0000,0000,,the matrix multiplication, this is sum
Dialogue: 0,0:51:21.52,0:51:23.98,Default,,0000,0000,0000,,of AIKSJ.
Dialogue: 0,0:51:23.98,0:51:29.37,Default,,0000,0000,0000,,Oh, excuse me.
Dialogue: 0,0:51:29.37,0:51:36.37,Default,,0000,0000,0000,,Okay? So what my J - sorry.
Dialogue: 0,0:51:37.18,0:51:41.05,Default,,0000,0000,0000,,So what my J microphone is recording is
Dialogue: 0,0:51:42.19,0:51:43.94,Default,,0000,0000,0000,,some linear combination of
Dialogue: 0,0:51:43.94,0:51:45.57,Default,,0000,0000,0000,,all of the speakers. So
Dialogue: 0,0:51:45.57,0:51:49.78,Default,,0000,0000,0000,,at time I, what microphone J is recording is some linear combination of
Dialogue: 0,0:51:49.78,0:51:52.75,Default,,0000,0000,0000,,what all the speakers are saying at time I.
Dialogue: 0,0:51:52.75,0:51:54.36,Default,,0000,0000,0000,,So K here
Dialogue: 0,0:51:54.36,0:51:57.82,Default,,0000,0000,0000,,indexes over the N speakers.
Dialogue: 0,0:51:57.82,0:52:01.24,Default,,0000,0000,0000,,So our goal
Dialogue: 0,0:52:02.87,0:52:06.42,Default,,0000,0000,0000,,is to find the matrix, W, equals A inverse, and
Dialogue: 0,0:52:06.42,0:52:10.13,Default,,0000,0000,0000,,just defining W that way.
Dialogue: 0,0:52:10.13,0:52:17.13,Default,,0000,0000,0000,,So
Dialogue: 0,0:52:18.14,0:52:21.35,Default,,0000,0000,0000,,we can recover the original sources
Dialogue: 0,0:52:21.35,0:52:23.31,Default,,0000,0000,0000,,as a linear combination of
Dialogue: 0,0:52:23.31,0:52:23.56,Default,,0000,0000,0000,,our
Dialogue: 0,0:52:23.55,0:52:30.55,Default,,0000,0000,0000,,microphone recordings, XI.
Dialogue: 0,0:52:33.06,0:52:35.33,Default,,0000,0000,0000,,Just as a point of notation,
Dialogue: 0,0:52:35.33,0:52:42.33,Default,,0000,0000,0000,,I'm going to write the matrix W this way. I'm going to use
Dialogue: 0,0:52:50.89,0:52:55.10,Default,,0000,0000,0000,,lower case W subscript one, subscript two and so on to denote the roles
Dialogue: 0,0:52:55.10,0:53:02.10,Default,,0000,0000,0000,,of this matrix, W.
Dialogue: 0,0:53:13.91,0:53:14.56,Default,,0000,0000,0000,,Let's
Dialogue: 0,0:53:14.56,0:53:18.72,Default,,0000,0000,0000,,see.
Dialogue: 0,0:53:18.72,0:53:23.54,Default,,0000,0000,0000,,So let's look at why IC is possible. Given these overlapping voices,
Dialogue: 0,0:53:23.54,0:53:28.25,Default,,0000,0000,0000,,let's think briefly why it might be possible
Dialogue: 0,0:53:28.25,0:53:30.76,Default,,0000,0000,0000,,to recover the original sources.
Dialogue: 0,0:53:30.76,0:53:33.06,Default,,0000,0000,0000,,So for the next example, I want
Dialogue: 0,0:53:33.06,0:53:36.51,Default,,0000,0000,0000,,to say
Dialogue: 0,0:53:42.74,0:53:46.53,Default,,0000,0000,0000,,- let's say that each of my speakers
Dialogue: 0,0:53:46.53,0:53:50.38,Default,,0000,0000,0000,,outputs - this will sound like white noise. Can I switch
Dialogue: 0,0:53:50.38,0:53:53.38,Default,,0000,0000,0000,,the laptop display,
Dialogue: 0,0:53:53.38,0:53:56.71,Default,,0000,0000,0000,,please? For this example, let's say that
Dialogue: 0,0:53:57.22,0:54:01.46,Default,,0000,0000,0000,,each of my speakers outputs uniform white noise. So
Dialogue: 0,0:54:01.46,0:54:05.46,Default,,0000,0000,0000,,if that's the case, these are my axis, S1 and S2.
Dialogue: 0,0:54:05.46,0:54:08.82,Default,,0000,0000,0000,,This is what my two speakers would be uttering.
Dialogue: 0,0:54:08.82,0:54:11.29,Default,,0000,0000,0000,,The parts of what they're
Dialogue: 0,0:54:11.29,0:54:14.98,Default,,0000,0000,0000,,uttering will look like a line in a square box if the two speakers are independently
Dialogue: 0,0:54:14.98,0:54:16.09,Default,,0000,0000,0000,,outputting
Dialogue: 0,0:54:16.09,0:54:18.39,Default,,0000,0000,0000,,uniform minus one random variables.
Dialogue: 0,0:54:18.39,0:54:20.29,Default,,0000,0000,0000,,So this is part of
Dialogue: 0,0:54:20.29,0:54:24.01,Default,,0000,0000,0000,,S1 and S2, my original sources.
Dialogue: 0,0:54:24.01,0:54:28.100,Default,,0000,0000,0000,,This would be a typical sample of what my microphones record. Here, at
Dialogue: 0,0:54:28.100,0:54:31.40,Default,,0000,0000,0000,,the axis, are X1 and X2.
Dialogue: 0,0:54:31.40,0:54:35.09,Default,,0000,0000,0000,,So these are images I got from [inaudible] on
Dialogue: 0,0:54:35.09,0:54:37.27,Default,,0000,0000,0000,,ICA.
Dialogue: 0,0:54:38.71,0:54:43.69,Default,,0000,0000,0000,,Given a picture like this, you can sort of look at this box, and you can sort of tell what the axis of
Dialogue: 0,0:54:43.69,0:54:44.94,Default,,0000,0000,0000,,this
Dialogue: 0,0:54:44.94,0:54:45.81,Default,,0000,0000,0000,,parallelogram
Dialogue: 0,0:54:45.81,0:54:48.15,Default,,0000,0000,0000,,are. You can figure out
Dialogue: 0,0:54:48.15,0:54:51.100,Default,,0000,0000,0000,,what linear transformation would transform the parallelogram back
Dialogue: 0,0:54:51.100,0:54:54.36,Default,,0000,0000,0000,,to a box.
Dialogue: 0,0:54:54.36,0:54:58.77,Default,,0000,0000,0000,,So it turns out there are some inherent ambiguities in ICA.
Dialogue: 0,0:54:58.77,0:55:00.51,Default,,0000,0000,0000,,I'll just say what they are.
Dialogue: 0,0:55:00.51,0:55:01.57,Default,,0000,0000,0000,,One is that
Dialogue: 0,0:55:01.57,0:55:05.71,Default,,0000,0000,0000,,you can't recover the original indexing of the sources. In particular,
Dialogue: 0,0:55:05.71,0:55:07.38,Default,,0000,0000,0000,,if
Dialogue: 0,0:55:07.38,0:55:10.81,Default,,0000,0000,0000,,I generated the data for speaker one and speaker two,
Dialogue: 0,0:55:10.81,0:55:14.47,Default,,0000,0000,0000,,you can run ICA, and then you may end up with the order of the speakers
Dialogue: 0,0:55:14.47,0:55:17.53,Default,,0000,0000,0000,,reversed. What that corresponds to is if you take this
Dialogue: 0,0:55:17.53,0:55:21.81,Default,,0000,0000,0000,,picture and you flip this picture along a 45-degree
Dialogue: 0,0:55:21.81,0:55:26.13,Default,,0000,0000,0000,,axis. You take a 45-degree axis and reflect this picture across the 45-degree axis, you'll still
Dialogue: 0,0:55:26.13,0:55:28.28,Default,,0000,0000,0000,,get a box. So
Dialogue: 0,0:55:28.28,0:55:31.32,Default,,0000,0000,0000,,there's no way for the algorithms to tell which was speaker No. 1 and
Dialogue: 0,0:55:31.32,0:55:32.91,Default,,0000,0000,0000,,which
Dialogue: 0,0:55:32.91,0:55:37.70,Default,,0000,0000,0000,,was speaker No. 2. The numbering or the ordering of the speakers is
Dialogue: 0,0:55:37.70,0:55:40.84,Default,,0000,0000,0000,,ambiguous. The other source of ambiguity, and these are the only ambiguities
Dialogue: 0,0:55:40.84,0:55:42.09,Default,,0000,0000,0000,,in this example,
Dialogue: 0,0:55:42.09,0:55:44.47,Default,,0000,0000,0000,,is the sign of the sources. So
Dialogue: 0,0:55:44.47,0:55:49.12,Default,,0000,0000,0000,,given my speakers' recordings,
Dialogue: 0,0:55:49.12,0:55:53.19,Default,,0000,0000,0000,,you can't tell whether you got a positive SI or whether you got
Dialogue: 0,0:55:53.19,0:55:56.18,Default,,0000,0000,0000,,back a negative SI.
Dialogue: 0,0:55:56.18,0:55:58.21,Default,,0000,0000,0000,,In this picture, what that corresponds to
Dialogue: 0,0:55:58.21,0:56:02.10,Default,,0000,0000,0000,,is if you take this picture, and you reflect it along the vertical axis, if
Dialogue: 0,0:56:02.10,0:56:04.66,Default,,0000,0000,0000,,you reflect it along the horizontal axis,
Dialogue: 0,0:56:04.66,0:56:05.91,Default,,0000,0000,0000,,you still get a box.
Dialogue: 0,0:56:05.91,0:56:08.72,Default,,0000,0000,0000,,You still get back [inaudible] speakers.
Dialogue: 0,0:56:08.72,0:56:09.65,Default,,0000,0000,0000,,So
Dialogue: 0,0:56:09.65,0:56:11.72,Default,,0000,0000,0000,,it turns out that in this example,
Dialogue: 0,0:56:11.72,0:56:16.60,Default,,0000,0000,0000,,you can't guarantee that you've recovered positive SI rather
Dialogue: 0,0:56:16.60,0:56:19.69,Default,,0000,0000,0000,,than negative SI.
Dialogue: 0,0:56:19.69,0:56:21.93,Default,,0000,0000,0000,,So it turns out that these are the only
Dialogue: 0,0:56:21.93,0:56:25.74,Default,,0000,0000,0000,,two ambiguities in this example. What is the permutation of the speakers, and the
Dialogue: 0,0:56:25.74,0:56:28.14,Default,,0000,0000,0000,,other is the sign of the speakers.
Dialogue: 0,0:56:28.14,0:56:30.75,Default,,0000,0000,0000,,Permutation of the speakers, there's not much you can do about that.
Dialogue: 0,0:56:30.75,0:56:34.91,Default,,0000,0000,0000,,It turns out that if you take the audio
Dialogue: 0,0:56:34.91,0:56:35.61,Default,,0000,0000,0000,,source
Dialogue: 0,0:56:35.61,0:56:39.20,Default,,0000,0000,0000,,and if you flip the sign, and you take negative S, and if you play that through a
Dialogue: 0,0:56:39.20,0:56:43.82,Default,,0000,0000,0000,,microphone it'll sound indistinguishable.
Dialogue: 0,0:56:43.82,0:56:44.88,Default,,0000,0000,0000,,So
Dialogue: 0,0:56:44.88,0:56:47.83,Default,,0000,0000,0000,,for many of the applications we care about, the sign
Dialogue: 0,0:56:47.83,0:56:51.26,Default,,0000,0000,0000,,as well as the permutation
Dialogue: 0,0:56:51.26,0:56:55.08,Default,,0000,0000,0000,,is ambiguous, but you don't really care
Dialogue: 0,0:56:55.08,0:57:02.08,Default,,0000,0000,0000,,about it. Let's switch back
Dialogue: 0,0:57:03.53,0:57:08.99,Default,,0000,0000,0000,,to
Dialogue: 0,0:57:08.99,0:57:11.18,Default,,0000,0000,0000,,chalk board, please.
Dialogue: 0,0:57:11.18,0:57:15.64,Default,,0000,0000,0000,,It turns out, and I don't want to spend too much time on this, but I do want to say it briefly.
Dialogue: 0,0:57:15.64,0:57:17.29,Default,,0000,0000,0000,,It turns out the
Dialogue: 0,0:57:17.29,0:57:19.20,Default,,0000,0000,0000,,reason why those are the only
Dialogue: 0,0:57:19.20,0:57:25.81,Default,,0000,0000,0000,,sources of ambiguity - so the ambiguities were the
Dialogue: 0,0:57:25.81,0:57:29.87,Default,,0000,0000,0000,,permutation of the speakers
Dialogue: 0,0:57:29.87,0:57:31.96,Default,,0000,0000,0000,,and the signs.
Dialogue: 0,0:57:31.96,0:57:35.40,Default,,0000,0000,0000,,It turns out that
Dialogue: 0,0:57:35.40,0:57:39.92,Default,,0000,0000,0000,,the reason these were the only ambiguities was because
Dialogue: 0,0:57:39.92,0:57:44.100,Default,,0000,0000,0000,,the SIJs were
Dialogue: 0,0:57:44.100,0:57:46.69,Default,,0000,0000,0000,,
Dialogue: 0,0:57:46.69,0:57:50.51,Default,,0000,0000,0000,,non-Gaussian. I don't want to spend too much time on this, but I'll say it briefly.
Dialogue: 0,0:57:50.51,0:57:54.09,Default,,0000,0000,0000,,Suppose my original sources, S1 and S2, were Gaussian.
Dialogue: 0,0:57:54.09,0:57:55.91,Default,,0000,0000,0000,,So
Dialogue: 0,0:57:58.33,0:58:02.20,Default,,0000,0000,0000,,suppose SI is
Dialogue: 0,0:58:02.20,0:58:04.34,Default,,0000,0000,0000,,Gaussian, would mean zero
Dialogue: 0,0:58:04.34,0:58:07.02,Default,,0000,0000,0000,,and identity covariance.
Dialogue: 0,0:58:07.02,0:58:10.96,Default,,0000,0000,0000,,That just means that each of my speakers outputs a Gaussian random variable. Here's a typical
Dialogue: 0,0:58:10.96,0:58:12.62,Default,,0000,0000,0000,,example of Gaussian
Dialogue: 0,0:58:12.62,0:58:18.48,Default,,0000,0000,0000,,data.
Dialogue: 0,0:58:18.48,0:58:22.87,Default,,0000,0000,0000,,You will recall the contours of a Gaussian distribution with identity covariants
Dialogue: 0,0:58:22.87,0:58:25.09,Default,,0000,0000,0000,,looks like
Dialogue: 0,0:58:25.09,0:58:27.74,Default,,0000,0000,0000,,this, right? The Gaussian is a
Dialogue: 0,0:58:27.74,0:58:30.57,Default,,0000,0000,0000,,spherically symmetric distribution.
Dialogue: 0,0:58:30.57,0:58:35.22,Default,,0000,0000,0000,,So if my speakers were outputting Gaussian random variables, than if
Dialogue: 0,0:58:35.22,0:58:38.18,Default,,0000,0000,0000,,I observe a linear combination of this,
Dialogue: 0,0:58:38.18,0:58:40.48,Default,,0000,0000,0000,,there's actually no way to recover the
Dialogue: 0,0:58:40.48,0:58:43.42,Default,,0000,0000,0000,,original distribution because there's no way for me to tell
Dialogue: 0,0:58:43.42,0:58:46.12,Default,,0000,0000,0000,,if the axis are at this angle or if they're at
Dialogue: 0,0:58:46.12,0:58:48.35,Default,,0000,0000,0000,,that angle and so
Dialogue: 0,0:58:48.35,0:58:52.43,Default,,0000,0000,0000,,on. The Gaussian is a rotationally symmetric
Dialogue: 0,0:58:52.43,0:58:56.77,Default,,0000,0000,0000,,distribution, so I would no be able to recover the orientation in the
Dialogue: 0,0:58:56.77,0:58:58.84,Default,,0000,0000,0000,,rotation
Dialogue: 0,0:58:58.84,0:59:02.28,Default,,0000,0000,0000,,of this. So I don't want to prove this too much. I don't want to spend too much time dwelling on this, but it turns
Dialogue: 0,0:59:02.28,0:59:02.90,Default,,0000,0000,0000,,out
Dialogue: 0,0:59:02.90,0:59:04.70,Default,,0000,0000,0000,,if your source is a Gaussian,
Dialogue: 0,0:59:04.70,0:59:07.93,Default,,0000,0000,0000,,then it's actually impossible to do
Dialogue: 0,0:59:07.93,0:59:12.05,Default,,0000,0000,0000,,ICA. ICA relies critically on your data being non-Gaussian because if the data
Dialogue: 0,0:59:12.05,0:59:16.94,Default,,0000,0000,0000,,were Gaussian, then the rotation of the data would be ambiguous. So
Dialogue: 0,0:59:16.94,0:59:19.08,Default,,0000,0000,0000,,regardless of how much data you have,
Dialogue: 0,0:59:19.08,0:59:23.55,Default,,0000,0000,0000,,even if you had infinitely large amounts of data, you would not be able to recover
Dialogue: 0,0:59:23.55,0:59:26.74,Default,,0000,0000,0000,,the matrix A or W.
Dialogue: 0,0:59:32.78,0:59:39.78,Default,,0000,0000,0000,,Let's go ahead and divide the algorithm.
Dialogue: 0,0:59:56.78,1:00:00.94,Default,,0000,0000,0000,,To do this, I need just one more result, and then the derivation will be
Dialogue: 0,1:00:03.03,1:00:07.73,Default,,0000,0000,0000,,three lines. [Inaudible] many variables as N, which is the joint vector of the sound that all of my
Dialogue: 0,1:00:07.73,1:00:11.31,Default,,0000,0000,0000,,speakers that are emitting at any time.
Dialogue: 0,1:00:11.31,1:00:12.46,Default,,0000,0000,0000,,So
Dialogue: 0,1:00:12.46,1:00:15.62,Default,,0000,0000,0000,,let's say the density of S is
Dialogue: 0,1:00:15.62,1:00:17.34,Default,,0000,0000,0000,,P subscript S,
Dialogue: 0,1:00:17.34,1:00:19.57,Default,,0000,0000,0000,,capital S.
Dialogue: 0,1:00:19.57,1:00:23.40,Default,,0000,0000,0000,,So my microphone recording records S equals AS,
Dialogue: 0,1:00:23.40,1:00:25.32,Default,,0000,0000,0000,,equals W inverse
Dialogue: 0,1:00:25.32,1:00:31.02,Default,,0000,0000,0000,,S. Equivalently, S equals W sign of X.
Dialogue: 0,1:00:31.02,1:00:34.53,Default,,0000,0000,0000,,So let's think about what is the density of
Dialogue: 0,1:00:34.53,1:00:38.21,Default,,0000,0000,0000,,X. So I have P of S. I know the density of
Dialogue: 0,1:00:38.21,1:00:41.36,Default,,0000,0000,0000,,S, and X is a linear combination of the S's.
Dialogue: 0,1:00:41.36,1:00:45.17,Default,,0000,0000,0000,,So let's figure out what is the density of X.
Dialogue: 0,1:00:45.17,1:00:48.67,Default,,0000,0000,0000,,One thing we could do is
Dialogue: 0,1:00:48.67,1:00:51.34,Default,,0000,0000,0000,,figure out what S is. So this is just -
Dialogue: 0,1:00:51.34,1:00:55.76,Default,,0000,0000,0000,,apply the density of
Dialogue: 0,1:00:55.76,1:00:58.07,Default,,0000,0000,0000,,S to W of S. So let's
Dialogue: 0,1:00:58.07,1:01:01.100,Default,,0000,0000,0000,,see. This is the probability of S, so we just
Dialogue: 0,1:01:02.91,1:01:06.56,Default,,0000,0000,0000,,figure out what S is. S is W times X, so the probability of S is
Dialogue: 0,1:01:06.56,1:01:09.94,Default,,0000,0000,0000,,W times X, so the probability of X must be [inaudible].
Dialogue: 0,1:01:09.94,1:01:11.62,Default,,0000,0000,0000,,So this is wrong.
Dialogue: 0,1:01:11.62,1:01:14.75,Default,,0000,0000,0000,,It turns out you can do this for probably mass functions but not for
Dialogue: 0,1:01:14.75,1:01:16.92,Default,,0000,0000,0000,,continuous density. So in particular,
Dialogue: 0,1:01:16.92,1:01:20.97,Default,,0000,0000,0000,,it's not correct to say that the probability of X is - well, you just figure out what
Dialogue: 0,1:01:20.97,1:01:22.50,Default,,0000,0000,0000,,S is.
Dialogue: 0,1:01:22.50,1:01:26.19,Default,,0000,0000,0000,,Then you say the probability of S is applied to that. This is wrong. You
Dialogue: 0,1:01:26.19,1:01:27.82,Default,,0000,0000,0000,,can't do this with densities.
Dialogue: 0,1:01:27.82,1:01:30.97,Default,,0000,0000,0000,,You can't say the probability of S is that because it's a property density
Dialogue: 0,1:01:30.97,1:01:32.97,Default,,0000,0000,0000,,function.
Dialogue: 0,1:01:32.97,1:01:34.46,Default,,0000,0000,0000,,In particular,
Dialogue: 0,1:01:34.46,1:01:35.51,Default,,0000,0000,0000,,the
Dialogue: 0,1:01:35.51,1:01:37.85,Default,,0000,0000,0000,,right formula is the
Dialogue: 0,1:01:37.85,1:01:40.44,Default,,0000,0000,0000,,density of S applied to W times X,
Dialogue: 0,1:01:40.44,1:01:41.73,Default,,0000,0000,0000,,times the determinant
Dialogue: 0,1:01:41.73,1:01:44.21,Default,,0000,0000,0000,,of the matrix, W.
Dialogue: 0,1:01:44.21,1:01:47.19,Default,,0000,0000,0000,,Let me just illustrate that with an example.
Dialogue: 0,1:01:47.19,1:01:49.92,Default,,0000,0000,0000,,Let's say
Dialogue: 0,1:01:49.92,1:01:51.55,Default,,0000,0000,0000,,the
Dialogue: 0,1:01:51.55,1:01:58.20,Default,,0000,0000,0000,,density for S is that. In
Dialogue: 0,1:01:58.20,1:02:03.47,Default,,0000,0000,0000,,this example, S is uniform
Dialogue: 0,1:02:03.47,1:02:05.54,Default,,0000,0000,0000,,over the unit interval.
Dialogue: 0,1:02:07.68,1:02:14.68,Default,,0000,0000,0000,,So the density for S looks like that. It's
Dialogue: 0,1:02:15.19,1:02:18.14,Default,,0000,0000,0000,,just density for the uniform
Dialogue: 0,1:02:18.14,1:02:20.75,Default,,0000,0000,0000,,distribution of zero one.
Dialogue: 0,1:02:20.75,1:02:24.15,Default,,0000,0000,0000,,So let me let X be equal to two times
Dialogue: 0,1:02:24.15,1:02:30.01,Default,,0000,0000,0000,,S. So this means A equals two.
Dialogue: 0,1:02:30.01,1:02:33.71,Default,,0000,0000,0000,,W equals one half. So if
Dialogue: 0,1:02:33.71,1:02:36.72,Default,,0000,0000,0000,,S is a uniform distribution over zero, one,
Dialogue: 0,1:02:36.72,1:02:40.32,Default,,0000,0000,0000,,then X, which is two times that, will be the uniform distribution over the
Dialogue: 0,1:02:40.32,1:02:43.30,Default,,0000,0000,0000,,range from zero to two.
Dialogue: 0,1:02:43.30,1:02:50.30,Default,,0000,0000,0000,,So the density for X will be -
Dialogue: 0,1:02:54.36,1:02:57.29,Default,,0000,0000,0000,,that's one, that's two,
Dialogue: 0,1:02:57.29,1:03:01.41,Default,,0000,0000,0000,,that's one half,
Dialogue: 0,1:03:02.53,1:03:04.95,Default,,0000,0000,0000,,and
Dialogue: 0,1:03:04.95,1:03:07.94,Default,,0000,0000,0000,,that's one. Okay? Density for X will be indicator
Dialogue: 0,1:03:07.94,1:03:12.73,Default,,0000,0000,0000,,zero [inaudible] for X [inaudible] two
Dialogue: 0,1:03:12.73,1:03:15.74,Default,,0000,0000,0000,,times W, times one half.
Dialogue: 0,1:03:15.74,1:03:20.23,Default,,0000,0000,0000,,So
Dialogue: 0,1:03:20.23,1:03:21.73,Default,,0000,0000,0000,,does that make
Dialogue: 0,1:03:21.73,1:03:25.02,Default,,0000,0000,0000,,sense? [Inaudible] computer density for X because X is now spread out
Dialogue: 0,1:03:25.02,1:03:28.65,Default,,0000,0000,0000,,across a wider range. The density of X is now smaller,
Dialogue: 0,1:03:28.65,1:03:35.65,Default,,0000,0000,0000,,and therefore, the density of X has this one half
Dialogue: 0,1:03:37.86,1:03:38.92,Default,,0000,0000,0000,,term
Dialogue: 0,1:03:38.92,1:03:42.58,Default,,0000,0000,0000,,here. Okay? This is an illustration for the case of one-dimensional random variables,
Dialogue: 0,1:03:42.58,1:03:44.29,Default,,0000,0000,0000,,
Dialogue: 0,1:03:44.29,1:03:45.16,Default,,0000,0000,0000,,or S
Dialogue: 0,1:03:45.16,1:03:49.49,Default,,0000,0000,0000,,and X of one D. I'm not going to show it, but the generalization of this to vector value random variables is that the
Dialogue: 0,1:03:49.49,1:03:51.65,Default,,0000,0000,0000,,density of X is given by this
Dialogue: 0,1:03:51.65,1:03:53.95,Default,,0000,0000,0000,,times the determinant of the matrix, W. Over here,
Dialogue: 0,1:03:53.95,1:04:00.95,Default,,0000,0000,0000,,I showed the one dimensional [inaudible] generalization.
Dialogue: 0,1:04:21.44,1:04:28.44,Default,,0000,0000,0000,,So we're nearly there. Here's
Dialogue: 0,1:04:28.75,1:04:33.97,Default,,0000,0000,0000,,how I can implement ICA.
Dialogue: 0,1:04:33.97,1:04:37.04,Default,,0000,0000,0000,,So my distribution on
Dialogue: 0,1:04:37.04,1:04:44.04,Default,,0000,0000,0000,,S,
Dialogue: 0,1:04:50.26,1:04:52.96,Default,,0000,0000,0000,,so I'm going to assume that my density on S
Dialogue: 0,1:04:52.96,1:04:55.10,Default,,0000,0000,0000,,is given by this as a product over the
Dialogue: 0,1:04:55.10,1:04:59.95,Default,,0000,0000,0000,,N speakers of the density - the product of speaker
Dialogue: 0,1:04:59.95,1:05:00.89,Default,,0000,0000,0000,,I
Dialogue: 0,1:05:00.89,1:05:03.66,Default,,0000,0000,0000,,emitting a certain sound. This is a product of densities.
Dialogue: 0,1:05:03.66,1:05:07.66,Default,,0000,0000,0000,,This is a product of distributions because I'm going to assume that the
Dialogue: 0,1:05:07.66,1:05:11.47,Default,,0000,0000,0000,,speakers are having independent conversations. So the SI's independent
Dialogue: 0,1:05:11.47,1:05:15.87,Default,,0000,0000,0000,,for different values of I.
Dialogue: 0,1:05:15.87,1:05:18.06,Default,,0000,0000,0000,,So by the formula we just worked out,
Dialogue: 0,1:05:18.06,1:05:22.36,Default,,0000,0000,0000,,the density for X would be equal to that.
Dialogue: 0,1:05:36.60,1:05:39.31,Default,,0000,0000,0000,,I'll just remind you, W was A
Dialogue: 0,1:05:39.31,1:05:42.58,Default,,0000,0000,0000,,inverse. It was
Dialogue: 0,1:05:42.58,1:05:43.93,Default,,0000,0000,0000,,this matrix
Dialogue: 0,1:05:43.93,1:05:47.62,Default,,0000,0000,0000,,I defined previously
Dialogue: 0,1:05:47.62,1:05:50.43,Default,,0000,0000,0000,,so that SI
Dialogue: 0,1:05:50.43,1:05:52.52,Default,,0000,0000,0000,,equals WI [inaudible]
Dialogue: 0,1:05:52.52,1:05:59.21,Default,,0000,0000,0000,,X. So that's what's in
Dialogue: 0,1:05:59.21,1:06:02.30,Default,,0000,0000,0000,,there. To complete my formulation for this model,
Dialogue: 0,1:06:02.30,1:06:06.36,Default,,0000,0000,0000,,the final thing I need to do is
Dialogue: 0,1:06:06.36,1:06:10.18,Default,,0000,0000,0000,,choose
Dialogue: 0,1:06:10.18,1:06:11.55,Default,,0000,0000,0000,,a density
Dialogue: 0,1:06:11.55,1:06:14.26,Default,,0000,0000,0000,,for what I think each speaker is
Dialogue: 0,1:06:14.26,1:06:17.95,Default,,0000,0000,0000,,saying. I need to assume some density over
Dialogue: 0,1:06:17.95,1:06:21.66,Default,,0000,0000,0000,,the sounds emitted by an individual speaker.
Dialogue: 0,1:06:21.66,1:06:25.63,Default,,0000,0000,0000,,So following the discussion I had right when the [inaudible]
Dialogue: 0,1:06:25.63,1:06:27.65,Default,,0000,0000,0000,,ICA,
Dialogue: 0,1:06:27.65,1:06:30.56,Default,,0000,0000,0000,,one thing I could do is I could choose
Dialogue: 0,1:06:30.56,1:06:32.02,Default,,0000,0000,0000,,the density for S,
Dialogue: 0,1:06:32.02,1:06:35.51,Default,,0000,0000,0000,,or equivalently, I could choose the CDF, the cumulative distribution
Dialogue: 0,1:06:35.51,1:06:37.17,Default,,0000,0000,0000,,function for
Dialogue: 0,1:06:37.17,1:06:38.22,Default,,0000,0000,0000,,S.
Dialogue: 0,1:06:38.22,1:06:41.49,Default,,0000,0000,0000,,In this case, I'm going to choose
Dialogue: 0,1:06:41.49,1:06:44.82,Default,,0000,0000,0000,,a CDF, probably for historical reasons and probably for
Dialogue: 0,1:06:44.82,1:06:46.57,Default,,0000,0000,0000,,convenience.
Dialogue: 0,1:06:46.57,1:06:50.02,Default,,0000,0000,0000,,I need to choose the CDF for S, so
Dialogue: 0,1:06:50.02,1:06:54.78,Default,,0000,0000,0000,,what that means is I just need to choose some function that increases from zero to
Dialogue: 0,1:06:54.78,1:06:59.44,Default,,0000,0000,0000,,what. I know I can't choose a Gaussian because we know you can't
Dialogue: 0,1:06:59.44,1:07:02.20,Default,,0000,0000,0000,,do ICA on Gaussian data.
Dialogue: 0,1:07:02.20,1:07:04.65,Default,,0000,0000,0000,,So I need some function increasing from zero to one
Dialogue: 0,1:07:04.65,1:07:08.64,Default,,0000,0000,0000,,that is not the cumulative distribution function for a
Dialogue: 0,1:07:08.64,1:07:10.36,Default,,0000,0000,0000,,Gaussian distribution.
Dialogue: 0,1:07:10.36,1:07:14.01,Default,,0000,0000,0000,,So what other functions do I know that increase from zero to one? I
Dialogue: 0,1:07:14.01,1:07:16.14,Default,,0000,0000,0000,,just choose the
Dialogue: 0,1:07:16.14,1:07:18.33,Default,,0000,0000,0000,,CDF to be
Dialogue: 0,1:07:18.33,1:07:21.98,Default,,0000,0000,0000,,the
Dialogue: 0,1:07:21.98,1:07:23.04,Default,,0000,0000,0000,,sigmoid function.
Dialogue: 0,1:07:23.04,1:07:24.73,Default,,0000,0000,0000,,This is a
Dialogue: 0,1:07:24.73,1:07:27.23,Default,,0000,0000,0000,,commonly-made choice that
Dialogue: 0,1:07:27.23,1:07:31.05,Default,,0000,0000,0000,,is made for convenience. There is actually no great reason for why you
Dialogue: 0,1:07:31.05,1:07:34.08,Default,,0000,0000,0000,,choose a sigmoid function. It's just a convenient function that we all know
Dialogue: 0,1:07:34.08,1:07:35.29,Default,,0000,0000,0000,,and are familiar with
Dialogue: 0,1:07:35.29,1:07:37.85,Default,,0000,0000,0000,,that happens to increase from zero to one.
Dialogue: 0,1:07:37.85,1:07:44.85,Default,,0000,0000,0000,,When you take the derivative
Dialogue: 0,1:07:45.79,1:07:49.39,Default,,0000,0000,0000,,of the sigmoid, and that will give you back
Dialogue: 0,1:07:49.39,1:07:50.12,Default,,0000,0000,0000,,your
Dialogue: 0,1:07:50.12,1:07:55.46,Default,,0000,0000,0000,,density. This is just not Gaussian. This is the main virtue of choosing the sigmoid.
Dialogue: 0,1:07:55.46,1:08:02.46,Default,,0000,0000,0000,,So
Dialogue: 0,1:08:19.02,1:08:21.96,Default,,0000,0000,0000,,there's really no rational for the choice of sigma. Lots of other things will
Dialogue: 0,1:08:21.96,1:08:23.60,Default,,0000,0000,0000,,work fine, too.
Dialogue: 0,1:08:23.60,1:08:26.66,Default,,0000,0000,0000,,It's just a common, reasonable default.
Dialogue: 0,1:08:38.04,1:08:40.28,Default,,0000,0000,0000,,It turns out that
Dialogue: 0,1:08:40.28,1:08:44.63,Default,,0000,0000,0000,,one reason the sigma works well for a lot of data sources is that
Dialogue: 0,1:08:44.63,1:08:49.08,Default,,0000,0000,0000,,if this is the Gaussian.
Dialogue: 0,1:08:49.08,1:08:52.19,Default,,0000,0000,0000,,If you actually take the sigmoid and you take its derivative,
Dialogue: 0,1:09:02.30,1:09:06.64,Default,,0000,0000,0000,,you find that the sigmoid has [inaudible] than the Gaussian. By this I mean
Dialogue: 0,1:09:06.64,1:09:10.51,Default,,0000,0000,0000,,the density of the sigmoid dies down to zero much more slowly than
Dialogue: 0,1:09:10.51,1:09:12.30,Default,,0000,0000,0000,,the
Dialogue: 0,1:09:12.30,1:09:13.49,Default,,0000,0000,0000,,Gaussian.
Dialogue: 0,1:09:13.49,1:09:18.08,Default,,0000,0000,0000,,The magnitudes of the tails dies down as E to the minus S squared.
Dialogue: 0,1:09:18.08,1:09:21.96,Default,,0000,0000,0000,,For the sigmoid, the tails look like E to the minus
Dialogue: 0,1:09:21.96,1:09:26.95,Default,,0000,0000,0000,,S. So the tails die down as E to the minus S, around E
Dialogue: 0,1:09:26.95,1:09:29.53,Default,,0000,0000,0000,,to the minus S squared. It turns out that most distributions of this property
Dialogue: 0,1:09:29.53,1:09:34.36,Default,,0000,0000,0000,,with [inaudible] tails, where the distribution decays to zero relatively slowly
Dialogue: 0,1:09:34.36,1:09:38.44,Default,,0000,0000,0000,,compared to Gaussian will
Dialogue: 0,1:09:38.44,1:09:39.92,Default,,0000,0000,0000,,work fine for your data.
Dialogue: 0,1:09:39.92,1:09:43.94,Default,,0000,0000,0000,,Actually, one other choice you can sometimes us is what's called the Laplacian
Dialogue: 0,1:09:43.94,1:09:46.23,Default,,0000,0000,0000,,distribution, which is
Dialogue: 0,1:09:46.23,1:09:53.23,Default,,0000,0000,0000,,that. This will work fine, too, for many data sources.
Dialogue: 0,1:10:06.54,1:10:08.11,Default,,0000,0000,0000,,Sticking with the sigmoid for now, I'll just
Dialogue: 0,1:10:08.11,1:10:09.42,Default,,0000,0000,0000,,write
Dialogue: 0,1:10:09.42,1:10:14.48,Default,,0000,0000,0000,,down the algorithm in two steps. So given
Dialogue: 0,1:10:14.48,1:10:17.15,Default,,0000,0000,0000,,my training set, and
Dialogue: 0,1:10:17.15,1:10:21.18,Default,,0000,0000,0000,,as you show, this is an unlabeled training set, I can
Dialogue: 0,1:10:21.18,1:10:25.77,Default,,0000,0000,0000,,write down the log likelihood of my parameters. So that's - assembled my training
Dialogue: 0,1:10:25.77,1:10:27.21,Default,,0000,0000,0000,,examples, log of - times
Dialogue: 0,1:10:27.21,1:10:34.21,Default,,0000,0000,0000,,that.
Dialogue: 0,1:10:42.87,1:10:44.88,Default,,0000,0000,0000,,So that's my log
Dialogue: 0,1:10:44.88,1:10:51.88,Default,,0000,0000,0000,,likelihood.
Dialogue: 0,1:10:53.34,1:10:59.38,Default,,0000,0000,0000,,To learn the parameters, W, of this model, I can use the [inaudible] assent,
Dialogue: 0,1:10:59.38,1:11:06.38,Default,,0000,0000,0000,,which is
Dialogue: 0,1:11:06.57,1:11:08.58,Default,,0000,0000,0000,,just that.
Dialogue: 0,1:11:08.58,1:11:11.49,Default,,0000,0000,0000,,It turns out, if you work through the math,
Dialogue: 0,1:11:11.49,1:11:13.97,Default,,0000,0000,0000,,let's see. If P of S
Dialogue: 0,1:11:13.97,1:11:19.82,Default,,0000,0000,0000,,is equal to the derivative of the
Dialogue: 0,1:11:19.82,1:11:23.78,Default,,0000,0000,0000,,sigmoid, then if you just work through the math to compute the [inaudible] there. You've all
Dialogue: 0,1:11:23.78,1:11:27.41,Default,,0000,0000,0000,,done this a lot of times. I won't bother to show
Dialogue: 0,1:11:27.41,1:11:34.41,Default,,0000,0000,0000,,the details. You find that is equal to this.
Dialogue: 0,1:11:46.63,1:11:49.58,Default,,0000,0000,0000,,Okay? That's just - you can work those out yourself. It's just math to
Dialogue: 0,1:11:49.58,1:11:54.50,Default,,0000,0000,0000,,compute the derivative of this with respect to
Dialogue: 0,1:11:54.50,1:11:59.31,Default,,0000,0000,0000,,W. So to summarize, given the training set,
Dialogue: 0,1:11:59.31,1:12:02.10,Default,,0000,0000,0000,,here's my [inaudible] update rule. So you run the
Dialogue: 0,1:12:02.10,1:12:06.31,Default,,0000,0000,0000,,[inaudible] to learn the parameters W.
Dialogue: 0,1:12:06.31,1:12:08.38,Default,,0000,0000,0000,,After you're
Dialogue: 0,1:12:08.38,1:12:09.72,Default,,0000,0000,0000,,done, you then
Dialogue: 0,1:12:12.37,1:12:14.11,Default,,0000,0000,0000,,output SI equals
Dialogue: 0,1:12:14.11,1:12:16.99,Default,,0000,0000,0000,,WXI, and you've separated your sources
Dialogue: 0,1:12:16.99,1:12:18.17,Default,,0000,0000,0000,,of your
Dialogue: 0,1:12:18.17,1:12:21.78,Default,,0000,0000,0000,,data back out into the original independent sources.
Dialogue: 0,1:12:21.78,1:12:26.20,Default,,0000,0000,0000,,Hopefully up to only a permutation and a plus/minus
Dialogue: 0,1:12:26.20,1:12:30.65,Default,,0000,0000,0000,,sign ambiguity.
Dialogue: 0,1:12:30.65,1:12:34.56,Default,,0000,0000,0000,,Okay? So just switch back to the laptop, please?
Dialogue: 0,1:12:34.56,1:12:41.56,Default,,0000,0000,0000,,So we'll just wrap up with a couple of examples of applications of ICA.
Dialogue: 0,1:12:42.21,1:12:43.15,Default,,0000,0000,0000,,This is
Dialogue: 0,1:12:43.15,1:12:46.72,Default,,0000,0000,0000,,actually a picture of our TA, Katie.
Dialogue: 0,1:12:46.72,1:12:49.98,Default,,0000,0000,0000,,So one of the applications of ICA is
Dialogue: 0,1:12:49.98,1:12:52.01,Default,,0000,0000,0000,,to process
Dialogue: 0,1:12:52.01,1:12:56.53,Default,,0000,0000,0000,,various types of [inaudible] recording data, so [inaudible]. This
Dialogue: 0,1:12:56.53,1:12:58.78,Default,,0000,0000,0000,,is a picture of
Dialogue: 0,1:12:58.78,1:13:02.47,Default,,0000,0000,0000,,a EEG cap, in which there are a number of electrodes
Dialogue: 0,1:13:02.47,1:13:04.53,Default,,0000,0000,0000,,you place
Dialogue: 0,1:13:04.53,1:13:07.96,Default,,0000,0000,0000,,on the - in this case, on Katie's brain, on Katie's scalp.
Dialogue: 0,1:13:07.96,1:13:13.37,Default,,0000,0000,0000,,So where each electrode measures changes in voltage over time
Dialogue: 0,1:13:13.37,1:13:15.06,Default,,0000,0000,0000,,on the scalp.
Dialogue: 0,1:13:15.06,1:13:18.41,Default,,0000,0000,0000,,On the right, it's a typical example of [inaudible] data
Dialogue: 0,1:13:18.41,1:13:22.57,Default,,0000,0000,0000,,where each electrode measures - just changes in voltage over
Dialogue: 0,1:13:22.57,1:13:23.89,Default,,0000,0000,0000,,time. So
Dialogue: 0,1:13:23.89,1:13:27.95,Default,,0000,0000,0000,,the horizontal axis is time, and the vertical axis is voltage. So here's the same thing,
Dialogue: 0,1:13:27.95,1:13:29.56,Default,,0000,0000,0000,,blown up a little bit.
Dialogue: 0,1:13:29.56,1:13:32.68,Default,,0000,0000,0000,,You notice there are artifacts in this
Dialogue: 0,1:13:32.68,1:13:36.34,Default,,0000,0000,0000,,data. Where the circle is, where the data is circled, all
Dialogue: 0,1:13:36.34,1:13:37.67,Default,,0000,0000,0000,,the
Dialogue: 0,1:13:37.67,1:13:41.18,Default,,0000,0000,0000,,electrodes seem to measure in these very synchronized recordings.
Dialogue: 0,1:13:41.18,1:13:44.70,Default,,0000,0000,0000,,It turns out that we look at [inaudible] data as well as a number of other
Dialogue: 0,1:13:44.70,1:13:47.02,Default,,0000,0000,0000,,types of data, there are
Dialogue: 0,1:13:47.02,1:13:51.55,Default,,0000,0000,0000,,artifacts from heartbeats and from human eye blinks and so on. So the
Dialogue: 0,1:13:51.55,1:13:55.03,Default,,0000,0000,0000,,cartoonist, if you imagine, placing the
Dialogue: 0,1:13:55.03,1:13:56.73,Default,,0000,0000,0000,,electrodes, or
Dialogue: 0,1:13:56.73,1:13:58.32,Default,,0000,0000,0000,,microphones, on my scalp,
Dialogue: 0,1:13:58.32,1:14:01.84,Default,,0000,0000,0000,,then each microphone is recording some overlapping combination of all the
Dialogue: 0,1:14:01.84,1:14:04.92,Default,,0000,0000,0000,,things happening in my brain or in my body.
Dialogue: 0,1:14:04.92,1:14:08.38,Default,,0000,0000,0000,,My brain has a number of different processes going on. My body's [inaudible]
Dialogue: 0,1:14:08.38,1:14:10.52,Default,,0000,0000,0000,,going on, and
Dialogue: 0,1:14:10.52,1:14:13.43,Default,,0000,0000,0000,,each electrode measures a sum
Dialogue: 0,1:14:13.43,1:14:15.68,Default,,0000,0000,0000,,of the different voices in my brain.
Dialogue: 0,1:14:15.68,1:14:19.79,Default,,0000,0000,0000,,That didn't quite come out the way I wanted it to.
Dialogue: 0,1:14:19.79,1:14:21.53,Default,,0000,0000,0000,,So we can just take this data
Dialogue: 0,1:14:21.53,1:14:25.40,Default,,0000,0000,0000,,and run ICA on it and find out one of the independent components, what the
Dialogue: 0,1:14:25.40,1:14:26.13,Default,,0000,0000,0000,,independent
Dialogue: 0,1:14:26.13,1:14:30.33,Default,,0000,0000,0000,,process are going on in my brain. This is an example of running ICA.
Dialogue: 0,1:14:30.33,1:14:33.24,Default,,0000,0000,0000,,So you find that a small number of components, like those shown up there,
Dialogue: 0,1:14:33.24,1:14:37.74,Default,,0000,0000,0000,,they correspond to heartbeat, where the arrows - so those are very periodic
Dialogue: 0,1:14:37.74,1:14:42.33,Default,,0000,0000,0000,,signals. They come on occasionally and correspond to [inaudible] components of
Dialogue: 0,1:14:42.33,1:14:43.05,Default,,0000,0000,0000,,heartbeat.
Dialogue: 0,1:14:43.05,1:14:47.46,Default,,0000,0000,0000,,You also find things like an eye blink component, corresponding to a
Dialogue: 0,1:14:47.46,1:14:49.78,Default,,0000,0000,0000,,sigmoid generated when you blink your eyes.
Dialogue: 0,1:14:49.78,1:14:53.82,Default,,0000,0000,0000,,By doing this, you can then subtract out the heartbeat and the eye blink
Dialogue: 0,1:14:53.82,1:14:56.18,Default,,0000,0000,0000,,artifacts from the data, and now
Dialogue: 0,1:14:56.18,1:15:01.22,Default,,0000,0000,0000,,you get much cleaner ICA data - get much cleaner EEG readings. You can
Dialogue: 0,1:15:01.22,1:15:03.70,Default,,0000,0000,0000,,do further scientific studies. So this is a
Dialogue: 0,1:15:03.70,1:15:06.18,Default,,0000,0000,0000,,pretty commonly used preprocessing step
Dialogue: 0,1:15:06.18,1:15:09.70,Default,,0000,0000,0000,,that is a common application of ICA.
Dialogue: 0,1:15:09.70,1:15:13.03,Default,,0000,0000,0000,,[Inaudible] example is
Dialogue: 0,1:15:13.03,1:15:16.30,Default,,0000,0000,0000,,the application, again, from [inaudible]. As
Dialogue: 0,1:15:16.30,1:15:20.90,Default,,0000,0000,0000,,a result of running ICA on natural small image patches. Suppose I take
Dialogue: 0,1:15:20.90,1:15:22.05,Default,,0000,0000,0000,,natural images
Dialogue: 0,1:15:22.05,1:15:25.91,Default,,0000,0000,0000,,and run ICA on the data and ask what are the independent components of data.
Dialogue: 0,1:15:25.91,1:15:30.04,Default,,0000,0000,0000,,It turns out that these are the bases you get. So this is a plot of the
Dialogue: 0,1:15:30.04,1:15:32.53,Default,,0000,0000,0000,,sources you get.
Dialogue: 0,1:15:32.53,1:15:36.27,Default,,0000,0000,0000,,This algorithm is saying that a natural image patch
Dialogue: 0,1:15:36.27,1:15:37.75,Default,,0000,0000,0000,,shown
Dialogue: 0,1:15:37.75,1:15:39.79,Default,,0000,0000,0000,,on the left
Dialogue: 0,1:15:39.79,1:15:45.33,Default,,0000,0000,0000,,is often expressed as a sum, or a linear combination, of
Dialogue: 0,1:15:45.33,1:15:46.68,Default,,0000,0000,0000,,independent sources of
Dialogue: 0,1:15:46.68,1:15:48.16,Default,,0000,0000,0000,,things that make up images.
Dialogue: 0,1:15:48.16,1:15:52.78,Default,,0000,0000,0000,,So this model's natural images is generated by independent objects
Dialogue: 0,1:15:52.78,1:15:55.34,Default,,0000,0000,0000,,that generate different ages in the image.
Dialogue: 0,1:15:55.34,1:16:01.26,Default,,0000,0000,0000,,One of the fascinating things about this is that, similar to neuroscience, this has also been
Dialogue: 0,1:16:01.26,1:16:04.79,Default,,0000,0000,0000,,hypothesized as a method for how the human brain processes image
Dialogue: 0,1:16:04.79,1:16:05.100,Default,,0000,0000,0000,,data. It
Dialogue: 0,1:16:05.100,1:16:10.14,Default,,0000,0000,0000,,turns out, this is similar, in many ways, to computations
Dialogue: 0,1:16:10.14,1:16:15.08,Default,,0000,0000,0000,,happening in early visual processing in the human brain,
Dialogue: 0,1:16:15.08,1:16:17.66,Default,,0000,0000,0000,,in the mammalian
Dialogue: 0,1:16:17.66,1:16:19.80,Default,,0000,0000,0000,,brain. It's just
Dialogue: 0,1:16:19.80,1:16:25.26,Default,,0000,0000,0000,,interesting to see ages are the independent components of images.
Dialogue: 0,1:16:25.26,1:16:30.64,Default,,0000,0000,0000,,Are there quick questions, because I'm running late. Quick questions before I close? Interviewee: [Inaudible] square matrix? Instructor (Andrew
Dialogue: 0,1:16:30.64,1:16:31.93,Default,,0000,0000,0000,,Ng):Oh,
Dialogue: 0,1:16:31.93,1:16:35.41,Default,,0000,0000,0000,,yes. For the algorithms I describe, I assume A is a square matrix.
Dialogue: 0,1:16:35.41,1:16:38.59,Default,,0000,0000,0000,,It turns out if you have more microphones than speakers, you can also apply very
Dialogue: 0,1:16:38.59,1:16:39.61,Default,,0000,0000,0000,,similar algorithms. If
Dialogue: 0,1:16:39.61,1:16:43.92,Default,,0000,0000,0000,,you have fewer microphones than speakers, there's sort of an open research problem. The odds
Dialogue: 0,1:16:43.92,1:16:48.46,Default,,0000,0000,0000,,are that if you have one male and one female speaker, but one microphone, you can
Dialogue: 0,1:16:48.46,1:16:51.82,Default,,0000,0000,0000,,sometimes sort of separate them because one is high, one is low. If you have two
Dialogue: 0,1:16:51.82,1:16:55.46,Default,,0000,0000,0000,,male speakers or two female speakers, then it's beyond the state of the art now to separate them
Dialogue: 0,1:16:55.46,1:16:57.05,Default,,0000,0000,0000,,with one
Dialogue: 0,1:16:57.05,1:17:00.50,Default,,0000,0000,0000,,microphone. It's a great research program. Okay.
Dialogue: 0,1:17:00.50,1:17:04.87,Default,,0000,0000,0000,,Sorry about running late again. Let's close now, and we'll
Dialogue: 0,1:17:04.87,1:17:05.75,Default,,0000,0000,0000,,continue reinforcement learning.