0:00:00.043,0:00:04.012 In this video, we'll see a demonstration of JSON data. 0:00:04.012,0:00:05.075 As a reminder, JSON stands for 0:00:05.075,0:00:08.002 Java Script Object Notation, and 0:00:08.002,0:00:09.077 it's a standard for writing 0:00:09.077,0:00:13.081 data objects into human readable format, typically in a file. 0:00:13.081,0:00:16.007 It's useful for exchanging data 0:00:16.007,0:00:18.011 between programs, and generally 0:00:18.011,0:00:20.007 because it's quite flexible, it's useful 0:00:20.007,0:00:23.012 for representing and for storing data that's semi-structured. 0:00:23.012,0:00:24.082 A reminder of the 0:00:24.082,0:00:26.087 basic constructs in JSON, we 0:00:26.087,0:00:28.003 have the atomic value, such 0:00:28.003,0:00:30.032 as integers and strings and so on. 0:00:30.032,0:00:31.066 And then we have two types of 0:00:31.066,0:00:33.015 composite things; we have 0:00:33.015,0:00:34.002 objects that are sets of 0:00:34.002,0:00:38.048 label-value pairs and then we have arrays that are lists of values. 0:00:38.048,0:00:40.028 In the demonstration, we'll go through 0:00:40.028,0:00:41.086 in more detail the basic constructs 0:00:41.086,0:00:44.001 of JSON and we'll look at 0:00:44.001,0:00:46.034 some tactic correctness, we'll demonstrate 0:00:46.034,0:00:47.009 the flexibility of the data 0:00:47.009,0:00:49.018 model and then we'll 0:00:49.018,0:00:50.093 look briefly at JSON's schema, 0:00:50.093,0:00:52.081 not widely used yet but 0:00:52.081,0:00:54.029 still fairly interesting to look at 0:00:54.029,0:00:55.092 and we'll look at some validation 0:00:55.092,0:00:59.038 of JSON data against a particular schema. 0:00:59.038,0:01:00.062 So, here's the JSON 0:01:00.062,0:01:03.002 data that we're gonna be working with during this demo. 0:01:03.002,0:01:04.072 It's the same data that appeared 0:01:04.072,0:01:07.002 in the slides, in the introduction 0:01:07.002,0:01:08.038 to JSON, but now we're going 0:01:08.038,0:01:11.008 to look into the components of the data. 0:01:11.008,0:01:13.021 It's also by the way, the 0:01:13.021,0:01:14.007 same example pretty much that we 0:01:14.007,0:01:17.007 used for XML, it's reformatted 0:01:17.007,0:01:18.055 of course to meet the JSON 0:01:18.055,0:01:22.008 data model, but you can compare the two directly. 0:01:22.008,0:01:23.064 Lastly, we do have 0:01:23.064,0:01:25.062 the file for the data on 0:01:25.062,0:01:26.071 the website, and I do 0:01:26.071,0:01:28.002 suggest that you download the 0:01:28.002,0:01:29.008 file so that you can 0:01:29.008,0:01:31.053 take a look at it closely on your own computer. 0:01:31.053,0:01:32.028 All right. 0:01:32.028,0:01:33.046 So, let's see what we have, 0:01:33.046,0:01:34.082 right now we're in 0:01:34.082,0:01:36.066 an editor for JSON data. 0:01:36.066,0:01:38.008 It happens to be the Eclipse 0:01:38.008,0:01:38.088 editor and we're going to 0:01:38.088,0:01:39.094 make make some edits to the 0:01:39.094,0:01:41.094 file after we look through 0:01:41.094,0:01:43.074 the constructs of the file. 0:01:43.074,0:01:45.069 So, this is JSON 0:01:45.069,0:01:48.012 data representing books and 0:01:48.012,0:01:49.065 magazines, and we have 0:01:49.065,0:01:52.039 a little more information about our books and our magazines. 0:01:52.039,0:01:53.092 So, at the outermost, the 0:01:53.092,0:01:57.039 curly brace indicates that this is a JSON object. 0:01:57.039,0:01:59.014 And as a reminder, an object 0:01:59.014,0:02:01.043 is a set of label-value 0:02:01.043,0:02:03.062 pairs, separated by commas. 0:02:03.062,0:02:07.007 So, our first value is the label "books". And 0:02:07.007,0:02:09.066 then our first element in 0:02:09.066,0:02:11.056 the object is the label books 0:02:11.056,0:02:14.049 and this big value and the 0:02:14.049,0:02:16.004 second, so there's only two label-value 0:02:16.004,0:02:17.075 pairs here, is the 0:02:17.075,0:02:21.018 label magazines and this big value here. 0:02:21.018,0:02:23.091 And let's take a look first at magazines. 0:02:23.091,0:02:25.018 So magazines, again, is the 0:02:25.018,0:02:26.045 label and the value we 0:02:26.045,0:02:27.085 can see with the square 0:02:27.085,0:02:30.022 brackets here is an array. 0:02:30.022,0:02:31.062 An array is a list of 0:02:31.062,0:02:33.034 values and here we 0:02:33.034,0:02:35.008 have two values in our array. 0:02:35.008,0:02:37.002 They're still composite values. 0:02:37.002,0:02:38.059 So, we have two values, each 0:02:38.059,0:02:40.028 of which is an object, 0:02:40.028,0:02:42.007 a set of label-value pairs. 0:02:42.007,0:02:46.074 Let me mention, sometimes people call these labels 'properties', by the way. 0:02:46.074,0:02:48.028 Okay. So, now we are inside 0:02:48.028,0:02:49.007 our 2 objects that are 0:02:49.007,0:02:53.011 the 2 elements in the array that's the value of magazines. 0:02:53.011,0:02:54.008 And each one of those has 0:02:54.008,0:02:56.092 3 labels and 3 values. 0:02:56.092,0:02:58.086 And now we're finally down to the base values. 0:02:58.086,0:03:00.004 So, we have the title being "National 0:03:00.004,0:03:02.026 Geographic", a string, the 0:03:02.026,0:03:04.005 month being January, a string 0:03:04.005,0:03:06.033 and the year 2009, where 2009 is an integer. 0:03:06.033,0:03:08.065 And again, we have 0:03:08.065,0:03:12.014 another object here that's a different magazine 0:03:12.014,0:03:15.027 with a different name, month and happens to be the same year. 0:03:15.027,0:03:16.067 Now, these two have exactly the 0:03:16.067,0:03:18.013 same structure but they don't 0:03:18.013,0:03:19.038 have to and we will 0:03:19.038,0:03:21.099 see that as we start editing the file. 0:03:21.099,0:03:23.019 But before we edit the file, 0:03:23.019,0:03:24.059 let's go and look at 0:03:24.059,0:03:26.014 our books here. 0:03:26.014,0:03:28.055 The value of our other 0:03:28.055,0:03:30.011 label-value pair inside the 0:03:30.011,0:03:32.001 outermost object, "books" is 0:03:32.001,0:03:34.027 also an array, and 0:03:34.027,0:03:35.087 the array in this case also 0:03:35.087,0:03:38.095 has just two elements, so we've represented two books here. 0:03:38.095,0:03:40.039 It's a little more complicated than the 0:03:40.039,0:03:42.056 magazines, but those elements 0:03:42.056,0:03:45.072 are still objects that are label-value pairs. 0:03:45.072,0:03:47.044 So, we have now the ISBN, 0:03:47.044,0:03:49.007 the price, the addition, the title, 0:03:49.007,0:03:51.097 all either integers or strings, 0:03:51.097,0:03:54.075 and then we have one nested composite 0:03:54.075,0:03:56.021 object which is the authors 0:03:56.021,0:03:57.081 and that's an array again. 0:03:57.081,0:04:02.029 So, the array again, is indicated by the square brackets. 0:04:02.029,0:04:04.003 And inside this array, we 0:04:04.003,0:04:06.013 have two authors and each 0:04:06.013,0:04:07.036 of the authors has a first 0:04:07.036,0:04:08.085 name and a last name, 0:04:08.085,0:04:10.038 but again, that uniformity is 0:04:10.038,0:04:13.031 not required by the model itself, as we'll see. 0:04:13.031,0:04:15.037 So, as I mentioned, 0:04:15.037,0:04:16.073 this is actually an editor for 0:04:16.073,0:04:19.028 JSON data and we're going to come back to this editor in a moment. 0:04:19.028,0:04:20.003 But what I wanted to do is 0:04:20.003,0:04:22.006 show the same data 0:04:22.006,0:04:23.069 in a browser because browsers 0:04:23.069,0:04:25.038 actually offer some nice features 0:04:25.038,0:04:27.086 for navigating in JSON. 0:04:27.086,0:04:28.068 So here we are in the 0:04:28.068,0:04:30.029 Chrome browser, which has nice 0:04:30.029,0:04:32.072 features for navigating JSON, 0:04:32.072,0:04:34.057 and other browsers do as well. 0:04:34.057,0:04:35.099 We can see here again that we 0:04:35.099,0:04:37.053 have an object in 0:04:37.053,0:04:39.002 our JSON data, that consists 0:04:39.002,0:04:40.082 of two label-value pairs; 0:04:40.082,0:04:42.073 books and magazines, which are 0:04:42.073,0:04:43.062 currently closed and and then 0:04:43.062,0:04:47.016 this plus allows us to open them up, and see the structure. 0:04:47.016,0:04:48.082 For example, we open magazines 0:04:48.082,0:04:52.021 and we see that magazines is an array containing two objects. 0:04:52.021,0:04:53.004 We can open one of those 0:04:53.004,0:04:55.094 objects, and see that the three label-value pairs. 0:04:55.094,0:04:59.059 Now we're at the lowest levels and similarly for the other object. 0:04:59.059,0:05:00.072 We can see here that Books 0:05:00.072,0:05:03.007 is also an array, and we go ahead and open it up. 0:05:03.007,0:05:05.011 It's an array of two objects. 0:05:05.011,0:05:06.004 We open one of those 0:05:06.004,0:05:07.088 objects and we see again 0:05:07.088,0:05:09.061 the set of label-value pairs, 0:05:09.061,0:05:10.008 where one of the values 0:05:10.008,0:05:12.061 is a further nesting. 0:05:12.061,0:05:14.044 It's an array and we open 0:05:14.044,0:05:15.045 that array, and we see 0:05:15.045,0:05:16.069 two objects, and we open 0:05:16.069,0:05:19.042 them and finally see the data at the lowest levels. 0:05:19.042,0:05:21.048 So again, the browser 0:05:21.048,0:05:22.065 here gives us a nice way 0:05:22.065,0:05:26.091 to navigate the JSON data and see its structure. 0:05:26.091,0:05:28.099 So now we're back to our JSON editor. 0:05:28.099,0:05:30.075 By the way, this editor, Eclipse, does 0:05:30.075,0:05:32.058 also have some features for 0:05:32.058,0:05:34.015 opening and closing the structure 0:05:34.015,0:05:35.096 of the data, but it's 0:05:35.096,0:05:38.013 not quite as nice as the browser that we use. 0:05:38.013,0:05:39.082 So we decided to use the browser instead. 0:05:39.082,0:05:40.009 What we are going to 0:05:40.009,0:05:42.034 use the editor for is to 0:05:42.034,0:05:43.049 make some changes to the 0:05:43.049,0:05:44.084 JSON data and see which 0:05:44.084,0:05:47.058 changes are legal and which aren't. 0:05:47.058,0:05:50.043 So, let's take a look at the first change, a very simple one. 0:05:50.043,0:05:52.001 What if we forgot a comma. 0:05:52.001,0:05:53.032 Well, when we try to 0:05:53.032,0:05:54.043 save that file, we get a 0:05:54.043,0:05:55.051 little notice that we have an 0:05:55.051,0:05:56.093 error, we expected an 0:05:56.093,0:05:58.085 N value, so that's a 0:05:58.085,0:06:02.066 pretty straightforward mistake, let's put that comma back. 0:06:02.066,0:06:04.068 Let's say insert an 0:06:04.068,0:06:07.056 extra brace somewhere here, for whatever reason. 0:06:07.056,0:06:09.079 We accidentally put in an extra brace. 0:06:09.079,0:06:13.016 Again we see that that's marked as an error. 0:06:13.016,0:06:13.099 So an error that can 0:06:13.099,0:06:15.033 be fairly common to make is 0:06:15.033,0:06:18.009 to forget to put quotes around strings. 0:06:18.009,0:06:20.017 So, for example, this ISBN 0:06:20.017,0:06:23.065 number here, if we don't quote it, we're gonna get an error. 0:06:23.065,0:06:24.077 As we'll see the only things that can 0:06:24.077,0:06:27.003 be unquoted are numbers and 0:06:27.003,0:06:29.006 the values null, true and false. 0:06:29.006,0:06:31.066 So, let's put our quotes back there. 0:06:31.066,0:06:33.016 Now, actually, even more 0:06:33.016,0:06:34.044 common is to forget to 0:06:34.044,0:06:37.071 put quotes around the labels in label-value pairs. 0:06:37.071,0:06:40.077 But if we forget to quote that, that's going to be an error as well. 0:06:40.077,0:06:41.079 You might have noticed, by the 0:06:41.079,0:06:43.051 way, when we use the browser 0:06:43.051,0:06:44.093 that the browser didn't even show 0:06:44.093,0:06:46.008 us the quotes in the labels. 0:06:46.008,0:06:47.097 But you do when you make 0:06:47.097,0:06:51.071 the raw JSON data, you do need to include those quotes. 0:06:51.071,0:06:56.021 Speaking of quotes, what if we quoted our price here. 0:06:56.021,0:06:57.019 Well that's actually not an 0:06:57.019,0:06:58.067 error, because now we've simply turned 0:06:58.067,0:07:00.057 price into a string, and 0:07:00.057,0:07:03.031 string values are perfectly well allowed anywhere. 0:07:03.031,0:07:04.048 Now we'll see when we use 0:07:04.048,0:07:05.059 JSON's schema that we 0:07:05.059,0:07:07.018 can make restrictions that don't allow 0:07:07.018,0:07:08.089 strings in certain places, but 0:07:08.089,0:07:10.007 just for syntactic correctness of 0:07:10.007,0:07:15.038 JSON data any of our values can be strings. 0:07:15.038,0:07:16.058 Now, as I mentioned, there are 0:07:16.058,0:07:17.096 a few values that are 0:07:17.096,0:07:20.045 sort of reserved words in JSON. 0:07:20.045,0:07:22.039 For example, true is a 0:07:22.039,0:07:24.042 reserved word for a bullion value. 0:07:24.042,0:07:25.076 That means we don't need to 0:07:25.076,0:07:27.019 quote it because it's actually 0:07:27.019,0:07:28.088 its own special type of value. 0:07:28.088,0:07:30.039 And so is false. 0:07:30.039,0:07:32.027 And the third one is null, 0:07:32.027,0:07:35.003 so there's a built-in concept of null. 0:07:35.003,0:07:36.063 Now, if we wanted to 0:07:36.063,0:07:38.016 use nil for whatever reason 0:07:38.016,0:07:39.065 instead of null, well, now 0:07:39.065,0:07:40.007 we're going to get an error because 0:07:40.007,0:07:42.014 nil is not a reserved word, 0:07:42.014,0:07:43.053 and if we really wanted nil 0:07:43.053,0:07:47.088 then we would need to actually make it a quoted string. 0:07:47.088,0:07:50.018 Now, let's take a look inside our author list. 0:07:50.018,0:07:51.024 And I'm going to show you 0:07:51.024,0:07:52.009 that arrays do not have 0:07:52.009,0:07:54.007 to have the same type of 0:07:54.007,0:07:56.035 value for every element in the array. 0:07:56.035,0:07:58.004 So here we have a homogeneous 0:07:58.004,0:07:59.067 list of authors. Both of them 0:07:59.067,0:08:01.017 are objects with a first 0:08:01.017,0:08:02.022 name and a last name as 0:08:02.022,0:08:04.026 separate label-value pairs, 0:08:04.026,0:08:05.009 but if I change that 0:08:05.009,0:08:07.098 first one, the entire value 0:08:07.098,0:08:09.053 to be, instead of a 0:08:09.053,0:08:11.033 composite one, simply the string, 0:08:11.033,0:08:13.027 Jefferey Ullman. Oops, sorry 0:08:13.027,0:08:15.067 about my typing there, and that 0:08:15.067,0:08:17.002 is not an error, it 0:08:17.002,0:08:18.083 is allowed to have a string, 0:08:18.083,0:08:20.002 and then a composite object. 0:08:20.002,0:08:22.092 And we could even have an array, and anything we want. 0:08:22.092,0:08:24.003 In an array, when you 0:08:24.003,0:08:25.037 have a list of values, all 0:08:25.037,0:08:26.066 you need is for each one 0:08:26.066,0:08:30.022 to be syntactically a correct value in JSON. 0:08:30.022,0:08:32.000 Now let's go visit our magazines 0:08:32.000,0:08:33.008 for a moment here and let 0:08:33.008,0:08:35.074 me show that empty objects are okay. 0:08:35.074,0:08:37.079 So a list of label 0:08:37.079,0:08:41.046 value pairs, comprising an object, can be the empty list. 0:08:41.046,0:08:42.097 And so now I've turned this magazine 0:08:42.097,0:08:44.055 into having no information about 0:08:44.055,0:08:46.042 it, but that is legal in JSON. 0:08:46.042,0:08:50.091 And similarly, arrays are allowed to be of zero length. 0:08:50.091,0:08:52.036 So I can take these authors 0:08:52.036,0:08:53.043 here and I can just take 0:08:53.043,0:08:54.081 out all of the authors, and 0:08:54.081,0:08:58.004 make that an empty list, but that's still valid JSON. 0:08:58.004,0:09:01.093 Now, what if I took this array out altogether? 0:09:01.093,0:09:02.098 In that case, now we 0:09:02.098,0:09:04.036 have an error because this is 0:09:04.036,0:09:05.067 an object where we have 0:09:05.067,0:09:08.001 label-value pairs and every 0:09:08.001,0:09:09.029 label-value pair has to 0:09:09.029,0:09:12.003 have both a label and a value. 0:09:12.003,0:09:13.091 So let's put our array back 0:09:13.091,0:09:15.001 and we can have anything in 0:09:15.001,0:09:16.038 there so let's just make it 0:09:16.038,0:09:19.016 "fu" and that corrects the error. 0:09:19.016,0:09:20.003 What if we didn't want an 0:09:20.003,0:09:21.049 array here instead and we 0:09:21.049,0:09:24.064 tried to make it, say, an object,? 0:09:24.064,0:09:26.008 Well, we're going to see an 0:09:26.008,0:09:28.001 error there, because an object 0:09:28.001,0:09:29.029 as a reminder and this is an 0:09:29.029,0:09:30.082 easy mistake to make. Objects 0:09:30.082,0:09:33.029 are always label-value pairs. 0:09:33.029,0:09:34.006 So if you want just a value, 0:09:34.006,0:09:36.014 that should be an array if 0:09:36.014,0:09:37.061 you want an object, then we're 0:09:37.061,0:09:39.008 talking about a label-value pair, so 0:09:39.008,0:09:40.024 we can just add "fu" as 0:09:40.024,0:09:42.084 our value, and then we're all set. 0:09:42.084,0:09:46.004 So what we've seen so far is syntactic correctness. 0:09:46.004,0:09:48.005 Again, there's no required 0:09:48.005,0:09:50.073 uniformity across values in 0:09:50.073,0:09:52.095 arrays or in the 0:09:52.095,0:09:55.011 label-value pairs in objects we 0:09:55.011,0:09:56.052 just need to ensure that 0:09:56.052,0:09:57.078 all of our values, our basic 0:09:57.078,0:09:59.022 values, are of the right types, 0:09:59.022,0:10:00.045 and things like our commas and 0:10:00.045,0:10:02.088 curly braces are all in place. 0:10:02.088,0:10:04.023 What we're gonna do next is look 0:10:04.023,0:10:05.077 at JSON's schema where we 0:10:05.077,0:10:08.008 have a mechanism for enforcing certain 0:10:08.008,0:10:11.083 constraints beyond simple syntactic correctness. 0:10:11.083,0:10:13.001 If you've been very observant, you 0:10:13.001,0:10:14.026 might even have noticed that we 0:10:14.026,0:10:15.047 have a second tab up 0:10:15.047,0:10:17.027 here in our editor for a 0:10:17.027,0:10:18.096 second JSON file, and this file 0:10:18.096,0:10:20.034 is going to be the schema 0:10:20.034,0:10:22.077 for our bookstore data. We're using 0:10:22.077,0:10:25.013 JSON schema, and JSON 0:10:25.013,0:10:27.008 schema, like, XML schema 0:10:27.008,0:10:29.004 is expressed in the data model itself. 0:10:29.004,0:10:31.031 So, our schema description for 0:10:31.031,0:10:33.018 this JSON data is itself 0:10:33.018,0:10:35.037 JSON data, and here it is. 0:10:35.037,0:10:37.011 And it's going to take a bit of time to explain. 0:10:37.011,0:10:37.098 Now the first thing that you might 0:10:37.098,0:10:39.042 notice is wow, the schema 0:10:39.042,0:10:41.005 looks more complicated and in 0:10:41.005,0:10:43.049 fact longer than the data itself. 0:10:43.049,0:10:47.008 Well, that is true, but that's mostly because our data file is tiny. 0:10:47.008,0:10:49.023 So, if we had thousands, you know, tens 0:10:49.023,0:10:51.046 of thousands of books and magazines, 0:10:51.046,0:10:53.014 our schema file wouldn't 0:10:53.014,0:10:54.031 change, but our data file would 0:10:54.031,0:10:57.058 be much longer and that's the typical case, in reality. 0:10:57.058,0:10:58.079 Now, this video is not a 0:10:58.079,0:11:01.003 complete tutorial about JSON's schema. 0:11:01.003,0:11:02.075 There's many constructs in JSON's 0:11:02.075,0:11:04.056 schema that weren't needed to 0:11:04.056,0:11:06.076 describe the bookstore data, for example. 0:11:06.076,0:11:08.053 And even this file here, 0:11:08.053,0:11:11.057 I'm not gonna go through every detail of it right here. 0:11:11.057,0:11:12.072 You can download the file and 0:11:12.072,0:11:15.023 take a look, read a little more about JSON schema. 0:11:15.023,0:11:16.032 I'm just going to give the 0:11:16.032,0:11:17.074 flavor of the schema 0:11:17.074,0:11:19.029 specification and then we're 0:11:19.029,0:11:20.087 going to work with validating the data 0:11:20.087,0:11:24.005 itself to see how the schema and data work together. 0:11:24.005,0:11:28.002 But to give you the flavor here, let's go through at least some portions of the schema. 0:11:28.002,0:11:29.056 So, in some sense, 0:11:29.056,0:11:31.062 the structure of the schema file 0:11:31.062,0:11:34.055 reflects the structure of the data file that it's describing. 0:11:34.055,0:11:37.018 So, the outermost constructs in 0:11:37.018,0:11:38.033 the schema file are the 0:11:38.033,0:11:39.056 outermost in the data file and 0:11:39.056,0:11:42.046 as we nest it parallels the nesting. 0:11:42.046,0:11:43.068 Let me just show a little 0:11:43.068,0:11:48.047 bit here, we'll probably look at most of it in the context of validation. 0:11:48.047,0:11:52.068 So, we see here that our outermost construct in our data file is an object. 0:11:52.068,0:11:53.078 And that's told to us, 0:11:53.078,0:11:55.008 because we have "type" as 0:11:55.008,0:11:57.027 one of our built-in labels for the schema. 0:11:57.027,0:11:58.054 So we we have an 0:11:58.054,0:12:00.061 object with two properties, as 0:12:00.061,0:12:02.063 we can see here, the book's property 0:12:02.063,0:12:04.065 and the magazine's property. 0:12:04.065,0:12:05.077 And I use the word 0:12:05.077,0:12:07.088 "labels" frequently for label-value 0:12:07.088,0:12:11.064 pairs, that's synonymous with property value pairs. 0:12:11.064,0:12:13.055 Then inside the books property 0:12:13.055,0:12:15.001 for example, we see that 0:12:15.001,0:12:16.055 the type of that is array, 0:12:16.055,0:12:19.025 so we've got a label-value pair where the value is an array. 0:12:19.025,0:12:22.042 And then we follow the nesting and see that it's an array of objects. 0:12:22.042,0:12:24.016 And we go further down and we 0:12:24.016,0:12:26.001 see the different label-value pairs 0:12:26.001,0:12:27.051 of the object that make up 0:12:27.051,0:12:31.000 the books and nesting further into the authors and so on. 0:12:31.000,0:12:32.096 We see similarly for magazines 0:12:32.096,0:12:34.077 that the value of the 0:12:34.077,0:12:36.023 a label-value pair for 0:12:36.023,0:12:37.094 magazines is an array, and 0:12:37.094,0:12:41.003 that array consists of objects with further nesting. 0:12:41.003,0:12:42.015 So what we're looking at here is 0:12:42.015,0:12:45.055 an online JSON schema validator. We have two windows. 0:12:45.055,0:12:46.054 On the left we have our 0:12:46.054,0:12:47.065 schema and on the 0:12:47.065,0:12:49.007 right we have our data, and 0:12:49.007,0:12:50.059 this is exactly the same data 0:12:50.059,0:12:54.001 file and schema file that we were looking at earlier. 0:12:54.001,0:12:55.071 If we hit the validate button, 0:12:55.071,0:12:58.015 hopefully everything should work and it does. 0:12:58.015,0:12:59.029 This tells us that the 0:12:59.029,0:13:03.001 JSON data is valid with respect to the schema. 0:13:03.001,0:13:04.059 Now, this system will of 0:13:04.059,0:13:06.047 course find basic syntactic errors 0:13:06.047,0:13:07.076 so I can take away a comma 0:13:07.076,0:13:09.003 just like I did before and 0:13:09.003,0:13:10.038 when I validate I'll get a 0:13:10.038,0:13:13.028 parsing error that really has nothing to do with the schema. 0:13:13.028,0:13:14.002 What I'm going to focus on 0:13:14.002,0:13:16.049 now is actually validating 0:13:16.049,0:13:18.044 semantic correctness of the Jason 0:13:18.044,0:13:19.075 with respect back to the constructs 0:13:19.075,0:13:21.039 that we've specified in this schema. 0:13:21.039,0:13:25.033 Let me first put that comma back so we start with a valid file. 0:13:25.033,0:13:26.077 So, the first thing I'll show is 0:13:26.077,0:13:28.041 the ability to constrain basic 0:13:28.041,0:13:29.067 types, and then the ability 0:13:29.067,0:13:32.018 to constrain the range of values of those basic types. 0:13:32.018,0:13:34.047 And let's focus on price. 0:13:34.047,0:13:35.056 So here we're talking about the 0:13:35.056,0:13:37.076 price property inside books and 0:13:37.076,0:13:39.058 we specify in our schema 0:13:39.058,0:13:42.005 that the type of the price must be an integer. 0:13:42.005,0:13:44.001 So, for example, if our 0:13:44.001,0:13:46.022 price were instead a string 0:13:46.022,0:13:47.026 and we went ahead and try 0:13:47.026,0:13:49.095 to validate that we would get an error. 0:13:49.095,0:13:51.003 Let's make it back into an 0:13:51.003,0:13:53.068 integer but let's make 0:13:53.068,0:13:56.006 it into the integer 300 now instead of 100. 0:13:56.006,0:13:58.061 And why am I doing that? 0:13:58.061,0:14:00.041 Because the JSON schema also 0:14:00.041,0:14:01.097 lets me constrain the range of 0:14:01.097,0:14:05.014 values that are allowed if we have a numeric value. 0:14:05.014,0:14:06.066 So, not only in price did I 0:14:06.066,0:14:08.026 say that it's an integer but 0:14:08.026,0:14:09.046 I also said that it 0:14:09.046,0:14:11.006 has a minimum and maximum value, 0:14:11.006,0:14:13.025 the integer of prices must 0:14:13.025,0:14:15.043 be between 0 and 200. 0:14:15.043,0:14:16.037 So, if I try to make 0:14:16.037,0:14:18.024 the price of 300, and I 0:14:18.024,0:14:20.004 validate, I'm again getting an error. 0:14:20.004,0:14:21.091 Now it's not a type error, 0:14:21.091,0:14:23.002 but it's an error that my 0:14:23.002,0:14:26.034 integer was outside of the allowed range. 0:14:26.034,0:14:27.048 I've put the price back to 0:14:27.048,0:14:28.081 a hundred, and now let's 0:14:28.081,0:14:32.016 look at constraints on string values. 0:14:32.016,0:14:33.054 JSON schema actually has 0:14:33.054,0:14:35.016 a little pattern matching language that 0:14:35.016,0:14:36.063 can be used to constrain the 0:14:36.063,0:14:40.023 allowable strings for a specific type of value. 0:14:40.023,0:14:43.038 We'll look at ISBN number here as an example of that. 0:14:43.038,0:14:45.024 We've said that ISBN is 0:14:45.024,0:14:47.008 of type string, and then 0:14:47.008,0:14:48.056 we've further constrained in the 0:14:48.056,0:14:50.005 schema that the string values for 0:14:50.005,0:14:52.072 ISBN must satisfy a certain pattern. 0:14:52.072,0:14:56.021 I'm not gonna go into the details of this pattern-matching language. 0:14:56.021,0:14:57.068 I'm just gonna give an example. 0:14:57.068,0:14:59.014 And in fact, this entire demo is 0:14:59.014,0:15:00.062 really just an example lots of 0:15:00.062,0:15:03.043 things in JSON's schema that we're not seeing. 0:15:03.043,0:15:05.035 What this pattern here says is 0:15:05.035,0:15:06.098 that the string value for 0:15:06.098,0:15:08.092 ISBN must start with 0:15:08.092,0:15:13.008 the four characters ISBN and then can be followed by anything else. 0:15:13.008,0:15:14.011 So, if we go over to our 0:15:14.011,0:15:15.036 data and we look at 0:15:15.036,0:15:17.006 the ISBN number here and 0:15:17.006,0:15:18.003 say we have a typo, we 0:15:18.003,0:15:20.079 forgot the "I" and we try to validate. 0:15:20.079,0:15:22.042 Then we'll see that our data 0:15:22.042,0:15:25.074 no longer matches our schema specification. 0:15:25.074,0:15:29.045 Now let's look at some other constraints we can specify in JSON's schema. 0:15:29.045,0:15:32.015 We can constrain the number of elements in an array. 0:15:32.015,0:15:35.004 We can give a minimum or maximum or both. 0:15:35.004,0:15:38.012 And I've done that here in the context of the authors array. 0:15:38.012,0:15:39.069 Remember the authors are 0:15:39.069,0:15:40.073 an array that's a list of 0:15:40.073,0:15:42.089 objects and here I've said that 0:15:42.089,0:15:44.008 we have a minimum number of 0:15:44.008,0:15:45.043 items of 1 and a 0:15:45.043,0:15:46.075 maximum number items of 10. 0:15:46.075,0:15:48.044 In other words, every book 0:15:48.044,0:15:51.042 has to have between one and ten authors. 0:15:51.042,0:15:53.019 So let's try, for example, 0:15:53.019,0:15:56.018 taking out all of our authors here in our first book. 0:15:56.018,0:15:57.085 We actually looked at this before in terms 0:15:57.085,0:15:59.027 of syntactic validity, and it 0:15:59.027,0:16:01.045 was perfectly valid to have an empty array. 0:16:01.045,0:16:02.086 But when we try to validate 0:16:02.086,0:16:03.088 now we do get an 0:16:03.088,0:16:05.014 error, and the reason is 0:16:05.014,0:16:06.024 that we said that we needed 0:16:06.024,0:16:10.046 between one and ten array elements in the case of authors. 0:16:10.046,0:16:12.002 Now let's fix that, 0:16:12.002,0:16:13.054 not by putting our authors back 0:16:13.054,0:16:14.091 but let's say we actually decide 0:16:14.091,0:16:17.063 we would like to be able to have books that have no authors. 0:16:17.063,0:16:19.007 So, we can simply fix 0:16:19.007,0:16:21.004 that by changing that minimum 0:16:21.004,0:16:23.004 item to zero and that 0:16:23.004,0:16:24.095 makes our data valid again and 0:16:24.095,0:16:26.027 in fact, we could actually take that 0:16:26.027,0:16:28.047 minimum constraint out all together, 0:16:28.047,0:16:32.082 and if we do that our data is still going to be valid. 0:16:32.082,0:16:33.007 Now let's see what happens when we 0:16:33.007,0:16:36.092 add something to our data that isn't mentioned in the schema. 0:16:36.092,0:16:38.027 If you look carefully you'll see 0:16:38.027,0:16:39.044 that everything that we have 0:16:39.044,0:16:42.021 in the data so far has been specified in the schema. 0:16:42.021,0:16:43.083 Let's say we come along 0:16:43.083,0:16:46.033 and decide were gonna also have ratings for our books. 0:16:46.033,0:16:47.093 So let's add here a 0:16:47.093,0:16:51.085 rating label property with the value 5. 0:16:51.085,0:16:53.061 We go ahead and validate, you 0:16:53.061,0:16:54.066 probaly think it's not going to 0:16:54.066,0:16:57.036 validate properly but actually it did. 0:16:57.036,0:16:59.027 The definition of JSON 0:16:59.027,0:17:00.092 schema that it can constrain things by 0:17:00.092,0:17:02.034 describing them but you 0:17:02.034,0:17:04.008 can also have components in 0:17:04.008,0:17:06.007 the data that aren't present in this schema. 0:17:06.007,0:17:08.027 If we want to insist 0:17:08.027,0:17:10.052 that every property that is 0:17:10.052,0:17:11.067 present in the data is 0:17:11.067,0:17:12.009 also described in this 0:17:12.009,0:17:14.003 schema, then we can 0:17:14.003,0:17:17.012 actually add a constraint to the schema that tells us that. 0:17:17.012,0:17:20.074 Specifically, under the object 0:17:20.074,0:17:22.039 here, we can put in 0:17:22.039,0:17:24.009 a special flag which itself 0:17:24.009,0:17:27.008 is specified as a label called additional properties. 0:17:27.008,0:17:29.025 And this flag if we 0:17:29.025,0:17:31.043 set it to false and remember 0:17:31.043,0:17:32.096 false can is actually a keyword 0:17:32.096,0:17:34.052 in json's schema, tells us 0:17:34.052,0:17:36.016 that in our data we're not 0:17:36.016,0:17:37.077 allowed to have any properties 0:17:37.077,0:17:40.049 beyond those that are specified in the schema. 0:17:40.049,0:17:41.093 So now we validate and we 0:17:41.093,0:17:43.064 get an error, because the property 0:17:43.064,0:17:46.018 rating hasn't been defined in the schema. 0:17:46.018,0:17:48.046 If additional properties is missing, 0:17:48.046,0:17:50.019 or have the default value 0:17:50.019,0:17:53.083 of "true", then the validation goes through. 0:17:53.083,0:17:56.096 Now lets take a look at our authors that are still here. 0:17:56.096,0:17:58.005 Let's suppose that we don't 0:17:58.005,0:18:01.053 have a first name for our middle author here. 0:18:01.053,0:18:02.088 If we take that away and 0:18:02.088,0:18:04.062 we try to validate, we do 0:18:04.062,0:18:06.065 get an error, because we specified 0:18:06.065,0:18:08.044 in our schema and it's right 0:18:08.044,0:18:11.081 down here--that author-objects must 0:18:11.081,0:18:14.075 have both a first name and a last name. 0:18:14.075,0:18:16.029 It turns out that we can 0:18:16.029,0:18:20.021 specify for every property that the property is optional. 0:18:20.021,0:18:21.083 So, we can add to the 0:18:21.083,0:18:23.041 description of the first 0:18:23.041,0:18:24.059 name, not only that the 0:18:24.059,0:18:26.044 type is a string but that that 0:18:26.044,0:18:27.084 property is optional so we 0:18:27.084,0:18:31.021 say optional, true. 0:18:31.021,0:18:34.038 Now let's validate, and now we're in good shape. 0:18:34.038,0:18:35.018 Now, let's take a look 0:18:35.018,0:18:36.007 at what happens when we have 0:18:36.007,0:18:37.093 object that has more than 0:18:37.093,0:18:41.022 one instance of the same label or same property. 0:18:41.022,0:18:43.016 So let's suppose, for example, in 0:18:43.016,0:18:45.019 our magazine, the magazine 0:18:45.019,0:18:46.056 has two different years, 2009 and 2011. 0:18:46.056,0:18:52.037 This is syntactically valid, JSON, 0:18:52.037,0:18:55.077 it meets the structure of having a list of label-value pairs. 0:18:55.077,0:18:57.016 When we validate it, we 0:18:57.016,0:19:00.019 see that we can't add a second property, year. 0:19:00.019,0:19:02.073 So this validator doesn't permit 0:19:02.073,0:19:04.013 two copies of the same 0:19:04.013,0:19:05.089 property, and it's actually kind 0:19:05.089,0:19:07.018 of a parsing thing and not 0:19:07.018,0:19:09.003 so much related to JSON's schema. 0:19:09.003,0:19:12.054 Many parsers actually do enforce 0:19:12.054,0:19:14.015 that labels or properties need 0:19:14.015,0:19:15.072 to be unique within objects, even 0:19:15.072,0:19:18.003 though technically syntactically correct 0:19:18.003,0:19:20.002 JSON does allow multiple copies. 0:19:20.002,0:19:22.034 So that's just something to remember, 0:19:22.034,0:19:23.092 the typical use of objects is 0:19:23.092,0:19:26.017 to have unique labels, sometimes 0:19:26.017,0:19:30.004 are even called keys of which evokes a concept of them unique. 0:19:30.004,0:19:32.002 So typically they are unique. 0:19:32.002,0:19:34.021 They don't have to be for syntactic validity. 0:19:34.021,0:19:35.047 Usually when you wanna have 0:19:35.047,0:19:39.044 repeated values, it actually makes more sense to create an array. 0:19:39.044,0:19:41.084 I've taken away the second year in order to make the JSON valid again. 0:19:41.084,0:19:44.067 Now let's take a look at months. 0:19:44.067,0:19:46.065 I've used months to illustrate 0:19:46.065,0:19:48.087 the enumeration constraint so we 0:19:48.087,0:19:50.006 saw that we could constrain the 0:19:50.006,0:19:52.018 values of integers, and we 0:19:52.018,0:19:54.006 saw that we can constrain strings 0:19:54.006,0:19:55.054 using a pattern, but we can 0:19:55.054,0:19:57.004 also constrain any type by 0:19:57.004,0:19:59.034 enumerating the values that are allowed. 0:19:59.034,0:20:00.085 So, for the month, we've set 0:20:00.085,0:20:02.019 it a string type which it 0:20:02.019,0:20:03.083 is but we've further constrained it 0:20:03.083,0:20:05.093 by saying that string must be 0:20:05.093,0:20:08.002 either January or February. 0:20:08.002,0:20:09.004 So, if we try to say 0:20:09.004,0:20:14.053 put in the string March, we 0:20:14.053,0:20:17.034 validate and we get the obvious error here. 0:20:17.034,0:20:18.065 We can fix that by changing the 0:20:18.065,0:20:19.089 month back, but maybe it 0:20:19.089,0:20:21.026 makes more sense that March 0:20:21.026,0:20:23.001 would be part of our enumeration type, 0:20:23.001,0:20:24.031 so we'll add March to 0:20:24.031,0:20:27.063 the possible values for months, and now we're good. 0:20:27.063,0:20:28.086 As a next example, let's take 0:20:28.086,0:20:30.008 a look at something that we 0:20:30.008,0:20:31.094 saw was syntactically correct but 0:20:31.094,0:20:33.062 isn't going to be semantically 0:20:33.062,0:20:34.086 correct, which is when 0:20:34.086,0:20:36.056 we have the author list 0:20:36.056,0:20:39.093 be a mixture of objects and strings. 0:20:39.093,0:20:43.051 So, let's put Jeffrey Ullman here just as a string. 0:20:43.051,0:20:44.073 We saw that that was still 0:20:44.073,0:20:46.014 valid JSON, but when we 0:20:46.014,0:20:47.069 try to validate now, we're gonna 0:20:47.069,0:20:49.077 get an error because we expected 0:20:49.077,0:20:50.088 to see an object, we have 0:20:50.088,0:20:52.029 specified that the authors 0:20:52.029,0:20:54.097 are objects, and instead we got a string. 0:20:54.097,0:20:56.095 Now JSON schema does allow 0:20:56.095,0:20:58.068 us to specify that we 0:20:58.068,0:21:00.099 can have different types of data 0:21:00.099,0:21:02.059 in the same context, and I'm 0:21:02.059,0:21:05.022 going to show that with a little bit of a simpler example here. 0:21:05.022,0:21:06.062 So, let's first take away our 0:21:06.062,0:21:09.093 author there so that we're back with a valid file. 0:21:09.093,0:21:13.027 And what I am going to look at is simply the year values. 0:21:13.027,0:21:15.029 So, let suppose for whatever 0:21:15.029,0:21:16.054 reason that in our 0:21:16.054,0:21:17.083 magazines, one of the 0:21:17.083,0:21:21.019 years was a string and the other year was an integer. 0:21:21.019,0:21:22.036 So that's not gonna work out 0:21:22.036,0:21:23.064 right now because we have 0:21:23.064,0:21:27.012 specified clearly that the year must be an integer. 0:21:27.012,0:21:29.034 In JSON schema specifications, when we 0:21:29.034,0:21:31.037 want to allow multiple types 0:21:31.037,0:21:32.081 for values that are 0:21:32.081,0:21:34.064 used in the same context, we 0:21:34.064,0:21:36.055 actually make the type be an array. 0:21:36.055,0:21:37.075 So instead of just saying 0:21:37.075,0:21:38.098 integer, if we put 0:21:38.098,0:21:40.054 an array here that has 0:21:40.054,0:21:42.087 both integer and string that's 0:21:42.087,0:21:43.098 telling us that our year 0:21:43.098,0:21:45.011 value can be either an 0:21:45.011,0:21:46.011 integer or a string 0:21:46.011,0:21:48.005 and now when we validate, 0:21:48.005,0:21:50.023 we get a correct JSON file. 0:21:50.023,0:21:53.072 That concludes our demo of JSON schema validation. 0:21:53.072,0:21:54.087 Again, we've just seen 0:21:54.087,0:21:56.073 one example with a number 0:21:56.073,0:21:58.025 of the constructs that are available 0:21:58.025,0:21:59.074 in JSON schema, but it's not 0:21:59.074,0:22:01.025 nearly exhaustive, there are many 0:22:01.025,0:22:02.097 others, and I encourage you 0:22:02.097,0:22:04.081 to read a bit more about it. 0:22:04.081,0:22:06.036 You can download this data and 0:22:06.036,0:22:07.083 this schema as a starting 0:22:07.083,0:22:09.056 point, and start adding things playing around 0:22:09.056,0:22:10.044 and I think you'll get a 0:22:10.044,0:22:12.018 good feel for how JSON 0:22:12.018,0:22:13.037 schema can be used to 0:22:13.037,9:59:59.000 constrain the allowable data in a JSON file.