In this video, we'll see a demonstration of JSON data.
As a reminder, JSON stands for
Java Script Object Notation, and
it's a standard for writing
data objects into human readable format, typically in a file.
It's useful for exchanging data
between programs, and generally
because it's quite flexible, it's useful
for representing and for storing data that's semi-structured.
A reminder of the
basic constructs in JSON, we
have the atomic value, such
as integers and strings and so on.
And then we have two types of
composite things; we have
objects that are sets of
label-value pairs and then we have arrays that are lists of values.
In the demonstration, we'll go through
in more detail the basic constructs
of JSON and we'll look at
some tactic correctness, we'll demonstrate
the flexibility of the data
model and then we'll
look briefly at JSON's schema,
not widely used yet but
still fairly interesting to look at
and we'll look at some validation
of JSON data against a particular schema.
So, here's the JSON
data that we're gonna be working with during this demo.
It's the same data that appeared
in the slides, in the introduction
to JSON, but now we're going
to look into the components of the data.
It's also by the way, the
same example pretty much that we
used for XML, it's reformatted
of course to meet the JSON
data model, but you can compare the two directly.
Lastly, we do have
the file for the data on
the website, and I do
suggest that you download the
file so that you can
take a look at it closely on your own computer.
All right.
So, let's see what we have,
right now we're in
an editor for JSON data.
It happens to be the Eclipse
editor and we're going to
make make some edits to the
file after we look through
the constructs of the file.
So, this is JSON
data representing books and
magazines, and we have
a little more information about our books and our magazines.
So, at the outermost, the
curly brace indicates that this is a JSON object.
And as a reminder, an object
is a set of label-value
pairs, separated by commas.
So, our first value is the label "books". And
then our first element in
the object is the label books
and this big value and the
second, so there's only two label-value
pairs here, is the
label magazines and this big value here.
And let's take a look first at magazines.
So magazines, again, is the
label and the value we
can see with the square
brackets here is an array.
An array is a list of
values and here we
have two values in our array.
They're still composite values.
So, we have two values, each
of which is an object,
a set of label-value pairs.
Let me mention, sometimes people call these labels 'properties', by the way.
Okay. So, now we are inside
our 2 objects that are
the 2 elements in the array that's the value of magazines.
And each one of those has
3 labels and 3 values.
And now we're finally down to the base values.
So, we have the title being "National
Geographic", a string, the
month being January, a string
and the year 2009, where 2009 is an integer.
And again, we have
another object here that's a different magazine
with a different name, month and happens to be the same year.
Now, these two have exactly the
same structure but they don't
have to and we will
see that as we start editing the file.
But before we edit the file,
let's go and look at
our books here.
The value of our other
label-value pair inside the
outermost object, "books" is
also an array, and
the array in this case also
has just two elements, so we've represented two books here.
It's a little more complicated than the
magazines, but those elements
are still objects that are label-value pairs.
So, we have now the ISBN,
the price, the addition, the title,
all either integers or strings,
and then we have one nested composite
object which is the authors
and that's an array again.
So, the array again, is indicated by the square brackets.
And inside this array, we
have two authors and each
of the authors has a first
name and a last name,
but again, that uniformity is
not required by the model itself, as we'll see.
So, as I mentioned,
this is actually an editor for
JSON data and we're going to come back to this editor in a moment.
But what I wanted to do is
show the same data
in a browser because browsers
actually offer some nice features
for navigating in JSON.
So here we are in the
Chrome browser, which has nice
features for navigating JSON,
and other browsers do as well.
We can see here again that we
have an object in
our JSON data, that consists
of two label-value pairs;
books and magazines, which are
currently closed and and then
this plus allows us to open them up, and see the structure.
For example, we open magazines
and we see that magazines is an array containing two objects.
We can open one of those
objects, and see that the three label-value pairs.
Now we're at the lowest levels and similarly for the other object.
We can see here that Books
is also an array, and we go ahead and open it up.
It's an array of two objects.
We open one of those
objects and we see again
the set of label-value pairs,
where one of the values
is a further nesting.
It's an array and we open
that array, and we see
two objects, and we open
them and finally see the data at the lowest levels.
So again, the browser
here gives us a nice way
to navigate the JSON data and see its structure.
So now we're back to our JSON editor.
By the way, this editor, Eclipse, does
also have some features for
opening and closing the structure
of the data, but it's
not quite as nice as the browser that we use.
So we decided to use the browser instead.
What we are going to
use the editor for is to
make some changes to the
JSON data and see which
changes are legal and which aren't.
So, let's take a look at the first change, a very simple one.
What if we forgot a comma.
Well, when we try to
save that file, we get a
little notice that we have an
error, we expected an
N value, so that's a
pretty straightforward mistake, let's put that comma back.
Let's say insert an
extra brace somewhere here, for whatever reason.
We accidentally put in an extra brace.
Again we see that that's marked as an error.
So an error that can
be fairly common to make is
to forget to put quotes around strings.
So, for example, this ISBN
number here, if we don't quote it, we're gonna get an error.
As we'll see the only things that can
be unquoted are numbers and
the values null, true and false.
So, let's put our quotes back there.
Now, actually, even more
common is to forget to
put quotes around the labels in label-value pairs.
But if we forget to quote that, that's going to be an error as well.
You might have noticed, by the
way, when we use the browser
that the browser didn't even show
us the quotes in the labels.
But you do when you make
the raw JSON data, you do need to include those quotes.
Speaking of quotes, what if we quoted our price here.
Well that's actually not an
error, because now we've simply turned
price into a string, and
string values are perfectly well allowed anywhere.
Now we'll see when we use
JSON's schema that we
can make restrictions that don't allow
strings in certain places, but
just for syntactic correctness of
JSON data any of our values can be strings.
Now, as I mentioned, there are
a few values that are
sort of reserved words in JSON.
For example, true is a
reserved word for a bullion value.
That means we don't need to
quote it because it's actually
its own special type of value.
And so is false.
And the third one is null,
so there's a built-in concept of null.
Now, if we wanted to
use nil for whatever reason
instead of null, well, now
we're going to get an error because
nil is not a reserved word,
and if we really wanted nil
then we would need to actually make it a quoted string.
Now, let's take a look inside our author list.
And I'm going to show you
that arrays do not have
to have the same type of
value for every element in the array.
So here we have a homogeneous
list of authors. Both of them
are objects with a first
name and a last name as
separate label-value pairs,
but if I change that
first one, the entire value
to be, instead of a
composite one, simply the string,
Jefferey Ullman. Oops, sorry
about my typing there, and that
is not an error, it
is allowed to have a string,
and then a composite object.
And we could even have an array, and anything we want.
In an array, when you
have a list of values, all
you need is for each one
to be syntactically a correct value in JSON.
Now let's go visit our magazines
for a moment here and let
me show that empty objects are okay.
So a list of label
value pairs, comprising an object, can be the empty list.
And so now I've turned this magazine
into having no information about
it, but that is legal in JSON.
And similarly, arrays are allowed to be of zero length.
So I can take these authors
here and I can just take
out all of the authors, and
make that an empty list, but that's still valid JSON.
Now, what if I took this array out altogether?
In that case, now we
have an error because this is
an object where we have
label-value pairs and every
label-value pair has to
have both a label and a value.
So let's put our array back
and we can have anything in
there so let's just make it
"fu" and that corrects the error.
What if we didn't want an
array here instead and we
tried to make it, say, an object,?
Well, we're going to see an
error there, because an object
as a reminder and this is an
easy mistake to make. Objects
are always label-value pairs.
So if you want just a value,
that should be an array if
you want an object, then we're
talking about a label-value pair, so
we can just add "fu" as
our value, and then we're all set.
So what we've seen so far is syntactic correctness.
Again, there's no required
uniformity across values in
arrays or in the
label-value pairs in objects we
just need to ensure that
all of our values, our basic
values, are of the right types,
and things like our commas and
curly braces are all in place.
What we're gonna do next is look
at JSON's schema where we
have a mechanism for enforcing certain
constraints beyond simple syntactic correctness.
If you've been very observant, you
might even have noticed that we
have a second tab up
here in our editor for a
second JSON file, and this file
is going to be the schema
for our bookstore data. We're using
JSON schema, and JSON
schema, like, XML schema
is expressed in the data model itself.
So, our schema description for
this JSON data is itself
JSON data, and here it is.
And it's going to take a bit of time to explain.
Now the first thing that you might
notice is wow, the schema
looks more complicated and in
fact longer than the data itself.
Well, that is true, but that's mostly because our data file is tiny.
So, if we had thousands, you know, tens
of thousands of books and magazines,
our schema file wouldn't
change, but our data file would
be much longer and that's the typical case, in reality.
Now, this video is not a
complete tutorial about JSON's schema.
There's many constructs in JSON's
schema that weren't needed to
describe the bookstore data, for example.
And even this file here,
I'm not gonna go through every detail of it right here.
You can download the file and
take a look, read a little more about JSON schema.
I'm just going to give the
flavor of the schema
specification and then we're
going to work with validating the data
itself to see how the schema and data work together.
But to give you the flavor here, let's go through at least some portions of the schema.
So, in some sense,
the structure of the schema file
reflects the structure of the data file that it's describing.
So, the outermost constructs in
the schema file are the
outermost in the data file and
as we nest it parallels the nesting.
Let me just show a little
bit here, we'll probably look at most of it in the context of validation.
So, we see here that our outermost construct in our data file is an object.
And that's told to us,
because we have "type" as
one of our built-in labels for the schema.
So we we have an
object with two properties, as
we can see here, the book's property
and the magazine's property.
And I use the word
"labels" frequently for label-value
pairs, that's synonymous with property value pairs.
Then inside the books property
for example, we see that
the type of that is array,
so we've got a label-value pair where the value is an array.
And then we follow the nesting and see that it's an array of objects.
And we go further down and we
see the different label-value pairs
of the object that make up
the books and nesting further into the authors and so on.
We see similarly for magazines
that the value of the
a label-value pair for
magazines is an array, and
that array consists of objects with further nesting.
So what we're looking at here is
an online JSON schema validator. We have two windows.
On the left we have our
schema and on the
right we have our data, and
this is exactly the same data
file and schema file that we were looking at earlier.
If we hit the validate button,
hopefully everything should work and it does.
This tells us that the
JSON data is valid with respect to the schema.
Now, this system will of
course find basic syntactic errors
so I can take away a comma
just like I did before and
when I validate I'll get a
parsing error that really has nothing to do with the schema.
What I'm going to focus on
now is actually validating
semantic correctness of the Jason
with respect back to the constructs
that we've specified in this schema.
Let me first put that comma back so we start with a valid file.
So, the first thing I'll show is
the ability to constrain basic
types, and then the ability
to constrain the range of values of those basic types.
And let's focus on price.
So here we're talking about the
price property inside books and
we specify in our schema
that the type of the price must be an integer.
So, for example, if our
price were instead a string
and we went ahead and try
to validate that we would get an error.
Let's make it back into an
integer but let's make
it into the integer 300 now instead of 100.
And why am I doing that?
Because the JSON schema also
lets me constrain the range of
values that are allowed if we have a numeric value.
So, not only in price did I
say that it's an integer but
I also said that it
has a minimum and maximum value,
the integer of prices must
be between 0 and 200.
So, if I try to make
the price of 300, and I
validate, I'm again getting an error.
Now it's not a type error,
but it's an error that my
integer was outside of the allowed range.
I've put the price back to
a hundred, and now let's
look at constraints on string values.
JSON schema actually has
a little pattern matching language that
can be used to constrain the
allowable strings for a specific type of value.
We'll look at ISBN number here as an example of that.
We've said that ISBN is
of type string, and then
we've further constrained in the
schema that the string values for
ISBN must satisfy a certain pattern.
I'm not gonna go into the details of this pattern-matching language.
I'm just gonna give an example.
And in fact, this entire demo is
really just an example lots of
things in JSON's schema that we're not seeing.
What this pattern here says is
that the string value for
ISBN must start with
the four characters ISBN and then can be followed by anything else.
So, if we go over to our
data and we look at
the ISBN number here and
say we have a typo, we
forgot the "I" and we try to validate.
Then we'll see that our data
no longer matches our schema specification.
Now let's look at some other constraints we can specify in JSON's schema.
We can constrain the number of elements in an array.
We can give a minimum or maximum or both.
And I've done that here in the context of the authors array.
Remember the authors are
an array that's a list of
objects and here I've said that
we have a minimum number of
items of 1 and a
maximum number items of 10.
In other words, every book
has to have between one and ten authors.
So let's try, for example,
taking out all of our authors here in our first book.
We actually looked at this before in terms
of syntactic validity, and it
was perfectly valid to have an empty array.
But when we try to validate
now we do get an
error, and the reason is
that we said that we needed
between one and ten array elements in the case of authors.
Now let's fix that,
not by putting our authors back
but let's say we actually decide
we would like to be able to have books that have no authors.
So, we can simply fix
that by changing that minimum
item to zero and that
makes our data valid again and
in fact, we could actually take that
minimum constraint out all together,
and if we do that our data is still going to be valid.
Now let's see what happens when we
add something to our data that isn't mentioned in the schema.
If you look carefully you'll see
that everything that we have
in the data so far has been specified in the schema.
Let's say we come along
and decide were gonna also have ratings for our books.
So let's add here a
rating label property with the value 5.
We go ahead and validate, you
probaly think it's not going to
validate properly but actually it did.
The definition of JSON
schema that it can constrain things by
describing them but you
can also have components in
the data that aren't present in this schema.
If we want to insist
that every property that is
present in the data is
also described in this
schema, then we can
actually add a constraint to the schema that tells us that.
Specifically, under the object
here, we can put in
a special flag which itself
is specified as a label called additional properties.
And this flag if we
set it to false and remember
false can is actually a keyword
in json's schema, tells us
that in our data we're not
allowed to have any properties
beyond those that are specified in the schema.
So now we validate and we
get an error, because the property
rating hasn't been defined in the schema.
If additional properties is missing,
or have the default value
of "true", then the validation goes through.
Now lets take a look at our authors that are still here.
Let's suppose that we don't
have a first name for our middle author here.
If we take that away and
we try to validate, we do
get an error, because we specified
in our schema and it's right
down here--that author-objects must
have both a first name and a last name.
It turns out that we can
specify for every property that the property is optional.
So, we can add to the
description of the first
name, not only that the
type is a string but that that
property is optional so we
say optional, true.
Now let's validate, and now we're in good shape.
Now, let's take a look
at what happens when we have
object that has more than
one instance of the same label or same property.
So let's suppose, for example, in
our magazine, the magazine
has two different years, 2009 and 2011.
This is syntactically valid, JSON,
it meets the structure of having a list of label-value pairs.
When we validate it, we
see that we can't add a second property, year.
So this validator doesn't permit
two copies of the same
property, and it's actually kind
of a parsing thing and not
so much related to JSON's schema.
Many parsers actually do enforce
that labels or properties need
to be unique within objects, even
though technically syntactically correct
JSON does allow multiple copies.
So that's just something to remember,
the typical use of objects is
to have unique labels, sometimes
are even called keys of which evokes a concept of them unique.
So typically they are unique.
They don't have to be for syntactic validity.
Usually when you wanna have
repeated values, it actually makes more sense to create an array.
I've taken away the second year in order to make the JSON valid again.
Now let's take a look at months.
I've used months to illustrate
the enumeration constraint so we
saw that we could constrain the
values of integers, and we
saw that we can constrain strings
using a pattern, but we can
also constrain any type by
enumerating the values that are allowed.
So, for the month, we've set
it a string type which it
is but we've further constrained it
by saying that string must be
either January or February.
So, if we try to say
put in the string March, we
validate and we get the obvious error here.
We can fix that by changing the
month back, but maybe it
makes more sense that March
would be part of our enumeration type,
so we'll add March to
the possible values for months, and now we're good.
As a next example, let's take
a look at something that we
saw was syntactically correct but
isn't going to be semantically
correct, which is when
we have the author list
be a mixture of objects and strings.
So, let's put Jeffrey Ullman here just as a string.
We saw that that was still
valid JSON, but when we
try to validate now, we're gonna
get an error because we expected
to see an object, we have
specified that the authors
are objects, and instead we got a string.
Now JSON schema does allow
us to specify that we
can have different types of data
in the same context, and I'm
going to show that with a little bit of a simpler example here.
So, let's first take away our
author there so that we're back with a valid file.
And what I am going to look at is simply the year values.
So, let suppose for whatever
reason that in our
magazines, one of the
years was a string and the other year was an integer.
So that's not gonna work out
right now because we have
specified clearly that the year must be an integer.
In JSON schema specifications, when we
want to allow multiple types
for values that are
used in the same context, we
actually make the type be an array.
So instead of just saying
integer, if we put
an array here that has
both integer and string that's
telling us that our year
value can be either an
integer or a string
and now when we validate,
we get a correct JSON file.
That concludes our demo of JSON schema validation.
Again, we've just seen
one example with a number
of the constructs that are available
in JSON schema, but it's not
nearly exhaustive, there are many
others, and I encourage you
to read a bit more about it.
You can download this data and
this schema as a starting
point, and start adding things playing around
and I think you'll get a
good feel for how JSON
schema can be used to
constrain the allowable data in a JSON file.