Return to Video

01-01-introduction.mp4

  • 0:01 - 0:04
    Welcome to the course Introduction to Databases.
  • 0:04 - 0:06
    I'm Jennifer Widom from Stanford University.
  • 0:06 - 0:08
    In this course we'll be learning
  • 0:08 - 0:09
    about databases and the use
  • 0:09 - 0:12
    of database management systems, primarily
  • 0:12 - 0:14
    from the viewpoint of the designer,
  • 0:14 - 0:18
    user and developer of database applications.
  • 0:18 - 0:21
    I'm going to start by describing in
  • 0:21 - 0:22
    one very long sentence what
  • 0:22 - 0:27
    a database management system provides for applications.
  • 0:27 - 0:29
    It provides a means of handling large amounts
  • 0:29 - 0:32
    of data primarily, but let's looks at a little more detail.
  • 0:32 - 0:33
    What it provides, in a
  • 0:33 - 0:36
    long sentence, is efficient, reliable,
  • 0:36 - 0:40
    convenient and safe multi-user
  • 0:40 - 0:41
    storage of and access to
  • 0:41 - 0:45
    massive amounts of persistent data.
  • 0:45 - 0:46
    So, I'm going to go
  • 0:46 - 0:47
    into each one of those adjectives in
  • 0:47 - 0:49
    a little bit more detail in a moment.
  • 0:49 - 0:51
    But I did want to mention that database
  • 0:51 - 0:54
    systems are extremely prevalent in the world today.
  • 0:54 - 0:56
    They sit behind many websites
  • 0:56 - 0:58
    that will run your banking systems,
  • 0:58 - 1:01
    your telecommunications, deployments of
  • 1:01 - 1:04
    sensors, scientific experiments and much, much more.
  • 1:04 - 1:05
    Highly prevalent.
  • 1:05 - 1:06
    So let's talk a little
  • 1:06 - 1:08
    bit about why database systems are
  • 1:08 - 1:13
    so popular so and prevalent by looking at these seven adjectives.
  • 1:13 - 1:15
    The first aspect of database
  • 1:15 - 1:16
    systems is that they handle
  • 1:16 - 1:19
    data at a massive scale.
  • 1:19 - 1:20
    So if you think about
  • 1:20 - 1:22
    the amount of data that is
  • 1:22 - 1:24
    being produced today, database systems
  • 1:24 - 1:25
    are handling terabytes of data,
  • 1:25 - 1:29
    sometimes even terabytes of data every day.
  • 1:29 - 1:30
    And one of the critical
  • 1:30 - 1:31
    aspects is that the data
  • 1:31 - 1:33
    that's handled by database management systems
  • 1:33 - 1:35
    systems is much larger than can
  • 1:35 - 1:38
    fit in the memory of a typical computing system.
  • 1:38 - 1:39
    So memories are indeed growing
  • 1:39 - 1:41
    very, very fast, but the
  • 1:41 - 1:42
    amount of data in the world
  • 1:42 - 1:43
    and data to be handled by
  • 1:43 - 1:46
    database systems is growing much faster.
  • 1:46 - 1:48
    So database systems are
  • 1:48 - 1:52
    designed to handle data that to residing outside of memory.
  • 1:52 - 1:54
    Secondly, the data that's
  • 1:54 - 1:58
    handled by database management systems is typically persistent.
  • 1:58 - 1:59
    And what I mean by that is
  • 1:59 - 2:00
    that the data in the database
  • 2:00 - 2:04
    outlives the programs that execute on that data.
  • 2:04 - 2:06
    So if you run
  • 2:06 - 2:08
    a typical computer program the program
  • 2:08 - 2:11
    will start the variables we created.
  • 2:11 - 2:13
    There will be data that's operated on
  • 2:13 - 2:16
    the program, the program will finish and the data will go away.
  • 2:16 - 2:18
    It's sort of the other way with databases.
  • 2:18 - 2:20
    The data is what sits there
  • 2:20 - 2:21
    and then program will start
  • 2:21 - 2:22
    up, it will operate on the
  • 2:22 - 2:25
    data, the program will stop and the data will still be there.
  • 2:25 - 2:27
    Very often actually multiple programs
  • 2:27 - 2:31
    will be operating on the same data.
  • 2:31 - 2:32
    Next, safety.
  • 2:32 - 2:34
    So database systems, since
  • 2:34 - 2:36
    they run critical applications such as
  • 2:36 - 2:39
    telecommunications and banking systems,
  • 2:39 - 2:40
    have to have guarantees that
  • 2:40 - 2:42
    the data managed by the system
  • 2:42 - 2:44
    will stay in a consistent
  • 2:44 - 2:45
    state, it won't be lost or
  • 2:45 - 2:47
    overwritten when there are
  • 2:47 - 2:50
    failures, and there can be hardware failures.
  • 2:50 - 2:53
    There can be software failures.
  • 2:53 - 2:55
    Even simple power outages.
  • 2:55 - 2:57
    You don't want your bank
  • 2:57 - 2:58
    balance to change because the
  • 2:58 - 3:00
    power went out at your bank branch.
  • 3:00 - 3:02
    And of course there are the problem
  • 3:02 - 3:05
    of malicious users that may try to corrupt data.
  • 3:05 - 3:06
    So database systems have a
  • 3:06 - 3:08
    number of built in mechanisms that
  • 3:08 - 3:10
    ensure that the data remains consistent,
  • 3:10 - 3:12
    regardless of what happens.
  • 3:12 - 3:14
    Next multi-user. So I
  • 3:14 - 3:18
    mentioned that multiple programs may operate on the same database.
  • 3:18 - 3:20
    And even with one program operating
  • 3:20 - 3:22
    on a database, that program may
  • 3:22 - 3:23
    allow many different users or
  • 3:23 - 3:27
    applications to access the data concurrently.
  • 3:27 - 3:28
    So when you have
  • 3:28 - 3:30
    multiple applications working on
  • 3:30 - 3:32
    the same data, the system
  • 3:32 - 3:33
    has to have some mechanisms, again,
  • 3:33 - 3:36
    to ensure that the data stays consistent.
  • 3:36 - 3:37
    That you don't have, for example,
  • 3:37 - 3:39
    half of a data item
  • 3:39 - 3:41
    overwritten by one person and
  • 3:41 - 3:43
    the other half overwritten by another.
  • 3:43 - 3:45
    So there's mechanisms in database
  • 3:45 - 3:48
    systems called concurrency control.
  • 3:48 - 3:49
    And the idea there is
  • 3:49 - 3:53
    that we control the way multiple users access the database.
  • 3:53 - 3:55
    Now we don't control it by
  • 3:55 - 3:57
    only having one user have
  • 3:57 - 3:58
    exclusive access to the database
  • 3:58 - 4:01
    or the performance would slow down considerably.
  • 4:01 - 4:03
    So the control actually occurs at
  • 4:03 - 4:05
    the level of the data items in the database.
  • 4:05 - 4:07
    So many users might be operating
  • 4:07 - 4:09
    on the same database but be
  • 4:09 - 4:11
    operating on different individual data items.
  • 4:11 - 4:12
    It's a little bit similar
  • 4:12 - 4:14
    to, say, file system concurrency or
  • 4:14 - 4:16
    even variable concurrency in programs,
  • 4:16 - 4:21
    except it's more centered around the data itself.
  • 4:21 - 4:24
    The next adjective is convenience, and
  • 4:24 - 4:26
    convenience is actually one of the
  • 4:26 - 4:28
    critical features of database systems.
  • 4:28 - 4:29
    They really are designed to make
  • 4:29 - 4:31
    it easy to work with large
  • 4:31 - 4:32
    amounts of data and to
  • 4:32 - 4:35
    do very powerful and interesting processing on that data.
  • 4:35 - 4:39
    So there's a couple levels at which that happens.
  • 4:39 - 4:44
    There's a notion in databases called Physical Data Independence.
  • 4:44 - 4:45
    It's kind of a mouthful, but
  • 4:45 - 4:47
    what that's saying is that
  • 4:47 - 4:49
    the way that data is actually
  • 4:49 - 4:51
    stored and laid out on
  • 4:51 - 4:53
    disk is independent of the
  • 4:53 - 4:56
    way that programs think about the structure of the data.
  • 4:56 - 4:57
    So you could have a program that
  • 4:57 - 4:59
    operates on a database and
  • 4:59 - 5:00
    underneath there could be a
  • 5:00 - 5:02
    complete change in the
  • 5:02 - 5:04
    way the data is stored, yet
  • 5:04 - 5:06
    the program itself would not have to be changed.
  • 5:06 - 5:07
    So the operations on the
  • 5:07 - 5:11
    data are independent from the way the data is laid out.
  • 5:11 - 5:12
    And somewhat related to
  • 5:12 - 5:15
    that is the notion of high level query languages.
  • 5:15 - 5:17
    So, the databases are
  • 5:17 - 5:20
    usually queried by languages
  • 5:20 - 5:23
    that are relatively compact
  • 5:23 - 5:24
    to describe, really at a
  • 5:24 - 5:28
    very high level what information you want from the database.
  • 5:28 - 5:31
    Specifically, they obey a
  • 5:31 - 5:33
    notion that's called declarative, and what
  • 5:33 - 5:36
    declarative is saying is that
  • 5:36 - 5:37
    in the query, you describe
  • 5:37 - 5:38
    what you want out of the
  • 5:38 - 5:40
    database but you don't need
  • 5:40 - 5:42
    to describe the algorithm to
  • 5:42 - 5:44
    get the data out, and that's a really nice feature.
  • 5:44 - 5:45
    It allows you to write queries in
  • 5:45 - 5:47
    a very simple way, and then
  • 5:47 - 5:48
    the system itself will find
  • 5:48 - 5:52
    the algorithm to get that data out efficiently.
  • 5:52 - 5:54
    And speaking of efficiency, that's
  • 5:54 - 5:56
    number six, but certainly not
  • 5:56 - 5:59
    sixth importance. There's in
  • 5:59 - 6:00
    real estate as a little
  • 6:00 - 6:02
    aside here, a old saying
  • 6:02 - 6:03
    that when you have a piece of
  • 6:03 - 6:05
    property, the most important three
  • 6:05 - 6:06
    aspects of the property are
  • 6:06 - 6:10
    the location of the property, the location and the location.
  • 6:10 - 6:12
    And people say the same
  • 6:12 - 6:13
    thing about databases, a similar
  • 6:13 - 6:15
    parallel joke, which is that the
  • 6:15 - 6:17
    three most important things in
  • 6:17 - 6:19
    a database system is first
  • 6:19 - 6:23
    performance, second performance and again performance.
  • 6:23 - 6:24
    So database systems have
  • 6:24 - 6:28
    to do really thousands of queries
  • 6:28 - 6:31
    or updates per second.
  • 6:31 - 6:34
    These are not simple queries necessarily.
  • 6:34 - 6:36
    These may be very complex operations.
  • 6:36 - 6:39
    So, constructing a
  • 6:39 - 6:40
    database system, that can execute
  • 6:40 - 6:42
    queries, complex queries, at that
  • 6:42 - 6:44
    rate, over gigantic amounts of
  • 6:44 - 6:46
    data, terabytes of data is no
  • 6:46 - 6:47
    simple task, and that is
  • 6:47 - 6:49
    one of the major features also, provided
  • 6:49 - 6:51
    by a database management system.
  • 6:51 - 6:55
    And lastly, but again not last in importance is reliability.
  • 6:55 - 6:56
    Again, looking back at say
  • 6:56 - 6:58
    your banking system or your telecommunications
  • 6:58 - 7:00
    system, it's critically important
  • 7:00 - 7:03
    that those are up all the time.
  • 7:03 - 7:07
    So 99.99999 % up time
  • 7:07 - 7:08
    is the type of guarantee that
  • 7:08 - 7:13
    database management systems are making for their applications.
  • 7:13 - 7:14
    So that gives us an idea
  • 7:14 - 7:17
    of all the terrific things that a database system provides.
  • 7:17 - 7:18
    I hope you're all ready convinced that
  • 7:18 - 7:21
    if you have a application you
  • 7:21 - 7:22
    want to build that involves data, it
  • 7:22 - 7:23
    would be great to have all
  • 7:23 - 7:27
    of these features provided for you in a database system.
  • 7:27 - 7:28
    Now let me mention a few
  • 7:28 - 7:30
    of the aspects surrounding database
  • 7:30 - 7:31
    systems and scope a little
  • 7:31 - 7:34
    bit what we're going to be covering in this course.
  • 7:34 - 7:37
    When people build database applications,
  • 7:37 - 7:40
    sometimes they program them with what's known as a framework.
  • 7:40 - 7:41
    Currently at the time of
  • 7:41 - 7:42
    this video, some of the
  • 7:42 - 7:44
    popular frameworks are Django
  • 7:44 - 7:46
    or Ruby on Rails, and these
  • 7:46 - 7:48
    are environments that help you
  • 7:48 - 7:49
    develop your programs, and help
  • 7:49 - 7:51
    you generate, say the calls
  • 7:51 - 7:53
    to the database system. We're
  • 7:53 - 7:54
    not, in this set of
  • 7:54 - 7:55
    videos, going to be talking
  • 7:55 - 7:56
    about the frameworks, but rather we're
  • 7:56 - 7:58
    going to be talking about the data
  • 7:58 - 8:02
    base system itself and how it is used and what it provides.
  • 8:02 - 8:04
    Second of all, database systems are
  • 8:04 - 8:07
    often used in conjunction with what's known as middle-ware.
  • 8:07 - 8:08
    Again, at the time of this
  • 8:08 - 8:10
    video, typical middle-ware might
  • 8:10 - 8:12
    be application servers, web servers,
  • 8:12 - 8:14
    so this middle-ware helps
  • 8:14 - 8:16
    applications interact with database
  • 8:16 - 8:18
    systems in certain types of ways.
  • 8:18 - 8:20
    Again, that's sort of outside the scope of the course.
  • 8:20 - 8:24
    We won't be talking about middleware in the course.
  • 8:24 - 8:25
    Finally, it's not the
  • 8:25 - 8:27
    case that every application that
  • 8:27 - 8:29
    involves data necessarily uses
  • 8:29 - 8:33
    the database system, so historically,
  • 8:33 - 8:34
    a lot of data has been stored
  • 8:34 - 8:37
    in files, I think that's a little bit less so these days.
  • 8:37 - 8:40
    Still, there's a lot of data out there that's simply sitting in files.
  • 8:40 - 8:43
    Excel spreadsheets is another
  • 8:43 - 8:45
    domain where there's a lot
  • 8:45 - 8:47
    of data sitting out there, and
  • 8:47 - 8:49
    it's useful in certain ways, and the
  • 8:49 - 8:51
    processing of data is not always
  • 8:51 - 8:54
    done through query languages associated with database systems.
  • 8:54 - 8:56
    For example, Hadoop is
  • 8:56 - 8:59
    a processing framework for running
  • 8:59 - 9:02
    operations on data that's stored in files.
  • 9:02 - 9:04
    Again, in this set of
  • 9:04 - 9:05
    videos we're going to focus
  • 9:05 - 9:07
    on the database management system
  • 9:07 - 9:09
    itself and on storing
  • 9:09 - 9:13
    and operating on data through a database management system.
  • 9:13 - 9:16
    So there are four key concepts that we're going to cover for now.
  • 9:16 - 9:18
    The first one is the data model.
  • 9:18 - 9:20
    The data model is a
  • 9:20 - 9:23
    description of, in general, how the data is structured.
  • 9:23 - 9:24
    One of the most common
  • 9:24 - 9:26
    data models is the relational dot
  • 9:26 - 9:28
    data model, we'll spend quite a bit of time on that.
  • 9:28 - 9:30
    In the relational data model
  • 9:30 - 9:33
    the data and the database is thought of as a set of records.
  • 9:33 - 9:35
    Now another popular way to
  • 9:35 - 9:37
    store data is for example,
  • 9:37 - 9:39
    in XML documents, so, an XML
  • 9:39 - 9:40
    document captures data, instead
  • 9:40 - 9:42
    of a set of records, as a
  • 9:42 - 9:45
    hierarchical structure, of labeled values.
  • 9:45 - 9:47
    Another possible data model
  • 9:47 - 9:49
    would be a graph data model or
  • 9:49 - 9:52
    all data in the database is in the form of nodes and edges.
  • 9:52 - 9:54
    So again, a data model is
  • 9:54 - 9:55
    telling you the general form of
  • 9:55 - 9:58
    data that's going to be stored in the database.
  • 9:58 - 10:02
    Next is the concept of schema versus data.
  • 10:02 - 10:03
    One can think of this kind
  • 10:03 - 10:07
    of like types and variables in a programming language.
  • 10:07 - 10:09
    The schema sets up
  • 10:09 - 10:11
    the structure of the database.
  • 10:11 - 10:12
    Maybe I'm going to have information about
  • 10:12 - 10:15
    students with IDs and
  • 10:15 - 10:17
    GPAs, or about colleges,
  • 10:17 - 10:18
    and it's just going to tell
  • 10:18 - 10:19
    me the structure of the database
  • 10:19 - 10:21
    where the data is the actual
  • 10:21 - 10:25
    data stored within the schema.
  • 10:25 - 10:26
    Again, in a program, you
  • 10:26 - 10:27
    set up types and then you
  • 10:27 - 10:28
    have variables of those types, we'll
  • 10:28 - 10:29
    set up a schema, and then
  • 10:29 - 10:32
    we will have a whole bunch of data that adheres to that schema.
  • 10:32 - 10:34
    Typically the schema is set
  • 10:34 - 10:36
    up at the beginning, and doesn't change
  • 10:36 - 10:39
    very much where the data changes rapidly.
  • 10:39 - 10:41
    Now to set up the schema,
  • 10:41 - 10:44
    one normally uses what's known as a data definition language.
  • 10:44 - 10:46
    Sometimes people use higher level design
  • 10:46 - 10:48
    tools that help them think
  • 10:48 - 10:49
    about the design and then from
  • 10:49 - 10:52
    there go to the data definition language.
  • 10:52 - 10:53
    But it's used in general to set up
  • 10:53 - 10:57
    a scheme or structure for a particular database.
  • 10:57 - 10:58
    Once the schema has been set up
  • 10:58 - 11:00
    and data has been loaded, then
  • 11:00 - 11:01
    it's possible to start querying
  • 11:01 - 11:03
    and modifying the data and
  • 11:03 - 11:04
    that's typically done with what's
  • 11:04 - 11:07
    known as the data manipulation language,
  • 11:07 - 11:15
    so for querying and modifying the database.
  • 11:15 - 11:16
    Okay, so those are some key concepts
  • 11:16 - 11:17
    certainly we're going to get in
  • 11:17 - 11:21
    to much more detail in later videos about each of these concepts.
  • 11:21 - 11:22
    Now let's talk about the
  • 11:22 - 11:25
    people that are involved in a database system. So
  • 11:25 - 11:26
    the first person we'll mention
  • 11:26 - 11:28
    is the person who implements the
  • 11:28 - 11:31
    database system itself, the database implementer.
  • 11:31 - 11:32
    That's the person who builds the
  • 11:32 - 11:35
    system, that's not going to be the focus of this course.
  • 11:35 - 11:37
    We're going to be focusing more on
  • 11:37 - 11:38
    the types of things that are
  • 11:38 - 11:41
    done by the other three people that I'm going to describe.
  • 11:41 - 11:43
    The next one is the database designer.
  • 11:43 - 11:45
    So the database designer is the
  • 11:45 - 11:47
    person who establishes the schema
  • 11:47 - 11:48
    for a database.
  • 11:48 - 11:51
    So, let's suppose we have an application.
  • 11:51 - 11:51
    We know there's going to be a
  • 11:51 - 11:53
    lot of data involved in the
  • 11:53 - 11:54
    application and we want to
  • 11:54 - 11:55
    figure out how we are gonna structure
  • 11:55 - 11:57
    that data before we build
  • 11:57 - 11:59
    the application. That's the job of the database designer.
  • 11:59 - 12:01
    It's a surprisingly difficult job
  • 12:01 - 12:03
    when you have a very complex
  • 12:03 - 12:05
    data involved in an application.
  • 12:05 - 12:07
    Once you've established the
  • 12:07 - 12:08
    structure of the database
  • 12:08 - 12:10
    then it's time to build the
  • 12:10 - 12:11
    applications or programs that
  • 12:11 - 12:13
    are going to run on the
  • 12:13 - 12:15
    database, often interfacing between
  • 12:15 - 12:16
    the eventual user and the
  • 12:16 - 12:18
    data itself, and that's
  • 12:18 - 12:20
    the job of the application developer,
  • 12:20 - 12:26
    so those are the programs that operate on the database.
  • 12:26 - 12:28
    And again I've mentioned already
  • 12:28 - 12:29
    that you can have a database
  • 12:29 - 12:33
    with many different programs that operate on it, be very common.
  • 12:33 - 12:34
    You might, for example, have a
  • 12:34 - 12:37
    sales database where some applications
  • 12:37 - 12:39
    are actually inserting the sales
  • 12:39 - 12:41
    as they happen, while others are analyzing the sales.
  • 12:41 - 12:43
    So it's not necessary to have
  • 12:43 - 12:46
    a one-to-one coupling between programs and databases.
  • 12:46 - 12:50
    And the last person is the database administrator.
  • 12:50 - 12:51
    So the database administrator is the
  • 12:51 - 12:53
    person who loads the data,
  • 12:53 - 12:57
    sort of gets the whole thing running and keeps it running smoothly.
  • 12:57 - 12:59
    So, this actually turns
  • 12:59 - 13:00
    out to be a very important job
  • 13:00 - 13:03
    for large database applications.
  • 13:03 - 13:04
    For better or worse, database systems
  • 13:04 - 13:06
    do tend to have a
  • 13:06 - 13:07
    number of tuning parameters
  • 13:07 - 13:09
    associated with them, and getting
  • 13:09 - 13:11
    those tuning parameters right can
  • 13:11 - 13:12
    make a significant difference in the
  • 13:12 - 13:15
    all important performance of the database system.
  • 13:15 - 13:17
    So database administrators are
  • 13:17 - 13:20
    actually, highly valued, very important, highly
  • 13:20 - 13:22
    paid as a matter of fact,
  • 13:22 - 13:24
    and are, for large deployments,
  • 13:24 - 13:26
    an important person in the entire process.
  • 13:26 - 13:28
    So those are the people that
  • 13:28 - 13:29
    are involved, again, in this
  • 13:29 - 13:31
    class we'll be focusing mostly on
  • 13:31 - 13:33
    designing and developing applications,
  • 13:33 - 13:36
    a little bit on administration, but in
  • 13:36 - 13:37
    general thinking about databases and
  • 13:37 - 13:40
    the use of database management systems
  • 13:40 - 13:43
    from the perspective of the application builder and user.
  • 13:43 - 13:45
    To conclude, we're going to
  • 13:45 - 13:47
    be learning about databases and whether
  • 13:47 - 13:48
    you know it or not not you're
  • 13:48 - 13:50
    already using a database every day.
  • 13:50 - 13:52
    In fact, more likely than not
  • 13:52 -
    you're using a database every hour.
Title:
01-01-introduction.mp4
Video Language:
English
Duration:
13:55
Amara Bot added a translation

English subtitles

Revisions