1 00:00:02,719 --> 00:00:07,360 The Internet: HTTP and HTML 2 00:00:07,360 --> 00:00:11,740 I'm Jasmine and I'm a program manager on the XBOX One engineering 3 00:00:11,759 --> 00:00:18,700 team. One of our biggest features is called XBOX Live. It's an online service that connects 4 00:00:18,700 --> 00:00:24,099 gamers from all around the world, and we rely on the internet to make that happen. This 5 00:00:24,099 --> 00:00:30,500 is no easy task and there are a lot of things happening behind the scenes. The internet 6 00:00:30,500 --> 00:00:36,280 is totally changing how people interact and connect. But how does it work? How do the 7 00:00:36,280 --> 00:00:43,489 computers all across the world actually communicate with each other? Let's look at web browsing. 8 00:00:43,489 --> 00:00:50,199 First, you open a web browser. It's the app you use to access the web pages. Next, you 9 00:00:50,199 --> 00:00:55,899 type in the web address, or URL, which stands for Uniform Resource Locator of the website 10 00:00:55,899 --> 00:01:06,810 you want to visit like tumblr.com. Hi, I'm David Karp, the founder of Tumblr and we're 11 00:01:06,810 --> 00:01:12,560 here today to talk about how those web browsers we use everyday actually work. So you've probably 12 00:01:12,560 --> 00:01:16,350 wondered what actually happens when you type an address into your web browser and then 13 00:01:16,350 --> 00:01:21,020 hit enter. And it really is about as crazy as you can imagine. So in that moment your 14 00:01:21,020 --> 00:01:25,930 computer starts talking to another computer, called a server, that's usually thousands 15 00:01:25,930 --> 00:01:32,450 of miles away. And in milliseconds your computer asks that server for a website, and that server 16 00:01:32,450 --> 00:01:39,530 starts to talk back to your computer in a language called HTTP. HTTP stands for HyperText 17 00:01:39,530 --> 00:01:43,680 Transfer Protocol. You can kind of think of it as the language that one computer uses 18 00:01:43,680 --> 00:01:48,009 to ask another computer for a document. And it's actually really pretty straightforward. 19 00:01:48,009 --> 00:01:52,540 If you were to intercept the conversation between your computer and a web server on 20 00:01:52,540 --> 00:01:56,670 the internet, it's mainly made up of something called "GET" requests. Those are really very 21 00:01:56,670 --> 00:02:01,590 simply the word GET and the name of the document that you're requesting. So if you try to log 22 00:02:01,590 --> 00:02:06,360 into Tumblr and load our login page, all you're doing is sending a GET request to Tumblr's 23 00:02:06,360 --> 00:02:14,290 server that says GET /login. And that tells Tumblr's server that you want all of the HTML 24 00:02:14,290 --> 00:02:21,800 code for the Tumblr login page. So HTML stands for Hyper Text Markup Language and you can 25 00:02:21,800 --> 00:02:26,470 think of that as the language you use to tell a web browser how to make a page look. If 26 00:02:26,470 --> 00:02:30,540 you think about something like Wikipedia, which is really just a big simple document 27 00:02:30,540 --> 00:02:35,630 and HTML is the language that you use to make that title big and bold, to make the font 28 00:02:35,630 --> 00:02:42,690 the right font, to link certain text to certain other pages, to make some text bold, to make some 29 00:02:42,690 --> 00:02:46,740 text italic, to put an image in the middle of the page, to align the image to the right, 30 00:02:46,740 --> 00:02:52,990 to align the image to the left. The text of a web page is included directly in the HTML, 31 00:02:52,990 --> 00:02:58,380 but other parts like images or videos are separate files with their own URLs that need 32 00:02:58,380 --> 00:03:04,540 to be requested. The browser sends separate HTTP requests for each of these and displays 33 00:03:04,540 --> 00:03:11,670 them as they arrive. If a web page has a lot of different images, each of them causes a 34 00:03:11,670 --> 00:03:20,780 separate HTTP request and the page loads slower. Now sometimes when you browse the web, you're 35 00:03:20,780 --> 00:03:25,880 not just requesting pages with GET requests. Sometimes you send information like when you 36 00:03:25,880 --> 00:03:32,300 fill out a form or type a search query. Your browser sends this information in plain text 37 00:03:32,300 --> 00:03:39,090 to the web server using an HTTP POST request. Let's say you log in to Tumblr. Well the first 38 00:03:39,090 --> 00:03:45,360 thing you do is you make a POST request, that is a POST to Tumblr's login page that has 39 00:03:45,360 --> 00:03:49,680 some data attached to it. It has your email address, it has your password. That goes to 40 00:03:49,680 --> 00:03:55,350 Tumblr's server. Tumblr's server figures out that okay, you're David. It sends a web page 41 00:03:55,350 --> 00:04:00,480 back to your browser that says, Success! Logged in as David. But along with that web page, 42 00:04:00,480 --> 00:04:07,000 it also attaches a little bit of invisible cookie data that your browser sees and knows to save. 43 00:04:07,000 --> 00:04:11,360 And it's really important because it's really the only way that a website can remember who 44 00:04:11,360 --> 00:04:16,940 you are. All that cookie data really is, is an ID card for Tumblr. It's a number that 45 00:04:16,940 --> 00:04:21,790 identifies you as David. And your web browser holds on to that number and the next time 46 00:04:21,790 --> 00:04:26,660 you refresh Tumblr, the next time you go to Tumblr.com, your web browser knows to automatically 47 00:04:26,660 --> 00:04:30,930 attach that ID number with the request that it sends over to Tumblr's servers. So now 48 00:04:30,930 --> 00:04:35,970 Tumblr's servers sees the request coming from your browser, sees the ID number, and knows 49 00:04:35,970 --> 00:04:43,940 "Ok, this is a request from David." Now, the internet is completely open. All 50 00:04:43,940 --> 00:04:49,350 of its connections are shared and information is sent in plain text. This makes it possible 51 00:04:49,350 --> 00:04:55,630 for hackers to snoop on any personal information that you send over the internet. But safe 52 00:04:55,630 --> 00:05:00,970 websites prevent this, by asking your web browser to communicate on a secure channel 53 00:05:00,970 --> 00:05:07,630 using something called Secure Sockets Layer and its successor Transport Layer Security. 54 00:05:07,630 --> 00:05:14,000 You can think of SSL and TLS as a layer of security wrapped around your communications 55 00:05:14,000 --> 00:05:20,530 to protect them from snooping or tampering. SSL and TLS are active when you see the little 56 00:05:20,530 --> 00:05:27,440 lock that appears in your browser address bar, next to the HTTPS. The HTTPS protocols 57 00:05:27,440 --> 00:05:33,840 ensure that your HTTP requests are secure and protected. When a website asks your browser 58 00:05:33,840 --> 00:05:39,500 to engage in a secure connection, it first provides a digital certificate. Which is like 59 00:05:39,500 --> 00:05:45,140 an official ID card proving that it's the website it claims to be. Digital certificates 60 00:05:45,140 --> 00:05:49,900 are published by certificate authorities, which are trusted entities that verify the 61 00:05:49,900 --> 00:05:55,280 identities of websites and issue certificates for them. Just like a government can issue 62 00:05:55,280 --> 00:06:01,030 IDs or passports. Now if a website tries to start a secure connection without a properly 63 00:06:01,030 --> 00:06:09,590 issued digital certificate, your browser will warn you. That's the basics of web browsing! 64 00:06:09,590 --> 00:06:17,010 The part of the internet we see day to day. To summarize, HTTP and DNS manage the sending 65 00:06:17,010 --> 00:06:23,450 and receiving of HTML, media files, or anything on the web. What makes this possible under 66 00:06:23,450 --> 00:06:30,370 the hood are TCP/IP and router networks that break down and transport information in small 67 00:06:30,370 --> 00:06:36,670 packets. Those packets themselves are made up of binary, sequences of 1s and 0s that 68 00:06:36,670 --> 00:06:42,550 are physically sent through electric wires, fiber optic cables, and wireless networks. 69 00:06:42,550 --> 00:06:47,440 Fortunately, once you've learned how one layer of the internet works, you can rely on it 70 00:06:47,440 --> 00:06:52,070 without remembering all the details. And we can trust that all those layers will work 71 00:06:52,070 --> 00:06:59,090 together to successively deliver information at scale and with reliability.