< Return to Video

Lecture 11: Q&A (2020)

  • 0:00 - 0:07
    I guess we should do an intro to to this as well,
  • 0:07 - 0:10
    so this is a just sort of a
  • 0:10 - 0:15
    free-form Q&A lecture where you, as in
    the two people sitting here, but also
  • 0:15 - 0:20
    everyone at home who did not come here
    in person get to ask questions and we
  • 0:20 - 0:23
    have a bunch of questions people asked
    in advance but you can also ask
  • 0:23 - 0:27
    additional questions during, for the two
    of you who are here, you can do it either
  • 0:27 - 0:34
    by raising your hand or you can submit it on
    the forum and be anonymous, it's up to you
  • 0:34 - 0:36
    regardless though, what we're gonna
    do is just go through some of the
  • 0:36 - 0:40
    questions have been asked and try to
    give as helpful answers as we can
  • 0:40 - 0:44
    although they are unprepared on our side and
  • 0:44 - 0:46
    yeah that's the plan I guess we go
  • 0:46 - 0:49
    from popular to least popular
  • 0:49 - 0:50
    fire away
  • 0:50 - 0:52
    all right so for our first question any
  • 0:52 - 0:56
    recommendations on learning operating
    system related topics like processes,
  • 0:56 - 1:00
    virtual memory, interrupts,
    memory management, etc
  • 1:00 - 1:02
    so I think this is a
  • 1:02 - 1:07
    is an interesting question because these
    are really low level concepts that often
  • 1:07 - 1:11
    do not matter, unless you have to
    deal with this in some capacity,
  • 1:11 - 1:13
    right so
  • 1:13 - 1:18
    one instance where this matters is you're
    writing really low level code like
  • 1:18 - 1:20
    you're implementing a kernel or something
    like that, or you want to
  • 1:20 - 1:23
    just hack on the Linux kernel.
  • 1:23 - 1:25
    It's rare otherwise that you need to work with
  • 1:25 - 1:28
    especially like virtual memory and
    interrupts and stuff yourself
  • 1:28 - 1:32
    processes, I think are a more general concept
    that we've talked a little bit about in
  • 1:32 - 1:37
    this class as well and tools like
    htop, pgrep, kill, and signals and
  • 1:37 - 1:38
    that sort of stuff
  • 1:38 - 1:39
    in terms of learning it
  • 1:39 - 1:45
    maybe one of the best ways, is to try to
    take either an introductory class on the
  • 1:45 - 1:51
    topic, so for example MIT has a class
    called 6.828, which is where
  • 1:51 - 1:55
    you essentially build and develop your
    own operating system based on some code
  • 1:55 - 1:59
    that you're given, and all of those labs
    are publicly available and all the
  • 1:59 - 2:02
    resources for the class are publicly available,
    and so that is a good way to
  • 2:02 - 2:04
    really learn them is by doing them yourself.
  • 2:04 - 2:05
    There are also various
  • 2:05 - 2:11
    tutorials online that basically guide
    you through how do you write a kernel
  • 2:11 - 2:15
    from scratch. Not necessarily a very
    elaborate one, not one you would want
  • 2:15 - 2:21
    to run any real software on, but just to
    teach you the basics and so that would
  • 2:21 - 2:22
    be another thing to look up.
  • 2:22 - 2:24
    Like how do I write a kernel in and then your
  • 2:24 - 2:28
    language of choice. You will probably not
    find one that lets you do it in Python
  • 2:28 - 2:34
    but in like C, C++, Rust, there
    are a bunch of topics like this
  • 2:34 - 2:37
    one other note on operating systems
  • 2:37 - 2:40
    so like Jon mentioned MIT has a 6.828 class but
  • 2:40 - 2:43
    if you're looking for a more high-level
    overview, not necessarily programming or
  • 2:43 - 2:46
    an operating system, but just learning about
    the concepts another good resource
  • 2:46 - 2:51
    is a book called "Modern Operating
    Systems" by Andy Tannenbaum
  • 2:51 - 2:58
    there's also actually a book called the "The FreeBSD
    Operating System" which is really good,
  • 2:58 - 3:03
    It doesn't go through Linux, but it goes
    through FreeBSD and the BSD kernel is
  • 3:03 - 3:07
    arguably better organized than the Linux
    one and better documented and so it
  • 3:07 - 3:12
    might be a gentler introduction to some of those
    topics than trying to understand Linux
  • 3:12 - 3:15
    You want to check it as answered?
  • 3:15 - 3:17
    - Yes + Nice
  • 3:17 - 3:17
    Answered
  • 3:17 - 3:19
    For our next question,
  • 3:19 - 3:24
    What are some of the tools you'd
    prioritize learning first?
  • 3:24 - 3:30
    - Maybe we can all go through and
    give our opinion on this? + Yeah
  • 3:30 - 3:32
    Tools to prioritize learning first?
  • 3:32 - 3:36
    I think learning your editor well,
    just serves you in all capacities
  • 3:37 - 3:41
    like being efficient at editing files,
    is just like a majority of
  • 3:41 - 3:45
    what you're going to spend your time doing.
    And in general, just using your
  • 3:45 - 3:49
    keyboard more and your mouse less. It means
    that you get to spend more of your
  • 3:49 - 3:54
    time doing useful things and
    less of your time moving
  • 3:54 - 3:56
    I think that would be my top priority,
  • 4:05 - 4:07
    so I would say that for what
  • 4:07 - 4:10
    tool to prioritize will depend
    on what exactly you're doing
  • 4:10 - 4:16
    I think the core idea is you should try
    to find the types of tasks that you are
  • 4:16 - 4:18
    doing repetitively and so
  • 4:18 - 4:24
    if you are doing some sort of like
    machine learning workload and
  • 4:24 - 4:27
    you find yourself using Jupyter notebooks,
    like the one we presented
  • 4:27 - 4:33
    yesterday, a lot. Then again, using
    a mouse for that might not be
  • 4:33 - 4:36
    the best idea and you want to familiarize
    with the keyboard shortcuts
  • 4:36 - 4:41
    and pretty much with anything you will
    end up figuring out that there are some
  • 4:41 - 4:46
    repetitive tasks, and you're running a
    computer, and just trying to figure out
  • 4:46 - 4:48
    oh there's probably a better way to do this
  • 4:48 - 4:51
    be it a terminal, be it an editor
  • 4:51 - 4:56
    And it might be really interesting to
    learn to use some of the topics that
  • 4:56 - 5:01
    we have covered, but if they're not
    extremely useful in a everyday
  • 5:01 - 5:05
    basis then it might not be worth prioritizing them
  • 5:07 - 5:07
    Out of the topics
  • 5:08 - 5:12
    covered in this class, in my opinion, two
    of the most useful things are version
  • 5:12 - 5:15
    control and text editors, and I think they're
    a little bit different from each
  • 5:15 - 5:19
    other, in the sense that text editors I
    think are really useful to learn well,
  • 5:19 - 5:22
    but it was probably the case that before
    we started using Vim and all its fancy
  • 5:22 - 5:25
    keyboard shortcuts you had some other
    text editor you were using before and
  • 5:25 - 5:30
    you could edit text just fine maybe a little
    bit inefficiently, whereas I think
  • 5:30 - 5:33
    version control is another really useful
    skill and that's one where if you don't
  • 5:33 - 5:37
    really know the tool properly, it can actually
    lead to some problems like loss
  • 5:37 - 5:39
    of data or just inability to collaborate
    properly with people. So I
  • 5:39 - 5:43
    think version control is one of the first
    things that's worth learning well.
  • 5:43 - 5:47
    Yeah, I agree with that. I think
    learning a tool like Git is just
  • 5:47 - 5:50
    gonna save you so much heartache down the line.
  • 5:50 - 5:51
    It, also, to add on to that,
  • 5:52 - 5:57
    it really helps you collaborate with others,
    and Anish touched a little bit on GitHub
  • 5:57 - 6:01
    in the last lecture, and just learning
    to use that tool well in order
  • 6:01 - 6:05
    to work on larger software projects
    that other people are working on is
  • 6:05 - 6:06
    an invaluable skill.
  • 6:10 - 6:11
    For our next question,
  • 6:11 - 6:13
    "When do I use Python versus a
  • 6:13 - 6:16
    Bash script versus some other language?"
  • 6:16 - 6:20
    This is tough, because I think this comes
  • 6:20 - 6:22
    down to what Jose was saying earlier too,
  • 6:22 - 6:24
    that it really depends on
    what you're trying to do.
  • 6:24 - 6:27
    For me, I think for Bash scripts in particular,
  • 6:27 - 6:29
    Bash scripts are for
  • 6:29 - 6:33
    automating running a bunch of commands.
    You don't want to write any
  • 6:33 - 6:35
    other, like, business logic in Bash.
  • 6:35 - 6:39
    Like, it is just for, 'I want to run these
  • 6:39 - 6:44
    commands, in this order... maybe with
    arguments?' But - but, like, even that,
  • 6:44 - 6:48
    it's unclear that you want a Bash script
    once you start taking arguments.
  • 6:48 - 6:53
    Similarly, like, once you start doing any
    kind of, like, text processing, or
  • 6:53 - 6:55
    configuration, all that,
  • 6:55 - 6:59
    reach for a language that is... a more, a more serious
  • 6:59 - 7:01
    programming language than Bash is.
  • 7:01 - 7:03
    Bash is really for short, one-off
  • 7:03 - 7:10
    scripts, or ones that have a very well-defined
    use case, on the terminal, in
  • 7:10 - 7:13
    the shell, probably.
  • 7:13 - 7:16
    For a slightly more concrete guideline,
    you might say, 'Write a
  • 7:16 - 7:19
    Bash script if it's less than a hundred
    lines of code or so', but once it gets
  • 7:19 - 7:22
    beyond that point, Bash is kind of
    unwieldy, and it's probably worth
  • 7:22 - 7:25
    switching to a more serious programming
    language, like Python.
  • 7:25 - 7:27
    And, to add to that,
  • 7:27 - 7:32
    I would say, like, I found myself writing,
    sometimes, scripts in Python, because
  • 7:32 - 7:37
    if I have already solved some subproblem
    that covers part of the problem in Python,
  • 7:37 - 7:41
    I find it much easier to compose the
    previous solution that I found out in
  • 7:41 - 7:46
    Python than just try to reuse Bash code,
    that I don't find as reusable as Python.
  • 7:46 - 7:50
    And in the same way it's kind of nice that
    a lot of people have written something
  • 7:50 - 7:53
    like Python libraries or like Ruby libraries
    to do a lot of these things,
  • 7:53 - 7:58
    whereas, in Bash, it's kind of hard
    to have, like, code reuse.
  • 7:58 - 8:02
    And, in fact,
  • 8:02 - 8:08
    I think to add to that, usually, if you
    find a library, in some language that
  • 8:08 - 8:12
    helps with the task you're trying to
    do, use that language for the job.
  • 8:12 - 8:16
    And in Bash, there are no libraries. There
    are only the programs on your computer.
  • 8:16 - 8:19
    So you probably don't want to use
    it, unless like there's a program
  • 8:19 - 8:24
    you can just invoke. I do think another
    thing worth remembering about Bash is:
  • 8:24 - 8:26
    Bash is really hard to get right.
  • 8:26 - 8:31
    It's very easy to get it right for the particular
    use case you're trying to solve right now,
  • 8:31 - 8:32
    but, like, things like,
  • 8:32 - 8:36
    "What if one of the filenames has a space in it?"
  • 8:36 - 8:39
    It has caused so many bugs, and so
  • 8:39 - 8:43
    many problems in Bash scripts. And, if you
    use a - a real programming language, then
  • 8:43 - 8:47
    those problems just go away.
  • 8:47 - 8:50
    Yes! Checked it.
  • 8:51 - 8:52
    For our next question,
  • 8:52 - 8:56
    what is the difference between sourcing
    a script, and executing that script?
  • 8:57 - 9:03
    Ooh. So, this, actually, we got in office
    hours a - a while back, as well, which is,
  • 9:03 - 9:07
    'Aren't they the same? Like, aren't they
    both just running the Bash script?'
  • 9:07 - 9:08
    And, it is true
  • 9:08 - 9:12
    both of these will end up executing the
    lines of code that are in the script.
  • 9:12 - 9:17
    The ways in which they differ is that
    sourcing a script is telling your
  • 9:17 - 9:23
    current Bash script, your current Bash
    session, to execute that program,
  • 9:23 - 9:29
    whereas the other one is, 'Start up a new instance
    of Bash, and run the program there, instead.'
  • 9:29 - 9:35
    And, this matters for things like... Imagine that
    "script.sh" tries to change directories.
  • 9:35 - 9:38
    If you are running the script,
    as in the second invocation,
  • 9:38 - 9:43
    "./script.sh", then the new
    process is going to change
  • 9:43 - 9:47
    directories. But, by the time that script
    exits, and returns to your shell,
  • 9:47 - 9:52
    your shell still remains in the same place. However,
    if you do "cd" in a script, and you "source" it,
  • 9:52 - 9:55
    your current instance of Bash is the
    one that ends up running it, and
  • 9:55 - 9:58
    so, it ends up "cd"-ing where you are.
  • 9:58 - 10:01
    This is also why, if you define functions,
  • 10:01 - 10:05
    for example, that you may want to
    execute in your shell session,
  • 10:05 - 10:07
    you need to source the script, not run it,
  • 10:07 - 10:10
    because if you run it, that function
    will be defined in the
  • 10:10 - 10:12
    instance of Bash,
  • 10:12 - 10:17
    in the Bash process that gets launched, but it
    will not be defined in your current shell.
  • 10:17 - 10:23
    I think those are two of the biggest
    differences between the two.
  • 10:29 - 10:30
    Next question...
  • 10:30 - 10:35
    "What are the places where various packages and tools
    are stored and how does referencing them work?
  • 10:35 - 10:39
    What even is /bin or /lib?"
  • 10:39 - 10:45
    So, as we covered in the first lecture,
    there is this PATH environment variable,
  • 10:45 - 10:50
    which is like a semicolon-separated-
    string of all the places
  • 10:50 - 10:55
    where your shell is gonna look for binaries.
    And, if you just do something like
  • 10:55 - 10:58
    "echo $PATH", you're gonna get this list;
  • 10:58 - 11:02
    all these places are gonna
    be consulted, in order.
  • 11:02 - 11:04
    It's gonna go through all of them, and, in fact,
  • 11:04 - 11:07
    - There is already... Did we cover which? + Yeah
  • 11:07 - 11:10
    So, if you run "which", and a specific command,
  • 11:10 - 11:14
    the shell is actually gonna tell
    you where it's finding this (command).
  • 11:14 - 11:15
    Beyond that,
  • 11:15 - 11:20
    there is like some conventions where a lot
    of programs will install their binaries
  • 11:20 - 11:24
    and they're like /usr/bin (or at
    least they will include symlinks)
  • 11:24 - 11:26
    in /usr/bin so you can find them
  • 11:26 - 11:28
    There's also a /usr/local/bin
  • 11:28 - 11:34
    There are special directories. For example,
    /usr/sbin it's only for sudo user and
  • 11:34 - 11:38
    some of these conventions are slightly
    different between different distros so
  • 11:38 - 11:48
    I know like some distros for example install
    the user libraries under /opt for example
  • 11:51 - 11:55
    Yeah I think one thing just
    to talk a little bit of more
  • 11:56 - 12:01
    about /bin and then Anish maybe you can
    do the other folders so when it comes to
  • 12:01 - 12:03
    /bin the convention
  • 12:03 - 12:10
    There are conventions, and the conventions are
    usually /bin are for essential system utilities
  • 12:10 - 12:13
    /usr/bin are for user programs and
  • 12:13 - 12:17
    /usr/local/bin are for user
    compiled programs, sort of
  • 12:17 - 12:22
    so things that you installed that you intend
    the user to run, are in /usr/bin
  • 12:22 - 12:27
    things that a user has compiled themselves and stuck
    on your system, probably goes in /usr/local/bin
  • 12:27 - 12:30
    but again, this varies a lot from machine
    to machine, and distro to distro
  • 12:30 - 12:34
    On Arch Linux, for example, /bin
    is a symlink to /usr/bin
  • 12:34 - 12:40
    They're the same and as Jose mentioned, there's
    also /sbin which is for programs that are
  • 12:40 - 12:44
    intended to only be run as root, that
    also varies from distro to distro
  • 12:44 - 12:47
    whether you even have that directory, and
    on many systems like /usr/local/bin
  • 12:47 - 12:51
    might not even be in your PATH, or
    might not even exist on your system
  • 12:51 - 12:56
    On BSD on the other hand /usr/local/bin
    is often used a lot more heavily
  • 12:57 - 12:57
    yeah so
  • 12:57 - 13:01
    What we were talking about so far, these
    are all ways that files and folders are
  • 13:01 - 13:05
    organized on Linux things or Linux or
    BSD things vary a little bit between
  • 13:05 - 13:07
    that and macOS or other platforms
  • 13:07 - 13:09
    I think for the specific locations,
  • 13:09 - 13:11
    if you to know exactly what it's
    used for, you can look it up
  • 13:11 - 13:17
    But some general patterns to keep in mind or anything
    with /bin in it has binary executable programs in it,
  • 13:17 - 13:20
    anything with \lib in it, has
    libraries in it so things that
  • 13:20 - 13:25
    programs can link against, and then some
    other things that are useful to know are
  • 13:25 - 13:29
    there's a /etc on many systems, which
    has configuration files in it and
  • 13:29 - 13:34
    then there's /home, which underneath that directory
    contains each user's home directory
  • 13:34 - 13:39
    so like on a linux box my username
    or if it's Anish will
  • 13:39 - 13:41
    correspond to a home directory /home/anish
  • 13:42 - 13:43
    Yeah I guess there are
  • 13:43 - 13:48
    a couple of others like /tmp is usually
    a temporary directory that gets
  • 13:48 - 13:51
    erased when you reboot not always but sometimes,
    you should check on your system
  • 13:52 - 13:59
    There's a /var which often holds like
    files the change over time so
  • 13:59 - 14:06
    these these are usually going to be things
    like lock files for package managers
  • 14:06 - 14:12
    they're gonna be things like log files
    files to keep track of process IDs
  • 14:12 - 14:16
    then there's /dev which shows devices so
  • 14:16 - 14:21
    usually so these are special files that
    correspond to devices on your system we
  • 14:21 - 14:27
    talked about /sys, Anish mentioned /etc
  • 14:29 - 14:36
    /opt is a common one for just like third-party
    software that basically it's usually for
  • 14:36 - 14:41
    companies ported their software to Linux
    but they don't actually understand what
  • 14:41 - 14:45
    running software on Linux is like, and
    so they just have a directory with all
  • 14:45 - 14:51
    their stuff in it and when those get installed
    they usually get installed into /opt
  • 14:51 - 14:56
    I think those are the ones off the top of my head
  • 14:56 - 14:58
    yeah
  • 14:58 - 15:02
    And we will list these in our lecture notes
    which will produce after this lecture
  • 15:02 - 15:04
    Next question
  • 15:04 - 15:07
    Should I apt-get install a Python whatever
  • 15:07 - 15:11
    package or pip install that package
  • 15:11 - 15:14
    so this is a good question that I think at
  • 15:14 - 15:17
    a higher level this question is asking
    should I use my systems package manager
  • 15:17 - 15:21
    to install things or should I use some other
    package manager. Like in this case
  • 15:21 - 15:25
    one that's more specific to a particular
    language. And the answer here is also
  • 15:25 - 15:29
    kind of it depends, sometimes it's nice
    to manage things using a system package
  • 15:29 - 15:32
    manager so everything can be installed
    and upgraded in a single place but
  • 15:32 - 15:35
    I think oftentimes whatever is available
    in the system repositories the things
  • 15:35 - 15:38
    you can get via a tool like
    apt-get or something similar
  • 15:38 - 15:41
    might be slightly out of date compared to
    the more language specific repository
  • 15:41 - 15:45
    so for example a lot of the Python packages
    I use I really want the most
  • 15:45 - 15:48
    up-to-date version and so
    I use pip to install them
  • 15:49 - 15:51
    Then, to extend on that is
  • 15:51 - 15:58
    sometimes the case the system packages
    might require some other
  • 15:58 - 16:02
    dependencies that you might not have realized
    about, and it's also might be
  • 16:02 - 16:07
    the case or like for some systems,
    at least for like alpine Linux they
  • 16:07 - 16:11
    don't have wheels for like a lot of the
    Python packages so it will just take
  • 16:11 - 16:15
    longer to compile them, it will take more
    space because they have to compile them
  • 16:15 - 16:21
    from scratch. Whereas if you just go
    to pip, pip has binaries for a lot of
  • 16:21 - 16:23
    different platforms and that will probably work
  • 16:23 - 16:29
    You also should be aware that pip might not do
    the exact same thing in different computers
  • 16:29 - 16:34
    So, for example, if you are in a kind of laptop
    or like a desktop that is running like
  • 16:34 - 16:39
    a x86 or x86_64 you probably have binaries,
    but if you're running something
  • 16:39 - 16:43
    like Raspberry Pi or some other kind of
    embedded device. These are running on a
  • 16:43 - 16:48
    different kind of hardware architecture
    and you might not have binaries
  • 16:48 - 16:52
    I think that's also good to take into account,
    in that case in might be worthwhile to
  • 16:52 - 16:59
    use the system packages just because they
    will take much shorter to get them
  • 16:59 - 17:02
    than to just to compile from scratch
    the entire Python installation
  • 17:02 - 17:07
    Apart from that, I don't think I can think of any exceptions
    where I would actually use the system packages
  • 17:07 - 17:09
    instead of the Python provided ones
  • 17:19 - 17:21
    So, one other thing to keep in mind is that
  • 17:21 - 17:26
    sometimes you will have more than one
    program on your computer and you might
  • 17:26 - 17:30
    be developing more than one program on
    your computer and for some reason not
  • 17:30 - 17:34
    all programs are always built with the latest
    version of things, sometimes they
  • 17:34 - 17:39
    are a little bit behind, and when you
    install something system-wide you can
  • 17:39 - 17:45
    only... depends on your exact system,
    but often you just have one version
  • 17:45 - 17:50
    what pip lets you do, especially combined
    with something like python's virtualenv,
  • 17:50 - 17:55
    and similar concepts exist for other
    languages, where you can sort of say
  • 17:55 - 18:00
    I want to (NPM does the same thing as well
    with its node modules, for example) where
  • 18:00 - 18:06
    I'm gonna compile the dependencies of
    this package in sort of a subdirectory
  • 18:06 - 18:10
    of its own, and all of the versions that it
    requires are going to be built in there
  • 18:10 - 18:14
    and you can do this separately for separate
    projects so there they have
  • 18:14 - 18:17
    different dependencies or the same dependencies
    with different versions
  • 18:17 - 18:21
    they still sort of kept separate. And that
    is one thing that's hard to achieve
  • 18:21 - 18:23
    with system packages
  • 18:27 - 18:28
    Next question
  • 18:28 - 18:33
    What's the easiest and best profiling tools
    to use to improve performance of my code?
  • 18:34 - 18:39
    This is a topic we could talk
    about for a very long time
  • 18:39 - 18:43
    The easiest and best is to print stuff using time
  • 18:43 - 18:48
    Like, I'm not joking, very often
    the easiest thing is in your code
  • 18:49 - 18:54
    At the top you figure out what the current
    time is, and then you do sort of
  • 18:54 - 18:58
    a binary search over your program of add
    a print statement that prints how much
  • 18:58 - 19:03
    time has elapsed since the start of your
    program and then you do that until you
  • 19:03 - 19:06
    find the segment of code that took the
    longest. And then you go into that
  • 19:06 - 19:10
    function and then you do the same thing
    again and you keep doing this until you
  • 19:10 - 19:14
    find roughly where the time was spent. It's
    not foolproof, but it is really easy
  • 19:14 - 19:17
    and it gives you good information quickly
  • 19:17 - 19:25
    if you do need more advanced information
    Valgrind has a tool called cache-grind?
  • 19:25 - 19:29
    call grind? Cache grind? One of the two.
  • 19:29 - 19:33
    and this tool lets you run your program and
  • 19:33 - 19:39
    measure how long everything takes and
    all of the call stacks, like which
  • 19:39 - 19:43
    function called which function, and what
    you end up with is a really neat
  • 19:43 - 19:47
    annotation of your entire program source
    with the heat of every line basically
  • 19:47 - 19:52
    how much time was spent there. It does
    slow down your program by like an order
  • 19:52 - 19:56
    of magnitude or more, and it doesn't really
    support threads but it is really
  • 19:56 - 20:01
    useful if you can use it. If you can't,
    then tools like perf or similar tools
  • 20:01 - 20:05
    for other languages that do usually some
    kind of sampling profiling like we
  • 20:05 - 20:10
    talked about in the profiler lecture, can
    give you pretty useful data quickly,
  • 20:10 - 20:15
    but it's a lot of data around
    this, but they're a little bit
  • 20:15 - 20:19
    biased and what kind of things they usually
    highlight as a problem and it
  • 20:19 - 20:23
    can sometimes be hard to extract meaningful
    information about what should
  • 20:23 - 20:28
    I change in response to them. Whereas the
    sort of print approach very quickly
  • 20:28 - 20:32
    gives you like this section
    of code is bad or slow
  • 20:32 - 20:35
    I think would be my answer
  • 20:35 - 20:40
    Flamegraphs are great, they're a good way
    to visualize some of this information
  • 20:41 - 20:46
    Yeah I just have one thing to add,
    oftentimes programming languages
  • 20:46 - 20:49
    have language specific tools for profiling
    so to figure out what's the
  • 20:49 - 20:52
    right tool to use for your language like if
    you're doing JavaScript in the web browser
  • 20:52 - 20:55
    the web browser has a really nice tool for
    doing profiling you should just use that
  • 20:55 - 21:00
    or if you are using go, for example, go has a built-in
    profiler is really good you should just use that
  • 21:02 - 21:04
    A last thing to add to that
  • 21:04 - 21:10
    Sometimes you might find that doing this binary
    search over time that you're kind of
  • 21:10 - 21:14
    finding where the time is going, but this
    time is sometimes happening because
  • 21:14 - 21:18
    you're waiting on the network, or you're
    waiting for some file, and in that case
  • 21:18 - 21:23
    you want to make sure that the time
    that is, if I want to write
  • 21:23 - 21:27
    like 1 gigabyte file or like read 1
    gigabyte file and put it into memory
  • 21:27 - 21:32
    you want to check that the actual time
    there, is the minimum amount of time
  • 21:32 - 21:36
    you actually have to wait. If it's ten times
    longer, you should try to use some
  • 21:36 - 21:39
    other tools that we covered in the debugging
    and profiling section to see
  • 21:39 - 21:46
    why you're not utilizing all your
    resources because that might...
  • 21:51 - 21:56
    Because that might be a lot of what's happening
    thing, like for example, in my research
  • 21:56 - 21:59
    in machine learning workloads, a lot of
    time is loading data and you have to
  • 21:59 - 22:03
    make sure well like the time it takes to
    load data is actually the minimum amount
  • 22:03 - 22:08
    of time you want to have that happening
  • 22:08 - 22:13
    And to build on that, there are actually
    specialized tools for doing things like
  • 22:13 - 22:17
    analyzing wait times. Very often when
    you're waiting for something what's
  • 22:17 - 22:21
    really happening is you're issuing your
    system call, and that system call takes
  • 22:21 - 22:24
    some amount of time to respond. Like you do
    a really large write, or a really large read
  • 22:24 - 22:28
    or you do many of them, and one thing
    that can be really handy here is
  • 22:28 - 22:32
    to try to get information out of the
    kernel about where your program is
  • 22:32 - 22:37
    spending its time. And so there's (it's
    not new), but there's a relatively
  • 22:37 - 22:43
    newly available thing called BPF or eBPF.
    Which is essentially kernel tracing
  • 22:43 - 22:49
    and you can do some really cool things with
    it, and that includes tracing user programs.
  • 22:49 - 22:52
    It can be a little bit awkward to
    get started with, there's a tool
  • 22:52 - 22:56
    called BPF trace that i would recommend
    you looking to, if you need to do like
  • 22:56 - 23:00
    this kind of low-level performance debugging.
    But it is really good for this
  • 23:00 - 23:05
    kind of stuff. You can get things like
    histograms over how much time was spent
  • 23:05 - 23:07
    in particular system calls
  • 23:07 - 23:10
    It's a great tool
  • 23:12 - 23:15
    What browser plugins do you use?
  • 23:17 - 23:20
    I try to use as few as I can get away with using
  • 23:20 - 23:26
    because I don't like things being in
    my browser, but there are a couple of
  • 23:26 - 23:30
    ones that are sort of staples.
    The first one is uBlock Origin.
  • 23:30 - 23:37
    So uBlock Origin is one of many ad blockers but
    it's a little bit more than an ad blocker.
  • 23:37 - 23:43
    It is (a what do they call it?) a
    network filtering tool so it lets
  • 23:43 - 23:47
    you do more things than just block ads.
    It also lets you like block connections
  • 23:47 - 23:51
    to certain domains, block connections
    for certain types of resources
  • 23:51 - 23:56
    So I have mine set up in what they call
    the Advanced Mode, where basically
  • 23:56 - 24:02
    you can disable basically all network requests.
    But it's not just Network requests,
  • 24:02 - 24:07
    It's also like I have disabled all inline
    scripts on every page and all
  • 24:07 - 24:12
    third-party images and resources, and then
    you can sort of create a whitelist
  • 24:12 - 24:16
    for every page so it gives you really
    low-level tools around how to
  • 24:16 - 24:20
    how to improve the security of your browsing.
    But you can also set it in not the
  • 24:20 - 24:24
    advanced mode, and then it does much of
    the same as a regular ad blocker would
  • 24:24 - 24:28
    do, although in a fairly efficient way
    if you're looking at an ad blocker it's
  • 24:28 - 24:32
    probably the one to use and it
    works on like every browser
  • 24:32 - 24:34
    That would be my top pick I think,
  • 24:39 - 24:44
    I think probably the one I
    use like the most actively
  • 24:44 - 24:50
    is one called Stylus. It lets you modify
    the CSS or like the stylesheets
  • 24:50 - 24:55
    that webpages have. And it's pretty
    neat, because sometimes you're
  • 24:55 - 24:59
    looking at a website and you want
    to hide some part of the website
  • 24:59 - 25:04
    you don't care about. Like maybe a ad, maybe
    some sidebar you're not finding useful
  • 25:04 - 25:06
    The thing is, at the end of
    the day these things are
  • 25:06 - 25:10
    displaying in your browser, and you
    have control of what code is
  • 25:10 - 25:13
    executing and similar to what Jon was
    saying, like you can customize this
  • 25:13 - 25:18
    to no end, and what I have for a lot of
    web pages like hide this this part, or
  • 25:18 - 25:23
    also trying to make like dark modes for
    them like you can change pretty much the
  • 25:23 - 25:27
    color for every single website. And what
    is actually pretty neat is that there's
  • 25:27 - 25:31
    like a repository online of people that
    have contributed this is stylesheets
  • 25:31 - 25:35
    for the websites. So someone probably
    has (done) one for GitHub
  • 25:35 - 25:39
    Like I want dark GitHub and someone has
    already contributed one that makes
  • 25:39 - 25:45
    that much more pleasing to browse. Apart
    from that, one that it's not really
  • 25:45 - 25:49
    fancy, but I have found incredibly helpful
    is one that just takes a screenshot an
  • 25:49 - 25:53
    entire website. And It will
    scroll for you and make
  • 25:53 - 25:58
    compound image of the entire website and that's
    really great for when you're trying to
  • 25:58 - 26:00
    print a website and is just terrible.
  • 26:00 - 26:01
    (It's built into Firefox)
  • 26:01 - 26:03
    oh interesting
  • 26:03 - 26:06
    oh now that you mention builtin to Firefox,
    another one that I really like about
  • 26:06 - 26:09
    Firefox is the multi account containers
  • 26:09 - 26:11
    (Oh yeah, it's fantastic)
  • 26:11 - 26:12
    Which kind of lets you
  • 26:12 - 26:17
    By default a lot of web browsers, like
    for example Chrome, have this
  • 26:17 - 26:21
    notion of like there's session that you
    have, where you have all your cookies
  • 26:21 - 26:25
    and they are kind of all shared from the
    different websites in the sense of
  • 26:25 - 26:31
    you keep opening new tabs and unless you go into
    incognito you kind of have the same profile
  • 26:31 - 26:34
    And that profile is the same for
    all websites, there is this
  • 26:34 - 26:36
    Is it an extension or is it built in?
  • 26:36 - 26:41
    (it's a mix, it's complicated)
  • 26:41 - 26:46
    So I think you actually have to say you want
    to install it or enable it and again
  • 26:46 - 26:50
    the name is Multi Account Containers and
    these let you tell Firefox to have
  • 26:50 - 26:54
    separate isolated sessions. So
    for example, you want to say
  • 26:54 - 26:59
    I have a separate sessions for whenever I
    visit to Google or whenever I visit Amazon
  • 26:59 - 27:02
    and that can be pretty neat, because then you can
  • 27:02 - 27:08
    At a browser level it's ensuring that no information
    sharing is happening between the two of them
  • 27:08 - 27:12
    And it's much more convenient than
    having to open a incognito window
  • 27:12 - 27:14
    where it's gonna clean all the time the stuff
  • 27:14 - 27:17
    (One thing to mention is Stylus vs Stylish)
  • 27:18 - 27:20
    Oh yeah, I forgot about that
  • 27:20 - 27:25
    One important thing is the browser extension
    for side loading CSS Stylesheets
  • 27:25 - 27:32
    it's called a Stylus and that's different
    from the older one that was
  • 27:32 - 27:37
    called Stylish, because that one got
    bought at some point by some shady
  • 27:37 - 27:41
    company, that started abusing it not only to have
  • 27:41 - 27:46
    that functionality, but also to read your
    entire browser history and send that
  • 27:46 - 27:48
    back to their servers so they could data mine it.
  • 27:48 - 27:54
    So, then people just built this open-source alternative
    that is called Stylus, and that's the one
  • 27:54 - 27:59
    we recommend. Said that, I think the repository
    for styles is the same for the
  • 27:59 - 28:04
    two of them, but I would have
    to double check that.
  • 28:04 - 28:06
    Do you have any browser plugins Anish?
  • 28:06 - 28:09
    Yes, so I also have some recommendations
    for browser plugins
  • 28:09 - 28:14
    I also use uBlock Origin and I also use Stylus,
  • 28:14 - 28:19
    but one other one that I'd recommend is
    integration with a password manager
  • 28:19 - 28:22
    So this is a topic that we have in
    the lecture notes for the security
  • 28:22 - 28:25
    lecture, but we didn't really get to talk
    about in detail. But basically password
  • 28:25 - 28:28
    managers do a really good job of increasing
    your security when working
  • 28:28 - 28:32
    with online accounts, and having browser
    integration with your password manager
  • 28:32 - 28:34
    can save you a lot of time like you
    can open up a website then it can
  • 28:34 - 28:37
    autofill your login information for you
    sir and you go and copy and paste it
  • 28:37 - 28:40
    back and forth between a separate program
    if it's not integrated with your
  • 28:40 - 28:43
    web browser, and it can also, this integration,
    can save you from certain
  • 28:43 - 28:48
    attacks that would otherwise be possible if
    you were doing this manual copy pasting.
  • 28:48 - 28:51
    For example, phishing attacks. So
    you find a website that looks very
  • 28:51 - 28:54
    similar to Facebook and you go to log in
    with your facebook login credentials and
  • 28:54 - 28:57
    you go to your password manager and copy
    paste the correct credentials into this
  • 28:57 - 29:00
    funny web site and now all of a sudden
    it has your password but if you have
  • 29:00 - 29:03
    browser integration then the extension
    can automatically check
  • 29:03 - 29:07
    like. Am I on F A C E B O O K.com,or
    is it some other domain
  • 29:07 - 29:11
    that maybe look similar and it will not enter
    the login information if it's the wrong domain
  • 29:11 - 29:16
    so browser extension for
    password managing is good
  • 29:16 - 29:18
    Yeah I agree
  • 29:19 - 29:21
    Next question
  • 29:21 - 29:24
    What are other useful data wrangling tools?
  • 29:24 - 29:32
    So in yesterday's lecture, I mentioned curl, so
    curl is a fantastic tool for just making web
  • 29:32 - 29:36
    requests and dumping them to your terminal.
    You can also use it for things
  • 29:36 - 29:41
    like uploading files which is really handy.
  • 29:41 - 29:48
    In the exercises of that lecture we also talked about
    JQ and pup which are command line tools that let you
  • 29:48 - 29:53
    basically write queries over JSON
    and HTML documents respectively
  • 29:53 - 30:00
    that can be really handy. Other
    data wrangling tools?
  • 30:00 - 30:04
    Ah Perl, the Perl programming language is
  • 30:04 - 30:08
    often referred to as a write only
    programming language because it's
  • 30:08 - 30:13
    impossible to read even if you wrote it.
    But it is fantastic at doing just like
  • 30:13 - 30:22
    straight up text processing, like nothing
    beats it there, so maybe worth learning
  • 30:22 - 30:24
    some very rudimentary Perl just
    to write some of those scripts
  • 30:24 - 30:29
    It's easier often than writing some like hacked-up
    combination of grep and awk and sed,
  • 30:29 - 30:36
    and it will be much faster to just tack something
    up than writing it up in Python, for example
  • 30:36 - 30:44
    but apart from that, other data wrangling
  • 30:44 - 30:47
    No, not off the top of my head really
  • 30:47 - 30:54
    column -t, if you pipe any white space separated
  • 30:54 - 30:59
    input into column -t it will align all
    the white space of the columns so that
  • 30:59 - 31:06
    you get nicely aligned columns that's, and
    head and tail but we talked about those
  • 31:09 - 31:14
    I think a couple of additions to that,
    that I find myself using commonly
  • 31:14 - 31:20
    one is vim. Vim can be pretty useful
    for like data wrangling on itself
  • 31:20 - 31:22
    Sometimes you might find that the operation
    that you're trying to do is
  • 31:22 - 31:28
    hard to put down in terms of piping
    different operators but if you
  • 31:28 - 31:33
    can just open the file and just record
  • 31:33 - 31:37
    a couple of quick vim macros to do what you
    want it to do, it might be like much,
  • 31:37 - 31:42
    much easier. That's one, and then the other
    one, if you're dealing with tabular
  • 31:42 - 31:46
    data and you want to do more complex operations
    like sorting by one column,
  • 31:46 - 31:51
    then grouping and then computing some sort
    of statistic, I think a lot of that
  • 31:51 - 31:56
    workload I ended up just using Python
    and pandas because it's built for that
  • 31:56 - 32:00
    And one of the pretty neat features that
    I find myself also using is that it
  • 32:00 - 32:04
    will export to many different formats.
    So this intermediate state
  • 32:04 - 32:09
    has its own kind of pandas dataframe
    object but it can
  • 32:09 - 32:14
    export to HTM, LaTeX, a lot of different
    like table formats so if your end
  • 32:14 - 32:20
    product is some sort of summary table, then pandas
    I think it's a fantastic choice for that
  • 32:21 - 32:25
    I would second the vim and also
    Python I think those are
  • 32:25 - 32:29
    two of my most used data wrangling tools.
    For the vim one, last year we had a demo
  • 32:29 - 32:32
    in the series in the lecture notes, but
    we didn't cover it in class we had a
  • 32:32 - 32:38
    demo of turning an XML file into a JSON version
    of that same data using only vim macros
  • 32:38 - 32:40
    And I think that's actually the
    way I would do it in practice
  • 32:40 - 32:43
    I don't want to go find a tool that does
    this conversion it is actually simple
  • 32:43 - 32:45
    to encode as a vim macro,
    then I just do it that way
  • 32:45 - 32:49
    And then also Python especially in an interactive
    tool like a Jupyter notebook
  • 32:49 - 32:51
    is a really great way of doing data wrangling
  • 32:51 - 32:53
    A third tool I'd mention which
    I don't remember if we
  • 32:53 - 32:55
    covered in the data wrangling
    lecture or elsewhere
  • 32:55 - 32:59
    is a tool called pandoc which can do transformations
    between different text
  • 32:59 - 33:03
    document formats so you can convert from
    plaintext to HTML or HTML to markdown
  • 33:03 - 33:07
    or LaTeX to HTML or many other formats
    it actually it supports a large
  • 33:07 - 33:10
    list of input formats and a
    large list of output formats
  • 33:10 - 33:16
    I think there's one last one which I mentioned briefly
    in the lecture on data wrangling which is
  • 33:16 - 33:20
    the R programming language, it's
    an awful (I think it's an awful)
  • 33:20 - 33:25
    language to program in. And i would never
    use it in the middle of a data wrangling
  • 33:25 - 33:31
    pipeline, but at the end, in order to like produce
    pretty plots and statistics R is great
  • 33:31 - 33:36
    Because R is built for doing
    statistics and plotting
  • 33:36 - 33:41
    there's a library for are called
    ggplot which is just amazing
  • 33:41 - 33:47
    ggplot2 i guess technically It's
    great, it produces very
  • 33:47 - 33:51
    nice visualizations and it lets you do,
    it does very easily do things like
  • 33:51 - 33:58
    If you have a data set that has like multiple
    facets like it's not just X and Y
  • 33:58 - 34:03
    it's like X Y Z and some other variable,
    and then you want to plot like the
  • 34:03 - 34:08
    throughput grouped by all of those parameters
    at the same time and produce
  • 34:08 - 34:12
    a visualization. R very easily let's you
    do this and I haven't seen anywhere
  • 34:12 - 34:15
    that lets you do that as easily
  • 34:17 - 34:18
    Next question,
  • 34:18 - 34:21
    What's the difference between
    Docker and a virtual machine
  • 34:23 - 34:28
    What's the easiest way to explain this? So docker
  • 34:28 - 34:31
    starts something called containers and
    docker is not the only program that
  • 34:31 - 34:37
    starts containers. There are many others
    and usually they rely on some feature of
  • 34:37 - 34:40
    the underlying kernel in the case of
    docker they use something called LXC
  • 34:40 - 34:48
    which are Linux containers and the basic
    premise there is if you want to start
  • 34:48 - 34:53
    what looks like a virtual machine that
    is running roughly the same operating
  • 34:53 - 34:57
    system as you are already running on your
    computer then you don't really need
  • 34:57 - 35:05
    to run another instance of the kernel
    really that other virtual machine can
  • 35:05 - 35:10
    share a kernel. And you can just use the
    kernels built in isolation mechanisms to
  • 35:10 - 35:14
    spin up a program that thinks it's
    running on its own hardware but in
  • 35:14 - 35:19
    reality it's sharing the kernel and so this
    means that containers can often run
  • 35:19 - 35:23
    with much lower overhead than a full virtual
    machine will do but you should
  • 35:23 - 35:26
    keep in mind that it also has somewhat weaker
    isolation because you are sharing
  • 35:26 - 35:31
    a kernel between the two if you spin up
    a virtual machine the only thing that's
  • 35:31 - 35:36
    shared is sort of the hardware and to
    some extent the hypervisor, whereas
  • 35:36 - 35:41
    with a docker container you're sharing
    the full kernel and the that is a
  • 35:41 - 35:45
    different threat model that you
    might have to keep in mind
  • 35:47 - 35:52
    One another small note there as Jon pointed
    out, to use containers something
  • 35:52 - 35:56
    like Docker you need the underlying operating
    system to be roughly the same
  • 35:56 - 36:00
    as whatever the program that's running
    on top of the container expects and so
  • 36:00 - 36:04
    if you're using macOS for example, the
    way you use docker is you run Linux
  • 36:04 - 36:08
    inside a virtual machine and then you can
    run Docker on top of Linux so maybe
  • 36:08 - 36:12
    if you're going for containers in order
    to get better performance your trading
  • 36:12 - 36:15
    isolation for performance if you're running
    on Mac OS that may not work out
  • 36:15 - 36:17
    exactly as expected
  • 36:17 - 36:21
    And one last note is that there
    is a slight difference, so
  • 36:21 - 36:26
    with Docker and containers,
    one of the gotchas you have
  • 36:26 - 36:29
    to be familiar with is that containers
    are more similar to virtual
  • 36:29 - 36:33
    machines in the sense of that they will
    persist all the storage that you
  • 36:33 - 36:36
    have where Docker by default won't have that.
  • 36:36 - 36:38
    Like Docker is supposed to be running
  • 36:38 - 36:42
    So the main idea is like I want
    to run some software and
  • 36:42 - 36:46
    I get the image and it runs and if you
    want to have any kind of persistent
  • 36:46 - 36:50
    storage that links to the host system
    you have to kind of manually specify
  • 36:50 - 36:56
    that, whereas a virtual machine is using
    some virtual disk that is being provided
  • 36:56 - 37:03
    Next question
  • 37:03 - 37:05
    What are the advantages of each operating system
  • 37:05 - 37:09
    and how can we choose between them?
    For example, choosing the best Linux
  • 37:09 - 37:11
    distribution for our purposes
  • 37:14 - 37:17
    I will say that for many, many tasks the
  • 37:17 - 37:20
    specific Linux distribution that you're
    running is not that important
  • 37:20 - 37:24
    the thing is, it's just what kind of
  • 37:24 - 37:28
    knowing that there are different types
    or like groups of distributions,
  • 37:28 - 37:32
    So for example, there are some distributions
    that have really frequent updates
  • 37:32 - 37:39
    but they kind of break more easily. So for
    example Arch Linux has a rolling update
  • 37:39 - 37:44
    way of pushing updates, where things might
    break but they're fine with the things
  • 37:44 - 37:48
    being that way. Where maybe where you
    have some really important web server
  • 37:48 - 37:51
    that is hosting all your business
    analytics you want that thing
  • 37:51 - 37:56
    to have like a much more steady way of
    updates. So that's for example why you
  • 37:56 - 37:58
    will see distributions like Debian being
  • 37:58 - 38:03
    much more conservative about what they push, or
    even for example Ubuntu makes a difference
  • 38:03 - 38:07
    between the Long Term Releases
    that they are only update every
  • 38:07 - 38:12
    two years and the more periodic
    releases of one there is a
  • 38:12 - 38:17
    it's like two a year that they make.
    So, kind of knowing that there's the
  • 38:17 - 38:21
    difference apart from that some distributions
    have different ways
  • 38:21 - 38:27
    of providing the binaries
    to you and the way they
  • 38:27 - 38:34
    have the repositories so I think a lot of Red
    Hat Linux don't want non free drivers in
  • 38:34 - 38:37
    their official repositories where I
    think Ubuntu is fine with some of
  • 38:37 - 38:42
    them, apart from that I think like just
    a lot of what is core to most Linux
  • 38:42 - 38:47
    distros is kind of shared between them
    and there's a lot of learning in the
  • 38:47 - 38:51
    common ground. So you don't have
    to worry about the specifics
  • 38:52 - 38:56
    Keeping with the theme of this class being somewhat
    opinionated, I'm gonna go ahead and say
  • 38:56 - 39:00
    that if you're using Linux especially for
    the first time choose something like
  • 39:00 - 39:04
    Ubuntu or Debian. So you Ubuntu to is a
    Debian based distribution but maybe is a
  • 39:04 - 39:07
    little bit more friendly, Debian is a little
    bit more minimalist. I use Debian
  • 39:07 - 39:10
    and all my servers, for example. And I use
    Debian desktop on my desktop computers
  • 39:10 - 39:15
    that run Linux if you're going for maybe
    trying to learn more things and you want
  • 39:15 - 39:19
    a distribution that trades stability for
    having more up-to-date software maybe
  • 39:19 - 39:22
    at the expense of you having to fix a
    broken distribution every once in a
  • 39:22 - 39:27
    while then maybe you can consider something
    like Arch Linux or Gentoo
  • 39:27 - 39:33
    or Slackware. Oh man, I'd say that like
    if you're installing Linux and just like
  • 39:33 - 39:35
    want to get work done Debian is a great choice
  • 39:36 - 39:38
    Yeah I think I agree with that.
  • 39:38 - 39:41
    The other observation is like
    you couldn't install BSD
  • 39:41 - 39:47
    BSD has gotten, has come a long way from
    where it was. There's still a bunch of
  • 39:47 - 39:51
    software you can't really get for BSD but
    it gives you a very well-documented
  • 39:51 - 39:56
    experience and and one thing that's different
    about BSD compared to Linux is
  • 39:56 - 40:03
    that in an BSD when you install BSD you
    get a full operating system, mostly
  • 40:03 - 40:08
    So many of the programs are maintained by
    the same team that maintains the kernel
  • 40:08 - 40:11
    and everything is sort of upgraded together,
    which is a little different
  • 40:11 - 40:13
    than how thanks work in the Linux world it does
  • 40:13 - 40:17
    mean that things often move a little bit
    slower. I would not use it for things
  • 40:17 - 40:22
    like gaming either, because drivers support
    is meh. But it is an interesting
  • 40:22 - 40:31
    environment to look at. And then for things
    like Mac OS and Windows I think
  • 40:31 - 40:36
    If you are a programmer, I don't know why
    you are using Windows unless you are
  • 40:36 - 40:42
    building things for Windows; or you want
    to be able to do gaming and stuff
  • 40:42 - 40:47
    but in that case, maybe try dual booting,
    even though that's a pain too
  • 40:47 - 40:52
    Mac OS is a is a good sort of middle point
    between the two where you get a system
  • 40:52 - 40:58
    that is like relatively nicely polished
    for you. But you still have access to
  • 40:58 - 41:01
    some of the lower-level bits
    at least to a certain extent.
  • 41:01 - 41:07
    it's also really easy to dual boot Mac OS and Windows
    it is not quite the case with like Mac OS and
  • 41:07 - 41:10
    Linux or Linux and Windows
  • 41:14 - 41:16
    Alright, for the rest of the
    questions so these are
  • 41:16 - 41:19
    all 0 upvote questions so maybe we can go
    through them quickly in the last five
  • 41:19 - 41:23
    or so minutes of class. So the next
    one is Vim versus Emacs? Vim!
  • 41:23 - 41:31
    Easy answer, but a more serious answer is like I think
    all three of us use vim as our primary editor
  • 41:31 - 41:35
    I use Emacs for some research specific
    stuff which requires Emacs but
  • 41:35 - 41:39
    at a higher level both editors have interesting
    ideas behind them and if you
  • 41:39 - 41:43
    have the time is worth exploring both
    to see which fits you better and also
  • 41:43 - 41:47
    you can use Emacs and run it in a vim
    emulation mode. I actually know a
  • 41:47 - 41:49
    good number of people who do that so
    they get access to some of the cool
  • 41:49 - 41:53
    Emacs functionality and some of the cool
    philosophy behind that like Emacs is
  • 41:53 - 41:55
    programmable through Lisp which is kind of cool.
  • 41:55 - 41:59
    Much better than vimscript, but people like
    vim's modal editing, so there's an
  • 41:59 - 42:04
    emacs plugin called evil mode which gives
    you vim modal editing within Emacs so
  • 42:04 - 42:08
    it's not necessarily a binary choice you
    can kind of combine both tools if you
  • 42:08 - 42:11
    want to. And it's worth exploring
    both if you have the time.
  • 42:11 - 42:13
    Next question
  • 42:13 - 42:16
    Any tips or tricks for machine
    learning applications?
  • 42:19 - 42:22
    I think, like knowing how
  • 42:22 - 42:25
    a lot of these tools, mainly the data wrangling
  • 42:25 - 42:30
    a lot of the shell tools, it's really
    important because it seems a lot
  • 42:30 - 42:34
    of what you're doing as machine learning
    researcher is trying different things
  • 42:34 - 42:39
    but I think one core aspect of doing that,
    and like a lot of scientific work is being
  • 42:39 - 42:45
    able to have reproducible results
    and logging them in a sensible way
  • 42:45 - 42:48
    So for example, instead of trying to come
    up with really hacky solutions of how
  • 42:48 - 42:51
    you name your folders to make
    sense of the experiments
  • 42:51 - 42:53
    Maybe it's just worth having for example
  • 42:53 - 42:56
    what I do is have like a JSON
    file that describes the
  • 42:56 - 43:00
    entire experiment I know like all the parameters
    that are within and then I can
  • 43:00 - 43:05
    really quickly, using the tools that
    we have covered, query for all the
  • 43:05 - 43:10
    experiments that have some specific
    purpose or use some data set
  • 43:10 - 43:15
    Things like that. Apart from that, the other
    side of this is, if you are running
  • 43:15 - 43:20
    kind of things for training machine
    learning applications and you
  • 43:20 - 43:24
    are not already using some sort of
    cluster, like university or your
  • 43:24 - 43:28
    company is providing and you're just kind
    of manually sshing, like a lot of
  • 43:28 - 43:31
    labs do, because that's kind of the easy way
  • 43:31 - 43:37
    It's worth automating a lot of that job
    because it might not seem like it but
  • 43:37 - 43:41
    manually doing a lot of these operations
    takes away a lot of your time and also
  • 43:41 - 43:45
    kind of your mental energy
    for running these things
  • 43:49 - 43:52
    Anymore vim tips?
  • 43:52 - 43:57
    I have one. So in the vim lecture we tried
    not to link you to too many different
  • 43:57 - 44:00
    vim plugins because we didn't want that
    lecture to be overwhelming but I think
  • 44:00 - 44:03
    it's actually worth exploring vim plugins
    because there are lots and lots
  • 44:03 - 44:07
    of really cool ones out there.
    One resource you can use is the
  • 44:07 - 44:11
    different instructors dotfiles like a lot
    of us, I think I use like two dozen
  • 44:11 - 44:14
    vim plugins and I find a lot of them quite
    helpful and I use them every day
  • 44:14 - 44:18
    we all use slightly different subsets of
    them. So go look at what we use or look
  • 44:18 - 44:22
    at some of the other resources we've linked
    to and you might find some stuff useful
  • 44:23 - 44:27
    A thing to add to that is, I don't think
    we went into a lot detail in the
  • 44:27 - 44:32
    lecture, correct me if I'm wrong. It's
    getting familiar with the leader key
  • 44:32 - 44:35
    Which is kind of a special key
    that a lot of programs will
  • 44:35 - 44:39
    especially plugins, that will link to
    and for a lot of the common operations
  • 44:39 - 44:45
    vim has short ways of doing it, but you
    can just figure out like quicker
  • 44:45 - 44:50
    versions for doing them. So for example, like
    I know that you can do like semicolon WQ
  • 44:50 - 44:56
    to save and exit or that you
    can do like capital ZZ but I
  • 44:56 - 44:59
    just actually just do leader (which for
    me is the space) and then W. And I have
  • 44:59 - 45:04
    done that for a lot of a lot of kind of
    common operations that I keep doing all
  • 45:04 - 45:08
    the time. Because just saving one keystroke
    for an extremely common operation
  • 45:08 - 45:11
    is just saving thousands a month
  • 45:11 - 45:13
    Yeah just to expand a little bit
  • 45:13 - 45:17
    on what the leader key is so in vim you
    can bind some keys I can do like ctrl J
  • 45:17 - 45:20
    does something like holding one key and
    then pressing another I can bind that to
  • 45:20 - 45:24
    something or I can bind a single keystroke
    to something. What the leader
  • 45:24 - 45:26
    key lets you do, is bind
  • 45:26 - 45:28
    So you can assign any key
    to be the leader key and
  • 45:28 - 45:33
    then you can assign leader followed by
    some other key to some action so for
  • 45:33 - 45:37
    example like Jose's leader key is space
    and they can combine space and then
  • 45:37 - 45:42
    releasing space followed by some other
    key to an arbitrary vim command so it
  • 45:42 - 45:46
    just gives you yet another way of binding
    like a whole set of key combinations.
  • 45:46 - 45:50
    Leader key plus kind of any key on
    the keyboard to some functionality
  • 45:50 - 45:54
    I think I've I forget whether
    we covered macros in the vim
  • 45:54 - 45:59
    uh sure but like vim macros are worth
    learning they're not that complicated
  • 45:59 - 46:03
    but knowing that they're there and knowing
    how to use them is going to save
  • 46:03 - 46:10
    you so much time. The other one is something
    called marks. So in vim you can
  • 46:10 - 46:13
    press m and then any letter on your keyboard
    to make a mark in that file and
  • 46:13 - 46:18
    then you can press apostrophe on the
    same letter to jump back to the same
  • 46:18 - 46:22
    place. This is really useful if you're
    like moving back and forth
  • 46:22 - 46:25
    between two different parts of your code
    for example. You can mark one as A and
  • 46:25 - 46:30
    one as B and you can then jump between
    them with tick A and tick B.
  • 46:30 - 46:35
    There's also Ctrl+O which jumps to the previous
    place you were in the file no matter
  • 46:35 - 46:41
    what caused you to move. So for example
    if I am in a some line and then I jump
  • 46:41 - 46:45
    to B and then I jump to A, Ctrl+O will
    take me back to B and then back to the
  • 46:45 - 46:49
    place I originally was. This can also be
    handy for things like if you're doing a
  • 46:49 - 46:53
    search then the place that you
    started the search is a part of
  • 46:53 - 46:56
    that stack. So I can do a search I can
    then like step through the results
  • 46:56 - 47:01
    and like change them and then Ctrl+O
    all the way back up to the search
  • 47:01 - 47:06
    Ctrl+O also lets you move across files so
    if I go from one file to somewhere else in
  • 47:06 - 47:10
    different file and somewhere else in the
    first file Ctrl+O will move me back
  • 47:10 - 47:15
    through that stack and then there's
    Ctrl+I to move forward in that
  • 47:15 - 47:21
    stack and so it's not as though you
    pop it and it goes away forever
  • 47:21 - 47:27
    The command colon earlier is really handy.
    So, colon earlier gives you an earlier
  • 47:27 - 47:33
    version of the same file and it it does
    this based on time not based on actions
  • 47:33 - 47:37
    so for example if you press a bunch of like
    undo and redo and make some changes
  • 47:37 - 47:43
    and stuff, earlier will take a literally
    earlier as in time version of your file
  • 47:43 - 47:47
    and restore it to your buffer. This can
    sometimes be good if you like undid and
  • 47:47 - 47:51
    then rewrote something and then realize
    you actually wanted the version that was
  • 47:51 - 47:55
    there before you started undoing earlier
    let's you do this. And there's a plug-in
  • 47:55 - 48:02
    called undo tree or something like
    that There are several of these,
  • 48:02 - 48:06
    that let you actually explore the full
    tree of undo history the vim keeps
  • 48:06 - 48:09
    because it doesn't just keep a linear history
    it actually keeps the full tree
  • 48:09 - 48:13
    and letting you explore that might in
    some cases save you from having to
  • 48:13 - 48:16
    re-type stuff you typed in the past or
    stuff you just forgot exactly what you
  • 48:16 - 48:21
    had there that used to work and no longer
    works. And this is one final one I
  • 48:21 - 48:27
    want to mention which is, we mentioned
    how in vim you have verbs and nouns
  • 48:27 - 48:33
    right to your verbs like delete or yank
    and then you have nouns like next of
  • 48:33 - 48:37
    this character or percent to swap brackets
    and that sort of stuff the
  • 48:37 - 48:45
    search command is a noun so you can do
    things like D slash and then a string
  • 48:45 - 48:50
    and it will delete up to the next match
    of that pattern this is extremely useful
  • 48:50 - 48:54
    and I use it all the time
  • 48:58 - 49:04
    One another neat addition on the undo stuff
    that I find incredibly valuable in
  • 49:04 - 49:08
    an everyday basis is that like one of
    the built-in functionalities of vim
  • 49:08 - 49:14
    is that you can specify an undo directory
    and if you have a specified an
  • 49:14 - 49:18
    undo directory by default vim, if you
    don't have this enabled, whenever you
  • 49:18 - 49:23
    enter a file your undo history is
    clean, there's nothing in there
  • 49:23 - 49:26
    and as you make changes and then
    undo them you kind of create this
  • 49:26 - 49:33
    history but as soon as you exit the
    file that's lost. Sorry, as soon
  • 49:33 - 49:37
    as you exit vim, that's lost. However
    if you have an undodir, vim is
  • 49:37 - 49:42
    gonna persist all those changes into
    this directory so no matter how many
  • 49:42 - 49:46
    times you enter and leave that history
    is persisted and it's incredibly
  • 49:46 - 49:48
    helpful because even like
  • 49:48 - 49:50
    it can be very helpful for
    some files that you modify
  • 49:50 - 49:55
    often because then you can kind of keep
    the flow. But it's also sometimes really
  • 49:55 - 50:00
    helpful if you modify your bashrc see and
    something broke like five days later and
  • 50:00 - 50:03
    then you've vim again. Like what actually
    did I change ,if you don't
  • 50:03 - 50:07
    have say like version control, then
    you can just check the undos and
  • 50:07 - 50:11
    that's actually what happened. And
    the last one, it's also really
  • 50:11 - 50:15
    worth familiarizing yourself with registers
    and what different special
  • 50:15 - 50:20
    registers vim uses. So for example if
    you want to copy/paste really that's
  • 50:20 - 50:26
    gone into in a specific register and if you
    want to for example use the a OS a copy
  • 50:26 - 50:30
    like the OS clipboard, you should
    be copying or yanking
  • 50:30 - 50:36
    copying and pasting from a different register
    and there's a lot of them and yeah
  • 50:36 - 50:41
    I think that you should explore, there's
    a lot of things to know about registers
  • 50:42 - 50:45
    The next question is asking about two-factor
    authentication and I'll just give
  • 50:45 - 50:48
    a very quick answer to this one in the interest
    of time. So it's worth using two
  • 50:48 - 50:52
    factor auth for anything security sensitive
    so I use it for my GitHub
  • 50:52 - 50:57
    account and for my email and stuff like
    that. And there's a bunch of different
  • 50:57 - 51:01
    types of two-factor auth. From SMS based
    to factor auth where you get special
  • 51:01 - 51:05
    like a number texted to you when you try
    to log in you have to type that number
  • 51:05 - 51:09
    and to other tools like universal to
    factor this is like those Yubikeys
  • 51:09 - 51:11
    that you plug into your you have
    to tap it every time you login
  • 51:11 - 51:18
    so not all, (yeah Jon is holding a
    Yubikey), not all two-factor auth is
  • 51:18 - 51:22
    created equal and you really want to be
    using something like U2F rather than SMS
  • 51:22 - 51:25
    based to factor auth. There something
    based on one-time pass codes that you
  • 51:25 - 51:29
    have to type in we don't have time to get
    into the details of why some methods
  • 51:29 - 51:32
    are better than others but at a high
    level use U2F and the Internet has
  • 51:32 - 51:38
    plenty of explanations for why other
    methods are not a great idea
  • 51:38 - 51:42
    Last question, any comments on differences
    between web browsers?
  • 51:48 - 51:50
    Yes
  • 51:55 - 52:00
    Differences between web browsers, there
    are fewer and fewer differences between
  • 52:00 - 52:06
    web browsers these day. At this point
    almost all web browsers are chrome
  • 52:06 - 52:10
    Either because you're using Chrome or
    because you're using a browser that's
  • 52:10 - 52:16
    using the same browser engine as Chrome.
    It's a little bit sad, one might say, but
  • 52:16 - 52:21
    I think these days whether you choose
  • 52:21 - 52:24
    Chrome is a great browser for security reasons
  • 52:24 - 52:28
    if you want to have something
    that's more customizable or
  • 52:28 - 52:39
    you don't want to be tied to Google then
    use Firefox, don't use Safari it's a
  • 52:39 - 52:46
    worse version of Chrome. The new Internet
    Explorer edge is pretty decent and also
  • 52:46 - 52:51
    uses the same browser engine as
    Chrome and that's probably fine
  • 52:51 - 52:55
    although avoid it if you can because it
    has some like legacy modes you don't
  • 52:55 - 52:58
    want to deal with. I think that's
  • 52:58 - 53:03
    Oh, there's a cool new browser called flow
  • 53:03 - 53:06
    that you can't use for anything useful
    yet but they're actually writing
  • 53:06 - 53:09
    their own browser engine and that's really neat
  • 53:09 - 53:15
    Firefox also has this project called servo which is
    they're really implementing their browser engine
  • 53:15 - 53:20
    in Rust in order to write it to be like
    super concurrent and what they've done
  • 53:20 - 53:25
    is they've started to take modules
    from that version and port them
  • 53:25 - 53:29
    over to gecko or integrate them with gecko
    which is the main browser engine
  • 53:29 - 53:32
    for Firefox just to get those
    speed ups there as well
  • 53:32 - 53:37
    and that's a neat neat thing
    you can be watching out for
  • 53:39 - 53:42
    That is all the questions, hey we did it. Nice
  • 53:42 - 53:51
    I guess thanks for taking the missing semester
    class and let's do it again next year
Title:
Lecture 11: Q&A (2020)
Description:

more » « less
Video Language:
English
Duration:
53:53

English subtitles

Revisions