-
alright let's get started with today's
-
lecture so actually before we get
-
started one quick note about office
-
hours it seemed from the poll that some
-
people were under the impression that
-
the office hours that follows each
-
lecture is just about that day's
-
lectures topics and this is not the case
-
you can come to office hours and ask us
-
questions about any lecture whether it's
-
the previous day or from the previous
-
week or even things not exactly covered
-
in this class that you're just curious
-
about so yeah come to office hours with
-
questions about anything office hours
-
are in the 32 g9 lounge so building 32
-
also known as a Stata Center has two
-
towers the G tower and the D tower so
-
we're in the gates tower on the ninth
-
floor so if you take the elevator all
-
the way up there's the lounge right in
-
front of you okay cool so today we're
-
going to be talking about version
-
control systems so I just want to get a
-
sense of whether you guys have used
-
version control systems before so could
-
you raise your hand if you have any
-
experience with git or any other version
-
control system like subversion or
-
mercurial or anything else oh great so
-
that's a good number of you so I won't
-
talk about version control systems in
-
general way too much then we'll pretty
-
quickly get into the details of git and
-
like it's data model and its internals
-
but just as a quick summary version
-
control systems are tools that are used
-
to keep track of changes to source code
-
or other collections of files or folders
-
and as the name implies these tools help
-
track the history of changes to some set
-
of documents and in addition to doing
-
that they facilitate collaboration so
-
they're really useful for working with a
-
group of people on a software project
-
Virna control systems track changes to a
-
folder and its contents in a series of
-
snapshots so you capture the entire
-
state of a folder and everything inside
-
like a software project and you have
-
multiple of these in a series of
-
snapshots each snapshot encapsulate the
-
entire set of files and folders
-
contained within some top-level
-
directory and then version control
-
systems also maintain a bunch of
-
metadata along with the actual changes
-
to the content and this is to make it
-
possible to figure things out like who
-
authored a particular change to a
-
particular file or when was a particular
-
change made
-
and slow version control systems
-
maintain metadata like authors and
-
commit timestamps and you can also
-
attach extra messages to these snapshots
-
and things like that and so why is
-
version control useful
-
well it's useful even when you're
-
working on projects by yourself so you
-
can use it to look at old versions of
-
code you've written figure out like why
-
something was changed by looking at
-
commit messages work on different things
-
in parallel without conflicts by using
-
different branches of development or be
-
able to work on bug fixes while keeping
-
work on different features independent
-
things like that and so it's an
-
invaluable tool even if you're working
-
just by yourself even on a small scale
-
project like I think the instructors of
-
this course use git even on things like
-
homework assignments or class projects
-
even small scale things in addition to
-
our research or larger software projects
-
and then of course version control is a
-
really powerful tool for working with
-
other people so it's useful for sending
-
patches of code around resolving
-
conflicts when different people are
-
working on the same piece of code at the
-
same time things like that and so it's a
-
really powerful tool for working by
-
yourself or with others and it also has
-
a neat functionality to let you answer
-
questions that would otherwise be kind
-
of hard to answer like who wrote a
-
particular module in a software project
-
or who edited a particular line in a
-
particular software project why was this
-
particular line change when was it
-
changed by whom things like that and
-
version control systems also have some
-
really powerful functionality that we
-
might cover at the end of today's
-
lecture
-
or you can find the lecture notes if we
-
don't have time to do things like
-
supposed to have some project you've
-
been working on for a couple years and
-
then you notice that some funny thing
-
about the project was broken like you
-
have some unit test that doesn't pass
-
anymore and it wasn't broken just now it
-
was broken some time ago and you don't
-
know exactly when this regression was
-
introduced well written control systems
-
have a way of automatically identifying
-
this like you can take it and give it a
-
unit test that's currently failing but
-
you know was passing at some point in
-
the past and it can binary search your
-
history and figure out exactly what
-
change to your code made it break so
-
lots of really powerful fancy features
-
if you know how to use these tools
-
properly
-
there are a number of version control
-
systems out there and get has become
-
kind of the de facto standard for
-
version control so that's what we're
-
going to be covering in today's lecture
-
one comic I want to show you which was
-
on the screen before hand let me bring
-
it back up so this is an xkcd comic that
-
illustrates gets reputation so let me
-
read it out loud for you
-
this is good it tries collaborative work
-
on projects through a beautiful
-
distributed graph theory tree model cool
-
how do we use it no idea just memorize
-
these shell commands and type them to
-
sync up if you get errors save your work
-
elsewhere delete the project and
-
download a fresh copy so maybe some
-
people may not want to raise their hands
-
for this but raise your hand if you've
-
ever done this before
-
I certainly have when I was learning
-
this tool so good number of you here
-
have done this before so the goal of
-
this lecture is to make it so you don't
-
have to do this anymore unfortunately as
-
this comic illustrates gets interface is
-
a pretty terribly designed interface
-
it's a leaky abstraction and so for this
-
reason we believe that learning get
-
topped down starting with the interface
-
is maybe not the best way to go and it
-
can lead to some confusion it's possible
-
like this comic shows to memorize a
-
handful of commands and think of them as
-
magic incantations and why everything's
-
working all right it kind of works out
-
all right but then you have to follow
-
the approach of this comic whenever
-
things go wrong
-
so while git has an ugly interface its
-
underlying design and ideas are actually
-
pretty beautiful an ugly interface has
-
to be memorized but the beautiful ideas
-
underlying git can actually be
-
understood and once you understand gets
-
internals its data model which is
-
actually not that complicated then you
-
can learn the interface to get you
-
you'll have to memorize some things but
-
you can understand what exactly certain
-
commands do by understanding how they
-
manipulate the underlying data model and
-
so the way we're going to teach get
-
today is first talk about the data model
-
almost in abstract talk about how we
-
might model files and folders snapshots
-
of history and how they relate to each
-
other
-
then after that we'll walk you through
-
some get commands and then finally in
-
the resources and exercises will link
-
you to tutorials that'll teach you all
-
the specifics because there are lots of
-
different commands that you will need to
-
learn eventually any questions so far
-
about our teaching approach for today
-
cool great so let's get started there
-
are probably many ad hoc approaches you
-
could take to version control and I'm
-
guessing some of you may have done this
-
before like say you have some file or
-
folder we have a bunch of different
-
files corresponding system software
-
project and you want to track changes
-
you could just say every day make a copy
-
of that entire folder and give that
-
folder a timestamp when you want to do
-
things like collaborate with other
-
people you could take the entire folder
-
turned it into a zip archive and email
-
it to somebody and then whenever you and
-
your buddy are working on two different
-
features of a software project you can
-
work on them in parallel then one of you
-
emails the zip file to the other person
-
and then you manually copy and paste the
-
appropriate segments from their code
-
into your code so that eventually you
-
end up with one piece of code that has
-
both of your features in it this kind of
-
sort of works raise your hand if you've
-
done this before
-
I certainly have still a decent number
-
of you get let's us not do this sort of
-
thing
-
it is a well-thought-out model that kind
-
of facilitates these sorts of
-
interactions things that you might want
-
to do like tracking your own history on
-
your in project or collaboration or
-
things like that so git has a well
-
thought-out model that enables things
-
like branches and collaboration and
-
merging changes from other people all
-
sorts of neat stuff get models history
-
is a collection of files and folders
-
within some top-level directory so
-
you're probably familiar with this
-
abstraction just from files and folders
-
on your own computer and so here's one
-
example like you might have some
-
top-level directory I'll just call this
-
like root in parentheses and this
-
directory might have say a folder in it
-
called foo and this folder inside of it
-
might have a file called bar dot txt and
-
this might have some contents in it like
-
say this says hello world and then maybe
-
this top-level directory it has one
-
folder in it it caalso have another file
-
in it so say there's some other file
-
and this file also has some contents in
-
it all right
-
simple enough the terminology get uses
-
for these different things for files and
-
folders is this and the top-level thing
-
are called trees so this is a folder and
-
then these things what we normally call
-
files are called blogs all right ok so
-
now we have a model of files and folders
-
and this is a recursive data structure
-
trees can contain other trees and then
-
trees can contain both trees and files
-
obviously files can't contain trees all
-
right so now we have a model of files
-
and folders and the kind of top-level of
-
this thing the thing I've just labeled
-
root is the directory being tracked like
-
you might have some folder on your
-
computer corresponding to a software
-
project now how do you model history
-
once you have a model of files and
-
folders well you can imagine one way of
-
doing it which is you take a snapshot of
-
this entire thing and then history is
-
just a linear sequence of snapshots like
-
you might imagine that it's you can
-
almost think of it like you have copies
-
of the folder which are dated and
-
time-stamped well it doesn't use a
-
simple linear model like that it uses
-
something a little bit fancier you might
-
have heard this terminology before but
-
git uses a directed acyclic graph to
-
model history and this might sound like
-
a bunch of fancy math words but it's
-
actually not all that complicated
-
so in get each snapshot has some number
-
of parents and basically want to know
-
like what change preceded what other
-
change so suppose here I'm going to use
-
circles to refer to individual snapshots
-
this is the entire contents within this
-
tree so all the files and folders in my
-
project my entire project may be in some
-
state and then I edit some files and now
-
it's in some other state and then I add
-
some more files and that's in some other
-
state and every state points back to
-
which state preceded it this so far is a
-
linear history
-
but it lets us do something a little bit
-
fancier than this you can also from a
-
certain snapshot fork your history and
-
say I want to base changes off of this
-
version and create a new snapshot like
-
this so this way of modeling history
-
allows you to do things like okay I'm
-
working on my project this is my main
-
line of development I go up to here and
-
now I have two different tasks I want to
-
work on suppose on one hand I have some
-
fancy new feature I want to add to my
-
project and so I'm going to be working
-
on that for a couple days but separately
-
from that somebody's reported a bug to
-
me and I need to go chase down that bug
-
and fix it well instead of working on
-
all that stuff kind of concurrently at
-
the same time in the same line of
-
development
-
git has its way of branching the history
-
into two separate Forks and working on
-
different things in parallel temporarily
-
in a way that are unrelated to each
-
other so I could take this base snapshot
-
like my project is in some state where
-
it works and then from here I could
-
implement a new feature that creates a
-
new snapshot so this has the base
-
project plus a new feature so I'll do
-
like plus feature and then similarly
-
separately from this I could go back to
-
this original snapshot because I don't
-
want to do bug fixing while implementing
-
my new feature go here and then work on
-
my bug fix and create a different
-
snapshot so this has only the bug fix
-
but not the feature and then finally
-
once I've done these two separate things
-
in parallel eventually I want to
-
incorporate them all into my common
-
source code that has both the feature
-
and the bug fix so eventually I might
-
author a new snapshot by merging the
-
changes present in these two different
-
snapshots and so this one I'll have both
-
of these snapshots as parents and this
-
version here will have both the feature
-
and my bug fix so does it make sense why
-
get models history in a way that's a
-
little bit fancier than just a sequence
-
of snapshots of my files and folders why
-
I want to be able to support branching
-
to work on things in parallel and then
-
also merging to combine changes from
-
different parallel branches of
-
development question
-
yeah so that's an excellent point it
-
seems that when you merge things you
-
could create errors that weren't
-
anticipated you could imagine here that
-
this feature actually changes something
-
that makes this bug-fix redundant or you
-
could imagine this bug fix breaking this
-
feature or something like that oh that's
-
a really good point that's a something
-
known as merge conflicts and this is
-
something that git will try to do like
-
when you merge your parallel branches of
-
development it will try to automatically
-
combine the changes in a way such that
-
it retains all the important changes but
-
if it gets confused it will report a
-
merge conflict and then leave it up to
-
you the programmer to figure out how to
-
combine kind of concurrent changes to
-
the same files or things like that and
-
then get has some tools for facilitating
-
this any other questions great ok so now
-
we have a model files and folders and
-
then we have a model of history how
-
different snapshots of our code relate
-
to each other
-
one little detail here is that each of
-
these circles so they kind of correspond
-
to a snapshot like a tree with files and
-
folders but they also have a little bit
-
of metadata so like inside here we might
-
have like the author of this commit is
-
me and we might have other metadata like
-
some message associated with this commit
-
I might describe what kinds of changes
-
I've made that are present in this
-
snapshot but not the previous one
-
that is not really the chair class so
-
next we're going to talk about kind of
-
one level lower than this like how
-
exactly is this represented as a as a
-
data structure inside get and so I'm
-
actually going to write down pseudocode
-
because I think it's actually easiest to
-
understand this way so first we have
-
files so a log is just a bunch of bytes
-
so I'll say this is an array of bytes
-
okay then what is a tree remember that
-
this is just a folder of what are
-
folders they're mappings from the
-
filename or directoryname to the actual
-
contents and the contents are either
-
another tree like a subtree or the file
-
and then finally we have the last thing
-
there what I've been calling snapshots
-
so far and get terminology those are
-
called commits and so what does a commit
-
[Applause]
-
it's a bunch of stuff commits have
-
parents that describes what precede them
-
so in the case of most normal commits
-
they have one parent like what they came
-
from what merge commits can have
-
multiple parents so parents are an array
-
of commits and then I have some metadata
-
like the author and maybe a message
-
and then finally the actual contents the
-
snapshot which is a tree that's the
-
top-level tree corresponding to a
-
particular commitment so this is a
-
really clean simple model of history and
-
this is basically all there is to how
-
get models history any questions about
-
that all right so now we have that going
-
a little bit deeper let's talk about how
-
it actually stores and addresses this
-
actual data like at some point this
-
actually has to turn to data on disk
-
right so get defines an object kind of a
-
big standing term but an object is any
-
one of those three things so it's a blob
-
or a tree tree or a commit and then in
-
get all objects are content addressed
-
so what get maintains on disk and you
-
can actually we can look at this later
-
is a set of objects maintained as this
-
content address store so if you have any
-
one of these objects the way you put it
-
into this store is its key is the hash
-
of the object so like in pseudocode I
-
might say that to store a particular
-
object o what I do is I compute its ID
-
by taking the sha-1 hash of o and then I
-
put it into my objects map store it to
-
disk a quick show of hands who here
-
knows what a hash function is all right
-
so I'll quickly summarize basically a
-
hash function is you can think of it as
-
like this magical function that takes a
-
big piece of data and turns it into a
-
short string at a high level these are
-
used to or maybe that's like a
-
sufficient
-
clinician I won't go into too much more
-
detail here but you can ask me
-
afterwards if you're if you're curious
-
so basically they give you a way to name
-
a thing in a way that's kind of
-
deterministic based on the constants of
-
the thing it takes into thing as input
-
and gives you a short name for it and
-
then the opposite of stores load the way
-
we can load things from the store you
-
might have just guessed you can look
-
them up by their ID and this is just we
-
retrieve it from the object store by ID
-
and it gives us back the contents any
-
questions about this so far question
-
that's a good question what language is
-
it's all written in it's written in the
-
language I just made up so it's
-
pseudocode the get implementation itself
-
is a mix of C it's mostly C and then
-
some bash and Perl scripts I think any
-
other questions is this made-up language
-
clear enough or do I need to explain any
-
aspects of it great okay so blobs trees
-
and commits and get are unified in this
-
way they're all objects and also as you
-
might think given my description here
-
like it looks like commits contain a
-
whole bunch of other commits and contain
-
a snapshot and things like that in
-
practice it doesn't actually work that
-
way instead all these are pointers so a
-
commit will be able to reference a bunch
-
of parents by their IDs so this is
-
actually not an array of commits
-
themselves but IDs and similarly the
-
snapshot inside a commit is not the
-
actual tree object it's the ID of the
-
tree and so all these objects are kind
-
of stored on their own in this object
-
store and then all the references to
-
different objects are just by their ID
-
by their sha-1 hash does that make sense
-
you can almost in your head map it to
-
like these are objects in a programming
-
language like Java and then this is a
-
reference to a tree so it's like a
-
pointer and then that is your realm
-
maybe this naughty helps maybe it
-
doesn't
-
yeah yeah exactly so I'll just repeat
-
that for everybody to hear on the
-
microphone
-
this is gets on disk data store it's a
-
Content address store where objects are
-
addressed by their hash all right any
-
questions about that so far ok so now we
-
have a way of identifying we've unified
-
all the different types of objects into
-
one type of thing we call object and we
-
have a way of identifying objects by
-
their sha-1 hash what do these actual
-
sha-1 hashes look like well they're
-
hexadecimal strings that are 40
-
characters long
-
like sha-1 is a 160-bit hash and so one
-
of the actual IDs returned by that sha-1
-
function is going to be a really long
-
string and so given that we'll have ways
-
of identifying these different things
-
like this we'll have corresponding to it
-
an ID like for a 3-2 CEB or something
-
something so now we have a way of naming
-
everything in this commit graph but
-
these names are really inconvenient
-
because they're super long and they're
-
like text strings they're not meaningful
-
to humans in any way so it's solution to
-
this problem is one other thing so get
-
maintains a set of objects and then it
-
maintains a set of references what our
-
references here I'll erase this bit on
-
the left this parts pretty logical
-
that's the irony another time so
-
references all right here
-
so this is another piece of data that
-
get maintains internally references is a
-
map from string to string and you can
-
think of this as mapping human readable
-
names like I might have a name like fix
-
encoding bug fix - encoding - bug is a
-
human readable name and this would be
-
maps to like that long hexadecimal
-
string there and so with these
-
references and you can imagine how we
-
might have ways of creating new
-
references and updating references and
-
things like that
-
with this I can now refer to things in
-
my commit graph by name so I might have
-
the same be called like fix bug or I
-
might have a name for something over
-
here things like that and so yeah with
-
this skit can use human readable names
-
to refer to particular snapshots in the
-
history instead of these long
-
hexadecimal strings one other thing to
-
be aware of here is given gits design
-
for history this entire graph is
-
actually immutable you can add new stuff
-
to it but you can't actually manipulate
-
anything in here I won't go into the
-
details of exactly how or why but just
-
assume that that's the case however
-
references are immutable so as you're
-
updating the history like suppose you
-
keep working on this piece of software
-
you create a new commit so I'm
-
representing that by the circle this
-
points to the previous commit I can
-
actually have say my fixed bug reference
-
is pointing here I can update this
-
reference to now point over here however
-
I can't for example make this point over
-
here that's not even a meaningful thing
-
to say because this is just the hash of
-
this object to change this hash I'd need
-
to change the contents of the object
-
which doesn't really make sense
-
all right any questions about that so
-
far that's basically it forgets data
-
model and then we'll go into actually
-
interacting with get via the command
-
line and we'll see how git commands
-
correspond with manipulations of a graph
-
data structure so any questions about
-
modeling history as trees of trees and
-
blobs and then snapshots these things
-
called commits being chained together
-
and you have references that can point
-
to particular nodes in this graph cool
-
no questions so basically once we have
-
objects and references like that's
-
basically all there is to a git
-
repository those are the two pieces of
-
data that it stores and at a high level
-
all get command line commands are just
-
manipulations of either the references
-
data or the objects data okay so for the
-
rest of this lecture I'm going to go
-
through some git commands it's basically
-
going to be an interactive demo similar
-
to the vim lecture and then you can
-
refer to the notes for full information
-
on these commands look of course it's a
-
really powerful tool we can't cover
-
everything in what 20 minutes all right
-
so I'm going to go over to this folder
-
called playground and I'm going to make
-
a new directory called demo CD into demo
-
and this directory is going to represent
-
the top level of my project it's
-
currently empty because I just created
-
it if I want to turn this into a git
-
repository I use the git init command
-
get in it stands for git initialize and
-
we see that it says initialized empty
-
git repository in blah blah slash dot
-
git if I do LS I still see nothing but
-
if I do LS - a there's a hidden file in
-
this directory called dot git if I do LS
-
get there's a bunch of stuff in here
-
this is the directory on disk where it
-
gets stores all of its internal data
-
namely the objects and the references
-
and you actually see here objects and
-
refs as two directories in here and
-
all the repository data will be stored
-
underneath those two directories one
-
letter command to keep in mind as we're
-
going through these is something called
-
get help get help takes a sub command as
-
an argument it gives you some help on it
-
so if I do get help in it for example
-
it'll tell me about the git init command
-
so now there are some commands for
-
figuring out what's going on with a git
-
repository like git status at a high
-
level says what is going on right now
-
and we see here let's ignore the first
-
line for now the second line says no
-
commits yet that's because we just
-
initialized a fresh repository and so
-
there is no history yet I'm actually
-
going to does anybody still want this
-
are kind of clear this part of the board
-
I'm going to as we go along draw how the
-
underlying objects and references data
-
is changing when I type in certain git
-
commands so right now this picture or
-
lack of picture represents the current
-
state of our repository it's empty there
-
are no snapshots so let's fix that let's
-
add something to our history here we
-
have no files so let me just go ahead
-
and create a file hello.txt with the
-
content hello world normally you'd have
-
your source code with actually useful
-
stuff in it now what I want to do is I
-
want to take the current contents of
-
this directory and turn it into a new
-
snapshot to represent say the first
-
state my project was in you might
-
imagine an interface for doing this
-
where there is like a git snapshot
-
command or get something else command
-
which takes a snapshot of the entire
-
state of the current directory for a
-
number of reasons git doesn't have a
-
command that works exactly like that
-
because git wants to give you a little
-
bit of flexibility as to what changes to
-
include in the next snapshot you take
-
this is something that's kind of
-
confusing to beginners sometimes so I'll
-
try to explain it right now git has a
-
concept of something called a staging
-
area and at a high level it's where you
-
tell git what changes should be included
-
in the next snapshot you take if we do
-
get status here we'll see that git says
-
no commits yet like it said before and
-
it says untracked files hello Tex
-
so this is saying that get notices that
-
there's a new file in the current
-
directory but it is not going to be
-
included in the neck
-
snapshot gets kind of ignoring it for
-
now but if I do get ad hello text and if
-
I do get status again it says now
-
changes to be committed new file
-
hello.txt and so now if I do the get
-
snapshot command which is actually get
-
commit which creates a new one of those
-
circles I drone the board over there
-
this file will be included in that
-
snapshot I'm I take so let me go ahead
-
and run git commit what this does is it
-
pops up my text editor and it lets me
-
type in a message that will be
-
associated with this commit and it's
-
really good to write high-quality commit
-
messages because then later when you're
-
looking back at your products version
-
history you'll know why you made certain
-
changes I'm going to add this relatively
-
useless commit message but we have a
-
link in the lecture notes for a guide on
-
how to write high-quality commit
-
messages so now that I've done that get
-
prints out some output master ignore
-
that bit for now this thing is the hash
-
of the commit I just created so now I
-
have in my history a single node this
-
has in it a tree that has a single blob
-
a single file hello.txt with the
-
contents hello world and then this has
-
the sha-1 hash for 2fb
-
something something something it's
-
actually truncated in the get interface
-
as well this is just printing out my
-
commit message again and it says as a
-
reminder I just added hello dot text and
-
so now if I use the git log command so
-
the git log commit is really useful in
-
that it helps you visualize the history
-
the the commit graph if I do question
-
that's a great question so the question
-
is what exactly does this hash
-
correspond to so this is the hash of the
-
commit the commit contains inside of it
-
the hash of the tree along with whatever
-
other information so I can actually use
-
get cat file - P this number this is
-
kind of like a get internals command
-
that will print out the contents of this
-
commit so you can see this kind of maps
-
to
-
data structure I drew on the board over
-
there so this commit has inside of it
-
this tree and then I'm the author and
-
this is the commit message and so on and
-
I can continue digging down here so you
-
can take this hash of this tree and do
-
get cat file - P this hash here it says
-
that this tree has inside of it a single
-
entry hello text and that file has it's
-
a blob and it has this hash I can do get
-
cat file - P this thing and it will show
-
me the actual contents of that file so
-
these are like internal git commands to
-
explore objects in the object store
-
question that's a great question so the
-
question is why did I have to use get
-
add why can't you just commit all
-
changes and the answer is well there
-
kind of is a way to commit all changes
-
if you do get commit - a this commits
-
all the changes that were made to files
-
that are already being tracked by git so
-
anything that was included in the
-
previous snapshot but has been modified
-
since then it doesn't include new things
-
there's also variants of git add like if
-
you do get add colon slash this will add
-
everything in the top from the top level
-
down of your repository but at a higher
-
level the reason we have this separation
-
between git add and git commit and why
-
get come it doesn't just snapshot the
-
entire directory is that they're often
-
situations where you don't want to
-
include everything in the current
-
snapshot like here's a couple examples
-
one is that I might be packing on my
-
project and I go ahead and implement two
-
features maybe I don't want to have a
-
single snapshot that comes after this
-
one that's like I implemented feature a
-
and feature B maybe I want to create two
-
separate nodes in the history so that it
-
looks like first I implemented feature a
-
and then after that I implemented
-
feature B so I have one snapshot that
-
only includes a and then the next one
-
includes both a and B git add is a tool
-
and like the staging area in general is
-
a tool that will allow me to do that
-
sort of thing
-
another example is suppose I'm working
-
on a bug fix and I have printf
-
statements I've put all over my code and
-
then finally I find the bug and there's
-
a plus one somewhere where there
-
shouldn't be a plus one so go fix that
-
and then I want to take a new snapshot
-
right with my fix
-
but the snapshot probably should include
-
all of my print statements it just needs
-
to include the fix of removing that plus
-
one so one way I could solve that issue
-
is I can go in
-
annually remove all the print statements
-
but it has a much better way of doing
-
that there's actually a way to specify
-
that I only want to add the change of
-
removing that plus one then I can commit
-
that take the new snapshot and then I
-
can throw away all the other changes
-
there are commands for doing that and
-
some of them are linked in the lecture
-
notes so those are two ways in which you
-
can use the staging area to help you and
-
why there isn't just like a snapshot
-
everything command yeah so mm-hmm yeah
-
John John points out the yeah yet
-
another example is you might have log
-
files in your current directory that
-
your program runs when you run it and
-
you probably don't want to include those
-
when you take a snapshot there's
-
probably other things like if you
-
compile your project you end up with a
-
bunch of dotto and like elf files you
-
probably don't want those to be part of
-
your history so going back to what I was
-
showing you before I'm going to clear
-
the terminal screen and then show you
-
the git log command so get logged lets
-
you visualize the version history and
-
this is an incredibly helpful command by
-
default git log shows you a flattened
-
version of the version history so even
-
though the version history is a graph
-
this will linearize it and just show
-
things in order i personally find that
-
confusing so I almost never use git log
-
and instead get log takes some arguments
-
that actually show the history as a
-
graph so you can treat this as a magic
-
incantation for now and you can read the
-
documentation if you want to figure out
-
exactly what each of those flags does
-
but for now this doesn't look all that
-
different because we only have one node
-
in our graph so visualizing it as a
-
flattened thing versus a graph doesn't
-
look all that different let me go ahead
-
and create a new snapshot and then we
-
can run this command again and then see
-
exactly what it does so I will put
-
another line into hello dot text and so
-
if I cat hello dot text it has the thing
-
it had before plus this I can do get
-
commit and notice this doesn't do
-
anything it just says no stained state
-
no changes staged for commit why is that
-
it's because I didn't add this to the
-
staging area I didn't tell yet but like
-
this is something that should be
-
included in the next snap
-
so if I do get ad hallo text get status
-
it says okay this change is ready to be
-
committed this modification to this file
-
and now I can do git commit I'm gonna
-
put in a useless commit message and the
-
new changes have been made and so now my
-
history has another note in it and then
-
this note has some hash that's shown on
-
the screen and now if I rerun that
-
command from earlier the git log with
-
all these arguments it actually starts
-
looking more like a graph here notice
-
that this is like that graph turned this
-
way the more recent so it's shown
-
vertically not horizontally and the more
-
recent commits are shown at the top this
-
is showing one commit it shows as commit
-
hash shows a bunch of metadata including
-
the commit message and then this is the
-
part I want to talk about next so
-
remember we talked about objects like
-
the actual contents of your repository
-
and then we talked about references ways
-
of naming things in the repository with
-
human readable names so master is one
-
reference that's created by default when
-
you initialize it get repository and by
-
convention it generally refers to like
-
the main branch of development in your
-
code so master will represent like the
-
most up-to-date version of your project
-
so here you can think of master as a
-
pointer to this commit and as we add
-
more commits this pointer will be
-
mutated to point to later commits then
-
we also see here head this is a special
-
reference and get it's a reference like
-
master but it's special in some way and
-
head basically is used to refer to where
-
you are currently looking right now any
-
questions so far yeah question
-
that's an excellent question so the
-
question is work with github before and
-
you have to create an account to do that
-
how does github relate to get and the
-
answer to that question is github is a
-
repository host for get so you can
-
create an account on github and store a
-
git repository there and use that to
-
collaborate with other people but git as
-
a command-line tool is just independent
-
from github so you don't have to use
-
github to use git you don't have to use
-
github declare it with get either like
-
there are other providers of git
-
repositories like bitbucket or get lab
-
or things like that and so yeah github
-
is a host for github repositories any
-
other questions yeah so the question is
-
if you want this repository to end up on
-
github how do you do that yeah there's a
-
separate set of commands for doing that
-
there's a so that concept of having your
-
local copy of version history interact
-
with another copy so the other copy is
-
called a remote and then their set of
-
commands for interacting with git
-
remotes and sending data from your
-
remote or from your copy to get remotes
-
and getting data from git remotes into
-
your local copy and we'll cover that
-
later in this lecture or maybe in the
-
lecture notes Ron might make a
-
supplemental video to go along with this
-
lecture any other questions okay a
-
couple other basic commands to show you
-
so so far I've shown you a version
-
history and we've taken a file and
-
modified it but we haven't really made
-
use of the history in any way besides
-
reading the messages so one useful git
-
command is something called git checkout
-
and this is a kind of wacky command it
-
lets you do a bunch of different things
-
but one thing it lets you do is move
-
around in your version history so one
-
thing I can do is give get checkout the
-
commit hash of a previous commit and I
-
don't need to type the whole thing I can
-
give it a prefix and it's
-
to figure out what I'm talking about and
-
what this will do is it will change the
-
state of my working directory to how it
-
was at that commit so here if I do cat
-
hello text recall that I had only one
-
line in here before at the first commit
-
and later I added that second line now
-
if I do that get logged command and this
-
command is super helpful like it shows
-
you all the things if I do this command
-
notice that this output looks a little
-
bit different than before like my actual
-
history contents the commits themselves
-
in the way they relate to each other and
-
all that have not changed but the
-
references have so notice that head is
-
down here even the master is still up
-
here so at high level what this is
-
telling me is this is what I'm looking
-
at right now if I want to go back here I
-
could type git checkout and this commit
-
hash
-
does anybody know a different thing I
-
could type here instead of this long
-
hash in order to go back to this commit
-
yeah you can give it the name of this is
-
a branch colored in green here and it
-
refers to this commit so I can give it
-
the short name or the human readable
-
name instead and now if I do cat hello
-
text notice that it has that second line
-
[Music]
-
yeah yeah so to repeat that git checkout
-
actually changes the contents of your
-
working directory and so in that way it
-
can be a somewhat dangerous command if
-
you misuse it for example you can see if
-
I modify hello text and then try that
-
get checkout command from earlier
-
actually notice here that it says error
-
it says there's a file that's been
-
modified and the git checkout would
-
destroy your modification you probably
-
want to do something about that but
-
there are flags like for example get
-
checkout - eff does this forcibly and
-
now it's throwing away my changes so
-
yeah get checkout has the potential to
-
well it certainly does modify things in
-
your working directory and it can
-
actually destroy changes if you're not
-
careful question
-
exactly yeah this is exactly what I want
-
you to be thinking about how these like
-
the crazy get interface commands
-
correspond to mutations to this graph
-
and mutations to the reference or like
-
additions to the graph in mutations to
-
the references map so yeah exactly
-
get checkout moves the head pointer and
-
then also mutates the contents of your
-
working directory with the contents that
-
the head pointer now points to of course
-
my name for that commit any other
-
questions all right so one other basic
-
command I want to show you is the git
-
diff command so I'm going to modify this
-
file and put some changes in it the git
-
diff command can show you what's changed
-
since the last snapshot it's just
-
helpful for like knowing what's going on
-
with your project git diff can also take
-
extra arguments like you can do git diff
-
and say compute a diff not with respect
-
to the last snapshot the last commit but
-
with respect to this and say ok two
-
lines have been added since this point
-
to hello dot text
-
question so your the question is what
-
does this command do without this extra
-
argument here that's a good question so
-
what this does is it computes a DIF with
-
respect to head and looking at my get
-
log hat is pointing to here so it's
-
doing a get diff with respect to this
-
commit and you can actually specify that
-
explicitly you can do get diff had
-
hollow text okay yes uh-huh
-
so that's a good question it's like how
-
can hello dot text be different than
-
head because head refers to where you
-
currently are so to clarify head refers
-
to the last snapshot so like in my
-
picture here had and master are both
-
here and the current working directory
-
is kind of independent of this like
-
you're going to delete all the files in
-
here it doesn't change the history graph
-
or the references and so yeah you can
-
have differences between here and here
-
and at a high level this is how you work
-
on a project like you make some changes
-
here you get add them to stage them and
-
then you get commit and that creates a
-
new snapshot here good question
-
any other questions yep
-
so the question is does get actually
-
save all this stuff kind of in the
-
obvious way or is it doing something
-
fancier the answer is is it is doing
-
something a little bit fancier but you
-
can it has an interface that lets you
-
think of it like it stored that way in
-
practice get uses Delta compression it
-
also does some other stuff but yeah the
-
on disk representation is actually
-
reasonably efficient question yeah
-
that's a good question so the question
-
is here we were comparing the current
-
working directory with a particular
-
snapshot in the past can we compare two
-
snapshots with each other like at two
-
different points in the history and yeah
-
I get diff can take yet another argument
-
here so I can for example compare head
-
with it did in the wrong order I can
-
compare what change from here to head in
-
hello text and it shows me that I added
-
the second line in there any other
-
questions
-
yeah so the question is you're working
-
on a shared project in a Dropbox folder
-
and anyone a migrate to get does it make
-
sense to turn the Dropbox folder into a
-
git repo do not use get inside dropbox
-
dropbox will corrupt your gate repo
-
there are good solutions to doing that
-
one is just use github otherwise I talk
-
to me after class and there ways of
-
using Dropbox as I get remote safely any
-
other questions
-
next we're going to talk about branching
-
and merging which is another powerful
-
feature of get that you almost certainly
-
use both when working on your own
-
projects and when collaborating with
-
others for this series of demos we're
-
going to rather than work with a simple
-
text file actually write a simple
-
computer program because it'll better
-
illustrate the concepts of branching and
-
merging and as we go through this
-
demonstration we'll keep in mind how the
-
get interface commands connect to the
-
underlying data model connect to objects
-
and references and how these commands
-
modify those two data structures let me
-
do a get status to see the current state
-
of my repository here I've modified
-
hello text I actually don't really care
-
about this modification anymore this is
-
some random file if I do get check out
-
folio text this is another different use
-
of the check out command which basically
-
throws away the changes that I've made
-
in the working directory and sets the
-
contents of hello text back to the way
-
it was in the snapshot that head points
-
to if I like it get logged - - all -
-
graft - - decorate it'll show me that
-
here I added the initial attacks and it
-
added that single line here and so now
-
whole text doesn't have that third line
-
I'd added it just has the original -
-
next time we should write a very simple
-
program we'll call this program and a
-
mold pie and let me just go ahead and
-
write a program that it prints a little
-
bit of output when I run it let's see
-
[Applause]
-
so when I run this program it runs main
-
mean calls default and then let me go
-
right ahead go ahead and define default
-
and default is going it's just going to
-
print hello so this is a program that
-
greets its user and so if I run animal
-
dot pi I'll see that it just prints
-
hello so that'll be our starting point
-
if I do get status it shows me that
-
animal dot hi is an untracked file to
-
begin with i want this to be part of my
-
part of the snapshot so i'm going to get
-
add animal dot hi to add it to the
-
staging area and do then do a git commit
-
here I'm going to write yet another
-
useless commit message don't actually
-
write commit messages like this in real
-
projects but for now this is fine so now
-
I have this basic animal dot pi and if I
-
look at my get history now I have this
-
latest snapshot this is the commit hash
-
and this is where the master branch is
-
pointing now we're actually way to
-
demonstrate how to use git branches to
-
have parallel lines of development they
-
get branch command or the branch
-
sub-command is used to access
-
functionality related to branching just
-
running git branch by itself lists all
-
the branches that are present in the
-
local repository it can also take an
-
extra argument - V V to be extra verbose
-
and print some extra information if we
-
do get branch and then specify the name
-
for a new branch git will create a new
-
branch which is just a reference that
-
points the same place where we're
-
currently looking so now there's a new
-
reference called cat reference in this
-
case is the same as branch there's a new
-
branch called cat which points to
-
wherever head was pointing if I look at
-
the git log again I'll see that here had
-
points to master masters over here and
-
this is also where the cat branches so
-
now I have two branches two references
-
that resolve to the same commit get is
-
actually aware of not only which
-
snapshot in the history are currently
-
looking at so had points to this commit
-
but it's also aware of had kind of being
-
associated with a with a branch so here
-
head is associated with master and it's
-
the case that if I create a new snapshot
-
if I type git commit at this point the
-
next snapshot will be created and
-
I'll point to that new snapshot master
-
will be updated along with head if I do
-
get checkout cat what this does is it
-
switches to the branch cat it replaces
-
the contents of the working directory
-
without with whatever cats pointing to
-
which in this case is the same as the
-
contents before but now if I look at the
-
git log again now I have head point to
-
cat instead of master and then master
-
also points to the same place the same
-
underlying commit and now at this point
-
if I make changes to my current working
-
directory and make a new commit the cat
-
branch the cat pointer will be updated
-
to point than you commit where as master
-
will continue pointing wherever it
-
pointed before so let me go ahead and
-
modify animal PI to add some cat related
-
functionality so I'm going to say that
-
if sISTAR V one is cat then run the cat
-
function otherwise run the default
-
function and then let me go ahead and
-
import define the cat function so cats
-
don't say hello them you know so cat
-
prints meow straightforward enough so
-
now if I run animal dot PI and give it
-
the cat argument it says meow if I give
-
it some other argument it defaults back
-
to the hello all right so simple change
-
I made if I do a get status that says
-
that animal that PI has been modified
-
Farren get diff it'll show me what's
-
changed since the last commit
-
so here I've added this cat function
-
highlighted in green then also change
-
the main function a little bit now here
-
if I do get add animal dot I get commit
-
I mean actually you write a slightly
-
more useful commit message this time I'm
-
going to add cat functionality and now
-
if I look at the git log I see a little
-
more stuff I'm going to show you one
-
more argument to this get logged command
-
there's an argument - - one line one
-
line spelled correctly which shows a
-
more compact representation of the graph
-
so sould be a more useful thing to use
-
because we're super zoomed into the
-
screen and there isn't that much space
-
to show a long commit history so here we
-
see the sequence of commits is still
-
linear and we have master still pointing
-
wherever it pointed before where we just
-
had the basic underlying animal top high
-
functionality but now we have this cat
-
branch which adds the cat functionality
-
we could for example get checkout master
-
to go back to the master branch and then
-
here if we look at animal dot pie it
-
doesn't have the cat functionality
-
anymore if we look at the git log we'll
-
see that head is pointing to master so
-
so we can jump back and forth between
-
parallel lines of development so now
-
that we have the cat functionality
-
suppose that we want to work on adding
-
dog functionality in parallel and
-
suppose that in this case like the cat
-
functionality is under development or
-
maybe somebody else is working on it so
-
we just want to start from the base
-
master commit and build the dog
-
functionality starting from there so now
-
what do I want to do I want to create a
-
new branch dog for adding the dog
-
related functionality and I'll
-
eventually merge it in later so I can
-
use the git branch dog command followed
-
by the git checkout dog command to
-
create a new dog branch and then check
-
it out there's actually a short form for
-
this get checkout - b-dawg so this does
-
get branch dog get checkout dog and now
-
if I look at my graph I have cat where
-
it was before master where it was before
-
but now head instead of pointing to
-
master as it did before now head points
-
to this newly created dog reference
-
which is also at the same commit so at
-
this base commit and now I'll go ahead
-
and add my dog functionality so let me
-
go and define my dog function dogs don't
-
say hello they say woof and then I'll
-
add some similar functionality here to
-
decide whether to run default or dog so
-
if the first argument is dog then I want
-
to run the dog function otherwise whoops
-
otherwise I want to run the default
-
function so here's what I've changed
-
with respect to the base commit wherever
-
master is pointing so I've added the dog
-
function and I've changed mean a little
-
bit so a kind of parallel modification
-
to what I did in the cat branch let me
-
go ahead and get add animal Titus add up
-
to the staging area if I do get status
-
I'll see that this change will be
-
committed when I make the next commit
-
and then I do get commit add
-
functionality now when I look at the get
-
graph it actually looks kind of
-
interesting compared to the ones we've
-
looked at before this shows that these
-
three commits
-
are in common with the ones that come
-
after it but then the history is
-
actually forked after this point and I
-
have this one commit that adds cat
-
functionality in one line of development
-
and then I have this other commit that
-
adds dog functionality in this other
-
line of development and then using the
-
git checkout command I can switch back
-
and forth between dog and cat and master
-
so this is great I can do development in
-
parallel on different features but this
-
is only really useful if I can
-
eventually combine those things back
-
into my original line of development to
-
have both features in a single version
-
of my source code so the command that's
-
used to do that is get merge so like get
-
branch and get merge can kind of be
-
thought of as opposites let me check out
-
get check out master let me check out my
-
master branch so now you see head points
-
to master and then I want to merge the
-
cat functionality and the dog
-
functionality into master and to do that
-
I can use the git merge command and get
-
merge is actually pretty fancy and I can
-
actually merge cat and dog at the same
-
time but for this demonstration we're
-
going to only merge one thing at a time
-
so first I'll type git merge cat and
-
gets us some stuff here it says
-
fast-forward so what is going on here
-
well this is one interesting thing that
-
get can do when you're at a particular
-
commit and you merge some other branch
-
in where that other branch has the
-
current commit as a predecessor it's not
-
necessary to create any new snapshots or
-
do any other fancy stuff basically this
-
this master branch here this pointer to
-
this commit can just be moved to point
-
here instead to incorporate that cat
-
functionality and so if we look at the
-
git log again we see that master is
-
basically pointing to the same places
-
wherever cat was pointing all right so
-
now we're on the master branch and it
-
has the cat functionality great we're
-
halfway there if we look at animal dock
-
by it has the cat functionality but it's
-
missing the dog stuff so let's try get
-
merge dog next something a little bit
-
more interesting happens this time so
-
this time the branch can't be fast
-
forwarded like it was before it's not
-
that one thing which is strictly older
-
than the other thing there's been
-
parallel development that may be kind of
-
incompatible with the current set of
-
changes and
-
it does its best job at automatically
-
merging the changes from this other
-
branch
-
so it says Auto merging animal dot pie
-
but in this particular case there's what
-
was what's called a merge conflict so it
-
wasn't able to automatically resolve on
-
the call the conflicts between these two
-
parallel branches of development and
-
this is something you'll see in practice
-
when you're working on real software
-
projects and they're complicated
-
slightly incompatible changes happening
-
in parallel so at this point it's left
-
up to the developer to fix this issue
-
and get offers some functionality in
-
order to help resolve merge conflicts
-
there's a program called git merge tool
-
and in my particular setup this will
-
launch vim diff actually this is not
-
configured them diff I think will start
-
the right program let me set up my get
-
to launch the correct tool actually
-
let's skip that part and let's just
-
manually look at this event if so
-
there's a program called vim diff which
-
can be set up to be launched when you
-
type in get merge tool which is a tool
-
that you use when you try get merged in
-
there merge conflicts but in this
-
particular case we'll just manually
-
resolve them
-
so let me I did get merge - - abort so
-
it put me back in the state I was before
-
I tried that git merge so this is the
-
current state of my repository I'm back
-
to the case where master is at the same
-
place as cat and I'm about to merge in
-
dog so I do get merged dog and it says
-
conflict merge conflict in animal Pi so
-
let's just look at animal dot PI
-
directly so it looks like this top part
-
looks pretty reasonable it has both the
-
cat function and the dog function which
-
is exactly what I want but now I see
-
some weird stuff in main and this is
-
where I add slightly incompatible
-
changes so here it says that in one
-
thing like basically the branch you were
-
on you had this content and then the
-
branch you're trying to merge had this
-
content and then these things here the
-
angle brackets and the equals our
-
conflict markers so this is where you
-
were and this is the thing you're trying
-
to merge in and it's basically saying
-
that it was this on one case this in the
-
other case and it doesn't really know
-
how to resolve
-
these two and it's left up to the
-
programmer to fix this problem so in
-
this particular case we can go ahead and
-
delete the conflict markers and then
-
turns out that we can actually
-
concatenate this code together and does
-
the right thing maybe we want to make a
-
small change like this should be an if
-
this should be an else--if and this
-
should be an else that might make a
-
little bit more sense actually I think
-
it's necessary for correctness here so
-
the programmer needed to modify the code
-
a little bit in order to make it
-
sensible when it's merged together but
-
once the programmer has fixed the merge
-
conflicts fixed the stuff between the
-
conflict markers you can save this file
-
and we can do get merged - - continue to
-
tell git that we fix the issues it's
-
necessary to re add animal PI to tell
-
git that we've actually fixed these
-
issues and then we need to get merged -
-
- continue it pops up an editor and we
-
can give a commit message for this new
-
commit that we're about to create and
-
now if we look at the git history we
-
have the single commit that represents
-
our merge commit that we just made which
-
merges in the dog functionality and here
-
this has as parents both the dog commit
-
and the cat commit
-
so both these branches appear in our
-
history from this point backwards and
-
this current commit that we're on
-
incorporates the functionality from both
-
of these branches so if we run animal
-
duck fight with cat it does the cat
-
thing if we run it with dog it does the
-
dog thing and if we run it with anything
-
else it falls back to the default
-
implementation so this is a
-
demonstration of how you branch and get
-
to do development on different things in
-
parallel and then how you can use the
-
merge command and get to resolve those
-
different branches and combine them
-
together into a single snapshot that
-
includes all the functionality that was
-
developed in parallel with each other
-
and then one thing that can happen when
-
you're doing get branching and merging
-
is you run into merge conflicts and
-
these conflicts show up as conflict
-
markers and text files you can manually
-
resolve them and kit also has some tools
-
that can help with this though these
-
tools are kind of advanced and will only
-
refer to them in the lecture notes and
-
not actually demonstrate them for you
-
so that's get branching and merging
-
any questions no great so moving on to
-
the next topic of this lecture we will
-
talk about git remotes so this is
-
basically how you collaborate with other
-
people using git a git repository the
-
stuff contained in this dot git folder
-
represents kind of an entire copy of the
-
history it has the objects in the
-
references and contains all the previous
-
snapshots and the way you collaborate
-
with other people using git is that
-
other people can also have copies of the
-
entire git repository and then your get
-
copy your local instantiation of the
-
repository can be aware of the existence
-
of other clones of the same repository
-
and this is a concept known as remotes
-
so the git remote command will list all
-
the remotes that git is aware of for the
-
current repository and in our case with
-
this repository right here this command
-
get remote just doesn't print anything
-
because we haven't configured any
-
remotes it is only aware of the single
-
local copy of the repository that we're
-
working with here but in practice if
-
you're collaborating with other people
-
your git might be aware of the copy of
-
the code that is on github and then
-
there's a set of commands to send
-
changes from your local copy of the
-
repository to a remote that your get is
-
aware of so sending stuff from your
-
computer to github for example and
-
there's another set of commands for
-
fetching changes made in a local
-
repository to get changes from github
-
into your own local copy in this
-
demonstration here we actually won't go
-
and configure a github account and log
-
in and create a new repository on there
-
you can find other tutorials for doing
-
that we'll actually just use a separate
-
folder on the same computer and treat it
-
like a git remote so let me I'm in the
-
demo folder here let me go up one
-
directory I have a directory called
-
playground that has this demo folder and
-
I'll go ahead and create a new directory
-
in here and I'll call it remote and then
-
do get in it - - bear in here those are
-
the command that you'll probably never
-
need to use in regular usage but now
-
what I've done is made remote into a
-
folder that's appropriate to use as a
-
git remote
-
so now going back into my demo folder
-
here might mean repository I can do get
-
remote to list the remotes
-
there's nothing yet but I can use the
-
git remote add functionality to make my
-
local repository aware of the existence
-
of a remote so I can do git remote add
-
and then the format for this is that
-
remotes have names and then they have a
-
URL so in this case I'll use the name
-
origin does often use by convention as
-
the name of the remote if you're only
-
using one and then for the URL normally
-
this will be like a github URL or
-
something like that or bitbucket URL or
-
get live URL if you're using an online
-
repository hosting service but in this
-
case it's just a path to a folder on my
-
local machine there's a folder in the
-
parent directory called remote that will
-
act as the git remote for this
-
repository so now once I've done that
-
there's a set of commands for
-
interacting with this remote one command
-
that's useful is the git push command
-
this command can send the changes from
-
your computer to the remote and the
-
format for this command is that git push
-
takes in the name of a remote and then
-
it takes in a local branch name : a
-
remote branch name and what it does is
-
it creates a new branch or updates a
-
branch on the remote with the name
-
specified here and sets it to the
-
contents of the branch specified here so
-
a concrete use of this might look like
-
git push I've only one remote called
-
origin and then what should I push let
-
me look at my history graph I have a
-
bunch of things I could push let me get
-
pushed to origin the master branch from
-
my local machine :
-
master so I want to create a branch on
-
the remote machine with the name master
-
that is going to be the same as the
-
master branch on my local machine so let
-
me go ahead and run that command it
-
prints out some stuff and it says on the
-
remote I created a new branch remote
-
master points to the same branch as
-
master on my local machine and now if I
-
do a git log it shows me so in blue is
-
head where I currently am in green are
-
all the branches in my local git
-
repository and now we see one new color
-
here that we had
-
seen before so in red get shows
-
references that are present on the
-
remotes that my local copy is aware of
-
so on the remote origin there's also a
-
branch that happens to have the name
-
master that points to the same place as
-
my local branch master points and so now
-
if I make updates to my local copies
-
like suppose here I go in and change the
-
capitalization of these things and then
-
get had animal dot hi get commit here's
-
a short form for commit with a message
-
so it doesn't pop up the editor I'll
-
give it a late and commit message and
-
now if I look at the git graph now I see
-
that I've created this new snapshot here
-
that has this lower casing stuff in it
-
but origin master is still back here so
-
if somebody else looks at the remote
-
they will only see the changes up to
-
here and we can actually demonstrate
-
this functionality so let me go ahead
-
and open up a new tab here and go into
-
my playground directory the git clone
-
command is a command that somebody can
-
use to start from some copy of a
-
repository somewhere and make their own
-
local copy so this is often a command to
-
use when starting out with a git repo
-
like there might be something available
-
on github and you want to copy it all in
-
your machine in order to look at it or
-
start doing development and so the
-
format for git clone is that it takes in
-
a URL and then it takes in a name for a
-
folder for where to clone it so in our
-
case here we're just going to clone from
-
this remote directory we're pretending
-
that this remote folder is actually a
-
remote machine and then we're all clone
-
it into the folder called demo two so
-
cloning into demo 2 done and I'm going
-
to CD into that directory and then now
-
here I'm going to rename these tabs at
-
the bottom I will say this one's machine
-
one and this one's machine too so you
-
can think of these as two different
-
people on different machines with their
-
own copy of the repository and they're
-
both interacting with the single remote
-
so if I do my get log command that I've
-
been doing on machine one I see on
-
Machine 2 I see this portion of the
-
history so master on machine 2 is
-
pointing to the same places origin
-
master
-
and it says merge branch dog so if I
-
look at animal dot pie here
-
it doesn't have the changes that I made
-
on machine to even though there are
-
sorry on machine one where I have this
-
new commit that is only present on this
-
machine but not on the remote and not on
-
machine too so if I want to fix that if
-
I want to send these changes up to the
-
remote like think of it as sending it up
-
to github err up to the machine that's
-
holding or maintaining the source code I
-
can use the git push command again git
-
push origin master colon master and this
-
will work but this is kind of annoying
-
to type every time you want to do this
-
like this is a really common operation
-
so git has a way of making this a little
-
bit simpler it has a way of maintaining
-
relationships between branches on your
-
own local machine and branches on remote
-
machines it is a way of knowing what
-
branch on a remote machine a local
-
branch corresponds to so that you can
-
type in a shortened version of git push
-
and it'll know what all the arguments to
-
the expanded form would have been and
-
there a couple different syntaxes for
-
doing this one way is to use the git
-
branch - - set up stream to command and
-
what this does is for the branch that's
-
currently checked out which is master it
-
will set the upstream - and I'll type in
-
origin master and see now it says branch
-
master set up to track remote branch
-
master from origin now if I type in get
-
branch - VV remember this is tell me
-
about all the branches that I know about
-
in a very verbose way that's what the -
-
VV means I have three branches on my
-
local machine on machine one I have cat
-
dog and master and master on my local
-
machine corresponds to origin master so
-
now I can type in just get push without
-
all the extra arguments I could have
-
done this as git push origin master
-
colon master but it wasn't necessary
-
it'll know that I want to push to origin
-
master and it will make that change
-
so now these changes are present on the
-
remote we can go over to machine to
-
pretend we're the other guy interacting
-
with this repository and if I do might
-
get logged command I still don't see the
-
changes so what's going on here
-
well it's necessary in order to run a
-
separate command
-
or it's necessary to run a separate
-
command in order to have these changes
-
present here by default all the get
-
commands don't talk to the internet it
-
all works locally which means it works
-
very fast but then there are special
-
commands for saying that you want to
-
retrieve changes that have made
-
somewhere else and the command that's
-
used for doing that is a command called
-
git fetch get fetch takes thee as an
-
argument the name of the remote but if
-
there's only one it'll just use that so
-
you can type in git fetch and then it's
-
talked to this remote repository and it
-
says that there's some update on the
-
remote and we can visualize it by
-
running git log and now we see here
-
another situation that we hadn't seen
-
before we have master on our local
-
machine the master branch doesn't change
-
the git fetch command doesn't change any
-
of our local history our local
-
references like our branches but now
-
it's aware that origin master has been
-
updated to point to this new commit and
-
there's a separate command we can do get
-
merge in order to move master up to here
-
or there's another command called get
-
pull which is the same as doing git
-
fetch and then get merge so if we just
-
do get pull here for example it will say
-
it's fast forwarding is merging in
-
origin master into our master and now if
-
we look at the git history graph we've
-
currently checked out master master
-
points to the same place as the origin
-
master that we're aware of and all the
-
changes between Machine 2 and Machine 1
-
are in sync so those are the basic
-
commands for interacting with git
-
remotes so there's the git remote
-
command for listing remotes and adding
-
and removing them and things like that
-
and then there's the git push command
-
for sending changes from your local copy
-
of the repository to the remote and then
-
there's the git fetch command which is
-
for retrieving changes to a repository
-
that are present on a remote and getting
-
the changes on your local machine and
-
once you retrieve those changes you can
-
use git merge to update your local
-
branch to point to the same place where
-
the remote branch does or you can use
-
the git pull command which does
-
basically the same thing as get fetch
-
plus git merge and then of course
-
separate from all these commands
-
is the clone command that we talked
-
about a little while ago which is for
-
taking a copy of remote repository and
-
initializing the local repository from
-
that copy so that's a quick overview of
-
the different commands used to interact
-
with git remotes and now these are kind
-
of complicated and it takes a while to
-
master all the different variations of
-
this and understand how they're actually
-
used in practice but hopefully this acts
-
as a quick introduction and you can see
-
how the different commands relate to the
-
underlying data model all these commands
-
all they do is fetch new objects from
-
other places or send objects from the
-
local mission to other places and these
-
commands mutate references so relating
-
these relating the interface of git and
-
some of these kind of badly designed
-
commands to the underlying data model
-
can help it make a lot more sense the
-
final topic we're going to cover today
-
is it's a kind of overview of other
-
things that get can do that we're not
-
going to go into detail in teaching you
-
how to do but we just want to tell you
-
that these functionalities exist in case
-
you need to do these things yourself you
-
can look up the documentation and find
-
out exactly how to do it one thing is
-
the git config command like a lot of
-
tools we've looked at like the shell and
-
T MUX and things like that
-
git is highly configurable and it's
-
configured using a plain text file which
-
can be edited either through the
-
command-line interface so git config can
-
take in flags that will modify this text
-
file or you can edit the dot git config
-
file in the home folder with plain text
-
configuration and so for this lecture
-
I've actually cut out most of them I get
-
config and only left in my username and
-
email for what will go in to get commits
-
but there's a lot of stuff you can put
-
in here which will make it behave nicer
-
it behaved the way you want it to and
-
you can look online for different ways
-
people have configured their get configs
-
oftentimes people have documentation in
-
their kit configs
-
which can be found on github there's a
-
couple other random commands that could
-
be useful one is for when you want to
-
clone a repository with git clone that's
-
really gigantic get cloned by default
-
copies the entire version history for
-
the remote it's downloading the
-
repository from but there's an argument
-
you can pass it which is - - shallow
-
which will avoid doing that so if
-
there's some copy of some code on github
-
say that you want to get a copy copy of
-
on your local machine but that
-
repository is really gigantic and has a
-
billion commits he's get cloned - -
-
shallow this will be much faster but
-
then of course he won't have the version
-
history on your local machine you'll
-
just have the latest snapshot another
-
command that we find really useful when
-
doing development on real software
-
projects is an interactive version of
-
the git add command so to demonstrate
-
this I'm going to go ahead and make a
-
couple different changes to my animal PI
-
one change I'll make here I'll change
-
some text here and then I'll put a new
-
print statement here so let's pretend
-
that this first change was some real
-
change I wanted to make say it's a bug
-
fix and this other change here was a
-
printf that I added for debugging but I
-
don't actually want to commit in the
-
next snapshot if I do a get diff it'll
-
show me that yes I've made these two
-
changes and if I do get add animal dot
-
pi it will stage both of those changes
-
for a commit and that's not what I want
-
I could go manually remove this debug
-
print and then do this get animal dog
-
get add animal dot PI but there's an
-
easier way to do it
-
there's this get add - pika man which
-
lets me interactively stage pieces of
-
files for it commit and so there's some
-
interface for working with this so here
-
it's saying do I want to stage both of
-
these changes and no I don't but I'm
-
going to split it into two smaller
-
changes this one I do want to keep so I
-
say Y for yes and this one I don't want
-
to keep so I say n for no and then if I
-
do get diff - - cached this will show me
-
what changes are staged for commit so
-
now it shows only the actual change I
-
wanted to keep if I do get diff it'll
-
still show me the other change that is
-
not going to be part of the next the
-
next commit which is the change I didn't
-
want to keep and then with this I can do
-
get commit specify some commit message
-
now I only have this change left and
-
then I can do get check out animal to
-
apply to throw away this change so get
-
add - P for interactive staging is a
-
useful thing a couple other commands
-
that you can look up on your own are the
-
get blame command so this commands kind
-
of ominous but it can be used to figure
-
who edited what line of a file and you
-
can also find the corresponding commit
-
that was responsible for modifying that
-
particular line of that file and then
-
you can look up commit messages
-
associated with that and whatnot so this
-
is not that interesting to do in our
-
current toy repository but I'll go over
-
to the repository for the class website
-
and we can look at some particular file
-
here and let me go to some particular
-
line here and I can be looking at this
-
me like oh why was this particular line
-
added what does it mean and I can look
-
at the git blame for this so if I do get
-
blame config dot yml it'll print out all
-
the lines kind of in the right column
-
and then in the left side it'll show me
-
what commits that change was made in and
-
by whom and then looking at this like I
-
can go down to this collections line it
-
was made in this commit that's the last
-
commit that modified that line and now I
-
can use the git show command to get
-
information for that particular commit
-
oh and this is kind of useful redo
-
lectures is a collection that's probably
-
what was related to that collections
-
line and then beyond just showing the
-
commit and the commit message it also
-
shows me the actual changes introduced
-
in that particular commit and they can
-
go look through them and understand
-
what's going on another kind of cool
-
command is a command called git stash so
-
let's go back to our demo repository and
-
demonstrate that here so say if some
-
changes here and I temporarily want to
-
put them away if I do get stash it will
-
revert my working directory to the state
-
it was in at the last commit so if I do
-
cat hollow text that change is gone but
-
it's not just deleted its saved
-
somewhere and if I do get stash pop it
-
will undo the stash so now if I look at
-
hello text it has the changes I made so
-
yet another useful command another
-
really neat command is something called
-
git bisect and this has a complicated
-
interface that we're not going to
-
demonstrate in detail but basically this
-
is a tool that can be used to solve a
-
bunch of problems where you need to
-
manually search history for something
-
suppose you're in a scenario where
-
you've been working on a project for a
-
long time you have lots and lots of
-
snapshots you're a thousand commits in
-
and then you notice that some unit test
-
doesn't pass anymore but you know that
-
this was passing like
-
year ago and you're trying to figure out
-
at what point did it break like at what
-
point was this regression in your code
-
introduced so one thing you could do is
-
manually check out like go back one
-
commit and see if the unit test is still
-
failing go back one commit see if it's
-
still failing and eventually you'll find
-
the first commit where the test stopped
-
working and it'll probably tell you like
-
what broke but that's kind of annoying
-
to do manually get by sight automates
-
that process and it actually binary
-
searches your history so it does this in
-
the most efficient way possible and not
-
only that get bisect can take in a
-
scripts that it uses to try to figure
-
out whether a committed looking at is
-
good or bad so it can be a fully
-
automated process like you can give git
-
bisect a unit test and say find the
-
first commit where this unit test
-
stopped passing it's a really powerful
-
tool another random thing that's kind of
-
useful is something called a git ignore
-
file so by default if you have random
-
files in a directory like let me create
-
the dot d s underscore store file whoops
-
create the dot d s underscore store file
-
and then do git status so D s stores
-
like some nuisance file that Mac OS
-
creates I don't know exactly what goes
-
in here but basically once this file is
-
in this directory now whenever I do get
-
status it says oh there's this new file
-
that I've never heard of it before but
-
it apparently here like do you want to
-
add it and this sort of tough stuff gets
-
annoying and there's a lot of other
-
stuff beyond OS specific garbage that
-
might be in a directory like for example
-
if you're working with C code you might
-
compile it and produce dot o files or
-
executable files or things like that and
-
you probably don't want binaries to be
-
part of your commit history you only
-
want the source code and so git has a
-
way of you being able to tell the tool
-
that you don't care about a particular
-
set of files and to ignore them and
-
that's something called a git ignore
-
file so if I go and modify the file
-
called git ignore in the current
-
directory I can specify particular file
-
names or patterns of file names like say
-
I can specify star dot o so any file
-
ending in dot o along with da store and
-
now if I touch food oh and now do a get
-
status
-
I'll see that git says okay I've
-
hollowed out tax which I've modified
-
sure and
-
and I have get ignore so you should
-
track your get ignore file using it but
-
notice that it doesn't mention my dot d
-
s store file or my food out o file
-
that's present in the current directory
-
because that has been get ignored so
-
that's a quick overview of a little bit
-
of advanced get functionality just to
-
give you a flavor of what sorts of cool
-
things this tool can do and then finally
-
we have a couple other topics that are
-
covered in the lecture notes in more
-
detail I'll just quickly list them here
-
so you know what to look for one is that
-
there are many graphical clients forget
-
we don't personally use them we like the
-
git command line tool but some of them
-
are kind of ok and you might want to
-
check them out just to see if you prefer
-
using those another thing is shell
-
integration so you've noticed that in
-
this tutorial I've done get status a
-
whole bunch to see kind of what's going
-
on with my repository well that's kind
-
of annoying to do and a lot of people
-
have their shell prompts set up so that
-
just within this shell prompt itself
-
like on every line it will show me a
-
very succinct summary of what's going on
-
with my repository so it might show me a
-
summary of what branch I have currently
-
checked out along with maybe if I've
-
modified files or untracked files and so
-
we have a link in the lecture notes on
-
how to get some nice shell integration
-
for displaying kind of get related
-
information in your shell prompt similar
-
to that you can get integrations with
-
your text editor so for example I use
-
vim and I have a plug-in for vim that
-
does all sorts of interesting get
-
related stuff one thing I can do with
-
this plug-in is look at get blame
-
information remember we just looked at
-
this through the command line instead I
-
can look at it with this plug-in and it
-
lets me work with it a lot faster I can
-
look at get blame press enter when
-
hovering over a specific commit and it
-
shows me that particular commit in my
-
text editor it even hides all the other
-
files and shows me just the one file I
-
was looking at which is presumably what
-
I care about so we have links to that in
-
the lecture notes as well and there are
-
a couple of there interesting things you
-
could look at there if you're interested
-
finally this lecture by itself is
-
probably not enough to teach you
-
everything you need to know about git
-
it's a good start we think that the
-
right way of learning get was to learn
-
about
-
the underlying data model the whole
-
objects and references and how get
-
models history and then we gave you an
-
introduction to using the git commands
-
and if you want to become really
-
proficient at this tool in the resources
-
section in the lecture notes for today
-
we have a link to a book called pro git
-
so this is a free book it's nicely
-
written it's pretty short and I think
-
going through the first couple chapters
-
of that book should teach you basically
-
everything you need to know in order to
-
use get proficiently for real software
-
projects and for contributing
-
it's a project on github and things like
-
that and then finally just like all the
-
other lectures we have a number of
-
exercises you can go to go through if
-
you want some interesting and
-
challenging problems that you can figure
-
out how to do