Return to Video

1025_gitify_your_life.ogv

  • 0:00 - 0:03
    Time we start with the next talk
  • 0:04 - 0:06
    I welcome Richard Hartmann
  • 0:06 - 0:10
    He is involved in Debian since many years
  • 0:10 - 0:14
    and he became recently Debian Developer
  • 0:14 - 0:18
    and he will talk about gitify your life.
  • 0:18 - 0:22
    ?, blogs, configs, data and backup. gitify everything
  • 0:22 - 0:24
    Richard Hartmann
  • 0:24 - 0:25
    Thank you. [applause]
  • 0:31 - 0:32
    Thank you for coming
  • 0:32 - 0:35
    expecially those who ? years attended all ?
  • 0:37 - 0:39
    Short thing about myself
  • 0:40 - 0:42
    As ? said I'm Richard Hartmann
  • 0:42 - 0:46
    In my day job I am backbone manager at Globalways
  • 0:46 - 0:49
    I'm involved in freenode and OFTC and...
  • 0:49 - 0:51
    should I speak louder?
  • 0:52 - 0:53
    I'm not...
  • 0:56 - 0:58
    test, test... good back there?
  • 1:01 - 1:03
    Can you turn up the volume a little bit?
  • 1:06 - 1:08
    test, test... ok, perfect.
  • 1:08 - 1:13
    Since about a week I've been a Debian Developer (yay)
  • 1:13 - 1:21
    [applause] and I'm the author of vcsh.
  • 1:21 - 1:25
    Raise of hands: who of you know what git is?
  • 1:26 - 1:27
    perfect
  • 1:27 - 1:31
    That's just as in ? perfect, we can skip it.
  • 1:32 - 1:34
    Let's move to the first tool, etckeeper.
  • 1:34 - 1:37
    Some of maybe most of this audience will have heard of it,
  • 1:37 - 1:46
    it's a tool to basicly store your /etc in pretty much every version control system you can think of
  • 1:46 - 1:48
    It's implemented in POSIX shell
  • 1:48 - 1:53
    it autocommits every thing in /etc basically at every opportunity
  • 1:53 - 1:55
    you may need to write excludes, for example
  • 1:55 - 1:58
    before your network config ?
  • 1:58 - 2:00
    but else, yeah, that's really cool
  • 2:00 - 2:01
    the autocommit
  • 2:01 - 2:07
    it hooks into most of the important or maybe even all of the important package management systems
  • 2:07 - 2:11
    so when you install your packages, even on SuSE or whatever
  • 2:11 - 2:14
    you can just have it commit automatically, which is very nice
  • 2:15 - 2:18
    You can obviously commit manually
  • 2:18 - 2:20
    if you for example change your X config
  • 2:21 - 2:23
    it supports as I said various backends
  • 2:23 - 2:26
    it's quite nice to recover from failures
  • 2:26 - 2:31
    for example ? used it to recover from saturday's power outages
  • 2:31 - 2:36
    because some servers lost stuff and with etckeeper you can just replay all the data which was...
  • 2:36 - 2:38
    rather nice.
  • 2:38 - 2:39
    Then there is bup.
  • 2:39 - 2:43
    bup is a backup tool based on the git pack file format
  • 2:43 - 2:44
    it's written in python
  • 2:44 - 2:46
    it's very very fast
  • 2:46 - 2:47
    and it's very space efficient.
  • 2:48 - 2:53
    The author of bup managed to reduce his own personal backup size
  • 2:53 - 2:56
    from 120 GiB to 45 GiB
  • 2:56 - 3:00
    just by migrating away from rsnapshot over to bup
  • 3:00 - 3:02
    which is quite good
  • 3:02 - 3:05
    I mean, it's almost or a little bit more than a third, so
  • 3:05 - 3:06
    very good
  • 3:07 - 3:10
    This happens because it has built-in deduplication
  • 3:10 - 3:14
    because obviously git pack files also have deduplication
  • 3:15 - 3:17
    You can restore every single mount point
  • 3:17 - 3:19
    or every single point in time
  • 3:19 - 3:23
    every single backup can be monted as FUSE filesystem or a ? filesystem
  • 3:23 - 3:25
    independently of each other
  • 3:25 - 3:28
    so you can even compare different versions of what you have in your backups
  • 3:29 - 3:30
    which again is very nice
  • 3:30 - 3:35
    the one thing which is a real downside for serious deployments
  • 3:36 - 3:43
    there is no way to get data out of your... archive or out of your backups
  • 3:43 - 3:47
    which again is a direct consequence of using git pack files
  • 3:47 - 3:50
    there is a branch which supports deleting old data
  • 3:50 - 3:53
    but this is not in mainline and it hasn't been in mainline for...
  • 3:54 - 3:56
    I think one or two years
  • 3:56 - 3:59
    so I'm not sure if it will ever happen but...
  • 3:59 - 4:01
    yeah
  • 4:01 - 4:03
    at least in theory it would exist.
  • 4:04 - 4:10
    Then for your websites, for your wikis, for your whatever there is ikiwiki.
  • 4:11 - 4:13
    ikiwiki is a wiki compiler,
  • 4:13 - 4:14
    as the name implies,
  • 4:14 - 4:19
    and it converts various different files into HTML files
  • 4:20 - 4:21
    it's written in Perl
  • 4:21 - 4:23
    it supports various backends
  • 4:23 - 4:25
    again most of the ones you can possibly think of
  • 4:27 - 4:29
    oh, I can even slow down, good
  • 4:30 - 4:35
    it's able to parse various markup languages, more on that on the next slide
  • 4:35 - 4:42
    there are several different ways to actually edit any kind of content within ikiwiki
  • 4:43 - 4:46
    it has templating support, it has CSS support
  • 4:46 - 4:52
    these are quite extensive, but they may be improved, but that's for another time
  • 4:52 - 4:57
    it acts as a wiki, as a CMS, as a blog, as a lot of different things
  • 4:57 - 5:03
    it automatically generates RSS and Atom feeds for every single page, for every subdirectory
  • 5:03 - 5:06
    so you can easily subscribe to topical content
  • 5:06 - 5:10
    if you are for example only interested in one part of a particular page
  • 5:10 - 5:12
    just subscribe to this part by RSS
  • 5:12 - 5:15
    and you don't have to check if there updates for it
  • 5:15 - 5:20
    which is very convenient to keep track of comments somewhere or something
  • 5:20 - 5:26
    It supports OpenID, which means you dont have to go through all the trouble of...
  • 5:26 - 5:29
    having a user database or doing very...
  • 5:30 - 5:31
    or doing a lot of antispam measures
  • 5:31 - 5:35
    because it turns out OpenID is relatively well...
  • 5:35 - 5:36
    suited for just...
  • 5:36 - 5:39
    stopping spam. For some reason, maybe they just
  • 5:39 - 5:41
    haven't picked it up yet, I don't know
  • 5:41 - 5:44
    but it's quite nice, because you don't have to do any actual work
  • 5:44 - 5:50
    and people can still edit your content, and you can track back changes at least to some extent
  • 5:52 - 5:58
    it supports various markup languages, the best ones, well, debatable, but in my opinion is Markdown
  • 5:58 - 6:07
    it supports WikiText, reStructuredText, Textile and HTML and there are ikiwiki specific extensions
  • 6:07 - 6:12
    for example normal wikilinks which are a lot more powerful than the normal linking style in MarkDown
  • 6:12 - 6:15
    which kind of sucks, but... whatever
  • 6:17 - 6:23
    it also supports directives, which basically tell ikiwiki to do special things with the page
  • 6:23 - 6:24
    for example you can tag your blog pages
  • 6:24 - 6:27
    or you can make...
  • 6:27 - 6:33
    generate pages which automatically pull in content from different other pages and stuff like this.
  • 6:33 - 6:35
    that's all done by directives.
  • 6:38 - 6:40
    How does it work?
  • 6:40 - 6:45
    You can edit webpages directly, if you want to, on the web
  • 6:45 - 6:50
    then you will have a rebuild of the content
  • 6:50 - 6:52
    but only the parts with changes
  • 6:52 - 6:56
    so if you... hello?
  • 6:56 - 6:59
    if you change only one single file it will only rebuild one single file
  • 6:59 - 7:04
    if you change for example the navigation it will rebuild everything because obviously...
  • 7:04 - 7:06
    it is used by everything.
  • 7:16 - 7:20
    If it has to generate pages automatically, for example the index pages or something
  • 7:20 - 7:23
    if you just create a new subdirectory, or if you have...
  • 7:24 - 7:26
    if you have commands which have to appear on your site
  • 7:26 - 7:29
    it will automatically generate those MarkDown files and commit them
  • 7:29 - 7:34
    or you put them in your souce directory and you just commit them and...
  • 7:34 - 7:38
    and have them part of your site, or you can autocommit them if you want.
  • 7:38 - 7:40
    That's possible as well.
  • 7:40 - 7:46
    You can obviously change... pull in changes in your local repository if you want to look at them
  • 7:47 - 7:49
    Common uses would be public wiki...
  • 7:49 - 7:53
    private notes, for just note keeping of your personal TODO list or whatever
  • 7:54 - 7:58
    having an actual blog, which a lot of people in this room probably do
  • 7:58 - 8:04
    that's, yeah, I mean a lot of people on Planet Debian have their blog on ikiwiki, for good reasons
  • 8:05 - 8:09
    and an actual CMS for company websites or stuff
  • 8:09 - 8:12
    which also tends to work quite well.
  • 8:14 - 8:22
    The three main ways to interact with ikiwiki are webbased text editing, which is quite useful for new users, but is quite boring, in my opinion,
  • 8:22 - 8:28
    there is also a WYSIWYG editor which is even more fancy for non-technical users
  • 8:29 - 8:33
    and there is just plain old CLI-based editing way:
  • 8:33 - 8:39
    just edit files and commit them back into repository pushes up and everything gets rebuilt automatically , which is...
  • 8:39 - 8:42
    in my opinion the best way to interact with ikiwiki, because
  • 8:42 - 8:46
    you are able to stay on the command line and simply push out your...
  • 8:46 - 8:50
    your stuff onto the web and you don'tactually have to leave the command line
  • 8:51 - 8:53
    which is pretty kinda neat.
  • 8:54 - 8:57
    There are also some more advanced use cases
  • 8:59 - 9:03
    as I said you can interface with the source files directly
  • 9:03 - 9:04
    you can maintain...
  • 9:05 - 9:06
    something is wrong
  • 9:06 - 9:10
    for example you can maintain your your wiki and your docs and your...
  • 9:10 - 9:12
    source code in one single directory
  • 9:13 - 9:15
    and it would simply...
  • 9:15 - 9:19
    and simply have parts of your subdirectory structure rendered.
  • 9:19 - 9:21
    for example git-annex does this
  • 9:21 - 9:24
    there is a doc directory, which is rendered to the website
  • 9:25 - 9:27
    but is also part of the normal source directory
  • 9:27 - 9:31
    which means that everybody who checks out a copy of the repository
  • 9:31 - 9:34
    will have the complete forum, bug reports, TODO lists
  • 9:34 - 9:35
    user comments,
  • 9:35 - 9:40
    everything on their local filesystem, without having to leave - again - their command line,
  • 9:40 - 9:49
    which doesn't break media, and so is just very convenient to have one single resource for everything regarding one single program.
  • 9:50 - 9:53
    And another nice thing is if you create different branches
  • 9:53 - 9:59
    for preview, staging areas you can have workflows where some people are just allowed to create ...
  • 9:59 - 10:05
    pages, other people then look over those pages and merge them back into master and then push them on the website
  • 10:05 - 10:08
    which basically allows you to...
  • 10:09 - 10:14
    to have content control or real publishing workflow, if you have a need to do this
  • 10:16 - 10:18
    Next stop: git-annex.
  • 10:19 - 10:20
    The beef.
  • 10:22 - 10:29
    It's basically a tool to manage files with git without checking those files into git
  • 10:30 - 10:32
    ?
  • 10:35 - 10:36
    Yeah, what is git-annex?
  • 10:36 - 10:36
    It's based on git,
  • 10:36 - 10:39
    it maintains the metadata about files,
  • 10:39 - 10:43
    as in location, and file names and everything, in your git repository
  • 10:44 - 10:49
    but it doesn't actually maintain the file content within the git repository
  • 10:49 - 10:50
    more on that later
  • 10:50 - 10:53
    this saves a lot of time and space.
  • 10:54 - 10:59
    You still able to use any git-annex repository as a normal git repository
  • 10:59 - 11:02
    which ? means you're even able to have a mix of...
  • 11:02 - 11:05
    for example, say, all your ? files
  • 11:05 - 11:08
    should be maintained by normal git,
  • 11:08 - 11:12
    and then you have all the merging which git does for you and everything
  • 11:12 - 11:14
    and then you have for example your photographs,
  • 11:14 - 11:16
    or your videos for web publishing
  • 11:16 - 11:19
    which are maintained in the annex
  • 11:19 - 11:24
    which means you don't have to have a copy of those files in each and every single location
  • 11:26 - 11:31
    A very nice thing about git-annex is that it's written with very low bandwidth and flaky connections in mind
  • 11:32 - 11:36
    quite a lot of you will know that Joey lives basically in the middle of nowhere
  • 11:36 - 11:40
    which is a great thing to be forced to write really efficient code
  • 11:41 - 11:43
    which doesn't use a lot of data, and that shows:
  • 11:44 - 11:44
    it's really quick
  • 11:44 - 11:48
    and even if you had a really really bad connection
  • 11:48 - 11:50
    in backwaters or whatever...
  • 11:50 - 11:52
    during holidays or during normal living
  • 11:53 - 11:56
    it's still able to transfer the data which you need to transfer,
  • 11:56 - 11:58
    it's very very nice
  • 11:58 - 12:02
    There are various workflows: we'll see four of them in a few minutes
  • 12:04 - 12:09
    So. It's written in Haskell, so it's probably strongly typed and nobody can write patches for it
  • 12:11 - 12:14
    it uses rsync to actually transfer the data,
  • 12:14 - 12:17
    which means it doesn't try to reinvent any wheels
  • 12:17 - 12:24
    it's really just based on top of established and well know and well debugged programs
  • 12:24 - 12:29
    In indirect mode, which in my personal opinion is the better mode,
  • 12:29 - 12:30
    what it does is
  • 12:30 - 12:36
    it moves the actual files into a different location, namely .git/annex/objects
  • 12:37 - 12:42
    it then makes those files read only, so you cannot event accidentally delete those files
  • 12:42 - 12:47
    even if you rm -f them, it will still tell you no, I can't delete them,
  • 12:47 - 12:48
    which is very secure
  • 12:49 - 12:52
    may be incovenient, but you can work on this
  • 12:52 - 12:57
    it replaces those files with symlinks of the same name, and those just point at the object
  • 12:57 - 13:00
    and if there is an object behind this symlink or not...
  • 13:00 - 13:06
    that basically returns whether you are able on this particular machine, or in this particular repository
  • 13:07 - 13:13
    but you will definitely have the informations about the name of the file, the theorethical location of the file...
  • 13:13 - 13:17
    the hash of the file will be in every single repository
  • 13:17 - 13:19
    There is also a direct mode
  • 13:19 - 13:22
    initially mainly written for windows and Mac OS X
  • 13:22 - 13:25
    because Windows just doesn't support symlinks properly
  • 13:25 - 13:28
    and OS X was supporting symlinks,
  • 13:28 - 13:32
    apparently has lots of developers who think it is a great idea to follow symlinks...
  • 13:32 - 13:35
    and display the actual target of the symlink instead of the symlink
  • 13:35 - 13:39
    so you have cryptic filenames which are very hard to deal with
  • 13:39 - 13:46
    obviously people who are used to GUI tools which then only display really really cryptic names ?
  • 13:46 - 13:50
    so there is direct mode which doesn't do the symlink stuff
  • 13:50 - 13:53
    it basically rewrites the files on the fly
  • 13:53 - 13:58
    git still thinks it would be managing symlinks, but...
  • 13:58 - 14:04
    git-annex just pulls them up from under git, and pushes in the actual content.
  • 14:05 - 14:09
    You keep on nodding, so... I'm probably doing good
  • 14:10 - 14:14
    and if you want you can always delete old data, or you can keep it...
  • 14:14 - 14:17
    or you can just... for example what I'm doing:
  • 14:17 - 14:20
    you can have one or two machines which slurp up all your data...
  • 14:20 - 14:26
    and have an everlasting archive of everything which you've ever put into your annexes...
  • 14:26 - 14:30
    and other machines, for example laptops with smaller SSDs
  • 14:31 - 14:34
    those just have the data which you are actually interested in at the moment
  • 14:36 - 14:38
    How does this work in the background?
  • 14:38 - 14:41
    Each repository has a UUID
  • 14:41 - 14:46
    It also has a name, which makes it easier for you to actually interact with the repository...
  • 14:46 - 14:49
    but in the background it's just the UUID for obvious reasons...
  • 14:49 - 14:55
    because it just makes ? and synchronization easy, period
  • 14:55 - 14:59
    It's also tracking informations in a special branch called git-annex
  • 14:59 - 15:03
    this branch means that all...
  • 15:06 - 15:11
    this branch ? every single repository has full and complete informations...
  • 15:11 - 15:16
    about all files, about the locations of all files, about the last status of those files...
  • 15:16 - 15:19
    if those files have been added to some repository
  • 15:19 - 15:19
    or they have been deleted,
  • 15:19 - 15:22
    or if they are being over there forever
  • 15:22 - 15:31
    so in every single repository you can just lookup the status of this file or of all files in all of your repositories
  • 15:31 - 15:33
    which is, yeah, convenient
  • 15:34 - 15:38
    The tracking information is very simple
  • 15:38 - 15:41
    and it's designed to be merged very...
  • 15:41 - 15:43
    it's a little bit more complicated than applying union merge,
  • 15:43 - 15:46
    but basically what it does is it adds a timestamp
  • 15:47 - 15:53
    and tells if the file is there or not and it has the UUID of the repository
  • 15:53 - 15:57
    and from this informations, along with the timestamps you can simply reproduce...
  • 15:57 - 16:04
    the whole lifecycle of your files through your whole cloud of git-annex repositories
  • 16:04 - 16:06
    in this one particular annex.
  • 16:07 - 16:10
    One really nice which you can do is...
  • 16:10 - 16:13
    if you are on the command line, which again in my opinion is the better mode...
  • 16:13 - 16:15
    you can simply run git-annex sync
  • 16:15 - 16:17
    which basically does a commit...
  • 16:17 - 16:20
    oh, it does a git-annex add, then it does a commit,
  • 16:20 - 16:24
    then it merges from the other repositories
  • 16:24 - 16:27
    into your own master, into your own git-annex branch
  • 16:27 - 16:29
    then it merges the log files
  • 16:29 - 16:31
    that's where the git-annex branch comes in
  • 16:31 - 16:34
    and then it pushes to all other known repositories
  • 16:34 - 16:42
    which is basically a one-shot command to syncronize all the metadata about all the files with all the other repositories
  • 16:43 - 16:45
    and it takes no time at all
  • 16:45 - 16:47
    given a network connection
  • 16:48 - 16:52
    Data integrity is something which is very important for...
  • 16:52 - 16:57
    yeah, for all of the tools, but git-annex was really designed with data integrity in mind
  • 16:58 - 17:04
    by default it uses a SHA-2 256 with file extension...
  • 17:04 - 17:08
    to store the objects, so it renames the file to its own shasum
  • 17:08 - 17:13
    which allows you to always verify the data even without git-annex
  • 17:13 - 17:17
    you are able to say by means of globbing...
  • 17:17 - 17:22
    which files, or which directory, or which types of files should have how many copies in different repositories
  • 17:22 - 17:24
    so for example what I do:
  • 17:24 - 17:28
    all my raw files, all theraw photographs are in at least three different locations,
  • 17:28 - 17:32
    all the JPEGs are only in two, because JPEGs can be regenerated
  • 17:32 - 17:33
    raws can not.
  • 17:34 - 17:38
    All remotes and all special remotes can always be verified
  • 17:38 - 17:41
    with special remotes this may take quite some bandwidth
  • 17:41 - 17:46
    with actual normal git-annex remotes you run the verification locally
  • 17:46 - 17:52
    and just report back the results with obviously saves a lot of bandwidth and transfer time
  • 17:54 - 17:58
    verification obviously takes the amount of requires copies into account
  • 17:58 - 18:01
    so if you would have to have 3 different copies
  • 18:01 - 18:05
    and your whole repository cloud only has 2, it will complain
  • 18:05 - 18:09
    it will tell you "yes, checksum is great, but you don't have enough copies, please do something about it".
  • 18:11 - 18:15
    and even if you ? right now, delete all copies from git annex
  • 18:15 - 18:19
    you would still be able to get all your data out of git annex
  • 18:19 - 18:24
    because what it boils down to, in indirect mode, it's just symlinks to other objects
  • 18:24 - 18:28
    these objects have their own checksum as their file name
  • 18:28 - 18:31
    so you'll even be able to verify, without git-annex,
  • 18:31 - 18:33
    just by means of a little bit of shell scripting,
  • 18:33 - 18:35
    that all your files are correct,
  • 18:35 - 18:39
    that you don't have any bit flips or anything on your local disk.
  • 18:40 - 18:44
    direct mode doesn't really need a recovery ?, because...
  • 18:45 - 18:48
    the actual file is just in place of the symlink
  • 18:52 - 18:55
    but on the other hand you won't be...
  • 18:55 - 18:59
    you still need to look at the git-annex branch to determine the actual checksums
  • 18:59 - 19:02
    which you wouldn't have to do with the indirect mode.
  • 19:03 - 19:08
    There are a lot of special remotes. And what are special remotes?
  • 19:08 - 19:11
    these are able to store data in non git-annex remotes
  • 19:11 - 19:16
    because, let's face it, on most servers, or most servers where you could store data
  • 19:16 - 19:19
    you aren't actually able to get a shell and execute commands
  • 19:19 - 19:22
    you can just push data to it, you can receive data
  • 19:22 - 19:25
    but you cannot actually execute anything on this computer.
  • 19:27 - 19:29
    That's what special remotes are for.
  • 19:30 - 19:34
    All special remotes support encrypted data storage
  • 19:34 - 19:37
    so you just gpg encrypt your data and then send it off
  • 19:37 - 19:42
    which means that the remotes can only see the file names
  • 19:42 - 19:46
    but they cannot see anything else about the contents of your files
  • 19:46 - 19:52
    obviously you don't want to trust amazon or anyone to store your plain text data
  • 19:52 - 19:54
    that would just be stupid
  • 19:54 - 19:59
    There is a hook system, which allows you to write a lot of new special remotes
  • 19:59 - 20:06
    you'll see a list of... quite an extensive list of stuff in a second
  • 20:06 - 20:11
    Normal, built-in, special remotes which are supported by haskell out of the box
  • 20:11 - 20:13
    by git-annex out of the box
  • 20:13 - 20:15
    and actually implemented in haskell
  • 20:16 - 20:22
    are Amazon Glacier, Amazion S3, bup, directory — a normal directory on your system
  • 20:22 - 20:27
    rsync, webdav, http or ftp and the hook system
  • 20:28 - 20:32
    there is a guy who brought most of those
  • 20:32 - 20:37
    we can support archive.org, IMAP, box.com, Google Drive... you can read them yourself, I mean...
  • 20:37 - 20:41
    but those are quite a lot of different special remotes, if you...
  • 20:41 - 20:49
    already have storage on any of those services, just start pushing encrypted data to it if you want, and you're basically done.
  • 20:52 - 20:55
    There is an ongoing project called the git-annex assistant
  • 20:55 - 20:59
    last year, and I think this year it just ended, didn't it?
  • 21:00 - 21:05
    so, pretty much exactly one year ago Joey has started to to raise funds
  • 21:05 - 21:12
    by means of a kickstarter to just focus on writing git-annex assistant for a few months
  • 21:13 - 21:15
    he got so much that he could do it for a whole year
  • 21:15 - 21:23
    and he's just restarted the whole thing with his own fundraising campaign without kickstarter and he got another full year
  • 21:24 - 21:33
    yeah... are you still accepting funds?
  • 21:34 - 21:38
    ok, so, if you use it at least consider donating
  • 21:38 - 21:44
    because honesty you can't write patches for it anyway, because it's in haskell, so...
  • 21:44 - 21:49
    that's... the other means of actually contributing
  • 21:53 - 21:57
    git-annex boils down to be a daemon, which runs in the background
  • 21:58 - 22:03
    and keeps track of all of your files, of newly added files
  • 22:03 - 22:09
    and then starts transferring those files, if configured to do so
  • 22:09 - 22:15
    it starts transferring files to other people or to other repositories
  • 22:15 - 22:18
    this is all managed by means of a web gui
  • 22:18 - 22:26
    which in turns means that it's really, well, not easy, but easier to port to for example windows or android
  • 22:26 - 22:28
    which both work, to some extent
  • 22:29 - 22:33
    not fully, but they are useful, or useable, more or less
  • 22:34 - 22:40
    at least on android it's really quite good, I couldn't test it on windows, because...
  • 22:41 - 22:45
    and it also makes it accessible for non technical users
  • 22:45 - 22:50
    so for example if you want to share some of your photographs with your parents
  • 22:50 - 22:54
    or with friends, or if you want to share, I don't know, videos with other people
  • 22:54 - 22:57
    you just put them into one of those repositories
  • 22:57 - 23:02
    and even those non-technical people just magically see stuff appear in their own repository
  • 23:02 - 23:04
    and can just pull the data if they want to
  • 23:04 - 23:08
    or if you configured it to do so, it would it would even transfer all the data automatically
  • 23:09 - 23:13
    which is... it's ?
  • 23:15 - 23:20
    It supports content notifications, but not content transfer
  • 23:20 - 23:22
    by means of xmpp or jabber
  • 23:22 - 23:27
    which used to work quite well with google talk, I think it's not...
  • 23:28 - 23:29
    oh, it still works, ok
  • 23:30 - 23:37
    at least at the moment, we'll see when they just ? google ? with google+, but...
  • 23:38 - 23:43
    at least at the moment it still works, if you have a google account you can simply transfer all your data
  • 23:43 - 23:49
    you can transfer the metadata about your data, you cannot actually transfer the files through jabber
  • 23:49 - 23:54
    but that's probably something which will happen within the next year
  • 23:55 - 23:58
    there are quite ? rulesets for content distribution
  • 23:58 - 24:04
    so for example I can show you...
  • 24:04 - 24:11
    you can say "put all raw files into this archive, and all jpegs on my laptop", or whatever
  • 24:11 - 24:16
    or "if I still have more than 500 GB free on this please put data in
  • 24:16 - 24:21
    and as soon as only have 20 left stop putting data into this one repository"
  • 24:21 - 24:24
    which obviously is quite convenient
  • 24:24 - 24:28
    as I said there is a windows port, and now on to usecases.
  • 24:28 - 24:30
    First usecase: the archivist.
  • 24:31 - 24:34
    What the archivist does is: basically he just collects data
  • 24:34 - 24:38
    either to ? or just to collect
  • 24:38 - 24:43
    and if you have this usecase what you probably want to do, you want to have offline disks
  • 24:43 - 24:47
    to store at your mom's, or to put into a drawer
  • 24:47 - 24:53
    or just you don't have enough sata ports in your computer because you just have so much data
  • 24:53 - 25:00
    so, what you can do is you can just push this data to either connected machines or to disconnected drives...
  • 25:00 - 25:02
    or to some webservice, and just store data
  • 25:02 - 25:06
    but normally you would have the problem of keeping track of where your data lives
  • 25:06 - 25:09
    if it's still ok, if it's still there, everything.
  • 25:09 - 25:16
    With git-annex you can automate all this administrative side of archiving your stuff.
  • 25:17 - 25:22
    Even if you only have one of those disks, if they're a proper remote...
  • 25:22 - 25:27
    you'll have full informations about all the data in your annex cloud up to this point
  • 25:27 - 25:33
    so even if you only pull out one random disk you still have informations on all the other disks on this one disk
  • 25:33 - 25:36
    which obviously is a nice thing.
  • 25:37 - 25:38
    Media consumption.
  • 25:38 - 25:45
    Let's say you pull a video of this talk, or you get some slides...
  • 25:45 - 25:48
    maybe also from this talk, you can get some podcasts...
  • 25:48 - 25:53
    and git-annex has become a native podcatcher quite recently, I thing two or three weeks ago
  • 25:53 - 25:56
    which means you don't even have a separate podcatcher
  • 25:57 - 26:02
    you just tell git-annex "this is all of my rss feeds" and it will just pull in all the content,
  • 26:03 - 26:08
    Then you can synchronize all this data for example to your cellphone, or your tablet, or whatever
  • 26:08 - 26:14
    consume the data on any of your devices, even if you have 10 copies of this particular podcast
  • 26:14 - 26:17
    because you didn't get around to listen to it on your computer...
  • 26:17 - 26:20
    and you didn't get around to listen to it on your cellphone
  • 26:20 - 26:22
    but then on your tablet you did listen to it
  • 26:22 - 26:25
    you have three copies of this file which you don't need anymore...
  • 26:25 - 26:28
    because you have listened to the content and you don't care about the content anymore
  • 26:28 - 26:34
    what you do is you drop this content on one random repository
  • 26:34 - 26:38
    and this information that you have dropped the actual content,
  • 26:38 - 26:42
    not the metadata about the content, but the actual content, you don't need the content anymore...
  • 26:42 - 26:47
    will slowly propagate to all of the annexes and if they have the data they will also drop the data
  • 26:47 - 26:53
    so you don't have to really care about keeping track of those things
  • 26:53 - 26:56
    you can simply have this message propagate
  • 26:57 - 27:01
    do you want to comment? can someone give Joey a microphone?
  • 27:07 - 27:10
    Just a minor correction
  • 27:10 - 27:12
    it doesn't propagate that you've dropped the content
  • 27:12 - 27:15
    but you can move it around in ways that have exactly the effect you described
  • 27:16 - 27:22
    ? get the wrong idea that if you accidentally remove one thing it will vanish from everything ?
  • 27:23 - 27:26
    but if you deliberately drop the content and tell the annex...
  • 27:26 - 27:28
    no. that's not how it works.
  • 27:28 - 27:30
    I want to talk about it later, but it's...
  • 27:30 - 27:32
    you looked at the slides, but...
  • 27:32 - 27:33
    sorry, ?
  • 27:35 - 27:37
    He watches for everything which is ?
  • 27:47 - 27:55
    Next thing, if you are on the road, and one usecase which is probably quite common: taking pictures while you are on the road ?
  • 27:55 - 27:58
    You take your pictures, you save them to your annex
  • 27:58 - 28:01
    where you are able to store them back to your server or wherever
  • 28:01 - 28:07
    if you want to, and even if for example one disk gets ?
  • 28:07 - 28:09
    and you lose part of your content,
  • 28:09 - 28:14
    you'll still at least be able to have an overview of what content used to be in your annex
  • 28:14 - 28:21
    and if you then pull out your old SD card and see "oh, that photo is still there" you can simply reimport it and it will magically reappear.
  • 28:21 - 28:22
    What it also does is:
  • 28:22 - 28:24
    if you have a very tiny computer with you
  • 28:24 - 28:29
    you can, as soon as you are at an internet cafe, just sync up with your server or your storage, whatever
  • 28:29 - 28:34
    and push out the data to your remotes
  • 28:34 - 28:39
    which then means you'll have two or three or five copies of the data
  • 28:39 - 28:41
    and git-annex keeps track of what is where for you
  • 28:41 - 28:45
    so you don't have to worry about copying stuff around.
  • 28:48 - 28:51
    And then there is one personal usecase, for photographs
  • 28:52 - 28:56
    I have a very specific way of organizing my photographs
  • 28:56 - 28:58
    my wife disagrees violently
  • 29:00 - 29:03
    she likes to do her photo storage in a completely different way
  • 29:03 - 29:05
    she doesn't care about the raw files
  • 29:05 - 29:12
    and she doesn't care about all the documentation pictures of signposts or whatever which I just took to remember which cities we went through
  • 29:12 - 29:19
    so what she can do is she can simply delete the actual files or ? the symlink of this file
  • 29:19 - 29:22
    and it will disappear from her own annex
  • 29:22 - 29:24
    she can then commit all this
  • 29:24 - 29:30
    normally if she would sync back the data I would also have the same layout, which I don't want
  • 29:30 - 29:34
    expecially since she tends to rename everything a lot
  • 29:34 - 29:39
    but what I did, I set up a rebasing branch on top of my normal git-annex repository
  • 29:39 - 29:43
    so what she gets is: she has her own view of the whole data
  • 29:43 - 29:45
    or the part she cares about
  • 29:45 - 29:47
    and when I add new content
  • 29:47 - 29:51
    she will see the new content, she will rearrange the content however she pleases
  • 29:51 - 29:53
    but as it's a rebasing branch
  • 29:53 - 29:56
    all her changes will always be replayed on top of master
  • 29:59 - 30:02
    so she has her own view, and I don't even notice her own view
  • 30:02 - 30:08
    but even if she uses one of the other computers she would have the same view which she herself has
  • 30:08 - 30:12
    so basically she has her own view all of the data
  • 30:12 - 30:15
    This is very convenient to keep the peace at home.
  • 30:17 - 30:19
    Next topic: vcsh.
  • 30:20 - 30:23
    Most of you here probably have some sort of system...
  • 30:23 - 30:27
    where they have one subversion or cvs or whatever repository
  • 30:27 - 30:30
    and they have it somewhere in their home directory
  • 30:30 - 30:36
    you symlink into various places in your home directory, and it kind of keeps working so you don't throw it away, but...
  • 30:36 - 30:39
    to be honest it sucks. Here is why.
  • 30:41 - 30:43
    Or, here's why in a second.
  • 30:44 - 30:47
    vcsh is implemented in POSIX, which is very very portable
  • 30:47 - 30:52
    it's based on git, but it's not directly git
  • 30:52 - 30:57
    The one thing which git is not able to do is maintain several different working copies into one dicrrectory
  • 30:57 - 31:00
    which is a safety feature, more on that later
  • 31:00 - 31:06
    but this really sucks if you want to maintain your mplayer, your shell, your whatever configuration
  • 31:06 - 31:11
    in your home directory, which is the obvious and only real place where it makes sense to put your configuration
  • 31:11 - 31:14
    you don't want to put it into dot-dot-files and then symlink back
  • 31:14 - 31:18
    you want to have it in your home directory as actual files.
  • 31:18 - 31:21
    So, vcsh uses fake bare git repositories
  • 31:21 - 31:23
    again, more on that on the next slide
  • 31:23 - 31:25
    and it's basically a wrapper around git
  • 31:25 - 31:31
    which makes git do stuff which it normally wouldn't do
  • 31:31 - 31:36
    and it has a quite extensible and useful hook system which ? will care about
  • 31:37 - 31:42
    Whith a normal git repository you have two really defining variables within git
  • 31:42 - 31:44
    you have the work tree
  • 31:44 - 31:46
    which is where your actual files live
  • 31:47 - 31:51
    and you have the $GIT_DIR, where the actual data lives
  • 31:51 - 31:56
    normaly in a normal checkout you just have your directory and .git under this
  • 31:57 - 32:02
    If you have a bare repository you obviously don't have an actual checkout of your data
  • 32:02 - 32:06
    you have just all the objects and the configuration stuff
  • 32:06 - 32:09
    so that's what a bare repository boils down to being
  • 32:10 - 32:13
    A fake bare git repository on the other hand has both
  • 32:13 - 32:15
    it has a $GIT_WORK_TREE and it has a $GIT_DIR
  • 32:15 - 32:17
    but those are detached from each other
  • 32:17 - 32:20
    they don't have to be closely tied together
  • 32:20 - 32:26
    and also sets core.bare = false, to actually tell git that "yes, this is a real setup, but..."
  • 32:26 - 32:31
    "yes, you still have a work tree, even thought you don't really expect it"
  • 32:31 - 32:33
    "to have one, you still have a work tree".
  • 32:35 - 32:38
    By default vcsh puts your work tree into home
  • 32:38 - 32:40
    and your git dir into...
  • 32:40 - 32:45
    it's based on .config/vcsh/repo.d and then the name of the repository
  • 32:45 - 32:50
    which just puts it away and out the way of you actually seeing stuff
  • 32:50 - 32:55
    but it follows the cross desktop specifications so if you move stuff around it will also follow
  • 32:55 - 32:57
    Fake bare repositories are really...
  • 32:58 - 33:02
    are messy to setup, and it's very easy to get them wrong
  • 33:02 - 33:07
    that is also the reason why git normally disallows this kind of stuff
  • 33:08 - 33:10
    because all of a sudden you have a lot of...
  • 33:10 - 33:13
    context-dependency on when you do what
  • 33:13 - 33:15
    just immagine you set git workdir...
  • 33:15 - 33:16
    $GIT_WORK_DIR, sorry
  • 33:16 - 33:20
    and run random commands like git add, that's...
  • 33:20 - 33:26
    kind of ok, if you git reset --hard you'll probably not be to happy
  • 33:26 - 33:29
    you checkout the current version that's also quite bad
  • 33:29 - 33:32
    and if you clean -f, yeah, you just throw the home directory
  • 33:32 - 33:34
    congratulations
  • 33:34 - 33:39
    So, it's really risky to run with these variables set
  • 33:39 - 33:44
    which is why I wrote vcsh to wrap around git
  • 33:44 - 33:50
    to hide all this complexity and do quite some sanity checks to make sure everything's set up correctly
  • 33:50 - 33:57
    again it allows you to have several repositories and it also manages really the complete lifecycle of all your repositories
  • 33:57 - 34:03
    it's very easy to just create a new repository, you just init, just with git
  • 34:03 - 34:08
    you add stuff, you commit it, and you define a remote and start pushing to this remote
  • 34:09 - 34:10
    simple
  • 34:11 - 34:14
    This looks like git because it's very closely tight to git
  • 34:14 - 34:19
    and it uses a lot of the power or of the syntax of git, for obvious reasons
  • 34:19 - 34:22
    because... it's closely tight to git
  • 34:22 - 34:25
    you can simply clone as you would with git
  • 34:25 - 34:28
    you can simply show your files as you would with git
  • 34:28 - 34:32
    you can rename the repository, which git can't do, but you don't have to
  • 34:32 - 34:34
    you can show the status of all your files
  • 34:34 - 34:36
    or just of one of your repositories
  • 34:36 - 34:38
    or of all repositories
  • 34:38 - 34:44
    you can pull in all your repositories at once, you can push all of your repositories at once
  • 34:44 - 34:46
    with one single command
  • 34:47 - 34:52
    so, if you are on the road, or you just want to sync up a new machine it's really quick, it's really easy
  • 34:53 - 34:57
    There are three modes of dealing with your repositories
  • 34:57 - 34:59
    default mode is the quickest to type
  • 34:59 - 35:04
    you just say vcsh zsh commit whatever or any random git command
  • 35:05 - 35:06
    but you cannot really run gitk
  • 35:06 - 35:10
    you can do this by using the run mode, which is the second mode
  • 35:10 - 35:14
    we simply ? here run is missing and here git is missing
  • 35:14 - 35:19
    so you say simply vcsh run zsh git commit whatever
  • 35:19 - 35:26
    and this is exactly the same command, it's literally the same comand once it arrives at the shell level, so to speak
  • 35:26 - 35:29
    here you can also run gitk, because...
  • 35:29 - 35:34
    with this, you set up the whole environment for one single command to run with this context
  • 35:34 - 35:37
    of the changed environment variables
  • 35:37 - 35:42
    or you could even enter the repository, then you set all the variables
  • 35:42 - 35:46
    and then you can just use normal git commands as you would normally
  • 35:46 - 35:48
    this is the most powerful mode,
  • 35:48 - 35:52
    but it's also the most likely to hurt you if you don't know what you're doing
  • 35:52 - 35:55
    so I don't recommend working ? down this way.
  • 35:57 - 36:04
    You should have your shell display prompt information about being in a vcsh repository or not
  • 36:04 - 36:08
    simply because else you may forget that you entered something
  • 36:08 - 36:14
    and then if you run those commands, there will be pain
  • 36:18 - 36:22
    At once the usecases, which will be possible quite soon
  • 36:22 - 36:29
    we can just combine vcsh with git-annex to manage everything which is not configuration files in your own home directory
  • 36:29 - 36:35
    ? basically two programs to sync everything about all of your home directory
  • 36:35 - 36:37
    without having to do any extra work
  • 36:38 - 36:41
    you can also use it to do really wierd stuff
  • 36:41 - 36:46
    for example you can backup a .git of a different repository with the help of vcsh
  • 36:46 - 36:52
    so you can just go in, change objects or anything, break stuff and just replay whatever you're doing
  • 36:52 - 36:56
    just to try and see how it breaks in interesting ways.
  • 36:56 - 37:02
    You can just backup a working copy which is maintained by a different reopository or a different system
  • 37:02 - 37:07
    you can even put a whole repository, including the .git,
  • 37:07 - 37:08
    into a different git file
  • 37:08 - 37:13
    or you can even put other VCSs like subversion or something into git, if you want to.
  • 37:14 - 37:16
    Then there is mr.
  • 37:16 - 37:18
    mr ties all those...
  • 37:18 - 37:23
    hopefully by now you have about twenty new repositories
  • 37:23 - 37:26
    because you have configuration, you have ikiwiki, you have everything
  • 37:26 - 37:29
    so now you need something to syncronize all those repositories
  • 37:29 - 37:32
    because doing it by hand is just a lot of work
  • 37:35 - 37:41
    mr supports push, pull, commit operations for all the major known version control systems
  • 37:41 - 37:45
    allowing you to have one single interface to operate on all your systems
  • 37:45 - 37:49
    It's quite trivial to write support for new systems
  • 37:49 - 37:52
    I think it took me about two hours to support vcsh natively
  • 37:52 - 37:54
    so, that's really quick
  • 37:54 - 37:57
    If you want to try, the stuff which I told you about...
  • 37:57 - 38:05
    in the links later there will be the possibility to just clone a subrepository for vcsh
  • 38:05 - 38:10
    which will then put up a suggested mr directory layout
  • 38:10 - 38:12
    and you can just work from there
  • 38:12 - 38:16
    This is the... alright, it's my suggested layout
  • 38:16 - 38:18
    which basically...
  • 38:18 - 38:22
    you just include everything in config.d you maintain...
  • 38:22 - 38:30
    your available.d, by means of vcsh, so you simply sync around all your content between all the different computers
  • 38:30 - 38:35
    and then you simply soft link from available to the actual config
  • 38:35 - 38:39
    which is basically what apache does with sites.enabled and sites.available
  • 38:39 - 38:43
    or modules.available and modules.enabled
  • 38:43 - 38:45
    which is really really powerful
  • 38:45 - 38:48
    Last thing is not git based, but zsh.
  • 38:49 - 38:52
    It's a really powerful shell, you should consider using it
  • 38:52 - 38:56
    it has very good tab complection for all the tools listed here, more than bash
  • 38:56 - 39:00
    it has a right prompt, which will automatically disappear if it needs to
  • 39:00 - 39:05
    which is very convenient to display not important but still useful information
  • 39:05 - 39:11
    and it will automatically, if you tell it to, tell you about you being in a git repository or subversion repository or whatever
  • 39:11 - 39:12
    by means of vcs.info
  • 39:13 - 39:18
    which also means you'll be told that at the moment you are in a vcsh repository
  • 39:18 - 39:21
    and you may kill your stuff if you do things wrong
  • 39:21 - 39:23
    it can mimic all the major shells
  • 39:23 - 39:26
    and there's just too many reasons to list
  • 39:27 - 39:29
    So... final pitch
  • 39:29 - 39:34
    This is true: I've tried it earlier, I can demo it, I still have five minutes left
  • 39:34 - 39:39
    it takes me less than five minutes to syncronize my complete, whole, digital life while on the road
  • 39:39 - 39:44
    so if I'm at the airport and just want to update all my stuff,and push out all my stuff...
  • 39:44 - 39:47
    it'll take a few minutes, but then I can hop on the airplane...
  • 39:47 - 39:51
    and I'll know everything is fine, everything is up-to-date on my local machine
  • 39:51 - 39:57
    on my laptop machine, I can continue working, and have a backup on my remote systems
  • 39:57 - 39:59
    These are the websites
  • 40:00 - 40:08
    The slides will be linked from penta, so you are more than welcome to look at these links later
  • 40:08 - 40:12
    There are previous talks, which you can also look at, if you want to
  • 40:12 - 40:14
    and that's pretty much it
  • 40:14 - 40:17
    and if you have any more questions afterwards either catch me...
  • 40:17 - 40:21
    or there is an IRC channel, and there is a mailing list
  • 40:21 - 40:27
    ok, we can take a few questions, we have still a few minutes
  • 40:27 - 40:31
    then if there are more questions ask Ritchie afterwards
  • 40:32 - 40:36
    And while we are doing this just look here, because that's a complete sync of everything I have
  • 40:37 - 40:40
    Just to make sure I understood this correctly,
  • 40:40 - 40:49
    with git-annex the point is that the data is stored dispersed over different local destinations, so to speak
  • 40:49 - 40:53
    but the metadata ? exists, ? complete git history
  • 40:53 - 41:00
    so git is able to tell me, "well, this version at that destination was changed at that time and so on and so on"
  • 41:00 - 41:03
    did I get this right or...
  • 41:03 - 41:05
    git will be able to tell you about changes...
  • 41:05 - 41:08
    ok, I don't have internet, sorry
  • 41:08 - 41:12
    git will be able to tell you about changes in the filenames, or directory structure
  • 41:12 - 41:16
    git-annex will be able to tell you about changes in the actual file content
  • 41:16 - 41:17
    or in moving around the files
  • 41:17 - 41:22
    but as one single unit, more or less, then yes...
  • 41:22 - 41:25
    the answer is yes, but not quite, but yes
  • 41:25 - 41:32
    yes, but ? all the things you asked about are in git, you know the previous location, all that stuff
  • 41:32 - 41:39
    but in a separate branch which you should use git-annex to access, but you can do it by hand if you want to
  • 41:52 - 41:55
    I'm not familiar with tracking branches,
  • 41:55 - 41:58
    yet you mention that the workflow for your wife has a different view of the data than you
  • 41:58 - 42:07
    with that workflow is it possible for your wife to upload photos that you will have in your view as well, or is it a oneway street?
  • 42:07 - 42:13
    minor correction: tracking branches track a different repository,
  • 42:13 - 42:18
    what I meant was rebasing branches, which rebase on top of a different branch
  • 42:18 - 42:24
    which basically just keeps the patches always on top of the branch, no matter where the head moves to
  • 42:27 - 42:32
    if she wanted to do that she would need to simply git checkout master
  • 42:32 - 42:39
    do whatever she wanted to do, and then git checkout her own branch, and then she's...
  • 42:39 - 42:44
    she is able to, but she would need to change into the master branch and then back
  • 42:49 - 42:50
    microphone
  • 42:51 - 42:57
    she never pushes her private branch? it always lives on her own machine?
  • 42:57 - 43:02
    no, she does push it, but I don't display this view of the data
  • 43:03 - 43:08
    because else she wouldn't be able to syncronize this view between different computers
  • 43:08 - 43:12
    I seem to have internet now, so let's just let this run in the background
  • 43:14 - 43:15
    any more questions?
  • 43:23 - 43:25
    no more questions?
  • 43:27 - 43:27
    than we...
  • 43:27 - 43:29
    ? more minutes for questions?
  • 43:36 - 43:41
    ok, so thanks to Richard Hartmann, we will continue...
Title:
1025_gitify_your_life.ogv
Video Language:
English
Team:
Debconf
Project:
2013_debconf13

English subtitles

Revisions