[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Hi, thank you. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm Nicolas Dandrimont and I will indeed\Nbe talking to you about Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Software Heritage. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm a software engineer for this project. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I've been working on it for 3 years now. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we'll see what this thing is all about. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Mic not working] Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I guess the batteries are out. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, let's try that again. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, we all know, we've been doing\Nfree software for a while, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that software source code is something\Nspecial. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Why is that? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,As Harold Abelson has said in SICP, his\Ntextbook on programming, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,programs are meant to be read by people\Nand then incidentally for machines to execute. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, what software source code\Nprovides us is a way inside Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the mind of the designer of the program. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For instance, you can have,\Nyou can get inside very crazy algorithms Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that can do very fast reverse square roots\Nfor 3D, that kind of stuff Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Like in the Quake 2 source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,You can also get inside the algorithms\Nthat are underpinning the internet, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for instance seeing the net queue\Nalgorithm in the Linux kernel. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we are building as the free software\Ncommunity is the free software commons. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, the commons is all the cultural\Nand social and natural resources Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we share and that everyone\Nhas access to. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,More specifically, the software commons\Nis what we are building Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with software that is open and that is\Navailable for all to use, to modify, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,to execute, to distribute. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We know that those commons are a really\Ncritical part of our commons. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Who's taking care of it? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The software is fragile. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Like all digital information, you can lose\Nsoftware. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,People can decide to shut down hosting\Nspaces because of business decisions. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,People can hack into software hosting\Nplatforms and remove the code maliciously Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,or just inadvertently. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And, of course, for the obsolete stuff,\Nthere's rot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you don't care about the data, then\Nit rots and it decays and you lose it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, where is the archive we go to\Nwhen something is lost, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,when GitLab goes away, when Github\Ngoes away. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Where do we go? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Finally, there's one last thing that we\Nnoticed, it's that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,there's a lot of teams that work on\Nresearch on software Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and there's no real big infrastructure\Nfor research on code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's tons of critical issues around\Ncode: safety, security, verification, proofs. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Nobody's doing this at a very large scale. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you want to see the stars, you go\Nthe Atacama desert and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you point a telescope at the sky. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Where is the telescope for source code? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's what Software Heritage wants to be. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we do is we collect, we preserve\Nand we share all the software Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that is publicly available. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Why do we do that? We do that to\Npreserve the past, to enhance the present Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and to prepare for the future. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we're building is a base infrastructure\Nthat can be used Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for cultural heritage, for industry,\Nfor research and for education purposes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How do we do it? We do it with an open\Napproach. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Every single line of code that we write\Nis free software. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We do it transparently, everything that\Nwe do, we do it in the open, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,be that on a mailing list or on\Nour issue tracker. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we strive to do it for the very long\Nhaul, so we do it with replication in mind Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so that no single entity has full control\Nover the data that we collect. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we do it in a non-profit fashion\Nso that we avoid Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,business-driven decisions impacting\Nthe project. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, what do we do concretely? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We do archiving of version control systems. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What does that mean? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It means we archive file contents, so\Nsource code, files. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We archive revisions, which means all the\Nmetadata of the history of the projects, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we try to download it and we put it inside\Na common data model that is Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,shared across all the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We archive releases of the software,\Nreleases that have been tagged Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in a version control system as well as\Nreleases that we can find as tarballs Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,because sometimes… boof, views of\Nthis source code differ. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Of course, we archive where and when\Nwe've seen the data that we've collected. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All of this, we put inside a canonical,\NVCS-agnostic, data model. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you have a Debian package, with its\Nhistory, if you have a git repository, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if you have a subversion repository, if\Nyou have a mercurial repository, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,it all looks the same and you can work\Non it with the same tools. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we don't do is archive what's around\Nthe software, for instance Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the bug tracking systems or the homepages\Nor the wikis or the mailing lists. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There are some projects that work\Nin this space, for instance Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the internet archive does a lot of\Nreally good work around archiving the web. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our goal is not to replace them, but to\Nwork with them and be able to do Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,linking across all the archives that exist. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We can, for instance for the mailing lists\Nthere's the gmane project Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that does a lot of archiving of free\Nsoftware mailing lists. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So our long term vision is to play a part\Nin a semantic wikipedia of software, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,a wikidata of software where we can\Nhyperlink all the archives that exist Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and do stuff in the area. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Quick tour of our infrastructure. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, all the way to the right is\Nour archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our archive consists of a huge graph\Nof all the metadata about Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the files, the directories, the revisions,\Nthe commits and the releases and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,all the projects that are on top\Nof the graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We separate the file storage into an other\Nobject storage because of Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the size discrepancy: we have lots and lots\Nof file contents that we need to store Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we do that outside the database\Nthat is used to store the graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, what we archive is a set of\Nsoftware origins that are Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,git repositories, mercurial repositories,\Netc. etc. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All those origins are loaded on a\Nregular schedule. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If there is a very active software origin,\Nwe're gonna archive it more often Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,than stale things that don't get\Na lot of updates