[Script Info] Title: [Events] Format: Layer, Start, End, Style, Name, MarginL, MarginR, MarginV, Effect, Text Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Hi, thank you. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm Nicolas Dandrimont and I will indeed\Nbe talking to you about Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Software Heritage. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm a software engineer for this project. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I've been working on it for 3 years now. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we'll see what this thing is all about. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Mic not working] Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I guess the batteries are out. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, let's try that again. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, we all know, we've been doing\Nfree software for a while, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that software source code is something\Nspecial. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Why is that? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,As Harold Abelson has said in SICP, his\Ntextbook on programming, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,programs are meant to be read by people\Nand then incidentally for machines to execute. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, what software source code\Nprovides us is a way inside Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the mind of the designer of the program. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For instance, you can have,\Nyou can get inside very crazy algorithms Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that can do very fast reverse square roots\Nfor 3D, that kind of stuff Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Like in the Quake 2 source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,You can also get inside the algorithms\Nthat are underpinning the internet, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for instance seeing the net queue\Nalgorithm in the Linux kernel. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we are building as the free software\Ncommunity is the free software commons. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, the commons is all the cultural\Nand social and natural resources Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we share and that everyone\Nhas access to. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,More specifically, the software commons\Nis what we are building Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with software that is open and that is\Navailable for all to use, to modify, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,to execute, to distribute. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We know that those commons are a really\Ncritical part of our commons. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Who's taking care of it? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The software is fragile. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Like all digital information, you can lose\Nsoftware. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,People can decide to shut down hosting\Nspaces because of business decisions. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,People can hack into software hosting\Nplatforms and remove the code maliciously Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,or just inadvertently. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And, of course, for the obsolete stuff,\Nthere's rot. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you don't care about the data, then\Nit rots and it decays and you lose it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, where is the archive we go to\Nwhen something is lost, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,when GitLab goes away, when Github\Ngoes away. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Where do we go? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Finally, there's one last thing that we\Nnoticed, it's that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,there's a lot of teams that work on\Nresearch on software Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and there's no real big infrastructure\Nfor research on code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's tons of critical issues around\Ncode: safety, security, verification, proofs. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Nobody's doing this at a very large scale. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you want to see the stars, you go\Nthe Atacama desert and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you point a telescope at the sky. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Where is the telescope for source code? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's what Software Heritage wants to be. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we do is we collect, we preserve\Nand we share all the software Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that is publicly available. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Why do we do that? We do that to\Npreserve the past, to enhance the present Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and to prepare for the future. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we're building is a base infrastructure\Nthat can be used Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for cultural heritage, for industry,\Nfor research and for education purposes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How do we do it? We do it with an open\Napproach. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Every single line of code that we write\Nis free software. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We do it transparently, everything that\Nwe do, we do it in the open, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,be that on a mailing list or on\Nour issue tracker. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we strive to do it for the very long\Nhaul, so we do it with replication in mind Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so that no single entity has full control\Nover the data that we collect. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we do it in a non-profit fashion\Nso that we avoid Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,business-driven decisions impacting\Nthe project. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, what do we do concretely? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We do archiving of version control systems. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What does that mean? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It means we archive file contents, so\Nsource code, files. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We archive revisions, which means all the\Nmetadata of the history of the projects, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we try to download it and we put it inside\Na common data model that is Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,shared across all the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We archive releases of the software,\Nreleases that have been tagged Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in a version control system as well as\Nreleases that we can find as tarballs Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,because sometimes… boof, views of\Nthis source code differ. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Of course, we archive where and when\Nwe've seen the data that we've collected. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All of this, we put inside a canonical,\NVCS-agnostic, data model. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you have a Debian package, with its\Nhistory, if you have a git repository, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if you have a subversion repository, if\Nyou have a mercurial repository, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,it all looks the same and you can work\Non it with the same tools. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we don't do is archive what's around\Nthe software, for instance Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the bug tracking systems or the homepages\Nor the wikis or the mailing lists. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There are some projects that work\Nin this space, for instance Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the internet archive does a lot of\Nreally good work around archiving the web. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our goal is not to replace them, but to\Nwork with them and be able to do Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,linking across all the archives that exist. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We can, for instance for the mailing lists\Nthere's the gmane project Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that does a lot of archiving of free\Nsoftware mailing lists. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So our long term vision is to play a part\Nin a semantic wikipedia of software, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,a wikidata of software where we can\Nhyperlink all the archives that exist Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and do stuff in the area. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Quick tour of our infrastructure. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, all the way to the right is\Nour archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our archive consists of a huge graph\Nof all the metadata about Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the files, the directories, the revisions,\Nthe commits and the releases and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,all the projects that are on top\Nof the graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We separate the file storage into an other\Nobject storage because of Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the size discrepancy: we have lots and lots\Nof file contents that we need to store Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we do that outside the database\Nthat is used to store the graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, what we archive is a set of\Nsoftware origins that are Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,git repositories, mercurial repositories,\Netc. etc. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All those origins are loaded on a\Nregular schedule. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If there is a very active software origin,\Nwe're gonna archive it more often Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,than stale things that don't get\Na lot of updates. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we do to get the list of software\Norigins that we archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We have a bunch of listers that can,\Nscroll through the list of repositories, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for instance on Github or other\Nhosting platforms. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We have code that can read Debian archive\Nmetadata to make a list of the packages Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that are inside this archive and can be\Narchived, etc. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All of this is done on a regular basis. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We are currently working on some kind\Nof push mechanism so that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,people or other systems can notify us\Nof updates. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our goal is not to do real time archiving,\Nwe're really in it for the long run Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,but we still want to be able to prioritize\Nstuff that people tell us is Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,important to archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The internet archive has a "save now"\Nbutton and we want to implement Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,something along those lines as well, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so if we know that some software project\Nis in danger for a reason or another, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,then we can prioritize archiving it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So this is the basic structure of a revision\Nin the software heritage archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,You'll see that it's very similar to\Na git commit. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The format of the metadata is pretty much\Nwhat you'll find in a git commit Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with some extensions that you don't\Nsee here because this is from a git commit Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So basically what we do is we take the\Nidentifier of the directory Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that the revision points to, we take the\Nidentifier of the parent of the revision Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we can keep track of the history Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and then we add some metadata,\Nauthorship and commitership information Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and the revision message and then we take\Na hash of this, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,it makes an identifier that's probably\Nunique, very very probably unique. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Using those identifiers, we can retrace\Nall the origins, all the history of Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,development of the project and we can\Ndeduplicate across all the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All the identifiers are intrinsic, which\Nmeans that we compute them Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,from the contents of the things that\Nwe are archiving, which means that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we can deduplicate very efficiently\Nacross all the data that we archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How much data do we archive? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,A bit. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, we have passed the billion revision\Nmark a few weeks ago. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,This graph is a bit old, but anyway,\Nyou have a live graph on our website. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's more than 4.5 billion unique\Nsource code files. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We don't actually discriminate between\Nwhat we would consider is source code Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and what upstream developers consider\Nas source code, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so everything that's in a git repository,\Nwe consider as source code Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if it's below a size threshold. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,A billion revisions across 80 million\Nprojects. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What do we archive? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We archive Github, we archive Debian. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, Debian we run the archival process\Nevery day, every day we get the new packages Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that have been uploaded in the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Github, we try to keep up, we are currently\Nworking on some performance improvements, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,some scalability improvements to make sure\Nthat we can keep up Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with the development on GitHub. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We have archived as a one-off thing\Nthe former content of Gitorious and Google Code Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which are two prominent code hosting\Nspaces that closed recently Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and we've been working on archiving\Nthe contents of Bitbucket Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is kind of a challenge because\Nthe API is a bit buggy and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Atliassian isn't too interested\Nin fixing it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,In concrete storage terms, we have 175TB\Nof blobs, so the files take 175TB Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and kind of big database, 6TB. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The database only contains the graph of\Nthe metadata for the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is basically a 8 billion nodes and\N70 billion edges graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And of course it's growing daily. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We are pretty sure this is the richest\Nsource code archive that's available now Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and it keeps growing. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So how do we actually… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What kind of stack do we use to store\Nall this? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We use Debian, of course. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All our deployment recipes are in Puppet\Nin public repositories. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've started using Ceph\Nfor the blob storage. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We use PostgreSQL for the metadata storage\Nwe some of the standard tools that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,live around PostgreSQL for backups\Nand replication. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We use standard Python stack for\Nscheduling of jobs Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and for web interface stuff, basically\Npsycopg2 for the low level stuff, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Django for the web stuff Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and Celery for the scheduling of jobs. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,In house, we've written an ad hoc\Nobject storage system which has Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,a bunch of backends that you can use. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, we are agnostic between a UNIX\Nfilesystem, azure, Ceph, or tons of… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It's a really simple object storage system\Nwhere you can just put an object, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,get an object, put a bunch of objects,\Nget a bunch of objects. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've implemented removal but we don't\Nreally use it yet. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,All the data model implementation,\Nall the listers, the loaders, the schedulers Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,everything has been written by us,\Nit's a pile of Python code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, basically 20 Python packages and\Naround 30 Puppet modules Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,to deploy all that and we've done everything\Nas a copyleft license, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,GPLv3 for the backend and AGPLv3\Nfor the frontend. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Even if people try and make their own\NSoftware Heritage using our code, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,they have to publish their changes. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Hardware-wise, we run for now everything\Non a few hypervisors in house and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,our main storage is currently still\Non a very high density, very slow, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,very bulky storage array, but we've\Nstarted to migrate all this thing Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,into a Ceph storage cluster which\Nwe're gonna grow as we need Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in the next few months. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've also been granted by Microsoft\Nsponsorship, ??? sponsorship Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for their cloud services. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've started putting mirrors of everything\Nin their infrastructure as well Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which means full object storage mirror,\Nso 170TB of stuff mirrored on azure Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,as well as a database mirror for graph. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And we're also doing all the content\Nindexing and all the things that need Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,scalability on azure now. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Finally, at the university of Bologna,\Nwe have a backend storage for the download Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so currently our main storage is\Nquite slow so if you want to download Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,a bundle of things that we've archived,\Nthen we actually keep a cache of Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,what we've done so that it doesn't take\Na million years to download stuff. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We do our development in a classic free\Nand open source software way, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we talk on our mailing list, on IRC,\Non a forge. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Everything is in English, everything is\Npublic, there is more information Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,on our website if you want to actually\Nhave a look and see what we do. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, all that is very interesting but how\Ndo we actually look into it? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,One of the ways that you can browse,\Nthat you can use the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,is using a REST API. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, this API allows you to do\Npointwise browsing of the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so you can go and follow the links\Nin a graph, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is very slow but gives you a pretty\Nmuch full access of the data. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's an index for the API that you can\Nlook at, but that's not really convenient, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we also have a web user interface. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It's in preview right now, we're gonna do\Na full launch in the month of June. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you go to \Nhttps://archive.softwareheritage.org/browse/ Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with the given credentials, you can\Nhave a look and see what's going on. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Basically, we have a web interface that\Nallows you to look at Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,what origins we have downloaded, when\Nwe have downloaded the origins Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with a kind of graph view of how often\Nwe visited the origins Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and a calendar view of when we have\Nvisited the origins. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And then, inside the visits, you can\Nactually browse the contents Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we've archived. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, for instance, this is the Python\Nrepository as of May 2017 Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and you can have the list of files,\Nthen drill down, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,it should be pretty intuitive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you look at the history of a project,\Nyou can see the differences Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,between two revisions of a project. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Oh no, that's the syntax highlighting,\Nbut anyway the diffs arrive right after. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, yeah, pretty cool stuff. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I should be able to do a demo as well,\Nit should work. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm gonna zoom in. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So this is the main archive, you can see\Nsome statistics about the objects Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we've downloaded. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,When you zoom in, you get some kind of\Noverflows, because… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Yeah, why would you do that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you want to browse, we can try to find\Nan origin. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,"glibc". Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So there's lots and lots of, like, random\NGithub forks of things… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We don't discriminate and we don't really\Nfilter what we download. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We are looking into doing some relevance\Nkind of sorting of the results, here. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Next. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Xilinx, why not. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, this has been downloaded for the last\Ntime of August 3rd 2016, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so it's probably a dead repository, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,but yeah, you can see a bunch of source\Ncode, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you can read the README of the glibc. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If we go back to a more interesting origin Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,here's the repository for git. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I've selected voluntarily an old visit\Nof the repo so that we can see Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,what was going on then. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If a look at the calendar view, you can see\Nthat we've had some issues actually Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,updating this, but anyway. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If I look at the last visit, then we can\Nactually browse the contents, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you can get syntax highlighting as well. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,This is a big big file with lots of comments Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Let's see the actual source code… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Anyway, so, that's the browsing interface. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We can also now get back what we've\Narchived and download it, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is kind of something that you might\Nwant to do Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if a repository is lost, you can actually\Ndownload it Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and get the source code back again. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How we do that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you go on the top right of this browsing\Ninterface, you have actions and download Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and you can download a directory that\Nyou are currently looking at. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It's an asynchronous process, which means\Nthat if there is a lot of load, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,then it's gotta take some time to get\Nactually, to be able to download the content Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So you can put in your email address so we\Ncan notify you when the download is ready. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm gonna try my luck and say just "ok"\Nand it's gonna appear at some point Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in the list of things that I've requested. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I've already requested some things that\Nwe can actually get and open as a tarball. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Yeah, I think that's the thing that I was\Nactually looking at, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is this revision of the git\Nsource code Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and then I can open it Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Yay, emacs, that's when you want. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Yay, source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,This seems to work. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And then, of course, if you want to\Nactually script what you're doing, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,there's an API that allows you to do\Nthe downloads as well, so you can. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The source code is deduplicated a lot,\Nwhich means that for one single repository Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you get tons of files that we have to\Ncollect if you want to actually download Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,an archive of a directory. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,It takes a while but we have an asynchronous\NAPI so you can POST Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,the identifier of a revision to this URL\Nand then get status updates Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and at some point, it will tell you that\Nthe… here Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The status well tell you that the object\Nis available. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,You can download it and you can even\Ndownload the full history of a project Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and get that as a git-fast-export archive\Nthat you can reimport into Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,a new git repository. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So any kind of VCS that we've imported,\Nyou can export as a git repository Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and reimport on your machine. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How to get involved in the project? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We have a lot of features that we're\Ninterested in, lots of them are now Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in early access or have been done. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's some stuff that we would like\Nhelp with. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,This is some stuff that we're working on: Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,provenance information, you have a content Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you want to know which repository\Nit comes from, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that's something we're on. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Full text search, the end goal is to be\Nable even to trace Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,source of snippets of code that's have\Nbeen copied from one project to another. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's something that we can look into\Nwith the wealth of information that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we have inside the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's a lot of things that, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I mean… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's a lot of things that people want\Nto do with the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Our goal is to enable people to do things,\Nto do interesting things Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,with a lot of source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you have an idea of what you want to do\Nwith such an archive, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,please you can come talk to us Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and we'll be happy to help you help us. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we want to do is to diversify\Nthe sources of things that we archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Currently, we have good support for git,\Nwe have OK support for subversion\N Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and mercurial. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If your project of choice is in another\Nversion control system, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we are gonna miss it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So people can contribute in this area. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For the listing part, we have coverage of\NDebian, we have coverage or Github, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if your code is somewhere else, we won't\Nsee it, so we need people to contribute Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,stuff that can list for instance Gitlab\Ninstances, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and then we can integrate that in our\Ninfrastructure and actually have have Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,people be able to archive their gitlab\Ninstances. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And of course, we need to spread\Nthe word, make the project sustainable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We have a few sponsors now, Microsoft,\NNokia, Huawei, Github has joined as a sponsor Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The university of Bologna, of course Inria\Nis sponsoring. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,But we need to keep spreading the word\Nand keep the project sustainable. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And, of course, we need to save endangered\Nsource code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For that, we have a suggestion box on\Nthe wiki that you can add things to. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For instance, we have in the back of\Nour minds archiving SourceForge, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,because we know that this isn't very\Nsustainable and that's risk of being Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,taken down at some point. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,If you want to join us, we also have\Nsome job openings that are available. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For now it's in Paris, so if you want to\Nconsider coming work with us in Paris, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you can look into that. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's Software Heritage. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We are building a reference archive of\Nall the free software Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that's being ever written Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in an international, open, non-profit and\Nmutualised infrastructure Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we have opened up to everyone,\Nall users, vendors, developers can use it. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The idea is to be at the service of\Nthe community and for society Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,as a whole. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So if you want to join us, you can look at\Nour website, you can look at our code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,You can also talk to me, so if you have\Nany questions, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I think we have 10, 12 minutes for questions. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Applause] Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Do you have questions? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Q] How do you protect the archive\Nagainst stuff that you don't want to Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,have in the archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I think of a stuff that is copyright-\Nprotected and that Github will also Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,delete after a while. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Worse, if I would misuse the archive\Nas my private backup Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and store encrypted blocks on Github\Nand you will eventually backup them Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for me. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[A] There's, I think, two sides of the\Nquestion. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The first side is Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Do we really archive only stuff that is\Nfree software and Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we can redistribute and how do we\Nmanage, for instance, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,copyright takedown stuff. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Currently, most of the infrastructure\Nof the project is under French law. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's a defined process to do\Ncopyright takedown in the French legal system. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We would be really annoyed to have to\Ntake down content from the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,What we do, however, is to mirror public\Ninformation that is publicly available. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Of course I'm not a lawyer for the project,\Nso I can't really… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I'm not 100% sure of what I'm about to say\Nbut Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,what I know is that in the current French\Nlegistlation status, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,if the source of the data is still available Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so for instance if the data is still on\NGithub, then you need to have Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Github take it down before we have to\Ntake it down. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We're not currently filtering content for\Nmisuse of the archive, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so the only thing that we do is put\Na limit on the size of the files Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that are archived in Software Heritage. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The limit is pretty high, like 100MB. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We can't really decide ourselves Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,what is source code,\Nwhat is not source code Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,because for instance if your project is\Na cryptography library, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,you might want to have some encrypted\Nblocks of data that are stored Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in you source code repository as\Ntest fixtures. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,And then, you need them to build the code\Nand to make sure that it works. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, how would that be any different than\Nyou encrypted backup on Github? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,How could we, Software Heritage,\Ndistinguish between proper use and misuse Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,of the resources. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I guess our long term goal is to not have\Nto care about misuse because Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,it's gonna be a drop in the ocean. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We're gonna have so much… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We want to have enough space and\Nenough resources Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that we don't really need to ask ourselves\Nthis question, basically. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Thanks. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Other questions? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Q] Have you looked at some form of\Nauthentication to provide additional Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,insurance that the archived source code\Nhasn't been modified or tampered with Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in some form? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[A] First of all, all the identifiers for\Nthe objects that are inside the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,are cryptographic hashes of the contents\Nthat we've archived. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, for files, for instance, we take\Nthe SHA1, the SHA256, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,one of the BLAKE hashes and the git\Nmodified SHA1 of the file, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and we use that in the manifest for\Nthe directories. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So the directories, the directory identifiers\Nare a hash of the manifest Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,of the list of files that are inside\Nthe directory, etc. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, recursively, you can make sure that\Nthe data that we give back to you Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,has not been, at least altered, by bitflip\Nor anything. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We regularly run a scrub of the data\Nthat we have in the archive, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so we make sure that there's no rot\Ninside our archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've not looked into, basically,\Nattestation of… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,for instance, making sure that the code\Nthat we've downloaded… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,I mean, we're not doing anything more\Nthan taking a picture of the data Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and we say "We've computed this hash.\NMaybe the code that's been presented Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,by Github to Software Heritage is different\Nthan what you've uploaded to Github, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we can't tell." Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,In the case of git, you can always use\Nthe identifiers of the objects Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that you've pushed so you have\Nthe commit hash, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,which is itself a cryptographic identifier\Nof the contents of the commit. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Intern, if the commit is signed, then\Nthe signature is still stored Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in the Software Heritage metadata and\Nyou can reproduce the original git object Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and check the signature, but we've not\Ndone anything specific for Software Heritage Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,in this area. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Does that answer your question? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Cool. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Other questions? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's one in front. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Q] It's partially question, partially\Ncomment. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Your initial idea was to have a telescope,\Nor something like this for source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For now, for me, it looks a little bit\Nmore like microscope, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so you can focus on one thing, but that's\Nnot much. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So have you sorted things about how to\Nanalyze entire ecosystem Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,or something like this. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,For example, now we have Django 2 which is\NPython 3 only so it would be interesting to Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,look at all Django modules to see when\Nthey start moving to this Django. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So we would need to start analyzing\Nthousands or millions of files, but then Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,we would need some SQL like, or some\Nmap reduce jobs Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,or something like this for this. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[A] Yes Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So, we've started… Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,The two initiators of the project, Roberto\NDi Cosmo and Stefano Zacchiroli Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,are both researchers in computer science\Nso they have a strong background in Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,actually mining software repositories and\Ndoing some large scale analysis Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,on source code. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've been talking with research groups\Nwhose main goal is to do analysis on Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,large scale source code archives. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,One of the first mirrors outside of our\Ncontrol of the archive Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,will be in Grenoble (France). Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,There's a few teams that work on\Nactually doing large scale research Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,on source code over there, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,so that's what the mirror will be\Nused for. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,We've also been looking at what\Nthe Google open source team does. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,They have this big repository with all\Nthe code that Google uses Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,and they've started to push back,\Nlike do large scale analysis of Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,security vulnerabilities, issues with\Nstatic and dynamic analysis Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,of the code and they've started pushing\Ntheir fixes upstream. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,That's something that we want to enable\Nusers to do, Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that's not something that we want to do\Nourselves, but we want to make sure Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,that people can do it using our archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,So we'd be happy to work with people\Nwho already do that so that Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,they can use their knowledge and their\Ntools inside our archive. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Does that answer your question? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Cool. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Any more questions? Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,No? Then thank you very much Nicolas. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,Thank you. Dialogue: 0,9:59:59.99,9:59:59.99,Default,,0000,0000,0000,,[Applause]