< Return to Video

Analyze Reports

  • 0:01 - 0:04
    This Web Archiving Service video tutorial
  • 0:04 - 0:06
    will show you how utilize the various
  • 0:06 - 0:11
    report options available to you.
  • 0:11 - 0:12
    The reports you can run are listed
  • 0:12 - 0:14
    under the "Reports" tab
  • 0:14 - 0:35
    when viewing the overview of an individual capture.
  • 0:35 - 0:37
    Each report has a brief description
  • 0:37 - 0:40
    explaining its function.
  • 0:40 - 0:41
    The most important report is
  • 0:41 - 0:45
    the "Crawl Log."
  • 0:45 - 0:48
    This report gives you detailed information
  • 0:48 - 0:50
    about your capture and can help you determine
  • 0:50 - 0:52
    any errors that were encountered
  • 0:52 - 0:54
    during the crawl.
  • 0:54 - 0:55
    You can search for a particular item
  • 0:55 - 0:58
    to see whether or not it was captured.
  • 0:58 - 1:00
    Use Ctrl-f on a PC or Command-f on a Mac
  • 1:00 - 1:04
    to search for the filename.
  • 1:04 - 1:06
    Then find the corresponding Heritrix
  • 1:06 - 1:08
    or HTTP status code
  • 1:08 - 1:11
    that is in the second column.
  • 1:11 - 1:13
    The main Heritrix codes that you will encounter
  • 1:13 - 1:18
    are "1," which means successful DNS lookup performed;
  • 1:18 - 1:23
    "200," which indicates that the item was successfully captured;
  • 1:23 - 1:25
    "403," which tells you that the item requires
  • 1:25 - 1:28
    authorization to be viewed, and therefore
  • 1:28 - 1:29
    was not captured;
  • 1:29 - 1:32
    "404," which means that the item could not be found,
  • 1:32 - 1:34
    and therefore was not captured;
  • 1:34 - 1:39
    "-9998," which means that there was a robots.txt exclusion
  • 1:39 - 1:42
    for this item, and it was not captured.
  • 1:42 - 1:43
    These tools are great for browsing
  • 1:43 - 1:46
    and troubleshooting on your own,
  • 1:46 - 1:48
    but know that we're happy to work with you
  • 1:48 - 1:50
    to research any errors or problems
  • 1:50 - 1:52
    that you come across.
  • 1:52 - 1:54
    In addition to the "Reports" tab,
  • 1:54 - 1:57
    we also a useful page of quality assurance tools.
  • 1:57 - 2:00
    Using the dropdown menu beneath the "Captures" tab
  • 2:00 - 2:02
    to choose "QA Tools" will take you to the list
  • 2:02 - 2:05
    of quality assurance tools.
  • 2:05 - 2:07
    These tools will help pinpoint areas
  • 2:07 - 2:10
    that are causing problems within your captures.
  • 2:10 - 2:13
    Each tool has a brief explanation.
  • 2:13 - 2:16
    For example, checking the list of redirected seed URLs
  • 2:16 - 2:19
    will clue you in as to which sites
  • 2:19 - 2:20
    may need updated URLs in order to continue
  • 2:20 - 2:24
    capturing correctly in the future.
  • 2:24 - 2:26
    In order to update a URL,
  • 2:26 - 2:28
    simply click on the "Edit Site" link,
  • 2:28 - 2:31
    and then change the seed URL information
  • 2:31 - 2:33
    on the "Edit Site" screen.
  • 2:33 - 2:36
    This has been a tutorial on analyzing reports.
  • 2:36 - 2:38
    Check out our additional tutorials
  • 2:38 - 2:40
    on analyzing capture results,
  • 2:40 - 2:41
    and comparing captures,
  • 2:41 - 2:44
    to better understand your capture results.
  • 2:44 - 2:46
    As always, if you have questions,
  • 2:46 - 2:50
    feel free to contact us at washelp@ucop.edu
Title:
Analyze Reports
Description:

This Web Archiving Service video tutorial will help you understand how to analyze your reports.

more » « less
Video Language:
English
cpwillett edited English subtitles for Analyze Reports

English subtitles

Revisions