<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Guide to Preservation of Web Resources</title>
	<atom:link href="http://jiscpowrguide.jiscpress.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://jiscpowrguide.jiscpress.org</link>
	<description>Practical advice for web and records managers</description>
	<lastBuildDate>Mon, 07 Feb 2011 21:47:09 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Tools for capturing</title>
		<link>http://jiscpowrguide.jiscpress.org/tools-for-capturing/</link>
		<comments>http://jiscpowrguide.jiscpress.org/tools-for-capturing/#comments</comments>
		<pubDate>Fri, 18 Jun 2010 11:11:24 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=93</guid>
		<description><![CDATA[Tools for capturing web resources Web harvesting engines are essentially web search engine crawlers with special processing to extract specific fields of content from web pages. The shortcomings of crawlers are described in Chapter 6. Heritrix Heritrix is a free, open-source, extensible, archiving quality web crawler. It was developed, and is used, by the Internet [...]]]></description>
			<content:encoded><![CDATA[<h1></h1>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="2" width="616" valign="top"><strong>Tools for capturing web resources</strong></p>
<p>Web harvesting engines are essentially web search engine crawlers   with special processing to extract specific fields of content from web pages.   The shortcomings of crawlers are described in Chapter 6.</td>
</tr>
<tr>
<td width="142" valign="top">
<h3>Heritrix</h3>
</td>
<td width="474" valign="top">Heritrix is a free, open-source, extensible, archiving quality web   crawler. It was developed, and is used, by the Internet Archive and is freely   available for download and use in web preservation projects under the terms   of the GNU GPL. It is implemented in Java, and can therefore run on any   system that supports Java (Windows, Apple, Linux/Unix).</p>
<p><strong><em>More information:</em></strong> <a href="http://crawler.archive.org/">http://crawler.archive.org</a></p>
<p><strong><em>Download:</em></strong> <a href="http://sourceforge.net/projects/archive-crawler">http://sourceforge.net/projects/archive-crawler</a></td>
</tr>
<tr>
<td width="142" valign="top">
<h3>HTTrack</h3>
</td>
<td width="474" valign="top">HTTrack is a free offline browser utility, available to use and   modify under the terms of the GNU GPL. Distributions are available for   Windows, Apple, and Linux/Unix. It enables the download of a website from the   Internet to a local directory, capturing HTML, images, and other files from   the server, and recursively building all directories locally. It can arrange   the original site&#8217;s relative link structure so that the entire site can be   viewed locally as if online. It can also update an existing mirrored site,   and resume interrupted downloads.</p>
<p>Like many crawlers, HTTrack may in some cases experience problems   capturing some parts of websites, particularly when using Flash, Java,   Javascript, and complex CGI.</p>
<p><strong><em>More information:</em></strong> <a href="http://www.httrack.com/">http://www.httrack.com/</a></p>
<p><strong><em>Download:</em></strong> <a href="http://www.httrack.com/page/2/en/index.html">http://www.httrack.com/page/2/en/index.html</a></td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="153" valign="top">
<h3>Wget</h3>
</td>
<td width="463" valign="top">GNU Wget is a free software package for retrieving files using   HTTP, HTTPS and FTP. It is a non-interactive command line tool, so it can   easily be used with other scripts, or run automatically at scheduled   intervals. It is freely available under the GNU GPL and versions are   available for Windows, Apple and Linux/Unix.</p>
<p>GNU Wget&#8217;s features include:</p>
<ul>
<li>Converting absolute links in   downloaded documents to relative links, so that downloaded documents may link   to each other locally.</li>
<li>Using filename wild cards,   and recursively mirroring directories.</li>
<li>Resuming aborted downloads.</li>
<li>Multilingual message files.</li>
<li>Support for cookies, proxies   and persistent HTTP connections.</li>
<li>Using local file timestamps   to determine whether documents need to be re-downloaded when mirroring.</li>
</ul>
<p><strong><em>More information:</em></strong> <a href="http://www.gnu.org/software/wget/">http://www.gnu.org/software/wget/</a></p>
<p><strong><em>Download:</em></strong> <a href="http://www.gnu.org/software/software.html">http://www.gnu.org/software/software.html</a></td>
</tr>
<tr>
<td width="153" valign="top">
<h3>DeepArc</h3>
</td>
<td width="463" valign="top">DeepArc was developed by the Bibliothèque Nationale de France to   archive objects from database-driven deep websites (particularly documentary   gateways). It uses a database to store object metadata, while storing the   objects themselves in a file system. Users are offered a form-based search   interface where they may input keywords to query the database. DeepArc has to   be installed by the web publisher who maps the structure of the application   database to the DeepArc target data model. DeepArc will then retrieve the   metadata and objects from the target site.</p>
<p><strong><em>More information:</em></strong> <a href="http://bibnum.bnf.fr/downloads/deeparc/">http://bibnum.bnf.fr/downloads/deeparc/</a></p>
<p><strong><em>Download:</em></strong> <a href="http://sourceforge.net/projects/deeparc/">http://sourceforge.net/projects/deeparc/</a></td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="2" width="616" valign="top">
<h1>Workflow systems or   curatorial tools</h1>
<p>These are used for controlling   a web harvest, conducting quality assurance checking, initiating and   scheduling archiving processes, managing the metadata (including access   restrictions), and producing management reports. They may interface with   access tools, for repositories engaged with publishing their harvested   copies.</td>
</tr>
<tr>
<td width="153" valign="top">
<h3>Web   Curator Tool</h3>
</td>
<td width="463" valign="top">Web Curator Tool (WCT) is for managing the selective web   harvesting process, is designed for use in libraries and other collecting   organisations, and supports collection by non-technical users while still   allowing complete control of the web harvesting process. The WCT is now   available under the terms of the Apache Public License.</p>
<p>WCT is a tool that interfaces with the Heritrix crawler, allowing   a certain amount of configuration of the target&#8217;s profile, the addition of   extra seed URLs, and enabling filters to be applied to gather more (or less)   material from the target. It also generates several log files which are more   accessible than HTTrack&#8217;s, and can help determine why gathers are going wrong   and how to fix them.</p>
<p>It was developed by the National Library of New Zealand and the   British Library, and initiated by the International Internet Preservation   Consortium. Since December 2007, Web Curator Tool is being used by the UK Web   Archive. (See ARIADNE issue 50, January 2007,   www.ariadne.ac.uk/issue50/beresford/).</p>
<p><strong><em>More information and download:</em></strong> <a href="http://webcurator.sourceforge.net/">http://webcurator.sourceforge.net/</a></td>
</tr>
<tr>
<td width="153" valign="top">
<h3>PANDORA   Digital Archiving System (PANDAS)</h3>
</td>
<td width="463" valign="top">The PANDORA Digital Archiving   System, known as PANDAS, was developed by the National Library of Australia as   an integrated, web-based, web archiving management system. The need for such   a system arose from the scale of the Library&#8217;s archiving activity and the   necessity to enable other PANDORA participants to contribute to the Archive   from various geographic locations.<br />
Like WCT, PANDAS is a workflow system; with the crawling being done   by HTTrack. It was created to enable very selective harvesting and is not   intended for large-scale automated harvests. Its main functions include   managing workflow, creating publisher and title entities, access permissions,   gather schedules, and metadata. One caveat is that the tool has a very strong   bias towards library models (it treats websites and web pages as titles that   have authors and subjects).</p>
<p>Also, the software is built from web objects and lacks robustness;   its interface with HTTrack is far from clear, particularly when it comes to   applying filters to the gather.</p>
<p><strong><em>More information:</em></strong> <a href="http://pandora.nla.gov.au/pandas.html">http://pandora.nla.gov.au/pandas.html</a></td>
</tr>
<tr>
<td width="153" valign="top">
<h3>NetarchiveSuite</h3>
</td>
<td width="463" valign="top">NetarchiveSuite is a curator tool which allows librarians to   define and control harvests of web material. The system scales from small   selective harvests to harvests of entire national domains. The system is   fully distributable on any number of machines and includes a secure storage   module handling multiple copies of the harvested material as well as a   quality assurance tool automating the quality assurance process.</p>
<p>It was developed by the Royal Library and the State and University   Library in the virtual organisation netarchive.dk</p>
<p><strong><em>More information and download:</em></strong> <a href="http://netarchive/">http://netarchive.dk/suite</a></td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="2" width="616" valign="top">
<h2>Snapshot tools</h2>
</td>
</tr>
<tr>
<td width="137" valign="top">
<h3>Adobe   Acrobat web capture tool</h3>
</td>
<td width="479" valign="top">Adobe Acrobat WebCapture generates tagged accessible PDF files   from web pages. Acrobat adds the Adobe PDF toolbar and Convert Current Web   Page To An Adobe PDF File button to Internet Explorer 5.01 and later, which   allows the user to convert the currently displayed web page to a tagged Adobe   PDF file.</p>
<p>This tool allows web pages, or entire sites, to be captured to a   PDF file. Tools like this have their place, but (like all web capture and   preservation technologies) they also have their drawbacks. A PDF’s   print-oriented format isn’t a good match to some sites, much as some sites   don’t look good when printed. Acrobat Web Capture effectively uses the   browser’s print engine combined with PDF writer pseudo-printer to do its   work, so there will be a close correlation.</p>
<p><strong><em>More information:</em></strong></p>
<p><a href="http://www.wap.org/journal/acrobat4capture.html">http://www.wap.org/journal/acrobat4capture.html</a></p>
<p><a href="http://www.planetpdf.com/enterprise/article.asp?ContentID=6057">http://www.planetpdf.com/enterprise/article.asp?ContentID=6057</a></td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="137" valign="top">
<h3>OpenOffice   web wizard</h3>
</td>
<td width="479" valign="top">Open Office has many advanced features, including the ability to   use some of its conversion features in batch mode, therefore it could be used   to mass convert web pages into PDF.</td>
</tr>
<tr>
<td width="137" valign="top">
<h3>A.nnotate</h3>
</td>
<td width="479" valign="top">A.nnotate will allow the user to do web page capture. By entering a   URL or using a bookmarklet it will take a snapshot of a web page and store a   copy of the HTML in a private space on the a.nnotate.com site. This gives a   page at a particular point in time. Currently it does a shallow copy (i.e.   just the HTML) so if the images are required it would need to download those.   The A.nnotate server is also available for local installation (with an API)   if it is to be integrated with a CMS. PDFs can also be uploaded to A.nnotate   and these get converted to images and rendered in the browser using pure HTML   / AJAX   (without any dependency on Flash or Adobe reader.</td>
</tr>
<tr>
<td width="137" valign="top">
<h3>SnagIt   9</h3>
</td>
<td width="479" valign="top">SnagIt is an example of an advanced, commercial screen-capture   tool that includes features to capture images and linked files from a web   page, and save the source code and URL of web pages.</p>
<p><strong><em>More information:</em></strong></p>
<p><a href="http://www.techsmith.com/screen-capture.asp">http://www.techsmith.com/screen-capture.asp</a></td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/tools-for-capturing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The JISC-funded PoWR (Preservation of Web Resources) project</title>
		<link>http://jiscpowrguide.jiscpress.org/the-jisc-funded-powr-preservation-of-web-resources-project/</link>
		<comments>http://jiscpowrguide.jiscpress.org/the-jisc-funded-powr-preservation-of-web-resources-project/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:43:18 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=91</guid>
		<description><![CDATA[This Guide is one of the outputs from the JISC-funded PoWR project carried out jointly by ULCC (University of London Computer Centre) and UKOLN (University of Bath). One of the goals of PoWR is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web [...]]]></description>
			<content:encoded><![CDATA[<p>This Guide is one of the outputs from the JISC-funded PoWR project carried out jointly by ULCC (University of London Computer Centre) and UKOLN (University of Bath).</p>
<p>One of the goals of PoWR is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources. Anyone coming for the first time to the field of digital preservation can find it a daunting area, with very distinct terminology and concepts. Some of these are drawn from time-honoured approaches to managing things like government records or Institutional archives, while others have been developed exclusively in the digital domain.</p>
<h2>PoWR workshops</h2>
<p>The Project ran three workshops: in London, Aberdeen and Manchester. The workshops, organised by UKOLN, were a mixture of presentations and break-out groups, where a great deal of useful discussion took place and many ideas were generated. Much valuable and interesting input was gleaned from the mixture of professionals who participated, including people from a records management background, web managers, and other information professionals with an interest in web preservation, or experience of the difficulties and issues.</p>
<h2>The PoWR blog</h2>
<p>A blog was built (<a href="http://jiscpowr.jiscinvolve.org/">http://jiscpowr.jiscinvolve.org/</a>) at the very start of the project in April 2008. Several key chapters of the Handbook (see below) originated on this blog, many of them starting life as a series of ‘what if’ scenarios or actual case studies, focusing on various challenging aspects of web content and the actual use made of systems in an HFE context. The resulting discussions and comments provided a great deal of content to assess and assimilate.</p>
<h2>The Handbook</h2>
<p>The Handbook, written by ULCC staff, is a distillation and synthesis of the material gathered via the workshops and blog; it also draws heavily on the expertise of the PoWR team in the areas of website management, records management, digital preservation, etc. The Handbook aims to provide suggestions for best practice and advice specifically for UK higher and further educational institutions, to enable the preservation of websites and web-based resources.</p>
<h2>The Guide</h2>
<p>This Guide was developed using content from the Handbook and turning it into a practical guide for all those who would benefit from knowing how to go about preserving web resources.</p>
<p><strong>A note on the authors</strong></p>
<p>This Guide has contributions from a range of people but was primarily written and reviewed by the JISC PoWR team:</p>
<p>University of London Computer Centre (ULCC), Senate House, South Block, Malet   Street, London WC1E 7HU (<a href="http://www.ulcc.ac.uk/">http://www.ulcc.ac.uk</a>/)</p>
<ul>
<li>Ed Pinsent</li>
<li>Richard Davis</li>
<li>Kevin Ashley</li>
</ul>
<p>UKOLN, University of Bath, Bath BA2 7AY (<a href="http://www.ukoln.ac.uk/">http://www.ukoln.ac.uk</a>/)</p>
<ul>
<li>Brian Kelly</li>
<li>Marieke Guy</li>
</ul>
<p>This Guide was edited by:</p>
<p>Susan Farrell Consulting Ltd (<a href="http://www.farrellconsulting.co.uk/">http://www.farrellconsulting.co.uk</a>/)</p>
<ul>
<li>Susan Farrell</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/the-jisc-funded-powr-preservation-of-web-resources-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Case study – Capturing wiki contents</title>
		<link>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-capturing-wiki-contents/</link>
		<comments>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-capturing-wiki-contents/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:40:57 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=89</guid>
		<description><![CDATA[Scenario: Wikis are now are at the online heart of innumerable projects &#8211; for teaching, research, publishing and business. When the project is complete what should be done with the content to ensure it is retained? Does wiki software allow for this? Issues Many wikis have a backup option which will enable the capture of [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Scenario: Wikis are now are at the   online heart of innumerable projects &#8211; for teaching, research, publishing and   business. When the project is complete what should be done with the content   to ensure it is retained? Does wiki software allow for this?</td>
</tr>
<tr>
<td width="616" valign="top">
<h3>Issues</h3>
<p>Many wikis have a backup option which will enable the   capture of wiki content, such as Wetpaint, Wikidot, Mediawiki, and   Confluence. However all these options produce imperfect results. In contrast   to this, if a spidering engine like HTTrack or Wget (see Appendix C) is used   to harvest the site remotely, a working local copy of the wiki, looking much   as it does on the web, will be the result. This might be an attractive option   if a record of what it looked like on a certain date is required.</p>
<p>However, it may not be necessary to gather every web   page since the wiki contains many automatically generated pages: versioning,   indexing, admin etc. So a <strong>selection decision</strong> is needed. For example, the edit   history and discussion pages may be excluded as the user community only wants   to look at the finished content.</p>
<p>The change history is important to the current   owner-operators of the wiki, however is this really needed for long term (or   even permanent) preservation. Indeed, could their access requirement be   satisfied merely by allowing the wiki (presuming it is reasonably secure,   backed-up etc.) to go on operating the way it is, as a self-documenting   collaborative editing tool?</p>
<h3>Approaches</h3>
<p>All this suggests some basic questions to ask when   setting up a wiki for a project:</p>
<ul>
<li>What aspects of the wiki do   we want to preserve and for how long?</li>
<li>Is there a business need to   capture the wiki change history, and for how long?</li>
<li>Will it need preserving at   intervals, or at a completion date?</li>
<li>Is it more important to   preserve text content, complete functionality, or its look?</li>
<li>Should we back it up? If so,   what should we back up?</li>
<li>Does the wiki provide backup   features? If so, what does it back up (e.g. attachments, discussions,   revisions)?</li>
<li>Once &#8216;backed up&#8217;, how easily   can it be restored?</li>
<li>Will the links still work in   our preservation or backup copy?</li>
<li>If the backup includes raw   wiki markup, do you have the capabilities to re-render this as HTML?</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-capturing-wiki-contents/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Case study – Student blogs</title>
		<link>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-student-blogs/</link>
		<comments>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-student-blogs/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:39:51 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=87</guid>
		<description><![CDATA[Description: Your Institution runs a blog service for students and staff. One enthusiastic alumna wants to migrate the extensive blog she has kept for three years, but your Institution systematically deletes files and accounts held by students on Institution servers shortly after they graduate. How should the Institution respond if students wish to maintain or [...]]]></description>
			<content:encoded><![CDATA[<p>Description: Your Institution runs   a blog service for students and staff. One enthusiastic alumna wants to   migrate the extensive blog she has kept for three years, but your Institution   systematically deletes files and accounts held by students on Institution   servers shortly after they graduate. How should the Institution respond if   students wish to maintain or migrate the contents of their blog (plus   embedded resources, comments, etc.)?</p>
<h3>Issues</h3>
<ul>
<li>Should the option be open to   students to have their resources persist on Institutional servers after they   leave &#8211; perhaps as part of an Alumni programme?</li>
<li>Should this be an opt-in or   opt-out process, and should fees be involved?</li>
<li>Does an Institution have   permission to archive the content of blogs? This might include permission not   only from the blog’s author but also from creators of third party content.</li>
<li>Is it possible to excise   potentially offending material, or is the risk (probably negligible) that an   Institution might be sued for copyright breaches acceptable?</li>
<li>Are Institutional staff and   students well-informed about the issues of online copyright? Is it possible   to include a default Creative Commons licence in the terms of use of the   system?</li>
<li>Is it more sustainable for   the Institution to host and manage a blogging service or to use third party   providers such as Blogger.com or WordPress.com?</li>
</ul>
<h3>Approaches</h3>
<ul>
<li>Decide that the issue is   predominantly one of policy, not of in-house hosting versus third party   hosting. If an educational Institution is encouraging the use of blogs to   support reflection, discourse and deep learning, it has a responsibility to   make that online environment as safe as it tries to make its physical campus.</li>
<li>Institutions could recommend   the use of mature hosted blogging services for students who will normally   only be at the Institution for a short period. Third party hosting might be a   reasonable alternative to the costs of service development and maintenance,   but the Institution must examine the terms and conditions and functionality   very carefully to ensure they meet standards it can recommend to those in its   charge.</li>
<li>Seek permission from the   owners of the blog content before making copies, investigate wider   application of Creative Commons licences and work towards resolving third   party issues.</li>
</ul>
<h3></h3>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-student-blogs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Case study – Vanishing domain names</title>
		<link>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-vanishing-domain-names/</link>
		<comments>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-vanishing-domain-names/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:39:07 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=85</guid>
		<description><![CDATA[Scenario: A project team in the Institution has purchased a top level domain with a .org suffix, outside the main Institution domain, in order to expose and store its project outputs. The project is now developing into a successful service, there are numerous dependencies, and users have come to trust the domain. But the project [...]]]></description>
			<content:encoded><![CDATA[<p>Scenario: A project team in the   Institution has purchased a top level domain with a .org suffix, outside the   main Institution domain, in order to expose and store its project outputs.   The project is now developing into a successful service, there are numerous   dependencies, and users have come to trust the domain. But the project   manager failed to renew the domain name subscription, and it has now been   purchased by a third party. This third party is requesting a significant fee   to release the domain name back to the Institution.</p>
<h3>Issues</h3>
<p>If your resources are located on the main   Institutional website (usually in the .ac.uk second level domain, managed by   JANET), then your domain is unlikely to disappear unless there are major   changes affecting your institution.</p>
<p>If, however, you are using an alternative domain name   (such as .org, .org.uk, .co.uk or .com) then care is needed in managing   domain registrations. Internal administrative management procedures will need   to be in place to ensure that the domain name is renewed prior to the expiry.</p>
<p>You may ask why anyone would wish to make use of a   non-.ac.uk domain in light of such possible dangers. JANET does not sell off   its domains to the highest bidder. Instead it has strict eligibility   guidelines that may not be met by short-term, collaborative or cross-sectoral   projects and services. Equally within Institutions, the allocation of fourth level   sub-domains (e.g. specialproject.london.ac.uk) is often tightly controlled or   subject to considerable bureaucracy.</p>
<h3>Approaches</h3>
<ul>
<li>Carry out an audit of the   Institution&#8217;s use of non- .ac.uk domains.</li>
<li>Ensure that such domains have   adequate administrative processes in place to ensure that the domain name is   not lost if, for example, project funding ceases and staff involved in the   project leave the Institution.</li>
<li>Carry out a risk assessment   of the dangers of losing such domains, and the costs your Institution may be   willing to pay to claim back the domain.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-vanishing-domain-names/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Case study – Institutional use of Twitter</title>
		<link>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-institutional-use-of-twitter/</link>
		<comments>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-institutional-use-of-twitter/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:37:47 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=82</guid>
		<description><![CDATA[Scenario: An Institution has set up an Institutional Twitter account for disseminating news on activities and events. The unspoken expectation is that Twitter will be used across the Institution as an individual productivity and social tool. However, one department has become an early adopter of the technology and is using it in teaching and learning, [...]]]></description>
			<content:encoded><![CDATA[<p>Scenario: An Institution has set up   an Institutional Twitter account for disseminating news on activities and   events. The unspoken expectation is that Twitter will be used across the   Institution as an individual productivity and social tool. However, one   department has become an early adopter of the technology and is using it in   teaching and learning, and research contexts. The Head of this department is   now suggesting that a formal policy for capture and preservation of Twitter   messages be enacted.</p>
<h3>Approaches</h3>
<p>Capturing Twitter tweets is straightforward as they   are easily downloaded from the RSS feeds that the service generates. These   could be converted into text documents or web pages, or even imported into a   database or blogging software. A number of solutions are available for tweet   preservation such as Twapper Keeper (<a href="http://twapperkeeper.com/index.php">http://twapperkeeper.com/index.php</a>),   FriendFeed (<a href="http://friendfeed.com/">http://friendfeed.com/</a>), the   WordPress Lifestream plugin (<a href="http://wordpress.org/extend/plugins/lifestream/">http://wordpress.org/extend/plugins/lifestream/</a>),   What the Hashtag (<a href="http://wthashtag.com/Main_Page">http://wthashtag.com/Main_Page</a>)   and Tweetdoc (<a href="http://www.tweetdoc.org/">http://www.tweetdoc.org/</a>).   More significant is the need to define an Institutional decision about the   value of these resources. Why keep Twitter tweets?</p>
<ul>
<li>Corporate record: Is a   Twitter a digital resource that information professionals should be   interested in capturing or preserving? At what point does a Twitter turn into   a record that requires the attention of the records manager?</li>
<li>Scholarly record: can it be   demonstrated that tweets are part of the scholarly record that is not   captured in other forms?</li>
<li>Legal reasons: Is Twitter   being used to deliver learning? Is there a legal requirement to record what   has been sent, as part of the assessment record?</li>
</ul>
<h3>Possible outcomes from the decision:</h3>
<ul>
<li>It is agreed that Twitter   posts are transient and do not need to be preserved. Records of corporate   activity are something Institutions consider within their archiving policies,   and tweets could be considered part of that corporate record. The decision   needs to be reviewed as some information / communication media may take over   the role of others.</li>
<li>As a recognised official   Institution publication, the posts need to be subject to Quality Assurance   and editorial processes, which include keeping a record of the posts.</li>
<li>An informal log of posts is   kept, in order to have a record of topics that have been covered, audit the   number of posts, be able to identify any significant impact from these   services, etc.</li>
</ul>
<h3></h3>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-institutional-use-of-twitter/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Case Study – Home page history</title>
		<link>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-home-page-history/</link>
		<comments>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-home-page-history/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:36:57 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=80</guid>
		<description><![CDATA[Scenario: Your institution is about to commemorate an important event and the Vice Chancellor  wants to highlight that the Institution is actively engaging with new technologies, and so would like to provide an example of how the Institution&#8217;s website has developed since it was launched. Issues How has your Institutional home page changed over time? [...]]]></description>
			<content:encoded><![CDATA[<p>Scenario: Your institution is about to commemorate an important event and   the Vice Chancellor  wants to highlight   that the Institution is actively engaging with new technologies, and so would   like to provide an example of how the Institution&#8217;s website has developed   since it was launched.</p>
<h3>Issues</h3>
<ul>
<li>How has your Institutional   home page changed over time?</li>
<li>Have you kept records of the   changes and the decisions which were made (and how they were made)?</li>
<li>If you needed to do this for   your Institution, do you feel you would be able to deliver a solution? How   far back could you go?</li>
</ul>
<h3>Approaches</h3>
<ul>
<li>The Internet Archive (see   Chapter 7) has taken snapshots of websites since 1996 and may have captured   web pages from your Institution. The University of Bath   used the snapshots of its home page captured by Internet Archive to   illustrate how it had changed between 1997 and 2007: an animated   visualisation of the changes, linking to the IA&#8217;s snapshots, is available on   UKOLN&#8217;s website. However, there is no guarantee that the Internet Archive   will have captured every iteration of your Institution&#8217;s website, nor that   the copies it has are complete and fully functional.</li>
<li>Even if there are few, or no,   surviving copies of previous versions of your website, there is no time like   the present to start making sure snapshots are kept, either by taking your   own copies, or ensuring the Internet Archive takes a copy. You can use an   online form to nominate a site for crawling by the Internet Archive. It is   also possible to nominate your site for capture by the UK Web Archive (see   Chapter 7).</li>
<li>Another approach is to build   a compiled online history. The University    of Virginia maintains a   web page detailing 14 years of its website history. It includes fascinating   statistical information based on analysis of the web server logs. Copies of   the website are not available before 1996 and the image of the website in   1996 is taken from the Internet Archive. All subsequent snapshots are hosted   on the main U.Va website, in subdirectories (/virginia1999, etc.). Some years   are missing: whether because the changes were insignificant, or no copy   survives, is not clear. Although there are broken links in the archived   sites, or there are anachronistic links to current versions of pages, the   archived snapshots provide a valuable view of the evolution of the   Institution&#8217;s web presence.</li>
</ul>
<h3></h3>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/case-study-%e2%80%93-home-page-history/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 9 &#8211; What policies need to be developed?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-9-what-policies-need-to-be-developed/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-9-what-policies-need-to-be-developed/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:35:38 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=78</guid>
		<description><![CDATA[Summary: All existing policies which may have an impact on web preservation should be located and assessed. A policy review should be carried out. It is unlikely that any Institution will have a single policy or mission statement that governs everything that should be happening in relation to websites and web resources. Any relevant Institutional [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>All existing policies which   may have an impact on web preservation should be located and assessed.</li>
<li>A policy review should be   carried out.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>It is unlikely that any Institution will have a single policy or mission statement that governs everything that should be happening in relation to websites and web resources. Any relevant Institutional statements are probably scattered across several places and departments; further, any guidance relating to the creation, storage and preservation of web-based materials may only be implied rather than made explicit.</p>
<p>However, the first step is to investigate the following sources (where available) as they may prove helpful.</p>
<ul>
<li>Institutional mission statement.</li>
<li>Legal or legislative mandate.</li>
<li>Regulatory requirements.</li>
<li>Change management policy and procedures.</li>
<li>Terms and conditions of website use, website privacy statement, accessibility policy, disclaimer, and copyright notice.</li>
<li>Acceptable use policy / regulations concerning use of Institutional computing.</li>
<li>Code of conduct for work areas and use of software.</li>
<li>Web publishing policies and guidelines.</li>
<li>IT security policy.</li>
<li>System administration code of practice.</li>
<li>Blogging terms and conditions.</li>
<li>Records management policy.</li>
<li>Archivist&#8217;s collection and preservation policies.</li>
<li>Digital library guidelines.</li>
<li>Institutional Repository deposit agreements.</li>
<li>E-learning object repository policies.</li>
<li>Any institutional or departmental policies governing information management, asset management, or knowledge management.</li>
</ul>
<p>It may also be useful to locate the Minutes of any Committees or Advisory Groups in the Institution which formulate web development strategies or advise on policy and current development activities.</p>
<h2>Assessing policies</h2>
<p>Once all relevant policies are located, the following should be considered:</p>
<ul>
<li>Do any policies refer explicitly to web resources?</li>
<li>Do the policies refer to the three proposed web resource classes? See Chapter 2.</li>
<li>Do the policies suggest any action with regard to keeping web resources?</li>
<li>Is there any scope for influencing the behaviour of those who create and use web resources?</li>
<li>Is there any scope for assigning responsibilities for creation, capture and management of web resources to individuals?</li>
<li>Would these policies allow the carrying out of preservation actions?</li>
<li>Would these policies prevent the carrying out of preservation actions?</li>
</ul>
<p>For example, a records manager&#8217;s retention schedules may not explicitly mention web resources by name as the content of the record tends to be identified rather than describing the form it is in. Therefore these, and other policies, will need to be translated, or adapted, to address the preservation of web resources. This includes all web resources and must stand the test of time without the need for endless revision.</p>
<h2>Policy review</h2>
<p>Reviewing policies and procedures is vital and should be embedded as a continual review action within the preservation process itself. As part of its long-term and evolving strategy, the Institution should:</p>
<ul>
<li>Strive to define technology-neutral policies. The policies should not be dependent on a choice of software, nor the format of the resource.</li>
<li>Apply the policies to emerging systems.</li>
<li>Make sure that its web resources and their management are explicitly covered by appropriate policies.</li>
<li>Separate decisions about what policy says would be ideal, from what is achievable using current resources and technology.</li>
</ul>
<p>Decisions made at these stages should be taken at Institutional level, so that ways can be found of embedding the decisions in practice, or matching them up to existing policies.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Review all        related strategies, policies and guidelines, and identify any gaps.</li>
<li>Start the        process of ensuring web preservation is covered by all appropriate        strategies, policies and guidelines. This is important if the        Institution is to be the owner of the web preservation programme.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-9-what-policies-need-to-be-developed/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 8 – What approaches should I take?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-8-%e2%80%93-what-approaches-should-i-take/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-8-%e2%80%93-what-approaches-should-i-take/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:34:41 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=76</guid>
		<description><![CDATA[Summary: Four main operational models can be considered. The quick win approaches include domain harvesting, carrying out pilot projects and considering using the EDRMS. Strategic approaches include information lifecycle management, adapting records management approaches, and the continuity approach. Operational models There are various operational models for developing and implementing a web preservation programme. Selecting a [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>Four main operational models can   be considered.</li>
<li>The quick win approaches   include domain harvesting, carrying out pilot projects and considering using   the EDRMS.</li>
<li>Strategic approaches include   information lifecycle management, adapting records management approaches, and   the continuity approach.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<h2>Operational models</h2>
<p>There are various operational models for developing and implementing a web preservation programme. Selecting a model will depend on balancing such factors as costs, risks, priorities, and available resources and infrastructure.</p>
<p><strong>In-house</strong>: the programme is resourced, managed and implemented within the Institution. This approach offers the most flexibility and control, assuming the necessary skills and resources are available. In the USA, the Harvard Web Archive Collection Service (WAX) is an example of a successful in-house project that relied heavily on the active participation of the creators of the resources. 48 Harvard websites were put into the collection, representing Departments, Committees, Schools, Libraries, Museums, and educational programmes.</p>
<p><strong>Contracted-out</strong>: all or some of the work is performed by a contractor. It is unlikely a single contractor will have all the skills and resources to perform the entire package, and Institutions may need to look at contracting out services for e.g. hosting and storage or website hosting, while retaining the curational and selection elements in-house.</p>
<p><strong>Collaborative</strong>: one or more Institutions work together, pooling skills and resources towards a common goal. JISC often fund such projects which bring together initiatives from two or more HFE Institutions, particularly for collaborative tools and e-learning projects. However, as yet, no funded programme for website preservation exists.</p>
<p><strong>Consortium</strong>: the programme is implemented by a consortium of organisations, using some shared resource or infrastructure. The IIPC is an example of the consortium approach. In this country, the UK Web Archive used to fill this role, although its role as a consortium appears not to be as active. The activities of the Digital Preservation Coalition&#8217;s Web Archiving and Preservation Task Force will also prove valuable in this collaborative area. It aims to ‘provide a mutually supportive environment for continued policy development for members and a mechanism through which non-members can engage with web archiving policy’.</p>
<h2>Approaches</h2>
<p>Once the operational model is decided, there are two main classes of approach:</p>
<p>1) <strong>Quick win: </strong>This can be used to protect resources identified as being most at risk and the approaches include domain harvesting, remote harvesting, pilot projects, and using an EDRMS. They may be attractive because they are quick, and some of them can be performed without involving other people or requiring changes in working. However, they may become expensive to sustain if they do not evolve into strategy.</p>
<p>2) <strong>Strategic: </strong>This includes longer-term solutions which take more time to implement, involve some degree of change, and affect more people in the Institution. These approaches are adapted from Lifecycle Management and records management, and also may involve working with external organisations that will do the work (or some of it) for the Institution. The pay-off may be delayed in some cases, but the more these solutions become embedded in the workflow, the more web archiving and preservation becomes a matter of course.</p>
<h2>Quick win</h2>
<p>Domain harvesting refers to two possible approaches:</p>
<ul>
<li>The Institution conducts its own domain harvest, sweeping the entire domain (or domains), using appropriate web crawling tools.</li>
<li>The Institution works in partnership with an external agency to do domain harvesting on its behalf.</li>
</ul>
<p>Domain harvesting is only ever a partial solution to the preservation of web content as there are limitations to the systems which currently exist. Too much content may be gathered, including that which does not need to be preserved. Conversely, content which ought to be collected may be missed including hidden links, secure and encrypted pages, external domains, database-driven content, and databases. Simply harvesting the material and storing a copy of it may also not address all the issues associated with preservation.</p>
<p>Instead of trying to solve the web resource problem for an entire Institution, a pilot project could be carried out from which a visible result will be seen quite quickly. This may make it more persuasive for other departments to participate and will add credibility to the programme. Pilot projects can also generate useful reports about lessons learned that can be used to make the next project even more successful. In addition fewer stakeholders may be involved, if the project is scoped tightly enough, thus saving time on consulting users and owners of the content. Pilot projects could be targeted as outlined in Chapter 5.</p>
<p>Migration of resources from one operating system to another or from one storage/management system to another is a form of preservation. This may raise questions about emulation and performance. Can the resource be successfully extracted from its old system, and behave in an acceptable way in the new system?</p>
<p>It is not yet known how feasible it is to use an EDRMS (Electronic Document and Records Management System) for the management of web resources. These systems seem to work best with static documents; authors of reports, for example, understand that a good time to declare their report as a record is when the final approved version has been accepted. Yet one of the distinctive features of Web 2.0 content is that the information is very fluid, and often there is no obvious point at which to draw this line and fix content.</p>
<p>It is technically feasible, for example, to capture Instant Messaging outputs as text or HTML files which could be saved into an EDRMS. The question remains whether there is a defined policy that supports doing this; one that recognises use of IM as a legitimate record-keeping tool, and as a practice that is acceptable to the Institution. The attraction of storing certain web-based output in an EDRMS is that then such resources could be managed in line with agreed retention schedules; and that related records are filed together, like with like.</p>
<h2>Strategic Approaches</h2>
<p>These include Information Lifecycle Management, adapting records management approaches, or following the web continuity methodology. Any approach taken should ensure that web resources are protected from careless or wrongful destruction, deletion, or removal of the resource.</p>
<p>Information Lifecycle Management (ILM) involves recognised professional standards and practices, leading to better management of information, and is one possible approach. If a lifecycle model can be applied to web resources, they will be created, managed, stored and disposed of in a more efficient and consistent way. It can also assist with the process of identifying what should and should not be retained, and why; and that in turn will help with making preservation decisions. ILM makes no assumptions about software or IT systems, nor does it assume that all information will be managed through a single software tool; rather, it is a conceptual framework to help ensure consistency within an organisation. It can be especially helpful when introducing new systems, or reviewing existing ones.</p>
<p>Information moves through a series of phases over time. JISC&#8217;s approach to ILM proposes four distinct phases: creation; active use; semi-active use; and final outcome.</p>
<p>Information should be managed throughout each phase, and there are pertinent issues which apply. ILM can also be aligned very closely to the records management programme. An ILM approach always takes a start-to-finish, cradle-to-grave view. A model can be adapted according to your Institutional needs. The model should have a chronological structure, clearly defined phases, user identification, and consistency.</p>
<p>There is a lot of literature available including the JISCInfoNet published guidance, <em>Managing The Information Lifecycle</em>, which is geared towards the HFE sector. (See <a href="http://www.jiscinfonet.ac.uk/infokits/information-lifecycle">http://www.jiscinfonet.ac.uk/infokits/information-lifecycle</a>.)</p>
<p>If a records model can be applied to web resources, the same benefits associated with ILM apply: web resources will be created, managed, stored and disposed of in a more efficient and consistent way. The RM programme will already be established, and through the agreed retention schedules it can assist with the process of identifying what should and should not be retained, and why. All of that in turn will help with making preservation decisions. Under records management, these things will take place within a legislative and regulatory framework that enables and obliges the creation and disposal of records.</p>
<p>The ‘Web Continuity’ project involves a ‘comprehensive archiving of the government web estate by The National Archives’, and aims to address both &#8216;persistence&#8217; and &#8216;preservation&#8217; in a way that is seamless and robust. Web continuity offers concepts and ways of working that may be adaptable to a web preservation programme in an Institution, particularly as a main area of focus is the integrity of website links. The project’s use of digital object identifiers (DOIs) can marry a live URL to a persistent identifier. To achieve persistency of links, a redirection component derived from open source software is used. This component will ‘deliver the information requested by the user whether it is on the live website, or retrieved from the web archive and presented appropriately’. Of course, this redirection component only works if the domains are still being maintained, but it will do much to ensure that links persist over time.</p>
<p>Part if the project involves building a centralised registry database of Government websites, and is a means of auditing the website crawls that are undertaken. Such a registry approach is worth considering on a smaller scale for an Institution as would the project’s methodology for rolling out XML site maps across government. These site maps can help preservation because they help to expose hidden content that is not linked to by navigation, or dynamic pages created by a CMS or database.</p>
<p>The intended presentation method will make it much clearer to users that they are accessing an archived page instead of a live one and helps to address any potential liability issues arising from members of the public acting upon outdated information.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Decide on the        most appropriate approach for your Institution. Options include: doing        everything in-house; contracting out all or some of the work’;        collaborating with other Institutions; working within a consortium.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-8-%e2%80%93-what-approaches-should-i-take/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 7 – Who should be involved?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-7-%e2%80%93-who-should-be-involved/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-7-%e2%80%93-who-should-be-involved/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:33:25 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=74</guid>
		<description><![CDATA[Summary: The Institution must take ownership of the web preservation programme at the highest level so that staff and students are motivated to take part. There are some national and international initiatives which might help. Success with the preservation of web resources will potentially involve the participation and collaboration of a wide range of experts: [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>The Institution must take   ownership of the web preservation programme at the highest level so that   staff and students are motivated to take part.</li>
<li>There are some national and   international initiatives which might help.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Success with the preservation of web resources will potentially involve the participation and collaboration of a wide range of experts: information managers, asset managers, web managers, IT specialists, system administrators, records managers, and archivists. Each of the participants will have an interest derived from their role which means some may be more driven in terms of the process than others.</p>
<p>For example, the interest of records managers will be in legal compliance, long-term RM goals, retention, disposal, and classification of those web resources classed as records. This is a central activity in their role, whereas web managers may be less driven as their role is more about managing current content and new technological initiatives. Each needs to consider the other’s view of preservation and work together to make the web preservation programme a success.</p>
<p>Records managers need to:</p>
<ul>
<li>recognise that the web is a potential place where records can occur, and identify where, how, when and by what agency this is happening;</li>
<li>rethink some of their traditional models as centralised control over web resources is not possible;</li>
<li>overcome any fear of IT, and forge relations with people like the web manager, system administrator, and IT manager.</li>
</ul>
<p>Web managers:</p>
<ul>
<li>need to recognise that the web is a potential place where records can occur, and that they have some responsibility to ensure they are protected;</li>
<li>should think twice before deleting everything or disabling an account;</li>
<li>should exploit software for better ways to capture and manage the content;</li>
<li>consider preservation-friendly software for the next purchase of web tools.</li>
</ul>
<p>As well as those with the specific roles identified above the preservation programme must also involve the stakeholders identified during the appraisal process (see Chapter 5).</p>
<h2>The Institution as owner</h2>
<p>It is vital that the Institution takes ownership at the highest level as this will motivate all Institutional staff and students to take part in the web preservation programme. Also, effective web preservation needs to be policy-driven as it is about changing behaviour, and consistently working to policies. A clear policy is needed that states the importance and value of the Institution’s web resources, and makes it clear why some of them are being preserved. There should be a sense of corporate ownership of the Institution&#8217;s website, the web publication programme, web resources that have value and need to be preserved, and all other issues associated with making the resources ready for preservation such as capture, storage and management.</p>
<p>There are many drivers for an Institution to take ownership of the web preservation programme and these are discussed in Chapter 3. However, the strategic, legal, financial and contractual obligations coupled with the risk issues should provide enough impetus for the Institution to take action.</p>
<h2>Technology is not the answer</h2>
<p>Web resources have existed in UK HFE Institutions for many years, and so have the tools that would help an Institution capture, manage and store those resources. The fact that these tools are not being widely used is an indicator that web preservation is not primarily a technological problem: the solution does not lie in buying new software or more software. There is no single tool that addresses all possible web preservation issues (behaviour, dynamic content, scripts, versioning, etc.), so any programme of work associated with web preservation needs to be properly resourced, with team-based and collaborative approaches drawn from across more than one discipline.</p>
<h2>Can other people do it for you?</h2>
<p>The Internet Archive (<a href="http://www.archive.org/">http://www.archive.org</a>), also know as the &#8216;Wayback Machine&#8217;, is unique in that it has been gathering pages from websites for so long that it holds web material that cannot be retrieved or found anywhere else, and would have been lost.</p>
<p>The Internet Archive has ways which allow anyone to submit a website for inclusion in the Archive including a subscription service called Archive-It (<a href="http://www.archive-it.org/">http://www.archive-it.org/</a>). The advantage of this service is that distinct web archives (collections) are created containing only the content selected for harvesting and at the chosen frequency. These can be catalogued and managed directly by the subscriber. The assumption is that archived copies will be made public, via the Internet Archive, although arrangements can be made to keep them private.</p>
<p>Additionally, people are encouraged to use the Internet Archive as a sort of &#8216;People&#8217;s Repository&#8217;. By registering, it is possible to upload images, texts, moving images, and audio material, thus making use of IA&#8217;s considerable storage capacity. Again, in return for free storage, it is expected that your resource is made public.</p>
<p>A few caveats about the suitability of this solution to UK HFE Institutions:</p>
<ul>
<li>IA lacks an explicit preservation principle or policy, and has no real mandate to capture websites outside of a societal desire to see it happening and to share the results with the public. This may cause severe problems to HFE Institutions; as it may not cover everything the Institution needs to do within its remit.</li>
<li>There is potential for legal difficulties and litigation. IPR issues may not be adequately dealt with by Creative Commons and the IA&#8217;s &#8216;notice and take down&#8217; approach.</li>
<li>IA may not have a sustainable funding model and its continuance is largely dependent on the generosity of its creator, Brewster Kahle.</li>
</ul>
<p>There are additional caveats about the technical failings of Internet Archive:</p>
<ul>
<li>IA will not capture all web-based assets and cannot capture any site or service that depends on a database, or a login.</li>
<li>IA cannot guarantee capture to a reliable depth, or reliable quality. Note that this applies to dynamic content.</li>
<li>There can be large gaps between capture dates.</li>
<li>The image assets in IA are always smaller than archive quality copies.</li>
<li>IA may not be preserving the resources they capture to OAIS standards.</li>
<li>There is little in the way of contextual information in their catalogues.</li>
</ul>
<p>A number of Institutional assets are missed out by the IA approach including library catalogues, image collections, e-print collections with a database, and interactive teaching materials.</p>
<p>The UK Web Archive (UKWA) is a British Library initiative which started gathering and curating websites in 2004. The archive is free to view, accessed directly from the web itself and has collected thousands of websites. UKWA&#8217;s approach is selective, and determined by written selection policies. The Archive contains UK websites that publish research, reflect the diversity of lives, interests and activities throughout the UK, and demonstrate web innovation.</p>
<p>It may be possible to nominate a Institutional website for capture with UKWA but it may not be selected or archived. The current position on this process is that owners of UK website are especially encouraged to nominate their sites but that UKWA reserves the right to decide whether to include the nominated sites.</p>
<p>Institutions may also want to bear the following in mind:</p>
<ul>
<li>The capture will be a snapshot of the website at a certain date and time.</li>
<li>Certain resources will be beyond the reach of the Heritrix crawler (e.g. databases, secure and password-protected pages, and hidden links).</li>
<li>Similarly, if the website depends heavily on server-side architecture, then remote capture may fail.</li>
</ul>
<p>If the website is selected by UKWA, it will involve a few practical things:</p>
<ul>
<li>Signing a permissions agreement to allow remote harvesting and copying.</li>
<li>Agreeing to have the archived copy made publicly available.</li>
<li>Allowing the remote harvester to ignore robot exclusions.</li>
</ul>
<p>A UKWA solution is better than nothing but there are limitations, and it may not constitute a quality solution to preservation of all the Institution’s web resources. (<a href="http://www.webarchive.org.uk/ukwa/">http://www.webarchive.org.uk/ukwa/</a>)</p>
<p>The International Internet Preservation Consortium (IIPC) will not help to harvest an Institution&#8217;s website, but they are an internationally-recognised body of excellence for website preservation. The mission of the IIPC is to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.</p>
<p>The goals of the consortium are (<a href="http://www.netpreserve.org/about/index.php">http://www.netpreserve.org/about/index.php</a>):</p>
<ul>
<li>To enable the collection, preservation and long-term access of a rich body of Internet content from around the world.</li>
<li>To foster the development and use of common tools, techniques and standards for the creation of international archives.</li>
<li>To be a strong international advocate for initiatives and legislation that encourage the collection, preservation and access to Internet content.</li>
<li>To encourage and support libraries, archives, museums and cultural heritage institutions everywhere to address Internet content collecting and preservation.</li>
</ul>
<p>Other external initiatives, mostly library-based and sponsored at a National level, aim to complete selective web collections, often based on the aim of archiving the entire &#8216;national&#8217; domain. They are provided here as they may be able to offer some useful lessons learned. However, they will not be able to assist with the archiving of an Institution’s website. Some of the National Library collections are not open to the public.</p>
<p>MINERVA, Library of Congress (<a href="http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html">http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html</a>). A selective web archive based on themes of national importance, i.e. national political elections, wars and terrorism.</p>
<p>PANDORA (Preserving and Accessing Networked Documentary Resources of Australia) (<a href="http://pandora.nla.gov.au/">http://pandora.nla.gov.au/</a>). National Library of Australia with nine other Australian libraries and cultural collecting organisations.</p>
<p>UK Government Web Archive (<a href="http://www.nationalarchives.gov.uk/webarchive/">http://www.nationalarchives.gov.uk/webarchive/</a>). A selective collection of UK Government websites, archived at regular intervals from August 2003, developed by The National Archives using Internet Archive.</p>
<p>Austrian On-line Archive (AOLA), Austrian National Library and Technical University of Vienna (<a href="http://www.ifs.tuwien.ac.at/%7Eaola/">http://www.ifs.tuwien.ac.at/~aola/</a>).</p>
<p>DAMP (Digital Archive for Web Publications) (<a href="http://www.nsk.hr/DigitalLib.aspx?id=80">http://www.nsk.hr/DigitalLib.aspx?id=80</a>), University of Zagreb and the National and University Library (NUL) in Zagreb, Croatia.</p>
<p>Kulturarw3 &#8211; KB Web Archive, Royal Library &#8211; the National Library of Sweden.</p>
<p>The State and University Library and the Royal Library, Denmark (<a href="http://netarchive.dk/">http://netarchive.dk/</a>).</p>
<p>WebArchiv, National Library of the Czech Republic and Masaryk University in Brno (<a href="http://en.webarchiv.cz/">http://en.webarchiv.cz/</a>).</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Consider who        needs to be involved to ensure the web preservation programme is a        success not just in the short-term but also in the long-term.</li>
<li>Discuss the        programme with all the relevant people, perhaps setting up a working        group to facilitate the discussions.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-7-%e2%80%93-who-should-be-involved/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 6 – How do I capture them?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-6-%e2%80%93-how-do-i-capture-them/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-6-%e2%80%93-how-do-i-capture-them/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:32:16 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=72</guid>
		<description><![CDATA[Summary: The capture of web resources can be carried out within the authoring system or server, at the browser or with a crawler. Each process has its advantages and disadvantages. Tools are available for each type of capture. Tools A number of tools are available for capturing web resources: Tools for capturing web resources. Workflow [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>The capture of web resources   can be carried out within the authoring system or server, at the browser or   with a crawler. Each process has its advantages and disadvantages.</li>
<li>Tools are available for each   type of capture.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<h2>Tools</h2>
<p>A number of tools are available for capturing web resources:</p>
<ul>
<li>Tools for capturing web resources.</li>
<li>Workflow systems and curatorial tools.</li>
<li>Snapshot tools.</li>
</ul>
<p>For more see Appendix C. Netpreserve.org and Harvard University Library also have lists of tools.</p>
<h2>Point of capture</h2>
<p>There are three points in the journey of a web page, from server to user, where its capture is likely to be most feasible.</p>
<p>1. Capture within the authoring system or server: This involves retrieving web pages directly at their point of origin, usually the content management system, or the server on which web pages are held. It is possible to get all the content (HTML, CSS, GIFs, JPEGs, etc) however, increasingly web pages are formatted &#8216;on-the-fly&#8217; to suit the specific needs of the user that is requesting them (e.g. type of browser, small screen or large screen, desktop device or mobile phone). So this method raises the question of which of these possible versions should be captured.</p>
<p>2. Capture at the browser: This could also be described as capture post-rendering, or at the point of the HTTP transaction. It implies something of a snapshotting approach, and such a snapshot is going to result in frozen content.</p>
<p>3. Capture with a crawler: Using a crawler is going to resolve some of the problems of other methods but not all. Crawlers are unlikely to succeed totally as they miss other external sources such as document servers, databases and datafeeds, internal databases, subscription databases, and file management platforms. Web content management systems, access methods, protocols, and security and logins, may also present barriers.</p>
<p>Many crawlers, including Heritrix, are also prone to the &#8216;collateral harvesting&#8217; problem. This means they can gather lots of content which is not needed, by blindly following links. There are ways of setting exclusion filters to prevent this, but its behaviour can still be unexpected.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="108" valign="top">
<h3>Types of capture</h3>
</td>
<td width="228" valign="top">
<h3>Advantages</h3>
</td>
<td width="224" valign="top">
<h3>Disadvantages</h3>
</td>
</tr>
<tr>
<td width="108" valign="top">
<h3>Capture   within the authoring system or server</h3>
</td>
<td width="228" valign="top">Easy to perform if the server is owned by the Institution.</p>
<p>Works in the short to medium term, for internal purposes.</td>
<td width="224" valign="top">Captures raw information, not presentation.</p>
<p>May be too dependent on authoring infrastructure or CMS.</p>
<p>Not good for external access.</td>
</tr>
<tr>
<td width="108" valign="top">
<h3>Capture   at the browser</h3>
</td>
<td width="228" valign="top">Relatively simple for well-contained sites.</p>
<p>Commercial tools for doing it exist.</td>
<td width="224" valign="top">What the user sees is captured (but it is not necessarily known   why).</p>
<p>Treats web content like a publication: frozen.</p>
<p>Loses behaviour and other attributes.</td>
</tr>
<tr>
<td width="108" valign="top">
<h3>Capture   with a crawler</h3>
</td>
<td width="228" valign="top">Most widely-used method.</p>
<p>Defers some access issues.</p>
<p>Provides link re-writing.</p>
<p>Provides embedded external content: from archive or live.</td>
<td width="224" valign="top">Lots of work, tools and experience are necessary.</p>
<p>Presents many problems for capture: often not everything is   captured, or too much is captured.</td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Decide on your capture   approach.</li>
<li>Assess the tools and decide   which is/are the most appropriate for your Institution.</li>
<li>Define policies for the   capture of, and access to, preserved web resources.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-6-%e2%80%93-how-do-i-capture-them/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 5 – How do I decide what to preserve?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-5-%e2%80%93-how-do-i-decide-what-to-preserve/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-5-%e2%80%93-how-do-i-decide-what-to-preserve/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:31:37 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=70</guid>
		<description><![CDATA[Summary: The appraisal process looks at the location, use and ownership of the web resources, and particularly aims to ensure that web resources containing unique information are preserved. Resources which are managed elsewhere (e.g. in asset collections or Institutional repositories), or are of little or no value, can be omitted from the web preservation programme. [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>The appraisal process looks   at the location, use and ownership of the web resources, and particularly aims   to ensure that web resources containing unique information are preserved.</li>
<li>Resources which are managed   elsewhere (e.g. in asset collections or Institutional repositories), or are   of little or no value, can be omitted from the web preservation programme.</li>
<li>The MoSCoW method enables prioritisation of   resources.</li>
<li>The selection approach can be   unselective (domain harvesting) or selective (criteria-based or event-based).</li>
<li>Which aspects of resources   (information and experience), and which elements (content, appearance and   behaviours) to preserve must be considered.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Careful appraisal and selection of web resources will make the task of web preservation more manageable as this will make it clear what should, and should not, be included in the scope of the web preservation programme. Also, within that list, each web resource can be assigned a priority for action.</p>
<p>Some issues to bear in mind when deciding which of the resources to preserve are:</p>
<ul>
<li>The Institutional structure and its aims.</li>
<li>The policies and drivers for preservation.</li>
<li>The legal record-keeping and audit requirements for the Institution.</li>
<li>The potential reuse value of resources.</li>
<li>Is the resource needed by staff to perform a specific task?</li>
<li>Has the resource been accessed in the last six months?</li>
<li>Does the resource represent a significant financial investment in terms of staff cost and time spent creating it?</li>
</ul>
<p>It is particularly important to facilitate the survival of web resources which contain unique information such as those:</p>
<ul>
<li>which only exist in web-based form &#8211; for example, teaching materials designed as web pages;</li>
<li>which do not exist anywhere else but on the website;</li>
<li>whose ownership or responsibility is unclear, or lacking altogether;</li>
<li>that constitute records, according to the definitions in Chapter 2;</li>
<li>that have potential archival value, according to definitions supplied by the archivist.</li>
</ul>
<h2>Appraisal and selection</h2>
<p>Appraisal identifies those web resources which constitute records, publications and artefacts (as defined in Chapter 2) for preservation. The web resources selected should provide information about, and evidence of, what the Institution has done and why, what it and its staff and students have achieved, and its impact locally and in the wider world.</p>
<p>In simple terms, appraisal of HFI resources for preservation should focus on:</p>
<ul>
<li>substantive functions (i.e. teaching, research, academic award administration) and;</li>
<li>substantive elements (e.g. strategy development, policy development) of facilitative functions (e.g. governance, estate management, public relations).</li>
</ul>
<p>(From JISC Infonet (2007) <em>Guidance on Archival Appraisal</em>)</p>
<p>The questions that need to be answered for appraisal relate to: location; use; and ownership. However there are some resources which can immediately be excluded from the selection list.</p>
<p>As indicated in Chapter 2, web resources can be found on web server(s), in managed systems or in less well-managed systems which may be hosted externally. The locations of these servers and systems must be discovered so consulting the web manager and the IT manager is the first step. This should result in finding out the:</p>
<ul>
<li>number and names of domains and sub-domains being used, including staff and student intranets and portals, funded project websites and any others. It may prove difficult to track them all down as departments and other areas of the Institution may register their own domains;</li>
<li>number and location of web servers. Again this may be difficult to do because of the possible autonomy of departments but the IT Security staff should be able to help as they will manage the firewall;</li>
<li>managed systems and their locations;</li>
<li>less managed systems (where known). Discussions with stakeholders are most likely to track these down;</li>
<li>backup schedules for all relevant systems and servers;</li>
<li>resources with external dependencies.</li>
</ul>
<p>Once the locations of the web resources are known, each should be appraised to find out its purpose and how it is being created. It is particularly important to find out if original resources are being created (i.e. those that are not available in any other format).</p>
<p>Finding out the owners of the web resources will identify the stakeholders of the web preservation programme. Each owner will be able to provide valuable help in finalising the programme as they can say which resources they would like to be kept and why, and for how long they need to be kept. This approach will also help get people on board and start to embed a culture of good practice and web management within the Institution.</p>
<p>However it is important to be objective when considering the information provided by the stakeholders as it is likely that they may want to keep everything indefinitely. Conversely, it is all too easy to sweep resources away simply because the stakeholder is not around to defend their interests.</p>
<p>There are some resources which can be immediately excluded from the preservation programme including those that are already being managed elsewhere, and those which have little or no value.</p>
<p><strong><em>Asset Collections</em>.</strong> For some asset collections, or e-resource collections, the web is often just an access tool for the underlying information resource so preservation actions are best concentrated directly on that resource. This class might include: digitised images; research databases; electronic journals; ebooks; digitised periodicals; examples of past examination papers; and theses.</p>
<p><strong><em>Institutional repositories</em></strong> (examples include DSpace, eprints or Fedora). Institutional repositories are web-based tools, but the materials stored in an IR are already being managed as there are elements such as metadata profiling, secure and managed storage, backup procedures, audit trails of use, and recognised ownership. A well-managed IR therefore already constitutes a recognised digital preservation method in itself.</p>
<p><strong><em>Duplicate copies</em></strong><strong>.</strong> In some cases, the website is a pointer to resources that are stored and managed somewhere else. Or the resource has been uploaded from a drive which is owned and maintained by another department. If it is ascertained that the &#8216;somewhere else&#8217; is already being preserved, then it may not be necessary to keep the website copies.</p>
<p><strong><em>Institutional web-based applications which deliver a common service. </em></strong>In this case the web application is an incidental component used in the management of such services. Quite often the important record component in such instances is actually stored or managed elsewhere, for example in a database of underlying data.</p>
<p><strong><em>Services which do not generate any informational material of lasting value to the Institution. </em></strong>Some examples of this are room booking systems, systems which allow automated submission of student work for assessment, or circulation of examination results.</p>
<p><strong><em>Resources which clearly fall outside the scope of an agreed records retention policy, or an archival selection policy. </em></strong>Examples of this might include Twitter and Instant Messaging, unless evidence can be found of a strong Institutional driver to retain and manage such outputs. (See case study on page 13.)</p>
<p>Prioritisation is fundamental to successful preservation &#8211; keeping everything is rarely possible. So, when considering what to preserve and what not to preserve, the MoSCoW method can be used as this classifies the requirements as one of the following:</p>
<ul>
<li>M: resources which must be preserved.</li>
<li>S: resources which should preserved, if at all possible.</li>
<li>C: resources which could be preserved, if it does not affect anything else.</li>
<li>W: resources which won&#8217;t be preserved.</li>
</ul>
<p>Even after carrying out the MoSCoW method, it is possible that the list of web resources in the M and S categories will be long and potentially unmanageable. So the MoSCoW method can then be carried out again with just those in the M category to find out the order in which the resources should be tackled.</p>
<h2>Selection approaches</h2>
<p>Three main approaches (one unselective and two selective) have arisen from the work carried out by National Libraries and these can be adapted to the requirements of an HFE Institution.</p>
<p>1. Unselective approach &#8211; bulk / domain harvesting: This could involve harvesting the entire website, and/or all its associated domains. Some argue that it is cheaper and quicker to be unselective than to go through the time-consuming selection route; that it is demonstrably less &#8216;subjective&#8217; and will produce a more accurate picture of the web resource collections; and that since it is technically feasible, why not?</p>
<p>However, aspects of those arguments are more applicable to a digital archive or repository trying to scope its collection within certain affordable and pragmatic boundaries. Secondly, there is no point in capturing &#8216;everything&#8217; if it has already been established that there are significant quantities of web resources in the Institution that do not even need capture, let alone preservation. In running a frequent domain-wide harvest of Institutional networks, there is a risk of creating large amounts of unsorted and potentially useless data, and committing additional resources to its storage.</p>
<p>2. Selective approach &#8211; criteria-based selection: This could entail selecting web resources according to a pre-defined set of criteria, for example:</p>
<ul>
<li>All resources owned by one Department.</li>
<li>One genre of web resource (e.g. all blogs).</li>
<li>Resources that share a common subject, or related subjects.</li>
<li>All resources that affect students or staff only.</li>
<li>All funded projects with web-based deliverables.</li>
<li>All resources thought to be at risk of loss.</li>
<li>All records or all publications.</li>
<li>Resources that would most benefit an external user community (e.g. alumni).</li>
<li>Resources covering a pre-determined theme.</li>
</ul>
<p>3. Selective approach &#8211; event-based: Consider if there would be value in taking &#8216;before and after&#8217; snapshots of certain web pages, if agents of change are known to be at work. The sort of time-based events which might trigger a decision to capture are:</p>
<ul>
<li>End or beginning of term, or beginning of a new academic year.</li>
<li>Appointment, or departure, of a senior official.</li>
<li>Completion of a major piece of research.</li>
<li>Publication of the new prospectus.</li>
<li>Purchase of new authoring software which affects web content.</li>
<li>Corporate or Institutional re-branding.</li>
<li>Formation of a new department.</li>
</ul>
<p>Having decided on the approach, the next step is to decide which aspects and elements of web resources must be captured.</p>
<h2>Aspects</h2>
<p>It is possible to make a distinction between preserving an <em>experience</em> (experience of accessing web content including all its attendant behaviours and aspects) and preserving the <em>information</em> (all meaningful content including words, figures, images, audio) which the experience makes available. Both are valid preservation approaches and both achieve different ends.</p>
<p>Deciding which aspects of web resources to capture can be informed to a large extent by the Institutional drivers, and the agreed policies for retention and preservation.</p>
<p>A few examples are given below.</p>
<p><strong>Evidential and record-keeping</strong>: As well as the content, this would involve preserving some form of change history, with as much contextual information as possible. This may not apply to all the web resources, just to those which are needed for legal purposes, to protect the Institution, where decision-making is involved, etc. For such resources the following should be captured and preserved:</p>
<ul>
<li>an audit trail of changes and a change history;</li>
<li>contextual information about people (authors, users etc), and dates and times (creation, change and publication dates etc);</li>
<li>the content, appearance and behaviour of the resource.</li>
</ul>
<p><strong>Repurposing and reuse:</strong> For web resources which are being reused and potentially repurposed in a different context (or even on a different server), it would make sense to preserve:</p>
<ul>
<li>the content, appearance and behaviour of the resource;</li>
<li>contextual metadata about its creation, its original location, its authorship, its access rights, etc.</li>
</ul>
<p><strong>Social history:</strong> For web resources which are not needed for evidential purposes, but are being preserved to retain something about the history of the Institution, the capture requirements may not be as exacting. For example, if it was decided to preserve a sample of student home pages, appearance of the resource could be preserved to demonstrate how home pages looked five years ago.</p>
<h2>Elements</h2>
<p>The elements of web resources that need to be considered are content, appearance and behaviour.</p>
<p>Content: This is just the words. No links, no behaviour, no framesets, no stylesheets, no images &#8211; just plain text.</p>
<p>Appearance: This is the look and feel of the page including navigational devices, images, page layout etc.</p>
<p>Behaviour: This covers web resources which have dynamic or animated features. One example of a website with a mixture of behaviours would be a blog, which might have behaviours such as a live feed, comments which can change, site administration features, and bookmarking and tagging features.</p>
<p>If the preservation of content, appearance and behaviour is required, the job of preservation becomes more complex. It may not be feasible or desirable to capture all of these elements therefore it is important to specify which are most significant for preservation.</p>
<h2>DPC decision tree</h2>
<p>A potentially useful tool is the Decision Tree produced by the Digital Preservation Coalition. It is intended to help build a selection policy for digital resources, although it should be pointed out that it was intended for use in a digital archive or repository. The Decision Tree may have some value for appraising web resources if it is suitably adapted.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Define a policy for the   appraisal and selection of resources, consulting stakeholders as appropriate.</li>
<li>Refine the list of resources   for preservation drafted after Chapter 2.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-5-%e2%80%93-how-do-i-decide-what-to-preserve/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 4 &#8211; What is a web preservation programme?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-4-what-is-a-web-preservation-programme/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-4-what-is-a-web-preservation-programme/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:30:31 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=66</guid>
		<description><![CDATA[A web preservation programme must involve planning and activities for all the stages in the lifecycle. It involves two main phases: analysis and planning; and execution. The table below shows these phases divided into sequential stages, and the tasks included in each stage. The column on the right points to the relevant chapter(s) in this [...]]]></description>
			<content:encoded><![CDATA[<p>A web preservation programme must involve planning and activities for <em>all the stages in the lifecycle.</em> It involves two main phases: analysis and planning; and execution. The table below shows these phases divided into sequential stages, and the tasks included in each stage. The column on the right points to the relevant chapter(s) in this Guide which will provide help.</p>
<p>It may prove more successful for an Institution to manage the individual stages as a series of clearly defined projects, with owners. Indeed, some of the suggested actions fall into the category of generic project management actions, and do not have a corresponding chapter in the Guide.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td colspan="2" width="552" valign="top">
<h2>Analysis and planning phase</h2>
</td>
</tr>
<tr>
<td width="444" valign="top"><strong><br />
</strong><strong>Tasks</strong></td>
<td width="108" valign="top"><strong>Chapter</strong></td>
</tr>
<tr>
<td colspan="2" width="552" valign="top"><strong>Stage 1A – Institutional   analysis</strong></td>
</tr>
<tr>
<td width="444" valign="top">Define web preservation and web resources</td>
<td width="108" valign="top">1, 2</td>
</tr>
<tr>
<td width="444" valign="top">Discuss preservation with relevant staff</td>
<td width="108" valign="top">7</td>
</tr>
<tr>
<td width="444" valign="top">Set up a working group or project, with an owner</td>
<td width="108" valign="top">7</td>
</tr>
<tr>
<td width="444" valign="top">Define the legal requirements and Institutional priorities</td>
<td width="108" valign="top">App F</td>
</tr>
<tr>
<td width="444" valign="top">Review all related strategies, policies and guidelines</td>
<td width="108" valign="top">9</td>
</tr>
<tr>
<td colspan="2" width="552" valign="top"><strong>Stage 1B – Resource   analysis</strong></td>
</tr>
<tr>
<td width="444" valign="top">Assess the existing infrastructure and technical skills</td>
<td width="108" valign="top">2</td>
</tr>
<tr>
<td width="444" valign="top">Assess the resourcing implications of the programme</td>
<td width="108" valign="top">3, 7, 8</td>
</tr>
<tr>
<td width="444" valign="top">Consider the desired speed of implementation of the programme, and   define its aims, scope and duration</td>
<td width="108" valign="top">8</td>
</tr>
<tr>
<td width="444" valign="top"><strong>Stage 2 – Getting buy-in</strong></td>
<td width="108" valign="top"></td>
</tr>
<tr>
<td width="444" valign="top">Choose which approach model you will use (in-house, contracted   out, collaborative, consortial)</td>
<td width="108" valign="top">7, 8</td>
</tr>
<tr>
<td width="444" valign="top">Develop a business case and secure buy-in</td>
<td width="108" valign="top">3</td>
</tr>
<tr>
<td width="444" valign="top">Assess the cost effectiveness of your approach</td>
<td width="108" valign="top">7,8</td>
</tr>
<tr>
<td width="444" valign="top">Perform a risk assessment</td>
<td width="108" valign="top">3</td>
</tr>
<tr>
<td width="444" valign="top"><strong>Stage 3 – Decide on   methodology</strong></td>
<td width="108" valign="top"></td>
</tr>
<tr>
<td width="444" valign="top">Define policies for the appraisal and selection of web resources,   identifying the key functions which need owners or managers</td>
<td width="108" valign="top">5</td>
</tr>
<tr>
<td width="444" valign="top">Consult stakeholders</td>
<td width="108" valign="top">5</td>
</tr>
<tr>
<td width="444" valign="top">Decide on selection and capture approaches</td>
<td width="108" valign="top">6 AppC</td>
</tr>
<tr>
<td width="444" valign="top">Choose preservation tools</td>
<td width="108" valign="top">6</td>
</tr>
<tr>
<td width="444" valign="top">Determine an access policy</td>
<td width="108" valign="top">6</td>
</tr>
<tr>
<td width="444" valign="top">Complete web preservation programme contents</td>
<td width="108" valign="top">All chapters</td>
</tr>
</tbody>
</table>
<table border="1" cellspacing="0" cellpadding="0" width="552">
<tbody>
<tr>
<td colspan="2" width="552" valign="top">
<h2>Execution phase</h2>
</td>
</tr>
<tr>
<td width="444" valign="top"><strong><br />
</strong><strong>Tasks</strong></td>
<td width="108" valign="top"><strong>Chapter</strong></td>
</tr>
<tr>
<td width="444" valign="top">Carry out the appraisal of web resources and identify those to be   included.</td>
<td width="108" valign="top">5</td>
</tr>
<tr>
<td width="444" valign="top">Capture the selected resources</td>
<td width="108" valign="top">6, App C</td>
</tr>
<tr>
<td width="444" valign="top">Manage the storage and access of the resources, with appropriate   metadata</td>
<td width="108" valign="top"></td>
</tr>
<tr>
<td width="444" valign="top">Develop, and ensure adoption of, policies to embed preservation   into the life of the Institution</td>
<td width="108" valign="top">9</td>
</tr>
<tr>
<td width="444" valign="top">Manage the digital repository where web resources are stored, and   take such actions over time as are needed to ensure their continued access   and permanency</td>
<td width="108" valign="top"></td>
</tr>
</tbody>
</table>
<p><strong>Resources required</strong></p>
<p>The resources required are: staff; hardware and infrastructure; and software.</p>
<p>Chapter 7 covers which staff should be involved in the programme but it useful to bear in mind that the skills required are:</p>
<ul>
<li>technicians who understand websites;</li>
<li>selection and curator skills (archivists or librarians);</li>
<li>digital preservation skills;</li>
<li>permissions management.</li>
</ul>
<p>The following in terms of hardware and infrastructure may be required:</p>
<ul>
<li>dedicated servers for data storage;</li>
<li>sufficient web server capacity for delivery of resources;</li>
<li>high speed and high bandwidth Internet connection for capture;</li>
<li>sufficient connection bandwidth for users.</li>
</ul>
<p>The following software may be required:</p>
<ul>
<li>collection tools (See Appendix C);</li>
<li>harvesting methodologies;</li>
<li>database archiving;</li>
<li>specialised software for access and delivery;</li>
<li>specialised software for metadata management.</li>
</ul>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Action:</p>
<ul>
<li>Consider the desired speed of   implementation of the programme, and define its aims, scope and duration.</li>
<li>Draw up a task plan with a   timeline.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-4-what-is-a-web-preservation-programme/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 3 – Why do I have to preserve them?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-3-%e2%80%93-why-do-i-have-to-preserve-them/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-3-%e2%80%93-why-do-i-have-to-preserve-them/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:24:32 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=64</guid>
		<description><![CDATA[Summary: The drivers for carrying out web preservation are strategic, legal, financial, contractual and reputational. Web preservation also has a role to play in business continuity planning. The espida project provides a useful methodology for quantifying the value of web preservation which is helpful when developing a business case for a web preservation project. There [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">Summary:</p>
<ul>
<li>The drivers for carrying out web preservation are strategic, legal, financial, contractual and reputational.</li>
<li>Web preservation also has a role to play in business continuity planning.</li>
<li>The espida project provides a useful methodology for quantifying the value of web preservation which is helpful when developing a business case for a web preservation project.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>There are many internal and external drivers for undertaking web preservation within an HFE Institution and some of these are summarised below.</p>
<h2>Strategic, legal, financial and contractual obligations</h2>
<p>Institutional websites contain evidence of Institutional activity which is not recorded elsewhere and may be lost if the website is not archived or regular snapshots taken. This loss could be construed as a threat to the business continuity of the organisation as reference to this content may be required for the checking of strategic, legal, financial, contractual or scholarly information. For example, if certain information is not recorded or protected the Institution is in danger of failing to comply with legal acts such as Freedom of Information and Data Protection. Also the Institution may be breaking contractual and auditing obligations, and putting itself at risk.</p>
<h2>Reputational risk</h2>
<p>The Institution’s reputation can be put at risk by poor website continuity, broken links, and missing resources but more damaging is the risk that website users may make decisions based on misleading or out-of-date content.</p>
<h2>Supporting the Institutional Mission</h2>
<p>The case can be strengthened for carrying out web preservation by showing what it delivers to support the Institution’s mission statement, and its associated strategies in the areas of research, teaching and learning, information, libraries, and records management. A policy for preserved web resources could feasibly assist in supporting Institution-wide aims and generic objectives, even when they do not explicitly mention digital resources by name, for example ‘attracting a wide variety of students’.</p>
<h2>Risk management and risk analysis</h2>
<p>Possible risks associated with websites and web resources include: loss of data, records or resources; failure to be information compliant; the risk of litigation from students or the public; and the risk of breaching copyright.</p>
<p>Bringing web preservation into line with business continuity planning may help to change Institutional practice. The risks associated with possible IPR infringements are put in perspective by Charles Oppenheim’s lightweight risk formula: R = A x B x C x D, where:</p>
<ul>
<li>A: probability that you are illegal;</li>
<li>B: probability that you are found out;</li>
<li>C: probability that action will be taken against you;</li>
<li>D: extent of financial risk.</li>
</ul>
<p>The aim is to keep all of these values as low as possible, but it is also the case that if any of these is zero, the overall risk is effectively nullified.</p>
<p>If a risk management strategy is required to enable web preservation, using the JISC InfoKit on Risk Management is a good place to start. The kit takes the view that Risk Management is an essential part of project management.</p>
<h2>Saving money</h2>
<p>Web resources cost money to create and store; failure to repurpose and reuse them would constitute a waste of money. Although web preservation may have an initial cost, once the process has begun the savings can be great. Having a good strategy in place (which means selection, retention, and deletion where appropriate) will save both money and energy in the long run. The website may also contain digital assets and electronic resources – assets which may be of continued value which may increase in value through sharing and repurposing.</p>
<h2>Responsibility to staff and students</h2>
<p>The Institution has a responsibility to the people who use the resources. Students and staff may make serious choices about their academic careers or their jobs based on website information, and the Institution has a responsibility to make sure a record is kept of the publication programme.</p>
<h2>Responsibility to users</h2>
<p>The Institution has a responsibility to the people who may need to use the resources in the future. Many of the resources which the Institution publishes are unique, and deleting them may mean that invaluable scholarly, cultural and scientific resources (heritage records) will not be available to future generations.</p>
<h2>Gaining a competitive edge</h2>
<p>Starting a web preservation programme will make the Institution look ‘forward thinking’ and therefore give them an edge over their competitors. The Institution could be one of the first to start an official web preservation programme which will be great marketing fodder. Embedding web preservation strategies will also help the Institution think about the continuity of resources, broken links and other aspects of web management.</p>
<h2>Quantifying the value of web preservation</h2>
<p>The espida project in Glasgow (<a href="http://www.gla.ac.uk/espida/">http://www.gla.ac.uk/espida/</a>) offers a useful methodology which could be used to quantify the value of web preservation. It takes a pragmatic view of the way that HFE Institutions operate in the real world recognising that preservation activities will continue to vie with other services for funds.</p>
<p>espida can help:</p>
<ul>
<li>demonstrate the value of websites and web resources;</li>
<li>communicate the intangible benefits of web preservation and web resource preservation to senior management, and articulate those benefits;</li>
<li>make a case for a web preservation programme, based on a formalised and transparent communication process between the proposer and the funder;</li>
<li>identify costs and benefits of web preservation, using scorecards and cost templates;</li>
<li>produce a decision-making process that is transparent and based on all relevant information.</li>
</ul>
<p>The results of the process will enable the development of a business case which not only answers the question ‘how much does web preservation cost?’, but also ‘why do we need web preservation?’ and ‘why should we spend money on web preservation, rather than on the primary business of the organisation?’.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">Action</p>
<ul>
<li>Think about the drivers and how they relate to your Institution so that you can develop a business case to get senior management buy-in for your web preservation programme.</li>
<li>Carry out a risk assessment.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-3-%e2%80%93-why-do-i-have-to-preserve-them/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 2 – What are web resources?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-2-%e2%80%93-what-are-web-resources/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-2-%e2%80%93-what-are-web-resources/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:23:43 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=62</guid>
		<description><![CDATA[Summary: Web resources are those delivered by a web browser, and can be found on web servers, in managed systems (including content management systems, Institutional repositories and digital collections), and less well-managed systems (such as Web 2.0 applications and services). To help with determining whether they should be preserved, web resources should be categorised into [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">
<div>Summary:</div>
<ul>
<li>Web resources are those   delivered by a web browser, and can be found on web servers, in managed   systems (including content management systems, Institutional repositories and   digital collections), and less well-managed systems (such as Web 2.0   applications and services).</li>
<li>To help with determining   whether they should be preserved, web resources should be categorised into   records, publications or artefacts.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>Web resources are those delivered by a web browser so they may appear not only on the Institutional website but also on other websites and web systems. Some examples are:</p>
<ul>
<li>Institutional and departmental records, with legal and business requirements governing their retention and good maintenance.</li>
<li>Content affecting students, such as prospectuses and e-learning objects.</li>
<li>Administrative, research, teaching and project outputs.</li>
<li>Evidence of other activities (e.g. conferences).</li>
</ul>
<p>Many of the resources of an Institutional website may be stored in a CMS or within well-established managed systems, such as:</p>
<ul>
<li>Those for managing assessments and examinations.</li>
<li>Online libraries.</li>
<li>Virtual Learning Environments (VLEs).</li>
<li>Online teaching courses and course content.</li>
<li>Institutional repositories.</li>
<li>Digital collections used for study.</li>
<li>E-learning objects and teaching materials.</li>
<li>E-portfolios.</li>
</ul>
<p>In these cases, the resources are simply being accessed or delivered by a web browser so, except for those in a CMS, the preservation should be of the system rather than the pages as they appear on the website. In the case of a CMS both the system and the pages produced need to be preserved.</p>
<p>But the increasing use of Web 2.0 services and applications means that many resources exist in less well-managed systems, many of which are hosted outside the Institution, including:</p>
<ul>
<li>Blogs Blogger (e.g. WordPress, Edublogs, Warwick Blogs).</li>
<li>Wikis (e.g. Mediawiki, Wetpaint, Tiddlywiki).</li>
<li>Social bookmarking (e.g. Delicious, CiteULike, Connotea).</li>
<li>Media sharing services (e.g. Flickr, Slideshare, YouTube. (Scribd, DeviantArt)).</li>
<li>Social networking systems (e.g. Facebook, Twitter, Ning, Elgg, Crowdvine, LinkedIn).</li>
<li>Collaborative editing tools (e.g. Google Docs).</li>
<li>Syndication and notification technologies (e.g. Netvibes, Technorati).</li>
<li>Instant messaging (e.g. Facebook Chat, Google Chat, Skype, Jabber, Windows Messenger).</li>
</ul>
<h2>Which web resources need to be preserved?</h2>
<p>To determine which web resources to preserve it is helpful to consider which of three categories best describes the resource – record, publication or artefact. If it is a record or a publication, the resource should be considered in the context of existing policies and procedures for these types of document.</p>
<p>A record is ‘recorded information, in any form, created or received and maintained by an organisation or person in the transaction of business or conduct of affairs and kept as evidence of such activity.’ (<a href="http://www.recordsmanagement.ed.ac.uk/InfoStaff/RMstaff/RM_framework.htm">http://www.recordsmanagement.ed.ac.uk/InfoStaff/RMstaff/RM_framework.htm</a>)</p>
<p>A web resource can be considered a record if it:</p>
<ul>
<li>constitutes evidence of business activity that needs to be referred to again;</li>
<li>is evidence of a transaction;</li>
<li>needs to be kept for legal reasons.</li>
</ul>
<p>Some examples:</p>
<ul>
<li>The website contains the only copy of an important record. Web resources should not be removed or deleted without establishing if they are the only copy.</li>
<li>The website, or a set of web pages, in itself constitutes evidence of Institutional activity. The history of this evidence is visible through the various iterations and changes of the website.</li>
<li>The website is in itself evidence of the publication programme, or has such evidence embedded within its systems. If it is necessary to show, as evidence, that the Institution published a particular document on a certain date, then the logs in the CMS constitute an evidential record. In some cases, this may be needed to protect against liability.</li>
<li>A transaction of some sort that has taken place through the website (transaction does not just mean money has changed hands). If these transactions need to be kept for legal or evidential reasons, then they are records too. The transaction may generate some form of documentation (e.g. automated email responses), which may in turn need to be captured out of the process and stored in a place where it can be retrieved and accessed.</li>
</ul>
<p>A publication is ‘a work is deemed to have been published if reproductions of the work or edition have been made available (whether by sale or otherwise) to the public’. (National Library of Australia <a href="http://www.nla.gov.au/services/ldeposit.html">http://www.nla.gov.au/services/ldeposit.html</a>)</p>
<p>A web resource might be considered a publication if it is:</p>
<ul>
<li>a web page that is exposed to the public on the website;</li>
<li>an attachment to a web page (e.g. a PDF or Word Document) that is exposed on the website;</li>
<li>a copy of a digital resource, e.g. a report or dissertation that has already been published by other means.</li>
</ul>
<p>Some examples:</p>
<ul>
<li>Websites containing the only copy of an important publication.</li>
<li>Web pages constituting a version of information that is available elsewhere. By version, it is meant that it has been rendered in some way to bring it into the website. This rendering may include, for example, the addition of navigation elements that make it different from the original source.</li>
<li>A web page constituting a mix of published information. For example, a page of original Institutional material combined with an RSS feed from outside the Institution.</li>
</ul>
<p>An artefact is ‘anything else that isn&#8217;t a record or a publication by the above definitions, but which is still worth preserving, can be understood as an artefact’.</p>
<p>A web resource might be considered an artefact if, for example, it:</p>
<ul>
<li>has intrinsic value to the Institution for historical or heritage purposes;</li>
<li>is an example of a significant milestone in the Institution&#8217;s technical progress, for example the first instance of using a particular type of software.</li>
</ul>
<p>Preserved artefacts could include image collections (still and moving), databases, e-learning objects, digitised objects or research objects.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">
<div>Actions</div>
<ul>
<li>Investigate the web resources   in your Institution so that you can see the range of types and locations,   particularly concentrating on those which are more hidden such as the Web 2.0   applications and services. This will be refined following Chapter 5.</li>
<li>Assess the existing technical   infrastructure and technical skills.</li>
<li>Produce an outline list of   the resources to be preserved.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-2-%e2%80%93-what-are-web-resources/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chapter 1 – What is preservation?</title>
		<link>http://jiscpowrguide.jiscpress.org/chapter-1-%e2%80%93-what-is-preservation-2/</link>
		<comments>http://jiscpowrguide.jiscpress.org/chapter-1-%e2%80%93-what-is-preservation-2/#comments</comments>
		<pubDate>Wed, 16 Jun 2010 13:22:53 +0000</pubDate>
		<dc:creator>jiscpowrguide</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/?p=60</guid>
		<description><![CDATA[Summary: Definition of ‘web preservation’. To ensure that everyone in the Institution agrees on what should be preserved and how, a web preservation programme should be developed. All resources must be managed in order to preserve them. There are issues to bear in mind which are specific to web resources, Web 2.0 resources and content [...]]]></description>
			<content:encoded><![CDATA[<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="616" valign="top">
<div>Summary:</div>
<ul>
<li>Definition of ‘web   preservation’.</li>
<li>To ensure that everyone in   the Institution agrees on what should be preserved and how, a web   preservation programme should be developed.</li>
<li>All resources must be managed   in order to preserve them.</li>
<li>There are issues to bear in   mind which are specific to web resources, Web 2.0 resources and content   management systems.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<p>For the purposes of this Guide, we define web preservation as ‘the capture, management and preservation of websites and web resources’. Web preservation must be a <em>start-to-finish</em> activity, and it should encompass the <em>entire lifecycle</em> of the web resource.</p>
<p>Another definition to consider in this context is JISC’s definition of ‘digital preservation’ &#8211; ‘the set of processes and activities that ensures long-term, sustained storage of, access to and interpretation of, digital information’.</p>
<p>Institutional views of preservation requirements may vary so it is important for an Institution to agree on a web preservation programme which defines the web resources which will be preserved. When considering this bear in mind that:</p>
<ul>
<li>Resources must be managed in order to preserve them.</li>
<li>Preservation will not apply to all web resources: a selective approach is recommended.</li>
<li>Preserving every version of every resource is not always necessary.</li>
<li>Permanent preservation (as defined by the OAIS model) is not the only viable option. Short-term protection of a resource from loss or damage is an acceptable form of preservation.</li>
<li>Preservation actions do not have to result in a &#8216;perfect&#8217; solution.</li>
</ul>
<p>When considering web resources there are a number of specific preservation issues which apply. In addition, Web 2.0 and content management systems present unique issues.</p>
<h2>Web resource preservation issues</h2>
<p>1 &#8211; Frequency of change: Web resources change to a greater or lesser extent every day, and periodically change dramatically because of events such as re-branding, the implementation of a content management system, or changes to content providers.</p>
<p>2 &#8211; Quantity and range of resources: The quantity and range of resources potentially needing preservation are so large it is vital to: know what resources there are; where they are; and what to do about them.</p>
<p>3 &#8211; Continuity: Because of the ease with which websites and pages can be edited, the possible impact on users expecting continuity in web resources can be overlooked. For example, a page may stay the same, but no longer be available from the same URL, or it may remain at the same URL but its content changes. So the issues are: persistence of resources at a given URL; and persistence of resources within a domain.</p>
<p>Ideally it should be possible to support versioning across a whole site, so that old versions of a page link to their associated contemporary versions, but this represents a large overhead.</p>
<p>4- Integrity of web resources: Websites and pages need to be protected from careless or wrongful amendment, deletion, or removal, whether by malevolent hackers/crackers, or well-intentioned Institutional staff.</p>
<p>5 &#8211; Ownership: There may be issues of ownership<strong> </strong>resulting from web resources being managed by many different departments or members of staff, or by sub-sites sometimes being temporary or ad hoc (for example, a project site).</p>
<p>6 &#8211; Databases and deep websites: Databases present particular issues because<strong> </strong>preserving an underlying database may not preserve the user&#8217;s experience on the web. Also database-driven websites are not always easy to capture by remote harvesting.</p>
<p>7 &#8211; Streaming and multimedia: The quantity and quality of data, and the range of formats, can cause issues when dealing with multimedia. In addition, these resources can be hosted elsewhere and therefore the same set of issues applies as for Web 2.0 applications (see below).</p>
<p>8 &#8211; Personalised websites: Some websites offer users customisable features. This raises the issue of whether every possible combination of every user&#8217;s custom view should be preserved.</p>
<p>9 &#8211; Appraisal and selection: Appraising and selecting which web resources should be preserved raises many questions which are dealt with in Chapter 5.</p>
<p>10 &#8211; Provising access: Once preserved it has to be considered how access will be provided to the web resources and how to deal with issues of IPR and ownership.</p>
<p>11 &#8211; Resources for preservation: Both personnel and technical resource issues also have to be considered. Preservation work can be an overhead on day-to-day web and records management activities so assigning people to the preservation work needs to be balanced with routine web and records management.</p>
<p>In technical terms, it is necessary to estimate how much storage space will be required to store the old web resources and where this will be located.</p>
<h2>Web 2.0 preservation issues</h2>
<p>The two most important issues with Web 2.0 software and applications are ownership and retention.</p>
<p>1 &#8211; Ownership and responsibility: Often individuals create and manage their own Web 2.0 resources such as external (personal) accounts for Flickr, Slideshare or WordPress.com. So it is possible for academics to conduct a significant amount of Institutional business outside any known Institution network. In these cases, the Institution either does not know this activity is taking place, or ownership of the resources is not recognised officially. In such a scenario, it is likely the resources are at risk.</p>
<p>2 &#8211; Retention of &#8216;master copies&#8217;: Third party sites such as Slideshare or YouTube are excellent for dissemination, but they cannot be relied on to preserve materials permanently. So, if a resource is created on one of these third party sites and it requires retention or preservation, arrangements must be made within the Institution for the &#8216;master copy&#8217;.</p>
<h2>Content management system preservation issues</h2>
<p>With digital preservation in mind, the features of particular value which content management systems (CMSs) may offer are:</p>
<ul>
<li>Version control &#8211; when changes are made to items in the CMS, the previous version is kept.</li>
<li>Change logging &#8211; when changes are made to items in the CMS, the system records who made the change and when.</li>
<li>Rollback/reversion &#8211; the facility to restore the website, or a part of it, to a previous state.</li>
<li>Creating a snapshot of the website at a particular point in time.</li>
</ul>
<p>Many CMSs offer one or more of these features but the extent to which they can easily be used to reinstate older versions of a website, or find what changes happened when, varies dramatically. Version control information is easy to create and store, but less easy to put to practical use. Discussions with web managers suggest that these features are rarely tested very vigorously.</p>
<p>The particular preservation issues of CMSs are:</p>
<ul>
<li>Page names and numbers.</li>
<li>Rollback function is limited.</li>
<li>Lifespan of system.</li>
<li>Compatibility between systems.</li>
</ul>
<p>Page names and numbers: Some CMSs may present problems to a remote harvesting engine, or crawler, as pages that are identified with numerical tags instead of page names, for example, may not be recognised, and hence may be missed by the remote harvester. This is especially true if the CMS generates pages dynamically. The severity of this behaviour may also depend on how the site was built in the first place.</p>
<p>Rollback function is limited: A rollback may not be the same as restoring a full snapshot as it will tend to focus on a particular page or content element, but not its entire context. Web pages usually have many objects that they relate to &#8211; for example embedded images and stylesheets &#8211; so the rollback cannot be used to view the content of the whole page as seen by the user. The content is held in the database as layers of time-stamped pages and a script is required to retrieve it. It is therefore not clear to what extent the rollback functions and version control tools produce useful, tangible outputs that could be captured, managed or preserved.</p>
<p>Compatibility between systems: A CMS may not be supported indefinitely so the question arises about whether the new version will be compatible with the old version. Also, the Institution may decide to change the CMS and, as CMS internal management of content, data and metadata tends to be application-specific, this may mean that moving large quantities of interlinked website content between CMS packages is likely to be a manual and intensive process.</p>
<p>Backing up is not enough: A CMS is a database full of content, but simply backing up the database will not constitute preservation of the content. The backup action would capture a change history of the website for as long as it was kept in that CMS; it would not constitute a usable collection of page snapshots, or an archived website.</p>
<p>Metadata: The change history metadata would be extremely useful for records management and preservation purposes, but access to that metadata is not guaranteed: it would need to be exportable in a form that could be preserved.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="604" valign="top">
<div>Action</div>
<ul>
<li>Define web preservation in   the context of your Institution.</li>
<li>Consider   to what extent the issues raised by Web 2.0 and content management systems   affect your Institution.</li>
</ul>
</td>
</tr>
</tbody>
</table>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/chapter-1-%e2%80%93-what-is-preservation-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction</title>
		<link>http://jiscpowrguide.jiscpress.org/introduction/</link>
		<comments>http://jiscpowrguide.jiscpress.org/introduction/#comments</comments>
		<pubDate>Fri, 11 Jun 2010 08:59:29 +0000</pubDate>
		<dc:creator>Joss Winn</dc:creator>
				<category><![CDATA[Web preservation]]></category>

		<guid isPermaLink="false">http://jiscpowrguide.jiscpress.org/hello-world/</guid>
		<description><![CDATA[The Preservation of Web Resources project (JISC PoWR) was funded by the JISC in order to identify emerging best practices for the preservation of web resources. The project was provided by UKOLN and ULCC and ran from April through to November 2008. A number of workshops were organised to help identify emerging best practices, and [...]]]></description>
			<content:encoded><![CDATA[<p>The Preservation of Web Resources project (JISC PoWR) was funded by the JISC in order to identify emerging best practices for the preservation of web resources. The project was provided by UKOLN and ULCC and ran from April through to November 2008. A number of workshops were organised to help identify emerging best practices, and a blog was established to raise awareness of this work and to gain feedback on the approaches being taken by the JISC PoWR team.</p>
<p>The project handbook was published in November 2008. Since then we have seen a growing awareness of the importance of digital preservation in general and in the preservation of web resources (including web pages, web-based applications and websites) in particular. The current economic crisis and the expected cuts across public sector organisations mean that a decade of growth and optimism is now over – instead we can expect to see reduced levels of funding available within the sector which will have an impact on the networked services which are used to support teaching and learning and research activities.</p>
<p>The need to manage the implications of these cutbacks is likely to result in a renewed interest in digital preservation. We are therefore pleased to be able to publish this new guide, based on the original <em>PoWR: The Preservation of Web Resources Handbook</em>, which provides practical advice to practitioners and policy makers responsible for the provision of web services.</p>
<p>Brian Kelly, UKOLN, Project Director, JISC PoWR project</p>
]]></content:encoded>
			<wfw:commentRss>http://jiscpowrguide.jiscpress.org/introduction/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

