Table of Contents
Practical advice for web and records managers
The Preservation of Web Resources project (JISC PoWR) was funded by the JISC in order to identify emerging best practices for the preservation of web resources. The project was provided by UKOLN and ULCC and ran from April through to November 2008. A number of workshops were organised to help identify emerging best practices, and a blog was established to raise awareness of this work and to gain feedback on the approaches being taken by the JISC PoWR team. The project handbook was published i [...]
Summary: Definition of ‘web preservation’. To ensure that everyone in the Institution agrees on what should be preserved and how, a web preservation programme should be developed. All resources must be managed in order to preserve them. There are issues to bear in mind which are specific to web resources, Web 2.0 resources and content management systems. For the purposes of this Guide, we define web preservation as ‘the capture, management and prese [...]
Summary: Web resources are those delivered by a web browser, and can be found on web servers, in managed systems (including content management systems, Institutional repositories and digital collections), and less well-managed systems (such as Web 2.0 applications and services). To help with determining whether they should be preserved, web resources should be categorised into records, publications or artefacts. Web resources are those delivered by a web browse [...]
Summary: The drivers for carrying out web preservation are strategic, legal, financial, contractual and reputational. Web preservation also has a role to play in business continuity planning. The espida project provides a useful methodology for quantifying the value of web preservation which is helpful when developing a business case for a web preservation project. There are many internal and external drivers for undertaking web preservation within an HFE Institution and some of these [...]
A web preservation programme must involve planning and activities for all the stages in the lifecycle. It involves two main phases: analysis and planning; and execution. The table below shows these phases divided into sequential stages, and the tasks included in each stage. The column on the right points to the relevant chapter(s) in this Guide which will provide help. It may prove more successful for an Institution to manage the individual stages as a series of clearly defined projects, with [...]
Summary: The appraisal process looks at the location, use and ownership of the web resources, and particularly aims to ensure that web resources containing unique information are preserved. Resources which are managed elsewhere (e.g. in asset collections or Institutional repositories), or are of little or no value, can be omitted from the web preservation programme. The MoSCoW method enables prioritisation of resources. The selection approach can be unselective (domai [...]
Summary: The capture of web resources can be carried out within the authoring system or server, at the browser or with a crawler. Each process has its advantages and disadvantages. Tools are available for each type of capture. Tools A number of tools are available for capturing web resources: Tools for capturing web resources. Workflow systems and curatorial tools. Snapshot tools. For more see Appendix C. Netpreserve.org and Harvard University Library als [...]
Summary: The Institution must take ownership of the web preservation programme at the highest level so that staff and students are motivated to take part. There are some national and international initiatives which might help. Success with the preservation of web resources will potentially involve the participation and collaboration of a wide range of experts: information managers, asset managers, web managers, IT specialists, system administrators, records managers, [...]
Summary: Four main operational models can be considered. The quick win approaches include domain harvesting, carrying out pilot projects and considering using the EDRMS. Strategic approaches include information lifecycle management, adapting records management approaches, and the continuity approach. Operational models There are various operational models for developing and implementing a web preservation programme. Selecting a model will depend on balancing s [...]
Summary: All existing policies which may have an impact on web preservation should be located and assessed. A policy review should be carried out. It is unlikely that any Institution will have a single policy or mission statement that governs everything that should be happening in relation to websites and web resources. Any relevant Institutional statements are probably scattered across several places and departments; further, any guidance relating to the creation, sto [...]
Scenario: Your institution is about to commemorate an important event and the Vice Chancellor wants to highlight that the Institution is actively engaging with new technologies, and so would like to provide an example of how the Institution's website has developed since it was launched. Issues How has your Institutional home page changed over time? Have you kept records of the changes and the decisions which were made (and how they were made)? If you needed to do this for [...]
Scenario: An Institution has set up an Institutional Twitter account for disseminating news on activities and events. The unspoken expectation is that Twitter will be used across the Institution as an individual productivity and social tool. However, one department has become an early adopter of the technology and is using it in teaching and learning, and research contexts. The Head of this department is now suggesting that a formal policy for capture and preservation of Twitter me [...]
Scenario: A project team in the Institution has purchased a top level domain with a .org suffix, outside the main Institution domain, in order to expose and store its project outputs. The project is now developing into a successful service, there are numerous dependencies, and users have come to trust the domain. But the project manager failed to renew the domain name subscription, and it has now been purchased by a third party. This third party is requesting a significant fee to r [...]
Description: Your Institution runs a blog service for students and staff. One enthusiastic alumna wants to migrate the extensive blog she has kept for three years, but your Institution systematically deletes files and accounts held by students on Institution servers shortly after they graduate. How should the Institution respond if students wish to maintain or migrate the contents of their blog (plus embedded resources, comments, etc.)? Issues Should the option be open to stud [...]
Scenario: Wikis are now are at the online heart of innumerable projects - for teaching, research, publishing and business. When the project is complete what should be done with the content to ensure it is retained? Does wiki software allow for this? Issues Many wikis have a backup option which will enable the capture of wiki content, such as Wetpaint, Wikidot, Mediawiki, and Confluence. However all these options produce imperfect results. In contrast to this, if a spide [...]
This Guide is one of the outputs from the JISC-funded PoWR project carried out jointly by ULCC (University of London Computer Centre) and UKOLN (University of Bath). One of the goals of PoWR is to make current trends in digital preservation meaningful and relevant to information professionals with the day-to-day responsibility for looking after web resources. Anyone coming for the first time to the field of digital preservation can find it a daunting area, with very distinct terminology and c [...]
Tools for capturing web resources Web harvesting engines are essentially web search engine crawlers with special processing to extract specific fields of content from web pages. The shortcomings of crawlers are described in Chapter 6. Heritrix Heritrix is a free, open-source, extensible, archiving quality web crawler. It was developed, and is used, by the Internet Archive and is freely available for download and use in web preservation projects under the terms of t [...]