Digital Preservation Plan for Personal Digital Archive
The purpose of this report is to identify the contents of a personal digital archive (PDA), provide a risk assessment, and produce a preservation plan that ensures continued access to and usability of the content. Digital preservation provides the strategies and actions that can be taken to digital files and objects to provide long-term access.
A PDA preservation plan is similar to an enterprise or institutional preservation plan but on a smaller scale. It can share conceptual models that help to implement preservation actions and processes, and locate the archiver within a complex lifecycle. Understanding the risks that actions bring to the workflow and the archive itself are assessed.
The PDA consists of photographs, which include JPEG, TIF, and PNG files on an Apple iMac (Intel iMac, 2020). There are duplicates in the collection, but they are stored under different file names; the Documents folder on an HP Pavilion 500-336 Desktop Computer (2014) contains a range of documents from PDFs of old invoices to spreadsheets of family members’ addresses, and Word documents of old university content. There are a dozen videos on a Samsung Galaxy 9 mobile phone; there is also older content stored on CD-ROMs and 3.5-inch floppy disks. Stored on a portable USB thumb drive are the outputs and working files of a personal creative project that includes video, photos, GIFs, PDF documents, Photoshop, Illustrator, and InDesign files. See Table 1 for more information about the files and their locations.
Personal Digital Archive (PDA) Collection Summary
The content of the PDA files is a mixture of images, video, graphics, and text. Given how quickly file formats can become obsolete with software upgrades that phase out support for older file types (Digital Preservation Coalition [DPC], 2015a), the actions needed to care for this collection will be further expanded upon in the Preservation Plan.
The major risks associated with the continued accessibility and usability of the digital files in the PDA centre around the point that, “file formats encode information into forms that can only be processed and rendered comprehensible by very specific combinations of hardware and software. The accessibility of that information is therefore highly vulnerable in today’s rapidly evolving technological environment” (Brown, 2008).
Current, specific risks to the PDA include: technology obsolescence; technology fragility; means of access; natural or technological disaster; insufficient resources; significance of items may not be recognised; inadequate metadata; maintaining critical aspects of functionality (Oliver & Harvey, 2016). Table 2 expands on the risks.
Risks to Personal Digital Archive
Note: Table adapted from Personal Digital Archiving (Redwine, 2015), and Risk Management of Digital Information: A File Format Investigation (Lawrence et al., 2000).
For this PDA there are several hardware devices that file types reside on, and software the files rely on to be rendered usable. A goal of managing the risk to these files is to migrate them to a state that is suitable for preservation (Wheaton College Library and Archives, 2019, p. 3). Risk mitigation for migration of digital files for preservation needs to consider file ubiquity, support, stability, metadata support, interoperability, viability, and re-usability (Brown, 2008).
Migration tools carry risks, which can include changes happening to the content that is invisible to the preserver. Mitigation actions include an acceptance criterion that highlights the extent the preserver tolerates changes to the format or functionality of the content (Digital Preservation Coalition [DPC], 2015a).
Mitigating these risks is explored as part of the Preservation Plan and upheld by the Levels of Digital Preservation in Benchmarking. The goal of a PDA preservation plan is to ensure continued access to the files contained in it over a long-term time frame. This section of the report outlines the actions needed to enable the PDA’s digital file preservation that considers the risk factors.
Personal Digital Archive’s preservation plans are not simply a backup of content but a series of managed activities and actions that safeguard digital content and enable it to be rendered as needed, regardless of technical failure or obsolescence (Purdue University, 2020). In terms of a PDA, common advice for novices is to firstly identify and decide which files are of most importance and where the files are located (Library of Congress & NDIIPP, 2013).
To put that in terms of the professional DCC Curation Lifecyle Model, seen in Appendix A (Digital Curation Centre, 2021), identify and decide sits within the Full Lifecycle actions of Preservation Planning, and within the Curate of the Curate and Preserve actions. They also sit within the Sequential Action of Appraise and Select. Locating the PDA collection within the DCC Curation Lifecycle Model allows a preservationist to understand where their content is in the cycle and allow better decision-making about other actions that need to be taken.
Benchmarking is a method to measure both a preservation archive’s organisational capabilities and the technological facets of its actions, strategies, and policies (Carden et al., n.d.). While various models are available, this PDA and report are using a fairly simple benchmark model, The National Digital Stewardship Alliance’s Levels of Digital Preservation, to implement best practices (NDSA, n.d.). Table 3 describes baseline practices this report’s PDA should be benchmarked to, within the Levels of Digital Preservation model, seen in Appendix B (NDSA, n.d.). The model “features actions independent of specific formats, content type, and storage systems, thus enhancing their usability across domains” (NDSA, 2019).
Personal Digital Archive (PDA) Project: Levels of Digital Preservation
Note: Table is adapted from elements of the Levels of Digital Preservation (NDSA, n.d.), seen in Appendix B. Levels indicated here are to be achieved at the end of the Store stage in the DCC Lifecycle .
From this point in the implementation of the preservation plan, as seen in the Curation Lifecycle Model (see Appendix A), the next stage after Appraisal and Select is Ingest, Preservation Action, and Store. This report ends with the PDA entering the Access, Use, and Reuse stage. It is recommended for this PDA that any future files created should consider the DCC Curation Lifecycle and begin at conceptualisation in the Create and Receive step to include robust metadata (Digital Curation Centre, 2021).
The files coming into the archive environment for the PDA are part of the Ingest stage, and now need actions to preserve the content and be prepared for long-term storage, which is the Preservation Action stage of the DCC Lifecycle (See Appendix A), (Digital Curation Centre, 2021). Preservation Actions are the process of combining the files and their corresponding metadata (known as a Submission Informational Package or SIP) in preparation for taking them into the preservation storage (Wheaton College Library and Archives, 2019, p. 10).
The file formats contained in the PDA collection are arranged into content type in Table 4, regardless of their location, and show the migration that some files will need to take to be best suited for their preservation (Digital Preservation Coalition [DPC], 2015a). This is a process of standardising the files allowed into the PDA to those files that are most resistant to degradation and obsolescence, over a longer time frame (Carden et al., n.d.). It is recommended that a master file is created using preservation-appropriate files.
Personal Digital Archive File Format Migration
Note: The file migrations are based on the recommendations found in National heritage digitization strategy – Digital preservation file format recommendations (Canadian Heritage Information Network, 2019), and Sustainability of digital formats: Planning for Library of Congress collections (Library of Congress, 2021).
During the Ingest stage, a process of format identification, validation, and characterisation are necessary steps in migration to ensure that the preservationist can identify a file, identify broken files, and find the significant properties that characterise the file such as technical metadata, respectively (Carden et al., n.d.). A software tool that is available for download that allows the preservationist of this PDA to carry out these actions can be found at JHOV (https://jhove.openpreservation.org/) (Open Preservation Foundation, 2015).
The UK National Archives provides free-to-use resources of technical information called PRONOM (The National Archives, n.d.). Many useful tools are available including DROID to identify file formats. External projects endorsed by PRONOM include a tool called COPTR that is useful in finding tools needed for long term digital preservation tasks, including search options by stage, function, content type or file format, and searchable by DCC Lifecycle stages (COPTR contributors, 2021b). It is recommended to search COPTR and utilize the available tools that address risks identified in the Risk Assessment.
As an example of the tools available from COPTR, their Ingest page provides a list of functions that help with Ingest such as Dependency Analysis, File Format Identification, Fixity, Metadata Extraction, Metadata Processing, Quality Assurance, and Validation, and also provides a list of tools to carry out the functions within Ingest, or options for any stage of the DCC Lifecycle (COPTR contributors, 2021a).
Metadata, which is largely descriptive and administrative information about the file, is needed to continue to understand a preserved file. Metadata is essential for managing information resources, resource identification and description, and retrieving information (Hilder, 2018, p. 5). The COPTR tool previously recommended is ideal for extracting existing metadata. Metadata is an Ingest Lifecycle function and ensures a file’s Authenticity (original, complete, intended), Provenance (Chain of custody), and Integrity (Checksum) (Oliver & Harvey, 2016).
As explained in Table 3, three complete copies of a file, with at least one copy in a separate geographic location is recommended. It is also recommended that the storage include a cloud-based location, such as Dropbox or other professional cloud repository, an external hard drive such as a thumb drive, and a local, accessible location such as a desktop or laptop (Digital Preservation Coalition [DPC], 2015b).
The goal of this report is to identify and assess theoretical and practical issues related to digital curation and preservation in the context of a PDA, and identify risks, lifecycles, concepts, tools, and technology needed to effectively carry out a PDA project. Moving the PDA files through the DCC Lifecycle from Appraisal and Select to Store ensures the appropriate actions are taken to safeguard the content of the files and the processes that they go through.
The DCC Curation Lifecycle Model
Levels of Digital Preservation
Digital Curation & Preservation
Digital Preservation Plan for Personal Digital Archive report created for Digital Curation & Preservation course in Master of Information Studies at Charles Sturt University.