VSARA > Partner > Assistance > Resource-Library > Digital

Digitization & Digital Preservation


Digitization of historic materials has many benefits. It can be part of a preservation strategy for at risk materials, by limiting the handling of the originals, and digitization can be a great way to increase access to materials if the digital versions are made freely available online. However, not everything can or should be digitized: digitization takes resources, requires the right to reproduce the materials, and some collections don’t warrant item-level attention.

Following the basic framework below can help to focus and plan a digitization project. In addition, find introductory information about digitization from the State Archives of North Carolina, the Pennsylvania Historical and Museum Commission, the Pennsylvania Historical Society, and the Utah Division of Archives and Records Service.

It is crucial to plan the entire project in some detail before you begin. Understand why you want to digitize and what the outcomes will be. Then, set your benchmarks and procedures for scanning, and detail how you will capture metadata. Finally, you will need to have a plan for storage, access, and preservation. Connecting to Collections Care has a webinar about the basics of a digitization project, and Recollection Wisconsin and the Minnesota Historical Society both have basic digital project planning worksheets that can be very helpful in planning your project.

Selection

You should ask some basic questions about the materials you are considering for digitization before moving forward.

  • Do the materials meet the mission and goals of your institution?
  • Are the materials important enough to warrant the time and money spent on digitization?
  • Will there be enough research demand for these materials? Or will the materials be relevant to a community event or historic milestone?
  • Do you have the right to copy the materials? Do you have the right to make them available on site and/or online?
  • Are there any privacy or confidentiality restrictions within the materials?
  • Are the materials stable enough to go through the physical handling of scanning or photographing?

If the answer is no to any of these questions, the materials may not be a good candidate for digitization.

If the answer is yes to the following questions, then the materials may not be a good candidate for digitization:

  • Do the materials need a lot of work (flattening, removal of fasteners, arrangement and description) before they can be digitized?
  • Have the materials already been digitized by another organization?

The Northeast Document Conservation Center has a helpful leaflet on selection for digitization.

Copyright

Copyright, or owning the right to reproduce and distribute a work, is essential in selection for digitization. Was it clear that your institution was allowed to share the materials freely when they were acquired? The person with physical custody can give permission to use, but if they are not the copyright holder, they cannot give the copyright to your organization. If the materials are being scanned simply to create digital backups for preservation then there should be no issue, but if the goal is to allow public access, either on site or especially via the internet, copyright needs to be considered. The Cornell University Library has a helpful chart to guide you through the complexities of U.S. copyright law and for determining when works fall into the public domain.

Even if the images are under copyright, there are still some situations where they can be used with minimal risk, if it falls under what’s called fair use. This is determined on a case-by-case basis by considering four factors – the nature of the work (e.g. news vs. creative work), the nature of the use (e.g. educational vs. commercial), the amount to be used, and the effect on the potential market for the work (e.g.  if the work is used repeatedly, it would prevent more purchases of the work and thus impact the copyright holder’s potential to profit from their work). The Cornell University Library has a checklist to help in determining if something is fair use. Copyright is a complex issue; Recollection Wisconsin has a helpful overview of copyright and you can learn more from this University of Illinois Libraries guide and from the Cornell University Library.

Reformatting

Scanning, photographing, or otherwise digitally capturing the materials is the technical piece of any digitization project. You will need to determine the resolution of the image capture and the format in which the resulting file will be saved. This will depend upon the type of material you are digitizing (e.g., photographs vs. text) and the ultimate use of those materials (e.g. public access via the web vs. print-quality images for publication); you may need a different approach for different types of material. You can find a good overview of scanning from Recollection Wisconsin and basic quick guide to scanning requirements in their digital project planning worksheet. The Library of Congress has a brief tutorial on the basics of scanning and Cornell University has a more in-depth digital imaging tutorial. The State Archives of North Carolina has a helpful video on technical concepts used in digitization.

While many digitization projects can be accomplished in-house, you may have a project that due to scale, condition, or format would be best served by contracting with a professional vendor. The VHRP polled stakeholders for recommendations on digitization vendors who serve the Vermont area. Contact us for information on vendors with general digitization services, newspaper digitization services, and audio-visual media digitization services.

Metadata

Metadata is often defined as “data about data.” This description can document what is known about the material and makes it easier to find, use, and manage your materials. There are several kinds of metadata that can be created, including:

  • Descriptive metadata describes a resource so the user can search for it and find it. This information is typically entered manually in a catalog or content management system.
  • Technical metadata is captured automatically by the computer software that created the file and includes information about the technical processes used to produce it. This is the information that is embedded in your file when you take a photograph on your digital camera or phone.
  • Structural metadata is what ties multiple digital objects together and explains their relationship, such as a book with each page digitized as a separate image.
  • Administrative metadata is the information about the materials, such as how you acquired the item, copyright status, and any preservation actions taken on the materials.

There are also different types of standards for metadata. There are structural standards, which are the fields or the pieces of information you are going to collect about an item. For example, when describing an item in your catalog, you may need to capture the title, the author or creator, and the dates. A common descriptive metadata structure is Dublin Core. Then, there are content standards, which is how you format the information within those fields that you are gathering. For example, you might enter the author’s or creator’s name as last name, first name. A common content standard for archival materials is Describing Archives: A Content Standard (DACS). Finally, there are controlled vocabularies, such as the Library of Congress Subject Headings, that provide consistency to the values entered. Using the same, consistent terms helps ensure that, when your materials are searched, all relevant results will be returned.

Recollection Wisconsin has an overview of metadata, and the Getty Institute has a more in-depth primer.

Storage and Access

What will you use for long-term storage of the digital files once you’ve captured them? And if the goal of the digitization project is to allow the public to be able to access the files, how will that happen? Will you upload them to your website, or use an online discovery or exhibit platform? And if your materials are for public use, you’ll need to plan on how to get the word out! How will you promote this new resource to potential users? Recollection Wisconsin has a great overview of storage for digital collections.

Storage of digital collections is intimately connected to their preservation; you will also need to consider how to preserve these digital files over time. You can learn more about that in the next section.

Digital Preservation

There is a common misconception that “going digital” preserves records forever, but this is not the case; digital files are faced with many conditions that threaten their longevity. Digital files are dependent upon software that can be replaced by newer versions, leaving those older files unusable. The media that files are stored on can deteriorate over time, leaving the files corrupted. And digital files are particularly vulnerable to loss of context; for example, if you have a folder of file names with no other information, then you have no context for how they were created or how the files relate to one another. Digital materials may be either “born digital” or analog materials that have been digitized. In either case, they require a set of strategies and skills to ensure that they persist over time.

There are several strategies for preserving digital files, and they are often used in tandem, as one approach doesn’t solve all problems. Digital preservation can include replication, which involves making authentic copies and storing them in different geographic locations; refreshing, which involves copying materials from an unstable medium to a stable one (e.g. going from an old hard drive to a new one); migration, which involves transferring data in a file from an older generation of software to a new one (e.g., migrating from Microsoft Word 97 to Microsoft Word 2016); normalization, which includes migrating files to standardized formats (e.g., all Word documents become PDF-A); and emulation, which involves imitating the old software environment to run older files on newer equipment, especially important for files where the look and feel and behavior need to be retained (e.g., databases).

Digital preservation is more complex than preserving analog materials, but don’t wait until you have all the answers before getting started. There are already actions to be taken, such as making an inventory of all the digital files in your institution’s custody. You can’t preserve everything at once, so your inventory can help identify priority materials, such as those with few copies, that are non-standard file formats, or anything that requires specialized software to run. A good rule to remember is the 3-2-1 rule in digital preservation: 3 copies on at least 2 different kinds of media, and 1 copy stored somewhere different than the other copies. There are many free, open-source tools to help with digital preservation efforts in-house, as well as commercial or consortial platforms that come with a cost.

There are many resources available to you to learn more about digital preservation. The Northeast Document Conservation Center has an overview of digital preservation and the Wisconsin Historical Society has an overview of storage and preservation for digital materials. For more in-depth information, consult the Digital Preservation Coalition’s Digital Preservation Handbook, or you can take a series of webinars about caring for digital records through Connecting to Collections Care or the Digital POWRR Project. To help you understand the terminology used in the digital preservation world, consult the National Digital Stewardship Alliance’s glossary.



Contact Information

Vermont State Archives & Records Administration

1078 Route 2, Middlesex

Montpelier, VT 05633-7701

Contact VSARA

Phone & Hours

Main Line: 802-828-3700

Fax: 802-828-3710

Office Hours: 7:45 AM to 4:30 PM, M-F

Reference Room: 9 AM to 4 PM, M-F

Closed State Holidays

Tanya Marshall, State Archivist & Director


Visit the Calendar Page:
event