DH Open Office Hours: Digitization
On April 23, 2025, the Digital Scholarship Group hosted a DH Open Office Hours event on an introduction to digitization in the presentation space within the Centers for Digital Scholarship. For many digital scholarship research projects one of the first steps is digitizing analog and paper materials. Managing digitization well is crucial to a successful digital project. Kim Kennedy, the Digital Production Librarian, provided an introduction to how to plan and manage a digitization project. The talk was focused on best practices for digitization, as well as the digitization resources available within Snell Library. The plan is to hold a larger, more workshop-style event on digitization this Fall.
Kim described the different technical tools available for digitization, including flatbed scanners, sheet feeder scanners, planetary scanners, camera-based systems, slide scanners, and microfilm scanners. There are two flatbed scanners available for use within the CDS Digitization Lab: the Epson Perfection V850 Pro or the Epson Expression 13000XL. The Epson Perfection V850 Pro can capture 8.5×11 in. documents, as well as slides and negatives. The 13000XL can capture 11×17 in. documents. There is also a planetary scanner, which is especially useful for large-format scanning, in Digital Production Services in Snell.
After discussing the various equipment used to digitize printed materials, Kim spoke in more detail about the best practices for file types. TIFF is an industry standard for digitization because it allows for high quality images. Some projects may require images in a JPEG format (to share on WikiMedia Commons for example), while others may require images in a PDF format. Kim explained that PDF is the required format for using OCR (Optical Character Recognition)—a technology that creates searchable PDFs by adding a plain text layer to the images (and can also create separate plain-text files for use in things like text analysis). She emphasized the importance of manually adding tags (such as identifying headers and other document structure) to searchable PDFs to make them accessible for screen readers.
Kim explained “resolution” as how many pixels the scanner captures per inch of the original material. This is usually expressed in ppi (pixels per inch) or dpi (dots per inch). She suggested using 400 dpi for most materials; increasing the dpi to 600 if the image will be displayed at a large size. Interestingly, she revealed that slides and negatives require between 3,000 and 4,000 dpi because the original size starts off so small.
She then turned to discussing the digitization of A/V materials, which follows a different process than printed materials. Often A/V digitization involves identifying an older machine that can play the content and then using an analog-to-digital converter to capture the information in digital form. Digitization of A/V material is often very time consuming as the material is captured in real time. Unlike printed materials, there is no agreed upon standard for file type for video materials. DPS uses uncompressed video, which can result in very large files: up to 1GB per minute of material!
Kim also highlighted two different preservation issues unique to A/V materials: sticky shed syndrome and vinegar syndrome. Sticky shed syndrome causes the magnetic layer to unstick from the base layer of film, which eventually makes the film unusable. Fascinatingly, one way to mitigate sticky shed syndrome is to bake the film (in an oven!) immediately prior to digitization. Vinegar syndrome causes film to smell like vinegar and start to shrink. Because of these issues, Kim suggested that A/V items are often better suited to be sent to a vendor to be digitized.
She concluded by highlighting different resources for digitization available in Snell Library, namely: Digital Production Services (DPS), the Digital Scholarship Group (DSG), and Research Data Services (RDS). DPS supports the library’s Digital Repository Service (DRS): a secure repository system, designed to store and share scholarly, administrative, and archival materials on behalf of the Northeastern University community. Kim shared that the DRS currently has over 400,000 items with 10 million streams. Kim Kennedy can be reached at [email protected].
