Accelerated PDF Remediation with Augmented Classification AI

Handouts Media

Scheduled at 9:00 am in Penrose 1 on Friday, November 15.

#39789

Speaker(s)

  • Greg Suprock, Head of Solutions Architecture, Apex CoVantage

Session Details

  • Length of Session: 1-hr
  • Format: Lecture
  • Expertise Level: Intermediate
  • Type of session: General Conference

Summary

We will describe the strengths and weaknesses of a novel software approach for PDF remediation. We will also discuss the importance of the human-in-the-loop design. We will share two case studies: first, we will outline the workflow for institution-based customers with many files to remediate, and, second, we will describe an online platform designed for on-demand service with small numbers of files to remediate or those documents on an accelerated schedule.

Abstract

We will describe the strengths and weaknesses of a novel software approach for PDF remediation. We will also discuss the importance of the human-in-the-loop design. We will share two case studies: first, we will outline the workflow for institution-based customers with many files to remediate, and, second, we will describe an online platform designed for on-demand service with small numbers of files to remediate or those documents on an accelerated schedule

The goal of developing a new software suite to assist in PDF remediation for accessibility of complex documents was to reduce production time without quality loss. The solution was a novel approach using classification artificial intelligence (AI) augmented with human-in-the-loop verification was developed.

We will discuss the development process and evaluation of the classification AI models trained on scientific, technical, engineering, and math (STEM) content and commercial documents. We will describe the workflow design including how the output from the classification AI pairs with data from algorithmic software extractions to create a software roadmap for accessibly tagging the input PDF. We will explain why we built into the workflow humans in the loop for content identification and tagging verification.

Keypoints

  1. Classification AI accelerates accessibly tagging PDF.
  2. Humans in the loop with Classification AI facilitates delivery of accessible PDF.
  3. Balancing classification AI with algorithmic programming to create a document tagging roadmap.

Disability Areas

Vision

Topic Areas

Artificial Intelligence, Uncategorized, Web/Media/App Access

Speaker Bio(s)

Greg Suprock

Mr. Suprock is the Head of Solutions Architecture at Apex CoVantage. He is an experienced technologist with a track record of introducing positive change and exploiting disruptive technology. He is an expert at creating new products, new services and novel approaches to problem solving with a unique ability to work at all levels of an organization inspiring others to achieve success. Additional professional experience includes: Implementing High-Volume, High-Quality Workflows: • Designed and implemented XML and products workflows for U.S. Copyright Office, PLOS, JSTOR, and Taylor & Francis. • Designed and implemented high volume library workflows for capturing archival formats for State Library of New South Wales, National Library of Australia and the National Institutes of Health. • Responsible for design and implementation for National Library of Medicine Book Conversions. • Lead the development of Apex’s online production and tracking system. Thought Leadership: • Support PLOS and University of Michigan through technical and strategic consulting engagements. • Presentations at SSP, Council of Science Editors, Charleston, AXEcon, mEnabling, and Digital Book World. • SSP Board Election; Co-chair SSP Education Committee • National Information Standards Organization (NISO) Board Member; Information Discovery and Interchange (IDI) Committee; NISO Audit Committee; and NISO Architecture Committee. • Regular contributor to Apex blog posts Introducing Disruptive Technology: • Introduced PACE (novel online graphics inspection and repair software) for use by PLOS. PACE solved problems in art processing due to author limitations, reduced burden on PLOS production staff causing a dramatic shift down in source issues/cost and increased throughput. • EZ-Edit deployment for PLOS, Taylor and Francis and Brill to move XML as far forward in the production workflow as possible, without requiring XML expertise in end-users. • Introduce AI/ML technology to address complex issues in the publishing and conversion workspace. These include copyediting level assessment tools, enhanced OCR and Intelligent Character Recognition (ICR), and, classification AIs for accelerating production.

Handout(s)