Going Beyond OCR: Achieve End-To-End Automation with Unstructured Data

August 27, 2019

I remember how, even at one of the top 3 Pharmaceutical companies, submitting a new drug application to the FDA used to involve loading box upon box of paper documents into tractor trailers. Tractor trailers of data. Think about how long it took for someone to review those boxes of paper. Now we submit electronically, and that in itself is faster, but there are ways we can speed up the process and make it safer for patients than ever before.

This is where our company, Court Square Group (CSG), comes in. We’re a managed service firm that specializes in managing applications in our Audit Ready Compliant Cloud (ARCC) environment for companies in the pharmaceutical, biotech, and medical devices industries. We manage those applications, or the integration between applications, used in the drug development process and submissions to regulatory bodies like the FDA, Health Canada, or the EU authorities. Our goal is to keep pushing the envelope to implement technology solutions for our clients, while keeping them audit-ready. We are always looking for new and innovative solutions to incorporate into our service offerings for our clients.

When it comes to clinical testing and approvals, the auditor is worried about data integrity. They need to ensure the drug’s safety data hasn’t been compromised or changed in any way. That affects your validating systems and your change control. While most other IT vendors don’t even know what IQ, OQ, and PQ are, CSG lives and breathes these qualification procedures. We qualify all our infrastructure and make sure it stays qualified.

Whether we’re working on a client’s clinical files or helping them submit to regulatory bodies, we host several document management solutions. That’s where we first started working with Adlib Software a number of years ago: they were the de-facto standard for PDF rendering solutions among all the different solutions we managed.

Since then, we’ve continued our relationship with more advanced projects that have opened my eyes to what’s possible in content management solutions. Adlib Software has made documents come alive for us. We can now use that content to facilitate easier handling of the documents. I’ll give you an example of the first use case that really made me think of the numerous ways that we could utilize the Adlib tools.

Working with Documents the Smart Way: The Case of FDA Form 1572

FDA form 1572 is a government form that our clients need to complete as part of their clinical trials. The form has multiple fields, including listing the principal investigators at a clinical site as well as blocks of text. In this use case, we deal with up to 150 clinical sites, and each one of these clinical sites has multiple documents for their principal investigators because clinical trials can last years. Every time a principal investigator leaves or comes in, they have to generate a new 1572 document.

To submit their clinical data to the FDA, our clients have to pull together all the data from these documents, reconcile the data, and produce a summary report. We do this work for them.

In the past, this process amounted to someone manually going through every single document, pulling the individual names off each one, and putting them into the summary document. Verifying the data manually was tedious and could take up to a week and a half to produce that final summary depending on the total number of sites and the amount of documents per site to be reviewed. It also introduced the potential for human error into the process, which needed to be minimized.

Manually processing documents WILL lead to human error. Automation = quality.

I thought: There has to be a way to automate this. When we first started getting into these document management use cases, I believed what we were trying to solve was a fairly simple problem. We did look at the other tools available, and as we looked at those other solutions, I realized this is a far more complex problem than I originally thought.

Enter Adlib Elevate. As an enterprise-grade data enrichment platform, Adlib Elevate not only handles OCRs for documents, but intelligently pulls data from several documents and ties them together. Supported by the iterative tools my team put together, we decided to apply Adlib Elevate to the problem of the 1572 forms and summary reports.

The change from manual to automation was dramatic. Where this task used to take us over a week, Adlib Elevate reads the documents, determines which ones were 1572 forms, identifies the different fields, pulls the relevant data, and creates a summary report in less than an hour.

Effectively discovering, cleansing, and enriching data means consistency and higher data for your company.

More than just time savings, this process also adds credibility. Sometimes, a name is listed twice on the original form or spelled slightly differently. We can add logic to our processing that will look for doubled or incorrect names and create an exception report for review.

Often, we have to verify that a site’s principal investigator is properly certified, and that means having that person’s CV to check that their certification is up to date. We can now add a natural extension so Adlib Elevate automatically checks that we have the CV. It will then read the document to verify the lead investigator’s certification hasn’t expired.

There are two big values for our clients in this. The first is that it’s going to be faster and cheaper. But the bigger benefit is the assurance of consistency and quality to the data and the final summary. They don’t have to worry the FDA will reject their submission because the data was incorrect, or that they’ll have to resubmit, because the FDA will certainly check that their data lines up. That’s a huge benefit to our clients when they’ve already invested a great deal in the FDA submission process.

Automation and Pre-Processing

Using Adlib Elevate, we’re able to produce summary reports for clients’ institutional review boards for their clinical trials, because, again, there’s a wealth of data for us to tie together.

Clients have to produce summaries for their clinical trial submission. Some of the data comes from those 1572s, some from other forms. The nice part is that Elevate can determine the type of document based on its content. That means we can automate the process of putting together summary reports.

In addition to summary reports, we can pre-process documents to determine their content. Maybe that sounds obvious—shouldn’t the client already know what data they have? But a lot of companies buy data from other companies. When they buy this data, it amounts to perhaps thousands of documents. Until someone reads all those documents and identifies every one, they can’t be sure, for example, that a particular form referenced in a given document is actually present.

This is where we can offer a solution by pre-processing the documents with Adlib Elevate. We will use OCR, identify each document, and even metadata tag the documents and cross-check to identify and search for any documents referenced. It means we can go back to the sponsoring company that sold the data to have them fill in the gaps for anything missing.

That represents a huge benefit to our clients because not only does it ensure they get everything they purchased, but they get that assurance right away. If you discover there’s a document missing from a data compound months after purchase, they are potentially looking at an additional delay in receiving that document from the seller. By automating the data pre-processing, we save our clients what could be a lot of wasted time.

One of the biggest problems in this industry right now is the fact that electronic documents lose fidelity as they’re transferred from one person to another. This is because a lot of people print a document, and then the other person will scan it, and it’s not a real document anymore. Now it’s just a scanned image. With Adlib Elevate, we can OCR these images and make them searchable documents again.

The right OCR solution can turn your static documents into living documents. @adlibsoftware

We’re now thinking about these documents as living content where information can be pulled and processed in ways we’ve never discovered before. This opens up endless possibilities for our customers—and our business. Adlib Elevate give us that.

Getting Better Medicine Into Patients’ Hands

Let’s take a step back and look at the big picture of how this translates for the patient who needs an innovative new treatment. Maybe they’ve been diagnosed with a disorder that affects only a small segment of the population. Or maybe they have complications that render available medicine unusable.

It used to be that a pharmaceutical company had to be massive. They needed thousands of employees to funnel all this data from the clinical trials through the approval process to get that drug to those patients. Now, the technology we’ve put in place means pharmaceutical companies can do more with less.

It opens up space for smaller companies with only a couple hundred people to get their drugs approved—and it’s even safer than before because we can substantiate that the clinical data is uncompromised. These medicines can be approved and start changing people’s lives so much faster than they ever could in the past. Speed often comes at the expense of quality, but as this industry evolves, we’re discovering it doesn’t have to.

Keith Parent

Founder and CEO at Court Square Group

Going Beyond OCR: Achieve End-To-End Automation with Unstructured Data

Adlib Software

Working with Documents the Smart Way: The Case of FDA Form 1572

Automation and Pre-Processing

Getting Better Medicine Into Patients’ Hands

More Stories from Adlib Software