As much as we like to think of document scanning as mature technology, its application can still be a huge variable in advanced technology implementations. This was illustrated in a recent AI implementation at an international bank with branches throughout Europe and Latin America.
“Our data extraction software was working well when we did testing and then put it into production; but when we rolled it out to the branch offices, the quality of extraction degraded—it went from 95% accuracy to around 30%,” said Sinuhé Arroyo, Founder and CEO of Singapore based AI specialist TAIGER. “When I received this information, I started to connect some dots and found out that the scanners at the branches were lower quality. The results really proved the old adage of ‘garbage in, garbage out.’ So, we changed the scanners and the problem was resolved.”
This resolution of this issue led to a partnership with Kodak Alaris, the information management (including scanner manufacturing) division of Kodak Alaris. TAIGER and Kodak Alaris recently announced a global strategic alliance through which TAIGER will integrate its software with scanners from Kodak Alaris and offer them as part of its data extraction solutions. Initially, TAIGER, will focus on certifying a pair of network scanners, the Kodak Scan Station 730EX and the Kodak S2060w.
“The network scanners meet the needs for a particular customer and allow us to embed the TAIGER solution into the interface of the device,” said Vanilda Grando, global sales development director for Kodak Alaris. “This gave us our initial sales model, but we are not necessarily limited to only working with TAIGER on network scanners.”
The high quality of images produced by the scanners was the dealmaker from Arroyo’s standpoint. “The technology that Alaris brings to the table in the area of image processing was superior to what we saw from other players,” he said. “It really made them stand out.”
TAIGER (pronounced “tiger”) is one of several AI specialists that we have spoken with recently who are applying their technology to document capture applications. Arroyo founded the company in 2009. “I started programming in BASIC when I was 10 years old,” Arroyo told . “I continued to study computer science and got my PhD in AI. I was working in areas like machine learning and automated reasoning. Natural language processing was one of the disciplines.
“When I started looking for ways to apply my research to business, I realized that text understanding was an area that remained largely unsolved. With TAIGER, we went there initially with search technology and then added a chatbot assistant. Later we added information extraction, which can read documents and extract information. We convert that information into data to be ingested by information systems.”
TAIGER is based in Singapore and is part of a growing tech scene there. In 2017, TAIGER closed a round of funding worth the equivalent of U.S. $5.87 million. (Just this week, TAIGER announced a Series B round of $25 million in funding, which values the company at $110 million.) It currently has branch offices in New York, Madrid, Hong Kong, and Sydney, with plans to continue to expand worldwide. “So far, we have mostly focused on the financial services market with five of the seven largest global banks as our customers,” said Arroyo. “We also have some select government contracts and have just started working with some law firms.”
TAIGER has three separate product lines: iSearch, iConverse for chatbots, and iMatch for data extraction. “Our extraction technology can work with documents that come in digital or physical formats,” said Arroyo. “If the document is an e-mail or another type of electronic file, there is no need to scan it. If it’s paper, we rely on a scanner to generate an image file, to which we apply OCR. We then work with the ASCII text to interpret the document for the benefit of the defined application.” (TAIGER is not currently utilizing Kodak Capture Pro Software for OCR, but Arroyo said it is under strong consideration.)
According to Arroyo, the main differentiator between iMatch and traditional capture is the spectrum of documents that iMatch can address. “Traditional capture does well on structured documents and semistructured ones, if there is not much variation or complexity,” he said. “When you get to completely unstructured documents—they are not even in the realm. Its machine learning technology cannot achieve the accuracy we do, so the vendors cannot guarantee the KPIs that we can. We can offer 90-95% accuracy with no false positives and that’s a tremendous value.”
For the aforementioned global bank, TAIGER has deployed iMatch to help with the onboarding process. For corporate clients, this involves capturing data from at least 10 different documents, which includes unstructured documents like Powers of Attorney and Articles of Incorporation. The specific phrase that grants an individual power of attorney for a corporation, for example, can be vastly different depending on who drew up the form.
“One of the challenges with language is that the meaning of something you say can vary so much based on context,” said Arroyo. “For example, if I ask where you live, that can mean I want to know in what neighborhood your house is, or what city or country. Or, we can ask the same thing with different verbalization. This is why pattern matching does not work for extraction from unstructured documents. The tree of possibilities can be phenomenally large. “Our technology has the ability to identify the subjects, predicates, objects, etc., of sentences. It then applies semantic understanding to determine which of the five to seven different meanings is right. Then we try and contextualize the phrase within the document to really try and get to its meaning.”
Arroyo explained that there are three elements to an iMatch implementation. “You have the underlying engine which is going to be the same in every instance,” he said. “Then, you have the documents. Finally, there is a knowledge base, such as an ontology, that helps tune the engine to process a particular type of document. The value proposition is in being able to very accurately capture mission critical data and guarantee there will be no false positives.”
The application at the global bank resulted in a reduction in onboarding time for corporate customers from several weeks in some cases to just seven minutes. The costs of onboarding customers has been reduced by 85%. Mistakes were also reduced.
iMatch currently supports capturing data in English, Spanish and Chinese. “We are building support for many more languages,” said Arroyo. “It’s not really that challenging of a process. It takes us about a couple of months to add a new language.”
To date, iMatch has been deployed only on premises, but TAIGER is striving to move its technology to the cloud. “We see many organizations, especially in the financial services market, looking to move to cloudbased software,” said Arroyo. “I think it benefits everyone through reduced time to market and lower cost of implementation for the customer. Working with the network scanners from Alaris is a no brainer when it comes to enabling integration with cloud software.”
TAIGER continues to develop its technology, recently adding new redaction capabilities. “As we add features and potentially move our technology to the cloud, it should enable us to better target the mid-market, although I don’t see us ever going much further downstream than that,” said Arroyo. “We are really a B2B company.”
Initially, at least Alaris will look for opportunities to recommend the TAIGER software to its customers. “We are not ready to act as reseller,” said Alaris’ Grando. “But, these are the type of global alliances that we are looking to develop. When you combine TAIGER’s software with our scanners, it enables us to create something really different in the market. I like to say that we are the gasoline and they are the engine. We start the process by providing their engine with good quality images, which makes a difference in how well they can automate transactions.”
Document Imaging Report (DIR), is the premier management and marketing newsletter on opportunities and trends converting paper processes to electronic format. DIR is an 8-page newsletter produced twice monthly that helps you stay on top of critical developments in just minutes - saving you hours of reading and research every month. It is written for vendors, end users, and the distribution channel for the imaging industry. DIR provides the most accurate and unbiased inside information in the market and will keep you at the forefront of industry changes and advancements. To subscribe to the DIR, or for a free trial, please visit https://www.documentimagingreport.com/?page_id=1345