• Skip to main content

Polydocs

Automate Data Capture for Infor ERP

  • Products
        • DOC²
            • Inbox automation
            • Autonomous capturing
            • Master data validation
            • Portfolio of integrations
        • Insight²
  • Use Cases
      • 1200px-Infor_logo.svgAccounting Automation
        • Invoice processing for Infor LN
        • Invoice processing for Infor M3
      • Document approval
        • Approval of invoices
  • Resources
    • Blog
    • Watch & Learn
    • Documentation
    • Glossary
  • Company
    • Contact
    • Career
    • Partner
    • Services
  • English
    • Deutsch
  • Try it free
  • Get a Demo

data

Jan 17 2023

Data extraction

data extraction
17. January 2023 von Leni

Data extraction

from a tedious activity to an efficient process


In today’s world, which is characterized by a wealth of data, the analysis and further use of data is an omnipresent topic. In this context, the extraction of data from a wide variety of documents plays an indispensable and constantly necessary role. In many companies, however, data is still often extracted and processed manually in laborious and time-consuming processes, which is associated with an increased expenditure of time and risk of error. 

So how can these dataextraction processes be facilitated and optimized to process unstructured data into a structured format suitable for easy reuse?

The biggest challenges in data extraction

The biggest challenges in data extraction lie in the large number of form variants and document versions containing unstructured data and, in some cases, irrelevant information. 

An adaptable content capture and data integration system is therefore necessary to extract the required data and process it into a structured format. And this is exactly what DOC² offers you! With DOC²’s flexible content capture, data from a wide variety of documents can be recognized, extracted and structured. This ordered data digitization allows data to be easily #processed, stored and analyzed elsewhere, resulting in greater control and increased accuracy of captured data. 

The use of an AI enables DOC² to provide simplified and structured data extraction, which increases the efficiency of these processes. It saves time, speeds up otherwise tedious turnaround times, and additionally improves the quality of the captured data.

So say goodbye to tedious and lengthy manual data typing and take advantage of DOC² in your business.

Feel free to contact us.

We’ll help you as best we can.

ContaCt
data extrction

Image credit: Header & post image from Polydocs

Written by Stephanie Propstmeier · Categorized: artificial intelligence, Blog, data, DOC², English · Tagged: AI, analysis, data extraction, DOC², efficiency, extracted, optimization, process, Process optimization, recognized, structured

Jan 10 2023

Optimize your processes

processes
10. January 2023 from Stephanie Propstmeier

Optimize your processes


Process optimization – Who doesn’t want that? If there is a way to improve and simplify certain processes and procedures in a company, it would be pretty stupid not to use it. Especially manual routine activities such as processing invoices and order documents is often a lengthy process in which, unfortunately, mistakes are made time and again. After all, we are all only human and do our best. Improving weak points can optimize workflow and lead to #cost savings and more efficient workflows. Integrating modern technology can make these very processes more efficient, making everyday work much easier. 

The partial or complete automation of certain processes can thus contribute to an overall improvement of the workflow within a company. Artificial intelligence can quickly digitize important information and make it available for further use. Data can thus be analyzed more quickly, allowing any weaknesses or errors to be uncovered and corrected more quickly, saving costs and time. In detail, this means, for example, that invoice or order documents can be easily read in and lengthy processing procedures can be avoided, reducing manual activities and shortening throughput times. That sounds good, doesn’t it?

Increase efficiency in your company with DOC²!

By digitizing documents, DOC² can help you shorten tedious processes, giving you more time to focus on more important tasks. DOC² can read and process all important information from different documents by Image Recognition. This means that the necessary data is quickly available for subsequent processes. DOC²’s easy and fast extraction and processing of important document information into a clear format thus ensures improved data quality and a reduced error rate. Through DOC²’s cloud-based API architecture and ongoing process optimization, workflows are constantly improved and made more efficient. The bundling of the necessary data from different documents into a clear format thus also simplifies tedious manual routine activities, resulting in an increase in overall productivity. See for yourself how DOC² can help you optimize processes and thus increase the efficiency and performance of your company.

Feel free to contact us.

We’ll help you as best we can.

ContaCt
processes

Image credit: Header & post image from Dirk Wouters on Pixabay

Written by Stephanie Propstmeier · Categorized: Blog, data, DOC², English · Tagged: automation, cost savings, digitizing, DOC², Process optimization, time is money, Workflow

Dec 13 2022

Can an AI-supported OCR really read “handwriting” better than a human?

ocr
13. December 2022 von Leni

Can an AI-supported OCR really read “handwriting” better than a human?


Paper processing is tedious as it takes a lot of time and resources to manually enter all of the data into the system, but it is unfortunately a necessary evil for many companies. This is exactly why more and more companies are using OCR (optical character recognition) software to support the process. With traditional OCR, up to 80% of document workflows can be handled. It is able to recognize almost any variant of machine-made and clearly printed text, based on font and symbol recognition, but as soon as the text is smeared or crooked, it becomes difficult. This is the 20% in which a human being has to intervene again, which inevitably means that we are faced with three major challenges:

  1. Inaccuracy: Typos and Handling of Exceptions (we’re all just humans)
  2. Resources: It is difficult to find employees who are willing, and most importantly, able to extract text from low-quality documents.
  3. Safety: The transition from machine to the human and back to the machine poses safety risks. This plays a major role, especially in strictly regulated industries with sensitive data, such as financial services, authorities or even organizations in the healthcare sector. 

If we are being honest, we need a better solution for these 20%. But before we get into problem solving, we should clarify an important question first. 

What exactly are low-quality documents?

A low-quality document can be a fax or a scanned document of poor quality. Likewise, it can also be a delivery note, a timesheet, or a form for patient registration, which has been filled out by hand.

Traditional OCR is no longer sufficient to extract data from these document types. AI-powered OCR, on the other hand, uses more advanced technologies, such as a highly skilled machine learning models and advanced computer vision engines. By combining these two technologies, we get an OCR software that is able to replicate the way people are able to read low-quality documents. If the model is good enough, it may actually be able to extract (handwritten) texts better than humans. But each model is only as good as the data set on which it is trained.

How do I find the right provider for me among all the different vendors?

Terms like ‘artificial intelligence’ and ‘machine learning’ are used too often and not every vendor can prove the functionality of their technology. Look out for providers with transparent numbers and ask yourself the following questions in advance: 

  • Is the vendor’s solution AI-powered or is it just a well-marketed mix of human data entry and machine learning? 
  • Can vendors provide accuracy for each process performed, as well as each document read and extracted? 
  • Why is the provider active in the OCR industry and how much experience do they have in this field? 
  • Does it offer a cloud-based SaaS solution or do you need to host the solution on-premises? 
  • How long does it take to use the product? How much training is needed? Are special expertise required?

Fellow Consulting AG / PolyDocs GmbH started collecting data by humans. As a result, we have the largest human-audited dataset (over one billion fields) in the industry. We offer high accuracy from day one of product use.

That was a lot of information and a lot of questions for now! Take your time, think about it, weigh all the pros and cons and decide correctly – for DOC² (Polydocs) 😉

Written by Stephanie Propstmeier · Categorized: artificial intelligence, Blog, data, DOC², English

Nov 15 2022

Digitalization – Industry 4.0 – Artificial Intelligence

Künstliche Intelligenz
15. November 2022 von Leni

Digitalization – Industry 4.0 – Artificial Intelligence


These are all terms that we hear or read very often these days. Today we wanted to reflect on the topic of artificial intelligence (AI).

What exactly is artificial intelligence?

Where and how to use it? What benefits can it bring me and my company?

Artificial intelligence is not that easy to explain, for the simple reason that there is not even a precise definition for the term “intelligence”. Nevertheless, let’s try to explain it :-).

We can say that an AI is a mix of different technologies that enable machines to understand, act and learn with human intelligence. 

Technologies like machine learning (ML or MachineLearning), natural language processing (NLP – Natural Language Processing) as well as Deep Learning (DL) are all part of the AI landscape. 

In all areas where large amounts of data are generated, the use of an AI makes sense and is beneficial. An AI is able to objectively analyze huge amounts of unstructured data within a very short time using sophisticated algorithms, to recognize patterns in it, and to make decisions independently on the basis of these patterns. In addition, the error rate continues to fall due to the constantly growing wealth of experience. AI takes over everyday recurring routine tasks, so employees have more time to focus on tasks that require empathy and become more productive. In other words, AI should not and cannot fully replace a human, but merely support and complement them.

When combined with analytics and automation, AI can help the business achieve their goals, such as improved customer service or an optimized supply chain: faster and easier.

Automations reduce costs and bring a new level of consistency, speed and scalability to business processes.

With DOC², we want to help you automate your document processing processes. By using artificial swarm intelligence, DOC² offers you many advantages.

You want to know what they are?

Then feel free to write to us

contact
Artificial Intelligence

Image credit: Header & Featured image from Gerd Altmann on Pixabay

Written by Stephanie Propstmeier · Categorized: artificial intelligence, Blog, DOC², English, Machine Learning · Tagged: AI, Artificial Intelligence, Deep Learning, digitalization, DOC², future, Industry 4.0, machine learning, ML, Natural Language Processing, NLP

Nov 01 2022

Document processing with Doc2 Version 2.0

Doc2
1. November 2022 from Daniel Jordan

Document processing with Doc2 Version 2.0


The extraction of data from PDFs and scanned documents might not be the most interesting or challenging issue of the century. It doesn’t give you the opportunity to control a robot, play virtual games or helps you express your creativity. Instead, it’s plain diligent work, something that “KI” promised to automate but hasn’t achieved yet. Nonetheless, the processing of documents, meaning the conversion of analogue data into a digital format, describes a subtle challenge – a complex task, that is so easy and yet so difficult to resolve.

Document processing with Doc2 Version 2.0, the conversion of analogue data into a digital format – an easy and yet so difficult task to resolve.

We at Polydocs realized after completing several different projects, that document processing is omnipresent – from companies to non-governmental organizations, from little businesses to large corporations – there is always a PDF that needs to be digitalised!

The processing of documents is therefore not just difficult, it might also be in urgent need.

This blog post explains the framework for the development of document processing solutions and describes what we are working on for Doc2 Version 2.0.

Document processing with Doc2 Version 2.0 is based on the principles:

Annotations are indispensable: There is no simple nostrum. Even if you have a good model, you still must make sure to fine tune your data. Ideally you should have an annotation tool with integrated fine-tuning, or you are flexible enough to integrate the mechanism.

Generate multi-modal models: When analysing a document, we are not just relying on the text. Instead, we are taking all information (position, font size, etc.) as context, to make use of all these attributes. A plain OCR (optical character recognition) or a simple text-based approach is insufficient to resolve this task.

Always correct. OCR– and document layout models are not always perfect, hence the necessity for humans to correct the results in the system. The correction can be used to train your model or as the penultimate step before saving the results to a database.

Formularverständnis

Understanding the form

All the above-mentioned principles have been considered in Doc2. The document describes the typical document processing workflow:

Dokumentenverarbeitungsworkflow

Annotations are indispensable

Labels are indispensable for every solution in document processing. Documents tend to be very diverse, even if they have distinguishable patterns. A tool for reliable document labelling is therefore required.

Generation of multi-modal models

An additional reason why processing of documents is such an attractive problem lies in its multi-modal nature – textural and visual information is readily available. Unfortunately, rough solutions for document processing tend to only use one or the other:

Image centred approaches involve a lot of complex business rules around the frame of limitation and text placement to obtain the required information. They often rely on templates that aren’t scalable. Text centred approaches are based on NLP-pipelines for OCR-captured texts. However, text blocks are not compatible with the domain where these models were originally trained, leading to a mediocre performance.Fortunately, multi-modal models like Doc2 have the ability to learn from textural and visual information. Not only word and image itself, but also their positions are embedded for a certain document. The interactions between them are then learned, aided by predefined training goals.

The Doc² model learns from textual as well as visual information and learns about the interaction between them.

Always correct

We think that even with the most efficient document processing systems, human knowledge and experience for corrections and assessment must be incorporated. Human-in-the-loop can serve as a final inspection for the output of a model. We can reuse the corrected annotations to further refine the model and close the loop.

Final remarks for document processing with Doc2 Version 2.0

This blog post gives an outlook on our Version 2.0, describing the most important aspects of a document processing solution: an annotation mechanism, a multi-modal model and a step for assessment.

Machine learning was promised to automate manual labour. However, it looks like we hit a wall and instead started to automate creative works. In my opinion, we have optimised the search for nostrums: a big model is fed with input to receive the desired output. Manual labour, such as the processing of documents, is not like that. Instead, they are generally custom-made: you must label the data, make sure all elements of the document are incorporated and correct the output of the models – and one big model does not suffice. There are several models, extracting different things.

document processing

Image credits: Header & Featured Image from Freepik and Freepik

Written by Daniel Jordan · Categorized: Blog, data, DOC², English, Machine Learning · Tagged: AI, DOC², document processing, OCR

  • Go to page 1
  • Go to page 2
  • Go to page 3
  • Interim pages omitted …
  • Go to page 10
  • Go to Next Page »
  • Contact
  • Privacy Policy
  • Imprint
  • Master Service Agreement
  • Status

Copyright © 2023 · Polydocs Gmbh · Log in