Over the years IT Creative Labs has implemented many different projects of various complexity and across many different niches. Our development team comprises senior professionals only and today we asked our lead backend engineer Vlad to share some insights on his work using one of the projects as an example.
The insurance company wanted to semi-automate the process of reviewing insurance claims, track statistics and semi-automate the process of filling out the forms, should the claim be approved for coverage. In order to achieve that the insurance company needed to automate the processing of user documents and to be able to obtain structured user data as a result of that processing. This automated process also needed to be integrated with the existing systems that the insurance company was using already. The first thing we did with the client is break down their requirements into smaller tasks, identifying key objectives.
Useful side note from Vlad: Any task, even the most difficult one, can be broken down into smaller sub-tasks. This concept is called Microproductivity.
Here is the breakdown Vlad & the team proposed:
Task 1
Problem: Documents were rotated differently, some were flipped upside down
Solution: Created an automated process of proper rotation for all the documents. A simple CV (Computer Vision) algorithm was used for that.
Task 2
Problem: Sometimes a scan of a document can be the size of A4 and the document itself is the size of an ID card.
Solution: Implemented a fixed-scale document template by cropping along the edge of the document and removing white space.
Task 3
Problem: Ability to identify the type of documents to be able to extract relevant fields from them.
Solution: A classifier model was implemented based on ML (machine learning), where the convolution neural network was trained on a high volume of documents and error validation (backpropagation).
Task 4
Problem: Different types of documents, a lot of older document types, hand-written documentation.
Solution: Implemented text recognition from image. A classic recognition model for Latin language was implemented.
Task 5
Problem: Extract recognized information and structure it properly mapping it to the correct fields, so it is presented in the cohesive and standardized manner.
Solution: Create an automated machine learning model, trained on specific types of documents, to extract information from the document and fill in the fields. When the fields were empty, we used an additional filling algorithm for placing all the appropriately identified elements on the coordinate grid.
In Vlad’s own words: “Everything was simple: just several models, algorithms, training and the task was done.”
For this project the following techstack was used: OpenCV, Python, Tensor Flow, Keras. React for frontend and Flask for backend, PostgreSQL for data storage.
As a result of creating this semi-automated machine learning-based process and integrating it with the rest of the existing systems, the insurance company was able to significantly cut down processing time and human error while being able to process larger volumes without increasing its staff count.
If you have a project in mind that you’d like to chat about – reach out!