Automanager Spreadsheets Recognizer

Business Solution

Automatically recognizes data from Excel tables with different structures, multiple mistakes and crazy remarks

This use-case is a part of a complex platform "Automanager" - The Marketplace for wholesalers and retailers for auto tires and rims.

Automanager Scheme

Created to collect and refresh prices from MarketPlayers

Wholesalers

The project idea is receiving of unstructured information from Excel pricelists of many wholesalers, standardizing it and preparing for Retailers.

More than 100 Wholesalers send regularly own pricelists to the special service email. The Email Scraper extracts it from there and put into the tasks queue to handle

Excel pricelists

Let's look on that pricelists

  • Their structure is entirely different.
  • No any standards to point product's parameters or products titles.
  • A lot of errors in the names.
  • Some of them contain a mix of Latin and Cyrillic letters, which look the same
  • Some special symbols or signs instead of stock amount.
  • Even they use the color of a cell's background to point if the product exists in the warehouse or not
  • Prices are indicated in different currencies
Prices with mistakes Prices with mistakes
PriceList Profile

Searching for a more optimal way to recognize

Moving step-by-step, we've started recognizing the first most difficult prices by the Python script, which we wrote separately for each new pricelist.

Such a way we collected more and more experience and understanding of situations. After the first 20 pricelists, we've understood which kind of a mess we could wait further.

The solution to connect each Python script to the related PriceList Profile could look not so perfect and not so secure. But quite enough for the startup with a limited budget.

Synonyms Vocabulary

How to handle it without coding

The next goal was to avoid programming coding for each new price. So understanding more and more similarities between different pricelists structure, we divided some settings for it. So the Client's staff can handle it themselves without special education.

The recognizing process also works through Synonyms Vocabulary, which continuously improving.

Manual works still left

So after automatic script working some percentage of records couldn't be recognized and required manual work. To simplify this process, we show the Percentage of Similarity

Similarity Percentage

The possible better solution

Of course, we can improve this process entirely by using Machine Learning and other technologies. But it requires the vast data volume first. And the budget for development as well.

How much and how long?

Maybe you are interested in the costs and duration of this solution. Ok. It was just a part of the general continuous development and researching process simultaneously. Approximately it was about 30 hours of research and development. But not at once and time-to-time. The whole process of improving took 2 months

How it helps business?

The whole process works autonomously and without our maintenance. The Client uses it for their own Retail store and sells these services for others.

Have you interested in some similar solution?

Let's learn together where is the bottleneck in your Business to solve it optimal.