Success stories
Automatic de-typesetting of
print media at Rebold
Rebold is a Spanish company in the marketing and communication sector that specialises in the analysis of data of interest to its clients. By fully understanding their environment, Rebold’s clients will be able to identify opportunities for growth. Among the information to be studied is that found in the written press.
01. Challenge
One of the most time-consuming tasks in press analysis is the de-layout of the press, identifying the different news units in each medium.
This de-typing is carried out manually by several operators, who view hundreds of pages from various media and digitally cut these pages into various news items. This is necessary work, but it is of little value in itself without further analysis of the news.
This is highly time-consuming and costly to carry out. In addition, in an environment where news is increasingly changing and ephemeral, it is necessary to have a system where this process is much faster.
At Rebold they were looking for experts in artificial intelligence and computer vision . These had to be able to identify, cluster and group a set of text boxes of various shapes and sizes into an informative unit.
Datision has extensive experience in computer vision and, as a spin-off of the Institut de Robotica i Informàtica industrial (IRI), had participated in multiple research projects that could help achieve the objectives set.
02. Solution
To solve the problem posed, it is decided to develop a system based on deep learning. It will be able to autonomously learn the current demagging process, which is possible thanks to the large corpus of current data available.
By using these algorithms, the system is able to generate images from the various media, establish correlations between the various text boxes due to their position, typography, size, etc., and determine which of them, including images, are part of the same news item.
Undoubtedly, this system represents an increase in the productivity of the operators in the unpacking process, who will be able to focus on the analysis of the news, thus increasing the value of their work and reducing the analysis time.
03. Current scenario
This research project is in its final phase and with satisfactory results to be put into production soon. Manual unpacking times of 20-30 minutes have been reduced to just seconds.
Once completed, further progress is expected to be made in the classification and analysis of text using natural language processing systems.