Automatic de-typesetting of print media at Rebold
Rebold is a Spanish company in the marketing and communication sector that specialises in the analysis of data of interest to its clients. By fully understanding their environment, Rebold’s clients will be able to identify opportunities for growth. Among the information to be studied is that found in the written press.
One of the most time-consuming tasks in press analysis is the de-layout of the press, identifying the different news units in each medium.
To solve the problem posed, it is decided to develop a system based on deep learning. It will be able to autonomously learn the current demagging process, which is possible thanks to the large corpus of current data available.
By using these algorithms, the system is able to generate images from the various media, establish correlations between the various text boxes due to their position, typography, size, etc., and determine which of them, including images, are part of the same news item.
Undoubtedly, this system represents an increase in the productivity of the operators in the unpacking process, who will be able to focus on the analysis of the news, thus increasing the value of their work and reducing the analysis time.