Introduction to video Subtitles
video subtitles are generally divided into two categories:
- Caption: This subtitle is synthesized into the video stream by post-production, and contains a semantic description of the current video content. such as: Online video course subtitles, news broadcast subtitles and so on.
- subtitle : This caption is the text of the environment in the video or the object itself. such as road name on the street sign, the name on the clothing, the trademark of the product and so on.
What we're going to introduce is the first type of caption .
In contrast to the situation where optical character recognition (Optical Character recognition) can be performed directly, the extraction of video captions faces several problems:
- The complex background of video images makes it difficult to extract and segment subtitles.
- In order to avoid occlusion of the main part of the image, many video captions have very small character sizes, resulting in low resolution.
- Digital video is stored in a lossy compression format, reducing the resolution again.
However, as a caption, there are also the following notable features:
- The size of subtitles is limited to a certain range, and the size is basically equal.
- Subtitles are focused in a horizontal arrangement.
- Edge Shadow is the complementary color of the subtitle foreground or the substrate.
Using these features, it can reduce the difficulty of extracting subtitles, and make the extracted subtitles have higher accuracy.
Recognition of video subtitles
The recognition of video subtitles is mainly through the following steps: subtitle detection, subtitle positioning, caption extraction and subtitle recognition.
This is illustrated below:
General flow of Video caption recognition