Content-based video tagging is a large context in which traditional retrieval techniques cannot meet the requirements of today's massive video retrieval, and people turn to research on content-based video retrieval. Based on the content of the video, it labels the keywords according to different semantic concepts, which is the necessary basis for establishing the video index and realizing the efficient video retrieval. Currently, the main methods used in video tagging can be divided into three categories: manual tagging, rule-based labeling, and machine learning-based labeling.
Rule-based labeling is the use of expert knowledge in an area to establish the corresponding classification rules for labeling. However, these rules do not generally summarize all the semantic content, and can not meet the requirements of video labeling versatility and practicality.
Based on the method of machine learning, the model of semantic concept is established by the learning of manual annotated training video data, then the non-labeled video data sets are classified by the model, and the corresponding semantic concepts are labeled. Because the machine learning theory is relatively mature, it is generally considered to be a suitable method to solve the problem of video labeling, and the research of video labeling is mainly focused on how to use machine learning theory to improve the accuracy of labeling.
According to the different levels of the concept in the video structure, video labeling can be divided into the following three kinds: Video type layer labeling, key frame image layer labeling, and image of the object layer annotation. The different levels of video labeling are as follows: