Unlike unimodal retrieval,In cross-modal search, the modes of the search results are different from those of the query results.. For example, you can use an image to retrieve text, video, and audio.The key to cross-modal search is to model the relationships between different modes. The difficulty is bridging the semantic gap.However, when the document to be retrieved contains a multi-mode, the general cross-modal method cannot be directly applied to the multi-modal retrieval.
The multi-modal retrieval method can process multimedia data with multiple modes,In multi-modal search, the query and the document to be retrieved may contain more than one mode.. The multi-modal retrieval method can be used to improve the accuracy of unimodal retrieval.The main difference between multi-modal and cross-modal retrieval is that in multi-modal retrieval, there must be at least one mode for querying and retrieving documents..The multi-modal method is usually used to combine different modes for retrieval, rather than modeling their relationships.For example,
In many multimodal image retrieval systems, the image to be searched may contain relevant text, and the image to be retrieved also contains relevant text information.If the query and the document to be retrieved do not have the same modal, this is the problem to be solved by cross-modal. The traditional multi-modal method is powerless..
Translated from a semantic model for cross-modal and multi-modal Retrieval