Cloud storage faces encrypted data retrieval challenge

Source: Internet
Author: User
Keywords Algorithms security cloud storage can

Cloud computing is a form of distributed computing, an online network service delivery and usage model that obtains the required services on an as-needed and extensible basis over the network. is a network of services and hardware and software collections of data centers that provide this service. Cloud computing is the evolution of parallel computing, distributed computing and Grid computing. The implementation of cloud computing includes software as service, utility computing, platform as service, infrastructure as service. Cloud computing already has some applications, such as Google's docs, and Microsoft and Amazon have similar cloud computing services.

The primary goal of cloud computing is to provide efficient computing services. One of the cloud computing infrastructures is providing a reliable and secure data storage center. Therefore, storage security is one of the security topics in the Cloud computing field. In order to solve the problem of data privacy protection, the common method is to encrypt the data by the user, and store the encrypted ciphertext information on the server. After the size of encrypted data stored in cloud, the retrieval of encrypted data becomes an urgent problem to be solved.

In the research work of cryptographic information retrieval, there are several algorithms such as single user linear search, keyword based public key search, security index and so on. These algorithms can quickly retrieve the required information, but the cost is high, do not apply to large-scale data retrieval, and in the cloud storage, the retrieval of the relevant documents, the relevant ranking is a further problem to be solved, the above several algorithms can not solve the problem.

The word frequency information in the document can be used to sort the documents according to the correlation degree by the guaranteed order encryption, which improves the retrieval accuracy and the return rate. However, some keywords appear in the document frequency is very high, the reference is not strong, this kind of word is called commonly used words, the existence of common words distort the document and the actual query correlation degree. The vector space model which accurately reflects the relevance of documents and queries cannot be directly applied. All-Homomorphic encryption provides cryptographic algorithms that can operate on ciphertext. And through the full homomorphism encryption, on the one hand can ensure that the ciphertext information is not statistically analyzed, on the other hand, encryption information can be additive and multiplication operations, while maintaining its Chengmingwen order.

1. Encryption storage technology in cloud storage applications

Large-scale high-performance storage system security requirements, especially in cloud storage applications, scalable and high-performance storage security technology, is to promote the network environment storage applications (such as cloud storage applications) The most fundamental guarantee, has become the current network storage area research hotspot. Storage security in cloud storage applications includes authentication services, data encryption storage, security management, security logging, and auditing.

The Access Control service realizes user authentication and authorization, prevents illegal access and unauthorized access. The main features include: Users can only be authorized by the Administrator or the owner of the file license permission to do the operation, the administrator can only perform the necessary administrative operations, such as user management, data backup, Hotspot object migration, but not access to user encrypted private data.

Encrypted storage is the encryption of the specified directory and file to save, to achieve sensitive data storage and transmission process of confidentiality protection. The main function of security management is the maintenance of user information and authority, such as user account registration and cancellation, authorized users, and the user rights are recovered in emergency.

Security log and audit is to record user and system and security related major activity events, for system administrator monitoring system and active users to provide the necessary audit information.

For users, storage encryption services are particularly important in the 4 types of storage security services described above. Encrypted storage is the core technology that guarantees the confidentiality of the user's private data on the shared storage platform.

As storage systems and storage devices become more networked, the storage system must provide the appropriate encryption data sharing technology while ensuring sensitive data confidentiality. Protecting the privacy of users requires that storage security be based on trust in the storage system. It is necessary to study the encryption storage technology applicable to networked storage systems, provide end-to-end encryption storage technology and key long-term storage and sharing mechanism to ensure the confidentiality and privacy of user data, improve the security of key storage, the efficiency of distribution and the flexibility of encryption strategy. In the large amount of encrypted information storage, encrypted retrieval is the main means to realize information sharing, and it is one of the problems that must be solved in encrypted storage.

2. Encryption Information Retrieval Technology

The research of encrypted information retrieval began in 2000, Song et Boneh put forward a practical algorithm of encrypting data Search, and others put forward the algorithm of security index search based on key words.




2.1 Linear Search algorithm

In the linear search algorithm, the plaintext information is encrypted by the symmetric encryption algorithm first. For each key word corresponding to the ciphertext information, generate a string of length is less than the length of the ciphertext information pseudo-random sequence, and generate a pseudo-random sequence and ciphertext information to determine the calibration sequence. The length of the pseudo random sequence and the length of the test sequence are equal to the length of the ciphertext information. The pseudo random sequence and the test sequence encrypt the ciphertext information again. In the search process, the user submits the plaintext information corresponding to the ciphertext information sequence. On the server side, the ciphertext information sequence is linearly combined with each sequence modulo 2. If the result satisfies the check relation, then the ciphertext information sequence appears, otherwise, the ciphertext information does not exist.

Linear Search method is a kind of encryption information retrieval algorithm, so it has strong ability to resist statistic analysis. But it has a fatal disadvantage, that is, successive matching of ciphertext information, which makes this retrieval method difficult to apply in the case of large datasets.

2.2 Keyword-based public key search

The algorithm of public key cryptography based on keywords is proposed by Boneh, whose aim is to obtain data information by accessing remote database in the case of user-side storage and insufficient computing resources. The distribution of storage and computing resources is asymmetric, that is, the user's computational storage ability can not meet their needs in real time. On the other hand, users in the mobile situation to store, index data needs also increased, such as email services. In this particular case, the user's data privacy needs to be protected. There are several different sources of encrypted data, and the solution to this problem is that the cryptographic algorithm uses public key cryptography.

The process of the algorithm is to generate the public key, the private key, and then the stored plaintext keyword is encrypted with the public key to generate searchable ciphertext information.

2.3 Security Index

The security index was proposed by park and others, which solved the problem that simple indexing was vulnerable to statistical attack. The mechanism is that the key used for each encryption is a set of inverse hash sequences that are generated in advance, and the encrypted index is placed in the long filter. When retrieving, we first generate multiple traps with inverse hash sequence key, and then carry out the test of the cloth-lung. Decrypt the returned ciphertext document to get the document that you want to retrieve.

This is a solution to the multiuser encryption information retrieval with new users joining and exiting the old user. However, the defect is that a large number of key sequences need to be generated, and as the number of retrieval times increases, the computational complexity of each retrieval is increased linearly. This is difficult to accept in practical applications.

In the above-mentioned encryption information retrieval algorithm, the retrieval model used is a Boolean model, so it cannot sort according to the correlation degree of the query and the document to be retrieved. In practice, especially in the case of large data cloud storage applications, there may be a number of documents containing a query keyword, and how to find one or more of the most relevant documents in a number of possible related documents is a problem that needs to be addressed. It is an open question whether a mature vector space model can be applied to the encrypted document, and then the related sort.

2.4 Encryption search algorithm with relevance ordering

Swaminathan and others put forward a sort search algorithm to protect privacy. In this algorithm, the word frequency of the keywords in each document is encrypted by the guaranteed encryption algorithm. After the encrypted document is submitted to the server, the encrypted document containing the cipher text is retrieved first, then the ciphertext information corresponding to the word frequency encrypted by the PAO algorithm is sorted and processed, and the encrypted document with high value is returned to the user and decrypted by the user.

This approach allows you to sort the encrypted document with a given number of possible related documents, and then return the most likely related documents to the user. However, this algorithm is not suitable for a query containing multiple query words, then the algorithm only uses the word frequency information in the document, can not use the inverse of the document frequencies, and then the vector space model can not be directly applied. One way to solve the former problem is to encrypt the word frequency information with the additive homomorphic encryption algorithm.




3. A retrieval method based on full homomorphism encryption

In the research of cryptographic information retrieval, the ranking of results is one of the important indexes to measure the performance of the retrieval algorithm. With the promotion and application of cloud computing technology, the encrypted document will increase in an explosion. The accuracy of the sequencing becomes an objective requirement for the performance of the retrieval system, whose main purpose is to improve the service quality and retrieval efficiency of the retrieval system. Analyzing the existing encryption information retrieval algorithm, it is found that it is not enough to consider the scheduling problem and accuracy while ensuring the performance of the recall and the two aspects.

In order to solve this problem, this paper proposes a retrieval method of all-homomorphic encryption for cloud storage applications. The retrieval method of all homomorphic encryption is to use the vector space model in information retrieval, to compute the correlation between the retrieved documents and the information to be queried, to count the frequency of the words and the inverted documents, and then to encrypt the documents and establish the indexing method using the whole homomorphism method. Retrieves the encrypted document with the index Xiangmiven and uploads it to the server side.

The full homomorphic encryption retrieval and sequencing process is shown in Figure 1. Before submitting the search, the search statement is also preceded by word segmentation and stemming, and the plaintext sequence of the keyword is obtained and the plaintext is encrypted. The cloud server submits the encrypted retrieval word when retrieving the ciphertext sequence.

The document is represented by the weight vector of each keyword, and the weight is normalized to the product of frequency and logarithm of the inverted document. The weights can be obtained by operation of the frequency and inverted document frequencies after the full homomorphic encryption.

The same method is used to describe the search term, the inner product of both can get the correlation degree, then sort by size, and return the valid sorted document to the user. After the user gets the encrypted document, the document is decrypted with the private key to get the original document.

PlainText data encrypted by a full homomorphic encryption algorithm can be effectively retrieved without resuming the plaintext information, that is, the most relevant documents are returned to the user. Not only protects the user's data security, but also improves the retrieval performance.

4. Concluding remarks

In this paper, the importance of encryption retrieval technology in cloud storage is analyzed, and the current research status and problems of cryptographic retrieval and related technologies are analyzed synthetically. On this basis, this paper puts forward the method of full homomorphic encryption retrieval and briefly introduces the basic principle of the full Homomorphic encryption retrieval method. The existing experimental data show that the full Homomorphic encryption retrieval method can improve the retrieval efficiency to some extent compared with other encryption retrieval algorithms.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.