Data mining----Basic tools that librarians should master

Source: Internet
Author: User
Keywords Library

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

Wang Green Park Cammeying Guangzhou PLA Institute of Physical Education 510502





Abstract: This paper reveals a way for librarians to carry out information service in the future Digital Library, discusses the basic principles and methods of data mining and web mining, and emphasizes the necessity for librarians to master the new technology of data mining.


Keywords: Data mining, WEB, INTERNET, information services, librarian





Introduction


with the advent of the information age and the development of Internet technology, the library's future functions become more and more diverse. Thanks to the great success of web technology, people are increasingly relying on the internet to acquire knowledge and information. Obviously, people go to the library more and more times and less time, one day, people will never personally run to the library to borrow a book or consult a question. Faced with such a huge challenge, people engaged in library work put forward the idea of digital library. After painstaking research, the digital library has been successfully established. Digital Library is a standard electronic information infrastructure, it adopts distributed data storage, users can search and retrieve their own information through a variety of links and operations, and the whole operation is open to users. In other words, digital libraries can store data in multiple sites, and users can search for information from these sites with a simple operation. The digital library is the development of the traditional library in the information age, it not only contains the function of the traditional library, but also provides the comprehensive information Access service.  It is hoped that the digital library will become the information center and hub of the future library.





However, from the network technology point of view, the digital library and other web information sources are no different, they are built on the Internet information site. From a user's point of view, they are concerned about getting information and knowledge from the Internet, and they cannot realize whether you are a digital library or a general site on the Internet. But it is not easy and sometimes impossible to retrieve useful information from tens of thousands of web information sites, not to mention that information seekers often have little time to roam the vast ocean of data. We know that in the digital library, librarians are information specialists, they are information organizers, information producers, information providers and information managers. It is through the information service activities of librarians that the digital library is different from other information sites. Then, how should the librarians, especially the librarians in the college library, do the information service in the digital library in the 21st century? What advanced technical tools should librarians have to serve the users?  This is the important problem that the library circle should study now.





second, the choice of technology


The web is a very successful information system. The system provides an opportunity for information to be disseminated and disseminated globally, allowing anyone to disseminate and access information at any time in any location. The web's unstructured information dissemination and acquisition method has caused the explosion of the news. A large amount of unstructured information is dispersed across the Internet. A large amount of information brings convenience to people at the same time also brings many problems: information overload, difficult to digest, information authenticity difficult to identify, information security is difficult to guarantee, information form inconsistent, difficult to unify processing. In the ocean of information, even the most advanced search engines can only get information from 1/3 of indexed web sites. People began to put forward a new slogan: "Learn to discard information." People began to think: "How can we not be overwhelmed by information, but find useful knowledge in time, improve information utilization?" "In fact, as an unprofessional information manager, it is almost impossible to get the latest, most authoritative knowledge and more comprehensive information from the vast ocean of information, and get useful knowledge from it." It is important to know that experts and professors in institutions of higher learning or research are in urgent need of the latest and most authoritative knowledge and information in the field of expertise because of the urgency of their research projects, and they have no time to find the information they need in countless pages in thousands of sites. So there should be an intermediate link between the provider of information and the consumer of information, through which information consumers can quickly and easily get the information they want, and it is this urgent need to provide the librarians with the opportunity to show their talents. Librarians use information management skills to provide users with useful information directly.  Librarians should seize the opportunity to give serious consideration to how to broaden their services in an Internet environment, extending from desktop consulting services to web-based information space Electronic consulting services.





www is one of many types of intelligence and has great particularity. Its information is distributed around the world and is variable at any time. The information distributed around the world allows anyone to disseminate and obtain information at any time in any location. To this end, librarians must select a tool to effectively organize and obtain information in www. Traditional search engines do not evaluate the content of the site, but only mechanically identify the keywords provided by web designers, even the best search engines require users to personally access different sites and verify the information.  Web Mining is the ideal tool to overcome these drawbacks.





third, data Mining and web mining


Data Mining (Mining) is one of the most popular topics in the field of information Technology (IT). Data mining is the process of extracting information and knowledge which is hidden in the unknown, but is potentially useful, from a large number of incomplete, noisy, fuzzy and random practical application data. What is knowledge? In a broad sense, data and information are also forms of knowledge, but people regard concepts, rules, patterns, rules and constraints as knowledge. People think of data as a source of knowledge, as if mining or panning from ore. Raw data can be structured, such as data in relational databases, or semi-structured, such as text, graphics, and image data, or even heterogeneous data distributed across networks. The method of discovering knowledge can be either mathematical or mathematical, or it can be deductive or inductive. Discovered knowledge can be used for information management, query optimization, decision support and Process control, and can also be used for data maintenance. Therefore, data mining is an interdisciplinary subject, it puts people's application of data from low-level simple query to the data mining knowledge, providing decision support. In this kind of demand traction, researchers from different fields, especially database technology, artificial intelligence technology, mathematical statistics, visualization technology, parallel computing and other scholars and engineering technicians, devote themselves to the new research field of data mining and form a hot spot of technology. When the data mining technology is applied to the web in the network environment, it becomes web mining (web Mining), and web mining can be defined broadly as the discovery and analysis of useful information from www. This definition has two meanings: on the one hand, it describes the automatic search and acquisition of information and data from millions of web sites and online databases, which is called Web content Mining (Mining) and, on the other hand,  The model for discovering and analyzing users accessing one or more sites and online services is called Web Usage mining (web Usage Mining).





Heterogeneous and unstructured data in the web makes it difficult to discover, organize, and manage information. Traditional search, indexing tools, such as Lycos, Alta Vista, WebCrawler, Aliweb, and so on, although they can provide users with some convenience, but they do not provide structured data, and do not provide classification, filtering and document translation and other basic functions. In recent years, researchers are working Web content mining and developing intelligent information retrieval tools. Agent based retrieval is the intelligent Information retrieval tool, it is an artificial intelligence system. It can on behalf of a specific user, automatic or semi-automatic discovery and organization of web-based information, it can according to the user's basic situation, automatically retrieve users interested in information, and organize and translate this information. Some agents can even automatically learn the user's hobby, and according to the user's hobby for users to retrieve relevant information. Another approach to Web content mining is a database based approach.  This approach is to integrate and organize heterogeneous unstructured data in the Web into structured data, like relational databases, and then access and analyze the information using standard database query mechanisms and data mining techniques.





Web Usage Mining (web Usage Mining) is the discovery of a user access model (or access habit), and its data is automatically collected from daily access logs. Web usage mining is critical to building user profiles.  Studying the behavior of users on one or more servers is essential to improving the Web site to serve users more effectively.





Iv. Information Service


Web Mining is a promising tool. We know that the traditional inefficient search engine retrieves information that is often indexed incomplete, has a large amount of irrelevant information, or does not perform reliability verification. Users can quickly and easily retrieve relevant and reliable information from the Web is a system's most basic requirements. Web mining can not only discover information from a large number of WWW data, but also monitor and predict the user's access habits. This gives designers more reliable information when designing a Web site. Web mining technology can help librarians in the design of the site towards user-friendly, save time and high efficiency direction. Web mining technology provides the advanced tools for librarians to carry out information service.  With this tool, librarians can organize more and better quality information for users in accordance with the requirements or habits of each user.





For example, college librarians use Web mining technology to retrieve relevant information from WWW for different research subjects in different disciplines of the university. The technology can automatically retrieve information and classify it according to the subject area, making it easier to access. Librarians can establish a set of characteristics for different subject areas and search and classify them based on these characteristics, thus ensuring that the information obtained is reliable and authoritative. Because Web mining technology can automatically, without human intervention from the WWW to discover and organize information, so that librarians only spend a small amount of time to maintain the database to complete the task. Users who don't have to spend a lot of time browsing hundreds of documents can get the information they want in a very short time and feel very satisfied. More importantly, they can access any information anywhere in the world at any time.  In fact, this is how librarians are moving their consulting services from the desktop to the Internet.





v. Concluding remarks


in the future digital library, how to give full play to the role of librarians is an important issue that every librarian should consider. Data mining technology is the main technology of information retrieval in the future.  To this end, we librarians should continue to learn new technologies, new methods, do a good job of information services, and strive to open up and strive to become a real information experts.


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.