The NABCD of Genius website

Source: Internet
Author: User
Tags social network software idf

1. Requirements (need)

With the development of the society, the Internet industry has been the rapid development, now, the Internet in all areas have made a vital role, with the huge number of data also let each of us in the study life pressure, how to find their own needs in these data, how to be more efficient, Find the part you need more conveniently in the ocean-like data. All of this is testing the ability of IT workers to process data.

Required Requirements:

To make the user's use simpler and easier to understand, it developed keyword extraction and tagging of these technologies.

    • The so-called keyword, Baidu Encyclopedia said: in particular, the individual media in the production of the use of the index, the vocabulary used. is the vocabulary in library science. Keyword search is one of the main methods of Web search index, is the specific name of the products, services and companies that you want visitors to understand. And on our genius website, it is for the document indexing work, from the report, the paper selected to represent the full text topic content information words or terminology. Keywords automatic extraction is relying on the computer to choose from the document to reflect the subject content of the word, also known as keyword Automatic indexing, in the literature retrieval, automatic summarization, text clustering/classification and other aspects have important applications. Keywords can provide a brief summary of the document, allowing the reader to understand the approximate content of the document in a short time. Keywords are also the basis of indexing and clustering of documents in information retrieval system.

    • So-called tag, Baidu Encyclopedia on the definition: tag in China does not have a unified Chinese name, some call it "classification", also known as "open Classification" or "mass classification", and some called "label." Tag (tag) is a more flexible and interesting way of logging classification, you can add one or more tags for each log, and then you can see all the logs on Blogbus and you use the same tag, and thus generate more contact and communication with other users. Thus, tag is a user-defined keyword used to describe information. Tagging is the user's behavior of assigning tags to information.

Killer features:

    • From our team's understanding of the current project, the entire site landing, uploading files, translation files and other display interfaces are written by the WPF design, that is, the so-called client, and we want to achieve a comprehensive web site.

Peripheral Features:

    • A good UI design

    • Scalability: Enhance functionality without destroying the underlying structure

Accessibility features

    • Provides a range of skins for users to enjoy more
    • Provide background music so that the user can be relaxed on hearing.
2. Procedure (approach)

A good keyword extraction &tagging algorithm is beneficial to improve the user experience

    • In order to make the implementation of the algorithm to be simple and effective and good, that is, cost-effective. We followed last year's algorithm, using the TF-IDF approach .
    • The main idea of TFIDF is that if a word or phrase appears in an article with a high frequency of TF and is seldom seen in other articles, it is considered to be a good category-distinguishing ability and suitable for classification. TFIDF
      In fact: TFXIDF,TF word frequency (term Frequency), IDF Anti-document frequencies (inverse document Frequency). TF represents the frequency at which the entry T appears in document D.
    • IDF is generally log (n/n), where n is the total number of documents and N represents the number of documents containing the term T. To make the anti-document frequency smoother, we did some optimizations to let Idf=log (n/n+0.01), where the number of documents containing the term T in a class is M.
    • The main idea of IDF is that if the fewer documents that contain the entry T, that is, the smaller the n, the larger the IDF, the better the class-distinguishing ability of the term T. If the number of documents containing the term T in a class of CI is M, and the total number of documents containing T in the other class is K, it is clear that all documents containing T are n=m+k, when M is large, n is also large, and the IDF value obtained by the IDF formula is small, indicating that the term T category is not strong. But in fact, if an entry is frequently present in a document of a class, it indicates that the term is a good representation of the character of the text of the class, which should be given a higher weight and selected as the characteristic word of the text to distinguish it from other classes of documents. This is where the IDF is deficient.
    • Tagging's main idea is: We have two kinds of algorithms, one is the correlation information weighted Adaptive multi-label classification algorithm, one is based on the TF-IDF keyword extraction tag algorithm. If time is not particularly urgent, we will use the first algorithm to get the best results.

Second, good interaction is the basis of the user's ultimate experience

    • Interface design is the site to the user's first experience, good user design, will make our website form intangible value, user interface design of the three principles are: the user interface in the control of users; Reduce the user's memory burden, maintain the consistency of the interface. Therefore, we will follow these three principles, design a good, user-friendly interface.
    • Through the interaction between the product interface and the behavior of the product and its users to establish an organic relationship, so as to effectively achieve the user's goal, which is the purpose of interactive design. Whether it is a personalized interface, music section settings, problem classification, problem feedback, etc., is to create a more comfortable user experience.
3. Benefits (Benefit)

Our products audience users in school-based students, for the student community to provide query-related information services, different from other general public search site, will be a large number of low-correlation information to users, our products will be crawled good text data processing, access to effective keywords and tags tag , in addition to our bilingual translation, these features are more relevant to the student community.

For us, we plan to cooperate with the educational institutions after the product matures, on the one hand, we can obtain a larger and more valuable teaching material from the educational institution, on the one hand we help the educational institution to promote the business on the Learning website, based on the high percentage of our users ' group, The effectiveness and success rate of the promotion will be greatly improved. lies in the cooperation of educational structure to form a mutually beneficial and mutual win mode of business use.

4. Competition (competitor) currently similar to the platform of a wide range of products, major mainstream search engines have also launched related products, such as the more famous "Baidu Know", "360 Search and Answer" and so on. In addition to these search giants, some mainstream forums have launched related search services, compared to our product technology is far from mature, but we also have their own competitive advantage, so that we have reason to believe in another competitive market share.

First, the user-oriented group specificity, the field of strong pertinence, high degree of professionalism.

The website uses the membership system, through the account binding user, becomes the member user many for in the school personnel or the related field professional, in questions and answers and enriches the website content will have the professional reference and the academic rigor, simultaneously the system collects the related specialized information, provides the professional solution for the question solution.

Second, the product size is small, easy to modify, high plasticity, flexibility

Compared to the current search engine giants, our service groups are small, targeted, we get feedback response time is short, fast feedback, can quickly on the function of the site and the structure of the existing problems to provide timely and effective solutions.

III. auxiliary functions and individualized design

This product additional accessibility features, such as music section, user-defined home page background, personalized skin and other functions, so that users learn knowledge on the site at the same time, in the visual and auditory relaxation.
5. Delivery (Delivery)
The propaganda channel relies on the propaganda of the students mainly, supplemented by the network propaganda.

The first is to open the market, the site through the QQ group, group, Weibo and other mainstream network social software released, first from around the students, recommend trying to use our products, for each registered user, will be based on the ID generated exclusive invitation code, through the dissemination of the invitation code and registered users, you can get some exclusive rewards, A positive reward feedback mechanism is also available to the user who successfully publishes the invitation message. Students through mutual recommendation to obtain the user volume, according to the theory of social computing, the student's social circle group composition is relatively single, with other students as the main, so the mutual recommendation between students effective, high success rate, spread speed, development form rich.

Second, through the cooperation with other Web sites, to obtain the advertising on their pages, mutual promotion to increase the amount of mutual benefit of mutually beneficial co-operation model. Publish Location:
Our products, initially intended to be released in the Beihang University, to students as the main member of the QQ Group, group, Weibo and other social network software released, and then sent to the friends of the school students group, our products to other universities, expand user groups, speed up the development of user volume, and then plan in Csdn, On GitHub and other important it platforms, the more the better, the more we promote our products.

Expected User:


The number of users is expected to be around 500 through the advocacy of the group members.

The NABCD of Genius website

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.