Graduation Design Concept: discovery of the world's National People's Congress User Group

Source: Internet
Author: User

According to his interest, he refused the dean and relied on the name of Teacher Liu Qing to do web data mining. In fact, I want to work independently on the basis of tdrd data, so I did not participate in the unified assignment of tasks in the group, and hung in the group name. If I could not, I could ask the teacher.

The central idea is to explore common interest groups, circle of friends, and celebrities by observing the relationships between users of tdrd. There are only a limited number of ways between users of the world's largest people: browsing the common layout, replying, in-site emails, on-site MSG, and viewing information. We can use PHP on the web layerProgramRecord each user's actions and input them to MySQL. For later analysis and mining.

1. You can explore common interest groups by browsing the common layout. In fact, this function is now very common, and the smth system itself also has this function, which is not available in Heaven and Earth. I want to familiarize myself with clustering.AlgorithmFor the subsequent work.

2. You can find out the circle of friends by reading replies, in-site emails, MSG, and materials. This is a clustering algorithm that uses a matrix of contact strength to calculate the feature values of objects. Then, you can send a questionnaire to the user to assess whether the results are accurate.

3. From the user contact we mentioned above, we can find the "Celebrities" of the World Congress ". Clustering algorithms require both parties to have frequent connections to form a strong relationship. If a person does not pay equal attention to those who follow him, this may be a "celebrity ", everyone pays attention to him, but he does not pay attention to those people. In this case, we need to use the PageRank method to regard the links that direct to these users as links.

4. If there is still time, I would like to discuss whether it is reasonable to generate a static smth mainpage every 15 minutes. That is to say, in order to reduce the pressure on dynamic page resolution and database query caused by repeated and concurrent access to a dynamic page, a website generally uses dynamic pages (including database queries) on a regular basis) generate static pages (using interpreter output redirection ). How is the interval determined? Which factors are related? I thought about these factors: the frequency of database updates, the number of concurrent page views, and the real-time user expectation for data (I can accept the delay ). We need to find a function that determines the time interval of static generation from these factors, at least the empirical formula, to reasonably determine the time interval, rather than the programmer's estimation and imagination. This should involve data collection, questionnaire survey, multi-dimensional curve fitting, and whether there is a correlation between the last two factors, the correlation coefficient, and so on. This problem seems complicated and can be considered for a long time. I have not reviewed the papers yet, and I do not know if there are any conclusions available.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.