Reading 1: I personally prefer to watch news and information, such as huxiu, 36kr, and sports news of science and technology, which is also useful for related apps, toutiao.com is doing a good job. many people around it are using it. However, after using it for a while, many apps have the following features:
1. there are many and complex information. even if I only subscribe to or pay attention to some categories, there are too many messages to be pushed, followed by irrelevant messages. I can't consume too much information. I'm disgusted with irrelevant information.
2. the current APP claims to be able to carry out accurate and personalized recommendations. Toutiao.com is doing well, but it does not feel like it can capture changes in user interests in a timely manner, and the recommendation results are also small, with insufficient surprises.
3. aggregation News has a lot of repetitive content, and many of them are simple crawling and presentation, which does not greatly improve reading methods and experience.
The above is probably a bit inconvenient after I use it. I have done some recommendations and text processing related things for a while before, and I want to implement a simple system with my own ideas, try it by yourself and verify your thoughts. my personal thoughts on the above issues are 1. each day, users are presented with a certain amount of valuable news, that is, the number of news pushed to users is limited. in terms of relevance, user feature modeling is required, and the expected effect is not obvious, it can only be controlled through some policies, such as the most popular and Related combinations, an event or a category to display a piece of news and so on. 2. update the user's feature weights in a timely manner based on user behaviors, and make changes more real-time. 3. many people only read the general idea of the article and seldom read the full text. if the article can be summarized, it should be better for the APP class, however, there seems to be no good abstract method for Chinese now, so we can only try to improve it. I will use the digest algorithm introduced in the previous article to conduct experiments and try it with Chinese lexical and semantic features.
The above are purely personal opinions and opinions, and there must be something wrong with it. if you have ideas in this regard, you can share them together.
At present, some development work has been carried out. Previously, java has been used for web-related services and design, but the general cost of running java on ECS is high, therefore, python is used for related development. The system is designed as follows:
The system is mainly divided into OnLine Service and OffLine Service. The OnLine part mainly performs the following operations:
A ). fetcher uses UA and PA to obtain the recommended News data. First, it requests related data computing to redis and then obtains data from MySql. Currently, it is assumed that MySql can satisfy a certain number of concurrent requests, in the future, you can add a cache layer before MySql based on the data type.
B). Updater updates the UA weights in the cache based on user behaviors, so that we can recommend the weights based on the latest user behaviors.
Currently, the tornado framework is used to provide web services, redis is used as the cache to store data, mysql is used as the underlying data storage, rabbitmq is used as the message queue, and jieba word divider is used to perform Chinese word segmentation, redis + mysql has been implemented at present. the design and implementation of the main remaining web pages, feature extraction and summarization are in progress, because there are many things, the final implementation may be quite different from what is mentioned in the article. next we will talk about the implementation process and effect of some ideas, depending on the progress and work. if you are interested, you can communicate with each other.