Topic Center

Contact Sales

Home > Others

Prediction of the number and propagation depth of microblog propagation--based on Pyspark and some regression algorithm

Last Update:2016-09-02 Source: Internet

Author: User

Tags pyspark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

8-28 decided to take part in the thousands data Processing task, since the scene was almost the same as one of the regression predictions that it had done, and the first day began to prepare for small-scale data.

# # Second Big modified version
# # # Date 20160829
Raw data processing, get the user fan relationship, the amount of Weibo forwarded at each time period, the overall depth of Weibo forwarding
The next stage goal, build the model, realize the prediction based on time series

# # Third Big modified version
# # # Date 20160830
Transfer these operations to the Linux platform because some iterations completely make my computer's memory unbearable
The main purpose of this version is to calculate the change in the time series of the depth of a microblog.

# # Fourth Major modified version
# # # Date 20160831
The test that extracts the depth of the sequence and the number of forwards that have changed over time from the raw data is done
This modification two tasks: first, the function is integrated into two parts respectively; second, replace the sampled data with the original test data to run through the basic data processing
The main purpose of the next release is to build a model of the data prediction through these known relationships, train with training data, test with test data, and then modify the parameters to get the best model

# # Fifth Major modified version
# # # Date 20160901
The serious problem this morning is that there is not enough memory, because I have cached the rdd of the computational process, especially the initial data, which is so large that it is not enough.
The change caches only important results, such as the Rdd of the time series, the number of forwards, and the forwarding depth, so that the program can be executed almost completely.
Just the second version of the depth of calculation or some problems, need to be used later in the time to further modify, especially for a specific time period, who is forwarding, the number of people forwarded the biggest fan.

The main problem with this version is to save the results of the calculation to a file, so that the regression model calls the data processed in the file for training and prediction.
First plan to achieve a certain time period of the forecast, the other overall forecast is to do later.

# # Sixth Major modified version
# # # Date 20160901
The biggest receipt this afternoon was to see the dawn.

But success is a distance from what I thought before.
This version will complete the calculation of all the required data, saved to the file, I hope to complete today

Cond

Prediction of the number and propagation depth of microblog propagation--based on Pyspark and some regression algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

sum of number v and 18 advantages and disadvantages of cloud based services use of in and on in dates use of in and on for dates pyspark and kafka date and time of use of if and

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Prediction of the number and propagation depth of microblog propagation--based on Pyspark and some regression algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support