Projections for the 2016 presidential election

Source: Internet
Author: User
Tags ggplot

US presidential election of the ASA

In this election year, the American Statistical Association (ASA) put together student contests and presidential elections to predict the exact percentage of candidates who were the winner of the 2016 presidential election as a match point. For details, see:

http://thisisstatistics.org/electionprediction2016/

Get Data

There are a lot of public polling data on the Internet. You can get the relevant data for the presidential election on the following website:

http://projects.fivethirtyeight.com/2016-election-forecast/national-polls/

Other good data sources are:

http://www.realclearpolitics.com/epolls/latest_polls/

Http://elections.huffingtonpost.com/pollster/2016-general-election-trump-vs-clinton

http://www.gallup.com/products/170987/gallup-analytics.aspx)

It is important to note that the data is updated daily, so you may be looking at this article with different results when the data changes.

Because the original data is a JSON file, R pulls it off as a list in lists.

The original GitHub address: HTTPS://GITHUB.COM/HARDIN47/PREDICTION2016/BLOB/MASTER/PREDBLOG.RMD

# #载入需要的包require (XML) require (DPLYR) require (Tidyr) require (READR) require (mosaic) require (Rcurl) require (GGPLOT2) Require (lubridate) require (Rjsonio) # #数据拉取url = "http://projects.fivethirtyeight.com/2016-election-forecast/ national-polls/"Doc <-htmlparse (url, useinternalnodes = TRUE) #爬取网页内容sc = xpathsapply (Doc,"//script[ Contains (., ' Race.model ')] ", function (x) c (Xmlvalue (x), Xmlattrs (x) [[" href]])) Jsobj = Gsub (". *race.state Data = (. *); race.pathprefix.* "," \\1 ", SC) data = Fromjson (jsobj) allpolls <-data$polls#unlisting the whole thingindx &L t;-sapply (allpolls, length) pollsdf <-as.data.frame (Do.call (Rbind, lapply (allpolls, ' length<-', Max (indx))) # #                                                  Data cleaning #unlisting The WEIGHTSPOLLSWT <-as.data.frame (t (As.data.frame (Do.call,                                                        Lapply (Pollsdf$weight, Data.frame, Stringsasfactors=false))))) names (POLLSWT) <-C ("Wtpolls", "Wtplus", "Wtnow") row.names (POLLSWT) <-nullpollsdf <-cbind (POLLSDF, POLLSWT ) #unlisting the VOTINGINDXV <-sapply (pollsdf$votinganswers, length) pollsvot <-as.data.frame (Do.call, Lapply (pollsdf$votinganswers, ' length<-', Max (INDXV)))) Pollsvot1 <-R Bind (As.data.frame (Do.call (Rbind, lapply (Pollsvot$v1, Data.frame, St                                                       Ringsasfactors=false))) Pollsvot2 <-rbind (As.data.frame (Do.call, Rbind (lapply, Pollsvot$v2, Stringsasfactors=false))) pollsvot1 <-cbind (polltype = Rownames (pollsvot1), Pol LSVOT1, Polltypea = Gsub (' [0-9]+ ', ' ', Rownames (POLLSVOT1)), Polltype1 = Extract_nume Ric (Rownames (POLLSVOT1)) Pollsvot1$polltype1 <-IfElse (is.na (Pollsvot1$polltype1), 1, Pollsvot1$polltype1 + 1) Pollsvot2 <-cbind (polltype = Rownames (POLLSVOT2), pollsvot2, Polltypea = Gsub (' [0-9]+ ', ' ', Rownames (Pollsvot2)), Polltype1 = Extract_numeric (Rownames (Pollsvot2))) Pollsvot2$polltype1 <-IfElse (is.na (Pollsvot2$polltype1), 1, pollsvot2$ Polltype1 + 1) pollsdf <-pollsdf%>% Mutate (population = unlist (population), samplesize = As.numeric (Unli St (SampleSize)), pollster = Unlist (pollster), StartDate = Ymd (Unlist (StartDate)), endDate = Ymd ( Unlist (EndDate)), pollsterrating = Unlist (pollsterrating))%>% Select (population, samplesize, pollster, start Date, EndDate, pollsterrating, Wtpolls, Wtplus, Wtnow) allpolldata <-cbind (Rbind (Pollsdf[rep (Seq_len) (Nrow (polls DF)), each=3),], Pollsdf[rep (Seq_len (Nrow (POLLSDF)), each=3),]), Rbind (poll  SVOT1, Pollsvot2)) Allpolldata <-allpolldata%>% Arrange (polltype1, choice)

View all selection data: Allolldata

Fast Visualization

It is necessary to simply look at the data before figuring out the proportion of the projected votes for the 2016 U.S. presidential campaign. The data set has been collated and visualized using the Ggplot2 package (select data from August 2016, the x axis is the enddate,y axis is adj_pct, the color is based on choice, which is the two colors Clinton and Hillary, and according to the Wtnow set the point size):

# #快速可视化ggplot (Subset (Allpolldata, (Polltypea = = "Now") & (EndDate > Ymd ("2016-08-01")),        Aes (y=adj_pct, X =enddate, Color=choice)) +   geom_line () + Geom_point (Aes (Size=wtnow)) +   Labs (title = "Vote percentage by date and P Oll weight\n ",        y =" Percent Vote if election Today ", x =" Poll Date ",        color =" candidate ", size=" 538 poll\nweight " )

Quick analysis

Given that each candidate's vote ratio is based on the percentage of votes currently voted, the vote weight must be set based on the idea of 538 (sample size samplesize) and the day Sine poll of the poll closing days. The weights are calculated as follows:

Using the calculated weights, I will calculate the weighted average of the percentage of votes being predicted and its standard deviation (SE). The standard deviation (SE) calculation formula is derived from Cochran (1977).

# #快速分析 # references # code found at http://stats.stackexchange.com/questions/25895/ computing-standard-error-in-weighted-mean-estimation# cited from http://www.cs.tufts.edu/~nr/cs257/archive/ donald-gatz/weighted-standard-error.pdf# Donald F. Gatz and Luther Smith, "The standard error of A weighted MEAN CONCENTRA Tion-i. BOOTSTRAPPING VS Other METHODS "weighted.var.se <-function (x, W, Na.rm=false) # computes the variance of a weighted m Ean following Cochran 1977 definition{if (na.rm) {w <-w[i <-!is.na (x)]; x <-x[i]} n = Length (w) Xwbar = Weighted.mean (x,w,na.rm=na.rm) Wbar = Mean (w) out = n/((n-1) *sum (w) ^2) * (SUM ((W*x-wbar*xwbar) ^2) -2*xwbar*sum ((W-wbar) * (W*x-wbar*xwbar)) +xwbar^2*sum ((W-wbar) ^2)) return (out)}# calculates the cumulative average and weighted average cumulative mean/weighted MEANALLPOLLDATA2 <-allpolldata%>% filter (Wtnow > 0)%>% filter (Polltypea = "Now")%>% mutate (dayssince = As.numeric (To Day ()-endDate))%>% mutate (WT = Wtnow * SQRT (samplesize)/dayssince)%>% MutatE (votewt = wt*pct)%>% group_by (choice)%>% arrange (choice,-dayssince)%>% mutate (cum.mean.wt = Cumsum (votewt )/cumsum (WT))%>% mutate (Cum.mean = Cummean (PCT)) View (ALLPOLLDATA2)

Visualize cumulative average and weighted average values
# #绘制累计平均/Weighted average cumulative mean/weighted mean# cumulative average Ggplot (subset (ALLPOLLDATA2 (EndDate > Ymd ("2016-01-01")),        AES (Y=cum.mean, X=enddate, Color=choice)) +   geom_line () + Geom_point (Aes (SIZE=WT)) +   Labs (title = "Cumulative Mean Vote percentage\n ",        y =" Cumulative Percent Vote if election Today ", x =" Poll Date ",        color =" candidate ", si Ze= "Calculated Weight") # Weighted average Ggplot (subset (ALLPOLLDATA2) (EndDate > Ymd ("2016-01-01")),        Aes (Y=CUM.MEAN.WT, X =enddate, Color=choice)) +   geom_line () + Geom_point (Aes (SIZE=WT)) +   Labs (title = "Cumulative Weighted Mean Vote P Ercentage\n ",        y =" cumulative Weighted Percent Vote if election Today ", x =" Poll Date ",        color =" candidate ", si Ze= "Calculated Weight")

Vote percentage Forecast

In addition, the standard deviation of the weighted average and average (Cochran (1977)) can be calculated for each candidate. Using this formula, we can predict the final percentage of the main candidate!

Pollsummary <-allpolldata2%>%   Select (Choice, pct, WT, VOTEWT, SampleSize, dayssince)%>%  group_by ( Choice)%>%  summarise (mean.vote = Weighted.mean (pct, WT, na.rm=true),            std.vote = sqrt (weighted.var.se (PCT, WT, NA.RM=TRUE)) pollsummary## # A Tibble:2 x 3##     choice mean.vote  std.vote##      <chr>     <dbl>     <dbl>## 1 Clinton  43.48713 0.5073771## 2   Trump  38.95760 1.0717574

Apparently, the main candidates were Clinton and Hillary Clinton, who had an average percentage of votes higher than Hillary, and whose standard deviation was Uhillary, which meant that the vote was stable, and that the eventual winner was probably Clinton, but the change in Hillary's changes did not rule out the possibility of Hillary's victory. Hillary's vote was seen to have reached the highest percentage of 51%.

Original link: https://www.r-statistics.com/2016/08/presidential-election-predictions-2016/

This article link: http://www.cnblogs.com/homewch/p/5811945.html

Projections for the 2016 presidential election

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.