Python and R data analysis/mining tools Mutual Search

Source: Internet
Author: User
Tags cassandra svm statsmodels nltk keras

If you are already familiar with the Python and R module/package loading method, the table below is relatively easy to find. Python is referenced in the following table as a module., and some modules are not native modules, please use

pip install *

installation; In the same vein, in order to facilitate indexing, R also::represents the function and the name of the package in which the function is located, if it does not contain a::default package that is represented as R,::please use



Connector and IO database
category Python R
Mysql Mysql-connector-python (official) Rmysql
Oracle Cx_oracle Roracle
Redis Redis Rredis
Mongodb Pymongo Rmongo, Rmongodb
Neo4j Py2neo Rneo4j
Cassandra Cassandra-driver Rjdbc
Odbc Pyodbc RODBC
Jdbc Unknown [Jython only] Rjdbc
Io class
category Python R
Excel Xlsxwriter, Pandas. (from/to) _excel, OPENPYXL OPENXLSX::READ.XLSX (2), XLSX::READ.XLSX (2)
Csv Csv.writer Read.csv (2), read.table
Json Json Jsonlite
Descriptive statistics of statistical classes
category Python R
Summary of Descriptive statistics Scipy.stats.descirbe Summary
Mean value Scipy.stats.gmean (geometric mean), Scipy.stats.hmean (harmonic average), Numpy.mean, Numpy.nanmean, pandas. Series.mean Mean
Number of Median Numpy.median, Numpy.nanmediam, pandas. Series.median Median
The majority of Scipy.stats.mode, Pandas. Series.mode Unknown
Number of Bits Numpy.percentile, Numpy.nanpercentile, pandas. Series.quantile Quantile
Cumulative experience Function (ECDF) Ecdf
Standard deviation SCIPY.STATS.STD, SCIPY.STATS.NANSTD, NUMPY.STD, pandas. Series.std Sd
Variance Numpy.var, Pandas. Series.var Var
Coefficient Scipy.stats.variation Unknown
Covariance Numpy.cov, Pandas. Series.cov CoV
(Pearson) Correlation coefficient Scipy.stats.pearsonr, Numpy.corrcoef, pandas. Series.corr Cor
Peak degree Scipy.stats.kurtosis, Pandas. Series.kurt E1071::kurtosis
Degree of skewness Scipy.stats.skew, Pandas. Series.skew E1071::skewness
Histogram Numpy.histogram, numpy.histogram2d, NUMPY.HISTOGRAMDD Unknown
Regression (including statistics and machine learning)
category Python R
Common least squares regression (OLS) Statsmodels.ols, Sklearn.linear_model. Linearregression Lm
Generalized linear regression (GLS) Statsmodels.gls Nlme::gls, Mass::gls
Scale regression (Quantile regress) Statsmodels. Quantreg Quantreg::rq
Ridge return Sklearn.linear_model. Ridge Mass::lm.ridge, Ridge::linearridge
LASSO Sklearn.linear_model. Lasso Lars::lars
Minimum angle regression Sklearn.linear_modle. Lassolars Lars::lars
Robust regression Statsmodels. RLM Mass::rlm
Hypothesis Testing
category Python R
T test Statsmodels.stats.ttest_ind, Statsmodels.stats.ttost_ind, statsmodels.stats.ttost.paired; Scipy.stats.ttest_1samp, Scipy.stats.ttest_ind, Scipy.stats.ttest_ind_from_stats, Scipy.stats.ttest_rel T.test
KS Test (test distribution) Scipy.stats.kstest, Scipy.stats.kstest_2samp Ks.test
Wilcoxon (non-parametric test, differential test) Scipy.stats.wilcoxon, Scipy.stats.mannwhitneyu Wilcox.test
Shapiro-wilk test of normal condition Scipy.stats.shapiro Shapiro.test
Pearson correlation coefficient test Scipy.stats.pearsonr Cor.test
Time series
category Python R
Ar Statsmodels.ar_model.AR Ar
Arima Statsmodels.arima_model.arima Arima
Var Statsmodels.var_model.var Unknown

Python can also be found inPyFlux.

Survival analysis
category Python R
ph regression Statsmodels.formula.api.phreg Unknown

Modules specifically analyzed:

Machine Learning class Regression

See Statistical classes

Classifier LDA, QDA
category Python R
Lda Sklearn.discriminant_analysis. Lineardiscriminantanalysis Mass::lda
QDA Sklearn.discriminant_analysis. Quadraticdiscriminantanalysis Mass::qda
SVM (Support vector machine)
category Python R
Support Vector Classifier (SVC) Sklearn.svm.SVC E1071::svm
Non-support vector classifier (NONSVC) Sklearn.svm.NuSVC Unknown
Linear support vector classifier (lenear SVC) Sklearn.svm.LinearSVC Unknown
Based on proximity
category Python R
K-Nearest classifier Sklearn.neighbors.KNeighborsClassifier Unknown
Radius near classifier Sklearn.neighbors.RadiusNeighborsClassifier Unknown
Near center of gravity classifier (Nearest centroid Classifier) Sklearn.neighbors.NearestCentroid Unknown
category Python R
Naive Bayesian Sklearn.naive_bayes. Gaussiannb E1071::naivebayes
Dovibeyes (multinomial Naive Bayes) Sklearn.naive_bayes. Multinomialnb Unknown
Bernoulibeyes (Bernoulli Naive Bayes) Sklearn.naive_bayes. Bernoullinb Unknown
Decision Tree
category Python R
Decision Tree Classifier Sklearn.tree.DecisionTreeClassifier Tree::tree, Party::ctree
Decision Tree Regression Sklearn.tree.DecisionTreeRegressor Tree::tree, Party::tree
Assemble method
category Sub-category Python R
Bagging Random Forest classifier Sklearn.ensemble.RandomForestClassifier Randomforest::randomforest, Party::cforest
Bagging Random Forest regression device Sklearn.ensemble.RandomForestRegressor Randomforest::randomforest, Party::cforest
Boosting Gradient boosting xgboostModule xgboostPackage
Boosting AdaBoost Sklearn.ensemble.AdaBoostClassifier adabag,fastAdaboost,ada
Stacking Unknown Unknown Unknown
category Python R
Kmeans Scipy.cluster.kmeans.kmeans Kmeans::kmeans
Hierarchical clustering Scipy.cluster.hierarchy.fcluster (Stats::) hclust
Baouzu class (Bagged Cluster) Unknown E1071::bclust
DBSCAN Sklearn.cluster.DBSCAN Dbscan::d Bsan
Birch Sklearn.cluster.Birch Unknown
K-medoids Clustering Pyclust. Kmedoids (Reliability unknown) Cluster.pam
Association Rules
category Python R
Apriori algorithm Apriori (Unknown reliability, py3 not supported), Pyfim (Reliability unknown, PIP installation not available) Arules::apriori
Fp-growth algorithm Fp-growth (Unknown reliability, py3 not supported), Pyfim (Reliability unknown, PIP installation not available) Unknown
Neural network
category Python R
Neural network, keras.* Nnet::nnet, Nueralnet::nueralnet
Deep learning keras.* Unreliable packages mostly and unknown

Of course, thetheanomodule is worth mentioning, buttheanothe design of the essence package is not in the neural network, so it is not attributed to this class.

Probabilistic graph model


Text, NLP basic operations
category Python R
Tokenize Nltk.tokenize (UK), Jieba.tokenize (middle) Tau::tokenize
Stem Nltk.stem Rtexttools::wordstem, Snowballc::wordstem
Stopwords Stop_words.get_stop_words Tm::stopwords, Qdap::stopwords
Chinese participle Jieba.cut, Smallseg, Yaha, finalseg, genius Jiebar
TFIDF Gensim.models.TfidfModel Unknown
Topic model
category Python R
Lda Lda. LDA, Gensim.models.ldamodel.LdaModel Topicmodels::lda
Lsi Gensim.models.lsiModel.LsiModel Unknown
Rp Gensim.models.rpmodel.RpModel Unknown
HDP Gensim.models.hdpmodel.HdpModel Unknown

It's worth noting that Python's new third-party modulespaCy

Interaction with other analysis/visualization/mining/reporting tools

category Python R
Weka Python-weka-wrapper Rweka
Tableau Tableausdk Rserve (actual service pack for R)

Reproduced in: 1190000005041649

Python and R data analysis/mining tools Mutual Search

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.