Video Note: Go Data science-Daniel Whitenack

Source: Internet
Author: User
This is a creation in Article, where the information may have evolved or changed.
    • Video information
      • What is Data science
      • The struggle of Data science
        • Integrity
        • Deployment
      • Do Data Science once
      • Arithmetic and visualization
      • Prediction
      • Other data-related items

Video Info #

Go for Data Science
by Daniel Whitenack
At Gophercon 2016

Https://www.youtube.com/watch?v=D5tDubyXLrQ

Slide Address: https://github.com/dwhitena/slides/tree/master/gophercon2016

What is Data Science #

    • ETL, Data cleaning, Organization
    • Parsing, Extraction of Patterns
    • Arithmetic

Data Science's Struggle #

Integrity #

Using python often results in unexpected outcomes.

For example, here we use Python and Golang to read the values of CSV, and to find the maximum value of the first column.

Python Code

123456789
Import  as Pdcols = [  ' Integercolumn ',  ' stringcolumn ']data = pd.read_csv (' Example.csv ', names=cols)print data[' integercolumn '].max ()

Go Code:

 12345678910111213141516171819202122232425 
 F, err: = OS. Open ( "Example.csv" ) if  err! = nil  {err = errors. Wrap (err,  "Could not open CSV" ) log. Fatal (err)}r: = csv. Newreader (Bufio. Newreader (f)) records, Err: = R.readall () if  err! = nil  {err = Errors. Wrap (err,  "Could not parse CSV" ) log. Fatal (Err)}var  intmax int  for  _, Record: = range  records {intval, err: = StrConv. Atoi (Record[0 ]) if  err! = nil  {err = errors. Wrap (err,  "Parse failed" ) log. Fatal (Err)} if  intval > Intmax {intmax = Intval}}fmt. Println (Intmax) 

Normal conditions are fine, but if the data is missing, for example, some column values are missing. For Golang, will be very rigorous error, and Python does not error, continue to execute. So a long time, a large amount of data, when this inconsistency occurs, there is no barrier, because there is no known where the error.

For example, when three rows and three columns are input, and the first column of the third row is missing.

Python returns:

12
$ python example2.py2.0

and Golang returns

123
go run example2. Go /to/from:1 

Deployment

    • GCE or AWS
    • Scikit
    • Numpy
    • Pandas

Even if the application of Data science is well-written, every deployment needs to sit down with Ops and talk about what cloud to use, what libraries are in it, and what versions and so on, the whole process becomes quite cumbersome and complex. Even if you write an extra-long Dockerfile to build the desired environment, it becomes impractical to maintain the image yourself later.

There is no such problem in Golang, because Golang is statically compiled, so as long as the compilation is good, all the necessary dependencies are inside. So all of a sudden it became easy. Dockerfile can become super simple:

123
From Scratchadd myservice/myservicecmd ["/myservice"]

Make a Data science #

    • Use the github.com/google/go-github/github package to get information about Github's Go project.
    • Use pachyderm to organize data, Pachyderm is also written in Go, and based on k8s. There is a class file system for PFS.

Arithmetic and visualization #

Recommended Usegonum

    • Plot
    • Graph
    • Stat
    • Integrate
    • Lapack
    • Unit
    • Matrix
    • Mathext
    • Floats
    • Blas
    • Optimize

You can perform matrix operations (matrices), or you can draw (plot).

Prediction #

Linear regression is used here, using github.com/sajari/regression this package.

Forecast that the 2016 lecture will be set up in 195 projects this day, and in 2017 there will be 240 projects.

Other data-related items #

    • Gophernotes: Interactive Go Notebooks
    • Golearn: Universal Machine Learning
    • Glow: Distributed Computing System (map reduce)
    • Gobot: A library of Go that is specifically prepared for sensors and IoT
    • Golang will internally support multidimensional slice
    • TensorFlow provides Go API

Gopher Slack also has a data science channel

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.