This is a creation in Article, where the information may have evolved or changed.
- Video information
- What is Data science
- The struggle of Data science
- Do Data Science once
- Arithmetic and visualization
- Prediction
- Other data-related items
Video Info #
Go for Data Science
by Daniel Whitenack
At Gophercon 2016
Https://www.youtube.com/watch?v=D5tDubyXLrQ
Slide Address: https://github.com/dwhitena/slides/tree/master/gophercon2016
What is Data Science #
- ETL, Data cleaning, Organization
- Parsing, Extraction of Patterns
- Arithmetic
Data Science's Struggle #
Integrity #
Using python often results in unexpected outcomes.
For example, here we use Python and Golang to read the values of CSV, and to find the maximum value of the first column.
Python Code
123456789 |
Import as Pdcols = [ ' Integercolumn ', ' stringcolumn ']data = pd.read_csv (' Example.csv ', names=cols)print data[' integercolumn '].max () |
Go Code:
12345678910111213141516171819202122232425 |
F, err: = OS. Open ( "Example.csv" ) if err! = nil {err = errors. Wrap (err, "Could not open CSV" ) log. Fatal (err)}r: = csv. Newreader (Bufio. Newreader (f)) records, Err: = R.readall () if err! = nil {err = Errors. Wrap (err, "Could not parse CSV" ) log. Fatal (Err)}var intmax int for _, Record: = range records {intval, err: = StrConv. Atoi (Record[0 ]) if err! = nil {err = errors. Wrap (err, "Parse failed" ) log. Fatal (Err)} if intval > Intmax {intmax = Intval}}fmt. Println (Intmax) |
Normal conditions are fine, but if the data is missing, for example, some column values are missing. For Golang, will be very rigorous error, and Python does not error, continue to execute. So a long time, a large amount of data, when this inconsistency occurs, there is no barrier, because there is no known where the error.
For example, when three rows and three columns are input, and the first column of the third row is missing.
Python returns:
12 |
$ python example2.py2.0 |
and Golang returns
123 |
go run example2. Go /to/from:1 |
Deployment
- GCE or AWS
- Scikit
- Numpy
- Pandas
Even if the application of Data science is well-written, every deployment needs to sit down with Ops and talk about what cloud to use, what libraries are in it, and what versions and so on, the whole process becomes quite cumbersome and complex. Even if you write an extra-long Dockerfile to build the desired environment, it becomes impractical to maintain the image yourself later.
There is no such problem in Golang, because Golang is statically compiled, so as long as the compilation is good, all the necessary dependencies are inside. So all of a sudden it became easy. Dockerfile can become super simple:
123 |
From Scratchadd myservice/myservicecmd ["/myservice"] |
Make a Data science #
- Use the
github.com/google/go-github/github
package to get information about Github's Go project.
- Use
pachyderm
to organize data, Pachyderm is also written in Go, and based on k8s. There is a class file system for PFS.
Arithmetic and visualization #
Recommended Usegonum
- Plot
- Graph
- Stat
- Integrate
- Lapack
- Unit
- Matrix
- Mathext
- Floats
- Blas
- Optimize
You can perform matrix operations (matrices), or you can draw (plot).
Prediction #
Linear regression is used here, using github.com/sajari/regression
this package.
Forecast that the 2016 lecture will be set up in 195 projects this day, and in 2017 there will be 240 projects.
Other data-related items #
- Gophernotes: Interactive Go Notebooks
- Golearn: Universal Machine Learning
- Glow: Distributed Computing System (map reduce)
- Gobot: A library of Go that is specifically prepared for sensors and IoT
- Golang will internally support multidimensional slice
- TensorFlow provides Go API
Gopher Slack also has a data science channel