To become a programmer, in addition to learning a variety of tutorials, familiar with a variety of tools that have been used in the production environment will allow you to grow faster!
See a surprise at the end!
There are 7 Python tools that are essential tools for all data professionals. When you have a certain understanding of them, you will become the absolute advantage of finding a job! Let's get to them here:
IPython
Ipython is an interactive interpreter based on the Python shell, but has much more powerful editing and interactivity capabilities than the default shell. IPython provides the following features:
- Stronger interactive shell (Qt-based terminal)
- A browser-based Notepad, support code, plain text, math formulas, built-in charts and other rich media
- Support Interactive data visualization and graphical interface tools
- Flexible, embedded interpreter can be loaded into any own project
- Easy-to-use, high-performance tool for parallel computing
When you don't use a library, you can write some test code with Ipython. You can use Ipython to quickly master the method and usage of the library.
Graphlab greate
Graphlab Greate is a Python library supported by the C + + engine to quickly build large, high-performance data products.
About the characteristics of Graphlab greate:
You can analyze the amount of data in T as a unit of measurement on your computer at an interactive speed.
Tabular data, curves, text, and images can be analyzed on a single platform.
The latest machine learning algorithms include deep learning, evolutionary tree and factorization machines theory.
You can use Hadoop Yarn or EC2 clustering to run the same code on your laptop or distributed system.
Focus on tasks or machine learning with flexible API functions.
Easily configure data products on the cloud with predictive services.
- Create visual data for exploration and product monitoring.
Spark
Spark is a large data processing framework built around speed, ease of use, and complex analysis, providing a comprehensive, unified framework for managing the large data processing needs of a wide variety of datasets and data sources (batch data or real-time streaming data) with different properties (text data, chart data, and so on).
Spark's distributed computing based on the map reduce algorithm has the advantage of Hadoop MapReduce, but unlike MapReduce, the job intermediate output and results can be stored in memory, eliminating the need to read and write HDFs, As a result, spark is better suited for map reduce algorithms such as data mining and machine learning that need to be iterated.
Pandas
Pandas is a very useful library based on NumPy, just like the name, people love. The reason for this is that it is very simple to read and process data.
Pandas has two types of basic data structures that are unique to them. The reader should note that it has two data structures, because it is still a library of Python, so the data types in Python still apply here, and you can also use classes to define the data types themselves. But Pandas also defines two types of data: Series and DataFrame, which make the data easier to manipulate.
Scikit-learn
Scikit-learn is a machine learning library developed in Python, which contains a large number of machine learning algorithms and datasets, and is a convenient tool for data mining. The basic functions of scikit-learn are mainly divided into six parts: classification, regression, clustering, data dimensionality reduction, model selection and data preprocessing. Scikit-learn requires support from other packages such as NumPy and scipy to be able to use it.
# #PuLP
Linear programming is an optimization in which an object function is limited to a maximum extent. PuLP is a linear programming model written in Python. It can produce linear files, can call highly optimized solvers, Glpk,coin Clp/cbc,cplex, and gurobi to solve these linear problems.
# #Matplotlib
Matplotlib a drawing toolkit that uses up to 2D images in Python, using matplotlib to visualize data very simply.
Matplotlib try to make things easier and make difficult things possible. With Matplotlib, you only need to enter a few lines of code, you can generate drawings, histograms, power spectra, bar charts, error maps, scatter charts, etc., Welcome to join the technology sharing group: 725479218
Technology Sharing Group
There are a lot of learning materials, learning books, Baidu Cloud set of learning materials, want to get into the group!
7 Python tools are recommended, so you can learn faster than others!