Python Data Analysis 1

Source: Internet
Author: User
Tags aliases clear screen first string instance method stack trace python script

Summary of this section

  Basic Environment

Ipython Foundation

Objective

This is the first blog in 18, because boss for some of my job expectations, need to start doing some data analysis work, so began to write this series of blog. The main content of the classification is basically the landlord in view of the reading "Data analysis using Python" a book written in the basic points.

First, the big environment does not need to say, why to use Python to do these things, why choose NumPy and pandas to carry out data processing analysis? If it is a small big white, then patiently open all the content, then will find, wow, a lot of code is very difficult to write things, why these modules deal with thieves simple. Landlord anyway is surprised to close the mouth of the state study of this book. In addition, this classification of the blog needs the Python Foundation, if not, easy to read the Python basic classification section first.

Hope that the world peace, I hope that one day I see this blog all the readers can through their own efforts to achieve a small goal, towards success. Everybody cheer up.

Environmental issues

First solve a problem that I spent 2 days Baidu has not found the exact solution. Business requirements are such, the landlord is the company is used by the SQL Server database, all the basis of data analysis is based on data, and the landlord just met a very unfortunate problem, is the Windows directly installed Pymssql module error problem, specifically I do not know why with

Pip install pymssql or PIP3 install pymssql

is not installed, always error. Like, the error says less sqlfront.h file.

to Baidu search Freetds-dev;

Landlord Computer is window * 64 bit, Python version is 3.6; (You can query the Python version, there is a basic, this does not know, it is really rind)

So choose x86_64_vs2015, and download.

Download good, unzip, inside the Include folder will find the error file Sqlfront.h file, altogether need several files. If you don't mind, you can copy all the files to the Include folder in the Python root directory; the pursuit of the perfect can be copied this file in the past, continue to pip see what is missing files, one by one dragged past.

That's it....

It was a different test. Once again pip install pymssql will find the following error.

????? WHF????

Landlord is stuck in this point, has been unable to solve. Finally, after a full 2 days, try to come up with an effective method.

Baidu pymssql-2.1.3.tar.gz this file.

Or use this URL https://pypi.python.org/pypi/pymssql/2.1.3, pull to the end, download the matching version

I don't know what version it is here????

In fact, I will not, are Baidu to, so that Baidu is really a good teacher. Don't tell me why the direct input python doesn't come out the following diagram: Go to add environment variable go, PIP also adds a bit

By executing the above statement, you can query the version of the wheel that needs to be installed.

Download a matching version.

cmd to the download directory, unzip the file and install it with PIP.

At this time go to the Python root directory site-packages can find the wheel file installed folder.

Download a copy of the last pymssql main file from the previous page. Extract

Unzip the file and rename the folder to

Then copy the Site-package folder to the Python root directory. Don't ask why, I don't know. Take a look at the file name in the inside probably can guess. (Imaginative achievement here, inexpressible)

It's good to be here, really .... I was so tried out anyway, if still not, then I'm sorry to disturb you. 、。。。

Mounting module

Direct PIP installs the Numpy,matplotlib,pandas,ipython module.

Looked down, many people said, Anaconda used to do data analysis is better, but landlord accustomed to use Pycharm, because before, and Pycharm do Django project is very powerful, so continue to use pycharm, only need to install a jupyter note Book is good.

Some other common habits, grammar:

Import NumPy as Npimport pandas as Pdimport Matplotlib.pyplot as Plt

The following will continue to be used in this grammar, writing content.

It is also not recommended to import a large library like NumPy directly. (from NumPy InPort *)

Ipython Foundation

In fact, there are Python-based, these things are not very big problem.

Starting the Ipython interpreter is similar to starting the Python interpreter. Just change the command to Ipython, and add the environment variable.

You can enter any Python statement.

TAB key Auto-complete

It's not important to use PYCHARM to hint.

The TAB key can be used not only for searching namespaces, objects, and module properties. When you enter something that looks like a file path (even in a Python string), press the TAB key to find something that matches it in your computer's file system.

Introspection

Add a question mark (?) before or after a variable. You can display some common information about the object. If an object is a function or an instance method, its docstring will also be displayed. Use double question marks to display the source code of the function, if possible.

It is also very powerful if we use the wildcard character *, such as np.*load*, to output all names that match the wildcard expression.

%run command

In Python, we can directly run the file directly with the Python + *.py file.

In the Ipython environment, you can run with%run *.py.

The script is run in an empty namespace (no import, no other variables defined, I think I should know)

Interrupt Code Execution

CTRL + C

Exceptions and tracking

When the%run execution script is sent to an exception, Ipython will output the trace of the entire call stack by default with a few lines of code near the call stack as the context reference.

Magic Command

Ipython has some special commands called Magic commands, some of them are handy for common tasks, and some allow you to easily control the behavior of your Ipython system.

The Magic command is prefixed with a percent-semicolon. For example, you can use the Magic command%timeit to detect the execution time of any Python statement (such as a matrix algorithm)

The Magic command can be used without a percent sign by default, as long as there are no variables with the same name. This technique is called automatic and can be opened or closed by%automatic.

It is possible to access its documentation directly in Ipython, so we recommend that you take a look at the special commands (enter%quickref or magic, anyway I don't have time to see them)

Ipython Magic Command%quickref Display Ipython Quick reference%magic Show detailed documentation of all Magic commands%debug Enter the interactive debugger%hist print commands from the bottom of the latest exception trace history%pdb automatically enters the debugger after an exception occurs%paste Execute Python code in Clipboard%cpaste open a special prompt to manually paste the python code you want to execute%reset delete all variables/names in the interactive namespace%page OBJECT is printed out via the pager object% Run script.py executes a python script file in Ipython%prun statement cprofile through statement and prints the output of the parser%time statement The execution time of the report statement%timeit statement multiple executions statement to calculate the average time to perform the ensemble. Useful for code that executes very little time%who,%who_ls,%whos displays variables defined in the interactive namespace, variable information level/redundancy%xdel variable Delete variable, And tries to clear all references to its objects in Ipython Ipython shell Command-Ctrl-p or UP ARROW key to search command history with the currently entered text beginning with the command-ctrl-n or DOWN ARROW key forward to search command history with the currently entered text beginning with the command-ctrl-r read by row reverse history search (partial match)-ctrl-shift-v Paste text from clipboard-Ctrl-c abort the code that is currently executing-ctrl-a moves the cursor to the beginning of the line-CTRL-E moves the cursor to the end of the row-ctrl-k deletes the text from the cursor to the end of the line-ctrl-u clears all text from the current line 12-c TRL-F moves the cursor forward one character-ctrl-b moves the cursor backwards by one character-ctrl-l clear screen Ipython System Interaction CommandsFor example, a variable in the Dir_info =!dir Ipython can save the result returned in the system shell and add it when invoking the system shell command! You can%bookmark Db/home/wesm/dropbox to save the DB as a bookmark permanently%alias ll Ls-l temporarily save LL as an alias for Ls-l%!cmd in the system shell Cmd%output =!cmd A The RGS executes CMD and stores stdout in output%alias alias_name cmd defines aliases for system shell commands%bookmark use Ipython directory bookmarks system%CD directory Change the system working directory to DIRECTORY%PWD returns the current working directory of the System%PUSHD directory will present the directory into the stack, and go to the target directory%popd pop up the top of the stack, and turn to the directory%dirs return a list containing the current directory stack%dhist Print Directory Access History%env return system environment variables in dict form Search and Reuse command historyFor example, input%run script_test.py executes a script. Find the wrong place to modify it, just enter the first few characters of the%run command and press CTRL + P or the UP ARROW key, which will search out the command history of the first string you entered to match the command. Press CTRL + P multiple times to continue querying, so CTRL + N can query down CTRL + R for partial incremental search, on Windows, Ipython simulates the ReadLine function, presses CTRL-R and enters several characters in the line you want to search. Pressing Ctrl-r will loop through each line in the command that matches the input. input and output variablesForgetting to assign a function result to a variable is a headache, but in Ipython, references to input and output (returned objects) are saved in special variables. The last two OutputThe results are stored separately in(_ An underscore with __ two underscores) in the variable:

The text entered is saved in a variable named _ix, where x is the line number of the input navigation. Each input variable corresponds to an output variable _x.

Because the input variable is a string. So it can be executed directly with the EXEC keyword of python (you can do it from the beginning, you can do it with Python syntax)

EXEC _i27

Record output and input

Ipython can record the entire console session, including input and output. Execute%logstart to begin logging.

Shell commands and aliases

In Ipython, with an exclamation point (! The command line at the beginning indicates that all subsequent content needs to be executed in the system shell. That is, you can delete files, modify directories, or perform any other process. You can even start a process that takes control from the hands of Ipython.

In addition, the console output of the shell command can be stored in a variable, simply! The expression at the beginning is assigned to the variable.

Ip_info =!ifconfig Eth0 | grep "inet" Ip_info[0].strip ()

Use! , Ipython also allows the use of Python values defined in the current environment. Just precede the variable name with the dollar symbol $

foo = ' test* '! LS $foo # will fetch all the matching files.

%alias can define abbreviations for shell commands.

%alias Ls-l11/user

If you execute more than one command at a time, simply write them in a single line and separate them:

%alias Test_alias (CD ch08; ls; CD ...) Test

Ipython will immediately forget all the aliases you have defined at the end of the session.

Directory bookmarks

Ipython has a simple directory system that preserves the aliases of commonly used directories for quick jumps.

%bookmark bookmark1/home/wes/xxxcd bookmark1    ==> go to/home/wes/xxx directory

If the signature of the book conflicts with a directory name in the current working directory, it can be tagged with the-b tag (which is overwrite, using the bookmark directory). The purpose of the%BOOKMARK-L option is to list all bookmarks:

%bookmark-l

The difference between bookmarks and aliases is that bookmarks are automatically persisted.

Interactive debugger

One of the best times to debug your code is when the error just happened. The%debug command invokes the post-event debugger and jumps directly to the stack frame that throws the exception.

In this debugger, you can execute arbitrary Python code and view all objects and data in each stack frame. The default is from the lowest level (that is, where the error occurs). Enter u (or up) and D (or down) to switch between the levels of the stack trace.

Executing the%pdb command allows Ipython to invoke the debugger automatically after an exception occurs.

In addition, the debugger can help with code development work, especially if you want to set breakpoints or functions/scripts to step through to see the execution of each statement. There are several ways to achieve this. First, use the%run command with the-D option, which will open the debugger before executing the code in the script file. You must enter s immediately to enter the script.

Run-d Ch03/ipython_bug.pys

After that, the next step in the implementation of the document will be a word for you. C (continue) keeps the script running until the breakpoint is reached. Enter n (or next) to execute directly to the next line.

Note that the debugger command takes precedence over the variable name. At this point, precede the variable with an exclamation mark (! ) to view its contents.

Other usage scenarios for the debugger

1, Set_trace (poor breakpoint).

Def set_trace (): From    Ipython.core.debugger import pdb    pdb (color_scheme= ' Linux '). Set_trace (Sys._getframe () . F_back) def debug (f, *args, **kwargs): from    Ipython.core.debugger import pdb    PDB = PDB (color_scheme= ' Linux ')    return Pdb.runcall (F, *args, **kwargs)

The first function (Set_trace) is very simple and can be placed in any place you want to stop to see: pressing C (continue) will still cause the code to resume execution without being affected.

2. Debug function

The Debug function allows you to use the debugger directly on any function. Suppose you have a function like this:

Def f (A, B, c=1): temp = x + y return temp/z

You are now stepping through the debugging. Notice the Debug method, we pass F as the first parameter to the debug function. followed by the order to follow the various keywords to be passed to f keyword parameters.

Debug (F, 1, 1, c=4)

Also can be used in conjunction with%run, through%run-d execution script, you will go directly to the debugger, then you can set some breakpoints and start the script. If you add-B and a line number, the debugger automatically sets a breakpoint when it starts. (suggest that this paragraph best Baidu a bit this book to look at)

Test the execution time of the code

The starting point for Ipython is for big data analysis, of course, in the face of large-scale data, a lot of data, for this long-running data analysis application, perhaps we need to test the various parts of the code execution time, to understand specifically, in the whole process of which functions occupy the most time.

Using the built-in time module and the Time.clock and Time.time functions can be used to manually test code execution times, but this is a frustrating time because developers or testers need to write many of the same lifeless formulation code:

Import Timestart = Time.time () for I in Range (iterations):    # Execute code here Test    end = (Time.time ()-start)/iterations

Since this is a very useful requirement feature, Ipython has the ability to easily test the information provided during the development of the test code. Magic functions%time and%timeit. %time executes one statement at a time and then reports the overall execution time.

# Suppose we have a very large array of strings, need to find out all the strings that start with ' foo ' strings = [' foo ', ' foo1 ', ' baz ', ' python ', ' Xiaoxiao '] * 100000method1 = [x for x in S] trings if X.startswith (' foo ')]method2 = [x for x in Strings if [x:3] = = ' Foo ']

It seems that their performance should be similar, let's use%time to verify.

It seems that the first method is much faster than the second, but this is not a very precise result, and if you perform multiple%time on the same statement, you will find that the result becomes. To get more accurate results, you need to use the Magic function%timeit, which, for any statement, executes multiple times to produce a very precise average execution time.

It is important to note that this is what we need to focus on, and the fact is that it is necessary to understand the performance characteristics of the Python standard library, Numpy,pandas, and other libraries used in the book. In a large data analytics application, these insignificant milliseconds are constantly accumulating! For analytic statements and functions that have a very short execution time. %timeit is very useful. Although these time values are small to negligible, the same is true for large amounts of data processing, and the time taken to perform 100w processing can be quite biased. For the above example, we can actually compare these 2 functions to understand their performance characteristics:

Basic performance Analysis:%prun and%run-p

Progressive Analysis Function Performance

Suggest these two points to look at the book, to write a py file to test, very annoying 0.0. In short, use%prun (cProfile) To do macro analysis, and%lprun (Line_profiler) to do microscopic performance analysis.

Python Data Analysis 1

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.