High performance Python notes (Python is a good language, and all-stack programmers use it!) ）

Last Update:2014-10-08 Source: Internet

Author: User

Tags semaphore numba

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

High performance PythonDirectory

1 Understanding Performant Python
2 Profiling
3 Lists and tuples
4 Dictionaries and sets
5 iterators and Generators
6 Matrix and Vector computation
7 Compiling to C
8 Concurrency
9 Multiprocessing
Ten Clusters and Job Queues
One Using less RAM
A lessons from the Field

Understanding Performant PythonProfilingLists and tuples

is an internal implementation an array?

Dictionaries and sets

Dictionary elements: __hash__ + __eq__/__cmp__
Entropy (entropy)
Locals () Globals () __builtin__
List Understanding/Generator Understanding: (one with [], one with ())

[<value> for <item> in <sequence> if <condition>] vs (< Value> for <item> in <sequence> if <condition>)
Itertools:
1. IMAP, Ireduce, IFilter, Izip, Islice, chain, TakeWhile, cycle
P95 Knuth ' s online mean algorithm?

iterators and GeneratorsMatrix and Vector computation

is always lifting the example of ' Loop invariant ', which is the compiler is not optimized OK?
$ perf stat-e cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,\
cache-references,cache-misses,branches,branch-misses,task-clock,faults,\
minor-faults,cs,migrations-r 3 python diffusion_python_memory.py
numpy
1. Np.roll ([[[1,2,3],[4,5,6]], 1, axis=1)
2. ? Can cython optimize data structures? Or is it just about handling code?
3. in-place operations, such as + =, *=
  1. = numexpr
    1. from numexpr Import Evaluate
    2. Evaluate ("Next_grid*d*dt+grid", Out=next_grid)
4. ? Creating our own roll function
scipy
1. from scipy.ndimage.filters import Laplace
2. Laplace (grid, out, mode= ' wrap ')
3. Page-faults shows scipy allocating a lot of memory? Instructions shows that the SCIPY function is too generic?

Compiling to C

Compile to C:
1. Cython
3. Shed skin:for non-numpy Code
  1. Shedskin--extmod test.py
  2. Additional 0.05s: Used to replicate data from the Python environment
5. Pythran
Numba:specialized for NumPy based on LLVM
1. Use continuum ' s Anaconda version
2. From Numba import JIT
  1. @jit ()
3. Experimental GPU support is also available?
4. #pythran export Evolve (float64[][], float)
VMS & Jit:pypy
1. GC Behavior: Whereas CPython uses reference counting, PyPy uses a modified mark and sweep (thus may not be recovered in time)
2. Note that PyPy 2.3 runs as Python 2.7.3.
3. STM: try to remove Gil
Other tools: Theano Parakeet Pyviennacl Nuitka Pyston (Dropbox)Pycuda(low-level code is not portable?) ）
cTYPES,Cffi (from PyPy), f2py, CPython module
1. $ f2py-c-M diffusion--fcompiler=gfortran--opt= '-o3 ' diffusion.f90
JIT Versus AOT

Concurrency

Concurrency: Avoid waste of I/O wait
In Python, Coroutines is implemented as generators.
For Python 2.7 implementations of future-based concurrency, ...?
2. Gevent (suitable for mainly cpu-based problems that sometimes involve heavy I/O)
  1. Gevent monkey-patches The standard I/O functions to be asynchronous
  2. Greenlet
    1. Wait
    2. The futures is created with gevent.spawn
    3. Control the number of simultaneous open resources: from Gevent.coros import Semaphore
      1. requests = [Gevent.spawn (Download, u, semaphore) for u in URLs]
  3. Import grequests?
  4. 69x acceleration? Does this mean that the corresponding unnecessary IO waits?
  5. Event loop may either underutilizing or overutilizing
4. Tornado (by Facebook, suitable for mostly i/o-bound asynchronous applications)
  1. From tornado import Ioloop, Gen
  2. From Functools Import Partial
  3. Asynchttpclient.configure ("Tornado.curl_httpclient. Curlasynchttpclient ", max_clients=100)
  4. @gen. coroutine
    1. ... responses = yield [Http_client.fetch (URL) for URL in URLs] #生成Future对象?
    2. Response_sum = SUM (len (r.body) for r in responses)
    3. raise Gen. Return (Value=response_sum)
  5. _ioloop = Ioloop. Ioloop.instance ()
  6. Run_func = partial (Run_experiment, Base_url, Num_iter)
  7. result = _ioloop.run_sync (Run_func)
  8. Disadvantage: Tracebacks can no longer hold valuable information

In Python 3.4, new machinery introduced to easily create coroutines and has them still return values

Asyncio

yieldfrom: Raise exception is no longer required in order to return results from Coroutine

very low-level = import aiohttp

 @asyncio. Coroutinedef http_get (URL ): #<span style= "White-space:pre" ></span>nonlocal semaphore<span style= "White-space:pre" ></ Span>with (yield from semaphore): <span style= "white-space:pre" ></span>response = yield from Aiohttp.request (' GET ', url) <span style= "white-space:pre" ></span>body = yield from Response.content.read ( ) <span style= "White-space:pre" ></span>yield from Response.wait_for_close () <span style= "White-space: Pre "></span>return bodyreturn http_gettasks = [http_client (URL) for URLs in Urls]for the future in Asyncio.as_complet Ed (tasks): <span style= "white-space:pre" ></span>data = yield from futureloop = Asyncio.get_event_loop () result = Loop.run_until_complete (Run_experiment (Base_url, Num_iter))

Allows us to unify modules like Tornado and gevent by have them run in the same event loop

Multiprocessing

Process Pool Queue Pipe Manager ctypes (for IPC?) ）
In Python 3.2, the Concurrent.futures module is introduced (via PEP 3148)
PyPy fully supports multiprocessing and runs faster
From Multiprocessing.dummy import Pool (multi-threaded version?) ）
hyperthreading can give up to a 30% perf gain if there is enough compute resources
It is worth noting, the negative of threads on cpu-bound problems are reasonably solved in Python 3.2+
Using external queue implementations: Gearman, 0MQ, celery (using RABBITMQ as the message agent), pyres, SQS or Hotqueue
Manager = multiprocessing. Manager ()
Value = Manager. Value (b ' C ', flag_clear)
RDS = Redis. Strictredis ()
Rds[flag_name] = Flag_set
Value = multiprocessing. RawValue (b ' C ', flag_clear) #无同步机制?
Sh_mem = Mmap.mmap ( -1, 1) # Memory map 1 byte as a flag
Sh_mem.seek (0)
Flag = Sh_mem.read_byte ()
Using Mmap as a Flag Redux (? A bit of a sight to understand, skip over)
$ ps-a-O Pid,size,vsize,cmd | grep np_shared
Lock = Lockfile. Filelock (filename)
Lock.acquire/release ()
Lock = multiprocessing. Lock ()
Value = multiprocessing. Value (' I ', 0)
Lock.acquire ()
Value.value + = 1
Lock.release ()

Clusters and Job Queues

$462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
1. Does the version upgrade cause inconsistencies? But the API should be versioned ...
Skype ' s 24-hour Global Outage
1. Some versions of the Windows client didn ' t properly handle the delayed responses and crashed.
To reliably start the cluster's components when the machine boots, we tend to use either a cron Job,circus or Supervisord, or sometimes upstart (which is being replaced by SYSTEMD)
Might want to introduce a random-killer tool like Netflix ' s Chaosmonkey
Make sure it was cheap in time and money to deploy updates to the system
Make sure-deployment system like Fabric, Salt, Chef, or Puppet
Early Warning: Pingdom andserverdensity
Status monitoring: Ganglia
3 Clustering Solutions
1. Parallel Python
  1. Ppservers = ("*",) # set IP list to be autodiscovered
  2. Job_server = pp. Server (Ppservers=ppservers, Ncpus=nbr_local_cpus)
  3. ... job = Job_server.submit (Calculate_pi, (Input_args,), (), ("random",))
2. IPython Parallel
  1. Via Ipcluster
  2. ？ schedulers Hide the synchronous nature of the engines and provide an asynchronous interface
3. NSQ (distributed messaging system, go authoring)
  1. Pub/sub:topicd, Channels, consumers
  2. writer = nsq. Writer ([' 127.0.0.1:4150 ',])
  3. Handler = partial (Calculate_prime, Writer=writer)
  4. Reader = nsq. Reader (Message_handler = handler, nsqd_tcp_addresses = [' 127.0.0.1:4150 ',], topic = ' numbers ', channel = ' worker_group_a ' ,)
  5. Nsq.run ()
Other cluster tools

Using less RAM

IPython #memit
Array module
Dawg/dafsa
Marisa Trie (Static tree)
Datrie (need an alphabet to contain all the keys?) ）
HAT trie
HTTP microservices (using flask): https://github.com/j4mie/postcodeserver/
Probabilistic Data Structures
1. hyperloglog++ structure?
2. Very Approximate counting with a 1-byte Morris Counter
  1. 2^exponent, updating using probabilistic rules: random (0,1) <=2^- Exponent
3. k-minimum values/kmv (remember K minimum hash value, assuming hash value is evenly distributed)
4. Bloom Filters
  1. This method gives us no false negatives and a controllable rate of false positives (possibly misjudged)
  2. ? Use 2 separate hash simulations for any number of hashes
  3. very sensitive to initial capacity
  4. Scalable Bloom Filters: By chaining together multiple bloom filters ...
5. loglog Counter
  
  bit_index = Trailing_zeros (item_hash)
  
  if bit _index > Self.counter:
  self.counter = Bit_index
  1. variant: Superloglog hyperloglog

lessons from the Field

Sentry is used to log and diagnose Python stack traces
Aho-corasick trie?
We use Graphite with collectd and statsd to allow us to draw pretty graphs of what ' s going on
Gunicorn is used as a WSGI and its IO loop is executed by Tornado

High performance Python notes (Python is a good language, and all-stack programmers use it!) ）

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More