High performance Python notes (Python is a good language, and all-stack programmers use it!) )

Source: Internet
Author: User
Tags semaphore numba

High performance PythonDirectory
  • 1 Understanding Performant Python
  • 2 Profiling
  • 3 Lists and tuples
  • 4 Dictionaries and sets
  • 5 iterators and Generators
  • 6 Matrix and Vector computation
  • 7 Compiling to C
  • 8 Concurrency
  • 9 Multiprocessing
  • Ten Clusters and Job Queues
  • One Using less RAM
  • A lessons from the Field
Understanding Performant PythonProfilingLists and tuples
    1. is an internal implementation an array?
Dictionaries and sets
    1. Dictionary elements: __hash__ + __eq__/__cmp__
    2. Entropy (entropy)
    3. Locals () Globals () __builtin__
    4. List Understanding/Generator Understanding: (one with [], one with ())
      [<value> for <item> in <sequence> if <condition>] vs (< Value> for <item> in <sequence> if <condition>)
    5. Itertools:
      1. IMAP, Ireduce, IFilter, Izip, Islice, chain, TakeWhile, cycle
    6. P95 Knuth ' s online mean algorithm?
iterators and GeneratorsMatrix and Vector computation
    1. is always lifting the example of ' Loop invariant ', which is the compiler is not optimized OK?
    2. $ perf stat-e cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,\
      cache-references,cache-misses,branches,branch-misses,task-clock,faults,\
      minor-faults,cs,migrations-r 3 python diffusion_python_memory.py
    3. numpy
      1. Np.roll ([[[1,2,3],[4,5,6]], 1, axis=1)
      2. ? Can cython optimize data structures? Or is it just about handling code?
      3. in-place operations, such as + =, *=
        1. = numexpr
          1. from numexpr Import Evaluate
          2. Evaluate ("Next_grid*d*dt+grid", Out=next_grid)
      4. ? Creating our own roll function
    4. scipy
      1. from scipy.ndimage.filters import Laplace
      2. Laplace (grid, out, mode= ' wrap ')
      3. Page-faults shows scipy allocating a lot of memory? Instructions shows that the SCIPY function is too generic?
Compiling to C
  1. Compile to C:
    1. Cython
        1. ZMQ also used?
        2. setup.py
          from distutils.core Import setup
          from Distutils.extension impo RT Extension
          from cython.distutils import build_ext
          Setup (cmdclass = {' BUILD_EX T ': Build_ext},
          ext_modules = [Extension ("Calculate", ["Cythonfn.pyx"])]
          )
        3. $ python setup.py build_ext--inplace
        4. Cython Annotations: Line of code more Yellow for "more calls to the" Python virtual machine, "
        5. Add type Annotations
          1. cdef unsigned int I, n
        6. disable bounds checking: #cython: boundscheck=false (modifier)
        7. buffer tag protocol?
          1. def calculate_z (int maxiter, double complex[:] ZS, double complex[:] cs): ....
        8. OpenMP
          1. prange
          2. -fopenmp (for GCC? )
          3. schedule= "guided"
    2. Shed skin:for non-numpy Code
      1. Shedskin--extmod test.py
      2. Additional 0.05s: Used to replicate data from the Python environment
    3. Pythran
  2. Numba:specialized for NumPy based on LLVM
    1. Use continuum ' s Anaconda version
    2. From Numba import JIT
      1. @jit ()
    3. Experimental GPU support is also available?
    4. #pythran export Evolve (float64[][], float)
  3. VMS & Jit:pypy
    1. GC Behavior: Whereas CPython uses reference counting, PyPy uses a modified mark and sweep (thus may not be recovered in time)
    2. Note that PyPy 2.3 runs as Python 2.7.3.
    3. STM: try to remove Gil
  4. Other tools: Theano Parakeet Pyviennacl Nuitka Pyston (Dropbox)Pycuda(low-level code is not portable?) )
  5. cTYPES,Cffi (from PyPy), f2py, CPython module
    1. $ f2py-c-M diffusion--fcompiler=gfortran--opt= '-o3 ' diffusion.f90
  6. JIT Versus AOT
Concurrency
  1. Concurrency: Avoid waste of I/O wait
  2. In Python, Coroutines is implemented as generators.
  3. For Python 2.7 implementations of future-based concurrency, ...?
    1. Gevent (suitable for mainly cpu-based problems that sometimes involve heavy I/O)
      1. Gevent monkey-patches The standard I/O functions to be asynchronous
      2. Greenlet
        1. Wait
        2. The futures is created with gevent.spawn
        3. Control the number of simultaneous open resources: from Gevent.coros import Semaphore
          1. requests = [Gevent.spawn (Download, u, semaphore) for u in URLs]
      3. Import grequests?
      4. 69x acceleration? Does this mean that the corresponding unnecessary IO waits?
      5. Event loop may either underutilizing or overutilizing
    2. Tornado (by Facebook, suitable for mostly i/o-bound asynchronous applications)
      1. From tornado import Ioloop, Gen
      2. From Functools Import Partial
      3. Asynchttpclient.configure ("Tornado.curl_httpclient. Curlasynchttpclient ", max_clients=100)
      4. @gen. coroutine
        1. ... responses = yield [Http_client.fetch (URL) for URL in URLs] #生成Future对象?
        2. Response_sum = SUM (len (r.body) for r in responses)
        3. raise Gen. Return (Value=response_sum)
      5. _ioloop = Ioloop. Ioloop.instance ()
      6. Run_func = partial (Run_experiment, Base_url, Num_iter)
      7. result = _ioloop.run_sync (Run_func)
      8. Disadvantage: Tracebacks can no longer hold valuable information
  4. In Python 3.4, new machinery introduced to easily create coroutines and has them still return values
    1. Asyncio
      1. yieldfrom: Raise exception is no longer required in order to return results from Coroutine
      2. very low-level = import aiohttp
         @asyncio. Coroutinedef http_get (URL ): #<span style= "White-space:pre" ></span>nonlocal semaphore<span style= "White-space:pre" ></ Span>with (yield from semaphore): <span style= "white-space:pre" ></span>response = yield from Aiohttp.request (' GET ', url) <span style= "white-space:pre" ></span>body = yield from Response.content.read ( ) <span style= "White-space:pre" ></span>yield from Response.wait_for_close () <span style= "White-space: Pre "></span>return bodyreturn http_gettasks = [http_client (URL) for URLs in Urls]for the future in Asyncio.as_complet Ed (tasks): <span style= "white-space:pre" ></span>data = yield from futureloop = Asyncio.get_event_loop () result = Loop.run_until_complete (Run_experiment (Base_url, Num_iter)) 
      3. Allows us to unify modules like Tornado and gevent by have them run in the same event loop
Multiprocessing
  1. Process Pool Queue Pipe Manager ctypes (for IPC?) )
  2. In Python 3.2, the Concurrent.futures module is introduced (via PEP 3148)
  3. PyPy fully supports multiprocessing and runs faster
  4. From Multiprocessing.dummy import Pool (multi-threaded version?) )
  5. hyperthreading can give up to a 30% perf gain if there is enough compute resources
  6. It is worth noting, the negative of threads on cpu-bound problems are reasonably solved in Python 3.2+
  7. Using external queue implementations: Gearman, 0MQ, celery (using RABBITMQ as the message agent), pyres, SQS or Hotqueue
  8. Manager = multiprocessing. Manager ()
    Value = Manager. Value (b ' C ', flag_clear)
  9. RDS = Redis. Strictredis ()
    Rds[flag_name] = Flag_set
  10. Value = multiprocessing. RawValue (b ' C ', flag_clear) #无同步机制?
  11. Sh_mem = Mmap.mmap ( -1, 1) # Memory map 1 byte as a flag
    Sh_mem.seek (0)
    Flag = Sh_mem.read_byte ()
  12. Using Mmap as a Flag Redux (? A bit of a sight to understand, skip over)
  13. $ ps-a-O Pid,size,vsize,cmd | grep np_shared
  14. Lock = Lockfile. Filelock (filename)
    Lock.acquire/release ()
  15. Lock = multiprocessing. Lock ()
    Value = multiprocessing. Value (' I ', 0)
    Lock.acquire ()
    Value.value + = 1
    Lock.release ()
Clusters and Job Queues
  1. $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
    1. Does the version upgrade cause inconsistencies? But the API should be versioned ...
  2. Skype ' s 24-hour Global Outage
    1. Some versions of the Windows client didn ' t properly handle the delayed responses and crashed.
  3. To reliably start the cluster's components when the machine boots, we tend to use either a cron Job,circus or Supervisord, or sometimes upstart (which is being replaced by SYSTEMD)
  4. Might want to introduce a random-killer tool like Netflix ' s Chaosmonkey
  5. Make sure it was cheap in time and money to deploy updates to the system
  6. Make sure-deployment system like Fabric, Salt, Chef, or Puppet
  7. Early Warning: Pingdom andserverdensity
  8. Status monitoring: Ganglia
  9. 3 Clustering Solutions
    1. Parallel Python
      1. Ppservers = ("*",) # set IP list to be autodiscovered
      2. Job_server = pp. Server (Ppservers=ppservers, Ncpus=nbr_local_cpus)
      3. ... job = Job_server.submit (Calculate_pi, (Input_args,), (), ("random",))
    2. IPython Parallel
      1. Via Ipcluster
      2. schedulers Hide the synchronous nature of the engines and provide an asynchronous interface
    3. NSQ (distributed messaging system, go authoring)
      1. Pub/sub:topicd, Channels, consumers
      2. writer = nsq. Writer ([' 127.0.0.1:4150 ',])
      3. Handler = partial (Calculate_prime, Writer=writer)
      4. Reader = nsq. Reader (Message_handler = handler, nsqd_tcp_addresses = [' 127.0.0.1:4150 ',], topic = ' numbers ', channel = ' worker_group_a ' ,)
      5. Nsq.run ()
  10. Other cluster tools
Using less RAM
  1. IPython #memit
  2. Array module
  3. Dawg/dafsa
  4. Marisa Trie (Static tree)
  5. Datrie (need an alphabet to contain all the keys?) )
  6. HAT trie
  7. HTTP microservices (using flask): https://github.com/j4mie/postcodeserver/
  8. Probabilistic Data Structures
    1. hyperloglog++ structure?
    2. Very Approximate counting with a 1-byte Morris Counter
      1. 2^exponent, updating using probabilistic rules: random (0,1) <=2^- Exponent
    3. k-minimum values/kmv (remember K minimum hash value, assuming hash value is evenly distributed)
    4. Bloom Filters
      1. This method gives us no false negatives and a controllable rate of false positives (possibly misjudged)
      2. ? Use 2 separate hash simulations for any number of hashes
      3. very sensitive to initial capacity
      4. Scalable Bloom Filters: By chaining together multiple bloom filters ...
    5. loglog Counter
      bit_index = Trailing_zeros (item_hash)
      if bit _index > Self.counter:
      self.counter = Bit_index
      1. variant: Superloglog hyperloglog
lessons from the Field
    1. Sentry is used to log and diagnose Python stack traces
    2. Aho-corasick trie?
    3. We use Graphite with collectd and statsd to allow us to draw pretty graphs of what ' s going on
    4. Gunicorn is used as a WSGI and its IO loop is executed by Tornado

High performance Python notes (Python is a good language, and all-stack programmers use it!) )

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.