Recently in the "Python Source code Analysis", Python internal operating mechanism than previously understood deeper into the, feel that they have the opportunity to do a small dynamic scripting language, hehe, of course, is bragging. The goal, of course, is not to create a dynamic language with only one purpose: better use of python. See module import that piece of time, finally to the module import mechanism more understanding, in case forget special record down.
The search path for the module
The search path of the module is placed in the Sys.path list, if the default Sys.path does not contain its own module or package path, you can dynamically join (sys.path.apend). Here are the rules for adding Sys.path under the Windows platform.
1. Sys.path The first path is often the directory where the main module resides. Add an empty item in the interactive environment that corresponds to the current directory.
2. If the PYTHONPATH environment variable exists, Sys.path will load the directory specified by this variable.
3, we try to find the Python home, if we set the PYTHONHOME environment variable, we think this is Python home, otherwise, we use Python.exe directory found lib/os.py to infer Python home.
If we do find the Python home, then the related subdirectories (Lib, Plat-win, LIB-TK, etc.) will be added to the Sys.path on a python home basis and imported (executed) lib/ site.py, add the site-specific directory and the package under it.
If we do not find the Python Home, the registry Software/python/pythoncore/2.5/pythonpath entries are added to Sys.path (HKLM and HKCU merge), but the associated subdirectories are not automatically added.
4, if we do not find the Python Home, and there is no pythonpath environment variable, and cannot find the Pythonpath in the registry, then the default relative path will be joined (such as:./lib;. /plat-win, etc.).
Summarized as follows
When running Python.exe in the installed home directory, first infer Python Home, if Pythonhome is found, the Pythonpath in the registry is ignored, otherwise the pythonpath of the registry is added.
If the PYTHONPATH environment variable exists, Sys.path will certainly load the directory specified by this variable.
If Python.exe is in another directory (different directories, such as embedding through COM into other programs), Python home will not infer that the Pythonpath of the registry will be used at this time.
If Python.exe cannot find his home directory (Pythonhome) and the registry does not Pythonpath, the default relative directory will be added.
Standard Import
All modules loaded into memory in Python are placed in the sys.modules. When you import a module, you first look in the list to see if the module has already been loaded, and if loaded, simply add the module's name to the local namespace of the module that is calling import. If it is not loaded, the module file is looked up from the Sys.path directory, the module file can be py, PYC, PYD, the module is loaded into memory after it is found, added to the Sys.modules, and the name is imported into the current local namespace.
As you can see, a module is not loaded repeatedly . Many different modules can introduce the same module to their local namespace with import, but there is only one Pymoduleobject object behind it.
Say an easy-to-ignore problem, importonly imports modules, cannot import objects in modules (classes, functions, variables, etc.). If there is a function getname in module A (a.py), the other module cannot import getname into this module via import A.getname, only with import a. If you want to import only specific classes, functions, and variables, use the From A import getname.
Nested import
Nested import, I have two cases, one is: This module imports a module (import a), and a in the import statement, will activate another import action, such as import B, and B module can import other modules, continue.
It is easy to understand this nesting, note that the local namespace of each module is independent, so the above example, the module import a after the module can only access module A, can not access B and other modules. Although module B is already loaded into memory, if you want to access it you also need to explicitly import B in this module.
Another type of nesting refers to import B in module A, and import a in module B. What happens then? This is explained in detail by Robertchen in the Python list, transcribed as follows:
[a.py] From B import D class C:pass [b.py] from A import C class D:pass
Why can't I load D when I execute a?
If you change a.py to: Import B.
What's going on here?
Robertchen: This is related to the mechanism of Python internal import, which is divided into several steps within the From B import D,python:
- Find the symbol "B" in Sys.modules
- If the fruit symbol B is present, the module object corresponding to the symbol B <module b> is obtained.
Gets the object corresponding to the symbol "D" from the __dict__ of <module b> and throws an exception if "D" does not exist
If the symbol B does not exist, create a new module object <module B> Note that at this point, the __dict__ of the module object is empty.
Executes the expression in b.py, populating the __dict__ of the <module b>.The object corresponding to "D" is obtained from the __dict__ of <module b>, and if "D" does not exist, an exception is thrown.
So the order of execution for this example is as follows:
1, the implementation of the a.py in the From B import D
Since the Python a.py is executed, there is no <moduleB> presence in Sys.modules, first create a Module object (<moduleB>) for b.py, note that At this time the module object created is empty, there is nothing inside python, after creating this module object, will parse execution b.py, the purpose is to populate <module b> this dict.
2, the implementation of the b.py from a import C
In the execution of the b.py process, you will encounter this sentence, first check sys.modules This module cache is already existing <moduleA>, because the cache has not been cached <modulea> Inside Python, a Module object (<moduleA>) is created for a.py, and then, similarly, the statements in a.py are executed.
3. Re-execute the From B import D in a.py
At this time, because in the 1th step, the creation of <moduleB> objects have been sys.modules, so directly to get the <moduleb> but, notice, from the whole process, we know that at this point <moduleB> is an empty object, there is nothing in it, so the operation of getting the symbol "D" from this module throws an exception. If this is only a IMPORTB, it will not throw an exception because the symbol "B" already exists in sys.modules.
The above explanation has been included in the woodpecker by Zoom.quiet, there is a picture, you can refer to.
Package (Pack) Import
A package can be seen as a collection of modules, as long as a folder has a __init__.py file underneath it, so this folder can be considered a package. The folder under the package can also be a package (child package). Further, a number of smaller packages can be aggregated into a larger package, which facilitates the management and maintenance of the class and facilitates the user's use by wrapping this structure. For example, SQLAlchemy are released to the user in the form of a package.
Packages and modules are actually very similar things, if you look at the type of package import SQLAlchemy type (SQLAlchemy), you can see actually also <type ' module ' >. The path to find when you import the package is also sys.path.
The process of package import is basically consistent with the module, except that it executes the __init__.py in the package directory instead of the statements inside the module when the package is imported. In addition, if the package is simply imported, and there are no explicit other initialization operations in the package's __init__.py, the modules below this package will not be imported automatically. Such as:
Pa
--__init__.py
--wave.py
--pb1
--__init__.py
--pb1_m.py
--pb2
--__init__.py
--pb2_m.py
__init__.py are empty if the following programs are available:
- Import Sys
- Import Pa.wave #1
- Import PA. PB1 #2
- Import PA. Pb1.pb1_m as M1 #3
- Import PA. Pb2.pb2_m #4
- PA.wave.getName () #5
- M1.getname () #6
- Pa. PB2.pb2_m.getName () #7
When the # # # is executed, the sys.modules will have both PA and pa.wave two modules, at which point any class or function of Pa.wave can be called. However, you cannot call any of the modules under PA.PB1 (2). The current Local has a PA name.
When you execute # #, just load the PA.PB1 into memory, and the Sys.modules will have PA, Pa.wave, Pa. PB1 three modules, but any module under PA.PB1 does not automatically load memory, at this point, if the direct execution of PA.PB1.pb1_m.getName () will be an error, because there is no pb1_m in PA.PB1. The current local is still only the PA name, and there is no PA.PB1 name.
When you execute # #, the pb1_m under PA.PB1 will be loaded into memory, and the Sys.modules will have PA, Pa.wave, Pa. PB1, PA. Pb1.pb1_m four modules, the PA.PB1.pb1_m.getName () can be executed at this time. Due to the use of AS, in addition to the PA name in the current local, M1 is added as an alias for PA.PB1.pb1_m.
When the # # # is executed, the PA.PB2, PA. Pb2.pb2_m loaded into memory, the Sys.modules will have PA, Pa.wave, Pa. PB1, PA. Pb1.pb1_m, PA. PB2, PA. Pb2.pb2_m of six modules. The current local is still only PA, M1.
Below the # #, #6, #7都是可以正确运行的.
Note: If PA.PB2.pb2_m want to import PA.PB1.pb1_m, Pa.wave can be directly successful. It is best to use an explicit import path for./. The relative import path is not recommended.
Python Import mechanism