Python 3 Features Introduction
Python is the mainstream language in machine learning and other sciences, and we often need to use it to process large amounts of data. Python is compatible with a variety of deep learning frameworks and has a number of great tools to perform data preprocessing and visualization.
Using Pathlib to better handle paths
Pathlib is the default module for Python 3, helping to avoid the use of a large number of os.path.joins:
Examples of type hints in Pycharm:
Python is not just a script-friendly language, but now the data flow also includes a number of steps, each of which includes different frameworks (sometimes including different logic).
The code above applies to Numpy.array (including multidimensional), Astropy. Table and Astropy. Column, Bcolz, Cupy, Mxnet.ndarray and so on.
The code can also be used for pandas. Series, but the way is wrong:
This is a two-line code. Imagine how difficult it is to predict how complex systems behave, and sometimes a function can lead to wrong behavior. It is helpful to have a clear understanding of which types of methods are suitable for large systems, and it gives reminders when functions do not have such parameters.
If you have a great code base, type hinting tools such as MyPy may be part of the integration process. Unfortunately, the hint is not strong enough to provide a fine-grained type for ndarrays/tensors, but perhaps we can soon have such a ToolTip, which will be a great feature of DS.
Other uses of function annotations
As mentioned earlier, annotations do not affect code execution and provide some meta-information that you can use at will.
For example, the unit of measurement is a common problem in the scientific community, and the Astropy package provides a simple adorner (Decorator) to control the unit of measurement of the input amount and convert the output to the desired unit.
The following Python 3 symbol with @ as matrix multiplication is more readable and easier to translate in the deep learning framework: Because some code, such as X @ W + B[none,:] Represents a single-layer perception under different libraries such as NumPy, Cupy, Pytorch, and TensorFlow Machine.
Use * * as a wildcard character
The wildcard character of recursive folders is not very convenient in Python2, so there is a custom GLOB2 module to overcome this problem. Recursive flag has been supported in Python 3.6.
In Jupyter, it is very good to record each output to a separate document and to trace the problematic document in the event of an error, so we can now rewrite the print function.
In the following code, we can use the context Manager to temporarily override the behavior of the print function:
F-strings can be used as a simple and reliable format
The default format system provides some flexibility and is not required in data experiments. But such code is either too verbose for any modification or becomes fragmented. While representative data science needs to output some log information iteratively in a fixed format, the code typically needs to be used as follows:
The obvious difference between "true division" and "integer division"
This change is convenient for data science (but I'm sure it's not for system programming).
Strict sorting
Output:
- Python 2:6??
- Python 3:2 Hello.
- Python 2:counter ({' \xc3 ': 2, ' B ': 1, ' E ': 1, ' C ': 1, ' K ': 1, ' M ': 1, ' s ': 1, ' t ': 1, ' \xb6 ': 1, ' \XBC ': 1})
- Python 3:counter ({' M ': 1, '? ': 1, ' B ': 1, ' E ': 1, ' l ': 1, ' s ': 1, ' t ': 1, ' Ü ': 1, ' C ': 1, ' K ': 1})
These work correctly in Python 2, but Python 3 is more friendly.
Preserve the order of dictionaries and **kwargs
In the CPython 3.6+ version, the default behavior of the dictionary is similar to Ordereddict (which is guaranteed in the 3.7+ version). This is maintained in the Order of Dictionary understanding (and other operations such as JSON serialization/deserialization).
Have you noticed? The uniqueness of the name is also checked automatically.
To unpack in an iterative manner
Save 3 times times more space, and faster. In fact, similar compression (but not speed-independent) can be implemented with the protocol=2 parameter, but the user usually ignores this option (or doesn't know it at all).
More secure parsing
For more information on super and method parsing order, see stackoverflow:https://stackoverflow.com/questions/576169/ Understanding-python-super-with-init-methods
A better IDE will give you a variable comment
The most enjoyable thing about programming in languages such as Java, C # is that the IDE can provide very good advice because the type of all identifiers is known before executing the code. This is difficult to implement in Python, but annotations can help you:
- Write down your expectations in a clear form
- Get good advice from the IDE
This is an example of a pycharm with variable annotations. Even if you are using a function that does not have a comment (for example, because of backward compatibility), it can also work.
Obviously, the author of the code is not familiar with Python's code style (most likely just jumping from CPP and rust to Python). Unfortunately, this is not just a matter of personal preference, because changing the order of parameters in the SVC (adding/deleting) makes the code ineffective. In particular, Sklearn often reorder or rename a number of algorithmic parameters to provide a consistent API. Each refactoring can invalidate the code.
In Python3, the writer of the library may need to use * to explicitly name parameters:
Minor: Single-precision integer type
Python 2 provides two basic integer types, the Int (64-bit sign integer) and long (which becomes quite baffling in C + +) for a lengthy calculation.
Python 3 has a single-precision type int, which contains long-time operations.
Here's how to see if a value is an integer:
Data science-specific code migration issues (and how to resolve them)
To stop support for nested parameters:
- Map (),. keys (),. values (),. Items (), etc. returns an iterator instead of a list. The main problems with iterators are: There are no trivial splits and cannot iterate two times. Turning the results into a list can solve almost any problem.
- See Python question and answer: how do I migrate to Python 3? (https://eev.ee/blog/2016/07/31/python-faq-how-do-i-port-to-python-3/)
Using Python to teach machine learning and data science major issues
- The course author should first take the time to explain what an iterator is and why it cannot be fragmented/cascaded/multiplied/iterated two times (and how to handle it) like a string.
- I'm sure most of the course authors are happy to avoid these details, but now it's almost impossible.
Conclusion
Python 2 and Python 3 coexist for nearly 10 years, and today we have to say: It's time to turn to Python 3.
Research and Production code should be shorter, easier to read, and significantly more secure after migrating to the Python 3 code base.
Most libraries now support two versions of both 2.x and 3.x. But we should not wait until the popular toolkit starts to stop supporting Python 2 before you start to take action, enjoy the new language features in advance.
After the migration, I'm sure the program will be smoother: "We're not going to do backwards incompatibilities (https://snarky.ca/why-python-3-exists/)."
Incoming group: 125240963, you can get dozens of sets of pdf!
Python2.7 Soon is the past style! Still using 2.7 programmers to prepare a 3 guide