File operations, Path operations, Stringio and Bytesio, serialization deserialization, regular expressions and Python use

Last Update:2018-09-09 Source: Internet

Author: User

Tags object serialization parent directory

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

File operations

Open action
Open (file, mode= ' R ', Buffering=-1, Encoding=none, Errors=none, Newline=none, Closefd=true,
Opener=none)
Opens a file that returns a file object (Stream object) and a file descriptor. Failed to open file, return exception
Basic use: Create a file test, then open it, run out of
The most common operation in a file operation is to read and write. There are two modes of file access: Text mode and binary mode. Operation functions in different modes
Different, the results of the performance are not the same.
Parameters for Open
File
The name of the file to open or to create. If you do not specify a path, the default is the current path
Open default is read-only mode R opens a file that already exists.
R read-only opens the file and throws an exception if you use the Write method. Throws a Filenotfounderror exception if the file does not exist
W indicates write-only opening, if read throws an exception if the file does not exist, create the file directly if the file exists, clear the contents of the file
The x file does not exist, the file is created, and the write-only method opens. File exists, throws Fileexistserror exception
A file exists, write-only open, append content file does not exist, after creation, write-only open, append content
R is read only and Wxa are written only. Wxa can generate new files, W no matter whether the file exists or not, will generate a completely new content of the file;
No, can be appended at the end of the open file, x must require that the file does not exist beforehand, create a new file yourself
Text-mode T-character stream, which encodes the bytes of a file according to a character encoding and operates according to characters. The default mode of open is Rt.
Binary mode B-stream, which understands the file in bytes, regardless of the character encoding. In binary mode operation, byte operations use the bytes type
+ provides missing read or write functionality for R, W, a, x, but the get file object still follows the characteristics of R, W, a, x itself. + cannot be used alone,
You can think of it as an enhancement to the preceding pattern character.
File pointers
In the above example, it has been shown that there is a pointer.
The file pointer, pointing to the current byte position
Mode=r, the pointer starts at 0 mode=a, and the pointer starts at EOF
Tell () shows the current position of the pointer
Seek (offset[, whence]) moves the file pointer position. Offest How many bytes are offset, and where whence starts.

whence 0 Default in text mode, offest only positive integer whence 1 indicates from the current position, Offest only accepts 0
Whence 2 means that starting with EOF, Offest only accepts 0
Buffering: Buffer
1 indicates a buffer with the default size. In binary mode, use IO. The default_buffer_size value, which defaults to 4096 or 8192.
If it is text mode, if it is an end device, it is the row cache mode, if not, then the binary mode policy is used.
0 only used in binary mode, indicating off buffer
1 is used in text mode only, indicating that row buffering is used. It means to see the newline character flush.
Greater than 1 to specify the size of buffer
Buffer buffers
Buffer A memory space, generally a FIFO queue, until the buffer is full or the threshold is reached, the data will flush to disk.
Flush () writes buffer data to disk close () calls Flush () before closing
1 text mode, usually with default buffer size
22 binary mode, is a byte operation, you can specify the size of buffer
3 in general, the default buffer size is a good choice, unless you know it clearly, or not adjust it
In general programming, it is clear that you need to write a disk, and will call flush manually, instead of waiting for automatic flush or close
Read
Read (Size=-1)
A size that indicates how many characters or bytes are read, or none indicates that the read to EOF
Line Read
ReadLine (Size=-1)
A row of rows reads the contents of the file. The size setting can read several characters or bytes in a row at a time.
ReadLines (Hint=-1)
Reads a list of all rows. Specifies that hint returns the specified number of rows.
Write
Write (s), writes the string s to the file and returns the number of characters writelines (lines), and writes a list of strings to the file.
Close
Flush and close the file object.
The file has been closed and closed again without any effect.
Context Management
1 using the With ... as keyword
2 context-managed statement blocks do not open new scopes
When the 3with statement block finishes executing, the file object is closed automatically
Stringio
? Classes in the IO module
? From IO import Stringio
? In memory, open a text-mode buffer that can manipulate it like a file object
? When the Close method is called, the buffer is freed
GetValue () to get the full content. Not related to file pointers
Benefits
Generally, the operation of the disk is much slower than the memory operation, the memory is sufficient, the general optimization idea is less landing, reducing
The process of disk IO can greatly improve the running efficiency of the program.
Bytesio
? Classes in the IO module
? From IO import Bytesio
? In memory, open up a binary mode buffer that can manipulate it like a file object
? When the Close method is called, the buffer is freed
File-like Object
? Class file object that can operate like a file object
? Socket objects, input and output objects (stdin, stdout) are class file objects
Path operation
Path Operation module
Before version 3.4
Os.path Module
Version 3.4 starts
It is recommended to use the Pathlib module to provide the path object to operate. Include Directories and files.
Pathlib Module
From Pathlib import Path
Path stitching and decomposition
operator
Path Object/Path object
Path object/String or string/path object
Decomposition
Parts property, you can return each part of a path
Joinpath
Joinpath (*other) joins multiple strings into the path object

Get path
STR Get path string
Bytes gets the bytes of the path string
Parent Directory
The logical parent directory of the parent directory
Parents The parent directory sequence, index 0 is the direct parent
Catalog components
Name, stem, suffix, suffixes, with_suffix (suffix), with_name (name)
The last part of the name directory
The extension of the last section of the suffix directory
Stem directory last section, no suffix
Suffixes returns a list of multiple extension names
With_suffix (suffix) replaced with extension, without supplemental extension
With_name (name) replaces the last part of the directory and returns a new path
Global methods
CWD () returns the current working directory
Home () return to the current home directory
Judging method
Is_dir () is directory, directory exists return true
Is_file () is normal file, file exists return true
Is_symlink () Whether it is a soft link
Is_socket () is a socket file
Is_block_device () is a block device
Is_char_device () is a character device
Is_absolute () is an absolute path
Resolve () Returns a new path, which is the absolute path to the current path object, and if it is a soft link it is parsed directly
Absolute () Gets the absolute path
Exists () whether the directory or file exists
RmDir () Delete the empty directory. Does not provide a way to determine if the directory is empty
Touch (mode=0o666, exist_ok=true) Create a file
As_uri () returns the path as a URI wildcard
Glob (pattern) to pass a given pattern
Rglob (pattern) wildcard for a given pattern, recursive directory
All return a generator
The
Match (pattern)
Pattern matching, successful return true
Stat () equals stat command
Lstat () with stat (), but if it is a symbolic link, displays the file information of the symbolic link itself
File operations
Path.open (mode= ' R ', Buffering=-1, Encoding=none, Errors=none, Newline=none)
The use method is similar to the built-in function open. Returns a File object
3.5 new functions added
Path.read_bytes ()
Read the path corresponding file with ' RB ' and return the binary stream. See source
Path.read_text (Encoding=none, Errors=none)
Read the path corresponding file in ' RT ' mode and return the text.
Path.write_bytes (data)
Write the data to the path corresponding file in ' WB ' mode.
Path.write_text (data, Encoding=none, Errors=none)
Writes the string to the path corresponding file in ' WT ' mode.
Serialization and deserialization
Defined
Serialization of serialization
Store the objects in memory and turn them into bytes. -B Binary
Deserialization deserialization
Restores a byte of a file to an in-memory object. <-binary
Serialization saved to a file is persisted.
The data can be serialized, persisted, or transmitted over the network, or a sequence of bytes received from a file or network can be deserialized.
Python provides the Pickle library.
Pickle Library
serialization, deserialization module in Python.
Dumps object serialized to bytes object
The dump object is serialized to a file object, that is, the file is deposited
Loads deserialization from a bytes object
The load object is deserialized and the data is read from the file
Json
JSON (JavaScript object Notation, JS tag) is a lightweight data interchange format. It is based on the ECMAScript (the organization developed
A subset of the JS specification, using a text format that is completely independent of the programming language to store and represent data.
Data types for JSON
Value
Double quotation marks are caused by strings, values, True and False,null, objects, arrays, which are values
String
A combination of any character enclosed by double quotation marks, which can have an escape character.
Numerical
Positive or negative, with integer, floating-point number.
Object
A collection of unordered key-value pairs
Format: {key1:value1, ..., keyn:valulen}
Key must be a string that encloses the string with double quotes.
Value can be any of the valid values.
Array
A collection of ordered values
Python supports a small number of built-in data types to JSON-type conversions.
Common methods
Python type JSON type
Dumps JSON encoding
Dump JSON encoding and depositing files
Loads JSON decoding
Load JSON decoding and depositing files
Generally JSON-encoded data is seldom landed, and data is transmitted over the network. When transferring, consider compressing it.
It's essentially a text, a string.
JSON is simple, and almost all language programming supports JSON, so it's a wide range of applications.
Messagepack
Messagepack is a binary-efficient object serialization class library that can be used for cross-language communication.
It can exchange structural objects among many languages, like JSON.
But it's faster and lighter than JSON.
Support for Python, Ruby, Java, C + + and many other languages. Claimed to be 4 times times faster than Google Protocol buffers.
Compatible with JSON and pickle.
Common methods
PACKB the serialized object. Dumps is provided for compatibility with Pickle and JSON.
UNPACKB the deserialized object. The loads is provided for compatibility.
The pack serialization object is saved to a file object. Provides a dump to be compatible.
Unpack the deserialized object to a file object. The load is provided to be compatible.
Regular expressions

. Match any character except for a line break
The [ABC] character set can only represent one character position. Match any one of the characters contained
The [^ABC] character set can only represent one character position. Matches any character that drops a character within a collection
A [A-z] character range, a collection that represents a character position that matches any one of the characters contained
[^a-z] Character range, a collection that represents a character position matching any character that drops a character within a collection
\b Match the boundaries of a word
\b does not match the boundary of a word
\d equals [0-9] matches a digit
\d equals [^0-9] matches a non-numeric
\s matches 1-bit whitespace characters, including line breaks, tabs, space equivalents [\f\r\n\t\v]
\s Matching 1-bit non-whitespace characters
\w equivalent [a-za-z0-9_] contains Chinese
\w matches a character other than \w

Any symbol with a special meaning in a regular expression, use \ Escape if you want to use its intent.
The backslash itself, to use \
\ r, \ n or escape to represent carriage return, line break

Repeat

The preceding regular expression repeats 0 or more times
The preceding regular expression repeats at least once
？ The preceding regular expression repeats 0 or 1 times
{n} repeats n times
{n,} repeats more than n times
{N,m} repeats n to M times

Grouping (capturing) assertions

X y matches x or y
(pattern) is automatically assigned after grouping (capture) group number starting from 1 can change the priority \ Number matching corresponding grouping (refers to the contents of the previous match on the group)
(?:p Attern) Only change priority non-grouping
(? <name>exp) (? nameexp) group capture to group naming Python syntax for (? P<NAME>EXP)
(? =exp) 0 width positive lookahead assertion assertion exp must appear on the right side of the match
(? <=exp) 0 width is recalling the post assertion assertion that exp must appear on the left side of the match
(?! EXP) 0 Width Negative lookahead assertion asserts that exp must not appear on the right side
(? <!exp) 0 width Negative review post assertion asserts that Exp must not appear on the left side
(? #comment) Comments

Greed and non-greed
Default greedy mode, as many matching strings as possible

*? Match any number of times and repeat as little as possible
+? Match at least once and repeat as little as possible
?? Match 0 or 1 times and repeat as little as possible
{n}? Match at least n times and repeat as little as possible
{n,m}? Match at least N times, up to M times, as few repetitions as possible

Engine options

IgnoreCase match ignores case re. Ire.ignorecase
Singleline single-line mode, which penetrates the/n re. Sre.dotall
Multiline Multi-line mode re. Mre.multiline
Ignorepatternwhitespace ignores expression white-space characters, to use whitespace

Single-line mode:
. All characters, including line breaks, can be matched.
^ Represents the beginning of the entire string, the end of the entire string
Multi-line mode:
. Characters other than line breaks can be matched.
^ Indicates the beginning of the line, $ end
^ Represents the beginning of the entire string, which represents the end of the entire string. The start refers to \ n followed by the next character, and the end refers to the character before/n
As you can see, a single-line pattern is like seeing through a newline, all text is a long, one-line string, so ^ is the line of this line of string
First, $ is the end of the line.
Multi-line mode, unable to penetrate line breaks, ^ and $ or the end of line at the beginning of the meaning, but limited to each row
Note: Note the line breaks that are not visible in the string, \ r \ n will affect the e$ test, e$ only match e\n

Python Regular Expressions
Re module
Compile
Re.compile (pattern, flags=0)
Returns the regular Expression object regex
The results of the regular expression compilation are saved, and the next time you use the same pattern, you do not need to recompile
One-time match
Regex.match (string[, pos[, Endpos])

Match from beginning of string to specify start and end position return match object
Regex.search (string[, pos[, Endpos])

Search from scratch until the first match, specifying the start and end positions to return the match object
Regex.fullmatch (string[, pos[, Endpos])
Match the entire string to the regular expression
Full-Text Search
Regex.findall (string[, pos[, Endpos])

Matches the entire string from left to right, returning a list of all occurrences
Regex.finditer (string[, pos[, Endpos])
Matches the entire string, left-to-right, and returns all occurrences of the iterator, each of which is a match object
Match substitution
Regex.sub (Replacement, string, count=0)

Use pattern to match string strings and replace matches with replacement
Replacement can be string, bytes, function
Regex.subn (Replacement, string, count=0)
Functions like a sub return a tuple (new_string, Number_of_subs_made)
Split string
Regex.Split (String, maxsplit=0)
Return to List
Group
using (pattern) captured data to put in a group

Match class method
Group (N)

1-n the corresponding group 0 but will match the entire string
If a named grouping is used, the group (name) method can be taken out
Groups ()

Returns a tuple of all groups
Groupdict ()
Returns a dictionary of all named groupings

File operations, Path operations, Stringio and Bytesio, serialization deserialization, regular expressions and Python use

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More