Python Novice Advanced version: How to read unstructured, image, video, voice data

Source: Internet
Author: User
Tags base64 webp

An image file object is returned after the open is read, and all subsequent image processing is based on that object. After the above code is executed, the system default image browser is called through Img.show () to view the open image for viewing.

This object contains a number of methods that can be used to print the properties of an output file, such as size, format, color mode, and so on.

Print (' img format: ', img.format) # Printable image format

Print (' img size: ', img.size) # Prints Image size

Print (' img mode: ', img.mode) # Printable Image color mode

The results returned after the above code execution are as follows:

    • RGB: All colors in nature can be obtained from the combination of red, green, and blue wavelengths of three colors, which is very popular in the field of digital display.

    • CMYK: This is an industrial four-color printing of the Arancer standard, four letters respectively refers to the blue (Cyan), magenta (Magenta), yellow (Yellow), black.

    • HSB: This pattern uses color (Hue), saturation (saturation), and brightness (brightness) to express the elements of the colour, which is more based on the perception and perception of human psychology.

    • Other modes: Other modes include grayscale mode, index mode, bitmap mode, etc., and are also common in certain scenarios.

In addition, other operations can be performed based on the file object, such as format conversion, rotation, cropping, merging, filtering processing, color processing, thumbnail processing, and so on. Confined to space, not too much to introduce here.

2. Reading images using OpenCV

OPENCV There are two main ways to read and display images, the first is to use a CV library, and the second is to use the CV2 library.

First: Reading images using CVS

Grammar

Cv2.imread (filename[, flags])

Describe

Read the image content, if the image can not read the return of empty information, supporting the image format includes almost every day in the format of all scenarios, including:

    • Windows bitmaps files: *.bmp, *.dib

    • JPEG files: *.jpeg, *.jpg, *.jpe

    • JPEG 2000 File: *.JP2

    • PNG file: *.png

    • WEBP File: *.WEBP

    • Mobile image format: *.PBM, *.PGM, *.ppm *.PXM, *.PNM

    • Sun rasters file: *.sr, *.ras

    • TIFF files: *.tiff, *.tif

    • OpenEXR file: *.exr

    • Radiance HDR Files: *.hdr, *.pic

Parameters

    • filename required, string, image address.

    • Flags optional, int or corresponding string, color read mode. If Flag>0 or Cv2. Imread_color, reads a color image with a r/g/b three-channel, if flag=0 or CV2. Imread_grayscale, read grayscale image, if flag<0 or CV2. Imread_unchanged, reads the original image that contains the alpha channel.

Return

The image content that returns null if the image cannot be read.

Tip: In addition to using OpenCV's own image presentation method, OpenCV often works with matplotlib to display images, which are more commonly used. The combination can be used to borrow Matplotlib's powerful image display capabilities for image comparison and reference as well as the output of different image modes.

03 Reading video Data

The most commonly used library for Python read video is OpenCV. This article is illustrated in the example of a video called Megamind.avi, which reads the video content in a sample code:

Import CV2 # Importing libraries

Cap = Cv2. Videocapture ("Tree.avi") # Get Video Object

Status = cap.isopened () # Determine if the file is correctly opened

If status: # Gets the video's attribute information if it is opened correctly

Frame_width = Cap.get (3) # Get frame width

Frame_height = Cap.get (4) # Get frame height

Frame_count = Cap.get (7) # Gets the total number of frames

Frame_fps = Cap.get (5) # Get frame rate

Print (' Frame width: ', frame_width) # printout

Print (' Frame height: ', frame_height) # printout

Print (' Frame count: ', Frame_count) # printout

Print (' Frame fps: ', frame_fps) # printout

Success, frame = Cap.read () # Read first frame of video

While success: # If the read state is True

Cv2.imshow (' Vidoe frame ', frame) # Display frame image

Success, frame = Cap.read () # Get Next Frame

K = Cv2.waitkey (1000/int (Frame_fps) #) # Delay a certain amount of time for each frame playback while waiting for input instructions

if k = = 27: # If the key ESC is detected during the wait

Break # Exit Loop

Cv2.destroyallwindows () # Close all windows

Cap.release () # Release the video file object

The above code is divided into 4 parts, separated by a blank line.

The first 3 lines, the pilot storage, and then read the video file and get the video object, and then get the video read status. The key method is Videocapture, which is used to read the image.

Grammar

Cv2. Videocapture (Videocapture id|filename|apipreference)

Describe

Read a video device or file and create an instance of a video object

Parameters

Required, Videocapture Id|filename

Videocapture Id:int, the ID of the system-assigned device object, the ID of the default device object is 0.

Filename:

    • The name of the video file, a string, such as Abc.avi. Only AVI format is supported in the current version.

    • Sequence images, strings, such as img_%2d.jpg (image sequences include img_00.jpg, img_01.jpg, img_02.jpg, ... )

    • Video URL address, string, such as Protocol://host:port/script_name?script_params|auth

    • Apipreference:int, the API used in the background

Return

A Video object instance

The second part is the IF loop body of 9 lines of code, the code is mainly used to determine the correct reading of the file, the output of the overall video file information. In addition to the parameter values used by the Get method in your code, OPENCV also supports more image properties, as shown in the following table.

Value

Property

Describe

0

Cv_cap_prop_pos_msec

Current position (unit: MS)

1

Cv_cap_prop_pos_frames

Current position (unit: Number of frames, starting from 0)

2

Cv_cap_prop_pos_avi_ratio

Current position (unit: ratio, 0 = start, 1 for end)

3

Cv_cap_prop_frame_width

Frame width

4

Cv_cap_prop_frame_height

Frame height

5

Cv_cap_prop_fps

Frame rate

6

Cv_cap_prop_fourcc

4-character video encoding (eg: ' M ', ' J ', ' P ', ' G ')

7

Cv_cap_prop_frame_count

Total number of frames

8

Cv_cap_prop_format

Retrieve (). Call the returned matrix format

9

Cv_cap_prop_mode

The current captured pattern indicated by the back-end variable

10

Cv_cap_prop_brightness

Brightness (for camera only)

11

Cv_cap_prop_contrast

Contrast ratio (for cameras only)

12

Cv_cap_prop_saturation

Saturation (for cameras only)

13

Cv_cap_prop_hue

Tint (for camera only)

14

Cv_cap_prop_gain

Gain (for camera only)

15

Cv_cap_prop_exposure

Exposure (for cameras only)

16

Cv_cap_prop_convert_rgb

Whether the image should be converted to an RGB image (Boolean value)

17

Cv_cap_prop_white_balance

White balance (temporarily not supported v2.4.3)

Image properties supported by the ▲get method

The third part is the specific reading and presentation of each frame of the video content. First read the first frame of the video, if the status is true, the image is displayed and the next frame is read, during which the image delay control is done through the Cv2.waitkey parameter, while the delay waits for the system input to be specified, and if there is an input ESC exits the loop to read the frame content.

The fourth part is the release of the video file object after all the operations have been completed by deleting all forms created by OpenCV.

For more information on OPENCV, refer to opencv.org

04 Reading Voice data

For the reading of voice files, you can use Python's audioop, AIFC, WAV and other libraries to implement. But for speech processing in this segment, the current market has a very mature solution, such as Iflytek, Baidu Voice, etc., in most cases, we will call its API to achieve the function of speech analysis processing, or as pre-processing functions before analysis.

In the implementation process, you can either download the SDK directly for offline applications, or you can use the online service.

This article will take the Baidu Voice API service Application As an example, explains how to request the Baidu Speech API, converts the voice data to the text information.

Before formally applying the Baidu Speech API, please set up Baidu account and register as Baidu developer.

Based on this condition, we continue to open the speech recognition service. Here's how:

Go to Http://yuyin.baidu.com/app and click on which app to turn on the speech recognition service in the popup screen. We use it by default in the previously established Api_for_python application. So, click on "Activate Service" for the app.

▲ Open Service

In the pop-up window, click Select "Speech Recognition" and OK.

▲ Select Open Speech recognition service

After the successful launch, the system will prompt the "service is open", and then click on the right side of "View Key", will pop up the following information:

▲ Figure 2-32 Application key Information

The API key and secret key in the popup above are the information to be used in subsequent speech recognition.

The following is the full code:

# import Library

Import JSON # for converting JSON strings

Import Base64 # BASE64 encoding for making voice files

Import requests # used to send server requests

# Get Token

Api_key = ' DDOYOKO0VZBGDDFQNYHINKYDGKZBKUQR ' # is obtained from the application's Key information

Secret_key = ' OIIBOC5ULLUMUMPWS3M0LUWB00HQIDPX ' # is obtained from the application's Key information

Token_url = "Https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=%s&client _secret=%s "# Get Token's address

res = Requests.get (token_url% (Api_key, Secret_key)) # Send Request

Res_text = res.text # Get the text message in the request

token = json.loads (res_text) [' Access_token '] # Extract token information

# define the voice to send

Voice_file = ' BAIDU_VOICE_TEST.PCM ' # Voice file to recognize

VOICE_FN = open (Voice_file, ' RB ') # opens file in binary mode

Org_voice_data = Voice_fn.read () # Read File contents

Org_voice_len = Len (org_voice_data) # Get file length

Base64_voice_data = Base64.b64encode (org_voice_data) # Convert speech content to Base64 encoded format

# Send a message

# define the data body information to be sent

headers = {' Content-type ': ' Application/json '} # defines header information

Payload = {

"Format": "PCM", # subject to the specific name of the voice extension to be recognized

"Rate": 8000, # supports 8000 or 160,002 sample rates

"Channel": 1, # fixed value, mono

"Token": token, # The above-obtained token

"CUiD": "b8-76-3f-41-3e-2b", # Native MAC address or device unique identification flag

"Len": Org_voice_len, # The original file content length obtained above

"Speech": Base64_voice_data # Voice data after transcoding

}

data = Json.dumps (payload) # to convert to JSON format

Vop_url = ' Http://vop.baidu.com/server_api ' # Speech recognition API

Voice_res = Requests.post (Vop_url, Data=data, headers=headers) # Send a speech recognition request

Api_data = voice_res.text # Get speech recognition text return results

Text_data = Json.loads (api_data) [' Result ']

Print (Api_data) # printout overall return results

Print (text_data) # Printing output speech recognition text

The code is delimited by a blank line, including 4 parts:

The first part is to import the required library information, for specific purposes see code annotations.

The second part is to obtain token information to use the Baidu speech recognition API. The Api_key and Secret_key are obtained from the "Application key Information". Token_url defines the complete string through a placeholder and sends the specific variable data on request, and reading the token directly from the returned information is easy to use in the application below. For more information on obtaining tokens, refer to HTTP://YUYIN.BAIDU.COM/DOCS/ASR/56.

Tip: When requesting token, you can use either get or post (recommended), token is valid for 1 months by default, and re-apply if it expires.

The third part is mainly used to obtain and process voice file data. The most common open method reads the speech data in binary mode, then obtains the original data length from the obtained speech data and converts the original data to the Base64 encoding format.

Note: Baidu Speech recognition API for the audio source to be recognized is required: The original PCM recording parameters must conform to 8k/16k sampling rate, 16bit bit depth, mono, supported by the compression format: PCM (uncompressed), WAV, opus, AMR, X-flac.

Part IV is the main body of the content of this section, sending requests to obtain speech recognition results. This paragraph defines the sending header information, then defines a dictionary that stores the Key-value string to be sent and converts it to JSON format, then uploads it by means of the Post method and obtains the returned result, and finally outputs the information that returns the result and the voice-to-text in it. This part of the details of the more detailed, see the Baidu Speech API development Instructions HTTP://YUYIN.BAIDU.COM/DOCS/ASR/57.

With regard to the acquisition of cuid, as the author was tested on the local computer, the MAC address was used. To get the MAC address, open the System Terminal command-line window (win+r, enter CMD and return), enter command ipconfig/all on the command line, and find in all listed connections that the media state is not "media disconnected" and that it belongs to the physical address information of the current connection. As the author of Computer Mac information:

▲ Get MAC address information

For more information on voice services, refer to http://www.xfyun.cn/.

The above code executes and returns the following result:

{"Corpus_no": "6409809149574448654", "err_msg": "Success.", "Err_no": 0, "result": ["Baidu Voice provides technical support,"], "SN": " 83327679891492399988 "}

[u ' \u767e\u5ea6\u8bed\u97f3\u63d0\u4f9b\u6280\u672f\u652f\u6301\uff0c ']

The successful return of the system is the recognition result, the recording content is "Baidu Voice provides technical support", the second end of the encoding is the Unicode encoding format of Chinese.

Summary: The above speech recognition only provides a method of speech to the text, in fact, the voice itself includes a lot of information, in addition to the relatively shallow physiological and physical characteristics, such as speed, tone, syllables, timbre, rimal, and also include a deeper social attributes, this part of the content needs natural voice understanding of deep-seated applications. The main application directions of current voice data reading include:

    • Speech to text. This is also a broad sense of speech recognition, the direct conversion of voice information to text information, for example, there is this small function.

    • Speech recognition. Speech recognition refers to the process by which the Speaker realizes its role recognition and individual recognition by selecting speech Recognition unit, extracting speech feature parameters, model training, model matching and so on, for example, through a certain speech to identify who is speaking.

    • Speech semantic comprehension. On the basis of speech recognition, it is necessary to analyze the semantic features to obtain the latent knowledge or intention of the speech, and then provide the corresponding response content or method. The difference between speech recognition and speech comprehension is that the speech recognition is to determine the literal meaning of speech expression, which belongs to the surface meaning, and the meaning of voice comprehension is the deep meaning of digging the voice.

    • Speech synthesis. Speech synthesis is to allow the computer to "speak", which is a personification of the technical method. Speech synthesis, also known as text to Speech technology, through the mechanical, electronic way of the text information into human can understand the voice.

    • Application integration. After analyzing and identifying the information can be integrated with the hardware, directly send instructions through the voice. For example, with Siri "communication", in addition to daily communication, it can also tell you the weather conditions, to help you set up the system schedule, introduce the restaurant and so on. This is a typical application of intelligent robot in pattern recognition.

Based on the above-mentioned complex application scenarios, usually the process of voice follow-up analysis, processing and modeling can not be done by the data engineer alone, but also requires a lot of corpus material, sociology, signal engineering, language grammar, phonetics, natural speech processing, machine learning, knowledge search, Knowledge processing and other cross-disciplinary and related fields are only possible to unlock the passwords.

You are also welcome to follow my blog: https://home.cnblogs.com/u/sm123456/

Welcome to join thousands of people to communicate questions and Answers group: 125240963

Python Novice Advanced version: How to read unstructured, image, video, voice data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.