Yi Hundred tutorial ai python fix-ai data preparation-tag data

Source: Internet
Author: User

We already know that data in some format is required for machine learning algorithms. Another important requirement is that the data must be correctly tagged before it is sent as input to the machine learning algorithm. For example, if the classification is said, then there will be many tags on the data. These tokens exist in the form of text, numbers, and so on. Functional expectations related to machine learning in Sklearn the data must be digitally marked. Therefore, if the data is other forms, it must be converted to a number. The process of converting a word label into a digital form is called a tag encoding.

Tag Encoding Step

Follow these steps to encode data markers in Python-

1th Step -Import a useful package

If you use Python, this will be the first step in converting the data to a specific format (that is, preprocessing). It can do the following-

Import NumPy as NP  from Import preprocessing

2nd Step -Define the sample label

After the package is imported, we need to define some sample labels so that the tag encoder can be created and trained. The following sample labels are now defined-

# Sample Input Labelsinput_labels = ['red','black', ' Red ','green','black',' Yellow ','white']

3rd Step -Create and train tag encoder objects

In this step, we need to create a tag encoder and train it. The following is the implementation of Python code-

# Creating the label encoderencoder = preprocessing. Labelencoder () Encoder.fit (input_labels)

The following is the output after running the Python code above-

LabelEncoder()
4th Step-Check performance by coding a randomly sorted list

This step can be used to check performance by encoding a randomly ordered list. The following Python code can do the same thing--

# encoding a set of labelstest_labels = ['green','red', ' Black '  = encoder.transform (test_labels)print("\nlabels =" , Test_labels)

The label will print as follows-

= [‘green‘, ‘red‘, ‘black‘]

Now, you can get a list of encoded values and convert the text labels to numbers, as shown below-

Print ("encoded values =", List (encoded_values))

The output results are printed as follows-

Encoded values = [1, 2, 0]
5th Step-Check performance by decoding a set of random numbers-

You can use this procedure to check performance by decoding a random set of numbers. The following Python code can also do the same thing--

Now, it will be printed as follows-

= [3, 0, 4, 1]print("\nDecoded labels =", list(decoded_list))
Python

Now, the decoded value will be printed as follows-

= [‘white‘, ‘black‘, ‘yellow‘, ‘green‘]

Tagged and unlabeled data

Unlabeled data is primarily made up of samples of natural or man-made objects that can easily be obtained from the real world. They include audio, video, photos, news articles, and more.

On the other hand, tagged data takes a set of unlabeled data and uses some meaningful tags or tags or classes to augment each unlabeled piece of data. For example, if you have a photo, the label can be placed based on the contents of the photo, i.e. it is a boy or girl or animal or any other photo. Tagging data requires human expertise or the judgment of a given unlabeled data.

In many cases, untagged data is rich and easy to obtain, but labeling data often requires manual/expert comment. Semi-supervised learning attempts to combine tag data with unlabeled data to build a better model.

Yi Hundred tutorial ai python fix-ai data preparation-tag data

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.