We already know that data in some format is required for machine learning algorithms. Another important requirement is that the data must be correctly tagged before it is sent as input to the machine learning algorithm. For example, if the classification is said, then there will be many tags on the data. These tokens exist in the form of text, numbers, and so on. Functional expectations related to machine learning in Sklearn the data must be digitally marked. Therefore, if the data is other forms, it must be converted to a number. The process of converting a word label into a digital form is called a tag encoding.
Tag Encoding Step
Follow these steps to encode data markers in Python-
1th Step -Import a useful package
If you use Python, this will be the first step in converting the data to a specific format (that is, preprocessing). It can do the following-
Import NumPy as NP from Import preprocessing
2nd Step -Define the sample label
After the package is imported, we need to define some sample labels so that the tag encoder can be created and trained. The following sample labels are now defined-
# Sample Input Labelsinput_labels = ['red','black', ' Red ','green','black',' Yellow ','white']
3rd Step -Create and train tag encoder objects
In this step, we need to create a tag encoder and train it. The following is the implementation of Python code-
# Creating the label encoderencoder = preprocessing. Labelencoder () Encoder.fit (input_labels)
The following is the output after running the Python code above-
LabelEncoder()
4th Step-Check performance by coding a randomly sorted list
This step can be used to check performance by encoding a randomly ordered list. The following Python code can do the same thing--
# encoding a set of labelstest_labels = ['green','red', ' Black ' = encoder.transform (test_labels)print("\nlabels =" , Test_labels)
The label will print as follows-
= [‘green‘, ‘red‘, ‘black‘]
Now, you can get a list of encoded values and convert the text labels to numbers, as shown below-
Print ("encoded values =", List (encoded_values))
The output results are printed as follows-
Encoded values = [1, 2, 0]
5th Step-Check performance by decoding a set of random numbers-
You can use this procedure to check performance by decoding a random set of numbers. The following Python code can also do the same thing--
Now, it will be printed as follows-
= [3, 0, 4, 1]print("\nDecoded labels =", list(decoded_list))
Python
Now, the decoded value will be printed as follows-
= [‘white‘, ‘black‘, ‘yellow‘, ‘green‘]
Tagged and unlabeled data
Unlabeled data is primarily made up of samples of natural or man-made objects that can easily be obtained from the real world. They include audio, video, photos, news articles, and more.
On the other hand, tagged data takes a set of unlabeled data and uses some meaningful tags or tags or classes to augment each unlabeled piece of data. For example, if you have a photo, the label can be placed based on the contents of the photo, i.e. it is a boy or girl or animal or any other photo. Tagging data requires human expertise or the judgment of a given unlabeled data.
In many cases, untagged data is rich and easy to obtain, but labeling data often requires manual/expert comment. Semi-supervised learning attempts to combine tag data with unlabeled data to build a better model.
Yi Hundred tutorial ai python fix-ai data preparation-tag data