Feature extraction--conversion of tags and indexes: Onehotencoder

Source: Internet
Author: User

? single-Hot Coding (one-hot Encoding) refers to a list of categorical features (or noun features, nominal/categorical features) mapped into a series of two of dollars

Continuous characteristics of the process, the original category features there are several possible values, this feature will be mapped into several two-element continuous features, each of which represents a value, if the sample

Ben shows this feature, then take 1, otherwise take 0.

One-hot coding is suitable for some of the algorithms that are expected to feature continuous features, such as logistic regression.

First, you create a dataframe that contains a list of categorical features, and it is important to note that before you convert using Onehotencoder, Dataframe needs to first use the

Stringindexer to value the original tag:

#导入相关的类库From pyspark.sql import sparksessionFrom pyspark.ml.feature import onehotencoder,stringindexer#创建SparkSession对象, configure Sparkspark = SparkSession.builder.master (' local '). AppName (' Onehotencoderdemo '). Getorcreate ()#创建一个简单的DataFrame训练集df = Spark.createdataframe ([(0, "a"),(1, "B"),(2, "C"),(3, "a"),(4, "a"),(5, "C")], ["id", "category"])#创建StringIndexer对象, set input and output parametersindexer = stringindexer (inputcol= ' category ', outputcol= ' Categoryindex ')#生成训练模型model = Indexer.fit (DF)#利用生成的model对DataFrame进行转换indexed = model.transform (DF)#创建OneHotEncoder对象, set input and output parametersOnehotencoder = Onehotencoder (inputcol= ' Categoryindex ', outputcol= ' Categoryvec ')#我们创建OneHotEncoder对象对处理后的DataFrame进行编码, you can see that the encoded binary features are sparse#向量形式, the same sequence as the Stringindexer encoding, note that the last category ("B") is encoded as a full 0 -way#量, if you want "B" to also have a binary feature, you can specify Setdroplast (FALSE) when you create Onehotencoder. oncoded = onehotencoder.transform (indexed)oncoded.show ()

Feature extraction--conversion of tags and indexes: Onehotencoder

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.