Pyspark's Dataframe learning "Dataframe Query" (3)

Source: Internet
Author: User
Tags pyspark

When viewing dataframe information, you can view the data in Dataframe by Collect (), show (), or take (), which contains the option to limit the number of rows returned.

1. View the number of rows

You can use the count () method to view the number of dataframe rows

From pyspark.sql import sparksession

spark= sparksession\
                . Builder \.
                appName ("DataFrame") \
                . Getorcreate ()
# # import type from
pyspark.sql.types import *
#生成以逗号分隔的数据
Stringcsvrdd = Spark.sparkContext.parallelize ([
    123, "Katie", "Brown"),
    (234, "Michael", "a", "green"),
    (345, " Simone ",", "Blue")
])
#指定模式, Structfield (name,datatype,nullable) where name: The name of the field, DataType: The field's data type, Nullable: Indicates whether the value of the field is empty
schema = Structtype ([
    Structfield ("id", Longtype (), True),
    Structfield ("name", StringType (), true),
    Structfield ("Age", Longtype (), true),
    Structfield ("Eyecolor", StringType (), True)
])
#对RDD应用该模式并且创建DataFrame
swimmers = Spark.createdataframe (stringcsvrdd,schema)
#利用DataFrame创建一个临时视图
swimmers.registertemptable ("swimmers")
#查看DataFrame的行数
print Swimmers.count ()
3

2. Filter statements

To run a filter statement using the filter clause

#获取age =22 ID
swimmers.select ("id", "age"). Filter ("Age=22"). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+

Another way of writing the above query

Swimmers.select (Swimmers.id,swimmers.age). Filter (SWIMMERS.AGE==22). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+

If you only want to get the name of the person whose eye color is the beginning of the letter B, you can use the like

#获得eyeColor like ' b% ' (name) name, (eyecolor) Eye color
swimmers.select ("name", "Eyecolor"). Filter ("Eyecolor like ' b% '"). Show ()
+------+--------+
| name|eyecolor|
+------+--------+
|   katie| brown|
|    simone| blue|
+------+--------+

3. Using SQL queries

SQL queries can be executed because the. Registertemptable method is executed on the swimmers data.

Number of rows queried

# swimmers.select ("name", "Eyecolor"). Filter ("Eyecolor like ' b% '"). Show ()
spark.sql ("SELECT count (1) from Swimmers "). Show ()

+--------+
|count (1) |
+--------+
| 3|
+--------+
Using the WHERE clause to run a filter statement

#用SQL获得age =22 id,age
spark.sql ("Select Id,age from Swimmers where age=22"). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+

If you just want to retrieve the name of the person whose eye color begins with the letter B, you can use the like

Spark.sql ("Select Name,eyecolor from swimmers where eyecolor like ' b% '"). Show ()
+------+--------+
| name|eyecolor|
+------+--------+
|   katie| brown|
|    simone| blue|
+------+--------+


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.