When viewing dataframe information, you can view the data in Dataframe by Collect (), show (), or take (), which contains the option to limit the number of rows returned.
1. View the number of rows
You can use the count () method to view the number of dataframe rows
From pyspark.sql import sparksession
spark= sparksession\
. Builder \.
appName ("DataFrame") \
. Getorcreate ()
# # import type from
pyspark.sql.types import *
#生成以逗号分隔的数据
Stringcsvrdd = Spark.sparkContext.parallelize ([
123, "Katie", "Brown"),
(234, "Michael", "a", "green"),
(345, " Simone ",", "Blue")
])
#指定模式, Structfield (name,datatype,nullable) where name: The name of the field, DataType: The field's data type, Nullable: Indicates whether the value of the field is empty
schema = Structtype ([
Structfield ("id", Longtype (), True),
Structfield ("name", StringType (), true),
Structfield ("Age", Longtype (), true),
Structfield ("Eyecolor", StringType (), True)
])
#对RDD应用该模式并且创建DataFrame
swimmers = Spark.createdataframe (stringcsvrdd,schema)
#利用DataFrame创建一个临时视图
swimmers.registertemptable ("swimmers")
#查看DataFrame的行数
print Swimmers.count ()
3
2. Filter statements
To run a filter statement using the filter clause
#获取age =22 ID
swimmers.select ("id", "age"). Filter ("Age=22"). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+
Another way of writing the above query
Swimmers.select (Swimmers.id,swimmers.age). Filter (SWIMMERS.AGE==22). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+
If you only want to get the name of the person whose eye color is the beginning of the letter B, you can use the like
#获得eyeColor like ' b% ' (name) name, (eyecolor) Eye color
swimmers.select ("name", "Eyecolor"). Filter ("Eyecolor like ' b% '"). Show ()
+------+--------+
| name|eyecolor|
+------+--------+
| katie| brown|
| simone| blue|
+------+--------+
3. Using SQL queries
SQL queries can be executed because the. Registertemptable method is executed on the swimmers data.
Number of rows queried
# swimmers.select ("name", "Eyecolor"). Filter ("Eyecolor like ' b% '"). Show ()
spark.sql ("SELECT count (1) from Swimmers "). Show ()
+--------+
|count (1) |
+--------+
| 3|
+--------+
Using the WHERE clause to run a filter statement
#用SQL获得age =22 id,age
spark.sql ("Select Id,age from Swimmers where age=22"). Show ()
+---+---+
| id|age|
+---+---+
|234| 22|
+---+---+
If you just want to retrieve the name of the person whose eye color begins with the letter B, you can use the like
Spark.sql ("Select Name,eyecolor from swimmers where eyecolor like ' b% '"). Show ()
+------+--------+
| name|eyecolor|
+------+--------+
| katie| brown|
| simone| blue|
+------+--------+