scala> val data1 = data.todf ("Affairs", "Gender", "Age", "yearsmarried", "Children", "religiousness", "Education", "Occupation", "rating") Data1:org.apache.spark.sql.DataFrame = [Affairs:string, gender:string ... 7 more fields]scala> Data1.limit (Ten) show+-------+------+---+------------+--------+-------------+---------+--- -------+------+|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|+-------+-- ----+---+------------+--------+-------------+---------+----------+------+| 0| male| 37| 10| no| 3| 18| 7| 4| | 0| null| 27| null| no| 4| 14| 6| null| | 0| null| 32| null| yes| 1| 12| 1| null| | 0| null| 57| null| yes| 5| 18| 6| null| | 0| null| 22| null| no| 2| 17| 6| null| | 0| null| 32| null| no| 2| 17| 5| null| | 0|female| 22| null| no| 2| 12| 1| null| | 0| male| 57| 15| yes| 2| 14| 4| 4| | 0|female| 32| 15| yes| 4| 16| 1| 2| | 0| male| 22| 1.5| no| 4| 14| 4| 5|+-------+------+---+------------+--------+-------------+---------+----------+------+scala> | Val res=data1.select ("Yearsmarried"). Na.drop () Res:org.apache.spark.sql.DataFrame = [yearsmarried:string]scala> Res.limit () show () +------------+|yearsmarried|+------------+| 10| | 15| | 15| | 1.5| | 15| | 4| | 15| | 1.5| | 4| | 15|+------------+scala> | Val Res123=data1.na.fill ("wangxiao123") Res123:org.apache.spark.sql.DataFrame = [Affairs:string, gender:string ... 7 more Fields]scala> Res123.limit (). Show ()+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+|affairs| gender|age|yearsmarried|children|religiousness|education|occupation| rating|+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+| 0| male| 37| 10| no| 3| 18| 7| 4| | 0|wangxiao123| 27| wangxiao123| no| 4| 14| 6|wangxiao123| | 0|wangxiao123| 32| wangxiao123| yes| 1| 12| 1|wangxiao123| | 0|wangxiao123| 57| wangxiao123| yes| 5| 18| 6|wangxiao123| | 0|wangxiao123| 22| wangxiao123| no| 2| 17| 6|wangxiao123| | 0|wangxiao123| 32| wangxiao123| no| 2| 17| 5|wangxiao123| | 0| female| 22| wangxiao123| no| 2| 12| 1|wangxiao123| | 0| male| 57| 15| yes| 2| 14| 4| 4| | 0| female| 32| 15| yes| 4| 16| 1| 2| | 0| male| 22| 1.5| no| 4| 14| 4| 5|+-------+-----------+---+------------+--------+-------------+---------+----------+-----------+scala> | Val Res2=data1.na.fill (value= "wangxiao111", Cols=array ("Gender", "yearsmarried")) Res2: Org.apache.spark.sql.DataFrame = [Affairs:string, gender:string ... 7 more fields]scala> Res2.limit () show () +-------+-----------+---+------------+--------+-------------+------- --+----------+------+|affairs| gender|age|yearsmarried|children|religiousness|education|occupation|rating|+-------+-----------+---+---------- --+--------+-------------+---------+----------+------+| 0| male| 37| 10| no| 3| 18| 7| 4| | 0|wangxiao111| 27| wangxiao111| no| 4| 14| 6| null| | 0|wangxiao111| 32| wangxiao111| yes| 1| 12| 1| null| | 0|wangxiao111| 57| wangxiao111| yes| 5| 18| 6| null| | 0|wangxiao111| 22| wangxiao111| no| 2| 17| 6| null| | 0|wangxiao111| 32| wangxiao111| no| 2| 17| 5| null| | 0| female| 22| wangxiao111| no| 2| 12| 1| null| | 0| male| 57| 15| yes| 2| 14| 4| 4| | 0| female| 32| 15| yes| 4| 16| 1| 2| | 0| male| 22| 1.5| no| 4| 14| 4| 5|+-------+-----------+---+------------+--------+-------------+---------+----------+------+scala> | Val Res3=data1.na.fill (Map ("Gender", "wangxiao222", "yearsmarried", "wangxiao567")) Res3: Org.apache.spark.sql.DataFrame = [Affairs:string, gender:string ... 7 More Fields]scaLa> Res3.limit () show () +-------+-----------+---+------------+--------+-------------+---------+----------+- -----+|affairs| gender|age|yearsmarried|children|religiousness|education|occupation|rating|+-------+-----------+---+---------- --+--------+-------------+---------+----------+------+| 0| male| 37| 10| no| 3| 18| 7| 4| | 0|wangxiao222| 27| wangxiao567| no| 4| 14| 6| null| | 0|wangxiao222| 32| wangxiao567| yes| 1| 12| 1| null| | 0|wangxiao222| 57| wangxiao567| yes| 5| 18| 6| null| | 0|wangxiao222| 22| wangxiao567| no| 2| 17| 6| null| | 0|wangxiao222| 32| wangxiao567| no| 2| 17| 5| null| | 0| female| 22| wangxiao567| no| 2| 12| 1| null| | 0| male| 57| 15| yes| 2| 14| 4| 4| | 0| female| 32| 15| yes| 4| 16| 1| 2| | 0| male| 22| 1.5| no| 4| 14| 4| 5|+-------+-----------+---+------------+--------+-------------+---------+----------+------+scala> | | Data1.filter ("Gender is null"). Limit (Ten). show+-------+------+---+------------+--------+-------------+---------+- ---------+------+|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|+-------+ ------+---+------------+--------+-------------+---------+----------+------+| 0| null| 27| null| no| 4| 14| 6| null| | 0| null| 32| null| yes| 1| 12| 1| null| | 0| null| 57| null| yes| 5| 18| 6| null| | 0| null| 22| null| no| 2| 17| 6| null| | 0| null| 32| null| no| 2| 17| 5| null|+-------+------+---+------------+--------+-------------+---------+----------+------+scala> data1.filter (Data1 ("Gender"). IsNull). Limit () show+-------+------+---+------------+--------+-------------+---------+----- -----+------+|affairs|gender|age|yearsmarried|children|religiousness|education|occupation|rating|+-------+---- --+---+------------+--------+-------------+---------+----------+------+| 0| null| 27| null| no| 4| 14| 6| null| | 0| null| 32| null| yes| 1| 12| 1| null| | 0| null| 57| null| yes| 5| 18| 6| null| | 0| null| 22| null| no| 2| 17| 6| null| | 0| null| 32| null| no| 2| 17| 5| null|+-------+------+---+------------+--------+-------------+---------+----------+------+scala> | | Math.sqrt ( -1.0) Res32:doubLe = nanscala> math.sqrt ( -1.0). IsNaN () Res33:boolean = True
Spark DataFrame data frame null value judgment and processing