Products A table of changes in the price of goods, orders, records each purchase of goods and dates
Match orders and products based on a non-equivalent join in Spark SQL, counting the prices of the items in each order
Slow-changing commodity price list
Wangzai milk, there was a price change.
scala> val products = sc.parallelize(Array( | ("旺仔牛奶", "2017-01-01", "2018-01-01", 4), | ("旺仔牛奶", "2018-01-02", "2020-01-01", 5), | ("王老吉", "2017-01-02", "2019-01-01", 5), | ("卫龙辣条", "2010-01-01", "2020-01-01", 2) | )).toDF("name", "startDate", "endDate", "price")products: org.apache.spark.sql.DataFrame = [name: string, startDate: string ... 2 more fields]scala> products.show();+----+----------+----------+-----+|name| startDate| endDate|price|+----+----------+----------+-----+|旺仔牛奶|2017-01-01|2018-01-01| 4||旺仔牛奶|2018-01-02|2020-01-01| 5|| 王老吉|2017-01-02|2019-01-01| 5||卫龙辣条|2010-01-01|2020-01-01| 2|+----+----------+----------+-----+
Order form (product name, order date)
Wangzai Milk has an order at different price periods
scala> val orders = sc.parallelize(Array( | ("2017-06-01", "旺仔牛奶"), | ("2017-07-01", "王老吉"), | ("2018-03-01", "旺仔牛奶") | )).toDF("date", "product")orders: org.apache.spark.sql.DataFrame = [date: string, product: string]scala> orders.show+----------+-------+| date|product|+----------+-------+|2017-06-01|旺仔牛奶||2017-07-01| 王老吉||2018-03-01|旺仔牛奶|+----------+-------+
Calculates the price of the commodity at the time of each order by a non-equivalent connection
Check out the price of two orders at different time periods for the milk of the Mong Tsai
scala> orders.join(products, $"product" === $"name" && $"date" >= $"startDate" && $"date" <= $"endDate").show()+-----------+------------+----------+------------+-------------+-----+| date | product | name | startDate | endDate | price|+-----------+------------+----------+------------+-------------+-----+|2017-07-01| 王老吉 | 王老吉 |2017-01-02|2019-01-01 | 5 ||2017-06-01| 旺仔牛奶 |旺仔牛奶|2017-01-01|2018-01-01 | 4 ||2018-03-01| 旺仔牛奶 |旺仔牛奶|2018-01-02|2020-01-01 | 5 |+-----------+------------+----------+------------+-------------+-----+
Spark SQL does not equal join