Spark issues with duplicate columns after join (org.apache.spark.sql.AnalysisException:Reference ' * ' is ambiguous)

Source: Internet
Author: User

Problem

Datafrme provides a powerful join operation, but often finds it problematic to run into duplicate columns when operating. When you're not paying attention, there's a problem when you do other things with related columns!

If the two fields exist at the same time, the error will be as follows: Org.apache.spark.sql.AnalysisException:Reference ' Key2 ' is ambiguous

Instance

1. Create two instances of DF Demo

Val df = sc.parallelize (Array ("    Yuwen", "Zhangsan", "$"), ("Yuwen", "Lisi", "All"), ("Shuxue", "Zhangsan",), ("Shuxue" , "Lisi,")). TODF ("course", "name", "Score")

Display: Df.show ()

Val df2 = Sc.parallelize (("    Yuwen", "Zhangsan", "Max"), ("Shuxue", "Zhangsan",)). TODF ("course", "Name", " Score ")

Display: Df2.show

Associated query:

Val joined = Df.join (DF2, DF ("cource") = = = DF2 ("Cource") && DF ("name") = = = DF2 ("name"), "Left_outer")

Results show:

This is where the problem arises. There are three 22 identical fields in this place, and when you manipulate this field, you have a problem.

Solve the problem

1. You can use the time you specify which DF field you want to use

Joined.select (DF ("course"), DF ("name")). Show

Results:

2. You can delete the extra columns, in the actual situation you will not be able to associate two identical tables, usually the names of several fields are the same, so you can delete the fields you do not need

Joined.drop (DF2 ("name"))

Results:

3. It is entirely possible to avoid this problem by modifying the expression of the join. Mainly through the SEQ object to achieve

Df.join (DF2, Seq ("course", "name")). Show ()

Results:

Transferred from: https://www.cnblogs.com/chushiyaoyue/p/6927488.html

Spark issues with duplicate columns after join (org.apache.spark.sql.AnalysisException:Reference ' * ' is ambiguous)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.