Background
Sqoop is a tool used to transfer data from Hadoop and relational databases (RDBMS) to each other. When using Sqoop, we need to provide the access password for the database. Currently Sqoop supports 4 ways to enter passwords:
- Clear text mode.
- Interactive mode.
- File mode.
- Alias mode.
The author uses the Sqoop in CDH5.10, the version is 1.4.6. In the upcoming demo, we will use MySQL as our relational database.
Clear text mode
Clear text mode is the simplest way. We can use the--password parameter when executing the Sqoop command, so we can access the database by entering the password directly on the command line.
# sqoop list-databases --connect jdbc:mysql://your_mysql_host --username your_mysql_username --password your_mysql_password
Because we typed the plaintext database password in the command line, this introduces the risk of leaking the password. Let's imagine that a malicious "hacker" hacked into your server, and as soon as he tapped the "history" command, he could see the command we had last time and, of course, see the password for our database. So we try not to use this dangerous way.
Interactive mode
Interactive mode is a common way to provide passwords. When you execute the sqoop command with the-p parameter, the terminal will prompt you for the password after you press ENTER:
# sqoop list-databases --connect jdbc:mysql://your_mysql_host --username your_mysql_username -P
In this way, there is no risk of leaking passwords, because no one can see our passwords, only the Sqoop program knows. However, this approach has a disadvantage, because it needs to artificially enter the password (interactive), so it can only be executed under the terminal. So this is a good way to do some simple tests on the command line, and if you want to execute the Sqoop script in some backend services (like Oozie), we need to do it in a different way.
File mode
File mode is commonly used in the background to perform Sqoop script scenarios. It does not require an artificially typed password and is more secure than the clear text mode. Here's how it works:
# echo -n "your_mysql_password" > /home/xxx/.mysql.password# chmod 400 /home/xxx/.mysql.password# sqoop list-databases --connect jdbc:mysql://your_mysql_host --username your_mysql_username --password-file file:///home/xxx/.mysql.password
First we need to create a file to hold our passwords, such as the. mysql.password file in the example. There's a hole in here. You cannot use VIM to create the file, because vim automatically adds a newline character to the end of the file, and Sqoop does not ignore the end of the newline character, so the password that contains the newline character is submitted to the database, causing the access to fail. So we can use the Echo-n command to avoid the appearance of end-of-line newline characters.
We then set the access permission for the file to 400, that is, only the current user has the readable permission. Finally, the path to the password file is specified by the--password-file parameter when executing the Sqoop command. We can also specify a password file located in HDFs, as long as the path is specified by replacing "file" with "HDFs".
But the file mode also has the disadvantage, that is the password or in the form of plaintext stored in the file, as long as the "hacker" can read the contents of the password file, it can still get our password.
Alias mode
The alias pattern is a relatively new way to introduce articles on the web with less. In this way, you can perfectly solve the problem of storing passwords in plaintext in file mode. Starting with sqoop1.4.5, Sqoop supports the use of passwords stored in Java KeyStore so that we do not have to store passwords in plaintext in the file.
First we use the Hadoop credential create [alias_name]-provider [hdfs_location] command (which is only available after Hadoop 2.6.0) for creating passwords and password aliases in KeyStore:
# hadoop credential create mysql.pwd.alias -provider jceks://hdfs/user/password/mysql.pwd.jceks
The command executes as shown:
Enter the password for our database after enter alias password. After execution, the program creates a Mysql.pwd.jceks file under the/user/password/of HDFs, and Mysql.pwd.alias is our password alias. We can use Mysql.pwd.alias instead of our real database password. When executing the sqoop command, we can use the--password-alias parameter, the value of which is the password alias we just specified:
# sqoop list-databases -Dhadoop.security.credential.provider.path=jceks://hdfs/user/password/mysql.pwd.jceks --connect jdbc:mysql://master --username root --password-alias mysql.pwd.alias
So is this a way to hide our passwords? Open the Mysql.pwd.jceks file, we can only see a piece of garbled characters, which means that the alias mode is good to hide our real database password.
Summarize
This article introduces 4 kinds of ways to provide database password in Sqoop, it is suggested that we should use alias mode to provide the password of database in future work, because this method is most safe and convenient.
Sqoop 4 ways to provide database passwords