Main excerpt from http://dblab.xmu.edu.cn/blog/290-2/
Brief introduction
This guide describes the Hadoop Distributed File System HDFs and details the reader's operational practices for the HDFs file system. Hadoop distributed FileSystem (Hadoop Distributed File System,hdfs) is one of the core components of Hadoop, and if Hadoop is already installed, it already contains the HDFS components and does not need to be installed separately.
Using the Java API to interact with HDFS
The shell command described above is essentially an application of the Java API, which interacts between different file systems of Hadoop by invoking the Java API. The following is the official Hadoop API documentation that you want to learn more about Hadoop, and you can visit the following websites to see the capabilities of each API.
Interacting with the Java API requires the use of software eclipse to write Java programs.
(i) Installation of IntelliJ idea in Ubuntu
Download the trial version ideaiu-2018.1.1.tar.gz directly on the website.
(ii) Creating a project in idea
Click Create New Project
Select the Java project, if the SDK does not display 1.8, click the south New button to add the appropriate SDK, the default location is/usr/lib/jvm/java-8-openjdk-amd64
Enter the project name "Hdfsexample" after "Project name" and select "Use default location" to have all files of this Java project saved to the "/home/hadoop/hdfsexample" directory and then Click the "next>" button at the bottom of the interface to go to the next step of completing Setup.
(iii) Add the jar packages needed for the project
Adding a reference jar package in File>project struecture
The jar packages needed to load the Java project in this interface contain Java APIs that can access HDFs. These jar packages are located in the Hadoop installation directory of the Linux system and are in the "/usr/local/hadoop/share/hadoop" directory for this tutorial. Click the button in the interface,
In order to write a Java application that interacts with HDFS, it is generally necessary to add the following jar packages to the Java project:
(1) Hadoop-common-2.7.1.jar and Haoop-nfs-2.7.1.jar under the "/usr/local/hadoop/share/hadoop/common" catalogue;
(2) All jar packages under the/usr/local/hadoop/share/hadoop/common/lib "directory;
(3) Haoop-hdfs-2.7.1.jar and Haoop-hdfs-nfs-2.7.1.jar under the "/usr/local/hadoop/share/hadoop/hdfs" catalogue;
(4) All jar packages in the "/usr/local/hadoop/share/hadoop/hdfs/lib" directory.
For example, if you want to add Hadoop-common-2.7.1.jar and Haoop-nfs-2.7.1.jar in the "/usr/local/hadoop/share/hadoop/common" directory to the current Java project.
(iv) Writing Java application code
Enter the name of the newly created Java class file, where the name "Hdfsfileifexist" is used, and the other can use the default settings, then click the "OK" button in the bottom right corner of the interface,
A source code file named "Hdfsfileifexist.java" is created, enter the following code in the file:
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;public class HDFSFileIfExist { public static void main(String[] args){ try{ String fileName = "test"; Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://192.168.3.236:9000"); // 这里根据自己实际情况调整 conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem"); FileSystem fs = FileSystem.get(conf); if(fs.exists(new Path(fileName))){ System.out.println("文件存在"); }else{ System.out.println("文件不存在"); } }catch (Exception e){ e.printStackTrace(); } }}
(v) Compile and run the program
Before you start compiling your program, make sure that Hadoop is up and running, and if it's not already started, you need to open a Linux terminal and enter the following command to start Hadoop:
cd /usr/local/hadoop./sbin/start-dfs.sh
In the Project window, select the Hdfsfileifexist class and right-click Run to see the results. In the Java code, we set the decision to determine whether HDFs contains the test name of the file, can be adjusted according to the actual situation.
(vi) Deployment of applications
Here's how to build a jar package for Java applications to run on a Hadoop platform. First, create a new directory called MyApp under the Hadoop installation directory to hold our own Hadoop applications, and execute the following commands in the Linux terminal:
cd /usr/local/hadoopmkdir myapp
Then, on the left side of the idea work interface, File > Project Structure, as shown below, then make the appropriate selection:
Then select the class to export
Then choose to delete the other dependent classes, leaving only their own code to
Select Build on the menu bar and choose Build artifacts.
Then test the program
cp out/artifacts/HDFSExample_jar/HDFSExample.jar /usr/local/hadoop/ ./bin/hadoop jar HDFSExample.jar
I'm outputting the result here
文件不存在
If you do not delete the dependent class when you export the settings above, you can also run it using the following method:
java -jar ./HDFSExample.jar
Get the same results.
Big Data -09-intellij Idea Development Java program Operation HDFs