Hadoop learning 2: hadoop Learning
After building a pseudo-distributed system:Introduction to pseudo distributed installation: http://www.powerxing.com/install-hadoop/
Exercise 1 compile a Java program to implement the followingFunction:
1. In HDFSUpload files
2. From HDFSDownload filesTo local
3.Show file directory
4.Move files
5.Create folder
6.Remove folder
package cn.itcast.hadoop.hdfs;
import java.io.FileInputStream;import java.io.FileNotFoundException;import java.io.FileOutputStream;import java.io.IOException;import org.apache.commons.compress.utils.IOUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.LocatedFileStatus;import org.apache.hadoop.fs.Path;import org.apache.hadoop.fs.RemoteIterator;import org.junit.Before;import org.junit.Test;public class temp { static FileSystem fs = null; /* * initiation */ @Before public void init() throws IOException{ Configuration configuration = new Configuration(); configuration.set("fs.defaultFS", "hdfs://zpfbuaa:9000/"); fs = FileSystem.get(configuration); } /* * upload files */ @Test public void upload() throws IOException{ init(); Path dstPath = new Path("hdfs://zpfbuaa:9000/aa/my.jar"); FSDataOutputStream os = fs.create(dstPath); FileInputStream is = new FileInputStream("/home/hadoop/download/my.jar"); IOUtils.copy(is, os); } /* * upload files to HDFS */ @Test public void upload2() throws IOException{ fs.copyFromLocalFile(new Path("/home/hadoop/download/my.jar"), new Path("hdfs://zpfbuaa:9000/aaa/bbb/ccc/my3.jar")); } /* * download files to local */ public void download(){ } /* * list the information of files */ @Test public void listfile() throws FileNotFoundException, IllegalArgumentException, IOException{ RemoteIterator<LocatedFileStatus> filesIterator = fs.listFiles(new Path("/"), true); while(filesIterator.hasNext()){ LocatedFileStatus fileStatus = filesIterator.next(); Path path = fileStatus.getPath(); String filename = path.getName(); System.out.println(filename); } System.out.println("---------------------------------------------"); FileStatus[] listStatus = fs.listStatus(new Path("/")); for(FileStatus status : listStatus){ String name = status.getPath().getName(); System.out.println(name + (status.isDirectory()?" is a dir":" is a file")); } } /* * make a new file */ @Test public void makdir() throws IllegalArgumentException, IOException{ fs.mkdirs(new Path("/aaa/bbb/ccc")); } /* * delete a old file */ public void rm() throws IllegalArgumentException, IOException{ fs.delete(new Path("/aaa/bbb"), true); } public static void main(String[] args) throws Exception { // TODO Auto-generated method stub Configuration configuration = new Configuration(); configuration.set("fs.defaultFS", "hdfs://zpfbuaa:9000/"); fs = FileSystem.get(configuration); FSDataInputStream is = fs.open(new Path("/jdk-7u65-linux-i586.tar.gz")); FileOutputStream os = new FileOutputStream("/home/hadoop/download/my.jar"); IOUtils.copy(is,os); }}
Exercise 2 compile a Java program to implement socket information interaction and function calls between the client and the server
LoginServiceImpl. class server instance class
package cn.itcast.hadoop.rpc;public class LoginServiceImpl implements LoginServiceInterface{ @Override public String Login(String username, String password) { return username + " logged in successfully!"; } }
LoginServiceInterface class (implemented on both the server side and local side)
package cn.itcast.hadoop.rpc;public interface LoginServiceInterface { public static final long versionID = 1L; public String Login(String username,String password);}
Starter. class creates a server class
package cn.itcast.hadoop.rpc;import java.io.IOException;import org.apache.hadoop.HadoopIllegalArgumentException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.ipc.RPC;import org.apache.hadoop.ipc.RPC.Server;import org.apache.hadoop.ipc.RPC.Builder;public class starter { public static void main(String[] args) throws HadoopIllegalArgumentException, IOException { Builder builder = new RPC.Builder(new Configuration()); builder.setBindAddress("zpfbuaa").setPort(10000).setProtocol(LoginServiceInterface.class).setInstance(new LoginServiceImpl()); Server server = builder.build(); } }
LoginController logon class
package cn.itcast.hadoop.rpc;import java.io.IOException;import java.net.InetSocketAddress;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.ipc.RPC;public class LoginController { public static void main(String[] args) throws IOException { LoginServiceInterface proxy = RPC.getProxy(LoginServiceInterface.class, 1L, new InetSocketAddress("zpfbuaa", 10000), new Configuration()); String result = proxy.Login("zpfbuaa", "123456789"); System.out.println(result); }}
LoginServiceInterface
package cn.itcast.hadoop.rpc;public interface LoginServiceInterface { public static final long versionID = 1L; public String Login(String username,String password);}
Note that:
1. to simulate remote calls, place the LoginServiceImpl. class, LoginServiceInterface. class interface class, and starter. class on the virtual machine. Local LoginController class and LoginServiceInterface class.
2. First, you needService StartupIn the preceding example, port 10000 of the VM is monitored.
3. Easy to ignore:VersionID. Different versions have different versions. In the preceding example, the version number is defined as Long and the final type is assigned 1L.
4.Jar packageAnd version control.
5. Local and server-side functions are requiredImplement the same interface classBut to prevent the version mismatch during the call, you must declare the version number, that is, versionID, when creating an instance, in this way, functions can be differentiated by different version numbers.
Hadoop Remote Call Implementation Mechanism RPC
The main steps are as follows:
1. encapsulate the local socket and interface class as a proxy to generateDynamic local proxy instance.
2. The instance calls the corresponding function and passes in the corresponding parameters.
3. The local socket gets a dynamic proxy callFunctions and input parameters.
4. Use the network transmission protocol to implement local socket and remote server socketConnect to achieve Information Transmission.
5.Server socketObtain the called function and input parameters, and generateDynamic Server proxy instance.
6. The server-side instance calls the server-side functions and passes in the obtained parameters.
7. The function call result is returned to the server socket.
8. The server socket willReturned resultsIt is transmitted to the local socket through the network transmission protocol.
9. The local socket passes the returned result to the local dynamic proxy.
Advantages of RPC:
1. Implemented the separation of controller and implement
2. the RPC mechanism can be used to effectively transmit information.
3. Ensure data reliability (DataNode needs to regularly transmit its own stored blocks information to NameNode for blocks maintenance ).
Underlying implementation mechanism of remote calls:
Implement the RPC mechanism:
View FileSystem fs = FileSystem. get (new Configuration ());
View the fs generation process step by step!
After adding the breakpoint, check it gradually!