0 Basic Learning Hadoop to get started work Line guidance Questions Guide: What are the basics of 1.hadoop programming? What problems do 2.hadoop programming need to be aware of? 3. How to create a MapReduce program and how it contains several parts? 4. How can I connect to eclipse remotely and what problems might you encounter? 5. How to compile Hadoop source code? Read this article, need some foundation below two articles 0 Basic learning Hadoop to get started work Line Guide (elementary article) http://www.aboutyun.com/ Thread-6780-1-1.html 0 Basic Learning Hadoop to get started work Line Guide (intermediate article) http://www.aboutyun.com/ thread-7567-1-1.html if read, see this article is not a problem, this article is about Hadoop programming chapter. Hadoop programming, Hadoop is a Java framework, but also a revolution in programming, so that the traditional development of running programs from a single client (a single computer) to be able to run by multiple clients (multiple machines), so that the task can be decomposed, which greatly improves the efficiency. Since Hadoop is a Java framework, because we have to understand Java, there is a lot of information on the web, so learning Java is not a difficult task. But the degree of learning is probably the concern of our 0 basic students. Language in many cases are interlinked, if you are a student, still in the foundation stage, then the difficulty for you is still not small. 1. Beginner requirements must have a theoretical basis, and be able to complete a small project, at least to complete a few small cases, such as library. Basic Requirements for Beginners: (1) understand what objects, interfaces, continuation, polymorphism (2) must be familiar with Java Syntax (3) Mastering certain Common packages (4) will use the MAVEN download code (5) will use Eclipse, including the shortcut keys inside, how to open the project traditional programmer, Because of the rich programming experience, as long as you can master the development tools: (1) will use the MAVEN download code (2) will use Eclipse, including the shortcut keys inside, how to open the project (3) Simple familiarity with Java syntax is just the basics, if you want to develop Hadoop, You also need to know the following (1) will compile Hadoop (2) will use the Hadoop-eclipse-plugin plug-in, Remote connection cluster (3) will run Hadoop program. Here are some of the things we need to learn about. Whether you're a traditional developer or a student, the 0 basics are all you need to know: we need to get into the development. Develop 0 basics, how to, we provide the relevant content of the following article to learn Hadoop----Java 0 Basic Learning Route Guide Video (1) http://www.aboutyun.com/thread-6920-1-1.html This article what development tools we use and even consider what operating system to use. Then is the basic Java knowledge, including variables, functions and so on. Learn about Hadoop---Getting started with Java Primary fast reading (2) http://www.aboutyun.com/ Thread-6921-1-1.html the first chapter belongs to the ideological chapter, then this one belongs to the actual combat article, through the different way, gives you how to write the first small program. Java 0 Basics: Step by step how to create a project with Eclipse and write a small program instance http://www.aboutyun.com/ Thread-6963-1-1.html because the above is not really abstracted from the use of eclipse, so here is a special description of how to create a project using Eclipse, and how to write the Java foundation of the applet instance: the skills that Eclipse programming has to know Www.aboutyun.com/thread-6964-1-1.htmleclipse basic knowledge will be after we are in the project, these techniques are quite useful, and often use one, consider the development environment above we lay the groundwork, with these foundations, We started to learn to develop Hadoop, but how do we build the environment, we know that Windows has 32, 64 points, the JDK also has, then how to solve, we need to use 32 bit jdk,64 bit on the WINDOW32 bit to use 64-bit JDK, Also if Linux uses Linux32 in 64 bits. In more detail, you can refer to the 0 basic issues that need to be considered in developing Hadoop with Java http://www.aboutyun.com/thread-6824-1-1.html II, using J2SE, Java EE we need to understand the Java language as a whole, Java contains the following three kinds of Java Enterprise version, mainly for web development J2SE Java Standard Edition, mainly for Web development, but the lack of enterprise version of some features, under normal circumstances, Java applications are referred to J2SE development. J2ME Java micro-version, mainly for the development of mobile phones and so if we want to process, and display data, you can use Java EE, more detailed reference Hadoop development--java 0 Basis of the Java, J2SE, j2me the difference http://www.aboutyun.com /thread-6904-1-1.html Third, we have a certain understanding of Java, we began to use the JAVA1. Environment variable Configuration This is a bit of an adaptation for developers of the idiomatic integration environment, such as. NET, installation development environment VS, direct development, why do I need to configure the environment?Variable. Environment variables allow us to find the command of the JDK, and perhaps one of the benefits of. NET is that it is encapsulated. Don't worry about anything else. For environment variables, we need to configure the Java_home,path path, For more detailed reference: Hadoop development--java 0 Basic Development Tools environment variable configuration 2. Development Tools Select Development tools There are many kinds, different people are accustomed to different, development tools are not the same, here listed individuals think more common 1.eclipse2.myeclipse3.maven more tools reference Hadoop open Hair-java 0 basis, development and choose what development tools more appropriate http://www.aboutyun.com/ thread-6892-1-1.html while Maven can be used with eclipse, it can be used alone, and it's more commonly used in later development, such as when we download Hadoop source code and compile Hadoop, we need maven to do it. Maven learning can be referenced by source code compilation MAVEN Series video tutorial Rollup Http://www.aboutyun.com/thread-7972-1-1.html3.Java compiled Java can be run in one place, because of the JVM. After compiling the effect, you can view: Java 0 Basics, Learning Hadoop: Why compile Java source code, compile what will be the effect http://www.aboutyun.com/ Thread-7620-1-1.html4. How to open Java Project Java Project Development, or is more special, through the import, such as the other, such as. NET projects, directly click on the icon to open, Java projects, such as through the import of Eclipse imports, detailed reference to the following posts: 0 Basics teach you how to import Java projects into Eclipse http://www.aboutyun.com/ Thread-8213-1-1.html5.java Resources Download: The above added some basic knowledge, may not be comprehensive, if the lack of knowledge in this area, there are two ways: 1. Baidu, see the video, what to watch what video 2. If you want to be able to see, you can download the following resources Javaweb books Library Management system source MySQL version Http://www.aboutyun.com/thread-7269-1-1.htmlJava using the Hadoop Development Foundation: Javaweb video sharing http://www.aboutyun.com/ thread-7117-1-1.html Hundreds of Gjava file share http://www.aboutyun.com/thread-7955-1-1.htmljava hundred g content Download: Includes self-study, introduction, advanced applications, cases, etc. http://www.aboutyun.com/ Thread-6195-1-1.htmljava Foundation, we can finally develop, in fact, the development is not how difficult, often encountered the problem is as follows: 1. There are two reasons why you cannot connect to cluster windows using a plug-in connection set 1. User name inconsistency resolution: 1, In the case of a test environment, you can cancel user permission checks for Hadoop HDFs. Open Conf/hdfs-site.xml, and find the Dfs.permissions property modified to False (default is True) OK. (1.2.1 version only works with this method), how to do this can refer to the first question. 2. To modify the Hadoop location parameter, in the Advanced Parameter tab, locate the Hadoop.job.ugi key and change it to the user name that starts Hadoop 3 to modify the user name of the Windows machine as the Hadoop user name. 2. When running the MapReduce program, check the permissions according to the Hadoop development method Summary and Operation guidance http://www.aboutyun.com/ Thread-6950-1-1.html we know that Hadoop development can use plug-ins, or do not use plug-ins, if you do not use plug-in development may encounter the following problem resolution modified under Permissions on the Windows Eclipse running MapReduce encounters a permissions problem how to resolve http://www.aboutyun.com/thread-7660-1-1.html3. Missing Hadoop.dll, and Winutils.exe (1) Missing Winutils.exe return error: Could not locate executable null \bin\winutils.exe in the Hadoop binaries Windows Hadoop-eclipse-plugin plug-in to remotely develop Hadoop running mapreduce encounters problems and resolves http://www.aboutyun.com/ Thread-8311-1-1.html (2) Missing Hadoop.dll error is as follows: Unable to load Native-hadoop library for your platform ... using Builtin-java Classes where applicable workaround: 1. First, the HADOOP.DLL put it in the bin directory of Hadoop, as shown in 2. Configure Hadoop Home and Pathpath, where the absolute path is used, and path is configured with the bin path of Hadoop. After configuration, do not restart the machine as shown in the package and plug-in download, you can find the Hadoop family, Strom, Spark, Linux, Flume and other jar packages, installation package summary download (continuous update) http://www.aboutyun.com/ Thread-8178-1-1.html above summarizes our development environment frequently encountered problems, the above problem hit a preventive needle, we back in the connection, it will be much smoother. Hadoop can be developed under Linux or under Windows. Here's how to connect to a Hadoop cluster remotely for different versions of configuration, remote connection configuration is different, especially the configuration of the port, but the overall steps are similar, The following are HADOOP1 respectively. X and hadoop2.x1. Plug-in remote connection hadoop1.x One of the ways Hadoop is developed: Using plug-in development guidance http://www.aboutyun.com/ thread-6947-1-1.htmlhadoop2.x Novice Guide: Use Eclipse remote connection to Hadoop for program development on Windows http://www.aboutyun.com/ thread-6001-1-1.htmlhadoop2.2 Eclipse Link HDFs (Hadoop) http://www.aboutyun.com/thread-8190-1-1.html configuration for Hadoop 2. x development Environment (ECLIPSE) http://www.aboutyun.com/thread-7538-1-1.html2. Problems with remote connection problems a part of the connection is summarized above, such as plug-ins, missing. dll, Version and other issues in Win7 using Eclipse to connect virtual machines in Ubuntu Hadoop2.4 experience summary http://www.aboutyun.com/thread-7784-1-1.htmlwindows 7 using eclipse Development environment Construction and problem summary of Hadoop application http://www.aboutyun.com/ THREAD-8179-1-1.HTML3. Running MapReduce already connected to the cluster, we start to run can be programmed, in which we can operate HDFs, as in the following example: Hadoop combat: Java Programming for HDFs Http://www.aboUtyun.com/thread-6500-1-1.htmljava Creating an HDFs file instance Http://www.aboutyun.com/thread-6779-1-1.htmlJava operation HDFs Error Summary http:/ Www.aboutyun.com/thread-6261-1-1.html of course Operation HDFs, will encounter permissions problems, modify Hdfs-site.xml can, we do not repeat. In addition to operating the HDFs upload and download files, we also need to complete certain functions, such as wordcount and other simple functions. This inside programming completes three aspects content: 1.map function, plays the function that divides 2.reduce function, processing then summarizes 3.main () drive. 4. How to take parameters also need to continue tool interface, with parameter detailed reference how to write run with parameter input and output path Hadoop program (1) Create mapreduce run MapReduce parameter reference below: We can first complete a certain function, the realization of the function, can refer to the MapReduce primary case (1): Use MapReduce to re-http://www.aboutyun.com/ Thread-7041-1-1.htmlmapreduce Primary Case (2): Sorting with MapReduce data http://www.aboutyun.com/ Thread-7046-1-1.htmlmapreduce Primary Case (3): Using MapReduce to achieve average results http://www.aboutyun.com/ Thread-7048-1-1.html through the above implementation, here is an example, can be put into the project, directly run, of course, you need to create data files, and according to their actual situation to modify the URI, that is, hdfs://... needs to be modified to its own content. Novice Guide, how to create a MapReduce program in the development environment http://www.aboutyun.com/ thread-7945-1-1.html If we have a deep understanding of mapreduce, we can convert most of the programs to MapReduce, with detailed reference to how to convert traditional programs into mapreducehttp:// www.aboutyun.com/ Thread-8314-1-1.htmlhadoop programming needs to be noted although Hadoop is written in the Java language, it has its own data type, and may encounter coding problems, and because of the MapReduce partition, the use of hash calculation, the following content, you can understand the HA Doop seriesProcess Basics: Data type introduction and conversion between Java data types Http://www.aboutyun.com/thread-7036-1-1.htmleclipse debugging Hadoop needs to be aware of the coding problem http:// Www.aboutyun.com/thread-6910-1-1.htmlhadoop Basics: Introduction to hash values in Java http://www.aboutyun.com/thread-7560-1-1.html (2) Running MapReduce is created, we have two ways to run MapReduce, one to package to a cluster to run, and one to run in Eclipse. Hadoop cluster, how to run the Java JAR package---How to run the MapReduce program http://www.aboutyun.com/ thread-7408-1-1.html packaged cluster operation: Refer to the following Java 0 Foundation: the Java source code into jar package various methods introduced http://www.aboutyun.com/ Thread-7058-1-1.htmlhadoop Programming: Resolve Eclipse to run, pack and place on the cluster ClassNotFoundException: Experience Summary http://www.aboutyun.com/ Thread-7086-1-1.html (3) problems encountered in the run at the beginning have been said some classic questions, here are listed some related posts. The Map/reduce Project Rollup http://www.aboutyun.com/thread-7541-1-1.html developed hadoop2.x in 1.eclipse illustrates the following issues: 1. How do I create an Mr Program? 2. How do I configure the operating parameters? What happens when 3.hadoop_home is empty? What is the role of 4.hadoop-common-2.2.0-bin-master/bin? Extension: What is 4.winutils.exe? 2.win7 Eclipse Debugging CentOS hadoop2.2-mapreduce problem Solving solution http://www.aboutyun.com/ Thread-8030-1-1.html explains the following questions: 1. Build a MapReduce Project, run-time found the problem: Could not locate executable null, how to resolve? 2.Could not locate executabl .... \hadoop-2.2.0\hadoop-2.2.0\bin\winutils.exe in the Hadoop binaries. How to fix it? 3.win7 using Hadoop-eclipse-plugin plug-in development added Hadoop.dll does not take effect http://www.aboutyun.com/ Thread-8322-1-1.html4. Uploading to a cluster through the Java API in eclipse why replication default 3 copies, how to set http://www.aboutyun.com/ Thread-7085-1-1.html5.eclipse run appears call from to master:8020 failed on connection exception:http://www.aboutyun.com/ Thread-8321-1-1.html6.hadoop Eclipse plugin problem http://www.aboutyun.com/thread-8269-1-1.html7.Linux below Eclipse connection error, Ask the Great God http://www.aboutyun.com/thread-5104-1-1.html (4) MapReduce debugging: There are many ways to debug, here is a simple, primitive one. I remember that JavaScript was not debugged at first, so we used the alert method to see if the results were what we wanted, and here we debug mapreduce using a similar approach. We output debug information by counter COUNTPRINT1 = Context.getcounter ("Loop strscore in Map", "output Information"), Getcounter can output the variables inside the program to the Java console, This achieves the debug effect. Detailed reference: Debugging in Hadoop (MapReduce) map and redcue information output method http://www.aboutyun.com/thread-7682-1-1.html of course, there are other debugging methods, you can refer to the following Win7 Eclipse Debug CentOS hadoop2.2-mapreduce Problem Solution http://www.aboutyun.com/ thread-8030-1-1.html Debugging Hadoop Source code: Eclipse Debug and log print http://www.aboutyun.com/thread-8021-1-1.html4. Get source, read sourceSource code can be through Git,maven and other ways. (1) Mavenmaven can be used alone or as a plugin in eclipse, and since Hadoop SRC uses Maven's approach, we need to learn and use Maven. Eclipse Maven Plugin plugin installation and configuration and Maven combat books download http://www.aboutyun.com/thread-8014-1-1.html source compilation Maven Series Video tutorial summary/HTTP/ www.aboutyun.com/ thread-7972-1-1.html If you look at the above content, then we are very familiar with Maven, you can obtain the source of Hadoop through Maven, in the source of access, at least to keep the network unblocked, how to obtain, and view the Hadoop source code, view, we also need to close The following situation may occur if the source is not found. For more details, please refer to: From zero teaches you how to obtain hadoop2.4 source code and use Eclipse to associate hadoop2.4 source http://www.aboutyun.com/ Thread-8211-1-1.htmleclipse View Hadoop source code appears source not found because it is not added. ziphttp://www.aboutyun.com/ thread-7047-1-1.html editing hadoop2.2.0 source code guidance in Eclipse http://www.aboutyun.com/ Thread-7283-1-1.html source obtained, how can we view the reading source, how to view the definition of the class through eclipse, the implementation of the function, through the following post can achieve our goal. How to view and read hadoop2.4 source code through Eclipse http://www.aboutyun.com/thread-8225-1-1.html (2) Other access source tools git, svn source management, Get the Web Source tool: TortoiseSVN user manual http://www.aboutyun.com/thread-7982-1-1.htmlEclipse git plugin egit manual/HTTP// WWW.ABOUTYUN.COM/THREAD-8034-1-1.HTML5. Compiling Hadoop source code compilation, just beginning or relatively complex, need to install a lot of software including Maven, Protobuf, CMake, Ant and other tools to install, After compiling, IWe can install them. In more detail, you can view the following content from zero teach you how to compile Hadoop2.4http://www.aboutyun.com/thread-8130-1-1.htmlhadoop source code eclipse in the Linux environment (Ubuntu) Compilation tutorial http://www.aboutyun.com/thread-5653-1-1.html for compiled. class files, if you want to view the source code, you can use the Anti-compilation tool to implement Java class file decompile and Eclipse, MyEclipse Anti-compilation plug-in installation, use http://www.aboutyun.com/ Thread-7053-1-1.html6. Plugin production Eclipse Development, some students, interested, want to make plug-ins, you can view the following Hadoop2.4.0 Eclipse plugin production http://www.aboutyun.com/ thread-8117-1-1.htmlhadoop2.4.0 Eclipse Plugin Authoring and problem logging Http://www.aboutyun.com/thread-7780-1-1.htmlHadoop 2.2.0 compile Hadoop-eclipse-plugin plug-in HTTP://WWW.ABOUTYUN.COM/THREAD-7516-1-1.HTML7. Resources: Because some students often can not find the installation package, plug-ins, etc. Here is a summary of some resources: the Hadoop family, Strom, Spark, Linux, Flume and other jar packages, installation package rollup download (continuous update) http://www.aboutyun.com/ thread-8178-1-1.htmlhadoop2.4 Summary: hadoop2.4 plugin download, fully distributed, pseudo-distributed, Eclipse plug-in development Daquan http://www.aboutyun.com/ Thread-7795-1-1.htmlhadoop-eclipse-plugin-2.2.0.jar Plugin Package Share http://www.aboutyun.com/thread-6342-1-1.html original link: http ://www.aboutyun.com/thread-8329-1-1.html
0 Basic Learning Hadoop to get started work line guide