Learn to program, learn Java or Big data, Android? There are many students in the tangle, recently there are a lot of beginners to ask, learning big data, learning Spark, the company mainly use those languages to write, every hear this question, at least it is very good, prove that you have started to learn
If you are confident that you can stick to your learning, you can start to take action now!
I. Big Data Technology Basics
1. Linux operation Basics
Introduction and installation of Linux
Common Linux commands-File Operations
Common Linux commands-user management and permissions
Common Linux commands-system management
Common Linux commands-password-free login configuration and Network Management
Insta
~ slowly. The various provinces and cities on the line, let me start to think of those things (this is dangerous omen)7, in early 2016, for some reason, came to a bank in Shanghai, here is a complete big data environment, at that time, actually a little afraid, for what, because although the establishment of the Big data
Easyreport is an easy-to-use Web Reporting tool (supporting hadoop,hbase and various relational databases) whose main function is to convert the row and column structure queried by SQL statements into an HTML table (table) and to support cross-row (RowSpan) and cross-columns ( ColSpan). It also supports report Excel export, chart display, and fixed header and left column functions. The overall architecture looks like this:Directory
Developmen
databases, and more.Big Data-survey results
mongodb-a very popular, cross-platform, document-oriented database.
Elasticsearch-is a distributed RESTful search engine designed for cloud computing.
cassandra-an open-source distributed database management system. Originally designed and developed by Facebook, it is deployed on a large number of commercial servers to process large amounts of data.
knife "?2. Basic big data knowledge preparation
Environment: several servers, of course, can also be single-host; it is only a matter of efficiency.
Basic: hadoop
Algorithms: Understanding the "divide and conquer" concept in classic algorithms
For big data sorting tasks, we
strategy is to be an object within the JVM, and to do concurrency control at the code level. Similar to the following.In the later version of Spark1.3, the Kafka Direct API was introduced to try to solve the problem of data accuracy, and the use of direct in a certain program can alleviate the accuracy problem, but there will inevitably be consistency issues. Why do you say that? The Direct API exposes the management of the Kafka consumer offset (for
(', ') as(lbl:chararray,desc:Chararray,score:int);;
--Build the index and store it on HDFS, noting the need to configure a simple Lucene index (storage?). Is it indexed? )
Store A into '/tmp/data/20150303/luceneindex ' using Lucenestore (' store[true]:tokenize[true] ');
At this point, we have successfully stored the index on HDFS, do not be happy to kill, this is just a beginning, where you may have doubts, the index stored in HDFs can be d
simple Lucene index (storage?). Is it indexed? )
Store A into '/tmp/data/20150303/luceneindex ' using Lucenestore (' store[true]:tokenize[true] ');
At this point, we have successfully stored the index on HDFS, do not be happy to kill, this is just a beginning, where you may have doubts, the index stored in HDFs can be directly queried or access it? The answer is yes, but it is not recommended that you directly read the HDFs index, even if the bloc
test data is not how much, but the comprehensiveness of the coverage, if you have prepared thousands of data, but the same data type, the coverage of the Code branch is also one, that the data only one can be called effective test data, All the rest are invalid test data.Th
Tags: cloud computing Big Data spark technology spark hotspot spark interactive Q "Winning the cloud computing Big Data era" SparkAsia Pacific Research Institute Stage 1 Public Welfare lecture hall [Stage 1 interactive Q A sharing] Q1: Can spark shuffle point spark_local_dirs to a solid state drive to speed up
Tags: HTTP Io Using Ar strong data SP Art From: http://www.csdn.net/article/2013-12-04/2817707-Impala-Big-Data-Engine Big data processing is a very important field in cloud computing. Since Google proposed the mapreduce distributed processing framework, open source softw
In the blog "Agile Management of the various releases of Hadoop", we introduced the vsphere Big Data Extensions (BDE) is to solve the enterprise deployment and management of the Hadoop release of the weapon, It makes it easy and reliable to transport the many mainstream commercial distributions of
1, yes, we are big data also write common Java code, write ordinary SQL.
For example, the Java API version of the Spark program, the same length as the Java8 stream API.JavaRDDString> lines = sc.textFile("data.txt");JavaRDDInteger> lineLengths = lines.map(s -> s.length());int totalLength = lineLengths.reduce((a, b) -> a + b);Another example is to delete a Hive table.DROP TABLE pokes;
2. Yes,
Personal opinion: Big data we all know about Hadoop, but not all of it. How do we build a large database project. For offline processing, Hadoop is still more appropriate, but for real-time, relatively strong, the amount of data is large, we can use storm, then storm and wha
Http://www.aboutyun.com/thread-6855-1-1.htmlPersonal opinion: Big data we all know about Hadoop, but not all of it. How do we build a large database project. For offline processing, Hadoop is still more appropriate, but for real-time, relatively strong, the amount of data is
(This article also published in my public number "dotnet daily Essence article", Welcome to the right QR code to pay attention to. ) Preface: Build2016 After a long time, and now only to review, say those and big data related to the session, also because I recently in-depth research on this aspect of things. The content of the Microsoft Developer Conference Build2016 from March 30 to April 1 exploded throug
management software of IBM China R D center shares information about IBM Big Data PlatformZhu Hui believes that enterprises must face 3 V challenges in the big data era, namely the Variety type, Velocity speed, and Volume capacity ). Currently, users need to manage various data
Microsoft's recent open positions:Title: Senior SDE
The big data tooling team looking for a talented and passionate developer to work on the development and debugging experiences for cosmos and HD insight.
Cosmos is a massively-Parallel Supercomputer comprised of tens of thousands of commodity servers, coordinating to provide vast reliable storage and stunning computation power. our internal service proce
One: Cause(0) The beginning of the individual is very incompatible with the MATLAB programming language, Ken can be part of the programmer's common problem-learn C + + or Java, will despise other languages, too lazy to try other languages. Until one day ... He found that he or she found that the language he was proficient in could not solve the problem before making a change.(1) Recently has been dealing with big
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.