Inverted indexes of hadoop learning notes and hadoop learning notes Indexes
Development tools: eclipse
Objective: To inverted index phone_numbers in the following document:
13599999999 10086
13899999999 120
13944444444 13800138000
13722222222 13800138000
18800000000 120
13722222222 10086
18944444444 10086
Code:
1 import java.io.IOException; 2 import org.apache.hadoop.conf.Configured; 3 import org.apache.hadoop.conf.Configuration; 4 import org.apache.hadoop.fs.Path; 5 import org.apache.hadoop.util.Tool; 6 import org.apache.hadoop.util.ToolRunner; 7 import org.apache.hadoop.io.*; 8 import org.apache.hadoop.mapreduce.*; 9 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;10 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;11 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;12 13 public class Test_1 extends Configured implements Tool 14 {15 enum Counter16 {17 LINESKIP, // error lines18 }19 20 public static class Map extends Mapper<LongWritable, Text, Text, Text>21 {22 public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException23 {24 String line = value.toString(); // read original data25 26 try27 {28 // process data29 String[] lineSplit = line.split(" ");30 String anum = lineSplit[0];31 String bnum = lineSplit[1];32 33 context.write(new Text(bnum), new Text(anum)); // map output34 }35 catch(java.lang.ArrayIndexOutOfBoundsException e)36 {37 context.getCounter(Counter.LINESKIP).increment(1);38 return;39 }40 41 }42 }43 public static class Reduce extends Reducer<Text, Text, Text, Text>44 {45 public void reduce(Text key, Iterable<Text>values, Context context)throws IOException, InterruptedException46 {47 String valueString;48 String out = "";49 50 for (Text value : values)51 {52 valueString = value.toString();53 out += valueString + "|";54 }55 56 context.write(key, new Text(out)); // reduce output57 }58 }59 public int run(String[] args)throws Exception60 {61 Configuration conf = getConf();62 63 Job job = new Job(conf, "Test_1"); // task name64 job.setJarByClass(Test_1.class); // specified task65 66 FileInputFormat.addInputPath(job, new Path(args[0])); // input path67 FileOutputFormat.setOutputPath(job, new Path(args[1])); // output path68 69 job.setMapperClass(Map.class);70 job.setReducerClass(Reduce.class);71 job.setOutputFormatClass(TextOutputFormat.class);72 job.setOutputKeyClass(Text.class);73 job.setOutputValueClass(Text.class);74 75 job.waitForCompletion(true);76 77 return job.isSuccessful() ? 0 : 1;78 }79 80 public static void main(String[] args)throws Exception81 {82 int res = ToolRunner.run(new Configuration(), new Test_1(), args);83 System.exit(res);84 }85 }
Running result:
What is B * tree inverted index technology?
So far, this is the most common index in Oracle and most other databases. Note that "B" here does not represent binary, but balanced. B * tree indexes are not a binary tree. However, its implementation is similar to that of the binary search tree, with the goal of minimizing the time spent on Oracle data search. The bottom block of the tree is called the leaf node or the leaf block, which contains each index and a rowid (pointing to the indexed row ). The internal block on the leaf node is called the branch block ). These nodes are used to implement navigation in the structure. Interestingly, the leaf node of the index actually forms a two-way linked list. It is also easy to perform an index interval scan (Sequential Scan of values). After finding the first value, we do not need to navigate in the index structure, but simply scan forward or backward through the leaf node as needed. Therefore, it would be quite simple to satisfy the following predicate conditions: where x between 20 and 30Oracle found that the first index leaf block with a minimum value greater than or equal to 20, and then traversed the leaf node linked list horizontally, until a value greater than 30 is hit. The B * tree index does not contain non-unique (nonunique) entries. In a non-unique index, Oracle appends rowid to the key as an additional column to make the key unique. In a unique index, Oracle does not add rowid to the index based on the uniqueness You define. One of the characteristics of B * trees is that all leaf blocks should be on the same layer of books. (This section seems to have some minor translation problems, so copy the original article as follows)
Creating Database indexes is actually creating some inverted tables, right?
You are talking about the search engine database. The inverted index is currently the most common storage method for search engines by search engine companies. It is also the core content of the search engine! In the actual reference of a search engine, you sometimes need to search for records based on certain values of the keyword. Therefore, we create an index based on the keyword. This index is called inverted index, files with inverted indexes, also known as inverted index files, can also be called Inverted Files to achieve fast retrieval and high-speed efficiency!
If it is a database built by a common database software, it cannot be understood as this.
Reference: tag.csdn.net/..6.html