"Search Engine" BerkeleyDB implementing the Queue database

Source: Internet
Author: User

When crawling URLs using crawlers, we always use the data structure of the queue, in the example, write a queue class in Java can solve the problem, but this kind of queue stored data can only be stored in memory, once the power outage, all the data is emptied, the next time again. So, this queue can't be used to solve the problem, we have to implement a queue that can persist data.

Here is a queue I implemented with Berkeley DB, BerkeleyDB is a memory embedded database that automatically persists data to disk when the data stored in memory is larger than its buffer size.

Berkeley DB uses a key-value pair to store it, so I use the Java BigInteger as the key, and the URL is stored as value. Key is incremented, BigInteger can be achieved after 1 plus thousands of 0, fully meet the requirements of a large number of URLs.

The queue maintains two team heads and two BigInteger at the end of the queue, preserving the head and tail values of the queues, respectively, deleting the data from the head and adding 1 to the header value, and adding the data to the queue, and the trailing value is added 1;size to return the length; There are several cursor operations, such as first, current, Next, Prev and last, to traverse the queue.

The implementation of the queue is based on the Myberkeleydb Class I previously encapsulated, with just a few simple APIs that are handy to use, which is also code reuse. Here's the code:

 PackageCom.nosql;ImportJava.math.BigInteger;/********************************* * uses BerkeleyDB to encapsulate some database operations * including setting the buffer, setting the encoding, setting the data library * path, storing the key value pairs, looking up the values according to the key, closing the number of * database operations. * @author Administrator *********************************/ Public  class myberkeleydbqueue {    PrivateMyberkeleydb database;//Database    Private Static FinalBigInteger bigintegerincrement = biginteger.valueof (1);increment value of//key value    PrivateBigInteger Head;//Queue header    PrivateBigInteger tail;//Queue tail    PrivateBigInteger current;//used to traverse the current location of the database    Private Static FinalString headstring ="Head";Private Static FinalString tailstring ="Tail"; Public Myberkeleydbqueue() {database =NewMyberkeleydb (); }//Initialize database     Public void Open(String dbName)        {database.setenvironment (Database.getpath (), database.getchachesize ()); Database.open (DbName);//Open DatabaseHead = (BigInteger) database.get ("Head"); Tail = (BigInteger) database.get ("Tail");if(Head = =NULL|| Tail = =NULL) {head = biginteger.valueof (0); Tail = biginteger.valueof (-1);            Database.put (headstring, head);        Database.put (tailstring, tail);    } current = Biginteger.valueof (Head.longvalue ()); }//Set encoding     Public void Setcharset(String CharSet)    {Database.setcharset (charset); }//set path     Public void SetPath(String Path)    {Database.setpath (path); }//Set buffer size     Public Boolean setchachesize(LongSize) {returnDatabase.setchachesize (size); }//Queue     Public void EnQueue(Object value) {if(Value = =NULL)return;        Tail = Tail.add (myberkeleydbqueue.bigintegerincrement);        Database.put (tailstring, tail); Database.put (tail, value);//queue up, Team tail plus 1}//OUT Team     PublicObjectDeQueue() {Object value = Database.del (head);//Get the team head element and delete it        if(Value! =NULL) {head = Head.add (bigintegerincrement);        Database.put (headstring, head); }returnValue }//Team Header value     PublicObjectHead(){returnHead }//Team Tail value     PublicObjectTail(){returnTail }//Off     Public void Close()    { This. Database.close (); }//Get the size of database storage data     Public Long size()    {returnDatabase.size ()-2; }//Gets the current cursor value     PublicObject Current(){returnDatabase.get (current); }//Get the first cursor value     PublicObject First() {current = Biginteger.valueof (Head.longvalue ());returnCurrent (); }//Get the first cursor value     PublicObject Last() {current = Biginteger.valueof (Tail.longvalue ());returnCurrent (); }//Get the next cursor value     PublicObjectNext(){if(Current.compareto (tail) <0) {current = Current.add (bigintegerincrement);returnCurrent (); }return NULL; }//Get previous cursor value     PublicObjectprev(){if(Current.compareto (head) >0) {current = Current.divide (bigintegerincrement);returnCurrent (); }return NULL; }}

The team header value and the tail value are stored by String/biginteger, and the URL and key are stored using the Biginteger/string key value pair (in order to reuse, the code is all object, here is explained for a better understanding), So the size function returns the queue length minus a 2, which is the team header value and the tail value.

Below is a functional test file:

Packagecom. Test;Import Java. Math. BigInteger;Importcom. NoSQL. Myberkeleydbqueue;public class Test_myberkeleydbqueue {public static void main (string[] args) {//TODO auto-generated method Stu b myberkeleydbqueue queue = new Myberkeleydbqueue ();Queue. SetPath("Webroot\\data\\db\\queue");Queue. Open("Queue");System. out. println("Head:"+queue. Head());System. out. println("Tail:"+queue. Tail());System. out. println("Size"+queue. Size());System. out. println("===================");for (int i=0; i<10;i++) {Queue. EnQueue(i);System. out. println("Head:"+queue. Head());System. out. println("Tail:"+queue. Tail());System. out. println("Size:"+queue. Size());System. out. println("===================");}//Cursor Test System. out. println("first element:"+queue. First());System. out. println("last element:"+queue. Last());Long size1 = queue. Size()-1;//Lose the first elementSystem. out. println(Queue. First());//resets the cursor to 0while (size1-->0) {System. out. println(Queue. Next());} System. out. println("===================");System. out. println("Size:"+queue. Size());System. out. println("===================");Long size = Queue. Size()+3, the number of outgoing units is greater than the total number of queue elements, and null is outputfor (int i=0; i<size;i++) {System. out. println("Delete:"+queue. DeQueue());System. out. println("Head:"+queue. Head());System. out. println("Tail:"+queue. Tail());System. out. println("Size"+queue. Size());System. out. println("===================");} queue. Close();}}

"Search Engine" BerkeleyDB implementing the Queue database

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.