zookeeper-How to modify the source code-"Big data five minutes a day"

Source: Internet
Author: User

This article is just a throw brick to meet the role of Jade, to give an example of how to modify the source code. The article was inspired by ZOOKEEPER-2784.

Ask a question first

Before the article about the design of ZXID, we first review the following:

The ZXID has 64 bits, divided into two parts:
The high 32-bit is the epoch of the leader: election clock, each time a new Leader,epoch is selected to accumulate 1
The low 32-bit is the transaction ID within the epoch: the cluster will accumulate 1 for each update operation of the user.

What is the problem with this design?

The transaction ID of the Zookeeper may be more than 32 bits.

The epoch grew very slowly, with more than 32 bits that would take a very long time, almost ignoring the problem, but the transaction ID did not seem to work. Let's figure it out.

If we operate 1000 times per second Zookeeper, or 1k/s ops, then

2^32/(86400?1000) ≈ 49.7

After 49.7 days, the transaction ID will overflow, what happens to overflow, see the code:

Src/java/main/org/apache/zookeeper/server/quorum/leader.java line1037

    /**     * create a proposal and send it out to all the members     *     * @param request     * @return the proposal that is queued to send to all the members     */    public Proposal propose(Request request) throws XidRolloverException {        /**         * Address the rollover issue. All lower 32bits set indicate a new leader         * election. Force a re-election instead. See ZOOKEEPER-1277         */        if ((request.zxid & 0xffffffffffL) == 0xffffffffffL) {            String msg =                    "zxid lower 32 bits have rolled over, forcing re-election, and therefore new epoch start";            shutdown(msg);            throw new XidRolloverException(msg);        }        

As you can see from the code above,

Zookeeper's Leader node will throw a new xidrolloverexception (msg) to force re-election to re-elect,

That is, the service will stop for a while, and in some scenarios the situation is too frequent to tolerate, so let's take a look at how to fix it.

How to solve?

It says that the epoch is slow to ignore its overflow problem, so you can redesign the ZXID,

Designed to be 24 bits high for the epoch, the low 40 bits are used for transaction ID growth.

Let's figure it out again:

2^40/(86400?1000) ≈ 12725.8  即 12725.8/365 ≈ 34.9 年

In the case of 1k/s OPS, a mandatory election will be held after 34.9 years.

Imagine good, can solve our problem, then we continue.

There's one more worry.

From the bottom of the operating system, for a 32-bit operating system, a single operation can handle the longest length of 32bit, and a long type of 8 bytes 64bit, so long read and write to two instructions to complete (that is, each read and write 64bit 32bit).

Why do you say this, because perhaps someone will associate this with the ZXID design, the above ZOOKEEPER-2784 also mentioned this problem.

However, I thought the Zxid is long type, reading and writing the long type (and double type the same) in the JVM, is divided into high 32bit and low 32bit part of the operation, and because the variable are not modified with and are not ZXID volatile Boxed for the corresponding reference type ( Long / Double ), so it belongs to [non-atomic operation]

I'm about to translate:

Zxid is a long type, and the JVM of the three-bit is operated on a long read-write (as with the double type), which is divided into high 32-bit and low 32-bit sections, since the ZXID variable is not modified with volatile and is not boxed to the corresponding reference type ( long/double), which belongs to non-atomic operations.

The elder brother was worried about the possibility of concurrency when the ZXID was redesigned to change the high 32-bit and low 32-bit to high 24-bit and low 40-bit.

Will there be this problem, we first look at the source code:

 Iterator<Integer> iterator = servers.iterator();                  long zxid = Long.valueOf(m.group(2));                  int count = (int)zxid;// & 0xFFFFFFFFL;                  int epoch = (int)Long.rotateRight(zxid, 32);// >> 32;

Note that this & 0xFFFFFFFFL, in fact, the following code is still a lot of this bitwise AND operation, it is not posted out.

Turn over this piece of source code to know, this worry is redundant, about ZXID all operations are bit operation instead of "=" assignment operation, it does not cause concurrency problems at the JVM level.

How to Modify

Next we use the source code in the "Bit and" way, the 32 is changed to 40-bit.

That is: Zxid the 40 bits of the Zxid at (&) 0xffffffffffL (40-bit).

Note that you want to change the int type before count to long, because int is 32bit,long to 64bit, at which point count has 40 bits so it is replaced by a long.

 Iterator<Integer> iterator = servers.iterator();            long zxid = Long.valueOf(m.group(2));         // int count = (int)zxid;// & 0xFFFFFFFFL;         // int epoch = (int)Long.rotateRight(zxid, 32);// >> 32;            long count = zxid & 0xffffffffffL;            int epoch = (int)Long.rotateRight(zxid, 40);// >> 40;

There are a lot of similar places in the back to be modified, not listed here, interested can see here GitHub

Zookeeper article here on the end, about zookeeper everyone else want to know can leave a message, I think valuable words will update some new articles.

Recommended Reading

What does big data need to learn?
Doesn't big data just write SQL?

zookeeper-How to modify the source code-"Big data five minutes a day"

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.