"Original" KAKFA Utils source Code Analysis (ii)

Last Update:2015-03-31 Source: Internet

Author: User

Tags zookeeper client

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We continue to study the Kafka.utils package.

Eight, Kafkascheduler.scala First, the file defines a trait:scheduler--It is a scheduler that runs the task. The way the task is dispatched supports recurring background tasks or one-time deferred tasks. This trait defines three abstract methods : 1.Startup: start Scheduler for receiving scheduling tasks 2. Shutdown: turn off the scheduler. Once closed, the scheduling task is no longer performed, even for those tasks that are later than the closing time. 3. Schedule: Schedule the execution of a task. Method receives 4 parameters 3.1 Task name 3.2 is completely a function of side effects (side effect, return unit) for task scheduling when executing 3.3 Delay Time
3.4 Execution interval, if less than 0, description is a one-time Task Scheduler
3.5 delay Time unit, default is millisecondsSecond, the file also defines a thread-safe (using the @threadsafe tag) Kafkascheduer implements the previously defined scheduler interface-the scheduler is primarily Java-based The Schedulerthreadpoolexecutor class in the concurrent package implements task scheduling for thread pool mode. Since this is the thread pool, you need to provide the number of threads (threads) When you construct the class, the thread name prefix in the thread pool (Threadnameprefix, the default is kakfa-scheduler-), and whether the Daemon Daemon (daemon) is specified. That is, these threads do not block the JVM from shutting down. The class defines 2 fields that are also defined : One is that the Scheduledthreadpoolexecutor object holds the thread pool and is marked as @volatile, guaranteeing that the read of the object does not go to the register, direct memory reads, ensure memory visibility. And the other is Atomicinteger's Schedulerthreadid, which is composed of thread pool threads and thread names. The Atomicinteger type guarantees that access to the variable is thread-safe. Kafkascheduler implements the scheduler trait, so the startup, shutdown, and schedule methods must be implemented : 1. Startup: If the scheduler gracefully shuts down the class field executor should always be null, so at the start of the startup method you need to first determine whether executor is empty, If you do not throw an exception for NULL, the scheduler may already be running. Otherwise, create a thread pool with threads threads, and set the thread pool to no longer perform any type of dispatch tasks (including recurring scheduled background tasks and one-time deferred scheduling tasks). Then create a thread factory to initialize those threads. This uses the Newthread method in the Utils.scala in the package to create the thread. We'll talk about Utils.scala when we talk about it later. 2. ensurestarted: a purely side-effect function that will only be used in the shutdown method. The main purpose is to ensure that the scheduler is started. is simply to determine if executor is empty and throws an exception if it is empty. 3. Shutdown: The Scheduledthreadpoolexecutor.shutdown method is called and a 1-day timeout is set when the scheduler is enabled (note that the 1-day period here is hard-coded, Not supported by configuration), in a blocking manner to wait for the shutdown request to be fully executed. According to the theory of "JAva concurrency in practice", this is actually a blocking approach (blocking method), which should strictly allow the user to initiate an interrupt mechanism, It may be that the developer feels that shutdown will not run for a long time. Of course, maybe I was wrong about it:) 4. Schedule: The most important logic code of the scheduler. In order to ensure that the scheduler is started (call ensurestarted), call the Utils.scala's Runnable method (again, we'll say it later) to encapsulate the specified function into a Runnable object. Then determine the type of dispatch task (if the period parameter is greater than 0, it is a task that needs to be repeatedly dispatched, whereas a one-time deferred task) calls the different methods of Scheduledthreadpoolexecutor ( Scheduleatfixedrate or schedule) to perform this runnable Nine, Log4jcontroller.scala Look at the name and know it is related to log4j management. The code structure is also clear : A pair of associated objects (companion object) and a private trait. First say the private trait:Log4jcontrollermbean. Since they are trait, they are usually Java-like interfaces that define some abstract methods : 1. Getloggers: Returns a list of logger names List[string] 2. Getloglevel: get log level 3. Setloglevel: set the log level. However, unlike the normal setter method, the method returns a Boolean, after which the reason is said inside its implementation class. Since trait is defined, it is natural to have a specific class that implements it :log4jcontroller--allows the log4j log level to be dynamically modified at run time. The class also provides 2 auxiliary private methods :Newlogger and Existinglogger, plus the 3 abstract methods that implement Log4jcontrollermbean declarations, altogether 5 methods: 1. Newlogger: Create a logger logger, possibly root logger, more likely a normal logger. 2. Existinglogger: Return the corresponding logger according to Loggername3. Getloggers: Returns the current set of logger (also including root logger), each of which is a format such as logger name = Log level4. Getloglevel: Gets the corresponding log level according to the given logger name5. Setloglevel: Set the log level. It is worth noting that if loggername is empty or the log level is empty, returning false indicates that the setting is unsuccessfulLog4jcontroller object is simple, just initializes a Log4jcontroller instance, and use the Registermbean method in Utils.scala to register it with the platform Mbean server, registered as Kafka:type=kafka. Log4jcontroller 10, Logger.scalathis trait front, although not how to mention, but in fact, many classes have achieved this trait. The meaning of the name is very self-descriptive-the method class that operates the log. The trait also creates a logger object-in the form of lazy Val. The lazy in Scala means lazy loading, which is initialized only if the logger is used for the first time. Because many classes implement logging trait, it is necessary to use logger as a lazy Val, otherwise a logger object is constructed each time a new instance of the implementation class is constructed. This is completely unnecessary, we are only in use when the initialization is not very good? In addition, the trait has a logident field that is initialized to NULL, but because it is protected Var, it is clear that the subclass that implements the trait is specified. In the name of a variable, it appears to be a format that represents the log identifier. There are a large number of classes in the following code that specify different logident. This trait defines a large number of write-log methods, of course, for different log levels, such as Trace, DEBUG, INFO, WARN, error, and fatal. Interestingly, there is a swallow*** method at each level-the method receives a function that has no return value (strictly speaking, the return value is unit) and then runs the function. If you encounter an exception just record the exception and swallow it directly instead of throwing it again. The Swallow method in Utils.scala helps to achieve this function. Given that many of the methods of logging trait are repetitive and simple, they are not repeated. 11, Mx4jloader.scalafrom the name, it should be used in the Mx4j-tools Open Source Library (official website: http://mx4j.sourceforge.net/), but Kafka source code does not contain the corresponding JAR package. If you want to use it, download it yourself and put it under classpath. The latest version is 3.0.2,:HTTP://SOURCEFORGE.NET/PROJECTS/MX4J/FILES/MX4J%20BINARY/3.0.2/ the file provides an object that primarily enables jmx--to enable the feature using-dmx4jenable=true. The default IP address and port are 0.0.0.0 and 8082, respectively. Use-dmx4jport=8083 and-dmx4jaddress=127.0.0.1 to override the default settings. The mx4jloader.maybeload is called in later kafkaserver to load the JMX settings:Maybeload: From the name of--maybe load--It is also possible to not load, either because the Mx4j-tools jar package is not under Classpath, or it is not set in the configuration file (default is not open). The process is: first load the system settings (Kafka implements a verifiableproperties encapsulates the Java Properties object), and then see if there is a kafka_mx4jenable attribute. If there is no direct return false--the JMX is not required to be loaded. Gets the mx4jaddress and Mx4jport properties if they exist. Instantiate the Httpadaptor object instance and the Xsltprocessor object instance through the reflection mechanism (all two classes are provided by Mx4j-tools), and then register them. If the intermediate procedure captures the classnotfoundexception exception, the direct return false indicates that the Mx4j-tools jar package is not classpath, and if it is an Mbean registration-related exception, it also returns false and throws the exception. 12, Os.scalavery dapper an object that only provides the name string and iswindows two variables to get the operating system name separately and to determine whether it is a Windows platform. 13, Pool.scalaThe name is pool, but the data structure of the field pool is actually a concurrenthashmap, more like the CONCURRENTHASHMAP data structure is a layer of encapsulation, So many of the methods it offers are implemented directly by calling Concurrenthashmap's same name method. And also the--[k of generics, V]. It is worth noting that the constructor of this class receives a parameter type of option[(K) + V], which is actually option[function], which takes a type K parameter to return a value of type V, and the default class constructor parameter is None. It also provides an auxiliary constructor that assigns a [k,v] pair in a map to the hashmap on the bottom of the class. since most of the methods provided are called standard concurrenthashmap methods, I will not repeat them, but in particular, the Getandmaybeput method:Getandmaybeput: The name is self-descriptive--get value from a given key, add a record of the key if it does not exist--that is, generate a value from valuefactory to the pool, and return the value. But if it is an increase, how can value be evaluated? Let's take a look at the code.
As you can see, the code first determines if the valuefactory is empty and throws an exception directly if it is empty. But in fact, we can first determine whether there is a value, if there is already a direct return, even if the valuefactory is empty, it does not matter, because we do not need to generate a value from valuefactory. So I think I can say code rewrite as:In short, postpone the non-null judgment of valuefactory to the moment when it needs to be used. It is also important to note that although this method uses a synchronization mechanism, there are other methods (such as put) that are available in the class to add records to Concurrenthashmap. So when Getandmaybeput returns, you may find that the return value is not the same as the value computed by valuefactory-this is because another thread succeeded in inserting [key,value] pairs, Of course all this is Concurrenthashmap is based on CAs. 14, Replicationutils.scalaKafka's message to persist between clusters must provide some level of redundancy-that is, the replica mechanism. Similar to Hadoop,kafka also has a corresponding replica factor (replication factor). Concrete realization We talk about replication when we talk about it. This file provides an object that is just a common suite class used by the copy mechanism. One way we say:1. PARSELEADERANDISR:ISR represents In-sync replicas, which represents a set of replicas that are still active (alive) and hold a different state from the leader copy. Naturally, we need to define how much difference we can take with leader, which can be configured by two parameters: Replica.lag.time.max.ms and Replica.lag.max.messages. This method receives a JSON-formatted string that contains leader, Leader_epoch, a set of ISR lists, and Controller_epoch information, and returns a Leaderisrandcontrollerepoch object after parsing. The latter is in the Kafka.collections package, which is a simple case class-the main purpose is to print some basic information about the leader and ISR: including the ID, time epoch, etc.-this information is stored in the zookeeper. 2. Checkleaderandisrzkdata: As the name implies, check the leader and ISR list data on the given zookeeper path. Use Zkutils.readdatamaybenull to read the data on the corresponding path (which of course may be null), if the first method called Parseleaderandisr tries to parse if the tuple succeeds (true, zookeeper version), Returns if any exception occurs (false,-1) indicates that the check failed
3. UPDATELEADERANDISR: Use the Zookeeper client object to update the leader and ISR information stored on ZK. Because the copy mechanism provided by Kafka is for topic partitions, the method also receives a PartitionID. Returns a Boolean value that indicates whether the update result was successful. The code logic is also clear: get the Zookeeper path to update first, and then call the Leaderandisrzkdata method on Zkutils to assemble the new JSON string. Finally, the update operation is performed using the Conditionupdatepersistentpath method. This update is conditional in terms of name, which means it is possible for the update to fail (for example, path does not exist, or the current version does not match). We'll talk about these two methods when we study Zkutils.scala. In summary, a Boolean is returned to indicate whether the update was successful. in general, the first two methods mainly serve the Updateleaderandisr method, and the Updateleaderandisr method is also called in Kafka.cluster.Partition.

"Original" KAKFA Utils source Code Analysis (ii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More