Analysis on the plug-in mechanism of the learning notes 10-3 in nutch 1.3
-------------------------------------
1. Some object descriptions
- Pluginrepository: This is a plugin used to store all plug-in description objects (plugindescriptor), plug-in extension points (extensionpoint), and activated plug-ins.
- Plugindescriptor: used to describe the meta information of a single extension. Its content is mainly obtained from plugin. xml.
- Plugin: describes an abstraction of a plug-in, including a plug-in descriptor, which is a one-to-one relationship.
- Extensionpoint: This extension point mainly refers to an interface in an object, that is, there can be multiple extensions to implement this interface. One or more extension points are actually a plug-in, for example, nutch-extensionpoints.
- Extension: an extension is an implementation of an extension point. A plug-in can contain multiple extensions.
- Pluginmanifestparser: it is mainly used to parse the plugin. xml file under the plug-in directory and generate the corresponding plugindescriptor object.
- Pluginclassloader: It inherits from urlclassloader and is used to dynamically generate corresponding plug-in implementation Objects Based on URLs.
2. Plug-In repository initialization process
There are two ways to generate pluginrepository: one is to directly create a corresponding object, and the other is to call the static get method of pluginrepository to obtain the corresponding pluginrepository from the cache, in the process of nutch, the second method is generally used to obtain pluginrepository, which ensures that resources are shared among multiple processes.
2.1 The initialization of pluginrepostory is performed in the CTR function.
The source code is as follows:
Factivatedplugins = new hashmap <string, plugin> (); fextensionpoints = new hashmap <string, extensionpoint> (); this. conf = conf; // whether to enable automatic startup when it is configured as a filter (that is, it is not loaded) but dependent on other plug-ins. The default value is true this. auto = Conf. getboolean ("plugin. auto-activation ", true); // Directory Name of the plug-in, which can be multiple directories string [] pluginfolders = Conf. getstrings ("plugin. folders "); pluginmanifestparser manifestparser = new pluginmanifestparser (Conf, this); Map <string, pl Ugindescriptor> allplugins = manifestparser. parsepluginfolder (pluginfolders); // list of plug-in names to be excluded. Regular Expressions are supported to define pattern excludes = pattern. compile (Conf. get ("plugin. excludes "," "); // list of plug-in names to be included. Regular Expressions are supported to define pattern between des = pattern. compile (Conf. get ("plugin. includes "," "); // filter unused ins and return the filtered ins Map <string, plugindescriptor> filteredplugins = filter (excludes, includes, allplugins ); // check the dependency of the plug-in fregister Edplugins = getdependencycheckedplugins (filteredplugins, this. Auto? Allplugins: filteredplugins); // install the extension points. It is mainly for the installextensionpoints (fregisteredplugins) of the plug-in nutch-extensionpoints ); try {// install the corresponding extension set for a specific extension point // note: In fact, both the extension points and extensions here are in the form of plug-ins installextensions (fregisteredplugins );} catch (pluginruntimeexception e) {log. fatal (E. tostring (); throw new runtimeexception (E. getmessage ();} displaystatus ();2.2 analyze the generation of a plug-in descriptor below
The agent descriptor is generated by calling the parsepluginfolder method of the pluginmanifestparser object. The source code is as follows:
/*** Returns a list Of all found plugin descriptors. ** @ Param pluginfolders * folders to search plugins from * @ return a {@ Link Map} of all found {@ link plugindescriptor} s. */Public Map <string, plugindescriptor> parsepluginfolder (string [] pluginfolders) {Map <string, plugindescriptor> map = new hashmap <string, plugindescriptor> (); if (pluginfolders = NULL) {Throw new illegalargumentexception ("plugin. folders is not defined ");} For (string name: pluginfolders) {// traverses all plug-in directories, here, the getpluginfolder method resolves the problem of the relative path of a resource. File directory = getpluginfolder (name); If (directory = NULL) {continue;} log.info ("plugins: Looking in: "+ directory. getabsolutepath (); // traverses the plug-in for (File onesubfolder: directory. listfiles () {If (onesubfolder. isdirectory () {string manifestpath = onesubfolder. getabsolutepath () + file. separator + "plugin. XML "; try {log. debug ("parsing:" + manifestpath); // analyze plugin. XML file plugindescriptor P = parsemanifestfile (manifestpath); map. put (P. getpluginid (), P);} catch (malformedurlexception e) {log. warn (E. tostring ();} catch (saxexception e) {log. warn (E. tostring ();} catch (ioexception e) {log. warn (E. tostring ();} catch (parserconfigurationexception e) {log. warn (E. tostring () ;}}} return map ;}
Private plugindescriptor parsemanifestfile (string pmanifestpath) throws malformedurlexception, saxexception, ioexception, parserconfigurationexception {// parses the XML file and generates the Document Object document = parsexml (new file (pmanifestpath ). tourl (); string ppath = new file (pmanifestpath ). getparent (); // analyze XML return parseplugin (document, ppath );}
Private plugindescriptor parseplugin (document pdocument, string ppath) throws malformedurlexception {element rootelement = pdocument. getdocumentelement (); // parse the following information in XML // <plugin id = "index-anchor" name = "anchor indexing filter" version = "1.0.0" provider-name = "nutch.org"> string id = rootelement. getattribute (attr_id); string name = rootelement. getattribute (attr_name); string version = rootelement. getattribute ("version"); string providername = rootelement. getattribute ("provider-name"); // plug-in class attribute, but it seems that the string pluginclazz = NULL is not used here; if (rootelement. getattribute (attr_class ). trim (). length ()> 0) {pluginclazz = rootelement. getattribute (attr_class);} // generate plug-in descriptor object plugindescriptor = new plugindescriptor (ID, version, name, providername, pluginclazz, ppath, this. conf); log. debug ("Plugin: Id =" + ID + "name =" + name + "version =" + version + "provider =" + providername + "class =" + pluginclazz ); // parse the following content // <extension id = "org. apache. nutch. indexer. anchor "name =" nutch anchor indexing filter "point =" org. apache. nutch. indexer. indexingfilter "> // <implementation id =" anchorindexingfilter "// class =" org. apache. nutch. indexer. anchor. anchorindexingfilter "/> // </extension> parseextension (rootelement, plugindescriptor); // This mainly resolves the plug-in of nutch-extensionpoints, the XML content is as follows // <extension-point id = "org. apache. nutch. indexer. indexingfilter "name =" nutch indexing filter "/> // <extension-point id =" org. apache. nutch. parse. parser "name =" nutch content parser "/> // <extension-point id =" org. apache. nutch. parse. htmlparsefilter "name =" HTML parse filter "/> parseextensionpoints (rootelement, plugindescriptor); // The dynamic library of the parsing plug-in and the third-party library used by the plug-in, the XML content is as follows // <runtime> // <library name = "parse-tika.jar"> // <export name = "*"/> // </library> // <library name = "apache-mime4j-0.6.jar"/> // <library name = "asm-3.1.jar"/> // <library name = "bcmail-jdk15-1.45.jar"/> // <library name = "bcprov-jdk15-1.45.jar"/>/ /</runtime> parselibraries (rootelement, plugindescriptor); // The plug-in library on which the plug-in depends is parsed, the XML content is as follows // <requires> // <import plugin = "nutch-extensionpoints"/> // <import plugin = "Lib-RegEx-filter"/> // </ requires> parserequires (rootelement, plugindescriptor); Return plugindescriptor ;}
Note that this pluginmanifestparser is used to parse the corresponding plugin. XML file to generate the pluginrepository object. This strange concept is that a plug-in Descriptor (plugindescriptor) can contain multiple extensible points or extensible points, here, why not separate extensible points? plugindescriptor only contains one or more implementations of extensible points. The extensible point is the interface definition of the plug-in.
2.3 check plug-in dependency
This dependency check is interesting, mainly based on the parameter plugin. Auto-activation. Some source code is as follows:
/*** @ Param filtered * is the list of plugin filtred * @ Param all * is the list of all plugins found. * @ return list */private list <plugindescriptor> getdependencycheckedplugins (Map <string, plugindescriptor> filtered, Map <string, plugindescriptor> All) {If (filtered = NULL) {return NULL;} Map <string, plugindescriptor> checked = new hashmap <string, plugindescriptor> (); // traverses all filtered plug-ins for (plugindescriptor Plugin: filtered. values () {try {// Save the dependent plug-in descriptor checked of the current plug-in. putall (getplugincheckeddependencies (plugin, all); // Save the current plugin descriptor checked. put (plugin. getpluginid (), plugin);} catch (missingdependencyexception MDE) {// log exception and ignore plugin log. warn (MDE. getmessage ();} catch (circulardependencyexception CDE) {// simply ignore this plugin log. warn (CDE. getmessage () ;}return new arraylist <plugindescriptor> (checked. values ());}
3. Plug-In call Process
The agent calling process is divided into the following steps:
- Obtain the corresponding extension point object from the plug-in repository based on the extension point id.
- Get the corresponding extension set based on the extension Point Object
- Traverses the extension set and instance the corresponding extension from the extension object. The instantiated filter is to call pluginclassloader.
The following code generates the urlfilter plug-in:
(1) ExtensionPoint point = PluginRepository.get(conf).getExtensionPoint(URLFilter.X_POINT_ID); if (point == null) throw new RuntimeException(URLFilter.X_POINT_ID + " not found."); (2) Extension[] extensions = point.getExtensions(); Map<String, URLFilter> filterMap = new HashMap<String, URLFilter>(); for (int i = 0; i < extensions.length; i++) { Extension extension = extensions[i]; (3) URLFilter filter = (URLFilter) extension.getExtensionInstance(); if (!filterMap.containsKey(filter.getClass().getName())) { filterMap.put(filter.getClass().getName(), filter); } }4. Summary
The plug-in mechanism of nutch is still classic. The above is just a simple analysis, and more practices are required for in-depth understanding.