Http://www.ibm.com/developerworks/cn/linux/l-gperf.html
The role of command-line processing and gperf
command-line processing has long been one of the most overlooked areas of software development. Almost all of the more complex software has some command-line options available. In fact, a large number of if-else
statements are often used to handle user input, so maintaining this legacy code is time-consuming and is true for senior programmers. In this case, many C developers often use lengthy (usually nested) if-else
statements, as well as ANSI C library functions, such as strcmp
, strcasecmp
and strtok
as complements, as shown in Listing 1.
Listing 1. C-language Style command-line processing
if (Strtok (cmdstring, "+dumpdirectory")) { //code for printing help messages goes here }else if (Strtok (cmds Tring, "+dumpfile")) { //code for printing version info goes here }
Instead of using an ANSI C-based application programming interface, C + + developers use strings from the Standard Template Library (LIBRARY,STL). However, it is still not possible to avoid using nested if-else
sequence statements. Obviously, this approach lacks scalability as command-line options continue to grow. For a typical program call with N options, the code ultimately performs a 0 (N2) comparison. This method is useful for generating code that runs faster and easier to maintain, using a hash table to store command-line options and using hashing to validate user-specified input.
This is the role played by Gperf. It will generate a hash list from a list of valid command-line options and a lookup function with a time complexity of O (1). Therefore, for a typical program call with N options, the code simply performs an O (N) [N*o (1)] comparison-a huge improvement to legacy code.
Back to top of page
Gperf usage Mode
The Gperf will be from a user-supplied file (typically using. Gperf as an extension, but not mandatory)-for example, commandoptions.gperf-and generate C + + source code for hash lists, hashes, and lookup methods. All code is directed to standard output and must then be redirected to a file similar to the following:
Gperf- L C + + Command_line_options.gperf > perfecthash.hpp
Note: -L
The option instructs Gperf to generate C + + code.
Back to top of page
Gperf input file format
Listing 2 shows the typical format of the Gperf input file.
Listing 2. Gperf input file format
%{/* C code, goes verbatim in output */%}declarations%%keywords%%functions
The file format consists of several elements: C code content, declarations, keywords, and functions.
C Code Content
C code content is optional, used %{
and %}
enclosed. The C code and comments will all be copied to the output file generated by the Gperf. (Note that this is similar to the GNU Flex and Bison utility).
Statement
The Declarations section is also optional, and if Gperf is not invoked with an -t
option, the Declarations section can be ignored. However, if this option is enabled, the first field of the last element in the declaration section must be char*
the const char*
name that is called with the or identifier.
However, -K
you can overwrite the name of the first field by using the options in Gperf. For example, if you want to name the field command_option, perform the following gperf call:
Gperf-t-K Command_option
Listing 3 shows the C Code content and Declarations section.
Listing 3. C Code content and Declarations section
%{struct commandoptioncode { enum { helpverbose = 1, ...,//More option codes here _64bit = 5 }; };typedef struct Commandoptioncode commandoptioncode;%}struct commandoption { const char* command_option; int optioncode; }; %%
Key words
The keyword section contains keywords -predefined command-line arguments in this example. In this section, if the first column of each row starts with a number sign (#), the row belongs to the comment line. The keyword should be the first field of each non-commented line; the char*
string quotation marks associated with it are usually optional. In addition, the fields can be placed after the previous keyword, but must be separated by commas and ended at the end of the line. These fields correspond directly to the last part of the declaration section structure, as shown in Listing 4.
Listing 4. Keyword section
%%+helpverbose, Commandoptioncode::helpverbose+append_log, Commandoptioncode::append_log+compile, Commandoptioncode::compile
Initialization of the C++/STL style
The initialization of the C++/STL style is to create one stl::map
and use the insert()
method to insert it repeatedly into the map. Conversely, any person responsible for maintaining the code must debug it to find out exactly where each command-line option is initialized, which is common in poorly written code. GPERF provides a cleaner interface for this.
The first entry refers CommandOption
to the field of the structure const char* command_option
, as shown in Listing 3, and the second entry refers to a field in the same structure int OptionCode
. So what does this mean? In fact, this is how gperf initializes the hash table, where command-line options and their associated properties are stored.
Function
The function is also an optional part. All the text in the function section that begins with the %%
end of the file is copied to the resulting file. As with the Declarations section, the user needs to provide valid C + + code for the function section.
Back to top of page
Gperf output
Gperf a predefined set of keywords, and then performs a quick lookup on those keywords. Similar to this, gperf outputs two functions: hash()
and in_word_set()
. The former is a hashing routine, and the latter is used to perform lookups. The Gperf output can be either C or C + +-you can specify it as one of them. If the output is specified as C, two C functions with the above name will be generated. If specified as a C + + language, Gperf generates a Perfect_Hash
class named, which contains two methods.
Note: You can use -Z
the option to modify the generated class name.
The prototype of the hash function is:
unsigned int hash (const char *STR, unsigned int len);
Which str
represents the command-line option, and len
its length. For example, if the command-line argument is +helpverbose
, yes str
+helpverbose
, len
12
.
In the gperf generated hash, the in_word_set()
lookup function. The prototype of the routine depends on the user-specified -t
options. If this option is not specified, only the user-specific command string (as data stored in the gperf generated hash) is processed, rather than the structure associated with the command string.
For example, in Listing 3, the CommandOption
structure is associated with the user command parameter, which is returned by the in_word_set()
routine. You can use the -N
options to change the name of this routine. The parameters of the routine are similar to the functions explained earlier hash()
:
const struct commandoption* in_word_set (const char *STR, unsigned int len);
Back to top of page
Common Gperf Options
Gperf is a highly customizable tool that can accept different options. The Gperf online manual (see the links in the Resources section) describes all the options available in Gperf, including:
-
-L language-name
: instructs Gperf to generate output using the specified language. The following options are currently supported:
-
KR-C
: This old-fashioned k&r C can be supported by the old and new C compilers, but the new ANSI C-compliant compiler may generate warnings or, in some cases, even generate flag errors.
- C: This option will generate C code, but if you do not adjust the existing source code, you may not be able to compile with some old C compiler.
-
ANSI-C
: This option generates code that conforms to the ANSI C standard and can only be compiled with the ANSI C standard-compliant compiler or the C + + compiler.
- C + +: This option generates C + + code.
-
-N
: This option allows the user to modify the name of the lookup function. The default name is in_word_set()
.
-
-H
: This option allows the user to modify the name of the hash routine. The default name is hash()
.
-
-Z
: This option provides a -L
C + + option is used. It allows the user to specify the name of the generated C + + class that contains in_word_set()
and hash()
functions. The default name is Perfect_Hash
.
-
-G
: This option generates a lookup table and takes it as a static global variable instead of being generated inside the lookup function to hide the table (the default behavior).
-
-C
: The Gperf will generate a lookup table as discussed earlier. -C
option creates a const
lookup table using the keyword declaration. The contents of all generated lookup tables are constants-that is, read-only forms. Many compilers can generate more efficient code by putting tables into read-only memory.
-
-D
: This option handles the keywords that hash the duplicate values.
-
-t
: This option allows you to include the keyword structure.
-
-K
: This option allows the user to select the name of a keyword component in a keyword structure.
-
-p
: This option can be compatible with earlier versions of Gperf. In earlier versions, it modified the generated function in_word_set()
to return the default Boolean value (that is, 0 or 1) to the pointer to wordlist array
type. This option is useful, especially -t
if you use the (Allow user-defined structs
) option. This option is not required in the latest version of Gperf and can be removed.
Back to top of page
Gperf Principle Overview
A static search set is an abstract data type that contains operations including initialize
, insert
and retrieve
. The perfect hash function is a static search set implementation that is very efficient in both time and space. Gperf is a perfect hash function generator that uses a list of user-supplied keyword lists to build the perfect hash function. Gperf converts the list of keyword elements supplied by n users to the source code that contains the k element lookup table and two functions:
-
hash
: The routine maps the keyword to scope 0 only. k -1 , where k = N. If k = n, it hash()
is considered to be the smallest perfect hash()
function. This hash()
function has two properties:
-
perfect property
: Find table entries with a time complexity of O (1) -that is, a string comparison is required to perform keyword recognition in a static search set.
-
minimal property
: The minimum amount of memory allocated for storing keywords.
-
in_word_set
: This routine uses hash()
a string comparison to determine whether a string belongs to a user-supplied list, and in most cases.
The internal implementation of GPERF takes two internal data structures as the core: the keyword signature (keyword signatures) list ( Key_List
) and the associated value (associated values) array ( asso_values
). All user-specified keywords and their properties are read from the user-specified file and stored as a node in the linked list (called Key_List
). When searching for the perfect hash()
function, gperf only a portion of each keyword character as the search key. This part of the character is called a keyword signature or keysig
.
An array of associative values is hash()
generated inside the function and keysig
is indexed using characters. Gperf repeatedly searches for an association value configuration that maps all n keysig
to distinct hash values. When Gperf finds a configuration and the configuration assigns each to the keysig
unique location in the generated lookup table, a perfect function is generated hash()
. The resulting perfect hash()
function returns an unsigned int
value with a range of 0: ( k-1), where the K value is the maximum keyword hash value plus 1.
When k = n, the minimum perfect function is generated hash()
. The keyword hash value is usually calculated by combining the associated value of the keyword with the length of the keyword keysig
. By default, a function adds the associated value of the hash()
first index position of the keyword and the associated value of the last index position to the length; for example:
Hash_value = length + asso_values[(unsigned char) keyword[1];
Back to top of page
Sample Project
The following is a simple project that explains the concepts discussed so far. Consider the Gperf file as shown in Listing 5.
Listing 5. Command_options.gperf
%{#include "command_options.h" typedef struct COMMANDOPTIONCODE commandoptioncode;%}struct commandoption { const char *option; int optioncode; }; %%+helpverbose, Commandoptioncode::helpverbose+password, Commandoptioncode::P assword+nocopyright, Commandoptioncode::nocopyright+nolog, Commandoptioncode::nolog+_64bit, Commandoptioncode::_64bit
Listing 6 shows the header files contained in the Gperf file command_options.h
.
Listing 6. Command_options.h header File
#ifndef __commandoptions_h#define __commandoptions_hstruct commandoptioncode { enum { helpverbose = 1, PASSWORD = 2, nocopyright = 3, NOLOG = 4, _64bit = 5 }; }; #endif
The Gperf command line looks like this:
Gperf-cgd-n isvalidcommandlineoption-k option-l C + t command_line_options.gperf > PERFECTHASH.HPP
The hash list is generated as part of the perfecthash.hpp file. Because of the options specified on the command line -G
, a hash list is generated at the global scope. Because the -C
gperf call is made using the option, a const
hash list is defined using the property. Listing 7 shows the detailed content of the generated source code.
Listing 7. The generated PERFECTHASH.HPP
/* C + + code produced by gperf version 3.0.3 *//* command-line: ' C:\\gperf\\gperf.exe '-cgd-n isvalidcommandlineoption-k Option-l C + t Command_line_options.gperf *//* Computed positions:-K ' 2 ' */#if! ((' = = +) && ('! ' = =) && (' "' = =") && (' # ' = = +) && ('% ' = = PNS) && (' & ' = =) && (' = ' = =) && (' (' = = = +) && (') ' = =) && (' * ' = =) &&A mp (' + ' = =) && (', ' = = ') && ('-' = = ') && ('. ' = =) && ('/' = = +) && ( ' 0 ' = = () && (' 1 ' = =) && (' 2 ' = =) && (' 3 ' = =) && (' 4 ' = =) &&A mp (' 5 ' = = ') && (' 6 ' = =) && (' 7 ' = =) && (' 8 ' = =) && (' 9 ' = =) && ( ': ' = = (+) && ('; ' = =) && (' < ' = =) && (' = ' = ') && (' > ' = =) &&A mp ('? ' = =) && (' A ' = =) && (' B ' = =) && (' C ' = =) && (' D ' = =) && (' E ' = =) && (' F ' = =) && (' G ' = =) && (' H ' = =) && (' I ' = =) && (' J ' = =) && (' K ' = =) && ( ' L ' = = () && (' M ' = = ') && (' N ' = =) && (' O ' = =) && (' P ' = =) && (' Q ' = = Bayi ' && (' R ' = =) && (' S ' = =) && (' T ' = =) && (' U ' = =) &&am P (' V ' = =) && (' W ' = =) && (' X ' = =) && (' Y ' = =) && (' Z ' = =) && ( ' [' = =] && (' \ \ = =) && ('] ' = = ') && (' ^ ' = = 94) && (' _ ' = = ') && ( (' a ' = =) && (' b ' = = 98) && (' c ' = =) && (' d ' = =) && (' e ' = = 101) && (' f ' = = 102) && (' g ' = = 103) && (' h ' = = 104) && (' i ' = =) && (' j ' = = 106) & amp;& (' k ' = = 107&& (' l ' = = 108) && (' m ' = = 109) && (' n ' = =) && (' o ' = = 111) && (' p ' = = ) && (' q ' = = 113) && (' r ' = =) && (' s ' = =) && (' t ' = =) && (' U ' = = 117) && (' V ' = = 118) && (' w ' = = 119) && (' x ' = =) && (' y ' = = 121) && (' z ' = = 122) && (' {' = = 123) && (' | ' = = 124) && ('} ' = =) && (' ~ ' = = 126))/* The Character set is not based on ISO-646. */#error "Gperf generated tables don t work with this execution character set. Please report a bug to <[email protected]>. " #endif #line 1 "command_line_options.gperf" #include "command_options.h" typedef struct COMMANDOPTIONCODE Commandoptioncode; #line 6 "command_line_options.gperf" struct commandoption {const char *option; int Optioncode; }; #define Total_keywords 5#define min_word_length 6#define max_word_length 12#define min_hash_value 6#define MAX_HASH_value 17/* maximum key range = A, duplicates = 0 */class perfect_hash{private:static inline unsigned int Hash (const c Har *str, unsigned int len);p ublic:static const struct commandoption *isvalidcommandlineoption (const char *STR, unsigned int len);}; Inline unsigned intperfect_hash::hash (register const char *STR, register unsigned int len) {static const unsigned char a Sso_values[] = {18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18 , 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 0, 18, 18, 18, 18, 18, 18, 18, 1 8, 5, 18, 18, 18, 18, 18, 0, 18, 0, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1 8, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18 , 18}; Return len + asso_values[(unsigned char) str[1];} static const struct Commandoption wordlist[] = {#line "command_line_options.gperf" {"+nolog", Commandoptioncode::no LOG}, #line "Command_line_options.gperf" {"+_64bit", commandoptioncode::_64bit}, #line "Command_line_ Options.gperf "{" +password ", Commandoptioncode::P assword}, #line" Command_line_options.gperf " {"+nocopyright", commandoptioncode::nocopyright}, #line "Command_line_options.gperf" {"+helpverbose", CommandOpti Oncode::helpverbose}};static Const signed char lookup[] = {-1,-1,-1,-1,-1,-1, 0, 1,-1, 2,-1,-1, 3,-1, -1,-1,-1, 4};const struct commandoption *perfect_hash::isvalidcommandlineoption (Register const char *STR, Register unsigned int len) {if (len <= max_word_length && len >= min_word_ LENGTH) {Register int key = hash (str, len); if (key <= max_hash_value && key >= 0) {Register int index = Lookup[key]; if (index >= 0) {Register const char *s = Wordlist[index]. Option; if (*str = = *s &&!strcmp (str + 1, s + 1)) return &wordlist[index]; }}} return 0;}
Finally, listing 8 shows the main source code list.
Note: Listing 8 demonstrates that the user can find command-line options from a given command-line option keyword within a constant time, and then process the option with the appropriate steps. IsValidCommandLineOption
The lookup time complexity is O (1).
Listing 8. Defining the Gperf.cpp of an application entry point
#include "command_options.h" #include "perfecthash.hpp" #include <iostream> #include <string>using namespace Std;int Main (int argc, char* argv[]) { string cmdlineoption = argv[1];//first command line Argument
const commandoption* option = perfect_hash::isvalidcommandlineoption (cmdlineoption.c_str (), Cmdlineoption.length ()); Switch (Option->optioncode) {case commandoptioncode::helpverbose: cout << Application Specific detailed help goes here "; break; Default:break; } return 0; }
Note : All the examples in this article were tested using Gperf version 3.0.3. If you are using an earlier version, you may need to use options in a command-line invocation -p
.
Back to top of page
Conclusion
The Gperf utility can quickly generate a perfect hash for small-to-medium databases. However, Gperf can also be used for other purposes. In fact, it can be used in the GUN compiler to maintain the perfect hash of language keywords, with the latest features that enable you to manipulate larger databases. Therefore, you might consider using Gperf in your next development project.
About Gpref O (n2)--O (1)