The C standard library in Linux contains a regular library that can be referenced only by reference, but finds that the Linux-brought regular library cannot use meta-characters and non-greedy matches, for example:
STR: 1.1.1.1
regex: (\d*.\d*.\d*.\d*)
The regular expression in which the \d uses metacharacters to match the numbers, but not in the Regex.h regular library.
STR: \123\456\
regex:\ (. +?) \
The regular expression uses a non-greedy match, but only matches the "123\456" in the Regex.h's regular library.
Here is an example of a test Pcre regular library:
/************************************************************************* > File name:example3.c > Author: Ge.zhang > Mail: > Created time:2018 March 29 Thursday 12:07 18 seconds **************************************************** /#include <stdio.h> #include <pcre.h> #include <string.h> #define OVECCOUNT 30/*
Should be a multiple of 3 */#define EBUFLEN #define BUFLEN 1024x768 int main () {pcre *re;
const char *error;
int erroffset;
int Ovector[oveccount];
int RC, I, J;
Char src[] = "123.123.123.123:80|1.1.1.1:88";
Char pattern[] = "(\\d*.\\d*.\\d*.\\d*):(\\d*)";
printf ("String:%s\n", SRC);
printf ("Pattern: \"%s\ "\ n", pattern);
Re = pcre_compile (pattern, 0, &error, &erroffset, NULL);
if (re = = NULL) {printf ("PCRE compilation failed at offset%d:%s\n", Erroffset, error);
return 1;
} char *p = src; while (rc = Pcre_exec (Re, NULL, p, strlen (p), 0, 0, ovEctor, oveccount))! = Pcre_error_nomatch) {printf ("Ovector is {");
for (j=0; j<oveccount; J + +) {printf ("%d,", ovector[j]);
} printf ("}\n");
printf ("\nok, has matched...\n\n");
for (i=0;i<rc;i++) {char *substring_start = p + ovector[2*i];
int substring_length = Ovector[2*i+1]-ovector[2*i];
Char matched[1024];
memset (matched, 0, 1024);
strncpy (matched, Substring_start, substring_length);
printf ("match:%s\n", matched);
} printf ("Iamhere p is%s\n", p);
p + = ovector[1];
printf ("Iamhere p+= is%s\n", p);
if (!p) break;
} pcre_free (re);
return 0; }
The above code prints the result screenshot as follows:
The above code mainly uses the Pcre_compile and pcre_exec two functions, the prototype is as follows:
(1) Pcre_compile
Pcre *pcre_compile (const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr); c0/> function: Compiles the specified regular expression
parameters:
pattern, input parameters, regular expression options for the string to be compiled,
input parameters, to specify the compile-time option
errptr, input parameters, Used to output error messages
Erroffset, output parameters, the offset of the error position in pattern
tableptr, input parameters, to specify a character list, the general case is NULL, using the default character list
return value: Pcre internal representation structure of a compiled regular expression
(2) Pcre_exec
int pcre_exec (const pcre *code, const Pcre_extra *extra, const char *subject, int length, int startoffset, int options, in T *ovector, int ovecsize);
Function: Used to check whether a string matches a specified regular expression
parameter:
code, input parameters, Pcre_compile compiled regular expression structure pointer
Extra, input parameters, used to Pcre_ Exec passes some additional data information about the structure of the pointer subject, the input parameter,
to be used to match the string
length, input parameters, to be used to match the string lengths of
Startoffset, input parameters, Used to specify where the subject begins to be matched by an offset
option, an input parameter that specifies some options in the matching process
ovector, an output parameter that returns an array ovecsize the offset of the matched position, an
input parameter, The maximum size return value of the array used to return the matched position offset: The match returns a
non-negative number successfully, and the matching return negative
Where ovector this parameter needs to understand that if pcre successfully matched, then the match string will be written to the beginning and end of Ovector, for example, the value of Ovector in the above code is as follows:
$ = {0, 18, 0, 15, 16, 18, 11508, 22708, 6, 4096, 2, 1752488, 1756584, 1756584, 240, 240, 6, 4, 4, 3 72, 372, 372, 68, 68, 4, 4, 7, 1745352, 16, 0}
Since the code in the predefined set the maximum number of matches is 30, so there are 30 values listed here, in fact, pcre_exec only matched to 3 results, the variable RC saved is the pcre_exec match number. Then the starting and ending positions of the three match results are:
0,18 = 123.123.123.123:80
0,15 = 123.123.123.123
16.18 = 80
Thus, the matching results can be extracted according to the values in the Ovector.
In addition, the regular expression "(\d*.\d*.\d*.\d*):(\d*)" In the code uses two parentheses, because the regular expression saves the matching values in a pair of parentheses to the matching result, so this regular expression matches to three demerit words, If the purpose is only to match the IP address and port number, you can remove the parentheses, which is "\d*.\d*.\d*.\d*:\d*", so that only one result is matched.