Regular
How do I build a regular expression syntax that is oriented toward HTML tags?
In the process of handling strings, regular expressions have an irreplaceable position. A corresponding regular expression support package is available in the Advanced computer language now.
If we use HTML code in a Web page as a structured string, how do we extract the HTML tags (tags) that we want with a similar regular expression?
Instead of thinking about how to implement it, consider how to describe HTML tags in a regular expression of a class. To get an HTML tag (tag), you need to know the 2 attributes of the tag:
1, the position of the label;
2, the nature of the label itself;
For example, in the following HTML page:
<body>
<div id= "Bodywrapper" class= "wrapper" >
<div id= "Leftwrapper" class= "wrapper" >
</div>
<div id= "Rightwrapper" class= "wrapper" >
<div class= "column" >
<div>
<div class= "column" >
<div>
</div>
</div>
</body>
If you want to extract the second div tag of class is column in the above HTML code, then you need me to specify the ID and class and location information for the Div. The question now is: How do you design an expression syntax that satisfies the above requirements?
Like what:
Gettag:div{tag-name:div;tag-position:2;tag-class:column;tag-id:; tag-content:;}
Tag-parent{tag-name:div;tag-position:2;tag-class:column;tag-id:; tag-content:;}
Tag-child{...}
The above is a descriptive syntax that is easy to understand.
or Python-like syntax:
gettag:div
tag-name:
Tag-id:
Tag-position:2
tag-class:column
tag-content:
tag-parent:
tag-name:
Tag-id:
Tag-position:2
Tag-class:column
tag-content:
tag-child:
tag-type:table
I do not know if you have a more appropriate form of grammatical expression. As long as it is possible to design this expression grammar properly, then the subsequent work will be done well. The ultimate goal of this is that programmers can parse HTML code with this expression, as easily as working with ordinary strings.
Hope everybody participates, many ideas, I will according to everybody's feedback, consummates this grammar, and does a based on this grammar realization.
Reference:
A basic introduction to regular expressions can be seen here:
Http://www.webjx.com/htmldata/2006-03-16/1142469074.html
Http://www.webjx.com/htmldata/2006-03-16/1142468929.html
The regular expression was first proposed by the mathematician Stephen Kleene in 1956, based on the results of the incremental study of natural languages. Regular expressions with complete syntax are used in the format matching of characters and are later applied to the field of molten information technology. Since then, regular expressions have evolved over several periods, and the standards are now approved by ISO (International Standards organization) and identified by the Open Group organization.
A regular expression is not a private language, but it can be used to find and replace text in a file or word character. It has two criteria: a basic regular expression (BRE), an extended regular expression (ERE). ere included the BRE function and other concepts.
Regular expressions are used in many programs, including XSH,EGREP,SED,VI and programs on UNIX platforms. They can be adopted in many languages, such as HTML and XML, which are usually just a subset of the entire standard.