How to replace html tags and use MySQL functions to capture articles _ mysql

Source: Internet
Author: User
Tags mysql functions types of functions mysql locate mysql substring
MySQL solves the html tag replacement for crawling articles and the usage of its mysql function description has just completed the Html5 Web App page of a mobile client strategy. new requirements have emerged: because the articles are captured, there are many external links that are not filtered at the beginning. So I first wrote a filter function in PHP, and then executed batch update related database records.

Public static function filter_newslink ($ aid) {$ content = mod_news: get_newscont ($ aid); // filter the outer links of an image first $ content = preg_replace ('/()
 /I ',' $ {2} ', $ content); // filter the text of the outer link to the text $ content = preg_replace ('/(.*)
 /I ',' $ {2} ', $ content); $ data = array ('news _ id' => $ aid, "content" => $ content ); $ status = mod_news: update_newscontent ($ data); return $ status ;}

You can use mysql to export, process, and import the keyword tags of an article to the database, or use stored procedures. This stored procedure will not be written.

Later, I found that some images were not captured, and the product was launched. it was unrealistic to crawl data again. After negotiation with colleagues, remove the block where the image is located. The article will no longer show the "card number" picture table. The implementation method is to use some of the less commonly used functions of MySQL.

SELECT replace (content, SUBSTRING (content from position ("card number" IN content) for position ("-->" IN content )),"

") As x from CONTENT_TABLE cwhere c. news_id in (select news_id from NEWS_TABLE where col_id = 66) update CONTENT_TABLE cset c. content = replace (c. content, SUBSTRING (c. content from position ("card number" IN c. content) for position ("-->" IN c. content )),"

") Where c. news_id in (select news_id from NEWS_TABLE where col_id = 66)

Given that the captured html structure is messy and complex, this is the only way to do this. Although the efficiency is average, there are only a few hundred articles and acceptable solutions. Here we will repeat the SQL functions involved:

I. MySQL LOCATE and alias function POSITION

The LOCATE (substr, str) function is the same as POSITION (substr IN str) and INSTR (str, substr)

Purpose: return the position where the substring substr first appears in the string str. If the substring substr does not exist in str, the return value is 0. There is also a form of LOCATE, which contains three parameters: LOCATE (substr, str, pos ), returns the position where the substring substr first appears after the position pos in the str string. INSTR (str, substr) and LOCATE () have the same two-parameter format, but the parameter order is different.

Mysql> select locate ('bar', 'foobarbar ');
-> 4
Mysql> select instr ('foobar', 'bar ');
-> 4
Mysql> select locate ('xbar', 'foobar ');
-> 0
Mysql> select instr ('xbar', 'foobar ');
-> 0
Mysql> select locate ('bar', 'foobarbarbar ', 7 );
-> 7

This function is multi-byte secure. In MySQL 3.23, this function is case-sensitive. in MySQL 4.0, if any parameter is a binary string, it is case-sensitive.

The following statement can implement the same query function:

SELECT 'column' FROM 'table' where 'condition' like '% keyword %'

SELECT 'column' from 'table' where locate ('keyword', 'condition ')> 0

SELECT 'column' from 'table' where position ('keyword' IN 'condition ')

SELECT 'column' from 'table' where instr ('condition ', 'keyword')> 0

In terms of speed, the last three are a little faster than using like.


Usage 1: REPLACE (str, from_str, to_str)
All the from_str strings in the str string are replaced by to_str, and the following string is returned:
Mysql> select REPLACE ('www', 'A', 'list ');

For example, replace the "Three Kingdoms" in the name field of the table with "Three Kingdoms"
Mysql> update table set name = replace (name, 'Three Kingdoms, Three Kingdoms for all ')

This function is also multi-byte secure.

Usage 2: REPLACE

When inserting data into a table, you often encounter this situation: 1. First, determine whether the data exists; 2. if the data does not exist, insert the data; 3. if the data exists, update the data. Programmers, including me, have three common practices:

First, MySQL is a very common practice. many new users and even many senior coder have written such a statement, which encapsulates three functions in the code, whether a function query record exists. one function is inserted directly, and the other function updates existing records. Call in different situations. This method is used to execute data operations multiple times, which will inevitably cause a large overhead.

Type 2: replace the encapsulation with an SQL statement to perform on-demand operations, insert new records, or update old data. The SQL Server statements are as follows:
If not exists (select 1 from NEWS_bak where fig = 1008)
Insert into NEWS_bak (title, keyword, description) values ('Sun Quan ', 'Three Kingdoms', 'Sun Quan-Wu guoda ')
Update NEWS_bak set title = "Sun Quan" and keyword = 'Three Kingdoms 'and description = 'Sun Quan-Wu Guoba' where news_id = 1008

Note: For the same expression of if not exists, MySQL is generally used as the condition where not exists (); because exists (select null) also returns true, so select exists (select null) the result is 1.

But how does one implement this logic in MySQL? Methods are available, and the syntax is more concise-replace.

MySQL replace into has three forms:

1. replace into tbl_name (col_name,...) values (...)
2. replace into tbl_name (col_name,...) select...
3. replace into tbl_name set col_name = value ,...

The first two forms are frequently used. The "into" keyword can be omitted, but it is better to add "into" to make it more intuitive. The values of all columns are equal to the value specified in the REPLACE statement. All missing columns are set as their default values, which is the same as INSERT. You cannot reference a value from the current row or use a value in a new row. If you use a value such as "SET col_name = col_name + 1", the reference to the column name on the right will be processed as DEFAULT (col_name. Therefore, the value is equivalent to SET col_name = DEFAULT (col_name) + 1.


Replace into tableName (id, index) values ('1', 'index-A'), ('2', 'index-B ')
This statement inserts two records into the tableName table. The replace into function is similar to the insert function. The difference is that replace into first tries to insert data into the table. if this row of data already exists in the table (determined based on the primary key or unique index), the data is deleted first, insert new data. otherwise, insert new data directly.

To use REPLACE, you must have both the INSERT and DELETE permissions for the table.

The REPLACE statement returns a number to indicate the number of affected rows. This is the sum of the number of deleted and inserted rows. If this number is 1 for a single row, one row is inserted and no row is deleted. If the number is greater than 1, one or more old rows are deleted before the new row is inserted. If the table contains multiple unique indexes, and the new row copies the values of different old rows in different unique indexes, it is possible that a single row replaces multiple old rows.

PHP can use the mysql_affected_rows () function to obtain the number of affected rows. The number of affected rows can be easily determined whether only one row is added for REPLACE, or whether REPLACE replaces other rows: you only need to check whether the number is 1 (added) or larger (replaced ).

Note: the table to which data is inserted must have a primary key or a UNIQUE index UNIQUE! Otherwise, using a REPLACE statement is meaningless. This statement is the same as INSERT, because no index is used to determine whether other rows have been copied in the new row. In this case, replace into inserts data directly, which leads to duplicate data in the table.

III. Use of MySQL SUBSTRING functions

SUBSTRING (expression, start, length)
Expression is a string, binary string, text, image, column, or expression containing columns. Do not use expressions that contain aggregate functions.
Start is an integer that indicates the start position of the substring. If it is a negative number, it is counted from the end of the string.
Length is an integer (optional) length of the substring (number of characters or bytes to return ).

There are four types of functions:

SUBSTRING (str, pos ),
SUBSTRING (str FROM pos ),
SUBSTRING (str, pos, len ),
SUBSTRING (str FROM pos FOR len)

Returns a substring from the str string in a format that does not contain the len parameter, starting from the position pos. The format with the len parameter returns a substring with the same length as the len character from the string str, starting from the position pos. Use the FROM format as the standard SQL syntax. It may also use a negative value for pos. In this case, the position of the substring starts with the pos character at the end of the string, rather than the start position of the string. You can use a negative value for pos in the following format.


Because start and length are specified when SUBSTRING is used in text data, DBCS data (such as Japanese characters) may split characters at the start or end of the result. This behavior is consistent with the way that READTEXT processes DBCS. However, due to occasional strange results, we recommend that you use ntext instead of text for DBCS characters.

Return type

If expression is a supported character data type, character data is returned. If expression is a supported binary data type, binary data is returned. Generally, text data is returned in varchar format, while image data is returned in varbinary format.

The type of the returned string is the same as that of the given expression (except for those displayed in the table ).


The offset (start and length) of the ntext, char, or varchar data type must be specified in the number of characters ). The offset that uses the text, image, binary, or varbinary data type must be specified in the number of bytes.

The compatibility level may affect the return value. For more information about the compatibility level, see sp_dbcmptlevel.

Other string truncation functions include:
1. left (str, length) truncates a string from the left
Example: select left (title, 1) AS abstract FROM 'News _ bak'

2, Right (str, length) truncates a string from the right

3, Substring_index (str, delim, count) truncate string by keyword
Note: substring_index (intercepted field, keyword, number of times the keyword appears)
Example: select substring_index ("", ".", 2) as domain from news_table: www.8783
(Note: If the number of times a keyword appears is negative, for example,-2, it is counted from the last to the end of the string)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.