Apache under the shell script submission site 404 dead Chain

Source: Internet
Author: User
Tags apache log

Website operators for the concept of death chain is not unfamiliar, the site of some data deletion or page revision, etc. are easy to create a dead chain, affecting the user experience does not say, too many dead chain will affect the overall weight or rankings of the site.


Baidu Webmaster Platform provides a dead chain submission tool, can be the site of the existence of the Dead chain (protocol dead chain, 404 pages) To submit, can quickly delete the dead chain, help SEO optimization site. In the file to commit the dead chain one by one manually fill in the dead chain words too cumbersome, work we advocate complex automation, so this article we share the Apache service together through the shell script to organize the website dead chain, easy for us to submit.



1. Configure Apache record search engine

Apache is currently the Web site construction of the most mainstream Web services, but Apache log files are not recorded by default Baidu, Google and other major search engine crawl procedures, so first we need to set up Apache configuration files.

Locate the Apache configuration file httpd.conf, and locate the following two lines in the configuration file:


" Logs/access_log "  "logs/access_log" combined


The default is common, where we just need to common this line Front Plus # comment out and then remove the # before the combined line. Then save the restart Apache service.

Note: If you have multiple sites added to your server and each site has a separate profile, then we only need to set the Customlog entry in the corresponding site's configuration file, for example:


vim/usr/local/apache/conf/vhost/www.chanzhi.org.conf ServerAdmin [emailprotected] DocumentRoot"/data/wwwroot/www.chanzhi.org"ServerName www.chanzhi.org serveralias chanzhi.org errorlog"/data/wwwlogs/www.chanzhi.org_error_apache.log"Customlog"/data/wwwlogs/www.chanzhi.org_apache.log"combined<directory"=""data=""wwwroot=""www.chanzhi.org"="">setoutputfilter DEFLATE Options followsymlinks execcgi Require All granted allowoverride all Order all Ow,deny Allow fromAll directoryindex index.html index.php

Here is the site logging format before and after the configuration:


Before configuration:

After configuration:


2. Writing shell Scripts

We get a crawl record of the specified crawler in the site log through a shell script, and then summarize it into a file for later use. The code is as follows, such as Save as deathlink.sh


#!/bin/bash# Initialize variable # defines spider UA information (default is Baidu spider) UA='+http://www.baidu.com/search/spider.html'#前一天的日期 (Apache log) DATE= ' Date +%y%m%d-d"1 day ago"' #定义日志路径logfile=/data/wwwlogs/www.chanzhi.org_apache.log-${date}.log #定义死链文件存放路径deathfile=/data/wwwroot/www.chanzhi.org/deathlink.txt #定义网站访问地址website=http://www.chanzhi.org#分析日志并保存死链数据 forUrlinch' Awk-v str="${ua}" '$9== "404" && $15~str {print $7}'${logfile} ' Dogrep-Q"$url"${deathfile} | | Echo ${website}${url} >>${deathfile}done

When you use the script, you can adjust the path and field according to your own server, and then execute the script:


Bash deathlink.sh


3. Submit the Dead chain

When executing the above script, a file containing all the obtained 404 page links is generated under the specified directory, with one row for each connection. For example:

Finally, in the Webmaster platform to submit the Dead chain page, fill in their own dead chain file address can be, for example:


After the audit, Baidu will be included in the invalid link deleted to avoid the failure of the page link to the site adversely affected.


Summary:

This article and everyone to share in the Apache service environment, how to use shell script to automatically get Baidu Spider crawler crawl of the dead chain, and generate a summary file submitted to the search engine. If you have any other better ways or questions, you are welcome to share the conversation.


Apache under the shell script submission site 404 dead Chain

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.