Linux use shell script to automatically submit Web 404 dead link to search engine

Source: Internet
Author: User


Shell Script

Just do it, simply write a Shell script and it's done!

Script Name: Web site dead chain generation script

Script function: Daily analysis of the site the day before the Nginx log, and then extract the status code of 404 and UA for Baidu Spider Crawl path, and write to the site root directory death.txt file, used to submit Baidu dead chain.
Scripting code:

#!/bin/bash
#Desc: Death Chain File Script
#Author: Zhangge
#Blog: http://your domain/5038.html
#Date: 2015-05-03
#初始化变量
#定义蜘蛛UA信息 (the default is Baidu Spider)
Ua=+http://www.baidu.com/search/spider.html

#前一天的日期 (Nginx log)
Date= ' Date +%y-%m-%d-d ' 1 day ago '

#定义日志路径
logfile=/home/wwwlogs/your domain _${date}.log

#定义死链文件存放路径
deathfile=/home/wwwroot/your domain/death.txt

#定义网站访问地址
website=http://your domain name.

#分析日志并保存死链数据
For URL in ' Cat ${logfile} | Grep-i "${ua}" | awk ' {print $ ' "$} ' | grep "404" | awk ' {print '} '
Todo
grep "$url" ${deathfile} >/dev/null | | echo ${website}${url} >>${deathfile}
Done

Instructions for use:

①, script is suitable for every day to do a log cut Nginx

②, save the code as a shell script, such as deathlink.sh, and then set up a task plan as follows:


#执行如下命令编辑任务计划
[Root@mars_server ~]# Crontab-e

#每天凌晨1点执行此脚本
0 1 */1 * */root/death.sh >/dev/null 2>&1

#按下ESC, and then type: Wq Save and exit
③, after execution, will be generated in the Web site root dead chain file: Death.txt, you can browser access to see the content, such as:
http://your domain/death.txt

④, go immediately to submit this dead chain file can:


As a result, the system will execute the script every day, will yesterday's Baidu Spider climbed to the 404 path to the site root directory of the Death.txt, in case of Baidu dead chain crawl tool to come to crawl.
It is worth noting that these dead chain records are cumulative, has been saved dead chain data, even if Baidu spiders do not crawl will continue to save, need manual cleaning, but generally do not clean up also no problem.
Note: If your Nginx service does not have the appropriate access log configured, add the access log for the Web site you want under the server, otherwise the script will not be available.
Third, other development

①, if you have not done Nginx log cutting before, then you can use the following foot would be a one-time fix:

#!/bin/bash
#Desc: Cut Nginx Log and Create Death Chain File
#Author: Zhangge
#Blog: http://your domain/5038.html
#Date: 2015-05-03

#①, initializing variables:
#定义access日志存放路径
Logs_path=/home/wwwlogs

#定义蜘蛛UA信息 (the default is Baidu Spider)
Ua=+http://www.baidu.com/search/spider.html

#定义网站域名 (need to give the corresponding site in the form of domain name configuration Nginx log, such as your domain name. log)
domain= your domain name.

#定义前一天日期
Date= ' Date +%y-%m-%d-d ' 1 day ago '

#定义日志路径
logfile=/home/wwwlogs/your domain _${date}.log

#定义死链文件存放路径
deathfile=/home/wwwroot/your domain/death.txt

#定义网站访问地址
website=http://your domain name.

#②, Nginx Log Cutting
MV ${logs_path}/${domain}.log ${logs_path}/${domain}_${date}.log
KILL-USR1 ' PS Axu | grep "Nginx:master Process" | Grep-v grep | awk ' {print $} '
#可选功能: Automatically delete 30 days before the log, can be modified to save the length of time.
CD ${logs_path}
Find. -mtime +30-name "*20[1-9][3-9]*" | Xargs rm-f


#③, website Dead chain Generation (dedicated Baidu)
#分析日志并保存死链数据
For URL in ' Cat ${logfile} | Grep-i "${ua}" | awk ' {print $ ' "$} ' | grep "404" | awk ' {print '} '
Todo
grep "$url" ${deathfile} >/dev/null | | echo ${website}${url} >>${deathfile}
Done

②, other Web servers, such as Apache or IIS, as long as the reference script ideas, modified to the actual path or log fields, can also write a similar function of the Shell or Batch script, the need for a friend to study toss it!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.