The summer vacation is free. The first bullet is the Django-Based Query System for the academic performance of the Yangtze River University.

Source: Internet
Author: User

The summer vacation is free. The first bullet is the Django-Based Query System for the academic performance of the Yangtze River University.

Knowledge points involved in this article include: Python crawler, MySQL database, html/css/js basics, selenium and phantomjs basics, MVC design patterns, django framework (Python web development framework ), basic operations on apache server and linux (centos 7 as an example. Therefore, it is suitable for students with the above foundations to learn.

Statement: This blog post is only intended for purely technical exchanges,This article will filter sensitive information. Sorry (I have nothing to do with the problems on the website of the Academic Affairs Office of Changjiang University for any reason ).

Implementation: without the data interface of the Academic Affairs Office (Student Information Security), you can only write crawlers to simulate login to the academic affairs office and then crawl data to prevent the Academic Affairs Office website from crashing, as a result, crawlers fail to cache data. You can retrieve data directly from your database next time. What we need to do is to regularly update the data and synchronize it with the Academic Affairs Office.

Technical Architecture: centos 7 + apache2.4 + mariadb5.5 + Python2.7.5 + mod_wsgi 3.4 + django1.11

------------------------------------------------------------------------

I. Python crawler:

1. Check the logon portal first.

Here, we use FireFox for packet capture analysis. We found that the login was post, with 7 parameters and Verification Code. There are two solutions at this time, one is to use the current very popular technology to use DL for image recognition, and the other is to let users lose their own. First, the cost is relatively high .. If you are not busy, try it. Remember that Python has a library named Pillow or PIL that can be used for image recognition. Try TF during the summer vacation. The second one is low.

2. There is also a way to go up tall. You don't need to worry about the verification code. We will not elaborate on it here. We will simulate the login:

# Coding: utf8from bs4 import BeautifulSoupimport urllibimport urllib2import requestsimport sysreload (sys) sys. setdefaultencoding ('gbk') loginURL = "" cjcxURL = "http://jwc2.yangtzeu.edu.cn: 8080/cjcx. aspx "html = urllib2.urlopen (loginURL) soup = BeautifulSoup (html," lxml ") _ VIEWSTATE = soup. find (id = "_ VIEWSTATE") ["value"] _ EVENTVALIDATION = soup. find (id = "_ EVENTVALIDATION") ["value"] data = {"_ VIEWS TATE ":__ VIEWSTATE," _ EVENTVALIDATION ":__ EVENTVALIDATION," txtUid ":" Account "," btLogin ":" % B5 % C7 % C2 % BC ", "txtPwd": "password", "selKind": "1"} header = {# "Host": "rjc2.yangtzeu.edu.cn: 8080", "User-Agent ": "Mozilla/5.0 (Windows NT 10.0 ;... Gecko/20100101 Firefox/54.0 "," Accept ":" text/html, application/xhtml + x... Lication/xml; q = 0.9, */*; q = 0.8 "," Accept-Language ":" zh-CN, zh; q = 0.8, en-US; q = 0.5, en; q = 0.3 "," Accept-Encoding ":" gzip, deflate "," Content-Type ": "application/x-www-form-urlencoded", # "Content-Length": "644", "Referer": "http://jwc2.yangtzeu.edu.cn: 8080/login. aspx ", #" Cookie ":" ASP. NET_SessionId = 3zjuqi0cnk5514l241csejgx ", #" Connection ":" keep-alive ", #" Upgrade-Insecure-Requests ":" 1 ",} UserSession = requests. session () Request = UserSession. post (loginURL, data, header) Response = UserSession. get (cjcxURL, cookies = Request. cookies, headers = header) soup = BeautifulSoup (Response. content, "lxml") print soup

Next we can see:

Post again (this code is connected ):

__VIEWSTATE2 = soup.find(id="__VIEWSTATE")["value"]__EVENTVALIDATION2 = soup.find(id="__EVENTVALIDATION")["value"]AllcjData = {            "__EVENTTARGET":"btAllcj",            "__EVENTARGUMENT":"",            "__VIEWSTATE":__VIEWSTATE2,            "__EVENTVALIDATION":__EVENTVALIDATION2,            "selYear":"2017",            "selTerm":"1",#            "Button2":"%B1%D8%D0%DE%BF%CE%B3%C9%BC%A8"        }AllcjHeader = {#       "Host":"jwc2.yangtzeu.edu.cn:8080",        "User-Agent":"Mozilla/5.0 (Windows NT 10.0;… Gecko/20100101 Firefox/54.0",        "Accept":"text/html,application/xhtml+x…lication/xml;q=0.9,*/*;q=0.8",        "Accept-Language":"zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3",        "Accept-Encoding":"gzip, deflate",        "Content-Type":"application/x-www-form-urlencoded",#        "Content-Length":"644",        "Referer":"http://jwc2.yangtzeu.edu.cn:8080/cjcx.aspx",#        "Cookie":,        "Connection":"keep-alive",        "Upgrade-Insecure-Requests":"1",        }Request1 = UserSession.post(cjcxURL,AllcjData,AllcjHeader)Response1 = UserSession.get(cjcxURL,cookies = Request.cookies,headers=AllcjHeader)soup = BeautifulSoup(Response1.content,"lxml")print soup

No... This get page is still the original page... I think there are two reasons for this post failure: first, the _ VIEWSTATE and _ EVENTVALIDATION variables of asp.net cause the post failure, and second, multiple buttons of a form use js for judgment, as a result, crawlers fail. For Dynamically Loaded pages, normal crawlers still do not work ....

3. Use selenium (web automated testing tool that can simulate mouse clicks) + phantomjs (Browsers without interfaces are faster than chrome and Firefox)

Selenium installation: pip install selenium

Install phantomjs:

(1) Address: http://phantomjs.org/download.html (I downloaded Linux 64-bit)

(2) Decompression: tar-jxvf phantomjs-2.1.1-linux-x86_64.tar.bz2/usr/share/

(3) Installation dependency: yum install fontconfig freetype libfreetype. so.6 libfontconfig. so.1

(4) configure the environment variable: export PATH = $ PATH:/usr/share/phantomjs-2.1.1-linux-x86_64/bin

(5) Input phantomjs in shell. If you can enter the command line, the installation is successful.

Ignore my comments:

# Coding: utf8from bs4 import BeautifulSoupfrom selenium import webdriverfrom selenium. webdriver. common. keys import Keysimport timeimport urllibimport urllib2import sys reload (sys) sys. setdefaultencoding ('utf8') driver = webdriver. phantomJS (); driver. get ("") driver. find_element_by_name ('txtuid '). send_keys ('account') driver. find_element_by_name ('txtpwd '). send_keys ('Password') driver. find_element_by_id ('btlogin' ). Click () cookie = driver. get_cookies () driver. get ("http://jwc2.yangtzeu.edu.cn: 8080/cjcx. aspx ") # print driver. page_source # driver. find_element_by_xpath ("// input [@ name = 'btallcj '] [@ type = 'button']") # js = "document. getElementById ('btallcj '). onclick = function () {__ doPostBack ('btallcj ', '')}" # js = "var ob; ob = document. getElementById ('btallcj '); ob. focus (); ob. click ();) "includriver.exe cute_script (" document. getElementB YId ('btallcj '). click (); ") # time. sleep (2) # Let the operation stop a bit # driver. find_element_by_link_text ("all scores "). click () # Find the 'login' button and click # time. sleep (2) # js1 = "document. form1. _ EVENTTARGET. value = 'btallcj '; "# js2 =" document. form1. _ EVENTARGUMENT. value = '';" includriver.execute_script(js100000000driver.exe cute_script (js2) # driver. find_element_by_name ('_ EVENTTARGET '). send_keys ('btallcj ') # driver. find_element_by_name ('_ EVENTARGUMENT '). send _ Keys ('') # js =" var input = document. createElement ('input'); input. setAttribute ('type', 'den den '); input. setAttribute ('name', '_ EVENTTARGET'); input. setAttribute ('value', ''); document. getElementById ('form1 '). appendChild (input); var input = document. createElement ('input'); input. setAttribute ('type', 'den den '); input. setAttribute ('name', '_ EVENTARGUMENT'); input. setAttribute ('value', ''); document. getElem EntById ('form1'). appendChild (input); var theForm = document. forms ['form1']; if (! TheForm) {theForm = document. Form1;} function _ doPostBack (eventTarget, eventArgument) {if (! TheForm. onsubmit | (theForm. onsubmit ()! = False) {theForm. _ EVENTTARGET. value = eventTarget; theForm. _ EVENTARGUMENT. value = eventArgument; theForm. submit () ;}__ doPostBack ('btallcj ', '')" # js = "var script = document. createElement ('script'); script. type = 'text/javascript '; script. text = 'if (! TheForm) {theForm = document. Form1;} function _ doPostBack (eventTarget, eventArgument) {if (! TheForm. onsubmit | (theForm. onsubmit ()! = False) {theForm. _ EVENTTARGET. value = eventTarget; theForm. _ EVENTARGUMENT. value = eventArgument; theForm. submit () ;}} '; document. body. appendChild (script); "#driver.exe cute_script (js) driver. find_element_by_name ("Button2 "). click () html = driver. page_sourcesoup = BeautifulSoup (html, "lxml") print souptables = soup. findAll ("table") for tab in tables:
For tr in tab. findAll ("tr "):
Print "--------------------"
For td in tr. findAll ("td") [0: 3]:
Print td. getText ()

 

Now you can only get the required course scores ..... Because all the scores are triggered by js generated by ASP... Instead of directly submit... Looking for a solution. Let's start designing our database...

Ii. Mariadb student database design. Here, we reference the content of our SQL server database on the machine...

 

My database creation statement:

create database jwc character set utf8;use jwc;create table Student(    Sno char(9) primary key,    Sname varchar(20) unique,    Sdept char(20),    Spwd char(20));create table Course(    Cno   char(2) primary key,    Cname varchar(30) unique,    Credit  numeric(2,1));create table SC(     Sno char(9) not null,    Cno char(2) not null,    Grade int check(Grade>=0 and Grade<=100),    primary key(Sno,Cno),    foreign key(Sno) references Student(Sno),    foreign key(Cno) references Course(Cno));

Iii. Python web environment setup (LAMP ):

1. Because the selected http server is apache, you need to install mod_wsgi (python universal Gateway Interface) to implement interaction between apache and Python programs... If nginx is used, install and configure uwsgi... Similar to java servlet and PHP php-fpm.

Install: yum install mod_wsgi

Configuration: vim/etc/httpd/conf/httpd. conf

This configuration took me a lot of time and thought about it... There are many errors on the internet... The most standard Python web django Development Configuration... Thank you for taking it away.

#config python webLoadModule wsgi_module modules/mod_wsgi.so  <VirtualHost *:8080>    ServerAdmin root@Vito-Yan    ServerName www.yuol.onlne    ServerAlias yuol.online    Alias /media/ /var/www/html/jwc/media/    Alias /static/ /var/www/html/jwc/static/    <Directory /var/www/html/jwc/static/>            Require all granted    </Directory>        WSGIScriptAlias / /var/www/html/jwc/jwc/wsgi.py #    DocumentRoot "/var/www/html/jwc/jwc"    ErrorLog "logs/www.yuol.online-error_log"    CustomLog "logs/www.yuol.online -access_log" common        <Directory "/var/www/html/jwc/jwc">        <Files wsgi.py>            AllowOverride All             Options Indexes FollowSymLinks Includes ExecCGI            Require all granted        </Files>        </Directory></VirtualHost>

2. Install django below... Pip install django .... Done.

View django version: python-m django -- version

Address: https://www.djangoproject.com

Create a project: python-admin startproject jwc (my website root directory of apache is created under/var/www/html)

3. apcehe configuration: Leave it unpasted. Change the jwc above to ipvc2, change the port to 9000, and then Listen 9000 (Why is 9000 used? the jwc of the first project is 8080, the built-in django server uses python manage. py runserver can be enabled. Its default port is 8000, so no 8000 is needed to avoid conflict. The tomcat server of my jsp project uses port 9090 to avoid conflict. It is best not to use it, generally, port 9000 is used, and others are not recommended for use ).

4. settings. py Configuration:

DEBUG = True DEBUG Enabled

ALLOWED_HOSTS = ['192. 168.47.128 '] Add a host

5. Configure wsgi. py. Don't ask me why... I don't know either .. Use the apache server to start the django project... If you use the server that comes with django, you don't need to change it...

"""WSGI config for jwc2 project.It exposes the WSGI callable as a module-level variable named ``application``.For more information on this file, seehttps://docs.djangoproject.com/en/1.11/howto/deployment/wsgi/"""#import os#from django.core.wsgi import get_wsgi_application#os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jwc2.settings")#application = get_wsgi_application()import os    from os.path import join,dirname,abspath    PROJECT_DIR = dirname(dirname(abspath(__file__)))    import sys    sys.path.insert(0,PROJECT_DIR)os.environ.setdefault("DJANGO_SETTINGS_MODULE", "jwc2.settings")    from django.core.wsgi import get_wsgi_applicationapplication = get_wsgi_application()

Then the success will be achieved .... The Python web environment is complete...

4. Start our first django project application...

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.