"Python" starts from 0 to write crawlers--to store the bean paste data into the database

Source: Internet
Author: User
Tags mongodb

1. What did we get?

ID, name, release year, release date, category, Director, starring, length, rating, star rating, number of reviews

2. Make a classification of the data.

A.. Basic information: Name, Director, release year, category, length

B. Evaluation information: score, star rating, number of reviews

C. Starring table: starring (I'm struggling to make a single list)

3. Table Design. Now a little bit of a sore is the primary key. Use the self-increment primary key or the movie ID key. After all my many considerations, I decided carefully (hastily) to use the movie ID (to build a unique index anyway, why not take it as the primary key). ), so I just turned the data in the ID again.

    m_id = Re.search ("[0-9]+", Movie_url). Group ()    movie["ID "] = Int (m_id)

Write to play, is so casual, have any ideas on change!!!

4. Build a table. is also the egg ache, I actually tangled up is uses MySQL or MongoDB, immediately deletes oneself a slap, you will mongodb? I feel that if I continue to open a mongodb pit, I will never come back to this reptile.

I'm not stupid, not a few to write my own table SQL. SQLyog directly. A little tangle of InnoDB and MyISAM. I began to want to each time a movie information into three tables, or to use the transaction, but also want to spam data will not deduct my money, so used MyISAM. After all, it's a lot of inserts and select

Build a Table statement

CREATE DATABASE' Douban '; Use' Douban ';--Basic Information TableCREATE TABLE' T_movie_info ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, watercress movie ID', ' type 'tinyint(4)DEFAULT NULLCOMMENT'Genre 0: Drama, 1: Movie', ' name 'varchar( -)DEFAULT NULLCOMMENT'Movie Name', ' director 'varchar( -)DEFAULT NULLCOMMENT'Director',  ` Year`int(4)DEFAULT NULLCOMMENT'release Year',  `Month`int(2)DEFAULT NULLCOMMENT'Release Month',  ` Day`int(2)DEFAULT NULLCOMMENT'Release date', ' categories1 'varchar( -)DEFAULT NULLCOMMENT'belongs to Category 1', ' Categories2 'varchar( -)DEFAULT NULLCOMMENT'belongs to Category 2', ' time 'int(3)DEFAULT NULLCOMMENT'Length of Time',  PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=UTF8--Starring TableCREATE TABLE' t_movie_actors ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, watercress movie ID', ' Actor1 'varchar( -)DEFAULT NULL, ' Actor2 'varchar( -)DEFAULT NULL, ' Actor3 'varchar( -)DEFAULT NULL, ' Actor4 'varchar( -)DEFAULT NULL, ' Actor5 'varchar( -)DEFAULT NULL,  PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=UTF8--Evaluation Data SheetCREATE TABLE' t_movie_scores ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, movie ID', ' score 'DoubleUnsignedDEFAULT '0'COMMENT'Ratings', ' votes 'int(Ten) unsignedDEFAULT '0'COMMENT'Number of reviews', ' star1 'DoubleUnsignedDEFAULT '0'COMMENT'1 star ratio', ' star2 'DoubleUnsignedDEFAULT '0'COMMENT'2 star ratio', ' Star3 'DoubleUnsignedDEFAULT '0'COMMENT'3 star ratio', ' STAR4 'DoubleUnsignedDEFAULT '0'COMMENT'4 star ratio', ' STAR5 'DoubleUnsignedDEFAULT '0'COMMENT'5 star ratio',  PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=Utf8

5. Write a method to insert the data into the database. is also the first time to write data to the database with Python, tossing the night. I do not know which side of the cake than the%s do not add quotation marks, make me see this for the first time, and then went wrong always think this is right, looking for another reason

definset_data (movie):#get a database connectionconn = Pymysql. Connect (host="localhost", port=3306, user="Root", password="Root", db="Douban")    #Conn.autocommit (True) # set autocommitcursor = Conn.cursor ()#Get cursorSql_insert_info = ("insert INTO ' t_movie_info ' (' id ', ' type ', ' name ', ' director ', ' Year ', ' Month ', ' Day ',"                       "' categories1 ', ' categories2 ', ' time ') values (%d,%d, '%s ', '%s ',%d,%d,%d, '%s ', '%s ',%d)") Categories= movie["Categories"] Ca_len=len (categories) Categories1= Categories[0]ifCa_len > 0ElseNone Categories2= Categories[1]ifCa_len > 1ElseNone cursor.execute (sql_insert_info% (movie["ID"], movie["type"], movie["name"], movie["Directer"], movie["Date"].year, movie["Date"].month, movie["Date"].day, Categories1, Categories2, movie[" Time"]))    #Write SQL do not forget to add double quotation marks%s, otherwise it will error Unknown column ' A ' in ' Field List 'Sql_insert_actors = ("insert INTO ' t_movie_actors ' (ID, Actor1, Actor2, Actor3, Actor4, Actor5)"                         "values (%d, '%s ', '%s ', '%s ', '%s ', '%s ')") Actors= movie["actors"] Actors_len=Len (actors) Actor1= Actors[0]ifActors_len > 0ElseNone Actor2= Actors[1]ifActors_len > 1ElseNone Actor3= Actors[2]ifActors_len > 2ElseNone Actor4= Actors[3]ifActors_len > 3ElseNone Actor5= Actors[4]ifActors_len > 4ElseNone cursor.execute (sql_insert_actors% (movie["ID"], Actor1, Actor2, Actor3, Actor4, Actor5)) Sql_insert_scores= ("insert INTO ' t_movie_scores ' (ID, score, votes, star1, Star2, Star3, STAR4, STAR5)"                         "values (%d,%f,%d,%f,%f,%f,%f,%f)") Stars= movie["stars"] Stars_len=Len (stars) Star1= Stars[0]ifStars_len > 0Else0.0star2= Stars[1]ifStars_len > 1Else0.0Star3= Stars[2]ifStars_len > 2Else0.0STAR4= Stars[3]ifStars_len > 3Else0.0STAR5= Stars[4]ifStars_len > 4Else0.0Cursor.execute (Sql_insert_scores% (movie["ID"], movie["score"], movie["vote"], star1, Star2, Star3, STAR4, STAR5) conn.commit () data1= Douban_movie ("https://movie.douban.com/subject/30236775/?from=showing") Inset_data (data1) data2= Douban_movie ("Https://movie.douban.com/subject/26842702/?tag=%E7%83%AD%E9%97%A8&from=gaia") Inset_data (data2) data3= Douban_movie ("Https://movie.douban.com/subject/26973784/?tag=%E6%9C%80%E6%96%B0&from=gaia") Inset_data (data3) data4= Douban_movie ("Https://movie.douban.com/subject/30249296/?tag=%E7%83%AD%E9%97%A8&from=gaia") Inset_data (DATA4)

After executing the database:

"Python" starts from 0 to write crawlers--to store the bean paste data into the database

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.