1. What did we get?
ID, name, release year, release date, category, Director, starring, length, rating, star rating, number of reviews
2. Make a classification of the data.
A.. Basic information: Name, Director, release year, category, length
B. Evaluation information: score, star rating, number of reviews
C. Starring table: starring (I'm struggling to make a single list)
3. Table Design. Now a little bit of a sore is the primary key. Use the self-increment primary key or the movie ID key. After all my many considerations, I decided carefully (hastily) to use the movie ID (to build a unique index anyway, why not take it as the primary key). ), so I just turned the data in the ID again.
m_id = Re.search ("[0-9]+", Movie_url). Group () movie["ID "] = Int (m_id)
Write to play, is so casual, have any ideas on change!!!
4. Build a table. is also the egg ache, I actually tangled up is uses MySQL or MongoDB, immediately deletes oneself a slap, you will mongodb? I feel that if I continue to open a mongodb pit, I will never come back to this reptile.
I'm not stupid, not a few to write my own table SQL. SQLyog directly. A little tangle of InnoDB and MyISAM. I began to want to each time a movie information into three tables, or to use the transaction, but also want to spam data will not deduct my money, so used MyISAM. After all, it's a lot of inserts and select
Build a Table statement
CREATE DATABASE' Douban '; Use' Douban ';--Basic Information TableCREATE TABLE' T_movie_info ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, watercress movie ID', ' type 'tinyint(4)DEFAULT NULLCOMMENT'Genre 0: Drama, 1: Movie', ' name 'varchar( -)DEFAULT NULLCOMMENT'Movie Name', ' director 'varchar( -)DEFAULT NULLCOMMENT'Director', ` Year`int(4)DEFAULT NULLCOMMENT'release Year', `Month`int(2)DEFAULT NULLCOMMENT'Release Month', ` Day`int(2)DEFAULT NULLCOMMENT'Release date', ' categories1 'varchar( -)DEFAULT NULLCOMMENT'belongs to Category 1', ' Categories2 'varchar( -)DEFAULT NULLCOMMENT'belongs to Category 2', ' time 'int(3)DEFAULT NULLCOMMENT'Length of Time', PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=UTF8--Starring TableCREATE TABLE' t_movie_actors ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, watercress movie ID', ' Actor1 'varchar( -)DEFAULT NULL, ' Actor2 'varchar( -)DEFAULT NULL, ' Actor3 'varchar( -)DEFAULT NULL, ' Actor4 'varchar( -)DEFAULT NULL, ' Actor5 'varchar( -)DEFAULT NULL, PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=UTF8--Evaluation Data SheetCREATE TABLE' t_movie_scores ' (' ID ')bigint( -) unsigned not NULLCOMMENT'primary key, movie ID', ' score 'DoubleUnsignedDEFAULT '0'COMMENT'Ratings', ' votes 'int(Ten) unsignedDEFAULT '0'COMMENT'Number of reviews', ' star1 'DoubleUnsignedDEFAULT '0'COMMENT'1 star ratio', ' star2 'DoubleUnsignedDEFAULT '0'COMMENT'2 star ratio', ' Star3 'DoubleUnsignedDEFAULT '0'COMMENT'3 star ratio', ' STAR4 'DoubleUnsignedDEFAULT '0'COMMENT'4 star ratio', ' STAR5 'DoubleUnsignedDEFAULT '0'COMMENT'5 star ratio', PRIMARY KEY(' id ')) ENGINE=MyISAMDEFAULTCHARSET=Utf8
5. Write a method to insert the data into the database. is also the first time to write data to the database with Python, tossing the night. I do not know which side of the cake than the%s do not add quotation marks, make me see this for the first time, and then went wrong always think this is right, looking for another reason
definset_data (movie):#get a database connectionconn = Pymysql. Connect (host="localhost", port=3306, user="Root", password="Root", db="Douban") #Conn.autocommit (True) # set autocommitcursor = Conn.cursor ()#Get cursorSql_insert_info = ("insert INTO ' t_movie_info ' (' id ', ' type ', ' name ', ' director ', ' Year ', ' Month ', ' Day '," "' categories1 ', ' categories2 ', ' time ') values (%d,%d, '%s ', '%s ',%d,%d,%d, '%s ', '%s ',%d)") Categories= movie["Categories"] Ca_len=len (categories) Categories1= Categories[0]ifCa_len > 0ElseNone Categories2= Categories[1]ifCa_len > 1ElseNone cursor.execute (sql_insert_info% (movie["ID"], movie["type"], movie["name"], movie["Directer"], movie["Date"].year, movie["Date"].month, movie["Date"].day, Categories1, Categories2, movie[" Time"])) #Write SQL do not forget to add double quotation marks%s, otherwise it will error Unknown column ' A ' in ' Field List 'Sql_insert_actors = ("insert INTO ' t_movie_actors ' (ID, Actor1, Actor2, Actor3, Actor4, Actor5)" "values (%d, '%s ', '%s ', '%s ', '%s ', '%s ')") Actors= movie["actors"] Actors_len=Len (actors) Actor1= Actors[0]ifActors_len > 0ElseNone Actor2= Actors[1]ifActors_len > 1ElseNone Actor3= Actors[2]ifActors_len > 2ElseNone Actor4= Actors[3]ifActors_len > 3ElseNone Actor5= Actors[4]ifActors_len > 4ElseNone cursor.execute (sql_insert_actors% (movie["ID"], Actor1, Actor2, Actor3, Actor4, Actor5)) Sql_insert_scores= ("insert INTO ' t_movie_scores ' (ID, score, votes, star1, Star2, Star3, STAR4, STAR5)" "values (%d,%f,%d,%f,%f,%f,%f,%f)") Stars= movie["stars"] Stars_len=Len (stars) Star1= Stars[0]ifStars_len > 0Else0.0star2= Stars[1]ifStars_len > 1Else0.0Star3= Stars[2]ifStars_len > 2Else0.0STAR4= Stars[3]ifStars_len > 3Else0.0STAR5= Stars[4]ifStars_len > 4Else0.0Cursor.execute (Sql_insert_scores% (movie["ID"], movie["score"], movie["vote"], star1, Star2, Star3, STAR4, STAR5) conn.commit () data1= Douban_movie ("https://movie.douban.com/subject/30236775/?from=showing") Inset_data (data1) data2= Douban_movie ("Https://movie.douban.com/subject/26842702/?tag=%E7%83%AD%E9%97%A8&from=gaia") Inset_data (data2) data3= Douban_movie ("Https://movie.douban.com/subject/26973784/?tag=%E6%9C%80%E6%96%B0&from=gaia") Inset_data (data3) data4= Douban_movie ("Https://movie.douban.com/subject/30249296/?tag=%E7%83%AD%E9%97%A8&from=gaia") Inset_data (DATA4)
After executing the database:
"Python" starts from 0 to write crawlers--to store the bean paste data into the database