Extract the required rows in the Dataframe data sheet
Code Features:
Use LOC () in the Dataframe table to get the rows we want, and then sort them according to the values of a column element
This code also shows the addition of columns for DataFrame, name_dataframe[' diff ']=___ directly, and the DataFrame can be sorted based on the value of the newly added column element
Import Pandas as Pdunames = [' user_id ', ' gender ', ' age ', ' occupation ', ' zip ']users = pd.read_table (' Users.dat ', sep= ':: ', Header=none, names=unames) rnames = [' user_id ', ' movie_id ', ' rating ', ' timestamp ']ratings = pd.read_table (' Ratings.dat ') , sep= ':: ', Header=none, names=rnames) mnames = [' movie_id ', ' title ', ' genres ']movies = pd.read_table (' Movies.dat ', sep= ' :: ', Header=none, names=mnames) data = Pd.merge (Pd.merge (ratings,users), movies) mean_ratings = pd.pivot_table (data, index=[' title '],values= ' rating ', columns= ' gender ') print (mean_ratings[:10]) Ratings_by_title = data.groupby (' title ' ). Size () print (ratings_by_title[:10]) active_titles = Ratings_by_title.index[ratings_by_title >= 250]print (active _titles) active_mean_ratings = Mean_ratings.loc[active_titles]top_female_ratings = Active_mean_ratings.sort_index ( By= ' F ', Ascending=false) active_mean_ratings[' diff ' = active_mean_ratings[' M ']-active_mean_ratings[' F ']sorted_by_ diff = Active_mean_ratings.sort_index (by= ' diff ') print (sorted_by_diff[::-1][:]) #注意对dataframe进行倒序访问的方法
Extract the required rows in the Dataframe data sheet