1. Extract the user list of each program
2. correspond the user ID list to the user's profile.
The implementation script is as follows:
1 #/bin/sh 2 3 program_dir =/home/Minelab/liweibo/raw_data 4 user_file =/home/Minelab/liweibo o/springnightuser/sina_user.data 5 6 program_list = 'ls $ program_dir' 7 8 for program in $ program_list 9 do 10 # generate two files for each program 11 # program name _ userid_times.map field: user ID: the number of Weibo posts related to the program. 12 # program name _ userid_times_profile field: user ID user mentions the number of times users nickname user gender user region user birthday user interest number user fans number user posting Weibo number user tag 13 Rm-RF $ program_dir/$ Program/$ program" _ Userid_times.map "14 Rm-RF $ program_dir/$ Program/$ program" _ userid_times_profile.map "15 CAT $ program_dir/$ Program/$ program. data | awk-F' \ t' {print $2} '| sort | uniq-c | sort-r-N | SED's/^ * // G' | sed's // \ t/G' | awk-F' \ t'' {print $2 "\ t" $1} '| sort> $ program_dir/$ Program/ $ program "_ userid_times.map" 16 join-T $ '\ t' $ program_dir/$ Program/$ program "_ userid_times.map" $ user_file> $ program_dir /$ Program/$ program "_ userid_times_prof ile. Map" 17 18 echo $ program is done! 19 done 20 21 echo "all is done! "
Program user information processing extractuserforlargepaper. Sh
3. Number the program information.
#!/bin/shprogram_dir=/home/minelab/liweibo/raw_datainter_dir=/home/minelab/liweibo/inter_dataresult_file=$inter_dir/id_program.mapprogram_list=`ls $program_dir`rm -rf $result_filei=1for program in $program_listdo echo $i" "$program>>$result_file i=$[$i+1]doneecho "done"
Assign an ID to a program
The obtained id_program.map File
1 hundred flowers competing for Yan 2 better fun 3 Spring Festival is what 4 answer 5 help not help 6 symbol China 7 glory and dream 8 joy song 9 sword heart book rhyme 10 volume Pearl curtain 11 Kangding love song 12 empty New Year 13 old aunt 14 trainer dance 15 rose life 16 dream butterfly 17 magic three brothers 18 hard to forget this evening 19 years taste 20 youth dance music 21 feelings have to have 22 groups I don't return 23 disturbing the masses 24 people to salute to 25 tip on the Spring Festival 26 time all where to go 27 Say what are you doing 28 Horse pole 29 the world's Yellow River 9th road bend 30 days Yao China 31 with light 13 absolutely 32 reunion dinner 0.33 million horse Pentium 0.34 million Spring River water 35 my requirements are not high 36 my Chinese Dream 37 I that's the case. 38 think about your 365 day 39 pony huanteng 40 wild bees flying 41 hero song 42 hero group song 43 stand in the distant place 44 on the high position 45 on the bright 46 best night
Id_program.map
4. Create a program id_user Matrix
#! /Bin/bash # The final file format is program id "" comment on the number of users of the program "" comment on the user ID list of the Program (separate IDs with spaces) # If a user reviews a program multiple times, processing as one time: program_dir =/home/Minelab/liweibo/raw_datainter_dir =/home/Minelab/liweibo/inter_dataresult_file = $ inter_dir/else = 'ls $ program_dir 'RM -RF $ result_filerm-RF $ tmp_filei = 1for program in $ program_listdo user_list = 'cat $ program_dir/$ Program/$ program "_ blank" | awk-F' \ t ''{ printf ("% s ", $1) ;}end {print ;} ''line_num = 'cat $ program_dir/$ Program/$ program "_ userid_times_profile.map" | WC-L | awk '{print $1} ''echo $ I "" $ line_num" "$ user_list> $ tmp_file I = $ [$ I + 1] Done # sort by program popularity cat $ tmp_file | sort-T $ '\ t'-K 2-R -N> $ result_filerm-RF $ tmp_fileecho "done"
Build id_user Matrix
5. Create a user _ program id Matrix
6. Measure the popularity of programs.
7. measure user activity