Big paper microblog personalization

Source: Internet
Author: User

1. Extract the user list of each program

2. correspond the user ID list to the user's profile.

The implementation script is as follows:

1 #/bin/sh 2 3 program_dir =/home/Minelab/liweibo/raw_data 4 user_file =/home/Minelab/liweibo o/springnightuser/sina_user.data 5 6 program_list = 'ls $ program_dir' 7 8 for program in $ program_list 9 do 10 # generate two files for each program 11 # program name _ userid_times.map field: user ID: the number of Weibo posts related to the program. 12 # program name _ userid_times_profile field: user ID user mentions the number of times users nickname user gender user region user birthday user interest number user fans number user posting Weibo number user tag 13 Rm-RF $ program_dir/$ Program/$ program" _ Userid_times.map "14 Rm-RF $ program_dir/$ Program/$ program" _ userid_times_profile.map "15 CAT $ program_dir/$ Program/$ program. data | awk-F' \ t' {print $2} '| sort | uniq-c | sort-r-N | SED's/^ * // G' | sed's // \ t/G' | awk-F' \ t'' {print $2 "\ t" $1} '| sort> $ program_dir/$ Program/ $ program "_ userid_times.map" 16 join-T $ '\ t' $ program_dir/$ Program/$ program "_ userid_times.map" $ user_file> $ program_dir /$ Program/$ program "_ userid_times_prof ile. Map" 17 18 echo $ program is done! 19 done 20 21 echo "all is done! "
Program user information processing extractuserforlargepaper. Sh

3. Number the program information.

#!/bin/shprogram_dir=/home/minelab/liweibo/raw_datainter_dir=/home/minelab/liweibo/inter_dataresult_file=$inter_dir/id_program.mapprogram_list=`ls $program_dir`rm -rf $result_filei=1for program in $program_listdo    echo $i"    "$program>>$result_file    i=$[$i+1]doneecho "done"
Assign an ID to a program

The obtained id_program.map File

1 hundred flowers competing for Yan 2 better fun 3 Spring Festival is what 4 answer 5 help not help 6 symbol China 7 glory and dream 8 joy song 9 sword heart book rhyme 10 volume Pearl curtain 11 Kangding love song 12 empty New Year 13 old aunt 14 trainer dance 15 rose life 16 dream butterfly 17 magic three brothers 18 hard to forget this evening 19 years taste 20 youth dance music 21 feelings have to have 22 groups I don't return 23 disturbing the masses 24 people to salute to 25 tip on the Spring Festival 26 time all where to go 27 Say what are you doing 28 Horse pole 29 the world's Yellow River 9th road bend 30 days Yao China 31 with light 13 absolutely 32 reunion dinner 0.33 million horse Pentium 0.34 million Spring River water 35 my requirements are not high 36 my Chinese Dream 37 I that's the case. 38 think about your 365 day 39 pony huanteng 40 wild bees flying 41 hero song 42 hero group song 43 stand in the distant place 44 on the high position 45 on the bright 46 best night
Id_program.map

4. Create a program id_user Matrix

 

#! /Bin/bash # The final file format is program id "" comment on the number of users of the program "" comment on the user ID list of the Program (separate IDs with spaces) # If a user reviews a program multiple times, processing as one time: program_dir =/home/Minelab/liweibo/raw_datainter_dir =/home/Minelab/liweibo/inter_dataresult_file = $ inter_dir/else = 'ls $ program_dir 'RM -RF $ result_filerm-RF $ tmp_filei = 1for program in $ program_listdo user_list = 'cat $ program_dir/$ Program/$ program "_ blank" | awk-F' \ t ''{ printf ("% s ", $1) ;}end {print ;} ''line_num = 'cat $ program_dir/$ Program/$ program "_ userid_times_profile.map" | WC-L | awk '{print $1} ''echo $ I "" $ line_num" "$ user_list> $ tmp_file I = $ [$ I + 1] Done # sort by program popularity cat $ tmp_file | sort-T $ '\ t'-K 2-R -N> $ result_filerm-RF $ tmp_fileecho "done"
Build id_user Matrix

 

 

 

5. Create a user _ program id Matrix

 

6. Measure the popularity of programs.

7. measure user activity

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.