How do I use Python to perform data analysis on Instagram?

Last Update:2017-10-09 Source: Internet

Author: User

Tags jupyter jupyter notebook

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I'm writing this article to show the basic ways to use Instagram programmatically. My approach can be used for data analysis, computer vision, and any cool projects you can think of. Instagram is the largest picture-sharing social media platform, with about 500 million active users per month, with 95 million of images and videos being uploaded to Instagram every day. Its data is large in size and has great potential. This article will show you how to use Instagram as a data source instead of a platform, and describe the development approach that this article gives you in your project. Introduction to APIs and tools
Instagram provides the official API, but these APIs are somewhat outdated and currently offer very limited functionality. So in this article, I used the non-Instagram official API provided by Levpasha. The API supports all key features such as liking, adding powder, uploading pictures and videos, and more. It's written in Python, and I'm only focusing on data-side operations in this article. I recommend using Jupyter Notebook and Ipython. Although there is no problem with official python, it does not provide features such as picture display. Installation you can use PIP to install the software library with the following command:
Python -m pip install -e git+https://github.com/levpasha/instagram-api-python.git#egg =instagramapi If FFmpeg is not already installed on the system, you can install it on Linux using the following command: Sudo apt-get install ffmpeg for Windows systems, You need to run the following command in the Python Interpreter: Import imageioimageio.plugins.ffmpeg.download () Use the API below to implement login instragram:from Instagramapi import instagramapiusername= "YourUserName" Instagramapi = instagramapi ( username, "YourPassword") Instagramapi.login () if the login is successful, you will receive a "Login successful" message. After the basic request completes the above preparation, we can proceed to implement the first request: Instagramapi.getprofiledata () result = instagramapi.lastjson{u ' status ': u ' OK ', u ' user ': {u ' biography ': u ', u ' birthday ': none, u ' Country_code ': 20, u ' email ': [email protected]om ', u ' External_url ': U ', u ' full_name ': u ' Nour galaby ', u ' gender ': 1, u ' Has_ Anonymous_profile_picture ': false, u ' hd_profile_pic_url_info ': &NBsp {u ' height ': 1080, u ' url ': u ' https://instagram.fcai2-1.fna.fbcdn.net/ T51.2885-1aaa7448121591_1aa.jpg ', u ' width ': 1080}, u ' hd_profile_pic_versions ' : [{u ' height ': 320, u ' url ': u ' https://instagram.fcai2-1.fna.fbcdn.net/ T51.2885-19/s320x320/19aa23237_4337448121591_195310aaa32_a.jpg ', u ' width ': 320}, {u ' height ': 640, u ' url ': u ' https:// Instagram.fcai2-1.fna.fbcdn.net/t51.2885-19/s640x640/19623237_45581744812153_44_a.jpg ', u ' width ': 640}], u ' is_private ': true, u ' is_verified ': False, u ' national_number ': 122, u ' phone_number ': u ' +201220 ', u ' PK ': 22412229, u ' profile_pic_id ': u ' 1550239680720880455_22 ', u ' profile_pic_url ': u ' Https://instagram.fcai2-1.fna.fbcdn.net/t51.2885-19/s150x150/19623237_455817448121591_195310166162_a.jpg ', u ' show_conversion_edit_entry ': False, u ' username ': u ' Nourgalaby '} as shown above, the result is given in JSON format, which includes all requested data. You can access the resulting data using normal key values. For example: (click to enlarge image) You can also use a tool (such as notepad++) to view the JSON data and explore it. Get and view the Instagram timeline below to let us implement some of the more useful features. We will request a post at the end of the timeline and view it in Jupyter notebook. The following code implements the Get timeline: Instagramapi.timelinefeed () is similar to the previous request implementation, and we also use Lastjson () to view the results. Looking at the result JSON data, we can see that it includes a series of key values called "entries". Each element in the list holds information about a specific post on the timeline, including the following elements:
[Text]: Save the post text content under the title, including hashtag.
[Likes]: the number of likes in a post.
[Created_at]: Post creation time.
[Comments]: Comments on posts.
[Image_versions]: Save a link to the actual jpg file, you can use the link to display the picture in Jupyter notebook.
The function Functions Get_posts_from_list () and Get_url () loop through the list of posts, find the URLs in each post, and attach them to our empty list. When the above function is complete, we will get a list of URLs as follows: (click to enlarge image) We can use the Ipython.display module to view the picture, the code is as follows: (click to enlarge image) in ipython Viewing images in Notebook is a useful feature, and we'll use these functions to see the results later, so stay tuned. Get the most popular posts now that we know how to make basic requests, how do we implement more complex requests? Let's do something like this: How to get the most popular of our posts. To do this, you first need to get all the posts for the currently logged-in user, and then sort the posts by the number of likes. Get all posts for a user to get all posts, we'll use the next_max_id and more_avialable values to perform loops on the results list. Import timemyposts=[]has_more_posts = truemax_id= "" while has_more_posts: instagramapi.getselfuserfeed (maxid=max_id) if instagramapi.lastjson[' more_ Available '] is not true: has_more_posts = False #stop condition print "stopped" max_id = instagramapi.lastjson.get (' next_max_id ', ') Myposts.extend (instagramapi.lastjson[' items ') #merge lists &nbsP; time.sleep (2) # slows the script down to avoid flooding the servers print len (myposts) Save and load data to disk because the above request may take a long time to complete, we do not want to run it when it is not necessary. It is therefore good practice to save the results and load them again as you continue to work. To do this, we will use pickle. Pickle can serialize and save any variables to a file, and then load them. Here is a working example: Save: import picklefilename=username+ "_posts" Pickle.dump (myposts,open (filename, "WB")) Load: Import picklefilename= "Nourgalaby_posts" Myposts=pickle.load (file=open (filename)) Sort by points of likes now we got a name called "Myposts". An ordered dictionary. To implement sorting according to a key value in the dictionary, we can use a lambda expression with the following code: myposts_sorted = sorted (myposts, key=lambda k:k[' like _count '],reverse=true) top_posts=myposts_sorted[:10]bottom_posts=myposts_sorted[-10:] The following code can be implemented as shown above: Image_urls=get_images_from_list (top_posts) Display_images_from_url (image_urls) Filter images we may want to do some filtering on our posts. For example, there might be videos in the post, but we just want the picture post. We can do this. Filter: Myposts_photos= filter (lambda k: k[' media_type ']==1, myposts) myposts_vids= Filter (lambda k: k[' Media_type ']==2, myposts) Print len (myposts) Print len (Myposts_photos) Print len (myposts_vids) Of course, you can filter any variable in the result, Play with your creativity! Notice instagramapi.getrecentactivity () get_recent_activity_response= instagramapi.lastjson for notifcation in get_recent_activity_response[' Old_stories ']: print notifcation[' args ' [' text '] results may be: userohamed3 liked your post.userhacker32 liked your post.user22 liked your post.userz77 liked your post.userwww77 started following you.user2222 liked your post.user23553 liked your Post. Notifications from specific users now, we can act on our request and play the notification. For example, I can get a list of announcements from a specific user: Username= "Diana" for notifcation in get_recent_activity_response[' old_ Stories ']: text = notifcation[' args ' [' Text '] if Username in text: print text Let's trySome of the more interesting things to do, such as getting the most out of the time you've been liked, and when people are liking the most in a day. To do this, we'll draw a diagram that shows the relationship between the moment of the day and the number of likes you've received. The following code draws the time date for the notification: IMPORT&NBSP;PANDAS&NBSP;AS&NBSP;PDDF&NBSP;=&NBSP;PD. DataFrame ({"Date":d Ates}) Df.groupby (df["date"].dt.hour). Count (). Plot (kind= "bar", title= "Hour" ) (click to enlarge image) As you can see in this example, I get the most praise from six o'clock in the afternoon to 10 points. If you know about social media, you'll know that this is the peak usage time, and most businesses choose this time period to post to get the most recognition. Get fans and powder list below I'll get a list of fans and threads and do some work on the list. To use both the getuserfollowings and Getuserfollowers functions, you first need to get user_id. Here's a way to get user_id: (click to enlarge the image) Now you can call the function as follows. Note that if the number of fans is very large, you will need to make multiple requests (described in more detail below). Now we made a request to get a list of fans and powders. A list of users is given in the JSON results, which contains information about each fan and the person being powdered. Hjd1956.com instagramapi.getuserfollowings (user_id) Print len (instagramapi.lastjson[' users ') following_list=instagramapi.lastjson[' users ']instagramapi.getuserfollowers (user_id) Print len ( instagramapi.lastjson[' users ']) followers_list=instagramapi.lastjson[' users ' if the number of fans is large, the given results may not be a complete list. Getting all fans to get a list of all fans is similar to getting all posts. We will make a request and then iterate over the result using the NEXT_MAX_ID key value. Thank you for Francesc garcia for your support here. import timefollowers = []next_max_id = truewhile next_max_id: print next_max_id # First iteration hack if next_max_id == true: next_max_id= " _ = instagramapi.getuserfollowers (user_id,maxid=next_max_id) followers.extend ( instagramapi.lastjson.get (' users ', []) next_ Max_id = instagramapi.lastjson.get (' next_max_id ', ') time.sleep (1) Followers_list=followers can do the same for a powder list, but I don't do it because, for me, a request is enough to get all of my powder. Now we have the list data of all the fans and the powders in JSON format. I will convert the list to a user-friendly data type, which is a collection that makes it easy to do a series of operations on the data. I only take the "username" key value and use Set () on it. hjdseo.cn user_list = map (lambda x: x[' username '] , following_list) Following_set= set (user_list) Print len (following_set) User_list = map (lambda x: x [' username '] , followers_list) followers_set= set (user_lIST) Print len (followers_set) Here I have selected a collection of all the user names. The same can be done for "full_name", and the results are more user-friendly. However, the results may not be unique, because some users may not provide a full name. Now we've got two sets. We can do the following: (click to enlarge the image) Here I give some statistics of the fans. You can do a lot of things, such as saving a fan list and making comparisons later to learn about the powder. We've shown you what you can do with your Instagram data. I hope you've learned how to use INSTAGRAM&NBSP;API and have some basic ideas about what you can do with these APIs. Stay tuned for the official API, which is still under development and you can use them to do more things in the future. If you have any questions or suggestions, please feel free to contact me.

How do I use Python for data analysis of Instagram?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More