Topics -> instagram, python, selenium, automation
Preview Link -> AutoInsta
Source Code Link -> GitHub
What We are going to do?
- Making a Login Script
- Accessing the timeline feeds and downloading the media files
- Managing the followers and followings
- How to run our code
Libraries/Tools required : -
- Chrome webdriver(same version as chrome browser)
- Requests library
- Selenium library
- Urllib3
Installing Libraries/Tools
For Chrome Web driver
You can download it from here
If you are unable to install then you can follow this link Youtube
Installing libraries
pip install requests pip install urllib3 pip install selenium
Before moving ahead, We must be aware of Css selectors
You can visit Here
Step 1 -> Making a Login Script
We will make a script which will take the username and password from the config file and login to the user account so that we can perform other activites
First make a config file containing username and password.
USERNAME = "" PASSWORD = ""
Writing script
First initiate the Chrome webdriver, base path and user url
BASE_DIR = os.path.dirname(__file__) browser = webdriver.Chrome() user_url = f"https://www.instagram.com/{USERNAME}"
Login Function
We will use the css selectors to locating the username field and password field. After entering the username and password at correct places, we click the submit button by location it by button[type='submit']
# login in the instagram account def login(browser=browser, username=USERNAME, password=PASSWORD): browser.get("https://www.instagram.com/") time.sleep(2) username_field = browser.find_element_by_name("username") username_field.send_keys(username) password_field = browser.find_element_by_name("password") password_field.send_keys(password) btn = browser.find_element_by_css_selector("button[type='submit']") time.sleep(2) btn.click()
Step 2 -> Accessing the timeline feeds and downloading the media files
It will open the user timeline and downloads media files depending upon the input passed.
def get_media(username=USERNAME, media_type=None, count=3, browser=browser) .....
It will take 4 keyword argument.
- count : It refers to the number of post it should search to download.
- username : Don't pass any username parameter if you want to download the self account post otherwise mention the username of other people.
- media_type : It can be anyone either img or video but you need to pass the parameter in lower case. Don't pass any parameter to download both media type.
- browser : It is just an instance of web driver
def get_media(username=USERNAME, media_type=None, count=3, browser=browser): person_url = f"https://www.instagram.com/{username}/" browser.get(person_url) # creating user media directory src_path = os.path.join(BASE_DIR, f"{username}") os.makedirs(src_path, exist_ok=True) # getting media links from the function media_elements = getting_all_post(media_type, count) print(media_elements) time.sleep(1.2) ... ...
It will first fetch the url. Then initiate the base dir for downloading media files.
Then It will call the getting_all_post function to retrieve all the media urls.
getting_all_post Function
It will take two arguments
- media_type : It can be anyone either img or video but you need to pass the parameter in lower case. Don't pass any parameter to download both media type.
- count : It refers to the number of post it should search to download.
media nested array is intialized. The zero index will contains all the images in the post while the first index will contains all the video links
# returning src url from all post def getting_all_post(media_type, count): media = [[], []] post_xpath = f"//a[contains(@href,'/p/')]" post_media = browser.find_elements_by_xpath(post_xpath) search_upto = len(post_media) if count and count > search_upto: print("only", search_upto, "media files alvailable") print("downloading---", search_upto, "files only") search_upto = count elif count: search_upto = count for post in post_media[0:search_upto]: browser.execute_script("arguments[0].click()", post) time.sleep(2.5) media[0] += get_src_from_post() try: media[1] += get_src_from_post(post_type="video") except NoSuchElementException: pass close_btn() # only returning the video url from all media url if media_type == 'video': return media[1] elif media_type == 'img': return media[0] else: return (media[0] + media[1])
It will get the src url of video and img using the get_src_from_post function so that we can download later.
get_src_from_post Function
It will take the source url from a particular post
# returning src of a particular post def get_src_from_post(post_type="img"): links = [] file_path = "//div[@role='button']//img[@style]" if post_type == "video": file_path = "//video[@type='video/mp4']" print(post_type) files = browser.find_elements_by_xpath(file_path) links += [file.get_attribute('src') for file in files] return links
Once we get all the urls, we will now download all the media files required
Continuing with the get_media function
def get_media(username=USERNAME, media_type=None, count=3, browser=browser): person_url = f"https://www.instagram.com/{username}/" browser.get(person_url) # creating user media directory src_path = os.path.join(BASE_DIR, f"{username}") os.makedirs(src_path, exist_ok=True) # getting media links from the function media_elements = getting_all_post(media_type, count) print(media_elements) time.sleep(1.2) # looping through the media links for element_url in media_elements: # parsing the correct file name base_url = urlparse(element_url).path filename = os.path.basename(base_url) print(base_url, "-------", filename) # creating the get request to download the media files with requests.get(element_url, stream=True) as r: if r.status_code not in range(200, 299): print("downloading fails, trying next") filepath = os.path.join(src_path, f'{username}{filename}') if os.path.exists(filepath): print("file already exists") continue # open created file to write as binary with open(filepath, 'wb') as f: for chunk in r.iter_content(): if chunk: f.write(chunk) print(element_url) print("-----------Done--------")
We will now iterate over the list of all media urls.
We will be using the urlparse from the urllib3 library to name our particular media file.
Now we will request that url using the Requests library to download it. If the status_code is 200 (or response is ok) then we will write that file in the disk at the preferred location.
Step 3 -> Managing the followers and followings
We can follow other suggested users or unfollow the ones that are not following us.
Following Management
It will remove a particular no of following
HHow does code work ?
- Gets the user profile/timeline page
- Open the following section
- Then checks if count of following is greater then 12 then it will scroll and find others following as first page only contains 12 followings
- It will remove following until input no of followings are not removed
# removing a particular no of following def remove_following(remove=0, browser=browser): browser.get(user_url) time.sleep(1.2) open_following_section() following_removed = 0 if remove > 12: time.sleep(1.3) scroll_to_bottom("((//nav)[3]//following::div)[1]") time.sleep(1) following_path = "//a[@title]" following_sel_list = browser.find_elements_by_xpath(following_path) while following_removed ≷ remove: try: user_name = following_sel_list[following_removed].text following_x_path = f"//a[@title='{user_name}']//following::button" following_btn = browser.find_element_by_xpath(following_x_path) following_btn.click() # print("removed----", user_name) following_removed += 1 time.sleep(1.7) unfollow_btn() # print("un follow", following_removed) except: break else: print("successfully removed", following_removed, "followers") close_btn()
It will use a open_following_section Function to follow the following section
open_following_section Function
def open_following_section(): follow_section_x_path = "//a[contains(@href,'following')]" follow_section = browser.find_element_by_xpath(follow_section_x_path) follow_section.click()
For scrolling bottom , we have used the scroll_to_bottom Function using the execute javascript function
# it scrolls a particular section of the site def scroll_to_bottom(scroll_xpath): scroll_time = 1.3 scroll_element = browser.find_element_by_xpath(scroll_xpath) last_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element) while True: browser.execute_script("arguments[0].scrollTo(0, arguments[1]);", scroll_element, last_height) time.sleep(scroll_time) new_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element) if new_height == last_height: break last_height = new_height
We have also used the unfollow_btn function to click the unfollow button
# finds the unfollow btn and click it. def unfollow_btn(): un_path = "//button[contains(text(),'Unfollow')]" un_follow = browser.find_element_by_xpath(un_path) un_follow.click()
Once our task is achieved, we closed that section using the close_btn
def close_btn(): close_btn_path = "//*[name()='svg' and @aria-label='Close']" close = browser.find_element_by_xpath(close_btn_path) close.click()
Managing Followers
It will help us to unfollow the users that are not following us
How does this code works
- Gets the following page
- Then opens the following section
- Gets all the users so that we can compare later with the followers
- Then open the followers page
- Gets all the followers and compare with the following
- Remove the ones which are not following us
# it removes the following that are not your follower def remove_following_not_followers(count=None, browser=browser): browser.get(user_url) time.sleep(1.3) open_following_section() time.sleep(1.3) scroll_to_bottom("((//nav)[3]//following::div)[1]") time.sleep(1) following_path = "//a[@title]" following_sel_list = browser.find_elements_by_xpath(following_path) following_list = {x.text for x in following_sel_list} print("followings are ", len(following_list), "--------------", following_list) close_btn() time.sleep(1) open_follower_section() time.sleep(1) scroll_to_bottom("((//div[@role='dialog']//div)[1]//div)[5]") followers_path = "(((//div[@role='dialog']//div)[1]//div)[5]//ul)[1]//a[@title]" followers_sel_list = browser.find_elements_by_xpath(followers_path) followers_list = {x.text for x in followers_sel_list} close_btn() print("followings are ", len(followers_list), "-------------------", followers_list) unwanted_follower = following_list - followers_list no_of_unwanted_follower = len(unwanted_follower) print("length of unwanted follower", no_of_unwanted_follower) time.sleep(1) open_following_section() time.sleep(1) scroll_to_bottom("((//nav)[3]//following::div)[1]") if count and count > no_of_unwanted_follower: print("the count exceeded only", no_of_unwanted_follower, "unwanted followers are there") no_of_unwanted_follower = count elif count: no_of_unwanted_follower = count for following in unwanted_follower[0:no_of_unwanted_follower]: print("removing---------", following) following_x_path = f"//a[@title='{following}']//following::button" following_btn = browser.find_element_by_xpath(following_x_path) following_btn.click() close_btn()
we have already discussed about the open_following_section , unfollow_btn, and close_btn Function. Lets discuss the other remained function used in this program.
open_follower_section Function
def open_follower_section(): follow_section_x_path = "//a[contains(@href,'followers')]" follow_section = browser.find_element_by_xpath(follow_section_x_path) follow_section.click()
Lets merge all the codes together
import os import requests import time from selenium import webdriver from selenium.common.exceptions import NoSuchElementException from urllib.parse import urlparse from config1 import USERNAME, PASSWORD BASE_DIR = os.path.dirname(__file__) browser = webdriver.Chrome() user_url = f"https://www.instagram.com/{USERNAME}" # login in the instagram account def login(browser=browser, username=USERNAME, password=PASSWORD): browser.get("https://www.instagram.com/") time.sleep(2) username_field = browser.find_element_by_name("username") username_field.send_keys(username) password_field = browser.find_element_by_name("password") password_field.send_keys(password) btn = browser.find_element_by_css_selector("button[type='submit']") time.sleep(2) btn.click() # returning src of a particular post def get_src_from_post(post_type="img"): links = [] file_path = "//div[@role='button']//img[@style]" if post_type == "video": file_path = "//video[@type='video/mp4']" print(post_type) files = browser.find_elements_by_xpath(file_path) links += [file.get_attribute('src') for file in files] return links # returning src url from all post def getting_all_post(media_type, count): media = [[], []] post_xpath = f"//a[contains(@href,'/p/')]" post_media = browser.find_elements_by_xpath(post_xpath) search_upto = len(post_media) if count and count > search_upto: print("only", search_upto, "media files alvailable") print("downloading---", search_upto, "files only") search_upto = count elif count: search_upto = count for post in post_media[0:search_upto]: browser.execute_script("arguments[0].click()", post) time.sleep(2.5) media[0] += get_src_from_post() try: media[1] += get_src_from_post(post_type="video") except NoSuchElementException: pass close_btn() # only returning the video url from all media url if media_type == 'video': return media[1] elif media_type == 'img': return media[0] else: return (media[0] + media[1]) def get_media(username=USERNAME, media_type=None, count=3, browser=browser): person_url = f"https://www.instagram.com/{username}/" browser.get(person_url) # creating user media directory src_path = os.path.join(BASE_DIR, f"{username}") os.makedirs(src_path, exist_ok=True) # getting media links from the function media_elements = getting_all_post(media_type, count) print(media_elements) time.sleep(1.2) # looping through the media links for element_url in media_elements: # parsing the correct file name base_url = urlparse(element_url).path filename = os.path.basename(base_url) print(base_url, "-------", filename) # creating the get request to download the media files with requests.get(element_url, stream=True) as r: if r.status_code not in range(200, 299): print("downloading fails, trying next") filepath = os.path.join(src_path, f'{username}{filename}') if os.path.exists(filepath): print("file already exists") continue # open created file to write as binary with open(filepath, 'wb') as f: for chunk in r.iter_content(): if chunk: f.write(chunk) print(element_url) print("-----------Done--------") # removing a particular no of following def remove_following(remove=0, browser=browser): browser.get(user_url) time.sleep(1.2) open_following_section() following_removed = 0 if remove > 12: time.sleep(1.3) scroll_to_bottom("((//nav)[3]//following::div)[1]") time.sleep(1) following_path = "//a[@title]" following_sel_list = browser.find_elements_by_xpath(following_path) while following_removed < remove: try: user_name = following_sel_list[following_removed].text following_x_path = f"//a[@title='{user_name}']//following::button" following_btn = browser.find_element_by_xpath(following_x_path) following_btn.click() # print("removed----", user_name) following_removed += 1 time.sleep(1.7) unfollow_btn() # print("un follow", following_removed) except: break else: print("successfully removed", following_removed, "followers") close_btn() # finds the unfollow btn and click it. def unfollow_btn(): un_path = "//button[contains(text(),'Unfollow')]" un_follow = browser.find_element_by_xpath(un_path) un_follow.click() def open_following_section(): follow_section_x_path = "//a[contains(@href,'following')]" follow_section = browser.find_element_by_xpath(follow_section_x_path) follow_section.click() def open_follower_section(): follow_section_x_path = "//a[contains(@href,'followers')]" follow_section = browser.find_element_by_xpath(follow_section_x_path) follow_section.click() def close_btn(): close_btn_path = "//*[name()='svg' and @aria-label='Close']" close = browser.find_element_by_xpath(close_btn_path) close.click() # it scrolls a particlar section of the site def scroll_to_bottom(scroll_xpath): scroll_time = 1.3 scroll_element = browser.find_element_by_xpath(scroll_xpath) last_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element) while True: browser.execute_script("arguments[0].scrollTo(0, arguments[1]);", scroll_element, last_height) time.sleep(scroll_time) new_height = browser.execute_script("return arguments[0].scrollHeight;", scroll_element) if new_height == last_height: break last_height = new_height # it removes the following that are not your follower def remove_following_not_followers(count=None, browser=browser): browser.get(user_url) time.sleep(1.3) open_following_section() time.sleep(1.3) scroll_to_bottom("((//nav)[3]//following::div)[1]") time.sleep(1) following_path = "//a[@title]" following_sel_list = browser.find_elements_by_xpath(following_path) following_list = {x.text for x in following_sel_list} print("followings are ", len(following_list), "--------------", following_list) close_btn() time.sleep(1) open_follower_section() time.sleep(1) scroll_to_bottom("((//div[@role='dialog']//div)[1]//div)[5]") followers_path = "(((//div[@role='dialog']//div)[1]//div)[5]//ul)[1]//a[@title]" followers_sel_list = browser.find_elements_by_xpath(followers_path) followers_list = {x.text for x in followers_sel_list} close_btn() print("followings are ", len(followers_list), "-------------------", followers_list) unwanted_follower = following_list - followers_list no_of_unwanted_follower = len(unwanted_follower) print("length of unwanted follower", no_of_unwanted_follower) time.sleep(1) open_following_section() time.sleep(1) scroll_to_bottom("((//nav)[3]//following::div)[1]") if count and count > no_of_unwanted_follower: print("the count exceeded only", no_of_unwanted_follower, "unwanted followers are there") no_of_unwanted_follower = count elif count: no_of_unwanted_follower = count for following in unwanted_follower[0:no_of_unwanted_follower]: print("removing---------", following) following_x_path = f"//a[@title='{following}']//following::button" following_btn = browser.find_element_by_xpath(following_x_path) following_btn.click() close_btn() if __name__ == '__main__': login(browser)
Some Note : -
- The download files will be in same directory with the name of the user. The download media files function has a automatic function which will prevent the duplication of the downloaded files.
- Don't use the opened browser as it will hinder the working of the code.
- If you find that the download is not working. There might be two reason :-
- Instagram might have changed the element position or element attribute- Your internet might be too slow
- While downloading the media file. If the media file is video then it will take some more time depending on the internet speed.
Step 4 -> How to run and setup code ?
How to setup :-
- First of all, install all the dependencies by `python3 install -r requirements.txt` in the cmd.
- Then you need to install the selenium web driver in your system. You can follow this link Youtube
- Now you need to change the username and password in the config.py folder or you can give it on later during the login.
- There are three features that our project provide :-
- Remove Following anyone
- Remove Following that are not your follower
- Download Media
- First you need to login to your download. Run the program in interactive shell. If you have already made changes in the config file. Then you can login by simple command:- `login()` or If you have not entered the credentials then you can login by:-
login(username="ENTER YOUR USERNAME", password="ENTER YOUR SECRET KEY")
- (A). Remove follower
Command:-
remove_following(remove=Enter the no of followers you want to unfollow)
Sample :-remove_following(remove=6)
- (B) : - Remove following that are not your follower.
Basic syntax :-
remove_following_not_followers(count=Enter the number of followers to remove)
Sample code :-remove_following_not_followers(count=5)
- (C). Download Media : - It can help you to download any media whether IGTV, Video Or Image.
Basic Syntax :- get_media(username=USERNAME, media_type=img or video, count=3)
Sample :-
get_media(username="therock", media_type=img, count=3) or get_media(count=3)
Deployment
You can easily deploy on Heroku
You can read more about on Medium Blog
Web Preview / Output
Placeholder text by Praveen Chaudhary · Images by Binary Beast