Topics -> discord-bot, python, requests-html, bot
Preview Link ->
Source Code Link -> GitHub
What We are going to do?
- Extracting the Opportunities from Internshala and Freelancer.
- Initializing the Discord client.
- Making commands, caching it for further use and providing response in real time to users
Some Important Concept
We will be using the requests-html for scraping.
But, What is requests-html?
This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible.
- Full JavaScript support!
- CSS Selectors (a.k.a jQuery-style, thanks to PyQuery).
- XPath Selectors, for the faint of heart.
- Mocked user-agent (like a real web browser).
- Automatic following of redirects.
- Connection–pooling and cookie persistence.
- The Requests experience you know and love, with magical parsing abilities.
- Async Support
Requests ?
Requests is a Python HTTP library, released under the Apache License 2.0. The goal of the project is to make HTTP requests simpler and more human-friendly.
Dicord Python
A modern, easy to use, feature-rich, and async ready API wrapper for Discord written in Python.
Installing Required libraries :-
pip install requests pip install requests_html pip install discord
Step 1 => Extracting the Opportunities from Internshala and Freelancer
We will requests-html to extract the opportunities from Freelancer and Internshala. We will use the css selectors to loacte the element.
We must have url depending on the input tag. We can frame url using our custom function.
For Internshala Url
# It will start the scraper. If It has a keyword then url will be based upon that. def start_scraper(keyword=None): if keyword: url = f"https://internshala.com/internships/keywords-{keyword}" else: url = "https://internshala.com/internships" return get_internship(url)
For Freelancer Url
# Starter function for freelancing function def get_freelance(keyword=None): random_keywords = ['python', 'java', 'web', 'javascript', 'graphics'] if keyword: url = f"https://www.freelancer.com/jobs/?keyword={keyword}" else: random_keyword = random.choice(random_keywords) url = f"https://www.freelancer.com/jobs/?keyword={random_keyword}" res_html = pharse_and_extract(url) freelance_works = extract_from_freelancer(res_html) return freelance_works
Fetching the data from the url using Request-html
def url_to_text(url): r = requests.get(url) if r.status_code == 200: html_text = r.text return html_text
r.status_code will check the response status code. If it is valid then proceed to other part.
Parsing the Html code using HTML from requests-HTML
# It will parse the html data into structure way def pharse_and_extract(url, name=2020): html_text = url_to_text(url) if html_text is None: return "" r_html = HTML(html=html_text) return r_html
Getting internship from Internshala
It will find all the post using the css class. Then it will loop through all the posts and get all the required details like stipend, duration, organisation name and so on.
# it will loop through all the internship and extract valuable data def get_internship(url): internships = [] res_data = pharse_and_extract(url) opportunties = res_data.find(".individual_internship") for opportunity in opportunties: title = opportunity.find(".company a", first=True).text internship_link = opportunity.find(".profile a", first=True).attrs['href'] organisation = opportunity.find(".company .company_name", first=True).text organisation_internships = opportunity.find(".company_name a", first=True).attrs['href'] location = opportunity.find(".location_link", first=True).text start_data = opportunity.find("#start-date-first", first=True).text.split("\xa0immediately")[-1] ctc = opportunity.find(".stipend", first=True).text apply_lastes_by = opportunity.xpath(".//span[contains(text(),'Apply By')]/../../div[@class='item_body']", first=True).text duration = opportunity.xpath(".//span[contains(text(),'Duration')]/../../div[@class='item_body']", first=True).text internships.append({ 'title': title, 'organisation': organisation, 'location': location, 'start_data': start_data, 'ctc': ctc, 'apply_lastes_by': apply_lastes_by, 'duration': duration, 'organisation_internships': f"https://internshala.com{organisation_internships}", 'internship_link': f"https://internshala.com{internship_link}" }) return internships
Getting Jobs using Freelancer Work
Same like above, First it will all post using the common class and then loop through it ie. (.JobSearchCard-item)
class.
# It will extract the freelancing opportunities def extract_from_freelancer(res_html): freelance_works = [] opportunities = res_html.find(".JobSearchCard-item") for opportunity in opportunities: title = opportunity.find(".JobSearchCard-primary-heading a", first=True).text freelance_link = opportunity.find(".JobSearchCard-primary-heading a", first=True).attrs['href'] avg = opportunity.find(".JobSearchCard-primary-price") if avg: avg_proposal = avg[0].text else: avg_proposal = "Not mentioned" apply_lastes_by = opportunity.find(".JobSearchCard-primary-heading-days", first=True).text desc = opportunity.find(".JobSearchCard-primary-description", first=True).text freelance_works.append({ 'title': title, 'description': desc, 'apply_lastes_by': apply_lastes_by, 'avg_proposal': avg_proposal, 'freelance_link': f"https://www.freelancer.com/{freelance_link}" }) return freelance_works
Step 2 => Initializing the Discord client
It will initialize the client so that we can use later when needed.
Please make sure to put the channel ID
@client.event async def on_ready(): channel = client.get_channel(<>) print("We have logged in as", client.user)
Step 3 => Making commands, caching it for further use and providing response in real time to users
What is Repl Database?
Replit Database is a simple, user-friendly key-value store inside of every repl. No configuration is required; you can get started right away!
What we are going to do in the step?
- Make commands, so that we may know what the user want depending on the input supplied
- Checking is the data is present in database or not.
- If present, then provide the response using our custom formatter
- If not, Scrape then provide response using formatter
1. Initializing the commands
@client.event async def on_message(message): if message.author == client.user: return if message.content.startswith('$hello'): await message.channel.send(f"Hello {message.author}") if message.content.startswith('$reset internship'): del db['internship'] await message.channel.send("cleared internship") if message.content.startswith('$reset freelance'): del db['freelance'] await message.channel.send("cleared freelance") if message.content.startswith('$reset'): db.clear() await message.channel.send("cleared all") if message.content.startswith('$help'): db.clear() await message.channel.send( "------------------\nwrite $internship with space separated field or keyword \n\nExample \n$internship python \n\nFor Freelance \n----------------------\nwrite $freelance with space separated\n----------------------- \n\nExample \n$freelance python \n\nOr \n\n $freelance \nfor random freelance work") if message.content.startswith('$internship'): .... if message.content.startswith('$freelance'): ....
2. Checking Repl Database
.... if message.content.startswith('$freelance'): key_list = message.content.split(" ") if len(key_list) > 1: keyword = key_list[1] if 'freelance' in db.keys(): if keyword in db['freelance'].keys(): free_result = random.choice(db['freelance'][keyword]) else: freelance_works = get_freelance(keyword=keyword) db['freelance'][keyword] = freelance_works free_result = random.choice(freelance_works) ....
If data is found in database
.... result_message = format_message(free_result) await message.channel.send(result_message) ....
If not, scrape then response
... else: db['freelance'] = {} freelance_works = get_freelance(keyword=keyword) db['freelance'][keyword] = freelance_works free_result = random.choice(freelance_works) result_message = format_message(free_result) await message.channel.send(result_message) ...
Whole Code at Once
@client.event async def on_message(message): if message.author == client.user: return if message.content.startswith('$hello'): await message.channel.send(f"Hello {message.author}") if message.content.startswith('$reset internship'): del db['internship'] await message.channel.send("cleared internship") if message.content.startswith('$reset freelance'): del db['freelance'] await message.channel.send("cleared freelance") if message.content.startswith('$reset'): db.clear() await message.channel.send("cleared all") if message.content.startswith('$help'): db.clear() await message.channel.send( "------------------\nwrite $internship with space separated field or keyword \n\nExample \n$internship python \n\nFor Freelance \n----------------------\nwrite $freelance with space separated\n----------------------- \n\nExample \n$freelance python \n\nOr \n\n $freelance \nfor random freelance work") if message.content.startswith('$internship'): keyword = message.content.split(" ")[-1] print(keyword) if 'internship' in db.keys(): if keyword in db['internship'].keys(): result = random.choice(db[keyword]) else: opportunities = start_scraper(keyword=keyword) db['internship'][keyword] = opportunities result = random.choice(opportunities) else: db['internship'] = {} opportunities = start_scraper(keyword=keyword) db['internship'][keyword] = opportunities result = random.choice(opportunities) result_message = format_message(result) await message.channel.send(result_message) if message.content.startswith('$freelance'): key_list = message.content.split(" ") if len(key_list) > 1: keyword = key_list[1] if 'freelance' in db.keys(): if keyword in db['freelance'].keys(): free_result = random.choice(db['freelance'][keyword]) else: freelance_works = get_freelance(keyword=keyword) db['freelance'][keyword] = freelance_works free_result = random.choice(freelance_works) else: db['freelance'] = {} freelance_works = get_freelance(keyword=keyword) db['freelance'][keyword] = freelance_works free_result = random.choice(freelance_works) result_message = format_message(free_result) await message.channel.send(result_message) else: if 'freelance' in db.keys(): if 'random' in db['freelance'].keys(): free_result = random.choice(db['freelance']['random']) else: data = get_freelance() db['freelance']['random'] = data free_result = random.choice(data) else: db['freelance'] = {} data = get_freelance() db['freelance']['random'] = data free_result = random.choice(data) result_message = format_message(free_result) await message.channel.send(result_message)
Deployment
You can only deploy on Repl as we are using the Repl Database.
Web Preview / Output
Placeholder text by Praveen Chaudhary · Images by Binary Beast