Fetching and Maintaining Fresh Job Listings
Efficiently fetch and maintain fresh job listings using the jobdata API by retrieving recent jobs and checking for expired listings daily, ensuring a clean and relevant local job database.
Table of contents
In this tutorial, we will explore a strategy to fetch the latest job listings from the jobdata API while ensuring that we keep our data fresh and up-to-date. We will utilize the /api/jobs/
endpoint to retrieve recent job listings and the /api/jobsexpired/
endpoint to check for expired jobs. This approach will help you maintain a clean and relevant job database.
Prerequisites
Before we begin, ensure you have the following:
- API Key: You need a valid API key from jobdata API with an active access pro subscription. If you don't have one, you can generate it from your dashboard after subscribing to the plan.
- Python Environment: Make sure you have Python installed along with the
requests
library. You can install it using pip if you haven't already:
pip install requests
Step 1: Fetch Recent Job Listings
We will start by fetching job listings that are not expired and have been published within the last 30 days. We will use the exclude_expired=true
and max_age=30
parameters in our request to the /api/jobs/
endpoint.
Example Code to Fetch Jobs with Pagination
import requests
from datetime import datetime, timedelta
# Constants
API_KEY = "YOUR_API_KEY" # Replace with your actual API key
JOBS_URL = "https://jobdataapi.com/api/jobs/"
EXPIRED_URL = "https://jobdataapi.com/api/jobsexpired/"
FETCH_DATE = datetime.now() # Current date for fetching jobs
# Function to fetch recent job listings with pagination
def fetch_recent_jobs():
headers = {"Authorization": f"Api-Key {API_KEY}"}
params = {
"exclude_expired": "true",
"max_age": 30,
"page_size": 1000, # Fetch 1000 jobs per request
"page": 1 # Start from the first page
}
all_jobs = [] # List to store all fetched jobs
while True:
response = requests.get(JOBS_URL, headers=headers, params=params)
if response.status_code == 200:
job_data = response.json()
all_jobs.extend(job_data['results']) # Add fetched jobs to the list
# ...or directly import jobs into your local DB here...
print(f"Fetched {len(job_data['results'])} jobs from page {params['page']}.")
# Check if there is a next page
if job_data['next']:
params['page'] += 1 # Increment page number for the next request
else:
break # Exit loop if no more pages
else:
print("Error fetching jobs:", response.status_code, response.text)
break
return all_jobs
# Fetch jobs
recent_jobs = fetch_recent_jobs()
print(f"Total fetched job listings: {len(recent_jobs)}")
Explanation
- We set the
exclude_expired
parameter totrue
to filter out expired jobs. - The
max_age
parameter is set to30
to only fetch jobs published in the last 30 days. - We specify
page_size
as1000
to minimize the number of requests. - The
while
loop handles pagination, fetching all available job listings sequentially.
Step 2: Check for Expired Jobs
Next, we will check for expired jobs daily. We will use the /api/jobsexpired/
endpoint to retrieve jobs that have expired since the last fetch date minus one day. This buffer ensures we don't miss any jobs that may have expired just before our check.
Example Code to Check Expired Jobs
# Function to check for expired jobs
def check_expired_jobs():
headers = {"Authorization": f"Api-Key {API_KEY}"}
expired_since = (FETCH_DATE - timedelta(days=1)).strftime('%Y-%m-%d') # One day buffer
params = {
"expired_since": expired_since,
"page_size": 1000, # Fetch 1000 expired jobs per request
"page": 1 # Start from the first page
}
expired_job_items = [] # List to store all expired job items
while True:
response = requests.get(EXPIRED_URL, headers=headers, params=params)
if response.status_code == 200:
expired_data = response.json()
print(f"Fetched {len(expired_data['results'])} expired job items from page {params['page']}.")
expired_job_items.extend(expired_data['results']) # Add expired jobs to the list
# ...or directly update jobs by their ID in your local DB here...
# Check if there is a next page
if expired_data['next']:
params['page'] = params.get('page', 1) + 1 # Increment page number
else:
break
else:
print("Error fetching expired jobs:", response.status_code, response.text)
break
return expired_job_items
# Check for expired jobs
expired_jobs = check_expired_jobs()
Explanation
- We calculate the
expired_since
date by subtracting one day from the current fetch date. - We use a
while
loop to paginate through the results, fetching up to 1000 expired jobs at a time. - The
next
link in the response helps us determine if there are more pages to fetch.
Step 3: Update Your Database
After (or while) fetching the expired jobs, you should update your local database to remove or mark these jobs as expired. You can match the job IDs from the expired jobs response against your local database entries.
Example Code to Update Database
# Example function to update the database (pseudo-code)
def update_database(expired_jobs):
for job in expired_jobs:
# Pseudo-code for database update
print(f"Updating database for expired job ID: {job['id']}")
# db.update_job_status(job['id'], status='expired')
# Call the update function with the expired job IDs
update_database(expired_jobs)
Explanation
- The
update_database
function is a placeholder for your actual database update logic. You would replace the print statement with your database update code.
Conclusion
By following this tutorial, you have learned how to fetch recent job listings from the jobdata API while ensuring that you keep your data fresh by checking for expired jobs daily. This approach allows you to maintain a clean and relevant job database, enhancing the user experience for job seekers.
Important Notes
- Ensure that you do not make parallel requests to the API; instead, make requests sequentially to avoid hitting rate limits.
- Consider implementing error handling and logging for production-level applications to track issues and performance.
- Regularly review and optimize your fetching and updating logic to ensure efficiency and accuracy.
Feel free to modify the code snippets to fit your specific use case and database structure!