jobdata

Fetching and Maintaining Fresh Job Listings

Efficiently fetch and maintain fresh job listings using the jobdata API by retrieving recent jobs and checking for expired listings daily, ensuring a clean and relevant local job database.

4 min read · Oct. 12, 2024
Table of contents

In this tutorial, we will explore a strategy to fetch the latest job listings from the jobdata API while ensuring that we keep our data fresh and up-to-date. We will utilize the /api/jobs/ endpoint to retrieve recent job listings and the /api/jobsexpired/ endpoint to check for expired jobs. This approach will help you maintain a clean and relevant job database.

Prerequisites

Before we begin, ensure you have the following:

  1. API Key: You need a valid API key from jobdata API with an active access pro subscription. If you don't have one, you can generate it from your dashboard after subscribing to the plan.
  2. Python Environment: Make sure you have Python installed along with the requests library. You can install it using pip if you haven't already:
pip install requests

Step 1: Fetch Recent Job Listings

We will start by fetching job listings that are not expired and have been published within the last 30 days. We will use the exclude_expired=true and max_age=30 parameters in our request to the /api/jobs/ endpoint.

Example Code to Fetch Jobs with Pagination

import requests
from datetime import datetime, timedelta

# Constants
API_KEY = "YOUR_API_KEY"  # Replace with your actual API key
JOBS_URL = "https://jobdataapi.com/api/jobs/"
EXPIRED_URL = "https://jobdataapi.com/api/jobsexpired/"
FETCH_DATE = datetime.now()  # Current date for fetching jobs

# Function to fetch recent job listings with pagination
def fetch_recent_jobs():
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    params = {
        "exclude_expired": "true",
        "max_age": 30,
        "page_size": 1000,  # Fetch 1000 jobs per request
        "page": 1  # Start from the first page
    }

    all_jobs = []  # List to store all fetched jobs
    while True:
        response = requests.get(JOBS_URL, headers=headers, params=params)
        if response.status_code == 200:
            job_data = response.json()
            all_jobs.extend(job_data['results'])  # Add fetched jobs to the list
            # ...or directly import jobs into your local DB here...
            print(f"Fetched {len(job_data['results'])} jobs from page {params['page']}.")
            # Check if there is a next page
            if job_data['next']:
                params['page'] += 1  # Increment page number for the next request
            else:
                break  # Exit loop if no more pages
        else:
            print("Error fetching jobs:", response.status_code, response.text)
            break

    return all_jobs

# Fetch jobs
recent_jobs = fetch_recent_jobs()
print(f"Total fetched job listings: {len(recent_jobs)}")

Explanation

  • We set the exclude_expired parameter to true to filter out expired jobs.
  • The max_age parameter is set to 30 to only fetch jobs published in the last 30 days.
  • We specify page_size as 1000 to minimize the number of requests.
  • The while loop handles pagination, fetching all available job listings sequentially.

Step 2: Check for Expired Jobs

Next, we will check for expired jobs daily. We will use the /api/jobsexpired/ endpoint to retrieve jobs that have expired since the last fetch date minus one day. This buffer ensures we don't miss any jobs that may have expired just before our check.

Example Code to Check Expired Jobs

# Function to check for expired jobs
def check_expired_jobs():
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    expired_since = (FETCH_DATE - timedelta(days=1)).strftime('%Y-%m-%d')  # One day buffer
    params = {
        "expired_since": expired_since,
        "page_size": 1000,  # Fetch 1000 expired jobs per request
        "page": 1  # Start from the first page
    }

    expired_job_items = []  # List to store all expired job items
    while True:
        response = requests.get(EXPIRED_URL, headers=headers, params=params)
        if response.status_code == 200:
            expired_data = response.json()
            print(f"Fetched {len(expired_data['results'])} expired job items from page {params['page']}.")
            expired_job_items.extend(expired_data['results'])  # Add expired jobs to the list
            # ...or directly update jobs by their ID in your local DB here...
            # Check if there is a next page
            if expired_data['next']:
                params['page'] = params.get('page', 1) + 1  # Increment page number
            else:
                break
        else:
            print("Error fetching expired jobs:", response.status_code, response.text)
            break

    return expired_job_items

# Check for expired jobs
expired_jobs = check_expired_jobs()

Explanation

  • We calculate the expired_since date by subtracting one day from the current fetch date.
  • We use a while loop to paginate through the results, fetching up to 1000 expired jobs at a time.
  • The next link in the response helps us determine if there are more pages to fetch.

Step 3: Update Your Database

After (or while) fetching the expired jobs, you should update your local database to remove or mark these jobs as expired. You can match the job IDs from the expired jobs response against your local database entries.

Example Code to Update Database

# Example function to update the database (pseudo-code)
def update_database(expired_jobs):
    for job in expired_jobs:
        # Pseudo-code for database update
        print(f"Updating database for expired job ID: {job['id']}")
        # db.update_job_status(job['id'], status='expired')

# Call the update function with the expired job IDs
update_database(expired_jobs)

Explanation

  • The update_database function is a placeholder for your actual database update logic. You would replace the print statement with your database update code.

Conclusion

By following this tutorial, you have learned how to fetch recent job listings from the jobdata API while ensuring that you keep your data fresh by checking for expired jobs daily. This approach allows you to maintain a clean and relevant job database, enhancing the user experience for job seekers.

Important Notes

  • Ensure that you do not make parallel requests to the API; instead, make requests sequentially to avoid hitting rate limits.
  • Consider implementing error handling and logging for production-level applications to track issues and performance.
  • Regularly review and optimize your fetching and updating logic to ensure efficiency and accuracy.

Feel free to modify the code snippets to fit your specific use case and database structure!

Related Docs

How to Determine if a Job Post Requires Security Clearance
Retrieving and Working with Industry Data for Imported Jobs
Using the jobdata API for Machine Learning with Cleaned Job Descriptions
Integrating the jobdata API with Excel
Integrating the jobdata API with Zapier
Optimizing API Requests: A Guide to Efficient jobdata API Usage
Integrating the jobdata API with n8n
Merging Job Listings from Multiple Company Entries
Integrating the jobdata API with Make
Converting Annual FTE Salary to Monthly, Weekly, Daily, and Hourly Rates