Fetching and Maintaining Fresh Job Listings

Efficiently fetch and maintain fresh job listings using the jobdata API by retrieving recent jobs and checking for expired listings daily, ensuring a clean and relevant local job database.

5 min read · Dec. 13, 2024

Tutorials

Background and Design Rationale
Prerequisites
Step 1: Fetch Recent Job Listings
- Example Code to Fetch Jobs with Pagination
- Explanation
Step 2: Check for Expired Jobs
- Example Code to Check Expired Jobs
- Explanation
Step 3: Update Your Database
- Example Code to Update Database
- Explanation
Conclusion
- Important Notes

In this tutorial, we will explore a strategy to fetch the latest job listings from the jobdata API while ensuring that we keep our data fresh and up-to-date. We will utilize the /api/jobs/ endpoint (see docs) to retrieve recent job listings and the /api/jobsexpired/ endpoint (see docs) to check for expired jobs. This approach will help you maintain a clean and relevant job database.

Background and Design Rationale

The process described here is designed to ensure efficient and cost-effective job listing management by separating active job retrieval and expiration tracking into two endpoints. The /api/jobs/ endpoint provides detailed, (non-expired with the exclude_expird=true flag) job data for initial imports, while the /api/jobsexpired/ endpoint offers a lightweight feed of recently expired jobs, including their expiration dates.

This two-step process optimizes resource usage by avoiding repeated fetching of full job details and focusing only on updating expiration statuses in your local database. By periodically checking the /api/jobsexpired/ feed and matching it against your database, you can maintain a fresh and accurate dataset with minimal API calls.

Prerequisites

Before we begin, ensure you have the following:

API Key: You need a valid API key from jobdata API with an active access pro subscription. If you don't have one, you can generate it from your dashboard after subscribing to the plan.
Python Environment: Make sure you have Python installed along with the requests library. You can install it using pip if you haven't already:

pip install requests

Step 1: Fetch Recent Job Listings

We will start by fetching job listings that are not expired and have been published within the last 30 days. We will use the exclude_expired=true and max_age=30 parameters in our request to the /api/jobs/ endpoint.

Example Code to Fetch Jobs with Pagination

import requests
from datetime import datetime, timedelta

# Constants
API_KEY = "YOUR_API_KEY"  # Replace with your actual API key
JOBS_URL = "https://jobdataapi.com/api/jobs/"
EXPIRED_URL = "https://jobdataapi.com/api/jobsexpired/"
FETCH_DATE = datetime.now()  # Current date for fetching jobs

# Function to fetch recent job listings with pagination
def fetch_recent_jobs():
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    params = {
        "exclude_expired": "true",
        "max_age": 30,
        "page_size": 1000,  # Fetch 1000 jobs per request
        "page": 1  # Start from the first page
    }

    all_jobs = []  # List to store all fetched jobs
    while True:
        response = requests.get(JOBS_URL, headers=headers, params=params)
        if response.status_code == 200:
            job_data = response.json()
            all_jobs.extend(job_data['results'])  # Add fetched jobs to the list
            # ...or directly import jobs into your local DB here...
            print(f"Fetched {len(job_data['results'])} jobs from page {params['page']}.")
            # Check if there is a next page
            if job_data['next']:
                params['page'] += 1  # Increment page number for the next request
            else:
                break  # Exit loop if no more pages
        else:
            print("Error fetching jobs:", response.status_code, response.text)
            break

    return all_jobs

# Fetch jobs
recent_jobs = fetch_recent_jobs()
print(f"Total fetched job listings: {len(recent_jobs)}")

Explanation

We set the exclude_expired parameter to true to filter out expired jobs.
The max_age parameter is set to 30 to only fetch jobs published in the last 30 days.
We specify page_size as 1000 to minimize the number of requests.
The while loop handles pagination, fetching all available job listings sequentially.

Step 2: Check for Expired Jobs

Next, we will check for expired jobs daily. We will use the /api/jobsexpired/ endpoint to retrieve jobs that have expired since the last fetch date minus one day. This buffer ensures we don't miss any jobs that may have expired just before our check.

Example Code to Check Expired Jobs

# Function to check for expired jobs
def check_expired_jobs():
    headers = {"Authorization": f"Api-Key {API_KEY}"}
    expired_since = (FETCH_DATE - timedelta(days=1)).strftime('%Y-%m-%d')  # One day buffer
    params = {
        "expired_since": expired_since,
        "page_size": 1000,  # Fetch 1000 expired jobs per request
        "page": 1  # Start from the first page
    }

    expired_job_items = []  # List to store all expired job items
    while True:
        response = requests.get(EXPIRED_URL, headers=headers, params=params)
        if response.status_code == 200:
            expired_data = response.json()
            print(f"Fetched {len(expired_data['results'])} expired job items from page {params['page']}.")
            expired_job_items.extend(expired_data['results'])  # Add expired jobs to the list
            # ...or directly update jobs by their ID in your local DB here...
            # Check if there is a next page
            if expired_data['next']:
                params['page'] = params.get('page', 1) + 1  # Increment page number
            else:
                break
        else:
            print("Error fetching expired jobs:", response.status_code, response.text)
            break

    return expired_job_items

# Check for expired jobs
expired_jobs = check_expired_jobs()

Explanation

We calculate the expired_since date by subtracting one day from the current fetch date.
We use a while loop to paginate through the results, fetching up to 1000 expired jobs at a time.
The next link in the response helps us determine if there are more pages to fetch.

Step 3: Update Your Database

After (or while) fetching the expired jobs, you should update your local database to remove or mark these jobs as expired. You can match the job IDs from the expired jobs response against your local database entries.

Example Code to Update Database

# Example function to update the database (pseudo-code)
def update_database(expired_jobs):
    for job in expired_jobs:
        # Pseudo-code for database update
        print(f"Updating database for expired job ID: {job['id']}")
        # db.update_job_status(job['id'], status='expired')

# Call the update function with the expired job IDs
update_database(expired_jobs)

Explanation

The update_database function is a placeholder for your actual database update logic. You would replace the print statement with your database update code that matches jobs by IDs existing in your database and sets their expired date accordingly.

Conclusion

By following this tutorial, you have learned how to fetch recent job listings from the jobdata API while ensuring that you keep your data fresh by checking for expired jobs daily. This approach allows you to maintain a clean and relevant job database, enhancing the user experience for job seekers.

Important Notes

Ensure that you do not make parallel requests to the API; instead, make requests sequentially to avoid hitting rate limits.
Consider implementing error handling and logging for production-level applications to track issues and performance.
Regularly review and optimize your fetching and updating logic to ensure efficiency and accuracy.

Feel free to modify the code snippets to fit your specific use case and database structure!

Related Docs

Fetching and Maintaining Fresh Job Listings

Table of contents

Background and Design Rationale

Prerequisites

Step 1: Fetch Recent Job Listings

Example Code to Fetch Jobs with Pagination

Explanation

Step 2: Check for Expired Jobs

Example Code to Check Expired Jobs

Explanation

Step 3: Update Your Database

Example Code to Update Database

Explanation

Conclusion

Important Notes

Related Docs

Integrating the jobdata API with Excel

Integrating the jobdata API with Zapier

Using the jobdata API for Machine Learning with Cleaned Job Descriptions

Retrieving and Working with Industry Data for Imported Jobs

Introduction to Using Vector Search and Embeddings through the jobdata API

Merging Job Listings from Multiple Company Entries

Optimizing API Requests: A Guide to Efficient jobdata API Usage

A Two-Step Approach to Precision Job Filtering

Integrating the jobdata API with Make

How to Determine if a Job Post Requires Security Clearance

Integrating the jobdata API with n8n

Converting Annual FTE Salary to Monthly, Weekly, Daily, and Hourly Rates