A Two-Step Approach to Precision Job Filtering
This tutorial presents a two-step method for optimizing API queries within the Life Sciences and Biotechnology industries, to ensure highly relevant job listings are identified more efficiently and accurately.
Table of contents
Fetching the right jobs form the API in specialized industries like Life Sciences and Biotechnology can be challenging. To streamline this process, we can leverage a two-step approach by filtering for title keywords first and then verify with description filters locally. Here we don't even need to use any full-text search capabilities from the API.
In this article, we'll walk through a Python script that fetches job listings based on title keywords and filters them further using highly relevant keywords from job descriptions.
Overview of the Method
Our approach involves two main steps:
-
API Query with Title Keywords:
We fetch job listings that match a set of predefined title keywords. These keywords are combined using the|OR|
operator to ensure we capture a broad range of relevant job titles (you can combine up to 50 different keywords). -
Local Filtering with Description Keywords:
Once we have the raw job listings, we filter them locally to ensure each job description contains at least two or more highly relevant keywords. This ensures that the jobs are not only relevant by title but also by their actual content.
By combining these two steps, we can efficiently narrow down job listings to those that are most relevant to the Life Sciences and Biotechnology industries.
The Python Script
Below is the Python script that implements this method. We'll break it down into its key components and explain each part in detail.
Step 1: Define Keyword Lists
We start by defining two lists of keywords:
- Title Keywords:
These are substrings that are likely to appear in job titles within the Life Sciences and Biotechnology industries.
title_keywords = [
"Biotech", "Life Sciences", "Biopharma", "Pharmaceutical", "Genomics",
"Proteomics", "Bioinformatics", "Molecular Biology", "Cell Biology",
"Biochemistry", "Clinical Research", "Regulatory Affairs", "Quality Assurance",
"Biomanufacturing", "Biostatistics", "Pharmacology", "Toxicology",
"Biomedical Engineer", "Scientist", "Research Associate", "Lab Technician"
]
- Description Keywords:
These are highly relevant keywords that typically appear in job descriptions for positions in these industries.
description_keywords = [
"PCR", "ELISA", "CRISPR", "NGS", "RNA", "DNA", "cell culture", "flow cytometry",
"mass spectrometry", "HPLC", "LC-MS", "GMP", "GLP", "FDA", "ICH", "clinical trials",
"drug development", "bioprocessing", "fermentation", "protein purification",
"assay development", "statistical modeling", "bioanalytical", "pharmacokinetics",
"pharmacodynamics", "toxicology studies", "regulatory submissions"
]
Step 2: Fetch Jobs Using the API
We define a function fetch_jobs_by_title
to fetch job listings from the API based on the title keywords. The keywords are combined using the |OR|
operator to create a single query string.
import requests
API_URL = "https://jobdataapi.com/api/jobs/"
API_KEY = "your_api_key_here" # Replace with your actual API key
def fetch_jobs_by_title(title_keywords):
"""
Fetches job listings from the API based on title keywords using the |OR| operator.
:param title_keywords: List of keyword substrings for job title search
:return: List of job listings (each listing is a dictionary with 'title' and 'description' keys)
"""
# Join up to 50 title keywords with |OR| for the API query
title_query = "|OR|".join(title_keywords)
# Make the API request
params = {
"title": title_query,
"description_md": True, # get Markdown version of job description
"description_off": True, # switch off HTML version
"exclude_expired": True, # only open positions
"max_age": 90, # only jobs published in the past 90 days
"api_key": API_KEY
}
response = requests.get(API_URL, params=params)
# Check if the request was successful
if response.status_code == 200:
res_json = response.json()
print(f"Found {res_json['count']} jobs based on title keywords.")
return res_json.get("results", [])
else:
print(f"Error fetching jobs: {response.status_code}")
return []
Note: If you're not on an access pro subscription you can remove the exclude_expired
parameter from the list above to make it work without one.
Step 3: Filter Jobs Locally
Next, we define a function filter_jobs
to filter the fetched job listings based on the description keywords. Each job description is checked to ensure it contains at least three or more of the highly relevant keywords.
def filter_jobs(job_listings, description_keywords, min_keywords=3):
"""
Filters job listings to ensure each job description contains at least a minimum number
of highly relevant keywords.
:param job_listings: List of job listings (each listing is a dictionary with 'title' and 'description' keys)
:param description_keywords: List of highly relevant keywords to check in job descriptions
:param min_keywords: Minimum number of description keywords required for a job to be considered relevant
:return: Filtered list of job listings
"""
filtered_jobs = []
for job in job_listings:
description = job["description_md"].lower()
# Count how many description keywords are in the job description
keyword_count = sum(keyword.lower() in description for keyword in description_keywords)
# If the job description contains at least min_keywords, add it to the filtered list
if keyword_count >= min_keywords:
filtered_jobs.append(job)
return filtered_jobs
Step 4: Putting It All Together
Finally, we combine the two functions to fetch and filter job listings. Here's the complete script:
if __name__ == "__main__":
# Step 1: Fetch jobs using the title keywords
job_listings = fetch_jobs_by_title(title_keywords)
print(f"Fetched {len(job_listings)} jobs based on title keywords.")
# Step 2: Filter jobs based on description keywords
filtered_jobs = filter_jobs(job_listings, description_keywords, min_keywords=2)
print(f"Filtered down to {len(filtered_jobs)} highly relevant jobs.")
# Print the filtered jobs (for demonstration purposes)
for job in filtered_jobs:
print(f"Title: {job['title']}")
print(f"Description: {job['description_md'][:100]}...") # Print first 100 chars of Markdown description
print(f"URL: {job['application_url']}")
print("-" * 50)
Example Output
When you run the script, you'll see output similar to the following:
Found 35572 jobs based on title keywords.
Fetched 100 jobs based on title keywords.
Filtered down to 59 highly relevant jobs.
Title: Immunology/Inflammation Expert (Postdoctoral Fellow / Scientist)
Description: **Position Summary:**
We are seeking a
highly motivated Immunology/Inflammation Expert to join our ...
URL: https://f.zohorecruit.com/jobs/Careers/3...
--------------------------------------------------
Title: Data Scientist
Description: **Knowledge,
Skills, Competencies and Responsibilities: - Technical
Competency: -**
* Play a key ro...
URL: https://v.zohorecruit.com/jobs/Careers/5...
--------------------------------------------------
Conclusion
This script is highly customizable - you can adjust the keyword lists, modify the filtering criteria, or integrate it with other tools to further enhance its functionality. The goal here is to demonstrate the quite simple yet effective concept to get a large number relevant listings very quickly.
The fastest way to create initial keyword and description filter lists is to ask you favorite LLM to generate it for you. After that you may review the results and modify or enhance them based on your experience or industry knowledge.