jobdata

Introduction to Using Vector Search and Embeddings through the jobdata API

Unlocking semantic job search and candidate matching by combining pre-generated embeddings with scalable vector databases.

12 min read · Sept. 5, 2025
Table of contents

The way job search platforms, HR systems, and market researchers find and categorize jobs is rapidly changing. Traditional keyword search methods are still useful, but they often fall short when it comes to understanding the meaning behind words. A search for “machine learning engineer,” for example, may miss valuable postings that use phrases like “AI specialist” or “data scientist.”

This is where vector embeddings and semantic search capabilities offered by the jobdata API come into play. Instead of matching only keywords, these tools allow you to match concepts, making it possible to uncover relationships between job postings and queries that would otherwise remain hidden.

This guide explains how our vector search and embeddings work, how to use them effectively, and how to avoid common pitfalls when building on top of the service.

Understanding Vector Embeddings in jobdata API

At the core of the API’s semantic search capabilities are vector embeddings. An embedding is a numerical representation of text—such as a job title and description—that captures its semantic meaning. In the jobdata API, embeddings are generated using OpenAI’s text-embedding-3-small model, which produces vectors with 768 dimensions in our case. Every job post in our database is preprocessed and assigned such an embedding.

Because these embeddings are already generated and stored, you do not need to manage that process yourself. Instead, you can request embeddings for job listings directly via the API (embed_3sh=true), or you can run semantic searches by providing either a text query or your own embedding.

The API supports two main approaches to semantic matching:

  1. Text-based vector search (vec_text) You provide a text string—for instance, “remote software engineer with AI experience.” The API transforms this string into an embedding in real time and compares it to the pre-generated embeddings of job postings. The response includes jobs that are semantically similar to your input text.

  2. Embedding-based vector search (vec_embedding) You provide a 768-dimensional embedding array directly. This is particularly useful if you already have embeddings from another source—for example, a candidate’s resume represented as an embedding—and want to match it against available jobs. This method requires a POST request, since the embedding itself is sent in the request body.

Both approaches return results ordered by cosine distance, a measure of similarity between vectors. Lower values indicate greater similarity.

Example API Calls

To illustrate, consider the following examples in Python:

Retrieve jobs with embeddings included:

import requests

url = "https://jobdataapi.com/api/jobs/"
params = {"embed_3sh": True, "page_size": 5}
headers = {"Authorization": "Api-Key YOUR_API_KEY"}

response = requests.get(url, headers=headers, params=params)
print(response.json())

This is useful if you want to download job data along with embeddings for local processing or clustering.

Perform a semantic search with a text query:

import requests

url = "https://jobdataapi.com/api/jobs/"
params = {"vec_text": "remote software engineer with AI experience", "max_dist": 0.45}
headers = {"Authorization": "Api-Key YOUR_API_KEY"}

response = requests.get(url, headers=headers, params=params)
print(response.json())

Match jobs against an existing embedding:

import requests

url = "https://jobdataapi.com/api/jobs/"
headers = {"Authorization": "Api-Key YOUR_API_KEY"}

embedding = [-0.0649, 0.0485, 0.0700, 0.0242, ...]  # shortened for clarity

params = {"max_dist": 0.45, "country": "US"}
payload = {"vec_embedding": embedding}

response = requests.post(url, headers=headers, params=params, json=payload)
print(response.json())

Combining Semantic Search with Filters

One of the strengths of the jobdata API is that vector search does not exist in isolation. You can combine semantic queries with any of the API’s filters—for example, location, salary, experience level, or job type. This makes it possible to construct highly precise queries such as:

  • “Find jobs similar to a data scientist role in fintech (job profile provided as embedding), but only in the United States, with a salary of at least $100,000, and with remote work options available (narrowing down through additional query filters).”

This combination of broad semantic understanding and fine-grained filtering produces results that are both relevant and tailored to your needs.

Practical Applications

The use cases for vector embeddings extend beyond straightforward job search:

  • Improved candidate-job matching: HR platforms can represent candidate profiles as embeddings and then match them against job postings semantically. This makes it easier to surface opportunities that align with a candidate’s skills and experience, even if the wording differs.

  • Market research and analysis: Analysts can cluster job embeddings to identify trends. For example, they might discover a growing cluster of roles related to “quantum computing” or notice that demand for “sustainability officers” is spreading across industries.

  • Personalized recommendations: A job platform can build an embedding that represents a user’s search history or profile, and then recommend jobs that are semantically closest. This provides a much more sophisticated recommendation system than simple keyword alerts.

  • Salary benchmarking: Since embeddings can cluster jobs with similar responsibilities, you can use them to identify salary ranges across comparable roles, even when the job titles differ.

Why the API Should Be Used as a Data Source only

A critical point for new users to understand is that the jobdata API is not intended to serve as a live backend for high-frequency, concurrent queries. Although the service is reliable, it enforces practical limits: no more than two requests in parallel and a maximum of ten requests per second (or somewhat higher with a certain top-tier subscription). Sending frequent queries directly from the client side is likely to cause bottlenecks or rate-limit issues.

Instead, the API is best used as a data source:

  • Fetch embeddings or search results in batches.
  • Store them in your own database or vector search system (such as FAISS, Pinecone, or PostgreSQL with pgvector).
  • Serve results to your end users from your own infrastructure, ensuring fast response times and reduced API overhead.

For example, you might run a daily job that fetches all “software engineering” positions with embeddings enabled. These can be stored locally, indexed in your vector database, and served instantly to users searching your platform. Meanwhile, the jobdata API continues to provide fresh data, but you avoid overwhelming it with repeated live queries.

Vector search excels at capturing meaning, but it is not always the right tool. In situations where you need an exact keyword match—such as requiring a specific certification or technology—full-text search remains invaluable. For example, “CFA certification” or “Kubernetes administrator” are precise requirements that might not be captured well by embeddings.

The most effective approach is often hybrid: use vector search to cast a wide net based on concepts, and then refine results with filters or keyword searches to ensure precision.


Example: Building a Semantic Job Search Pipeline

The jobdata API provides access to up-to-date job postings with rich metadata and semantic embeddings. However, as explained earlier, it is not designed to serve as a real-time backend for a high number of concurrent queries. The best practice is to use the API as a data ingestion layer: fetch jobs with embeddings, store them locally, and build your own semantic search layer on top.

This pipeline demonstrates how to:

  1. Pull job postings with embeddings from the jobdata API.
  2. Store and index them in a local vector database (FAISS).
  3. Perform efficient semantic searches on your own infrastructure.
  4. Periodically refresh the local dataset with new jobs from the API.

1. Prerequisites

  • A jobdata API key (with an active API access pro+ subscription or higher, required for embeddings).
  • Python environment with the following packages:
pip install requests faiss-cpu pandas
  • Basic familiarity with Python and data structures.

2. Fetch Jobs with Embeddings

We start by requesting job postings with embeddings included (embed_3sh=true). To keep the example simple, we’ll fetch only a few, but in practice you could paginate through thousands.

import requests
import pandas as pd

API_KEY = "YOUR_API_KEY"
BASE_URL = "https://jobdataapi.com/api/jobs/"

headers = {"Authorization": f"Api-Key {API_KEY}"}
params = {"embed_3sh": True, "page_size": 50}  # adjust page_size as needed

response = requests.get(BASE_URL, headers=headers, params=params)
data = response.json()["results"]

# Convert to DataFrame for easy handling
jobs_df = pd.DataFrame(data, columns=["id", "title", "description", "embedding_3sh"])
print(jobs_df.head())

At this point, you have a structured dataset containing job IDs, titles, descriptions, and their embeddings.

3. Store Jobs in a Vector Index

We will use FAISS, an efficient open-source library for similarity search. Each job’s embedding is inserted into an index, and we keep a mapping from index positions back to job metadata.

import numpy as np
import faiss

# Extract embeddings and convert to numpy array
embeddings = np.vstack(jobs_df["embedding_3sh"].values).astype("float32")

# Create FAISS index (cosine similarity implemented via inner product on normalized vectors)
dimension = embeddings.shape[1]
index = faiss.IndexFlatIP(dimension)

# Normalize vectors for cosine similarity
faiss.normalize_L2(embeddings)

# Add to index
index.add(embeddings)

# Store mapping of FAISS indices to job metadata
id_map = dict(enumerate(jobs_df.to_dict("records")))
print(f"Stored {index.ntotal} jobs in vector index.")

4. Run Semantic Queries with External Embeddings

Now we can run queries directly against the local FAISS index. A query embedding can come from three possible sources:

  1. Direct jobdata query (vec_text) Provide a natural language query string to the jobdata API, which will generate an embedding on the fly and return matching jobs. This is the quickest way to test semantic search, but it requires an API request every time.

  2. Externally generated embeddings (OpenAI API) Use OpenAI’s embedding endpoint (text-embedding-3-small) to generate vectors for any query text, candidate profile, or resume. This is often better for production workflows because you can search against your local store of job embeddings, avoiding repeated calls to the jobdata API.

  3. Embeddings from already imported jobs (embedding_3sh) You can reuse imported job post embeddings directly as your query vector to find other semantically similar jobs. This enables “related jobs” or “more like this” features without regenerating embeddings.

The second option is often preferred when building scalable applications, since it avoids hitting the jobdata API for every query. Here’s how you might do it with OpenAI and FAISS:

import numpy as np
import faiss
from openai import OpenAI

# Assume you have already:
#  - Populated FAISS with job embeddings (from jobdata API)
#  - Built an `id_map` linking FAISS indices back to job metadata

client = OpenAI(api_key="YOUR_OPENAI_API_KEY")

### OPTION 1: vec_text via jobdata API (returns jobs directly)
# Good for one-off searches, but not efficient as a query source for FAISS

### OPTION 2: External embedding (OpenAI)
query_text = "remote data scientist in healthcare"
embedding_response = client.embeddings.create(
    model="text-embedding-3-small",
    input=query_text,
    dimensions=768,
    encoding_format="float",
)
query_embedding = np.array(embedding_response.data[0].embedding, dtype="float32")

### OPTION 3: Use an existing job’s embedding as the query
# Suppose you already stored job embeddings locally
job_embedding = id_map[42]["embedding_3sh"]  # pick a job to use as the "query"
query_embedding = np.array(job_embedding, dtype="float32")

# Normalize for cosine similarity (required for FAISS inner product search)
faiss.normalize_L2(query_embedding.reshape(1, -1))

# Search in FAISS index
k = 5  # number of nearest jobs to return
distances, indices = index.search(query_embedding.reshape(1, -1), k)

# Display results
for rank, (dist, idx) in enumerate(zip(distances[0], indices[0]), start=1):
    job = id_map[idx]
    print(f"{rank}. {job['title']} (cosine similarity: {dist:.3f})")

By supporting all three strategies, you can tailor your system depending on the use case:

  • vec_text → Best for direct ad-hoc search via the API.
  • OpenAI embeddings → Best for user-driven queries and resumes, combined with your cached job embeddings.
  • Imported job embeddings (embedding_3sh) → Best for “more like this job” or clustering similar jobs.

This layered approach makes it possible to build a highly flexible job discovery system without overloading the jobdata API.

5. Keeping the Data Fresh

To maintain a relevant index, you can run a scheduled job (e.g. nightly or every couple hours) that:

  1. Fetches new postings from jobdata API with embed_3sh=true.
  2. Adds them to your FAISS index and local metadata store.
  3. Optionally removes expired or outdated postings.

This way, your semantic search layer always reflects the most current data, but end users interact only with your infrastructure, ensuring fast response times.

6. Advantages of This Approach

  • Performance: End users query your local index, avoiding API latency and rate limits.
  • Scalability: FAISS or similar tools can handle millions of vectors efficiently.
  • Flexibility: You can combine semantic similarity with additional business logic (e.g., filtering by salary, location, or employer).
  • Efficiency: API usage is minimized, since you fetch and enrich data in bulk rather than on-demand for every query.

7. Next Steps

  • Experiment with hybrid search: combine semantic similarity from FAISS with keyword filters or metadata constraints.
  • Explore alternative vector databases such as Pinecone, Weaviate, or PostgreSQL with pgvector if you need cloud-native scalability.
  • Extend the pipeline to support candidate matching (embedding resumes) or market research (clustering embeddings to reveal trends).

Summary

Instead of being limited to keyword matches, you can now work with representations of meaning and context. This enables systems that not only “find jobs with the same words,” but also “understand what the job is about” — a critical leap forward for both employers and job seekers.

With embeddings, you can:

  • Deliver relevance that feels human. A candidate searching for “AI researcher” can also be shown roles labeled “machine learning scientist” or “deep learning engineer.” Semantic similarity bridges the gap between rigid job titles and the fluid way people actually describe work.
  • Build stronger candidate-job matching. By embedding both job postings and candidate resumes/profiles, you can measure alignment on a deeper level than simple keyword overlap — highlighting opportunities where skills and requirements are semantically related, even if phrased differently.
  • Uncover market insights. Embeddings allow clustering, trend analysis, and anomaly detection across large sets of postings. This makes it possible to track how job roles are evolving, identify emerging skills, or compare regional labor markets more precisely.
  • Enable personalized recommendations. By combining semantic similarity with user history and preferences, you can recommend jobs that fit not just what a candidate typed into the search bar, but what they truly mean — and what they’re most likely to engage with.

To get the most from these capabilities, it’s best to treat the jobdata API as a data enrichment and ingestion layer, rather than a live backend for end-user queries. The optimal architecture is:

  1. Fetch enriched jobs (with embeddings) from the API.
  2. Cache them in your own infrastructure — e.g., FAISS, Pinecone, Weaviate, or pgvector.
  3. Serve queries locally, generating new embeddings as needed (e.g., from OpenAI) to compare against your stored data.
  4. Apply filters and business logic (location, salary, seniority) on top of semantic similarity to maximize result relevance.

For the vast majority of scenarios, this approach provides the right balance of speed, scalability, and cost-efficiency.

Related Docs

Integrating the jobdata API with Excel
Integrating the jobdata API with n8n
Fetching and Maintaining Fresh Job Listings
Merging Job Listings from Multiple Company Entries
Converting Annual FTE Salary to Monthly, Weekly, Daily, and Hourly Rates
Using the jobdata API for Machine Learning with Cleaned Job Descriptions
Integrating the jobdata API with Zapier
A Two-Step Approach to Precision Job Filtering
How to Determine if a Job Post Requires Security Clearance
Integrating the jobdata API with Make
Retrieving and Working with Industry Data for Imported Jobs
Optimizing API Requests: A Guide to Efficient jobdata API Usage