CSV File Downloads Documentation

Full job data download infos and sample files.

Docs Job data Company data

Introduction
Overview
Data Download URLs
- Vector embedding data
Sample Files and SQLite Database Demo
Examples

Introduction

The jobdata API access ultra package provides a comprehensive solution for accessing and managing job data. With both API and CSV download options, you can integrate up-to-date job and company information into your projects efficiently. Using the data download feature over accessing the API offers several key advantages, particularly for users who need comprehensive datasets for analysis, offline access, or bulk data processing.

The CSV file downloads provide a full snapshot of all job data updated weekly, ensuring that you have access to all recent information without the need for continuous API calls. This method might be more efficient for large-scale data handling, allowing for quicker integration into local databases and systems.

Additionally, data downloads are particularly beneficial for data scientists and analysts who require complete datasets for advanced analytics and reporting, offering a more streamlined and cost-effective solution compared to the incremental data retrieval typically done via API calls.

Overview

The Jobdata API access ultra package provides extensive data access and download capabilities, including:

Unlimited access to the Jobdata API feed with comprehensive job post backfill and related full company info.
Expired jobs info.
Job description text search.
Advanced company data search.
Job post embeddings and vector search.
Weekly-updated job data CSV file downloads.

A fresh export of the latest data happens every Sunday around 6p.m. UTC.

Data Download URLs

Data download URLs are generated with unique download access keys to maintain privacy and allow programmatic access. Every file represents its API endpoint counterpart with mostly the same attributes that are available through the access pro subscription package.

Additional CSV file versions with all job descriptions in Markdown format (instead of the original HTML variant) as well as their pre-generated vector embeddings (see Vector Embeddings and Search API Documentation for more info on this) of the jobs data are also available. This is to keep file sizes as small as possible so you can pick and choose the data you actually need.

In addition to that and to maintain all object relationships there are the following tables that reflect many-to-many relations ships between the job table and its counterparts:

job_job_types.csv: Contains all applicable job types for every job.
job_job_cities.csv: Contains all associated city selections for every job.
job_job_states.csv: Contains all state/canton/administrative region data.
job_job_countries.csv: Contains all associated country selections for every job.
job_job_regions.csv: Contains all associated region selections for every job.

Here are the formats for the all download links available:

Industries: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/industries.csv.gz
Company types: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/company_types.csv.gz
Companies: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/companies.csv.gz
Job types: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_types.csv.gz
Job regions: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_regions.csv.gz
Job countries: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_countries.csv.gz
Job states: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_states.csv.gz
Job cities: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_cities.csv.gz
Jobs: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs.csv.gz
Jobs (descr. as Markdown): https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs_md.csv.gz
Job embeddings only: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs_em.csv.gz
Job type relations: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_job_types.csv.gz
Job city relations: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_job_cities.csv.gz
Job state relations: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_job_states.csv.gz
Job country relations: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_job_countries.csv.gz
Job region relations: https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/job_job_regions.csv.gz

These links are designed for easy integration into automated systems, ensuring that your data remains up-to-date with minimal effort.

You get access to them right after subscribing to the access ultra package by clicking "Generate new links" from your dashboard. Note that the download access key is always different from your API access key and can only be used as part of your download link URLs. Generating a new download access key always invalidates the previous one and all its associated download URLs become invalid.

Vector embedding data

With the jobs_em.csv you get all our pre-generated vector embeddings. Currently that's through OpenAI's text-embedding-3-small model with 768 dimensions and half-precision floats (read more about it here. The download file contains the corresponding job ID and its embedding (field: embedding_3sh).

The CSV file might contain empty values as embedding data for some listings. This can happen due to a lag in subsequent processing of the latest jobs that have just been imported.

Sample Files and SQLite Database Demo

To help you get started, we provide sample files containing recent information of 100 companies and their most recent 5000 jobs. Additionally, an SQLite database example demonstrates how the data can be reassembled.

Sample Files

The following CSV files are the exact same format that the full data files would be when subscribed, downloadable from your dashboard under "Download access":

Industries: industries.csv
Company types: company_types.csv
Companies: companies.csv
Job types: job_types.csv
Job regions: job_regions.csv
Job countries: job_countries.csv
Job states: job_states.csv
Job cities: job_cities.csv
Jobs: jobs.csv
Jobs (descr. as Markdown): jobs_md.csv
Job embeddings only: jobs_em.csv
Job type relations: job_job_types.csv
Job city relations: job_job_cities.csv
Job state relations: job_job_states.csv
Job country relations: job_job_countries.csv
Job region relations: job_job_regions.csv

For more details on attributes and field types you can also refer to the corresponding API endpoint documentation.

SQLite Demo Database

A full SQLite3 database filled with the sample data from above CSV files (without the Markdown version and embeddings) can be downloaded here: demo.sqlite3

SQLite Database Schema

The example schema for the SQLite database can be found here: sqlite_demo_schema.sql

Examples

Data downloads are particularly beneficial for data scientists and analysts who require complete datasets for advanced analytics and reporting, offering a more streamlined and cost-effective solution compared to the incremental data retrieval typically done via API calls. Here are some simple use cases for the data download feature, along with corresponding code examples:

Use Case 1: Market Analysis

Objective: Analyze job market trends to identify high-demand skills and emerging industries.

Steps:

1. Download and load data:

curl -o jobs.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs.csv.gz
curl -o companies.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/companies.csv.gz
gunzip jobs.csv.gz
gunzip companies.csv.gz

2. Analyze job postings:

import pandas as pd

jobs = pd.read_csv('jobs.csv')
companies = pd.read_csv('companies.csv')

# Merge jobs with companies to get complete job data
job_data = pd.merge(jobs, companies, left_on='company_id', right_on='id')

# Identify high-demand skills by counting occurrences in job descriptions
skills = ['Python', 'JavaScript', 'SQL', 'Java', 'AWS']
skill_counts = {skill: job_data['description'].str.contains(skill, case=False).sum() for skill in skills}

print(skill_counts)

3. Visualize results:

import matplotlib.pyplot as plt

plt.bar(skill_counts.keys(), skill_counts.values())
plt.xlabel('Skills')
plt.ylabel('Demand Count')
plt.title('High-Demand Skills in Job Postings')
plt.show()

Use Case 2: Competitive Analysis

Objective: Analyze competitors' hiring patterns and job distribution.

Steps:

1. Download and load data:

curl -o jobs.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs.csv.gz
curl -o companies.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/companies.csv.gz
gunzip jobs.csv.gz
gunzip companies.csv.gz

2. Analyze competitor hiring patterns:

import pandas as pd

jobs = pd.read_csv('jobs.csv')
companies = pd.read_csv('companies.csv')

# Filter for competitors' job postings (assuming competitor IDs are known)
competitor_ids = [101, 102, 103]  # Example competitor IDs
competitor_jobs = jobs[jobs['company_id'].isin(competitor_ids)]

# Count job postings by competitor
competitor_counts = competitor_jobs['company_id'].value_counts()

# Map company names for better readability
company_names = companies.set_index('id')['name'].to_dict()
competitor_counts.index = competitor_counts.index.map(company_names)

print(competitor_counts)

3. Visualize results:

import matplotlib.pyplot as plt

competitor_counts.plot(kind='bar')
plt.xlabel('Competitor')
plt.ylabel('Number of Job Postings')
plt.title('Competitor Hiring Patterns')
plt.show()

Use Case 3: Salary Analysis

Objective: Analyze salary distributions across different job titles and industries.

Steps:

1. Download and load data:

curl -o jobs.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs.csv.gz
gunzip jobs.csv.gz

2. Analyze salary data:

import pandas as pd

jobs = pd.read_csv('jobs.csv')

# Filter out jobs with missing salary information
salary_data = jobs.dropna(subset=['salary_min', 'salary_max'])

# Calculate average salary
salary_data['average_salary'] = (salary_data['salary_min'] + salary_data['salary_max']) / 2

# Analyze by job title
title_salary = salary_data.groupby('title')['average_salary'].mean().sort_values(ascending=False)

print(title_salary.head(10))  # Top 10 job titles by average salary

3. Visualize results:

import matplotlib.pyplot as plt

title_salary.head(10).plot(kind='bar')
plt.xlabel('Job Title')
plt.ylabel('Average Salary')
plt.title('Top 10 Job Titles by Average Salary')
plt.show()

Use Case 4: Job Posting Trends Over Time

Objective: Analyze trends in job postings over time to identify seasonal patterns and growth areas.

Steps:

1. Download and load data:

curl -o jobs.csv.gz https://jobdataapi.com/download/<DOWNLOAD_ACCESS_KEY>/jobs.csv.gz
gunzip jobs.csv.gz

2. Analyze job posting trends:

import pandas as pd

jobs = pd.read_csv('jobs.csv')

# Convert published date to datetime
jobs['published'] = pd.to_datetime(jobs['published'])

# Group by month and count job postings
monthly_trends = jobs.groupby(jobs['published'].dt.to_period('M')).size()

print(monthly_trends)

3. Visualize results:

import matplotlib.pyplot as plt

monthly_trends.plot(kind='line')
plt.xlabel('Month')
plt.ylabel('Number of Job Postings')
plt.title('Job Posting Trends Over Time')
plt.show()

By leveraging the data download feature, these use cases illustrate how various stakeholders can derive actionable insights from comprehensive job and company datasets, enabling informed decision-making and strategic planning.

CSV File Downloads Documentation

Table of contents

Introduction

Overview

Data Download URLs

Vector embedding data

Sample Files and SQLite Database Demo

Sample Files

SQLite Demo Database

SQLite Database Schema

Examples

Use Case 1: Market Analysis

Use Case 2: Competitive Analysis

Use Case 3: Salary Analysis

Use Case 4: Job Posting Trends Over Time