Warning

Zyte Automatic Extraction will be discontinued starting April 30th, 2024. It is replaced by Zyte API. See Migrating from Automatic Extraction to Zyte API.

Job Posting Extraction#

Job posting extraction supports pages with a single job posting on them, as found on job boards, career sections of company web-sites, or other sites. Many fields are extracted, such as job title, description, salary information and publication date.

This supports use-cases such as market, technology and competitor analysis, finding leads, and many others.

Request example#

If you requested a job posting extraction, and the extraction succeeds, then the jobPosting field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://example.com/example-job-page',
    'pageType': 'jobPosting'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['jobPosting'])

Available fields#

The following fields are available for jobPosting:

title: string

The title of the job.

datePosted: string

Publication date of online listing. ISO-formatted with ‘T’ separator, may contain a timezone.

validThrough: string

The date after when the item is not valid, e.g. the end of an offer. ISO-formatted with ‘T’ separator, may contain a timezone.

description: string

A description of the job posting including sub-headings, with newline separators.

descriptionHtml: string

Simplified HTML of the description, including sub-headings, image captions and embedded content.

employmentType: string

Type of employment (e.g. full-time, part-time, contract, temporary, seasonal, internship)

hiringOrganization: dictionary with a raw string field

Information about the organization offering the job position. Example:

{"raw": "ACME Corp."}

baseSalary: dictionary

The base salary of the job or of an employee in the proposed role. It is a dictionary with the following fields:

raw - string with the salary amount, as it appears on the website
value - float number, the value of the base salary.
currency - string, currency associated to the amount.

Example:

{
    "raw": "$53,251 a year",
    "value": 53251.0,
    "currency": "$"
}

All fields are optional, except for raw. Example

jobLocation: dictionary with a raw string field

A (typically single) geographic location associated with the job position. Example:

{"raw": "West New York, NJ 07093"}

probability: float

Probability that this is a single job posting page.

url: string

URL of a page where this job posting was extracted.

All fields are optional, except for url and probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Response example#

Below is an example response with all job posting fields present:

[
  {
    "jobPosting": {
      "title": "Regional Manager",
      "datePosted": "2019-06-19T00:00:00",
      "validThrough": "2019-08-19T00:00:00",
      "description": "Job Description ...",
      "descriptionHtml": "<article>HTML for Job Description ...",
      "baseSalary": {
        "currency": "$",
        "raw": "$90000 gross",
        "value": 90000.0
      },
      "jobLocation": {
        "raw": "North Pole"
      },
      "hiringOrganization": {
        "raw": "ACME Corporation"
      },
      "employmentType": "Full-time",
      "probability": 0.95,
      "url": "https://example.com/job"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "jobPosting",
        "url": "https://example.com/job"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]