Scrapinghub Reference Documentation

Job Posting Extraction

If you requested a job posting extraction, and the extraction succeeds, then the jobPosting field will be available in the query result:

import requests

response = requests.post(
    'https://autoextract.scrapinghub.com/v1/extract',
    auth=('[api key]', ''),
    json=[{'url': 'http://example.com/example-job-page',
           'pageType': 'jobPosting'}])
print(response.json()[0]['jobPosting'])

The following fields are available for job postings:

Name

Type

Description

title

String

The title of the job.

datePosted

String

Publication date of online listing. ISO-formatted with ‘T’ separator, may contain a timezone.

validThrough

String

The date after when the item is not valid, e.g. the end of an offer. ISO-formatted with ‘T’ separator, may contain a timezone.

description

String

A description of the job posting including sub-headings, with newline separators.

descriptionHtml

String

Simplified HTML of the description, including sub-headings, image captions and embedded content.

employmentType

String

Type of employment (e.g. full-time, part-time, contract, temporary, seasonal, internship)

hiringOrganization

Dictionary with a raw string field

Information about the organization offering the job position.

baseSalary

Dictionary with structure described below

The base salary of the job or of an employee in the proposed role.

jobLocation

Dictionary with a raw string field

A (typically single) geographic location associated with the job position.

probability

Float

Probability that this is a single job posting page.

url

String

URL of a page where this job posting was extracted.

All fields are optional, except for url and probability. Fields without a valid value (null or empty array) are excluded from extraction results.

The following fields are avaialable for baseSalary:

Name

Type

Description

raw

String

Info about the Monery Amount as it appears on the website.

value

Number

The exact value of the amount.

currency

String

Currency associated to the amount.

All fields are optional, except for raw.

Below is an example response with all job posting fields present:

[
  {
    "jobPosting": {
      "title": "Regional Manager",
      "datePosted": "2019-06-19T00:00:00",
      "validThrough": "2019-08-19T00:00:00",
      "description": "Job Description ...",
      "descriptionHtml": "<article>HTML for Job Description ...",
      "baseSalary": {
        "currency": "$",
        "raw": "$90000 gross",
        "value": 90000.0
      },
      "jobLocation": {
        "raw": "North Pole"
      },
      "hiringOrganization": {
        "raw": "ACME Corporation"
      },
      "employmentType": "Full-time",
      "probability": 0.95,
      "url": "https://example.com/job"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "jobPosting",
        "url": "https://example.com/job"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]