Scrapinghub API Reference

Unified Schema

The Unified Schema project aims to provide a standard definition for the different types of data such as products, articles, reviews, jobs etc. extracted across websites.

Note

All fields in the AutoExtract have the exact same definition in the Unified Schema. We also aim to maintain backward compatibility while adding new fields. We also try our best to adhere to schema.org, only diverging when there is a reasonable benefit in doing so.

Product Schema

The following fields are available for products:

Field

Format

Description

aggregateRating

  • Type: Dictionary

  • Fields:

    1. ratingValue Number

    2. bestRating Number

    3. reviewCount Number

The overall rating, based on a collection of reviews or ratings

rating

{
 'ratingValue': 4.0,
 'bestRating': 5.0,
 'reviewCount': 23
}

additionalProperty

  • Type: List

  • Items: Dictionary

  • Fields:
    1. name String

    2. value String Float List Dictionary

This name-value pair field holds information pertaining to product specific features that have no matching property in the Product schema.

product_info

[{"name": "batteries",
  "value": "1 Lithium ion batteries required. (included)"},
 {"name": "Item model number",
  "value": "SM-A105G/DS"}]

brand

  • Type: String

The brand associated with the product

brand

{"brand": "Samsung"}

brand_not

No brand is returned

breadCrumbs

  • Type: List

  • Fields:
    1. name String

    2. link String

A list of breadcrumbs with optional name and URL.

[{"name": ""Cell Phones & Accessories"",
  "link": "https://mjz.com/cell-phones-accessories"}...]

description

  • Type: String

A description of the product

gtin

  • Type: List

  • Items: Dictionary

  • Fields:
    1. type String

    2. value String

Standardized GTIN product identifier which is unique for a product across different sellers. It includes the following type: isbn10, isbn13, issn, ean13, upc, ismn, gtin8, gtin14. gtin14 corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code.ean13 also includes the jan (japnese article number)

[{'type': 'isbn13', 'value': '9781933624341'}]

images

  • Type: List

  • Items: String

A list of URL or data URL values of all images of the product (may include the main image).

mainImage

  • Type: String

A URL or data URL value of the main image of the product.

mpn

  • Type: String

The Manufacturer Part Number (MPN) of the product. The product would have the same MPN across different e-commerce websites.

name

  • Type: String

The name of the product

offers

This field contains rich information pertaining to all the buying options offered on a product. Detailed information regarding all the properties returned in this field is available in the offers section. offers2

[{
    "availability":"InStock",
    "price":"129.99",
    "currency":"$"
    "itemCondition":{
    "type":"used",
 "description":"Used - Very Good"
 },
 "seller":{
 "name":"Merch Store",
 "url":"https://mzi.com/dr/amg/seller=A8K32FFKI51FKN",
 "identifier":"A8K32FFKI51FKN",
 "aggregateRating":{
 "reviewCount":479,
 "bestRating":5
 },
 "shippingInfo":{
 "minDays":"15",
 "maxDays":"30",
 "description":"Arrives between September 3-18."
 }
 }
 }]

ratingHistogram

  • Type: List

  • Items: Dictionary

  • Fields:
    1. ratingValue String

    2. ratingCount Number

    3. ratingPercentage Number

This fields provides the detailed distribution of ratings across the entire rating scale

histogram

[{"ratingValue": "5", "ratingPercentage": 61},
 {"ratingValue": "4", "ratingPercentage": 12}
 {"ratingValue": "3", "ratingPercentage": 6},
 {"ratingValue": "2", "ratingPercentage": 5}
 {"ratingValue": "1", "ratingPercentage": 16}]

releaseDate

Date on which the product was released or listed on the website in ISO 8601 date format

{"releaseDate": "2016-12-18"}

relatedProducts

  • Type: List

  • Items: Dictionary

  • Fields:
    1. relationshipName String

    2. products List

This field captures all products that are recommended by the website while browsing the product of interest. Related products can thus be used to gauge customer buying behaviour, sponsored products as well best sellers in the same category. The relationshipName field describes the relationship while the products field contains a list of items have the same product schema, thus extracting all available fields as defined in this table

related_products

variants

  • Type: List

  • Items: Product

This field returns a list of variants of the product. Each variant has the same schema as the Product schema defined in this table.

sku

  • Type: String

The Stock Keeping Unit (SKU) i.e. a merchant-specific identifier for the product

sku

{"sku": "A123DK9823"}

width

  • Type: String

The width of the product

height

  • Type: String

The height of the product

depth

  • Type: String

The depth of the product

weight

  • Type: String

The weight of the product

volume

  • Type: String

The volume of the product

url

  • Required

  • Type: String

The URL of the product

offers

The offers field contains several fields as explained below that can be leveraged to get deep insights into the various product offerings, associated seller information as well as inventory.

eligibleQuantity

This field gives details about bulk purchase offers available for the product.

Field

Format

Description

maxValue

Number

Maximum value allowed.

minValue

Number

Minimum value required

value

Number

Exact value required

unitText

String

Unit of measurement

description

String

Free text from where this range was extracted

Let’s take the following example to examine the aforementioned fields

bulk_offer

{'offers': [
   {'price': '11,98', 'currency': '$'},
   {'price': '10,78', 'currency': '$', 'eligibleQuantity': {'min_value': '48', 'description': 'Buy 44 or more $9.33'}}
  ]
}

availableAtOrFrom

The place(s) from which the offer can be obtained (e.g. store locations). It could contain a string, i.e.: online_only

Field

Format

Description

postalCode

String

Postal code of the address

streetAddress

String

The street address. For example, 1600 Amphitheatre Pkwy.

addressCountry

String

The country. For example, USA. You can also provide the two-letter ISO 3166-1 alpha-2 country code. https://en.wikipedia.org/wiki/ISO_3166-1

addressLocality

String

The locality in which the street address is, and which is in the region. For example, Mountain View.

addressRegion

String

The region in which the locality is, and which is in the country. For example, California.

areaServed

The geographic area where a service or offered item is provided. The fields and the definition is the same as availableAtOrFrom.

shippingInfo

Field

Format

Description

currency

String

Currency associated to the price

price

String

Cost of shipping

minDays

Number

Minimum number of days estimated for the delivery

maxDays

Number

Maximum number of days estimated for the delivery

averageDays

Number

Average days for a delivery

description

String

Any associated text describing the shipping info

originAddress

String or postalAddress

Location of the warehouse where the item is shipped from

seller

This field provides the seller details including rating.

Field

Format

Description

name

String

Name of the seller

url

String

URL for the seller’s page

identifier

String

Unique identifier assigned to the seller on the website

aggregateRating

Dictionary

The sellers rating. Same as aggregateRating in the product schema.

itemCondition

A predefined value and a textual description of the condition of the product included

Field

Format

Description

type

String

A predefined value of the condition of the product included in the offer. Takes on one of the following enumerated values ['NewCondition', 'DamagedCondition', 'RefurbishedCondition', 'UsedCondition']

description

String

A textual description of the condition of the product included in the offer

Article Schema

The following fields are available for articles:

Name

Type

Description

headline

String

Article headline or title.

datePublished

String

Date, ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw

String

Same date but before parsing, as it appeared on the site.

author

String

Author (or authors) of the article.

authorsList

List of strings

All authors of the article split into separate strings, for example the author value might be "Alice and Bob" and authorList value ["Alice", "Bob"], while for a single author author value might be "Alice Johnes" and authorList value ["Alice Johnes"].

inLanguage

String

Language of the article, as an ISO 639-1 language code.

breadcrumbs

List of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL.

mainImage

String

A URL or data URL value of the main image of the article.

images

List of strings

A list of URL or data URL values of all images of the article (may include the main image).

description

String

A short summary of the article, human-provided if available, or auto-generated.

articleBody

String

Text of the article, including sub-headings and image captions, with newline separators.

articleBodyRaw

String

html of the article body.

videoUrls

List of strings

A list of URLs of all videos inside the article body.

audioUrls

List of strings

A list of URLs of all audios inside the article body.

probability

Float

Probability that this is a single article page.

url

String

URL of page where this article was extracted.