Scrapinghub Reference Documentation

Product Extraction

If you requested a product extraction, and the extraction succeeds, then the product field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
    'pageType': 'product'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['product'])

The following fields are available for product:

Name

Type

Description

name

String

The name of the product.

offers

List of dictionaries with price, currency, regularPrice and availability string fields

Offers of the product.

All fields are optional but currency is present only if price is also present.

The price field is a string with a valid number (a dot is used as decimal separator). It is the price a customer has to pay after discounts or special offers.

currency is the currency as given on the website, without extra normalization (for example, both “$” and “USD” are possible currencies). It is present only if price is also present.

regularPrice is the price before any discount or special offer. It is present only when the price is different from regularPrice.

availability is the product availability. It can be:

  • "InStock"

    Includes limited availability, presale, preorder, and in-store only.

  • "OutOfStock"

    Includes discontinued and sold out.

sku

String

Stock Keeping Unit identifier for the product assigned by the seller.

mpn

String

Manufacturer part number identifier for product. It is issued by the manufacturer and is same across different websites for a product.

gtin

List of dict with type and value string fields

Standardized GTIN product identifier which is unique for a product across different sellers. It includes the following type: isbn10, isbn13, issn, ean13, upc, ismn, gtin8, gtin14. gtin14 corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code. ean13 also includes the jan (japanese article number). E.g. [{'type': 'isbn13', 'value': '9781933624341'}]

brand

String

Brand or manufacturer of the product.

breadcrumbs

List of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL.

mainImage

String

A URL or data URL value of the main image of the product.

images

List of strings

A list of URL or data URL values of all images of the product (may include the main image).

description

String

Description of the product.

aggregateRating

Dictionary with ratingValue, bestRating float fields and reviewCount int field

ratingValue is the average rating value. bestRating is the best possible rating value. reviewCount is the number of reviews or ratings for the product. All fields are optional but one of reviewCount or ratingValue is present.

additionalProperty

List of dictionaries with name and value fields

A list of product properties or characteristics, name field contains the property name, and value field contains the property value.

probability

Float

Probability that the requested page is a single product page.

url

String

URL a of page where this product was extracted.

All fields are optional, except for url and probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Below is an example response with all product fields present:

[
  {
    "product": {
      "name": "Product name",
      "offers": [
        {
          "price": "42",
          "currency": "USD",
          "availability": "InStock"
        }
      ],
      "sku": "product sku",
      "mpn": "product mpn",
      "gtin": [
        {
          "type": "ean13",
          "value": "978-3-16-148410-0"
        }
      ],
      "brand": "product brand",
      "breadcrumbs": [
        {
          "name": "Level 1",
          "link": "http://example.com"
        }
      ],
      "mainImage": "http://example.com/image.png",
      "images": [
        "http://example.com/image.png"
      ],
      "description": "product description",
      "aggregateRating": {
        "ratingValue": 4.5,
        "bestRating": 5.0,
        "reviewCount": 31
      },
      "additionalProperty": [
        {
          "name": "property 1",
          "value": "value of property 1"
        }
      ],
      "probability": 0.95,
      "url": "https://example.com/product"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a2",
      "domain": "example.com",
      "userQuery": {
        "pageType": "product",
        "url": "https://example.com/product"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]