Product Extraction

Request example

If you request a product extraction, and the extraction succeeds, then the product field is available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
    'pageType': 'product'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['product'])

Available fields

The following fields are available for product:

name: string

The name of the product.

offers: list of dictionaries

Product offers. Each offer may contain price, currency, regularPrice and availability string fields. All fields are optional but currency is present only if price is also present.

  • price field is a string with a valid number (a dot is used as decimal separator). It is the price a customer has to pay after discounts or special offers.

  • currency is the currency as given on the website, without extra normalization (for example, both “$” and “USD” are possible currencies). It is present only if price is also present.

  • regularPrice is the price before any discount or special offer. It is present only when the price is different from regularPrice.

  • availability is the product availability, as a string. Allowed values:

    • "InStock" - includes limited availability, presale, preorder, and in-store only.

    • "OutOfStock" - includes discontinued and sold out.

Example:

[
  {
    "price": "42",
    "regularPrice": "45.00",
    "currency": "USD",
    "availability": "InStock"
  }
]
sku: string

Stock Keeping Unit identifier for the product assigned by the seller.

mpn: string

Manufacturer part number identifier for product. It is issued by the manufacturer and is same across different websites for a product.

gtin: list of dictionaries with type and value string fields

Standardized GTIN product identifier which is unique for a product across different sellers. It includes the following type: isbn10, isbn13, issn, ean13, upc, ismn, gtin8, gtin14.

gtin14 corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code.

ean13 also includes the jan (japanese article number). Example:

[{"type": "isbn13", "value": "9781933624341"}]
brand: string

Brand or manufacturer of the product.

breadcrumbs: list of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL. Example:

[
  {"name": "Foo", "link": "http://example.com/foo"},
  {"name": "Bar", "link": "http://example.com/foo/bar"},
  {"name": "Baz"},
]
mainImage: string

A URL or data URL value of the main image of the product.

images: list of strings

A list of URL or data URL values of all images of the product (may include the main image).

description: string

Description of the product.

aggregateRating: dictionary

Aggregate information about the product rating and reviews.

  • ratingValue is the average rating value, as a float.

  • bestRating is the best possible rating value, as a float.

  • reviewCount is the number of reviews or ratings for the product, as int.

Example - 4.5 out of 5, based on 12 reviews:

{
  "ratingValue": 4.5,
  "bestRating": 5,
  "reviewCount": 12
}

All fields are optional but one of reviewCount or ratingValue must be present.

additionalProperty: list of dictionaries with name and value fields

A list of product properties or characteristics.

  • name field contains the property name,

  • value field contains the property value.

Example:

[
  {"name": "color", "value": "blue"},
  {"name": "brand", "value": "McBrand"},
  {"name": "best for", "value": "special events"},
]
probability: float

Probability that the requested page is a single product page.

url: string

URL a of page where this product was extracted.

All fields are optional, except for url and probability.

Fields without a valid value (null or empty array) are excluded from extraction results.

Response example

Below is an example response with all product fields present:

[
  {
    "product": {
      "name": "Product name",
      "offers": [
        {
          "price": "42",
          "regularPrice": "45.00",
          "currency": "USD",
          "availability": "InStock"
        }
      ],
      "sku": "product sku",
      "mpn": "product mpn",
      "gtin": [
        {
          "type": "ean13",
          "value": "978-3-16-148410-0"
        }
      ],
      "brand": "product brand",
      "breadcrumbs": [
        {
          "name": "Level 1",
          "link": "http://example.com"
        }
      ],
      "mainImage": "http://example.com/image.png",
      "images": [
        "http://example.com/image.png"
      ],
      "description": "product description",
      "aggregateRating": {
        "ratingValue": 4.5,
        "bestRating": 5.0,
        "reviewCount": 31
      },
      "additionalProperty": [
        {
          "name": "property 1",
          "value": "value of property 1"
        }
      ],
      "probability": 0.95,
      "url": "https://example.com/product"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a2",
      "domain": "example.com",
      "userQuery": {
        "pageType": "product",
        "url": "https://example.com/product"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]