Scrapinghub Reference Documentation

Product review extraction (beta)

If you requested product review extraction, and the extraction succeeds, then the productReviews field will be available in the query result:

import requests

response = requests.post(
    'https://autoextract.scrapinghub.com/v1/extract',
    auth=('[api key]', ''),
    json=[{'url': 'https://example.com/product-review',
           'pageType': 'productReviews'}])
print(response.json()[0]['productReviews'])

The following fields are available for product reviews:

Name

Type

Description

url

String

URL of a page where reviews were extracted.

reviews

List of dictionaries

List of reviews, individual fields described below.

url field is required.

Each review inside reviews field has the following fields available:

Name

Type

Description

name

String

Title, Header or Name of the review.

reviewBody

String

Text of the review, with newline separators.

reviewRating

Dictionary with ratingValue and optional bestRating float fields

ratingValue is a number representing the rating given by the reviewer, and bestRating is the best possible rating, if known.

datePublished

String

Publication date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw

String

Same date but before parsing, as it appeared on the site.

votedHelpful

Integer

Number of votes that consider this review as helpful (upvotes).

votedUnhelpful

Integer

Number of votes that consider this review as unhelpful (downvotes).

isVerified

Boolean

Whether the reviewer has been verified as a legit person or a real owner / buyer of the reiewed product.

probability

Float

Probability that this is a review.

Reviews refer to a product available on the same page. All fields are optional, except for probability.

Fields without a valid value (null or empty array) are excluded from extraction results.

Below is an example response with all product review fields present:

[
  {
    "productReviews": {
      "url": "https://example.com/product-review",
      "reviews": [
        {
          "name": "A great tool!",
          "reviewBody": "AutoExtract is a great tool for review extraction",
          "reviewRating": {
            "ratingValue": 5.0,
            "bestRating": 5.0
          },
          "datePublished": "2020-01-30T00:00:00",
          "datePublishedRaw": "Jan 30, 2020",
          "votedHelpful": 12,
          "votedUnhelpful": 1,
          "isVerified": true,
          "probability": 0.95
        },
        {
          "name": "Another review",
          "probability": 0.95
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "productReviews",
        "url": "https://example.com/product-review"
      }
    }
  }
]