Product Review Extraction (beta)

Request example

If you requested product review extraction, and the extraction succeeds, then the productReviews field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'https://example.com/product-review',
    'pageType': 'productReviews'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['productReviews'])

Available fields

Top-level

The following fields are available for productReviews:

url: string, required

URL of a page where reviews were extracted.

reviews: list of dictionaries

List of reviews. Individual fields are described below.

Individual reviews

Each review inside reviews field has the following fields available:

name: String

Title, Header or Name of the review.

reviewBody: String

Text of the review, with newline separators.

reviewRating: Dictionary

Information about the rating of the review. Fields:

  • ratingValue is a number representing the rating given by the reviewer

  • bestRating is the best possible rating, if known.

For example, “3 out of 5” would look like this:

{"ratingValue": 3, "bestRating": 5}
datePublished: String

Publication date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw: String

Same date as datePublished, but before parsing/normalisation, i.e. as it appears on the website.

votedHelpful: Integer

Number of votes that consider this review as helpful (upvotes).

votedUnhelpful: Integer

Number of votes that consider this review as unhelpful (downvotes).

isVerified: Boolean

Whether the reviewer has been verified as a legit person or a real owner / buyer of the reiewed product.

probability: Float

Probability that this is a review.

Reviews refer to a product available on the same page. All fields are optional, except for probability.

Fields without a valid value (null or empty array) are excluded from extraction results.

Response example

Below is an example response with all product review fields present:

[
  {
    "productReviews": {
      "url": "https://example.com/product-review",
      "reviews": [
        {
          "name": "A great tool!",
          "reviewBody": "AutoExtract is a great tool for review extraction",
          "reviewRating": {
            "ratingValue": 5.0,
            "bestRating": 5.0
          },
          "datePublished": "2020-01-30T00:00:00",
          "datePublishedRaw": "Jan 30, 2020",
          "votedHelpful": 12,
          "votedUnhelpful": 1,
          "isVerified": true,
          "probability": 0.95
        },
        {
          "name": "Another review",
          "probability": 0.95
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "productReviews",
        "url": "https://example.com/product-review"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]