Product Review Extraction (beta)¶
Request example¶
If you requested product review extraction, and the extraction succeeds,
then the productReviews
field will be available in the query result:
from autoextract.sync import request_raw
query = [{
'url': 'https://example.com/product-review',
'pageType': 'productReviews'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['productReviews'])
Available fields¶
Top-level¶
The following fields are available for productReviews
:
url
: string, requiredURL of a page where reviews were extracted.
reviews
: list of dictionariesList of reviews. Individual fields are described below.
Individual reviews¶
Each review inside reviews
field has the following fields available:
name
: StringTitle, Header or Name of the review.
reviewBody
: StringText of the review, with newline separators.
reviewRating
: DictionaryInformation about the rating of the review. Fields:
ratingValue
is a number representing the rating given by the reviewerbestRating
is the best possible rating, if known.
For example, “3 out of 5” would look like this:
{"ratingValue": 3, "bestRating": 5}
datePublished
: StringPublication date. ISO-formatted with ‘T’ separator, may contain a timezone.
datePublishedRaw
: StringSame date as
datePublished
, but before parsing/normalisation, i.e. as it appears on the website.votedHelpful
: IntegerNumber of votes that consider this review as helpful (upvotes).
votedUnhelpful
: IntegerNumber of votes that consider this review as unhelpful (downvotes).
isVerified
: BooleanWhether the reviewer has been verified as a legit person or a real owner / buyer of the reiewed product.
probability
: FloatProbability that this is a review.
Reviews refer to a product available on the same page.
All fields are optional, except for probability
.
Fields without a valid value (null or empty array) are excluded from extraction results.
Response example¶
Below is an example response with all product review fields present:
[
{
"productReviews": {
"url": "https://example.com/product-review",
"reviews": [
{
"name": "A great tool!",
"reviewBody": "AutoExtract is a great tool for review extraction",
"reviewRating": {
"ratingValue": 5.0,
"bestRating": 5.0
},
"datePublished": "2020-01-30T00:00:00",
"datePublishedRaw": "Jan 30, 2020",
"votedHelpful": 12,
"votedUnhelpful": 1,
"isVerified": true,
"probability": 0.95
},
{
"name": "Another review",
"probability": 0.95
}
]
},
"webPage": {
"inLanguages": [
{"code": "en"},
{"code": "es"}
]
},
"query": {
"id": "1564747029122-9e02a1868d70b7a3",
"domain": "example.com",
"userQuery": {
"pageType": "productReviews",
"url": "https://example.com/product-review"
}
},
"algorithmVersion": "20.8.1"
}
]