Scrapinghub Reference Documentation

Comment Extraction (beta)

If you requested a comment extraction, and the extraction succeeds, then the comments field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'https://example.com/article-with-comments',
    'pageType': 'comments'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['comments'])

The following fields are available for comments:

Name

Type

Description

url

String

URL of a page where comment were extracted.

comments

List of dictionaries

List of comments, individual fields described below.

url field is required.

Each review inside comments field has the following fields available:

Name

Type

Description

text

String

Text of the comment.

datePublished

String

Comment date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw

String

Same date but before parsing, as it appeared on the site.

upvoteCount

Integer

Number of up-vote recieved by the comment.

downvoteCount

Integer

Number of udown-vote recieved by the comment.

probability

Float

Probability that this is a comment.

Comments refer to an article/blog-post available on the same page.

All fields are optional, except for probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Below is an example response with all comment fields present:

[
  {
    "comments": {
      "url": "https://example.com/article-with-comments",
      "comments": [
        {
          "text": "A comment on article",
          "datePublished": "2020-01-30T00:00:00",
          "datePublishedRaw": "Jan 30, 2020",
          "upvoteCount": 12,
          "downvoteCount": 1,
          "probability": 0.95
        },
        {
          "text": "Another comment",
          "probability": 0.95
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "comments",
        "url": "https://example.com/article-with-comments"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]