Scrapinghub Reference Documentation

Forum Post Extraction (beta)

Forum Post refers to the posts made on internet forum page where a specific topic is discussed (thread).

If you requested a forum post extraction, and the extraction succeeds, then the forumPosts field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'https://example.com/forum-post-page',
    'pageType': 'forumPosts'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['forumPosts'])

The following fields are available for forumPosts:

Name

Type

Description

url

String

URL of a page from where posts were extracted.

topic

Dictionary with name string field

Name of the topic that is discussed on the page.

posts

List of dictionaries

List of posts, individual fields described below.

url field is required.

Each post inside posts field has the following fields available:

Name

Type

Description

text

String

Text of the post.

datePublished

String

Post date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw

String

Same date but before parsing, as it appeared on the site.

upvoteCount

Integer

Number of up-vote (likes) recieved by the post.

replyCount

Integer

Number of replies recieved by the post.

probability

Float

Probability that this is a post.

Posts refer to the topic extracted from the same page.

All fields are optional, except for probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Below is an example response with all forum post fields present:

[
  {
    "forumPosts": {
      "url": "https://example.com/forum-topic-1",
      "topic": {
        "name": "Which is the best country to work in?"
      },
      "posts": [
        {
          "text": "Finland is often considered the best for it.",
          "datePublished": "2020-01-30T00:00:00",
          "datePublishedRaw": "Jan 30, 2020",
          "upvoteCount": 12,
          "replyCount": 1,
          "probability": 0.95
        },
        {
          "text": "Switzerland has good work life balance.",
          "upvoteCount": 2,
          "probability": 0.80
        },
        {
          "text": "Depends on the person",
          "replyCount": 1,
          "probability": 0.80
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "forumPosts",
        "url": "https://example.com/forum-topic-1"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]