Warning

Zyte Automatic Extraction will be discontinued starting April 30th, 2024. It is replaced by Zyte API. See Migrating from Automatic Extraction to Zyte API.

General Web Page Information#

General web page information is returned for any requested page type.

Request example#

If you requested any kind of extraction (e.g. article or job posting), and the extraction succeeds, then along with the type-specific field (e.g. article or jobPosting), the webPage field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://example.com/article?id=24',
    'pageType': 'article'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['article'])
print(results[0]['webPage'])

Available fields#

The following fields are available for webPage:

inLanguages: list of dictionaries with code field

The list of languages used on the page, ordered from the most prominently used language to the least used. code denotes the IETF BCP 47 language tag. In case the language is not detected, the field is ommited.

Example:

[{"code": "en"}]

Response example#

Below is an example response with all web page fields present in case article extraction was requested. Article fields are omitted in this example, the webPage field would be present for other kinds of extraction as well:

[
  {
    "article": {
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "article",
        "url": "https://example.com/article"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]