Warning

Zyte API is replacing Smart Proxy Manager. See Migrating from Smart Proxy Manager to Zyte API.

Fetch API (replaced by Zyte API)#

Warning

Fetch API functionality is now offered under Zyte API, see Get started with Zyte API. This documentation is left for reference purposes of existing Fetch API users, new users should sign up to Zyte API.

Warning

To use the Fetch API you will need a Smart Proxy Manager API key with Browser Execution functionality enabled, even if you don’t use the render and screenshot parameters. Otherwise you will get 401 Unauthorized response.

The Fetch API allows you to download web pages using an HTTP API, instead of a proxy API. It provides server-side browser execution capabilities, and better browser emulation than requests processed through the standard proxy API.

Authentication is done through standard HTTP auth, using your Smart Proxy Manager API key as the user name and an empty password.

Here is an example of a working request (replace API_KEY with your API key):

curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2 -d '{"url": "https://toscrape.com/"}' -H 'Content-Type: application/json'

The examples in this documentation are provided as commands to execute in a terminal. You will need the curl and jq command line tools. curl often comes installed with your operating system, while jq needs to be downloaded.

You can download jq at: https://stedolan.github.io/jq/download/

Request Endpoint & Parameters#

Endpoint: http://fetch.crawlera.com:8011/fetch/v2
Method: POST
Parameter values should be URL encoded.

Parameter	Required	Description	Example	Default
`url`	yes	URL to fetch	`https://toscrape.com`
`region`	no	The region to route the request through, specified as a country code. If `auto` or ommitted, Smart Proxy Manager will pick the best region to route the request based on the target website.	`es`	`auto`
`render`	no	Pass `true` to render the URL in a browser.	`true`	`false`
`screenshot`	no	Pass `true` to return the `screenshot` field in the response, with a screenshot of the page. Implies `render=true`.	`true`	`false`

Response Format#

The status code of the Fetch API response is always 200 (regardless of the response from the target website) unless there is a problem with the Fetch API itself.

The response size limit is 10 Mb.

The API response is a (utf-8 encoded) JSON object with the following fields:

Name	Type	Description
`url`	String	The URL of the page fetched
`body`	String	The body of the response, encoded using `body_encoding`.
`body_encoding`	String	The encoding used for the body of the response. Either `plain` or `base64`.
`headers`	Object	The HTTP headers of the response.
`original_status`	String	The HTTP status of the response received from the website
`crawlera_status`	string	Smart Proxy Manager status, one of: `success` - successful request (counts towards monthly quota) `ban` - request was banned after trying multiple proxies `fail` - other error (not a ban) prevented fulfilling the request. See crawlera_error.
`screenshot`	String	A screenshot of the page, encoded in `base64`

Example Fetch API response:

{
      "url": "https://toscrape.com",
      "screenshot": "",
      "original_status": 200,
      "headers": {
        "server": "nginx/1.14.0 (Ubuntu)",
        "date": "Mon, 25 May 2020 17:40:16 GMT",
        "content-type": "text/html",
        "last-modified": "Wed, 29 Jun 2016 21:51:37 GMT",
        "x-upstream": "toscrape-sites-master_web",
        "transfer-encoding": "chunked"
      },
      "crawlera_status": "success",
      "body_encoding": "plain",
      "body": "...HTML of the response goes here..."
    }

Use Cases#

Fetch the HTML of a page rendered in a browser#

To run this example you will need:

a Smart Proxy Manager API key with Browser Execution enabled
curl, jq command line utilies

Example:

curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true}' -H 'Content-Type: application/json' | jq '.body' -r > page.html

Fetching a screenshot#

To run this example you will need:

a Smart Proxy Manager API key with Browser Execution enabled
curl, jq, base64 command line utilies

Example:

curl -u <API_KEY>: http://fetch.zyte.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true, "screenshot": true}' -H 'Content-Type: application/json' | jq '.screenshot' -r | base64 -d > image.jpg

Scrapy Middleware for Fetch API#

There is an official Scrapy downloader middleware to download pages using Fetch API.

https://github.com/scrapy-plugins/scrapy-crawlera-fetch

Installation and usage instructions can be found in the README of the project.