Warning

Zyte API is replacing Smart Proxy Manager. See Migrating from Smart Proxy Manager to Zyte API.

Smart Proxy Manager API#

Note

To fetch HTTPS web pages you need to download and install the certificate file or disable SSL certificate verification. See Fetching HTTPS pages with Zyte Smart Proxy Manager.

Warning

Proxy-Authorization header is required on 8010, 8011 and 8014 ports otherwise HTTP 407 response is returned.

Proxy API#

Smart Proxy Manager works with a standard HTTP web proxy API, where you only need an API key for authentication. This is the standard way to perform a request via Smart Proxy Manager:

curl -vkx proxy.zyte.com:8011 -U <API key>: https://httpbin.org/get

Errors#

See Errors Reference.

Sessions#

See Sessions

Request Headers#

Smart Proxy Manager supports multiple HTTP headers to control its behaviour.

Note

The total size of HTTP headers for a request is limited to 100 KiB.

X-Crawlera-Profile#

X-Crawlera-Profile mimics real browsers by applying a set of browser-specific headers. For example, all modern browsers set User-Agent, Accept and Accept-Language headers. Also, some browsers set DNT and Upgrade-Insecure-Requests headers.

Example:

X-Crawlera-Profile: pass

Supported values for this header are:

  • pass - do not use any browser profile, use all headers from the request

  • desktop- use a random desktop browser profile ignoring browser-specific headers from the request (default on Starter, Basic and Advanced plans)

  • mobile - use a random mobile browser profile ignoring browser-specific headers from the request

If an unsupported value is passed in X-Crawlera-Profile header, Smart Proxy Manager replies with a 540 Bad Header Value.

This header’s intent is to replace legacy X-Crawlera-UA so if you pass both X-Crawlera-UA and X-Crawlera-Profile, the latter supersedes X-Crawlera-UA.

X-Crawlera-Profile-Pass#

Smart Proxy Manager profiles already provide correct default values for the headers sent by the mimicked browser. If you want to use your own header, please use complimentary header X-Crawlera-Profile-Pass. The value of X-Crawlera-Profile-Pass is the name of the header you need to use. In that case, Smart Proxy Manager won’t override you value. You can put several header names there, delimited by comma.

Example

You want to use your own specific browser locale (de_DE) instead of default en_US. In that case, you need to put Accept-Language as a value of X-Crawlera-Profile-Pass and provide de_DE as a value of Accept-Language.

X-Crawlera-Profile: desktop
X-Crawlera-Profile-Pass: Accept-Language
Accept-Language: de_DE

Note

X-Crawlera-Profile-Pass was introduced on February 5th 2019. Before this day, any additional header passed with X-Crawlera-Profile like Accept would override the behaviour set by Smart Proxy Manager.

X-Crawlera-No-Bancheck#

This header instructs Smart Proxy Manager not to check responses against its ban rules and pass any received response to the client. The presence of this header (with any value) is assumed to be a flag to disable ban checks.

Example:

X-Crawlera-No-Bancheck: 1

X-Crawlera-Cookies#

This header controls internal cookie tracking performed by Smart Proxy Manager.

Supported values for this header are:

  • enable - internal cookies override cookies from the request

  • disable - cookies from the request override internal cookies

  • discard - all cookies are discarded (default on Starter, Basic and Advanced plans)

Example:

X-Crawlera-Cookies: disable

X-Crawlera-Session#

This header instructs Smart Proxy Manager to use sessions which will tie requests to a particular outgoing IP until it gets banned.

Example:

X-Crawlera-Session: create

When create value is passed, Smart Proxy Manager creates a new session an ID of which will be returned in the response header with the same name. All subsequent requests should use that returned session ID to prevent random outgoing IP switching between requests. Smart Proxy Manager sessions currently have maximum lifetime of 30 minutes. See Sessions for information on the maximum number of sessions.

X-Crawlera-JobId#

This header sets the job ID for the request (useful for tracking requests in the Smart Proxy Manager logs).

Example:

X-Crawlera-JobId: 999

X-Crawlera-Max-Retries#

This header limits the number of retries performed by Crawlera.

Example:

X-Crawlera-Max-Retries: 1

Passing 1 in the header instructs Smart Proxy Manager to do up to 1 retry. Default number of attempts is 3 (5 is the allowed maximum value, the minimum being 0). Passing 0 or 1 on this header has the same effect (only one attempt to execute the request).

X-Crawlera-Timeout#

This header sets Crawlera’s timeout in milliseconds for receiving a response from the target website. The timeout must be specified in milliseconds and be between 30,000 and 180,000. It’s not possible to set the timeout higher than 180,000 milliseconds or lower than 30,000 milliseconds, it will be rounded to its nearest maximum or minimum value.

Example:

X-Crawlera-Timeout: 40000

The example above sets the response timeout to 40,000 milliseconds. In the case of a streaming response, each chunk has 40,000 milliseconds to be received. If no response is received after 40,000 milliseconds, a 504 response will be returned. If not specified, it will default to 30000.

[Deprecated] X-Crawlera-Use-Https#

Previously the way to perform https requests needed the http variant of the url plus the header X-Crawlera-Use-Https with value 1 like the following example:

curl -x proxy.zyte.com:8011 -U <API key>: http://twitter.com -H x-crawlera-use-https:1

Now you can directly use the https url and remove the X-Crawlera-Use-Https header, like this:

curl -x proxy.zyte.com:8011 -U <API key>: https://twitter.com

If you don’t use curl for Smart Proxy Manager you can check the rest of the documentation and update your scripts in order to continue using Smart Proxy Manager without issues. Also, some programming languages will require the Zyte CA certificate.

Response Headers#

X-Crawlera-Next-Request-In#

This header is returned when response delay reaches the soft limit (120 seconds) and contains the calculated delay value. If the user ignores this header, the hard limit (1000 seconds) may be reached, after which Smart Proxy Manager will return HTTP status code 503 with delay value in Retry-After header.

X-Crawlera-Debug#

This header activates tracking of additional debug values which are returned in response headers. At the moment only request-time and ua values are supported, comma should be used as a separator. For example, to start tracking request time send:

X-Crawlera-Debug: request-time

or, to track both request time and User-Agent send:

X-Crawlera-Debug: request-time,ua

The request-time option forces Smart Proxy Manager to output to the response header a request time (in seconds) of the last request retry (i.e. the time between Smart Proxy Manager sending request to an outgoing IP and Smart Proxy Manager receiving response headers from that outgoing IP):

X-Crawlera-Debug-Request-Time: 1.112218

The ua option allows to obtain information about the actual User-Agent which has been applied to the last request (useful for finding reasons behind redirects from a target website, for instance):

X-Crawlera-Debug-UA: Mozilla/5.0 (Windows; U; Windows NT 6.1; zh-CN) AppleWebKit/533+ (KHTML, like Gecko)

X-Crawlera-Error#

This header is returned when an error condition is met, stating a particular Smart Proxy Manager error behind HTTP status codes (4xx or 5xx). The error message is sent in the response body.

Example:

X-Crawlera-Error: user_session_limit

Note

Returned errors are internal to Smart Proxy Manager and are subject to change at any time, so should not be relied on.

Using Smart Proxy Manager with headless browsers#

See our articles:

Using Smart Proxy Manager from different languages#

Use our code examples provided in Smart Proxy Manager Integrations.

Accessing HTTPS URLs#

See Fetching HTTPS pages with Zyte Smart Proxy Manager.

Working with Cookies#

See FAQ on Does Zyte Smart Proxy Manager handle cookies?.

HTTPS Proxy Endpoint#

As described in curl:

An HTTPS proxy receives all transactions over an SSL/TLS connection. Once a secure connection with the proxy is established, the user agent uses the proxy as usual, including sending CONNECT requests to instruct the proxy to establish a [usually secure] TCP tunnel with an origin server. HTTPS proxies protect nearly all aspects of user-proxy communications as opposed to HTTP proxies that receive all requests (including CONNECT requests) in vulnerable clear text.

Smart Proxy Manager supports a HTTPS Proxy interface on port 8014.

Here is an example of how to use it with curl:

curl -vx https://proxy.zyte.com:8014 -U <API key>: http://httpbin.org/ip

You can use Smart Proxy Manager HTTPS Proxy with all HTTP clients that support HTTPS proxies.

Note

On some operating systems, or when using certain HTTP clients, you may need the Zyte CA certificate to use the HTTPS proxy endpoint.