Scrapinghub Reference Documentation

Crawlera Sessions

Sessions allow reusing the same outgoing IP across multiple regions.

Session Limits

There ic a default delay of 12 seconds between each request using the same IP. These delays can differ for more popular domains. If the requests per second limit is exceeded, further requests will be delayed for up to 15 minutes. Each request made after exceeding the limit will increase the request delay. If the request delay reaches the soft limit (120 seconds), then each subsequent request will contain X-Crawlera-Next-Request-In header with the calculated delay as the value.

Sessions expire 30 minutes after the last request sent through that session.

Crawlera sessions are available on Advanced and Enterprise plans, and the maximum number of sessions that can be created at any point in time is 5000.

Session and retries

When using sessions, retries are automatically disabled under the assumption that is not helpful to retry a request through the same outgoing IP. However, this behaviour can be overridden through the X-Crawlera-Max-Retries header. If this header is passed, Crawlera will automatically retry, even inside sessions.

Using Sessions with the Proxy API

Sessions are managed using the X-Crawlera-Session header. To create a new session send:

X-Crawlera-Session: create

Crawlera will respond with the session ID in the same header:

X-Crawlera-Session: <session ID>

From then onward, subsequent requests can be made through the same outgoing IP by sending the session ID in the request header:

X-Crawlera-Session: <session ID>

Another way to create sessions is using the /sessions endpoint:

curl -u <API key>: proxy.crawlera.com:8010/sessions -X POST

This will also return a session ID which you can pass to future requests with the X-Crawlera-Session header like before. This is helpful when you can’t get the next request using X-Crawlera-Session.

If an incorrect session ID is sent, Crawlera responds with a bad_session_id error.

Sessions API

List sessions

Issue the endpoint List sessions with the GET method to list your sessions. The endpoint returns a JSON document in which each key is a session ID and the associated value is a outgoing IP.

Example:

curl -u <API key>: proxy.crawlera.com:8010/sessions
{"1836172": "<OUTGOING_IP1>", "1691272": "<OUTGOING_IP2>"}

Delete a session

Call the endpoint /sessions/SESSION_ID with the DELETE method in order to delete a session.

Example:

curl -u <API key>: proxy.crawlera.com:8010/sessions/1836172 -X DELETE