Scrapinghub API Reference

Requests API

The requests API allows you to work with request and response data from your crawls.

Note

Most of the features provided by the API are also available through the python-scrapinghub client library.

Request object

Field Description Required
time Request start timestamp in milliseconds Yes
method HTTP method. Default: GET Yes
url Request URL. Yes
status HTTP response code. Yes
duration Request duration in milliseconds. Yes
rs Response size in bytes. Yes
parent The index of the parent request. No
fp Request fingerprint. No

Note

Seed requests from start URLs will have no parent field.

requests/:project_id[/:spider_id][/:job_id][/:request_no]

Retrieve or insert request data for a project, spider or job, where request_no is the index of the request.

Parameter Description Required
format Results format. See Result formats. No
meta Meta keys to show. No
nodata If set, no data will be returned other than specified meta keys. No

Note

Pagination and meta parameters are supported, see Pagination and Meta parameters.

requests/:project_id/:spider_id/:job_id

Examples

Get the requests from a given job

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7
{"parent":0,"duration":12,"status":200,"method":"GET","rs":1024,"url":"http://scrapy.org/","time":1351521736957}

Adding requests

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7 -X POST -T requests.jl

requests/:project_id/:spider_id/:job_id/stats

Retrieve request stats for a given job.

Field Description
counts[field] The number of times the field occurs.
totals.input_bytes The total size of all requests in bytes.
totals.input_values The total number of requests.

Example

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7/stats
{"counts":{"url":21,"parent":19,"status":21,"method":21,"rs":21,"duration":21,"fp":21},"totals":{"input_bytes":2397,"input_values":21}}