AutoExtract Get Started¶
The AutoExtract API is a service for automatically extracting information from web pages.
You provide the page URLs that you are interested in, and what type of content you expect to find there: article, comments, forum posts, job posting, product, product list, product reviews, real estate or vehicle.
The service will then fetch the content, and apply a number of techniques behind the scenes to extract as much information as possible. Finally, the extracted information is returned to you in structured form.
Page types¶
The following page types are supported:
Price intelligence & ecommerce¶
Media & discussion monitoring¶
Market research¶
In addition to that, AutoExtract returns some general information about a web page.
Getting started¶
1. Sign up (or login)¶
To get started, you need to sign up for an Automatic Extraction subscription. You can start your free 14-day trial here. You will need a credit card to subscribe but you won’t be charged if you cancel in the first 14 days.
2. Get API Key¶
Once subscribed to the free trial, you will receive an API key. If you haven’t received one, you can contact the AutoExtract support team directly at autoextract-support@scrapinghub.com.
3. Integrate AutoExtract¶
Our recommendations to integrate AutoExtract:
use scrapinghub-autoextract if you want to extract multiple URLs from the command line,
use scrapinghub-autoextract if you want to use API from Python,
use scrapy-autoextract if you want to use API from a Scrapy spider,
check out the sample code in Node.js if you want to query AutoExtract in JavaScript with Node.js.
check out the sample code in PHP if you want to query AutoExtract in PHP with cURL.
check out the sample code in Java if you want to query AutoExtract in Java.
See AutoExtract API for the detailed description of the AutoExtract HTTP API.