Web Scraper教程：Web Scraper：如何通过API管理抓取任务

Web Scraper API 是Web Scraper Cloud提供的功能，允许用户通过API调用来管理Sitemaps、执行抓取任务和下载抓取到的数据。你可以使用Node.js或PHP SDK来快速集成这些API，帮助自动化数据抓取流程，并进行高效的任务管理。

如何使用Web Scraper API？

1. API访问令牌

要访问Web Scraper API，你需要在Web Scraper Cloud的API页面找到你的API访问令牌。这个令牌用于身份验证和API调用，确保你的请求是合法的。

2. API调用限制

默认情况下，每个用户每15分钟可以进行200次API调用。你可以通过API响应头追踪剩余调用次数：

X-RateLimit-Limit: 200
X-RateLimit-Remaining: 剩余调用次数
X-RateLimit-Reset: 限制重置的时间戳（仅在达到调用限制时返回）

如果调用限制被触发，SDK将自动进入休眠状态，并在限制恢复后继续请求。

常见API操作示例

1. 创建Sitemap

你可以通过POST请求创建一个新的Sitemap：

    
    POST https://api.webscraper.io/api/v1/sitemap?api_token=<YOUR API TOKEN>

示例JSON：

    
    {
	"_id": "webscraper-io-landing",
	"startUrl": ["http://webscraper.io/"],
	"selectors": [{
		"parentSelectors": ["_root"],
		"type": "SelectorText",
		"multiple": false,
		"id": "title",
		"selector": "h1",
		"regex": "",
		"delay": ""
	}]
}

响应：

    
    {
	"success": true,
	"data": {
		"id": 123
	}
}

2. 获取Sitemap列表

使用GET请求可以获取所有Sitemap：

    
    GET https://api.webscraper.io/api/v1/sitemaps?api_token=<YOUR API TOKEN>

响应：

    
    {
	"success": true,
	"data": [
		{
			"id": 123,
			"name": "webscraper-io-landing"
		},
		{
			"id": 124,
			"name": "webscraper-io-landing2"
		}
	],
	"current_page": 1,
	"last_page": 1,
	"total": 2,
	"per_page": 100
}

3. 创建抓取任务

你可以通过POST请求启动一个新的抓取任务：

    
    POST https://api.webscraper.io/api/v1/scraping-job?api_token=<YOUR API TOKEN>

示例JSON：

    
    {
	"sitemap_id": 123,
	"driver": "fast", 
	"page_load_delay": 2000,
	"request_interval": 2000,
	"proxy": 0
}

响应：

    
    {
	"success": true,
	"data": {
		"id": 500,
		"custom_id": "custom-scraping-job-12"
	}
}

4. 下载抓取数据

你可以通过API下载已抓取的数据，支持JSON和CSV格式。

JSON格式下载：

    
    GET https://api.webscraper.io/api/v1/scraping-job/<SCRAPING JOB ID>/json?api_token=<YOUR API TOKEN>

响应：

    
    {"title":"Nokia 123","price":"$24.99","description":"7 day battery"}
{"title":"ProBook","price":"$739.99","description":"14\", Core i5 2.6GHz, 4GB, 500GB, Win7 Pro 64bit"}

CSV格式下载：

    
    GET https://api.webscraper.io/api/v1/scraping-job/<SCRAPING JOB ID>/csv?api_token=<YOUR API TOKEN>

响应：

    
    web-scraper-order,title,price,description
1494492462-1,Nokia 123,$24.99,7 day battery
1494492462-2,ProBook,$739.99,14", Core i5 2.6GHz, 4GB, 500GB, Win7 Pro 64bit

API调用的其他功能

1. 任务进度监控

你可以通过API获取抓取任务的状态，确保任务顺利进行：

    
    GET https://api.webscraper.io/api/v1/scraping-job/<SCRAPING JOB ID>?api_token=<YOUR API TOKEN>

任务状态包括：

waiting-to-be-scheduled: 任务正在等待排队。
started: 任务正在进行中。
failed: 任务执行失败，通常是由于网络错误或页面返回了大量4xx/5xx错误。
finished: 任务已完成。

2. Sitemap调度

你可以通过API设置或取消Sitemap的调度任务，以便按计划执行抓取任务。例如，每天每隔10分钟运行一次任务：

    
    POST https://api.webscraper.io/api/v1/sitemap/<Sitemap ID>/enable-scheduler?api_token=<YOUR API TOKEN>

示例JSON：

    
    {
    "cron_minute": "*/10",
    "cron_hour": "*",
    "cron_day": "*",
    "cron_month": "*",
    "cron_weekday": "*",
    "request_interval": 2000,
    "page_load_delay": 2000,
    "driver": "fast",
    "proxy": 0
}

最后感受

通过Web Scraper API，你可以将数据抓取流程高度自动化，并灵活管理任务和数据。无论是创建Sitemap、启动抓取任务，还是下载抓取结果，API都提供了强大的工具来实现高效的数据处理。如果你正在构建一个需要自动化抓取的应用程序，Web Scraper API无疑是你最佳的选择！

不少同学问，有没有好用的服务器及性价比高的DeepSeek服务器推荐，我这里把我常用的几家服务商推荐给大家：

AI账号购买渠道【稳定靠谱】：https://link3.cc/torblack

阿里云api【免费送额度】：https://sourl.cn/T4Swar

腾讯云deepseek api【支持联网白嫖】https://curl.qcloud.com/T3M5yBHp

deepseek api购买【注册就送15元余额】：https://cloud.siliconflow.cn/i/VXZzAOed

华为云：https://sourl.cn/3RKEYt

京东云【优惠幅度最大】：https://3.cn/2-dSbfiR

UCloud海外服务器【性价比超高】:https://sourl.cn/icfrdG

想深入了解DeepSeek的核心玩法扫描下方二维码加入微信群

阅读全文