cuốn sách gpt4 ai đã làm

python - Scrapy:已抓取 0 页(可在 scrapy shell 中使用,但不适用于 scrapy crawl spider 命令)

In lại Tác giả: Walker 123 更新时间:2023-11-28 16:39:13 29 4
mua khóa gpt4 Nike

我在使用 scrapy 时遇到了一些问题。它没有返回任何结果。我试图将以下蜘蛛复制并粘贴到 scrapy shell 中,它确实有效。真的不确定问题出在哪里,但是当我用“scrapy crawl rxomega”运行它时,它不起作用。

from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from iherb.items import IherbItem

class RxomegaSpider(CrawlSpider):
name = 'rxomega'
allowed_domains = ['http://www.iherb.com/']
start_urls = ['http://www.iherb.com/product-reviews/Natural-Factors-RxOmega-3-Factors-EPA-400-mg-DHA-200-mg-240-Softgels/4251/',
'http://www.iherb.com/product-reviews/Now-Foods-Omega-3-Cardiovascular-Support-200-Softgels/323/']
#rules = (
# Rule(SgmlLinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
#)

def parse_item(self, response):
print('hello')
sel = Selector(response)
sites = sel.xpath('//*[@id="mainContent"]/div[3]/div[2]/div')
items = []
for site in sites:
i = IherbItem()
i['review'] = site.xpath('div[5]/p/text()').extract()
items.append(i)
return items

我看到的消息是...scrapy 爬行 rxomega

2014-02-16 17:00:55-0800 [scrapy] INFO: Scrapy 0.22.0 started (bot: iherb)
2014-02-16 17:00:55-0800 [scrapy] INFO: Optional features available: ssl, http11, django
2014-02-16 17:00:55-0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'iherb.spiders', 'SPIDER_MODULES': ['iherb.spiders'], 'BOT_NAME': 'iherb'}
2014-02-16 17:00:55-0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-02-16 17:00:55-0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-02-16 17:00:55-0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-02-16 17:00:55-0800 [scrapy] INFO: Enabled item pipelines:
2014-02-16 17:00:55-0800 [rxomega] INFO: Spider opened
2014-02-16 17:00:55-0800 [rxomega] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-02-16 17:00:55-0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6026
2014-02-16 17:00:55-0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6083
2014-02-16 17:00:55-0800 [rxomega] DEBUG: Crawled (200) (referer: None)
2014-02-16 17:00:56-0800 [rxomega] DEBUG: Crawled (200) (referer: None)
2014-02-16 17:00:56-0800 [rxomega] INFO: Closing spider (finished)
2014-02-16 17:00:56-0800 [rxomega] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 588,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 37790,
'downloader/response_count': 2,
'downloader/response_status_count/200': 2,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 2, 17, 1, 0, 56, 22065),
'log_count/DEBUG': 4,
'log_count/INFO': 7,
'response_received_count': 2,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2014, 2, 17, 1, 0, 55, 256404)}
2014-02-16 17:00:56-0800 [rxomega] INFO: Spider closed (finished)

câu trả lời hay nhất

genspider 特性创建了一个 CrawlSpider 和 parse_item,但是教程使用了 Spider 和 parse。两者都是 0.22 版本。更改为 Spider 并解析上面的代码,它就可以工作了。

关于python - Scrapy:已抓取 0 页(可在 scrapy shell 中使用,但不适用于 scrapy crawl spider 命令),我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/21819055/

29 4 0
Chứng chỉ ICP Bắc Kinh số 000000
Hợp tác quảng cáo: 1813099741@qq.com 6ren.com
Xem sitemap của VNExpress