python - Iterating over links in selenium with scrapy -
i learning scrape selenium , scrapy. have page list of links. want click first link , visit page crawl items , again come main(previous page list of links) , click on second link , crawl , repeat process until desired links over. click first link , crawler stops. done again crawl second link , remaining ones?
my spider looks so:
class test(initspider): name="test" start_urls = ["http://www.somepage.com"] def __init__(self): initspider.__init__(self) self.browser = webdriver.firefox() def parse(self,response): self.browser.get(response.url) time.sleep(2) items=[] sel = selector(text=self.browser.page_source) links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a') link in links: link.click() time.sleep(10) #do crawling , go , repeat process. self.browser.back()
thanks
you can take approach: call browser.get()
every href
of link in loop:
links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a') link in links: link = link.get_attribute('href') self.browser.get(link) # crawl
if link relative need join http://www.somepage.com
.
Comments
Post a Comment