python - Iterating over links in selenium with scrapy -

- September 15, 2011

i learning scrape selenium , scrapy. have page list of links. want click first link , visit page crawl items , again come main(previous page list of links) , click on second link , crawl , repeat process until desired links over. click first link , crawler stops. done again crawl second link , remaining ones?

my spider looks so:

class test(initspider):     name="test"     start_urls = ["http://www.somepage.com"]      def __init__(self):         initspider.__init__(self)         self.browser = webdriver.firefox()      def parse(self,response):         self.browser.get(response.url)         time.sleep(2)         items=[]         sel = selector(text=self.browser.page_source)         links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a')         link in links:             link.click()             time.sleep(10)             #do crawling , go , repeat process.             self.browser.back()

thanks

you can take approach: call browser.get() every href of link in loop:

links = self.browser.find_elements_by_xpath('//ol[@class="listing"]/li/h4/a') link in links:     link = link.get_attribute('href')     self.browser.get(link)     # crawl

if link relative need join http://www.somepage.com.

Search This Blog

EIght

python - Iterating over links in selenium with scrapy -

Comments

Post a Comment

Popular posts from this blog

windows - Single EXE to Install Python Standalone Executable for Easy Distribution -

c# - Access objects in UserControl from MainWindow in WPF -

javascript - How to name a jQuery function to make a browser's back button work? -