文章詳情頁(yè)

網(wǎng)頁(yè)爬蟲(chóng) - Python3.6 下的爬蟲(chóng)總是重復(fù)爬第一頁(yè)的內(nèi)容

瀏覽：145日期：2022-06-30 17:08:03

問(wèn)題描述

問(wèn)題如題：改成while，試了很多，然沒(méi)有效果，請(qǐng)教大家

# coding:utf-8# from lxml import etreeimport requests,lxml.html,osclass MyError(Exception): def __init__(self, value):self.value = value def __str__(self):return repr(self.value) def get_lawyers_info(url): r = requests.get(url) html = lxml.html.fromstring(r.content) # phones = html.xpath(’//span[@class='law-tel']’) phones = html.xpath(’//span[@class='phone pull-right']’) # names = html.xpath(’//p[@class='fl']/p/a’) names = html.xpath(’//h4[@class='text-center']’) if(len(phones) == len(names)):list(zip(names,phones))phone_infos = [(names[i].text, phones[i].text_content()) for i in range(len(names))] else:error = 'Lawyers amount are not equal to the amount of phone_nums: '+urlraise MyError(error) phone_infos_list = [] for phone_info in phone_infos:if(phone_info[0] == ''): info = '沒(méi)留姓名'+': '+phone_info[1]+'rn'else: info = phone_info[0]+': '+phone_info[1]+'rn'print (info)phone_infos_list.append(info) return phone_infos_listdir_path = os.path.abspath(os.path.dirname(__file__))print (dir_path)file_path = os.path.join(dir_path,'lawyers_info.txt')print (file_path)if os.path.exists(file_path): os.remove(file_path)with open('lawyers_info.txt','ab') as file: for i in range(1000):url = 'http://www.xxxx.com/cooperative_merchants?searchText=&industry=100&provinceId=19&cityId=0&areaId=0&page='+str(i+1)# r = requests.get(url)# html = lxml.html.fromstring(r.content)# phones = html.xpath(’//span[@class='phone pull-right']’)# names = html.xpath(’//h4[@class='text-center']’) # if phones or names:info = get_lawyers_info(url)for each in info: file.write(each.encode('gbk'))

問(wèn)題解答

回答1：

# coding: utf-8import requestsfrom pyquery import PyQuery as Qurl = ’http://www.51myd.com/cooperative_merchants?industry=100&provinceId=19&cityId=0&areaId=0&page=’with open(’lawyers_info.txt’, ’ab’) as f: for i in range(1, 5):r = requests.get(’{}{}’.format(url, i))usernames = Q(r.text).find(’.username’).text().split()phones = Q(r.text).find(’.phone’).text().split()print zip(usernames, phones)

Python 編程

上一條：python from fileutils import FileUtils文件操作下一條：網(wǎng)頁(yè)爬蟲(chóng) - python+smtp發(fā)送郵件附件問(wèn)題

相關(guān)文章：

1. python - oslo_config2. python - 請(qǐng)問(wèn)這兩個(gè)地方是為什么呢？3. php - 有關(guān)sql語(yǔ)句反向LIKE的處理4. javascript - 按鈕鏈接到另一個(gè)網(wǎng)址怎么通過(guò)百度統(tǒng)計(jì)計(jì)算按鈕的點(diǎn)擊數(shù)量5. python2.7 - python 正則前瞻后瞻無(wú)法匹配到正確的內(nèi)容6. 急急急！！！求大神解答網(wǎng)站評(píng)論問(wèn)題，有大神幫幫小弟嗎7. 大家都用什么工具管理mysql數(shù)據(jù)庫(kù)？8. 人工智能 - python 機(jī)器學(xué)習(xí) 醫(yī)療數(shù)據(jù) 怎么學(xué)9. mysql - Sql union 操作10. 請(qǐng)教一個(gè)mysql去重取最新記錄

排行榜

					
					php - 有關(guān)sql語(yǔ)句反向LIKE的處理
javascript - 切換掉當(dāng)前頁(yè)面后該頁(yè)面的js動(dòng)畫(huà)會(huì)暫停？
javascript - if(input.type==’text’&&type != ’text’){return false;}這是什么意思？
golang - 用IDE看docker源碼時(shí)的小問(wèn)題
docker-compose 為何找不到配置文件？
在mac下出現(xiàn)了兩個(gè)docker環(huán)境
boot2docker無(wú)法啟動(dòng)
angular.js - angularjs 路由如何禁止緩存
Android 關(guān)于圖片壓縮的問(wèn)題。
myeclipse中有AB兩個(gè)項(xiàng)目，A項(xiàng)目的java文件用GBK編碼，B項(xiàng)目用utf-8編碼，怎么辦？
javascript - 請(qǐng)問(wèn)一下react-native 布局的時(shí)候，尺寸的大小是如何確定的呢？
				

国产成人精品久久免费动漫-国产成人精品天堂-国产成人精品区在线观看-国产成人精品日本-a级毛片无码免费真人-a级毛片毛片免费观看久潮喷

網(wǎng)頁(yè)爬蟲(chóng) - Python3.6 下的爬蟲(chóng)總是重復(fù)爬第一頁(yè)的內(nèi)容