天堂图片网下载,将img标签中的src属性提取出来,交给
urllib.request.urlretrieve函数【urllib.urlretrieve(python2中)】自动回调Schedule函数,显示当前下载进度,
Schedule包含3个参数
blocknum:已经下载的数据块 blocksize:数据块的大小 totalsize:远程文件的大小
1 import urllib.request 2 from lxml import etree 3 import requests 4 def Schedule(blocknum,blocksize,totalsize): 5 ''''' 6 blocknum:已经下载的数据块 7 blocksize:数据块的大小 8 totalsize:远程文件的大小 9 '''10 per = 100.0 * blocknum * blocksize / totalsize11 if per > 100 :12 per = 10013 print('当前下载进度:%d'%per)14 user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'15 headers={ 'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36'}16 r = requests.get('http://www.ivsky.com/tupian/ziranfengguang/',headers=headers)17 #使用lxml解析网页18 html = etree.HTML(r.text)19 img_urls = html.xpath('.//img/@src')#先找到所有的img20 i=021 for img_url in img_urls:22 urllib.request.urlretrieve(img_url,'img'+str(i)+'.jpg',Schedule)23 i+=1