Day17 requests模组二

今天的影片内容为解释向网页服务器请求资料失败可能的原因
以及碰到「反爬虫机制」的应对方法/images/emoticon/emoticon13.gif

以下为影片中有使用到的程序码

#使用raise_for_status()找出错误原因
import requests

url = "https://www.kingstone.com.tw/"
htmlfile = requests.get(url)

if htmlfile.status_code == 200:
    print("列印出网页内容:\n", htmlfile.text)
else:
    print("网页下载失败")
    print(htmlfile.raise_for_status()) 
#新增表头并伪装成浏览器
import requests

url = "https://www.kingstone.com.tw/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'}
htmlfile = requests.get(url, headers = headers)

if htmlfile.status_code == 200:
    print("列印出网页内容:\n", htmlfile.text)
else:
    print("网页下载失败")
    print("失败原因:\n", htmlfile.raise_for_status())
#time.sleep
import requests
import time
import random

url = "https://new.ntpu.edu.tw/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}

for i in range(3):
    
    htmlfile = requests.get(url, headers = headers)
    if htmlfile.status_code == 200:
        print("列印出网页内容:\n", htmlfile.text)
    else:
        print("网页下载失败")
        print("失败原因:\n", htmlfile.raise_for_status())
    time.sleep(random.randint(1,5))
#设置代理ip
import requests

url = "http://icanhazip.com" #查询目前ip的网站
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}

proxies = {"http":"http://194.5.193.183:80"} #194.5.193.183:80为免费代理ip,不一定有效
htmlfile = requests.get(url, headers = headers, proxies = proxies)

print(htmlfile.text)
#尝试下载PTT网页
import requests

url = "https://www.ptt.cc/bbs/Gossiping/index.html"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}
htmlfile = requests.get(url, headers = headers)


if htmlfile.status_code == 200:
    print("列印出网页内容:\n", htmlfile.text)
else:
    print("网页下载失败")
    print("失败原因:\n", htmlfile.raise_for_status())
#增加cookies
import requests

url = "https://www.ptt.cc/bbs/Gossiping/index.html"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36'}
cookies = {'over18':'1'}
htmlfile = requests.get(url, headers = headers, cookies = cookies)


if htmlfile.status_code == 200:
    print("列印出网页内容:\n", htmlfile.text)
else:
    print("网页下载失败")
    print("失败原因:\n", htmlfile.raise_for_status())

推荐代理ip的连结

http://www.google-proxy.net/

如果在影片中有说得不太清楚或错误的地方,欢迎留言告诉我,谢谢您的指教。


<<:  Day16. 老鼠,老虎傻傻分不清楚?- Mouse(上)

>>:  【设计+切版30天实作】|Day17 - Bootstrap的环境建立

企划实现(18)

在撰写程序时我发现了一个以前没有遇到过的事情,我原先一直以为是因为环境导致的但是後来我发现跟环境没有...

冒险村11 - frozen_string_literal

11 - frozen_string_literal 延续 Begin from linter : ...

部署网站with Heroku - Application Error

估狗了一下,百百种原因会导致Application Error,为了寻找问题来源,依照画面指示在T...

Day 26: Tensorflow分类 分类图像衣物(一)

Tensorflow 衣物图像分类 辅助阅读: Basic classification: Clas...

Day 29 Chatbot integration- 多功能 chatbot 就此诞生!

Chatbot integration- 多功能 chatbot 就此诞生! 终於到了这一步,要把所...