Day 18 urllib模组

今天的影片内容为介绍Python内建用来下载网页资讯所使用的模组—urllib
某些观念和前两天所介绍的requests模组是差不多的呦～
而在最後会简单地介绍向网页服务器求取资料的规范—robots.txt

以下为影片中有使用到的程序码

#检查资料型态
import urllib.request

url = "https://new.ntpu.edu.tw/"
htmlfile = urllib.request.urlopen(url) 
print(type(htmlfile))

#使用read()读取物件
import urllib.request

url = "https://new.ntpu.edu.tw/"
htmlfile = urllib.request.urlopen(url)
print(htmlfile.read())

#转成utf-8编码
import urllib.request

url = "https://new.ntpu.edu.tw/"
htmlfile = urllib.request.urlopen(url)
print(htmlfile.read().decode('utf-8'))

#HTTPResponse物件常用属性
import urllib.request

url = "https://new.ntpu.edu.tw/"
htmlfile = urllib.request.urlopen(url)
print("物件网址:", htmlfile.geturl())
print("下载情形:", htmlfile.status) #列印出整数200为成功获取
print("表头内容:", htmlfile.getheaders())

#试试其他网站吧!
import urllib.request

url = "https://www.gamer.com.tw/"
htmlfile = urllib.request.urlopen(url)
print(htmlfile.read())

#增加表头
import urllib.request

url = "https://www.gamer.com.tw/"
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.54 Safari/537.36'}

Req = urllib.request.Request(url, headers = headers)
htmlfile = urllib.request.urlopen(Req)
print(htmlfile.read().decode('utf-8'))

如果在影片中有说得不太清楚或错误的地方，欢迎留言告诉我，谢谢您的指教。

<<: [Day 24] BDD - godog 小试身手

>>: JS中的排序法_下