我自己的 Github:Eterna-E
专案来源连结:Eterna-E/ScrabbleWordFinder
如题,当初会写这篇练习笔记,主要是为了纪录撰写 Python 的爬虫练习程序的练习过程,写 Scrabble Word Finder 专案是打算要找出英文字母排列的可能性(有意义的),假设随机输入了几个字母(字母数大约5~12个左右,超过太多的话,搜寻时间会很久)之後,到底能够找出多少个有意义的单词,然後再顺便用 python 的爬虫程序到剑桥词典搜寻这些单词的意思,输出到命令提示字元(cmd)显示出来,虽然好像没什麽实际的作用,但感觉很有趣XD。
特此撰写本篇文章作为纪录文件,用以方便後续有需要的时候,可以快速的重复查阅,虽然後面比较没有什麽机会再用到,但也算是一个还不错的经验。
程序码如下:
import requests
from bs4 import BeautifulSoup
getword = {
'words':'abcde'
}
res=requests.post("https://wordfind.com/", data = getword)
soup = BeautifulSoup(res.text)
def getWord(words,num): #取得指定字母数的文字
for word in soup.select('.defLink'):
if(len(word.select('a')[0].text) == num):
words.append(word.select('a')[0].text)
def printWord(words):
print(str(len(words[0]))+' Letter Words')
for word in words:
print(" "+str(words.index(word)+1)+'. '+word)
word4=[]
getWord(word4,4)
printWord(word4)
输出结果,如下图所示:
程序码如下:
from urllib import request
def getHTML(url):
headers = {'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
req = request.Request(url, headers=headers)
return request.urlopen(req).read()
word4 = ['abed','aced','bade','bead','cade','dace']
for word in word4:
soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E/"+word))
if(soup.select('.def.ddef_d.db')):
print(word)
print(soup.select('.def.ddef_d.db')[0].text.replace('\n',' '))
# 英文解释爬虫
输出结果,如下图所示:
程序码如下:
from urllib import request
def getHTML(url):
headers = {'User-Agent': 'User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'}
req = request.Request(url, headers=headers)
return request.urlopen(req).read()
word4 = ['abed','aced','bade','bead','cade','dace']
for word in word4:
soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/"+word))
if(soup.select('.def.ddef_d.db')):
print(word+soup.select('.def-body.ddef_b')[0].text)
# 中文解释爬虫
输出结果,如下图所示:
完成版,有做小优化过的 code,增加 fake-useragent ,并计算总搜寻时间。
程序码如下:
import requests
from bs4 import BeautifulSoup
from urllib import request
from fake_useragent import UserAgent
import time
ua = UserAgent()
inputword='terofs'#lyeirwa alocilg nsiore eseatt outgfh kidnr
wordfind = {
'words':inputword
}
startTime = time.time()
headers = {'User-Agent': ua.random }
res=requests.post("https://wordfind.com/", data = wordfind, headers = headers)
soup = BeautifulSoup(res.text)
#----------------------------------------------------------------------------------------------------
#----------------------------------------------------------------------------------------------------
def getWord(words,num):#将从Scrabble Word Finder网站网页原始码取得指定字母数的文字
for word in soup.select('.defLink'):
if(len(word.select('a')[0].text) == num):
words.append(word.select('a')[0].text)
def getHTML(url):
headers = {'User-Agent': ua.random }
req = request.Request(url, headers=headers)
return request.urlopen(req).read()
def wordCheck(words,word_checked,wordMeaning):
if(words):#假如该字数的单字不存在,就不查询
for word in words:
soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E-%E6%BC%A2%E8%AA%9E-%E7%B9%81%E9%AB%94/"+word))
if(soup.select('.def.ddef_d.db')):
word_checked.append(word)
wordMeaning.append(soup.select('.def-body.ddef_b')[0].text)
else:
soup = BeautifulSoup(getHTML("https://dictionary.cambridge.org/zht/%E8%A9%9E%E5%85%B8/%E8%8B%B1%E8%AA%9E/"+word))
if(soup.select('.def.ddef_d.db')):
word_checked.append(word)
wordMeaning.append(soup.select('.def.ddef_d.db')[0].text.replace(':','.'))
def printWord(words):
if(words):
print(str(len(words[0]))+' Letter Words'+'( '+str(len(words))+' words Found )')
for word in words:
print(" "+str(words.index(word)+1)+'. '+word)
def printWordMeaning(words,WordMeaning):
if(words):
print('Word Meaning :')
for wordmeanig in WordMeaning:
print(str(WordMeaning.index(wordmeanig)+1)+'. '+words[WordMeaning.index(wordmeanig)])
print(wordmeanig)
#----------------------------------------------------------------------------------------------------
def wordGetter(num,wordstmp,words,wordsMeaning):
getWord(wordstmp,num)
wordCheck(wordstmp,words,wordsMeaning)
printWord(words)
printWordMeaning(words,wordsMeaning)
#----------------------------------------------------------------------------------------------------
wordstmp = [[],[],[],[],[],[],[],[],[],[]]
words = [[],[],[],[],[],[],[],[],[],[]]
wordsMeaning = [[],[],[],[],[],[],[],[],[],[]]
print("根据你输入的字母:'"+inputword+"'我们可以排列出以下这些单字")
num = [i for i in range(0, 10)]
num.reverse()
for i in num:
wordGetter((i+3),wordstmp[i],words[i],wordsMeaning[i])
totalNum = 0
for i in range(0,10):
totalNum+=len(words[i])
print(str(totalNum)+' words Found')
endTime = time.time()
print("总搜寻时间:", format(endTime - startTime, '.2f') ,'秒,大约 '+ str(format(float(format(endTime - startTime, '.2f'))/60, '.2f'))+'分钟')
print(words)
输出结果节录,如下图所示:
<<: Day 9 - [Zenbo开发系列] 06-安装DDE语料到Zenbo
>>: [Day08] TS:什麽!型别也能做条件判断?认识 Conditional Types
今天事情比较多,没啥进度,就先对先前的内容补点漏洞。 登入 进站画面除了Day07的几个Patter...
为AI而AI 要做AI 专案, 目前潮流下及高层推动, 容易为AI而AI, 以下可以参考 专案规章 ...
30 - A Chain adding function Don't say so much, ju...
JavaScript的变数依使用的切分范围(作用域)可以分为两种: 区域变数 全域变数 前面提到透...
前言 在前一天介绍了 Docker 容器操作的技巧,今天来透过 Lab 学习如何将专案建置成 Ima...