Day 15 网页分析 - Web Application Analysis (网页快照截图 - cutycapt )

前言

今天要介绍的工具cutycapt感觉不太算是网页分析，但它位於Kali的Web Application Analysis分类之下的Web Crawlers & Directory Bruteforce，算是一种爬虫工具，可以对网页进行快照截图，通常是用在需要观测长期之下某页面的变化，像是天气图、google地图、新闻每日首页等等。

工具介绍

使用方式很简单，指定页面以及输出的档名就可以了。

cutycapt --url=http://example.com --out=example.png

所以进阶用法就是透过排程定期去页面抓图，假设想根据时间来当档名，可以先写一个脚本，假设放在/home/kali/script.sh，脚本内容如下

timestamp=$(date +%s)
cutycapt --url=http://example.com --out="/tmp/$timestamp.png"

这样直接执行脚本sh /home/kali/script.sh就可以产出一个带有时间戳记为档名的档案了。

而排程可以用crontab来设定，首先执行指令crontab -e，会询问要用什麽编辑器来设定排程，这边我选择2。

no crontab for root - using an empty one

Select an editor.  To change later, run 'select-editor'.
  1. /bin/nano        <---- easiest
  2. /usr/bin/vim.basic
  3. /usr/bin/vim.tiny

Choose 1-3 [1]:

接着就会进到编辑器画面，在这边我们可以设定何时触发排程来执行我们要的指令。

# Edit this file to introduce tasks to be run by cron.
# 
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
# 
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').
# 
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
# 
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
# 
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
# 
# For more information see the manual pages of crontab(5) and cron(8)
# 
# m h  dom mon dow   command

# 加入下列这行，意思是每分钟执行後面指令
* * * * * sh /home/kali/script.sh

可以看到我们的排程触发时机是* * * * *，意思是每分钟都执行後面的指令，接着输入:wq离开vim编辑页面，排程器就会开始每分钟执行一次这个脚本。除此之外我们还必须修改刚刚写的脚本，因为cutycapt需要在视窗环境下才能正常运作，所以透过排程器使用cutycapt是会无法正常截图的，因此根据官方建议，使用xvfv-run来避免这个问题，所以我们修改一下脚本如下，这样就能正常运作了。

timestamp=$(date +%s)
xvfb-run --server-args="-screen 0, 1024x768x24" cutycapt --url=http://example.com --out="/tmp/$timestamp.png"

其他工具

快照截图是个满有趣的功能，但有时候我们会想要打包网页上的文字建档，之後也方便搜寻，这时候就没办法用cutycapt了，可以利用wget来达成下载整个网页的需求。

wget一般常用来下载档案，就像前几篇需要下载网路上档案的时候，我们是这样用的

wget --no-check-certificate https://wordpress.org/wordpress-4.7-RC1.tar.gz

不只下载档案，wget也能用来下载网页，通常是静态网页效果较好，这边使用到-c来续传、-r递回下载

wget -c -r http://example.com

有时候要下载的内容很大的时候，怕影响整体网路状况，wget也能透过参数来限制下载速度，有兴趣的朋友可以透过wget -h来看一下详细的介绍。

<<: 【Day15】状态机的撰写

>>: 成为工具人应有的工具包-15 PasswordFox

Day 15 网页分析 - Web Application Analysis (网页快照截图 - cutycapt )

前言

工具介绍

其他工具

Day 24 深度学习与人工神经网路

Day17-维稳? StatefulSet介绍

day 15 - 从执行时间开始优化

ISO 27001 资讯安全管理系统【解析】(十)

【Day26】其他开源资源篇-odoo重要开源资源

[Day27] NLP会用到的模型(十)-实作transformer-下

Day 29 - Baseball Game

Day 7 - 拯救落後的专案能撑一天是一天(前端篇)

[Day23]ISO 27001 附录 A.11 实体及环境安全

Consistency and Consensus (1) - Consistency Guarantees