12赞

如何使用Python获取网站的屏幕截图/图像？

作者：手机用户2402852307 | 2023-08-22 11:50

如何解决《如何使用Python获取网站的屏幕截图/图像？》经验，为你挑选了5个好方法。

我想要实现的是从python中的任何网站获取网站截图.

环境:Linux

1> hoju..：

这是一个使用webkit的简单解决方案:http: //webscraping.com/blog/Webpage-screenshots-with-webkit/

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def capture(self, url, output_file):
        self.load(QUrl(url))
        self.wait_load()
        # set to webpage size
        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())
        # render image
        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        print 'saving', output_file
        image.save(output_file)

    def wait_load(self, delay=0):
        # process app events until page loaded
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

s = Screenshot()
s.capture('http://webscraping.com', 'website.png')
s.capture('http://webscraping.com/blog', 'blog.png')

有没有人使用@hoju的方法遇到问题？它不适用于每个网页......

2> Aamir Adnan..：

这是我的解决方案,从各种来源获取帮助.它需要完整的网页屏幕捕获并裁剪它(可选)并从裁剪后的图像中生成缩略图.以下是要求:

要求:

安装NodeJS

使用Node的包管理器安装phantomjs: npm -g install phantomjs

安装selenium(在你的virtualenv中,如果你正在使用它)

安装imageMagick

将phantomjs添加到系统路径(在Windows上)

import os
from subprocess import Popen, PIPE
from selenium import webdriver

abspath = lambda *p: os.path.abspath(os.path.join(*p))
ROOT = abspath(os.path.dirname(__file__))


def execute_command(command):
    result = Popen(command, shell=True, stdout=PIPE).stdout.read()
    if len(result) > 0 and not result.isspace():
        raise Exception(result)


def do_screen_capturing(url, screen_path, width, height):
    print "Capturing screen.."
    driver = webdriver.PhantomJS()
    # it save service log file in same directory
    # if you want to have log file stored else where
    # initialize the webdriver.PhantomJS() as
    # driver = webdriver.PhantomJS(service_log_path='/var/log/phantomjs/ghostdriver.log')
    driver.set_script_timeout(30)
    if width and height:
        driver.set_window_size(width, height)
    driver.get(url)
    driver.save_screenshot(screen_path)


def do_crop(params):
    print "Croping captured image.."
    command = [
        'convert',
        params['screen_path'],
        '-crop', '%sx%s+0+0' % (params['width'], params['height']),
        params['crop_path']
    ]
    execute_command(' '.join(command))


def do_thumbnail(params):
    print "Generating thumbnail from croped captured image.."
    command = [
        'convert',
        params['crop_path'],
        '-filter', 'Lanczos',
        '-thumbnail', '%sx%s' % (params['width'], params['height']),
        params['thumbnail_path']
    ]
    execute_command(' '.join(command))


def get_screen_shot(**kwargs):
    url = kwargs['url']
    width = int(kwargs.get('width', 1024)) # screen width to capture
    height = int(kwargs.get('height', 768)) # screen height to capture
    filename = kwargs.get('filename', 'screen.png') # file name e.g. screen.png
    path = kwargs.get('path', ROOT) # directory path to store screen

    crop = kwargs.get('crop', False) # crop the captured screen
    crop_width = int(kwargs.get('crop_width', width)) # the width of crop screen
    crop_height = int(kwargs.get('crop_height', height)) # the height of crop screen
    crop_replace = kwargs.get('crop_replace', False) # does crop image replace original screen capture?

    thumbnail = kwargs.get('thumbnail', False) # generate thumbnail from screen, requires crop=True
    thumbnail_width = int(kwargs.get('thumbnail_width', width)) # the width of thumbnail
    thumbnail_height = int(kwargs.get('thumbnail_height', height)) # the height of thumbnail
    thumbnail_replace = kwargs.get('thumbnail_replace', False) # does thumbnail image replace crop image?

    screen_path = abspath(path, filename)
    crop_path = thumbnail_path = screen_path

    if thumbnail and not crop:
        raise Exception, 'Thumnail generation requires crop image, set crop=True'

    do_screen_capturing(url, screen_path, width, height)

    if crop:
        if not crop_replace:
            crop_path = abspath(path, 'crop_'+filename)
        params = {
            'width': crop_width, 'height': crop_height,
            'crop_path': crop_path, 'screen_path': screen_path}
        do_crop(params)

        if thumbnail:
            if not thumbnail_replace:
                thumbnail_path = abspath(path, 'thumbnail_'+filename)
            params = {
                'width': thumbnail_width, 'height': thumbnail_height,
                'thumbnail_path': thumbnail_path, 'crop_path': crop_path}
            do_thumbnail(params)
    return screen_path, crop_path, thumbnail_path


if __name__ == '__main__':
    '''
        Requirements:
        Install NodeJS
        Using Node's package manager install phantomjs: npm -g install phantomjs
        install selenium (in your virtualenv, if you are using that)
        install imageMagick
        add phantomjs to system path (on windows)
    '''

    url = 'http://stackoverflow.com/questions/1197172/how-can-i-take-a-screenshot-image-of-a-website-using-python'
    screen_path, crop_path, thumbnail_path = get_screen_shot(
        url=url, filename='sof.png',
        crop=True, crop_replace=False,
        thumbnail=True, thumbnail_replace=False,
        thumbnail_width=200, thumbnail_height=150,
    )

这些是生成的图像:

完整的网页屏幕

从捕获的屏幕裁剪图像

裁剪图像的缩略图

问题是Python,而不是NodeJS.

3> ars..：

在Mac上,有webkit2png,在Linux + KDE上,你可以使用khtml2png.我尝试了前者并且效果很好,并且听说后者正在使用.

我最近遇到了QtWebKit,它声称是跨平台的(Qt将WebKit推入他们的库中,我猜).但我从未尝试过,所以我不能告诉你更多.

QtWebKit链接显示了如何从Python访问.您应该至少可以使用子进程对其他进程执行相同的操作.

4> Joolah..：

可以使用硒

from selenium import webdriver

DRIVER = 'chromedriver'
driver = webdriver.Chrome(DRIVER)
driver.get('https://www.spotify.com')
screenshot = driver.save_screenshot('my_screenshot.png')
driver.quit()

https://sites.google.com/a/chromium.org/chromedriver/getting-started

5> aezell..：

我无法对ars的回答发表评论,但实际上我使用QtWebkit运行了Roland Tapken的代码并且运行良好.

只是想确认Roland在他的博客上发布的内容在Ubuntu上运行得很好.我们的生产版本最终没有使用他写的任何内容,但我们使用PyQt/QtWebKit绑定取得了很大的成功.

推荐阅读

程序员
我需要将数据类型从浮点数更改为当前获得10亿条记录的数据库的十进制数

如何解决《我需要将数据类型从浮点数更改为当前获得10亿条记录的数据库的十进制数》经验，为你挑选了1个好方法。 ... [详细]
程序员
将参数输入到对象数组中？

如何解决《将参数输入到对象数组中？》经验，为你挑选了1个好方法。 ... [详细]
程序员
如果某些列是常见的,则将数据帧附加到主数据帧

如何解决《如果某些列是常见的,则将数据帧附加到主数据帧》经验，为你挑选了1个好方法。 ... [详细]
程序员
gcc shared_ptr复制赋值实现

如何解决《gccshared_ptr复制赋值实现》经验，为你挑选了1个好方法。 ... [详细]
程序员
将CSV文件导入Hadoop

如何解决《将CSV文件导入Hadoop》经验，为你挑选了1个好方法。 ... [详细]
程序员
Delphi SHGetFolderPath和null终止字符串

如何解决《DelphiSHGetFolderPath和null终止字符串》经验，为你挑选了1个好方法。 ... [详细]
程序员
Javascript调用函数 - 动态变量名

如何解决《Javascript调用函数-动态变量名》经验，为你挑选了1个好方法。 ... [详细]
程序员
np.multiply如何工作？

如何解决《np.multiply如何工作？》经验，为你挑选了1个好方法。 ... [详细]
程序员
将一些JSON文件加载到Spring Boot应用程序的最佳方法

如何解决《将一些JSON文件加载到SpringBoot应用程序的最佳方法》经验，为你挑选了1个好方法。 ... [详细]
程序员
RecyclerView项目失去焦点

如何解决《RecyclerView项目失去焦点》经验，为你挑选了0个好方法。 ... [详细]
程序员
是否有GCC编译指示可以打开和关闭C++ 11？

如何解决《是否有GCC编译指示可以打开和关闭C++11？》经验，为你挑选了1个好方法。 ... [详细]
程序员
使用sync.WaitGroup和频道的Golang app永远不会退出

如何解决《使用sync.WaitGroup和频道的Golangapp永远不会退出》经验，为你挑选了1个好方法。 ... [详细]
程序员
Admob横幅放慢了应用程序并在主线程上做了太多工作 - Android

如何解决《Admob横幅放慢了应用程序并在主线程上做了太多工作-Android》经验，为你挑选了0个好方法。 ... [详细]
程序员
有没有办法在表达式树中设置'DeclaringType'？

如何解决《有没有办法在表达式树中设置'DeclaringType'？》经验，为你挑选了0个好方法。 ... [详细]
程序员
从Chrome应用重新启动Chromebox

如何解决《从Chrome应用重新启动Chromebox》经验，为你挑选了1个好方法。 ... [详细]
程序员
在Xcode中部署的所有4种方法之间有什么区别？

如何解决《在Xcode中部署的所有4种方法之间有什么区别？》经验，为你挑选了0个好方法。 ... [详细]
程序员
如何使用SUM而不是UNION

如何解决《如何使用SUM而不是UNION》经验，为你挑选了1个好方法。 ... [详细]
程序员
从文件读取并写入StringIO - Python

如何解决《从文件读取并写入StringIO-Python》经验，为你挑选了0个好方法。 ... [详细]
程序员
在Travis-CI上,gradlew组装失败

如何解决《在Travis-CI上,gradlew组装失败》经验，为你挑选了3个好方法。 ... [详细]
程序员
使用启动屏幕在通用应用程序中禁用对iPad Pro的支持

如何解决《使用启动屏幕在通用应用程序中禁用对iPadPro的支持》经验，为你挑选了0个好方法。 ... [详细]

手机用户2402852307

这个屌丝很懒，什么也没留下！

关注作者

Tags | 热门标签

RankList | 热门文章