当前位置:  开发笔记 > 编程语言 > 正文

如何使用PHP从html中提取img src,title和alt?

如何解决《如何使用PHP从html中提取imgsrc,title和alt?》经验,为你挑选了6个好方法。

我想创建一个页面,其中所有驻留在我网站上的图像都列有标题和替代表示.

我已经给我写了一个程序来查找和加载所有HTML文件,但现在我被困在如何提取src,titlealt从这个HTML:

src="/image/fluffybunny.jpg" title="Harvey the bunny" alt="a cute little fluffy bunny" />

我想这应该用一些正则表达式完成,但由于标签的顺序可能会有所不同,而且我需要所有这些,我真的不知道如何以优雅的方式解析它(我可以通过char方式,但这很痛苦).



1> 小智..:
$url="http://example.com";

$html = file_get_contents($url);

$doc = new DOMDocument();
@$doc->loadHTML($html);

$tags = $doc->getElementsByTagName('img');

foreach ($tags as $tag) {
       echo $tag->getAttribute('src');
}


我喜欢这是多么容易阅读!xpath和regex也有效,但18个月之后再也不容易阅读了.

2> e-satis..:

编辑:现在我知道的更好

使用regexp来解决这类问题是一个坏主意,可能导致代码难以维护和不可靠.更好地使用HTML解析器.

使用正则表达式解决方案

在这种情况下,最好将流程分为两部分:

得到所有的img标签

提取元数据

我将假设您的doc不是xHTML严格的,因此您不能使用XML解析器.EG与此网页源代码:

/* preg_match_all match the regexp in all the $html string and output everything as 
an array in $result. "i" option is used to make it case insensitive */

preg_match_all('/]+>/i',$html, $result); 

print_r($result);
Array
(
    [0] => Array
        (
            [0] => logo link to homepage
            [1] => vote up
            [2] => vote down
            [3] => vote up

[...]
        )

)

然后我们用循环获取所有img标记属性:

$img = array();
foreach( $result as $img_tag)
{
    preg_match_all('/(alt|title|src)=("[^"]*")/i',$img_tag, $img[$img_tag]);
}

print_r($img);

Array
(
    [logo link to homepage] => Array
        (
            [0] => Array
                (
                    [0] => src="/Content/Img/stackoverflow-logo-250.png"
                    [1] => alt="logo link to homepage"
                )

            [1] => Array
                (
                    [0] => src
                    [1] => alt
                )

            [2] => Array
                (
                    [0] => "/Content/Img/stackoverflow-logo-250.png"
                    [1] => "logo link to homepage"
                )

        )

    [vote up] => Array
        (
            [0] => Array
                (
                    [0] => src="/content/img/vote-arrow-up.png"
                    [1] => alt="vote up"
                    [2] => title="This was helpful (click again to undo)"
                )

            [1] => Array
                (
                    [0] => src
                    [1] => alt
                    [2] => title
                )

            [2] => Array
                (
                    [0] => "/content/img/vote-arrow-up.png"
                    [1] => "vote up"
                    [2] => "This was helpful (click again to undo)"
                )

        )

    [vote down] => Array
        (
            [0] => Array
                (
                    [0] => src="/content/img/vote-arrow-down.png"
                    [1] => alt="vote down"
                    [2] => title="This was not helpful (click again to undo)"
                )

            [1] => Array
                (
                    [0] => src
                    [1] => alt
                    [2] => title
                )

            [2] => Array
                (
                    [0] => "/content/img/vote-arrow-down.png"
                    [1] => "vote down"
                    [2] => "This was not helpful (click again to undo)"
                )

        )

    [\"alt\"");
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
    echo $img['src'] . ' ' . $img['alt'] . ' ' . $img['title'];
}

我确实使用了该DOMDocument::loadHTML()方法,因为此方法可以处理HTML语法,并且不会强制输入文档为XHTML.严格地说,转换为a SimpleXMLElement是不必要的 - 它只是使用xpath并且xpath结果更简单.



4> DreamWerx..:

如果它是XHTML,那么你的例子是,你只需要simpleXML.

';
$sx = simplexml_load_string($input);
var_dump($sx);
?>

输出:

object(SimpleXMLElement)#1 (1) {
  ["@attributes"]=>
  array(3) {
    ["src"]=>
    string(22) "/image/fluffybunny.jpg"
    ["title"]=>
    string(16) "Harvey the bunny"
    ["alt"]=>
    string(26) "a cute little fluffy bunny"
  }
}



5> Bakudan..:

必须像这样编辑脚本

foreach( $result[0] as $img_tag)

因为preg_match_all返回数组数组



6> Nauphal..:

你可以使用simplehtmldom.simplehtmldom支持大多数jQuery选择器.下面给出一个例子

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
       echo $element->src . '
'; // Find all links foreach($html->find('a') as $element) echo $element->href . '
';

推荐阅读
农大军乐团_697
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有