当前位置:  开发笔记 > 编程语言 > 正文

如何通过PHP检查URL是否存在?

如何解决《如何通过PHP检查URL是否存在?》经验,为你挑选了7个好方法。

如何在PHP中检查URL是否存在(而不是404)?



1> karim79..:

这里:

$file = 'http://www.domain.com/somefile.jpg';
$file_headers = @get_headers($file);
if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') {
    $exists = false;
}
else {
    $exists = true;
}

从这里和上面的帖子正下方,有一个卷曲解决方案:

function url_exists($url) {
    if (!$fp = curl_init($url)) return false;
    return true;
}


也许使用`strpos($ headers [0],'404 Not Found')`可能会成功
我担心CURL方式不会这样.看看这个:http://stackoverflow.com/questions/981954/how-can-one-check-to-see-if-a-remote-file-exists-using-php/982045#982045
@Mark同意了!为了澄清,`strpos($ headers [0],'404')`更好!
一些网站在错误页面上有一个不同的`$ file_headers [0]`.例如,youtube.com.它的错误页面的值为"HTTP/1.0 404未找到"(差异为1.0和1.1).该怎么办?

2> MoonLite..:

在确定php中是否存在url时,有几点需要注意:

url本身是否有效(字符串,不是空的,良好的语法),这很快检查服务器端.

等待响应可能需要时间并阻止代码执行.

并非get_headers()返回的所有头文件都是格式良好的.

使用卷曲(如果可以).

防止获取整个正文/内容,但仅请求标题.

考虑重定向网址:

你想要第一个代码返回吗?

或者按照所有重定向并返回最后一个代码?

您可能最终得到200,但它可以使用元标记或javascript重定向.弄清楚之后会发生什么是艰难的.

请记住,无论您使用何种方法,都需要时间等待响应.
所有代码可能(并且可能会)停止,直到您知道结果或请求已超时.

例如:如果网址无效或无法访问,则以下代码可能需要很长时间才能显示该网页:

$url){
  // this could potentially take 0-30 seconds each
  // (more or less depending on connection, target site, timeout settings...)
  if( ! isValidUrl($url) ){
    unset($urls[$k]);
  }
}

echo "yay all done! now show my site";
foreach($urls as $url){
  echo "{$url}
"; }

以下功能可能会有所帮助,您可能需要修改它们以满足您的需求:

    function isValidUrl($url){
        // first do some quick sanity checks:
        if(!$url || !is_string($url)){
            return false;
        }
        // quick check url is roughly a valid http request: ( http://blah/... ) 
        if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){
            return false;
        }
        // the next bit could be slow:
        if(getHttpResponseCode_using_curl($url) != 200){
//      if(getHttpResponseCode_using_getheaders($url) != 200){  // use this one if you cant use curl
            return false;
        }
        // all good!
        return true;
    }

    function getHttpResponseCode_using_curl($url, $followredirects = true){
        // returns int responsecode, or false (if url does not exist or connection timeout occurs)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $ch = @curl_init($url);
        if($ch === false){
            return false;
        }
        @curl_setopt($ch, CURLOPT_HEADER         ,true);    // we want headers
        @curl_setopt($ch, CURLOPT_NOBODY         ,true);    // dont need body
        @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true);    // catch output (do NOT print!)
        if($followredirects){
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true);
            @curl_setopt($ch, CURLOPT_MAXREDIRS      ,10);  // fairly random number, but could prevent unwanted endless redirects with followlocation=true
        }else{
            @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false);
        }
//      @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_TIMEOUT        ,6);   // fairly random number (seconds)... but could prevent waiting forever to get a result
//      @curl_setopt($ch, CURLOPT_USERAGENT      ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1");   // pretend we're a regular browser
        @curl_exec($ch);
        if(@curl_errno($ch)){   // should be 0
            @curl_close($ch);
            return false;
        }
        $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int
        @curl_close($ch);
        return $code;
    }

    function getHttpResponseCode_using_getheaders($url, $followredirects = true){
        // returns string responsecode, or false if no responsecode found in headers (or url does not exist)
        // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings))
        // if $followredirects == false: return the FIRST known httpcode (ignore redirects)
        // if $followredirects == true : return the LAST  known httpcode (when redirected)
        if(! $url || ! is_string($url)){
            return false;
        }
        $headers = @get_headers($url);
        if($headers && is_array($headers)){
            if($followredirects){
                // we want the the last errorcode, reverse array so we start at the end:
                $headers = array_reverse($headers);
            }
            foreach($headers as $hline){
                // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc.
                // note that the exact syntax/version/output differs, so there is some string magic involved here
                if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***"
                    $code = $matches[1];
                    return $code;
                }
            }
            // no HTTP/xxx found in headers:
            return false;
        }
        // no headers :
        return false;
    }


如果有人有同样的问题,请检查dns-nameservers ..使用没有后续行为的opendns http://stackoverflow.com/a/11072947/1829460

3> lunarnet76..:
$headers = @get_headers($this->_value);
if(strpos($headers[0],'200')===false)return false;

所以,只要你联系一个网站并得到200以上的其他东西就可以了


但是,如果它是重定向怎么办?该域名仍然有效,但将被排除在外.
上面一行:`return strpos(@get_headers($ url)[0],'200')=== false?false:是的.可能有用.

4> mebjas..:

你不能在某些服务器上使用curl你可以使用这个代码




5> Randy Skretk..:
$url = 'http://google.com';
$not_url = 'stp://google.com';

if (@file_get_contents($url)): echo "Found '$url'!";
else: echo "Can't find '$url'.";
endif;
if (@file_get_contents($not_url)): echo "Found '$not_url!";
else: echo "Can't find '$not_url'.";
endif;

// Found 'http://google.com'!Can't find 'stp://google.com'.


如果关闭allow-url-fopen,这将不起作用. - http://www.php.net/manual/en/filesystem.configuration.php#ini.allow-url-fopen
我建议只读取第一个字节... if(@file_get_contents($ url,false,NULL,0,1))

6> 小智..:
function URLIsValid($URL)
{
    $exists = true;
    $file_headers = @get_headers($URL);
    $InvalidHeaders = array('404', '403', '500');
    foreach($InvalidHeaders as $HeaderVal)
    {
            if(strstr($file_headers[0], $HeaderVal))
            {
                    $exists = false;
                    break;
            }
    }
    return $exists;
}



7> Ehsan..:

我用这个函数:

/**
 * @param $url
 * @param array $options
 * @return string
 * @throws Exception
 */
function checkURL($url, array $options = array()) {
    if (empty($url)) {
        throw new Exception('URL is empty');
    }

    // list of HTTP status codes
    $httpStatusCodes = array(
        100 => 'Continue',
        101 => 'Switching Protocols',
        102 => 'Processing',
        200 => 'OK',
        201 => 'Created',
        202 => 'Accepted',
        203 => 'Non-Authoritative Information',
        204 => 'No Content',
        205 => 'Reset Content',
        206 => 'Partial Content',
        207 => 'Multi-Status',
        208 => 'Already Reported',
        226 => 'IM Used',
        300 => 'Multiple Choices',
        301 => 'Moved Permanently',
        302 => 'Found',
        303 => 'See Other',
        304 => 'Not Modified',
        305 => 'Use Proxy',
        306 => 'Switch Proxy',
        307 => 'Temporary Redirect',
        308 => 'Permanent Redirect',
        400 => 'Bad Request',
        401 => 'Unauthorized',
        402 => 'Payment Required',
        403 => 'Forbidden',
        404 => 'Not Found',
        405 => 'Method Not Allowed',
        406 => 'Not Acceptable',
        407 => 'Proxy Authentication Required',
        408 => 'Request Timeout',
        409 => 'Conflict',
        410 => 'Gone',
        411 => 'Length Required',
        412 => 'Precondition Failed',
        413 => 'Payload Too Large',
        414 => 'Request-URI Too Long',
        415 => 'Unsupported Media Type',
        416 => 'Requested Range Not Satisfiable',
        417 => 'Expectation Failed',
        418 => 'I\'m a teapot',
        422 => 'Unprocessable Entity',
        423 => 'Locked',
        424 => 'Failed Dependency',
        425 => 'Unordered Collection',
        426 => 'Upgrade Required',
        428 => 'Precondition Required',
        429 => 'Too Many Requests',
        431 => 'Request Header Fields Too Large',
        449 => 'Retry With',
        450 => 'Blocked by Windows Parental Controls',
        500 => 'Internal Server Error',
        501 => 'Not Implemented',
        502 => 'Bad Gateway',
        503 => 'Service Unavailable',
        504 => 'Gateway Timeout',
        505 => 'HTTP Version Not Supported',
        506 => 'Variant Also Negotiates',
        507 => 'Insufficient Storage',
        508 => 'Loop Detected',
        509 => 'Bandwidth Limit Exceeded',
        510 => 'Not Extended',
        511 => 'Network Authentication Required',
        599 => 'Network Connect Timeout Error'
    );

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

    if (isset($options['timeout'])) {
        $timeout = (int) $options['timeout'];
        curl_setopt($ch, CURLOPT_TIMEOUT, $timeout);
    }

    curl_exec($ch);
    $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    curl_close($ch);

    if (array_key_exists($returnedStatusCode, $httpStatusCodes)) {
        return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}";
    } else {
        return "'{$url}' does not exist";
    }
}

推荐阅读
贴进你的心聆听你的世界
这个屌丝很懒,什么也没留下!
DevBox开发工具箱 | 专业的在线开发工具网站    京公网安备 11010802040832号  |  京ICP备19059560号-6
Copyright © 1998 - 2020 DevBox.CN. All Rights Reserved devBox.cn 开发工具箱 版权所有