如何在PHP中检查URL是否存在(而不是404)?
这里:
$file = 'http://www.domain.com/somefile.jpg'; $file_headers = @get_headers($file); if(!$file_headers || $file_headers[0] == 'HTTP/1.1 404 Not Found') { $exists = false; } else { $exists = true; }
从这里和上面的帖子正下方,有一个卷曲解决方案:
function url_exists($url) { if (!$fp = curl_init($url)) return false; return true; }
在确定php中是否存在url时,有几点需要注意:
url本身是否有效(字符串,不是空的,良好的语法),这很快检查服务器端.
等待响应可能需要时间并阻止代码执行.
并非get_headers()返回的所有头文件都是格式良好的.
使用卷曲(如果可以).
防止获取整个正文/内容,但仅请求标题.
考虑重定向网址:
你想要第一个代码返回吗?
或者按照所有重定向并返回最后一个代码?
您可能最终得到200,但它可以使用元标记或javascript重定向.弄清楚之后会发生什么是艰难的.
请记住,无论您使用何种方法,都需要时间等待响应.
所有代码可能(并且可能会)停止,直到您知道结果或请求已超时.
例如:如果网址无效或无法访问,则以下代码可能需要很长时间才能显示该网页:
$url){ // this could potentially take 0-30 seconds each // (more or less depending on connection, target site, timeout settings...) if( ! isValidUrl($url) ){ unset($urls[$k]); } } echo "yay all done! now show my site"; foreach($urls as $url){ echo "{$url}
"; }
以下功能可能会有所帮助,您可能需要修改它们以满足您的需求:
function isValidUrl($url){ // first do some quick sanity checks: if(!$url || !is_string($url)){ return false; } // quick check url is roughly a valid http request: ( http://blah/... ) if( ! preg_match('/^http(s)?:\/\/[a-z0-9-]+(\.[a-z0-9-]+)*(:[0-9]+)?(\/.*)?$/i', $url) ){ return false; } // the next bit could be slow: if(getHttpResponseCode_using_curl($url) != 200){ // if(getHttpResponseCode_using_getheaders($url) != 200){ // use this one if you cant use curl return false; } // all good! return true; } function getHttpResponseCode_using_curl($url, $followredirects = true){ // returns int responsecode, or false (if url does not exist or connection timeout occurs) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url)){ return false; } $ch = @curl_init($url); if($ch === false){ return false; } @curl_setopt($ch, CURLOPT_HEADER ,true); // we want headers @curl_setopt($ch, CURLOPT_NOBODY ,true); // dont need body @curl_setopt($ch, CURLOPT_RETURNTRANSFER ,true); // catch output (do NOT print!) if($followredirects){ @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,true); @curl_setopt($ch, CURLOPT_MAXREDIRS ,10); // fairly random number, but could prevent unwanted endless redirects with followlocation=true }else{ @curl_setopt($ch, CURLOPT_FOLLOWLOCATION ,false); } // @curl_setopt($ch, CURLOPT_CONNECTTIMEOUT ,5); // fairly random number (seconds)... but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_TIMEOUT ,6); // fairly random number (seconds)... but could prevent waiting forever to get a result // @curl_setopt($ch, CURLOPT_USERAGENT ,"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1"); // pretend we're a regular browser @curl_exec($ch); if(@curl_errno($ch)){ // should be 0 @curl_close($ch); return false; } $code = @curl_getinfo($ch, CURLINFO_HTTP_CODE); // note: php.net documentation shows this returns a string, but really it returns an int @curl_close($ch); return $code; } function getHttpResponseCode_using_getheaders($url, $followredirects = true){ // returns string responsecode, or false if no responsecode found in headers (or url does not exist) // NOTE: could potentially take up to 0-30 seconds , blocking further code execution (more or less depending on connection, target site, and local timeout settings)) // if $followredirects == false: return the FIRST known httpcode (ignore redirects) // if $followredirects == true : return the LAST known httpcode (when redirected) if(! $url || ! is_string($url)){ return false; } $headers = @get_headers($url); if($headers && is_array($headers)){ if($followredirects){ // we want the the last errorcode, reverse array so we start at the end: $headers = array_reverse($headers); } foreach($headers as $hline){ // search for things like "HTTP/1.1 200 OK" , "HTTP/1.0 200 OK" , "HTTP/1.1 301 PERMANENTLY MOVED" , "HTTP/1.1 400 Not Found" , etc. // note that the exact syntax/version/output differs, so there is some string magic involved here if(preg_match('/^HTTP\/\S+\s+([1-9][0-9][0-9])\s+.*/', $hline, $matches) ){// "HTTP/*** ### ***" $code = $matches[1]; return $code; } } // no HTTP/xxx found in headers: return false; } // no headers : return false; }
$headers = @get_headers($this->_value); if(strpos($headers[0],'200')===false)return false;
所以,只要你联系一个网站并得到200以上的其他东西就可以了
你不能在某些服务器上使用curl你可以使用这个代码
$url = 'http://google.com'; $not_url = 'stp://google.com'; if (@file_get_contents($url)): echo "Found '$url'!"; else: echo "Can't find '$url'."; endif; if (@file_get_contents($not_url)): echo "Found '$not_url!"; else: echo "Can't find '$not_url'."; endif; // Found 'http://google.com'!Can't find 'stp://google.com'.
function URLIsValid($URL) { $exists = true; $file_headers = @get_headers($URL); $InvalidHeaders = array('404', '403', '500'); foreach($InvalidHeaders as $HeaderVal) { if(strstr($file_headers[0], $HeaderVal)) { $exists = false; break; } } return $exists; }
我用这个函数:
/** * @param $url * @param array $options * @return string * @throws Exception */ function checkURL($url, array $options = array()) { if (empty($url)) { throw new Exception('URL is empty'); } // list of HTTP status codes $httpStatusCodes = array( 100 => 'Continue', 101 => 'Switching Protocols', 102 => 'Processing', 200 => 'OK', 201 => 'Created', 202 => 'Accepted', 203 => 'Non-Authoritative Information', 204 => 'No Content', 205 => 'Reset Content', 206 => 'Partial Content', 207 => 'Multi-Status', 208 => 'Already Reported', 226 => 'IM Used', 300 => 'Multiple Choices', 301 => 'Moved Permanently', 302 => 'Found', 303 => 'See Other', 304 => 'Not Modified', 305 => 'Use Proxy', 306 => 'Switch Proxy', 307 => 'Temporary Redirect', 308 => 'Permanent Redirect', 400 => 'Bad Request', 401 => 'Unauthorized', 402 => 'Payment Required', 403 => 'Forbidden', 404 => 'Not Found', 405 => 'Method Not Allowed', 406 => 'Not Acceptable', 407 => 'Proxy Authentication Required', 408 => 'Request Timeout', 409 => 'Conflict', 410 => 'Gone', 411 => 'Length Required', 412 => 'Precondition Failed', 413 => 'Payload Too Large', 414 => 'Request-URI Too Long', 415 => 'Unsupported Media Type', 416 => 'Requested Range Not Satisfiable', 417 => 'Expectation Failed', 418 => 'I\'m a teapot', 422 => 'Unprocessable Entity', 423 => 'Locked', 424 => 'Failed Dependency', 425 => 'Unordered Collection', 426 => 'Upgrade Required', 428 => 'Precondition Required', 429 => 'Too Many Requests', 431 => 'Request Header Fields Too Large', 449 => 'Retry With', 450 => 'Blocked by Windows Parental Controls', 500 => 'Internal Server Error', 501 => 'Not Implemented', 502 => 'Bad Gateway', 503 => 'Service Unavailable', 504 => 'Gateway Timeout', 505 => 'HTTP Version Not Supported', 506 => 'Variant Also Negotiates', 507 => 'Insufficient Storage', 508 => 'Loop Detected', 509 => 'Bandwidth Limit Exceeded', 510 => 'Not Extended', 511 => 'Network Authentication Required', 599 => 'Network Connect Timeout Error' ); $ch = curl_init($url); curl_setopt($ch, CURLOPT_NOBODY, true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); if (isset($options['timeout'])) { $timeout = (int) $options['timeout']; curl_setopt($ch, CURLOPT_TIMEOUT, $timeout); } curl_exec($ch); $returnedStatusCode = curl_getinfo($ch, CURLINFO_HTTP_CODE); curl_close($ch); if (array_key_exists($returnedStatusCode, $httpStatusCodes)) { return "URL: '{$url}' - Error code: {$returnedStatusCode} - Definition: {$httpStatusCodes[$returnedStatusCode]}"; } else { return "'{$url}' does not exist"; } }