我一直在寻找一个简单的URL正则表达式,有没有人有一个方便的工作?我没有找到一个zend框架验证类,并看到了几个实现.
谢谢
使用该filter_var()
函数验证字符串是否为URL:
var_dump(filter_var('example.com', FILTER_VALIDATE_URL));
在不需要时使用正则表达式是不好的做法.
编辑:小心,这个解决方案不是unicode安全的,不是XSS安全的.如果您需要复杂的验证,也许最好还是寻找其他地方.
我在一些项目中使用过这个,我不相信我遇到了问题,但我确信它并非详尽无遗:
$text = preg_replace( '#((https?|ftp)://(\S*?\.\S*?))([\s)\[\]{},;"\':<]|\.\s|$)#i', "'$3$4'", $text );
最后的大多数随机垃圾是处理http://domain.com.
句子中的情况(以避免匹配尾随时期).我确信它可以清理,但因为它有效.我或多或少只是将它从项目复制到项目.
按照PHP手册- parse_url应该不会被用于验证URL.
不幸的是,似乎filter_var('example.com', FILTER_VALIDATE_URL)
没有更好的表现.
双方parse_url()
并filter_var()
会通过恶意的URL,例如http://...
因此在这种情况下 - 正则表达式是更好的方法.
万一你想知道网址是否真的存在:
function url_exist($url){//se passar a URL existe $c=curl_init(); curl_setopt($c,CURLOPT_URL,$url); curl_setopt($c,CURLOPT_HEADER,1);//get the header curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url if(!curl_exec($c)){ //echo $url.' inexists'; return false; }else{ //echo $url.' exists'; return true; } //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE); //return ($httpcode<400); }
按照John Gruber(Daring Fireball)的说法:
正则表达式:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))
在preg_match()中使用:
preg_match("/(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/", $url)
这是扩展的正则表达式模式(带注释):
(?xi) \b ( # Capture 1: entire matched URL (?: https?:// # http or https protocol | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels )+ (?: # End with: \(([^\s()<>]+|(\([^\s()<>]+\)))*\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) )
有关详细信息,请查看:http: //daringfireball.net/2010/07/improved_regex_for_matching_urls
在这种情况下,我不认为使用正则表达式是明智的做法.不可能匹配所有可能性,即使你这样做,仍然有可能网址根本不存在.
这是一个非常简单的方法来测试url是否实际存在且可读:
if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";
(如果没有,preg_match
那么这也会验证服务器上的所有文件名)
function validateURL($URL) { $pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i"; $pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i"; if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){ return true; } else{ return false; } }
我已经用这个很成功 - 我不记得从哪里得到它
$pattern = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";
并且有你的答案=)试着打破它,你不能!
function link_validate_url($text) { $LINK_DOMAINS = 'aero|arpa|asia|biz|com|cat|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|mobi|local'; $LINK_ICHARS_DOMAIN = (string) html_entity_decode(implode("", array( // @TODO completing letters ... "æ", // æ "Æ", // Æ "À", // À "à", // à "Á", // Á "á", // á "Â", //  "â", // â "å", // å "Å", // Å "ä", // ä "Ä", // Ä "Ç", // Ç "ç", // ç "Ð", // Ð "ð", // ð "È", // È "è", // è "É", // É "é", // é "Ê", // Ê "ê", // ê "Ë", // Ë "ë", // ë "Î", // Î "î", // î "Ï", // Ï "ï", // ï "ø", // ø "Ø", // Ø "ö", // ö "Ö", // Ö "Ô", // Ô "ô", // ô "Õ", // Õ "õ", // õ "Œ", // Œ "œ", // œ "ü", // ü "Ü", // Ü "Ù", // Ù "ù", // ù "Û", // Û "û", // û "Ÿ", // Ÿ "ÿ", // ÿ "Ñ", // Ñ "ñ", // ñ "þ", // þ "Þ", // Þ "ý", // ý "Ý", // Ý "¿", // ¿ )), ENT_QUOTES, 'UTF-8'); $LINK_ICHARS = $LINK_ICHARS_DOMAIN . (string) html_entity_decode(implode("", array( "ß", // ß )), ENT_QUOTES, 'UTF-8'); $allowed_protocols = array('http', 'https', 'ftp', 'news', 'nntp', 'telnet', 'mailto', 'irc', 'ssh', 'sftp', 'webcal'); // Starting a parenthesis group with (?: means that it is grouped, but is not captured $protocol = '((?:'. implode("|", $allowed_protocols) .'):\/\/)'; $authentication = "(?:(?:(?:[\w\.\-\+!$&'\(\)*\+,;=" . $LINK_ICHARS . "]|%[0-9a-f]{2})+(?::(?:[\w". $LINK_ICHARS ."\.\-\+%!$&'\(\)*\+,;=]|%[0-9a-f]{2})*)?)?@)"; $domain = '(?:(?:[a-z0-9' . $LINK_ICHARS_DOMAIN . ']([a-z0-9'. $LINK_ICHARS_DOMAIN . '\-_\[\]])*)(\.(([a-z0-9' . $LINK_ICHARS_DOMAIN . '\-_\[\]])+\.)*('. $LINK_DOMAINS .'|[a-z]{2}))?)'; $ipv4 = '(?:[0-9]{1,3}(\.[0-9]{1,3}){3})'; $ipv6 = '(?:[0-9a-fA-F]{1,4}(\:[0-9a-fA-F]{1,4}){7})'; $port = '(?::([0-9]{1,5}))'; // Pattern specific to external links. $external_pattern = '/^'. $protocol .'?'. $authentication .'?('. $domain .'|'. $ipv4 .'|'. $ipv6 .' |localhost)'. $port .'?'; // Pattern specific to internal links. $internal_pattern = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]]+)"; $internal_pattern_file = "/^(?:[a-z0-9". $LINK_ICHARS ."_\-+\[\]\.]+)$/i"; $directories = "(?:\/[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'#!():;*@\[\]]*)*"; // Yes, four backslashes == a single backslash. $query = "(?:\/?\?([?a-z0-9". $LINK_ICHARS ."+_|\-\.~\/\\\\%=&,$'():;*@\[\]{} ]*))"; $anchor = "(?:#[a-z0-9". $LINK_ICHARS ."_\-\.~+%=&,$'():;*@\[\]\/\?]*)"; // The rest of the path for a standard URL. $end = $directories .'?'. $query .'?'. $anchor .'?'.'$/i'; $message_id = '[^@].*@'. $domain; $newsgroup_name = '(?:[0-9a-z+-]*\.)*[0-9a-z+-]*'; $news_pattern = '/^news:('. $newsgroup_name .'|'. $message_id .')$/i'; $user = '[a-zA-Z0-9'. $LINK_ICHARS .'_\-\.\+\^!#\$%&*+\/\=\?\`\|\{\}~\'\[\]]+'; $email_pattern = '/^mailto:'. $user .'@'.'(?:'. $domain .'|'. $ipv4 .'|'. $ipv6 .'|localhost)'. $query .'?$/'; if (strpos($text, '') === 0) { return false; } if (in_array('mailto', $allowed_protocols) && preg_match($email_pattern, $text)) { return false; } if (in_array('news', $allowed_protocols) && preg_match($news_pattern, $text)) { return false; } if (preg_match($internal_pattern . $end, $text)) { return false; } if (preg_match($external_pattern . $end, $text)) { return false; } if (preg_match($internal_pattern_file, $text)) { return false; } return true; }