我在编写匹配有效IPv6地址的正则表达式时遇到问题,包括压缩形式的那些(::
从每个字节对中省略或前导零).
有人可以建议一个符合要求的正则表达式吗?
我正在考虑扩展每个字节对,并将结果与更简单的正则表达式进行匹配.
我无法得到@Factor Mystic使用POSIX正则表达式的答案,所以我写了一个与POSIX正则表达式和PERL正则表达式一起使用的答案.
它应该匹配:
IPv6地址
零压缩IPv6地址(rfc5952的第2.2节)
链路本地IPv6地址与区域索引(rfc4007的第11节)
IPv4嵌入式IPv6地址(rfc6052的第2部分)
IPv4映射的IPv6地址(rfc2765的2.1节)
IPv4转换地址(rfc2765第2.1节)
IPv6正则表达式:
(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))
为便于阅读,以下是将主要OR点分割成单独行的上述正则表达式:
# IPv6 RegEx ( ([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}| # 1:2:3:4:5:6:7:8 ([0-9a-fA-F]{1,4}:){1,7}:| # 1:: 1:2:3:4:5:6:7:: ([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8 ([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8 ([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8 ([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8 ([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8 [0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8 :((:[0-9a-fA-F]{1,4}){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 :: fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index) ::(ffff(:0{1,4}){0,1}:){0,1} ((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3} (25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses) ([0-9a-fA-F]{1,4}:){1,4}: ((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3} (25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address) ) # IPv4 RegEx ((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])
为了使上述内容更容易理解,以下"伪"代码复制了上述内容:
IPV4SEG = (25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]) IPV4ADDR = (IPV4SEG\.){3,3}IPV4SEG IPV6SEG = [0-9a-fA-F]{1,4} IPV6ADDR = ( (IPV6SEG:){7,7}IPV6SEG| # 1:2:3:4:5:6:7:8 (IPV6SEG:){1,7}:| # 1:: 1:2:3:4:5:6:7:: (IPV6SEG:){1,6}:IPV6SEG| # 1::8 1:2:3:4:5:6::8 1:2:3:4:5:6::8 (IPV6SEG:){1,5}(:IPV6SEG){1,2}| # 1::7:8 1:2:3:4:5::7:8 1:2:3:4:5::8 (IPV6SEG:){1,4}(:IPV6SEG){1,3}| # 1::6:7:8 1:2:3:4::6:7:8 1:2:3:4::8 (IPV6SEG:){1,3}(:IPV6SEG){1,4}| # 1::5:6:7:8 1:2:3::5:6:7:8 1:2:3::8 (IPV6SEG:){1,2}(:IPV6SEG){1,5}| # 1::4:5:6:7:8 1:2::4:5:6:7:8 1:2::8 IPV6SEG:((:IPV6SEG){1,6})| # 1::3:4:5:6:7:8 1::3:4:5:6:7:8 1::8 :((:IPV6SEG){1,7}|:)| # ::2:3:4:5:6:7:8 ::2:3:4:5:6:7:8 ::8 :: fe80:(:IPV6SEG){0,4}%[0-9a-zA-Z]{1,}| # fe80::7:8%eth0 fe80::7:8%1 (link-local IPv6 addresses with zone index) ::(ffff(:0{1,4}){0,1}:){0,1}IPV4ADDR| # ::255.255.255.255 ::ffff:255.255.255.255 ::ffff:0:255.255.255.255 (IPv4-mapped IPv6 addresses and IPv4-translated addresses) (IPV6SEG:){1,4}:IPV4ADDR # 2001:db8:3:4::192.0.2.33 64:ff9b::192.0.2.33 (IPv4-Embedded IPv6 Address) )
我在GitHub上发布了一个测试正则表达式的脚本:https://gist.github.com/syzdek/6086792
以下内容将验证IPv4,IPv6(完整和压缩)以及IPv6v4(完整和压缩)地址:
'/^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD'
来自" IPv6正则表达式 ":
(\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,6}\Z)| (\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,5}\Z)| (\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,4}\Z)| (\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,3}\Z)| (\A([0-9a-f]{1,4}:){1,5}(:[0-9a-f]{1,4}){1,2}\Z)| (\A([0-9a-f]{1,4}:){1,6}(:[0-9a-f]{1,4}){1,1}\Z)| (\A(([0-9a-f]{1,4}:){1,7}|:):\Z)| (\A:(:[0-9a-f]{1,4}){1,7}\Z)| (\A((([0-9a-f]{1,4}:){6})(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)| (\A(([0-9a-f]{1,4}:){5}[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3})\Z)| (\A([0-9a-f]{1,4}:){5}:[0-9a-f]{1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A([0-9a-f]{1,4}:){1,1}(:[0-9a-f]{1,4}){1,4}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A([0-9a-f]{1,4}:){1,2}(:[0-9a-f]{1,4}){1,3}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A([0-9a-f]{1,4}:){1,3}(:[0-9a-f]{1,4}){1,2}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A([0-9a-f]{1,4}:){1,4}(:[0-9a-f]{1,4}){1,1}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A(([0-9a-f]{1,4}:){1,5}|:):(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)| (\A:(:[0-9a-f]{1,4}){1,5}:(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}\Z)
听起来你可能正在使用Python.如果是这样,你可以使用这样的东西:
import socket def check_ipv6(n): try: socket.inet_pton(socket.AF_INET6, n) return True except socket.error: return False print check_ipv6('::1') # True print check_ipv6('foo') # False print check_ipv6(5) # TypeError exception print check_ipv6(None) # TypeError exception
我认为您不必将Python编译到Python中inet_pton
,如果您socket.AF_INET
作为第一个参数传入,它也可以解析IPv4地址.注意:这可能不适用于非Unix系统.
我必须强烈反对Frank Krueger的答案.
虽然您说需要正则表达式来匹配IPv6地址,但我假设您真正需要的是能够检查给定字符串是否是有效的IPv6地址.这里有一个微妙但重要的区别.
有多种方法可以检查给定字符串是否是有效的IPv6地址,而正则表达式匹配只是一种解决方案.
如果可以,请使用现有库.该库将有更少的错误,它的使用将导致更少的代码供您维护.
因子神秘主义者建议的正则表达是漫长而复杂的.它很可能有效,但你也应该考虑如果它出乎意料地失败你将如何应对.我想在这里提出的观点是,如果你不能自己形成一个必需的正则表达式,你将无法轻松调试它.
如果您没有合适的库,那么编写自己的不依赖于正则表达式的IPv6验证例程可能会更好.如果您编写它,您就会理解它,如果您理解它,您可以添加注释来解释它,以便其他人也可以理解并随后维护它.
使用正则表达式时要谨慎行事,其功能无法向其他人解释.
我不是Ipv6专家,但我认为你可以用这个更轻松地获得一个非常好的结果:
^([0-9A-Fa-f]{0,4}:){2,7}([0-9A-Fa-f]{1,4}$|((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4})$
回答"是一个有效的ipv6"它对我来说似乎没问题.把它分解成几部分......忘掉它吧.我省略了未指定的一个(::),因为我的数据库中没有"未指定的地址".
开头:
^([0-9A-Fa-f]{0,4}:){2,7}
< - 匹配可压缩部分,我们可以将其翻译为:2到7之间的冒号,它们之间可能有heaxadecimal数.
接下来是:
[0-9A-Fa-f]{1,4}$
< - 一个十六进制数字(省略前导0)或
((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}
< - 一个Ipv4地址
此正则表达式将根据使用REGULAR EXTENDED模式的正则表达式的GNU C++实现匹配有效的IPv6和IPv4地址:
"^\s*((([0-9A-Fa-f]{1,4}:){7}([0-9A-Fa-f]{1,4}|:))|(([0-9A-Fa-f]{1,4}:){6}(:[0-9A-Fa-f]{1,4}|((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){5}(((:[0-9A-Fa-f]{1,4}){1,2})|:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3})|:))|(([0-9A-Fa-f]{1,4}:){4}(((:[0-9A-Fa-f]{1,4}){1,3})|((:[0-9A-Fa-f]{1,4})?:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){3}(((:[0-9A-Fa-f]{1,4}){1,4})|((:[0-9A-Fa-f]{1,4}){0,2}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){2}(((:[0-9A-Fa-f]{1,4}){1,5})|((:[0-9A-Fa-f]{1,4}){0,3}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(([0-9A-Fa-f]{1,4}:){1}(((:[0-9A-Fa-f]{1,4}){1,6})|((:[0-9A-Fa-f]{1,4}){0,4}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:))|(:(((:[0-9A-Fa-f]{1,4}){1,7})|((:[0-9A-Fa-f]{1,4}){0,5}:((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])){3}))|:)))(%.+)?\s*$"