我想使用.Net Regex.Split方法将此输入字符串拆分为数组.它必须在空格上拆分,除非它包含在引号中.
输入:这是"我的字符串"它有"六个匹配"
预期产量:
这里
是
我的字符串
它
具有
六场比赛
我需要什么样的模式?我还需要指定任何RegexOptions吗?
无需选项
正则表达式:
\w+|"[\w\s]*"
C#:
Regex regex = new Regex(@"\w+|""[\w\s]*""");
或者,如果您需要排除"字符:
Regex .Matches(input, @"(?\w+)|\""(? [\w\s]*)""") .Cast () .Select(m => m.Groups["match"].Value) .ToList() .ForEach(s => Console.WriteLine(s));
Lieven的解决方案大部分都在那里,正如他在评论中所述,这只是将结局改为Bartek解决方案的问题.最终结果是以下工作regEx:
(?<=")\w[\w\s]*(?=")|\w+|"[\w\s]*"
输入:这是"我的字符串"它有"六个匹配"
输出:
这里
是
"我的字符串"
它
具有
"六场比赛"
不幸的是,它包括引号.如果您改为使用以下内容:
(("((?.*?)(?[\w]+))(\s)*)
并明确捕获"令牌"匹配,如下所示:
RegexOptions options = RegexOptions.None; Regex regex = new Regex( @"((""((?.*?)(?[\w]+))(\s)*)", options ); string input = @" Here is ""my string"" it has "" six matches"" "; var result = (from Match m in regex.Matches( input ) where m.Groups[ "token" ].Success select m.Groups[ "token" ].Value).ToList(); for ( int i = 0; i < result.Count(); i++ ) { Debug.WriteLine( string.Format( "Token[{0}]: '{1}'", i, result[ i ] ) ); }
调试输出:
Token[0]: 'Here' Token[1]: 'is' Token[2]: 'my string' Token[3]: 'it' Token[4]: 'has' Token[5]: ' six matches'
最佳答案对我来说并不适用.我试图用空格分割这种字符串,但它看起来像是分裂点('.').
"the lib.lib" "another lib".lib
我知道问题是关于正则表达式,但我最终编写了一个非正则表达式函数来执行此操作:
////// Splits the string passed in by the delimiters passed in. /// Quoted sections are not split, and all tokens have whitespace /// trimmed from the start and end. public static List split(string stringToSplit, params char[] delimiters) { List results = new List (); bool inQuote = false; StringBuilder currentToken = new StringBuilder(); for (int index = 0; index < stringToSplit.Length; ++index) { char currentCharacter = stringToSplit[index]; if (currentCharacter == '"') { // When we see a ", we need to decide whether we are // at the start or send of a quoted section... inQuote = !inQuote; } else if (delimiters.Contains(currentCharacter) && inQuote == false) { // We've come to the end of a token, so we find the token, // trim it and add it to the collection of results... string result = currentToken.ToString().Trim(); if (result != "") results.Add(result); // We start a new token... currentToken = new StringBuilder(); } else { // We've got a 'normal' character, so we add it to // the curent token... currentToken.Append(currentCharacter); } } // We've come to the end of the string, so we add the last token... string lastResult = currentToken.ToString().Trim(); if (lastResult != "") results.Add(lastResult); return results; }
我正在使用Bartek Szabat的答案,但我需要在我的代币中捕获的不仅仅是"\ w"字符.为了解决这个问题,我略微修改了他的正则表达式,类似于Grzenio的回答:
Regular Expression: (?[^\s"]+)|(? "[^"]*") C# String: (? [^\\s\"]+)|(? \"[^\"]*\")
Bartek的代码(返回标记被删除的封闭引号)变为:
Regex .Matches(input, "(?[^\\s\"]+)|(? \"[^\"]*\")") .Cast () .Select(m => m.Groups["match"].Value) .ToList() .ForEach(s => Console.WriteLine(s));
我发现这个答案中的正则表达式非常有用.要使它在C#中工作,您必须使用MatchCollection类.
//need to escape \s string pattern = "[^\\s\"']+|\"([^\"]*)\"|'([^']*)'"; MatchCollection parsedStrings = Regex.Matches(line, pattern); for (int i = 0; i < parsedStrings.Count; i++) { //print parsed strings Console.Write(parsedStrings[i].Value + " "); } Console.WriteLine();