在Objective-C中标记/拆分NSString的最佳方法是什么?
在http://borkware.com/quickies/one?topic=NSString(有用的链接)上找到了这个:
NSString *string = @"oop:ack:bork:greeble:ponies"; NSArray *chunks = [string componentsSeparatedByString: @":"];
希望这可以帮助!
亚当
每个人都提到componentsSeparatedByString:
但你也可以使用CFStringTokenizer
(记住一个NSString
并且CFString
可以互换),它也会对自然语言进行标记(比如中文/日文不会在空格上分割单词).
如果您只想拆分字符串,请使用-[NSString componentsSeparatedByString:]
.要进行更复杂的标记化,请使用NSScanner类.
如果您的标记化需求更复杂,请查看我的开源Cocoa String标记化/解析工具包:ParseKit:
http://parsekit.com
对于使用分隔符char(例如':')简单拆分字符串,ParseKit肯定会有点过分.但同样,对于复杂的标记化需求,ParseKit非常强大/灵活.
另请参阅ParseKit Tokenization文档.
如果要对多个字符进行标记,可以使用NSString componentsSeparatedByCharactersInSet
.NSCharacterSet有一些方便的预制集,如whitespaceCharacterSet
和illegalCharacterSet
.它还具有Unicode范围的初始化程序.
您还可以组合字符集并使用它们进行标记,如下所示:
// Tokenize sSourceEntityName on both whitespace and punctuation.
NSMutableCharacterSet *mcharsetWhitePunc = [[NSCharacterSet whitespaceAndNewlineCharacterSet] mutableCopy];
[mcharsetWhitePunc formUnionWithCharacterSet:[NSCharacterSet punctuationCharacterSet]];
NSArray *sarrTokenizedName = [self.sSourceEntityName componentsSeparatedByCharactersInSet:mcharsetWhitePunc];
[mcharsetWhitePunc release];
请注意,componentsSeparatedByCharactersInSet
如果连续遇到charSet的多个成员,将生成空字符串,因此您可能希望测试长度小于1的长度.
如果您想在保留“带引号的短语”的同时将字符串标记为搜索词,则此NSString
类别应尊重各种类型的引号对:""
''
‘’
“”
用法:
NSArray *terms = [@"This is my \"search phrase\" I want to split" searchTerms]; // results in: ["This", "is", "my", "search phrase", "I", "want", "to", "split"]
码:
@interface NSString (Search) - (NSArray *)searchTerms; @end @implementation NSString (Search) - (NSArray *)searchTerms { // Strip whitespace and setup scanner NSCharacterSet *whitespace = [NSCharacterSet whitespaceAndNewlineCharacterSet]; NSString *searchString = [self stringByTrimmingCharactersInSet:whitespace]; NSScanner *scanner = [NSScanner scannerWithString:searchString]; [scanner setCharactersToBeSkipped:nil]; // we'll handle whitespace ourselves // A few types of quote pairs to check NSDictionary *quotePairs = @{@"\"": @"\"", @"'": @"'", @"\u2018": @"\u2019", @"\u201C": @"\u201D"}; // Scan NSMutableArray *results = [[NSMutableArray alloc] init]; NSString *substring = nil; while (scanner.scanLocation < searchString.length) { // Check for quote at beginning of string unichar unicharacter = [self characterAtIndex:scanner.scanLocation]; NSString *startQuote = [NSString stringWithFormat:@"%C", unicharacter]; NSString *endQuote = [quotePairs objectForKey:startQuote]; if (endQuote != nil) { // if it's a valid start quote we'll have an end quote // Scan quoted phrase into substring (skipping start & end quotes) [scanner scanString:startQuote intoString:nil]; [scanner scanUpToString:endQuote intoString:&substring]; [scanner scanString:endQuote intoString:nil]; } else { // Single word that is non-quoted [scanner scanUpToCharactersFromSet:whitespace intoString:&substring]; } // Process and add the substring to results if (substring) { substring = [substring stringByTrimmingCharactersInSet:whitespace]; if (substring.length) [results addObject:substring]; } // Skip to next word [scanner scanCharactersFromSet:whitespace intoString:nil]; } // Return non-mutable array return results.copy; } @end