想知道是否有一种简单的方法在Objective C中进行简单的HTML转义/ unescape.我想要的是这样的伪代码:
NSString *string = @"<span>Foo</span>"; [string stringByUnescapingHTML];
哪个回报
Foo
希望能够解决所有其他HTML实体,甚至是像Ӓ之类的ASCII代码.
Cocoa Touch/UIKit中有没有方法可以做到这一点?
查看我的NSString类别的XMLEntities.有解码XML实体(包括所有HTML字符引用),编码XML实体,剥离标记以及从字符串中删除换行符和空格的方法:
- (NSString *)stringByStrippingTags; - (NSString *)stringByDecodingXMLEntities; // Including all HTML character references - (NSString *)stringByEncodingXMLEntities; - (NSString *)stringWithNewLinesAsBRs; - (NSString *)stringByRemovingNewLinesAndWhitespace;
来自Google Toolbox for Mac的另一个HTML NSString类别
尽管有这个名字,但这也适用于iOS.
http://google-toolbox-for-mac.googlecode.com/svn/trunk/Foundation/GTMNSString+HTML.h
/// Get a string where internal characters that are escaped for HTML are unescaped // /// For example, '&' becomes '&' /// Handles and 2 cases as well /// // Returns: // Autoreleased NSString // - (NSString *)gtm_stringByUnescapingFromHTML;
我不得不在项目中只包含三个文件:标题,实现和GTMDefines.h
.
此链接包含以下解决方案.Cocoa CF具有CFXMLCreateStringByUnescapingEntities功能,但在iPhone上不可用.
@interface MREntitiesConverter : NSObject{ NSMutableString* resultString; } @property (nonatomic, retain) NSMutableString* resultString; - (NSString*)convertEntitiesInString:(NSString*)s; @end @implementation MREntitiesConverter @synthesize resultString; - (id)init { if([super init]) { resultString = [[NSMutableString alloc] init]; } return self; } - (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)s { [self.resultString appendString:s]; } - (NSString*)convertEntitiesInString:(NSString*)s { if (!s) { NSLog(@"ERROR : Parameter string is nil"); } NSString* xmlStr = [NSString stringWithFormat:@" %@ ", s]; NSData *data = [xmlStr dataUsingEncoding:NSUTF8StringEncoding allowLossyConversion:YES]; NSXMLParser* xmlParse = [[[NSXMLParser alloc] initWithData:data] autorelease]; [xmlParse setDelegate:self]; [xmlParse parse]; return [NSString stringWithFormat:@"%@",resultString]; } - (void)dealloc { [resultString release]; [super dealloc]; } @end
这是我所做的令人难以置信的黑客攻击解决方案,但是如果你想简单地转义字符串而不用担心解析,请执行以下操作:
-(NSString *)htmlEntityDecode:(NSString *)string { string = [string stringByReplacingOccurrencesOfString:@""" withString:@"\""]; string = [string stringByReplacingOccurrencesOfString:@"'" withString:@"'"]; string = [string stringByReplacingOccurrencesOfString:@"<" withString:@"<"]; string = [string stringByReplacingOccurrencesOfString:@">" withString:@">"]; string = [string stringByReplacingOccurrencesOfString:@"&" withString:@"&"]; // Do this last so that, e.g. @"<" goes to @"<" not @"<" return string; }
我知道这绝不是优雅的,但它完成了工作.然后,您可以通过调用解码元素:
string = [self htmlEntityDecode:string];
就像我说的,这是hacky,但它的工作原理.如果要编码字符串,只需反转stringByReplacingOccurencesOfString参数即可.
在iOS 7中,您可以使用NSAttributedString导入HTML以将HTML实体转换为NSString的功能.
例如:
@interface NSAttributedString (HTML) + (instancetype)attributedStringWithHTMLString:(NSString *)htmlString; @end @implementation NSAttributedString (HTML) + (instancetype)attributedStringWithHTMLString:(NSString *)htmlString { NSDictionary *options = @{ NSDocumentTypeDocumentAttribute : NSHTMLTextDocumentType, NSCharacterEncodingDocumentAttribute :@(NSUTF8StringEncoding) }; NSData *data = [htmlString dataUsingEncoding:NSUTF8StringEncoding]; return [[NSAttributedString alloc] initWithData:data options:options documentAttributes:nil error:nil]; } @end
然后在您想要清理实体的代码中:
NSString *cleanString = [[NSAttributedString attributedStringWithHTMLString:question.title] string];
这可能是最简单的方法,但我不知道它的性能如何.您应该非常确定您的"清理"内容不包含任何标签或类似内容,因为此方法会在HTML期间将这些图像下载到NSAttributedString转换.:)
这是一个中和所有字符的解决方案(通过将它们全部用于unicode值的HTML编码实体)...根据我的需要使用它(确保来自用户但放在webview中的字符串不能有任何字符串XSS攻击):
接口:
@interface NSString (escape) - (NSString*)stringByEncodingHTMLEntities; @end
执行:
@implementation NSString (escape) - (NSString*)stringByEncodingHTMLEntities { // Rather then mapping each individual entity and checking if it needs to be replaced, we simply replace every character with the hex entity NSMutableString *resultString = [NSMutableString string]; for(int pos = 0; pos<[self length]; pos++) [resultString appendFormat:@"%x;",[self characterAtIndex:pos]]; return [NSString stringWithString:resultString]; } @end
用法示例:
UIWebView *webView = [[UIWebView alloc] init]; NSString *userInput = @""; NSString *safeInput = [userInput stringByEncodingHTMLEntities]; [webView loadHTMLString:safeInput baseURL:nil];
你的里程会有所不同.