我正在研究一种文本摘要方法,对于测试我的方法我有一个基准调用doc 2007
,在这个基准测试中我有很多xml文件,我应该清除该文件.
例如,我有一个xml
这样的文件:
The nature of the proceeding 1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises. In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions. ;
我应该在之间提取字符串
,
我的意思是结果应该是这样的:
The nature of the proceeding 1 The principal issue in this proceeding is whether the Victorian Arts Centre falls within the category of 'premises of State Government Departments and Instrumentalities', for the purposes of provisions in industrial awards relating to rates of payment for persons employed in cleaning those premises. In turn, this depends upon whether the Victorian Arts Centre Trust, a statutory corporation established by the Victorian Arts Centre Act 1979 (Vic) ('the VAC Act'), is properly described as a State Government department or instrumentality, for the purposes of the award provisions.
我发现了一些这样的事情:
Regex.Match("User name (sales)", @"\(([^)]*)\)").Groups[1].Value
使用Regex
,但它不起作用.你可以给我一个快速的解决方案吗?
使用LINQ to XML应该更容易:
var res = XElement.Parse(xml) .Descendants("sentence").Where(e => e.Attribute("id").Value == "s0") .FirstOrDefault().Value;
或者,正如耶尔达尔所说,更清洁的方式是:
var s0 = XElement.Parse(xml) .Descendants("sentence").FirstOrDefault(e => e.Attribute("id").Value == "s0") .Value;