我想解析(首先,只识别,保持符号)LaTeX数学.现在,我遇到了超级和下标的问题,结合花括号(例如a^{bc}
,它们的组合,我已经基本a^b
正常工作了).一个最小的例子(在保持可读性的同时尽可能短):
#includeusing std::cout; #include using std::string; #include namespace x3 = boost::spirit::x3; using x3::space; using x3::char_; using x3::lit; using x3::repeat; x3::rule scripts = "super- and subscripts"; x3::rule braced_thing = "thing optionaly surrounded by curly braces"; x3::rule superscript = "superscript"; x3::rule subscript = "subscript"; // main rule: any number of items with or without braces auto const scripts_def = *braced_thing; // second level main rule: optional braces, and any number of characters or sub/superscripts auto const braced_thing_def = -lit('{') >> *(subscript | superscript | repeat(1)[(char_ - "_^{}")]) >> -lit('}'); // superscript: things of the form a^b where a and b can be surrounded by curly braces auto const superscript_def = braced_thing >> '^' >> braced_thing; // subscript: things of the form a_b where a and b can be surrounded by curly braces auto const subscript_def = braced_thing >> '_' >> braced_thing; BOOST_SPIRIT_DEFINE(scripts) BOOST_SPIRIT_DEFINE(braced_thing) BOOST_SPIRIT_DEFINE(superscript) BOOST_SPIRIT_DEFINE(subscript) int main() { const string input = "a^{b_x y}_z {v_x}^{{x^z}_y}"; string output; // will only contain the characters as the grammar is defined above auto first = input.begin(); auto last = input.end(); const bool result = x3::phrase_parse(first, last, scripts, space, output); if(first != last) std::cout << "partial match only:\n" << output << '\n'; else if(!result) std::cout << "parse failed!\n"; else std::cout << "parsing succeeded:\n" << output << '\n'; }
它也是Available on Coliru.
问题是,这个段错误(我肯定有明显的理由)而且我没有别的办法,好吧,用...表达语法来表达这个.
我还没有看过@cv_and_he的建议,而是自己调试你的语法.我想出了这个:
auto token = lexeme [ +~char_("_^{} \t\r\n") ]; auto simple = '{' >> sequence >> '}' | token; auto expr = lexeme [ simple % char_("_^") ]; auto sequence_def = expr % +space;
带给我的是基本上一步一步重新思考/想象实际语法是什么样的.
我花了两次尝试才想到正确的
"a b"
解析方法(起初我把它"砍死"了一个只是另一个下标操作符,char_(" _^")
但我得到的印象是不会像你期望的那样导致AST.你使用的线索这个空间的船长).
现在,没有AST,但我们只是"收获"使用..匹配的原始字符串x3::raw[...]
.
Live Coliru
//#define BOOST_SPIRIT_X3_DEBUG #include#include #include namespace x3 = boost::spirit::x3; namespace grammar { using namespace x3; rule sequence { "sequence" }; auto simple = rule {"simple"} = '{' >> sequence >> '}' | lexeme [ +~char_("_^{} \t\r\n") ]; auto expr = rule {"expr"} = lexeme [ simple % char_("_^") ]; auto sequence_def = expr % +space; BOOST_SPIRIT_DEFINE(sequence) } int main() { for (const std::string input : { "a", "a^b", "a_b", "a b", "{a}^{b}", "{a}_{b}", "{a} {b}", "a^{b_x y}", "a^{b_x y}_z {v_x}^{{x^z}_y}" }) { std::string output; // will only contain the characters as the grammar is defined above auto first = input.begin(), last = input.end(); bool result = x3::parse(first, last, x3::raw[grammar::sequence], output); if (result) std::cout << "Parse success: '" << output << "'\n"; else std::cout << "parse failed!\n"; if (last!=first) std::cout << "remaining unparsed: '" << std::string(first, last) << "'\n"; } }
输出:
Parse success: 'a' Parse success: 'a^b' Parse success: 'a_b' Parse success: 'a b' Parse success: '{a}^{b}' Parse success: '{a}_{b}' Parse success: '{a} {b}' Parse success: 'a^{b_x y}' Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}'
启用调试信息的输出:
Parse success: 'a' a a a Parse success: 'a^b' a^b a^b a^b ^b b Parse success: 'a_b' a_b a_b a_b _b b Parse success: 'a b' a b a b a b b b b b Parse success: '{a}^{b}' {a}^{b} {a}^{b} {a}^{b} a}^{b} a}^{b} a}^{b} }^{b} }^{b} }^{b} ^{b} {b} b} b} b} } } } Parse success: '{a}_{b}' {a}_{b} {a}_{b} {a}_{b} a}_{b} a}_{b} a}_{b} }_{b} }_{b} }_{b} _{b} {b} b} b} b} } } } Parse success: '{a} {b}' {a} {b} {a} {b} {a} {b} a} {b} a} {b} a} {b} } {b} } {b} } {b} {b} {b} {b} {b} b} b} b} } } } Parse success: 'a^{b_x y}' a^{b_x y} a^{b_x y} a^{b_x y} ^{b_x y} {b_x y} b_x y} b_x y} b_x y} _x y} x y} y} y} y} y} } } } Parse success: 'a^{b_x y}_z {v_x}^{{x^z}_y}' a^{b_x y}_z {v_x}^{{ a^{b_x y}_z {v_x}^{{ a^{b_x y}_z {v_x}^{{ ^{b_x y}_z {v_x}^{{x {b_x y}_z {v_x}^{{x^ b_x y}_z {v_x}^{{x^z b_x y}_z {v_x}^{{x^z b_x y}_z {v_x}^{{x^z _x y}_z {v_x}^{{x^z} x y}_z {v_x}^{{x^z}_ y}_z {v_x}^{{x^z}_y y}_z {v_x}^{{x^z}_y y}_z {v_x}^{{x^z}_y} y}_z {v_x}^{{x^z}_y} }_z {v_x}^{{x^z}_y} }_z {v_x}^{{x^z}_y} }_z {v_x}^{{x^z}_y} _z {v_x}^{{x^z}_y} z {v_x}^{{x^z}_y} {v_x}^{{x^z}_y} {v_x}^{{x^z}_y} {v_x}^{{x^z}_y} {v_x}^{{x^z}_y} v_x}^{{x^z}_y} v_x}^{{x^z}_y} v_x}^{{x^z}_y} _x}^{{x^z}_y} x}^{{x^z}_y} }^{{x^z}_y} }^{{x^z}_y} }^{{x^z}_y} ^{{x^z}_y} {{x^z}_y} {x^z}_y} {x^z}_y} {x^z}_y} x^z}_y} x^z}_y} x^z}_y} ^z}_y} z}_y} }_y} }_y} }_y} _y} y} } } }