我有一个应用程序需要允许用户编写类似于excel的表达式:
(H1 +(D1/C3))*I8
更复杂的事情
如果(H1 ='True',D3*.2,D3*.5)
我只能用正则表达式做这么多.任何有关正确方法的建议以及我可以学习的任何资源都将不胜感激.
谢谢!
其他一些问题,你会发现:
如何编写编程语言?
学习编写一个编译器
在python中为DSL编写编译器
祝好运!
当面对类似的情况 - 需要处理短的单行表达式时 - 我写了一个解析器.表达式是表单的布尔逻辑
n1 = y and n2 > z n2 != x or (n3 > y and n4 = z)
等等.在英语中你可以说有AND和OR连接的原子,每个原子有三个元素 - 一个左侧属性,一个运算符和一个值.因为它是如此的succint我认为解析更容易.可能属性的集合是已知且有限的(例如:名称,大小,时间).运算符因属性而异:不同的属性采用不同的运算符集.并且可能值的范围和格式也根据属性而变化.
要解析,我使用String.Split()在空格上拆分字符串.我后来意识到在Split()之前,我需要规范化输入字符串 - 在parens之前和之后插入空格.我用regex.Replace()做到了这一点.
拆分的输出是一个令牌数组.然后解析发生在一个大的for循环中,左侧属性值上有一个开关.随着循环的每次循环,我被设置为在一组令牌中啜饮.如果第一个令牌是开放式的,那么该组只有一个令牌:paren本身.对于众所周知的名称 - 我的属性值 - 解析器必须在一组3个令牌中啜饮,每个令牌对应于名称,运算符和值.如果在任何时候没有足够的令牌,解析器会抛出异常.基于令牌流,解析器状态将发生变化.连接(AND,OR,XOR)意味着将前一个原子推到一个堆栈上,当下一个原子完成时,我会弹出前一个原子并将这两个原子连接成一个复合原子.等等.
Atom current; for (int i=0; i < tokens.Length; i++) { switch (tokens[i].ToLower()) { case "name": if (tokens.Length <= i + 2) throw new ArgumentException(); Comparison o = (Comparison) EnumUtil.Parse(typeof(Comparison), tokens[i+1]); current = new NameAtom { Operator = o, Value = tokens[i+2] }; i+=2; stateStack.Push(ParseState.AtomDone); break; case "and": case "or": if (tokens.Length <= i + 3) throw new ArgumentException(); pendingConjunction = (LogicalConjunction)Enum.Parse(typeof(LogicalConjunction), tokens[i].ToUpper()); current = new CompoundAtom { Left = current, Right = null, Conjunction = pendingConjunction }; atomStack.Push(current); break; case "(": state = stateStack.Peek(); if (state != ParseState.Start && state != ParseState.ConjunctionPending && state != ParseState.OpenParen) throw new ArgumentException(); if (tokens.Length <= i + 4) throw new ArgumentException(); stateStack.Push(ParseState.OpenParen); break; case ")": state = stateStack.Pop(); if (stateStack.Peek() != ParseState.OpenParen) throw new ArgumentException(); stateStack.Pop(); stateStack.Push(ParseState.AtomDone); break; // more like that... case "": // do nothing in the case of whitespace break; default: throw new ArgumentException(tokens[i]); } // insert housekeeping for parse states here }
这是简化的,只是一点点.但这个想法是每个案例陈述都相当简单.在表达式的原子单元中解析很容易.棘手的部分是将它们恰当地加在一起.
使用状态堆栈和原子堆栈,在每个slurp循环结束时的内务部分完成了这个技巧.根据解析器状态可能会发生不同的事情.正如我所说的,在每个case语句中,解析器状态可能会改变,先前的状态会被压入堆栈.然后在switch语句的末尾,如果状态说我刚刚完成解析一个原子,并且有一个挂起的连接,我会将刚刚解析的原子移动到CompoundAtom中.代码如下所示:
state = stateStack.Peek(); if (state == ParseState.AtomDone) { stateStack.Pop(); if (stateStack.Peek() == ParseState.ConjunctionPending) { while (stateStack.Peek() == ParseState.ConjunctionPending) { var cc = critStack.Pop() as CompoundAtom; cc.Right = current; current = cc; // mark the parent as current (walk up the tree) stateStack.Pop(); // the conjunction is no longer pending state = stateStack.Pop(); if (state != ParseState.AtomDone) throw new ArgumentException(); } } else stateStack.Push(ParseState.AtomDone); }
另外一点神奇的是EnumUtil.Parse.这允许我将"<"之类的东西解析为枚举值.假设您定义了这样的枚举:
internal enum Operator { [Description(">")] GreaterThan, [Description(">=")] GreaterThanOrEqualTo, [Description("<")] LesserThan, [Description("<=")] LesserThanOrEqualTo, [Description("=")] EqualTo, [Description("!=")] NotEqualTo }
通常,Enum.Parse查找枚举值的符号名称,<不是有效的符号名称.EnumUtil.Parse()在描述中查找事物.代码如下所示:
internal sealed class EnumUtil { ////// Returns the value of the DescriptionAttribute if the specified Enum value has one. /// If not, returns the ToString() representation of the Enum value. /// /// The Enum to get the description for ///internal static string GetDescription(System.Enum value) { FieldInfo fi = value.GetType().GetField(value.ToString()); var attributes = (DescriptionAttribute[])fi.GetCustomAttributes(typeof(DescriptionAttribute), false); if (attributes.Length > 0) return attributes[0].Description; else return value.ToString(); } /// /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object. /// Note: Utilised the DescriptionAttribute for values that use it. /// /// The System.Type of the enumeration. /// A string containing the name or value to convert. ///internal static object Parse(Type enumType, string value) { return Parse(enumType, value, false); } /// /// Converts the string representation of the name or numeric value of one or more enumerated constants to an equivilant enumerated object. /// A parameter specified whether the operation is case-sensitive. /// Note: Utilised the DescriptionAttribute for values that use it. /// /// The System.Type of the enumeration. /// A string containing the name or value to convert. /// Whether the operation is case-sensitive or not. ///internal static object Parse(Type enumType, string stringValue, bool ignoreCase) { if (ignoreCase) stringValue = stringValue.ToLower(); foreach (System.Enum enumVal in System.Enum.GetValues(enumType)) { string description = GetDescription(enumVal); if (ignoreCase) description = description.ToLower(); if (description == stringValue) return enumVal; } return System.Enum.Parse(enumType, stringValue, ignoreCase); } }
我从其他地方得到了EnumUtil.Parse()的东西.也许在这里?