Recent Changes

Yesterday

  1. 7:10 am

Friday, July 15

  1. msg Constructing ParseResults message posted Constructing ParseResults Thanks but I’m not actually parsing C. I just chose a small part of it as a simplified view on the …
    Constructing ParseResults
    Thanks but I’m not actually parsing C. I just chose a small part of it as a simplified view on the problem I was facing.

    The actual project will have next to no keywords, a line- and indentation-based grammar with shell-style line-end comments, and the thing for which I substituted word is in fact an Or of a double-quoted single-line string, triple-double-quoted multiline string, and an unquoted non-empty sequence of non-whitespace Unicode characters except for #, " and ,. I did not include any of that in the example because it wasn’t relevant to the question.
    4:32 am
  2. msg Constructing ParseResults message posted Constructing ParseResults Here are some style suggestions for your parser - use any or none as you prefer: changed a few…
    Constructing ParseResults
    Here are some style suggestions for your parser - use any or none as you prefer:

    • changed a few names to be a little more explicit
    • some idioms to simplify creating keyword expressions, instead of line after line of "KEYWORD = Keyword('keyword')"
    • changed 'word' to 'ident', and expanded to support more common identifier form
    • add support for nested struct definition
    • ignore comments

    #!/usr/bin/python3
     
    from pyparsing import *
     
    def expand_multiple_member_names(tokens):
        items = []
        for token in tokens:
            for name in token.names:
                item = ParseResults([token.type, name])
                item['type'] = token.type
                item['name'] = name
                items.append(item)
        return ParseResults(items)
     
    # suppressable punctuation
    SEMI,LBRACE,RBRACE = map(Suppress, ";{}")
     
    # define keyword expressions like STRUCT=Keyword("struct"), etc.
    keywords = "union,struct,typedef".split(',')
    for kw in keywords:
        globals()[kw.upper()] = Keyword(kw)
        # or use exec if you prefer:
        # exec("{} = Keyword('{}')".format(kw.upper(), kw))
     
    # generic identifier - if using latest pyparsing version, use 
    # ident = pyparsing_common.identifier
    ident = Word(alphas+'_', alphanums+'_')
     
    # define a Forward for types, since they can be recursive
    type_decl = Forward()('type')
     
    struct_members_decl = Group(type_decl
        + Group(delimitedList(ident))('names')
        + SEMI).setParseAction(expand_multiple_member_names)
     
    struct_type = Group(
        STRUCT + Optional(ident, '<none>')('name') + LBRACE
        + Group(ZeroOrMore(struct_members_decl))('members') + RBRACE
        )
     
    # expand as necessary to include '*'s, '&'s, etc.
    type_decl <<= Group(struct_type('struct')) | ident
     
    struct_decl = struct_type('struct') + SEMI
     
    # skip over comments, wherever they occur - only need to
    # make this call once, at the topmost level, will propagate down
    # to all embedded expressions
    struct_decl.ignore(cppStyleComment)
     
     
    testString = '''
    struct Foo {
        struct {
            float a,b,c;
            } values;
        int x, y;
        float z;
        // char* s;
    };
    '''
     
    result = struct_decl.parseString(testString, parseAll=True)
    print(result.dump())
     

    Gives:

    [['struct', 'Foo', [[[['struct', '<none>', [['float', 'a'], ['float', 'b'], etc. ...
    - struct: ['struct', 'Foo', [[[['struct', '<none>', [['float', 'a'],  etc. ...
      - members: [[[['struct', '<none>', [['float', 'a'], ['float', 'b'],  etc. ...
        [0]:
          [[['struct', '<none>', [['float', 'a'], ['float', 'b'], ['float', 'c']]]], 'values']
          - name: values
          - type: [['struct', '<none>', [['float', 'a'], ['float', 'b'], ['float', 'c']]]]
            - struct: ['struct', '<none>', [['float', 'a'], ['float', 'b'], ['float', 'c']]]
              - members: [['float', 'a'], ['float', 'b'], ['float', 'c']]
                [0]:
                  ['float', 'a']
                  - name: a
                  - type: float
                [1]:
                  ['float', 'b']
                  - name: b
                  - type: float
                [2]:
                  ['float', 'c']
                  - name: c
                  - type: float
              - name: <none>
        [1]:
          ['int', 'x']
          - name: x
          - type: int
        [2]:
          ['int', 'y']
          - name: y
          - type: int
        [3]:
          ['float', 'z']
          - name: z
          - type: float
      - name: Foo
    2:37 am
  3. msg Constructing ParseResults message posted Constructing ParseResults Thank you for the reply. I am not particularly attached to asXML , I just found its output easi…
    Constructing ParseResults
    Thank you for the reply.

    I am not particularly attached to asXML, I just found its output easier to read initially. Something in its element name assignment did in fact strike me as strange, but I didn’t expect it to be downright misleading.

    Now that I don’t have to assign results names in a way that makes asXML output pretty, I find that it is sufficient to drop the results name on members (which I didn’t like anyway) and do away with the _ParseResultsWithOffset invocation. Moving the action up to ZeroOrMore(members) seems to be unnecessary, which is just as well because in the actual grammar I’m implementing members can be interleaved with other rules.

    Iteration over tokens does look nicer than accessing tokens[0], although does not affect operation if I keep the action attached to members.

    Here is the final implementation I ended up with:

    #!/usr/bin/python3
     
    from pyparsing import *
     
    def expand(tokens):
        items = []
        for token in tokens:
            for name in token.names:
                item = ParseResults([token.type, name])
                item['type'] = token.type
                item['name'] = name
                items.append(item)
        return ParseResults(items)
     
    word = Word(alphas)
    members = Group(word('type')
        + Group(delimitedList(word))('names')
        + Suppress(';')).setParseAction(expand)
    structKeyword = Suppress(Keyword('struct'))
    struct = Group(
        structKeyword + word('name') + Suppress('{')
        + Group(ZeroOrMore(members))('members') + Suppress('}')
        + Suppress(';'))('struct')
     
    testString = '''
    struct Foo {
        int x, y;
        float z;
    };
    '''
     
    result = struct.parseString(testString, parseAll=True)
    print(result.dump())
    12:54 am

Thursday, July 14

  1. msg Constructing ParseResults message posted Constructing ParseResults Here is an après-parse converter for these ParseResults to XML: import xml.etree.ElementT…
    Constructing ParseResults
    Here is an après-parse converter for these ParseResults to XML:

    import xml.etree.ElementTree as ET
    def to_struct_XML(pr):
        ret = ET.Element(pr.struct.name)
        members = ET.Element('members')
        for member in pr.struct.members:
            member_element = ET.Element('member')
            type_element = ET.Element('type')
            type_element.text = member.type
            name_element = ET.Element('name')
            name_element.text = member.name
            member_element.append(type_element)
            member_element.append(name_element)
            members.append(member_element)
        ret.append(members)
        return ret
     
    import io
    out = io.BytesIO()
    ET.ElementTree(to_struct_XML(result)).write(out)
    xml = out.getvalue().decode('UTF-8')
     
    xml = xml.replace('><', '>\n<')
    print(xml)

    Gives:

    <Foo>
    <members>
    <member>
    <type>int</type>
    <name>x</name>
    </member>
    <member>
    <type>int</type>
    <name>y</name>
    </member>
    <member>
    <type>float</type>
    <name>z</name>
    </member>
    </members>
    </Foo>
    1:58 pm
  2. msg Constructing ParseResults message posted Constructing ParseResults (deleted)
    1:57 pm
  3. msg Constructing ParseResults message posted Constructing ParseResults Are you absolutely tied to using asXML() to list out the contents of your parsed data? I think I am…
    Constructing ParseResults
    Are you absolutely tied to using asXML() to list out the contents of your parsed data? I think I am going to deprecate this method, as it really is much less reliable than using dump() (having to match up results values with results names after-the-fact).

    Using dump() with your original code gives this:

    [['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]]
    - struct: ['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]
      - members: [['int', 'x'], ['int', 'y'], ['float', 'z']]
        - members: [['float', 'z']]
          [0]:
            ['float', 'z']
            - member: ['float']
            - name: z
            - type: float
      - name: Foo

    We can see all the expanded type-name pairs in the list of members, but they aren't in the named sub list (only 'z' is there, the last matching member). This usually indicates that multiple expressions are being matched with the same name, and only the last one is being kept. When using the old .setResultsName() form, this would be remedied using listAllMatches=True. With the new callable short form, you can fix by appending a '*' to the name (which I will change to 'member' from 'members'):

    members = Group(word('type')
        + Group(delimitedList(word))('names')
        + Suppress(';'))('member*').setParseAction(expand)

    This now gets us closer:

    [['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]]
    - struct: ['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]
      - members: [['int', 'x'], ['int', 'y'], ['float', 'z']]
        - member: [[['int', 'x'], ['int', 'y']], [['float', 'z']]]
          [0]:
            [['int', 'x'], ['int', 'y']]
            [0]:
              ['int', 'x']
              - member: ['int']
              - name: x
              - type: int
            [1]:
              ['int', 'y']
              - member: ['int']
              - name: y
              - type: int
          [1]:
            [['float', 'z']]
            [0]:
              ['float', 'z']
              - member: ['float']
              - name: z
              - type: float
      - name: Foo

    But now the expanded "int x,y" to [['int', 'x'], ['int', 'y']] is buried within the 0'th element of members, instead of being the first 2 of a 3-element members list. At this point, it seems that the solution is to attach the expand() parse action not to the individual member expression, but to the collective members expression:

    def expand(tokens):
        ret = []
        for token in tokens:
            for name in token.names:
                mem_pr = ParseResults([token.type, name])
                mem_pr['type'] = token.type
                mem_pr['name'] = name
                ret.append(mem_pr)
        return ParseResults(ret)
     
    word = Word(alphas)
    members = Group(word('type')
        + Group(delimitedList(word))('names')
        + Suppress(';'))
    structKeyword = Suppress(Keyword('struct'))
    struct = Group(
        structKeyword + word('name') + Suppress('{')
        + Group(ZeroOrMore(members).setParseAction(expand))('members') + Suppress('}')
        + Suppress(';'))('struct')

    Now parsing your test string and printing out the results using dump() gives:

    [['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]]
    - struct: ['Foo', [['int', 'x'], ['int', 'y'], ['float', 'z']]]
      - members: [['int', 'x'], ['int', 'y'], ['float', 'z']]
        [0]:
          ['int', 'x']
          - name: x
          - type: int
        [1]:
          ['int', 'y']
          - name: y
          - type: int
        [2]:
          ['float', 'z']
          - name: z
          - type: float
      - name: Foo

    Which looks closer to your desired expanded struct. If you *absolutely* need XML output from this, then I would write a custom XML serializer for this structure, which will be much more reliable in picking out names, members, member types and member names than the guessing game that asXML() uses.

    -- Paul
    1:26 pm
  4. msg Constructing ParseResults message posted Constructing ParseResults I have trouble creating ParseResults in my program. Versions: Python 3.5.1 and pyparsing 2.0.3…
    Constructing ParseResults
    I have trouble creating ParseResults in my program.

    Versions: Python 3.5.1 and pyparsing 2.0.3 as packaged in Ubuntu 16.04.

    Suppose a grammar very much like C structures. Basically, a structure has a name and a bunch of members. A member has a type and a name. For simplicity, assume that types and names are arbitrary words. As a convenience, several consecutive members of the same type can be introduced by listing their names delimited with commas.

    struct Foo {
        int x, y;
        float z;
    };

    The following pyparsing grammar naturally follows:

    from pyparsing import *
     
    word = Word(alphas)
    members = Group(word('type')
        + Group(delimitedList(word))('names')
        + Suppress(';'))('members')
    structKeyword = Suppress(Keyword('struct'))
    struct = Group(
        structKeyword + word('name') + Suppress('{')
        + Group(ZeroOrMore(members))('members') + Suppress('}')
        + Suppress(';'))('struct')

    This grammar produces ParseResults of the following kind:

    <root>
      <struct>
        <name>Foo</name>
        <members>
          <members>
            <type>int</type>
            <names>
              <ITEM>x</ITEM>
              <ITEM>y</ITEM>
            </names>
          </members>
          <members>
            <type>float</type>
            <names>
              <ITEM>z</ITEM>
            </names>
          </members>
        </members>
      </struct>
    </root>

    However, this is a nuisance to work with later. I would like to desugar the comma-separated definitions, to get the following tree:

    <root>
      <struct>
        <name>Foo</name>
        <members>
          <member>
            <type>int</type>
            <name>x</name>
          </member>
          <member>
            <type>int</type>
            <name>y</name>
          </member>
          <member>
            <type>float</type>
            <name>z</name>
          </member>
        </members>
      </struct>
    </root>

    I could do that as a postprocessing step, by walking the ParsingResults and building a data structure of my own. This is straightforward but boring, especially considering that in the real program there are quite a few more grammar rules.

    The next obvious way is to add a parsing action, and that’s where I get stumped.

    I imagine the action needs to be attached to the members grammar rule. It receives a “list” of one element which is a “dictionary” whose one key is type and the other is names. type is a string while names is a list of strings. The action needs to return a “list” of “dictionaries”, one for each name in the original’s names.

    The following kind of works:

    def expand(tokens):
        return [{'type': token.type, 'name': name}
                for token in tokens
                for name in token.names]

    Namely, it produces the following structure:

    <root>
      <struct>
        <name>Foo</name>
        <members>
          <members>{&apos;name&apos;: &apos;x&apos;, &apos;type&apos;: &apos;int&apos;}</members>
          <ITEM>{&apos;name&apos;: &apos;y&apos;, &apos;type&apos;: &apos;int&apos;}</ITEM>
          <members>{&apos;name&apos;: &apos;z&apos;, &apos;type&apos;: &apos;float&apos;}</members>
        </members>
      </struct>
    </root>

    Notice how individual member definitions are rendered as a text representation of a Python dictionary. When accessed as Python dictionaries (x['type']), they work as intended. But they cannot be accessed as namespaces (x.type) or lists (x[0]), and the XML rendition is ugly.

    It becomes clear that I have to construct a proper ParseResults structure. I sort of managed to do this:

    import pyparsing
     
    def expand(tokens):
        items = []
        for name in tokens[0].names:
            item = ParseResults([tokens[0].type, name], 'member')
            item['type'] = tokens[0].type
            item['name'] = pyparsing._ParseResultsWithOffset(name, 1)
            items.append(item)
        return ParseResults(items)

    I don’t like it because (1) I have to duplicate the element values in the constructor call and in subsequent item assignments, and (2) I am forced to delve into undocumented private implementation details (_ParseResultsWithOffset).

    So what I’d like to ask is:

    • Is my goal (to apply structural transformations during parsing, while keeping the whole tree accessible as ParseResults) sane? Or should I fall back to transforming the complete parsed AST to a different data structure after the fact?
    • If it is sane, what is the proper approach that does not suffer from the deficiencies outlined above?

    For easy reproduction, here’s the complete test program:

    #!/usr/bin/python3
     
    import pyparsing
    from pyparsing import *
     
    def expand1(tokens):
        return [{'type': token.type, 'name': name}
                for token in tokens
                for name in token.names]
     
    def expand(tokens):
        items = []
        for name in tokens[0].names:
            item = ParseResults([tokens[0].type, name], 'member')
            item['type'] = tokens[0].type
            item['name'] = pyparsing._ParseResultsWithOffset(name, 1)
            items.append(item)
        return ParseResults(items)
     
    word = Word(alphas)
    members = Group(word('type')
        + Group(delimitedList(word))('names')
        + Suppress(';'))('members').setParseAction(expand)
    structKeyword = Suppress(Keyword('struct'))
    struct = Group(
        structKeyword + word('name') + Suppress('{')
        + Group(ZeroOrMore(members))('members') + Suppress('}')
        + Suppress(';'))('struct')
     
    testString = '''
    struct Foo {
        int x, y;
        float z;
    };
    '''
     
    result = struct.parseString(testString, parseAll=True)
    print(result.asXML('root'))
    11:33 am

Monday, July 11

  1. 7:49 pm

Friday, June 24

  1. 12:48 pm

More