Recent Changes

Wednesday, May 10

  1. 5:53 am

Saturday, April 22

  1. msg SkipTo combined with CaselessKeyword gives unexpected results (possible bug) message posted SkipTo combined with CaselessKeyword gives unexpected results (possible bug) Yes, this looks like a bug, I'll get a fix into 2.2.1, but no ETA on that yet. Thanks, -- Paul
    SkipTo combined with CaselessKeyword gives unexpected results (possible bug)
    Yes, this looks like a bug, I'll get a fix into 2.2.1, but no ETA on that yet.

    Thanks,
    -- Paul
    8:26 pm

Friday, April 14

  1. msg SkipTo combined with CaselessKeyword gives unexpected results (possible bug) message posted SkipTo combined with CaselessKeyword gives unexpected results (possible bug) Hi, If I combine CaselessKeyword with SkiptTo, SkipTo stops at non-word boundaries for the keywo…
    SkipTo combined with CaselessKeyword gives unexpected results (possible bug)
    Hi,

    If I combine CaselessKeyword with SkiptTo, SkipTo stops at non-word boundaries for the keyword. The normal Keyword seems not to have this problem.

    Reproducer:
    from pyparsing import *
    kw = CaselessKeyword("KEY")
    st=SkipTo(kw)
    st.parseString("hello KEY")
    st.parseString("helloKEY") # false positive here
    2:06 am

Monday, April 10

  1. msg Parsing mardown-styled text message posted Parsing mardown-styled text In the end, the "StopOnSuffix" class above revealed to be incorrect because we have no wa…
    Parsing mardown-styled text
    In the end, the "StopOnSuffix" class above revealed to be incorrect because we have no way to know how many whitespaces were skipped and hence backtrack properly:
    https://github.com/Lucas-C/linux_configuration/blob/master/languages/python/pyparsing_StopOnSuffix.py#L36

    Hence I made another implementation in the same spirit as CharsNotIn : https://github.com/Lucas-C/linux_configuration/blob/master/languages/python/mindmaps/pseudo_markdown_parser.py#L39
    It is more limited (the forbidden char sequences are stop words) but it works properly.
    4:16 am

Friday, April 7

  1. msg Parsing mardown-styled text message posted Parsing mardown-styled text But after some tests, such alternative approach does not allow for nested markers.
    Parsing mardown-styled text
    But after some tests, such alternative approach does not allow for nested markers.
    7:04 am
  2. msg Parsing mardown-styled text message posted Parsing mardown-styled text I guess a cleaner approach to this form of backtracking would be to create an alternative to the &q…
    Parsing mardown-styled text
    I guess a cleaner approach to this form of backtracking would be to create an alternative to the "CharsNotIn" that considers more than 1 character at a time.
    6:47 am
  3. msg Parsing mardown-styled text message posted Parsing mardown-styled text I think I've actually come up with a solution, and found a bug at the same occasion. Here is the…
    Parsing mardown-styled text
    I think I've actually come up with a solution, and found a bug at the same occasion.

    Here is the bug: https://sourceforge.net/p/pyparsing/code/HEAD/tree/trunk/src/pyparsing.py#l380
    The full codetoklistcode should be passed to the codeParseResultscode constructor, not just its first element:
    self[name] = _ParseResultsWithOffset(ParseResults(toklist),0)

    And now a solution to my problem:
    #!/usr/bin/python3
     
    from pyparsing import Forward, Literal, OneOrMore, Suppress, Token, Word, basestring, printables
     
    class StopOnSuffix(Token): # cannot be a TokenConverter because .postParse does not alter loc
        def __init__( self, token_matcher, suffixes ):
            super(StopOnSuffix,self).__init__()
            self.name = 'StopOnSuffix'
            self.mayReturnEmpty = token_matcher.mayReturnEmpty
            self.mayIndexError = token_matcher.mayIndexError
            self.saveAsList = token_matcher.saveAsList
            self.token_matcher = token_matcher
            self.suffixes = suffixes
        def parseImpl( self, instring, loc, doActions=True ):
            loc, tokens = self.token_matcher.parseImpl(instring, loc, doActions)
            try:
                suffix, match_index = next((suffix, i) for i, match in enumerate(tokens)
                                                       for suffix in self.suffixes if suffix in match)
                match = tokens[match_index]
                match_trun_len = match.index(suffix)
                if match_trun_len > 0:
                    loc -= len(match) - match_trun_len
                    match = match[:match_trun_len]
                    tokens = tokens[:match_index] + [match] + tokens[match_index+1:]
            except StopIteration:
                pass
            return loc, tokens
     
    Bold = Suppress(Literal('**'))
    Italic = Suppress(Literal('__'))
    Text = OneOrMore(Word(printables))
     
    StyledText = Forward()
    BoldText = Bold + StopOnSuffix(StyledText, ['**'])('is_bold') + Bold
    ItalicText = Italic + StopOnSuffix(StyledText, ['__'])('is_italic') + Italic
    StyledText << (BoldText | ItalicText | Text)
    StyledText.resultsName = 'text'
    StyledText.saveAsList = True  # must be done at this point, not before
     
    def test(msg):
        parsed = StyledText.parseString(msg, parseAll=True)
        #print(parsed.dump())
        print('msg: {} => tokens={} is_bold={} is_italic={}'.format(msg, parsed.text, bool(parsed.is_bold), bool(parsed.is_italic)))
    test('**a text**')
    test('**__a text__**')
    test('__**a text**__')
    test('a **text**')
    test('__**a text__**')

    What do you think ? Does it make sense as a Token subclass ?
    6:31 am
  4. msg Parsing mardown-styled text message posted Parsing mardown-styled text Damn, wikispaces interpreted the syle in my examples ^^ I cannot edit my message so I recopy the…
    Parsing mardown-styled text
    Damn, wikispaces interpreted the syle in my examples ^^

    I cannot edit my message so I recopy them there with style markers escaped:
    - **a text** => bold
    - **__a text__** => bold italic
    - __**a text**__ => bold italic
    - a **text** => just text, I don't want to handle stylers inside strings
    - __**a text__** => just text, style markers are messed-up
    1:17 am
  5. msg Parsing mardown-styled text message posted Parsing mardown-styled text Hello ! I've been stuck on this challenge yesterday, so I'm asking for help here in hope someon…
    Parsing mardown-styled text
    Hello !

    I've been stuck on this challenge yesterday,
    so I'm asking for help here in hope someone could give me some tips & tricks on how to solve this.

    My goal is to detect when a string as Markdown-like style markers at the beginning + the end. Examples:

    - a text => bold
    - a text => bold italic
    - a text => bold italic
    - a text => just text, I don't want to handle stylers inside strings
    - a text => just text, style markers are messed-up

    Here is what I initially came up with:

    #!/usr/bin/python3
     
    from pyparsing import *
     
    Bold = Suppress(Literal('**'))
    Italic = Suppress(Literal('__'))
     
    Text = OneOrMore(Word(printables))('text')
     
    StyledText = Forward()
    BoldText = (Bold + StyledText + Bold)('is_bold')
    ItalicText = (Italic + StyledText + Italic)('is_italic')
    StyledText << (BoldText | ItalicText | Text)
     
    print(StyledText.parseString('**toto tata**', parseAll=True).dump())
    print(StyledText.parseString('**__toto tata__**', parseAll=True).dump())

    Then I realized from this SO answer that I needed to make my "Text" parser stop when it detects a "**" or "__" :
    http://stackoverflow.com/questions/39840633/parsing-pyparsing-group-of-mixed-character-words

    I haven't found any class nor function performing this in pyparsing builtin tools. Should I subclass Token to build my own parser like in this example ?
    http://stackoverflow.com/questions/2212860/pyparsing-question

    Regards.
    1:14 am

Sunday, April 2

  1. 9:45 am

More