Python library for creating PEG parsers

Overview

PyParsing -- A Python Parsing Module

Build Status Coverage

Introduction

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code.

[Since first writing this description of pyparsing in late 2003, this technique for developing parsers has become more widespread, under the name Parsing Expression Grammars - PEGs. See more information on PEGs at https://en.wikipedia.org/wiki/Parsing_expression_grammar .]

Here is a program to parse "Hello, World!" (or any greeting of the form "salutation, addressee!"):

from pyparsing import Word, alphas
greet = Word(alphas) + "," + Word(alphas) + "!"
hello = "Hello, World!"
print(hello, "->", greet.parseString(hello))

The program outputs the following:

Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions.

The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.

The pyparsing module handles some of the problems that are typically vexing when writing text parsers:

  • extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
  • quoted strings
  • embedded comments

The examples directory includes a simple SQL parser, simple CORBA IDL parser, a config file parser, a chemical formula parser, and a four- function algebraic notation parser, among many others.

Documentation

There are many examples in the online docstrings of the classes and methods in pyparsing. You can find them compiled into online docs at https://pyparsing-docs.readthedocs.io/en/latest/. Additional documentation resources and project info are listed in the online GitHub wiki, at https://github.com/pyparsing/pyparsing/wiki. An entire directory of examples is at https://github.com/pyparsing/pyparsing/tree/master/examples.

License

MIT License. See header of pyparsing.py

History

See CHANGES file.

Comments
  • Railroad diagram future updates

    Railroad diagram future updates

    I'm making this issue so that I and anyone else can suggest improvements to the railroad diagram generation feature.

    TODOs:

    • [ ] (from #225) Suppress FollowedBy/_FB subclass
    • [ ] Fix the root element not being called Unnamed and instead being called Forward e.g. in the SQL example
    • [ ] Fix the root element not being labelled "root" as it was previously
    • [ ] Consider ways to use the element class to disambiguate the Unnamed classes
    opened by multimeric 23
  • Storing the default names of elements

    Storing the default names of elements

    So currently elements have their name stored in the name field, which is assigned to elements in their __init__(), but can then be overwritten with setName().

    The problem with this approach is that we then lose access to the original name, which can be useful as a description of what the element parser. In particular I want access to this for the railroad diagram generator.

    So I'm proposing a change that allows us to keep the original names. Here's my current suggestion:

    from abc import ABC
    
    class Token(ABC):
        # Other methods
        def __init__():
            self.name = None
    
        def setName(name):
            self.name = name
        
        def __str__():
            return self.name or self.defaultName
    
        @abc.abstractmethod
        @property
        def defaultName():
            pass
    
    class Literal(Token):
        def __init__(self, matchString):
            super().__init__()
            self.match = matchString
    
        def defaultName():
            return '"%s"' % str(self.match)
    

    This way, I can use element.defaultName in my diagram generator. Thoughts?

    opened by multimeric 23
  • Adding assert()-class methods for List/Dict verification via .asDict()/.asList()

    Adding assert()-class methods for List/Dict verification via .asDict()/.asList()

    Since I've been doing lots of unit tests associated with the pyparsing and Python List/Dict, there might be some benefit from creating a couple of self.assertParserElementListTrue-related functions to assist the pyparsing community with ensuring that nothing gets broken as we go along.

    I've got a rough prototype of which I used heavily in my ISC Bind9/DHCP for such verification that List/Dict got constructed ... exactly and precisely:

    def assertParseElement(a_parse_element, a_test_data, a_expected_result,
                           a_assert_flag=True):
        """
        A nice unit test tool which provides an assert()-like function
        that takes an string, parse the string, takes its computed
        Pythonized list/dict and compares the result against its
        expected Pythonized result.
        :param a_parse_element:  ParserElement class to exercise
        :param a_test_data:  A string in which to be parsed by a_parse_element
        :param a_expected_result:  A Python list in which to expect
        :param a_assert_flag:  If True, then expected result must match or an
                               exception gets raised.
                               If False, then parse MUST fail or expected
                               result does not match, else an exception
                               gets raised
        :return: Always returns True (exception handles the False, like
                 an assert() class would do)
        """
    

    Is this something that our wonderful pyparsing community can use to ensure that such construct of List/Dict get done correctly?

    https://github.com/egberts/pyparsing/blob/9a06cc2e5c47228db612a17fc68d6a931b4425db/test/test_isc_bind_aml.py#L72

    opened by egberts 22
  • unexpected 2.4.0">

    unexpected "warn_ungrouped_named_tokens_in_collection" warnings in versions >2.4.0

    Hi Paul, I wanted to alert you to this issue in case you didn't hear about it yet: (https://github.com/sdispater/poetry/issues/1244, https://github.com/pypa/packaging/issues/170). The poetry people are working around it by fixing pyparsing to 4.2.0 πŸ˜•

    opened by a-recknagel 21
  • Start refactor

    Start refactor

    pyparsing can be refactored while maintaining backward capability. Here is a small refactor as a proof-of-concept. I contend that moving code out into modules will make it easier to navigate, and make the overall architecture easier to learn.

    If this PR is accepted, then I will propose more code be moved out to logical modules; eventually reducing pyparsing.__init__.py to the single function of declaring the namespace (the __all__ export).

    I am uncertain about what tests (and examples) must be run to confirm correctness, so I used the .travis.yml file for inspiration:

    SET PYTHONPATH=.
    c:\python36\python.exe simple_unit_tests.py
    c:\python36\python.exe unitTests.py
    c:\python36\python.exe examples/numerics.py
    c:\python36\python.exe examples/TAP.py
    c:\python36\python.exe examples/romanNumerals.py
    c:\python36\python.exe examples/sexpParser.py
    c:\python36\python.exe examples/oc.py
    c:\python36\python.exe examples/delta_time.py
    c:\python36\python.exe examples/eval_arith.py
    
    opened by klahnakoski 19
  • Regression in 3.0.2, 3.0.3, and 3.0.4

    Regression in 3.0.2, 3.0.3, and 3.0.4

    In translate-toolkit we use pyparsing for parsing Windows RC files.

    I did port the code to 3.0.1 and everything worked fine. On 3.0.2 and 3.0.3 we get testsuite failures (see https://github.com/translate/translate/runs/4028529279?check_suite_focus=true).

    The guilty commit is https://github.com/pyparsing/pyparsing/commit/4ab17bb55d1ba72adef66c01232711d421650767, reverting it makes it work again.

    opened by nijel 18
  • Failed to parse a combination of WITH, CASE and EXTRACT(... FROM ...)

    Failed to parse a combination of WITH, CASE and EXTRACT(... FROM ...)

    Example (for BigQuery):

    with t as (CASE EXTRACT(dayofweek FROM CURRENT_DATETIME()) when 1 then "S" end) select * from t
    

    error:

      File "/Library/Python/3.8/site-packages/pyparsing.py", line 1955, in parseString
        raise exc
      File "/Library/Python/3.8/site-packages/pyparsing.py", line 2969, in parseImpl
        raise ParseException(instring, loc, self.errmsg, self)
    pyparsing.ParseException: Expected {select statement | {Suppress:("(") select statement Suppress:(")")}}, found 'w'  (at char 13), (line:2, col:13)
    
    opened by qs 16
  • Left-Recursion support

    Left-Recursion support

    This PR adds support for direct and indirect left-recursion, according to the "bounded left-recursion" scheme. This is similar to but neither identical nor directly compatible with Packrat.

    • [x] basic LR implementation
    • [x] LR must be enabled like Packrat
    • [x] tests for direct left recursion
    • [x] tests for indirect left recursion
    • [x] tests for non-PEG clause interactions
    • [x] docs
    • [x] ? non-action left recursion lookahead ?

    Closes #287.

    opened by maxfischer2781 16
  • indentedBlock not clearing the indent stack when it partially matches, and then fails

    indentedBlock not clearing the indent stack when it partially matches, and then fails

    First of all, thanks for this amazingly powerful and expressive library. I'm so glad to have found it.

    I want to parse a language that uses indentation semantically, and thus I am using the indentedBlock function. However, I want to use it in combination with scanString(), because not all text in my input string belongs to this language.

    However, in doing this I have noticed a bug in the indentedBlock. If you try to parse a string that does include an indented block, but doesn't completely parse, the indent stack will not be reverted to how it was, and thus it will fail to parse all correct statements.

    To demonstrate this, I've made a simple test case: https://github.com/TMiguelT/PyparsingIndent/blob/master/indent.py. The comments explain that, if the indentedBlock expression matches, but then the rest of the parsing fails, then the parser will fail to match anything from then on.

    opened by multimeric 15
  • single character Word not getting captured when directly adjacent to a Literal in pp 3.0.0

    single character Word not getting captured when directly adjacent to a Literal in pp 3.0.0

    x = Literal("<@>")
    id = Word(alphas + '_', alphanums + '_')
    loop = id + x + id
    loop.parse_string('a<@>b2')
    

    This will cause an error for me.

    Here I have the diff for parse_string and my vendored PP 2.4.3 parseString:

    Screen Shot 2021-10-23 at 3 39 48 PM

    The version is 3.10.0 for Python and here is the error I get:

    
    self = {{Forward: {Group:({Group:({[{Combine:({'#' cython.binding | cython.boundscheck | cython.wraparound | cython.initializ...(A-Za-z)} '='} {Forward: None | {{{{{{{{{{{{Forward: None | Forward: operator term} | Combine:(Forward: bitwise o}...]}
    instring = '`first-class functions`\n(Add(x = 3 -> int; y = 3 -> int)) int\n        <*>x+y\n\n`generators`\n(Yield123()) coroutin...1 % denom\ndiv_by_zero2 = 1 / 0\nmod_zero2 = 1 % 0\ntruth = 1 + 1 == 2 >> 0\n\nmain:\nHello()\nprint("I\'m a binary.")'
    parse_all = False
    
        def parse_string(
            self, instring: str, parse_all: bool = False, *, parseAll: bool = False
        ) -> ParseResults:
            """
            Parse a string with respect to the parser definition. This function is intended as the primary interface to the
            client code.
        
            :param instring: The input string to be parsed.
            :param parse_all: If set, the entire input string must match the grammar.
            :param parseAll: retained for pre-PEP8 compatibility, will be removed in a future release.
            :raises ParseException: Raised if ``parse_all`` is set and the input string does not match the whole grammar.
            :returns: the parsed data as a :class:`ParseResults` object, which may be accessed as a `list`, a `dict`, or
              an object with attributes if the given parser includes results names.
        
            If the input string is required to match the entire grammar, ``parse_all`` flag must be set to ``True``. This
            is also equivalent to ending the grammar with :class:`StringEnd`().
        
            To report proper column numbers, ``parse_string`` operates on a copy of the input string where all tabs are
            converted to spaces (8 spaces per tab, as per the default in ``string.expandtabs``). If the input string
            contains tabs and the grammar uses parse actions that use the ``loc`` argument to index into the string
            being parsed, one can ensure a consistent view of the input string by doing one of the following:
        
            - calling ``parse_with_tabs`` on your grammar before calling ``parse_string`` (see :class:`parse_with_tabs`),
            - define your parse action using the full ``(s,loc,toks)`` signature, and reference the input string using the
              parse action's ``s`` argument, or
            - explicitly expand the tabs in your input string before calling ``parse_string``.
        
            Examples:
        
            By default, partial matches are OK.
        
            >>> res = Word('a').parse_string('aaaaabaaa')
            >>> print(res)
            ['aaaaa']
        
            The parsing behavior varies by the inheriting class of this abstract class. Please refer to the children
            directly to see more examples.
        
            It raises an exception if parse_all flag is set and instring does not match the whole grammar.
        
            >>> res = Word('a').parse_string('aaaaabaaa', parse_all=True)
            Traceback (most recent call last):
            ...
            pyparsing.ParseException: Expected end of text, found 'b'  (at char 5), (line:1, col:6)
            """
            parseAll = parse_all or parseAll
        
            ParserElement.reset_cache()
            if not self.streamlined:
                self.streamline()
            for e in self.ignoreExprs:
                e.streamline()
            if not self.keepTabs:
                instring = instring.expandtabs()
            try:
                loc, tokens = self._parse(instring, 0)
                if parseAll:
                    loc = self.preParse(instring, loc)
                    se = Empty() + StringEnd()
                    se._parse(instring, loc)
            except ParseBaseException as exc:
                if ParserElement.verbose_stacktrace:
                    raise
                else:
                    # catch and re-raise exception from here, clearing out pyparsing internal stack trace
    >               raise exc.with_traceback(None)
    E               pyparsing.exceptions.ParseException: Expected end of text, found 'i'  (at char 520), (line:27, col:1)
    
    ../../.pyenv/versions/3.10.0/lib/python3.10/site-packages/pyparsing/core.py:1101: ParseException
    
    

    I reran the code totally isolated and did not find the error. It appears to be coming from an IndentedBlock on the next line that has the literal as '(!)' rather than '<!>'.

    opened by rjdbcm 14
  • 2.4.1 release removed from PyPi?

    2.4.1 release removed from PyPi?

    The 2.4.1 release seems to have been removed from PyPi sometime on July 24th, but as far as I can tell there is no announcement of the reason for this and master branch still appears to contain the 2.4.1 changes.

    Currently PyPi is reporting that the latest release is 2.4.0 released April 8, 2019.

    Just raising this issue to confirm that this is deliberate / alert if it is not.

    opened by orthanc 14
  • Current stable release documentation

    Current stable release documentation

    It sounds the same as #297 but I have the version selector. There just isn't any option to view the docs for 3.0.9, which is the current stable release. I was very confused because e.g. pp.python_quoted_string is documented but doesn't exist.

    Also, I'm not sure if this is a recent change, but the docs say there should be a pp.comma_separated_list when it is actually under pp.common.comma_separated_list.

    opened by dave-kennedy 0
  • ZeroOrMore typing rejects strings

    ZeroOrMore typing rejects strings

    When typechecking code with a ZeroOrMore literal (e.g. ZeroOrMore(".")), my complains:

    error: Argument 1 to "ZeroOrMore" has incompatible type "str"; expected "ParserElement"  [arg-type]
    

    ZeroOrMore is annotated with expr: ParserElement, shouldn't that be Union[ParserElement, str] ?

    opened by ydirson 1
  • Tests failing in lucene_grammar.py example

    Tests failing in lucene_grammar.py example

    This is nothing too urgent, I am just flagging this - I have noticed that when running the lucene_grammar.py example, that some of the failure tests are actually succeeding, which means that either the expression or tests aren't correct. If people are using the code from this example to parse lucene queries, then I think it is important that it can be trusted.

    For example, parsing this test query that is expected to fail: a\:b\+c\~ yields ['a:b+c~'] (it doesn't fail)

    opened by lijenicol 0
  • Please revert changes that lead downstream projects to add extra spaces

    Please revert changes that lead downstream projects to add extra spaces

    downstream projects like pipenv are now forced to feed extra spaces:

    -charset-normalizer==2.1.1; python_full_version >= '3.6.0'
    +charset-normalizer==2.1.1 ; python_full_version >= '3.6.0'
    

    refs:

    • https://github.com/pypa/pipenv/issues/5506
    • https://github.com/pypa/pipenv/issues/5506#issuecomment-1332138018
    • https://github.com/sarugaku/requirementslib/commit/422cdae898f8b09aca9264daf803dfcc57d58549
    opened by glensc 3
  • I got an unexpected parsing result when using Forward()

    I got an unexpected parsing result when using Forward()

    import pyparsing as pp
    pp.ParserElement.enable_left_recursion()
    
    # Parser 1
    A = pp.Forward()
    B = pp.Forward()
    
    A <<= B + pp.Literal("a") + B
    B <<= A | pp.Literal("b")
    print(B.parse_string("bab"))
    
    # Parser 2
    B = pp.Forward()
    
    B <<= B + pp.Literal("a") + B | pp.Literal("b")
    print(B.parse_string("bab"))
    
    # Parser 1 output
    ['b']
    # Parser 2 output
    ['b', 'a', 'b']
    

    I don't known why the ouputs are different.

    opened by JacobiSong 0
  • Add CIFuzz to Github actions

    Add CIFuzz to Github actions

    Add CIFuzz workflow action to have fuzzers build and run on each PR. This is a service offered by OSS-Fuzz where pyparsing was recently integrated (https://github.com/pyparsing/pyparsing/issues/441). CIFuzz can help detect catch regressions and fuzzing build issues early, and has a variety of features (see the URL above). In the current PR the fuzzers gets build on a pull request and will run for 300 seconds.

    Signed-off-by: David Korczynski [email protected]

    opened by DavidKorczynski 0
Releases(pyparsing_3.0.9)
  • pyparsing_3.0.9(May 10, 2022)

    • Added Unicode set BasicMultilingualPlane (may also be referenced as BMP) representing the Basic Multilingual Plane (Unicode characters up to code point 65535). Can be used to parse most language characters, but omits emojis, wingdings, etc. Raised in discussion with Dave Tapley (issue #392).

    • To address mypy confusion of pyparsing.Optional and typing.Optional resulting in error: "_SpecialForm" not callable message reported in issue #365, fixed the import in exceptions.py. Nice sleuthing by Iwan Aucamp and Dominic Davis-Foster, thank you! (Removed definitions of OptionalType, DictType, and IterableType and replaced them with typing.Optional, typing.Dict, and typing.Iterable throughout.)

    • Fixed typo in jinja2 template for railroad diagrams, thanks for the catch Nioub (issue #388).

    • Removed use of deprecated pkg_resources package in railroad diagramming code (issue #391).

    • Updated bigquery_view_parser.py example to parse examples at https://cloud.google.com/bigquery/docs/reference/legacy-sql

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.9-py3-none-any.whl(96.03 KB)
    pyparsing-3.0.9.tar.gz(1.90 MB)
  • pyparsing_3.0.8(Apr 10, 2022)

    Version 3.0.8 -

    • API CHANGE: modified pyproject.toml to require Python version 3.6.8 or later for pyparsing 3.x. Earlier minor versions of 3.6 fail in evaluating the version_info class (implemented using typing.NamedTuple). If you are using an earlier version of Python 3.6, you will need to use pyparsing 2.4.7.

    • Improved pyparsing import time by deferring regex pattern compiles. PR submitted by Anthony Sottile to fix issue #362, thanks!

    • Updated build to use flit, PR by MichaΕ‚ GΓ³rny, added BUILDING.md doc and removed old Windows build scripts - nice cleanup work!

    • More type-hinting added for all arithmetic and logical operator methods in ParserElement. PR from Kazantcev Andrey, thank you.

    • Fixed infix_notation's definitions of lpar and rpar, to accept parse expressions such that they do not get suppressed in the parsed results. PR submitted by Philippe Prados, nice work.

    • Fixed bug in railroad diagramming with expressions containing Combine elements. Reported by Jeremy White, thanks!

    • Added show_groups argument to create_diagram to highlight grouped elements with an unlabeled bounding box.

    • Added unicode_denormalizer.py to the examples as a demonstration of how Python's interpreter will accept Unicode characters in identifiers, but normalizes them back to ASCII so that identifiers print and π•‘π“»α΅’π“ƒπ˜ and 𝖕𝒓𝗂𝑛ᡗ are all equivalent.

    • Removed imports of deprecated sre_constants module for catching exceptions when compiling regular expressions. PR submitted by Serhiy Storchaka, thank you.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.8-py3-none-any.whl(96.19 KB)
    pyparsing-3.0.8.tar.gz(1.87 MB)
  • pyparsing_3.0.7(Jan 21, 2022)

    • Fixed bug #345, in which delimitedList changed expressions in place using expr.streamline(). Reported by Kim GrΓ€sman, thanks!

    • Fixed bug #346, when a string of word characters was passed to WordStart or WordEnd instead of just taking the default value. Originally posted as a question by Parag on StackOverflow, good catch!

    • Fixed bug #350, in which White expressions could fail to match due to unintended whitespace-skipping. Reported by Fu Hanxi, thank you!

    • Fixed bug #355, when a QuotedString is defined with characters in its quoteChar string containing regex-significant characters such as ., *, ?, [, ], etc.

    • Fixed bug in ParserElement.run_tests where comments would be displayed using with_line_numbers.

    • Added optional "min" and "max" arguments to delimited_list. PR submitted by Marius, thanks!

    • Added new API change note in whats_new_in_pyparsing_3_0_0, regarding a bug fix in the bool() behavior of ParseResults.

      Prior to pyparsing 3.0.x, the ParseResults class implementation of __bool__ would return False if the ParseResults item list was empty, even if it contained named results. In 3.0.0 and later, ParseResults will return True if either the item list is not empty or if the named results dict is not empty.

      # generate an empty ParseResults by parsing a blank string with
      # a ZeroOrMore
      result = Word(alphas)[...].parse_string("")
      print(result.as_list())
      print(result.as_dict())
      print(bool(result))
      
      # add a results name to the result
      result["name"] = "empty result"
      print(result.as_list())
      print(result.as_dict())
      print(bool(result))
      

      Prints:

      []
      {}
      False
      
      []
      {'name': 'empty result'}
      True
      

      In previous versions, the second call to bool() would return False.

    • Minor enhancement to Word generation of internal regular expression, to emit consecutive characters in range, such as "ab", as "ab", not "a-b".

    • Fixed character ranges for search terms using non-Western characters in booleansearchparser, PR submitted by tc-yu, nice work!

    • Additional type annotations on public methods.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.7-py3-none-any.whl(95.75 KB)
    pyparsing-3.0.7.tar.gz(863.97 KB)
  • pyparsing_3.0.6(Nov 12, 2021)

  • pyparsing_3.0.5(Nov 7, 2021)

    • Added return type annotations for col, line, and lineno.

    • Fixed bug when warn_ungrouped_named_tokens_in_collection warning was raised when assigning a results name to an original_text_for expression. (Issue #110, would raise warning in packaging.)

    • Fixed internal bug where ParserElement.streamline() would not return self if already streamlined.

    • Changed run_tests() output to default to not showing line and column numbers. If line numbering is desired, call with with_line_numbers=True. Also fixed minor bug where separating line was not included after a test failure.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.5-py3-none-any.whl(94.91 KB)
    pyparsing-3.0.5.tar.gz(859.77 KB)
  • pyparsing_3.0.4(Oct 30, 2021)

    • Fixed bug in which Dict classes did not correctly return tokens as nested ParseResults, reported by and fix identified by Bu Sun Kim, many thanks!!!

    • Documented API-changing side-effect of converting ParseResults to use __slots__ to pre-define instance attributes. This means that code written like this (which was allowed in pyparsing 2.4.7):

      result = Word(alphas).parseString("abc")
      result.xyz = 100
      

      now raises this Python exception:

      AttributeError: 'ParseResults' object has no attribute 'xyz'
      

      To add new attribute values to ParseResults object in 3.0.0 and later, you must assign them using indexed notation:

      result["xyz"] = 100
      

      You will still be able to access this new value as an attribute or as an indexed item.

    • Fixed bug in railroad diagramming where the vertical limit would count all expressions in a group, not just those that would create visible railroad elements.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.4-py3-none-any.whl(94.66 KB)
    pyparsing-3.0.4.tar.gz(859.10 KB)
  • pyparsing_3.0.3(Oct 27, 2021)

  • pyparsing_3.0.2(Oct 27, 2021)

    • Reverted change in behavior with LineStart and StringStart, which changed the interpretation of when and how LineStart and StringStart should match when a line starts with spaces. In 3.0.0, the xxxStart expressions were not really treated like expressions in their own right, but as modifiers to the following expression when used like LineStart() + expr, so that if there were whitespace on the line before expr (which would match in versions prior to 3.0.0), the match would fail.

      3.0.0 implemented this by automatically promoting LineStart() + expr to AtLineStart(expr), which broke existing parsers that did not expect expr to necessarily be right at the start of the line, but only be the first token found on the line. This was reported as a regression in Issue #317.

      In 3.0.2, pyparsing reverts to the previous behavior, but will retain the new AtLineStart and AtStringStart expression classes, so that parsers can chose whichever behavior applies in their specific instance. Specifically:

      # matches expr if it is the first token on the line (allows for leading whitespace)
      LineStart() + expr
      
      # matches only if expr is found in column 1
      AtLineStart(expr)
      
    • Performance enhancement to one_of to always generate an internal Regex, even if caseless or as_keyword args are given as True (unless explicitly disabled by passing use_regex=False).

    • IndentedBlock class now works with recursive flag. By default, the results parsed by an IndentedBlock are grouped. This can be disabled by constructing the IndentedBlock with grouped=False.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.2-py3-none-any.whl(94.42 KB)
    pyparsing-3.0.2.tar.gz(857.57 KB)
  • pyparsing_3.0.1(Oct 24, 2021)

  • pyparsing_3.0.0(Oct 23, 2021)

    Version 3.0.0 -

    • A consolidated list of all the changes in the 3.0.0 release can be found in docs/whats_new_in_3_0_0.rst. (https://github.com/pyparsing/pyparsing/blob/master/docs/whats_new_in_3_0_0.rst)

    Version 3.0.0.final -

    • Added support for python -W warning option to call enable_all_warnings() at startup. Also detects setting of PYPARSINGENABLEALLWARNINGS environment variable to any non-blank value.

    • Fixed named results returned by url to match fields as they would be parsed using urllib.parse.urlparse.

    • Early response to with_line_numbers was positive, with some requested enhancements: . added a trailing "|" at the end of each line (to show presence of trailing spaces); can be customized using eol_mark argument . added expand_tabs argument, to control calling str.expandtabs (defaults to True to match parseString) . added mark_spaces argument to support display of a printing character in place of spaces, or Unicode symbols for space and tab characters . added mark_control argument to support highlighting of control characters using '.' or Unicode symbols, such as "␍" and "␊".

    • Modified helpers common_html_entity and replace_html_entity() to use the HTML entity definitions from html.entities.html5.

    • Updated the class diagram in the pyparsing docs directory, along with the supporting .puml file (PlantUML markup) used to create the diagram.

    • Added global method autoname_elements() to call set_name() on all locally defined ParserElements that haven't been explicitly named using set_name(), using their local variable name. Useful for setting names on multiple elements when creating a railroad diagram.

            a = pp.Literal("a")
            b = pp.Literal("b").set_name("bbb")
            pp.autoname_elements()
      

      a will get named "a", while b will keep its name "bbb".

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0-py3-none-any.whl(93.70 KB)
    pyparsing-3.0.0.tar.gz(855.07 KB)
  • pyparsing_3.0.0rc2(Oct 2, 2021)

    • Added url expression to pyparsing_common. (Sample code posted by Wolfgang Fahl, very nice!)

      This new expression has been added to the urlExtractorNew.py example, to show how it extracts URL fields into separate results names.

    • Added method to pyparsing_testing to help debugging, with_line_numbers. Returns a string with line and column numbers corresponding to values shown when parsing with expr.set_debug():

      data = """\
         A
            100"""
      expr = pp.Word(pp.alphanums).set_name("word").set_debug()
      print(ppt.with_line_numbers(data))
      expr[...].parseString(data)
      

      prints:

                    1
           1234567890
         1:   A
         2:      100
        Match word at loc 3(1,4)
             A
             ^
        Matched word -> ['A']
        Match word at loc 11(2,7)
                100
                ^
        Matched word -> ['100']
      
    • Added new example cuneiform_python.py to demonstrate creating a new Unicode range, and writing a Cuneiform->Python transformer (inspired by zhpy).

    • Fixed issue #272, reported by PhasecoreX, when LineStart() expressions would match expressions that were not necessarily at the beginning of a line.

      As part of this fix, two new classes have been added: AtLineStart and AtStringStart. The following expressions are equivalent:

      LineStart() + expr      and     AtLineStart(expr)
      StringStart() + expr    and     AtStringStart(expr)
      
    • Fixed ParseFatalExceptions failing to override normal exceptions or expression matches in MatchFirst expressions. Addresses issue #251, reported by zyp-rgb.

    • Fixed bug in which ParseResults replaces a collection type value with an invalid type annotation (changed behavior in Python 3.9). Addresses issue #276, reported by Rob Shuler, thanks.

    • Fixed bug in ParseResults when calling __getattr__ for special double-underscored methods. Now raises AttributeError for non-existent results when accessing a name starting with '__'. Addresses issue #208, reported by Joachim Metz.

    • Modified debug fail messages to include the expression name to make it easier to sync up match vs success/fail debug messages.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0rc2-py3-none-any.whl(92.04 KB)
    pyparsing-3.0.0rc2.tar.gz(721.85 KB)
  • pyparsing_3.0.0rc1(Sep 9, 2021)

    • Railroad diagrams have been reformatted: . creating diagrams is easier - call

        expr.create_diagram("diagram_output.html")
      

      create_diagram() takes 3 arguments: . the filename to write the diagram HTML . optional 'vertical' argument, to specify the minimum number of items in a path to be shown vertically; default=3 . optional 'show_results_names' argument, to specify whether results name annotations should be shown; default=False

      . every expression that gets a name using setName() gets separated out as a separate subdiagram . results names can be shown as annotations to diagram items . Each, FollowedBy, and PrecededBy elements get [ALL], [LOOKAHEAD], and [LOOKBEHIND] annotations . removed annotations for Suppress elements . some diagram cleanup when a grammar contains Forward elements . check out the examples make_diagram.py and railroad_diagram_demo.py

    • Type annotations have been added to most public API methods and classes.

    • Better exception messages to show full word where an exception occurred.

      Word(alphas)[...].parseString("abc 123", parseAll=True)
      

      Was:

      pyparsing.ParseException: Expected end of text, found '1'  (at char 4), (line:1, col:5)
      

      Now:

      pyparsing.exceptions.ParseException: Expected end of text, found '123'  (at char 4), (line:1, col:5)
      
    • Suppress can be used to suppress text skipped using "...".

      source = "lead in START relevant text END trailing text"
      start_marker = Keyword("START")
      end_marker = Keyword("END")
      find_body = Suppress(...) + start_marker + ... + end_marker
      print(find_body.parseString(source).dump())
      

      Prints:

      ['START', 'relevant text ', 'END']
      - _skipped: ['relevant text ']
      
    • New string constants identchars and identbodychars to help in defining identifier Word expressions

      Two new module-level strings have been added to help when defining identifiers, identchars and identbodychars.

      Instead of writing::

      import pyparsing as pp
      identifier = pp.Word(pp.alphas + "_", pp.alphanums + "_")
      

      you will be able to write::

      identifier = pp.Word(pp.indentchars, pp.identbodychars)
      

      Those constants have also been added to all the Unicode string classes::

      import pyparsing as pp
      ppu = pp.pyparsing_unicode
      
      cjk_identifier = pp.Word(ppu.CJK.identchars, ppu.CJK.identbodychars)
      greek_identifier = pp.Word(ppu.Greek.identchars, ppu.Greek.identbodychars)
      
    • Added a caseless parameter to the CloseMatch class to allow for casing to be ignored when checking for close matches. (Issue #281) (PR by Adrian Edwards, thanks!)

    • Fixed bug in Located class when used with a results name. (Issue #294)

    • Fixed bug in QuotedString class when the escaped quote string is not a repeated character. (Issue #263)

    • parseFile() and create_diagram() methods now will accept pathlib.Path arguments.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0rc1-py3-none-any.whl(90.17 KB)
    pyparsing-3.0.0rc1.tar.gz(715.14 KB)
  • pyparsing_3.0.0b3(Aug 8, 2021)

    • PEP-8 compatible names are being introduced in pyparsing version 3.0! All methods such as parseString have been replaced with the PEP-8 compliant name parse_string. In addition, arguments such as parseAll have been renamed to parse_all. For backward-compatibility, synonyms for all renamed methods and arguments have been added, so that existing pyparsing parsers will not break. These synonyms will be removed in a future release.

      In addition, the Optional class has been renamed to Opt, since it clashes with the common typing.Optional type specifier that is used in the Python type annotations. A compatibility synonym is defined for now, but will be removed in a future release.

    • HUGE NEW FEATURE - Support for left-recursive parsers! Following the method used in Python's PEG parser, pyparsing now supports left-recursive parsers when left recursion is enabled.

        import pyparsing as pp
        pp.ParserElement.enable_left_recursion()
      
        # a common left-recursion definition
        # define a list of items as 'list + item | item'
        # BNF:
        #   item_list := item_list item | item
        #   item := word of alphas
        item_list = pp.Forward()
        item = pp.Word(pp.alphas)
        item_list <<= item_list + item | item
      
        item_list.run_tests("""\
            To parse or not to parse that is the question
            """)
      

      Prints:

        ['To', 'parse', 'or', 'not', 'to', 'parse', 'that', 'is', 'the', 'question']
      

      Great work contributed by Max Fischer!

    • delimited_list now supports an additional flag allow_trailing_delim, to optionally parse an additional delimiter at the end of the list. Contributed by Kazantcev Andrey, thanks!

    • Removed internal comparison of results values against b"", which raised a BytesWarning when run with python -bb. Fixes issue #271 reported by Florian Bruhin, thank you!

    • Fixed STUDENTS table in sql2dot.py example, fixes issue #261 reported by legrandlegrand - much better.

    • Python 3.5 will not be supported in the pyparsing 3 releases. This will allow for future pyparsing releases to add parameter type annotations, and to take advantage of dict key ordering in internal results name tracking.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0b3-py3-none-any.whl(86.60 KB)
    pyparsing-3.0.0b3.tar.gz(705.93 KB)
  • pyparsing_3.0.0b2(Dec 30, 2020)

  • pyparsing_3.0.0b1(Nov 3, 2020)

    • API CHANGE Diagnostic flags have been moved to an enum, pyparsing.Diagnostics, and they are enabled through module-level methods:

      • pyparsing.enable_diag()
      • pyparsing.disable_diag()
      • pyparsing.enable_all_warnings()
    • API CHANGE Most previous SyntaxWarnings that were warned when using pyparsing classes incorrectly have been converted to TypeError and ValueError exceptions, consistent with Python calling conventions. All warnings warned by diagnostic flags have been converted from SyntaxWarnings to UserWarnings.

    • To support parsers that are intended to generate native Python collection types such as lists and dicts, the Group and Dict classes now accept an additional boolean keyword argument aslist and asdict respectively. See the jsonParser.py example in the pyparsing/examples source directory for how to return types as ParseResults and as Python collection types, and the distinctions in working with the different types.

      In addition parse actions that must return a value of list type (which would normally be converted internally to a ParseResults) can override this default behavior by returning their list wrapped in the new ParseResults.List class:

      # this parse action tries to return a list, but pyparsing
      # will convert to a ParseResults
      def return_as_list_but_still_get_parse_results(tokens):
          return tokens.asList()
      
      # this parse action returns the tokens as a list, and pyparsing will
      # maintain its list type in the final parsing results
      def return_as_list(tokens):
          return ParseResults.List(tokens.asList())
      

      This is the mechanism used internally by the Group class when defined using aslist=True.

    • A new IndentedBlock class is introduced, to eventually replace the current indentedBlock helper method. The interface is largely the same, however, the new class manages its own internal indentation stack, so it is no longer necessary to maintain an external indentStack variable.

    • API CHANGE Added cache_hit keyword argument to debug actions. Previously, if packrat parsing was enabled, the debug methods were not called in the event of cache hits. Now these methods will be called, with an added argument cache_hit=True.

      If you are using packrat parsing and enable debug on expressions using a custom debug method, you can add the cache_hit=False keyword argument, and your method will be called on packrat cache hits. If you choose not to add this keyword argument, the debug methods will fail silently, behaving as they did previously.

    • When using setDebug with packrat parsing enabled, packrat cache hits will now be included in the output, shown with a leading '*'. (Previously, cache hits and responses were not included in debug output.) For those using custom debug actions, see the previous item regarding an optional API change for those methods.

    • setDebug output will also show more details about what expression is about to be parsed (the current line of text being parsed, and the current parse position):

        Match integer at loc 0(1,1)
          1 2 3
          ^
        Matched integer -> ['1']
      

      The current debug location will also be indicated after whitespace has been skipped (was previously inconsistent, reported in Issue #244, by Frank Goyens, thanks!).

    • Modified the repr() output for ParseResults to include the class name as part of the output. This is to clarify for new pyparsing users who misread the repr output as a tuple of a list and a dict. pyparsing results will now read like:

      ParseResults(['abc', 'def'], {'qty': 100}]
      

      instead of just:

      (['abc', 'def'], {'qty': 100}]
      
    • Fixed bugs in Each when passed OneOrMore or ZeroOrMore expressions: . first expression match could be enclosed in an extra nesting level . out-of-order expressions now handled correctly if mixed with required expressions . results names are maintained correctly for these expressions

    • Fixed traceback trimming, and added ParserElement.verbose_traceback save/restore to reset_pyparsing_context().

    • Default string for Word expressions now also include indications of min and max length specification, if applicable, similar to regex length specifications:

        Word(alphas)             -> "W:(A-Za-z)"
        Word(nums)               -> "W:(0-9)"
        Word(nums, exact=3)      -> "W:(0-9){3}"
        Word(nums, min=2)        -> "W:(0-9){2,...}"
        Word(nums, max=3)        -> "W:(0-9){1,3}"
        Word(nums, min=2, max=3) -> "W:(0-9){2,3}"
      

      For expressions of the Char class (similar to Word(..., exact=1), the expression is simply the character range in parentheses:

        Char(nums)               -> "(0-9)"
        Char(alphas)             -> "(A-Za-z)"
      
    • Removed copy() override in Keyword class which did not preserve definition of ident chars from the original expression. PR #233 submitted by jgrey4296, thanks!

    • In addition to pyparsing.__version__, there is now also a pyparsing.__version_info__, following the same structure and field names as in sys.version_info.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0b1-py3-none-any.whl(81.64 KB)
    pyparsing-3.0.0b1.tar.gz(1014.21 KB)
  • pyparsing_3.0.0a2(Jun 28, 2020)

    Version 3.0.0a2 - June, 2020

    • Summary of changes for 3.0.0 can be found in "What's New in Pyparsing 3.0.0" documentation.

    • API CHANGE Changed result returned when parsing using countedArray, the array items are no longer returned in a doubly-nested list.

    • An excellent new enhancement is the new railroad diagram generator for documenting pyparsing parsers:

        import pyparsing as pp
        from pyparsing.diagram import to_railroad, railroad_to_html
        from pathlib import Path
      
        # define a simple grammar for parsing street addresses such
        # as "123 Main Street"
        #     number word...
        number = pp.Word(pp.nums).setName("number")
        name = pp.Word(pp.alphas).setName("word")[1, ...]
      
        parser = number("house_number") + name("street")
        parser.setName("street address")
      
        # construct railroad track diagram for this parser and
        # save as HTML
        rr = to_railroad(parser)
        Path('parser_rr_diag.html').write_text(railroad_to_html(rr))
      

      Very nice work provided by Michael Milton, thanks a ton!

    • Enhanced default strings created for Word expressions, now showing string ranges if possible. Word(alphas) would formerly print as W:(ABCD...), now prints as W:(A-Za-z).

    • Added ignoreWhitespace(recurse:bool = True) and added a recurse argument to leaveWhitespace, both added to provide finer control over pyparsing's whitespace skipping. Also contributed by Michael Milton.

    • The unicode range definitions for the various languages were recalculated by interrogating the unicodedata module by character name, selecting characters that contained that language in their Unicode name. (Issue #227)

      Also, pyparsing_unicode.Korean was renamed to Hangul (Korean is also defined as a synonym for compatibility).

    • Enhanced ParseResults dump() to show both results names and list subitems. Fixes bug where adding a results name would hide lower-level structures in the ParseResults.

    • Added new __diag__ warnings:

      "warn_on_parse_using_empty_Forward" - warns that a Forward has been included in a grammar, but no expression was attached to it using '<<=' or '<<'

      "warn_on_assignment_to_Forward" - warns that a Forward has been created, but was probably later overwritten by erroneously using '=' instead of '<<=' (this is a common mistake when using Forwards) (currently not working on PyPy)

    • Added ParserElement.recurse() method to make it simpler for grammar utilities to navigate through the tree of expressions in a pyparsing grammar.

    • Fixed bug in ParseResults repr() which showed all matching entries for a results name, even if listAllMatches was set to False when creating the ParseResults originally. Reported by Nicholas42 on GitHub, good catch! (Issue #205)

    • Modified refactored modules to use relative imports, as pointed out by setuptools project member jaraco, thank you!

    • Off-by-one bug found in the roman_numerals.py example, a bug that has been there for about 14 years! PR submitted by Jay Pedersen, nice catch!

    • A simplified Lua parser has been added to the examples (lua_parser.py).

    • Added make_diagram.py to the examples directory to demonstrate creation of railroad diagrams for selected pyparsing examples. Also restructured some examples to make their parsers importable without running their embedded tests.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-3.0.0a2-py3-none-any.whl(79.11 KB)
    pyparsing-3.0.0a2.tar.gz(762.95 KB)
  • pyparsing_2.4.7(Apr 5, 2020)

    Version 2.4.7 - April, 2020

    • Backport of selected fixes from 3.0.0 work: . Each bug with Regex expressions . And expressions not properly constructing with generator . Traceback abbreviation . Bug in delta_time example . Fix regexen in pyparsing_common.real and .sci_real . Avoid FutureWarning on Python 3.7 or later . Cleanup output in runTests if comments are embedded in test string
    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.7-py2.py3-none-any.whl(66.25 KB)
    pyparsing-2.4.7.tar.gz(634.49 KB)
  • pyparsing_2.4.6(Dec 25, 2019)

    Version 2.4.6 - December, 2019

    • Fixed typos in White mapping of whitespace characters, to use correct "\u" prefix instead of "u".

    • Fix bug in left-associative ternary operators defined using infixNotation. First reported on StackOverflow by user Jeronimo.

    • Backport of pyparsing_test namespace from 3.0.0, including TestParseResultsAsserts mixin class defining unittest-helper methods: . def assertParseResultsEquals( self, result, expected_list=None, expected_dict=None, msg=None) . def assertParseAndCheckList( self, expr, test_string, expected_list, msg=None, verbose=True) . def assertParseAndCheckDict( self, expr, test_string, expected_dict, msg=None, verbose=True) . def assertRunTestResults( self, run_tests_report, expected_parse_results=None, msg=None) . def assertRaisesParseException(self, exc_type=ParseException, msg=None)

      To use the methods in this mixin class, declare your unittest classes as:

      from pyparsing import pyparsing_test as ppt class MyParserTest(ppt.TestParseResultsAsserts, unittest.TestCase): ...

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.6-py2.py3-none-any.whl(66.14 KB)
    pyparsing-2.4.6.tar.gz(633.96 KB)
  • pyparsing_2.4.5(Nov 10, 2019)

  • pyparsing_2.4.4(Nov 5, 2019)

  • pyparsing_2.4.3(Nov 4, 2019)

    Version 2.4.3 - November, 2019

    (Backport of selected critical items from 3.0.0 development branch.)

    • Fixed a bug in ParserElement.__eq__ that would for some parsers create a recursion error at parser definition time. Thanks to Michael Clerx for the assist. (Addresses issue #123)

    • Fixed bug in indentedBlock where a block that ended at the end of the input string could cause pyparsing to loop forever. Raised as part of discussion on StackOverflow with geckos.

    • Backports from pyparsing 3.0.0: . __diag__.enable_all_warnings() . Fixed bug in PrecededBy which caused infinite recursion, issue #127 . support for using regex-compiled RE to construct Regex expressions

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.3-py2.py3-none-any.whl(66.40 KB)
    pyparsing-2.4.3.tar.gz(629.40 KB)
  • pyparsing_2.4.2(Jul 30, 2019)

    Version 2.4.2 - July, 2019

    • Updated the shorthand notation that has been added for repetition expressions: expr[min, max], with '...' valid as a min or max value:

      • expr[...] and expr[0, ...] are equivalent to ZeroOrMore(expr)
      • expr[1, ...] is equivalent to OneOrMore(expr)
      • expr[n, ...] or expr[n,] is equivalent to expr*n + ZeroOrMore(expr) (read as "n or more instances of expr")
      • expr[..., n] is equivalent to expr*(0, n)
      • expr[m, n] is equivalent to expr*(m, n) Note that expr[..., n] and expr[m, n] do not raise an exception if more than n exprs exist in the input stream. If this behavior is desired, then write expr[..., n] + ~expr.

      Better interpretation of [...] as ZeroOrMore raised by crowsonkb, thanks for keeping me in line!

      If upgrading from 2.4.1 or 2.4.1.1 and you have used expr[...] for OneOrMore(expr), it must be updated to expr[1, ...].

    • The defaults on all the __diag__ switches have been set to False, to avoid getting alarming warnings. To use these diagnostics, set them to True after importing pyparsing.

      Example:

      import pyparsing as pp
      pp.__diag__.warn_multiple_tokens_in_named_alternation = True
      
    • Fixed bug introduced by the use of getitem for repetition, overlooking Python's legacy implementation of iteration by sequentially calling getitem with increasing numbers until getting an IndexError. Found during investigation of problem reported by murlock, merci!

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.2-py2.py3-none-any.whl(63.91 KB)
    pyparsing-2.4.2.tar.gz(627.79 KB)
  • pyparsing_2.4.1.1(Jul 25, 2019)

    This is a re-release of version 2.4.1 to restore the release history in PyPI, since the 2.4.1 release was deleted.

    There are 3 known issues in this release, which are fixed in the upcoming 2.4.2:

    • API change adding support for expr[...] - the original code in 2.4.1 incorrectly implemented this as OneOrMore. Code using this feature under this relase should explicitly use expr[0, ...] for ZeroOrMore and expr[1, ...] for OneOrMore. In 2.4.2 you will be able to write expr[...] equivalent to ZeroOrMore(expr).

    • Bug if composing And, Or, MatchFirst, or Each expressions using an expression. This only affects code which uses explicit expression construction using the And, Or, etc. classes instead of using overloaded operators '+', '^', and so on. If constructing an And using a single expression, you may get an error that "cannot multiply ParserElement by 0 or (0, 0)" or a Python IndexError. Change code like

      cmd = Or(Word(alphas))
      

      to

      cmd = Or([Word(alphas)])
      

      (Note that this is not the recommended style for constructing Or expressions.)

    • Some newly-added __diag__ switches are enabled by default, which may give rise to noisy user warnings for existing parsers. You can disable them using:

      import pyparsing as pp
      pp.__diag__.warn_multiple_tokens_in_named_alternation = False
      pp.__diag__.warn_ungrouped_named_tokens_in_collection = False
      pp.__diag__.warn_name_set_on_empty_Forward = False
      pp.__diag__.warn_on_multiple_string_args_to_oneof = False
      pp.__diag__.enable_debug_on_named_expressions = False
      

      In 2.4.2 these will all be set to False by default.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.1.1-py2.py3-none-any.whl(62.82 KB)
    pyparsing-2.4.1.1.tar.gz(611.66 KB)
  • pyparsing_2.4.2a1(Jul 25, 2019)

  • pyparsing_2.4.1(Jul 21, 2019)

    For a minor point release, this release contains many new features!

    • A new shorthand notation has been added for repetition expressions: expr[min, max], with ... valid as a min or max value:

      • expr[...] is equivalent to OneOrMore(expr)
      • expr[0, ...] is equivalent to ZeroOrMore(expr)
      • expr[1, ...] is equivalent to OneOrMore(expr)
      • expr[n, ...] or expr[n,] is equivalent to expr*n + ZeroOrMore(expr) (read as "n or more instances of expr")
      • expr[..., n] is equivalent to expr*(0, n)
      • expr[m, n] is equivalent to expr*(m, n) Note that expr[..., n] and expr[m, n] do not raise an exception if more than n exprs exist in the input stream. If this behavior is desired, then write expr[..., n] + ~expr.
    • ... can also be used as short hand for SkipTo when used in adding parse expressions to compose an And expression.

      Literal('start') + ... + Literal('end')
      And(['start', ..., 'end'])
      

      are both equivalent to:

      Literal('start') + SkipTo('end')("_skipped*") + Literal('end')
      

      The ... form has the added benefit of not requiring repeating the skip target expression. Note that the skipped text is returned with '_skipped' as a results name, and that the contents of _skipped will contain a list of text from all ...s in the expression.

    • ... can also be used as a "skip forward in case of error" expression:

        expr = "start" + (Word(nums).setName("int") | ...) + "end"
      
        expr.parseString("start 456 end")
        ['start', '456', 'end']
      
        expr.parseString("start 456 foo 789 end")
        ['start', '456', 'foo 789 ', 'end']
        - _skipped: ['foo 789 ']
      
        expr.parseString("start foo end")
        ['start', 'foo ', 'end']
        - _skipped: ['foo ']
      
        expr.parseString("start end")
        ['start', '', 'end']
        - _skipped: ['missing <int>']
      

      Note that in all the error cases, the '_skipped' results name is present, showing a list of the extra or missing items.

      This form is only valid when used with the '|' operator.

    • Improved exception messages to show what was actually found, not just what was expected.

        word = pp.Word(pp.alphas)
        pp.OneOrMore(word).parseString("aaa bbb 123", parseAll=True)
      

      Former exception message:

        pyparsing.ParseException: Expected end of text (at char 8), (line:1, col:9)
      

      New exception message:

        pyparsing.ParseException: Expected end of text, found '1' (at char 8), (line:1, col:9)
      
    • Added diagnostic switches to help detect and warn about common parser construction mistakes, or enable additional parse debugging. Switches are attached to the pyparsing.__diag__ namespace object:

      • warn_multiple_tokens_in_named_alternation - flag to enable warnings when a results name is defined on a MatchFirst or Or expression with one or more And subexpressions (default=True)
      • warn_ungrouped_named_tokens_in_collection - flag to enable warnings when a results name is defined on a containing expression with ungrouped subexpressions that also have results names (default=True)
      • warn_name_set_on_empty_Forward - flag to enable warnings whan a Forward is defined with a results name, but has no contents defined (default=False)
      • warn_on_multiple_string_args_to_oneof - flag to enable warnings whan oneOf is incorrectly called with multiple str arguments (default=True)
      • enable_debug_on_named_expressions - flag to auto-enable debug on all subsequent calls to ParserElement.setName() (default=False)

      warn_multiple_tokens_in_named_alternation is intended to help those who currently have set __compat__.collect_all_And_tokens to False as a workaround for using the pre-2.3.1 code with named MatchFirst or Or expressions containing an And expression.

    • Added ParseResults.from_dict classmethod, to simplify creation of a ParseResults with results names using a dict, which may be nested. This makes it easy to add a sub-level of named items to the parsed tokens in a parse action.

    • Added asKeyword argument (default=False) to oneOf, to force keyword-style matching on the generated expressions.

    • ParserElement.runTests now accepts an optional 'file' argument to redirect test output to a file-like object (such as a StringIO, or opened file). Default is to write to sys.stdout.

    • conditionAsParseAction is a helper method for constructing a parse action method from a predicate function that simply returns a boolean result. Useful for those places where a predicate cannot be added using addCondition, but must be converted to a parse action (such as in infixNotation). May be used as a decorator if default message and exception types can be used. See ParserElement.addCondition for more details about the expected signature and behavior for predicate condition methods.

    • While investigating issue #93, I found that Or and addCondition could interact to select an alternative that is not the longest match. This is because Or first checks all alternatives for matches without running attached parse actions or conditions, orders by longest match, and then rechecks for matches with conditions and parse actions. Some expressions, when checking with conditions, may end up matching on a shorter token list than originally matched, but would be selected because of its original priority. This matching code has been expanded to do more extensive searching for matches when a second-pass check matches a smaller list than in the first pass.

    • Fixed issue #87, a regression in indented block. Reported by Renz Bagaporo, who submitted a very nice repro example, which makes the bug-fixing process a lot easier, thanks!

    • Fixed MemoryError issue #85 and #91 with str generation for Forwards. Thanks decalage2 and Harmon758 for your patience.

    • Modified setParseAction to accept None as an argument, indicating that all previously-defined parse actions for the expression should be cleared.

    • Modified pyparsing_common.real and sci_real to parse reals without leading integer digits before the decimal point, consistent with Python real number formats. Original PR #98 submitted by ansobolev.

    • Modified runTests to call postParse function before dumping out the parsed results - allows for postParse to add further results, such as indications of additional validation success/failure.

    • Updated statemachine example: refactored state transitions to use overridden classmethods; added <statename>Mixin class to simplify definition of application classes that "own" the state object and delegate to it to model state-specific properties and behavior.

    • Added example nested_markup.py, showing a simple wiki markup with nested markup directives, and illustrating the use of ... for skipping over input to match the next expression. (This example uses syntax that is not valid under Python 2.)

    • Rewrote delta_time.py example (renamed from deltaTime.py) to fix some omitted formats and upgrade to latest pyparsing idioms, beginning with writing an actual BNF.

    • With the help and encouragement from several contributors, including Matej Cepl and Cengiz Kaygusuz, I've started cleaning up the internal coding styles in core pyparsing, bringing it up to modern coding practices from pyparsing's early development days dating back to 2003. Whitespace has been largely standardized along PEP8 guidelines, removing extra spaces around parentheses, and adding them around arithmetic operators and after colons and commas. I was going to hold off on doing this work until after 2.4.1, but after cleaning up a few trial classes, the difference was so significant that I continued on to the rest of the core code base. This should facilitate future work and submitted PRs, allowing them to focus on substantive code changes, and not get sidetracked by whitespace issues.

    • NOTE: Deprecated functions and features that will be dropped in pyparsing 2.5.0 (planned next release):

      • support for Python 2 - ongoing users running with Python 2 can continue to use pyparsing 2.4.1

      • ParseResults.asXML() - if used for debugging, switch to using ParseResults.dump(); if used for data transfer, use ParseResults.asDict() to convert to a nested Python dict, which can then be converted to XML or JSON or other transfer format

      • operatorPrecedence synonym for infixNotation - convert to calling infixNotation

      • commaSeparatedList - convert to using pyparsing_common.comma_separated_list

      • upcaseTokens and downcaseTokens - convert to using pyparsing_common.upcaseTokens and downcaseTokens

      • __compat__.collect_all_And_tokens will not be settable to False to revert to pre-2.3.1 results name behavior - review use of names for MatchFirst and Or expressions containing And expressions, as they will return the complete list of parsed tokens, not just the first one. Use __diag__.warn_multiple_tokens_in_named_alternation to help identify those expressions in your parsers that will have changed as a result.

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.1-py2.py3-none-any.whl(63.65 KB)
    pyparsing-2.4.1.tar.gz(611.12 KB)
  • pyparsing_2.4.0(Apr 8, 2019)

    • Well, it looks like the API change that was introduced in 2.3.1 was more drastic than expected, so for a friendlier forward upgrade path, this release: . Bumps the current version number to 2.4.0, to reflect this incompatible change. . Adds a pyparsing.__compat__ object for specifying compatibility with future breaking changes. . Conditionalizes the API-breaking behavior, based on the value pyparsing.__compat__.collect_all_And_tokens. By default, this value will be set to True, reflecting the new bugfixed behavior. To set this value to False, add to your code:

        import pyparsing
        pyparsing.__compat__.collect_all_And_tokens = False
      

      . User code that is dependent on the pre-bugfix behavior can restore it by setting this value to False.

      In 2.5 and later versions, the conditional code will be removed and setting the flag to True or False in these later versions will have no effect.

    • Updated unitTests.py and simple_unit_tests.py to be compatible with python setup.py test. To run tests using setup, do:

      python setup.py test
      python setup.py test -s unitTests.suite
      python setup.py test -s simple_unit_tests.suite
      

      Prompted by issue #83 and PR submitted by bdragon28, thanks.

    • Fixed bug in ParserElement.runTests handling '\n' literals in quoted strings.

    • Added tag_body attribute to the start tag expressions generated by makeHTMLTags, so that you can avoid using SkipTo to roll your own tag body expression:

      a, aEnd = pp.makeHTMLTags('a')
      link = a + a.tag_body("displayed_text") + aEnd
      for t in s.searchString(html_page):
          print(t.displayed_text, '->', t.startA.href)
      
    • indentedBlock failure handling was improved; PR submitted by TMiguelT, thanks!

    • Address Py2 incompatibility in simple_unit_tests, plus explain() and Forward str() cleanup; PRs graciously provided by eswald.

    • Fixed docstring with embedded '\w', which creates SyntaxWarnings in Py3.8, issue #80.

    • Examples:

      • Added example parser for rosettacode.org tutorial compiler.

      • Added example to show how an HTML table can be parsed into a collection of Python lists or dicts, one per row.

      • Updated SimpleSQL.py example to handle nested selects, reworked 'where' expression to use infixNotation.

      • Added include_preprocessor.py, similar to macroExpander.py.

      • Examples using makeHTMLTags use new tag_body expression when retrieving a tag's body text.

      • Updated examples that are runnable as unit tests:

        python setup.py test -s examples.antlr_grammar_tests
        python setup.py test -s examples.test_bibparse
        
    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.4.0-py2.py3-none-any.whl(60.82 KB)
    pyparsing-2.4.0.tar.gz(597.61 KB)
  • pyparsing_2.3.1(Jan 13, 2019)

    New features in Pyparsing 2.3.1 -

    • ParseException.explain() method, to convert a raw Python traceback into a list of the parse expressions leading up to a parse mismatch.

    • New unicode sets Latin-A and Latin-B, and the ability to define custom sets using multiple inheritance.

        class Turkish_set(pp.pyparsing_unicode.Latin1, pp.pyparsing_unicode.LatinA):
            pass
      
        turkish_word = pp.Word(Turkish_set.alphas)
      
    • State machine examples, showing how to extend Python with your own pyparsing-enabled syntax. The examples implement a 'statemachine' keyword to define a set of classes and transition attribute to implement a State pattern:

        statemachine TrafficLightState:
            Red -> Green
            Green -> Yellow
            Yellow -> Red
      

      Transitions can be named also:

        statemachine LibraryBookState:
            New -(shelve)-> Available
            Available -(reserve)-> OnHold
            OnHold -(release)-> Available
            Available -(checkout)-> CheckedOut
            CheckedOut -(checkin)-> Available
      
    • Example parser for decaf language. This language is commonly used in university CS compiler classes.

    • Fixup of docstrings to Sphinx format, so pyparsing docs are now available on readthedocs.com! (https://pyparsing-docs.readthedocs.io/en/latest/)

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.3.1-py2.py3-none-any.whl(60.30 KB)
    pyparsing-2.3.1.tar.gz(582.51 KB)
  • pyparsing_2.3.0(Oct 31, 2018)

    • NEW SUPPORT FOR UNICODE CHARACTER RANGES This release introduces the pyparsing_unicode namespace class, defining a series of language character sets to simplify the definition of alphas, nums, alphanums, and printables in the following language sets: . Arabic . Chinese . Cyrillic . Devanagari . Greek . Hebrew . Japanese (including Kanji, Katakana, and Hirigana subsets) . Korean . Latin1 (includes 7 and 8-bit Latin characters) . Thai . CJK (combination of Chinese, Japanese, and Korean sets)

    POSSIBLE API CHANGES:

    • IndexErrors raised in parse actions are now wrapped in ParseExceptions
    • ParseResults have had several bugfixes which remove erroneous nesting levels See the CHANGES file for more details.

    New classes:

    • PrecededBy - lookbehind match
    • Char - single character match (similar to Word(exact=1))
    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.3.0-py2.py3-none-any.whl(58.57 KB)
    pyparsing-2.3.0.tar.gz(828.58 KB)
    pyparsing-2.3.0.zip(1010.66 KB)
  • pyparsing_2.2.2(Sep 30, 2018)

    Version 2.2.2 - September, 2018

    • Fixed bug in SkipTo, if a SkipTo expression that was skipping to an expression that returned a list (such as an And), and the SkipTo was saved as a named result, the named result could be saved as a ParseResults - should always be saved as a string. Issue #28, reported by seron.

    • Added simple_unit_tests.py, as a collection of easy-to-follow unit tests for various classes and features of the pyparsing library. Primary intent is more to be instructional than actually rigorous testing. Complex tests can still be added in the unitTests.py file.

    • New features added to the Regex class:

      • optional asGroupList parameter, returns all the capture groups as a list

      • optional asMatch parameter, returns the raw re.match result

      • new sub(repl) method, which adds a parse action calling re.sub(pattern, repl, parsed_result). Simplifies creating Regex expressions to be used with transformString. Like re.sub, repl may be an ordinary string (similar to using pyparsing's replaceWith), or may contain references to capture groups by group number, or may be a callable that takes an re match group and returns a string.

        For instance:

        expr = pp.Regex(r"([Hh]\d):\s*(.*)").sub(r"<\1>\2</\1>")
        expr.transformString("h1: This is the title")
        

        will return

        <h1>This is the title</h1>
        
    • Fixed omission of LICENSE file in source tarball, also added CODE_OF_CONDUCT.md per GitHub community standards. Issue #31

    Source code(tar.gz)
    Source code(zip)
    pyparsing-2.2.2-py2.py3-none-any.whl(56.43 KB)
    pyparsing-2.2.2.tar.gz(818.01 KB)
  • pyparsing_2.2.1(Sep 18, 2018)

    • Updates to migrate source repo to GitHub
    • Fix deprecation warning in Python 3.7 re: importing collections.abc
    • Fix Literal/Keyword bug raising IndexError instead of ParseException
    Source code(tar.gz)
    Source code(zip)
Owner
Pyparsing
Pyparsing
Parse Any Text With Python

ParseAnyText A small package to parse strings. What is the work of it? Well It's a module to creates parser that helps to parse a text easily with les

Sayam Goswami 1 Jan 11, 2022
Convert English text to IPA using the toPhonetic

Installation: Windows python -m pip install text2ipa macOS sudo pip3 install text2ipa Linux pip install text2ipa Features Convert English text to I

Joseph Quang 3 Jun 14, 2022
Text to ASCII and ASCII to text

Text2ASCII Description This python script (converter.py) contains two functions: encode() is used to return a list of Integer, one item per character

4 Jan 22, 2022
A Python library that provides an easy way to identify devices like mobile phones, tablets and their capabilities by parsing (browser) user agent strings.

Python User Agents user_agents is a Python library that provides an easy way to identify/detect devices like mobile phones, tablets and their capabili

Selwin Ong 1.3k Dec 22, 2022
The Scary Story - A Text Adventure

This is a text adventure which I made in python 3. This is one of my first big projects so any feedback would be greatly appreciated.

2 Feb 20, 2022
Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

Hotpotato Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes. It is a fullstack React App made with a Redux st

Nico G Pierson 13 Nov 05, 2021
A pipeline for making highlighted text stand-alone.

title emoji colorFrom colorTo sdk app_file pinned decontextualizer πŸ“€ green gray streamlit main.py false Decontextualizer As a second step in improvin

Paul Bricman 26 Dec 17, 2022
Repositori untuk belajar pemrograman Python dalam bahasa Indonesia

Python Repositori ini berisi kumpulan dari berbagai macam contoh struktur data, algoritma dan komputasi matematika yang diimplementasikan dengan mengg

Bellshade 111 Dec 19, 2022
Add your new words to a text file and get them randomly.

Memorize-New-Words In this very very very little project, I've wrote a code to memorize new english words. Therefore you can add the words and their m

Mostafa 2 Jul 04, 2022
🚩 A simple and clean python banner generator - Banners

🚩 A simple and clean python banner generator - Banners

Kumar Vicku 12 Oct 09, 2022
A Python app which can convert normal text to Handwritten text.

Text to HandWritten Text ✍️ Converter Watch Tutorial for this project Usage:- Clone my repository. Open CMD in working directory. Run following comman

Kushal Bhavsar 5 Dec 11, 2022
ChirpText is a collection of text processing tools for Python 3.

ChirpText is a collection of text processing tools for Python 3. It is not meant to be a powerful tank like the popular NTLK but a small package which

Le Tuan Anh 5 Nov 30, 2022
This project aims to test check if your RegExp are being matched by grep.

Bash RegExp This project aims to test check if your RegExp are being matched by grep. It's a local server that starts on the port 8080. It runs the se

Quatrecentquatre 1 Feb 28, 2022
a python package that lets you add custom colors and text formatting to your scripts in a very easy way!

colormate Python script text formatting package What is colormate? colormate is a python library that lets you add text formatting to your scripts, it

Rodrigo 2 Dec 14, 2022
Fuzz a language by mixing up only few words.

afasi Fuzz a language by mixing up only few words. Status Beta. Note: The default branch is default. Use Examples Version General Help Translate Help

Stefan Hagen 2 Dec 14, 2022
Text Summarizationcls app with python

Text Summarizationcls app This is the repo for the Text Summarization AI Project. It makes use of pre-trained Hugging Face models Packages Used The pa

Edem Gold 1 Oct 23, 2021
Make writing easier!

Handwriter Make writing easier! How to Download and install a handwriting font, or create a font from your handwriting. Use a word processor like Micr

64 Dec 25, 2022
Returns unicode slugs

Python Slugify A Python slugify application that handles unicode. Overview Best attempt to create slugs from unicode strings while keeping it DRY. Not

Val Neekman 1.3k Jan 04, 2023
A simple Python module for parsing human names into their individual components

Name Parser A simple Python (3.2+ & 2.6+) module for parsing human names into their individual components. hn.title hn.first hn.middle hn.last hn.suff

Derek Gulbranson 574 Dec 20, 2022