Pyparsing Wiki Home

Welcome to the Pyparsing Wiki Home! -
Download now from SourceForge!

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code.

btn_donate_LG.gifPyparsing is freely licensed for non-commercial or commercial use, but donations are greatly appreciated - 10% goes to the Python Software Foundation! (requires free registration with SourceForge) IF YOU ARE USING A BOOTLEG COPY OF "GETTING STARTED WITH PYPARSING" A DONATION HERE IS STRONGLY ENCOURAGED!
gswp_cover.gif //**O'Reilly has released "Getting Started with Pyparsing"as part of its Short Cut series! This e-book includes topics such as:
  • "Hello, World!" on Steroids
  • The Zen of Pyparsing (free sample chapter online)
  • Scraping data from a complex web page
  • Parsing S-expressions
  • Writing a Search Engine in 100 lines of code

  • NOTE - Pyparsing 2.x supports Python versions 2.6, 2.7, and 3.x. If you are using Python 2.5 or older, you must specifcally install version 1.5.7.
  • See more info on the News page

Here is a program to parse "Hello, World!" (or any greeting of the form "<salutation>, <addressee>!"):
from pyparsing import Word, alphas
greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
hello = "Hello, World!"
print (hello, "->", greet.parseString( hello ))

The program outputs the following:
Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions.
The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.
The parsing module handles some of the problems that are typically vexing when writing text parsers:
  • extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
  • quoted strings
  • embedded comments

The .zip file includes examples of a simple SQL parser, simple CORBA IDL parser, a config file parser, a chemical formula parser, an HTTP server log parser, a comma-separated list parser, and a four-function algebraic notation parser. It also includes a simple how-to document, and a UML class diagram of the library's classes.

Subversion access

This project's Subversion repository can be checked out through SVN with the following instruction set:

Or you can browse the repository with your browser, at:

(updated 1 Jun 2008)

Please let me know if you find this package helpful.

-- Paul McGuire


Thanks to Dave Kuhlman, Dr. Mark E. Light, and Boris Boutillier for their early feedback and suggestions on this module. Thanks also to Sverrir Valgeirsson, Chirag Wazir, Harald Armin Massa, Tony Shadwick, Maarten van Reeuwijk, Thomas Kalka, Lee SangYeong, Jim Richardson, Brad Clements, Mike Kelly, John Hunter, Seo Sanghyeon, Eric van der Vlist, 'Dang' Daniel Griffith, James Reeves, Jean-Guillaume Paradis, Wilson Fowlie, Carl Reitschuster, Petri Savolainen, Rick Walia, Andrea Griffini, Alberto Santini, Duncan McGreggor, Ravi Bhalotia, Kent Johnson, Gavin Panella, Alex Martelli, and Raymond Hettinger for their contributions, testing, feedback, and helpful comments and support.

Special thanks to JetBrains for providing PyCharm to this Open Source Project free-of-charge.

NOTE: when embedding code in a Discussion post below, you must use [[code]] tags as shown below:
  [[code]]  <-- these lines need to be in column 1, with nothing else on the line
  this is the code here

Follow this link for more formatting help: