Pyparsing Wiki Home


Welcome to the Pyparsing Wiki Home! -
Download now from SourceForge!

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code.

btn_donate_LG.gifPyparsing is freely licensed for non-commercial or commercial use, but donations are greatly appreciated - 10% goes to the Python Software Foundation! (requires free registration with SourceForge) IF YOU ARE USING A BOOTLEG COPY OF "GETTING STARTED WITH PYPARSING" A DONATION HERE IS STRONGLY ENCOURAGED!
gswp_cover.gif //**O'Reilly has released "Getting Started with Pyparsing"as part of its Short Cut series! This e-book includes topics such as:
  • "Hello, World!" on Steroids
  • The Zen of Pyparsing (free sample chapter online)
  • Scraping data from a complex web page
  • Parsing S-expressions
  • Writing a Search Engine in 100 lines of code


  • 20 July 2013 - NOTE - Pyparsing 2.0.1 was just released, and supports Python versions 2.6 and later. If you are using Python 2.5 or older, you must specifcally install version 1.5.7.
  • IF YOU ARE USING PYTHON 2.6 OR LATER AND ARE HAVING DIFFICULTIES INSTALLING THE LATEST PYPARSING, PLEASE MESSAGE ME IMMEDIATELY!
  • See more info on the News page


Here is a program to parse "Hello, World!" (or any greeting of the form "<salutation>, <addressee>!"):
from pyparsing import Word, alphas
greet = Word( alphas ) + "," + Word( alphas ) + "!" # <-- grammar defined here
hello = "Hello, World!"
print (hello, "->", greet.parseString( hello ))

The program outputs the following:
Hello, World! -> ['Hello', ',', 'World', '!']

The Python representation of the grammar is quite readable, owing to the self-explanatory class names, and the use of '+', '|' and '^' operator definitions.
The parsed results returned from parseString() can be accessed as a nested list, a dictionary, or an object with named attributes.
The parsing module handles some of the problems that are typically vexing when writing text parsers:
  • extra or missing whitespace (the above program will also handle "Hello,World!", "Hello , World !", etc.)
  • quoted strings
  • embedded comments

The .zip file includes examples of a simple SQL parser, simple CORBA IDL parser, a config file parser, a chemical formula parser, an HTTP server log parser, a comma-separated list parser, and a four-function algebraic notation parser. It also includes a simple how-to document, and a UML class diagram of the library's classes.

Subversion access

This project's SourceForge.net Subversion repository can be checked out through SVN with the following instruction set:


Or you can browse the repository with your browser, at:


(updated 1 Jun 2008)


Please let me know if you find this package helpful.

Regards,
-- Paul McGuire


Acknowledgements


Thanks to Dave Kuhlman, Dr. Mark E. Light, and Boris Boutillier for their early feedback and suggestions on this module. Thanks also to Sverrir Valgeirsson, Chirag Wazir, Harald Armin Massa, Tony Shadwick, Maarten van Reeuwijk, Thomas Kalka, Lee SangYeong, Jim Richardson, Brad Clements, Mike Kelly, John Hunter, Seo Sanghyeon, Eric van der Vlist, 'Dang' Daniel Griffith, James Reeves, Jean-Guillaume Paradis, Wilson Fowlie, Carl Reitschuster, Petri Savolainen, Rick Walia, Andrea Griffini, Alberto Santini, Duncan McGreggor, Ravi Bhalotia, Kent Johnson, Gavin Panella, Alex Martelli, and Raymond Hettinger for their contributions, testing, feedback, and helpful comments and support.