Detailed Solutions in Eight Programming Languages

Take the guesswork out of using regular expressions. With more than 140 practical recipes, this cookbook provides everything you need to solve a wide range of real-world problems. Novices will learn basic skills and tools, and programmers and experienced users will find a wealth of detail. Each recipe provides samples you can use right away.

This revised edition covers the regular expression flavors used by C#, Java, JavaScript, Perl, PHP, Python, Ruby, and VB.NET. You’ll learn powerful new tricks, avoid flavor-specific gotchas, and save valuable time with this huge library of practical solutions.

  • Learn regular expressions basics through a detailed tutorial
  • Use code listings to implement regular expressions with your language of choice
  • Understand how regular expressions differ from language to language
  • Handle common user input with recipes for validation and formatting
  • Find and manipulate words, special characters, and lines of text
  • Detect integers, floating-point numbers, and other numerical formats
  • Parse source code and process log files
  • Use regular expressions in URLs, paths, and IP addresses
  • Manipulate HTML, XML, and data exchange formats
  • Discover little-known regular expression tricks and techniques

Jan Goyvaerts

Jan Goyvaerts runs Just Great Software, where he designs and develops some of the most popular regular expression software. His products include RegexBuddy, the world's only regular expression editor that emulates the peculiarities of 15 regular expression flavors, and PowerGREP, the most feature-rich grep tool for Microsoft Windows.

Steven Levithan

Steve Levithan works at Facebook as a JavaScript engineer. He has enjoyed programming for nearly 15 years, working in Tokyo, Washington D.C., Baghdad, and Silicon Valley. Steven is a leading JavaScript regular expression expert, and has created a variety of open source regular expression tools including RegexPal and the XRegExp library.

  1. Chapter 1 Introduction to Regular Expressions

    1. Regular Expressions Defined

    2. Search and Replace with Regular Expressions

    3. Tools for Working with Regular Expressions

  2. Chapter 2 Basic Regular Expression Skills

    1. Match Literal Text

    2. Match Nonprintable Characters

    3. Match One of Many Characters

    4. Match Any Character

    5. Match Something at the Start and/or the End of a Line

    6. Match Whole Words

    7. Unicode Code Points, Categories, Blocks, and Scripts

    8. Match One of Several Alternatives

    9. Group and Capture Parts of the Match

    10. Match Previously Matched Text Again

    11. Capture and Name Parts of the Match

    12. Repeat Part of the Regex a Certain Number of Times

    13. Choose Minimal or Maximal Repetition

    14. Eliminate Needless Backtracking

    15. Prevent Runaway Repetition

    16. Test for a Match Without Adding It to the Overall Match

    17. Match One of Two Alternatives Based on a Condition

    18. Add Comments to a Regular Expression

    19. Insert Literal Text into the Replacement Text

    20. Insert the Regex Match into the Replacement Text

    21. Insert Part of the Regex Match into the Replacement Text

    22. Insert Match Context into the Replacement Text

  3. Chapter 3 Programming with Regular Expressions

    1. Programming Languages and Regex Flavors

    2. Literal Regular Expressions in Source Code

    3. Import the Regular Expression Library

    4. Create Regular Expression Objects

    5. Set Regular Expression Options

    6. Test If a Match Can Be Found Within a Subject String

    7. Test Whether a Regex Matches the Subject String Entirely

    8. Retrieve the Matched Text

    9. Determine the Position and Length of the Match

    10. Retrieve Part of the Matched Text

    11. Retrieve a List of All Matches

    12. Iterate over All Matches

    13. Validate Matches in Procedural Code

    14. Find a Match Within Another Match

    15. Replace All Matches

    16. Replace Matches Reusing Parts of the Match

    17. Replace Matches with Replacements Generated in Code

    18. Replace All Matches Within the Matches of Another Regex

    19. Replace All Matches Between the Matches of Another Regex

    20. Split a String

    21. Split a String, Keeping the Regex Matches

    22. Search Line by Line

    23. Construct a Parser

  4. Chapter 4 Validation and Formatting

    1. Validate Email Addresses

    2. Validate and Format North American Phone Numbers

    3. Validate International Phone Numbers

    4. Validate Traditional Date Formats

    5. Validate Traditional Date Formats, Excluding Invalid Dates

    6. Validate Traditional Time Formats

    7. Validate ISO 8601 Dates and Times

    8. Limit Input to Alphanumeric Characters

    9. Limit the Length of Text

    10. Limit the Number of Lines in Text

    11. Validate Affirmative Responses

    12. Validate Social Security Numbers

    13. Validate ISBNs

    14. Validate ZIP Codes

    15. Validate Canadian Postal Codes

    16. Validate U.K. Postcodes

    17. Find Addresses with Post Office Boxes

    18. Reformat Names From “FirstName LastName” to “LastName, FirstName”

    19. Validate Password Complexity

    20. Validate Credit Card Numbers

    21. European VAT Numbers

  5. Chapter 5 Words, Lines, and Special Characters

    1. Find a Specific Word

    2. Find Any of Multiple Words

    3. Find Similar Words

    4. Find All Except a Specific Word

    5. Find Any Word Not Followed by a Specific Word

    6. Find Any Word Not Preceded by a Specific Word

    7. Find Words Near Each Other

    8. Find Repeated Words

    9. Remove Duplicate Lines

    10. Match Complete Lines That Contain a Word

    11. Match Complete Lines That Do Not Contain a Word

    12. Trim Leading and Trailing Whitespace

    13. Replace Repeated Whitespace with a Single Space

    14. Escape Regular Expression Metacharacters

  6. Chapter 6 Numbers

    1. Integer Numbers

    2. Hexadecimal Numbers

    3. Binary Numbers

    4. Octal Numbers

    5. Decimal Numbers

    6. Strip Leading Zeros

    7. Numbers Within a Certain Range

    8. Hexadecimal Numbers Within a Certain Range

    9. Integer Numbers with Separators

    10. Floating-Point Numbers

    11. Numbers with Thousand Separators

    12. Add Thousand Separators to Numbers

    13. Roman Numerals

  7. Chapter 7 Source Code and Log Files

    1. Keywords

    2. Identifiers

    3. Numeric Constants

    4. Operators

    5. Single-Line Comments

    6. Multiline Comments

    7. All Comments

    8. Strings

    9. Strings with Escapes

    10. Regex Literals

    11. Here Documents

    12. Common Log Format

    13. Combined Log Format

    14. Broken Links Reported in Web Logs

  8. Chapter 8 URLs, Paths, and Internet Addresses

    1. Validating URLs

    2. Finding URLs Within Full Text

    3. Finding Quoted URLs in Full Text

    4. Finding URLs with Parentheses in Full Text

    5. Turn URLs into Links

    6. Validating URNs

    7. Validating Generic URLs

    8. Extracting the Scheme from a URL

    9. Extracting the User from a URL

    10. Extracting the Host from a URL

    11. Extracting the Port from a URL

    12. Extracting the Path from a URL

    13. Extracting the Query from a URL

    14. Extracting the Fragment from a URL

    15. Validating Domain Names

    16. Matching IPv4 Addresses

    17. Matching IPv6 Addresses

    18. Validate Windows Paths

    19. Split Windows Paths into Their Parts

    20. Extract the Drive Letter from a Windows Path

    21. Extract the Server and Share from a UNC Path

    22. Extract the Folder from a Windows Path

    23. Extract the Filename from a Windows Path

    24. Extract the File Extension from a Windows Path

    25. Strip Invalid Characters from Filenames

  9. Chapter 9 Markup and Data Formats

    1. Processing Markup and Data Formats with Regular Expressions

    2. Find XML-Style Tags

    3. Replace Tags with

    4. Remove All XML-Style Tags Except and

    5. Match XML Names

    6. Convert Plain Text to HTML by Adding


    7. Decode XML Entities

    8. Find a Specific Attribute in XML-Style Tags

    9. Add a cellspacing Attribute to Tags That Do Not Already Include It
    10. Remove XML-Style Comments

    11. Find Words Within XML-Style Comments

    12. Change the Delimiter Used in CSV Files

    13. Extract CSV Fields from a Specific Column

    14. Match INI Section Headers

    15. Match INI Section Blocks

    16. Match INI Name-Value Pairs

    17. Colophon