Regular Expression

  • By
  • November 21, 2020
  • Uncategorized
Regular Expression

Introduction

Regular Expressions (called REs, or regexes, or regex patterns) are used to extract patterns from a given text.
They are language independent. Regular expressions are widely used in applications that require some of the
use cases mentioned below but not restricted to it.

  • To find the given pattern in a text i.e like find, replace a word etc. in word application.
  • To find or extract all phone numbers or emails from a given text file.
  • It can also be used to extract or scrape information from the given website through web scraping.
  • It can also be used for validations like
  • Is it a valid email id or not.
  • Is it a valid phone number or mobile number or not
  • It can be used for any kind of text analytics
  • Natural Language Processing

 

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

Regular Expressions can be used for any type of pattern matching or pattern extraction from a given text.
Regular expressions are widely used in applications that require to do the above-mentioned use cases.

Simple Patterns

We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to
operate on strings, we’ll begin with the most common task: matching characters.

Matching Characters

Most letters and characters will simply match themselves. For example, the regular expression Python will
match the string hello exactly. (If case-insensitive mode is set that would allow this RE match Python or
PYTHON as well; we shall discuss more about this later.)

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

Note :

However there are exceptions to this rule; some characters are special metacharacters, and don’t match
themselves.

Here’s a complete list of the metacharacters; their meanings will be discussed below

. ^ $ * + ? { } [ ] \ | ( )

Implementation in Python

Step 1:

Import the re module

In [1]: import re

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

Step 2:

Create an regex object using compile method of re module and pass the pattern we want to match as the
argument to compile method of re module as shown below.

In [2]: pattern = re.compile(‘Python’)

To check the type of pattern i.e an object, we use the following code

In [3]: type(pattern)

From the above code, we can tell that it is an object of re. pattern i.e a regex object.

Step 3:

Once we have implemented step 2, we can go for creating the callable iterator object as shown below

In [4]: matches = pattern.finditer(‘Welcome to Python Regular Expression.’)

In the finite method pass the target string as an argument from which we have to search the pattern Python
mentioned in Step 1

Let’s now check the type of matches

In [5]: type(matches)

As we can see it is an object of type callable_iterator. Now we can iterate over it using a for loop.

In [6]: for match in matches:

 

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

print(‘Match found at’,’start Index:’,match.start(),’–>’,’End Index:’,mat

ch.end(),’–>’,’Match Pattern:’,match.group())

Here

start() –> Returns the start of the match pattern

end() –> Returns the end index + 1 of the match pattern

group() –> Returns the matched pattern

Now, Let’s check the type of it

In [7]: type(match)

 

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

As we can see it is of type Match Object.

Out[3]: re.Pattern

Out[5]: callable_iterator

Match found at start Index: 11 –> End Index: 17 –> Match Pattern: Python

Out[7]: re.Match

Let’s look at another example

we have a text string and we want to match a particular string from it. It can be done as given below
In [8]:   target = “””Regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine is written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and write the RE in a certain way in order to produce bytecode that runs faster. Optimization isn’t covered in this document, because it requires that you have a good understanding of the matching engine’s internals. The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that can be done with reg lar expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable.”””
In [9]: # Lets do text analytics of counting how many times “regular expression” word appears in the target string

 

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

#Step 1
import re

#Step 2 — lets set pattern to match “regular expression” from the target string pattern = re.compile(‘regular expression’)

#Step 3 — create an match object of case insensitive matcher = pattern.finditer(target,re.IGNORECASE)

In [10]: # Now lets iterate through it to find the match
count = 0 for the match in matcher:
count +=1
print(f’Match of {match.group()} is available at start index: ‘,match.star
t())
print(f”Total {match.group()} found is: {count}”)
Hence we have a total of 4 occurrences of “regular expression” in the given target of strings
Match of the regular expression is available at start index: 458
Match of a regular expression is available at start index: 585
Match of a regular expression is available at start index: 649
Match of a regular expression is available at start index: 856
The total regular expression found is: 4

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

Conclusion

Let’s recap what we have learned so far in this article.

If you want to represent a group of strings according to a particular pattern then you should go for Regular
Expressions

Steps to follow

Step 1: import re –> Importing the module

Step 2: pattern = re.compile(‘Python’) –> Creating an regex object

Step 3: matcher = pattern.finditer(‘Welcome to Python Regular Expression’) –> Creating an Match object

After you have completed the above three steps:

iterate through the callable_iterator i.e matcher in our case using a for loop and extract its

start index –> by calling start() method

end index –> by calling end() method

 

For Free, Demo classes Call: 7507414653
Registration Link: Click Here!

match –> by calling group() methStay tuned for the next series i.e part 2, Where we shall discuss further more Regular Expression patterns to
extract using i.e making use of meta characters discussed above like

  • character classes
  • predefined character classes
  • quantifiers and
  • important methods of re module

 

Author:
Titus Newton | SevenMentor Pvt Ltd.

 

Call the Trainer and Book your free demo Class for now!!!

 

Submit Comment

Your email address will not be published. Required fields are marked *

*
*