Python RegEx

❮ Vorig Volgende ❯

Een RegEx, of reguliere expressie, is een reeks tekens die een zoekpatroon vormt.

RegEx kan worden gebruikt om te controleren of een tekenreeks het opgegeven zoekpatroon bevat.

RegEx-module

Python heeft een ingebouwd pakket genaamd re, dat kan worden gebruikt om met reguliere expressies te werken.

Importeer de remodule:

import re

RegEx in Python

Als je de module hebt geïmporteerd re, kun je reguliere expressies gaan gebruiken:

Voorbeeld

Zoek in de string of deze begint met "The" en eindigt met "Spanje":

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

RegEx-functies

De remodule biedt een reeks functies waarmee we in een string naar een match kunnen zoeken:

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

metakarakters

Metatekens zijn tekens met een speciale betekenis:

Character	Description	Example
[]	A set of characters	"[a-m]"
\	Signals a special sequence (can also be used to escape special characters)	"\d"
.	Any character (except newline character)	"he..o"
^	Starts with	"^hello"
$	Ends with	"planet$"
*	Zero or more occurrences	"he.*o"
+	One or more occurrences	"he.+o"
?	Zero or one occurrences	"he.?o"
{}	Exactly the specified number of occurrences	"he{2}o"
\|	Either or	"falls\|stays"
()	Capture and group

Speciale sequenties

Een speciale reeks wordt \gevolgd door een van de tekens in de onderstaande lijst en heeft een speciale betekenis:

Character	Description	Example
\A	Returns a match if the specified characters are at the beginning of the string	"\AThe"
\b	Returns a match where the specified characters are at the beginning or at the end of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\bain" r"ain\b"
\B	Returns a match where the specified characters are present, but NOT at the beginning (or at the end) of a word (the "r" in the beginning is making sure that the string is being treated as a "raw string")	r"\Bain" r"ain\B"
\d	Returns a match where the string contains digits (numbers from 0-9)	"\d"
\D	Returns a match where the string DOES NOT contain digits	"\D"
\s	Returns a match where the string contains a white space character	"\s"
\S	Returns a match where the string DOES NOT contain a white space character	"\S"
\w	Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character)	"\w"
\W	Returns a match where the string DOES NOT contain any word characters	"\W"
\Z	Returns a match if the specified characters are at the end of the string	"Spain\Z"

sets

Een set is een set tekens binnen een paar vierkante haken []met een speciale betekenis:

Set	Description	Try it
[arn]	Returns a match where one of the specified characters (`a`, `r`, or `n`) are present
[a-n]	Returns a match for any lower case character, alphabetically between `a` and `n`
[^arn]	Returns a match for any character EXCEPT `a`, `r`, and `n`
[0123]	Returns a match where any of the specified digits (`0`, `1`, `2`, or `3`) are present
[0-9]	Returns a match for any digit between `0` and `9`
[0-5][0-9]	Returns a match for any two-digit numbers from `00` and `59`
[a-zA-Z]	Returns a match for any character alphabetically between `a` and `z`, lower case OR upper case
[+]	In sets, `+`, `*`, `.`, `\|`, `()`, `$`,`{}` has no special meaning, so `[+]` means: return a match for any `+` character in the string

De findall()-functie

De findall()functie retourneert een lijst met alle overeenkomsten.

Voorbeeld

Print een lijst van alle wedstrijden:

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x)

De lijst bevat de overeenkomsten in de volgorde waarin ze zijn gevonden.

Als er geen overeenkomsten worden gevonden, wordt een lege lijst geretourneerd:

Voorbeeld

Retourneer een lege lijst als er geen overeenkomst is gevonden:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x)

De zoekfunctie ()

De search()functie zoekt in de tekenreeks naar een overeenkomst en retourneert een Match-object als er een overeenkomst is.

Als er meer dan één overeenkomst is, wordt alleen het eerste exemplaar van de overeenkomst geretourneerd:

Voorbeeld

Zoek naar het eerste witruimteteken in de tekenreeks:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

Als er geen overeenkomsten worden gevonden, wordt de waarde Nonegeretourneerd:

Voorbeeld

Voer een zoekopdracht uit die geen overeenkomst oplevert:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x)

De split()-functie

De split()functie retourneert een lijst waarin de tekenreeks bij elke overeenkomst is gesplitst:

Voorbeeld

Splitsen bij elk witruimteteken:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x)

U kunt het aantal keren bepalen door de maxsplit parameter op te geven:

Voorbeeld

Splits de tekenreeks alleen bij het eerste optreden:

import re

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x)

De sub() Functie

De sub()functie vervangt de overeenkomsten door de tekst van uw keuze:

Voorbeeld

Vervang elk witruimteteken door het cijfer 9:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x)

U kunt het aantal vervangingen regelen door de count parameter op te geven:

Voorbeeld

Vervang de eerste 2 exemplaren:

import re

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x)

Overeenkomen met object

Een Match Object is een object dat informatie bevat over de zoekopdracht en het resultaat.

Opmerking: als er geen overeenkomst is, wordt de waarde Nonegeretourneerd in plaats van het overeenkomstobject.

Voorbeeld

Voer een zoekopdracht uit die een Match Object retourneert:

import re

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) #this will print an object

Het Match-object heeft eigenschappen en methoden die worden gebruikt om informatie over de zoekopdracht op te halen, en het resultaat:

.span()geeft een tuple terug met de start- en eindposities van de wedstrijd.
.stringgeeft de tekenreeks terug die aan de functie is doorgegeven
.group()retourneert het deel van de tekenreeks waar een overeenkomst was

Voorbeeld

Print de positie (start- en eindpositie) van de eerste match.

De reguliere expressie zoekt naar woorden die beginnen met een hoofdletter "S":

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.span())

Voorbeeld

Druk de string af die aan de functie is doorgegeven:

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.string)

Voorbeeld

Druk het deel van de string af waar een overeenkomst was.

De reguliere expressie zoekt naar woorden die beginnen met een hoofdletter "S":

import re

txt = "The rain in Spain"
x = re.search(r"\bS\w+", txt)
print(x.group())

Opmerking: als er geen overeenkomst is, wordt de waarde Nonegeretourneerd in plaats van het overeenkomstobject.

❮ Vorig Volgende ❯

Python -zelfstudie

Bestandsbehandeling

Python-modules

Python Matplotlib

Machinaal leren

Python MySQL

Python MongoDB

Python-referentie

Modulereferentie

Python-instructies

Python-voorbeelden

Python RegEx

RegEx-module

RegEx in Python

Voorbeeld

RegEx-functies

metakarakters

Speciale sequenties

sets

De findall()-functie

Voorbeeld

Voorbeeld

De zoekfunctie ()

Voorbeeld

Voorbeeld

De split()-functie

Voorbeeld

Voorbeeld

De sub() Functie

Voorbeeld

Voorbeeld

Overeenkomen met object

Voorbeeld

Voorbeeld

Voorbeeld

Voorbeeld