Regular expression is an special kind of API for finding sequence of character that specifies a pattern which can be searched for in a text and data. Usually regex or regular expression can be use to search, edit and manipulate text and data.
Java Regex API provides one interface and three classes in java.regex.util package.
1. MatchResult interface
A MatchResult interface represents the result of match operation. It contains query methods used to determines the results of a match against a regex.
2. Matcher class
It is a regex engine that interprets the pattern and perform match operations on a character sequence.
3. Pattern class
A Pattern object is an compiled representation of a regex.
4. PatternSyntaxException class
A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regex pattern.
Predefined Character classes
Construct | Description |
---|---|
. | Any character (may or may not match line terminators) |
\d | A digit: [0-9] |
\D | A non-digit: [^0-9] |
\s | A whitespace character: [ \t\n\x0B\f\r] |
\S | A non-whitespace character: [^\s] |
\w | A word character: [a-zA-Z_0-9] |
\W | A non-word character: [^\w] |
Character Classes
Construct | Description |
---|---|
[abc] | a, b, or c (simple class) |
[^abc] | Any character except a, b, or c (negation) |
[a-zA-Z] | a through z, or A through Z, inclusive (range) |
[a-d[m-p]] | a through d, or m through p: [a-dm-p] (union) |
[a-z&&[def]] | d, e, or f (intersection) |
[a-z&&[^bc]] | a through z, except for b and c: [ad-z] (subtraction) |
[a-z&&[^m-p]] | a through z, and not m through p: [a-lq-z] (subtraction) |
Quantifiers:
Quantifiers allow you to specify number of occurrences to match against.
Greedy | Reluctant | Possessive | Meaning |
---|---|---|---|
X? | X?? | X?+ | X , once or not at all |
X* | X*? | X*+ | X , zero or more times |
X+ | X+? | X++ | X , one or more times |
X{n} | X{n}? | X{n}+ | X , exactly n times |
X{n,} | X{n,}? | X{n,}+ | X , at least n times |
X{n,m} | X{n,m}? | X{n,m}+ | X , at least n but not more than m times |
Boundary Matchers:
Boundary Construct | Description |
---|---|
^ | The beginning of a line |
$ | The end of a line |
\b | A word boundary |
\B | A non-word boundary |
\A | The beginning of the input |
\G | The end of the previous match |
\Z | The end of the input but for the final terminator, if any |
\z | The end of the input |
Some of the important method used in Regex.
public String replaceAll(String replacement)
Used to replaces every sub-sequence of the input sequence that matches the pattern with the given replacement string.
Examples:
Count a word that appears in the input String.
import java.util.Scanner; import java.util.regex.Matcher; import java.util.regex.Pattern; public class PatternMatchDemo { private static final String REGEX = "\\bcat\\b"; public static void main(String arg[]) { Scanner s=new Scanner(System.in); System.out.println("Please enter String"); String str=s.nextLine(); Pattern pat=Pattern.compile(REGEX); Matcher matcher=pat.matcher(str); int count = 0; while(matcher.find()){ count++; } System.out.println("Total number of word : " +REGEX + " in String : " +str + " : " +count); } }
Comments
Post a Comment