正则表达式&字符类&通配符
Regular Expression 正则表达式
concept
- 正则表达式是一个定义了pattern字符序列的字符串。
- 此模式可以与另一个名为Matcher的字符串进行匹配。
- 为了使用正则表达式,Java在Java .util.regex包中提供了两个类,Pattern和Matcher
包含两个class:
1. Pattern Class
定义了一个模式(在搜索中使用)。
它没有任何构造函数和静态方法,使用Pattern类的compile()工厂方法来获取Pattern类的引用。
For example
Pattern myPattern = Pattern.compile(“表达式”);
这样就让表达式变成了一个模式
2. Matcher Class
用于模式匹配,pattern对象用于创建Matcher对象。
一旦创建了pattern对象就可以创建matcher对象
For example:
//mypattern就是上面定义过的那个模式
Matcher myMatcher = myPattern.matcher(“表达式”);
使用Pattern类定义正则表达式
使用Matcher类在其他序列中匹配模式
一旦创建了Matcher类对象 就可以用它含有的方法去进行匹配
Methods of Matcher Class
boolean find() | find the next subsequence in the text that matches a pattern specified |
---|---|
Matcher appendReplacement (StringBuffer sb, String replacement) | If the match found, then replaced by the replacement string and is appended to the StringBuffer variable. Further, the remaining character sequence is truncated. |
StringBuffer appendTail(StringBuffer sb) | If the match found, then replaced by the replacement string and is appended to the StringBuffer variable. Further, the remaining character sequence is appended. |
String replaceAll(String replacement) | replace every subsequence of the input sequence |
String replaceFirst(String replacement) | replace the first subsequence of the input sequence |
String group() | Returns a subsequence that matches a pattern in the text. |
int start(int group) | Return the start index of the matched subsequence |
int end(int group) | Returns the ending index of the matched subsequence. |
解释:
-
boolean mathes() 可以确定是否匹配
必须保证整个序列匹配 而不是子序列
Pattern pat1=Pattern.compile("Java"); Matcher mat1=pat1.matcher("java"); System.out.println(mat1.matches()); //返回false
-
boolean find()
如果存在子序列匹配就返回true 否则返回false
可以重复调用这个方法查找所有的匹配 每次都从上一次离开的位置开始。
搭配下一个方法使用
-
String group()
方法匹配的字符串 如果没有会抛出IllegalStateException.
Pattern pattern=Pattern.compile(".+?is"); Matcher matcher=pattern.matcher("This is a this"); while(matcher.find()) { System.out.println(matcher.group()); } /* 返回: This is a this */
-
Matcher appendReplacement (StringBuffer sb, String replacement)
-
StringBuffer appendTail(StringBuffer sb)
appendReplacement:
将当前匹配的子字符串替换为指定的字符串,并且将替换后的字符串及其之前到上次匹配的子字符串之后的字符串添加到一个StringBuffer对象中。
appendTail:
将最后一次匹配之后剩余的字符串添加到一个StringBuffer对象中。
appendtail 搭配appendreplace一起使用
Pattern pat=Pattern.compile("Chang", pattern.CASE_INSENSITIVE); StringBuffer str=new StringBuffer(); Matcher mat=pat.matcher("Chang is a girl"); while(mat.find()) { mat.appendReplacement(str, "chang"); System.out.println(str); } /* output: chang */ mat.appendTail(str); System.out.println(str); /* chang is a girl */
-
String replaceAll(String replacement)
可以将所有匹配的地方更换为新的字符串
Pattern pat=Pattern.compile("Chang", pattern.CASE_INSENSITIVE); Matcher mat=pat.matcher("Chang is a chang"); //replaceAll String str1=mat.replaceAll("c"); System.out.println(str1); /* output: c is a c */
-
String replaceFirst(String replacement)
值更改第一次匹配的地方
Pattern pat=Pattern.compile("Chang", pattern.CASE_INSENSITIVE); Matcher mat=pat.matcher("Chang is a chang"); //replaceFirst String str1=mat.replaceFirst("c"); System.out.println(str1); /* output: c is a chang */
-
int start(int group)
获得输入序列当前匹配的索引
-
int end(int group)
获得当前匹配序列的末尾索引
Pattern pat=Pattern.compile("Chang", pattern.CASE_INSENSITIVE); Matcher mat=pat.matcher("Chang is a chang"); while(mat.find()) { System.out.println("start:"+mat.start()); System.out.println("end:"+mat.end()); } /* output: start:0 end:5 start:11 end:16 */
Methods of Pattern Class
String pattern() | Return the compiled regular expression. |
---|---|
String[] split(CharSequence input, int limit) | Split the given input sequence based on the pattern and the limit. |
-
String[] split(CharSequence input, int limit)
根据模式和限制分割给定的输入序列。
Pattern mypattern=Pattern.compile(":"); //pattern 返回已经编译的正则表达式 String s=mypattern.pattern(); //split 根据模式和限制 拆分string 可以规定分割的份数 String []split=mypattern.split("one:two:there:four", 2); for(String ele:split) { System.out.println("Elemt: "+ele); } /* output: Elemt: one Elemt: two:there:four */
Character Classes 字符类
Character Class | Description | Example |
---|---|---|
[def] | Succeeds, if start with any character d, e or f. 以d e f 任意一个开头都可以 | Pattern.matches("[abc]at",“bat”)//true Pattern.matches("[mno]at",“rat”)//false |
[^def] | Succeeds, if start with any character except d, e, or f(Negation) 除了d e f开头的 | Pattern.matches("[^pqr]",“a”)//true Pattern.matches("[^pqr]",“abcd”)//false Pattern.matches("[^pqr]",“p”)//false |
[a-zA-Z] | Succeeds, if starts with any character between a to z or A to Z, inclusive (range). 表示一个匹配区间 | Pattern.matches("[a-zA-Z]",“A”)//true Pattern.matches("[a-zA-Z]",“a”)//true |
[b-e[n-q]] | Succeeds, if starts with any character between b to e or n to q. 以b开头到e或者[n-q] | Pattern.matches("[d-j[m-q]]",“p”)//true Pattern.matches("[d-j[m-q]]",“x”)//false |
[a-z&&[abc]] | Succeed, if starts with characters: a, b, or c (intersection). | Pattern.matches("[a-z&&[abc]]",“a”);//true Pattern.matches("[a-z&&[abc]]",“g”);//false |
[a-z&&[^bcd]] | Succeed, if starts with any character between a to z, except b, c, and d : ae-z. | Pattern.matches("[a-z&&[^bcd]]",“g”);//true Pattern.matches("[a-z&&[^bcd]]",“b”);//false |
[a-z&&[^n-p]] | Succeeds, if starts with any character between a to z and not n to p: [a-mq-z] (subtraction). | Pattern.matches("[a-z&&[^c-e]]at",“mat”);//true Pattern.matches("[a-z&&[^c-e]]at",“cat”);//false |
Quantifiers 量词和通配符
限定符指定字符出现的次数。
三种类型:
- 贪婪: 用于匹配与模式匹配的可能最长的字符串
- 胁迫: 用于匹配匹配模式的尽可能短的字符串。
- 所有格: 正则表达式与整个字符串匹配。它只在整个字符串满足条件时匹配。
Greedy | Reluctant | Possessive | Description |
---|---|---|---|
X? | X?? | X?+ | X occurs 1 或者 0次 |
X* | X*? | X*+ | X occurs 0次或者更多次 |
X+ | X+? | X++ | X occurs 1次或者更多次 |
X{n} | X{n}? | X{n}+ | X occurs 只有n次 |
X{n,} | X{n,}? | X{n,}+ | X occurs >=n次 |
X{n,m} | X{n,m}? | X{n,m}+ | X occurs n-m次 |
for example
Pattern pattern=Pattern.compile(".+is");
Matcher matcher=pattern.matcher("This is a this");
while(matcher.find()) {
System.out.println(matcher.group());
}
/*
output:
This is a this
*/
修改了表达式后 改为胁迫
Pattern pattern=Pattern.compile(".+?is");
Matcher matcher=pattern.matcher("This is a this");
while(matcher.find()) {
System.out.println(matcher.group());
}
/*
output:
This
is
a this
*/