Java正则表达式匹配模式详解

Java正则匹配的语法，请参考：Pattern (Java Platform SE 8 )

matches和find区别

matches: 输入的字符串必须和正则一摸一样，类似字符串相等的比较方法， "b".equals("b");

find：输入的字符串里面只要包含了正则式表达的内容即可，类似字符串包含的方法, "b".contains("b");

        String word = "my number is 188"; // matches=false, find=true
        String word1 = "188999"; // matches=false, find=true
        String word2 = "8"; // matches=true, find=true
        Pattern p = Pattern.compile("\\d");
        Matcher m = p.matcher(word2);
        System.out.println(m.matches());
        System.out.println(m.find());

默认匹配案例

       String word = "ac\nab";
       Pattern p = Pattern.compile("^a.*");
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }
//输出
// ac

上面的结果实际上只会输出ac，而ab并不会输出，这是因为Java正则中，如果出现了^ 或 $，默认情况下会忽略任何换行符，也就是说仅仅匹配第一行，后面的所有内容都会被忽略掉，如果我们想要不忽略，就得使用多行匹配模式

如果我们不使用 ^ 和 $ ，那么没问题可以匹配到所有，但如果我们就想在严格的 ^ 和 $ 中进行匹配呢？那么就得使用多行匹配模式了

多行匹配模式MULTILINE

多行匹配模式有两种语法

第一种，使用嵌入表达式：(?m)

       String word = "ac\nab";
       Pattern p = Pattern.compile("(?m)^a.*"); 
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }
//输出
// ac
// ab

第二种，指定Flag参数：Pattern.MULTILINE

    String word = "ac\nab";
       Pattern p = Pattern.compile("^a.*", Pattern.MULTILINE);
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }

全字符匹配模式DOTALL

在Java正则语法里面元字符 . 代表除了换行符外的任何字符，但有些时候我们就想匹配有换行符分隔的内容应该怎么做呢？

如果我们使用多行匹配模式，就会发现行不通

在Java里面使用 Pattern.DOTALL 参数 或者 (?s) 嵌入式表达式，代表让 . 代表所有字符，包含换行符

       String word = "run\nhad\noop";
//       Pattern p = Pattern.compile("h.*p", Pattern.DOTALL);
       Pattern p = Pattern.compile("(?s)h.*p");
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }
// 输出
// had
// oop

联合模式匹配 MULTILINE & DOTALL

有时候我们的匹配规则，比较复杂，可能需要联合多种模式一起用：

比如下面的规则：

工作的很好，ok，现在我们需求改为忽略换行符之后，仅匹配h开头和p结尾的字符串, 我们来分析下：

仅用MULTILINE肯定不行，因为h和p之间隔的有换行符

仅用DOTALL也不行，因为不区分多行，而是把整体当作一个大字符串了

所以只能联合 MULTILINE + DOTALL 两种模式了：

       String word = "run\nhad\noop\nhi\nspx";
//     Pattern p = Pattern.compile("(?ms)^h.*p$"); //嵌入式表达式
       Pattern p = Pattern.compile("^h.*p$", Pattern.DOTALL | Pattern.MULTILINE);
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }
// 输出
// had
// oop

忽略大小写CASE_INSENSITIVE

       String word = "cAt";
       Pattern p = Pattern.compile("(?i)^h.*p$");
//       Pattern p = Pattern.compile("cat", Pattern.CASE_INSENSITIVE);
       Matcher m = p.matcher(word);
       while (m.find()){
           System.out.println(m.group());
       }

Linux换行符UNIX_LINES

默认模式中\r\n都会被当做换行符：

       String input= "This is the first line\r"
               + "This is the second line\r"
               + "This is the third line\r";
       Pattern p = Pattern.compile("^T.*e");
       Matcher m = p.matcher(input);
       while (m.find()){
           System.out.println("["+m.group()+"]");
       }
// 输出
// [This is the first line]

当指定了UNIX_LINES后，只会在. ^ $ 中，其他的换行字符都会都会当成一个普通字符

    String input= "This is the first line\r"
               + "This is the second line\r"
               + "This is the third line\r";
       Pattern p = Pattern.compile("^T.*e", Pattern.UNIX_LINES);
       Matcher m = p.matcher(input);
       while (m.find()){
           System.out.println("["+m.group()+"]");
       }
// 输出
// This is the third line]

注意 \r 代表回车，会覆盖之前输出的内容，所以这里看到的结果是最后一段的结果

增加注释COMMENTS

可以在正则中加入解释

       String input= "abc\nbbc";
       Pattern p = Pattern.compile("a.*c # 寻找以a开头以c结尾的单词", Pattern.COMMENTS);
       Matcher m = p.matcher(input);
       while (m.find()){
           System.out.println("["+m.group()+"]");
       }

文字解析模式LITERAL

       String input= "abc\nbbc";
		//仅能与 CASE_INSENSITIVE 和 UNICODE_CASE 搭配
       Pattern p = Pattern.compile("a.*", Pattern.LITERAL);
       Matcher m = p.matcher(input);
       while (m.find()){
           System.out.println("["+m.group()+"]");
       }
// 输出为空，所有的元字符会被当成普通字符

非ASCII编码忽略大小写UNICODE_CASE

默认情况下忽略大小写匹配仅支持ASCII编码，如果非ASCII编码需要使用 UNICODE_CASE 和 CASE_INSENSITIVE 组合才有效果

       String input= "À";
       Pattern p = Pattern.compile("à", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
       Matcher m = p.matcher(input);
       while (m.find()){
           System.out.println("["+m.group()+"]");
       }

UNICODE_CHARACTER_CLASS模式

启用此模式，可以使用一些特定匹配规则：

Classes	Matchesb
\p{Lower}	A lowercase character:\p{IsLowercase}
\p{Upper}	An uppercase character:\p{IsUppercase}
\p{ASCII}	All ASCII:[\x00-\x7F]
\p{Alpha}	An alphabetic character:\p{IsAlphabetic}
\p{Digit}	A decimal digit character:p{IsDigit}
\p{Alnum}	An alphanumeric character:[\p{IsAlphabetic}\p{IsDigit}]
\p{Punct}	A punctuation character:p{IsPunctuation}
\p{Graph}	A visible character: [^\p{IsWhite_Space}\p{gc=Cc}\p{gc=Cs}\p{gc=Cn}]
\p{Print}	A printable character: [\p{Graph}\p{Blank}&&[^\p{Cntrl}]]
\p{Blank}	A space or a tab: [\p{IsWhite_Space}&&[^\p{gc=Zl}\p{gc=Zp}\x0a\x0b\x0c\x0d\x85]]
\p{Cntrl}	A control character: \p{gc=Cc}
\p{XDigit}	A hexadecimal digit: [\p{gc=Nd}\p{IsHex_Digit}]
\p{Space}	A whitespace character:\p{IsWhite_Space}
\d	A digit: \p{IsDigit}
\D	A non-digit: [^\d]
\s	A whitespace character: \p{IsWhite_Space}
\S	A non-whitespace character: [^\s]
\w	A word character: [\p{Alpha}\p{gc=Mn}\p{gc=Me}\p{gc=Mc}\p{Digit}\p{gc=Pc}\p{IsJoin_Control}]
\W	A non-word character: [^\w]

这里的匹配使用的unicode下的表示字符，参考：UTS #18: Unicode Regular Expressions

搜索 punctuation 关键词，可以看到unicode下的表示符号，就是我们键盘上非数字非字母部分的符号表示：

       Pattern p = Pattern.compile("\\p{Punct}");
       Matcher m = p.matcher("`");
       System.out.println(m.matches()); // returns true
       
       Pattern p1 = Pattern.compile("\\p{Punct}", Pattern.UNICODE_CHARACTER_CLASS);
       Matcher m1 = p1.matcher("`");
       System.out.println(m1.matches()); // returns false

注意上面的第二个不匹配，因为启动了UNICODE_CHARACTER_CLASS，必须用UNICODE_CHARACTER_CLASS下的字符表示才可以匹配

UNICODE的同等关系的CANON_EQ模式

这个一般用在UNICODE的字符中，举个例子：

“◌̇” U+0307 Combining Dot Above Unicode Character

unicode字符U+0307 代表字母上方的一个点 ḃ

而通过 b + \u0307 就能组成 ḃ ，而 ḃ 也有专门的unicode字符表示： \u1E03

也就是说 b\u0307 = \u1E03

在Java的正则里面，如果想要等价表示这个关系，就必须使用CANON_EQ模式匹配才可以

    String regex = "b\u0307";
        System.out.println(regex);
        System.out.println("\u1E03");
        Pattern pattern = Pattern.compile(regex, Pattern.CANON_EQ);
        Matcher matcher = pattern.matcher("\u1E03");
        if(matcher.matches()) {
            System.out.println("Match found");
        } else {
            System.out.println("Match not found");
        }
// 输出
// ḃ
// ḃ
// Match found

消息队列RabbitMQ

1. 消息队列 RabbitMQ 消息队列是一种在应用程序之间发送和接收消息的方法，可以实现异步通信、解耦应用、提高系统性能等效果。RabbitMQ 是一款常用的开源消息中间件，它实现了 AMQP 协议规范，并提供了可靠性、灵活性、易用性等优秀特性。本文将介绍 Rabbit

shardingjdbc使用与入门在其他表不动的情况下分表分库与springBoot的简单整合

shardingjdbc使用与入门与springBoot的简单整合首先我知道背地里说上司的的不是很不好,影响很差,可能会影响到我将来的出路,甚至以后换工作如果被人挖出了这个博客甚至都有可能不要我了,但是我

Veture can‘t find ‘tsconfig.json‘ or ‘jsconfig.json‘VSCode 上使用 Vue 插件 Vetur 时，出现标题报错

Veture can't find 'tsconfig.json' or 'jsconfig.json' 在 VSCode 上使用 Vue 插件 Vetur 时，老是出现标题中的报错，每次都要手动关闭是不是很烦？其实这个报错弹框是可以通过设置关闭的。

ChatGPT在语义理解和信息提取中的应用如何？

ChatGPT在语义理解和信息提取领域有着广泛的应用潜力。语义理解是指对文本进行深层次的理解，包括词义、句义和篇章义等层面的理解。信息提取是指从文本中自动抽取结构化的信息，如实体、关系、事件等。ChatGPT作为一种预训练语言模型，具有丰富的语义理解和上下文感知能力，可以在语义理解和信息提取

理解下PHP静态变量中使用unset

function test() {static

练习7-11 字符串逆序 (15分)

#include

决策树—非度量方法

文章目录一前言

不积跬步无以至千里——LeetCode 929. 独特的电子邮件地址

每封电子邮件都由一个本地名称和一个域名组成，以 @ 符号分隔。例如，在

matlab实现BP神经网络(完整DEMO)

本站原创文章，转载请说明来自《老饼讲解-BP神经网络》

Unity简易对象池（集合存储数据）

1、下面这个代码是用list集合创建的简易对象池，只能存储一种游戏对象。 using System.Collections