tag:blogger.com,1999:blog-18755363961715981762023-06-20T06:05:46.464-07:00Java Regular ExpressionJava Regular Expression - Examples ,Resources , Tutorials Blogisharahttp://www.blogger.com/profile/13758237633809870651noreply@blogger.comBlogger4125tag:blogger.com,1999:blog-1875536396171598176.post-4926852147909587582008-05-25T10:12:00.000-07:002008-05-25T10:17:09.921-07:00Address: ZIP code (US) Reguler Expression<script type="text/javascript"><!-- function regexbuddyhighlight(regexoffset) { document.getElementById("regex" + regexoffset).style.background = "#FFFF00"; document.getElementById("tree" + regexoffset).style.background = "#FFFF00"; } function regexbuddyclear(regexoffset) { document.getElementById("regex" + regexoffset).style.background = "white"; document.getElementById("tree" + regexoffset).style.background = "white"; } function regexbuddyhighlightdup(regexoffset) { document.getElementById("regex" + regexoffset).style.background = "#FFFF00"; document.getElementById("tree" + regexoffset + "dup").style.background = "#FFFF00"; } function regexbuddycleardup(regexoffset) { document.getElementById("regex" + regexoffset).style.background = "white"; document.getElementById("tree" + regexoffset + "dup").style.background = "white"; } --></script><br /><style type="text/css"><!-- body { margin: 0; padding: 0; background-color: white; color: black; height: 100%; } div.regexbuddyregex { margin: 10pt; } div.regexbuddytree { margin: 10pt; overflow-y: scroll; height: 70%; } div.regexbuddyfooter { margin: 10pt; } --></style><br /><br /><br /><div class="regexbuddyregex"><h1>Address: ZIP code (US)</h1><br /><tt class="regexbuddy"><span id="regex0" onmouseover="regexbuddyhighlight('0')" onmouseout="regexbuddyclear('0')">\b</span><span id="regex4" onmouseover="regexbuddyhighlight('4')" onmouseout="regexbuddyclear('4')">[0-9]</span><span id="regex14" onmouseover="regexbuddyhighlight('14')" onmouseout="regexbuddyclear('14')">{5}</span><span id="regex20" onmouseover="regexbuddyhighlight('20')" onmouseout="regexbuddyclear('20')">(?:</span><span id="regex26" onmouseover="regexbuddyhighlight('26')" onmouseout="regexbuddyclear('26')">-</span><span id="regex28" onmouseover="regexbuddyhighlight('28')" onmouseout="regexbuddyclear('28')">[0-9]</span><span id="regex38" onmouseover="regexbuddyhighlight('38')" onmouseout="regexbuddyclear('38')">{4}</span><span onmouseover="regexbuddyhighlight('20')" onmouseout="regexbuddyclear('20')">)</span><span id="regex46" onmouseover="regexbuddyhighlight('46')" onmouseout="regexbuddyclear('46')">?</span><span id="regex48" onmouseover="regexbuddyhighlight('48')" onmouseout="regexbuddyclear('48')">\b</span></tt><br /><br /></div><br /><div class="regexbuddytree"><ul class="regexbuddy"><li class="regexanchor"><span id="tree0" onmouseover="regexbuddyhighlight('0')" onmouseout="regexbuddyclear('0')">Assert position at a word boundary</span></li><li class="regexcharclassrange"><span id="tree4" onmouseover="regexbuddyhighlight('4')" onmouseout="regexbuddyclear('4')">Match a single character in the range between "0" and "9"</span><ul class="regexbuddy"><li class="regexrepeat"><span id="tree14" onmouseover="regexbuddyhighlight('14')" onmouseout="regexbuddyclear('14')">Exactly 5 times</span></li></ul></li><li class="regexgroup"><span id="tree20" onmouseover="regexbuddyhighlight('20')" onmouseout="regexbuddyclear('20')">Match the regular expression below</span><ul class="regexbuddy"><li class="regexrepeat"><span id="tree46" onmouseover="regexbuddyhighlight('46')" onmouseout="regexbuddyclear('46')">Between zero and one times, as many times as possible, giving back as needed (greedy)</span></li><li class="regexliteral"><span id="tree26" onmouseover="regexbuddyhighlight('26')" onmouseout="regexbuddyclear('26')">Match the character "-" literally</span></li><li class="regexcharclassrange"><span id="tree28" onmouseover="regexbuddyhighlight('28')" onmouseout="regexbuddyclear('28')">Match a single character in the range between "0" and "9"</span><ul class="regexbuddy"><li class="regexrepeat"><span id="tree38" onmouseover="regexbuddyhighlight('38')" onmouseout="regexbuddyclear('38')">Exactly 4 times</span></li></ul></li></ul></li><li class="regexanchor"><span id="tree48" onmouseover="regexbuddyhighlight('48')" onmouseout="regexbuddyclear('48')">Assert position at a word boundary</span></li></ul></div><br /><br /><div class="regexbuddyfooter"><p>Created with <a href="http://www.regexbuddy.com/">RegexBuddy</a></p></div>isharahttp://www.blogger.com/profile/13758237633809870651noreply@blogger.com0tag:blogger.com,1999:blog-1875536396171598176.post-68093129381987874452008-04-05T22:37:00.000-07:002008-04-05T22:39:54.098-07:00Java regular expression email validations<table bg cellpadding="5" cellspacing="0" cols="1" width="100%" style="color:#c0c0c0;"> <tbody><tr><td><center> <p> <span style="font-family:Arial, Helvetica;color:#000000;"> JavaRegxEmailValidations.java</span> </p> </center></td></tr></tbody></table> <pre><a name="l1"><span class="ln">1 </span></a><span class="s0">import </span><span class="s1">java.util.regex.Matcher;<br /><a name="l2"><span class="ln">2 </span></a></span><span class="s0">import </span><span class="s1">java.util.regex.Pattern;<br /><a name="l3"><span class="ln">3 </span></a></span><span class="s0">import </span><span class="s1">java.util.regex.PatternSyntaxException;<br /><a name="l4"><span class="ln">4 </span></a><br /><a name="l5"><span class="ln">5 </span></a></span><span class="s2">/**<br /><a name="l6"><span class="ln">6 </span></a> * Created by IntelliJ IDEA.<br /><a name="l7"><span class="ln">7 </span></a> * User: Ishara Samantha<br /><a name="l8"><span class="ln">8 </span></a> * Date: Apr 6, 2008<br /><a name="l9"><span class="ln">9 </span></a> * Time: 10:34:06 AM<br /><a name="l10"><span class="ln">10 </span></a> * To change this template use File | Settings | File Templates.<br /><a name="l11"><span class="ln">11 </span></a> */</span><span class="s1"><br /><a name="l12"><span class="ln">12 </span></a></span><span class="s0">public class </span><span class="s1">JavaRegxEmailValidations<br /><a name="l13"><span class="ln">13 </span></a>{<br /><a name="l14"><span class="ln">14 </span></a><br /><a name="l15"><span class="ln">15 </span></a></span><span class="s2">// \b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b</span><span class="s1"><br /><a name="l16"><span class="ln">16 </span></a></span><span class="s2">// </span><span class="s1"><br /><a name="l17"><span class="ln">17 </span></a></span><span class="s2">// Options: case insensitive</span><span class="s1"><br /><a name="l18"><span class="ln">18 </span></a></span><span class="s2">// </span><span class="s1"><br /><a name="l19"><span class="ln">19 </span></a></span><span class="s2">// Assert position at a word boundary «\b»</span><span class="s1"><br /><a name="l20"><span class="ln">20 </span></a></span><span class="s2">// Match a single character present in the list below «[A-Z0-9._%+-]+»</span><span class="s1"><br /><a name="l21"><span class="ln">21 </span></a></span><span class="s2">// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»</span><span class="s1"><br /><a name="l22"><span class="ln">22 </span></a></span><span class="s2">// A character in the range between “A” and “Z” «A-Z»</span><span class="s1"><br /><a name="l23"><span class="ln">23 </span></a></span><span class="s2">// A character in the range between “0” and “9” «0-9»</span><span class="s1"><br /><a name="l24"><span class="ln">24 </span></a></span><span class="s2">// One of the characters “._%” «._%»</span><span class="s1"><br /><a name="l25"><span class="ln">25 </span></a></span><span class="s2">// The character “+” «+»</span><span class="s1"><br /><a name="l26"><span class="ln">26 </span></a></span><span class="s2">// The character “-” «-»</span><span class="s1"><br /><a name="l27"><span class="ln">27 </span></a></span><span class="s2">// Match the character “@” literally «@»</span><span class="s1"><br /><a name="l28"><span class="ln">28 </span></a></span><span class="s2">// Match a single character present in the list below «[A-Z0-9.-]+»</span><span class="s1"><br /><a name="l29"><span class="ln">29 </span></a></span><span class="s2">// Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»</span><span class="s1"><br /><a name="l30"><span class="ln">30 </span></a></span><span class="s2">// A character in the range between “A” and “Z” «A-Z»</span><span class="s1"><br /><a name="l31"><span class="ln">31 </span></a></span><span class="s2">// A character in the range between “0” and “9” «0-9»</span><span class="s1"><br /><a name="l32"><span class="ln">32 </span></a></span><span class="s2">// The character “.” «.»</span><span class="s1"><br /><a name="l33"><span class="ln">33 </span></a></span><span class="s2">// The character “-” «-»</span><span class="s1"><br /><a name="l34"><span class="ln">34 </span></a></span><span class="s2">// Match the character “.” literally «\.»</span><span class="s1"><br /><a name="l35"><span class="ln">35 </span></a></span><span class="s2">// Match a single character in the range between “A” and “Z” «[A-Z]{2,4}»</span><span class="s1"><br /><a name="l36"><span class="ln">36 </span></a></span><span class="s2">// Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}»</span><span class="s1"><br /><a name="l37"><span class="ln">37 </span></a></span><span class="s2">// Assert position at a word boundary «\b»</span><span class="s1"><br /><a name="l38"><span class="ln">38 </span></a> <br /><a name="l39"><span class="ln">39 </span></a> </span><span class="s0">public static void </span><span class="s1">main(String[] args)<br /><a name="l40"><span class="ln">40 </span></a> {<br /><a name="l41"><span class="ln">41 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">ishara@gmail.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"ishara@gmail.com"</span><span class="s1">));<br /><a name="l42"><span class="ln">42 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">ip@1.2.3.123</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"ip@1.2.3.123"</span><span class="s1">));<br /><a name="l43"><span class="ln">43 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">pharaoh@egyptian.museum</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"pharaoh@egyptian.museum"</span><span class="s1">));<br /><a name="l44"><span class="ln">44 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">john.doe+regexbuddy@gmail.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"john.doe+regexbuddy@gmail.com"</span><span class="s1">));<br /><a name="l45"><span class="ln">45 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">Mike.O'Dell@ireland.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"Mike.O'Dell@ireland.com"</span><span class="s1">));<br /><a name="l46"><span class="ln">46 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"\\\"</span><span class="s3">Mike</span><span class="s0">\\\\</span><span class="s3"> O'Dell</span><span class="s0">\\\"</span><span class="s3">@ireland.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"</span><span class="s0">\"</span><span class="s3">Mike</span><span class="s0">\\</span><span class="s3"> O'Dell</span><span class="s0">\"</span><span class="s3">@ireland.com"</span><span class="s1">));<br /><a name="l47"><span class="ln">47 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">IPguy@[1.2.3.4]</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"IPguy@[1.2.3.4]"</span><span class="s1">));<br /><a name="l48"><span class="ln">48 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">ishara.samantha@gmail.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"ishara.samantha@gmail.com"</span><span class="s1">));<br /><a name="l49"><span class="ln">49 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">ishara@ac.lk</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"ishara@ac.lk"</span><span class="s1">));<br /><a name="l50"><span class="ln">50 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">1024x768@60Hz</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"1024x768@60Hz"</span><span class="s1">));<br /><a name="l51"><span class="ln">51 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">not.a.valid.email</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"not.a.valid.email"</span><span class="s1">));<br /><a name="l52"><span class="ln">52 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">not@valid.email</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"not@valid.email"</span><span class="s1">));<br /><a name="l53"><span class="ln">53 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">john@aol...com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"john@aol...com"</span><span class="s1">));<br /><a name="l54"><span class="ln">54 </span></a> System.out.println(</span><span class="s3">"emailValidation(</span><span class="s0">\"</span><span class="s3">Mike</span><span class="s0">\\\\</span><span class="s3"> O'Dell@ireland.com</span><span class="s0">\"</span><span class="s3">) = " </span><span class="s1">+ emailValidation(</span><span class="s3">"Mike</span><span class="s0">\\</span><span class="s3"> O'Dell@ireland.com"</span><span class="s1">));<br /><a name="l55"><span class="ln">55 </span></a><br /><a name="l56"><span class="ln">56 </span></a> }<br /><a name="l57"><span class="ln">57 </span></a><br /><a name="l58"><span class="ln">58 </span></a> </span><span class="s0">private static boolean </span><span class="s1">emailValidation(String email)<br /><a name="l59"><span class="ln">59 </span></a> {<br /><a name="l60"><span class="ln">60 </span></a> </span><span class="s0">boolean </span><span class="s1">foundMatch = </span><span class="s0">false</span><span class="s1">;<br /><a name="l61"><span class="ln">61 </span></a> </span><span class="s0">try </span><span class="s1">{<br /><a name="l62"><span class="ln">62 </span></a> Pattern regex = Pattern.compile(</span><span class="s3">"</span><span class="s0">\\</span><span class="s3">b[A-Z0-9._%+-]+@[A-Z0-9.-]+</span><span class="s0">\\</span><span class="s3">.[A-Z]{2,4}</span><span class="s0">\\</span><span class="s3">b"</span><span class="s1">, Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /><a name="l63"><span class="ln">63 </span></a> Matcher regexMatcher = regex.matcher(email);<br /><a name="l64"><span class="ln">64 </span></a> foundMatch = regexMatcher.find();<br /><a name="l65"><span class="ln">65 </span></a> } </span><span class="s0">catch </span><span class="s1">(PatternSyntaxException ex) {<br /><a name="l66"><span class="ln">66 </span></a> </span><span class="s2">// Syntax error in the regular expression</span><span class="s1"><br /><a name="l67"><span class="ln">67 </span></a> }<br /><a name="l68"><span class="ln">68 </span></a> </span><span class="s0">return </span><span class="s1">foundMatch;<br /><a name="l69"><span class="ln">69 </span></a> }<br /><a name="l70"><span class="ln">70 </span></a>}<br /><a name="l71"><span class="ln">71 </span></a></span></pre> <pre>Email address<br />Use this version to seek out email addresses in random documents and texts.<br />Does not match email addresses using an IP address instead of a domain name.<br />Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum. Including these increases the risk of false positives when applying the regex to random documents.<br />Requires the "case insensitive" option to be ON.</pre> <pre><span class="ln">emailValidation("ishara@gmail.com") = true<br />emailValidation("ip@1.2.3.123") = false<br />emailValidation("pharaoh@egyptian.museum") = false<br />emailValidation("john.doe+regexbuddy@gmail.com") = true<br />emailValidation("Mike.O'Dell@ireland.com") = true<br />emailValidation("\"Mike\\ O'Dell\"@ireland.com") = false<br />emailValidation("IPguy@[1.2.3.4]") = false<br />emailValidation("ishara.samantha@gmail.com") = true<br />emailValidation("ishara@ac.lk") = true<br />emailValidation("1024x768@60Hz") = false<br />emailValidation("not.a.valid.email") = false<br />emailValidation("not@valid.email") = false<br />emailValidation("john@aol...com") = true<br />emailValidation("Mike\\ O'Dell@ireland.com") = true</span></pre> <pre><span class="ln">More about email Validations as fallows</span></pre> <ol><li> <pre><span class="ln">Email address (anchored)<br />Use this anchored version to check if a valid email address was entered.<br />Does not match email addresses using an IP address instead of a domain name.<br />Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum.<br />Requires the "case insensitive" option to be ON.</span></pre> <ul><li> <pre><span class="ln">^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$</span></pre> </li></ul> </li><li> <pre>Email address (anchored; no consecutive dots)<br />Use this anchored version to check if a valid email address was entered.<br />Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com<br />Does not match email addresses using an IP address instead of a domain name.<br />Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum. Including these increases the risk of false positives when applying the regex to random documents.<br />Requires the "case insensitive" option to be ON.</pre> <ul><li> <pre><span class="ln">^[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}$</span></pre> </li></ul> </li><li> <pre><span class="ln">Email address (no consecutive dots)<br />Use this version to seek out email addresses in random documents and texts.<br />Improves on the original email address regex by excluding addresses with consecutive dots such as john@aol...com<br />Does not match email addresses using an IP address instead of a domain name.<br />Does not match email addresses on new-fangled top-level domains with more than 4 letters such as .museum. Including these increases the risk of false positives when applying the regex to random documents.<br />Requires the "case insensitive" option to be ON.</span></pre> <ul><li> <pre><span class="ln">\b[A-Z0-9._%+-]+@(?:[A-Z0-9-]+\.)+[A-Z]{2,4}\b</span></pre> </li></ul> </li><li> <pre>Email address (specific TLDs)<br />Does not match email addresses using an IP address instead of a domain name.<br />Matches all country code top level domains, and specific common top level domains.<br />Requires the "case insensitive" option to be ON.</pre> <ul><li> <pre><span class="ln">^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.(?:com|org|net|gov|mil|biz|info|name|aero|biz|info|mobi|jobs|museum|[A-Z]{2})$</span></pre> </li></ul> </li><li> <pre><span class="ln">Email address: RFC 2822<br />This regular expression implements the official RFC 2822 standard for email addresses. Using this regular expression in actual applications is NOT recommended. It is shown to illustrate that with regular expressions there's always a trade-off between what's exact and what's practical.</span></pre> <ul><li> <pre><span class="ln">(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])</span></pre> </li></ul> </li><li> <pre><span class="ln">Email address: RFC 2822 (simplified)<br />Matches a normal email address. Does not check the top-level domain.<br />Requires the "case insensitive" option to be ON.</span></pre> <ul><li> <pre><span class="ln">[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?</span></pre> </li></ul> </li><li> <pre><span class="ln">Email address: RFC 2822 (specific TLDs)<br />Matches all country code top level domains, and specific common top level domains.<br />Requires the "case insensitive" option to be ON.</span></pre> <ul><li> <pre><span class="ln">[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|name|aero|biz|info|mobi|jobs|museum)\b</span></pre> </li></ul> </li></ol> <pre> </pre> <pre> </pre> <pre> </pre> <pre> </pre> <pre> </pre> <pre> </pre>isharahttp://www.blogger.com/profile/13758237633809870651noreply@blogger.com0tag:blogger.com,1999:blog-1875536396171598176.post-27861709664285309532008-04-05T10:11:00.000-07:002008-04-05T10:18:10.248-07:00Java Regular Expression Samples<ul><li>Check if the regex matches a string entirely</li><li>IF/else branch whether the regx matches a sring entirely</li><li>Create an object to use the same regx for many operations</li><li>Create an object to apply a regx repeatedly to a given string</li><li>Use regex object to test if (part of ) a string can be matched</li><li>Use regex object to test if a string can be match entirely</li><li>use regex object to get the part of a string matched by the regex</li><li>Use regex object to get the path of a string matched by a numbered group</li><li>Use regex object to get a list of all text matched by a numbered group</li><li>Iterate over all matches in a string</li><li>Iterate over all matches and capturing groups in a string</li></ul>import java.util.ArrayList;<br />import java.util.List;<br />import java.util.regex.Matcher;<br />import java.util.regex.Pattern;<br />import java.util.regex.PatternSyntaxException;<br /><br />/**<br /> * Created by IntelliJ IDEA.<br /> * User: Ishara Samantha<br /> * Date: Apr 5, 2008<br /> * Time: 8:46:45 PM<br /> * To change this template use File | Settings | File Templates.<br /> */<br />public class JavaRegX<br />{<br /> private static String subjectString;<br /> private static String subjectString1;<br /> private static String anotherSubjectString;<br /><br /> public static void main(String[] args)<br /> {<br /> subjectString = "ishara@hoofoo.net";<br /> anotherSubjectString = "test@hoofoo.net";<br /> subjectString1 = subjectString + "," + anotherSubjectString;<br /><br /> //Check if the regex matches a string entirely<br /> try<br /> {<br /> boolean foundMatch = subjectString.matches("(?i)\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b");<br /> System.out.println("foundMatch = " + foundMatch);<br /> } catch (PatternSyntaxException ex)<br /> {<br /> ex.printStackTrace();<br /> }<br /><br /> //IF/else branch whether the regx matches a sring entirely<br /> try<br /> {<br /> if (subjectString.matches("(?i)\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b"))<br /> {<br /> System.out.println("Match");<br /> } else<br /> {<br /> System.out.println("Match Faild");<br /> }<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString);<br /><br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> try<br /> {<br /> //Create an object to use the same regx for many operations<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> //Create an object to apply a regx repeatedly to a given string<br /> Matcher regexMatcher = regex.matcher(subjectString);<br /> //Aply the same regex to more than one string<br /> regexMatcher.reset(anotherSubjectString);<br /><br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> //Use regex object to test if (part of ) a string can be matched<br /> boolean foundMatch = false;<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString1);<br /> foundMatch = regexMatcher.find();<br /> System.out.println("regexMatcher = " + regexMatcher);<br /> System.out.println("foundMatch = " + foundMatch);<br /> } catch (PatternSyntaxException ex)<br /><br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> //Use regex object to test if a string can be match entirely<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString);<br /> foundMatch = regexMatcher.matches();<br /> System.out.println("Use regex object to test if a string can be match entirely");<br /> System.out.println("regexMatcher = " + regexMatcher);<br /> System.out.println("foundMatch = " + foundMatch);<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> //use regex object to get the part of a string matched by the regex<br /> //Use regex object to get the path of a string matched by a numbered group<br /> String ResultString = null;<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString1);<br /> if (regexMatcher.find())<br /> {<br /> ResultString = regexMatcher.group(0);<br /> System.out.println("ResultString = " + ResultString);<br /> }<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> //Use regex object to get a list of all text matched by a numbered group<br /> List<string> matchList = new ArrayList<string>();<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString1);<br /> while (regexMatcher.find())<br /> {<br /> matchList.add(regexMatcher.group(0));<br /> }<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /> System.out.println("matchList.size() = " + matchList.size());<br /><br /> //Iterate over all matches in a string<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString);<br /> while (regexMatcher.find())<br /> {<br /> // matched text: regexMatcher.group()<br /> // match start: regexMatcher.start()<br /> // match end: regexMatcher.end()<br /> }<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /> //Iterate over all matches and capturing groups in a string<br /> try<br /> {<br /> Pattern regex = Pattern.compile("\\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,4}\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);<br /> Matcher regexMatcher = regex.matcher(subjectString);<br /> while (regexMatcher.find())<br /> {<br /> for (int i = 1; i <= regexMatcher.groupCount(); i++)<br /> {<br /> // matched text: regexMatcher.group(i)<br /> // match start: regexMatcher.start(i)<br /> // match end: regexMatcher.end(i)<br /> }<br /> }<br /> } catch (PatternSyntaxException ex)<br /> {<br /> // Syntax error in the regular expression<br /> }<br /><br /><br /> }<br />}isharahttp://www.blogger.com/profile/13758237633809870651noreply@blogger.com0tag:blogger.com,1999:blog-1875536396171598176.post-11351262483354473932008-04-01T10:24:00.000-07:002008-04-01T10:25:09.127-07:00Using Regular Expressions in Java<h1>Using Regular Expressions in Java</h1> <p>Java 4 (JDK 1.4) and later have comprehensive support for regular expressions through the standard <tt class="code">java.util.regex</tt> package. Because Java lacked a regex package for so long, there are also many 3rd party regex packages available for Java. I will only discuss Sun's regex library that is now part of the JDK. Its quality is excellent, better than most of the 3rd party packages. Unless you need to support older versions of the JDK, the <tt class="code"> java.util.regex</tt> package is the way to go.</p> <p>Java 5 and 6 use the same regular expression flavor (with a few minor fixes), and provide the same regular expression classes. They add a few advanced functions not discussed on this page.</p> <h2>Quick Regex Methods of The String Class</h2> <p>The Java String class has several methods that allow you to perform an operation using a regular expression on that string in a minimal amount of code. The downside is that you cannot specify options such as "case insensitive" or "dot matches newline". For performance reasons, you should also not use these methods if you will be using the same regular expression often.</p> <p><tt class="code">myString.matches("regex")</tt> returns true or false depending whether the string can be matched entirely by the regular expression. It is important to remember that String.matches() only returns true if the entire string can be matched. In other words: "regex" is applied as if you had written "^regex$" with start and end of string anchors. This is different from most other regex libraries, where the "quick match test" method returns true if the regex can be matched anywhere in the string. If myString is <tt class="string">abc</tt> then <tt class="code">myString.matches("bc")</tt> returns false. <tt class="regex">bc</tt> matches <tt class="string">abc</tt>, but <tt class="regex">^bc$</tt> (which is really being used here) does not.</p> <p><tt class="code">myString.replaceAll("regex", "replacement")</tt> replaces all regex matches inside the string with the replacement string you specified. No surprises here. All parts of the string that match the regex are replaced. You can use the contents of capturing parentheses in the replacement text via $1, $2, $3, etc. $0 (dollar zero) inserts the entire regex match. $12 is replaced with the 12th backreference if it exists, or with the 1st backreference followed by the literal "2" if there are less than 12 backreferences. If there are 12 or more backreferences, it is not possible to insert the first backreference immediately followed by the literal "2" in the replacement text.</p> <p>In the replacement text, a dollar sign not followed by a digit causes an IllegalArgumentException to be thrown. If there are less than 9 backreferences, a dollar sign followed by a digit greater than the number of backreferences throws an IndexOutOfBoundsException. So be careful if the replacement string is a user-specified string. To insert a dollar sign as literal text, use <tt>\$</tt> in the replacement text. When coding the replacement text as a literal string in your source code, remember that the backslash itself must be escaped too: <tt> "\\$"</tt>.</p> <p><tt class="code">myString.split("regex")</tt> splits the string at each regex match. The method returns an array of strings where each element is a part of the original string between two regex matches. The matches themselves are not included in the array. Use <tt class="code">myString.split("regex", n)</tt> to get an array containing at most n items. The result is that the string is split at most n-1 times. The last item in the string is the unsplit remainder of the original string.</p> <h2>Using The Pattern Class</h2> <p>In Java, you compile a regular expression by using the <tt class="code"> Pattern.compile()</tt> class factory. This factory returns an object of type <tt class="code">Pattern</tt>. E.g.: <tt class="code">Pattern myPattern = Pattern.compile("regex");</tt> You can specify certain options as an optional second parameter. <tt class="code">Pattern.compile("regex", Pattern.CASE_INSENSITIVE | Pattern.DOTALL | Pattern.MULTILINE)</tt> makes the regex case insensitive for US ASCII characters, causes the dot to match line breaks and causes the start and end of string anchors to match at embedded line breaks as well. When working with Unicode strings, specify <tt class="code"> Pattern.UNICODE_CASE</tt> if you want to make the regex case insensitive for all characters in all languages. You should always specify <tt class="code"> Pattern.CANON_EQ</tt> to ignore differences in Unicode encodings, unless you are sure your strings contain only US ASCII characters and you want to increase performance.</p> <p>If you will be using the same regular expression often in your source code, you should create a <tt class="code">Pattern</tt> object to increase performance. Creating a <tt class="code">Pattern</tt> object also allows you to pass matching options as a second parameter to the <tt class="code"> Pattern.compile()</tt> class factory. If you use one of the <tt class="code"> String</tt> methods above, the only way to specify options is to embed mode modifier into the regex. Putting <tt class="regex">(?i)</tt> at the start of the regex makes it case insensitive. <tt class="regex">(?m)</tt> is the equivalent of <tt class="code">Pattern.MULTILINE</tt>, <tt class="regex">(?s)</tt> equals <tt class="code">Pattern.DOTALL</tt> and <tt class="regex">(?u)</tt> is the same as <tt class="code">Pattern.UNICODE_CASE</tt>. Unfortunately, <tt class="code"> Pattern.CANON_EQ</tt> does not have an embedded mode modifier equivalent.</p> <p>Use <tt class="code">myPattern.split("subject")</tt> to split the subject string using the compiled regular expression. This call has exactly the same results as <tt class="code">myString.split("regex")</tt>. The difference is that the former is faster since the regex was already compiled.</p> <h2>Using The Matcher Class</h2> <p>Except for splitting a string (see previous paragraph), you need to create a <tt class="code">Matcher</tt> object from the <tt class="code">Pattern</tt> object. The <tt class="code">Matcher</tt> will do the actual work. The advantage of having two separate classes is that you can create many <tt class="code"> Matcher</tt> objects from a single <tt class="code">Pattern</tt> object, and thus apply the regular expression to many subject strings simultaneously.</p> <p>To create a <tt class="code">Matcher</tt> object, simply call <tt class="code">Pattern.matcher()</tt> like this: <tt class="code">myMatcher = Pattern.matcher("subject")</tt>. If you already created a <tt class="code"> Matcher</tt> object from the same pattern, call <tt class="code"> myMatcher.reset("newsubject")</tt> instead of creating a new matcher object, for reduced garbage and increased performance. Either way, <tt class="code"> myMatcher</tt> is now ready for duty.</p> <p>To find the first match of the regex in the subject string, call <tt class="code">myMatcher.find()</tt>. To find the next match, call <tt class="code">myMatcher.find()</tt> again. When <tt class="code"> myMatcher.find()</tt> returns false, indicating there are no further matches, the next call to <tt class="code">myMatcher.find()</tt> will find the first match again. The <tt class="code">Matcher</tt> is automatically reset to the start of the string when <tt class="code">find()</tt> fails.</p> <p>The <tt class="code">Matcher</tt> object holds the results of the last match. Call its methods <tt class="code">start()</tt>, <tt class="code">end()</tt> and <tt class="code">group()</tt> to get details about the entire regex match and the matches between capturing parentheses. Each of these methods accepts a single int parameter indicating the number of the backreference. Omit the parameter to get information about the entire regex match. <tt class="code"> start()</tt> is the index of the first character in the match. <tt class="code"> end()</tt> is the index of the first character after the match. Both are relative to the start of the subject string. So the length of the match is <nobr> <tt class="code">end() - start()</tt></nobr>. <tt class="code">group()</tt> returns the string matched by the regular expression or pair of capturing parentheses.</p> <p><tt class="code">myMatcher.replaceAll("replacement")</tt> has exactly the same results as <tt class="code">myString.replaceAll("regex", "replacement")</tt>. Again, the difference is speed.</p> <p>The <tt class="code">Matcher</tt> class allows you to do a search-and-replace and compute the replacement text for each regex match in your own code. You can do this with the <tt class="code">appendReplacement()</tt> and <tt class="code"> appendTail()</tt> Here is how:</p> <pre>StringBuffer myStringBuffer = <span class="reservedword">new</span> StringBuffer();<br />myMatcher = myPattern.matcher(<span class="characterstring">"subject"</span>);<br /><span class="reservedword">while</span> (myMatcher.find()) <span class="bracket">{</span><br /> <span class="reservedword">if</span> (checkIfThisMatchShouldBeReplaced()) <span class="bracket">{</span><br /> myMatcher.appendReplacement(myStringBuffer, computeReplacementString());<br /> <span class="bracket">}</span><br /><span class="bracket">}</span><br />myMatcher.appendTail(myStringBuffer);</pre> <p>Obviously, <tt class="code">checkIfThisMatchShouldBeReplaced()</tt> and <tt class="code">computeReplacementString()</tt> are placeholders for methods that you supply. The first returns true or false indicating if a replacement should be made at all. Note that skipping replacements is way faster than replacing a match with exactly the same text as was matched. <tt class="code"> computeReplacementString()</tt> returns the actual replacement string.</p> <h2>Regular Expressions, Literal Strings and Backslashes</h2> <p>In literal Java strings the backslash is an escape character. The literal string <tt class="code">"\\"</tt> is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression <tt class="regex">\\</tt> matches a single backslash. This regular expression as a Java string, becomes <tt class="code">"\\\\"</tt>. That's right: 4 backslashes to match a single one.</p> <p>The regex <tt class="regex">\w</tt> matches a word character. As a Java string, this is written as <tt class="code">"\\w"</tt>.</p> <p>The same backslash-mess occurs when providing replacement strings for methods like String.replaceAll() as literal Java strings in your Java code. In the replacement text, a dollar sign must be encoded as \$ and a backslash as \\ when you want to replace the regex match with an actual dollar sign or backslash. However, backslashes must also be escaped in literal Java strings. So a single dollar sign in the replacement text becomes <tt class="code">"\\$"</tt> when written as a literal Java string. The single backslash becomes <tt class="code"> "\\\\"</tt>. Right again: 4 backslashes to insert a single one.</p>isharahttp://www.blogger.com/profile/13758237633809870651noreply@blogger.com0