Here are some information gathered from the apache site…..
Package org.apache.oro.text.regex
This package used to be the OROMatcher library and provides both generic regular expression interfaces and Perl5 regular expression compatible implementation classes.
See:
Description
| Interface Summary | |
| MatchResult | The MatchResult interface allows PatternMatcher implementors to return results storing match information in whatever format they like, while presenting a consistent way of accessing that information. |
| Pattern | The Pattern interface allows multiple representations of a regular expression to be defined. |
| PatternCompiler | The PatternCompiler interface defines the operations a regular expression compiler must implement. |
| PatternCompilerOptions | PatternCompilerOptions abstracts the options used by regular expression engines into a common set. |
| PatternMatcher | The PatternMatcher interface defines the operations a regular expression matcher must implement. |
| PatternMatchingEngine | PatternMatchingEngine is an interface that abstracts a regular expression implementation into a PatternCompiler, a PatternMatcher, and PatternCompilerOptions. |
| Substitution | The Substitution interface provides a means for you to control how a substitution is performed when using the Util.substitute method. |
| Class Summary | |
| PatternMatcherInput | The PatternMatcherInput class is used to preserve state across calls to the contains() methods of PatternMatcher instances. |
| Perl5Compiler | The Perl5Compiler class is used to create compiled regular expressions conforming to the Perl5 regular expression syntax. |
| Perl5CompilerOptions | |
| Perl5Debug | The Perl5Debug class is not intended for general use and should not be instantiated, but is provided because some users may find the output of its single method to be useful. |
| Perl5Engine | |
| Perl5Matcher | The Perl5Matcher class is used to match regular expressions (conforming to the Perl5 regular expression syntax) generated by Perl5Compiler. |
| Perl5MatchResult | A class used to store and access the results of a Perl5Pattern match. |
| Perl5Pattern | An implementation of the Pattern interface for Perl5 regular expressions. |
| Perl5Substitution | Perl5Substitution implements a Substitution consisting of a literal string, but allowing Perl5 variable interpolation referencing saved groups in a match. |
| StringSubstitution | StringSubstitution implements a Substitution consisting of a simple literal string. |
| Util | The Util class is a holder for useful static utility methods that can be generically applied to Pattern and PatternMatcher instances. |
| Exception Summary | |
| MalformedPatternException | A class used to signify the occurrence of a syntax error in a regular expression that is being compiled. |
Package org.apache.oro.text.regex Description
This package used to be the OROMatcher library and provides both generic regular expression interfaces and Perl5 regular expression compatible implementation classes.
Note: The following information will be moved into the user’s guide.
Perl5 regular expressions
Here we summarize the syntax of Perl5.003 regular expressions, all of which is supported by the Perl5 classes in this package. However, for a definitive reference, you should consult the perlre man page that accompanies the Perl5 distribution and also the book Programming Perl, 2nd Edition from O’Reilly & Associates. We are working toward implementing the features added after Perl5.003 up to and including Perl 5.6. Please remember, we only guarantee support for Perl5.003 expressions in version 2.0.
- Alternatives separated by |
- Quantified atoms
- {n,m}
- Match at least n but not more than m times.
- {n,}
- Match at least n times.
- {n}
- Match exactly n times.
- *
- Match 0 or more times.
- +
- Match 1 or more times.
- ?
- Match 0 or 1 times.
- Atoms
- regular expression within parentheses
- a . matches everything except \n
- a ^ is a null token matching the beginning of a string or line (i.e., the position right after a newline or right before the beginning of a string)
- a $ is a null token matching the end of a string or line (i.e., the position right before a newline or right after the end of a string)
- Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
- Special backslashed characters work within a character class (except for backreferences and boundaries).
- \b is backspace inside a character class
- Special backslashed characters
- \b
- null token matching a word boundary (\w on one side and \W on the other)
- \B
- null token matching a boundary that isn’t a word boundary
- \A
- Match only at beginning of string
- \Z
- Match only at end of string (or before newline at the end)
- \n
- newline
- \r
- carriage return
- \t
- tab
- \f
- formfeed
- \d
- digit [0-9]
- \D
- non-digit [^0-9]
- \w
- word character [0-9a-z_A-Z]
- \W
- a non-word character [^0-9a-z_A-Z]
- \s
- a whitespace character [ \t\n\r\f]
- \S
- a non-whitespace character [^ \t\n\r\f]
- \xnn
- hexadecimal representation of character
- \cD
- matches the corresponding control character
- \nn or \nnn
- octal representation of character unless a backreference. a
- \1, \2, \3, etc.
- match whatever the first, second, third, etc. parenthesized group matched. This is called a backreference. If there is no corresponding group, the number is interpreted as an octal representation of a character.
- \0
- matches null character
- Any other backslashed character matches itself
- Expressions within parentheses are matched as subpattern groups and saved for use by certain methods.
By default, a quantified subpattern is greedy . In other words it matches as many times as possible without causing the rest of the pattern not to match. To change the quantifiers to match the minimum number of times possible, without causing the rest of the pattern not to match, you may use a “?” right after the quantifier.
- *?
- Match 0 or more times
- +?
- Match 1 or more times
- ??
- Match 0 or 1 time
- {n}?
- Match exactly n times
- {n,}?
- Match at least n times
- {n,m}?
- Match at least n but not more than m times
Perl5 extended regular expressions are fully supported.
- (?#text)
- An embedded comment causing text to be ignored.
- (?:regexp)
- Groups things like “()” but doesn’t cause the group match to be saved.
- (?=regexp)
- A zero-width positive lookahead assertion. For example, \w+(?=\s) matches a word followed by whitespace, without including whitespace in the MatchResult.
- (?!regexp)
- A zero-width negative lookahead assertion. For example foo(?!bar) matches any occurrence of “foo” that isn’t followed by “bar”. Remember that this is a zero-width assertion, which means that a(?!b)d will match ad because a is followed by a character that is not b (the d) and a d follows the zero-width assertion.
- (?imsx)
- One or more embedded pattern-match modifiers. i enables case insensitivity, m enables multiline treatment of the input, s enables single line treatment of the input, and x enables extended whitespace comments.
January 4, 2011 at 6:42 pm
http://www.testingminded.com/2009/01/tutorial-on-testing-webservices-with.html
January 4, 2011 at 6:44 pm
a great site for jmeter toutorial http://www.testingminded.com/2009/02/tutorial-on-functional-testing-with_3268.html
January 5, 2011 at 8:11 pm
What is a regular expression?
Part of this discussion is based on page 94 of “Compilers, Principles, Techniques, and Tools” by Aho, Sethi and Ullman
A regular expression is a pattern denoted by a sequence of symbols representing a state-machine or mini-program that is capable of matching particular sequences of characters. Regular expressions have their root in lexical analysis and tokenization where a set of lexemes had to be recognized before being passed on to a parser. Since then, regular expressions took a life of their own, appearing in such languages as AWK, TCL, and of course Perl, for all sorts of textual data extraction and manipulation purposes.
The most basic regular expression syntax consists of 4 operations. Let A and B each represent an alphabet (a set of characters) and s and t represent members of those alphabets.
Operation Representation Meaning
Union of A and B A|B s is such that s is in A or s is in B
Concatentation of A and B AB st are such that s is in A and t is in B
Kleene closure of A A* Zero or more concatenations of A
Positive closure of A A+ One or more concatenations of A
Using this notation you can define a regular expression for positive integers as follows:
digit +
Here digit represents the set of characters 0 – 9. A range of characters like this can be represented in most regular expression languages as [0-9]. Because this is such a common expression, some languages have a special character for it: \d .
Learning a regular expression language is quite simple once you’ve learned one, because most of the operations are the same. Only the notation changes.
Perl5 regular expressions
Here we summarize the syntax of Perl5 regular expressions, all of which is supported by the OROMatcher TM Perl5 classes. However, for a definitive reference, you should consult the perlre man page that accompanies the Perl5 distribution and also the book Programming Perl, 2nd Edition from O’Reilly & Associates. We need to point out here that for efficiency reasons the character set operator [...] is limited to work on only ASCII characters (Unicode characters 0 through 255). Other than that restriction, all Unicode characters should be useable in the package’s regular expressions.
Alternatives separated by |
Quantified atoms
{n,m}
Match at least n but not more than m times.
{n,}
Match at least n times.
{n}
Match exactly n times.
*
Match 0 or more times.
+
Match 1 or more times.
?
Match 0 or 1 times.
Atoms
regular expression within parentheses
a . matches everything except \n
a ^ is a null token matching the beginning of a string or line (i.e., the position right after a newline or right before the beginning of a string)
a $ is a null token matching the end of a string or line (i.e., the position right before a newline or right after the end of a string)
Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
Special backslashed characters work within a character class (except for backreferences and boundaries).
\b is backspace inside a character class
Special backslashed characters
\b
null token matching a word boundary (\w on one side and \W on the other)
\B
null token matching a boundary that isn’t a word boundary
\A
Match only at beginning of string
\Z
Match only at end of string (or before newline at the end)
\n
newline
\r
carriage return
\t
tab
\f
formfeed
\d
digit [0-9]
\D
non-digit [^0-9]
\w
word character [0-9a-z_A-Z]
\W
a non-word character [^0-9a-z_A-Z]
\s
a whitespace character [ \t\n\r\f]
\S
a non-whitespace character [^ \t\n\r\f]
\xnn
hexadecimal representation of character
\cD
matches the corresponding control character
\nn or \nnn
octal representation of character unless a backreference. a
\1, \2, \3, etc.
match whatever the first, second, third, etc. parenthesized group matched. This is called a backreference. If there is no corresponding group, the number is interpreted as an octal representation of a character.
matches null character
Any other backslashed character matches itself
Expressions within parentheses are matched as subpattern groups and saved for use by certain methods.
By default, a quantified subpattern is greedy . In other words it matches as many times as possible without causing the rest of the pattern not to match. To change the quantifiers to match the minimum number of times possible, without causing the rest of the pattern not to match, you may use a “?” right after the quantifier.
*?
Match 0 or more times
+?
Match 1 or more times
??
Match 0 or 1 time
{n}?
Match exactly n times
{n,}?
Match at least n times
{n,m}?
Match at least n but not more than m times
Perl5 extended regular expressions are fully supported.
(?#text)
An embedded comment causing text to be ignored.
(?:regexp)
Groups things like “()” but doesn’t cause the group match to be saved.
(?=regexp)
A zero-width positive lookahead assertion. For example, \w+(?=\s) matches a word followed by whitespace, without including whitespace in the MatchResult.
(?!regexp)
A zero-width negative lookahead assertion. For example foo(?!bar) matches any occurrence of “foo” that isn’t followed by “bar”. Remember that this is a zero-width assertion, which means that a(?!b)d will match ad because a is followed by a character that is not b (the d) and a d follows the zero-width assertion.
(?imsx)
One or more embedded pattern-match modifiers. i enables case insensitivity, m enables multiline treatment of the input, s enables single line treatment of the input, and x enables extended whitespace comments.
Copyright © 1997 ORO, Inc. All rights reserved. Original Reusable Objects, ORO, the ORO logo, and “Component software for the Internet” are trademarks or registered trademarks of ORO, Inc. in the United States and other countries.
Java is a trademark of Sun Microsystems. All other trademarks are the property of their respective holders.
January 12, 2011 at 2:47 pm
Do you people have a facebook fan page? I looked for one on twitter but could not discover one, I would really like to become a fan!
January 20, 2011 at 11:54 pm
Keep posting stuff like this i really like it
February 3, 2011 at 3:54 am
I’m not sure where you’re getting your information, but great topic. I needs to spend some time learning more or understanding more. Thanks for magnificent information I was looking for this info for my mission.
February 3, 2011 at 8:55 am
Thank you for the auspicious writeup. It in fact was a amusement account it. Look advanced to far added agreeable from you! However, how could we communicate?
February 3, 2011 at 11:17 am
Hello, i think that i saw you visited my web site so i came to “return the favor”.I’m trying to find things to improve my web site!I suppose its ok to use some of your ideas!!
February 3, 2011 at 11:52 am
We’re a group of volunteers and starting a new scheme in our community. Your site offered us with valuable information to work on. You’ve done an impressive job and our entire community will be thankful to you.
February 3, 2011 at 6:09 pm
Pretty section of content. I just stumbled upon your web site and in accession capital to assert that I acquire in fact enjoyed account your blog posts. Anyway I will be subscribing to your augment and even I achievement you access consistently quickly.
February 4, 2011 at 12:55 am
Hey there, You have done an excellent job. I’ll definitely digg it and personally recommend to my friends. I am sure they will be benefited from this website.