Java Regular Expressions

From no name for this wiki
Revision as of 16:19, 7 October 2015 by Claude (talk | contribs) (Inversion)
Jump to: navigation, search

Testprogramm

Anbei ein kleines Programm, um Java Regular Expressions zu testen: RegexTester. Der Code ist sehr einfach:

 Pattern pattern = Pattern.compile("myregex");
 Matcher matcher = pattern.matcher("text to search");
 while(matcher.find()){
   String group = matcher.group();
 }

Metacharacters

([{\^-$|]})?*+
Escaping kann wie folgt gemacht werden: 1. Backslash oder 2. in \Q \E einbetten.
Sample Textbox = \[test\] sucht den String [test]. Im Code müsste man schreiben: String regex ="\\[test\\]";

Character Classes

  • [abc] a, b, or c (simple class)
  • [^abc] Any character except a, b, or c (negation)
  • [a-zA-Z] a through z, or A through Z, inclusive (range)
  • [a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
  • [a-z&&[def]] d, e, or f (intersection)
  • [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
  • [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)

Predefined Character Classes

  • . Any character (may or may not match line terminators)
  • ?: Keine Backreferenz erstellen für Gruppen.
  • \d A digit: [0-9]
  • \D A non-digit: [^0-9]
  • \s A whitespace character: [ \t\n\x0B\f\r]
  • \S A non-whitespace character: [^\s]
  • \w A word character: [a-zA-Z_0-9]
  • \W A non-word character: [^\w]
  • (?i) Disable case sensitivity (Grosskleinschreibung nicht beachten)

Quantifiers

  • X? (greedy) X?? (reluctant) X?+ (possessive) X, once or not at all
  • X* (greedy) X*? (reluctant) X*+ (possessive) X, zero or more times
  • X+ (greedy) X+? (reluctant) X++ (possessive) X, one or more times
  • X{n} (greedy) X{n}? (reluctant) X{n}+ (possessive) X, exactly n times
  • X{n,} (greedy) X{n,}? (reluctant) X{n,}+ (possessive) X, at least n times
  • X{n,m} (greedy) X{n,m}? (reluctant) X{n,m}+ (possessive) X, at least n but not more than m times
  • ?: The question mark makes the preceding token in the regular expression optional. E.g.: colou?r matches both colour and color.

Boundaries

  • ^ The beginning of a line
  • $ The end of a line
  • \b A word boundary
  • \B A non-word boundary
  • \A The beginning of the input
  • \G The end of the previous match
  • \Z The end of the input but for the final terminator, if any
  • \z The end of the input
  • ? The question mark makes the preceding token in the regular expression optional. E.g.: colou?r matches both colour and color.

Backreferences

Backreferencen werden mit einem Backslash und einer Zahl nach der Gruppe definiert. Beispiel: (\d\d)\1. Matcht z.B. 3434 aber nicht 1234.

The regex Set(Value)? matches Set or SetValue. In the first case, the first backreference will be empty, because it did not match anything. In the second case, the first backreference will contain Value.

If you do not use the backreference, you can optimize this regular expression into Set(?:Value)?. The question mark and the colon after the opening round bracket are the special syntax that you can use to tell the regex engine that this pair of brackets should not create a backreference

Backreferences in .net

(?<name>subexpression) oder (?'name' subexpression)

        [TestMethod]
        public void TestRegEx()
        {
            string pattern = @".*MERGEFIELD\W*(?<bookmark>\w+)\W*";
            string input = "MERGEFIELD xyz";
            Match m  = Regex.Match(input, pattern);
            string myval = m.Groups["bookmark"].Value;
        }

Or Verknüpfung

Or Verknüpfungen werden mit einem vertikalen Strich gemacht. Beispiel: "text1|text2|text3"

Inversion

  • Beispiel: foo
  • Inversion: ^((?!foo).)*$