Regular Expression Format

Regular expressions are used to match the user names attached to print jobs, where the user name has been "mangled", i.e. altered by having extra characters added to the beginning and/or end of it. Regular expressions have three parts: a pre-match string that matches any characters that appear in the user name string before the actual user name; a match string, which matches the actual user name; and a post-match string which identifies any characters that appear after the actual user name.

Regular expressions are defined in Pharos Administrator at System > System Settings > Regular Expressions tab. Regular expressions are written in a special format, and may contain a number of different special characters.

Special Characters

The following special characters can be used to specify a match in a regular expression:

Special Character

Usage

. period

Represents a match for any one character.  For example, .at matches "cat", "bat", "hat", etc.

* asterisk

Means match zero or more of what goes before it in the Regular Expression. For example:

  • [a-z]* matches any name or word in lower-case
  • [A-Z]* matches any name or word in upper-case
  • .* matches anything at all, including nothing

+ plus

Means match one or more of what goes before it in the Regular Expression. This is similar to *; the only difference is that * matches on zero or more, while + needs at least one occurrence. So .+ matches any number of characters of any type, but does not match nothing.

( ) parentheses

Identifies parts of Regular Expressions that are treated as one unit. For example, [a-z] matches a single lower case letter, but (and)[a-z] matches a string with the letters "and" followed by a lower case letter e.g. "andy".

[ ] square brackets

Represents a range of possibilities that can be matched. For example, [gpr] matches either "g", "p" or "r", whereas [ ] matches a space .

Using a dash between characters within the brackets indicates a match for that range of characters. For example:

  • [g-r] matches any lower-case character between "g" and "r" inclusive
  • [a-z] matches any lower-case character
  • [0-9] matches any single digit in that range

^ caret

Used as the first character in the set between square brackets, the caret means the Regular Expression does not match any character within the brackets. For example [^gprw] matches any character other than "g", "p", "r" or "w"

Used at the beginning of an expression or sub-expression, the caret means the matched string must be at the beginning of the string being searched. For example, ^abc matches "abc" at the beginning of a string. ^abc.* means a match for any string beginning with "abc".

$ dollar sign

Used at the end of an expression or sub-expression, the dollar sign means the matched string must be at the end of the string being searched.  For example, abc$ matches "abc" at the end of a string. .*abc$ means a match for any string ending with "abc".

? question mark

Identifies the preceding character as an optional element that may occur once or not at all in the string being matched. For example, ABC? matches AB or ABC.

| pipe

Means a choice exists for a match. For example, the Regular Expression .*(s|es) means any characters followed by "s" or "es" so it matches "dogs" or "horses".

\ backslash

Followed by a special character returns that character to its literal form. For example, \* represents an asterisk in the string being matched.

Where consecutive groups of regular expressions are used, the string being matched must mirror the grouping. For example:

  • [a-z]*[ ][a-z]* matches any string comprising two groups of letter combinations separated by a space, such as "john doe"
  • [A-Z][a-z]* matches any word beginning with a capital, such as "Pharos" or "John"

Examples

The following examples illustrate the regular expressions for the pre-match, match and post-match required to match user names in a variety of situations.

 

Name Format Received by Pharos

Required Name Format

Pre-match

Match

Post-match

1

.john smith.abc.efg

john smith

[.]

[a-zA-Z0-9_]*[ ][a-zA-Z0-9_]*

[.][a-zA-Z0-9_]*

2

/o=.../on=.../un=.../n=john

john

.*/n=

[a-zA-Z0-9_]*

 

3

john (192.168.2.1)

john

 

[^ ]*

[ ][(].*

4

 (john)

john

[ ]*[(]

[^)]+

[)].*

Example 1 matches a two-word name that is preceded by a period, and following by a period and any number of other characters. This sort of user name string can be produced by Novell systems.

Example 2 matches any number of characters that occur after "/n=". This sort of user name string can be produced by Novell systems.

Example 3 matches a user name that is followed by a space, then any number of characters in parentheses. This sort of user name string can be produced by Mac and LPR printing systems.

Example 4 matches a user name preceded by any number of spaces and enclosed in parentheses. This is one of the default Regular Expressions installed with Pharos Administrator.