10 Digit Numeric Wordlist

The preprocessor is used to combine similar rules into one source line. For example, if you need to make John try lowercased words with digits appended, you could write a rule for each digit, 10 rules total. Now imagine appending two-digit numbers - the configuration file would get large and ugly. With the preprocessor you can do these things.

Regular Expression Examples is a list, roughly sorted by complexity, of regular expression examples. It also serves as both a library of useful expressions to include in your own code.

For advanced examples, see Advanced Regular Expression Examples You can also find some regular expressions on Regular Expressions and Bag of algorithms pages.

  • The four-digit password could probably be broken in a day, while the 10-digit password would take a millennium to break given current processing power. If your password used only the 26 lowercase letters from the alphabet, the four-digit password would have 26 the the fourth powe, or 456,000 password combinations.
  • Oct 12, 2010 Here is the code, I don't think there is any method in SE. It basically converts number to string and parses String and associates it with the weight. 1000 1 is treated as thousand position and 1 gets mapped to 'one' and thousand because of position.

See Also

Example Regexes to Match Common Programming Language Constructs
Extracting numbers from text strings, removing unwanted characters , comp.lang.tcl, 2002-06-23
a delightful explication by Michael Cleverly
re_syntax
URI detector for arbitrary text as a regular expression
Arts and crafts of Tcl-Tk programming
Regular Expressions
Regular Expression Debugging Tips
Visual Regexp
A terrific way to learn about REs.
Redet
Another tool for learning about and working with REs.
Regular Expression Debugging Tips
More tools.

Simple regexp Examples

regexp has syntax:

regexp ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?

If matchVar is specified, its value will be only the part of the string that was matched by the exp. As an example:

10 Digit Numeric Wordlist

If any subMatchVars are specified, their values will be the part of the string that were matched by parenthesized bits in the exp, counting open parentheses from left to right. For example:

Many times, people only care about the subMatchVars and want to ignore matchVar. They use a 'dummy' variable as a placeholder in the command for the matchVar. You will often see things like

where ${->} holds the matched part. It is a sneaky but legal Tcl variable name.

PYK 2015-10-29: As a matter of fact, every string is a legal Tcl variable name.

Splitting a String Into Words

'How do I split an arbitrary string into words?' is a frequently asked question. If you use split $string { }, then multiple spaces will produce a list with empty elements. If you try to use foreach or lindex or some other list operation, then you must be sure that the string is a well-formed list. (Braces could cause problems.) So use a regular expression like this very simple shorthand for non-space characters:

You can even split a string of text with arbitrary spaces and special characters into a list of words by using the -inline and -all switches to regexp:

Split into Words, Respecting Acronyms

from Tcl Chatroom, 2013-10-09

Floating Point Number

This expression includes options for leading +/- character, digits, decimal points, and a trailing exponent. Note the use of nearly duplicate expressions joined with the or operator | to permit the decimal point to lead or follow digits.

Expression to find if a string have any substring maching a floating point number ( This was posted to comp.lang.tcl by Roland B. Roberts.):

More information (http://www.regular-expressions.info/floatingpoint.html )

Letters

Thanks to Brent Welch for these examples, showing the difference between a traditional character matching and 'the Unicode way.'

Only letters:

Only letters, the Unicode way:

Special Characters

Thanks again to Brent Welch for these two examples.

The set of Tcl special characters: ] [ $ { } :

The set of regular expression special characters: '] [ $ ^ ? + * ( ) | '

CHARACTERDESCRIPTION
*The sub-pattern before '*' can occur zero or more times
+The sub-pattern before '+' can occur one or more times
?The sub-pattern before '?' can only occur zero or one time
|(Alteration) Matches any one sub-pattern separated by '|'s. Similar to logical 'OR'.
()Groups a pattern
[]Defines a set of characters, or range of characters [a-z,A-Z,0-9]

I don't understand these examples. Why have [, ], and then the rest of the characters inside a [] - that just makes the string have [ and ] there twice, right?

LV: the first regular expression should be seen like this:

{ ... }
Protect the 9 inner characters.
[ ... ]
Define a set of characters to process.
]
If your set of characters is going to include the right bracket character ] as a specific matching character, then it needs to be first in the set/class definition.
[${}
More individual characters.
Doubled because when regexp goes to evaluate the characters, it would otherwise treat a single backslash as a request to quote the next character, the ending right bracket of the set/class.

The second regular expression is interpreted in a similar fashion. There are more characters because there are more metacharacters.

Also, not all characters are there - where are the period, equals, bang (exclamation sign), dash, colon, alphas that are a part of character entry escapes or classes, 0, hash/pound sign, and angle brackets (< and >)? These special characters all have meta meanings within regular expressions...

LV: Apparently no one has come along and updated the above expression to cover these.

Example posted by KC:

A set containing both angle brackets:

newline/carriage return

Could someone replace this line with some verbiage regarding the way one uses regular expressions for specific newline-carriage return handling (as opposed to the use of the $ metacharacter)?

Janos Holanyi: I would really need to build up a re that would match one line and only one line - that is, excluding carriage-return-newline's (rn) from matching... How would such a re look like?

LV: how about something like this?

If you want to keep carriage returns or newlines by themselves, but not when they are together, you need something like:

This allows plain carriage return or plain newline.

Thanks to bbh and Donal Fellows for this regular expression.

Back References

From comp.lang.tcl:

I did some experimenting with other strings, like 'just a HHHHEEEEAAAADDDDEEEERRRR'. The regular expression (.)111 does the job I would have wanted, whereas (.){4} will return the last of each four characters - as posted as well.

That surprised me too -- being able to place backreferences within the regex is an extremely powerful technique.

for exactly 4 char repeats, and (.)1+ for arbitrary repeats.

Whitespace After a Newline

PYK 2019-02-21: How does one capture any whitespace followed by a newline, except for newlines? The key is to use a negative lookahead to match empty space not followed by a newline. That bears repeating: Parenthesis are used to isolate the negative lookahead so that what matches immediately prior is the empty string:

This mechanism is effectively an and not operator.

bll 2019-02-21: I find:

much easier.

PYK 2019-02-21: That picks up much more than whitespace, so not quite the same thing.

IP Numbers

You can create a regular expression to check an IP address for correct syntax. Note that this regular expression only checks for groups of 1-3 digits separated by periods. If you want to ensure that the digit groups are from 0-255, or that you have a valid IP address, you'll have to do additional (non regexp) work. This code posted to comp.lang.tcl by George Peter Staplin

The above regular expression matches any string where there are four groups of 1-3 digits separated by periods. Since it's not anchored to the start and end of the string (with ^ and $) it will match any string that contains four groups of 1-3 digits separated by periods, such as: '66.70.7.154.9'.

If you don't mind a longer regexp, there is no reason you can't ensure that each group of 1-3 digits is in the range of 0-255. For example (broken up a bit to make it more readable):

recently on comp.lang.tcl, someone mentioned that http://www.oreilly.com/catalog/regex/chapter/ch04.html#Be_Specific talks about matching IP addresses.

Gururajesh: A Perfect regular expression to validate ip address with a single expression.

For 245.254.253.2, output is 245.254.253.2

For 265.254.243.2, output is none, As ip-address can`t have a number greater than 255.

Lars H: Perfect? No, it looks like it would accept 99a99b99c99, since . will match any character. Also, it can be shortened significantly by making use of {4} and the like (see Regular expressions).

Better is

Tcllib should be useful

freethomas: I thinks this regexp is much simple and easier for IP number

AMG: This expression allows any character to separate the octets, not just period. I sincerely doubt this is what you want. Use . instead of D. Also it's not anchored with ^ and $, so it works on substrings rather than requiring that the whole string match. Though maybe this is what you want since you explicitly capture the matching substring.

I already fixed the syntax issue of saying { at the beginning but leaving out the closing }, also of leaving out the first (.

I see no reason to use ( and ) grouping. You don't give variables into which the subexpressions would be captured, and it's pointless to capture the dots between the octets. (See what I did there?) Try this:

AMG: Here's a very similar script (to Lars H's contribution) that uses scan instead of regexp. It's much more readable, in my opinion.

There are a few differences. One, the trailing dot is omitted from the first three output variables (which I call a, b, c, d instead of v1, v2, v3, v4). Two, leading zeroes are permitted and discarded. Three, -0 is accepted as 0. Four, garbage at the end of $string is silently discarded. Five, each octet can have a leading +, e.g. +255.+255.+255.+255. Six, it's OVER FIVE TIMES FASTER! On this machine, my version using scan takes 15 microseconds, whereas your version using regexp takes 78 microseconds. Use time to measure performance. (I replaced puts with return when testing.)

Now, here's a hybrid version that uses regexp.

This version takes 46 microseconds to execute. It doesn't accept leading + or -. It rejects garbage at the end of the string. It treats the octets as octal if they are given leading zeroes, and invalid octal is always accepted. The reason for this last is because if treats strings containing invalid octal as nonnumeric text, so the <= operator is used to sort text rather than compare numbers. Corrected version:

This version takes 47 microseconds and it rejects invalid octal. However, it still interprets numbers as octal if leading zeroes are given, so 0377.255.255.255 is accepted (but 0400.255.255.255 is rejected). To fix this, it would be necessary to make a pattern that rejects leading zeroes unless the octet is exactly zero, something like: (0|[^1-9]d*). But this is getting clumsy and slow; I prefer the scan solution. regexp: not always the right tool!

Gururajesh:

This will be ok... for above mentioned issue.

AMG: Why call scan four times? A single invocation can do the job:

I don't see any drawbacks to this approach. The regular expression is simple and is used only to reject + and - signs and garbage at the end, scan does the job of splitting and converting to integers, and math expressions check ranges. Three tools, each doing what they're designed for.

CJB: Here is a pure regexp version with comparable performance. It matches any valid ip, rejecting octals. However it does not split the integers and is therefore only useful for validation. The timings on my computer were about 22 microseconds for this version compared to 28 microseconds for the regexp/scan combo (I removed the puts statements for the comparison because they are slow and tend to vary).

Note that the pure scan version is still fastest (about 20 microseconds), splits, and has the same rejections (%d stores integers and ignores extra leading 0 characters).

fh 2012-02-13 11:54:30:

To search IP ADDRESS using Regular Expression

Domain names

(First shot)

This code does NOT attempt, obviously, to ensure that the last level of the regular expression matches a known domain...

Regular Expression for parsing http string

the above author should remember this is a Tcl wiki, and not an aolserver one, but thanks for the submission ;)

PYK 2016-02-28: In the previous edit, a - character was added to the regular expression, prohibiting the occurrence of - in scheme component of a URL. As far as I can tell, - is allowed in the scheme component, so I've reverted that change in the expression above.

E-mail addresses

RS: No warranty, just a first shot:

Understand that this expression is an attempt to see if a string has a format that is compatible with normal RFC SMTP email address formats. It does not attempt to see whether the email address is correct. Also, it does not account for comments embedded within email addresses, which are defined even though seldom used.

bll 2017-6-30 E-mail addresses are quite complicated. You must be careful not to reject valid e-mail addresses. For example, % and + characters are valid. Nobody uses the % sign any more as it is not secure. The + character is very useful, but unfortunately, there are a lot of incorrect e-mail validation routines that reject it.

The following pattern will still reject an e-mail of the form [email protected][ip-address]. No lengths are checked. It does not check that the top-level domain (e.g. .org, .com, .solutions) is valid.

Reference: https://en.wikipedia.org/wiki/Email_address#Examples

10 Digit Numbers Wordlist

XML-like data

To match something similar to XML-tags you can use regular-expressions, too. Let's assume we have this text:

We can match the body of bo with this regexp:

Now we extend our XML-text with some attributes for the tags, say:

If we try to match this with:

it won't work anymore. This is because s+ is greedy (in contrary to the non-greedy (.+?) and (.*?)) and that (the one greedy-operator) makes the whole expression greedy.

See Henry Spencer's reply in tcl 8.2 regexp not doing non-greedy matching correctly , comp.lang.tcl, 1999-09-20.

The correct way is:

Now we can write a more general XML-to-whatever-translater like this:

  1. Substitute [ and ] with their corresponding [ and ] to avoid confusion with subst in 3.
  2. Substitute the tags and attributes with commands
  3. Do a subst on the whole text, thereby calling the inserted commands

Call the parser with:

You have to be careful, though. Don't do this for large texts or texts with many nested xml-tags because the regular-expression-machine is not the the right tool to parse large,nested files efficiently. (Stefan Vogel)

DKF: I agree with that last point. If you are really dealing with XML, it is better to use a proper tool like TclDOM or tDOM.

PYK 2015-10-30: I patched the regular expression to fix an issue where the attributes group could pick up part of the tag in documents containing tags with similar prefixes. The fix is to use whitespace followed by non-whitespace other than > to detect the beginning of attributes. There are other things

Negated string

Bruce Hartweg wrote in comp.lang.tcl: You can't negate a regular expression, but you CAN negate a regular expression that is only a simple string. Logically, it's the following:

  • match any single char except first letter in the string.
  • match the first char in string if followed by any letter except the 2nd
  • match the first two if followed by any but the third, et cetera

Then the only thing more is to allow a partial match of the string at end of line. So for a regexp that matches

The following proc will build the expression for any given string

Donal Fellows followed up with:

That's just set me thinking; you can do this by specifying that the whole string must be either not the character of the antimatch*, or the first character of the antimatch so long as it is not followed by the rest of the antimatch. This leads to a fairly simply expressed pattern.

In fact, this allows us to strengthen what you say above to allow the matching of any negated regular expression directly so long as the first component of the antimatch is a literal, and the rest of the antimatch is expressible in an ERE lookahead constraint (which imposes a number of restrictions, but still allows for some fairly sophisticated patterns.)

* Anything's better than overloading 'string' here!

JMN 2005-12-22: Could someone please explain what is meant by a 'negated string' here? Specifically - what do the above achieve that isn't satisfied by the simpler:

Doesn't the following snippet from the regexp manpage indicate that a regexp can be negated? where does(or did?) the 'simple string' requirement come in? - is this info no longer current?

Lars H: It indeed seems the entire problem is rather trivial. In Tcl 7 (before AREs) one sometimes had to do funny tricks like the ones Bruce Hartweg performs above, but his use of {0,2} means he must be assuming AREs. Perhaps there was a transitory period where one was available but not the other.

Oleg 2009-12-11: If one needs to match any string but 'foo', then the following will do the work:

And in general case when one needs to match any string that is neither 'foo' nor 'bar', then the following will do the work:

CRML 2013-11-06 In general case when one needs to match any string that is neither 'foo' nor 'bar' might be done using:

AMG: Oleg's regexps confuse me. Translated literally, I read them as 'match any string that does not begin with foo (or bar) unless that string has more characters after the foo (or bar).' Very indirect, I must say. CRML's suggestion I like better, though I would drop the extra parentheses to obtain: ^(?!(foo|bar)$). This says, 'match any string that does not begin with either foo or bar when immediately followed by end of string.' In other words, 'match any string that is not exactly foo or bar.'

Turn a string into %hex-escaped (url encoded) characters:

e.g. Csan -> %43%73%61%6E

This demonstrates the power of using regsub together with subst, which is regarded as one of the most powerful ways to use regular expressions in Tcl.

Turn a string into %hex-escaped (url encoded) characters (part 2)

This one makes the result more readable and still quite safe to use in URLs e.g. https://wiki.tcl-lang.org -> http%3A%2F%2Fwiki%2Etcl%2Etk

The inverse of the above (not optimized):

Caveats about using regsub with subst

glennj 2008-12-16: It can be dangerous to blindly apply subst to the results of regsub, particularly if you have not validated the input string. Here's an example that's not too contrived:

This results in invalid command name 'Some'. What if $string was [exec format c:]?

See DKF's 'proc regsub-eval' contribution in regsub to properly prepare the input string for substitution. Paraphrased:

which results in what you'd expect: the string '[Some Malicious Command]'

APN I don't follow why all the extra are needed in the string map. The following should work just as well?

10 Digit Numeric Wordlist

PYK 2016-05-28: Indeed:

Maintain proper spacing when formatting for HTML

DG got this from Kevin Kenny on c.l.t.

And the output is:

Tabs require replacement, too:

glennj: Taken from comp.lang.perl.misc, transform variable names into StudlyCapsNames:

When using ASED's syntax checker you get an error of you don't use the -- option to regexp. Instead of regexp {([^A-Za-z0-9_-])} $string you have to write regexp -- {([^A-Za-z0-9_-])} $string

LV: A user recently asked:

I have a string that I'm trying to parse. Why doesn't this seem to work?

Wordlist

It looks to me like the *? causes the subsequent d+ to also be non-greedy and only match the first hit. Did I figure that out correctly? I presume that we currently don't have a way to turn off the greediness item?

Of course, in this simplified problem, one could just drop the greediness and code

I'll let the user decide if that suffices.

PYK 2019-08-15: See 'greediness' at Regular Expressions. In short the greediness of every quantifier is the greediness of its branch, regardless of the default preference of the quantifier. A branch, in turn, picks up its greediness from the first quantifier.

How do you select from two words?

LES: You got the regexp syntax wrong and tried to match the regular expression with the string 'match'. There is no 'zzz' variable (the actual match variable in your code) because your regular expression does not match the string 'match'. Try this:

Note that I could have dropped the 'zzz' variable, but left it there as a second match variable, as an exercise to you. You should understand why and what it does if you read the regexp page and assimilate the syntax.

Infinite spaces at start and end

RUJ: Could you match the following pattern of following string: infinite spaces at start and end.

LV: try

which should have a value of 1 (in other words, it matched). Of course, if those leading and trailing spaces are optional, then change the + to a *.

CRML non greedy or greedy does not give the same result. In the previous example, the .* matches all the string up to the last but one char.

URL Parser

See URL Parser.

Match a 'quoted string'

AMG: Adapted from Wibble:

This recognizes strings starting and ending with double quote characters. Any character can be embedded in the string, even double quotes, when preceded by an odd number of backslashes.

Word Splitting, Respecting Quoted Strings

10 Digit Numeric Wordlist Worksheet

given some text, e.g.

how to parse it into

see KBK, #tcl irc channel, 2012-12-02

split a string into n-length substrings

evilotto, #tcl, 2013-02-07

At Least 1 Alpha Character Interspersed with 0 or More Digits

Matching a group of strings

We can match a group of strings or subjects in a single regular expression

Sqlite Numeric Literal

ak - 2017-08-08 03:32:33

Regarding negation of regular expressions.

While the regular expression syntax does not allow for simple negation the underlying formalism of (non)deterministic finite automata does. Simply swap final and non-final states to negate, i.e. complement it.

See for example the grammar::fa package in Tcllib, which provides a complement method. It is implemented in the operations package. As are methods to convert from and to regular expressions.

Category TutorialCategory String Processing
Hash Suite - Windows password security audit tool. GUI, reports in PDF.

Wordlist rules syntax.

Each wordlist rule consists of optional rule reject flags followed byone or more simple commands, listed all on one line and optionallyseparated with spaces. There's also a preprocessor, which generatesmultiple rules for a single source line. Below you will finddescriptions of the rule reject flags, the rule commands (many of themare compatible with those of Crack 5.0a), and the preprocessor syntax.

Rule reject flags.

Numeric constants and variables.

Numeric constants may be specified and variables referred to with thefollowing characters:

Numeric

Here max_length is the maximum plaintext length supported for thecurrent hash type.

These may be used to specify character positions, substring lengths, andother numeric parameters to rule commands as appropriate for a givencommand. Character positions are numbered starting with 0. Thus, forexample, the initial value of 'm' (last character position) is one lessthan that of 'l' (length).

Character classes.

The complement of a class can be specified by uppercasing its name. Forexample, '?D' matches everything but digits.

Free Wordlist Download

Simple commands.

String commands.

To append a string, specify 'z' for the position. To prefix the wordwith a string, specify '0' for the position.

Although the use of the double-quote character is good for readability,you may use any other character not found in STR instead. This isparticularly useful when STR contains the double-quote character.There's no way to escape your quotation character of choice within astring (preventing it from ending the string and the command), but youmay achieve the same effect by specifying multiple commands one afteranother. For example, if you choose to use the forward slash as yourquotation character, yet it happens to be found in a string and youdon't want to reconsider your choice, you may write 'Az/yes/$/Az/no/',which will append the string 'yes/no'. Of course, it is simpler andmore efficient to use, say, 'Az,yes/no,' for the same effect.

Length control commands.

English grammar commands.

Insert/delete commands.

Also see the 'X' command (extract and insert substring) under 'Memoryaccess commands' below.

Note that square brackets ('[' and ']') are special characters to thepreprocessor: you should escape them with a backslash (') if usingthese commands.

Charset conversion commands.

Memory access commands.

If 'Q' or 'X' are used without a preceding 'M', they read from theinitial 'word'. In other words, you may assume an implied 'M' at thestart of each rule, and there's no need to ever start a rule with 'M'(that 'M' would be a no-op). The only reasonable use for 'M' is in themiddle of a rule, after some commands have possibly modified the word.

The intended use for the 'Q' command is to help avoid duplicatecandidate passwords that could result from multiple similar rules. Forexample, if you have the rule 'l' (lowercase) somewhere in your rulesetand you want to add the rule 'lr' (lowercase and reverse), you couldinstead write the latter as 'lMrQ' in order to avoid producing duplicatecandidate passwords for palindromes.

The 'X' command extracts a substring from memory (or from the initialword if 'M' was never used) starting at position N (in the memorized orinitial word) and going for up to M characters. It inserts thesubstring into the current word at position I. The target position maybe 'z' for appending the substring, '0' for prefixing the word with it,or it may be any other valid numeric constant or variable. Some exampleuses, assuming that we're at the start of a rule or after an 'M', wouldbe 'X011' (duplicate the first character), 'Xm1z' (duplicate the lastcharacter), 'dX0zz' (triplicate the word), '<4X011X113X215' (duplicateevery character in a short word), '>9x5zX05z' (rotate long words left by5 characters, same as '>9{{{{{'), '>9vam4Xa50'l' (rotate right by 5characters, same as '>9}}}}}').

Numeric commands.

'l' is set to the current word's length, and its new value is usable bythis same command (if N or/and M is also 'l').

V must be one of 'a' through 'k'. N and M may be any valid numericconstants or initialized variables. It is OK to refer to the samevariable in the same command more than once, even three times. Forexample, 'va00' and 'vaaa' will both set the variable 'a' to zero (butthe latter will require 'a' to have been previously initialized),whereas 'vil2' will set the variable 'i' to the current word's lengthminus 2. If 'i' is then used as a character position before the word ismodified further, it will refer to the second character from the end.It is OK for intermediate variable values to become negative, but suchvalues should not be directly used as positions or lengths. Forexample, if we follow our 'vil2' somewhere later in the same rule with'vj02vjij', we'll set 'j' to 'i' plus 2, or to the word's length as ofthe time of processing of the 'vil2' command earlier in the rule.

Character class commands.

Extra 'single crack' mode commands.

When defining 'single crack' mode rules, extra commands are availablefor word pairs support, to control if other commands are applied to thefirst, the second, or to both words:

If you use some of the above commands in a rule, it will only processword pairs (e.g., full names from the GECOS field) and reject singlewords. A '+' is assumed at the end of any rule that uses some of thesecommands, unless you specify it manually. For example, '1l2u' willconvert the first word to lowercase, the second one to uppercase, anduse the concatenation of both. The use for a '+' might be to apply somemore commands: '1l2u+r' will reverse the concatenation of both words,after applying some commands to them separately.

The rule preprocessor.

The preprocessor is used to combine similar rules into one source line.For example, if you need to make John try lowercased words with digitsappended, you could write a rule for each digit, 10 rules total. Nowimagine appending two-digit numbers - the configuration file would getlarge and ugly.

With the preprocessor you can do these things easier. Simply write onesource line containing the common part of these rules followed by thelist of characters you would have put into separate rules, in squarebrackets (the way you would do in a regexp). The preprocessor will thengenerate the rules for you (at John startup for syntax checking, andonce again while cracking, but never keeping all of the expanded rulesin memory). For the examples above, the source lines will be 'l$[0-9]'(lowercase and append a digit) and 'l$[0-9]$[0-9]' (lowercase and appendtwo digits). These source lines will be expanded to 10 and 100 rules,respectively. By the way, preprocessor commands are processedright-to-left while character lists are processed left-to-right, whichresults in natural ordering of numbers in the above examples and inother typical cases. Note that arbitrary combinations of characterranges and character lists are valid. For example, '[aeiou]' will usevowels, whereas '[aeiou0-9]' will use vowels and digits. If you need tohave John try vowels followed by all other letters, you can use'[aeioua-z]' - the preprocessor is smart enough not to produce duplicaterules in such cases (although this behavior may be disabled with the'r' magic escape sequence described below).

There are some special characters in rules ('[' starts a preprocessorcharacter list, '-' marks a range inside the list, etc.) You shouldprefix them with a backslash (') if you want to put them inside a rulewithout using their special meaning. Of course, the same applies to 'itself. Also, if you need to start a preprocessor character list at thevery beginning of a line, you'll have to prefix it with a ':' (the no-oprule command), or it would be treated as a new section start.

Finally, the preprocessor supports some magic escape sequences. Thesestart with a backslash and use characters that you would not normallyneed to escape. In the following paragraph describing the escapes, theword 'range' refers to a single instance of a mix of character listsand/or ranges placed in square brackets as illustrated above.

Currently supported are '1' through '9' for back-references to priorranges (these will be substituted by the same character that iscurrently substituted for the referenced range, with ranges numberedfrom 1, left-to-right), '0' for back-reference to the immediatelypreceding range, 'p' before a range to have that range processed 'inparallel' with all preceding ranges, 'p1' through 'p9' to have therange processed 'in parallel' with the specific referenced range, 'p0'to have the range processed 'in parallel' with the immediately precedingrange, and 'r' to allow the range to produce repeated characters. The'r' escape is only useful if the range is 'parallel' to another one orif there's at least one other range 'parallel' to this one, because youshould not want to actually produce duplicate rules.

Please refer to the default configuration file for John the Ripper formany example uses of the features described in here.

$Owl: Owl/packages/john/john/doc/RULES,v 1.14 2017/05/14 12:16:07 solar Exp $