uCalc String Library

uCalc String Library consists of a set of string-handling routines with names that will seem very familiar, especially to those who use either .NET's Microsoft.VisuaBasic.Strings module, VB 6, or PowerBASIC, but with support for uCalc Patterns, which infuses each routine with advanced parsing capabilities.

If you are familiar with such routines as InStr, Mid(), Left(), Right(), etc, you already know most of what you need to know to get started. This documentation focuses mainly on the additional functionality that uCalc brings to the table. The general description of each routine is similar to what you may be used to. The main differences are that text can be accessed or manipulated using special patterns, instead of just plain strings; and numeric arguments can be measured either in terms of characters, tokens, blocks, or expressions; and all routines have an additional Options parameter, which lets you select such things as whether the function call should be whitespace-sensitive and case-sensitive. By default case-sensitivity and white-space sensitivity is off, and tokens (not characters) are the unit of measurement, except where indicated otherwise. Some routines might be more familiar particularly to PowerBASIC users. As is common in BASIC dialects, character positions start at 1 (not 0).

For details about uCalc Patterns, visit http://www.ucalc.com/docs/Patterns.htm

Note: uCalc String Library routines are preceded by "uc" when called from your source code. However, when called from uCalc Console Calculator or uCalc Transform, the routines are not preceded by "uc". So for instance you would use ucInStr() if you are calling it from your source code, whereas you'd call InStr() if you are using uCalc Transform. With uCalc Fast Math Parser your source code would use ucInStr(), whereas your end-user expressions would use InStr(). In order for your uCalcFMP expressions to use the String Library, first place a call to ucLoadStringLibInFMP()

New: ucDelim, ucSplit, ucEquals, ucRange

ucInStr (Start, MainString, Pattern [, Options])

Returns the location of text within MainString that matches Pattern.

Start - Starting character position from where to begin the search

MainString - String within which to search for Pattern

Pattern - String or pattern to find within MainString

Example: InStr with default setting

ucInStr(1, MainString, "a test")

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 9

Explanation: With the default setting, "a test" (third argument) is a pattern and not a literal string. This pattern consists of the tokens "a" followed by "test". Spacing between the two does not matter, nor does casing (upper case/lower case) by default.

Example: InStr with special patterns

ucInStr(1, MainString, "({whatever})")

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 37

ucInStr(1, MainString, "is {etc:2}")

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 6

Explanation: In the first part, it matches the first occurrence of a pair of parenthesis with some text inside. {whatever} is a pattern variable. The name in this example was arbitrarily selected; instead of {whatever}, it could have been named {MyPattern}, or {item} just the same. The second pattern matches the token "is" followed by the next two tokens. {etc} is another pattern variable. The ":2" in {etc} tells it to pick up the next two tokens. See http://www.ucalc.com/docs/Patterns.htm for more details on uCalc Patterns.

Example: InStr with ucChar property

ucInStr(1, MainString, "a test", ucChar)

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 28

Explanation: ucChar makes it search character by character (the way traditional VB's InStr() function would do it), instead of token by token (which is the default in uCalc). This is similar to not using uCalc Patterns.

Example: InStr with ucCase (case-sensitive) property

ucInStr(1, MainString, "a test", ucCase)

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 38

Explanation: Here we introduce ucCase, which makes the search case sensitive. It skips over "A TEST", since it is all caps. Note that it also skips over the second literal occurrence, because it is within quotes. By default comparisons are done token by token, and "This is a test" (including surrounding quotes) represents one token, so it does not match the two separate tokens: "a" followed by "test". Likewise, ucInStr(1, MainString, "is") would match the "IS" token following "THIS", and not the "IS" that occurs within the word "THIS".

Example: InStr with ucSpace (whitespace-sensitive) property

ucInStr(1, MainString, "a test", ucSpace)

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 57

Explanation: Here the pattern is whitespace-sensitive. The given pattern has exactly one space, so it skips over occurrences where there is more than one space between "a" and "test".

Example: InStr with ucBlock property

ucInStr(1, MainString, "a test", ucBlock)

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 48

ucInStr(1, MainString, "(a test)", ucBlock)

THIS IS A TEST, "This is a test", (a test), a test, a test.

Returns 37

Explanation: So far, we've searched character by character, or token by token. Here, we are searching block by block. A block consists of either one token, or if the token was defined with the block property, the block starts with that token and ends with the closing token associated with it. By default the following block token pairs are defined ( and ), { and }, and ( and ). The first part of the example skips over "(a test)", and matches the following occurrence because (a test), including the parenthesis, represents one block unit, and it is not equal to "a test". In the second example "(a test)" as a block matches the given pattern "(a test)".

Example: InStr with combined properties

ucInStr(1, MainString, "testing 123")

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 1

ucInStr(1, MainString, "testing 123", ucCase)

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 16

ucInStr(1, MainString, "testing 123", ucChar)

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 31

ucInStr(1, MainString, "testing 123", ucCase+ucChar)

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 47

ucInStr(1, MainString, "(a test)", ucCase+ucSpace)

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 60

ucInStr(1, MainString, "(a test)", ucCase+ucSpace+ucBlock)

TESTING 123, testing 123, TESTING 123, "testing 123", (testing 123), testing 123.

Returns 74

Note: Not all combinations will produce a logically meaningful result. For instance ucChar cannot be combined with ucToken or ucBlock, because a character search is exclusive of token patterns.

ucLeft (MainString, n [, Options])

Returns the left-most n characters, tokens, blocks, or expressions in MainString.

Example: Counting tokens with ucLeft() (which is the default, and not characters)

ucLeft(MainString, 5)

This "is a test" using (the quick) brown fox, and more.

Returns: [the text that's highlighted]

Explanation: By default functions from this library count tokens (not characters). The 5 left-most tokens are:

1. This

2. "

3. Is

4. A

5. Test

This example is not spacing-sensitive.

Example: Counting tokens with ucLeft() (which is the default, and not characters)

ucLeft(MainString, 5, ucQuote)

This "is a test" using (the quick) brown fox, and more.

Returns: [the text that's highlighted]

Explanation: By default functions from this library count tokens (not characters). The 5 left-most tokens are:

1. This

2. "is a test"

3. using

4. (

5. the

This example is not spacing-sensitive. When ucQuote is on, text within quotes (including the quotes) count as one token.

Example: Counting characters with ucLeft()

ucLeft(MainString, 5, ucChar)

This͘"is a test" using (the quick) brown fox, and more.

Returns: "This " (without quotes, but including space)

Explanation: The 5 left-most characters are:

1. T

2. h

3. i

4. s

5. [space]

Example: Counting blocks with ucLeft()

ucLeft(MainString, 5, ucQuote+ucBlock)

This "is a test" using (the quick) brown fox, and more.

Returns: [the text that's highlighted]

Explanation: The 5 left-most blocks are:

1. This

2. "is a test"

3. using

4. (the quick)

5. brown

Example: Counting tokens, space-sensitive with ucLeft()

ucLeft(MainString, 5, ucQuote+ucSpace)

This "is a test" using (the quick) brown fox, and more.

Returns: [the text that's highlighted]

Explanation: The 5 left-most whitespace-sensitive tokens are:

1. This

2. [space]

3. "is a test"

4. [space]

5. using

ucRight (MainString, n [, Options])

Returns the right-most n characters, tokens, blocks, or expressions in MainString.

The rules for ucRight are very similar to those for ucLeft. One important difference to consider (especially if speed is a consideration) is that unlike ucLeft, which parses only until it reaches the number of characters or tokens specified, ucRight always parses the entire MainString text before returning the right-most n characters/tokens.

Examples: See ucLeft for the concept of counting by tokens, and blocks, etc.

ucLen (MainString [, Options])

Returns the length of a string in terms of characters, tokens, blocks, or expressions.

Example: counting tokens

ucLen(MainString, ucQuote)

This "is a test" using (the quick) brown fox, and more.

Returns: 13

Explanation: The counted tokens are:

1. This

2. "is a test"

3. Using

4. (

5. The

6. Quick

7. )

8. Brown

9. Fox

10. ,

11. And

12. More

13. .

Example: Counting characters with ucLen

ucLen(MainString, ucChar)

This͘"is a test" using (the quick) brown fox, and more.

Returns: 56

Explanation: This count each character much the same way VB's Len() function would.

Example: counting space-sensitive tokens with ucLen

ucLen(MainString, ucQuote+ucSpace)

This͘"is a test" using (the quick) brown fox, and more.

Returns: 21

Explanation: The counting here is similar to the first ucLen example except that each block of whitespace also counts as one unit. The space before "using" for instance counts as one, and the two consecutive spaces after "using" also count as one.

Example: Counting blocks with ucLen

ucLen(MainString, ucQuote+ucBlock)

This͘"is a test" using (the quick) brown fox, and more.

Returns: 10

Explanation: The text blocks that are counted are:

1. This͘

2. "is a test"

3. using

4. (the quick)

5. brown

6. fox

7. ,

8. and

9. more

10. .

ucUCase(MainString [, Pattern] [, Options])

Returns an uppercase version of MainString.

Pattern – ucUCase without the optional arguments behaves the same way as VB's UCase() function. If the optional pattern argument is used, then only text within MainString that matches Pattern is changed to uppercase.

Example:

MainString value:

This "is a test" using (the quick) brown "fox", (and) more.

ucUCase(MainString, "({text})")

Returns:

This "is a test" using (THE QUICK) brown "fox", (AND) more.

ucUCase(MainString, "{Q}{text}{Q}")

Returns:

This "IS A TEST" using (the quick) brown "FOX", (and) more.

ucUCase(MainString, "", ucSkipOver("({item})")

Returns:

THIS "IS A TEST" USING (the quick) BROWN "FOX", (and) MORE.

Explanation: Only text matching the pattern within the given string was changed to uppercase. SkipOver() causes it to match everything except the pattern passed to SkipOver.

ucLCase(MainString [, Pattern] [, Options])

Returns a lower case version of MainString.

See ucUCase.

ucMCase(MainString [, Pattern] [, Options])

Returns a mixed case version of MainString.

See ucUCase.

ucMid (MainString, Start [, Count] [, Options])

Returns a subset of a string.

Start – Starting position (counted by character, token, block, or expression)

Count – Length (in terms of characters, tokens, blocks, or expressions) of substring that should be returned.

Note: In this version, the second arg is required for it to work; use -1 to make it go to the end. [to be fixed.]

Example:

ucMid(MainString, 6)

THIS IS A TEST, "This is a test", (a test), a test, a test.

^ ^ ^ ^ ^ ^

1 2 3 4 5 6

Returns

"This is a test", (a test), a test, a test.

ucMid(MainString, 6, 4, ucQuote)

THIS IS A TEST, "This is a test", (a test), a test, a test.

^ ^ ^ ^ ^ ^

1 2 3 4 5 6

Returns

"This is a test", (a

The 4 tokens counted are

1. "This is a test"

2. ,

3. (

4. a

ucMid(MainString, 6, 4, ucChar)

THIS IS A TEST, "This is a test", (a test), a test, a test.

^^^^^^

123456

ucReplace (MainString, Pattern , Replacement [, Options])

Replaces all occurrences of one string pattern within MainString with another string.

Pattern – Pattern of text to find within MainString

Replacement – Replacement string for text that matches the Pattern argument. This string can contain pattern variables from the Pattern argument.

Example: ucReplace with default setting

ucReplace(MainString, "is {tokens:2}", "was {tokens}?")

THIS IS A TEST, This is a test, (is a test).

Returns

THIS was A TEST? This was a test?, (was a test?).

ucReplace(MainString, "is {tokens:2}", "was {tokens}?", ucSkipOver("This {etc},"))

THIS IS A TEST, This is a test, (is a test).

Returns

THIS IS A TEST, This is a test, (was a test?).

ucTally(MainString, Pattern [, Options])

Counts the number of occurrences of text matching a given pattern within a string. The count can be based on characters, tokens, blocks, or expressions. You can select whether or not to make the result whitespace-sensitive.

Examples:

ucTally(MainString, "is")

THIS IS A TEST, "This is a test", (is a test).

Returns: 3

ucTally(MainString, "is", ucQuote)

THIS IS A TEST, "This is a test", (is a test).

Returns: 2

ucTally(MainString, "is", ucChar)

THIS IS A TEST, "This is a test", (is a test).

Returns: 5

ucExtract(MainString, Pattern [, Options])

Extracts characters, tokens, blocks, or expressions up to a given pattern in a string. Similar to PowerBASIC's Extract, except that that one extracts only characters.

Example:

ucExtract(MainString, "<{tag}>{word:2}</{tag}>")

This is Test . This is another test. Testing 123.

Returns: This is Test . This is

Explanation: Returns text leading up to the first occurrence of text matching a pattern consisting of an HTML tag pair, containing exactly two words (tokens). It skips over Test which has 4 tokens ("Test", "<", "br", and ">"), and continues up until the second tag pair.

ucRemain(MainString, Pattern [, Options])

Returns text following a given pattern in a string. Similar to PowerBASIC's Remain, except that that one only deals with characters.

Example:

ucRemain(MainString, "<{tag}>{anything}</{tag}>")

This is Test . This is another test. Testing 123.

Returns: . This is another test. Testing 123.

Explanation: This returns text following (but not including) text that matches the pattern.

ucRetain(MainString, Pattern [, Options])

Returns a string consisting of only text that matches a given pattern. Similar to PowerBASIC's Retain() function except that that one deals only with characters.

Example:

ucRetain(ucFile("\Test\ucFile\cdcatalog.xml"), "<artist>{etc}</artist>", ucNth(22))

Returns: <artist>Luciano Pavarotti</artist>

Explanation: This example is meant to work with cdcatalog.xml, downloaded from http://www.w3schools.com/xml/cd_catalog.xml . ucFile() is a quick way to obtain all the text from one file. ucNth(22) tells it to retain only the 22th occurrence of the pattern of text between <artist> and </artist>.

ucRange(start, finish, Expression)

Returns a string consisting of a computed range of values. The variable x is the default iterator.

Example:

ucRange(65, 67, "Chr(x)")

Returns: ABC

ucRange (65, 67, "Chr(x)+Chr(x+32)", ucDelim(", ", "{", "}"))

Returns: {Aa, Bb, Cc}

Explanation: The expression in quotes is evaluated with values between 65 and 67. A concatenation of the ASCII Chr() codes for each is returned. In the second example, a upper and lower case pairs are computed, and they are separated with a delimiter defined by ucDelim. The first argument of ucDelim is the separator. The optional second and third arguments are the opening and closing string that will surround the computed range.

ucEquals(String1, String2)

Compares two strings. This powerful function compares two strings, not character by character (unless you use ucChar), but by pattern. Just as with other library routines, you can toggle space, quote, block, and case sensitivity, and you can also conveniently use ucSkip() to have it disregard certain parts when doing the operation (string comparison in this case).

Example:

ucEquals("THIS is a test", "this is a test") ' True

ucEquals("THIS, is a test?", "This is a test.") ' False

ucEquals("THIS, is a test?", "This is a test.", ucSkip("{ , | ? | . | <{tag}> }")) ' True

ucSetStringDefaults(Options)

All library routines allow a last argument that configures properties such as case sensitivity, delimiters, etc. If you will generally be using the same combination of properties, instead of passing it to each call, you can configure it before placing calls. Multiple properties are combined using the "+" operator.

Examples:

ucSetStringDefaults(ucSkip("<{tag}>"))

ucEquals("This is a test", "This is a TEST") ' True

ucEquals("This is a test", "This is a test") ' True

ucEquals("This is a test", "This is a test!") ' False

ucSetStringDefaults(ucCase+ucBlock+ucQuote)

ucLen("This (is a) ""test of tests"" ") ' Returns 3

ucLeft("This (is a) ""test of tests"" ", 2) ' Returns "This (is a)"

ucEqual("THIS IS A TEST", "This is a test") ' False

' In uCalc ConsoleCalculator

C:\> uCalc

.> SetStringDefaults(ucChar)

.> Len("This is a test")

. 14

.> Left("This is a test", 3)

. Thi

Other functions, which can be used in the Options argument here (or with all the other routines):

ucStart(Position)

ucLimit(Limit)

ucStartAfter(number)

ucStopAfter(number)

ucBetween(a, b)

ucNth(number)

ucPos(x)

ucLength(x)

ucText(x)

ucDelim(separator [, OpeningString, ClosingString])

ucSplit

This splits a text into parts. By default it splits the text into tokens. But you can use some of the same properties described above, or you can split based on a pattern. Once the text is split, you can retrieve the parts with ucStringItem(n), where n is a value between 1 and the number of parts, which is ucStringItemCount().