CS 211 Lesson 19
Strings and String Functions
Quote:
Nearly all men can stand adversity, but if you want to test a man's character, give him power. Abraham Lincoln
Lesson Objectives:
Understand how MATLAB represents string values
Be able to concatenate multiple string values
Be able to compare strings
Be able to find and replace substrings within strings
Be able to convert between numbers and strings
Lesson:
I. MATLAB Concepts
A. MATLAB Strings
MATLAB represents a text string as an row vector of characters (char data type), where each character is stored in 2 bytes of memory.
String literals are represented as a sequence of characters enclosed in single quotes, e.g., 'Jane Doe'.
Individual characters are stored as 16-bit Unicode values.
Consider the output of the following code:
Letters = 'abcde';
disp(Letters); % display Letters (as a char array)
fprintf('%s\n', Letters); % display Letters as a string
fprintf('%c\n', Letters); % display each character of Letters
fprintf('%d\n', Letters); % display Unicode values of LettersWhen using disp(), MATLAB displays a character array variable as a string.
To input a value as a character array, you must include 's' as the second argument of the input() function. For example,
Name = input('Please enter your full name: ', 's');
You can reference individual or multiple characters within a char array in the same way you access elements within other row vectors. For example,
Letters = 'abcdefghijklmnopqrstuvwxyz';
A = Letters(2) % A contains the character 'b'
B = Letters(1:3) % B contains the characters 'abc'
C = Letters(3:end) % C contains the characters 'cdefghijklmnopqrstuvwxyz'B. 2-D Character matrices
You can create 2-D character arrays, but all rows must have the same number of columns.
Consider the output of the following code:
Names1 = ['jane'; 'jim'; 'johnny']; % error - all strings not the same length Names2 = ['jane '; 'jim '; 'johnny']; % notice the trailing blanks to make them all the same length Names3 = char('jane', 'jim', 'johnny'); % the char() function adds the required trailing blanks for you Cell arrays (which will be covered in Lesson 21) are the best way to store an array of strings .
C. String Functions
MATLAB provides many useful functions for string manipulation. Only a few of them are presented in this lesson. If you would like to investigate additional string functions, search for "Characters and Strings" in the MATLAB help system.
Use the length() function (not size()) to get the number of characters in a string.
Use the num2str() and str2num() functions to convert between numbers and strings.
disp(['pi = ' num2str(pi)]);
x = str2num('23');
To manipulate a character's Unicode value as a number, cast the character to a number.
x = double('A');
To convert a number into its equivalent Unicode character, cast the number into a char.
Letter = char(67);
Use the ischar() logical function to determine if a variable is a char array.
Concatenation appends one string to the end of another.
You can append strings using square brackets:
Full_name = [First_name, ' ', Last_name]; % commas optional
You can also use the strcat() function to concatenate strings:
Full_name = strcat(First_name, ' ', Last_name);
strcat() removes trailing spaces from each input argument, but preserves spaces within strings.
strcat() takes two or more strings as input arguments, concatenates them, and returns a single string as an output argument.
strvcat() concatenates strings vertically padding with spaces to create a 2-D matrix (like char()).
Names3 = strvcat('jane', 'jim', 'johnny');
Use the strcmp() logical function (or one of its variants) to compare strings for equality.
Do not use the equality operator (==) to compare strings for equality!
Use strcmpi() to compare two strings ignoring case.
Use strncmp() to compare the first n characters of a string.
Use strncmpi() to compare the first n characters of a string ignoring case.
Consider the following code examples:
Name = 'Joe';
Name == 'joe' % compare character-by-character - bad idea
strcmp(Name, 'Joe') % compare strings
strcmp(Name, 'joe') % compare strings
strcmpi(Name, 'joe') % compare strings ignoring case
strncmp(Name, 'Joel', 3) % compare first 3 characters of strings
strncmpi(Name, 'joel', 3) % compare first 3 chars ignoring caseIt is OK to compare individual characters in a string using the == (equality) operator:
Name(1) == 'J' % single character comparison
MATLAB does not provide a function to determine alphabetical (lexicographical) ordering of strings
Use the deblank() function to remove trailing blanks from a string.
Use the strtrim() function to remove leading and trailing blanks from a string.
Use the lower() function to convert all upper case letters in a string to lower case.
Use the upper() function to convert all lower case letters in a string to upper case.
Use the isstrprop() function to determine the category of individual characters within a string
Categories include: alpha, alphanum, cntrl, digit, lower, wspace, upper, and xdigit (see isstrprop in the MATLAB help system).
Consider the following code:
String = 'I am 21!';
isstrprop(String, 'alpha')
isstrprop(String, 'alphanum')
isstrprop(String, 'digit')
isstrprop(String, 'lower')
isstrprop(String, 'upper')
isstrprop(String, 'wspace')Use the findstr() function to find a substring within a string .
findstr() returns a row vector containing the starting indexes of the matching substrings within a string.
if the substring is not present in the string, findstr() returns an empty vector, not zero.
Use the isempty() function to test if a vector is empty.
Consider the following example code:
Text = 'Give the check to Bill. Bill always pays the bill.';
findstr(Text, 'Bill')
findstr(lower(Text), 'bill')
findstr(Text, 'eggplant')Use the strmatch(string, array_of_strings) function to find a string within a 2D array of strings, where each row is separate string.
strmatch() returns a column vector containing the rows where matches were found.
If a third argument with a value of 'exact' is passed to strmatch(), then only exact matches will be recognized. Otherwise, partial matches are returned.
Use the strrep(str1, str2, str3) function to replace every occurrence of substring str2 with string str3 within string str1. Consider the following example code:
Text = 'Give the check to Bill. Bill always pays the bill.';
strrep(Text, 'Bill', 'Jill')
strrep(Text, 'always ', '')The strtok() function is useful for removing words or "tokens" (separated by delimiters) from a string
The default delimiters are white space (blanks, tabs, and new-line characters).
The second optional parameter to strtok() is a vector of delimiters.
Consider the following code executed in sequence:
Remainder = 'This is a string.';
[Word Remainder] = strtok(Remainder)
[Word Remainder] = strtok(Remainder)
[Word Remainder] = strtok(Remainder)
[Word Remainder] = strtok(Remainder)
II. Good Programming Practices
Always use the MATLAB string functions to create, compare, and manipulate string data.
III. Algorithms
MATLAB does not have a built-in function that will compare
two strings and determine whether the first string is "less than", "equal
to", or "greater than" a second string. Such a comparison is necessary if
you were to sort a list of strings into alphabetical order, such as for a
dictionary.
A basic algorithm for determining whether one string is less
than another string requires that you skip over all characters that are
equal to each other. Then you can base your comparison of the two strings on
the first characters that are not equal. For example, when comparing "Johnathon"
with "Johnston", the first string is less than the second string because 'a'
is less than 's' in the 5th position of the strings.
An algorithm (written in MATLAB pseudocode) for a lexicographical
comparison of two strings is below. The typical way to indicate "less than",
"equal to", or "greater than" is by returning -1, 0, and +1 respectively.
% Skip over all the characters that are
equal
Index = 1;
while Index < length(String1) and Index < length(String2) and String1(index)
== String2(index)
Index = Index + 1;
end
% Return the appropriate code
if String1(Index) < String2(Index) then return -1:
elseif string1(Index) > string2(Index) then return +1;
else % the characters at this position are equal -- the shorter of the two
strings should be "less than"
if length(String1) == length(String2) then return 0; % they are "equal"
elseif length(String1) < length(String2) then return -1;
else
return +1;
end
Lab Work: Lab 19
References: Chapman Textbook: section 6.2