It's fairly common for programs to have a need to do some simple kinds of lexical analysis and parsing, such as splitting a command string up into tokens. You can do this with the strtok
function, declared in the header file `string.h'.
strtok
.
The string to be split up is passed as the newstring argument on the first call only. The strtok
function uses this to set up some internal state information. Subsequent calls to get additional tokens from the same string are indicated by passing a null pointer as the newstring argument. Calling strtok
with another non-null newstring argument reinitializes the state information. It is guaranteed that no other library function ever calls strtok
behind your back (which would mess up this internal state information).
The delimiters argument is a string that specifies a set of delimiters that may surround the token being extracted. All the initial characters that are members of this set are discarded. The first character that is not a member of this set of delimiters marks the beginning of the next token. The end of the token is found by looking for the next character that is a member of the delimiter set. This character in the original string newstring is overwritten by a null character, and the pointer to the beginning of the token in newstring is returned.
On the next call to strtok
, the searching begins at the next character beyond the one that marked the end of the previous token. Note that the set of delimiters delimiters do not have to be the same on every call in a series of calls to strtok
.
If the end of the string newstring is reached, or if the remainder of string consists only of delimiter characters, strtok
returns a null pointer.
Warning: Since strtok
alters the string it is parsing, you always copy the string to a temporary buffer before parsing it with strtok
. If you allow strtok
to modify a string that came from another part of your program, you are asking for trouble; that string may be part of a data structure that could be used for other purposes during the parsing, when alteration by strtok
makes the data structure temporarily inaccurate.
The string that you are operating on might even be a constant. Then when strtok
tries to modify it, your program will get a fatal signal for writing in read-only memory. See Program Error Signals.
This is a special case of a general principle: if a part of a program does not have as its purpose the modification of a certain data structure, then it is error-prone to modify the data structure temporarily.
The function strtok
is not reentrant. See Nonreentrancy, for a discussion of where and why reentrancy is important.
Here is a simple example showing the use of strtok
.
#include <string.h> #include <stddef.h> ... char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *token; ... token = strtok (string, delimiters); /* token => "words" */ token = strtok (NULL, delimiters); /* token => "separated" */ token = strtok (NULL, delimiters); /* token => "by" */ token = strtok (NULL, delimiters); /* token => "spaces" */ token = strtok (NULL, delimiters); /* token => "and" */ token = strtok (NULL, delimiters); /* token => "punctuation" */ token = strtok (NULL, delimiters); /* token => NULL */