Searching Files
| FindList["file","text"] | get a list of all the lines in the file that contain the specified text |
| FindList["file","text",n] | get a list of the first n lines that contain the specified text |
| FindList["file",{"text1","text2",...}] | get lines that contain any of the texti |
Finding lines that contain specified text.
Here is a file containing some text. |
This returns a list of all the lines in the file containing the text is.
| Out[2]= |  |
|
The text fourth appears nowhere in the file.
| Out[3]= |  |
|
By default,
FindList scans successive lines of a file, and returns those lines which contain the text you specify. In general, however, you can get
FindList to scan successive
records, and return complete records which contain specified text. As in
ReadList, the option
RecordSeparators allows you to tell
Mathematica what strings you want to consider as record separators. Note that by giving a pair of lists as the setting for
RecordSeparators, you can specify different left and right separators. By doing this, you can make
FindList search only for text which is between specific pairs of separators.
This finds all "sentences" ending with a period which contain And.
| Out[4]= |  |
|
Options for FindList.
This finds only the occurrence of Here which is at the beginning of a line in the file.
| Out[5]= |  |
|
In general,
FindList finds text that appears anywhere inside a record. By setting the option
WordSearch->True, however, you can tell
FindList to require that the text it is looking for appears as a separate
word in the record. The option
WordSeparators specifies the list of separators for words.
The text th does appear in the file, but not as a word. As a result, the FindList fails.
| Out[6]= |  |
|
| FindList[{"file1","file2",...},"text"] |
| search for occurrences of the text in any of the filei |
Searching in multiple files.
This searches for third in two copies of textfile.
| Out[7]= |  |
|
It is often useful to call
FindList on lists of files generated by functions such as
FileNames.
| FindList["!command",...] | run an external command, and find text in its output |
Finding text in the output from an external program.
This runs the external Unix command date in a text-based interface.
| Out[8]= |  |
|
This finds the time-of-day field in the date.
| Out[9]= |  |
|
| OpenRead["file"] | open a file for reading |
| OpenRead["!command"] | open a pipe for reading |
| Find[stream,text] | find the next occurrence of text |
| Close[stream] | close an input stream |
Finding successive occurrences of text.
FindList works by making one pass through a particular file, looking for occurrences of the text you specify. Sometimes, however, you may want to search incrementally for successive occurrences of a piece of text. You can do this using
Find.
In order to use
Find, you first explicitly have to open an input stream using
OpenRead. Then, every time you call
Find on this stream, it will search for the text you specify, and make the current point in the file be just after the record it finds. As a result, you can call
Find several times to find successive pieces of text.
This opens an input stream for textfile.
| Out[10]= |  |
|
This finds the first line containing And.
| Out[11]= |  |
|
Calling Find again gives you the next line containing And.
| Out[12]= |  |
|
This closes the input stream.
| Out[13]= |  |
|
Once you have an input stream, you can mix calls to
Find,
Skip and
Read. If you ever call
FindList or
ReadList,
Mathematica will immediately read to the end of the input stream.
This opens the input stream.
| Out[14]= |  |
|
This finds the first line which contains second, and leaves the current point in the file at the beginning of the next line.
| Out[15]= |  |
|
Read can then read the word that appears at the beginning of the line.
| Out[16]= |  |
|
This skips over the next three words. |
Mathematica finds is in the remaining text, and prints the entire record as output.
| Out[18]= |  |
|
This closes the input stream.
| Out[19]= |  |
|
Finding and setting the current point in a stream.
Functions like
Read,
Skip and
Find usually operate on streams in an entirely sequential fashion. Each time one of the functions is called, the current point in the stream moves on.
Sometimes, you may need to know where the current point in a stream is, and be able to reset it. On most computer systems,
StreamPosition returns the position of the current point as an integer giving the number of bytes from the beginning of the stream.
| Out[20]= |  |
|
When you first open the file, the current point is at the beginning, and StreamPosition returns 0.
| Out[21]= |  |
|
This reads the first line in the file.
| Out[22]= |  |
|
Now the current point has advanced.
| Out[23]= |  |
|
This sets the stream position back.
| Out[24]= |  |
|
Now Read returns the remainder of the first line.
| Out[25]= |  |
|
| Out[26]= |  |
|