This is documentation for Mathematica 5, which was
based on an earlier version of the Wolfram Language.
View current documentation (Version 11.1)

Documentation / Mathematica / The Mathematica Book / Principles of Mathematica / Files and Streams /

2.12.8 Reading Textual Data

With <<, you can read files which contain Mathematica expressions given in input form. Sometimes, however, you may instead need to read files of data in other formats. For example, you may have data generated by an external program which consists of a sequence of numbers separated by spaces. This data cannot be read directly as Mathematica input. However, the function ReadList can take such data from a file or input stream, and convert it to a Mathematica list.

Reading numbers from a file.

Here is a file of numbers.

In[1]:= !!numbers

"11.1 22.2 33.3

44.4 55.5 66.6"

This reads all the numbers in the file, and returns a list of them.

In[2]:= ReadList["numbers", Number]

Out[2]=

Reading blocks of numbers.

This puts each successive pair of numbers from the file into a separate list.

In[3]:= ReadList["numbers", {Number, Number}]

Out[3]=

This makes each line in the file into a separate list.

In[4]:= ReadList["numbers", Number, RecordLists -> True]

Out[4]=

ReadList can handle numbers which are given in Fortran-like "E" notation. Thus, for example, ReadList will read 2.5E+5 as . Note that ReadList can handle numbers with any number of digits of precision.

Here is a file containing numbers in Fortran-like "E" notation.

In[5]:= !!bignum

"4.5E-5 7.8E4

2.5E2 -8.9"

ReadList can handle numbers in this form.

In[6]:= ReadList["bignum", Number]

Out[6]=

Reading objects of various types.

ReadList can read not only numbers, but also a variety of other types of object. Each type of object is specified by a symbol such as Number.

Here is a file containing text.

In[7]:= !!strings

"Here is text.

And more text."

This produces a list of the characters in the file, each given as a one-character string.

In[8]:= ReadList["strings", Character]

Out[8]=

Here are the integer codes corresponding to each of the bytes in the file.

In[9]:= ReadList["strings", Byte]

Out[9]=

This puts the data from each line in the file into a separate list.

In[10]:= ReadList["strings", Byte, RecordLists -> True]

Out[10]=

Types of objects to read.

This returns a list of the "words" in the file strings.

In[11]:= ReadList["strings", Word]

Out[11]=

ReadList allows you to read "words" from a file. It considers a "word" to be any sequence of characters delimited by word separators. You can set the option WordSeparators to specify the strings you want to treat as word separators. The default is to include spaces and tabs, but not to include, for example, standard punctuation characters. Note that in all cases successive words can be separated by any number of word separators. These separators are never taken to be part of the actual words returned by ReadList.

Options for ReadList.

This reads the text in the file strings as a sequence of words, using the letter e and . as word separators.

In[12]:= ReadList["strings", Word, WordSeparators -> {"e", "."}]

Out[12]=

Mathematica considers any data file to consist of a sequence of records. By default, each line is considered to be a separate record. In general, you can set the option RecordSeparators to give a list of separators for records. Note that words can never cross record separators. As with word separators, any number of record separators can exist between successive records, and these separators are not considered to be part of the records themselves.

By default, each line of the file is considered to be a record.

In[13]:= ReadList["strings", Record] // InputForm

Out[13]//InputForm= {"Here is text. ", "And more text."}

Here is a file containing three "sentences" ending with periods.

In[14]:= !!sentences

"Here is text. And more.

And a second line."

This allows both periods and newlines as record separators.

In[15]:= ReadList["sentences", Record,
RecordSeparators -> {".", "\n"}]

Out[15]=

This puts the words in each "sentence" into a separate list.

In[16]:= ReadList["sentences", Word, RecordLists -> True,
RecordSeparators -> {".", "\n"}]

Out[16]=

Settings for the RecordSeparators option.

Here is a file containing some text.

In[17]:= !!source

"f[x] (: function f :)

g[x] (: function g :)"

This reads all the text in the file source, and returns it as a single string.

In[18]:= InputForm[
ReadList["source", Record, RecordSeparators -> { }]
]

Out[18]//InputForm= {"f[x] (: function f :)\ng[x] (: function g :)\n"}

This gives a list of the parts of the file that lie between (: and :) separators.

In[19]:= ReadList["source", Record,
RecordSeparators -> {{"(: "}, {" :)"}}]

Out[19]=

By choosing appropriate separators, you can pick out specific parts of files.

In[20]:= ReadList[ "source", Record,
RecordSeparators ->
{{"(: function ", "["}, {" :)", "]"}} ]

Out[20]=

Mathematica usually allows any number of appropriate separators to appear between successive records or words. Sometimes, however, when several separators are present, you may want to assume that a "null record" or "null word" appears between each pair of adjacent separators. You can do this by setting the options NullRecords -> True or NullWords -> True.

Here is a file containing "words" separated by colons.

In[21]:= !!words

"first:second::fourth:::seventh"

Here the repeated colons are treated as single separators.

In[22]:= ReadList["words", Word, WordSeparators -> {":"}]

Out[22]=

Now repeated colons are taken to have null words in between.

In[23]:= ReadList["words", Word, WordSeparators -> {":"},
NullWords -> True]

Out[23]=

In most cases, you want words to be delimited by separators which are not themselves considered as words. Sometimes, however, it is convenient to allow words to be delimited by special "token words", which are themselves words. You can give a list of such token words as a setting for the option TokenWords.

Here is some text.

In[24]:= !!language

"22*a*b+56*c+13*a*d"

This reads the text, using the specified token words to delimit words in the text.

In[25]:= ReadList["language", Word, TokenWords -> {"+", "*"}]

Out[25]=

You can use ReadList to read Mathematica expressions from files. In general, each expression must end with a newline, although a single expression may go on for several lines.

Here is a file containing text that can be used as Mathematica input.

In[26]:= !!exprs

"x + y +

z

2^8"

This reads the text in exprs as Mathematica expressions.

In[27]:= ReadList["exprs", Expression]

Out[27]=

This prevents the expressions from being evaluated.

In[28]:= ReadList["exprs", Hold[Expression]]

Out[28]=

ReadList can insert the objects it reads into any Mathematica expression. The second argument to ReadList can consist of any expression containing symbols such as Number and Word specifying objects to read. Thus, for example, ReadList["file", Number, Number] inserts successive pairs of numbers that it reads into lists. Similarly, ReadList["file", Hold[Expression]] puts expressions that it reads inside Hold.

If ReadList reaches the end of your file before it has finished reading a particular set of objects you have asked for, then it inserts the special symbol EndOfFile in place of the objects it has not yet read.

Here is a file of numbers.

In[29]:= !!numbers

"11.1 22.2 33.3

44.4 55.5 66.6"

The symbol EndOfFile appears in place of numbers that were needed after the end of the file was reached.

In[30]:= ReadList["numbers", {Number, Number, Number, Number}]

Out[30]=

Reading from commands and streams.

This executes the Unix command date, and reads its output as a string.

In[31]:= ReadList["!date", String]

Out[31]=

Functions for reading from input streams.

ReadList allows you to read all the data in a particular file or input stream. Sometimes, however, you want to get data a piece at a time, perhaps doing tests to find out what kind of data to expect next.

When you read individual pieces of data from a file, Mathematica always remembers the "current point" that you are at in the file. When you call OpenRead, Mathematica sets up an input stream from a file, and makes your current point the beginning of the file. Every time you read an object from the file using Read, Mathematica sets your current point to be just after the object you have read. Using Skip, you can advance the current point past a sequence of objects without actually reading the objects.

Here is a file of numbers.

In[32]:= !!numbers

"11.1 22.2 33.3

44.4 55.5 66.6"

This opens an input stream from the file.

In[33]:= snum = OpenRead["numbers"]

Out[33]=

This reads the first number from the file.

In[34]:= Read[snum, Number]

Out[34]=

This reads the second pair of numbers.

In[35]:= Read[snum, {Number, Number}]

Out[35]=

This skips the next number.

In[36]:= Skip[snum, Number]

And this reads the remaining numbers.

In[37]:= ReadList[snum, Number]

Out[37]=

This closes the input stream.

In[38]:= Close[snum]

Out[38]=

You can use the options WordSeparators and RecordSeparators in Read and Skip just as you do in ReadList.

Note that if you try to read past the end of file, Read returns the symbol EndOfFile.