# Reading Textual Data

With , you can read files that contain *Mathematica* expressions given in input form. Sometimes, however, you may instead need to read files of *data* in other formats. For example, you may have data generated by an external program which consists of a sequence of numbers separated by spaces. This data cannot be read directly as *Mathematica* input. However, the function ReadList can take such data from a file or input stream, and convert it to a *Mathematica* list.

In[2]:= |

Out[2]= |

ReadList["file",{Number,Number}] | read numbers from a file, putting each successive pair into a separate list |

ReadList["file",Table[Number,{n}]] | put each successive block of n numbers in a separate list |

ReadList["file",Number,RecordLists->True] | |

put all the numbers on each line of the file into a separate list |

In[3]:= |

Out[3]= |

In[4]:= |

Out[4]= |

ReadList can handle numbers that are given in Fortran-like "E" notation. Thus, for example, ReadList will read 2.5E+5 as . Note that ReadList can handle numbers with any number of digits of precision.

In[6]:= |

Out[6]= |

ReadList["file",type] | read a sequence of objects of a particular type |

ReadList["file",type,n] | read at most n objects |

Reading objects of various types.

ReadList can read not only numbers, but also a variety of other types of object. Each type of object is specified by a symbol such as Number.

In[8]:= |

Out[8]= |

In[9]:= |

Out[9]= |

In[10]:= |

Out[10]= |

Byte | single byte of data, returned as an integer |

Character | single character, returned as a one-character string |

Real | approximate number in Fortran-like notation |

Number | exact or approximate number in Fortran-like notation |

Word | sequence of characters delimited by word separators |

Record | sequence of characters delimited by record separators |

String | string terminated by a newline |

Expression | complete Mathematica expression |

Hold[Expression] | complete Mathematica expression, returned inside Hold |

In[11]:= |

Out[11]= |

ReadList allows you to read "words" from a file. It considers a "word" to be any sequence of characters delimited by word separators. You can set the option WordSeparators to specify the strings you want to treat as word separators. The default is to include spaces and tabs, but not to include, for example, standard punctuation characters. Note that in all cases successive words can be separated by any number of word separators. These separators are never taken to be part of the actual words returned by ReadList.

option name | default value | |

RecordLists | False | whether to make a separate list for the objects in each record |

RecordSeparators | {"\r\n", "\n","\r"} | separators for records |

WordSeparators | {" ","\t"} | separators for words |

NullRecords | False | whether to keep zero-length records |

NullWords | False | whether to keep zero-length words |

TokenWords | {} | words to take as tokens |

Options for ReadList.

In[12]:= |

Out[12]= |

*Mathematica* considers any data file to consist of a sequence of *records*. By default, each line is considered to be a separate record. In general, you can set the option RecordSeparators to give a list of separators for records. Note that words can never cross record separators. As with word separators, any number of record separators can exist between successive records, and these separators are not considered to be part of the records themselves.

In[13]:= |

Out[13]//InputForm= | |

In[15]:= |

Out[15]= |

In[16]:= |

Out[16]= |

ReadList["file",Record,RecordSeparators->{}] | |

read the whole of a file as a single string | |

ReadList["file",Record,RecordSeparators->{{"lsep_{1}",...},{"rsep_{1}",...}}] | |

make a list of those parts of a file that lie between the and the |

Settings for the RecordSeparators option.

In[18]:= |

Out[18]//InputForm= | |

In[19]:= |

Out[19]= |

In[20]:= |

Out[20]= |

*Mathematica* usually allows any number of appropriate separators to appear between successive records or words. Sometimes, however, when several separators are present, you may want to assume that a "null record" or "null word" appears between each pair of adjacent separators. You can do this by setting the options NullRecords->True or NullWords->True.

In[22]:= |

Out[22]= |

In[23]:= |

Out[23]= |

In most cases, you want words to be delimited by separators that are not themselves considered as words. Sometimes, however, it is convenient to allow words to be delimited by special "token words", which are themselves words. You can give a list of such token words as a setting for the option TokenWords.

In[25]:= |

Out[25]= |

You can use ReadList to read *Mathematica* expressions from files. In general, each expression must end with a newline, although a single expression may go on for several lines.

In[27]:= |

Out[27]= |

In[28]:= |

Out[28]= |

ReadList can insert the objects it reads into any *Mathematica* expression. The second argument to ReadList can consist of any expression containing symbols such as Number and Word specifying objects to read. Thus, for example, ReadList["file", {Number, Number}] inserts successive pairs of numbers that it reads into lists. Similarly, ReadList["file", Hold[Expression]] puts expressions that it reads inside Hold.

If ReadList reaches the end of your file before it has finished reading a particular set of objects you have asked for, then it inserts the special symbol EndOfFile in place of the objects it has not yet read.

In[30]:= |

Out[30]= |

ReadList["!command",type] | execute a command, and read its output |

ReadList[stream,type] | read any input stream |

Reading from commands and streams.

In[31]:= |

Out[31]= |

OpenRead["file"] | open a file for reading |

OpenRead["!command"] | open a pipe for reading |

Read[stream,type] | read an object of the specified type from a stream |

Skip[stream,type] | skip over an object of the specified type in an input stream |

Skip[stream,type,n] | skip over n objects of the specified type in an input stream |

Close[stream] | close an input stream |

Functions for reading from input streams.

ReadList allows you to read *all* the data in a particular file or input stream. Sometimes, however, you want to get data a piece at a time, perhaps doing tests to find out what kind of data to expect next.

When you read individual pieces of data from a file, *Mathematica* always remembers the "current point" that you are at in the file. When you call OpenRead, *Mathematica* sets up an input stream from a file, and makes your current point the beginning of the file. Every time you read an object from the file using Read, *Mathematica* sets your current point to be just after the object you have read. Using Skip, you can advance the current point past a sequence of objects without actually reading the objects.

In[33]:= |

Out[33]= |

In[34]:= |

Out[34]= |

In[35]:= |

Out[35]= |

In[36]:= |

In[37]:= |

Out[37]= |

In[38]:= |

Out[38]= |

You can use the options WordSeparators and RecordSeparators in Read and Skip just as you do in ReadList.

Note that if you try to read past the end of file, Read returns the symbol EndOfFile.