Class SyntaxTokenizer


  • public class SyntaxTokenizer
    extends java.lang.Object
    A utility class that can break up Mathematica code into 4 syntax classes: strings, comments, symbols, and normal (meaning everything else). This class is used by MathSessionPane to implement its syntax coloring feature, but you can use it directly in your own programs.

    To use a SyntaxTokenizer, construct one and then call its setText() method, supplying the Mathematica input you want tokenized. You then call getNextRecord() repeatedly to get SyntaxRecords, which tell you the type of syntax element and the length in characters.

    This process is very fast. You can iterate through 100,000 characters of Mathematica code in a small fraction of a second

    Here is some sample code that demonstrates how to use a SyntaxTokenizer:

            String input = "some Mathematica code here";
            SyntaxTokenizer tok = new SyntaxTokenizer();
            tok.setText(input);
            while(tok.hasMoreRecords()) {
                    SyntaxTokenizer.SyntaxRecord rec = tok.getNextRecord();
                    System.out.println("type: " + rec.type);
                    System.out.println("text: " + input.substring(rec.start, rec.start + rec.length));
            }
    Since:
    2.0
    See Also:
    MathSessionPane
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      class  SyntaxTokenizer.SyntaxRecord
      A simple class the encapsulates information about a syntax element.
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int COMMENT
      A syntax type that corresponds to a Mathematica comment.
      static int NORMAL
      A syntax type that consists of everything other than STRING, COMMENT, or SYMBOL.
      static int STRING
      A syntax type that corresponds to a literal string.
      static int SYMBOL
      A syntax type that corresponds to a Mathematica symbol.
    • Constructor Summary

      Constructors 
      Constructor Description
      SyntaxTokenizer()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      SyntaxTokenizer.SyntaxRecord getNextRecord()
      Gets the next SyntaxRecord specifying the type of the element (SYMBOL, STRING, COMMENT or NORMAL), its start position, and length.
      boolean hasMoreRecords()
      Returns true or false to indicate whether there are any more records left in the text (i.e., whether we have come to the end of the input).
      void reset()
      Resets the state of the tokenizer so that the next call to getNextRecord() will retrieve the first record in the text.
      void setText​(java.lang.String text)
      Sets the Mathematica input text to tokenize.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • NORMAL

        public static final int NORMAL
        A syntax type that consists of everything other than STRING, COMMENT, or SYMBOL.
        See Also:
        Constant Field Values
      • STRING

        public static final int STRING
        A syntax type that corresponds to a literal string.
        See Also:
        Constant Field Values
      • COMMENT

        public static final int COMMENT
        A syntax type that corresponds to a Mathematica comment.
        See Also:
        Constant Field Values
      • SYMBOL

        public static final int SYMBOL
        A syntax type that corresponds to a Mathematica symbol.
        See Also:
        Constant Field Values
    • Constructor Detail

      • SyntaxTokenizer

        public SyntaxTokenizer()
    • Method Detail

      • setText

        public void setText​(java.lang.String text)
        Sets the Mathematica input text to tokenize.
        Parameters:
        text -
      • reset

        public void reset()
        Resets the state of the tokenizer so that the next call to getNextRecord() will retrieve the first record in the text.
      • getNextRecord

        public SyntaxTokenizer.SyntaxRecord getNextRecord()
        Gets the next SyntaxRecord specifying the type of the element (SYMBOL, STRING, COMMENT or NORMAL), its start position, and length.
      • hasMoreRecords

        public boolean hasMoreRecords()
        Returns true or false to indicate whether there are any more records left in the text (i.e., whether we have come to the end of the input).