Here, for example, is what a Manifest file looks like (by default) in the Source Editor:
Each of the entries in a Manifest file consists of a name-value pair. For example, the first entry above has "Manifest-Version" as the name and "1.0" as the value. Let's say our implementation of syntax highlighting would provide one color for the name, one color for the ":" sign (a colon), and another color for the value. Since each token can have a color assigned to it, you need to parse the text to recognize all names as one token, all colons as one token, and all values as one token. Therefore, before even beginning to implement syntax highlighting and code completion, you need to parse the text in the Manifest file into tokens. How would you do this?
First declare your tokens (by extending the org.netbeans.editor.TokenContext class). You could have four tokens -- one for the name, one for the colon, one for the value, and one for the end of line:
public static final TokenID NAME = new BaseTokenID("name", NAME_ID);
public static final TokenID COLON = new BaseTokenID("colon", COLON_ID);
public static final TokenID VALUE = new BaseTokenID("value", VALUE_ID);
public static final TokenID END_OF_LINE = new BaseTokenID("end-of-line", END_OF_LINE_ID);
After declaring the token, you need to find which part of the text is which token. You do this by starting in some initial state and sequentially looking at each character in the text and deciding if you stay in that state, move to another state, or announce that a token was found. For example, for names you start in the initial state and the first time you encounter a valid character for a name, you enter the ISI_NAME state. The ISI_NAME state is shown below. You stay in this state until you encounter a \\r, \\n or : character, which are definitely not part of a name. When you encounter such a character, you know that the characters you just walked over make up a name token.
case ISI_NAME:
switch (actChar) {
case ':':
case '\\r':
case '\\n':
state = ISA_AFTER_NAME;
return ManifestTokenContext.NAME;
}
break;
The code above runs within a while loop. At the end there is a break statement, which increases the offset in the text. The return statement in the code above avoids increasing the offset and ensures that the parsing of the next token will start with this character (it will likely be a colon, which is a meaningful token itself). The break statement on the other hand ensures that offset is increased. When all the characters up to the colon are tested, the IDE knows whether the cursor is inside a name or not. And when the IDE knows that, it can provide syntax highlighting and coloring (which you need to program yourself) -- because now it knows where it is and what should be provided.
Of course, this is only an outline of the technique. A detailed tutorial will be provided on this soon. Thanks to Andrei Badea for providing code and explanation for all of this.