X

Geertjan's Blog

  • July 10, 2007

Parsing HTML for Links and Writing to the Output Window (Part 2)

Geertjan Wielenga
Product Manager
HTML parsing is a wonderful thing. In Parsing HTML for Links and Writing to the Output Window (Part1), I provided all the keys to unlock this particular puzzle. There, we parse HTML documents for HREF attributes and then write their values to the Output window as self-referencing hyperlinks. That's useful, especially when you can determine whether those links are broken or not, which is what I showed over the last few days.

But how about a different scenario. Purely hypothetical. You have HTML documents that you use in JavaHelp, but also as an on-line user guide. Everything in the JavaHelp version and in the on-line version is the same, except... object tags. Possibly you use object tags for mouse-overs, i.e., for little popup definitions of terms used in the JavaHelp. In the on-line version, those object tags should be replaced by onmouseover A links. That's a bit annoying. You need to find those manually, somehow, and replace them manually, somehow. How to overcome this? How to automate this? Well, the first step is to identify all those object tags in the first place. The code for doing this is exactly the same as in Parsing HTML for Links and Writing to the Output Window (Part 1), except we'll be slightly more fancy. This time we won't only identify the start tag, but also the end tag. And not just the end tag, but also the 'text' parameter, the value of which provides the text for which the object tag is defined.

So, here's the end result, with each object tag's starting line, ending line, and 'text' parameter value written to the Output window (with links for jumping back into the appropriate place in the document!) all of which is done as a result of right-clicking in the HTML editor and choosing 'Identify All Object Tags':

And here is the HTML parsing code, which is just standard HTML parsing, no NetBeans API magic at all, except for the part where we identify the line, which we need to send to the NetBeans API OutputListener implementation, for printing as hyperlinks in the Output window (as explained earlier, in Parsing HTML for Links and Writing to the Output Window (Part 1)):

@Override
public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
if (t.toString().equals("object")) {
try {
java.lang.String value = (java.lang.String) a.getAttribute(HTML.Attribute.CLASSID);
int lineNo = NbDocument.findLineNumber(doc, pos);
int realLineNo = lineNo - 1;
writer.println("Object tag begins in line " + (lineNo + 1) + ": " +
value, new HTMLOutputListener(dObj, realLineNo, pos));
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
}
}
}
@Override
public void handleSimpleTag(Tag t, MutableAttributeSet a, int pos) {
if (t.toString().equals("param")) {
try {
java.lang.String nameParam = (java.lang.String) a.getAttribute(HTML.Attribute.NAME);
if (nameParam.equals("text")) {
String value = (java.lang.String) a.getAttribute(HTML.Attribute.VALUE);
int lineNo = NbDocument.findLineNumber(doc, pos);
int realLineNo = lineNo - 1;
writer.println("Text parameter in line " + (lineNo + 1) + ": " +
value, new HTMLOutputListener(dObj, realLineNo, pos));
}
} catch (IOException ex) {
Exceptions.printStackTrace(ex);
}
}
}
@Override
public void handleEndTag(Tag t, int pos) {
if (t.toString().equals("object")) {
int lineNo = NbDocument.findLineNumber(doc, pos);
writer.println("Ends in line " + (lineNo + 1) + "\\n\\n");
}
}

Combine the above with the code referenced in the earlier blog entry and then you too will be able to identify object links. Combine this with a more recent blog entry and then you'll be able to set the menu item on a folder, so that all subfolders and their HTML files will be searched for object tags. And, ultimately, we need to now figure out a way of replacing these found object tags with their on-line A link equivalents. Then we'll really have something useful!

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.