Identifying Words in HTML Documents

If you study Tim's Wicket tag parser, which underlies the navigator mentioned in yesterday's blog entry, then you'll find it's not very hard to adapt it to your own purposes. For example, here's the same parser in action, but slightly modified. Here, instead of looking for Wicket tags, it looks for all words in the document and then prints them to the navigator:

Why might this be useful? Well, it's now a small step to a spell checker. The user would specify a file containing words, the words would be compared to the words in the document, and all the words that are not found in the file would be printed to the navigator. And those words, because they don't match the words in the file, would be the ones that are incorrect in one way or another. And that's all a spell checker should tell you, i.e., which words are incorrect. I've made a spell checker before, using annotations in the editor, but I haven't been able to find the code. Plus, I prefer this navigator approach to adding still more annotations to the editor. So, watch this space for HTML spell checker developments.


Spellchecker? I can remember one for the JavaDoc in NB5, but unfortunately I can't remember who provided it. What about figuring that out and using its features?


Posted by Jake on August 20, 2007 at 06:23 PM PDT #

Post a Comment:
  • HTML Syntax: NOT allowed

Geertjan Wielenga (@geertjanw) is a Principal Product Manager in the Oracle Developer Tools group living & working in Amsterdam. He is a Java technology enthusiast, evangelist, trainer, speaker, and writer. He blogs here daily.

The focus of this blog is mostly on NetBeans (a development tool primarily for Java programmers), with an occasional reference to NetBeans, and sometimes diverging to topics relating to NetBeans. And then there are days when NetBeans is mentioned, just for a change.


« August 2015