X

Geertjan's Blog

  • October 27, 2007

Scripting for Something in HTML Files

Geertjan Wielenga
Product Manager
A script in Groovy for pulling all lines with HREF attributes from all HTML files within a folder, as well as all of its immediate subfolders. I call this script from a Java class, as explained in a previous blog entry. The result is that the Output window is then populated with all lines containing an HREF attribute:

Here's the script, could probably be a lot better, especially the part where the next level is found:

package demojavaapplication
class HelloWorld {
def basedir = '/home/geertjan/ijc/htmlfiles'
def text = []
void main(args) {
new File(basedir).eachFile { f->
if (f.isFile() && f.toString().endsWith("html")) {
writeTags(f)
} else if (!f.isFile()) {
basedir = f.toString()
new File(basedir).eachFile { fNext->
if (fNext.isFile() && fNext.toString().endsWith("html")) {
writeTags(fNext)
}
}
}
}
}
String writeTags(f) {
println "--------------"
println "File: " + f.getName()
f.eachLine {
ln -> if ( ln =~ 'href' ) {
text << "${ln}"
}
}
text.each{ println " Found: $it" }
text.clear()
}
}

If someone can help to make this script more compact, I would be happy to hear about it.

Join the discussion

Comments ( 2 )
  • bernhard Saturday, October 27, 2007

    Try :

    <pre><code>

    // limit directory scan depth

    MAX_DEPTH = 2;

    def dirsArgs = [];

    if (args.length > 0) {

    dirsArgs = args;

    } else {

    String basedir = 'C:\\\\Documents and Settings\\\\huberb1\\\\My Documents\\\\SaveAs Webpages'

    dirsArgs = [basedir]

    }

    dirsArgs.each { scanIt( it, 0 ) }

    /\*\*

    \* Scan a file, or directory

    \* @param filename a file or directory

    \* @param depth current dir depth

    \*/

    void scanIt( String filename, int depth ) {

    // limit scan dir depth

    if (depth > MAX_DEPTH) return

    final File theFile = new File(filename)

    if (theFile.isFile() && theFile.toString().endsWith('.html')) {

    // Examine this file

    writeTags(theFile)

    } else if (theFile.isDirectory()) {

    // Examine the directory

    theFile.eachFile {

    scanIt( it.toString(), depth+1 )

    }

    }

    }

    /\*\*

    \* Grep href from a single file

    \*/

    String writeTags(File f) {

    println "--------------"

    println "File: " + f.getName()

    final def text = [];

    f.eachLine {

    ln -> if ( ln =~ 'href' ) {

    text << "${ln}"

    }

    }

    // output matched lines

    text.each{ println " Found: $it" }

    }

    </code></pre>


  • Geertjan Sunday, October 28, 2007

    Thanks!!


Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.