Jeff Taylor's Weblog

  • Sun
    February 10, 2014

R attributes and regexpr

I've been working with R recently.

Here is an example of using the match.length attribute that is returned from regexpr:

Three strings in a vector, first and third include an embedded string:

data=c("<a href=\"ch4.html\">Chapter 1</a>","no quoted string is embedded in this string","<a   href=\"appendix.html\">Appendix</a>")

Use regexpr to locate the embedded strings:

> locations <- regexpr("\"(.*?)\"", data)

Matches are in the first string (at 9 with length 10) and third string (at 11 with length 15):

> locations
[1] 9 -1 11
[1] 10 -1 15
[1] TRUE

Vector from the attribute:

> attr(locations,"match.length")
[1] 10 -1 15

Use substr and the attribute vector to extract the strings:

> quoted_strings=substr( data, 
+attr(locations,"match.length")-1 )
> quoted_strings
[1] "\"ch4.html\"" "" "\"appendix.html\""

Maybe you'd like to remove the embedded quote characters from your strings:

> gsub("\"", "", quoted_strings)
[1] "ch4.html" "" "appendix.html"

An alternative is to use regmatches:

> regmatches(data,locations)
[1] "\"ch4.html\"" "\"appendix.html\""

Be the first to comment

Comments ( 0 )
Please enter your name.Please provide a valid email address.Please enter a comment.CAPTCHA challenge response provided was incorrect. Please try again.