R attributes and regexpr

I've been working with R recently.

Here is an example of using the match.length attribute that is returned from regexpr:

Three strings in a vector, first and third include an embedded string:

data=c("<a href=\"ch4.html\">Chapter 1</a>",
       "no quoted string is embedded in this string",
       "<a   href=\"appendix.html\">Appendix</a>")

Use regexpr to locate the embedded strings:

> locations <- regexpr("\"(.*?)\"", data)

Matches are in the first string (at 9 with length 10) and third string (at 11 with length 15):

> locations
[1]  9 -1 11
attr(,"match.length")
[1] 10 -1 15
attr(,"useBytes")
[1] TRUE

Vector from the attribute:

> attr(locations,"match.length")
[1] 10 -1 15

Use substr and the attribute vector to extract the strings:

> quoted_strings=substr( data, 
                         locations, 
                         locations+attr(locations,"match.length")-1 )    
> quoted_strings
[1] "\"ch4.html\""      ""                  "\"appendix.html\""

Maybe you'd like to remove the embedded quote characters from your strings:

> gsub("\"", "", quoted_strings)
[1] "ch4.html"      ""              "appendix.html"

An alternative is to use regmatches:

> regmatches(data,locations)
[1] "\"ch4.html\""      "\"appendix.html\""
Comments:

Post a Comment:
  • HTML Syntax: NOT allowed
About

user12620111

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today