Logo

Working With Strings, a Few Tricks

Basics of R for Data Science

Uppercase and lowercase with toupper() and tolower()

text = "This is a String"

toupper(text)
[1] "THIS IS A STRING"
tolower(text)
[1] "this is a string"

Replace a character with another in a string using gsub()

text = "this is a string"

gsub("t", "X", text)
[1] "Xhis is a sXring"

Removing all punctuation

text = "This is a string. For some reason, there is some punctuation."

gsub("[[:punct:]]", "", text)
[1] "This is a string For some reason there is some punctuation"

Removing all digits

text = "in this string231 there are some 1241useless digits 443perhaps because 3434someone copied and pasted 985from a pdf"

gsub("[[:digit:]]", "", text)
[1] "in this string there are some useless digits perhaps because someone copied and pasted from a pdf"

Splitting a string into a vector of words

text = "this is a string"

unlist(strsplit(text, split=" "))
[1] "this"   "is"     "a"      "string"

Note: strplit() actually does the job, but it returns a list; if you want a vector for convenience, use unlist() on the output list

Removing a few cases from a vector

myVect = c("this", "is", "a", "vector")

myVect[!myVect %in% c("is", "a")]
[1] "this"   "vector"

Counting characters in strings

myVect = c("this", "is", "a", "vector")

nchar(myVect)
[1] 4 2 1 6

Turning a vector of strings into a single string

myVect = c("this", "is", "a", "vector")

paste(myVect, collapse=" ")
[1] "this is a vector"