convert a matrix of characters into a matrix of strings in R -
i have large matrix of characters , want convert matrix of strings, without looping on each row individually, wondering there smart way fast, tried paste(data[,4:((i*2)+3)],collapse=""), problem combines rows large 1 string, while need have same initial number of rows original matrix, , each row contains 1 column string contains characters in specific row in other words: want convert matrix
a= { d e r p g k s k p s l n s k p s l n s k p s l n s k p s l n }
into
a= { derpgki skpasln skpasln skpasln skpasln }
apply
loop, should still pretty efficient in case. it's use be:
apply(x, 1, paste, collapse = "")
alternatively, can try:
do.call(paste0, data.frame(x))
which might faster....
a reproducible example (not sure why i'm wasting time here)...
x <- structure(c("d", "s", "s", "s", "s", "e", "k", "k", "k", "k", "r", "p", "p", "p", "p", "p", "a", "a", "a", "a", "g", "s", "s", "s", "s", "k", "l", "l", "l", "l", "i", "n", "n", "n", "n"), .dim = c(5l, 7l)) x # [,1] [,2] [,3] [,4] [,5] [,6] [,7] # [1,] "d" "e" "r" "p" "g" "k" "i" # [2,] "s" "k" "p" "a" "s" "l" "n" # [3,] "s" "k" "p" "a" "s" "l" "n" # [4,] "s" "k" "p" "a" "s" "l" "n" # [5,] "s" "k" "p" "a" "s" "l" "n"
let's compare options:
library(microbenchmark) fun1 <- function(inmat) apply(inmat, 1, paste, collapse = "") fun2 <- function(inmat) do.call(paste0, data.frame(inmat)) fun1(x) # [1] "derpgki" "skpasln" "skpasln" "skpasln" "skpasln" fun2(x) # [1] "derpgki" "skpasln" "skpasln" "skpasln" "skpasln" microbenchmark(fun1(x), fun2(x)) # unit: microseconds # expr min lq median uq max neval # fun1(x) 97.634 104.4805 112.0725 117.7735 268.503 100 # fun2(x) 1258.000 1282.6275 1301.5555 1316.5015 1576.506 100
and, on longer data.
x <- do.call(rbind, replicate(100000, x, simplify=false)) dim(x) # [1] 500000 7 microbenchmark(fun1(x), fun2(x), times = 10) # unit: milliseconds # expr min lq median uq max neval # fun1(x) 4189.8940 4226.9354 4382.0403 4570.032 4596.983 10 # fun2(x) 825.9816 835.4351 888.5102 1031.509 1056.832 10
i suspect on wider data, apply
still more efficient though.
Comments
Post a Comment