How to convert special symbols in web scraping with R? -
i learning how scrape web xml
, rcurl
packages. goes except 1 thing. special characters ö or č read in differently r. instance í read in ÃÂ. assume latter sort of html coding first.
i have been looking way convert these characters have not found it. sure other people have stumbled upon problem well, , suspect there must sort of function convert these characters. know solution? in advance.
here example of code, sorry did not provide earlier.
library(xml) url <- 'http://en.wikipedia.org/wiki/2000_wimbledon_championships_%e2%80%93_men%27s_singles' tables <- readhtmltable(url) sec <- tables[[6]] pl1r1 <- unlist(strsplit(as.character(sec[,2]), ' '))[seq(2,32, 4)] enc2utf8(pl1r1) # not seem work
try parsing first while specifying encoding, reading table, here: readhtmltable , utf-8 encoding.
an example might be:
library(xml) url <- "http://en.wikipedia.org/wiki/2000_wimbledon_championships_%e2%80%93_men%27s_singles" doc <- htmlparse(url, encoding = "utf-8") #this preserve characters tables <- as.data.frame(readhtmltable(doc, stringsasfactors = false)) sec <- tables[[6]] #not sure you're trying here though pl1r1 <- unlist(strsplit(as.character(sec[,2]), ' '))[seq(2,32, 4)]
Comments
Post a Comment