Followers

Monday 2 October 2017

Write UTF-8 Chinese characters from R to a CSV on Windows

Source: stackoverflow.com --- Sunday, October 01, 2017
I'm able to scrape and display Chinese characters like this: pacman::p_load(rvest, stringr) Sys.setlocale("LC_CTYPE", locale="Chinese") cheese <- read_html("http://ift.tt/2xQfteo") links <- cheese %>% html_node(".box") %>% html_nodes("li") %>% html_nodes("a") links <- links[grepl("read",links)] url_pattern <- "http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+" links <- str_extract(links, url_pattern) links <- links[grepl(".shtml",links)] text <- data.frame(english=as.character(), chinese=as.character(), stringsAsFactors = F) for(u in links){ cheese <- read_html(u) tmp0 <- cheese %>% html_nodes(xpath = '//*[@class="qh_en"]') %>% html_text() tmp0 <- gsub('\"', "", tmp0, fixed = TRUE) tmp1 <- cheese %>% html_nodes(xpath = '//*[@class="qh_zg"]') %>% html_text() tmp1 <- gsub('\"', "", tmp1, fixed = TRUE) if(length(tmp0)==length(tmp1)){ tmp <- data.frame(english=tmp0, chinese=tmp1, stringsAsFactors = F) text <- rbind(text, tmp) } } However, even when I set the file encoding, when I save it to CSV like this write.csv(text, "english_chinese_test.csv",fileEncoding = "UTF-8") the resulting file has values like this instead of Chinese characters: 2éè°è¨¤o¥ééo1Ã¨Â·ï¿ â€²Ë‰Ã¨Ë‰â€²3ˉo¤§¤°ooÃ©ï¿ Â¨3¨è§1|èaᱎo|è3ˉè§ˉ1èa·±aooèˉ′è§1ˉaooé飰é |-¤èo飰éoo¨é1è1è·è¤oè°èˉ′aˉooèμ·¥èˉ1é£aoo·èa¥¨¤°±é£°éè°±ˉèˉ′o ...



from Windows http://ift.tt/2fIf4Et

No comments:

Post a Comment