[R] 2. Sentiment analysis with tidy data

2. Sentiment analysis with tidy data

아래 그림은 텍스트 분석의 흐름도 입니다.
텍스트 감정을 분석하는 방법 중 하나는 텍스트 데이터가 여러 개별 단어의 조합으로 구성되어 있을 때,
전체 텍스트에서 감정 내용을 개별 단어의 감정 내용의 합으로 간주하는 것입니다.

2. 1. The `sentiments` dataset

textdata 라이브러리에 내장되어 있는 사전 데이터(dictionary-based)를 활용해보겠습니다.
- afinn: 단어 별 부정과 긍정 사이에 -5에서 5점까지 스코어를 매긴 데이터
- bing: 긍/부정을 binary로 나타낸 데이터 (positive, negative)
- nrc: positive, negative 외에 세 가지 이상의 감정을 분류한 데이터
아래 데이터 예시를 살펴보겠습니다.

library(textdata)

get_sentiments("afinn")

## # A tibble: 2,477 x 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # … with 2,467 more rows

get_sentiments("bing")

## # A tibble: 6,786 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 6,776 more rows

get_sentiments("nrc")

## # A tibble: 13,901 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 abacus      trust    
##  2 abandon     fear     
##  3 abandon     negative 
##  4 abandon     sadness  
##  5 abandoned   anger    
##  6 abandoned   fear     
##  7 abandoned   negative 
##  8 abandoned   sadness  
##  9 abandonment anger    
## 10 abandonment fear     
## # … with 13,891 more rows

이렇게 사전에 기반한 텍스트 분석은 해당 텍스트에 속한 각 단어들의 감정 점수를 합산하여 그 텍스트의 전체 감정을 찾는 방식입니다.

2. 2. Sentiment analysis with inner join

우리는 사전에 기반한 분석 방법으로 접근하기에 필요 없는 용어 제거와 같은 클렌징 작업이 필요할 수도 있습니다.
이 때 tidy data라면 anti_join() 등을 활용하는 것처럼 데이터 간의 조인 작업이 필요합니다.
1장에서 봤듯 janeaustenr 라이브러리의 austen_books()를 활용하겠습니다.

library(janeaustenr)

original_books <- austen_books() %>% 
  group_by(book) %>% 
  mutate(
    line_number = row_number(),
    chapter = cumsum(str_detect(string = text, pattern = regex("^chapter [\\divclx]", ignore_case = TRUE)))
  ) %>% 
  ungroup()

original_books

## # A tibble: 73,422 x 4
##    text                    book                line_number chapter
##    <chr>                   <fct>                     <int>   <int>
##  1 "SENSE AND SENSIBILITY" Sense & Sensibility           1       0
##  2 ""                      Sense & Sensibility           2       0
##  3 "by Jane Austen"        Sense & Sensibility           3       0
##  4 ""                      Sense & Sensibility           4       0
##  5 "(1811)"                Sense & Sensibility           5       0
##  6 ""                      Sense & Sensibility           6       0
##  7 ""                      Sense & Sensibility           7       0
##  8 ""                      Sense & Sensibility           8       0
##  9 ""                      Sense & Sensibility           9       0
## 10 "CHAPTER 1"             Sense & Sensibility          10       1
## # … with 73,412 more rows

이 또한 역시 unnest_tokens() 함수를 사용하여 text 컬럼을 word 라는 컬럼명을 갖는 단어 단위의 토큰으로 분리하겠습니다.

tidy_books <- original_books %>% 
  unnest_tokens(
    input = text, 
    output = "word"
  )

tidy_books

## # A tibble: 725,055 x 4
##    book                line_number chapter word       
##    <fct>                     <int>   <int> <chr>      
##  1 Sense & Sensibility           1       0 sense      
##  2 Sense & Sensibility           1       0 and        
##  3 Sense & Sensibility           1       0 sensibility
##  4 Sense & Sensibility           3       0 by         
##  5 Sense & Sensibility           3       0 jane       
##  6 Sense & Sensibility           3       0 austen     
##  7 Sense & Sensibility           5       0 1811       
##  8 Sense & Sensibility          10       1 chapter    
##  9 Sense & Sensibility          10       1 1          
## 10 Sense & Sensibility          13       1 the        
## # … with 725,045 more rows

그 다음 nrc 사전(get_sentiments("nrc"))에서 joy 라는 감정을 갖는 단어들을 살펴보겠습니다.

nrc_joy <- get_sentiments("nrc") %>% 
  filter(sentiment == "joy")

nrc_joy

## # A tibble: 689 x 2
##    word          sentiment
##    <chr>         <chr>    
##  1 absolution    joy      
##  2 abundance     joy      
##  3 abundant      joy      
##  4 accolade      joy      
##  5 accompaniment joy      
##  6 accomplish    joy      
##  7 accomplished  joy      
##  8 achieve       joy      
##  9 achievement   joy      
## 10 acrobat       joy      
## # … with 679 more rows

“Emma” 라는 이름을 갖는 책에서 joy 감정을 나타내는 단어들을 찾고 싶다면?
이러한 단어 사전 객체를 우리는 위에서 저장한 tidy_books 객체에 inner_join() 하겠습니다.

tidy_books %>% 
  filter(book == "Emma") %>% 
  inner_join(nrc_joy, by = "word")

## # A tibble: 4,432 x 5
##    book  line_number chapter word      sentiment
##    <fct>       <int>   <int> <chr>     <chr>    
##  1 Emma           16       1 happy     joy      
##  2 Emma           16       1 blessings joy      
##  3 Emma           21       1 marriage  joy      
##  4 Emma           22       1 mother    joy      
##  5 Emma           24       1 excellent joy      
##  6 Emma           25       1 mother    joy      
##  7 Emma           25       1 affection joy      
##  8 Emma           28       1 friend    joy      
##  9 Emma           33       1 friend    joy      
## 10 Emma           33       1 friend    joy      
## # … with 4,422 more rows

“Emma”에서 joy를 표현하는 단어들의 빈도를 계산하고자 한다면? count() 함수를 적용하여 확인할 수 있습니다.

tidy_books %>% 
  filter(book == "Emma") %>% 
  inner_join(nrc_joy, by = "word") %>% 
  count(word, sort = TRUE)

## # A tibble: 303 x 2
##    word        n
##    <chr>   <int>
##  1 good      359
##  2 young     192
##  3 friend    166
##  4 hope      143
##  5 happy     125
##  6 love      117
##  7 deal       92
##  8 found      92
##  9 present    89
## 10 kind       82
## # … with 293 more rows

또한 텍스트에서 감정이 어떻게 변하는 지도 살펴볼 수 있습니다.
이번에는 감정의 긍/부정을 나타내는 bing 사전(get_sentiments("bing"))을 활용해보겠습니다.
이 때 세션의 정의가 필요하다면 인덱싱을 정의를 해야하는데, 예제에는 그 세션을 80줄을 기준으로 하였습니다.

jane_austen_setiment <- tidy_books %>% 
  inner_join(get_sentiments("bing"), by = "word") %>% 
  mutate(index = line_number %/% 80) %>%  # x %/% y == floor(x/y) 몫을 제외하고 나머지 소숫점들 버림
  count(book, index, sentiment)

jane_austen_setiment

## # A tibble: 1,840 x 4
##    book                index sentiment     n
##    <fct>               <dbl> <chr>     <int>
##  1 Sense & Sensibility     0 negative     16
##  2 Sense & Sensibility     0 positive     32
##  3 Sense & Sensibility     1 negative     19
##  4 Sense & Sensibility     1 positive     53
##  5 Sense & Sensibility     2 negative     12
##  6 Sense & Sensibility     2 positive     31
##  7 Sense & Sensibility     3 negative     15
##  8 Sense & Sensibility     3 positive     31
##  9 Sense & Sensibility     4 negative     16
## 10 Sense & Sensibility     4 positive     34
## # … with 1,830 more rows

이를 조금 더 보기 편하게 pivot_wider() 함수를 적용하여 wide format으로 피벗팅 시켜보겠습니다.
- 피벗팅에 관련된 포스팅 : 여기!
피벗팅 이후에는 positive와 negative의 차이를 가지고 감정을 계산하겠습니다.
- sentiment = positive - negative

jane_austen_setiment_wider <- jane_austen_setiment %>% 
  pivot_wider(
    names_from = "sentiment",
    values_from = "n",
    values_fill = 0 # 결측이 있다면 0으로 처리
  ) %>% 
  mutate(sentiment = positive - negative)

jane_austen_setiment_wider

## # A tibble: 920 x 5
##    book                index negative positive sentiment
##    <fct>               <dbl>    <int>    <int>     <int>
##  1 Sense & Sensibility     0       16       32        16
##  2 Sense & Sensibility     1       19       53        34
##  3 Sense & Sensibility     2       12       31        19
##  4 Sense & Sensibility     3       15       31        16
##  5 Sense & Sensibility     4       16       34        18
##  6 Sense & Sensibility     5       16       51        35
##  7 Sense & Sensibility     6       24       40        16
##  8 Sense & Sensibility     7       23       51        28
##  9 Sense & Sensibility     8       30       40        10
## 10 Sense & Sensibility     9       15       19         4
## # … with 910 more rows

위의 결과를 가지고 각 책(book)별로 시간의 흐름에 따라(세션에 따라) 긍/부정 감정 추이를 시각화해볼 수 있습니다.

jane_austen_setiment_wider %>% 
  ggplot(aes(x = index, y = sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ book, nrow = 2, scales = "free_x")

2. 3. Comparing the three sentiment dictionaries

이번에는 특정 책 하나를 대상(filter(book == "Pride & Prejudice"))으로 하여 세 가지 감정 단어를 모두 사용해보고 비교해보도록 하겠습니다.

pride_prejudice <- tidy_books %>% 
  filter(book == "Pride & Prejudice")

pride_prejudice

## # A tibble: 122,204 x 4
##    book              line_number chapter word     
##    <fct>                   <int>   <int> <chr>    
##  1 Pride & Prejudice           1       0 pride    
##  2 Pride & Prejudice           1       0 and      
##  3 Pride & Prejudice           1       0 prejudice
##  4 Pride & Prejudice           3       0 by       
##  5 Pride & Prejudice           3       0 jane     
##  6 Pride & Prejudice           3       0 austen   
##  7 Pride & Prejudice           7       1 chapter  
##  8 Pride & Prejudice           7       1 1        
##  9 Pride & Prejudice          10       1 it       
## 10 Pride & Prejudice          10       1 is       
## # … with 122,194 more rows

위 객체를 가지고 세 가지 사전에 inner_join() 함수를 이용하여 적용해보겠습니다.

afinn <- pride_prejudice %>% 
  inner_join(get_sentiments("afinn"), by = "word") %>% 
  group_by(index = line_number %/% 80) %>% 
  summarise(sentiment = sum(value)) %>% 
  ungroup() %>% 
  mutate(method = "AFINN")

afinn

## # A tibble: 163 x 3
##    index sentiment method
##    <dbl>     <dbl> <chr> 
##  1     0        29 AFINN 
##  2     1         0 AFINN 
##  3     2        20 AFINN 
##  4     3        30 AFINN 
##  5     4        62 AFINN 
##  6     5        66 AFINN 
##  7     6        60 AFINN 
##  8     7        18 AFINN 
##  9     8        84 AFINN 
## 10     9        26 AFINN 
## # … with 153 more rows

bing <- pride_prejudice %>% 
  inner_join(get_sentiments("bing"), by = "word") %>% 
  mutate(method = "Bing et al.") %>% 
  count(method, index = line_number %/% 80, sentiment) %>% 
  pivot_wider(
    names_from = "sentiment",
    values_from = n,
    values_fill = 0
  ) %>% 
  mutate(sentiment = positive - negative)

bing

## # A tibble: 163 x 5
##    method      index negative positive sentiment
##    <chr>       <dbl>    <int>    <int>     <int>
##  1 Bing et al.     0        7       21        14
##  2 Bing et al.     1       20       19        -1
##  3 Bing et al.     2       16       20         4
##  4 Bing et al.     3       19       31        12
##  5 Bing et al.     4       23       47        24
##  6 Bing et al.     5       15       49        34
##  7 Bing et al.     6       18       46        28
##  8 Bing et al.     7       23       33        10
##  9 Bing et al.     8       17       48        31
## 10 Bing et al.     9       22       40        18
## # … with 153 more rows

nrc <- pride_prejudice %>% 
  inner_join(get_sentiments("nrc"), by = "word") %>% 
  filter(sentiment %in% c("positive", "negative")) %>% 
  mutate(method = "NRC") %>% 
  count(method, index = line_number %/% 80, sentiment) %>% 
  pivot_wider(
    names_from = "sentiment",
    values_from = n,
    values_fill = 0
  ) %>% 
  mutate(sentiment = positive - negative)

nrc

## # A tibble: 163 x 5
##    method index negative positive sentiment
##    <chr>  <dbl>    <int>    <int>     <int>
##  1 NRC        0       10       31        21
##  2 NRC        1       17       34        17
##  3 NRC        2       22       30         8
##  4 NRC        3       17       60        43
##  5 NRC        4       17       49        32
##  6 NRC        5       12       54        42
##  7 NRC        6       14       68        54
##  8 NRC        7       19       47        28
##  9 NRC        8       11       55        44
## 10 NRC        9       22       43        21
## # … with 153 more rows

위 세 가지를 가지고 시각화해본 결과는 아래와 같습니다.
- 아래 결과를 해석하자면, afinn 이나 bing은 조금 더 다양한 감정이 포함되었을거라 보여지는 반면,
  nrc는 이 책과 매핑하는 측면에서는 조금 더 긍정적인 단어들이 매핑될 가능성이 높다고 해석할 수 있습니다.

bind_rows(afinn, bing, nrc) %>% 
  ggplot(aes(x = index, y = sentiment, fill = method)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ method, ncol = 1, scales = "free_y")

2. 4. Most common positive and negative words

bing과 같이 감정과 단어가 모두 있는 사전이 갖는 하나의 이점은 각 감정에 기여하는 단어 수를 분석할 수 있다는 점입니다.

bing_word_counts <- tidy_books %>% 
  inner_join(get_sentiments("bing"), by = "word") %>% 
  count(word, sentiment, sort = TRUE)

bing_word_counts

## # A tibble: 2,585 x 3
##    word     sentiment     n
##    <chr>    <chr>     <int>
##  1 miss     negative   1855
##  2 well     positive   1523
##  3 good     positive   1380
##  4 great    positive    981
##  5 like     positive    725
##  6 better   positive    639
##  7 enough   positive    613
##  8 happy    positive    534
##  9 love     positive    495
## 10 pleasure positive    462
## # … with 2,575 more rows

긍/부정 각각 상위 10개 단어들을 시각화해본 결과는 아래와 같습니다.

bing_word_counts %>% 
  group_by(sentiment) %>% 
  slice_max(n, n = 10) %>% 
  ungroup() %>% 
  ggplot(aes(x = reorder(word, n), y = n, fill = sentiment)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ sentiment, scales = "free_y") +
  coord_flip() +
  labs(x = "Contribution to sentiment", y = NULL)

여기서 이상한 부분을 발견할 수 있습니다.
“miss”라는 단어는 해당 사전에서 부정으로 코딩되었지만, 샘플로 사용한 Jane Austen의 작품에서는 젊은 미혼 여성으로 사용된다고 합니다.
이러한 경우 아래와 같이 불용어 처리를 커스텀하게 설정할 수 있습니다.

custom_stop_words <- bind_rows(
  stop_words,
  tibble(word = c("miss"), lexicon = c("custom"))
)

custom_stop_words %>% 
  tail()

## # A tibble: 6 x 2
##   word     lexicon
##   <chr>    <chr>  
## 1 young    onix   
## 2 younger  onix   
## 3 youngest onix   
## 4 your     onix   
## 5 yours    onix   
## 6 miss     custom

2. 5. Wordclouds

ggplot2 라이브러리 외에 워드 클라우드를 그리기 위해서는 wordcloud라는 패키지도 활용해볼 수 있습니다.

library(wordcloud)

## 필요한 패키지를 로딩중입니다: RColorBrewer

워드클라우드를 통해 알아본 Jane Austen 소설에서 흔한 단어 상위 100개는 아래와 같습니다.
- 그림 내 텍스트의 크기는 단어의 빈도와 비례합니다.

tidy_books %>% 
  anti_join(stop_words, by = "word") %>% 
  count(word, sort = TRUE) %>% 
  with(wordcloud(word, n , max.words = 100))

어떤 특정 단어를 그룹으로 나누어 비교하고 싶을 때에는 comparison.cloud() 함수를 사용합니다.
해당 함수를 적용하기 위해서는 데이터 프레임을 행렬 형태로 변환해줄 필요가 있으며
이 때 reshape2 라이브러리에 있는 acast() 함수를 사용합니다.
예제는 아래와 같습니다.

library(reshape2)

## 
## 다음의 패키지를 부착합니다: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

tidy_books %>% 
  inner_join(get_sentiments("bing"), by = "word") %>% 
  count(word, sentiment, sort = TRUE) %>% 
  acast(word ~ sentiment, value.var = "n", fill = 0) %>% 
  comparison.cloud(
    colors = c("grey80", "grey20"),
    max.words = 100
  )

2. 6. Looking at units beyond just word

단어 수준에서 토큰화를 하면 유용한 작업을 할 수 있지만, 때로는 아래와 같이 단어가 아닌 그 이상이 필요할 경우도 있습니다.

tibble(text = prideprejudice) %>% 
  unnest_tokens(
    input = text,
    output = "setence",
    token = "sentences"
  )

## # A tibble: 15,545 x 1
##    setence                                                                   
##    <chr>                                                                     
##  1 "pride and prejudice"                                                     
##  2 "by jane austen"                                                          
##  3 "chapter 1"                                                               
##  4 "it is a truth universally acknowledged, that a single man in possession" 
##  5 "of a good fortune, must be in want of a wife."                           
##  6 "however little known the feelings or views of such a man may be on his"  
##  7 "first entering a neighbourhood, this truth is so well fixed in the minds"
##  8 "of the surrounding families, that he is considered the rightful property"
##  9 "of some one or other of their daughters."                                
## 10 "\"my dear mr."                                                           
## # … with 15,535 more rows

또한 정규표현식 패턴을 사용하여 토큰을 분할할 수도 있습니다.

austen_chapters <- austen_books() %>% 
  group_by(book) %>% 
  unnest_tokens(
    input = text,
    output = "chapter",
    token = "regex",
    pattern = "Chapter|CHAPTER [\\dIVXLC]"
  ) %>% 
  ungroup()

austen_chapters

## # A tibble: 275 x 2
##    book             chapter                                                     
##    <fct>            <chr>                                                       
##  1 Sense & Sensibi… "sense and sensibility\n\nby jane austen\n\n(1811)\n\n\n\n\…
##  2 Sense & Sensibi… "\n\n\nthe family of dashwood had long been settled in suss…
##  3 Sense & Sensibi… "\n\n\nmrs. john dashwood now installed herself mistress of…
##  4 Sense & Sensibi… "\n\n\nmrs. dashwood remained at norland several months; no…
##  5 Sense & Sensibi… "\n\n\n\"what a pity it is, elinor,\" said marianne, \"that…
##  6 Sense & Sensibi… "\n\n\nno sooner was her answer dispatched, than mrs. dashw…
##  7 Sense & Sensibi… "\n\n\nthe first part of their journey was performed in too…
##  8 Sense & Sensibi… "\n\n\nbarton park was about half a mile from the cottage. …
##  9 Sense & Sensibi… "\n\n\nmrs. jennings was a widow with an ample jointure.  s…
## 10 Sense & Sensibi… "\n\n\nthe dashwoods were now settled at barton with tolera…
## # … with 265 more rows

austen_chapters %>% 
  group_by(book) %>% 
  summarise(chapters = n()) %>% 
  ungroup()

## # A tibble: 6 x 2
##   book                chapters
##   <fct>                  <int>
## 1 Sense & Sensibility       51
## 2 Pride & Prejudice         62
## 3 Mansfield Park            49
## 4 Emma                      56
## 5 Northanger Abbey          32
## 6 Persuasion                25

이를 가지고 조금 더 활용하면 Austen의 각 소설에서 가장 부정적인 챕터가 무엇인지 확인해볼 수 있습니다.
위에서 언급했던 bing 사전에서 부정 의미를 담은 단어 목록을 따로 저장하고, 각 챕터의 단어 수를 저장하겠습니다.

bing_negative <- get_sentiments("bing") %>% 
  filter(sentiment == "negative")

bing_negative

## # A tibble: 4,781 x 2
##    word        sentiment
##    <chr>       <chr>    
##  1 2-faces     negative 
##  2 abnormal    negative 
##  3 abolish     negative 
##  4 abominable  negative 
##  5 abominably  negative 
##  6 abominate   negative 
##  7 abomination negative 
##  8 abort       negative 
##  9 aborted     negative 
## 10 aborts      negative 
## # … with 4,771 more rows

word_counts <- tidy_books %>% 
  group_by(book, chapter) %>% 
  summarise(words = n()) %>% 
  ungroup()

## `summarise()` has grouped output by 'book'. You can override using the `.groups` argument.

word_counts

## # A tibble: 275 x 3
##    book                chapter words
##    <fct>                 <int> <int>
##  1 Sense & Sensibility       0     7
##  2 Sense & Sensibility       1  1571
##  3 Sense & Sensibility       2  1970
##  4 Sense & Sensibility       3  1538
##  5 Sense & Sensibility       4  1952
##  6 Sense & Sensibility       5  1030
##  7 Sense & Sensibility       6  1353
##  8 Sense & Sensibility       7  1288
##  9 Sense & Sensibility       8  1256
## 10 Sense & Sensibility       9  1863
## # … with 265 more rows

그 다음 각 챕터에서 부정어의 수를 구하고 이를 총 단어 수로 나누어 비율을 계산하면 아래와 같습니다.

tidy_books %>% 
  semi_join(bing_negative, by = "word") %>% 
  group_by(book, chapter) %>% 
  summarise(negative_words = n()) %>% 
  left_join(word_counts, by = c("book", "chapter")) %>% 
  mutate(ratio = negative_words/words) %>% 
  filter(chapter > 0) %>% 
  slice_max(ratio, n = 1) %>% 
  ungroup()

## `summarise()` has grouped output by 'book'. You can override using the `.groups` argument.

## # A tibble: 6 x 5
##   book                chapter negative_words words  ratio
##   <fct>                 <int>          <int> <int>  <dbl>
## 1 Sense & Sensibility      43            161  3405 0.0473
## 2 Pride & Prejudice        34            111  2104 0.0528
## 3 Mansfield Park           46            173  3685 0.0469
## 4 Emma                     15            151  3340 0.0452
## 5 Northanger Abbey         21            149  2982 0.0500
## 6 Persuasion                4             62  1807 0.0343

저작자표시 비영리

'tidytext' 카테고리의 다른 글

[R] 한글 형태소 분석 (0)	2021.07.17
[R] unnest_tokens() (0)	2021.07.17
[R] 정규표현식 관련 (0)	2021.07.15
[R] stringr 문자열 관련 처리 함수 (0)	2021.07.14
[R] 1. Tidy text format (0)	2021.07.14

TAGS.

제이드의 낙서장

카테고리

방문자수

[R] 2. Sentiment analysis with tidy data

2. Sentiment analysis with tidy data

2. 1. The `sentiments` dataset

2. 2. Sentiment analysis with inner join

2. 3. Comparing the three sentiment dictionaries

2. 4. Most common positive and negative words

2. 5. Wordclouds

2. 6. Looking at units beyond just word

'tidytext' 카테고리의 다른 글

Comments

티스토리툴바

카테고리

방문자수

[R] 2. Sentiment analysis with tidy data

2. Sentiment analysis with tidy data

2. 1. The sentiments dataset

2. 2. Sentiment analysis with inner join

2. 3. Comparing the three sentiment dictionaries

2. 4. Most common positive and negative words

2. 5. Wordclouds

2. 6. Looking at units beyond just word

'tidytext' 카테고리의 다른 글

Comments

티스토리툴바

2. 1. The `sentiments` dataset