Skip to content

Chapter 4: SMS case study - invalid input #2

@jingo99

Description

@jingo99

Dear @dataspelunking,

I am a Dutch student currently working through your book for a project. I'm an absolute beginner at machine learning and your book is very helpful in teaching me. At chapter 4 page 117, I'm instructed to create 2 wordclouds based on the raw data, so I can inspect the difference in "spam" and "ham". However, when creating the wordclouds, my RStudio fails and tells me the following.

s_raw <- read.csv("sms_spam.csv", stringsAsFactors = FALSE)
wordcloud(spam$text, max.words = 40, scale = c(3, 0.5))

transformation drops documentstransformation drops documentsError in strwidth(words[i], cex = size[i], ...) : invalid input 'â£1000' in 'utf8towcs'

I'm a student developer, so I presumed it might have been the (british) pound-sign failing the code. I retrieved your updated csv-file from github, but to no avail. I have continued working, but if possible, i'd like to understand why this code fails.

I'm not able to really ask anyone for help, so I thought I'd try my luck here.

Would you be so kind as to explain what i'm missing here?

With kind regards,

Jingo

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions