I wanted to explore hotel review data for a few hotels from Trip Advisor. I have described scraping data for 3 hotels here. The hotels considered are (disclaimer: choice of hotels is random):
# set working directory
setwd("~/notesofdabbler/githubfolder/blog_notesofdabbler/hotelReview/")
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
library(ggplot2)
library(tm)
library(scales)
library(topicmodels)
# load data for a hotel
# currently 3 datasets available from scraping trip advisor data
# 1. J W Marriott, Indianapolis (label: jwmarriott)
# 2. Hampton Inn Indianapolis Northwest -100 (label: hamptoninn)
# 3. Conrad Indianapolis (label: conrad)
#
hotellist=c("jwmarriott","conrad","hamptoninn")
# load hotel review data for each hotel and return the following for each hotel
# * Data frame with hotel reviews
# * Top 3 records
# * Number of reviews in dataset
# * frequency of different ratings
dfrating.l=lapply(hotellist,
function(x) {
filenm=paste("dfrating_",x,".Rda",sep="")
load(filenm)
return(list(dfrating=dfrating,
top3records=head(dfrating,3),
numrevs=nrow(dfrating),
freqRating=table(dfrating$ratingnum)))
})
names(dfrating.l)=hotellist
dfrating.l[["jwmarriott"]]$top3records
## id
## 1 rn220694072
## 2 rn220524586
## 3 rn220162069
## topquote ratingdt
## 1 Excellent Hotel--especially if you can get a government rate! 2014-08-09
## 2 Close to football and baseball 2014-08-08
## 3 Best hotel experience ever! 2014-08-07
## rating
## 1 5 of 5 stars
## 2 5 of 5 stars
## 3 5 of 5 stars
## partialentry
## 1 \nJW Marriott Hotels are among my favorite in the Marriott chain and on the rare occasion that they offer a government rate, they are an excellent place to stay. One of the largest hotels in the Midwest, the JW Indianapolis is about 5 blocks from the center of Indianapolis. My room was exceptionally clean and comfortable. I had access to...\n\n\nMore \n\n
## 2 \nThis property is close to their minor league baseball team and just a short distance from Lucas Field, home of the Colts. It is across the street from the convention center and government buildings. It is about 4 -5 blocks from the city center. The staff is exceptionally friendly and helpful. The rooms were cleaned-up on time. There was no...\n\n\nMore \n\n
## 3 \nAll the services from hotel staff, to the FedEx office to the restaurants were top notch! All the wait staff for both the restaurants gave excellent service as well as personable service! The Fedex store was extremely helpful and sympathetic to my crazy stressful circumstance of my books not showing up in time from my publisher. The hotel staff offered...\n\n\nMore \n\n
## ratingnum id2
## 1 5 220694072
## 2 5 220524586
## 3 5 220162069
## fullrev
## 1 \nJW Marriott Hotels are among my favorite in the Marriott chain and on the rare occasion that they offer a government rate, they are an excellent place to stay. One of the largest hotels in the Midwest, the JW Indianapolis is about 5 blocks from the center of Indianapolis. My room was exceptionally clean and comfortable. I had access to the concierge lounge which was well stocked in the evening/mornings and all of the staff I encountered were exceptionally professional and courteous. Wifi is free to Gold/Platinum Marriott members. Lots of choices for dining on the property and they are really close to downtown Indianapolis and one of my favorite restaurants (Palominos).\n
## 2 \nThis property is close to their minor league baseball team and just a short distance from Lucas Field, home of the Colts. It is across the street from the convention center and government buildings. It is about 4 -5 blocks from the city center. The staff is exceptionally friendly and helpful. The rooms were cleaned-up on time. There was no back-up in checking in as there often is when large groups check in. Concierge was helpful. Free wi-fi in the lobby. Starbucks on the second level\n
## 3 \nAll the services from hotel staff, to the FedEx office to the restaurants were top notch! All the wait staff for both the restaurants gave excellent service as well as personable service! The Fedex store was extremely helpful and sympathetic to my crazy stressful circumstance of my books not showing up in time from my publisher. The hotel staff offered great local suggestions and were even courteous during extremely busy times!\n
dfrating.l[["jwmarriott"]]$numrevs
## [1] 808
dfrating.l[["jwmarriott"]]$freqRating
##
## 1 2 3 4 5
## 11 18 57 156 566
dfrating.l[["conrad"]]$top3records
## id topquote ratingdt rating
## 1 rn220562473 Spectacular service 2014-08-08 5 of 5 stars
## 2 rn220387742 Excellent staff service 2014-08-07 5 of 5 stars
## 3 rn220109249 A good choice downtown. 2014-08-06 4 of 5 stars
## partialentry
## 1 \nLea at the concierge desk was incredibly helpful. She gave us wonderful advice on dining and made us feel totally welcome. I have stayed in Indy dozens of times when I was traveling for business before I retired and only recently tried the Conrad . The bite is lovely, well situated and the staff has been exemplary\nHighly recommended\n
## 2 \nAll of the staff were excellent! From the house keepers to the front desk, they were great. The staff near the front door and outside were always anticipating our needs and offering to get our car, to give directions, to recommend a place to eat. Excellent hotel!\n
## 3 \nTo be honest there is a paucity of decent hotels in Indy.\nConrad (the upper market Hiltons) are generally good but you can see they come from 4* aiming at 5 as their isn't the inherent service focus of the full 5* hotels.\nAnyway. This ones pretty good. It benefits from having a Capital Grille as it's in-house restaurant and...\n\n\nMore \n\n
## ratingnum id2
## 1 5 220562473
## 2 5 220387742
## 3 4 220109249
## fullrev
## 1 \nLea at the concierge desk was incredibly helpful. She gave us wonderful advice on dining and made us feel totally welcome. I have stayed in Indy dozens of times when I was traveling for business before I retired and only recently tried the Conrad . The bite is lovely, well situated and the staff has been exemplary Highly recommended\n
## 2 \nAll of the staff were excellent! From the house keepers to the front desk, they were great. The staff near the front door and outside were always anticipating our needs and offering to get our car, to give directions, to recommend a place to eat. Excellent hotel!\n
## 3 \nTo be honest there is a paucity of decent hotels in Indy. Conrad (the upper market Hiltons) are generally good but you can see they come from 4* aiming at 5 as their isn't the inherent service focus of the full 5* hotels. Anyway. This ones pretty good. It benefits from having a Capital Grille as it's in-house restaurant and breakfast provider. And I like CGs from my life in NYC. I believe its better than the Hilton Garden Inn, it's older cousin a few blocks away, (the place that hands out water at reception if you qualify!). When I'm in Indy next in a month I will stay in the Conrad.
dfrating.l[["conrad"]]$numrevs
## [1] 614
dfrating.l[["conrad"]]$freqRating
##
## 1 2 3 4 5
## 3 17 35 122 437
dfrating.l[["hamptoninn"]]$top3records
## id topquote
## 1 rn219975597 Clean place, friendly staff, close to convention center
## 2 rn218377442 It's a Hampton, but a nicer than many
## 3 rn217368068 Great Service
## ratingdt rating
## 1 2014-08-06 4 of 5 stars
## 2 2014-07-29 4 of 5 stars
## 3 2014-07-24 5 of 5 stars
## partialentry
## 1 \nI traveled here for work, spending four nights in the same room. There were multiple events in town and this place was very busy. They ran out of shaving cream, something they normally have at the front desk or in the suite shop, but there is a CVS about four blocks away, so I was able to get some there....\n\n\nMore \n\n
## 2 \nComfortable, clean, attractive hotel with nice lobby. They have a happy hour with food, all free on a few midweek nights. Not many restaurants nearby except fast food but within 3 miles there are several decent places. The only down side to this hotel is the quality of exercise equipment. It all feels ready to fall apart when in use....\n\n\nMore \n\n
## 3 \nBusiness or pleasure this is a good place to stay. Service was excellent, staff attentive, and rooms are clean. Check in and out was profession and fast. The breakfast had a good varity, coffee was alway on, and it was very quite.\n
## ratingnum id2
## 1 4 219975597
## 2 4 218377442
## 3 5 217368068
## fullrev
## 1 \nI traveled here for work, spending four nights in the same room. There were multiple events in town and this place was very busy. They ran out of shaving cream, something they normally have at the front desk or in the suite shop, but there is a CVS about four blocks away, so I was able to get some there.Breakfast was busy and they ran out if yogurt one day out of four. They offer waffles, and some other hot breakfast items as you would expect. The business center had three nice desks, computers and a printer available. The workout room was spacious and not busy at all. The equipment needs some attention. There are three treadmills, which have inconsistent belt speeds which makes them awkward to use. The bike shuts off and the computer starts over. There is no elliptical machine. There are some nice free weights, a yoga ball, an adjustable bench, and a yoga mat. I repeatedly had trouble turning on the television. At one point I finally called the front desk and someone came up within a couple minutes. It was like a combination lock, but they finally got it turned on.The staff was consistently polite and helpful. There are many places to eat and be entertained within walking distance.
## 2 \nComfortable, clean, attractive hotel with nice lobby. They have a happy hour with food, all free on a few midweek nights. Not many restaurants nearby except fast food but within 3 miles there are several decent places. The only down side to this hotel is the quality of exercise equipment. It all feels ready to fall apart when in use. I drive to LA Fitness, about 5 miles down the highway which is a hassle at 5 am.\n
## 3 \nBusiness or pleasure this is a good place to stay. Service was excellent, staff attentive, and rooms are clean. Check in and out was profession and fast. The breakfast had a good varity, coffee was alway on, and it was very quite.\n
dfrating.l[["hamptoninn"]]$numrevs
## [1] 196
dfrating.l[["hamptoninn"]]$freqRating
##
## 2 3 4 5
## 1 10 36 149
These 3 hotels are listed among the top 5 hotels in Indianapolis. So most of the ratings are 4 or 5 for these. Next, I have plotted % of reviews with a given rating over time
# function to get number of reviews and % of reviews with a rating
getrev.bymon=function(hotel){
# get review data
dfrating=dfrating.l[[hotel]]$dfrating
# create a month label
dfrating$yrmon=floor_date(dfrating$ratingdt,"month")
# create sequence of months
yrmon=unique(dfrating$yrmon)
yrmonseq=seq(min(yrmon),max(yrmon),by="months")
# yr-month and rating combinations
yrmon.rating=expand.grid(yrmon=yrmonseq,ratingnum=c(1,2,3,4,5))
# get % of reviews for each rating by month
dfrating.bymon=dfrating%>%group_by(yrmon,ratingnum)%>%summarize(count=n())
dfrating.bymon.agg=dfrating.bymon%>%group_by(yrmon)%>%summarize(countfull=sum(count))
dfrating.bymon=merge(dfrating.bymon,dfrating.bymon.agg,c("yrmon"))
dfrating.bymon$pctrating=dfrating.bymon$count/dfrating.bymon$countfull
dfrating.bymon=merge(yrmon.rating,dfrating.bymon,by=c("yrmon","ratingnum"),all.x=TRUE)
dfrating.bymon$pctrating[is.na(dfrating.bymon$pctrating)]=0
dfrating.bymon$count[is.na(dfrating.bymon$count)]=0
dfrating.bymon$countfull[is.na(dfrating.bymon$countfull)]=0
dfrating.bymon$hotel=hotel
# get number of reviews by month
dfrating.bymon.revs=dfrating.bymon%>%group_by(yrmon)%>%summarize(numrevs=sum(count))
dfrating.bymon.revs$hotel=hotel
return(list(dfrating.bymon=dfrating.bymon,dfrating.bymon.revs=dfrating.bymon.revs))
}
dfrating.bymon.l=lapply(hotellist,function(x) getrev.bymon(x))
# plot number of reviews by year-month
dfrating.bymon.revs=do.call(rbind,lapply(dfrating.bymon.l,function(x) x$dfrating.bymon.revs))
p=ggplot(dfrating.bymon.revs,aes(x=yrmon,y=numrevs))+geom_line()+facet_grid(hotel~.)
p=p+xlab("")+ylab("# of reviews")
p=p+theme_bw()
p
# plot % of reviews for each rating by year-month
dfrating.bymon=do.call(rbind,lapply(dfrating.bymon.l,function(x) x$dfrating.bymon))
p=ggplot(dfrating.bymon,aes(x=yrmon,y=pctrating,color=factor(ratingnum)))+geom_line(size=1.1)
p=p+facet_grid(hotel~.)
p=p+xlab("")+ylab("% of Ratings")
p=p+scale_y_continuous(breaks=seq(0,1,0.1),labels=percent)+scale_color_discrete(name="# stars")
p=p+theme_bw()
p
The more choppy areas of % of ratings plot correspond to periods with very less reviews.
I wanted to check out the top words occuring in top quotes associated with review for each star rating. Though I show the top words for each star rating, we should remember that the frequency of lower star ratings in data is low.
# Explore top level quotes for each rating
# function to get document-term matrix from hotel review data for a given hotel
getDTM=function(dftxt){
# code adapted from http://www.rdatamining.com/examples/text-mining
txtcorpus=Corpus(VectorSource(dftxt))
#inspect(txtcorpus[1:5])
txtcorpus.cl=tm_map(txtcorpus,tolower)
txtcorpus.cl=tm_map(txtcorpus.cl,removePunctuation)
txtcorpus.cl=tm_map(txtcorpus.cl,removeNumbers)
mystopwords=c(stopwords("english"),"hotel","staff","room","rooms","indianapolis","marriott","conference",
"convention","indy","downtown","hampton","stay","stayed","inn","conrad")
txtcorpus.cl=tm_map(txtcorpus.cl,removeWords,mystopwords)
#dictCorpus=txtcorpus.cl
#txtcorpus.cl=tm_map(txtcorpus.cl,stemDocument)
#txtcorpus.cl=tm_map(txtcorpus.cl,stemCompletion,dictionary=dictCorpus)
dtm=DocumentTermMatrix(txtcorpus.cl)
dtm.m=as.matrix(dtm)
return(dtm.m)
}
getTopTerms=function(hotel){
# get review data
dfrating=dfrating.l[[hotel]]$dfrating
minrating=min(dfrating$ratingnum)
maxrating=max(dfrating$ratingnum)
tfreq.l=as.list(rep(NA,maxrating-minrating+1))
# of frequent words to retain
numterms=20
rating=maxrating+1
for(i in 1:(maxrating-minrating+1)){
rating=rating-1
#sprintf("Processing data for %s stars",rating)
dftxt=dfrating$topquote[dfrating$ratingnum==rating]
dtm.m=getDTM(dftxt)
tfreq=colSums(dtm.m)
tfreq.l[[i]]=names(sort(tfreq,decreasing=TRUE)[1:numterms])
}
topTerms=do.call(cbind,tfreq.l)
colnames(topTerms)=paste(seq(maxrating,minrating)," star")
return(list(dtm.m=dtm.m,topTerms=topTerms))
}
topTerms.l=lapply(hotellist,function(x) getTopTerms(x))
names(topTerms.l)=hotellist
topTerms.l[["jwmarriott"]]$topTerms
## 5 star 4 star 3 star 2 star
## [1,] "great" "nice" "good" "bad"
## [2,] "service" "great" "nice" "disappointing"
## [3,] "excellent" "good" "great" "service"
## [4,] "best" "location" "expected" "slow"
## [5,] "location" "place" "location" "amenities"
## [6,] "nice" "service" "really" "around"
## [7,] "place" "best" "service" "available"
## [8,] "wonderful" "excellent" "big" "back"
## [9,] "experience" "beautiful" "business" "bellman"
## [10,] "beautiful" "new" "doesnt" "beware"
## [11,] "fantastic" "property" "price" "building"
## [12,] "property" "really" "time" "charges"
## [13,] "business" "classy" "bad" "cleaning"
## [14,] "good" "everything" "chaos" "desk"
## [15,] "outstanding" "expensive" "dirty" "disappointment"
## [16,] "weekend" "fantastic" "disappointment" "errors"
## [17,] "customer" "one" "expensive" "extra"
## [18,] "ever" "outstanding" "just" "falls"
## [19,] "top" "plus" "trip" "first"
## [20,] "amazing" "view" "accessible" "front"
## 1 star
## [1,] "horrible"
## [2,] "careful"
## [3,] "comedy"
## [4,] "customer"
## [5,] "disappointed"
## [6,] "errors"
## [7,] "expereince"
## [8,] "experience"
## [9,] "fees"
## [10,] "fell"
## [11,] "good"
## [12,] "hardy"
## [13,] "hidden"
## [14,] "laurel"
## [15,] "manager"
## [16,] "never"
## [17,] "night"
## [18,] "price"
## [19,] "reservations"
## [20,] "service"
topTerms.l[["conrad"]]$topTerms
## 5 star 4 star 3 star 2 star
## [1,] "great" "nice" "nice" "disappointing"
## [2,] "best" "great" "four" "great"
## [3,] "excellent" "good" "service" "service"
## [4,] "service" "location" "better" "property"
## [5,] "location" "service" "brand" "attention"
## [6,] "wonderful" "excellent" "class" "bad"
## [7,] "amazing" "friendly" "dont" "brunch"
## [8,] "perfect" "hilton" "expectations" "customer"
## [9,] "weekend" "best" "first" "decent"
## [10,] "experience" "city" "good" "define"
## [11,] "nice" "perfect" "seasons" "dirty"
## [12,] "one" "value" "accommodations" "easter"
## [13,] "beautiful" "close" "age" "ever"
## [14,] "top" "comfortable" "amenities" "expected"
## [15,] "every" "experience" "anything" "experience"
## [16,] "luxury" "place" "badly" "food"
## [17,] "business" "property" "ball" "housekeeping"
## [18,] "fantastic" "star" "best" "improvement"
## [19,] "class" "amenities" "big" "issues"
## [20,] "fabulous" "another" "bit" "luxury"
## 1 star
## [1,] "average"
## [2,] "away"
## [3,] "back"
## [4,] "hilton"
## [5,] "never"
## [6,] "will"
## [7,] NA
## [8,] NA
## [9,] NA
## [10,] NA
## [11,] NA
## [12,] NA
## [13,] NA
## [14,] NA
## [15,] NA
## [16,] NA
## [17,] NA
## [18,] NA
## [19,] NA
## [20,] NA
topTerms.l[["hamptoninn"]]$topTerms
## 5 star 4 star 3 star 2 star
## [1,] "great" "clean" "nice" "expected"
## [2,] "best" "great" "industrial" NA
## [3,] "service" "nice" "adequate" NA
## [4,] "clean" "place" "area" NA
## [5,] "nice" "comfortable" "complex" NA
## [6,] "place" "good" "essentials" NA
## [7,] "excellent" "service" "floor" NA
## [8,] "experience" "friendly" "frig" NA
## [9,] "friendly" "new" "great" NA
## [10,] "one" "will" "lacking" NA
## [11,] "comfortable" "act" "located" NA
## [12,] "good" "amenities" "location" NA
## [13,] "highly" "away" "microwave" NA
## [14,] "location" "back" "missing" NA
## [15,] "new" "budget" "new" NA
## [16,] "property" "center" "next" NA
## [17,] "value" "choice" "number" NA
## [18,] "amazing" "class" "okay" NA
## [19,] "comfort" "close" "park" NA
## [20,] "inns" "convenient" "short" NA
One thing I want to explore in the future is to train such a data set (features being words and response being rating) using Trip Advisor reviews and use it to predict rating in some other area (such as tweets on a topic).
Next, I wanted to explore the full text review to see if there are themes. Right now, I am not too familiar with topic model packages (on my near term todo list). For now, I took a more simpler approach of just using k-means clustering of words using term document term matrix to find cluster of words. Here I just picked 5 clusters and haven’t checked what the right number of clusters are.
# Further investigation of high star rating using full reviews for a hotel
# functiont to cluster words from term document matrix of full reviews for a hotel
getClust=function(hotel){
# load hotel full review data
dfrating=dfrating.l[[hotel]]$dfrating
dftxt=dfrating$fullrev[dfrating$ratingnum>=4]
dtm.m=getDTM(dftxt)
# clustering of words to detect themes/topics
set.seed(1234)
txtclust=kmeans(t(dtm.m),5)
# size of clusters
txtclust$size
# within and total sum of squares
txtclust$totss
txtclust$withinss
# get list of frequent terms in each cluster
clustTerms=as.list(rep(NA,5))
termlist=colnames(dtm.m)
for(i in 1:5){
termlist.filt=termlist[txtclust$cluster == i]
tfreq=colSums(dtm.m)
tfreq.filt=sort(tfreq[termlist.filt],decreasing=TRUE)
clustTerms[[i]]=names(tfreq.filt[1:20])
}
clust.topic=do.call(cbind,clustTerms)
clust.topic[is.na(clust.topic)]=""
colnames(clust.topic)=c("cluster 1","cluster 2","cluster 3","cluster 4","cluster 5")
# print list of frequent terms in each cluster
return(list(txtclust=txtclust,clust.topic=clust.topic))
}
clust.topic=lapply(hotellist,function(x) getClust(x))
names(clust.topic)=hotellist
clust.topic[["jwmarriott"]]$clust.topic
## cluster 1 cluster 2 cluster 3 cluster 4 cluster 5
## [1,] "notch" "location" "great" "amazing" "floor"
## [2,] "care" "center" "service" "definitely" "one"
## [3,] "skywalk" "restaurants" "nice" "ever" "well"
## [4,] "decorated" "city" "good" "make" "clean"
## [5,] "courteous" "will" "" "much" "view"
## [6,] "efficient" "breakfast" "" "fantastic" "time"
## [7,] "options" "place" "" "staying" "friendly"
## [8,] "pleased" "hotels" "" "close" "food"
## [9,] "thanks" "beautiful" "" "several" "also"
## [10,] "travel" "area" "" "free" "comfortable"
## [11,] "attended" "really" "" "level" "bar"
## [12,] "beyond" "everything" "" "say" "just"
## [13,] "week" "best" "" "enjoyed" "excellent"
## [14,] "anyone" "lounge" "" "got" "night"
## [15,] "entire" "helpful" "" "weekend" "get"
## [16,] "exceptional" "restaurant" "" "went" "desk"
## [17,] "accommodating" "day" "" "elevator" "lobby"
## [18,] "expected" "wonderful" "" "located" "large"
## [19,] "experienced" "like" "" "nights" "front"
## [20,] "prices" "back" "" "trip" ""
clust.topic[["conrad"]]$clust.topic
## cluster 1 cluster 2 cluster 3 cluster 4 cluster 5
## [1,] "every" "worth" "service" "great" "location"
## [2,] "business" "getaway" "nice" "" "one"
## [3,] "recommend" "plus" "" "" "comfortable"
## [4,] "always" "luxury" "" "" "clean"
## [5,] "amazing" "superb" "" "" "time"
## [6,] "perfect" "impressed" "" "" "well"
## [7,] "back" "pricey" "" "" "bathroom"
## [8,] "top" "anniversary" "" "" "also"
## [9,] "beds" "favorite" "" "" "good"
## [10,] "distance" "noise" "" "" "will"
## [11,] "ever" "pretty" "" "" "excellent"
## [12,] "definitely" "standard" "" "" "friendly"
## [13,] "enjoyed" "anyone" "" "" "get"
## [14,] "made" "anywhere" "" "" "night"
## [15,] "make" "chose" "" "" "desk"
## [16,] "grill" "complaint" "" "" "bed"
## [17,] "located" "indoor" "" "" "just"
## [18,] "city" "now" "" "" "wonderful"
## [19,] "hilton" "sleep" "" "" "valet"
## [20,] "weekend" "areas" "" "" "shower"
clust.topic[["hamptoninn"]]$clust.topic
## cluster 1 cluster 2 cluster 3 cluster 4 cluster 5
## [1,] "clean" "excellent" "close" "good" "lot"
## [2,] "breakfast" "experience" "restaurants" "pool" "day"
## [3,] "great" "right" "located" "comfortable" "little"
## [4,] "nice" "wonderful" "complimentary" "friendly" "new"
## [5,] "area" "bit" "definitely" "quiet" "nights"
## [6,] "well" "enjoyed" "access" "location" "ive"
## [7,] "" "find" "offered" "time" "every"
## [8,] "" "next" "perfect" "one" "kept"
## [9,] "" "able" "really" "will" "lobby"
## [10,] "" "highly" "extremely" "night" "can"
## [11,] "" "make" "many" "desk" "fresh"
## [12,] "" "food" "eat" "hot" "need"
## [13,] "" "town" "pleasant" "service" "park"
## [14,] "" "weekend" "visit" "helpful" "dont"
## [15,] "" "near" "several" "free" "found"
## [16,] "" "price" "beautiful" "get" "minutes"
## [17,] "" "recently" "minute" "easy" "morning"
## [18,] "" "spacious" "nearby" "front" "within"
## [19,] "" "asked" "travel" "trip" "coffee"
## [20,] "" "come" "loved" "always" "hilton"
I was finding it hard to assign themes that differentiate clusters. Perhaps I need to do a better job filtering words or play around more with parameters for clustering or check out algorithms for topic models. Nevertheless, this was a fun exercise.
All analysis was done in RStudio 0.98.994
sessionInfo()
## R version 3.0.2 (2013-09-25)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] topicmodels_0.2-0 scales_0.2.3 tm_0.5-10 ggplot2_0.9.3.1
## [5] lubridate_1.3.3 dplyr_0.2
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 colorspace_1.2-4 dichromat_2.0-0
## [4] digest_0.6.4 evaluate_0.5.5 formatR_0.10
## [7] grid_3.0.2 gtable_0.1.2 htmltools_0.2.4
## [10] knitr_1.6 labeling_0.2 magrittr_1.0.1
## [13] MASS_7.3-29 memoise_0.1 modeltools_0.2-21
## [16] munsell_0.4.2 parallel_3.0.2 plyr_1.8
## [19] proto_0.3-10 RColorBrewer_1.0-5 Rcpp_0.11.1
## [22] reshape2_1.2.2 rmarkdown_0.2.54 slam_0.1-32
## [25] stats4_3.0.2 stringr_0.6.2 tools_3.0.2
## [28] yaml_2.1.11