Tableau + R

In this guide we’ll explore how to easily set up your own R Server and get it connected to Tableau. Then we’ll step through a few exercises highlighting some of my favorite R packages for use within Tableau

Introduction

Let’s face it, Tableau’s out of the box “Advanced Analytics” offering is quite rudimentary.  Having worked with many clients spanning most industries, one thing I’ve come to learn is that the trend line is about as far as most folks take it. There’s nothing advanced about that.

Tableau is in the business of Data Visualization, not Advanced Analytics. They understood their particular shortcoming and made the decision to open the doors for R integration (as of Tableau 8.1).  This provided users the opportunity to pass their more Advanced Analytics off to a service whose sole purpose was to render predictive analytics.

Today, we’ll work on setting up our very own R Server in the cloud and work through some fun use cases that demonstrate a few of the most powerful R packages available for use within Tableau.

Before we dive in, here’s a brief summary to demonstrate how we go from Tableau -> R Server -> Tableau:

Tableau allows you to embed R code right into a traditional Calculated Field.  This field will be communicating externally (external service) as the code within the calc field contains a SCRIPT_*() function. The fields within your dataset and the accompanying data are referenced within this SCRIPT function.

When this calc field gets executed in Tableau (added anywhere on your sheet canvas), both the R code and associated data are sent to an Rserve session.  At this point, all the magic is happening external to Tableau and running on your R Server.

The R code will produce a single vector and pass it back from the external R server to Tableau.  The resulting vector is then displayed using a Tableau Table Calculation (the only option) as it’s vessel for communicating back your results, visually.

Part 1: Set Up R Server

Step-By-Step Configuration of a brand new R Server. Everything covered in the video is also detailed out below.

  • Ubuntu Set Up
  • Adding R Repositories
  • Installing R-Base & Rserve
  • Enabling Remote Access
  • Connecting Tableau

Step 1 – Set Up Ubuntu Server 16.04 64bit

In order to complete this tutorial, you will need access to an Ubuntu 16.04 server.

We will be completing the steps in this guide using a cloud hosted Ubuntu Server as this will enable us to access our R Server from any machine, anytime, anywhere.  The least expensive self-managed option is fine (usually around $6.99/month)

Once you’ve signed up for your server, it should take around 10 minutes to provision.  Let’s move on to the next step to log into our server using SSH.

Step 2a – SSH onto your server (Mac or Windows)

Mac Instructions:

Open the terminal app:

  • Enter the root password from the Additional Information tab

Windows Instructions:

  • Download and install Putty
  • Open Putty, ensure SSH is selected, enter your server IP address to connect
  • Type in the user root
  • Enter the root password from the Additional Information tab

Step 2b – Change your root password + install updates

Once logged in as the root user, change your password to something you will easily remember:

passwd

Now perform an update:

sudo apt-get update

Step 3 – Add external Ubuntu repository

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 

sudo add-apt-repository 'deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu xenial/'

Step 4 – Run an update to grab the latest version of available packages (including R)

sudo apt-get update

sudo apt-get install libcurl4-openssl-dev

Step 5 – Install and Launch R

sudo apt-get install r-base

sudo -i R

Step 6 – Install Rserve

install.packages('Rserve')

Step 7 – Exit R to enable remote access

quit()

R CMD Rserve --RS-enable-remote

Troubleshooting

If you restart your Ubuntu environment, you’ll need to reinstall Rserve and re-enable remote access. Simply execute step 6 and step 7 above again.  All of your R packages will stay intact.

AT THIS POINT YOU SHOULD BE ABLE TO CONNECT TO YOUR R SERVER USING TABLEAU

Part 2: Hands-On Exercises

Step-By-Step Exercises Demonstrating the Power of R & Tableau

  • Time-Series Forecasting
  • Clustering
  • Sentiment Analysis

We’ll be using this Tableau Workbook throughout all exercises. It already contains a copy of the data and code we will be using.  Please download and open this file in Tableau Desktop 2018+

Exercise 1a – Time-Series Analysis – Profit Forecast

In this first exercise, we’ll be forecasting out what the future potentially could look like given historical performance. R forecasting is much more accurate than Tableau’s built in forecasting functionality as you’ll soon see.

Prerequisites: 
Before we begin, we’ll need to install the R package, forecast, which is what we’ll be using in this exercise.
Navigate back to your R server and if you aren’t inside of the R environment you can enter it by typing:

sudo -i R

Once you’ve confirmed you’re in R, install the forecast package by executing the line of code below:

install.packages(‘forecast’)

Tableau Calculations (code also included in workbook): 
Below you’ll find the R syntax we’ll need to place inside of two new Tableau calculated fields.

DateForecast

DATE(IF [Order Date] == { MAX([Order Date]) }
THEN DATEADD('year', 1, [Order Date])
ELSE [Order Date]
END)

ProfitForecast

SCRIPT_REAL("
library(forecast)
inputData = na.omit(.arg1)
startDate = as.Date(min(na.omit(.arg2)))
timeSeries = ts(inputData,
start = startDate,
deltat = 1/52)
timeSeriesForecast = forecast(timeSeries,
h = length(.arg1) - length(inputData),
level = 95)
append(inputData,
timeSeriesForecast$mean)
",
AVG([Profit]),
MAX([DateForecast]))

Exercise 1b – Time-Series Analysis – Temperature Forecast

In this exercise, we’ll be forecasting out what future temperatures could look like given historical weekly temperatures over time.

Prerequisites: (skip this prerequisite if you’ve already installed forecast):
Before we begin, we’ll need to install the R package, forecast, which is what we’ll be using in this exercise.
Navigate back to your R server and if you aren’t inside of the R environment you can enter it by typing:

sudo -i R

Once you’ve confirmed you’re in R, install the forecast package by executing the line of code below:

install.packages(‘forecast’)

Tableau Calculations (code also included in workbook): 
Below you’ll find the R syntax we’ll need to place inside of two new Tableau calculated fields.

DateForecast

DATE(IF [Date] == { MAX([Date]) }
THEN DATEADD('year', 1, [Date])
ELSE [Date]
END)

TempForecast

SCRIPT_REAL("
library(forecast)
inputData = na.omit(.arg1)
startDate = as.Date(min(na.omit(.arg2)))
timeSeries = ts(inputData,
start = startDate,
deltat = 1/52)
timeSeriesForecast = forecast(timeSeries,
h = length(.arg1) - length(inputData),
level = 95)
append(inputData,
timeSeriesForecast$mean)
",
AVG([Temperature (Farenheit)]),
MAX([DateForecast]))

Exercise 2 – Clustering – Criminal Activity

In this exercise, we’ll be clustering data to easily call out hotspots within our visualization using the R package dbscan. This particular clustering package is awesome because it doesn’t require the end user to specify the number of clusters and automatically renders a resulting set of clusters based on data and point proximity. This particular dataset contains Seattle, WA police activity related to the exact time of arrival to a crime scene. We use the scene arrival time to provide us with the minimum number of points to scan and automatically build clusters, one value per crime type. For that we’re looking at the number of similar crimes (i.e. crimes of the same type) normalized by the total time frame.

Prerequisites: 
Before we begin, we’ll need to install the R package, dbscan, which is what we’ll be using in this exercise.
Navigate back to your R server and if you aren’t inside of the R environment you can enter it by typing:

sudo -i R

Once you’ve confirmed you’re in R, install the dbscan package by executing the line of code below:

install.packages(‘dbscan’)

Tableau Calculations (code also included in workbook): 
Below you’ll find the R syntax we’ll need to place inside of two new Tableau calculated fields.

IncidentCount

5*(DATEDIFF('day',{fixed [Initial Type Group] : MIN([At Scene Time])}, {fixed [Initial Type Group] : MAX([At Scene Time])}))/90

Hotspot?

SCRIPT_STR("
library(dbscan)

data <- cbind((.arg1 * pi) / 180, (.arg2 * pi) / 180)

db <- dbscan(data, eps = 1/29590, minPts = .arg3[1])$cluster db[db > 0] <- 'Yes'
db[db == 0] <- 'No'

db
",
AVG([Latitude]), 
AVG([Longitude]), 
AVG([Incident Count]))

Exercise 3a – Sentiment Analysis (words) – Spotify

In this exercise, we'll be scanning rows of data containing lyrics from the Top 100 Spotify song list from 2017.  We'll use the R package tidytext to scan and assign a polarity score for every word of every line of every song.  This will allow us to see the overall polarity of every song within the Top 100.  We'll be able to easily rank from the most positive song down to the most negative.

Prerequisites: 
Before we begin, we'll need to install the R package, tidytext and dplyr which is what we'll be using in this exercise.
Navigate back to your R server and if you aren't inside of the R environment you can enter it by typing:

sudo -i R

Once you've confirmed you're in R, install the tidytext and dplyr package by executing the lines of code below:

install.packages(‘tidytext’)

install.packages(‘dplyr’)

Tableau Calculations (code also included in workbook): 
Below you'll find the R syntax we'll need to place inside of two new Tableau calculated fields.

Polarity

WINDOW_SUM(
	SCRIPT_REAL("
	library(tidytext)
	library(dplyr)

	df <- data.frame(word = .arg1) %>%
		left_join(get_sentiments('afinn'),
                          by='word')

	df$score
	",
	MIN([Word])
	)
)

Exercise 3b – Sentiment Analysis (sentences) – Airbnb

In this exercise, we'll be scanning rows of data containing reviews from Airbnb properties.  We'll use the R package syuzhet to scan and assign a positive or negative sentiment for every review.  This will allow us to quickly scan paragraphs of text to assess whether or not a review is positive or negative.

Prerequisites: 
Before we begin, we'll need to install the R package, syuzhet.
Navigate back to your R server and if you aren't inside of the R environment you can enter it by typing:

sudo -i R

Once you've confirmed you're in R, install the syuzhet package & plyr by executing the lines of code below:

install.packages(‘syuzhet’)

install.packages(‘plyr’)

Tableau Calculations (code also included in workbook): 
Below you'll find the R syntax we'll need to place inside of two new Tableau calculated fields.

Polarity

SCRIPT_STR('library(syuzhet);
library(plyr);
len <- length(.arg1);
result <- numeric(len);
for (i in 1:len){
token <- get_tokens(.arg1[i]);
sentiment <- get_sentiment(token,method = "syuzhet")
sentiment <- sentiment[sentiment!=0]
result[i] <- mean(sentiment)
}
as.character(cut(result,breaks=c(-Inf, 0.3, 0.5, Inf),labels=c("negative","neutral","positive")))',
ATTR([Comments]))
COPYRIGHT © 2019 BUSINESS INTELLIGENCE SOLUTIONS LLC (THIS WEBSITE IS NOT AFFILIATED WITH, SPONSORED BY, OR APPROVED BY TABLEAU - Enfold WordPress Theme by Kriesi