Aller au contenu principal
BigData

Web App Shiny pour afficher des données Databricks

Oct 14, 2021

Nicolas Bailly

Dans un article précédent (lien) nous avons vu comment créer une image contenant le driver ODBC de Databricks.
Dans cet article, nous utiliserons ce que nous avons fait pour déployer une application Shiny qui affiche un dashboard avec des données issues de Databricks.

Récupérer un token Azure AD

Comme nous avons vu, l’article précédent, comment créer une image avec une connexion ODBC, nous allons pouvoir y déployer une application qui se connecte à Databricks. Pour ça, nous devons générer un token Azure AD pour s’authentifier sur Databricks, à partir d’un Service Pincipal Name (SPN) qui a les droits sur Databricks.
La librairie « AzureAuth » permet de générer un token :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
library(AzureAuth)
databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")),
Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")),
auth_type="client_credentials")
library(AzureAuth) databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d" accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
library(AzureAuth)

databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), 
   Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")),
   auth_type="client_credentials")

Dans cet exemple, nous avons mis des variables d’environnements pour les paramètres de l’App Registered : client id, client secret et tenant.
A savoir que la constanste « databricksResource » contient le scope qui correspond à Databricks.

Connexion ODBC

Une fois que nous avons le token, nous pouvons construire la connexion ODBC :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
con <- dbConnect(odbc(), "Databricks_Cluster",
Auth_AccessToken=accessToken$credentials$access_token,
httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST")))
df <- dbGetQuery(con, "SELECT * FROM default.solar")
dbDisconnect(con)
con <- dbConnect(odbc(), "Databricks_Cluster", Auth_AccessToken=accessToken$credentials$access_token, httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST"))) df <- dbGetQuery(con, "SELECT * FROM default.solar") dbDisconnect(con)
con <- dbConnect(odbc(), "Databricks_Cluster", 
  Auth_AccessToken=accessToken$credentials$access_token, 
  httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST")))

df <- dbGetQuery(con, "SELECT * FROM default.solar")
dbDisconnect(con)

Cette connexion ODBC se fait à partir des éléments mis en place dans l’article précédent, c’est-à-dire ce qui a été mis dans le fichier odbc.ini. Mais aussi en passant en paramètre le token, le serveur Databricks et le « HTTP_PATH » du cluster

Allumage du cluster Databricks

Si votre cluster est éteint au moment d’envoyer la requête, vous aurez un timeout. La cluster va bien s’allumer mais ça mettra trop de temps pour exécuter la requête dans les temps.
Pour éviter cela, on peut appeler l’API Databricks qui permet de vérifier l’état du cluster et de l’allumer si besoin. Pour appeler cette API, nous devons passer un token que nous générons de la même manière que pour la requête ODBC

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
library(httr)
library(jsonlite)
databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="")
clusterState <- function() {
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")
response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""),
encode = "json",
add_headers(Authorization = authorizationHeader))
getClusterJsonResponse<-fromJSON(content(response, as = "text"))
return(getClusterJsonResponse$state)
}
library(httr) library(jsonlite) databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="") clusterState <- function() { accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials") authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="") response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""), encode = "json", add_headers(Authorization = authorizationHeader)) getClusterJsonResponse<-fromJSON(content(response, as = "text")) return(getClusterJsonResponse$state) }
library(httr)
library(jsonlite)

databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="")

clusterState <- function() {
  
  accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
                        password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
  authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")

  response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""), 
            encode = "json", 
            add_headers(Authorization = authorizationHeader))

  getClusterJsonResponse<-fromJSON(content(response, as = "text"))
  return(getClusterJsonResponse$state)
}

Nous utilisons ici la librairie httr pour effectuer des appels REST et la librairie jsonlite pour parser le json.
Cette méthode renvoie le statut du cluster. Les statuts qui nous intéressent sont « RUNNING » quand le cluster est démarré et « TERMINATED » quand il est éteint.
Dans le cas où le cluster est éteint, nous pouvons le démarrer avec la méthode suivante :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
startCluster <- function() {
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")
response<-POST(paste(databricksUrl, "clusters/start", sep=""),
encode = "json",
add_headers(Authorization = authorizationHeader),
body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID"))))
}
startCluster <- function() { accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials") authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="") response<-POST(paste(databricksUrl, "clusters/start", sep=""), encode = "json", add_headers(Authorization = authorizationHeader), body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID")))) }
startCluster <- function() {
  
  accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
                        password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
  authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")

  response<-POST(paste(databricksUrl, "clusters/start", sep=""), 
            encode = "json", 
            add_headers(Authorization = authorizationHeader),
            body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID"))))
}

Voici le code complet de cet exemple qui affiche le résultat dans un tableau :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
library(ggplot2)
library(shiny)
library(odbc)
library(AzureAuth)
library(httr)
library(jsonlite)
library(shinyjs)
databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="")
databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"#Constant for Azure AD which represent Databricks resources in Azure AD
clusterState <- function() {
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")
response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""),
encode = "json",
add_headers(Authorization = authorizationHeader))
getClusterJsonResponse<-fromJSON(content(response, as = "text"))
return(getClusterJsonResponse$state)
}
startCluster <- function() {
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")
response<-POST(paste(databricksUrl, "clusters/start", sep=""),
encode = "json",
add_headers(Authorization = authorizationHeader),
body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID"))))
}
dataframe <- function() {
accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
# Connexion
con <- dbConnect(odbc(), "Databricks_Cluster", Auth_AccessToken=accessToken$credentials$access_token, httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST")))
df <- dbGetQuery(con, "SELECT * FROM default.solar")
dbDisconnect(con)
# Data
return(df)
}
# R Shiny App
ui = shiny::fluidPage(
useShinyjs(),
verbatimTextOutput("verb"),
shiny::fluidRow(shiny::column(12, dataTableOutput('table'))),
hidden(actionButton("refresh", "Refresh Data")),
hidden(actionButton("start", "Start Cluster"))
)
server = function(input, output) {
values <- reactiveValues(df_data = NULL, state = "")
state <- clusterState()
values$state <- paste("Cluster state: ", state, sep="")
if (state == "RUNNING")
{
show("refresh")
values$df_data <- dataframe()
}
if (state == "TERMINATED")
show("start")
observeEvent(input$refresh, {
values$state <- paste("Cluster state:", clusterState(), sep="")
# Data
values$df_data <- dataframe()
})
observeEvent(input$start, {
startCluster()
state <- clusterState()
values$state <- paste("Cluster state: ", state, sep="")
if (state == "RUNNING")
{
show("refresh")
}
else
{
hide("refresh")
}
if (state == "TERMINATED")
{
show("start")
}
else
{
hide("start")
}
})
output$table <- renderDataTable({values$df_data})
output$verb <- renderText({values$state})
}
shinyApp(ui, server)
library(ggplot2) library(shiny) library(odbc) library(AzureAuth) library(httr) library(jsonlite) library(shinyjs) databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="") databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"#Constant for Azure AD which represent Databricks resources in Azure AD clusterState <- function() { accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials") authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="") response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""), encode = "json", add_headers(Authorization = authorizationHeader)) getClusterJsonResponse<-fromJSON(content(response, as = "text")) return(getClusterJsonResponse$state) } startCluster <- function() { accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials") authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="") response<-POST(paste(databricksUrl, "clusters/start", sep=""), encode = "json", add_headers(Authorization = authorizationHeader), body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID")))) } dataframe <- function() { accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")), password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials") # Connexion con <- dbConnect(odbc(), "Databricks_Cluster", Auth_AccessToken=accessToken$credentials$access_token, httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST"))) df <- dbGetQuery(con, "SELECT * FROM default.solar") dbDisconnect(con) # Data return(df) } # R Shiny App ui = shiny::fluidPage( useShinyjs(), verbatimTextOutput("verb"), shiny::fluidRow(shiny::column(12, dataTableOutput('table'))), hidden(actionButton("refresh", "Refresh Data")), hidden(actionButton("start", "Start Cluster")) ) server = function(input, output) { values <- reactiveValues(df_data = NULL, state = "") state <- clusterState() values$state <- paste("Cluster state: ", state, sep="") if (state == "RUNNING") { show("refresh") values$df_data <- dataframe() } if (state == "TERMINATED") show("start") observeEvent(input$refresh, { values$state <- paste("Cluster state:", clusterState(), sep="") # Data values$df_data <- dataframe() }) observeEvent(input$start, { startCluster() state <- clusterState() values$state <- paste("Cluster state: ", state, sep="") if (state == "RUNNING") { show("refresh") } else { hide("refresh") } if (state == "TERMINATED") { show("start") } else { hide("start") } }) output$table <- renderDataTable({values$df_data}) output$verb <- renderText({values$state}) } shinyApp(ui, server)
library(ggplot2)
library(shiny)
library(odbc)
library(AzureAuth)
library(httr)
library(jsonlite)
library(shinyjs)

databricksUrl=paste("https://", Sys.getenv(c("DATABRICKS_HOST")), "/api/2.0/", sep="")
databricksResource="2ff814a6-3304-4ab8-85cb-cd0e6f879c1d"#Constant for Azure AD which represent Databricks resources in Azure AD

clusterState <- function() {
  
  accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
                        password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
  authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")

  response<-GET(paste(databricksUrl, "clusters/get?cluster_id=", Sys.getenv(c("DATABRICKS_CLUSTER_ID")), sep=""), 
            encode = "json", 
            add_headers(Authorization = authorizationHeader))

  getClusterJsonResponse<-fromJSON(content(response, as = "text"))
  return(getClusterJsonResponse$state)
}

startCluster <- function() {
  
  accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
                        password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")
  authorizationHeader <- paste("Bearer ", accessToken$credentials$access_token, sep="")

  response<-POST(paste(databricksUrl, "clusters/start", sep=""), 
            encode = "json", 
            add_headers(Authorization = authorizationHeader),
            body = list(cluster_id=Sys.getenv(c("DATABRICKS_CLUSTER_ID"))))
}

dataframe <- function() {
  accessToken <- get_azure_token(databricksResource, Sys.getenv(c("DATABRICKS_TENANT")), Sys.getenv(c("DATABRICKS_CLIENT_ID")),
                  password=Sys.getenv(c("DATABRICKS_CLIENT_SECRET")), auth_type="client_credentials")

  # Connexion
  con <- dbConnect(odbc(), "Databricks_Cluster", Auth_AccessToken=accessToken$credentials$access_token, httpPath=Sys.getenv(c("DATABRICKS_HTTP_PATH")), host =Sys.getenv(c("DATABRICKS_HOST")))

  df <- dbGetQuery(con, "SELECT * FROM default.solar")
  dbDisconnect(con)
  # Data
  return(df)
}

# R Shiny App
ui = shiny::fluidPage(
  useShinyjs(),
  verbatimTextOutput("verb"),
  shiny::fluidRow(shiny::column(12, dataTableOutput('table'))),
  hidden(actionButton("refresh", "Refresh Data")),
  hidden(actionButton("start", "Start Cluster"))
  )
  
server = function(input, output) {

    values <- reactiveValues(df_data = NULL, state = "")

    state <- clusterState()
    values$state <- paste("Cluster state: ", state, sep="")

    if (state == "RUNNING")
    {
      show("refresh")
      values$df_data <- dataframe()
    }

    if (state == "TERMINATED")
      show("start")

    observeEvent(input$refresh, {
        values$state <- paste("Cluster state:", clusterState(), sep="")

        # Data
        values$df_data <- dataframe()
    })

    observeEvent(input$start, {
      startCluster()
      state <- clusterState()
      values$state <- paste("Cluster state: ", state, sep="")

      if (state == "RUNNING")
      {
        show("refresh")
      }
      else
      {
        hide("refresh")
      }

      if (state == "TERMINATED")
      {
        show("start")
      }
      else
      {
        hide("start")
      }
    })

    output$table <- renderDataTable({values$df_data})
    output$verb <- renderText({values$state})
  }


shinyApp(ui, server)

Dockerfile

Maintenant que nous avons une application fonctionnelle nous pouvons générer l’image à déployer. Le fait de travailler sur du Shiny est un peu particulier puisque nous devons d’abord installer un shiny server. Nous allons donc faire ça en 2 étapes :

  • Créer une image de base sur laquelle nous installons Shiny Server et les librairies ODBC
  • Créer un image à partir de cette dernière sur laquelle nous y installerons notre application

Ainsi, si nous avons plusieurs applications Shiny à déployer, nous pourrons réutiliser l’image de base pour ne pas avoir à réinstaller Shiny Server à chaque fois. Voici le dockerfile de l’image de base :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
FROM rocker/shiny:latest
RUN apt-get update
RUN apt-get install -y --no-install-recommends
libpq-dev
libxml2-dev
libssl-dev
libcurl4-openssl-dev
nano
curl
unixodbc
unixodbc-dev
### INSTALL databricks ODBC package
RUN curl https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/odbc/2.6.17/SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -o SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip &&
unzip SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -d tmp
RUN gdebi -n tmp/SimbaSparkODBC-2.6.17.0024-Debian-64bit/simbaspark_2.6.17.0024-2_amd64.deb
RUN rm -r tmp/*
RUN rm SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip
### CREATE ODBC.INI file
RUN echo "[ODBC Data Sources]" >> /etc/odbc.ini &&
echo "Databricks_Cluster = Simba Spark ODBC Driver" >> /etc/odbc.ini &&
echo "" >> /etc/odbc.ini &&
echo "[Databricks_Cluster]" >> /etc/odbc.ini &&
echo "Driver = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbc.ini &&
echo "Description = Simba Spark ODBC Driver DSN" >> /etc/odbc.ini &&
echo "HOST = " >> /etc/odbc.ini &&
echo "PORT = 443" >> /etc/odbc.ini &&
echo "Schema = default" >> /etc/odbc.ini &&
echo "SparkServerType = 3" >> /etc/odbc.ini &&
echo "AuthMech = 11" >> /etc/odbc.ini &&
echo "Auth_Flow = 0" >> /etc/odbc.ini &&
echo "ThriftTransport = 2" >> /etc/odbc.ini &&
echo "SSL = 1" >> /etc/odbc.ini &&
echo "HTTPPath = " >> /etc/odbc.ini &&
echo "UseProxy = 1" >> /etc/odbc.ini &&
echo "ProxyHost = $PROXY_HOST" >> /etc/odbc.ini &&
echo "ProxyPort = $PROXY_PORT" >> /etc/odbc.ini &&
echo "" >> /etc/odbc.ini &&
echo "[ODBC Drivers]" >> /etc/odbcinst.ini &&
echo "Simba = Installed" >> /etc/odbcinst.ini &&
echo "[Simba Spark ODBC Driver 64-bit]" >> /etc/odbcinst.ini &&
echo "Driver = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbcinst.ini &&
echo "" >> /etc/odbcinst.ini
#https://github.com/CSCfi/shiny-openshift/blob/master/Dockerfile
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
RUN chown -R shiny /var/lib/shiny-server/
# OpenShift gives a random uid for the user and some programs try to find a username from the /etc/passwd.
# Let user to fix it, but obviously this shouldn't be run outside OpenShift
RUN chmod ug+rw /etc/passwd
COPY fix-username.sh /fix-username.sh
COPY shiny-server.sh /usr/bin/shiny-server.sh
RUN chmod a+rx /usr/bin/shiny-server.sh
# Make sure the directory for individual app logs exists and is usable
RUN chmod -R a+rwX /var/log/shiny-server
RUN chmod -R a+rwX /var/lib/shiny-server
# Add environment variables for Shiny
RUN env | grep HTTP_PROXY >> /usr/local/lib/R/etc/Renviron &&
env | grep HTTPS_PROXY >> /usr/local/lib/R/etc/Renviron &&
chown shiny.shiny /usr/local/lib/R/etc/Renviron &&
chmod a+rw /usr/local/lib/R/etc/Renviron
ENTRYPOINT /usr/bin/shiny-server.sh
FROM rocker/shiny:latest RUN apt-get update RUN apt-get install -y --no-install-recommends libpq-dev libxml2-dev libssl-dev libcurl4-openssl-dev nano curl unixodbc unixodbc-dev ### INSTALL databricks ODBC package RUN curl https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/odbc/2.6.17/SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -o SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip && unzip SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -d tmp RUN gdebi -n tmp/SimbaSparkODBC-2.6.17.0024-Debian-64bit/simbaspark_2.6.17.0024-2_amd64.deb RUN rm -r tmp/* RUN rm SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip ### CREATE ODBC.INI file RUN echo "[ODBC Data Sources]" >> /etc/odbc.ini && echo "Databricks_Cluster = Simba Spark ODBC Driver" >> /etc/odbc.ini && echo "" >> /etc/odbc.ini && echo "[Databricks_Cluster]" >> /etc/odbc.ini && echo "Driver = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbc.ini && echo "Description = Simba Spark ODBC Driver DSN" >> /etc/odbc.ini && echo "HOST = " >> /etc/odbc.ini && echo "PORT = 443" >> /etc/odbc.ini && echo "Schema = default" >> /etc/odbc.ini && echo "SparkServerType = 3" >> /etc/odbc.ini && echo "AuthMech = 11" >> /etc/odbc.ini && echo "Auth_Flow = 0" >> /etc/odbc.ini && echo "ThriftTransport = 2" >> /etc/odbc.ini && echo "SSL = 1" >> /etc/odbc.ini && echo "HTTPPath = " >> /etc/odbc.ini && echo "UseProxy = 1" >> /etc/odbc.ini && echo "ProxyHost = $PROXY_HOST" >> /etc/odbc.ini && echo "ProxyPort = $PROXY_PORT" >> /etc/odbc.ini && echo "" >> /etc/odbc.ini && echo "[ODBC Drivers]" >> /etc/odbcinst.ini && echo "Simba = Installed" >> /etc/odbcinst.ini && echo "[Simba Spark ODBC Driver 64-bit]" >> /etc/odbcinst.ini && echo "Driver = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbcinst.ini && echo "" >> /etc/odbcinst.ini #https://github.com/CSCfi/shiny-openshift/blob/master/Dockerfile COPY shiny-server.conf /etc/shiny-server/shiny-server.conf RUN chown -R shiny /var/lib/shiny-server/ # OpenShift gives a random uid for the user and some programs try to find a username from the /etc/passwd. # Let user to fix it, but obviously this shouldn't be run outside OpenShift RUN chmod ug+rw /etc/passwd COPY fix-username.sh /fix-username.sh COPY shiny-server.sh /usr/bin/shiny-server.sh RUN chmod a+rx /usr/bin/shiny-server.sh # Make sure the directory for individual app logs exists and is usable RUN chmod -R a+rwX /var/log/shiny-server RUN chmod -R a+rwX /var/lib/shiny-server # Add environment variables for Shiny RUN env | grep HTTP_PROXY >> /usr/local/lib/R/etc/Renviron && env | grep HTTPS_PROXY >> /usr/local/lib/R/etc/Renviron && chown shiny.shiny /usr/local/lib/R/etc/Renviron && chmod a+rw /usr/local/lib/R/etc/Renviron ENTRYPOINT /usr/bin/shiny-server.sh
FROM rocker/shiny:latest

RUN apt-get update

RUN apt-get install -y --no-install-recommends 
  libpq-dev 
  libxml2-dev 
  libssl-dev 
  libcurl4-openssl-dev 
  nano 
  curl 
  unixodbc 
  unixodbc-dev

### INSTALL databricks ODBC package
RUN curl https://databricks-bi-artifacts.s3.us-east-2.amazonaws.com/simbaspark-drivers/odbc/2.6.17/SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -o SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip && 
    unzip SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip -d tmp
RUN gdebi -n  tmp/SimbaSparkODBC-2.6.17.0024-Debian-64bit/simbaspark_2.6.17.0024-2_amd64.deb
RUN rm -r tmp/*
RUN rm SimbaSparkODBC-2.6.17.0024-Debian-64bit.zip

### CREATE ODBC.INI file
RUN echo "[ODBC Data Sources]" >> /etc/odbc.ini && 
    echo "Databricks_Cluster = Simba Spark ODBC Driver" >> /etc/odbc.ini && 
    echo "" >> /etc/odbc.ini && 
    echo "[Databricks_Cluster]" >> /etc/odbc.ini && 
    echo "Driver          = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbc.ini && 
    echo "Description     = Simba Spark ODBC Driver DSN" >> /etc/odbc.ini && 
    echo "HOST            = " >> /etc/odbc.ini && 
    echo "PORT            = 443" >> /etc/odbc.ini && 
    echo "Schema          = default" >> /etc/odbc.ini && 
    echo "SparkServerType = 3" >> /etc/odbc.ini && 
    echo "AuthMech        = 11" >> /etc/odbc.ini && 
    echo "Auth_Flow       = 0" >> /etc/odbc.ini && 
    echo "ThriftTransport = 2" >> /etc/odbc.ini && 
    echo "SSL             = 1" >> /etc/odbc.ini && 
    echo "HTTPPath        = " >> /etc/odbc.ini && 
    echo "UseProxy        = 1" >> /etc/odbc.ini &&  
    echo "ProxyHost       = $PROXY_HOST" >> /etc/odbc.ini &&  
    echo "ProxyPort       = $PROXY_PORT" >> /etc/odbc.ini &&  
    echo "" >> /etc/odbc.ini && 
    echo "[ODBC Drivers]" >> /etc/odbcinst.ini && 
    echo "Simba = Installed" >> /etc/odbcinst.ini && 
    echo "[Simba Spark ODBC Driver 64-bit]" >> /etc/odbcinst.ini && 
    echo "Driver = /opt/simba/spark/lib/64/libsparkodbc_sb64.so" >> /etc/odbcinst.ini && 
    echo "" >> /etc/odbcinst.ini

#https://github.com/CSCfi/shiny-openshift/blob/master/Dockerfile
COPY shiny-server.conf /etc/shiny-server/shiny-server.conf
RUN chown -R shiny /var/lib/shiny-server/

# OpenShift gives a random uid for the user and some programs try to find a username from the /etc/passwd.
# Let user to fix it, but obviously this shouldn't be run outside OpenShift
RUN chmod ug+rw /etc/passwd 
COPY fix-username.sh /fix-username.sh
COPY shiny-server.sh /usr/bin/shiny-server.sh
RUN chmod a+rx /usr/bin/shiny-server.sh

# Make sure the directory for individual app logs exists and is usable
RUN chmod -R a+rwX /var/log/shiny-server
RUN chmod -R a+rwX /var/lib/shiny-server

# Add environment variables for Shiny
RUN env | grep HTTP_PROXY >> /usr/local/lib/R/etc/Renviron && 
    env | grep HTTPS_PROXY >> /usr/local/lib/R/etc/Renviron && 
    chown shiny.shiny /usr/local/lib/R/etc/Renviron && 
    chmod a+rw /usr/local/lib/R/etc/Renviron

ENTRYPOINT /usr/bin/shiny-server.sh

Vous voyez dans ce Dockerfile que nous reprenons ce que nous avons vu sur l’article précédent pour paramétrer ODBC.
Et voici le Dockerfile se basant sur l’image précédente « shinyserver-odbc » et sur laquelle nous déployons notre application :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
FROM shinyserver-odbc:latest
### Install R
RUN install2.r -e shinydashboard
DBI
odbc
RPostgreSQL
jsonlite
dplyr
magrittr
dbplyr
stringr
tidyr
DT
ggplot2
shinyjs
scales
plotly
shinyBS
lubridate
shinyWidgets
rmarkdown
shiny
httr
AzureAuth
# copy the app directory into the image
COPY . /srv/shiny-server/
# make application writable to test updates
RUN chown -R shiny:shiny /srv/shiny-server/
RUN chmod -R a+rw /srv/shiny-server
FROM shinyserver-odbc:latest ### Install R RUN install2.r -e shinydashboard DBI odbc RPostgreSQL jsonlite dplyr magrittr dbplyr stringr tidyr DT ggplot2 shinyjs scales plotly shinyBS lubridate shinyWidgets rmarkdown shiny httr AzureAuth # copy the app directory into the image COPY . /srv/shiny-server/ # make application writable to test updates RUN chown -R shiny:shiny /srv/shiny-server/ RUN chmod -R a+rw /srv/shiny-server
FROM shinyserver-odbc:latest

### Install R
RUN install2.r -e shinydashboard 
 DBI 
 odbc 
 RPostgreSQL 
 jsonlite 
 dplyr 
 magrittr 
 dbplyr 
 stringr 
 tidyr 
 DT 
 ggplot2 
 shinyjs 
 scales 
 plotly 
 shinyBS 
 lubridate 
 shinyWidgets 
 rmarkdown 
 shiny 
 httr 
 AzureAuth

# copy the app directory into the image
COPY . /srv/shiny-server/

# make application writable to test updates
RUN chown -R shiny:shiny /srv/shiny-server/
RUN chmod -R a+rw /srv/shiny-server

Variables d’environnement

Shiny a la particularité de ne pas accéder aux variables d’environnement système. Il a son propre fichier contenant ses variables : /usr/local/lib/R/etc/Renviron
Dans notre exemple d’application Shiny, nous accédons à des variables d’environnement. Pour que ça fonctionne, je vous conseille de modifier le fichier shiny-server.sh pour ajouter ces variables au démarrage de l’application, comme ceci :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
env | grep DATABRICKS_HOST >> /usr/local/lib/R/etc/Renviron &&
env | grep DATABRICKS_HTTP_PATH >> /usr/local/lib/R/etc/Renviron &&
env | grep DATABRICKS_CLUSTER_ID >> /usr/local/lib/R/etc/Renviron &&
env | grep DATABRICKS_TENANT >> /usr/local/lib/R/etc/Renviron &&
env | grep DATABRICKS_CLIENT_SECRET >> /usr/local/lib/R/etc/Renviron &&
env | grep DATABRICKS_CLIENT_ID >> /usr/local/lib/R/etc/Renviron
env | grep DATABRICKS_HOST >> /usr/local/lib/R/etc/Renviron && env | grep DATABRICKS_HTTP_PATH >> /usr/local/lib/R/etc/Renviron && env | grep DATABRICKS_CLUSTER_ID >> /usr/local/lib/R/etc/Renviron && env | grep DATABRICKS_TENANT >> /usr/local/lib/R/etc/Renviron && env | grep DATABRICKS_CLIENT_SECRET >> /usr/local/lib/R/etc/Renviron && env | grep DATABRICKS_CLIENT_ID >> /usr/local/lib/R/etc/Renviron
env | grep DATABRICKS_HOST >> /usr/local/lib/R/etc/Renviron && 
env | grep DATABRICKS_HTTP_PATH >> /usr/local/lib/R/etc/Renviron && 
env | grep DATABRICKS_CLUSTER_ID >> /usr/local/lib/R/etc/Renviron && 
env | grep DATABRICKS_TENANT >> /usr/local/lib/R/etc/Renviron && 
env | grep DATABRICKS_CLIENT_SECRET >> /usr/local/lib/R/etc/Renviron && 
env | grep DATABRICKS_CLIENT_ID >> /usr/local/lib/R/etc/Renviron

Code source complet

Vous avez maintenant un exemple d’application dont vous trouverez le code source complet sur notre github : dcube/ShinyServer (github.com)

1 Commentaire

  1. youcef

    Hi,
    this was an amazing article, I am wondering if it’s possible to build dashboards to track pipelines activities, events in case we work with event hub ?

    Réponse

Soumettre un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Découvrez nos autres articles

Databricks AI Summit 2025

Databricks AI Summit 2025

Après le Snowflake Summit, Databricks a pris le relais au Data + AI Summit 2025 avec une évolution notable. La plateforme ne se limite plus à l’ingénierie ou à la science des données : elle se positionne désormais comme un système d’activation intelligent, où modèles,...

Lire la suite