Skip to content

Instantly share code, notes, and snippets.

@oudy525i
Forked from abhioncbr/Apache_Superset.md
Created June 2, 2020 13:27
Show Gist options
  • Select an option

  • Save oudy525i/8a0a2c3144832fa1f66609281199ce31 to your computer and use it in GitHub Desktop.

Select an option

Save oudy525i/8a0a2c3144832fa1f66609281199ce31 to your computer and use it in GitHub Desktop.
Apache Superset in the production environment

Apache Superset in the production environment

Visualising data helps in building a much deeper understanding of the data and fastens analytics around the data. There are several mature paid products available in the market. Recently, I explored an open-source product name Apache-Superset which I found a very upbeat product in this space. Some prominent features of Superset are:

  • A rich set of data visualisations
  • An easy-to-use interface for exploring and visualising data
  • Create and share dashboards

After reading about Superset, I wanted to try it, and as Superset is a python programming language based project, we can easily install it using pip, but I decided to set it up as a container based on Docker. Apache-Superset GitHub Repo contains code for building and running Superset as a container. Since I want to run Superset in a completely distributed manner and less modification is possible in the code(my opinion), I decided to modify the code so that it could run in multiple different modes. Below is a list of specific changes/enhancements done in the code

  • Different version of Superset image can be built using the same code.
  • Superset configuration can be easily edited and mounted into the container, no need of rebuilding the image.
  • Asynchronous query execution through Celery based executor and managing it through Flower UI

Exploration made easy

While for exploring a project, development mode is an excellent choice, however, it would be great if initial exploration happens with all the features for instance, in-case of Superset, running queries in async mode, and storing the result in cache. You can explore Superset smoothly by the below commands.

  • First pull a docker-superset image from docker-hub
docker pull abhioncbr/docker-superset:<tag>
cd docker-files/ && SUPERSET_ENV=<local | prod> SUPERSET_VERSION=<tag> docker-compose up -d

Running Superset in a complete distributed mode

As per my understanding, running a Superset in the production environment for serving thousands of end-users setup should be distributed in nature and can be easily scalable as per the requirements. The below image depicts such setup

distributed-superset-setup

Published docker-image of Superset can be leveraged to achieve the above depicted image

  • Load-balancer in front for routing the request from clients to one server container.
  • Multiple containers in server mode for serving the UI of the Superset. Starting a server container using docker run can be done as
docker run -p 8088:8088 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster server <db_url> <redis_url>
  • Multiple containers in worker mode for executing the SQL queries in an async mode using Celery executor. Starting a worker container using docker run can be done as
docker run -p 5555:5555 -v config:/home/superset/config/ abhioncbr/docker-superset:<tag> cluster worker <db_url> 
<redis_url>
  • Centralised Redis container or Redis-cluster for serving as cache layer and Celery task queues for workers.
  • Centralised Superset metadata database.

I found setting up a Superset as Docker container is quite easy and the same can be used for different environments. You can similarly explore Superset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment