Data dashboards and ML demos

On assuring the client that you are doing something data-sciency because it looks like in the movies

March 12, 2020 — September 20, 2023

communicating
computers are awful
data sets
dataviz
faster pussycat
generative art
making things
photon choreography
statistics
UI
workflow
Figure 1: Multimodal data visualization

At the intersection of data visualization and database UI is the data dashboard. AFAICT this means “an exploratory graphing tool for your data which requires little or no programming or statistics special knowledge.” Occasionally useful. Occasionally cargo-culted by bizdev people who don’t know what they are doing. See also, e.g. the open source dashboard framework roundup, or the alternativeto Tableau listing.

Question: why are data dashboards not configured by default, out-of-the-box to ignore half the data that they are put in, making them into principled exploratory data analysis tools by reserving some data to verify the hypotheses they suggest?

Here are some options I am auditioning for some clients. They have a lot of overlap; it is hard to identify the USP of each.

1 Dash

dash is an open-source dashboard framework for R, Python, and Julia. There is an expensive enterprise version also. It is managed by the creators of plotly, a classic web dataviz solution.

2 Voilà

A jupyter-specific option.

Voilà turns Jupyter notebooks into standalone web applications.

  • Voilà supports Jupyter interactive widgets, including the roundtrips to the kernel.
  • Voilà does not permit arbitrary code execution by consumers of dashboards.
  • Built upon Jupyter standard protocols and file formats, voilà works with any Jupyter kernel (C++, Python, Julia), making it a language-agnostic dashboarding system.
  • Voilà is extensible. It includes a flexible template system to produce rich application layouts.

Sounds similar to many other options? Panel explains how these compare.

There is useful stuff in the broader Voilà ecosystem, which is a good sign, e.g.:

Voila-gridstack is a Voilà template started by Bartosz Telenczuk to turn notebooks into dashboards …. The idea behind it is to be able to change the layout of the cells to re-configure your dashboards using drag-and-drop. Once you have your desired layout, its configuration stays in the metadata of the notebook. This makes it simple to carry around or share the notebook and its layout configuration.

3 Panel

Panel is the dashboard part of holoviz.

Panel is an open-source Python library that lets you create custom interactive web apps and dashboards by connecting user-defined widgets to plots, images, tables, or text.

Compared to other approaches, Panel is novel in that it supports nearly all plotting libraries, works just as well in a Jupyter notebook as on a standalone secure web server, uses the same code for both those cases, supports both Python-backed and static HTML/JavaScript exported applications, and can be used to develop rich interactive applications without tying your domain-specific code to any particular GUI or web tools.

Panel makes it simple to make:

  • Plots with user-defined controls
  • Property sheets for editing parameters of objects in a workflow
  • Control panels for simulations or experiments
  • Custom data-exploration tools
  • Dashboards reporting key performance indicators (KPIs) and trends
  • Data-rich Python-backed web servers
  • and anything in between

Panel objects are reactive, immediately updating to reflect changes to their state, which makes it simple to compose viewable objects and link them into simple, one-off apps to do a specific exploratory task. The same objects can then be reused in more complex combinations to build more ambitious apps, while always sharing the same code that works well on its own.

Panel lets you move the same code freely between an interactive Jupyter Notebook prompt and a fully deployable standalone server. That way you can easily switch between exploring your data, building visualizations, adding custom interactivity, sharing with non-technical users, and back again at any point, using the same tools and the same code throughout. Panel thus helps support your entire workflow, so that you never have to commit to only one way of using your data and your analyses, and don’t have to rewrite your code just to make it usable in a different way. In many cases, using Panel can turn projects that used to take weeks or months into something you finish on the same day you started, creating a full Python-backed deployed web service for your visualized data in minutes or hours without having to run a software development project or hand your work over to another team.

This sounds similar to the other ones, no? Marc Skov Madsen argues the following point of difference

I am a Python-based data scientist and prefer Panel because I believe it is the most Pythonic, flexible and powerful data app framework out there.[…]

  • I can explore and develop efficiently using VS Code
  • Most of my colleagues do not work in VS Code. Instead, they work in Jupyter Notebooks, Jupyter Labs, Jupyter Hub, Spyder, and PyCharm. Panel works great in all these environments…
  • We can use Panel for interactive data exploration (Dash and Streamlit are primarily useful for building data apps.)
  • We can build highly performant and snappy data apps with Panel
  • You can use the plotting libraries you know and love

etc

4 streamlit

Streamlit. The Streamlit Gallery speaks volumes.

Streamlit is an open-source Python library that makes it easy to create and share beautiful, custom web apps for machine learning and data science.

In just a few minutes, you can build and deploy powerful data apps to:

  • Explore your data
  • Interact with your model
  • Analyse your model behaviour and input sensitivity
  • Showcase your prototype with awesome web apps

Moreover, Streamlit enables interactive development with automatic rerun on file changes.

5 Mercury

Mercury allows you to add interactive widgets in Python notebooks, so you can share notebooks as web applications. Forget about rewriting notebooks to web frameworks just to share your results. Mercury offers a set of widgets with simple re-execution of cells.

It is implemented as a specialised jupyter front-end.

6 Grafana

Grafana seems to specialize in time series analytics.

From heatmaps to histograms. Graphs to geomaps. Grafana has a plethora of visualization options to help you understand your data, beautifully. […] Bring your data together to get better context. Grafana supports dozens of databases, natively. Mix them together in the same Dashboard.

7 R Shiny

R can do lots of data analysis, including database analytics as a special case. If you want it to be web-based, shiny can put many queries/regressions/etc online, with all the statistical modelling power of R. This still favours statisticians framing the actual analysis, but given how bad we are at statistics as a species, this might be considered a feature not a bug that it requires you to pass the low-bar of understanding simple statistical software.

8 Glamorous Toolkit

Glamorous Toolkit

Glamorous Toolkit is the moldable development environment. It is a live notebook. It is a flexible search interface. It is a fancy code editor. It is a software analysis platform. It is a data visualization engine. All in one. And it is free and open-source under an MIT license.

9 Tableau

Tableau provides commercially-supported dashboards. An interesting example of this is the Mapping Police Violence project. It highlights both the insight you can get from this kind of visualization (Wow they kill a lot of people in the USA!) and also the dangerous limitation in these dashboards i.e. straight-up graphs of reported data do not solve difficult statistical modelling problems such as accounting for sampling bias. Although they might give the impression that this problem is solved.

10 Microsoft PowerBI

11 superset

Developed by Airbnb, now an Apache product.

tl;dr autogenerates dashboards based on your database, making it look like you have been doing something other than just collecting random crap.

Apache Superset is a data exploration and visualization web application.

Superset provides:

  • A wide array of beautiful visualizations to showcase your data.
  • A state of the art SQL editor/IDE exposing a rich metadata browser, and an easy workflow to create visualizations out of any result set.
  • Out of the box support for most SQL-speaking databases
  • [other keywords that only boring bizdev types care about and no one real ever needs]

12 blazer

blazer is a dashboarding/interactive query UI.

features:

  • Multiple data sources - PostgreSQL, MySQL, Redshift, Mongodb…
  • Variables - run the same queries with different values
  • Checks & alerts - get emailed when bad data appears
  • Audits - all queries are tracked
  • Security - works with your authentication system

13 Metabase

Metabase. Shanker Sneh, about whom I know nothing, says:

Good:

  1. Robust and clearly laid out framework. Supports proper database for application metadata.
  2. Feature-rich with easy user, query, segment & dashboard management & classification.
  3. Supports Google SSO, Slack, Email integration.

Not-so-good:

  1. Framework is Java based. Any customization will require dev activities from our end.

14 Database flow

database flow

Database Flow is an open source self-hosted SQL client, GraphQL server, and charting application that works with your database.

Visualize schemas, query plans, charts, and results.

Java app.

15 Gradio

specifically for python ML apps.

16 Incoming