Exploring In-Browser Data Analysis Solutions

This is a subjective review of the tools.

Table of Contents

Why?
ChatGPT Code Interpreter
Observable
Datasette
Apache Superset
Concord CODAP
Pretzel
Conclusion

Why?

https://llm-biases.teddysc.me/

I'm working with researchers in AI / HCI / education fields from NCSU and Northwestern and we want to explore options for non-coders / high schoolers to explore certain datasets to understand LLM biases.

Example datasets include:

The Kelly paper
- https://tddschn-kelly-datasette.hf.space/
- https://tddschn-biases-llm-reference-letters-datasette.hf.space/evaluated_letters-chatgpt-cbg/all_2_para_w_chatgpt_eval_hallucination_eval

We want to enable them to explore complex data and create meaningful plots on their own, or make something interactive for them to play with in browser.

I also suggested Tableau and PowerBI, but they only want web-based solutions.

ChatGPT Code Interpreter

https://g.teddysc.me/22990876690398bd42cfaa4be87a546e

It's very intuitive, just uploading a csv file + prompting.

Observable

Observable was the first thing that came to my mind when I was tasked with this - it's insanely powerful and interactive, and Observable Plot is much easier to use than d3.js.

ObservableHQ.com and the Observable Framework have many things in common, and I'm talking about the common part.

It's fun to play with and it's certainly very powerful, but kinda hard to get what I want after learning it for a few days. I'm still more used to my Python muscle.

Datasette

I love it and deployed it everywhere,

Plotting features is limited, so it's not a good fit for this task.

Apache Superset

Open source, lots of stars on GitHub, just like every other software from ASF. A no-code thing.

Documentation is lacking. I got stuck when trying to get a step in the tutorial to work, but I couldn't find solutions online.

Learning curve is kinda steep, and I'd rate the user friendliness as 3/10.

Some HN readers had great experience with this, maybe I'll appreciate the tool a lot more if I know how to do certain things. But sadly, I'm not there and the lack of resources is an obstacle for me to get there.

Concord CODAP

Source available. Very few people know this one and neither did I before this. Few GitHub stars, very very small community, and poor documentation.

But it is very neat. Drag-n-drop oriented, and intuitive, you can even have windows stacked on top of each other.

Highlighting records in the table would also highlight the data points in the plot, which is awesome.

Plot supports x, y, hue, and groups. And the avg / median line feature. Very neat.

You can try the example datasets and guided tours here: https://codap.concord.org/app/static/dg/en/cert/index.html

They want me to make a CODAP plugin to highlight biased part of text data in the datasets, but I don't think that's a good idea -
Looking at their Issues page, they don't seem to care about making it easier for people to contribute code to develop upon it.
When I try to deploy the web app, I encountered issues because of lack of proper documentation for the process.

I also love the map plot feature where they show the data points on a map.

I'd recommend just use their hosted web app for exploring small datasets, don't try to extend it or mess with the code.

Pretzel

https://pretzelai.github.io/

https://github.com/pretzelai/pretzelai

https://news.ycombinator.com/item?id=39717268

I love this. Modern UI, WASM, DuckDB (performant and can handle large datasets), CyberChef-like processing pipeline UI, and even complete pivot table & PRQL support.

They're embracing good designs and tech, and bleeding edge tech.

PWA and easible self-hostable (I tested it). Devs very open to ideas and feedback. All the good signs of a rising star.

As of time of writing this post, they're still actively adding new features. Current downside is lack of support for different plot types.

Supporting data filtration and export is a big plus. CODAP can't do this.

Pros:

Fast, loads 71 MB csv file with no problem
Direct access to SQL / PRQL Con:
No sharable URL to your workflow / analysis
Functionalities still very limited compared to what I could do with Python

Conclusion

For the purpose of playing with data for non-coders, it's probably best to go with CODAP first if dataset is small, once you know what you want you could asked ChatGPT to generate pretty plots for you with seaborn.

For large datasets, I'd recommend pre-process (add / remove columns, filter, sample) with Pretzel to get a small-enough datasets, then use CODAP / ChatGPT for insights visualization.

For best easy-of-use and level of control, I suggest just start using Python on Colab and use ChatGPT to write code for you. If the user is somewhat technical, VS Code with Copilot would be even better.

For people living in the terminal, DuckDB + YouPlot is a pretty cool combo too.