I’m excited to announce my very first package on pypi, datascroller, a Python package for interactive terminal data scrolling. It’s available for Windows as well as *nix systems (thanks to the windows-curses package), and there are issues for outside contributors on the datascroller Github repo.
How it works
See the gif below for a glimpse of datascroller in action:
During that demo, I was pressing keys to resize the terminal viewing window and to scroll from left-to-right and up-to-down within a Pandas data frame. Currently the scrolling keys are inspired by vim but later versions will offer customization options.
You can install datascroller with pip using:
pip install datascroller
Try datascroller out in iPython with the following code:
import pandas as pd from datascroller.scroller import scroller train = pd.read_csv( 'https://raw.githubusercontent.com/datasets/house-prices-uk/master/data/data.csv') scroller(train)
Why a terminal data scroller?
Scrolling a through data is a fundamental part of exploratory data analysis, and we’ve all had open-source tools let us down. My first experience with industrial-grade data scrolling came with using SAS at the turn of the century. Even then, you could scroll through tens of millions of rows on your 386DX through what must have been a very clever paging strategy. Say what you want about SAS, but honestly no other data viewer since then has beat it for me.
Moving to R around 2009, I had to accept the loss of SAS’s data set viewer and learn to accept the built-in viewer or just print slices of the data frame in the console. Around 2010, I started using RStudio and was impressed with their viewer, but it still couldn’t hold a candle to SAS’s and didn’t handle very large data sets well at the time (to the best of my recollection).
In 2019, RStudio may very well have their terminal viewer tuned to perfection. Even so, there are still some of us who find full-blown IDEs and even notebooks bulky and not worth the hassle. Like electric sunroofs, they’re just one more thing to break; sometimes rolling down your windows is good enough. That’s why, until the day I die or completely blind (more to come), I’ll be typing into a terminal.
The problem with working with data in a terminal is that you often don’t have access to graphical displays (without complicated setups) and you end up having to print slices of your data sets in the terminal for exploratory analysis. This slows you down! And while R’s tibble and Panda’s DataFrame are smart enough to not overwhelm your console with output, they make you work to see the parts of the data that you really need to see.
The datascroller vision
The featured image is a play on the movie “Minority Report” and its very memorable scene with Tom Cruise’s character using the futuristic API to sort through information. I always wanted to move around the data set like that, and I felt that the terminal would be a good place to do it. In 2014, at Google, I took my first crack at this with an internal R package I called “terminalR.” I got helpful feedback from mentors there, especially Tim Hesterberg, which I plan to incorporate into datascroller. The problem with terminalR was that you had to “drum” on the enter key while you used it (it relied on standard console input methods), which was corny. But Python offers the curses library, allowing my interactive “vision” to come true.
What’s next for datascroller?
The Python package datascroller, currently for use with Pandas dataframes, will become the tool “datascroller” for general purpose terminal data scrolling. Imagine interactive terminal scrolling of any csv, text, or even JSON file that can be initiated from outside of Python. And I’m trying to convince my friend John Merfeld, who makes extensive use of low vision accessibility tools, to help me light this thing up like a Christmas Tree to make data scroller itself an accessibility tool.
I have big plans for this tool.