Releasing to PyPi

Releasing to PyPi

Category : Python

While attempting to explain the Datascroller release process to collaborators John and Kevin over Hangouts, it quickly became apparent that these steps needed to be put in writing. The process below is very manual, tedious, and constructed from a trial and error approach. So take with a grain of salt and leave a comment if you know a better way!

Verifying the code to be released

PyPi is not especially forgiving if you release code you didn’t intend to, so it’s worth a bit of time to double check. After checking out the master branch and pulling changes from the remote, I check the following:

  • Is version in setup.py the right one for the release?
  • If using download_url (which I just learned is not advised), does the suffix align with the version?

After confirming that the right code is in my local repo, the next checks concern the functionality. Datascroller only has one test script right now that is not hooked up to CI/CD, so here are our steps:

  • Create a virtual environment with venv (e.g., python -m venv myenvs/test) and enter it
  • While still in my local repo, perform a developer install using pip: pip install -e .
  • Run any test scripts you have

For Datascroller I like to repeat the steps above on both Windows and Linux.

Uploading to TestPyPi

With PyPi, if you successfully release version 1.2.0 and then realize something is wrong with the code, then that’s too bad because you’ll never be able to release it again. You can release version 1.2.1, but that requires a code change to setup.py and might mess up your plans. That’s where TestPyPi comes in (make an account if you don’t have one). Back in the root of your local repo, while still in the virtual environment created above, perform the following steps:

  • pip install twine (a package especially for publishing to PyPi)
  • python setup.py sdist (creates the “dist” source distribution folder)
  • twine upload --skip-existing --repository-url https://test.pypi.org/legacy/ dist/*

You’ll then get a link to go look at your package on TestPyPi. “See, looks good!” I told collaborators John and Kevin. “So, what exactly looks good?” asked John. He had a point, so I created new virtual environments and ran

pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ datascroller

to ensure a key demo worked when installed through TestPyPi. Note in the command above that you have to link to the real PyPi to get third-party packages such as Pandas. This is enabled via the flag --extra-index-url.

If something when wrong on version 1.2.0, then even on TestPyPi, you cannot delete and resubmit version 1.2.0. You would need to change the version in setup.py and then change it back before submitting to PyPi. In these cases I would append a suffix to the current version (e.g., 1.2.0rc1).

Uploading to PyPi

If everything looks good and your account is in good standing on PyPi.org, then you’re ready for the very last step:

  • twine upload --skip-existing dist/*

Getting right with Github

After a successful release to PyPi, the final step is to go to the Releases section of the Datascroller project and draft a new release. This will snapshot the project and provide downloads of the entire codebase in zip and tar.gz form. Since we were using the download_url argument in setup.py, we’d try to match the future name of the tarball that only exists after this step. Provided we really can (or should) leave that argument out next time, that’s one less thing to worry about.

Concluding thoughts

When showing this workflow to John and Kevin, the feedback was, “there has to be a way to automate this.” Creating multiple virtual environments and installations in itself a hassle, so this is something we’ll be investing in soon. Please comment below regarding any weak points in the process above and suggestions to make releasing to PyPi easier and more fun!


Introducing “datascroller” for fast terminal data frame scrolling

Category : Python , Tools

I’m excited to announce my very first package on PyPi, datascroller, a Python package for interactive terminal data scrolling. It’s available for Windows as well as *nix systems (thanks to windows-curses), and contributors to the codebase are welcome!

How it works

See the gif below for a glimpse of datascroller in action:

datascroller allows terminal datascrolling

The syntax has changed slightly since the gif was created, but during that demo, I was pressing keys to resize the terminal viewing window and to scroll from left-to-right and up-to-down within a Pandas data frame. Currently the scrolling keys are inspired by vim but later versions will offer customization options.

You can install datascroller with pip using:

pip install datascroller

Try datascroller out in iPython with the following code:

import pandas as pd
from datascroller import scroll

train = pd.read_csv(
    'https://raw.githubusercontent.com/datasets/house-prices-uk/master/data/data.csv')

scroll(train)

Why a terminal data scroller?

Scrolling a through a data set is a fundamental part of exploratory data analysis, and open-source tools let us down in this regard. SAS has had it right for a while. From my memory of around 2001, you could scroll through tens of millions of rows through what must have been a very clever paging strategy. Say what you want about SAS, but honestly no other data viewer has come close.

Moving to R in 2009, I had to accept the loss of SAS’s data set viewer and learn to accept the built-in viewer or just print slices of the data frame in the console. Soon after, I started using RStudio. They offered a nice improvement on the default viewer, but it still couldn’t hold a candle to SAS’s and didn’t handle very large data sets well at the time (to the best of my recollection).

In 2019, RStudio may very well have their data viewer tuned to perfection. But some people prefer working in the terminal, and sometimes you have to (say, a client gives you an ssh login for a particular remote machine). It is possible to hook up notebooks, or use an X-server, but often it’s easier to just print slices of your data sets in the terminal for exploratory analysis. Ehile R’s tibble and Panda’s DataFrame are smart enough to not overwhelm your console with output, they make you work to see the parts of the data that you really need to see.

The datascroller vision

The featured image is a play on the movie “Minority Report” and its very memorable scene with Tom Cruise’s character using the futuristic API to sort through information. I always wanted to move around the data set like that, and I felt that the terminal would be a good place to do it. In 2014, at Google, I took my first crack at this with an internal R package I called “terminalR.” I got helpful feedback from data scientists there, especially Tim Hesterberg. Tim convinced me of the need to implement user configuration options (still a TODO for datascroller!) and also to transition to Emacs/ESS since they came with Emacs Lisp. But, we stopped short of achieving the vision full interactivity.

The terminalR package’s original mechanism was “drumming” on the enter key while you pressed other navigation buttons, as it relied on R’s standard console input methods). With Python offering wrappers for the curses library for both *nix systems and Windows, the interactive “vision” has become a reality.

What’s next for datascroller?

The Python package datascroller, currently for use with Pandas dataframes, will become the tool “datascroller” for general purpose terminal data scrolling. Imagine interactive terminal scrolling of any csv, text, or even JSON file that can be initiated from outside of Python. My past colleague John Merfeld, who makes extensive use of low vision accessibility tools, is on the project and will help consult as to whether certain color schemes (curses offers those) help make the terminal output easier to see, thus giving datascroller an accessibility angle.

Even with TerminalR, I could get around an R data frame pretty fast, fast than any GUI viewer. It has column and row searching functionality from the keyboard, and a lot of movement options. All these options and more are coming to datascroller soon, in full interactive fashion.

I have big plans for this tool.