Category Archives: Python

On Getting Python Functionality in a Simple Website

Tags :

Category : Infrastructure , Python

I’ll be trying something different with this blog post – logging my efforts while going after a goal. It might not be riveting reading but I’m looking for ways to increase my output in 2020. And Happy New Year!

I’ve been enjoying the very cheap and surprisingly functional web hosting from Siteground, but one way they keep the cost down is by using software over a decade old and denying install privileges on the machines (that I can ssh in 24/7 for less than the price a beer a month still amazes me). You won’t be able to FTP a copy of Miniconda (trust me, I tried it), so you’re stuck with the LAMP framework. But at the same time, you have powerful data science functions written in Python. What do you do?

The rest of this stream-of-consciousness style blog article explores the use of Heroku to solve this problem. Though I do not explicitly show a website requesting output from Python data science modules, I believe all the pieces are in place by the end.

Parthiban’s REST API via Heroku Tutorial

Previously I wrote about deploying data science apps on Heroku, and while I was very impressed with the service and how it let me work with common data science Python dependencies like statsmodels, I never exposed the app to the outside world. Today, I found this great looking article from Parthiban Sudhaman that promises a walk-through of developing a REST API in Python and deploying it on Heroku. Let’s try it out.

Environment

Parthiban recommends creating a virtual environment and installing the following dependencies:

Glancing at my first Heroku article, I realize need to get the Heroku CLI. Given the limitations of Gunicorn, I’ll install this on Windows Subsystem for Linux (WSL). But the default way of installing the Heroku CLI on Ubuntu, Snap, does not work with WSL at the time of writing. This command will install it on WSL:

curl https://cli-assets.heroku.com/install.sh | sh.

Trying the simple REST API

Next, I create my own version of todo.py from Step 4 of the article, and note that the sole import is Resource from the flask_restful module. I learn that the name of the API game here is to inherit from this abstract class Resource, so that your subclasses will be able to do real things with HTTP. Here’s the docstring from the Resource class:

Represents an abstract RESTful resource. Concrete resources should extend from this class and expose methods for each supported HTTP method. If a resource is invoked with an unsupported HTTP method, the API will return a response with status 405 Method Not Allowed. Otherwise the appropriate method is called and passed all arguments from the url rule used when adding the resource to an Api instance.

When Parthiban creates the subclass Todo from Resource, it’s not immediately clear what specific functionality is coming from that parent class. After fixing some spacing issues, I was able instantiate a Todo class locally and run the get() method, but the put() method returned an error about ‘request’ not being defined. Let’s keep moving.

Realizing that I had forgotten to store my todo.py in a folder called “resources,” I created that so that app.py could import it from one level beneath (the name of the base folder holding app.py and resources doesn’t seem to matter). I ran the contents of app.py in iPython and was able to see the REST API work in my browser:

Well I’m already happy. But this won’t do much for me if I can’t host it somewhere. On to Step 5.

Getting the Rest API onto Heroku

The following text goes into a file called “Procfile” in the root directory of this app (where I can see the resources) folder.

web: gunicorn app:app

The Procfile documentation shows that “web” is the process type and “gunicorn app:app” is the command to be run. It’s probably not a coincidence that our root python program is app.py and the Api class being instantiated is named “api” as an object, but I don’t know for sure.

I only add “gunicorn” and “flask-restful” to my requirements.txt file (in the same level as app.py), and it turns out this will be enough. I also omit the runtime.txt file without consequence.

Starting with the command heroku login, I followed similar steps to Parthiban, but I diverged somewhat. Here are the steps generally laid out:

  • Log into Heroku via the CLI (heroku login).
  • Create the app with the Heroku CLI (heroku create) and save both the web address ending in “.git” and the URL.
  • If the folder you’re working in isn’t already a git repository, make it so with git init.
  • If you don’t already have a remote that’s set to address copied in the second step, create one now. (This may happen automatically if you run git init before heroku create. I should find out!)
  • Add all relevant files, commit, and push to your Heroku remote.

Pushing code to the Heroku remote is what kicks off the magic, and I just watched some happen in my Terminal. (It really does feel like magic to me.)

It’s time to test the API. If you forgot to save the URL you can get it with heroku apps:info in the CLI. Add “/todo/1” at the end and see what happens:

Very cool!

Conclusions

Parthiban’s tutorial deserves more than 12 “claps”; people are just missing out. It lays out an easy to follow set of steps for getting started with Python REST APIs hosted internally. Thank you Parthiban!

Combined with the techniques I used in Data Science Apps in the Cloud with Heroku, there’s no reason to think this approach wouldn’t work with packages such as statsmodels. There’s a database featured there as well. I believe I have the elements for incorporating powerful data science functionality in a cheaply hosted website, but we’ll soon see.

To many good tutorials in 2020!


Introducing “datascroller” for fast terminal data frame scrolling

Category : Python , Tools

I’m excited to announce my very first package on PyPi, datascroller, a Python package for interactive terminal data scrolling. It’s available for Windows as well as *nix systems (thanks to windows-curses), and contributors to the codebase are welcome!

How it works

See the gif below for a glimpse of datascroller in action:

datascroller allows terminal datascrolling

The syntax has changed slightly since the gif was created, but during that demo, I was pressing keys to resize the terminal viewing window and to scroll from left-to-right and up-to-down within a Pandas data frame. Currently the scrolling keys are inspired by vim but later versions will offer customization options.

You can install datascroller with pip using:

pip install datascroller

Try datascroller out in iPython with the following code:

import pandas as pd
from datascroller import scroll

train = pd.read_csv(
    'https://raw.githubusercontent.com/datasets/house-prices-uk/master/data/data.csv')

scroll(train)

Why a terminal data scroller?

Scrolling a through a data set is a fundamental part of exploratory data analysis, and open-source tools let us down in this regard. SAS has had it right for a while. From my memory of around 2001, you could scroll through tens of millions of rows through what must have been a very clever paging strategy. Say what you want about SAS, but honestly no other data viewer has come close.

Moving to R in 2009, I had to accept the loss of SAS’s data set viewer and learn to accept the built-in viewer or just print slices of the data frame in the console. Soon after, I started using RStudio. They offered a nice improvement on the default viewer, but it still couldn’t hold a candle to SAS’s and didn’t handle very large data sets well at the time (to the best of my recollection).

In 2019, RStudio may very well have their data viewer tuned to perfection. But some people prefer working in the terminal, and sometimes you have to (say, a client gives you an ssh login for a particular remote machine). It is possible to hook up notebooks, or use an X-server, but often it’s easier to just print slices of your data sets in the terminal for exploratory analysis. Ehile R’s tibble and Panda’s DataFrame are smart enough to not overwhelm your console with output, they make you work to see the parts of the data that you really need to see.

The datascroller vision

The featured image is a play on the movie “Minority Report” and its very memorable scene with Tom Cruise’s character using the futuristic API to sort through information. I always wanted to move around the data set like that, and I felt that the terminal would be a good place to do it. In 2014, at Google, I took my first crack at this with an internal R package I called “terminalR.” I got helpful feedback from data scientists there, especially Tim Hesterberg. Tim convinced me of the need to implement user configuration options (still a TODO for datascroller!) and also to transition to Emacs/ESS since they came with Emacs Lisp. But, we stopped short of achieving the vision full interactivity.

The terminalR package’s original mechanism was “drumming” on the enter key while you pressed other navigation buttons, as it relied on R’s standard console input methods). With Python offering wrappers for the curses library for both *nix systems and Windows, the interactive “vision” has become a reality.

What’s next for datascroller?

The Python package datascroller, currently for use with Pandas dataframes, will become the tool “datascroller” for general purpose terminal data scrolling. Imagine interactive terminal scrolling of any csv, text, or even JSON file that can be initiated from outside of Python. My past colleague John Merfeld, who makes extensive use of low vision accessibility tools, is on the project and will help consult as to whether certain color schemes (curses offers those) help make the terminal output easier to see, thus giving datascroller an accessibility angle.

Even with TerminalR, I could get around an R data frame pretty fast, fast than any GUI viewer. It has column and row searching functionality from the keyboard, and a lot of movement options. All these options and more are coming to datascroller soon, in full interactive fashion.

I have big plans for this tool.