Category Archives: Infrastructure

On Getting Python Functionality in a Simple Website

Tags :

Category : Infrastructure , Python

I’ll be trying something different with this blog post – logging my efforts while going after a goal. It might not be riveting reading but I’m looking for ways to increase my output in 2020. And Happy New Year!

I’ve been enjoying the very cheap and surprisingly functional web hosting from Siteground, but one way they keep the cost down is by using software over a decade old and denying install privileges on the machines (that I can ssh in 24/7 for less than the price a beer a month still amazes me). You won’t be able to FTP a copy of Miniconda (trust me, I tried it), so you’re stuck with the LAMP framework. But at the same time, you have powerful data science functions written in Python. What do you do?

The rest of this stream-of-consciousness style blog article explores the use of Heroku to solve this problem. Though I do not explicitly show a website requesting output from Python data science modules, I believe all the pieces are in place by the end.

Parthiban’s REST API via Heroku Tutorial

Previously I wrote about deploying data science apps on Heroku, and while I was very impressed with the service and how it let me work with common data science Python dependencies like statsmodels, I never exposed the app to the outside world. Today, I found this great looking article from Parthiban Sudhaman that promises a walk-through of developing a REST API in Python and deploying it on Heroku. Let’s try it out.

Environment

Parthiban recommends creating a virtual environment and installing the following dependencies:

Glancing at my first Heroku article, I realize need to get the Heroku CLI. Given the limitations of Gunicorn, I’ll install this on Windows Subsystem for Linux (WSL). But the default way of installing the Heroku CLI on Ubuntu, Snap, does not work with WSL at the time of writing. This command will install it on WSL:

curl https://cli-assets.heroku.com/install.sh | sh.

Trying the simple REST API

Next, I create my own version of todo.py from Step 4 of the article, and note that the sole import is Resource from the flask_restful module. I learn that the name of the API game here is to inherit from this abstract class Resource, so that your subclasses will be able to do real things with HTTP. Here’s the docstring from the Resource class:

Represents an abstract RESTful resource. Concrete resources should extend from this class and expose methods for each supported HTTP method. If a resource is invoked with an unsupported HTTP method, the API will return a response with status 405 Method Not Allowed. Otherwise the appropriate method is called and passed all arguments from the url rule used when adding the resource to an Api instance.

When Parthiban creates the subclass Todo from Resource, it’s not immediately clear what specific functionality is coming from that parent class. After fixing some spacing issues, I was able instantiate a Todo class locally and run the get() method, but the put() method returned an error about ‘request’ not being defined. Let’s keep moving.

Realizing that I had forgotten to store my todo.py in a folder called “resources,” I created that so that app.py could import it from one level beneath (the name of the base folder holding app.py and resources doesn’t seem to matter). I ran the contents of app.py in iPython and was able to see the REST API work in my browser:

Well I’m already happy. But this won’t do much for me if I can’t host it somewhere. On to Step 5.

Getting the Rest API onto Heroku

The following text goes into a file called “Procfile” in the root directory of this app (where I can see the resources) folder.

web: gunicorn app:app

The Procfile documentation shows that “web” is the process type and “gunicorn app:app” is the command to be run. It’s probably not a coincidence that our root python program is app.py and the Api class being instantiated is named “api” as an object, but I don’t know for sure.

I only add “gunicorn” and “flask-restful” to my requirements.txt file (in the same level as app.py), and it turns out this will be enough. I also omit the runtime.txt file without consequence.

Starting with the command heroku login, I followed similar steps to Parthiban, but I diverged somewhat. Here are the steps generally laid out:

  • Log into Heroku via the CLI (heroku login).
  • Create the app with the Heroku CLI (heroku create) and save both the web address ending in “.git” and the URL.
  • If the folder you’re working in isn’t already a git repository, make it so with git init.
  • If you don’t already have a remote that’s set to address copied in the second step, create one now. (This may happen automatically if you run git init before heroku create. I should find out!)
  • Add all relevant files, commit, and push to your Heroku remote.

Pushing code to the Heroku remote is what kicks off the magic, and I just watched some happen in my Terminal. (It really does feel like magic to me.)

It’s time to test the API. If you forgot to save the URL you can get it with heroku apps:info in the CLI. Add “/todo/1” at the end and see what happens:

Very cool!

Conclusions

Parthiban’s tutorial deserves more than 12 “claps”; people are just missing out. It lays out an easy to follow set of steps for getting started with Python REST APIs hosted internally. Thank you Parthiban!

Combined with the techniques I used in Data Science Apps in the Cloud with Heroku, there’s no reason to think this approach wouldn’t work with packages such as statsmodels. There’s a database featured there as well. I believe I have the elements for incorporating powerful data science functionality in a cheaply hosted website, but we’ll soon see.

To many good tutorials in 2020!


Data Science Apps in the Cloud with Heroku

Category : Infrastructure

I recently had an opportunity to work with Heroku, a platform-as-a-service for deploying and running apps, for deploying Python-based data science applications in the cloud. At first, I didn’t understand why the engagement wasn’t just using AWS, since data science related instances abound on the EC2 marketplace. What I learned however is that AWS can be a money pit for businesses without a dedicated IT team. It is a complex beast that requires competent professionals to tame. Heroku, on the other hand, just seems to just work.

In this article, I’ll go through the basics of creating a Heroku application that at least loads popular data science dependencies in python. In later articles I may take the example to the end, where I load the Iris data set, run a regression on it using the statsmodels package, and write the results into a database on Heroku. All of this can be run using Heroku’s very simple free scheduler.

Preliminaries

To get started, create a free Heroku account at www.heroku.com, install the Heroku CLI, and run the following commands in a bash shell (Windows 10 users are encouraged to use the Ubuntu 18.04 app):

git clone https://github.com/baogorek/HerokuExample.git
cd HerokuExample
heroku login
heroku create

After cloning in the first line, the second line changes directories to the folder that contains the Heroku application, and the third line opens a browser window to log into your Heroku account. Finally, heroku create registers the app with the service. The output of that line is:

Creating app... done, ⬢ mysterious-badlands-45487                              https://mysterious-badlands-45487.herokuapp.com/ | https://git.heroku.com/mysterious-badlands-45487.git

which shows us that it is given a name, a URL, and its own git repository. If you navigate to the URL, you get a default welcome screen, but we won’t be building a web app in this article.

The git repository is interesting, because it seemed like we already had one. But this is a git remote hosted by Heroku itself, and it’s a big part of their deployment strategy. If I run a git remote -v, I can see it:

heroku  https://git.heroku.com/mysterious-badlands-45487.git (fetch)
heroku  https://git.heroku.com/mysterious-badlands-45487.git (push)
origin  https://github.com/baogorek/HerokuExample.git (fetch)
origin  https://github.com/baogorek/HerokuExample.git (push)

Even though I haven’t added anything new through git, I can deploy the app that I have through pushing to the heroku remote:

git push heroku master

Just that simple command sets off a lot of activity. Here is a selection of the output:

remote: Compressing source files... done.                                      remote: Building source:                                                       remote:                                                                        remote: -----> Python app detected                                             remote: -----> Installing python-3.6.8    
...
remote:        Installing collected packages: numpy, scipy, six, python-dateutil, pytz, pandas, patsy, cython, statsmodels, psycopg2-binary   
...
remote: -----> Launching...                                                    remote:        Released v3                                                     remote:        https://mysterious-badlands-45487.herokuapp.com/ deployed to Heroku  

Pushing to the “heroku” remote triggered the build of a Python application with data science dependencies such as numpy, scipy, pandas, and statsmodels. We see at the end that the app was “deployed.”

Testing it out

Since Heroku is based on containers, one quick way to test that our app has the data science dependencies that we think it does is to spin up a local container. We can do that with:

heroku run bash
python

In Python 3.6.8 within our local Heroku container, we can import a few packages just to make sure.

import statsmodels
import pandas

If you didn’t get an error, then your cloud-deployed Heroku app has these data science dependencies installed. Good!

Getting the dependencies

Exiting out of the local Heroku container, look inside the requirements.txt file:

cat requirements.txt

You’ll see a very modest text file with the following lines:

numpy
scipy==1.2
pandas
patsy
cython
statsmodels
psycopg2-binary

I specified scipy to be exactly version 1.2 based on advice from a post on a problem I was having, but otherwise these are the minimum dependencies specified by statsmodels.

Why not conda?

There are some Heroku “buildpacks” for conda online, but many of them are years old and not well maintained. Using the requirements.txt file was a breeze, and I didn’t see a reason to struggle with getting conda to work. But it clearly is possible.

Running jobs

If loading statsmodels and pandas in a local Heroku container didn’t send your pulse above 100, it’s not you. But we’re actually not too far away from making our Heroku app do things. One way to actually have your app act is to utilize the text file called Procfile (no “.txt” extension). If you look inside the Procfile for this app, it is completely blank.

Instead, I used the Heroku scheduler add on to run a file, like HerokuExample/herokuexample/run_glm.py. You can see how easy it is to set up by looking at the following screenshot:

Since you could potentially spin up some serious computing resources using the Heroku scheduler add on, you do need your credit card to enable it.

Running a script

Just so we see some output in this article, add the following line to the Procfile:

release: python herokuexample/say_hello.py

Add the file to git staging, commit it, and then push to the heroku remote:

git add Procfile
git commit -m "Updating Procfile"
git push heroku master

Among the output lines you will find:

remote: Verifying deploy... done.
remote: Running release command...
remote:
remote: I loaded statsmodels

Indeed it did.

Next steps

To really do something interesting without a full blown web app, we really need a database. Fortunately, Heroku has powerful database add ons that can complete the picture of a useful data science application that runs in the cloud. Leave a comment if you want to hear about Heroku databases in conjunction with data science apps, and I’ll add it to the queue.