Exporting pandas dataframe from Google Colab notebook to Google Spreadsheet using gspread
Quite often I need to export data from pandas dataframe into Google Spreadsheet to share with people who are not software developers. They are familiar with spreadsheet applications. Google Spreadsheets is one of the most popular now.
My usual procedure was:
- Export to CSV file using df.to_csv() and save it on a local drive
- Create a new spreadsheet in Google
- Upload CSV file from local drive
- Format spreadsheet by styling headers, highlighting columns with colour, etc.
- Finding that there is an error and you need to do that again, and again… :-(
How about making it easier and faster? Sure. Google provides Google Sheets API which could be used to update data directly from your Python code in the Google Colab notebook. I am going to use gspread library to access this API.
Sample dataset: World Happiness Report 2023 😊
So you were asked to find in which countries the most and least happy people live. Data is available at https://worldhappiness.report/data/
You can download an XLS file and load it into a data frame
Install gspread library
If the import fails, we install the library using pip
Authenticate the user to provide access to Google API
By default, gspread will look for OAuth credentials on your drive, but in Google Colab you can authenticate without creating new credentials. Google will ask for permission to access your Google Drive to access spreadsheets.
Open spreadsheet
You need to create a new empty spreadsheet named “gspread-test” in your Google Drive before running this code
Copy data frame data to Google Spreadsheet
Now we are ready to copy data over. The first row is populated with data frame column names and the remaining rows with data. Notice that I needed to replace NA with -1 since data is exported into JSON and will fail if it contains NA values
Now all data should be available in Google Spreadsheet 😊
You can also set cell formulas. For example, add an image to cell using image()
the function:
Styling spreadsheet
Let’s add a bit of styling and colour to make column names more prominent and change cell background colour based on values relative to minimum and maximum values.
Setting column header styling
Helper function to convert numeric column indexes into spread column names. For example, a column with index 3 should be converted into column “D”. Thanks to Sundar Nataraj for providing a code snippet on Stack Overflow
find the last column in the spreadsheet:
Should return the “H” column.
Now by calling .format() we can pass a range of cells to apply formatting and formatting options:
Now we have styling added for column headers
It would be nice to add a background gradient to the spreadsheet similar to what could be done in pandas while running code in notebook:
While it’s a very simple thing in pandas, I was unable to find how to extract colour values directly. But we can use matplotlib to get the same thing in Google Spreadsheet.
First of all, I would like to add a helper method to allow us to modify the brightness on the background gradient since values in black would be difficult to read on dark backgrounds. Thanks to scipy-cookbook for providing this useful snippet:
And now the gradient logic.
I am using a “Spectral” colour map, but you can use any other you like
We divide brightness by 2 and add 0.5, so all colours will be lighter
cmap_map(lambda x: x/2 + 0.5, spectral_cmap)
For each column, we need to get the min and max values and use them to normalise values in the column. So the minimum value will be 0 and the maximum value equal to 1 after normalisation.
By calling colormap with normalised value we get an array with RGBA values for this value. All formatting commands are added to the array for batch processing using batch_format().
While you can format() it will take much longer to run, since each cell colour update requires a call to API
Now each cell is colourized based on its value
And if you want to update the data frame and re-export data again, it would take just a couple of seconds to complete 😊.
The complete demo is available at Google Colab:
https://colab.research.google.com/drive/1OlKGSVcBBAFtgmO0qSXoun9zr6lj0J7r?usp=sharing
Additional resources:
- gspread-pandas — another library created for pandas data export, but I found that its functionality is a bit limited compared to gspread
- https://pandas.pydata.org/docs/reference/api/pandas.io.formats.style.Styler.background_gradient.html