A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

fyang0507 · 2017-06-01T05:15:56Z

Hi,

I'm a junior data scientist in university. I stored my data in google drive as to enable labeling from across platforms. What I notice is that gspread prepared a great bunch of functionalities already but lacks in terms of downloading or uploading the whole datasets.

I've gotten some self-made functions running and find them might be helpful for this community so I just share.

Downloading one or multiple sheets from ONE spreadsheet.
Of course the limitation is that all sheets should share the same format.

    def get_one_worksheet(self, workbook, sheet_name):
        gc = gspread.authorize(self.connection())
        sh = gc.open(workbook).worksheet(sheet_name)
        records = sh.get_all_records()
        print("Worksheet \"%s\" has been downloaded."%sheet_name)
        return records

    def get_all_worksheet(self, workbook):
        gc = gspread.authorize(self.connection()) # to authorize connection
        sh = gc.open(workbook) 
        worksheet_list = sh.worksheets()
        records = []
        for item in worksheet_list:
            current_sheet = sh.worksheet(item._title)
            print("Downloading: ", current_sheet._title)
            # Extract all records
            records += current_sheet.get_all_records()
            print("\n")
        return records

Change json to DataFrame

    def to_dataframe(self, records): # records is generated using the previous function as in form of json
        import pandas as pd
        import json
        df = pd.read_json(json.dumps(records))
        try:
            df.drop(labels='', axis=1, inplace=True)  # drop unnamed columns/excessive index
        except ValueError:
            pass
        return df

And additionally some uploading function. But since uploading cells/rows in gspread is pretty slow, I seldom use this function.

    def upload_dataframe(self, df, workbook, sheet_name):
        gc = gspread.authorize(self.connection())
        try:
            sh = gc.open(workbook)
        except:
            print("Unknown name. Please firstly create a blank spreadsheet.")
            return
        # number of columns to create
        col_num = len(df.columns)
        sheet = sh.add_worksheet(title=sheet_name, rows=1, cols=col_num)
        # update headers
        for num in range(0,col_num):
            sheet.update_cell(num+1, 1, df.columns[num])
        print("uploading dataframe, this process might take some time.\n")
        for num in range(0,len(df)):
            sheet.append_row(df.iloc[num])

yoihito · 2017-06-06T11:17:33Z

Hi @yang0339,
You uploading function is slow because sheet.append_row() is slow. I suggested some improvements in #462. Could you please take a look on it?

fyang0507 · 2017-06-09T02:38:01Z

hi @yoihito,

You definitely made the points. I'll do revision to my code to make them more sufficient later.
Meanwhile I'd like to see the community to expand the functionalities of gspread in the later release though, as a more important issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

fyang0507 commented Jun 1, 2017 •

edited

Loading

yoihito commented Jun 6, 2017

fyang0507 commented Jun 9, 2017

A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

Comments

fyang0507 commented Jun 1, 2017 • edited Loading

yoihito commented Jun 6, 2017

fyang0507 commented Jun 9, 2017

fyang0507 commented Jun 1, 2017 •

edited

Loading