Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A snippet to add down/uploading all sheets from spread and transform to DataFrame #465

Open
fyang0507 opened this issue Jun 1, 2017 · 2 comments

Comments

@fyang0507
Copy link

fyang0507 commented Jun 1, 2017

Hi,

I'm a junior data scientist in university. I stored my data in google drive as to enable labeling from across platforms. What I notice is that gspread prepared a great bunch of functionalities already but lacks in terms of downloading or uploading the whole datasets.

I've gotten some self-made functions running and find them might be helpful for this community so I just share.

Downloading one or multiple sheets from ONE spreadsheet.
Of course the limitation is that all sheets should share the same format.

    def get_one_worksheet(self, workbook, sheet_name):
        gc = gspread.authorize(self.connection())
        sh = gc.open(workbook).worksheet(sheet_name)
        records = sh.get_all_records()
        print("Worksheet \"%s\" has been downloaded."%sheet_name)
        return records

    def get_all_worksheet(self, workbook):
        gc = gspread.authorize(self.connection()) # to authorize connection
        sh = gc.open(workbook) 
        worksheet_list = sh.worksheets()
        records = []
        for item in worksheet_list:
            current_sheet = sh.worksheet(item._title)
            print("Downloading: ", current_sheet._title)
            # Extract all records
            records += current_sheet.get_all_records()
            print("\n")
        return records

Change json to DataFrame

    def to_dataframe(self, records): # records is generated using the previous function as in form of json
        import pandas as pd
        import json
        df = pd.read_json(json.dumps(records))
        try:
            df.drop(labels='', axis=1, inplace=True)  # drop unnamed columns/excessive index
        except ValueError:
            pass
        return df

And additionally some uploading function. But since uploading cells/rows in gspread is pretty slow, I seldom use this function.

    def upload_dataframe(self, df, workbook, sheet_name):
        gc = gspread.authorize(self.connection())
        try:
            sh = gc.open(workbook)
        except:
            print("Unknown name. Please firstly create a blank spreadsheet.")
            return
        # number of columns to create
        col_num = len(df.columns)
        sheet = sh.add_worksheet(title=sheet_name, rows=1, cols=col_num)
        # update headers
        for num in range(0,col_num):
            sheet.update_cell(num+1, 1, df.columns[num])
        print("uploading dataframe, this process might take some time.\n")
        for num in range(0,len(df)):
            sheet.append_row(df.iloc[num])
@yoihito
Copy link

yoihito commented Jun 6, 2017

Hi @yang0339,
You uploading function is slow because sheet.append_row() is slow. I suggested some improvements in #462. Could you please take a look on it?

@fyang0507
Copy link
Author

hi @yoihito,

You definitely made the points. I'll do revision to my code to make them more sufficient later.
Meanwhile I'd like to see the community to expand the functionalities of gspread in the later release though, as a more important issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants