You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm a junior data scientist in university. I stored my data in google drive as to enable labeling from across platforms. What I notice is that gspread prepared a great bunch of functionalities already but lacks in terms of downloading or uploading the whole datasets.
I've gotten some self-made functions running and find them might be helpful for this community so I just share.
Downloading one or multiple sheets from ONE spreadsheet.
Of course the limitation is that all sheets should share the same format.
defget_one_worksheet(self, workbook, sheet_name):
gc=gspread.authorize(self.connection())
sh=gc.open(workbook).worksheet(sheet_name)
records=sh.get_all_records()
print("Worksheet \"%s\" has been downloaded."%sheet_name)
returnrecordsdefget_all_worksheet(self, workbook):
gc=gspread.authorize(self.connection()) # to authorize connectionsh=gc.open(workbook)
worksheet_list=sh.worksheets()
records= []
foriteminworksheet_list:
current_sheet=sh.worksheet(item._title)
print("Downloading: ", current_sheet._title)
# Extract all recordsrecords+=current_sheet.get_all_records()
print("\n")
returnrecords
Change json to DataFrame
defto_dataframe(self, records): # records is generated using the previous function as in form of jsonimportpandasaspdimportjsondf=pd.read_json(json.dumps(records))
try:
df.drop(labels='', axis=1, inplace=True) # drop unnamed columns/excessive indexexceptValueError:
passreturndf
And additionally some uploading function. But since uploading cells/rows in gspread is pretty slow, I seldom use this function.
defupload_dataframe(self, df, workbook, sheet_name):
gc=gspread.authorize(self.connection())
try:
sh=gc.open(workbook)
except:
print("Unknown name. Please firstly create a blank spreadsheet.")
return# number of columns to createcol_num=len(df.columns)
sheet=sh.add_worksheet(title=sheet_name, rows=1, cols=col_num)
# update headersfornuminrange(0,col_num):
sheet.update_cell(num+1, 1, df.columns[num])
print("uploading dataframe, this process might take some time.\n")
fornuminrange(0,len(df)):
sheet.append_row(df.iloc[num])
The text was updated successfully, but these errors were encountered:
Hi @yang0339,
You uploading function is slow because sheet.append_row() is slow. I suggested some improvements in #462. Could you please take a look on it?
You definitely made the points. I'll do revision to my code to make them more sufficient later.
Meanwhile I'd like to see the community to expand the functionalities of gspread in the later release though, as a more important issue.
Hi,
I'm a junior data scientist in university. I stored my data in google drive as to enable labeling from across platforms. What I notice is that gspread prepared a great bunch of functionalities already but lacks in terms of downloading or uploading the whole datasets.
I've gotten some self-made functions running and find them might be helpful for this community so I just share.
Downloading one or multiple sheets from ONE spreadsheet.
Of course the limitation is that all sheets should share the same format.
Change json to DataFrame
And additionally some uploading function. But since uploading cells/rows in gspread is pretty slow, I seldom use this function.
The text was updated successfully, but these errors were encountered: