Skip to content

Disable curation & spreadsheet handling #33

@seasidesparrow

Description

@seasidesparrow

Following extensive testing that showed oracle performed much better than classic docmatching, we disabled curation of daily and weekly docmatching in June 2024. The crontab that automatically looks for curated files was disabled in mid-June but doing so left some of the infrastructure activated within the pipeline itself. The things that need to be disabled are

  • uploading to Google Sheets
  • broadcast of docmatching status via Slack

Additionally, the existing doc-matching process exports results to a backoffice file formatted for Google Sheets; see adsdocmatch/match_w_metadata.py, L86. Right now, the backoffice matching script will reprocess this file using grep/sed/awk to extract only those flagged as "Match" to a three column file (preprint bibcode, published bibcode, and score), and it is this result that is uploaded to oracle (using the -mf option) on a daily/weekly basis.

We can handle the first two issues easily with changes to run.py, and there is a PR in progress that addresses these. The latter issue will require a small amount of extra coding both in this repository and in the backoffice match scripts; we will need to update classic so that it assumes it is getting a file that's already correctly formatted.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions