Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Oct 7, 2024

Updating Google Docs Meta Data

  • addition of "Signal Set" column
  • addition of two chng signals: 7dav_inpatient_covid and 7dav_outpatient_covid
  • a bunch of fixes to extended ascii apostrophes and quotation marks (replaced with regular ascii equivalents)

The signal name for "covid_naat_pct_positive_7dav" was lost in an apparent accidental paste, but i fixed it here w/ a commit to the branch PR, and manually in the spreadsheet

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 8, 2024

@melange396
Copy link
Collaborator

It turns out that there are still extended ascii chars in here (they are actually unicode chars)... They are findable by running:

from collections import defaultdict
highchars = defaultdict(int)
with open('db_signals.csv') as f:
    for line in f:
        for char in line:
            val = ord(char)
            if val>=127:
                highchars[val] += 1

the current db_signals.csv file gets the following results:

>>> highchars
defaultdict(<class 'int'>, {8220: 9, 8217: 30, 8221: 9})
>>> chr(8220)
'“'
>>> chr(8221)
'”'
>>> chr(8217)
'’'
>>> 

I am not going to simply replace them in the file itself because of escaping concerns, so after merging this PR, i will replace them in the google spreadsheet and then run the csv sync utility (GH action) again.

@melange396 melange396 merged commit a9a2535 into dev Oct 8, 2024
7 checks passed
@melange396 melange396 deleted the bot/update-docs branch October 8, 2024 20:45
@melange396
Copy link
Collaborator

in case it helps someone in the future, heres some ugly code that i used to help compare the two versions of these files:

import csv

dev = []
with open('dev__db_signals.csv') as f:
    for r in csv.reader(f):
        dev.append(r)

new = []
with open('new__db_signals.csv') as f:
    for r in csv.reader(f):
        new.append(r)

def compare_rows(a, b):
    if len(a) != len(b):
        print("length mismatch")
    for i in range(len(a)):
        if a[i] != b[i]:
            print("    ", i, a[i].replace("\n", ""))
            print("    ", i, b[i].replace("\n", ""))

for i in range(len(dev)):
    offset = 0
    if i in (7,8):
        # skip added rows                                                                                                                                                                                          
        continue
    if i > 8:
        # account for added rows                                                                                                                                                                                   
        offset = 2
    n = new[i][:10] + new[i][11:] # skip added column @ index 10                                                                                                                                                   
    d = dev[i-offset]
    if n != d:
        print(i)
        compare_rows(n, d)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants