-
Notifications
You must be signed in to change notification settings - Fork 9
Ref/redcap #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ref/redcap #92
Conversation
thank you! some (not complete)
this is some ontology mapping, if it's in the choice set, we keep it as choice? we can set choice label and value the same?
I think you're right, it should be different from 999, so string should be okay, but shouldn't
so |
@yibeichan - I forgot the change the way we deal with choices from You can test the code and the output |
for |
@djarecka i test it, looks reasonable. and yes, we will probably have trouble using UI for this one. but I guess, the purpose of our UI is to serve our own reproschema files than those converted from REDCap? |
roundtrip is one goal. but we do want reuse, so we need to understand either why our ui doesn't support it or what part of the schema doesn't support something. a specific set of examples would be helpful to review. |
0006-simplify-value-and-input-type-mapping-add-truefalse.patch
|
@satra here is an activity example, look at the
|
@djarecka btw i add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i moved all content from my patch here
except: | ||
raise ValueError(f"Invalid input for HTML parsing: {input_string}") | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_value_type(validation_type): | |
""" | |
Determine the XSD value type based on REDCap validation type | |
Args: | |
validation_type (str): Validation type from REDCap | |
Returns: | |
str: XSD value type for ReproSchema | |
""" | |
if validation_type is None: | |
return "xsd:string" | |
# Handle date and time formats with pattern matching | |
if validation_type.startswith("date_"): | |
return "xsd:date" | |
elif validation_type.startswith("datetime_"): | |
return "xsd:dateTime" | |
elif validation_type.startswith("time"): | |
return "xsd:time" | |
# For other types, use the mapping | |
return VALUE_TYPE_MAP.get(validation_type, "xsd:string") |
REDCAP_COLUMN_REQUIRED, | ||
RESPONSE_COND, | ||
VALUE_TYPE_MAP, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | |
get_value_type) |
) | ||
|
||
|
||
def process_input_value_types(input_type_rc, value_type_rc) -> (str, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def process_input_value_types(input_type_rc, value_type_rc) -> (str, str): | |
def process_input_value_types(input_type_rc, value_type_rc) -> (str, str): | |
""" | |
Process input type and value type to determine the final input type and value type, | |
that can be used by ReproSchema. | |
Args: | |
input_type_rc (str): Input type from redcap form | |
value_type_rc (str): Value type from redcap form | |
Returns: | |
tuple: (input_type, value_type) | |
input_type (str): Final input type for ReproSchema | |
value_type (str): Final value type for ReproSchema | |
""" | |
# If input type in redcap is set but not recognized, raise an error | |
if input_type_rc not in INPUT_TYPE_MAP: | |
raise ValueError( | |
f"Input type '{input_type_rc}' from redcap is not currently supported, " | |
f"supported types are: {', '.join(INPUT_TYPE_MAP.keys())}" | |
) | |
elif input_type_rc: | |
input_type = INPUT_TYPE_MAP.get(input_type_rc) | |
if value_type_rc: | |
# Get value type using the new function | |
value_type = get_value_type(value_type_rc) | |
# Adjust input type based on validation | |
if value_type_rc.startswith("date") or value_type_rc.startswith("datetime"): | |
if input_type_rc == "text": | |
input_type = "date" | |
elif value_type_rc.startswith("time"): | |
if input_type_rc == "text": | |
input_type = "time" | |
elif value_type_rc == "integer" and input_type_rc == "text": | |
input_type = "number" | |
elif value_type_rc in ["float", "number"] and input_type_rc == "text": | |
input_type = "float" | |
elif value_type_rc == "email" and input_type_rc == "text": | |
input_type = "email" | |
elif value_type_rc == "signature" and input_type_rc == "text": | |
input_type = "sign" | |
elif input_type_rc == "yesno": | |
value_type = "xsd:boolean" | |
elif input_type_rc == "truefalse": | |
value_type = "xsd:boolean" | |
elif input_type_rc in COMPUTE_LIST: | |
value_type = "xsd:integer" | |
else: # if no validation type is set, default to string | |
value_type = "xsd:string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yibeichan - we don't have time
as input_type
. We have only date
, year
and timeRange
: https://github.com/ReproNim/reproschema-ui/blob/389ccc5d82ee6befab3f58ac4524fd5de08d13a6/README.md?plain=1#L63
response_options["choices"] = [ | ||
{"name": {"en": "Yes"}, "value": 1}, | ||
{"name": {"en": "No"}, "value": 0}, | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
] | |
] | |
elif input_type_rc == "truefalse": | |
response_options["choices"] = [ | |
{"name": {"en": "True"}, "value": 1}, | |
{"name": {"en": "False"}, "value": 0}, | |
] |
INPUT_TYPE_MAP = { | ||
"calc": "number", | ||
"sql": "number", | ||
"yesno": "radio", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"yesno": "radio", | |
"yesno": "radio", | |
"truefalse": "radio", |
} | ||
|
||
# Map certain field types directly to xsd types | ||
VALUE_TYPE_MAP = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VALUE_TYPE_MAP = { | |
VALUE_TYPE_MAP = { | |
# Basic types | |
"text": "xsd:string", | |
"email": "xsd:string", | |
"phone": "xsd:string", | |
"signature": "xsd:string", | |
"zipcode": "xsd:string", | |
"autocomplete": "xsd:string", | |
# Numeric types | |
"number": "xsd:decimal", | |
"float": "xsd:decimal", | |
"integer": "xsd:integer", | |
# Date and time types will be handled by pattern matching in process_input_value_types | |
# These entries are kept for backward compatibility | |
"date_": "xsd:date", | |
"time_": "xsd:time", | |
} |
reproschema/redcap2reproschema.py
Outdated
|
||
df = pd.read_csv( | ||
csv_file, encoding="utf-8-sig" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csv_file, encoding="utf-8-sig" | |
csv_file, encoding="utf-8-sig", low_memory=False |
see: https://www.ctsi.ufl.edu/files/2017/06/Repeating-Instruments-and-Events-1.pdf (events are part of a longitudinal setup) |
thanks, makes sense. but in the csv we have examples in branching logic like
look at here we have variables like also, the ontology one, we currently set both the value and label as it, but obviously, it shouldn't be present as text.
|
@satra - I put several questions on slacks and here, but I'm happy to review later of all the things that I believe would not work properly in UI or just UI would not represent everything what I believe author had in mind (although sometimes it's hard to figure out). I also tried to put warnings in the code. |
is |
i think those are categorical values (i.e. anything from those ontologies would work), but it looks like that's for validation and the test input is free form. |
@satra okay, i kind of understand what you mean, but if it's for "validation", then it means our UI should be capable of validating input in real time, right? Would we need such ontology to be part of our UI? what's the action item for this current PR? |
yes, I mentioned earlier that this choices are weird, since they have |
i suspect so.
reproschema currently does not have a clear option that linkml does for range. we would need to introduce something like that. i was able to say this because of domain knowledge. this would be very hard to determine from the redcap schema without using some semantic+agentic approach. (redcap allows many custom schemas). here is an example where we use a set of choices: https://github.com/ReproNim/reproschema-library/blob/25162084b702505cd7b6729ab569bd17c144213b/activities/demographics_and_background_information_v1/items/country_of_birth#L17 btw, what does redcap do when that schema is imported? that would be a way to check as well whether it is a wellformed schema. |
|
Local test branch
for more information, see https://pre-commit.ci
…now we only have input type for date)
…hanges; updates to redcap2rs: adding activity preamble, adding description for calculate; fixing tests
@satra @yibeichan - list of issues, that I believe would prevent from running the schema using reproschema-ui (without testing, just based on my understanding the schema and ui). List is a collection of issues that I mentioned in various places and was able to find now, so might be not complete Choices / responses options
Matrix Info and Preamble
Input types and value types
Ignoring columnsThese columns I don't map in any meaningful way to reproschema, so just adding to the
|
let's file away components that cannot be easily mapped, but please keep the following principles in play.
perhaps to help wrap this up file anything that cannot be addressed as new issues with proper description of what's not available (schema, ui, etc.,.), and relax evaluation of matching to specific columns, etc.,. for round trips. |
But how can I know what are the allowed value for Do you want me just add
@yibeichan - any chances you checked if the file is a valid redcap schema? If not, can you do it?
I understand. But do you want to do it now, or just record the ontology value that was used.
But we don't have
In general, it seems to me that I was not aware what is the real task here. In terms of redcap, that shouldn't dictate the form, I thought the task was mostly to record all redcap information to later track changes, so I was not questioning the files. I also never used redcap, and don't have an account. |
hi, thanks to both of you for the discussion!! Our principle for conversion is to retain as much information as possible, prioritizing important elements such as questions and response choices. The UI serves the schema, not the other way around. If, during conversion, we encounter elements not available in the UI, we should first assess whether their schema logic is valid. If it is, we add them to the schema first, and the UI can be updated later. For example, Regarding As for checking whether a file is a valid REDCap schema, a CSV file may contain different forms serving different purposes—some for data collection, others as data elements. I can first perform the redcap2reproschema conversion and then determine the purpose of the generated activities. For now, we can merge this PR and distribute the remaining issues, questions, and discussion points into separate issues. |
ok, so I will not be making any changes now, I will create issues based on the discussion, and please also add to the issue if I miss something, since there are many things that are lost in the discussion and it would be good if we do update the schema/code at some point |
test_field_property.py
(moved some tests from test_redcap2rs) andtest_process_csv
NOTES:
There are still some things that are not solved:
what to do with
BIOPORTAL:RXNORM
in choices (perhpas I can treat it as other fields that we do not support: just add to the notes..?)choices can have
…| 998, 99.8 | 0999, 99.9 | 1000, 100
, and the value0999
is interpreted as a string by the converter. I thought for moment that perhaps this is a typo, but I guess this might be done to distinguish from value for “I don’t know” that has value999
, so I keep it as a string, but not sure if this is finecurrently
test_rs2redcap_redcap2rs.py
doesn't work, since we used to havepreamble
foractivity
, but not sure how we want to deal with it based on the example from hbn, should we call teh firstdescriptive
item as preamble toactivity