Skip to content

transform patient data to time interval feature vectors

Notifications You must be signed in to change notification settings

SamoraHunter/pat2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pat2vec

Converts individual patient data into time interval feature vectors, suitable for filtering and concatenation into a data matrix D for binary classification machine learning tasks.

Example use case 1: I aim to compute the mean of n variables for each unique patient, resulting in a single row representing each patient.

Example use case 2: I intend to generate a monthly time series comprising patient data encompassing biochemistry, demographic details, and textual annotations (MedCat annotations) spanning the last 25 years. Each patient's data begins from a distinct start date (diagnosis date), providing a retrospective view.

Table of Contents

Notable requirements:

See requirements.txt

Features:

  • Single patient
  • Batch patient
  • Cohort search and creation
  • Automated random controls
  • Modular feature space selection
  • Look back
  • Look forward
  • Individual patient time windows.

Installation

Windows:

  1. Clone the repository: cd to gloabl_files

    git clone https://github.com/SamoraHunter/pat2vec.git
    cd pat2vec

    Run the installation script:

    install.bat
  2. Add the pat2vec directory to the Python path:

    Before importing pat2vec in your Python script, add the following lines to the script, replacing /path/to/pat2vec with the actual path to the pat2vec directory inside your project:

    import sys
    sys.path.append('/path/to/pat2vec')
  3. Import pat2vec in your Python script:

    import pat2vec

Unix/Linux:

Option 1: Install All Requirements Automatically

This option installs pat2vec along with its dependencies, including:

  • pat2vec_env (virtual environment)
  • snomed_methods
  • cogstack_search_methods
  • clinical_note_splitter

Before running the installation, ensure you:

  • Place the model pack in the appropriate directory gloabl_files/medcat_models/%modelpack%.zip
  • Populate the credentials file under gloabl_files/credentials.py
  • (Optional) Add a SNOMED file if needed gloabl_files/.. 'snomed', 'SnomedCT_InternationalRF2_PRODUCTION_20231101T120000Z', 'SnomedCT_InternationalRF2_PRODUCTION_20231101T120000Z', 'Full', 'Terminology', 'sct2_StatedRelationship_Full_INT_20231101.txt'

Installation Steps:

  1. Copy the install_pat2vec.sh file to your installation directory.

  2. Grant execution permissions:

    chmod +x install_pat2vec.sh
  3. Run the installation using one of the following options:

    • Standard installation:
      ./install_pat2vec.sh
    • Installation with proxy mirror support:
      ./install_pat2vec.sh --proxy
    • Install to a specific directory:
      ./install_pat2vec.sh --directory /path/to/install
    • Skip cloning repositories (if already cloned manually):
      ./install_pat2vec.sh --no-clone

Repositories Installed by This Script:

The script will clone the following repositories:


Option 2: Manual Installation

  1. Clone the repository:

    git clone https://github.com/SamoraHunter/pat2vec.git

    . Run the installation script:

    (Requires python3 on path and venv)
    chmod +x install.sh
    ./install.sh

    cd pat2vec

    
    
  2. Add the pat2vec directory to the Python path:

    Before importing pat2vec in your Python script, add the following lines to the script, replacing /path/to/pat2vec with the actual path to the pat2vec directory inside your project:

    import sys
    sys.path.append('/path/to/pat2vec')
  3. Import pat2vec in your Python script:

    import pat2vec

Usage:

  • Set paths, gloabl_files/medcat_models/modelpack.zip, gloabl_files/snomed_methods, gloabl_files/..

  • gloabl_files/

    • medcat_models/
      • modelpack.zip
    • snomed_methods/snomed_methods_v1.py**
    • pat2vec/
    • pat2vec_projects/
      • project_01/
        • example_usage.ipynb
        • treatment_docs.csv

*treatment_docs.csv should contain a column 'client_idcode' with your UUID's. **https://github.com/SamoraHunter/SNOMED_methods.git

  • Configure options

  • Run all

  • Examine example_usage.ipynb for additional functionality and use cases.

  • open example_usage.ipynb and hit run all.

  • If testing in a live environment ensure the testing flag is set to False in the config_obj.

Contributing

Contributions are welcome! Please see the contributing guidelines for more information.

License

This project is licensed under the MIT License - see the LICENSE file for details

Slide1

Slide2

About

transform patient data to time interval feature vectors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published