Skip to content

Latest commit

 

History

History
84 lines (64 loc) · 10.4 KB

04.use-cases-b.md

File metadata and controls

84 lines (64 loc) · 10.4 KB

Hosting a website{#hosting-a-website}

Personal or laboratory websites can improve the sharing of research findings, build online presence, and increase coordination of research efforts [@doi:10.1038/nj7142-347a]. Despite researchers in ecology and evolution generally lacking experience in building or hosting webpages, many tools have been developed to help this process. Static websites can now be easily built [using, for example Quarto (https://quarto.org), RMarkdown, Hugo (https://gohugo.io), GitHub Website templates (https://github.com/topics/website-template)), stored in a repository, and be readily hosted by activating GitHub's Pages (https://pages.github.com) feature. Creating and hosting websites on GitHub Pages is more complex than out-of-the-box platforms (e.g., Wix, Weebly, Google Sites). However, free hosting, widely available template customization, and versioning are strong advantages over alternatives.

Archiving citable code and data{#archiving-citable-code}

Government, funding agencies, and publishers exercise rigorous open access data policies and mandates [@doi:10.1371/journal.pone.0229003; @doi:10.1108/TG-03-2014-0008]. However, code and data sharing may be met by individual reluctance, temporary embargoes, or partially prevented by privacy and confidentiality reasons [@doi:10.1371/journal.pone.0026828; @doi:10.3389/fpubh.2017.00327; @doi:10.1371/journal.pone.0134826]. Still, data deposition and ensuring its availability can amplify the outreach of published studies [@doi:10.7717/peerj.1242], increase citation rates [@doi:10.1371/journal.pone.0000308], and among many other reasons, enable the reproducibility and robustness of scientific advances [@doi:10.1038/ncb3506; @doi:10.1016/j.tree.2015.11.006; @doi:10.1038/533452a]. While public repositories on GitHub make it easy to store and share data files, they are not considered long-term repositories for research materials. This is because GitHub, a for-profit company, does not have long-term data availability guarantees, allowing users to delete or make repositories private after publication. Also, GitHub does not issue Digital Object Identifiers (DOI) for content uploaded to their servers. DOIs are persistent and citable unique alpha-numeric identifiers assigned to digitally-stored research materials. Because of this, scientists sharing code and data through GitHub are strongly encouraged to independently submit their research materials to long-term data archives (e.g., Zenodo, Figshare, Dryad, OSF [@doi:10.1038/538127a; @doi:10.1371/journal.pcbi.1004947; @doi:10.1029/2021EA001797]; Table 1). Some of these options (Zenodo, Figshare and OSF) integrate with GitHub, allowing project, code, and data releases (Box 1) to be archived with versioned, citable DOIs. Linking GitHub repositories with a DOI helps research become findable, properly cited, and can ensure long-term stability [@doi:10.1890/ES14-00402.1]. This strategy has been increasingly adopted in numerous studies in ecology and evolution (e.g., the Zenodo repositores @10.5281/zenodo.6097109, @doi:10.5281/zenodo.1188710; the Dryad and OSF repositories @doi:10.5061/dryad.rjdfn2zgj and @doi:10.17605/OSF.IO/AMVP5).

An important aspect of making code and data citable and reusable is to add an appropriate license to protect intellectual property. Code published without a license is under exclusive copyright (by default), protecting it from copy, distribution, and modifications. One may grant specific rights to their code for reuse by adding licensing files and specifications within GitHub repositories [@url:https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository]. The Choose a License (https://choosealicense.com/non-software/) website offers further guidance on the licenses available for research and creative products. For example, Creative Commons (CC; https://creativecommons.org/licenses/) licenses can specify that shared code is intended for a specific analysis. A CC BY 4.0 license specifies that any code (or other creative products) must be appropriately credited to its original author when distributed, adapted or reused.

Collaborative and asynchronous code editing{#code-editing}

GitHub can serve as a platform for everyone working with research (e.g., supervisors or advisors, graduate students, postdoctoral fellows, and collaborators) to share in-progress work, and flag specific challenges or questions for each other (Table 2). Periodic code, data and text review is useful for identifying errors early in the research process [@doi:10.1145/3341525.3387370], and informing further training and mentorship to fill gaps in skills. This is facilitated by a group of core features of Git and GitHub that allow contributors to discuss and simultaneously work on a project. For instance, users can clone and fork repositories (see Box 1), allowing for modifications to be done on a linked copy of a repository, which can then later be merged into the main project through pull requests. Collaborators can comment on specific lines of code and text or suggest changes, which can then be incorporated with the click of a button, greatly facilitating peer review. Explicit project organization and increased communication within pull requests, in GitHub Issues, or in GitHub Discussions can help with project development and with potential merge conflicts due to users simultaneously working on the same sections. Moreover, version control allows researchers to make changes in code or documents without worrying about irreparably modifying someone else's work (#storing-sharing).

By enabling more comprehensive remote collaboration, GitHub encourages the exchange of ideas among researchers at different institutions and in different countries, which can serve to improve the quality of the research itself by providing open access to data and code.

Writing a manuscript{#writing-manuscript}

Beyond supporting collaborative code development, GitHub can be used for writing manuscripts. Writing a manuscript and storing its associated data and code in GitHub increases scientific reproducibility because text, code, and data can be found in one place. Although it may involve more initial time investment for setup, GitHub has many features that support a powerful collaborative workflow when writing manuscripts [@doi:10.1186/1751-0473-8-7]. Text documents stored and versioned in GitHub can be instantly displayed when written in Markdown, a lightweight markup language increasingly popular among scientists. When manuscripts are written in Markdown on GitHub, co-authors can contribute changes or suggest revisions to a manuscript in the same ways they would contribute changes or suggest revisions to code (see Collaborative and asynchronous code editing). Relevant literature or issues can be suggested using the Discussions and Issues features.

Incorporating GitHub into the process of writing a manuscript does not necessarily mean pivoting to an entirely new workflow. For instance, authors who prefer writing in LaTeX or Markdown can synchronize Overleaf [@url:https://www.overleaf.com/learn/how-to/Using_Git_and_GitHub], HackMD (https://hackmd.io), and other platforms directly with a GitHub repository. Similar GitHub integrations are available for projects stored in DropBox, Google Drive, among other popular tools familiar to many scientists and their collaborators (Table 1).

We wrote this manuscript using Manubot, a modifiable workflow implemented in GitHub to automatically render manuscripts and automate bibliographical tasks [@doi:10.1371/journal.pcbi.1007128]. Manubot uses GitHub's automation workflow, GitHub Actions, to combine and convert individual Markdown files into a single LaTeX document, which can then be converted to a Word or PDF document, and displayed as a webpage. Citations and bibliographic references are automatically managed with citable persistent identifiers (e.g., DOI, PubMed ID, ISBN, URL). The resulting manuscript can be rendered with document templates and citation style language formatting to meet journal formatting requirements. Every change made to the manuscript triggers its rendering, so that updates are readily displayed and made publicly available. Additional GitHub Actions can be integrated with Manubot, such as ones creating figures or generating tables (e.g., https://github.com/SORTEE-Github-Hackathon/manuscript/tree/main/.github/workflows).

Peer review{#peer-review}

Peer review is the standard process for assessing whether research done in ecology and evolution should be published in a scientific journal. GitHub provides an open and transparent platform that can be used for either directly providing feedback on research products or addressing changes recommended by reviewers. GitHub Issues can be used to organize and discuss reviewer suggestions and to assign them to co-authors (e.g., https://github.com/SORTEE-Github-Hackathon/manuscript/issues?q=label%3A%22Reviewer+Comment%22). When reviewer comments are posted as separate issues, authors can comment on the issues to discuss possible changes and assign co-authors who will address the issue. Co-authors can then integrate their edits and responses to reviewers using pull requests, which can be directly linked to the issues they address.

GitHub can also assist reviewers during the peer review process. If the code associated with a manuscript is made available at the time of submission (e.g., as a link to a GitHub repository within the Data Availability Statement), peer reviewers may be able to offer more comprehensive suggestions on the code and written materials, potentially recognizing errors before publication. Certain journals or software development communities require submitted work or research code to be hosted on GitHub and their review processes make use of GitHub Issues (e.g., rOpenSci (https://ropensci.org/software-review/), Journal of Open Source Software (https://joss.readthedocs.io/en/latest/submitting.html).