Skip to content

Tools for querying and analysis of genomic data

License

Notifications You must be signed in to change notification settings

hugohooverwang/biotoolbox

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bio::ToolBox - Tools for querying and analysis of genomic data

DESCRIPTION

This is a collection of libraries and high-quality end-user 
scripts for bioinformatic analysis, including working with gene 
annotation, collecting data scores from a variety of modern file 
formats, and conversion between file formats. 

The Bio::ToolBox libraries provide a unified, abstracted interface 
to multiple common gene annotation formats and the collection of data 
from multiple data files. They rely on BioPerl SeqFeature libraries 
and related adaptors to access binary file formats including Bam, 
BigWig, BigBed, and USeq. 

The Bio::ToolBox package includes scripts for setting up databases 
of annotation, collecting annotated features, collecting genomic data 
relative to features, manipulating and analyzing data, and data format 
conversion. 



REQUIREMENTS

These are Perl modules and scripts. They require Perl and a 
command-line environment. They have been developed and tested on Mac 
OS X and linux; Microsoft Windows compatability is not tested but 
should mostly work.



INSTALLATION

Installation is simple with the standard Perl incantation.
    perl ./Build.PL
    ./Build installdeps     # if necessary
    ./Build
    ./Build test
    ./Build install

Released version may be obtained though the CPAN repository using 
your favorite package manager. For a quick installation, 
the following command will get you started using the system perl and 
your personal home PERL5 library.

	curl -L http://cpanmin.us | perl - local::lib App::cpanminus Bio::ToolBox



ADDITIONAL MODULES

To make the installation as lean and simple as possible, only the minimal 
additional Perl modules are required, while the remainder are only 
recommended. These can be installed subsequently as necessary as the need 
arises. Most of the database adapters, including those for Bam, BigWig, 
and BigBed, require external library dependencies that must be compiled 
separately. See the respective modules for installation instructions.

Most scripts should fail gently with warnings about missing modules.




USAGE OF PROVIDED SCRIPTS

* Configuration *
There is a configuration file that may be customized for your particular
installation. The default file is written to ~/.biotoolbox.cfg. It is a simple
INI-style file that is used to set up database connection profiles, feature
aliases, helper application locations, etc. The file may be edited by users. 
More documentation can be found in the Bio::ToolBox::db_helper::config 
documentation. This file is automatically written as needed; it is not 
installed by the Installer.

* Execution *
All biotoolbox scripts are designed to be run from the command line or
executed from another script. Some programs, for example
manipulate_datasets.pl, also provide an interactive interface to allow for
spontaneous work or when the exact index number or name of the dataset in
the file or database is not immediately known.

* Help *
All scripts require command line options for execution. Executing the
program without any options will present a synopsis of the options that are
available. Most programs also have a --help option, which will display
detailed information about the program and execution (usually by displaying
the internal POD). The options are given in the long format (--help, for
example), but may be shortened to single letters if the first letter is
unique (-h, for example).

* File Formats *
Many of the programs are designed to input and output a tabbed-delimited 
text format (unix line endings), where the rows represent genomic features, 
bins, etc. and the columns represent descriptive information and data. The 
first line in the table are the column headings. Metadata about each 
column are recorded in header lines at the beginning of the file and 
prefixed by a # symbol. The files may be compressed with gzip. More 
information may be found Bio::ToolBox::Data.



PROJECT WEBSITE

The BioToolBox project repository may be found at
https://github.com/tjparnell/biotoolbox. 



About

Tools for querying and analysis of genomic data

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 99.1%
  • Other 0.9%