References¶
This page lists the scientific and technical references discussed in the BfxPM documentation and development plan, including international standards for reproducible research and data management.
International Guidelines & Standards¶
- The Turing Way - The leading international handbook for reproducible, ethical, and collaborative data science.
- ELIXIR Research Data Management (RDM) - Best practices for life science data management, emphasizing the separation of raw data and results.
- Software Carpentry (Project Organization) - Foundations of structured computational research.
- nf-core Project Structure - Guidelines for scalable, reproducible bioinformatics pipelines.
- Snakemake Best Practices - Structural recommendations for modular workflow management.
- FAIR Research Software (FAIR4RS) - International standards for making software Findable, Accessible, Interoperable, and Reusable.
- Wilkinson et al. (2016) - The foundational paper on the FAIR Guiding Principles for scientific data management and stewardship.
- Gentleman et al. (2004) - Bioconductor - Foundational concepts for open source software in bioinformatics.
CLI & Software Design Philosophy¶
- Command Line Interface Guidelines - An open-source guide to help you write better command-line programs.
- Rich Documentation - The library powering BfxPM's visual terminal experience.
- Typer Documentation - The framework enabling our modular and type-safe CLI structure.
Sequence Data Compression¶
- Academic 0up (Lossless Genomic Compression) - General overview of essential lossless strategies.
- PMC6662292 (Specialized Compressors) - Performance comparisons of specialized genomic compressors vs. Gzip.
- IEEE Spectrum (The Desperate Quest for Genomic Compression) - Rationale for domain-specific algorithms.
- Nature Scientific Reports (Lossless Compression of FASTQ Files)
- NanoSpring (Long-read FASTQ Compression) - Specialized tool for Nanopore and PacBio data.
- EBI Sequence File Formats
- Biostars Thread (FASTQ vs CRAM vs Genozip)
- Illumina FASTQ ORA Format
Aligned Data (BAM/CRAM)¶
- Bioinformatics Stack Exchange (Best Archive Practices)
- CRAM Specification (GA4GH Standard)
- HTSeq Read Mapping Documentation
Raw Signal Data (FAST5/POD5/SLOW5)¶
- Oxford Nanopore Beginner's Guide to Formats
- SLOW5 & blow5 Documentation - Open-source lossless alternative for raw signal recording.
- VBZ Compression Plugin - Lossless squiggle signal compression within HDF5.