Introduction¶
BfxPM (Bioinformatician's Project Manager) is an active command-line interface (CLI) manager designed specifically for the dynamic and often disorganized nature of computational biology research.
Why BfxPM?¶
While general scaffolding tools (like Cookiecutter) excel at Day 1 initialization, they offer little support for the long-term maintenance of a project. BfxPM is designed to address the "Day 365" problem:
- Ongoing Hygiene: Route loose FASTQ, BAM, and script files that accumulate over time.
- Scientific Attribution: Automatic generation of
CITATION.cffandORCID IDintegration ensuring you and your team get credit. - Reproducibility: Integrated scaffolding for Conda, Docker, and Singularity from the start.
- Resource Management: Tools to identify and compress massive biological datasets that would otherwise deplete server storage.
Installation¶
BfxPM is cross-platform and can be installed via the two primary scientific package managers:
Via Pip:
pip install bfxpm
Via Conda (Bioconda):
conda install jd2112::bfxpm
Standard Alignment¶
BfxPM is built on the philosophy of global reproducibility standards. Our structures and workflows are aligned with:
- The Turing Way: The leading international handbook for reproducible data science.
- ELIXIR & Software Carpentry: Best practices for the separation of Raw Data from Mutable Scripts.
- nf-core / Snakemake: Industry-standard directory hierarchy for streamlined pipeline deployment.
Core Capabilities¶
- Intelligent Initialization (
bfxpm init): - Sets up an industry-standard directory structure.
- Automatic Git Scaffolding: Generates Open Source essentials (
LICENSE,CONTRIBUTING.md,CODE_OF_CONDUCT.md,CITATION.cff) along with GitHub workflow boilerplates for CI and versioning. - Dynamic Organization (
bfxpm organize/bfxpm map/bfxpm modify): - Iteratively route scattering FASTQs, scripts, and logs into their correct subfolders using smart rules.
- Environment & Pipeline Scaffolding (
bfxpm env/bfxpm pipeline): - Instantly spawn
environment.yml,Singularity.def,Snakefileormain.nftemplates directly into your project scripts. - Data Acquisition (
bfxpm fetch): - Hook into SRA (via
fastq-dump) to cleanly fetch and automatically route raw data intodata/raw_external/. - Project Hygiene (
bfxpm clean/bfxpm compress): - Smart Clean: Finds giant
*.samor*.tmpfiles. - Smart Compress: Scans specifically for biological formats (
.fastq,.fast5,.pod5,.hifi) and batches them intotar.gzarchives. - Archive: Move old datasets to a managed
archive/directory. - Reporting & Tracking (
bfxpm report/bfxpm history): - Generate human-readable summaries (
PROJECT_SUMMARY.md) and machine-readable metadata (.json,.yml) accounting for project size, sample counts, and git history. - Agentic AI Integration (
bfxpm ai): - BioAssistant: A project-aware AI agent that helps with organization, research questions, and automated management.
- Safety-First: Integrated safety interceptors for destructive actions and support for local execution (Ollama).
Quick Start¶
Initialize your scholarly project:
bfxpm init
Clean up and compress your sequence data:
bfxpm clean compress
Generate a detailed status report:
bfxpm report
For more details and further functionality, please refer to the usage documentation and the commands & parameters documentation.
Credits¶
BfxPM was developed by Jyotirmoy Das to streamline active bioinformatics research and analysis. It integrates smoothly into High Performance Computing (HPC) environments using standard python tooling.