BfxPM (Bioinformatician's Project Manager) is an active command-line interface (CLI) manager designed specifically for the dynamic and often disorganized nature of computational biology research.

Why BfxPM?

While general scaffolding tools (like Cookiecutter) excel at Day 1 initialization, they offer little support for the long-term maintenance of a project. BfxPM is designed to address the "Day 365" problem:

  • Ongoing Hygiene: Route loose FASTQ, BAM, and script files that accumulate over time.
  • Scientific Attribution: Automatic generation of CITATION.cff and ORCID ID integration ensuring you and your team get credit.
  • Reproducibility: Integrated scaffolding for Conda, Docker, and Singularity from the start.
  • Resource Management: Tools to identify and compress massive biological datasets that would otherwise deplete server storage.

Installation

BfxPM is cross-platform and can be installed via the two primary scientific package managers:

Via Pip:

pip install bfxpm

Via Conda (Bioconda):

conda install jd2112::bfxpm

Standard Alignment

BfxPM is built on the philosophy of global reproducibility standards. Our structures and workflows are aligned with:

  • The Turing Way: The leading international handbook for reproducible data science.
  • ELIXIR & Software Carpentry: Best practices for the separation of Raw Data from Mutable Scripts.
  • nf-core / Snakemake: Industry-standard directory hierarchy for streamlined pipeline deployment.

Core Capabilities

  1. Intelligent Initialization (bfxpm init):
  2. Sets up an industry-standard directory structure.
  3. Automatic Git Scaffolding: Generates Open Source essentials (LICENSE, CONTRIBUTING.md, CODE_OF_CONDUCT.md, CITATION.cff) along with GitHub workflow boilerplates for CI and versioning.
  4. Dynamic Organization (bfxpm organize / bfxpm map / bfxpm modify):
  5. Iteratively route scattering FASTQs, scripts, and logs into their correct subfolders using smart rules.
  6. Environment & Pipeline Scaffolding (bfxpm env / bfxpm pipeline):
  7. Instantly spawn environment.yml, Singularity.def, Snakefile or main.nf templates directly into your project scripts.
  8. Data Acquisition (bfxpm fetch):
  9. Hook into SRA (via fastq-dump) to cleanly fetch and automatically route raw data into data/raw_external/.
  10. Project Hygiene (bfxpm clean / bfxpm compress):
  11. Smart Clean: Finds giant *.sam or *.tmp files.
  12. Smart Compress: Scans specifically for biological formats (.fastq, .fast5, .pod5, .hifi) and batches them into tar.gz archives.
  13. Archive: Move old datasets to a managed archive/ directory.
  14. Reporting & Tracking (bfxpm report / bfxpm history):
  15. Generate human-readable summaries (PROJECT_SUMMARY.md) and machine-readable metadata (.json, .yml) accounting for project size, sample counts, and git history.
  16. Agentic AI Integration (bfxpm ai):
  17. BioAssistant: A project-aware AI agent that helps with organization, research questions, and automated management.
  18. Safety-First: Integrated safety interceptors for destructive actions and support for local execution (Ollama).

Quick Start

Initialize your scholarly project:

bfxpm init

Clean up and compress your sequence data:

bfxpm clean compress

Generate a detailed status report:

bfxpm report

For more details and further functionality, please refer to the usage documentation and the commands & parameters documentation.

Credits

BfxPM was developed by Jyotirmoy Das to streamline active bioinformatics research and analysis. It integrates smoothly into High Performance Computing (HPC) environments using standard python tooling.