Introduction¶

BfxPM (Bioinformatician's Project Manager) is an active command-line interface (CLI) manager designed specifically for the dynamic and often disorganized nature of computational biology research.

Why BfxPM?¶

While general scaffolding tools (like Cookiecutter) excel at Day 1 initialization, they offer little support for the long-term maintenance of a project. BfxPM is designed to address the "Day 365" problem:

Ongoing Hygiene: Route loose FASTQ, BAM, and script files that accumulate over time.
Scientific Attribution: Automatic generation of CITATION.cff and ORCID ID integration ensuring you and your team get credit.
Reproducibility: Integrated scaffolding for Conda, Docker, and Singularity from the start.
Resource Management: Tools to identify and compress massive biological datasets that would otherwise deplete server storage.

Installation¶

BfxPM is cross-platform and can be installed via the two primary scientific package managers:

Via Pip:

pip install bfxpm

Via Conda (Bioconda):

conda install jd2112::bfxpm

Standard Alignment¶

BfxPM is built on the philosophy of global reproducibility standards. Our structures and workflows are aligned with:

The Turing Way: The leading international handbook for reproducible data science.
ELIXIR & Software Carpentry: Best practices for the separation of Raw Data from Mutable Scripts.
nf-core / Snakemake: Industry-standard directory hierarchy for streamlined pipeline deployment.

Core Capabilities¶

Intelligent Initialization (bfxpm init):
Sets up an industry-standard directory structure.
Automatic Git Scaffolding: Generates Open Source essentials (LICENSE, CONTRIBUTING.md, CODE_OF_CONDUCT.md, CITATION.cff) along with GitHub workflow boilerplates for CI and versioning.
Dynamic Organization (bfxpm organize / bfxpm map / bfxpm modify):
Iteratively route scattering FASTQs, scripts, and logs into their correct subfolders using smart rules.
Environment & Pipeline Scaffolding (bfxpm env / bfxpm pipeline):
Instantly spawn environment.yml, Singularity.def, Snakefile or main.nf templates directly into your project scripts.
Data Acquisition (bfxpm fetch):
Hook into SRA (via fastq-dump) to cleanly fetch and automatically route raw data into data/raw_external/.
Project Hygiene (bfxpm clean / bfxpm compress):
Smart Clean: Finds giant *.sam or *.tmp files.
Smart Compress: Scans specifically for biological formats (.fastq, .fast5, .pod5, .hifi) and batches them into tar.gz archives.
Archive: Move old datasets to a managed archive/ directory.
Reporting & Tracking (bfxpm report / bfxpm history):
Generate human-readable summaries (PROJECT_SUMMARY.md) and machine-readable metadata (.json, .yml) accounting for project size, sample counts, and git history.
Agentic AI Integration (bfxpm ai):
BioAssistant: A project-aware AI agent that helps with organization, research questions, and automated management.
Safety-First: Integrated safety interceptors for destructive actions and support for local execution (Ollama).

Quick Start¶

Initialize your scholarly project:

bfxpm init

Clean up and compress your sequence data:

bfxpm clean compress

Generate a detailed status report:

bfxpm report

For more details and further functionality, please refer to the usage documentation and the commands & parameters documentation.

Credits¶

BfxPM was developed by Jyotirmoy Das to streamline active bioinformatics research and analysis. It integrates smoothly into High Performance Computing (HPC) environments using standard python tooling.

BfxPM Logo