Usage Guide¶
Prerequisites¶
- Python: Ensure you have Python 3.8 or higher installed.
- Dependencies: The required libraries (
typer,rich,gitpython,pyyaml) are managed via your package manager. - External Tools: For full functionality of
bfxpm fetch, ensure specialized tools like the SRA Toolkit (fastq-dump) are in your system PATH.
Detailed Workflow¶
1. Project Initialization¶
The bfxpm init command is your first step. It sets up a standard, industry-compliant directory structure that separates raw data from code and results.
bfxpm init
Key Steps during Init:
- Author Information: Used to populate
project.yml,LICENSE, andCITATION.cff. - Git Initialization: If you opt for Git, BfxPM will scaffold a comprehensive repository including:
- Governance:
CONTRIBUTING.md,CODE_OF_CONDUCT.md. - Visibility:
SECURITY.md,SUPPORT.md. - Templates: GitHub Issue and Pull Request templates in
.github/. - CI/CD: Boilerplate GitHub Actions for CI and Versioning.
- Governance:
- ORCID ID: If Git is enabled, BfxPM asks for your ORCID ID to ensure proper scientific attribution in your
CITATION.cff.
2. Analytical Scaffolding¶
BfxPM helps you transition from project setup to analysis by generating standard config and pipeline files.
Environment Management¶
Use bfxpm env to manage reproducible computing environments.
bfxpm env init: Creates a standard Condaenvironment.ymlwith essential bioinfo channels (bioconda,conda-forge).bfxpm env docker: Generates aDockerfileoptimized for bioinformatics data.bfxpm env singularity: Generates aSingularity.deffile for HPC compatibility.
Pipeline Scaffolding¶
Quickly drop pipeline templates into your project.
bfxpm pipeline snakemake: Scaffold aSnakefileinscripts/snakemake/.bfxpm pipeline nextflow: Scaffold amain.nfinscripts/nextflow/.
3. Data Acquisition and Hygiene¶
BfxPM simplifies the messy process of downloading and managing large biological datasets.
Fetching Data¶
Use bfxpm fetch to pull data directly into data/raw_external/.
bfxpm fetch sra SRR123456
Cleaning and Compression¶
Bioinformatics generates massive intermediate files. bfxpm clean helps you find and manage them.
- Batch Compression: Identifies uncompressed biological formats (
.fastq,.fast5,.pod5,.hifi) and applies standardized lossless compression. - Standard: Converts to
.tar.gz. - Genomic: Leverages
samtoolsto convert BAM to CRAM andpod5-utilsto convert FAST5 to POD5. - Large Data Sync: While Git handles your code,
bfxpm sync --datausesrsyncto securely back up your massivedata/directory to a cluster or remote server. - Archiving: Moves entire directories to
archive/to keep your active project view focused.
See the References section for more details on these compression formats.
Compression Modes
bfxpm clean compress features a smarter, type-aware processing engine:
- Mode Awareness: You can run
bfxpm clean compress --mode genomicto prioritize scientific formats, or keep it onbest(default) to let BfxPM decide. - BAM to CRAM: If
samtoolsis installed, BfxPM will offer to convert your heavy BAM files to CRAM, saving 30-50% disk space while remaining 100% lossless. - FAST5 to POD5: If
pod5-utilsis installed, it will offer to convert legacy Oxford Nanopore FAST5 files to the modern, more efficient POD5 signal format. - Standard Fallback: For all other formats (FASTQ, HiFi), it continues to provide standard
.tar.gzcompression to keep your project root clean.
$ bfxpm clean --help
Usage: bfxpm clean [OPTIONS] COMMAND [ARGS]...
Manage and clean project disk space
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ temp Find and prompt to delete large or intermediate files (*.sam, *.tmp). │
│ compress Apply lossless compression to sequence and alignment files. │
│ Supports specialized formats like CRAM (for BAM) and POD5 (for FAST5). │
│ archive Move an old folder into the project's archive/ directory. │
│ decompress Decompress and restore files from .tar.gz, CRAM, or POD5 formats. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ bfxpm clean compress --help
Usage: bfxpm clean compress [OPTIONS]
Apply lossless compression to sequence and alignment files. Supports specialized formats like CRAM (for BAM) and POD5 (for FAST5).
╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --mode -m TEXT Compression mode: 'standard' (tar.gz), 'genomic' (CRAM/POD5), or 'best' [default: best] │
│ --keep Don't delete original files after compression │
│ --help Show this message and exit. │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Restoring Documents (Decompress)¶
If you need to access archived or compressed files, use bfxpm clean decompress.
- Restoration: Automatically extracts
.tar.gzand.gzfiles. - Genomic Reversion:
- CRAM to BAM: Restores your alignment files for tools that don't support CRAM.
- POD5 to FAST5: Converts signal data back to legacy FAST5 format for older basecallers.
bfxpm clean decompress
4. Organization and Visualization¶
If your project gets messy, BfxPM's organization tools help you regain control.
bfxpm map: Take a JSON snapshot of your current file locations before making changes.bfxpm organize: Iteratively route loose files to their designated BfxPM subfolders.bfxpm tree: View a beautiful, color-coded structure of your project in the terminal.
The Project "Time Machine"¶
BfxPM allows you to revert an organization attempt if you decide the previous layout was better.
- Snapshot: Create a map before you organize:
bfxpm map. - Rollback: To revert, provide the map file: use
-ror--rollbackflag followed by the map file name.bfxpm map --rollback bfxpm_structure_map_YYYYMMDD.json - BfxPM will locate your files and move them back to their original relative paths.
5. Reporting and Metadata¶
BfxPM automatically summarizes your project's status for publication or collaboration.
bfxpm report
PROJECT_SUMMARY.md, summary.json, and summary.yml.
6. Data Integrity and Security¶
Bioinformatics data is fragile. BfxPM helps ensure your scientific results aren't based on corrupted files.
bfxpm checksum generate: Creates achecksums.md5manifest for all files indata/.bfxpm checksum verify: Scans your data and compares it against the manifest to detect truncation or bit-rot.
7. Automated Flow Recording¶
Save terminal scripts to a file bfxpm run_history¶
When you run the right command after some troubles, and forget to save them, bfxpm run_history can scan your shell history and save them to a file.
Run History
$ bfxpm run_history --help
Usage: bfxpm run_history [OPTIONS]
Extract complex bioinformatics commands from shell history.
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --save --no-save [default: no-save] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
bfxpm flow: Bridge the gap between interactive exploration and reproducible scripts.¶
- `bfxpm flow start`: Starts monitoring your terminal interactions.
- `bfxpm flow stop`: Automatically filters your shell history and saves your commands into a clean, reproducible `.sh` script in `scripts/`.
Flow Recording
bfxpm flow and bfxpm run_history are designed to work in two completely different ways so that you're covered whether you planned ahead or not:
-
bfxpm run_history(The "Oops" Tool):- No preparation needed. It works even if you never ran
flow. - It retrospectively scans your global shell history (e.g., the last 100 commands) and uses a filter to find complex bioinformatics calls (like
bwa,samtools,nextflow) that you might have forgotten to document. - It’s perfect for when you just finished a quick test and realized, "Wait, that command worked! How did I type it?"
- No preparation needed. It works even if you never ran
-
bfxpm flow(The "Pilot" Tool):- Requires
flow start. - It is a precise, time-bound session recorder. It only captures the commands you ran during a specific mission.
- It's perfect for when you're starting a formal analysis and want a clean, dedicated script of your exact workflow from start to finish.
- Requires
In summary: You do not need to run flow for run_history to work. run_history is a general search tool for your past, while flow is a high-fidelity recorder for your present!
9. Project Portfolio Management¶
As you manage multiple research projects over time, BfxPM provides a way to index all your work into a unified portfolio.
bfxpm projects: Scans your current root (e.g., yourprojects/orlab/folder) and identifies all standardized BfxPM projects.- HTML Export: Use the
--htmlflag to generate a professionalprojects_index.htmlfile. This is a great way to maintain an interactive dashboard of your research progress, file counts, and activity dates.
bfxpm projects --html
$ bfxpm projects --help
Usage: bfxpm projects [OPTIONS] [PATH]
Scan and list all BfxPM projects in the given directory.
╭─ Arguments ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ path [PATH] Root directory to scan for projects [default: .] │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --html Generate an HTML summary of all projects │
│ --help Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
10. Data Deposition¶
When you're ready to publish, BfxPM helps you prepare and deposit your data to public repositories.
bfxpm deposit list: View all supported repositories (Zenodo, FigShare, Dryad, Dataverse, etc.).bfxpm deposit prepare: Create a clean ZIP bundle excluding.git,.venv, and temp files. Use--include-data/--include-resultsflags to control what's included.bfxpm deposit go zenodo: Open the deposit page for a specific repository directly in your browser.
Data Deposit List
Supported Repositories:
- Zenodo : Open repository for EU-funded research outputs from Horizon Europe, Euratom, and earlier Framework Programmes.
- Figshare: Figshare is a provider of repository infrastructure. Our solutions help organizations and researchers share, showcase and manage their outputs.
- Dataverse: The Dataverse Project is an open-source software for sharing, preserving, citing, exploring, and analyzing research data.
- Dryad: Dryad is a curated resource that makes the data underlying scientific and medical literature discoverable and reusable.
- Mendeley: Mendeley is a reference manager and academic social network.
- Datahub: Datahub is a data catalog for the modern data stack.
- Dans: DANS is a Dutch institute for research data.
- Eudat: EUDAT provides data management services for researchers in Europe.
Data Deposit Guard
- In
bfxpm deposit go, before opening a repository's website (like Zenodo or FigShare), a 📂 Data Deposit Alert is now shown. - The tool intelligently labels the repository as either "EU-based" (e.g., Zenodo, DANS, EUDAT) or "Outside the EU" based on its primary service location.
- It requires a final confirmation to ensure the user has reviewed the data for sensitive information and residency compliance before the browser is opened.
# Prepare a publication-ready bundle
bfxpm deposit prepare --output my_project_v1.zip
# Open Zenodo deposit page
bfxpm deposit go zenodo
12. Agentic AI Support¶
BfxPM includes a project-aware BioAssistant to help you navigate your research and automate management tasks.
- Setup:
bfxpm ai setupto connect to Gemini or Ollama. - Consulting: Use
bfxpm ai chatfor an interactive session where the agent can see your file tree and help you organize your work. - Automation: Ask the agent to perform tasks like "Clean up all empty directories in my results folder". BfxPM's Safety Interceptor ensures no destructive commands are run without confirmation and backup.
For more details, see the dedicated AI Agent page.
11. Help¶
$ bfxpm --help
Usage: bfxpm [OPTIONS] COMMAND [ARGS]...
#=================================================#
# BfxPM: Bioinformatician's Project Manager #
#=================================================#
Purpose: Project Manager tool for Bioinformaticians
Developer: Jyotirmoy Das
Maintainer: Jyotirmoy Das
Version: 0.1.0
Bioinformatician's Project Manager CLI
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ init Initialize a new BfxPM project including folder structures and Git repository. │
│ scan Scan a directory for files, summarizing extensions and disk usage. │
│ organize Organize scattered files and handle unknown directories into a BfxPM project. │
│ tree Show a beautiful tree view of the directory and files. │
│ projects Scan and list all BfxPM projects in the given directory. │
│ compress Compress a large folder to tar.gz to save space. │
│ save Save (version control) your scripts and config. │
│ history View a beautiful timeline of script modifications with usage details. │
│ sync Sync your project progress (Git) and optionally your large data folder (Rsync). │
│ login Save GitHub/Cloud token locally. │
│ logout Remove locally saved credentials. │
│ run_history Extract complex bioinformatics commands from shell history. │
│ show Show all the hidden files in the project folder. │
│ report Generate HTML/Markdown summary report and accompanying metadata files. │
│ update Update BfxPM to the latest version via pip. │
│ modify Manually move any files or directories to a new location. │
│ map Create a map of the current directory structure OR rollback to a previous map. │
│ clean Manage and clean project disk space │
│ env Manage standard bioinformatics environments │
│ pipeline Scaffold analysis pipelines │
│ fetch Fetch and route external biological data │
│ checksum Manage data integrity using checksums │
│ flow Record terminal sessions into scripts │
│ deposit Deposit project data to public repositories │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯