Skip to contents

Overview

The generatervis package provides functions to create empty .fastq files, generate random reads, fill raw .fastq files with random reads, plot .fastq sequences, convert .fastq files to .bam files, convert .bam files to .vcf files, and create metadata files for patient IDs.

⬇️ Installing generatervis

You can install the development version of generatervis from GitHub with

# install.packages("pak")  
pak::pak("Clinical-Informatics-Collaborative/generatervis")  
#> 
#> → Will update 1 package.
#> → The package (0 B) is cached.
#> + generatervis 0.1.0 → 0.1.0 👷‍♀️🔧 (GitHub: 38a905c)
#> ℹ No downloads are needed, 1 pkg is cached
#> ✔ Got generatervis 0.1.0 (source) (45.75 kB)
#> ℹ Packaging generatervis 0.1.0
#> ✔ Packaged generatervis 0.1.0 (689ms)
#> ℹ Building generatervis 0.1.0
#> ✔ Built generatervis 0.1.0 (523ms)
#> ✔ Installed generatervis 0.1.0 (github::Clinical-Informatics-Collaborative/generatervis@38a905c) (15ms)
#> ✔ 1 pkg: upd 1, dld 1 (NA B) [8s]

Usage

Create an empty raw .fastq file for the specified patient.

patient_id <- "patient_123"
generatervis::create_empty_fastq(patient_id)

Generate a random sample of reads for a Whole Genome Sequencing (WGS) dataset for the specified patient ID.

n <- 2
generatervis::rreads(patient_id, n)

Populate the .fastq file with the random reads.

output_dir <- tempdir()
read_length <- 8
generatervis::fill_fastq(patient_id, output_dir, n, read_length)

(Optional) Plot the nucleotide sequences in the .fastq file in a grid format.

generatervis::fastq_plot(patient_id, output_dir, n, read_length)

Convert the (raw) .fastq file to a (processed) .bam file using a dummy .sam format.

fastq_file <- file.path(output_dir, paste0(patient_id, ".fastq"))
generatervis::fill_fastq(patient_id, output_dir, n, read_length)
generatervis::fastq_to_bam(fastq_file, patient_id, output_dir, sam_file = paste0(output_dir, "/", patient_id, ".sam"), reference = "chr1")

To create the corresponding .bam file, use the samtools command-line tool.

# samtools view -Sb path_to/file_name.sam > path_to/file_name.bam

Convert the (processed) .bam file to a (summarised) .vcf file format.

generatervis::bam_to_vcf(patient_id, output_dir, vcf_file = paste0(output_dir, "/", patient_id, ".vcf"))

Create the metadata files to upload to data_storage_repository

generatervis::create_metadata(patient_id, output_dir)

When these metadata .txt files are ready, they can be uploaded to data_storarge_repository by forking the repository and creating a pull request.

Documentation

You can find detailed documentation and tutorials at the package website: https://clinical-informatics-collaborative.github.io/generatervis/

For in-session help:

# View documentation for a specific function
?create_empty_fastq

Contributing

To be added.

Code of Conduct

To be added.

License

To be added.

Citation

To be added.

Acknowledgments

This package is written as a part of the volunteer programme by the Research Computing Program, Walter and Eliza Hall Institute of Medical Research, mentored by Rowland Mosbergen.