Skip to content
View nictru's full-sized avatar

Highlights

  • Pro

Block or report nictru

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nictru/README.md

Hey, nice to meet you!

I am an aspiring bioinformatician and computational biologist. My primary fields of interests are pipeline development and machine learning in bulk, single-cell and non-coding transcriptomics.

Currently, I am doing an internship at Orakl Oncology, where I am working on the integration of multi-modal organoid datasets.

Below, you can find an overview of my most relevant work. If you find it useful or want to connect, feel free to follow me on LinkedIn and GitHub, or reach out via E-Mail.

Past and ongoing projects

Bachelor's thesis: Circular RNAs in Estrogen Signaling and Breast Cancer Biology

This project was based on data from mouse models. A major part of it was the robust detection of circRNAs and identifying potential regulatory roles. Identified many molecules that are already known, but also some novel hits. Details can be found in the dedicated repository. The pipeline for detecting and quantifying circRNAs is now available as nf-core/circrna.

Single cell atlas construction: COST MyeInfoBank

COST MyeInfoBank is a consortium of biologists and bioinformaticians that aims on deepening the understanding of myeloid cells inflammatory diseases and cancer. The working group I am involved with focuses on the construction of single-cell transcriptomic atlases of relevant diseases, and drawing insights about myeloid cells from these atlases.

For this purpose, I built a pipeline that allows building such atlases in a scalable and reproducible way. While originally called SIMBA, the pipeline is now available as nf-core/scdownstream. More information about my role in the consortium can be found in this article.

VASA-Seq data analysis

This was a university project at the Battich lab. VASA-Seq is a relatively new protocol for single cell transcriptomics, that allows capturing non-coding transcripts. Since there is no established workflow for processing this data yet, I developed a version of nf-core/scrnaseq that is tailored for VASA-Seq data. The resulting pipeline can be found here.

Key differences to the mainstream nf-core/scrnaseq are:

  • Genome preprocessing (include custom FASTA snippets for transcripts that are not annotated in the reference genome, like rRNA)
  • Allow obtaining spliced and unspliced counts for RNA velocity analysis using STARsolo (this has since been merged into nf-core/scrnaseq)

Pipeline development

I am the primary maintainer of three nf-core pipelines:

  • nf-core/scdownstream: Takes count matrices as input and performs QC, integration, clustering and some follow-up tasks. Originally implemented for the creation of single-cell atlases in the COST MyeInfoBank consortium.
  • nf-core/circrna: Allows to detect and quantify circular RNAs, investigate miRNA interactions and to perform statistical tests. I implemented this as part of my Bachelor's thesis.
  • nf-core/tfactivity: Takes donor-matched bulk transcriptomics and open chromatin data (e.g. ATAC-Seq, Histone ChIP-Seq) from two or more conditions as input. Tries to identify transcription factors that are differentially active across the provided conditions.

Also I am assisting the development of two more nf-core pipelines:

  • nf-core/scrnaseq: Takes FastQ files from droplet-based scRNA-Seq technologies as input, performs read-level QC, alignment and quantification, provides the user with count matrices.
  • nf-core/hadge: This one is not in nf-core yet, but will be soon. It allows processing of genotype-multiplexed scRNA-Seq datasets. Output are count matrices, just like in nf-core/scrnaseq.

Pinned Loading

  1. nf-core/scdownstream nf-core/scdownstream Public

    A single cell transcriptomics pipeline for QC, integration and making the data presentable

    Nextflow 80 47

  2. nf-core/scrnaseq nf-core/scrnaseq Public

    Single-cell RNA-Seq pipeline for barcode-based protocols such as 10x, DropSeq or SmartSeq, offering a variety of aligners and empty-droplet detection

    Nextflow 305 206

  3. nf-core/circrna nf-core/circrna Public

    circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data

    Nextflow 59 32

  4. nf-core/tfactivity nf-core/tfactivity Public

    Bioinformatics pipeline that makes use of expression and open chromatin data to identify differentially active transcription factors across conditions.

    Nextflow 12 4

  5. algobio algobio Public

    This project aims on making the algorithms covered in the corresponding lecture at LMU Munich easier to understand by providing step-by-step logging and result visualization.

    Python 7 1

  6. Mye-InfoBank/hugo-unifier Mye-InfoBank/hugo-unifier Public

    This python package can unify gene symbols across datasets based on the HUGO database.

    Python