Bioinformatics

Papers
(The TQCC of Bioinformatics is 13. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-07-01 to 2024-07-01.)
ArticleCitations
clinker & clustermap.js: automatic generation of gene cluster comparison figures624
YaHS: yet another Hi-C scaffolding tool582
GraphDTA: predicting drug–target binding affinity with graph neural networks408
Analysing high-throughput sequencing data in Python with HTSeq 2.0401
New strategies to improve minimap2 alignment accuracy388
GTDB-Tk v2: memory friendly classification with the genome taxonomy database379
Liftoff: accurate mapping of gene annotations371
DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome339
LDpred2: better, faster, stronger291
CoV-AbDab: the coronavirus antibody database276
STREME: accurate and versatile sequence motif discovery267
CAFE 5 models variation in evolutionary rates among gene families264
pyGenomeTracks: reproducible plots for multivariate genomic datasets 262
ProteinBERT: a universal deep-learning model of protein sequence and function258
CoV-Spectrum: analysis of globally shared SARS-CoV-2 data to identify and characterize new variants212
MolTrans: Molecular Interaction Transformer for drug–target interaction prediction193
DeepPurpose: a deep learning library for drug–target interaction prediction180
fastsimcoal2: demographic inference under complex evolutionary scenarios173
LocusZoom.js: interactive and embeddable visualization of genetic association study results159
Dream: powerful differential expression analysis for repeated measures designs152
Nebulosa recovers single-cell gene expression signals by kernel density estimation144
glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data138
Fast and sensitive taxonomic assignment to metagenomic contigs126
microbiomeMarker: an R/Bioconductor package for microbiome marker identification and visualization126
UCSC Cell Browser: visualize your single-cell data125
Weighted minimizer sampling improves long read mapping118
Colour deconvolution: stain unmixing in histological imaging113
DeepCDR: a hybrid graph convolutional network for predicting cancer drug response110
ggtranscript: an R package for the visualization and interpretation of transcript isoforms usingggplot2108
COVID-19 Docking Server: a meta server for docking small molecules, peptides and antibodies against potential targets of COVID-19107
ShinyCell: simple and sharable visualization of single-cell gene expression data103
dittoSeq: universal user-friendly single-cell and bulk RNA sequencing visualization toolkit101
IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning100
MGIDI: toward an effective multivariate selection in biological experiments100
FlaGs and webFlaGs: discovering novel biology through the analysis of gene neighbourhood conservation100
Unsupervised topological alignment for single-cell multi-omics integration98
The VEGA suite of programs: an versatile platform for cheminformatics and drug design projects97
ProDy 2.0: increased scale and scope after 10 years of protein dynamics modelling with Python97
BERT4Bitter: a bidirectional encoder representations from transformers (BERT)-based model for improving the prediction of bitter peptides97
MUFFIN: multi-scale feature fusion for drug–drug interaction prediction95
Scirpy: a Scanpy extension for analyzing single-cell T-cell receptor-sequencing data93
plotsr: visualizing structural similarities and rearrangements between multiple genomes93
propeller: testing for differences in cell type proportions in single cell data92
Information theoretic generalized Robinson–Foulds metrics for comparing phylogenetic trees91
TITAN: T-cell receptor specificity prediction with bimodal attention networks90
LightBBB: computational prediction model of blood–brain-barrier penetration based on LightGBM89
PBSIM2: a simulator for long-read sequencers with a novel generative model of quality scores88
DNA Features Viewer: a sequence annotation formatting and plotting library for Python86
ABlooper: fast accurate antibody CDR loop structure prediction with accuracy estimation86
PhyKIT: a broadly applicable UNIX shell toolkit for processing and analyzing phylogenomic data86
Accurate, scalable cohort variant calls using DeepVariant and GLnexus84
Systematic determination of the mitochondrial proportion in human and mice tissues for single-cell RNA-sequencing data quality control84
methylclock: a Bioconductor package to estimate DNA methylation age83
HiSCF: leveraging higher-order structures for clustering analysis in biological networks83
Fast gap-affine pairwise alignment using the wavefront algorithm82
Make Interactive Complex Heatmaps in R80
SoluProt: prediction of soluble protein expression in Escherichia coli80
Impact of protein conformational diversity on AlphaFold predictions77
Structure-aware protein–protein interaction site prediction using deep graph convolutional network74
POKY: a software suite for multidimensional NMR and 3D structure calculation of biomolecules74
SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization74
CellProfiler Analyst 3.0: accessible data exploration and machine learning for image analysis72
Protein interaction interface region prediction by geometric deep learning72
HyperAttentionDTI: improving drug–protein interaction prediction by sequence-based deep learning with attention mechanism71
Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function71
PyMod 3: a complete suite for structural bioinformatics in PyMOL71
DELPHI: accurate deep ensemble model for protein interaction sites prediction70
NanoCLUST: a species-level analysis of 16S rRNA nanopore sequencing data70
Cellsnp-lite: an efficient tool for genotyping single cells69
GraphQA: protein model quality assessment using graph convolutional networks68
MOVICS: an R package for multi-omics integration and visualization in cancer subtyping67
ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning65
V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data65
MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery65
BWA-MEME: BWA-MEM emulated with a machine learning approach64
Deuteros 2.0: peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry64
BP4RNAseq: a babysitter package for retrospective and newly generated RNA-seq data analyses using both alignment-based and alignment-free quantification method64
Conditional out-of-distribution generation for unpaired data using transfer VAE64
DeepSurf: a surface-based deep learning approach for the prediction of ligand binding sites on proteins63
iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor63
Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning63
UCSCXenaShiny: an R/CRAN package for interactive analysis of UCSC Xena data63
RNA-SeQC 2: efficient RNA-seq quality control and quantification for large cohorts62
TALE: Transformer-based protein function Annotation with joint sequence–Label Embedding61
COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology61
Bacteriophage classification for assembled contigs using graph convolutional network60
DeepGraphGO: graph neural network for large-scale, multispecies protein function prediction59
Extended connectivity interaction features: improving binding affinity prediction through chemical description59
iCarPS: a computational tool for identifying protein carbonylation sites by novel encoded features59
Mutalyzer 2: next generation HGVS nomenclature checker59
ImmuCellAI-mouse: a tool for comprehensive prediction of mouse immune cell abundance and immune microenvironment depiction58
Toward heterogeneous information fusion: bipartite graph convolutional networks for in silico drug repurposing58
Humanization of antibodies using a machine learning approach on large-scale repertoire data58
SpatialExperiment: infrastructure for spatially-resolved transcriptomics data in R using Bioconductor57
VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families56
Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data56
stPlus: a reference-based method for the accurate enhancement of spatial transcriptomics55
eMPRess: a systematic cophylogeny reconciliation tool54
Current structure predictors are not learning the physics of protein folding54
MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks54
HunFlair: an easy-to-use tool for state-of-the-art biomedical named entity recognition54
lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning54
Evaluating single-cell cluster stability using the Jaccard similarity index53
ViralMSA: massively scalable reference-guided multiple sequence alignment of viral genomes53
MBG: Minimizer-based sparse de Bruijn Graph construction53
Geometric potentials from deep learning improve prediction of CDR H3 loop structures52
ODGI: understanding pangenome graphs52
StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps51
Identification of sub-Golgi protein localization by use of deep representation learning features50
Interfacing Seurat with the R tidy universe50
DamageProfiler: fast damage pattern calculation for ancient DNA50
AEMDA: inferring miRNA–disease associations based on deep autoencoder50
amPEPpy 1.0: a portable and accurate antimicrobial peptide prediction tool50
HierCC: a multi-level clustering scheme for population assignments based on core genome MLST49
Ribbon: intuitive visualization for complex genomic variation49
EpiDope: a deep neural network for linear B-cell epitope prediction49
SVIM-asm: structural variant detection from haploid and diploid genome assemblies48
MobiDB-lite 3.0: fast consensus annotation of intrinsic disorder flavors in proteins48
Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning48
DLAB: deep learning methods for structure-based virtual screening of antibodies48
Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona47
Cellinker: a platform of ligand–receptor interactions for intercellular communication analysis47
RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB47
Swarm v3: towards tera-scale amplicon clustering47
MIB2: metal ion-binding site prediction and modeling server47
The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction47
GMNN2CD: identification of circRNA–disease associations based on variational inference and graph Markov neural networks46
LinearPartition: linear-time approximation of RNA folding partition function and base-pairing probabilities45
FL-QSAR: a federated learning-based QSAR prototype for collaborative drug discovery45
mlr3proba: an R package for machine learning in survival analysis45
OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks44
ToxDL: deep learning using primary structure and domain embeddings for assessing protein toxicity43
ganon: precise metagenomics classification against large and up-to-date sets of reference sequences43
Plotgardener: cultivating precise multi-panel figures in R43
AMICI: high-performance sensitivity analysis for large ordinary differential equation models43
Real-time mapping of nanopore raw signals43
Highly accurate classification of chest radiographic reports using a deep learning natural language model pre-trained on 3.8 million text reports42
Socket2: a program for locating, visualizing and analyzing coiled-coil interfaces in protein structures42
MultiDTI: drug–target interaction prediction based on multi-modal representation learning to bridge the gap between new chemical entities and known heterogeneous network42
TandemTools: mapping long reads and assessing/improving assembly quality in extra-long tandem repeats42
Using drug descriptions and molecular structures for drug–drug interaction extraction from literature42
BridgeDPI: a novel Graph Neural Network for predicting drug–protein interactions41
MR-Clust: clustering of genetic variants in Mendelian randomization with similar causal estimates41
Fijiyama: a registration tool for 3D multimodal time-lapse imaging41
PROSS 2: a new server for the design of stable and highly expressed protein variants41
Stitching and registering highly multiplexed whole-slide images of tissues and tumors using ASHLAR41
Tiara: deep learning-based classification system for eukaryotic sequences41
SCIM: universal single-cell matching with unpaired feature sets41
Ensembling graph attention networks for human microbe–drug association prediction41
BACPI: a bi-directional attention neural network for compound–protein interaction and binding affinity prediction40
Deep graph learning of inter-protein contacts40
MAGUS: Multiple sequence Alignment using Graph clUStering40
BERN2: an advanced neural biomedical named entity recognition and normalization tool39
monaLisa: an R/Bioconductor package for identifying regulatory motifs39
BERT-Kcr: prediction of lysine crotonylation sites by a transfer learning method with pre-trained BERT models39
orfipy: a fast and flexible tool for extracting ORFs39
SpacePHARER: sensitive identification of phages from CRISPR spacers in prokaryotic hosts39
REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets39
MungeSumstats: a Bioconductor package for the standardization and quality control of many GWAS summary statistics38
Generating property-matched decoy molecules using deep learning38
Graph neural representational learning of RNA secondary structures for predicting RNA-protein interactions38
Improved survival analysis by learning shared genomic information from pan-cancer data38
Predicting protein–peptide binding residues via interpretable deep learning38
GPDBN: deep bilinear network integrating both genomic data and pathological images for breast cancer prognosis prediction38
cytomapper: an R/Bioconductor package for visualization of highly multiplexed imaging data38
coronaSPAdes: from biosynthetic gene clusters to RNA viral assemblies37
ELIXIR: providing a sustainable infrastructure for life science data at European scale37
SOMDE: a scalable method for identifying spatially variable genes with self-organizing map37
ASpli: integrative analysis of splicing landscapes through RNA-Seq assays37
Modeling multi-scale data via a network of networks37
scGate: marker-based purification of cell types from heterogeneous single-cell RNA-seq datasets37
SimPlot++: a Python application for representing sequence similarity and detecting recombination37
DeepViral: prediction of novel virus–host interactions from protein sequences and infectious disease phenotypes36
VIDHOP, viral host prediction with deep learning36
Advanced graph and sequence neural networks for molecular property prediction and drug discovery36
Deep learning models for RNA secondary structure prediction (probably) do not generalize across families36
Bayesian modeling of spatial molecular profiling data via Gaussian process36
Topsy-Turvy: integrating a global view into sequence-based PPI prediction36
iPromoter-BnCNN: a novel branched CNN-based predictor for identifying and classifying sigma promoters36
FraGAT: a fragment-oriented multi-scale graph attention model for molecular property prediction36
Adversarial deconfounding autoencoder for learning robust gene expression embeddings36
PhosIDN: an integrated deep neural network for improving protein phosphorylation site prediction by combining sequence and protein–protein interaction information35
BERTMHC: improved MHC–peptide class II interaction prediction with transformer and multiple instance learning35
QDeep: distance-based protein model quality estimation by residue-level ensemble error classifications using stacked deep residual neural networks35
MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model34
synergy: a Python library for calculating, analyzing and visualizing drug combination synergy34
Learning embedding features based on multisense-scaled attention architecture to improve the predictive performance of anticancer peptides34
TPpred-ATMV: therapeutic peptide prediction by adaptive multi-view tensor learning model34
Multi-omics data integration by generative adversarial network34
DNA Chisel, a versatile sequence optimizer34
KG4SL: knowledge graph neural network for synthetic lethality prediction in human cancers33
TRTools: a toolkit for genome-wide analysis of tandem repeats33
FBA reveals guanylate kinase as a potential target for antiviral therapies against SARS-CoV-233
Inference of gene regulatory networks based on nonlinear ordinary differential equations33
Pre-training graph neural networks for link prediction in biomedical networks33
Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data33
Transfer learning via multi-scale convolutional neural layers for human–virus protein–protein interaction prediction33
Node similarity-based graph convolution for link prediction in biological networks32
DisoLipPred: accurate prediction of disordered lipid-binding residues in protein sequences with deep recurrent networks and transfer learning32
Sparse and skew hashing of K-mers32
DRUMMER—rapid detection of RNA modifications through comparative nanopore sequencing32
STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data32
sepal: identifying transcript profiles with spatial patterns by diffusion-based modeling32
CLNN-loop: a deep learning model to predict CTCF-mediated chromatin loops in the different cell lines and CTCF-binding sites (CBS) pair types31
Inferring cancer progression from Single-Cell Sequencing while allowing mutation losses31
Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework31
ipDMR: identification of differentially methylated regions with interval P-values31
A mixture model for determining SARS-Cov-2 variant composition in pooled samples31
A deep dilated convolutional residual network for predicting interchain contacts of protein homodimers31
Methylartist: tools for visualizing modified bases from nanopore sequence data31
Identifying signaling genes in spatial single-cell expression data31
CUT&RUNTools 2.0: a pipeline for single-cell and bulk-level CUT&RUN and CUT&Tag data analysis31
Supervised graph co-contrastive learning for drug–target interaction prediction31
EpiGraphDB: a database and data mining platform for health data science31
Integrating genome-scale metabolic modelling and transfer learning for human gene regulatory network reconstruction30
ClusTCR: a python interface for rapid clustering of large sets of CDR3 sequences with unknown antigen specificity30
Cooperative sequence clustering and decoding for DNA storage system with fountain codes30
DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms30
PecanPy: a fast, efficient and parallelized Python implementation of node2vec30
AITL: Adversarial Inductive Transfer Learning with input and output space adaptation for pharmacogenomics30
High-performance single-cell gene regulatory network inference at scale: the Inferelator 3.030
Improved design and analysis of practical minimizers30
E-MAGMA: an eQTL-informed method to identify risk genes using genome-wide association study summary statistics30
Statistical approaches for differential expression analysis in metatranscriptomics30
Drug repurposing against breast cancer by integrating drug-exposure expression profiles and drug–drug links based on graph neural network29
Annotating high-impact 5′untranslated region variants with the UTRannotator29
TransformerGO: predicting protein–protein interactions by modelling the attention between sets of gene ontology terms29
Sigflow: an automated and comprehensive pipeline for cancer genome mutational signature analysis29
CrossTalkeR: analysis and visualization of ligand–receptorne tworks29
SMILE: mutual information learning for integration of single-cell omics data29
NerLTR-DTA: drug–target binding affinity prediction based on neighbor relationship and learning to rank29
The string decomposition problem and its applications to centromere analysis and assembly29
Clustering spatial transcriptomics data29
Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation29
DeepTrio: a ternary prediction system for protein–protein interaction using mask multiple parallel convolutional neural networks29
Accurate spliced alignment of long RNA sequencing reads29
Effective drug–target interaction prediction with mutual interaction neural network29
Federated Random Forests can improve local performance of predictive models for various healthcare applications28
CSM-AB: graph-based antibody–antigen binding affinity prediction and docking scoring function28
SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model28
Graph2MDA: a multi-modal variational graph embedding model for predicting microbe–drug associations28
AOP-helpFinder webserver: a tool for comprehensive analysis of the literature to support adverse outcome pathways development28
MOMA: a multi-task attention learning algorithm for multi-omics data interpretation and classification28
Cross-dependent graph neural networks for molecular property prediction28
Non-parametric modelling of temporal and spatial counts data from RNA-seq experiments28
panRGP: a pangenome-based method to predict genomic islands and explore their diversity28
LLPSDB v2.0: an updated database of proteins undergoing liquid–liquid phase separation in vitro28
DeepUMQA: ultrafast shape recognition-based protein model quality assessment using deep learning28
Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation28
GNN-based embedding for clustering scRNA-seq data28
0.10362815856934