Empirical Software Engineering

Papers
(The TQCC of Empirical Software Engineering is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-05-01 to 2025-05-01.)
ArticleCitations
Introduction to the special issue on program comprehension179
Seeing the invisible: test prioritization for object detection system86
Optimal priority assignment for real-time systems: a coevolution-based approach70
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata59
Effects of variability in models: a family of experiments56
Toward effective secure code reviews: an empirical study of security-related coding weaknesses55
A study of documentation for software architecture55
Can static analysis tools find more defects?55
An empirical study on the effectiveness of large language models for SATD identification and classification50
Consensus task interaction trace recommender to guide developers’ software navigation49
On the adoption and effects of source code reuse on defect proneness and maintenance effort47
Path context augmented statement and network for learning programs46
TestEvoViz: visualizing genetically-based test coverage evolution39
Efficient static analysis and verification of featured transition systems38
Understanding the characteristics and the role of visual issue reports38
Bugs in machine learning-based systems: a faultload benchmark37
An empirical study of IoT topics in IoT developer discussions on Stack Overflow35
Assessing practitioner beliefs about software engineering34
Evaluating few-shot and contrastive learning methods for code clone detection32
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests32
Dynamical analysis of diversity in rule-based open source network intrusion detection systems31
Does the first response matter for future contributions? A study of first contributions31
Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net29
The human experience of comprehending source code in virtual reality29
The impact of class imbalance techniques on crashing fault residence prediction models28
Smells in system user interactive tests27
Automatic prediction of rejected edits in Stack Overflow27
Deep learning based identification of inconsistent method names: How far are we?27
Cross-status communication and project outcomes in OSS development26
Automated test generation for Scratch programs26
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration25
Evaluating the impact of flaky simulators on testing autonomous driving systems25
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering25
On the use of commit-relevant mutants24
Developers’ perception matters: machine learning to detect developer-sensitive smells24
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study24
App review driven collaborative bug finding24
Testing the past: can we still run tests in past snapshots for Java projects?23
The impact of the COVID-19 pandemic on women’s contribution to public code22
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow22
A fine-grained taxonomy of code review feedback in TypeScript projects22
Towards cost-benefit evaluation for continuous software engineering activities21
The well-being of software engineers: a systematic literature review and a theory21
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model21
How far are app secrets from being stolen? a case study on android21
An empirical study of the impact of log parsers on the performance of log-based anomaly detection21
On the impact of security vulnerabilities in the npm and RubyGems dependency networks21
Code reviews in open source projects : how do gender biases affect participation and outcomes?21
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects20
An empirical study of untangling patterns of two-class dependency cycles20
An empirical evaluation of a novel domain-specific language – modelling vehicle routing problems with Athos20
Visualizing the customization endeavor in product-based-evolving software product lines: a case of action design research20
A large-scale empirical study of commit message generation: models, datasets and evaluation19
A grounded theory of community package maintenance organizations19
Experimental comparison of features, analyses, and classifiers for Android malware detection19
Static detection of equivalent mutants in real-time model-based mutation testing19
Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements18
Securing dependencies: A comprehensive study of Dependabot’s impact on vulnerability mitigation18
An empirical study on the potential of word embedding techniques in bug report management tasks18
How far are we with automated machine learning? characterization and challenges of AutoML toolkits18
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity18
A configurable method for benchmarking scalability of cloud-native applications18
Lightweight dynamic build batching algorithms for continuous integration18
Advantages and disadvantages of (dedicated) model transformation languages18
Engineering recommender systems for modelling languages: concept, tool and evaluation18
Take a deep breath: Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments18
Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction17
LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction17
Towards a recipe for language decomposition: quality assessment of language product lines17
A metrics-based approach for selecting among various refactoring candidates17
Demystifying regular expression bugs17
Patterns of multi-container composition for service orchestration with Docker Compose17
Software product line testing: a systematic literature review16
Common challenges of deep reinforcement learning applications development: an empirical study16
Comparing effectiveness and efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools in a large java-based system16
When less is more: on the value of “co-training” for semi-supervised software defect predictors16
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes16
What kinds of contracts do ML APIs need?16
Language usage analysis for EMF metamodels on GitHub16
Mastering uncertainty in performance estimations of configurable software systems15
Automated driver management for Selenium WebDriver15
Software testing in the machine learning era15
On the Investigation of Empirical Contradictions - Aggregated Results of Local Studies on Readability and Comprehensibility of Source Code15
On systematically building a controlled natural language for functional requirements15
Semantic matching in GUI test reuse15
OpTrans: enhancing binary code similarity detection with function inlining re-optimization15
Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study14
Präzi: from package-based to call-based dependency networks14
Security assurance cases—state of the art of an emerging approach14
RAG-Driven multiple assertions generation with large language models14
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?14
An empirical study of Q&A websites for game developers14
Test smells 20 years later: detectability, validity, and reliability14
Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection13
Correction to: Examining ownership models in software teams13
Toward granular search-based automatic unit test case generation13
Gamification in software engineering: the mediating role of developer engagement and job satisfaction13
Which design decisions in AI-enabled mobile applications contribute to greener AI?13
Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment13
A study of how Docker Compose is used to compose multi-component systems13
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction13
Challenges and practices of deep learning model reengineering: A case study on computer vision13
Semantically-enhanced topic recommendation systems for software projects13
Prioritizing test cases for deep learning-based video classifiers13
Reflections on the Empirical Software Engineering journal12
Defect prediction using deep learning with Network Portrait Divergence for software evolution12
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively12
Demystifying API misuses in deep learning applications12
Fixing Dockerfile smells: an empirical study12
Program transformation landscapes for automated program modification using Gin12
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts12
Styler: learning formatting conventions to repair Checkstyle violations12
A controlled experiment on the impact of microtasking on programming12
Why secret detection tools are not enough: It’s not just about false positives - An industrial case study12
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP12
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop12
Learning to Predict Code Review Completion Time In Modern Code Review12
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines11
Cross-project defect prediction via semantic and syntactic encoding11
DDImage: an image reduction based approach for automatically explaining black-box classifiers11
On the spread and evolution of dead methods in Java desktop applications: an exploratory study11
Unveiling overlooked performance variance in serverless computing11
A comprehensive overview of software product management challenges11
Modeling function-level interactions for file-level bug localization11
Static analysis driven enhancements for comprehension in machine learning notebooks11
CyberSAGE: The cyber security argument graph evaluation tool11
A fine-grained data set and analysis of tangling in bug fixing commits11
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities11
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems11
E-APR: Mapping the effectiveness of automated program repair techniques11
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions11
Explainable automated debugging via large language model-driven scientific debugging10
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-1910
An empirical study of same-day releases of popular packages in the npm ecosystem10
Assessing the exposure of software changes10
Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness10
Understanding and effectively mitigating code review anxiety10
An empirical study on self-admitted technical debt in Dockerfiles10
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models10
Refactoring practices in the context of data-intensive systems10
Exposed! A case study on the vulnerability-proneness of Google Play Apps10
Predicting merge conflicts considering social and technical assets10
SoftNER: Mining knowledge graphs from cloud incidents10
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery10
A qualitative study on refactorings induced by code review10
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development10
Story points changes in agile iterative development10
Automatic bi-modal question title generation for Stack Overflow with prompt learning10
Agile software development one year into the COVID-19 pandemic10
IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction9
How programmers find online learning resources9
Studying the characteristics of AIOps projects on GitHub9
Beyond the virus: a first look at coronavirus-themed Android malware9
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months9
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues9
Characterizing refactoring graphs in Java and JavaScript projects9
On the assignment of commits to releases9
Model vs system level testing of autonomous driving systems: a replication and extension study9
Studying differentiated code to support smart contract update9
Inter-team communication in large-scale co-located software engineering: a case study9
GitHub Discussions: An exploratory study of early adoption9
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow9
Can search-based testing with pareto optimization effectively cover failure-revealing test inputs?9
Transformer-based code model with compressed hierarchy representation9
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations9
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets9
Navigating fairness: practitioners’ understanding, challenges, and strategies in AI/ML development9
What happens in my code reviews? An investigation on automatically classifying review changes8
Deep learning approaches for bad smell detection: a systematic literature review8
The forgotten role of search queries in IR-based bug localization: an empirical study8
Extracting enhanced artificial intelligence model metadata from software repositories8
An empirical study of the systemic and technical migration towards microservices8
How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow8
Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews8
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews8
Hyperfuzzing: black-box security hypertesting with a grey-box fuzzer8
Come for syntax, stay for speed, write secure code: an empirical study of security weaknesses in Julia programs8
Empirical evaluation of tools for hairy requirements engineering tasks8
Machine learning-based test smell detection8
Industrial adoption of machine learning techniques for early identification of invalid bug reports8
Predicting the objective and priority of issue reports in software repositories8
Reuse and maintenance practices among divergent forks in three software ecosystems8
How to cherry pick the bug report for better summarization?8
Multi-granular software annotation using file-level weak labelling8
Toward a theory on programmer’s block inspired by writer’s block8
0.1379280090332