Empirical Software Engineering

Papers
(The median citation count of Empirical Software Engineering is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2021-06-01 to 2025-06-01.)
ArticleCitations
Introduction to the special issue on program comprehension186
Optimal priority assignment for real-time systems: a coevolution-based approach95
Effects of variability in models: a family of experiments72
Toward effective secure code reviews: an empirical study of security-related coding weaknesses63
Consensus task interaction trace recommender to guide developers’ software navigation59
An empirical study on the effectiveness of large language models for SATD identification and classification59
On the adoption and effects of source code reuse on defect proneness and maintenance effort56
Path context augmented statement and network for learning programs55
TestEvoViz: visualizing genetically-based test coverage evolution52
Understanding the characteristics and the role of visual issue reports49
Dynamical analysis of diversity in rule-based open source network intrusion detection systems42
Does the first response matter for future contributions? A study of first contributions42
The human experience of comprehending source code in virtual reality40
Can static analysis tools find more defects?38
Seeing the invisible: test prioritization for object detection system36
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests36
A study of documentation for software architecture34
Evaluating few-shot and contrastive learning methods for code clone detection32
An empirical study of IoT topics in IoT developer discussions on Stack Overflow32
Bugs in machine learning-based systems: a faultload benchmark32
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata32
Efficient static analysis and verification of featured transition systems31
The impact of class imbalance techniques on crashing fault residence prediction models31
Deep learning based identification of inconsistent method names: How far are we?28
Smells in system user interactive tests28
Cross-status communication and project outcomes in OSS development28
Automatic prediction of rejected edits in Stack Overflow28
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering27
App review driven collaborative bug finding27
A fine-grained taxonomy of code review feedback in TypeScript projects27
On the use of commit-relevant mutants26
Testing the past: can we still run tests in past snapshots for Java projects?26
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study26
The impact of the COVID-19 pandemic on women’s contribution to public code25
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow25
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration24
Automated test generation for Scratch programs24
Towards cost-benefit evaluation for continuous software engineering activities24
Developers’ perception matters: machine learning to detect developer-sensitive smells23
On the impact of security vulnerabilities in the npm and RubyGems dependency networks23
Evaluating the impact of flaky simulators on testing autonomous driving systems23
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model22
An empirical study of the impact of log parsers on the performance of log-based anomaly detection22
Code reviews in open source projects : how do gender biases affect participation and outcomes?21
Visualizing the customization endeavor in product-based-evolving software product lines: a case of action design research21
Static detection of equivalent mutants in real-time model-based mutation testing21
How far are we with automated machine learning? characterization and challenges of AutoML toolkits21
An empirical study of untangling patterns of two-class dependency cycles21
Experimental comparison of features, analyses, and classifiers for Android malware detection21
An empirical evaluation of a novel domain-specific language – modelling vehicle routing problems with Athos21
A large-scale empirical study of commit message generation: models, datasets and evaluation21
JNFuzz-Droid: a lightweight fuzzing and taint analysis framework for native code of Android applications20
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity20
The well-being of software engineers: a systematic literature review and a theory20
Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements20
Advantages and disadvantages of (dedicated) model transformation languages20
A grounded theory of community package maintenance organizations20
Securing dependencies: A comprehensive study of Dependabot’s impact on vulnerability mitigation19
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects19
An empirical study on the potential of word embedding techniques in bug report management tasks19
How far are app secrets from being stolen? a case study on android19
A configurable method for benchmarking scalability of cloud-native applications19
Engineering recommender systems for modelling languages: concept, tool and evaluation18
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes18
LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction18
Patterns of multi-container composition for service orchestration with Docker Compose18
Towards a recipe for language decomposition: quality assessment of language product lines18
Software product line testing: a systematic literature review18
Demystifying regular expression bugs18
A metrics-based approach for selecting among various refactoring candidates17
What kinds of contracts do ML APIs need?17
Take a deep breath: Benefits of neuroplasticity practices for software developers and computer workers in a family of experiments17
Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction17
Lightweight dynamic build batching algorithms for continuous integration17
When less is more: on the value of “co-training” for semi-supervised software defect predictors16
Software testing in the machine learning era16
OpTrans: enhancing binary code similarity detection with function inlining re-optimization16
On the Investigation of Empirical Contradictions - Aggregated Results of Local Studies on Readability and Comprehensibility of Source Code16
Test smells 20 years later: detectability, validity, and reliability16
RAG-Driven multiple assertions generation with large language models16
Common challenges of deep reinforcement learning applications development: an empirical study16
Mastering uncertainty in performance estimations of configurable software systems16
Language usage analysis for EMF metamodels on GitHub16
On systematically building a controlled natural language for functional requirements16
An empirical study of Q&A websites for game developers15
Automated driver management for Selenium WebDriver15
Semantic matching in GUI test reuse15
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?15
Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study14
Gamification in software engineering: the mediating role of developer engagement and job satisfaction14
Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment14
Comparing effectiveness and efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools in a large java-based system14
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction14
Toward granular search-based automatic unit test case generation14
Prioritizing test cases for deep learning-based video classifiers14
Präzi: from package-based to call-based dependency networks14
Correction to: Examining ownership models in software teams14
A study of how Docker Compose is used to compose multi-component systems14
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts13
Semantically-enhanced topic recommendation systems for software projects13
Demystifying API misuses in deep learning applications13
Which design decisions in AI-enabled mobile applications contribute to greener AI?13
Why secret detection tools are not enough: It’s not just about false positives - An industrial case study13
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP13
Defect prediction using deep learning with Network Portrait Divergence for software evolution13
A controlled experiment on the impact of microtasking on programming13
Program transformation landscapes for automated program modification using Gin13
Challenges and practices of deep learning model reengineering: A case study on computer vision13
Reflections on the Empirical Software Engineering journal13
DDImage: an image reduction based approach for automatically explaining black-box classifiers13
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development12
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions12
Correction to: Towards a recipe for language decomposition: quality assessment of language product lines12
Styler: learning formatting conventions to repair Checkstyle violations12
Static analysis driven enhancements for comprehension in machine learning notebooks12
Modeling function-level interactions for file-level bug localization12
Unveiling overlooked performance variance in serverless computing12
E-APR: Mapping the effectiveness of automated program repair techniques12
A fine-grained data set and analysis of tangling in bug fixing commits12
Cross-project defect prediction via semantic and syntactic encoding12
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-1911
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop11
Fixing Dockerfile smells: an empirical study11
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively11
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities11
CyberSAGE: The cyber security argument graph evaluation tool11
An empirical study of same-day releases of popular packages in the npm ecosystem11
On the spread and evolution of dead methods in Java desktop applications: an exploratory study11
A comprehensive overview of software product management challenges11
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery11
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension11
Learning to Predict Code Review Completion Time In Modern Code Review11
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems11
Explainable automated debugging via large language model-driven scientific debugging11
An empirical study on self-admitted technical debt in Dockerfiles11
Story points changes in agile iterative development10
Model vs system level testing of autonomous driving systems: a replication and extension study10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models10
Assessing the exposure of software changes10
Understanding and effectively mitigating code review anxiety10
Studying differentiated code to support smart contract update10
Refactoring practices in the context of data-intensive systems10
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets10
Predicting merge conflicts considering social and technical assets10
SoftNER: Mining knowledge graphs from cloud incidents10
From guidelines to practice: assessing Android app developer compliance with google’s security recommendations10
Studying the characteristics of AIOps projects on GitHub10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues10
A qualitative study on refactorings induced by code review10
Exposed! A case study on the vulnerability-proneness of Google Play Apps10
Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness9
GitHub Discussions: An exploratory study of early adoption9
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months9
Can search-based testing with pareto optimization effectively cover failure-revealing test inputs?9
On the assignment of commits to releases9
Hyperfuzzing: black-box security hypertesting with a grey-box fuzzer9
IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction9
Detecting data manipulation errors in android applications using scene-guided exploration9
Inter-team communication in large-scale co-located software engineering: a case study9
Transformer-based code model with compressed hierarchy representation9
Automatic bi-modal question title generation for Stack Overflow with prompt learning9
Characterizing refactoring graphs in Java and JavaScript projects9
Multi-granular software annotation using file-level weak labelling9
How to cherry pick the bug report for better summarization?9
Navigating fairness: practitioners’ understanding, challenges, and strategies in AI/ML development9
Agile software development one year into the COVID-19 pandemic9
Beyond the virus: a first look at coronavirus-themed Android malware9
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow9
How programmers find online learning resources9
Extracting enhanced artificial intelligence model metadata from software repositories9
Toward a theory on programmer’s block inspired by writer’s block9
Deep learning approaches for bad smell detection: a systematic literature review8
Just-in-Time crash prediction for mobile apps8
What happens in my code reviews? An investigation on automatically classifying review changes8
Quality issues in machine learning software systems8
Industrial adoption of machine learning techniques for early identification of invalid bug reports8
Mining Python fix patterns via analyzing fine-grained source code changes8
FeatCompare: Feature comparison for competing mobile apps leveraging user reviews8
Does code review speed matter for practitioners?8
Machine learning-based test smell detection8
How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow8
Understanding refactorings in Elixir functional language8
Improving hardware/software interface management in systems of systems through documentation as code8
The forgotten role of search queries in IR-based bug localization: an empirical study8
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews8
Investigating user feedback from a crowd in requirements management in software ecosystems8
Mining and relating design contexts and design patterns from Stack Overflow8
Studying eventual connectivity issues in Android apps8
An empirical study of the systemic and technical migration towards microservices8
Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews8
Automated detection, categorisation and developers’ experience with the violations of honesty in mobile apps8
Come for syntax, stay for speed, write secure code: an empirical study of security weaknesses in Julia programs8
On the suitability of hugging face hub for empirical studies8
Predicting the objective and priority of issue reports in software repositories8
Reuse and maintenance practices among divergent forks in three software ecosystems8
The making of accessible Android applications: an empirical study on the state of the practice8
Correction to: Advantages and disadvantages of (dedicated) model transformation languages8
Empirical evaluation of tools for hairy requirements engineering tasks8
A longitudinal explanatory case study of coordination in a very large development programme: the impact of transitioning from a first- to a second-generation large-scale agile development method7
Ethics in AI through the practitioner’s view: a grounded theory literature review7
Correction to: Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering7
Quantum circuit mutants: Empirical analysis and recommendations7
Fluently specifying taint-flow queries with fluentTQL7
Tracking bad updates in mobile apps: a search-based approach7
On the usage and development of deep learning compilers: an empirical study on TVM7
Developer discussion topics on the adoption and barriers of low code software development platforms7
The indolent lambdification of Java7
What makes a code review useful to OpenDev developers? An empirical investigation7
Software reconfiguration in robotics7
A multi-objective effort-aware approach for early code review prediction and prioritization7
SparseCoder: Advancing source code analysis with sparse attention and learned token pruning7
Causal inference of server- and client-side code smells in web apps evolution7
ROBUST: 221 bugs in the Robot Operating System7
Deep security analysis of program code7
Works for Me! Cannot Reproduce – A Large Scale Empirical Study of Non-reproducible Bugs7
Rotten green tests in Java, Pharo and Python7
WIA-SZZ: Work item aware SZZ7
Developer-centric test amplification7
The evolution of the code during review: an investigation on review changes7
Revisiting the building of past snapshots — a replication and reproduction study7
Bug characterization in machine learning-based systems7
Integrating human values in software development using a human values dashboard7
A preliminary investigation on using multi-task learning to predict change performance in code reviews7
Leveraging encoder-only large language models for mobile app review feature extraction7
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning6
Investigating developers’ perception on software testability and its effects6
Design smells in multi-language systems and bug-proneness: a survival analysis6
Identifying self-admitted technical debt in issue tracking systems using machine learning6
Opportunities and security risks of technical leverage: A replication study on the NPM ecosystem6
Static test flakiness prediction: How Far Can We Go?6
An empirical study of business process models and model clones on GitHub6
A systematic review on smart contracts security design patterns6
The role of psychological safety in promoting software quality in agile teams6
From anecdote to evidence: the relationship between personality and need for cognition of developers6
Correction to: Analysing app reviews for software engineering: a systematic literature review6
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools6
CT-IoT: a combinatorial testing-based path selection framework for effective IoT testing6
How are project-specific forums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem6
Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub6
Detection and evaluation of bias-inducing features in machine learning6
Predicting health indicators for open source projects (using hyperparameter optimization)6
Game-based Sprint retrospectives: multiple action research6
Does class size matter? An in-depth assessment of the effect of class size in software defect prediction6
Fixing vulnerabilities potentially hinders maintainability6
Do I really need all this work to find vulnerabilities?6
What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk6
Detecting outdated code element references in software repository documentation6
An extensive replication study of the ABLoTS approach for bug localization6
A review of automatic source code summarization6
The best ends by the best means: ethical concerns in app reviews6
5.0434610843658