OOIR: Observatory of International Research

Papers

(The median citation count of Empirical Software Engineering is 4. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2022-06-01 to 2026-06-01.)

Article	Citations
Introduction to the special issue on program comprehension	104
TestEvoViz: visualizing genetically-based test coverage evolution	88
Consensus task interaction trace recommender to guide developers’ software navigation	88
Shaky structures: The wobbly world of causal graphs in software analytics	80
Underproduction analysis of open source software	71
The human experience of comprehending source code in virtual reality	66
Security by documentation? characterizing GitHub SECURITY.md policy and their adoption in Python libraries	55
The design space of lockfiles across package managers	51
(In)Security of mobile apps in developing countries: a systematic literature review	50
Seeing the invisible: test prioritization for object detection system	45
Optimal priority assignment for real-time systems: a coevolution-based approach	42
Can static analysis tools find more defects?	41
Understanding the characteristics and the role of visual issue reports	40
Fuzzing-based mutation testing of C/C++ software in cyber-physical systems	40
Evaluating software user feedback classifier performance on unseen apps, datasets, and metadata	39
An empirical study on the effectiveness of large language models for SATD identification and classification	38
More than React: Investigating the Role of Emoji Reaction in GitHub Pull Requests	38
Mitigating omitted variable bias in empirical software engineering	37
On the adoption and effects of source code reuse on defect proneness and maintenance effort	37
Does the first response matter for future contributions? A study of first contributions	35
Bugs in machine learning-based systems: a faultload benchmark	35
Evaluating few-shot and contrastive learning methods for code clone detection	34
Toward effective secure code reviews: an empirical study of security-related coding weaknesses	33
A study of documentation for software architecture	33
The impact of the COVID-19 pandemic on women’s contribution to public code	32

Automated test generation for Scratch programs	32
Developers’ perception matters: machine learning to detect developer-sensitive smells	32
Cross-status communication and project outcomes in OSS development	32
Automatic prediction of rejected edits in Stack Overflow	31
On the emergence of testing strategies: A socio-technical grounded theory	30
The impact of class imbalance techniques on crashing fault residence prediction models	30
BTLink : automatic link recovery between issues and commits based on pre-trained BERT model	29
Evaluating the impact of flaky simulators on testing autonomous driving systems	29
App review driven collaborative bug finding	27
Deep learning techniques to detect cybersecurity attacks: a systematic mapping study	26
Output format biases in the evaluation of large language models for code translation	26
Maintaining shared understanding of non-functional requirements in small companies using continuous software engineering	25
Towards cost-benefit evaluation for continuous software engineering activities	24
Analyzing and mitigating (with LLMs) the security misconfigurations of Helm charts from Artifact Hub	24
What causes exceptions in machine learning applications? Mining machine learning-related stack traces on Stack Overflow	24
Testing the past: can we still run tests in past snapshots for Java projects?	24
Collaboration failure analysis in cyber-physical system-of-systems using context fuzzy clustering	24
Deep learning based identification of inconsistent method names: How far are we?	24
The Influence of Code Comments on the Perceived Helpfulness of Stack Overflow Posts	23
Smells in system user interactive tests	23
A fine-grained taxonomy of code review feedback in TypeScript projects	23
The effect of stereotypes on perceived competence of indigenous software practitioners: a study of dress style in professional photos	22
Indentation and reading time: a randomized control trial on the differences between generated indented and non-indented if-statements	22
An empirical study of untangling patterns of two-class dependency cycles	22
AI support for data scientists: An empirical study on workflow and alternative code recommendations	22
A grounded theory of community package maintenance organizations	22
A Comprehensive Study of the Lifecycle of Dormant npm Packages	21
How far are we with automated machine learning? characterization and challenges of AutoML toolkits	21
On combining commit grouping and build skip prediction to reduce redundant continuous integration activity	21
An empirical evaluation of a novel domain-specific language – modelling vehicle routing problems with Athos	21
A configurable method for benchmarking scalability of cloud-native applications	21
An empirical study of the impact of log parsers on the performance of log-based anomaly detection	21
Why android app testing falls short: empirical insights from open-source projects and a practitioner survey	20
JNFuzz-Droid: a lightweight fuzzing and taint analysis framework for native code of Android applications	20
Code reviews in open source projects : how do gender biases affect participation and outcomes?	19
How far are app secrets from being stolen? a case study on android	19
Static detection of equivalent mutants in real-time model-based mutation testing	19
Securing dependencies: A comprehensive study of Dependabot’s impact on vulnerability mitigation	19
Scalable hierarchical protocol format inference via feature-heuristic message delimiter	19
Quantifying adoption: A SEM study of quantum software technology in software development	18
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects	18
Experimental comparison of features, analyses, and classifiers for Android malware detection	18
Understanding practitioners’ reasoning and requirements for efficient tool support in technical debt management	18
Advantages and disadvantages of (dedicated) model transformation languages	18
The well-being of software engineers: a systematic literature review and a theory	17
Lightweight dynamic build batching algorithms for continuous integration	17
Patterns of multi-container composition for service orchestration with Docker Compose	17
A large-scale empirical study of commit message generation: models, datasets and evaluation	17
A metrics-based approach for selecting among various refactoring candidates	17
Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction	16

An empirical study on the potential of word embedding techniques in bug report management tasks	16
ContractFull: a rapid and comprehensive static analysis tool for Ethereum smart contracts	16
Engineering recommender systems for modelling languages: concept, tool and evaluation	16
Validation of an analyzability model for quantum software: a family of experiments	16
Local software buildability across Java versions	16
Language usage analysis for EMF metamodels on GitHub	15
LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction	15
Common challenges of deep reinforcement learning applications development: an empirical study	15
Enhanced SQL error messages facilitate faster error fixing	15
Software testing in the machine learning era	15
Mastering uncertainty in performance estimations of configurable software systems	15
What kinds of contracts do ML APIs need?	15
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes	15
Securing LLM-in-the-loop software for empirical study of risks, mitigations, and utility trade-offs in a safety-critical case	15
Tools and benchmarks evolve: what is their impact on parameter tuning in SBSE experiments?	15
Software product line testing: a systematic literature review	15
OpTrans: enhancing binary code similarity detection with function inlining re-optimization	14
On the Investigation of Empirical Contradictions - Aggregated Results of Local Studies on Readability and Comprehensibility of Source Code	14
Comparing effectiveness and efficiency of Interactive Application Security Testing (IAST) and Runtime Application Self-Protection (RASP) tools in a large java-based system	14
RAG-Driven multiple assertions generation with large language models	14
When less is more: on the value of “co-training” for semi-supervised software defect predictors	14
Can the configuration of static analyses make resolving security vulnerabilities more effective? - A user study	14
Semantic matching in GUI test reuse	14
Prioritizing test cases for deep learning-based video classifiers	14
Test smells 20 years later: detectability, validity, and reliability	14
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction	14
Preface to the Special Issue on Security Testing for Complex Software Systems Special Issue 1239 Editorial	14
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?	14
DRECT: A search-based developer recommendation approach for software crowdsourcing platforms	14
Exploring the black box: analysing explainable AI challenges and best practices through stack exchange discussions	13
Challenges and practices of deep learning model reengineering: A case study on computer vision	13
Program transformation landscapes for automated program modification using Gin	13
Classifier or prompt: A case study on legal requirements traceability	13
Test schedule generation for acceptance testing of mission-critical satellite systems	13
Meta-enhanced code: leveraging structural and functional features for precise cross-modal code search	13
Defect prediction using deep learning with Network Portrait Divergence for software evolution	13
Toward granular search-based automatic unit test case generation	13
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts	13
Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment	13
Semantically-enhanced topic recommendation systems for software projects	13
Towards understanding the challenges of bug localization in deep learning systems	13
Correction to: Examining ownership models in software teams	13
Studying the explanations for the automated prediction of bug and non-bug issues using LIME and SHAP	13
Measuring SES-related traits relating to technology usage: Two validated surveys	13
Which design decisions in AI-enabled mobile applications contribute to greener AI?	13
An exploratory study on fine-tuning large language models for secure code generation	13
An empirical study of testing practices in open source AI agent frameworks and agentic applications	13
On the spread and evolution of dead methods in Java desktop applications: an exploratory study	12
On detection latencies of network intrusion detectors – discussion and application	12
DDImage: an image reduction based approach for automatically explaining black-box classifiers	12
A zero-shot framework for cross-project vulnerability detection in source code	12
A multi-model framework for semantically enhancing detection of quality-related bug report descriptions	12
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively	11
What have we learned? A conceptual framework on New Zealand software professionals and companies’ response to COVID-19	11
Experimental Evaluation of a Checklist-Based Inspection Technique to Verify the Compliance of Software Systems with the Brazilian General Data Protection Law	11
Automated detection of algorithm debt in deep learning frameworks: an empirical study	11
Implicit security requirements classification with large language models using the OWASP application security verification standard: a shift-left approach	11
Fixing Dockerfile smells: an empirical study	11
CyberSAGE: The cyber security argument graph evaluation tool	11
Identifying performance-sensitive configurations in software systems with LLM-based agents	11
Modeling function-level interactions for file-level bug localization	11
Styler: learning formatting conventions to repair Checkstyle violations	11
KPIRoot+: An efficient integrated framework for anomaly detection and root cause analysis in large-scale cloud systems	11
Unveiling overlooked performance variance in serverless computing	11
Demystifying API misuses in deep learning applications	11
When uncertainty leads to unsafety: Empirical insights into the role of uncertainty in unmanned aerial vehicle safety	11
APR4Vul: an empirical study of automatic program repair techniques on real-world Java vulnerabilities	11
Seeing confusion through a new lens: on the impact of atoms of confusion on novices’ code comprehension	11
A controlled experiment on the impact of microtasking on programming	11
A fine-grained evaluation of mutation operators to boost mutation testing for deep learning systems	11
A fine-grained data set and analysis of tangling in bug fixing commits	11
How challenging it is to identify real code authors: an empirical study	11
Static analysis driven enhancements for comprehension in machine learning notebooks	11
Learning to Predict Code Review Completion Time In Modern Code Review	11
Cross-project defect prediction via semantic and syntactic encoding	11
Towards automatic labeling of exception handling bugs: A case study of 10 years bug-fixing in Apache Hadoop	11
Understanding and effectively mitigating code review anxiety	10
Investigating cross-market android apps: Security, protection, and components	10
Studying differentiated code to support smart contract update	10
Explainable automated debugging via large language model-driven scientific debugging	10

From guidelines to practice: assessing Android app developer compliance with google’s security recommendations	10
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months	10
Navigating fairness: practitioners’ understanding, challenges, and strategies in AI/ML development	10
Model vs system level testing of autonomous driving systems: a replication and extension study	10
Automatic bi-modal question title generation for Stack Overflow with prompt learning	10
Agile software development one year into the COVID-19 pandemic	10
Predicting merge conflicts considering social and technical assets	10
Detecting data manipulation errors in android applications using scene-guided exploration	10
Towards understanding quality challenges of the federated learning for neural networks: a first look from the lens of robustness	10
ComPass: Contrastive Learning for Automated Patch Correctness Assessment in Program Repair	10
Story points changes in agile iterative development	10
Studying the characteristics of AIOps projects on GitHub	10
Assessing the exposure of software changes	10
Detecting API compatibility issues of android applications based on screen transition graphs	10
Empirically evaluating flaky test detection techniques combining test case rerunning and machine learning models	10
Transformer-based code model with compressed hierarchy representation	10
Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow	10
A qualitative study on refactorings induced by code review	10
Refactoring practices in the context of data-intensive systems	10
An empirical study on developers’ shared conversations with ChatGPT in GitHub pull requests and issues	10
Hyperfuzzing: black-box security hypertesting with a grey-box fuzzer	9
Can search-based testing with pareto optimization effectively cover failure-revealing test inputs?	9
An empirical study of the systemic and technical migration towards microservices	9
Decoupling in AI ethics: Learning how to walk the talk	9
GenCode: A generic data augmentation framework for boosting deep learning-based code understanding	9
What characteristics make ChatGPT effective for software issue resolution? An empirical study of task, project, and conversational signals in GitHub issues	9
IRJIT: A simple, online, information retrieval approach for just-in-time software defect prediction	9
How programmers find online learning resources	9
Understanding refactorings in Elixir functional language	9
On the assignment of commits to releases	9
Can generative AI bridge the gap? A quasi-experimental study of non-programmers with AI vs. programmers without AI	9
An efficient model maintenance approach for MLOps	9
Developers and generative AI: A study of self-admitted usage in open source projects	9
Leveraging large language models for sentiment analysis in GitHub pull request discussions	9
The whos, whats, and whys of issues related to personal data and data protection in open-source projects on GitHub	9
Peer-aided repairer: empowering large language models to repair advanced student assignments	9
Software selection in large-scale software engineering: A model and criteria based on interactive rapid reviews	9
“What really happened to my models?” Extending co-evolution with cross-layer traceability in metamodel-model histories	9
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews	9
Extracting enhanced artificial intelligence model metadata from software repositories	9
Industrial adoption of machine learning techniques for early identification of invalid bug reports	9
Come for syntax, stay for speed, write secure code: an empirical study of security weaknesses in Julia programs	9
Toward a theory on programmer’s block inspired by writer’s block	9
Correction to: Why do companies create and how do they succeed with a vendor-led open source foundation	9
CMF-Vul: Advancing automated vulnerability detection via contrastive multimodal fusion and challenge-driven representation learning	9
Investigating user feedback from a crowd in requirements management in software ecosystems	8
Machine learning-based test smell detection	8
Multi-granular software annotation using file-level weak labelling	8
A comprehensive study of machine learning techniques for log-based anomaly detection	8
ROBUST: 221 bugs in the Robot Operating System	8
A multi-objective effort-aware approach for early code review prediction and prioritization	8
On the suitability of hugging face hub for empirical studies	8
Continuance use of AI coding assistants among South Korean Industry Developers: A survey case study with large language models	8
Software reconfiguration in robotics	8
What makes a code review useful to OpenDev developers? An empirical investigation	8
Quality issues in machine learning software systems	8
Deep learning approaches for bad smell detection: a systematic literature review	8
Correction to: Utilization of pre-trained language models for adapter-based knowledge transfer in software engineering	8
Improving hardware/software interface management in systems of systems through documentation as code	8
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools	8
A longitudinal explanatory case study of coordination in a very large development programme: the impact of transitioning from a first- to a second-generation large-scale agile development method	8
Quantum circuit mutants: Empirical analysis and recommendations	8
The making of accessible Android applications: an empirical study on the state of the practice	8
Correction to: Advantages and disadvantages of (dedicated) model transformation languages	8
Does code review speed matter for practitioners?	8
SparseCoder: Advancing source code analysis with sparse attention and learned token pruning	8
Automated detection, categorisation and developers’ experience with the violations of honesty in mobile apps	8
“I see models being a whole other thing”: an empirical study of pre-trained model naming conventions and a tool for enhancing naming consistency	8
Bringing it home: successful backsourcing of software development in the public sector	8
A multi-language perspective on the robustness of LLM code generation	8
Ethics in AI through the practitioner’s view: a grounded theory literature review	8
Just-in-Time crash prediction for mobile apps	8
WIA-SZZ: Work item aware SZZ	8
Peer code review in research software development: The research software engineer perspective	7
Less is more: usefulness of data flow diagrams and large language models for security threat validation	7
Guiding principles for mixed methods research in software engineering	7
A systematic review on smart contracts security design patterns	7
eBPF-Guard: a detection method for container escape via multi-level monitoring and enhanced analysis model	7
Opportunities and security risks of technical leverage: A replication study on the NPM ecosystem	7
An eye tracking study assessing source code readability rules for program comprehension	7
Integrating human values in software development using a human values dashboard	7
Immutable in principle, upgradeable by design: exploratory study of smart contract upgradeability	7
LegiCode: A blockchain-legal LLM framework for real-time compliance in smart contract generation	7
The role of psychological safety in promoting software quality in agile teams	7
Cross-project defect prediction based on transfer graph convolutional network	7
Predicting Post-release Defects with Knowledge Units (KUs) of Programming Languages: An Empirical Study	7
Investigating developers’ perception on software testability and its effects	7
Do I really need all this work to find vulnerabilities?	7
Developer discussion topics on the adoption and barriers of low code software development platforms	7
Technical leverage analysis in the Python ecosystem	7
Design smells in multi-language systems and bug-proneness: a survival analysis	7
Vulnerabilities in infrastructure as code: what, how many, and who?	7
MPDA: a data augmentation approach to improve deep learning for software vulnerability detection	7
Detection, classification and prevalence of self-admitted aging debt	7
Detecting outdated code element references in software repository documentation	7