Empirical Software Engineering

Papers
(The median citation count of Empirical Software Engineering is 3. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-10-01 to 2024-10-01.)
ArticleCitations
Sampling in software engineering research: a critical review and guidelines114
Predictors of well-being and productivity among software professionals during the COVID-19 pandemic – a longitudinal study90
Test case selection and prioritization using machine learning: a systematic literature review57
Perceived diversity in software engineering: a systematic literature review50
A comprehensive study of bloated dependencies in the Maven ecosystem49
Automated patch assessment for program repair at scale45
Enjoy your observability: an industrial survey of microservice tracing and analysis43
Topic modeling in software engineering research43
AI lifecycle models need to be revised43
A teamwork effectiveness model for agile software development40
A privacy and security analysis of early-deployed COVID-19 contact tracing Android apps40
Promises and challenges of microservices: an exploratory study38
Understanding and improving the quality and reproducibility of Jupyter notebooks37
Predicting the objective and priority of issue reports in software repositories35
Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection33
Lags in the release, adoption, and propagation of npm vulnerability fixes32
StateAFL: Greybox fuzzing for stateful network servers31
Analysing app reviews for software engineering: a systematic literature review29
An exploratory study on confusion in code reviews28
Out of sight, out of mind? How vulnerable dependencies affect open-source projects27
On the impact of security vulnerabilities in the npm and RubyGems dependency networks27
How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow27
The secret life of test smells - an empirical study on test smell evolution and maintenance25
Software development with feature toggles: practices used by practitioners25
Game-based Sprint retrospectives: multiple action research24
Self-admitted technical debt practices: a comparison between industry and open-source24
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?24
Topic recommendation for software repositories using multi-label classification algorithms24
World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data24
A large scale analysis of mHealth app user reviews24
Industry practices and challenges for the evolvability assurance of microservices23
Why are many businesses instilling a DevOps culture into their organization?23
Can Offline Testing of Deep Neural Networks Replace Their Online Testing?23
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development23
Empirical evaluation of tools for hairy requirements engineering tasks22
An empirical study of IoT topics in IoT developer discussions on Stack Overflow22
Understanding shared links and their intentions to meet information needs in modern code review:22
A family of experiments on test-driven development21
Locating faults with program slicing: an empirical analysis21
How Scrum adds value to achieving software quality?21
An empirical study on changing leadership in agile teams21
On the privacy of mental health apps21
GitHub Discussions: An exploratory study of early adoption20
Development of recommendation systems for software engineering: the CROSSMINER experience20
Publish or perish, but do not forget your software artifacts20
Learning to recognize actionable static code warnings (is intrinsically easy)20
Spearheading agile: the role of the scrum master in agile projects19
Beyond the virus: a first look at coronavirus-themed Android malware18
A longitudinal explanatory case study of coordination in a very large development programme: the impact of transitioning from a first- to a second-generation large-scale agile development method18
Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges18
Resource and dependency based test case generation for RESTful Web services18
Breaking bad? Semantic versioning and impact of breaking changes in Maven Central18
On systematically building a controlled natural language for functional requirements17
Strategies to manage quality requirements in agile software development: a multiple case study17
TaintBench: Automatic real-world malware benchmarking of Android taint analyses17
Test smells 20 years later: detectability, validity, and reliability17
An empirical study of automated unit test generation for Python17
Can pre-trained code embeddings improve model performance? Revisiting the use of code embeddings in software engineering tasks17
Predicting unstable software benchmarks using static source code features17
Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study16
On the usage, co-usage and migration of CI/CD tools: A qualitative analysis16
Ethics in the mining of software repositories16
Evaluating network embedding techniques’ performances in software bug prediction16
A unified multi-task learning model for AST-level and token-level code completion16
A fine-grained data set and analysis of tangling in bug fixing commits16
Systematic literature review on software quality for AI-based software15
Dynamical analysis of diversity in rule-based open source network intrusion detection systems15
To what extent do DNN-based image classification models make unreliable inferences?15
Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub15
Uniform and scalable sampling of highly configurable systems15
Learning from what we know: How to perform vulnerability prediction using noisy historical data15
Gamification in software engineering: the mediating role of developer engagement and job satisfaction14
The entrepreneurial logic of startup software development: A study of 40 software startups14
The effects of continuous integration on software development: a systematic literature review14
A configurable method for benchmarking scalability of cloud-native applications14
Reuse and maintenance practices among divergent forks in three software ecosystems14
A multi-dimensional analysis of technical lag in Debian-based Docker images14
API compatibility issues in Android: Causes and effectiveness of data-driven detection techniques14
Empirical analysis of security vulnerabilities in Python packages14
Automated end-to-end management of the modeling lifecycle in deep learning14
On effort-aware metrics for defect prediction13
Developer-centric test amplification13
Automatically recommending components for issue reports using deep learning13
Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection13
Machine learning-based test selection for simulation-based testing of self-driving cars software13
Software testing and Android applications: a large-scale empirical study13
An automated framework for the extraction of semantic legal metadata from legal texts13
Do I really need all this work to find vulnerabilities?13
Are datasets for information retrieval-based bug localization techniques trustworthy?13
A comparative study and analysis of developer communications on Slack and Gitter13
A first look at Android applications in Google Play related to COVID-1912
Search-based fairness testing for regression-based machine learning systems12
What makes a popular academic AI repository?12
Demystifying the challenges and benefits of analyzing user-reported logs in bug reports12
Understanding developers’ privacy and security mindsets via climate theory12
Efficient static analysis and verification of featured transition systems12
FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction12
Towards effective assessment of steady state performance in Java software: are we there yet?12
Automated driver management for Selenium WebDriver12
A systematic literature review on trust in the software ecosystem12
GreenHub: a large-scale collaborative dataset to battery consumption analysis of android devices12
From anecdote to evidence: the relationship between personality and need for cognition of developers12
Identifying self-admitted technical debt in issue tracking systems using machine learning12
Deep security analysis of program code12
Automatic identification of self-admitted technical debt from four different sources12
An empirical study on self-admitted technical debt in Dockerfiles12
Model vs system level testing of autonomous driving systems: a replication and extension study11
Where were the repair ingredients for Defects4j bugs?11
An empirical study of text-based machine learning models for vulnerability detection11
An empirical study of the impact of log parsers on the performance of log-based anomaly detection11
Automatic team recommendation for collaborative software development11
A study of how Docker Compose is used to compose multi-component systems11
Learning by sampling: learning behavioral family models from software product lines11
Embedding API dependency graph for neural code generation11
Deep learning approaches for bad smell detection: a systematic literature review11
Mutation testing in the wild: findings from GitHub11
Understanding and improving artifact sharing in software engineering research11
Developer discussion topics on the adoption and barriers of low code software development platforms11
How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)11
Characterizing usages, updates and risks of third-party libraries in Java projects10
Why and what happened? Aiding bug comprehension with automated category and causal link identification10
SPVF: security property assisted vulnerability fixing via attention-based models10
Improving energy-efficiency by recommending Java collections10
From one to hundreds: multi-licensing in the JavaScript ecosystem10
Using code reviews to automatically configure static analysis tools10
Exploring Performance Assurance Practices and Challenges in Agile Software Development: An Ethnographic Study10
An empirical study of Q&A websites for game developers10
An empirical study of question discussions on Stack Overflow10
An empirical study of developers’ discussions about security challenges of different programming languages10
Training students in evidence-based software engineering and systematic reviews: a systematic review and empirical study10
An exploratory study on the introduction and removal of different types of technical debt in deep learning frameworks10
Security assurance cases—state of the art of an emerging approach10
FACER: An API usage-based code-example recommender for opportunistic reuse10
Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report10
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews10
Flair: efficient analysis of Android inter-component vulnerabilities in response to incremental changes10
Revisiting reopened bugs in open source software systems10
Weighted software metrics aggregation and its application to defect prediction9
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively9
Comparing the results of replications in software engineering9
Learning how to search: generating effective test cases through adaptive fitness function selection9
On the evolution and impact of architectural smells—an industrial case study9
Towards cost-benefit evaluation for continuous software engineering activities9
Pull request latency explained: an empirical overview9
A conceptual model for unifying variability in space and time: Rationale, validation, and illustrative applications9
FeatCompare: Feature comparison for competing mobile apps leveraging user reviews9
Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study9
An empirical study on release notes patterns of popular apps in the Google Play Store9
The sense of logging in the Linux kernel9
Developers perception of peer code review in research software development9
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects9
Generating API tags for tutorial fragments from Stack Overflow8
The Relation of Test-Related Factors to Software Quality: A Case Study on Apache Systems8
Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension8
Open-source software product line extraction processes: the ArgoUML-SPL and Phaser cases8
Learning lenient parsing & typing via indirect supervision8
Comparing ϕ and the F-measure as performance metrics for software-related classifications8
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration8
Bugs in machine learning-based systems: a faultload benchmark8
An empirical study of same-day releases of popular packages in the npm ecosystem8
Revisiting process versus product metrics: a large scale analysis8
Evolving software system families in space and time with feature revisions8
AndroEvolve: automated Android API update with data flow analysis and variable denormalization8
Advantages and disadvantages of (dedicated) model transformation languages8
A large-scale empirical study of commit message generation: models, datasets and evaluation8
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt8
Agile software development one year into the COVID-19 pandemic8
E-APR: Mapping the effectiveness of automated program repair techniques8
Präzi: from package-based to call-based dependency networks8
SSPCatcher: Learning to catch security patches8
FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code8
Evaluating classifiers in SE research: the ECSER pipeline and two replication studies8
What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk8
Responding to change over time: A longitudinal case study on changes in coordination mechanisms in large-scale agile8
Revisiting the debate: Are code metrics useful for measuring maintenance effort?8
Predicting health indicators for open source projects (using hyperparameter optimization)8
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning8
Software product-line evaluation in the large8
A qualitative study of developers’ discussions of their problems and joys during the early COVID-19 months7
SoftNER: Mining knowledge graphs from cloud incidents7
Propagating frugal user feedback through closeness of code dependencies to improve IR-based traceability recovery7
GitHub Actions: The Impact on the Pull Request Process7
CMFuzz: context-aware adaptive mutation for fuzzers7
Does class size matter? An in-depth assessment of the effect of class size in software defect prediction7
Understanding peer review of software engineering papers7
Mining and relating design contexts and design patterns from Stack Overflow7
Omni: automated ensemble with unexpected models against adversarial evasion attack7
Static detection of equivalent mutants in real-time model-based mutation testing7
Crowdsmelling: A preliminary study on using collective knowledge in code smells detection7
Two N-of-1 self-trials on readability differences between anonymous inner classes (AICs) and lambda expressions (LEs) on Java code snippets7
A machine and deep learning analysis among SonarQube rules, product, and process metrics for fault prediction7
Evaluating the robustness of source code plagiarism detection tools to pervasive plagiarism-hiding modifications7
Evaluating state-of-the-art # SAT solvers on industrial configuration spaces7
AIBugHunter: A Practical tool for predicting, classifying and repairing software vulnerabilities7
Empirical assessment of generating adversarial configurations for software product lines7
An empirical study of data constraint implementations in Java7
Information retrieval versus deep learning approaches for generating traceability links in bilingual projects7
Individual differences limit predicting well-being and productivity using software repositories: a longitudinal industrial study7
On the adequacy of static analysis warnings with respect to code smell prediction7
On the fulfillment of coordination requirements in open-source software projects: An exploratory study7
A comprehensive overview of software product management challenges7
CT-IoT: a combinatorial testing-based path selection framework for effective IoT testing7
The nature of build changes7
On the analysis of non-coding roles in open source development7
Mining Python fix patterns via analyzing fine-grained source code changes7
Revisiting the VCCFinder approach for the identification of vulnerability-contributing commits7
Investigating design anti-pattern and design pattern mutations and their change- and fault-proneness6
TraceSim: An Alignment Method for Computing Stack Trace Similarity6
Assessing exception handling testing practices in open-source libraries6
Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net6
An empirical study on the use of SZZ for identifying inducing changes of non-functional bugs6
SmartFast: an accurate and robust formal analysis tool for Ethereum smart contracts6
Modeling Performance of Microservices Systems with Growth Theory6
Discovering configuration workflows from existing logs using process mining6
Evolution of automated weakness detection in Ethereum bytecode: a comprehensive study6
What really changes when developers intend to improve their source code: a commit-level study of static metric value and static analysis warning changes6
An empirical study of the systemic and technical migration towards microservices6
Characterizing refactoring graphs in Java and JavaScript projects6
Evaluating refactorings for disciplining #ifdef annotations: An eye tracking study with novices6
Newcomer OSS-Candidates: Characterizing Contributions of Novice Developers to GitHub6
Helping or not helping? Why and how trivial packages impact the npm ecosystem6
An empirical study of issue-link algorithms: which issue-link algorithms should we use?6
A fly in the ointment: an empirical study on the characteristics of Ethereum smart contract code weaknesses6
How are project-specific forums utilized? A study of participation, content, and sentiment in the Eclipse ecosystem6
Self-Admitted Technical Debt and comments’ polarity: an empirical study6
Revisiting the building of past snapshots — a replication and reproduction study6
An exploratory study on the repeatedly shared external links on Stack Overflow6
A mixed-methods analysis of micro-collaborative coding practices in OpenStack6
How do Android developers improve non-functional properties of software?6
Exposed! A case study on the vulnerability-proneness of Google Play Apps6
An empirical study of the effectiveness of IR-based bug localization for large-scale industrial projects6
Static test flakiness prediction: How Far Can We Go?6
On the effectiveness of log representation for log-based anomaly detection6
Styler: learning formatting conventions to repair Checkstyle violations6
Understanding large-scale software systems – structure and flows6
Applying test case prioritization to software microbenchmarks6
Small-Amp: Test amplification in a dynamically typed language6
Software variability in service robotics6
Release synchronization in software ecosystems6
Quality gatekeepers: investigating the effects of code review bots on pull request activities6
Code smells detection via modern code review: a study of the OpenStack and Qt communities5
Measuring affective states from technical debt5
TCTracer: Establishing test-to-code traceability links using dynamic and static techniques5
Path context augmented statement and network for learning programs5
On the use of commit-relevant mutants5
On the preferences of quality indicators for multi-objective search algorithms in search-based software engineering5
CyberSAGE: The cyber security argument graph evaluation tool5
Conclusion stability for natural language based mining of design discussions5
On the Removal of Feature Toggles5
“More Than Deep Learning”: post-processing for API sequence recommendation5
0.1630699634552