Empirical Software Engineering

Papers
(The TQCC of Empirical Software Engineering is 8. The table below lists those papers that are above that threshold based on CrossRef citation counts [max. 250 papers]. The publications cover those that have been published in the past four years, i.e., from 2020-10-01 to 2024-10-01.)
ArticleCitations
Sampling in software engineering research: a critical review and guidelines114
Predictors of well-being and productivity among software professionals during the COVID-19 pandemic – a longitudinal study90
Test case selection and prioritization using machine learning: a systematic literature review57
Perceived diversity in software engineering: a systematic literature review50
A comprehensive study of bloated dependencies in the Maven ecosystem49
Automated patch assessment for program repair at scale45
Topic modeling in software engineering research43
AI lifecycle models need to be revised43
Enjoy your observability: an industrial survey of microservice tracing and analysis43
A teamwork effectiveness model for agile software development40
A privacy and security analysis of early-deployed COVID-19 contact tracing Android apps40
Promises and challenges of microservices: an exploratory study38
Understanding and improving the quality and reproducibility of Jupyter notebooks37
Predicting the objective and priority of issue reports in software repositories35
Problems with SZZ and features: An empirical study of the state of practice of defect prediction data collection33
Lags in the release, adoption, and propagation of npm vulnerability fixes32
StateAFL: Greybox fuzzing for stateful network servers31
Analysing app reviews for software engineering: a systematic literature review29
An exploratory study on confusion in code reviews28
Out of sight, out of mind? How vulnerable dependencies affect open-source projects27
On the impact of security vulnerabilities in the npm and RubyGems dependency networks27
How do i refactor this? An empirical study on refactoring trends and topics in Stack Overflow27
The secret life of test smells - an empirical study on test smell evolution and maintenance25
Software development with feature toggles: practices used by practitioners25
Self-admitted technical debt practices: a comparison between industry and open-source24
Is GitHub’s Copilot as bad as humans at introducing vulnerabilities in code?24
Topic recommendation for software repositories using multi-label classification algorithms24
World of code: enabling a research workflow for mining and analyzing the universe of open source VCS data24
A large scale analysis of mHealth app user reviews24
Game-based Sprint retrospectives: multiple action research24
Why are many businesses instilling a DevOps culture into their organization?23
Can Offline Testing of Deep Neural Networks Replace Their Online Testing?23
Finding the sweet spot for organizational control and team autonomy in large-scale agile software development23
Industry practices and challenges for the evolvability assurance of microservices23
An empirical study of IoT topics in IoT developer discussions on Stack Overflow22
Understanding shared links and their intentions to meet information needs in modern code review:22
Empirical evaluation of tools for hairy requirements engineering tasks22
Locating faults with program slicing: an empirical analysis21
How Scrum adds value to achieving software quality?21
An empirical study on changing leadership in agile teams21
On the privacy of mental health apps21
A family of experiments on test-driven development21
GitHub Discussions: An exploratory study of early adoption20
Development of recommendation systems for software engineering: the CROSSMINER experience20
Publish or perish, but do not forget your software artifacts20
Learning to recognize actionable static code warnings (is intrinsically easy)20
Spearheading agile: the role of the scrum master in agile projects19
Beyond the virus: a first look at coronavirus-themed Android malware18
A longitudinal explanatory case study of coordination in a very large development programme: the impact of transitioning from a first- to a second-generation large-scale agile development method18
Maintenance-related concerns for post-deployed Ethereum smart contract development: issues, techniques, and future challenges18
Resource and dependency based test case generation for RESTful Web services18
Breaking bad? Semantic versioning and impact of breaking changes in Maven Central18
On systematically building a controlled natural language for functional requirements17
Strategies to manage quality requirements in agile software development: a multiple case study17
TaintBench: Automatic real-world malware benchmarking of Android taint analyses17
Test smells 20 years later: detectability, validity, and reliability17
An empirical study of automated unit test generation for Python17
Can pre-trained code embeddings improve model performance? Revisiting the use of code embeddings in software engineering tasks17
Predicting unstable software benchmarks using static source code features17
Ethics in the mining of software repositories16
Evaluating network embedding techniques’ performances in software bug prediction16
A unified multi-task learning model for AST-level and token-level code completion16
A fine-grained data set and analysis of tangling in bug fixing commits16
Assessment of off-the-shelf SE-specific sentiment analysis tools: An extended replication study16
On the usage, co-usage and migration of CI/CD tools: A qualitative analysis16
To what extent do DNN-based image classification models make unreliable inferences?15
Will you come back to contribute? Investigating the inactivity of OSS core developers in GitHub15
Uniform and scalable sampling of highly configurable systems15
Learning from what we know: How to perform vulnerability prediction using noisy historical data15
Systematic literature review on software quality for AI-based software15
Dynamical analysis of diversity in rule-based open source network intrusion detection systems15
Gamification in software engineering: the mediating role of developer engagement and job satisfaction14
The entrepreneurial logic of startup software development: A study of 40 software startups14
The effects of continuous integration on software development: a systematic literature review14
A configurable method for benchmarking scalability of cloud-native applications14
Reuse and maintenance practices among divergent forks in three software ecosystems14
A multi-dimensional analysis of technical lag in Debian-based Docker images14
API compatibility issues in Android: Causes and effectiveness of data-driven detection techniques14
Empirical analysis of security vulnerabilities in Python packages14
Automated end-to-end management of the modeling lifecycle in deep learning14
Machine learning-based test selection for simulation-based testing of self-driving cars software13
On effort-aware metrics for defect prediction13
Developer-centric test amplification13
Automatically recommending components for issue reports using deep learning13
A comparative study and analysis of developer communications on Slack and Gitter13
Lessons Learnt on Reproducibility in Machine Learning Based Android Malware Detection13
Software testing and Android applications: a large-scale empirical study13
An automated framework for the extraction of semantic legal metadata from legal texts13
Do I really need all this work to find vulnerabilities?13
Are datasets for information retrieval-based bug localization techniques trustworthy?13
GreenHub: a large-scale collaborative dataset to battery consumption analysis of android devices12
From anecdote to evidence: the relationship between personality and need for cognition of developers12
Identifying self-admitted technical debt in issue tracking systems using machine learning12
Deep security analysis of program code12
Automatic identification of self-admitted technical debt from four different sources12
An empirical study on self-admitted technical debt in Dockerfiles12
A first look at Android applications in Google Play related to COVID-1912
Search-based fairness testing for regression-based machine learning systems12
What makes a popular academic AI repository?12
Demystifying the challenges and benefits of analyzing user-reported logs in bug reports12
Understanding developers’ privacy and security mindsets via climate theory12
Efficient static analysis and verification of featured transition systems12
FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction12
Towards effective assessment of steady state performance in Java software: are we there yet?12
Automated driver management for Selenium WebDriver12
A systematic literature review on trust in the software ecosystem12
A study of how Docker Compose is used to compose multi-component systems11
Learning by sampling: learning behavioral family models from software product lines11
Embedding API dependency graph for neural code generation11
Deep learning approaches for bad smell detection: a systematic literature review11
Mutation testing in the wild: findings from GitHub11
Understanding and improving artifact sharing in software engineering research11
Developer discussion topics on the adoption and barriers of low code software development platforms11
How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)11
Model vs system level testing of autonomous driving systems: a replication and extension study11
Where were the repair ingredients for Defects4j bugs?11
An empirical study of text-based machine learning models for vulnerability detection11
An empirical study of the impact of log parsers on the performance of log-based anomaly detection11
Automatic team recommendation for collaborative software development11
Training students in evidence-based software engineering and systematic reviews: a systematic review and empirical study10
An exploratory study on the introduction and removal of different types of technical debt in deep learning frameworks10
Security assurance cases—state of the art of an emerging approach10
FACER: An API usage-based code-example recommender for opportunistic reuse10
Using a balanced scorecard to identify opportunities to improve code review effectiveness: an industrial experience report10
Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews10
Flair: efficient analysis of Android inter-component vulnerabilities in response to incremental changes10
Revisiting reopened bugs in open source software systems10
Characterizing usages, updates and risks of third-party libraries in Java projects10
Why and what happened? Aiding bug comprehension with automated category and causal link identification10
SPVF: security property assisted vulnerability fixing via attention-based models10
Improving energy-efficiency by recommending Java collections10
From one to hundreds: multi-licensing in the JavaScript ecosystem10
Using code reviews to automatically configure static analysis tools10
Exploring Performance Assurance Practices and Challenges in Agile Software Development: An Ethnographic Study10
An empirical study of Q&A websites for game developers10
An empirical study of question discussions on Stack Overflow10
An empirical study of developers’ discussions about security challenges of different programming languages10
Pull request latency explained: an empirical overview9
A conceptual model for unifying variability in space and time: Rationale, validation, and illustrative applications9
FeatCompare: Feature comparison for competing mobile apps leveraging user reviews9
Testing self-healing cyber-physical systems under uncertainty with reinforcement learning: an empirical study9
An empirical study on release notes patterns of popular apps in the Google Play Store9
The sense of logging in the Linux kernel9
Developers perception of peer code review in research software development9
Real world projects, real faults: evaluating spectrum based fault localization techniques on Python projects9
Weighted software metrics aggregation and its application to defect prediction9
CsmithEdge: more effective compiler testing by handling undefined behaviour less conservatively9
Comparing the results of replications in software engineering9
Learning how to search: generating effective test cases through adaptive fitness function selection9
On the evolution and impact of architectural smells—an industrial case study9
Towards cost-benefit evaluation for continuous software engineering activities9
Agile software development one year into the COVID-19 pandemic8
E-APR: Mapping the effectiveness of automated program repair techniques8
Learning lenient parsing & typing via indirect supervision8
Präzi: from package-based to call-based dependency networks8
SSPCatcher: Learning to catch security patches8
FindICI: Using machine learning to detect linguistic inconsistencies between code and natural language descriptions in infrastructure-as-code8
Evaluating classifiers in SE research: the ECSER pipeline and two replication studies8
Responding to change over time: A longitudinal case study on changes in coordination mechanisms in large-scale agile8
Revisiting the debate: Are code metrics useful for measuring maintenance effort?8
Predicting health indicators for open source projects (using hyperparameter optimization)8
DebtFree: minimizing labeling cost in self-admitted technical debt identification using semi-supervised learning8
Software product-line evaluation in the large8
Generating API tags for tutorial fragments from Stack Overflow8
The Relation of Test-Related Factors to Software Quality: A Case Study on Apache Systems8
Considerations and Pitfalls for Reducing Threats to the Validity of Controlled Experiments on Code Comprehension8
Open-source software product line extraction processes: the ArgoUML-SPL and Phaser cases8
Comparing ϕ and the F-measure as performance metrics for software-related classifications8
Practitioner’s view of the success factors for software outsourcing partnership formation: an empirical exploration8
Bugs in machine learning-based systems: a faultload benchmark8
What do class comments tell us? An investigation of comment evolution and practices in Pharo Smalltalk8
An empirical study of same-day releases of popular packages in the npm ecosystem8
Revisiting process versus product metrics: a large scale analysis8
Evolving software system families in space and time with feature revisions8
AndroEvolve: automated Android API update with data flow analysis and variable denormalization8
Advantages and disadvantages of (dedicated) model transformation languages8
A large-scale empirical study of commit message generation: models, datasets and evaluation8
FIXME: synchronize with database! An empirical study of data access self-admitted technical debt8
0.057116031646729