Home
Research Publications
Teaching Students CV Software Funding Activities

Software & Research Data

Logical Structural Diff

Abstract

Software engineers often inspect program differences when reviewing others’ code changes, when writing check-in comments, or when determining why a program behaves differently from expected behavior. Program differencing tools that support these tasks are limited in their ability to group related code changes or to detect potential inconsistency in program changes. To overcome these limitations and to complement existing approaches, we built Logical Structural Diff (LSDiff) that infers systematic structural differences as logic rules, noting anomalies from systematic changes as exceptions to the logic rules. We conducted a focus group study with professional software engineers in a large E-commerce company and also compared LSDiff’s results with plain structural differences without rules and textual differences. Our evaluation suggests that LSDiff complements existing differencing tools by grouping code changes that form systematic change patterns regardless of their distribution throughout the code and that its ability to discover anomalies shows promise in detecting inconsistent changes.

Sample output (LSDiff applied to carol revision 430)

Tool 
          My student, Alex Loh developed  an Eclipse plug-in for LSdiff. This Plug-in performs Eclipse JDT analysis to extract structural differences between two versions, infers systematic changes as change-rules using the rule-inference algorithm presented in ICSE 09 paper, and presents learned rules in the LSdiff view. Please contact Alex (alelxloh@cs.utexas.edu) for more information on LSdiff Eclipse plug-in. (November 2009)
          LSdiff Eclipse Plug-In Web Page
          Download the Eclipse Plug-In for LSdiff as a Jar File
          Manual and Screen Shot

         
API-level Code Matching

Abstract

Mapping code elements in one version of a program to corresponding code elements in another version is a fundamental building block for many software engineering tools. Existing tools that match code elements or identify structural changes--refactorings and API changes--between two versions of a program have two limitations that we overcome. First, existing tools cannot easily disambiguate among many potential matches or refactoring candidates. Second, it is difficult to use these tools' results for various software engineering tasks due to an unstructured representation of results. To overcome these limitations, our approach represents structural changes as a set of high-level change rules, automatically infers likely change rules and determines method-level matches based on the rules. By applying our tool to several open source projects, we show that our tool identifies matches that are difficult to find using other approaches and produces more concise results than other approaches. Our representation can serve as a better basis for other software engineering tools.
 

Tool
Please email me to download the source code of the API change-rule inference tool. Please refer to the "README" file. 
(1) edu.washington.cs.likelychangerule
(2) edu.ucsc.originanalysis (S.Kim's origin analysis tool is used to parse Java programs.)
       
Data from our ICSE 2007 paper. 
Jfreechart : Matching results for JFreeChart release archive.
Jhotdraw : Matching results for JHotDraw release archive.
Jedit : Matching results for JEdit release archive.

          XML data fomat description

Comparison Data. 
S.Kim et al.’s function renaming analysis (WCRE 2005) and Weissgerber and Deihl’s refactoring reconstruction (ASE 2006)
Comparison on the argouml data set used by S. Kim et al.
Comparison on the jedit data set used by S.Kim et al.
Comparison on the jedit data set used by Weissgerber and Deihl

Clone Genealogy Data

Abstract

It has been broadly assumed that code clones are inherently bad and that eliminating clones by refactoring would solve the problems of code clones. To investigate the validity of this assumption, we developed a formal definition of clone evolution and built a clone genealogy tool that automatically extracts the history of code clones from a source code repository. Using our tool we extracted clone genealogy information for two Java open source projects and analyzed their evolution.
Our study contradicts some conventional wisdom about clones. In particular, refactoring may not always improve
software with respect to clones for two reasons. First, many code clones exist in the system for only a short time; extensive refactoring of such short-lived clones may not be worthwhile if they are likely diverge from one another very soon. Second, many clones, especially long-lived clones that have changed consistently with other elements in the same group, are not easily refactorable due to programming language limitations. These insights show that refactoring will not help in dealing with some types of clones and open up opportunities for complementary clone maintenance tools that target these other classes of clones.

Alloy Clone Genealogy Model

Data

Dnsjava
Clone genealogies with a similarity threshold 0.1
Clone genealogies with a similarity threshold 0.3
Clone genealogies with a similarity threshold 0.5
Versions used in Dnsjava (Either its release version number or its check-in date and time)
Version names in chronological order
Clone text found by CCFinder per each version 

Carol
Clone genealogies with a similarity threshold 0.1
Clone genealogies with a similarity threshold 0.3
Clone genealogies with a similarity threshold 0.5
Versions used in Carol (Either its release version number or its check-in date and time)
Version names in chronological order
Clone text found by CCFinder per each version