## Vdiff: A Program Differencing Algorithm for Verilog HDL

Adam Duley ARM Inc Christopher Spandikow IBM Corporation

Miryung Kim The University of Texas at Austin Problem: Limitations of using *diff* on evolving hardware designs

- assumes sequential execution semantics
- relies on code elements having unique names
- does not leverage Boolean expression equivalence checking despite the availability of SAT solvers

#### Solution: Vdiff

- a position-independent differencing algorithm with intimate knowledge of Verilog semantics
- 96.8% precision with 97.3% recall compared to manually classified differences
- produces syntactic differencing results in terms of Verilog-specific change types

### Outline

#### Motivation

- Verilog Background
- Vdiff Algorithm
- Evaluation
- Conclusions

#### Motivation

- hardware designers collaboratively evolve large Verilog programs
- hard to use *diff*-like tools during code reviews
- develop a foundation for reasoning about evolving hardware design descriptions

```
include "uart defines.v"
module uart rfifo (clk, reset, data out);
input clk, reset;
output [fifo width-1:0] data out;
reg [fifo counter w-1:0] fifo;
wire [fifo pointer w-1:0] overrun;
always @ (posedge clk or posedge reset)
begin
  if(reset)
 begin
    fifo[1] <= 0;
    fifo[0] <= 0;
  end
end
assign data out = fifo[0];
endmodule
```

include "uart defines.v"

module uart\_rfifo (clk, reset, data\_out);

input clk, reset; output [fifo\_width-1:0] data\_out; reg [fifo counter w-1:0] fifo;

> Modules are building blocks with an explicit input and output port interface.

A module is similar to a Java class.



include "uart\_defines.v"

module uart\_rfifo (clk, reset, data\_out); input clk, reset;

output [fifo width-1:0] data out;

reg [fifo\_counter\_w-1:0] fifo; wire [fifo\_pointer\_w-1:0] overrun;

Input and output ports are public interfaces that connect modules to external hierarchy.

They are similar to a constructor's parameter list in Java.

end
end
assign data\_out = fifo[0];
endmodule

include "uart\_defines.v"
module uart\_rfifo (clk, reset, data\_out);
input clk, reset;

output [fifo\_width-1:0] data\_out;

reg [fifo\_counter\_w-1:0] fifo; wire [fifo pointer w-1:0] overrun;

always @(posedge clk or posedge reset)
begin
if(reset)

#### Wires, registers, & integers are variable declarations.

end
end
assign data\_out = fifo[0];
endmodule

include "uart defines.v"

always blocks are similar to Java methods. However, they execute concurrently when the sensitivity list is true.

wire [fifo\_pointer\_w-1:0] overrun;

```
always @(posedge clk or posedge reset)
begin
    if(reset)
    begin
        fifo[1] <= 0;
        fifo[0] <= 0;
    end
end // always
assign data_out = fifo[0];
endmodule</pre>
```

```
include "uart_defines.v"
module uart_rfifo (clk, reset, data_out);
input clk, reset;
output [fifo_width-1:0] data_out;
reg [fifo_counter_w-1:0] fifo;
wire [fifo_pointer_w-1:0] overrun;
always @(posedge clk or posedge reset)
begin
    if(reset)
    begin
```

```
fifo[1] <= 0;</pre>
```

Assign statements model concurrent dataflow.

```
end // always
```

```
assign data_out = fifo[0];
```

endmodule

#### Blocking statements are sequential assignments.

reg [fifo\_counter\_w-1:0] fifo; wire [fifo\_pointer\_w-1:0] overrun; always @(posedge clk or posedge reset) begin

if(reset)

begin

fifo[1] = 0;
fifo[0] = 0;

end

end

assign data\_out = fifo; endmodule

include "uart\_defines.v"
module uart\_rfifo (clk, reset, data\_out);

Non-blocking statements are concurrent assignments, and they are prevalent in Verilog designs.

wire [fifo\_pointer\_w-1.0] overrun; always @(posedge clk or posedge reset) begin if(reset) begin fifo[1] <= 0; fifo[0] <= 0; end end assign data\_out = fifo; endmodule

#### **Diff Results**

```
+ always @ (posedge clk)
always @(posedge clk)
                              + begin
begin
                              +
                                if (reset)
  if(reset)
                                overrun \leq 0;
  begin
                              + end
    fifo[1] <= 0;
                              always @(posedge clk)
    fifo[0] <= 0;</pre>
                              begin
  end
                                if(reset)
end // always
                                begin
-always @(posedge clk )
                                  fifo[0] <= 0;
-begin
                                  fifo[1] <= 0;
   if (reset)
                                  fifo[2] <= 0;</pre>
     overrun <= 0;
                                end
-end // always
                              end // alwavs
```

Verilog's non-unique identifiers and concurrent semantics cause *diff* to identify a large amount of false positives.

### Outline

- Motivation
- Verilog Background
- Vdiff Algorithm
- Evaluation
- Conclusions

## Algorithm

- input: two versions of a Verilog file
- output: syntactic differences in terms of change types
  - 1. extract Abstract Syntax Tree (AST) from each file
  - 2. compare the two trees
  - 3. filter false positives in changes to sensitivity lists using a SAT solver
  - 4. categorize differences based on Verilog syntax

#### Extract AST

```
module uart rfifo (clk, reset,
             data out, overrun);
always @(posedge clk)
begin
  if(reset)
  begin
    fifo[1] <= 0;</pre>
    fifo[0] <= 0;
  end
end // always
always @(posedge clk)
begin
  if (reset)
    overrun <= 0;</pre>
end // always
assign data out = fifo[0];
endmodule
```



#### Tree Differencing Algorithm

- hierarchically compare tree nodes from the top down
- initially align nodes using the longest common subsequence (LCS) algorithm—unmatched nodes are split into ADD and DELETE sets
- for each pair in ADD x DELETE, calculate the textual similarity
- use greedy weighted bipartite graph matching to associate a DELETE node to a corresponding ADD node



mapped











mapped



mapped







mapped





### **Resulting Syntactic Differences**

uart\_rfifo.v rev 87-88 diff



Line 221, NB\_ADD, A Non-Blocking assignment has been added

#### **Boolean Equivalence Check**



## Change Type Classification

| Syntactic<br>Elements | Abbreviation | Description                     |
|-----------------------|--------------|---------------------------------|
| Non-Blocking          | NB_ADD       | Non-Blocking assignment added   |
|                       | NB_RMV       | Non-Blocking assignment removed |
|                       | NB_CE        | Non-Blocking assignment changed |
| Always                | AL_ADD       | Always block added              |
|                       | AL_RMV       | Always block removed            |
|                       | AL_SE        | Changes in the sensitivity list |

#### Vdiff Tool Implementation

- Eclipse plugin based on open source veditor plug-in
- interfaces with Eclipse compare plug-in
- integrates with SVN (subclipse plug-in)

http://users.ece.utexas.edu/~miryung/software/Vdiff/web/index.html

### Vdiff Eclipse Plugin



#### Outline

- Motivation
- Verilog Background
- Vdiff Algorithm
- Evaluation
- Conclusions

#### Evaluation

- compare Vdiff's results with manual differencing results—the first two authors manually inspected the revisions
- subjects
  - UART16550 (opencores.org)
  - RAMP GateLib DRAM controller (ramp.eecs.berkeley.edu)
- Criteria: precision & recall

#### Results

| Project            | File<br>Revisions | Eval | Vdiff | ΙνΛει | Precision | Recall |
|--------------------|-------------------|------|-------|-------|-----------|--------|
| Total<br>(UART)    | 141               | 600  | 601   | 586   | 97.5%     | 97.7%  |
| Total<br>(GateLib) | 69                | 497  | 502   | 482   | 96.2%     | 96.9%  |
| Total              | 210               | 1097 | 1103  | 1068  | 96.8%     | 97.3%  |

We evaluated 210 Verilog file revisions with more then 1000 differences across two different projects.

# Findings

- Vdiff matches position-independent constructs very well
- Vdiff struggles on line edits with low text similarity
- Feedback from a logic designer at IBM:

"I can see a use for [the change-types] right away. It would be great for team leads because they could look at this log of changes and understand what has changed between versions without having to look at the files [textual differences]."



#### Example: Vdiff Results

/\* Old \*/

```
/* New */
```

```
/* IF Block Added */
if (!srx_pad_i || rstate == sr_idle)
        counter b <= 0x191;</pre>
```

- Exact matching [Neamtiu 2005]
- In-order matching [Cottrell 2007]
- Greedy weighted bipartite matching [Vdiff]



- Exact matching [Neamtiu 2005]
- In-order matching [Cottrell 2007]
- Greedy weighted bipartite matching [Vdiff]



- Exact matching [Neamtiu 2005]
- In-order matching [Cottrell 2007]
- Greedy weighted bipartite matching [Vdiff]



- Exact matching [Neamtiu 2005]
- In-order matching [Cottrell 2007]
- Greedy weighted bipartite matching [Vdiff]



- Exact matching [Neamtiu 2005]
- In-order matching [Cottrell 2007]
- Greedy weighted bipartite matching [Vdiff]

| Average   | Exact<br>Match | In-Order<br>Match | Weighted<br>Bipartite |
|-----------|----------------|-------------------|-----------------------|
| Precision | 56.1%          | 90.9%             | 97.5%                 |
| Recall    | 67.9%          | 91.8%             | 97.7%                 |

1097 differences from 210 file revisions in 2 real world projects shows that *the ordering of code actually matters in practice* when it comes to computing differences.

| Average   | Exact<br>Match | In-Order<br>Match | Weighted<br>Bipartite |
|-----------|----------------|-------------------|-----------------------|
| Precision | 56.1%          | 90.9%             | 97.5%                 |
| Recall    | 67.9%          | 91.8%             | 97.7%                 |

#### Comparison with General Model Differencing Framework

#### • EMF [Eclipse EMF compare project]

- Mapped (1) modules to classes, (2) always blocks and continuous assignments to operations, (3) wires, registers, and ports to fields, and (4) modular instantiations to reference pointers in an EMF ecore model.
- Results (Recall=47%, Precision=80%) shows a need to expand the Ecore model to be able to handle specific concurrency constructs and non-unique identifiers.
- Sidiff [Treude et al., 2007, Schmidt and Gloetzner, 2008]
  - At the time of our evaluation Sidiff did not provide APIs to allow us to map Verilog language constructs to their general differencing algorithms.

#### Discussion

- Vdiff is sensitive to subtle changes to variable names and IF-conditions
  - Further investigation of different name similarity measures is required
- renaming of wires, registers, and modules caused false positives
- Vdiff's algorithm currently cannot recover from node mismatches
- equivalence check using a SAT solver is limited to sensitivity lists

# Outline

- Motivation
- Verilog Background
- Vdiff Algorithm
- Evaluation
- Conclusions

#### **Related Work**

#### Syntactic program differencing

- [Yang 1992, Neamtiu et al. 2005, Fluri et al. 2007, Cottrell et al. 2007, Raghavan et al. 2004, etc.]
- Vdiff is similar to these but identifies syntactic differences robustly even when multiple AST nodes have similar labels and when they are reordered.

#### Model differencing

- UMLdiff [Xing and Stroulia 2005], Sidiff [Kelter et al.] and EMF [Eclipse EMF]
- Change types
  - Change Distiller [Fluri et al. 2007]
  - Verilog change types [Sudakrishnan et al. 2009]
- Differential symbolic execution [Person et al. 2008]

#### Conclusions

- Vdiff is a position-independent differencing algorithm designed for hardware design descriptions
  - computes syntactic differences with high recall (96.8%) and high precision (97.3%)
  - classifies differences in terms of Verilog specific change types
  - can enable analysis of evolving hardware design

#### Acknowledgment

Vdiff website:

http://www.ece.utexas.edu/~miryung/software/Vdiff/web/index.html

The authors thank Greg Gibeling and Dr. Derek Chiou for providing accesses to the RAMP repository and Dr. Adnan Aziz and anonymous reviewers for their detailed comments on our draft.



