Clocking blocks allow inputs to be sampled and outputs to be driven at a specified clock event. If an input skew is mentioned for a clocking block, then all input signals within that block will be sampled at skew time units before the clock event. If an output skew is mentioned for a clocking block, then all output signals in that block will be driven skew time units after the corresponding clock event.
What are input and output skews ?
A skew is specified as a constant expression or as a parameter. If only a number is used, then the skew is interpreted to follow the active timescale in the given scope.
clocking cb @(clk);
input #1ps req;
output #2 gnt;
input #1 output #3 sig;
endclocking
In the example given above, we have declared a clocking block of the name cb to describe when signals belonging to this block has to be sampled. Signal req is specified to have a skew of 1ps and will be sampled 1 ps before the clock edge clk. The output signal gnt has an output skew of 2 time units and hence will follow the timescale followed in the current scope. If we have a timescale of 1ns/1ps then #2 represents 2 ns and hence will be driven 2 ns after the clock edge. The last signal sig is of inout
type and will be sampled 1 ns before the clock edge and driven 3 ns after the clock edge.
An input skew of 1step
indicates that the signal should be sampled at the end of the previous time step, or in other words, immediately before the positive clock edge.
clocking cb @(posedge clk);
input #1step req;
endclocking
Inputs with explicit #0 skew will be sampled at the same time as their corresponding clocking event, but in the Observed region to avoid race conditions. Similarly, outputs with no skew or explicit #0 will be driven at the same time as the clocking event, in the Re-NBA region.
Example
Consider a simple design with inputs clk and req and drives an output signal gnt. To keep things simple, lets just provide grant as soon as a request is received.
module des (input req, clk, output reg gnt);
always @ (posedge clk)
if (req)
gnt <= 1;
else
gnt <= 0;
endmodule
To deal with the design port signals, let's create a simple interface called _if.
interface _if (input bit clk);
logic gnt;
logic req;
clocking cb @(posedge clk);
input #1ns gnt;
output #5 req;
endclocking
endinterface
The next step is to drive inputs to the design so that it gives back the grant signal.
module tb;
bit clk;
// Create a clock and initialize input signal
always #10 clk = ~clk;
initial begin
clk <= 0;
if0.cb.req <= 0;
end
// Instantiate the interface
_if if0 (.clk (clk));
// Instantiate the design
des d0 ( .clk (clk),
.req (if0.req),
.gnt (if0.gnt));
// Drive stimulus
initial begin
for (int i = 0; i < 10; i++) begin
bit[3:0] delay = $random;
repeat (delay) @(posedge if0.clk);
if0.cb.req <= ~ if0.cb.req;
end
#20 $finish;
end
endmodule
It can be seen from simulation output window that req is driven #5ns after the clock edge.

Output skew
To get a clear picture of the output skew, lets tweak the interface to have three different clocking blocks each with a different output skew. Then let us drive req with each of the clocking blocks to see the difference.
interface _if (input bit clk);
logic gnt;
logic req;
clocking cb_0 @(posedge clk);
output #0 req;
endclocking
clocking cb_1 @(posedge clk);
output #2 req;
endclocking
clocking cb_2 @(posedge clk);
output #5 req;
endclocking
endinterface
In our testbench, we'll use a for
loop to iterate through each stimulus and use a different clocking block for each iteration.
module tb;
// ... part of code same as before
// Drive stimulus
initial begin
for (int i = 0; i < 3; i++) begin
repeat (2) @(if0.cb_0);
case (i)
0 : if0.cb_0.req <= 1;
1 : if0.cb_1.req <= 1;
2 : if0.cb_2.req <= 1;
endcase
repeat (2) @ (if0.cb_0);
if0.req <= 0;
end
#20 $finish;
end
endmodule

Input skew
To understand input skew, we'll change the DUT to simply provide a random value every #1ns just for our purpose.
module des (output reg[3:0] gnt);
always #1 gnt <= $random;
endmodule
The interface block will have different clocking block declarations like before each with a different input skew.
interface _if (input bit clk);
logic [3:0] gnt;
clocking cb_0 @(posedge clk);
input #0 gnt;
endclocking
clocking cb_1 @(posedge clk);
input #1step gnt;
endclocking
clocking cb_2 @(posedge clk);
input #1 gnt;
endclocking
clocking cb_3 @(posedge clk);
input #2 gnt;
endclocking
endinterface
In the testbench, we'll fork 4 different threads at time 0ns where each thread waits for the positive edge of the clock and samples the output from DUT.
module tb;
bit clk;
always #5 clk = ~clk;
initial clk <= 0;
_if if0 (.clk (clk));
des d0 (.gnt (if0.gnt));
initial begin
fork
begin
@(if0.cb_0);
$display ("cb_0.gnt = 0x%0h", if0.cb_0.gnt);
end
begin
@(if0.cb_1);
$display ("cb_1.gnt = 0x%0h", if0.cb_1.gnt);
end
begin
@(if0.cb_2);
$display ("cb_2.gnt = 0x%0h", if0.cb_2.gnt);
end
begin
@(if0.cb_3);
$display ("cb_3.gnt = 0x%0h", if0.cb_3.gnt);
end
join
#10 $finish;
end
endmodule
The output waveform is shown below and it can be seen that the design drives a random value every #1ns.

It's important to note that the testbench code which sampled through cb_1 clocking block managed to get the value 0x3 while cb_0 got 0xd. Note that these values may be different for other simulators since they can take a different randomization seed value.
ncsim> run cb_3.gnt = 0x9 cb_2.gnt = 0x3 cb_1.gnt = 0x3 cb_0.gnt = 0xd Simulation complete via $finish(1) at time 15 NS + 0