Stressing the interconnect for performance bugs
In an earlier post, I had given a small introduction on why performance verification is necessary for today's system on chips, along with a few key metrics that can be measured. Since any system will have multiple masters and multiple slaves, it is quite important to exercise these elements in various combinations such that the fabric is stressed and its internal arbiters and buffers are exhausted.
The first level of functional check would be to verify that the connectivity from each master to slave is valid and each support appropriate protocol level requirements. As an extension to this plan, it would also be of interest to perform read and write transactions with characteristics that allow the master to achieve maximum possible bandwidth at its port. For example, if a master say DMA has a 32-bit data bus width and operates at 100 MHz, then the maximum theoretical bandwidth that can be supported by that master is 32 * 100 Mbits/s. This can be further divided by 8 to get a value in bytes per second, in this case 400 Mbytes/s. This is possible only if DMA engine can send transactions such that the data bus is occupied on every single clock. In normal IPs, the observed value will be much less than the theoretical limit because there might be many operational overheads that limit it from reaching the maximum value. However, it'll be a bad thing if the IP can reach a higher throughput, but the fabric throttles it to a lower value.
A single master to single slave scenario can be easily done with the basic connectivity test, except that you might have to increase the number of data bytes transferred. For example, an AXI bus master will have a separate read and write channel which would mean that it can simultaneously send read and write operations. The amount of data will be dependent on the number of transfers performed and the transaction parameters like length, size, burst type, etc. However, it can be calculated easily and the master can be made to read or write 8kB or 256kB of data so that there is sufficient traffic to saturate the particular path from that master to a specific slave. In this case, only one master is active and only one path is exercised at a time.
Another interesting case is when multiple masters are involved. For example a CPU core can read instructions from the memory and simultaneously have a graphic engine store some results to some other address location within the same memory space. It is the job of the interconnect to be able to service both of these masters such that none of the two are starved of access to the slave at any point in time. The fabric could have arbiters in place to decide on which master should get a higher priority and hence this scenario will also benefit in stressing out those configurations. This scenario will have multiple masters active at the same the same time targeting the same slave.
Sometimes a master can have operations to two different slaves at the same time. A good example here would be a DMA engine which would require access to two different memory regions or devices to transfer a block of data. In this case, a single master perform accesses to multiple slaves and might be of interest only when certain use-cases are expected from the system.
The last but not the least is more of a combination of all the above, where there are multiple masters targeting diferent slaves. The important thing to note is that you have multiple components active at the same time exercising mutually excusive and partially shared paths. Some level of structural analysis of the interconnect would yield more intelligent test cases to stress the system.
For all these different types, we can measure the same set of metrics. SMSS will provide a good idea of how much the fabric can support when there is no interference from other masters, while the rest will tell you how it behaves in a typical application.