Jump to content
Using Talus Vortex to Implement an ARM1176JZS with Distributed MTCMOSAuthor(s): Jack Liu, Massimo Bertoletti and Andrew Lambert, Sondrel Ltd.
Abstract: Although advanced low-power techniques such as distributed MTCMOS have been in use for a number of years by a select few, wider adoption has been hampered by the need for suitable libraries and heavily customized layout flows. However, the recent availability of PMK libraries and the increasing automation of the low-power flow from a number of vendors including Magma is increasing accessibility. This paper describes our experience implementing an ARM1176JZS in the TSMC 65LP process with dormant modes using Talus® Vortex. The developed flow will be discussed in detail, paying special attention to several key areas including the creation of multiple power domains, optimization of the distributed coarse grain MTCMOS switches, power routing and a review of other important factors. Where available, the native Talus Vortex functions have been used in the flow, providing a valuable reference to design groups using these techniques and libraries for the first time.
The Physical Implementation of a Complex Switch ASIC for a High-Performance Parallel Computing SystemAuthor(s): Ding-shan You and Hua Shen, Institute of Computing Technologies, Chinese Academy of Sciences
Abstract: The design trade-off, flow integration and physical implementation of a complex switch ASIC, which is targeted for a parallel computing system's interconnect application, is introduced in this paper. The ASIC has 16 RX/TX ports with integration of three 16x16 crossbars to support multi-layer and multi-function parallel switching capacity. Each port of the chip could run up to 156.25 MHz with 3.12 Gbps throughput. By carefully designing and balancing on-chip clocking networks, the full timing closure of the chip's physical implementation has been achieved under 4 function modes and 3 process, voltage, temperature (PVT) corners using mainly Magma’s Blast tools. The ASIC has been taped-out using 0.18-micron/6-metal CMOS technology, and has about 20-million transistors, 17 different clock domains, 48 RAM macro blocks, a 12.39 mm x 12.39 mm die size, and a 1053-pin flip-chip package.
Clock Tree Implementation Using TalusAuthor(s): Raphy Du, Chandler Mei and Vincent Guo, Nvidia
Abstract: Clock distribution networks synchronize the flow of data signals among synchronous data paths. The design of these networks can dramatically affect system-wide performance and reliability. In deep sub-micron technologies, there are a few critical issues related to clock network: 1) the clock tree power tends to dominate the chip power due to its heavy load and high switching activity; 2) coupling caps introduce more crosstalk and make net delay unpredictable; 3) transition time on the leaf level has big impact on setup and hold time; 4) non-default clock routing introduces extra congestion. In this paper, we illustrate a method of using Talus to build a balanced, short-insertion-delay and low-power clock tree for a 45-nm graphics chip.
Using Talus Vortex for Concurrent Multi-Corner OptimizationAuthor(s): Chandler Mei, Leo Zhang and Qingyan Zheng, nVidia
Abstract: At the 90-nm node and below, designs are required to close timing in an increasing number of corners. The traditional method which optimizes setup time at wc corner and hold time at bc corner is neither comprehensive nor precise enough. A corner is a certain combination of Process, Voltage, Temperature (PVT) values, RC rules, and on-chip variation (OCV) specifications, under which the design must be timing clean. This paper shows how we use Talus Vortex to define corners and run multi-corner optimization concurrently.
Comparison of Timing Correlation and Optimization between Multi-Mode/Multi-Corner (MM/MC) and Single-Mode/Single-Corner (SM/SC) MethodsAuthor(s): Peng He, Min Xie, Xun Chen, Infineon Technologies (Xi’an) Co. Ltd.
Abstract: In the past, we have used Talus to address the complexities associated with optimization across multiple modes and process, voltage, temperature (PVT) corner combinations. These methods included: 1) multi-corner concurrent optimization to address timing problems across specific PVT corners; 2) classic MM/MC optimization for mode-corner combinations; and 3) surgical use of scenario-based optimization to address timing modes only related to specific corners.
This paper will introduce an optimization approach which utilizes multi-corner optimization to generate additional “margin-enhanced scenarios.” The flow used the “margin-enhanced scenarios” for concurrent timing optimization. This paper will compare timing results between: 1) single scenario and SM/SC; 2) full MM/MC and SM/SC; and 3) the “margin-enhanced scenario” optimization and SM/SC. Based on the results, the “margin-enhanced scenario” optimization showed the best correlation compared to SM/SC.
One Method for Place and Route Design Migration between TechnologiesAuthor(s): Peng He and YiHua Chai, Infineon Technologies (Xi’an) Co. Ltd.
Abstract: According to the market demands, usually we need to re-design an IC based on the same source code, but migrate to advanced technology, or create a new version using the same technology. This can result in lower tape-out costs by reducing area, higher performance by enhancing the frequency, and lower power. This paper describes a method of migrating a design between different technologies that has been fully placed and routed using Talus Vortex. The method uses scaling layers, re-binding the library and a place-and-route ECO method to achieve the migration. It will focus on and start from the final place-and-route database, not from logic synthesis. We’ll show how redesign time can be shortened to met tight time-to-market schedules.
DFM Solutions for 65-nm/40-nm SoC Designs with Magma SoftwareAuthor(s): Brian Jia, Alchip
Abstract: With the scaling of CMOS technologies from 65-nm to 40-nm, the magnitude of variability of key parameters affecting the performance of integrated circuits has increased greatly. This variability causes a decrease in the yield of these systems on a chip (SoCs). Advanced design-for-manufacturability (DFM) techniques, such as critical-area analysis (CAA), lithographic process checking (LPC) and virtual chemical-mechanical polishing (VCMP) are critical for 40-nm SoC design. In this paper, we’ll describe how to use these techniques with Magma tools and share the DFM improvement results of a test case.
Prototyping a Large-Scale SoC with a Complex Floorplan and Massive Amount of Hard MacrosAuthor(s): Koki Tsurusaki, Renesas Technology Corp. and Rui Li, Magma Design Automation Co., LTD Japan
Abstract: As the performance of compute servers and multi-threading functionality increases, many EDA tools have the ability to implement very large designs with as many as 3 to 4 million cells, using an automated flat flow. However, a bottleneck exists in these automated place-and-route flows. Designers have to create and verify a floorplan by hand, requiring multiple iterations over several weeks to achieve a workable floorplan with respect to timing closure, congestion and routability. This paper outlines an advanced, fully automated or minimal handwork prototyping approach using Talus and Hydra products, including Magma’s timing-driven and congestion-driven automatic macro placement function and a fast verification solution for timing and congestion. This paper discusses how to prototype a 65-nm car navigation chip with more than 3.5-million cells, 484 hard macros and dual CPU cores.
Floorplanning a Complicated Partition with the Talus Tcl InterfaceAuthor(s): Lei Zhang and Qingyuan Zheng, Nvidia
Abstract: This paper introduces a few techniques for floorplanning a partition in 65-nm and smaller geometries. We'll show how these techniques helped us achieve a workable floorplan in a relatively short time. They include analyzing logic connectivity, congestion and local density control. We'll also share tips for a zigzag buffer. Examples of how we implemented each of these techniques using Magma’s Tcl interface will be presented.
Library Modeling Optimization Using SiliconSmartAuthor(s): Hou TangQing, Verisilicon Microelectronics (Shanghai) Co. Ltd
Abstract: From the design of basic logic cells to the hierarchical rollup, there is hardly any place in an ASIC flow where characterization of the library does not play a role. The most important aspects of the characterization are accuracy and efficiency. Data accuracy ensures the model is a good reflection of the design and the efficiency of the characterization flow is of vital importance to the turnaround time of library models. This paper will present some practical modeling techniques that improve the accuracy and efficiency of library generation, including composite current source (CCS) modeling and auto-ranging load.
Using a Common Library Structure to Save Volcano SpaceAuthor(s): Peng He, YiHua Chai, Jean-Yves Larguier, Infineon Technologies (Xi’an) co. Ltd.
Abstract: Today, designs have become more and more complex. It is common for one design to include more than ten libraries. This makes the Volcano database very large. As the design goes through the RTL-to-GDSII process, Volcanoes may be generated for many of the steps. So, the total space used by the Volcanoes will be huge. Sometimes, when many projects run together the memory usage is pushed to its limit. This paper will introduce a method of reducing the Volcano space usage through a common library sharing structure. Results will be presented that show a reduction in Volcano space usage of 40% to 60% among four test projects.