This is the July 2002 Digest of SIGARCH Messages (sigarch-jul02): * Hot Interconnects 10 conference advance program http://www.gradebot.com/hoti/hoti10_preliminary_program.html Submitted by John Lockwood <lockwood@arl.wustl.edu> * Updated HOT Chips 14 program www.hotchips.org Submitted by Allen Baum <abaum@3wisemonkeys.net> * 4th Workshop on Binary Translation Call for Papers: http://www.ece.neu.edu/info/architecture/wbt2002.htm Submitted by David Kaeli <kaeli@ece.neu.edu> * List of papers published by last Computer Architecture Letters http://www.cs.virginia.edu/~tcca/ Submitted by Kevon Skadron <skadron@cs.virginia.edu> --Doug Burger SIGARCH Information Director infodir_SIGARCH@acm.org * Archive: http://www.cs.wisc.edu/~lists/archive/sigarch-members/maillist.html * Web pages: http://www.cs.wisc.edu/~arch/www/, http://www.acm.org/sigarch/ * To remove yourself from the SIGARCH mailing list: mail listserv@acm.org with message body: unsubscribe SIGARCH-MEMBERS ----------------------------------------------------------------- Doug Burger Office: 3.432 ACES Assistant Professor Phone: 512-471-9795 Department of Computer Sciences Assistant: 512-471-9442 University of Texas at Austin Fax: 512-232-1413 Taylor Hall 2.124 E-mail: dburger@cs.utexas.edu Austin, TX 78712-1188 USA www.cs.utexas.edu/users/dburger ----------------------------------------------------------------- Hot Interconnects 10 conference Stanford University August 21-23, 2002 http://www.hoti.org/ HotI 10 Advance Program ----------------------- http://www.gradebot.com/hoti/hoti10_preliminary_program.html Wednesday, Aug 21, 2002 Keynote: Delay Tolerant Networking Vint Cerf Session 1: Gigabit/sec and Terabit/sec Switching technologies * A Four-Terabit Single-Stage Packet Switch with Large Round-Trip Time Support - Francois Abel, Cyriel Minkenberg, Ronald P. Luijten, Mitch Gusat, Ilias Iliadis (IBM Research, Zurich) * Feedback Output Queuing: A Novel Architecture for Efficient Switching Systems - Victor Firoiu, Xiaohui Zhang, Emre Gunduzhan (Nortel Networks) * A Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers - Hang-Sheng Wang, Li-Shiuan Peh, Sharad Malik (Princeton University) * Multicast scheduling for switches with multiple input-queues - Shashank Gupta (Andiamo Systems), Adnan Aziz (University of Texas) Session 2: High Speed Packet Scheduling * A Flow Table-Based Design to Approximate Fairness - Rong Pan (Stanford), Lee Breslau (AT&T Labs-Research), Balaji Prabhakar (Stanford), Scott Shenker (ICIR) * Stable Round-Robin Scheduling Algorithms for High-Performance Input Queued Switches - Jing Liu, Mounir Hamdi (Hong Kong University of Science and Technology) * Architecture and Hardware for Scheduling Gigabit Packet Streams - Raj Krishnamurthy, Sudhakar Yalamanchili, Karsten Schwan (Georgia Institute of Technology) Richard West (Boston University) Session 3: Multiprocessors, Clusters, and Storage Area Networks * Scalability Port: A Coherent Interface for Shared Memory Multiprocessors - Mani Azimi, Faye Briggs, Michel Cekleov, Manoj Khare, Akhilesh Kumar, Lily P. Looi (Intel) * Scalable Opto-Electrical Network (SOENet) - Amit K. Gupta, William J. Dally, Arjun Singh, Brian Towles (Stanford) * Distributed-and-Split Data-Control Extension to SCSI for Scalable Storage Area Networks - Yitzhak Birk, Nafea Bishara (Technion) Panel: Panel Wireless Wars: Wi-Fi vs. GPRS vs. 3G - David Liddle (General Partner at U.S. Venture Partners) Thursday, Aug 22, 2002 Keynote: * Eric Brewer (Co-Founder and Chief Scientist for Inktomi) Session 4: Gigabit/sec and Terabit/sec Routing technologies * A Middle Ground Between CAMs and DAGs for High-Speed Packet Classification - Amit Prakash, Adnan Aziz (University of Texas at Austin) * Efficient Mapping of Range Classifier into Ternary-CAM - Huan Liu (Stanford University) * Sorting and Searching using Ternary CAMs - Rina Panigrahy, Samar Sharma (Cisco Systems) * Reducing TCAM Power Consumption and Increasing Throughput - Rina Panigrahy, Samar Sharma (Cisco Systems) Session 5: High Speed Packet Processing Engines * Stream Handlers: Application-specific Message Services on Attached Network Processors - Ada Gavrilovska, Kenneth Mackenzie, Karsten Schwan, Austen McDonald (Georgia Institute of Technology) * DiffServ over Network Processors: Implementation and Evaluation - Ying-Dar Lin, Yi-Neng Lin (National Chiao Tung University), Shun-Chin Yang, Yu-Sheng Lin (Industrial Technology Research Institute) * TCP-Splitter: A TCP/IP Flow Monitor in Reconfigurable Hardware - David V. Schuehler, John Lockwood (Washington University, St Louis) Session 6: Wireless, Broadband, and Optical Networks * Radioport: A Radio Network for Monitoring and Diagnosing Computer Systems - Hans Eberle (Sun Microsystems Laboratory) * Optimized Upstream Scheduling in Broadband Cable Networks - Yingfei Dong, Zhi-Li Zhang, and David H.-C. Du (University of Minnesota) * WDM Optical Interconnect Architectures Under Two Connection Models - Yuanyuan Yang (SUNY), Jianchao Wang (Data treasury Corp) Friday, Aug 23, 2002 Tutorials * Optical Networking: Recent Developments, Issues, and Trends - Raj Jain (Nayna Networks and The Ohio State University) * InfiniBand Architecture and Where it is Headed - Dhabaleswar K. Panda (The Ohio State University) * High-Speed Networking: A Systematic Approach to High-Bandwidth Low-Latency Communication - James Sterbenz (BBN) * Mobile Ad Hoc Networking: Medium Access Control and Routing Protocols - Nitin Vaidya (University of Illinois at Urbana-Champaign) The advance program is also on-line as: http://www.gradebot.com/hoti/hoti10_preliminary_program.html -- John Lockwood | Department of Computer Science Assistant Professor | Washington University lockwood@arl.wustl.edu | 1 Brookings Drive, Campus Box 1045 (314) 935-4460 | St. Louis, MO 63130 http://www.arl.wustl.edu/~lockwood ---------------------------------------------------------------------- ---------------------------------------------------------------------- The program for HOT Chips 14 has been updated; Program details and registration is available through our web site at http://www.hotchips.org ******* Important date ******* Jul 20, 2002 Advance Registration Deadline Sunday, August 18, 2002 ----------------------- Morning Tutorial ----------------------- IC Technology Scaling Trends, Challenges, & Potential Solutions through 2016 Peter M. Zeitzoff, Int'l SEMATECH Senior Fellow (MOSFETs) Tony Yen, Co-Director, Lithography Div., Int'l SEMATECH (Lithography) In this tutorial, three major areas of IC technology are addressed: MOSFET devices and front-end process integration, interconnect, and lithography. For each area, the scaling projections are discussed, key issues and challenges are assessed, and potential solutions for the challenges are evaluated, all through the year 2016. ----------------------- Afternoon Tutorial ----------------------- Low Power Wireless Networked System Design Rajesh K. Gupta Center for Embedded Computer Systems, UC Irvine This tutorial concerns the design of integrated systems with network connections incorporating an RF front end, baseband DSP, link layer coding and medium access control functions. A major challenge in the design of these systems is meeting the system performance with the lowest area, cost, and power. The tutorial focuses on top-down design approaches and techniques that help to bridge the gap between the system engineering and circuit engineering that has limited system to circuit level implementation optimizations. ----------------------- Monday, August 19, 2002 ----------------------- Session 1: Intel Microprocessors * McKinley Processor HP/Intel * Analysis of CPU2K Benchmarks on the McKinley Processor HP/Intel * Intel Xeon Processor and Hyper-Threading Technology Intel Keynote: Eric Schmidt CEO, Google TBA Session 2: Network Processors * Benchmark Performance: IBM PowerNP NP4GS3 Network Processor IBM * AMCC's 2nd Generation 5Gbps Network Processor AMCC Session 3: Interconnects * A 20Gb/s 0.13um CMOS Serial Link Stanford Univ. * Smarter Interconnects for Smarter Chips Sonics Inc. * Building High Performance Multi-processor Systems with JIO, Sun Micro. Session 4: Technology * Integrated Cryptographic HW Engines on the zSeries µProcesso IBM Corp. * How a processor can permute n bits in O(1) cycles, Princeton Univ. * CMOS Crossbar Ting Wu, Chi-Ying Tsui, Hong Kong U. of Science & Technology Session 5: Systems on Chip I * The RM9000 Family of Integrated Multiprocessor Devices PMC-Sierra * Alchemy Au1X00 AMD Panel: Embedded Systems Software : Visions of the Future Moderator: John Mashey Sensei Partners Panelists: Chris Rowen CEO, Tensilica Larry Mittag CTO, Stellcom Jim Turley JimTurley.com Nick Tredennick Dynamic Silicon ------------------------ Tuesday, August 20, 2002 ----------------------- Session 6: Potpourri * The Atheros Chipset for 108 Mb/s Multi-Mode Wireless LANs Atheros Comm. * PipeRench: Power & Performance Evaluation of a Programmable Pipelined Datapath CMU * GeForce4 Henry Moreton, John Montrym NVIDIA Corp Keynote: Tom Edwards NASA Air Traffic Control Session 7: Digital Signal Processors * A New Distributed DSP Architecture Based on the Intel IXS Intel * VASA: Single-chip MPEG-2 422P@HL CODEC LSI w/Multi-chip Config. for Large Scale Processing Beyond HDTV Level NTT Session 8: Switches * Delivering On The Promise of Asynchronous Circuit Design Fulcrum Micro. * A Scalable Switch Fabric to Multi-Terabit: Architecture & Challenges IBM Session 9: Systems on Chip II * FirePath Broadcom * Broadcom Calisto: A Multi-Channel Multi-Service Comm-unications Platform Broadcom * BCM1101 Ethernet IP Phone / Gateway Platform Broadcom Session 10: AMD Hammer Processor * The AMD x86-64 ISA: Extending the x86 to 64-bits AMD * The AMD Hammer Processor Core AMD * Hammer Shared Memory Multi Processor Systems AMD ---------------------------------------------------------------------- ---------------------------------------------------------------------- Call for papers for WBT2002 4th Workshop on Binary Translation Call for Papers: http://www.ece.neu.edu/info/architecture/wbt2002.htm ---------------------------------------------------------------------- ---------------------------------------------------------------------- -- List of papers from last Computer Architecture Letters -- Dear SIGARCH membership: We are delighted to announce the online publication of the next set of four papers in Computer Architecture Letters, the new publication of the IEEE Computer Society Technical Committee on Computer Architecture (TCCA). "Letters" is a quarterly forum for fast publication of new, high-quality ideas in the form of short, critically refereed, technical papers. Accepted letters are published immediately on our website and in the next available paper issue. The print issue is sent to all TCCA members; the website is available to the general public. Submissions are accepted on a continuing basis. Current turn-around time is 32 days, and we hope to improve this as our review process becomes more efficient. Current acceptance rate is 19%. The titles and abstracts of the new set of letters appears below, and these letters as well as the call for papers and submission instructions, can be found on the Letters website at http://www.cs.virginia.edu/~tcca/ We hope that you will look forward to each issue as a nice digest of some of the latest hot research going on in our field, and we hope that you will submit your early and exciting research results to Letters. We hope that the quick turn-around will encourage this by providing immediate recognition. Since IEEE allows publication in its conferences and journals if there is at least 30% new material and this seems to be a fairly common rule of thumb, this should not constrain researchers from following their letter with full conference papers or journal articles. The kind of paper that we are seeking is an early, "wow" idea that may not yet be ready for a full conference publication, but has enough validated insights to justify publication as a four-page letter. We recognize that some authors may prefer to take the extra time to solidify the research for publication at a prestigious conference rather than risk losing the idea to someone who takes the Letters paper and runs faster with it than the original author. We encourage you to not succumb to that mentality, but rather to submit your new work and get "credit" for the seminal idea, regardless of the outcome of subsequent conference submissions, Since we expect each issue of Letters to contain approximately 16 to 24 pages of really new stuff and will be read by a large fraction of the computer architecture community, credit for the seminal idea is almost guaranteed. We suggest that the seminal idea on one's CV is more valuable than the turn-the-crank-with-lots-of-data papers that we seem to encounter too frequently in the major conferences and journals. It remains, of course, for the community at large to validate this thesis. Yale Patt, Editor-in-Chief Kevin Skadron, Associate Editor-in-Chief Jean-Luc Gaudiot, IEEE Computer Society TCCA Chair New papers, volume 1, 2002, available online at http://www.cs.virginia.edu/~tcca/ ----------------------------------------------- - L. Shang, L.-S. Peh, N. K. Jha. "Power-efficient Interconnection Networks: Dynamic Voltage Scaling with Links." Volume 1, May 2002. - O. S. Unsal, I. Koren, C. M. Krishna, C. A. Moritz. "Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling." Volume 1, Apr. 2002. - AJ KleinOsowski, D.J. Lilja. "MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research." Volume 1, May 2002. - H. Vandierendonck, K. De Bosschere. "An Address Transformation Combining Block- and Word-Interleaving." Volume 1, May 2002. Abstracts --------- - L. Shang, L.-S. Peh, N. K. Jha. "Power-efficient Interconnection Networks: Dynamic Voltage Scaling with Links." Volume 1, May 2002. Power consumption is a key issue in high-performance interconnection network design. Communication links, already a significant consumer of power now, will take up an ever larger portion of the power budget as demand for network bandwidth increases. In this paper, we motivate the use of dynamic voltage scaling (DVS) for links, where the frequency and voltage of links are dynamically adjusted to minimize power consumption. We propose a history-based DVS algorithm that judiciously adjusts DVS policies based on past link utilization. Despite very conservative assumptions about DVS link characteristics, our approach realizes up to 4.3X power savings (3.2X average), with just an average 27.4% latency increase and 2.5% throughput reduction. To the best of our knowledge, this is the first study that targets dynamic power optimization of interconnection networks. - O. S. Unsal, I. Koren, C. M. Krishna, C. A. Moritz. "Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling." Volume 1, Apr. 2002. In this paper, we present an architecture-compiler based approach to reduce energy consumption in the processor. While we mainly target the fetch unit, an important side-effect of our approach is that we obtain energy savings in many other parts in the processor. The explanation is that the fetch unit often runs substantially ahead of execution, bringing in instructions to different stages in the processor that may never be executed. We have found, that although the degree of Instruction Level Parallelism (ILP) of a program tends to vary over time, it can be statically predicted by the compiler with considerable accuracy. Our Instructions Per Clock (IPC) prediction scheme is using a dependence-testing-based analysis and simple heuristics, to guide a front-end fetch-throttling mechanism. We develop the necessary architecture support and include its power overhead. We perform experiments over a wide number of architectural configurations, using SPEC2000 applications. Our results are very encouraging: we obtain up to 15% total energy savings in the processor with generally little performance degradation. In fact, in some cases our intelligent throttling scheme even increases performance. - AJ KleinOsowski, D.J. Lilja. "MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research." Volume 1, May 2002. Computer architects must determine how to most effectively use finite computational resources when running simulations to evaluate new architectural ideas. To facilitate efficient simulations with a range of benchmark programs, we have developed the MinneSPEC input set for the SPEC CPU 2000 benchmark suite. This new workload allows computer architects to obtain simulation results in a reasonable time using existing simulators. While the MinneSPEC workload is derived from the standard SPEC CPU 2000 workload, it is a valid benchmark suite in and of itself for simulation-based research. MinneSPEC also may be used to run large numbers of simulations to find ``sweet spots'' in the evaluation parameter space. This small number of promising design points subsequently may be investigated in more detail with the full SPEC reference workload. In the process of developing the MinneSPEC datasets, we quantify its differences in terms of function-level execution patterns, instruction mixes, and memory behaviors compared to the SPEC programs when executed with the reference inputs. We find that for some programs, the MinneSPEC profiles match the SPEC reference dataset program behavior very closely. For other programs, however, the MinneSPEC inputs produce significantly different program behavior. The MinneSPEC workload has been recognized by SPEC and is distributed with Version 1.2 and higher of the SPEC CPU 2000 benchmark suite. - H. Vandierendonck, K. De Bosschere. "An Address Transformation Combining Block- and Word-Interleaving." Volume 1, May 2002. As future superscalar processors employ higher issue widths, an increasing number of load/store-instructions needs to be executed each cycle to sustain high performance. Multi-bank data caches attempt to address this issue in a cost-effective way. A multi-bank cache consists of multiple cache banks that each support one load/store-instruction per clock cycle. The interleaving of cache blocks over the banks is of primary importance. Two common choices are block-interleaving and word-interleaving. Although word-interleaving leads to higher IPC, it is more expensive to implement than block-interleaving since it requires the tag array of the cache to be multi-ported. By swapping the bits in the effective address that are used by word-interleaving with those used by block-interleaving, it is possible to implement a word-interleaved cache with the same cost, cycle time and power consumption of a block-interleaved cache. Because this makes the L1 data cache blocks sparse, additional costs are incurred at different levels of the memory hierarchy. ---------------------------------------------------------------------- ----------------------------------------------------------------------