Hobbit

This section describes AT&T's entry into the 32 bit RISC design wars with the 92010 Hobbit processor and associated chips. The Hobbit addresses integrated computation/communication for the Personal Communicator market. It is implemented in 0.9 micron CMOS.

AT&T, a giant in the communication field, has been a major manufacturer of integrated circuits for a long time. But most of us haven't seen them, because they were manufactured for internal use. Now, AT&T is branching out into computation / communication products, where its marketing and extant communication reputation will help it gain market share.

The Hobbit chip set attempts to minimize the mips/watt factor, providing high performance for small, light-weight, battery powered systems. The Hobbit design requirements were driven by the Personal Communications market. AT&T sees this as a major market area for them in the coming years. This is a classic consumer market; ie, cost-sensitive. The Personal Communications product provides integrated computation/communications in a small, light-weight package. Unique constraints for this class of product results in unique design requirements for the base RISC machine. Example Personal Communicators began to appear in the market place in 1992, with EO Corporation basing theirs on the Hobbit processor.

The Hobbit is the first in a series of processors from AT&T. It is accompanied by companion chips, including the 92011 System Management device. Also available in the family is the 92012 PCMCIA Controller, the 92013 Peripheral controller, and the 92014 video display controller for CRT's or flat panels.

The 92012 PCMCIA Controller interfaces the Hobbit to the personal computer memory card association (pcmcia) bus, version 2.0. Block mode access is provided between the Hobbit processor and the memory (or I/O) cards. The Hobbit functions as a bus master in these transfers. Up to four card slots are supported. 32 bit Hobbit data formats are mapped into dual 16 bit PCMCIA accesses.

The 92013 Peripheral Controller provides an interface from the Hobbit to a subset of the industry standard architecture (ISA) bus, or AT-bus. Up to 8 memory or I/O devices on the bus can be accommodated. Four channels of 8 or 16 bit DMA are provided, with the Hobbit as bus master. The 92013 handles Hobbit's 32 bit accesses as 2 - 16 bit or 4 - 8 bit accesses. Memory and I/O accesses are supported by mapping the Hobbit bus address into a 16 Mbyte region on the ISA side. Up to 18 of these 16 Megabyte regions are supported.

The 92014 Video Display Controller interfaces the Hobbit bus to video memory for a CRT or LCD screen. An area of memory in the Hobbit address space can be set aside for snooping by the 92014. Data transfers can be direct, or through a write-through buffer, implemented as a 256 word FIFO. Color LCD displays are supported with an external VRAM frame buffer, and the CRT data is output to external RAMDAC. Simultaneous operation of a CRT and an LCD device are possible.

The Hobbit architecture is derived from the "CRISP" architecture developed by AT&T, which is optimized to run "C" code, and thus Unix.

The Hobbit achieves a peak rate of slightly better than one cycle per instruction via a variety of techniques. Most instructions are inherently single cycle. A three kbyte, three-way set associate cache is provided for encoded instructions, with a 256 byte stack cache holds the top of the user stack on chip. There is a 32 entry direct mapped (1:1) decoded instruction cache. The Hobbit's high code density results in reduced memory requirements for applications, This has the two fold benefit of reducing power requirements for memory, and speeding applications.

The prefetch decode unit incorporates a 3 stage pipeline, as does the execution unit. Operand bypassing is used to supply results to subsequent operations in the pipeline, without a write followed by a read, a technique called read-canceling. A single execution unit is included. Integer multiply and divide is handled by on-chip hardware. The chip achieves a respectable 27k Dhrystone performance figure.

The 92011 System Management device augments the basic processor architecture, by providing bus arbitration for up to five bus masters. It incorporates the system clock, interrupt controller functions, and dram control. It includes a real time clock, and a synchronous serial and an asynchronous port. 256 bytes of battery backed SRAM is also on-chip. The 92011 allows for graceful stopping and restarting of the main processor's clock. DRAM refresh can continue in stopped mode.

Dram control supports fast page mode read and write, and has programmable timing for RAS and CAS. Self-refreshed DRAM may be used. Up to 4 banks of 1, 4 or 16 Megabit drams may be used.

Interrupts are provided from 3 external sources, or by internal events such as timer overflow. Certain instructions are not interruptible. Exceptions are used to signal program errors. These are events such as division by zero, or privilege violation.

Multiple bus masters are accommodated by a bus request/bus grant protocol, with the Hobbit not necessarily the default master. An external arbiter is used.

The on-chip mmu includes dual 32 entry tlb's, for paged or non-paged segments. Address translation can be enabled or disabled. A 32 bit wide Von Neumann type memory interface is used, with 32 bit physical addresses. Write buffering is not used. Either aggressive or demand prefetching of instructions is supported. For aggressive prefetch, the prefetch unit requests quadword chunks until a branch or jump is encountered. If the branch target is encoded, prefetching continues from the branch target. If not, prefetching ceases. In demand fetch, the execution unit takes a mispredicted branch. Demand prefetch, originating at the execution unit, is the default mode.

The Hobbit family chips are IEEE 1149.1 JTAG compatible, and include a four bit port for boundary scan and test. The Hobbit chip dissipates 250 mW at 3.3 volts, 20 Mhz, and uses less than 50 microAmps in standby. The associated System Management Chip dissipates 90 mW, and less than 50 microAmps in standby. The Hobbit family is designed to work between 3 and 5 volt levels. The Hobbit chip is packaged as a 132 pin plastic quad flat pack (PQFP). The system management chip is a 208 pin package.

The Hobbit's instruction repertoire is small, consisting of only 44 different hardwired instructions. These are referred to as 2 1/2 operand instructions. This refers to an instruction which specifies two source operands, and defaults to a known destination operand

The instruction format does allow for variable length instructions. Pipelining is still feasible, because there are only three formats, identified in the first byte. Instructions are identified as 1, 3, or 5 parcel, a parcel being 16 bits, or two bytes. Single cycle execution is maintained. Code density distribution studies by AT&T have shown that over 70% of executed instructions are 2 byte, with an average of less than 4 bytes per instruction. That compares favorably with 32 bit wide fixed instruction formats.

No floating point instructions are included. Quite unlike other RISC architectures, the processor is a memory-memory architecture, with no specific load/store instructions.

The integer operations add, subtract, multiply, and divide are supported, as well as the AND, OR, and XOR logical operations. There is a left shift operation, and both an arithmetic and logical right shift. Logical comparisons are for equality, and signed and unsigned greater than.

Branch folding is used to reduce branches to zero cycles. This may at first appear magic, but it is really clever engineering. In essence, a branch instruction is folded back into the previous instruction, such that the program counter is not incremented as is the case for sequential instructions, but gets the branch value instead. The branch instruction disappears into the previous instruction. Of course, this works well for unconditional branches, but what about conditional ones? Actually, each instruction has two next-pc fields (one is the alternate next-pc), and branch prediction is done at the compiler level. A static branch prediction hint bit is inserted by the compiler to assist the hardware in making the right guess, thus reducing the non-sequential flow of control penalty. Delayed branch slots, or load delays are not required in this scheme.

Program control is modified by call and return instructions, and opcodes to allocate and free stack space, and fill the stack cache. An unconditional jump instruction is included. A flush may be forced to the decoded instruction cache, or the prefetch buffer.

The Hobbit has no user-visible registers. To avoid penalties in addressing external memory, the Hobbit caches the top 256 bytes of the stack on-chip into a circular buffer, with associated head and tail pointers. The interesting thing about a stack cache is that no tags are required. Since the cache is a FIFO data structure, only a bounds check is needed to determine if an item is present in on-chip cache. In a sense, the stack cache is like variable length register windows. By AT&T's studies, the hit rate for data references is about 88%.

Control registers include the configuration register, the fault register, the JTAG register, Interrupt and maximum stack pointer, program counter and program status word, shadow register, stack pointer, segment table and vector base, and two timer registers.

Unlike most RISC machines, the Hobbit has multiple operand addressing modes. There are 7 operand addressing modes: immediate, absolute (direct and indirect), stack offset (direct and indirect), program counter relative, and register. In immediate and absolute modes, the 32 bit operand value is stored in the instruction itself. In stack offset mode, a signed 2's complement offset is stored in the instruction. The offset is added to the SP. In stack offset indirect, the result of the addition is not the address of the operand, but the address of the address. Absolute indirect stores the address of the address of the operand in the instruction, and are used only for certain instructions, such as Jump and Call. Similarly, these instructions can use the pc relative mode, where the signed, 2's complement value in the instruction is added to the address of the instruction to determine the operand address. In register addressing mode, access is provided to the internal registers. Jump/call instructions have 4 addressing modes: absolute, indirect, pc-relative, and stack offset indirect.

High code density is maintained by the memory-memory architecture. In this scheme, memory access time is controlled by cached reads and writes. This optimizes the first (instructions/task) term of the performance equation (section 3), which is usually ignored by RISC designs. For example, the following table illustrates the addition of two items in memory in both approaches:

Memory-memory Load-store

add A,B Load A

Load B

Add

Store A

Thus, if the memory-memory architecture can be made to work at the one instruction/cycle rate, it has obvious advantages. Either big-endian or little-endian byte ordering can be handled by means of a settable bit in the PSW. Instructions can generally operate on signed or unsigned bytes, half-words, or words.

The Hobbit development toolbox includes a cross compiler (for 'c'), a cross-assembler and linker, and debugger. An In-circuit emulator is available, as well as a pc-board host for the Hobbit chip.

The Hobbit does not support floating point.

The Hobbit uses a three kbyte, three-way set associate cache for encoded instructions, and a 256 byte stack cache holds the top of the user stack on chip. In addition, there is a 32 entry direct mapped (1:1) decoded instruction cache.

There is no data cache per se, but the on-chip stack cache holds the top (most recently referenced) section of the user cache. This is a unique approach to caching a user data structure. The stack cache is implemented as a bank of 64 4-byte registers in a circular structure, with a stack pointer and a maximum stack pointer. The stack pointer is the lowest address of data in the cache. The maximum stack pointer points above the highest address of data in the cache. If the address of a referenced data item lies between the two register values, the data is present in the stack. No facilities for secondary cache or multiprocessing/coherency modeling are provided.

Hobbit's on-chip MMU, if enabled by setting the PSW virtual/physical bit to 1, translates virtual to physical addresses in paged or non-paged segments. The virtual address space is specified by a 32-bit address, and is divided into 1024 segments of 4 megabytes each. Paged segments are divided further into 4Kbyte pages. Non-paged segments can vary in size from 4kbytes to 4 Megabytes. In the present page, a zero cycle translation is possible. Translation makes use of map tables of physical addresses, which are 4096 bytes. User and kernel access rights bits are included. Data areas may be marked read-only.

User and kernel segments identifiers are maintained, and segments can be marked with read/write permissions.

Interrupts signal a need for servicing an external device, or the timers. Exceptions signal errors in the executing program. Three interrupt input lines are provided, and the two timers can have time-out interrupts enabled. Seven levels of priority are allowed. A non-maskable interrupt is supported. Interrupt latency is usually one cycle, since most instructions complete before the interrupt is serviced. Long instructions such as divide are interruptible. Exception conditions include division by zero, illegal instruction, privilege violation, and read/write faults.

Interrupts are handled and prioritized by the 92011 System Management chip. The 92011 has 8 programmable input pins that it vectors to the 5 interrupt pins of the Hobbit. Interrupt mask registers are provided.

The Hobbit is a unique blend of CISC-like and RISC-like features. Like RISC machines, it has a small (44), hardwired instruction repertoire. It is pipelined, in spite of having variable length instructions. It achieves single cycle execution or better, with branch folding. The Hobbit is not a load-store architecture, but rather a memory-memory architecture. Also, unlike most RISC machines, the Hobbit has multiple operand addressing modes. It achieves cisc-like high code density, by using the memory-memory architecture to optimize the instructions/task term of the performance equation. Finally, it uses an on-chip data cache to minimize wait time for operands.

References

1) ATT92010 Hobbit Microprocessor data sheet, Dec. 1992

2) ATT92011 System Management Device data sheet, Dec. 1992

3) ATT92012 PCMCIA Controller data sheet, Dec. 1992

4) ATT92013 Peripheral Controller data sheet, Dec. 1992

5) ATT92014 Video Display Controller data sheet, Dec. 1992

6) O'Brien, Donald; "On the Origin of the Name `Hobbit'", Mythlore, Winter 89 v 16 n 2 n 60 Page: 32