Assembly Language And Addressing Modes

Posted by: repair  :  Category: Basic Computer Architecture

With the hardware ready, a computer requires software to make it more than an
inactive collection of components. Microprocessors fetch instructions from program
memory, each consisting of an opcode and, optionally, additional operands following
the opcode. These opcodes are binary data that are easy for the microprocessor to
decode, but they are not very readable by a person. To enable a programmer to
more easily write software, an instruction representation called assembly language
was developed. Assembly language is a low-level language that directly represents
each binary opcode with a human-readable text mnemonic. For example, the
mnemonic for an unconditional branch-to-subroutine instruction could be BSR.
In contrast, a high-level language such as C++ or Java contains more complex
logical expressions that may be automatically converted by a compiler to dozens
of microprocessor instructions. Assembly language programs are assembled,
rather than compiled, into opcodes by directly translating each mnemonic into
its binary equivalent.

Assembly language also makes programming easier by enabling the usage of
text labels in place of hard-coded addresses. A subroutine can be named FOO,
and when BSR FOO is encountered by the assembler, a suitable branch target
address will be automatically calculated in place of the label FOO. Each type of
assembler requires a slightly different format and syntax, but there are general
assembly language conventions that enable a programmer to quickly adapt to
speci?c implementations once the basics are understood. An assembly language
program listing usually has three columns of text followed by an optional comment
column as shown in Fig. 3.14. The ?rst column is for labels that are placeholders
for addresses to be resolved by the assembler. Instruction mnemonics are located
in the second column. The third column is for instruction operands.

This listing uses the Motorola 6800 familys assembly language format. Though
developed in the 1970s, 68xx microprocessors are still used today in embedded
applications such as automobiles and industrial automation. The ?rst line of this
listing is not an instruction, but an assembler directive that tells the assembler to
locate the program at memory location $100. When assembled, the listing is
converted into a memory dump that lists a range of memory addresses and
their corresponding contents opcodes and operands. Assembler directives
are often indicated with a period pre?x. The program in Fig. 3.14 is very simple:
it counts to 30 ($1E) and then sends the Z character out the serial port. It
continues in an in?nite loop by returning to the start of the program when the
serial port routine has completed its task. The subroutine to handle the serial
port is not shown and is referenced with the SEND_CHAR label. The program
begins by clearing accumulator A (the 6800 has two accumulators: ACCA and
ACCB). It then enters an incrementing loop where the accumulator is incremented
and then compared against the terminal count value, $1E. The # pre?x tells the
assembler to use the literal value $1E for the comparison. Other alternatives are
possible and will soon be discussed. If ACCA is unequal to $1E, the
microprocessor goes back to increment ACCA. If equal, the accumulator is
loaded with the ASCII character to be transmitted, also a literal operand.
The assumption here is that the SEND_CHAR subroutine transmits whatever
is in ACCA. When the subroutine ?nishes, the program starts over with the
branch-always instruction.

Each of the instructions in the preceding program contains at least one operand.
CLRA and INCA have only one operand: ACCA. CMPA and LDAA each have
two operands: ACCA and associated data. Complex microprocessors may
reference three or more operands in a single instruction. Some instructions can
reference different types of operands according to the requirements of the program
being implemented. Both CMPA and LDAA reference literal operands in this
example, but a pro- grammer cannot always specify a predetermined literal data
value directly in the instruction sequence. Operands can be referenced in a
variety of manners, called addressing modes, depending on the type of instruction
and the type of operand. Some types of instructions inherently use only one
addressing mode, and some types have multiple modes. The manners of
referencing operands can be categorized into six basic addressing modes: implied,
immediate, direct, relative, indirect, and indexed. To fully understand how a
microprocessor works, and to ef?ciently utilize an instruction set, it is
necessary to explore the various mechanisms used to reference data.


Implied addressing speci?es the operand of an instruction as an inherent property
of that instruction. For example, CLRA implies the accumulator by de?nition.
No additional addressing information following the opcode is needed.


FIGURE 3.14 Typical assembly language listing.


FIGURE 3.14 Typical assembly language listing.


Immediate addressing places an operands value literally into the instruction
sequence. LDAA#Z has its primary operand immediately available following the
opcode. An immediate oper-and is indicated with the # pre?x in some assembly
languages. Eight-bit microprocessors with eight-bit instruction words cannot ?t an
immediate value into the instruction word itself and, therefore, require that an
extra byte following the opcode be used to specify the immediate value. More
powerful 32-bit microprocessors can often ?t a 16-bit or 24-bit immediate value
within the instruction word. This saves an additional memory fetch to obtain the
operand.

Direct addressing places the address of an operand directly into the instruction
sequence. Instead of specifying LDAA #Z, the programmer could specify
LDAA $1234. This version of the instruction would tell the microprocessor to
read memory location $1234 and load the resulting value into the accumulator.
The operand is directly available by looking into the memory address speci?ed
just following the instruction. Direct addressing is useful when there is a need to
read a ?xed memory location. Usage of the direct addressing mode has a slightly
different impact on various microprocessors. A typical 8-bit microprocessor has a
16-bit address space, meaning that two bytes following the opcode are necessary
to represent a direct address. The 8-bit microprocessor will have to perform two
additional 8-bit fetch operations to load the direct address. A typical 32-bit
microprocessor has a 32-bit address space, meaning that 4 bytes following the
opcode are necessary. If the 32-bit microprocessor has a 32-bit data bus, only
one additional 32-bit fetch operation is required to load the direct address.


Relative addressing places an operands relative address into the instruction
sequence. A relative address is expressed as a signed offset relative to the current
value of the PC. Relative addressing is often used by branch instructions, because
the target of a branch is usually within a short distance of the PC, or current
instruction. For example, BNE INC_LOOP results in a branch-if-not-equal
backward by two instructions. The assembler automatically resolves the addresses
and calculates a relative offset to be placed following the BNE opcode. This
relative operation is performed by adding the offset to the PC. The new PC value
is then used to resume the instruction fetch and execution process. Relative
addressing can utilize both positive and negative deltas that are applied to the
PC. A microprocessors instruction format constrains the relative range that can
be speci?ed in this addressing mode. For example, most 8-bit microprocessors
provide only an 8-bit signed ?eld for relative branches, indicating a range
of +127/128 bytes. The relative delta value is stored into its own byte just
after the opcode. Many 32-bit microprocessors allow a 16-bit delta ?eld and
are able to ?t this value into the 32-bit instruction word, enabling the entire
instruction to be fetched in a single memory read. Limiting the range of a
relative operation is generally not an excessive constraint because of
softwares locality property. Locality in this context means that the set of
instructions involved in performing a speci?c task are generally relatively
close together in memory. The locality property covers the great majority of
branch instructions. For those few branches that have their targets outside of
the allowed relative range, it is necessary to perform a short relative branch to
a long jump instruction that speci?es a direct address. This reduces the
ef?ciency of the microprocessor by having to perform two branches when
only one is ideally desired, but the overall ef?ciency of saving extra memory
accesses for the majority of short branches is worth the trade-off.

Indirect addressing speci?es an operands direct address as a value contained
in another register. The other register becomes a pointer to the desired data. For
example, a microprocessor with two accumulators can load ACCA with the
value that is at the address in ACCB. LDAA (ACCB) would tell the
microprocessor to put the value of accumulator B onto the address bus,
perform a read, and put the returned value into accumulator A. Indirect
addressing allows writing software routines that operate on data at different
addresses. If a programmer wants to read or write an arbitrary entry in a data
table, the software can load the address of that entry into a microprocessor
register and then perform an indirect access using that register as a pointer.
Some microprocessors place constraints on which registers can be used as
references for indirect addressing. In the case of a 6800 microprocessor,
LDAA (ACCB) is not actually a supported operation but serves as a
syntactical example for purposes of discussion.

Indexed addressing is a close relative (no pun intended) of indirect addressing,
because it also refers to an address contained in another register. However,
indexed addressing also speci?es an off-set, or index, to be added to that
register base value to generate the ?nal operand address:

base + offset = ?nal address.


Some microprocessors allow general accumulator registers to be used as
base-address registers, but others, such as the 6800, provide special index
registers for this purpose. In many 8-bit microprocessors, a full 16-bit address
cannot be obtained from an 8-bit accumulator serving as the base address.
Therefore, one or more separate index registers are present for the purpose
of indexed addressing. In contrast, many 32-bit microprocessors are able to
specify a full 32-bit address with any general-purpose register and place no
limitations on which register serves as the index register. Indexed addressing
builds upon the capabilities of indirect addressing by enabling multiple address
offsets to be referenced from the same base address. LDAA (X+$20) would
tell the microprocessor to add $20 to the index register, X, and use the
resulting address to fetch data to be loaded into ACCA. One simple example
of using indexed addressing is a subroutine to add a set of four numbers
located at an arbitrary location in memory. Before calling the subroutine, the
main program can set an index register to point to the table of numbers. Within
the subroutine, four individual addition instructions use the indexed addressing
mode to add the locations X+0, X+1, X+2, and X+3. When so written, the
subroutine is ?exible enough to be used for any such set of numbers. Because
of the similarity of indexed and indirect addressing, some microprocessors
merge them into a single mode and obtain indirect addressing by performing
indexed addressing with an index value of zero.


The six conceptual addressing modes discussed above represent the various
logical mechanisms that a microprocessor can employ to access data. It is
important to realize that each individual microprocessor applies these addressing
modes differently. Some combine multiple modes into a single mode (e.g.,
indexed and indirect), and some will create multiple submodes out of a single
mode. The exact variation depends on the speci?cs of an individual
microprocessors architecture.

With the various addressing modes modifying the speci?c opcode and operands
that are presented to the microprocessor, the bene?ts of using assembly language
over direct binary values can be observed. The programmer does not have to
worry about calculating branch target addresses or resolving different addressing
modes. Each mnemonic can map to several unique opcodes, depending on
the addressing mode used. For example, the LDAA instruction in Fig. 3.14 could
easily have used extended addressing by specifying a full 16-bit address at which
the ASCII transmit-value is located. Extended addressing is the 6800s mechanism
for specifying a 16-bit direct address. (The 6800s direct addressing involves only
an eight-bit address.) In either case, the assembler would determine the correct
opcode to represent LDAA and insert the correct binary values into the memory
dump. Additionally, because labels are resolved each time the program is
assembled, small changes to the program can be made that add or remove
instructions and labels, and the assembler will automatically adjust the resulting
addresses accordingly.

Programming in assembly language is different from using a high-level language,
because one must think in smaller steps and have direct knowledge about the
microprocessors operation and architecture. Assembly language is processor-
speci?c instead of generic, as with a high-level language. Therefore, assembly
language programming is usually restricted to special cases such as boot code
or routines in which absolute ef?ciency and performance are demanded.
A human programmer will usually be able to write more ef?cient assembly
language than a high-level language compiler can generate. In large programs,
the slight inef?ciency of the compiler is well worth the trade-off for ease of
programming in a high-level language. However, time-critical routines such
as I/O drivers or ISRs may bene?t from manual assembly language coding.



By : E-book Complete_Digital_Design

Extending The Microprocessor Bus

Posted by: repair  :  Category: Basic Computer Architecture

A microprocessor bus is intended to directly connect to memory and I/O devices
that are in close proximity to the microprocessor. As such, its electrical and
functional properties are suited for relatively short interconnecting wires and
relatively simple device interfaces that respond with data soon after the
microprocessor issues a request. Many computers, however, require some
mechanism to extend the microprocessor bus so that additional hardware,
such as plug-in expansion cards or memory modules, can enhance the system
with new capabilities. Supporting these modular extensions to the computers
architecture can be relatively simple or quite complex, depending on the
required degree of expandability and the physical distances across which
data must be communicated.

Expansion buses are generally broken into two categories, memory and I/O,
because these groups respective characteristics are usually quite different.
General-purpose memory is a high- bandwidth resource to which the
microprocessor requires immediate access so that it can maintain a high level
of throughput. Memory is also a predictable and regular structure, both
logically and physically. If more RAM is added to a computer, it is fairly
certain that some known number of chips will be required for a given quantity
of memory. In contrast, I/O by nature is very diverse, and its bandwidth
requirements are usually lower than that of memory. I/O expansion usually
involves cards of differing complexity and architecture as a result of the wide
range of interfaces that can be supported (e.g., disk drive controller versus
serial port controller). Therefore, an I/O expansion bus must be ?exible enough
to interface with a varying set of modules, some of which may not have been
conceived of when the computer is ?rst designed.

Memory expansion buses are sometimes direct extensions of the microprocessor
bus. From the preceding 8-bit computer example, the upper 16 kB of memory
could be reserved for future expan- sion. A provision for future expansion could
be as simple as adding a connector or socket for an ex- tra memory chip. In this
case, no special augmentation of the microprocessor bus is required.

However, in a larger system with more address space, provisions must be made
for more than one additional memory chip. In these situations, a simple buffered
extension of the microprocessor bus may suf?ce. A buffer, in this context,
is an IC that passes data from one set of pins to another, thereby
electrically separating two sections of a bus. As shown in Fig. 3.12, a buffer
can extend a mi- croprocessor bus so that its logical functionality remains
unchanged, but its electrical characteristics are enhanced to provide connectivity
across a greater distance (to a multichip memory expansion module).
A unidirectional address buffer extends the address bus from the microprocessor
to expan- sion memory devices. A bidirectional data buffer extends the bus away
from the microprocessor on writes and toward the microprocessor on reads.
The direction of the data buffer is controlled according to the state of read/write
enable signals generated by the microprocessor.

More complex memory structures may contain dedicated memory control logic
that sits between the microprocessor and the actual memory devices. Expanding
such a memory architecture is generally accomplished by augmenting the
back-side memory device bus as shown in Fig. 3.13 rather than by adding
additional controllers onto an extended microprocessor bus. Such an
expansion scheme may or may not require buffers, depending on the electrical
characteristics of the bus in question.

I/O buses may also be direct extensions of the microprocessor bus. The original
expansion bus in the IBM PC, developed in the early 1980s, is essentially an
extended Intel 8088 microprocessor bus that came to be known as the Industry
Standard Architecture (ISA) bus. Each I/O card on the ISA bus is mapped in a
unique address range in the microprocessors memory. Therefore, when software
wants to read or write a register on an I/O card, it simply performs an access to
the desired location. The ISA bus added a few features beyond the raw 8088 bus,
including DMA and variable wait states for slow I/O devices. A wait state results
when a device cannot immediately respond to the microprocessors request and
asserts a signal to stretch the access so that it can respond properly.


FIGURE 3.12 Buffered microprocessor bus for memory expansion. FIGURE 3.13 Extended memory controller bus.


FIGURE 3.13 Extended memory controller bus.


Direct extensions such as the ISA bus are fairly easy to implement and serve well
in applications where I/O response time does not unduly restrict microprocessor
throughput. As computers have gotten faster, the throughput of microprocessors
has rapidly outstripped the response times of all but the fastest I/O devices. In
comparison to a modern microprocessor, a hard-disk controller is rather slow,
with response times measured in microseconds rather than nanoseconds.
Additionally, as bus signals become faster, the permissible length of interconnecting
wires decreases, limiting their ex- pandability. These and other characteristics
motivate the decoupling of the microprocessors local bus from the computers I/O bus.

An I/O bus can be decoupled from the microprocessor bus by inserting an
intermediate bus con- troller between them that serves as an interface, or translator,
between the two buses. Once the buses are separated, activity on one bus does
not necessarily obstruct activity on the other. If the microprocessor wants to
write a block of data to a slow device, it can rapidly transfer that data to the
bus controller and then continue with other operations at full speed while the
controller slowly transfers the data to the I/O device. This mechanism is called
a posted-write, because the bus controller allows the microprocessor to
complete, or post, its write before the write actually completes. Separate buses
also open up the possibility of multiple microprocessors or logic elements
performing I/O operations without con?icting with the central microprocessor.
In a multimaster system, a specialized DMA controller can transfer data
between two peripherals such as disk controllers while the microprocessor
goes about its normal business.

The Peripheral Component Interconnect (PCI) bus is the industry-standard follow-on
to the ISA bus, and it implements such advanced features as posted-writes,
multiple-masters, and multiple bus segments. Each PCI bus segment is separated
from the others via a PCI bridge chip. Only traf?c that must travel between buses
crosses a bridge, thereby reducing congestion on individual PCI bus segments.
One segment can be involved in a data transfer between two devices without
affecting a simultaneous transfer between two other devices on a different segment.
These performanceenhancing features do not come for free, however. Their cost is
manifested by the need for dedicated PCI control logic in bridge chips and in the I/O
devices themselves. It is generally simpler to implement an I/O device that is
directly mapped into the microprocessors memory space, but the overall
performance of the computer may suffer under demanding applications.



By : E-book Complete_Digital_Design

Direct Memory Access

Posted by: repair  :  Category: Basic Computer Architecture

Transferring data from one region of memory to another is a common task
performed within a computer. Incoming data may be transferred from a serial
communications controller into memory, and outgoing data may be transferred
from memory to the controller. Memory-to-memory transfers are common,
too, as data structures are moved between subprograms, each of which may
have separate regions of memory set aside for its private use. The speed with
which memory is transferred normally depends on the time that the
microprocessor takes to perform successive read and write operations.
Each byte transferred requires several microprocessor operations: load
accumulator, store accumulator, update address for next byte, and check
if there is more data. Instead of simply moving a stream of bytes without
interruption, the microprocessor is occupied mostly by the overhead of
calculating new addresses and checking to see if more data is waiting.
Computers that perform a high volume of memory transfers may exhibit
performance bottlenecks as a result of the overhead of having the
microprocessor spend too much of its time reading and writing memory.

Memory transfer performance can be improved using a technique called direct
memory access, or DMA. DMA logic intercedes at the microprocessors request
to directly move data between a source and destination. A DMA controller
(DMAC) sits on the microprocessor bus and contains logic that is speci?cally
designed to rapidly move data without the overhead of simultaneously fetching
and decoding instructions. When the microprocessor determines that a block
of data is ready to move, it programs the DMAC with the starting address of the
source data, the number of bytes to move, and the starting address of the
destination data. When the DMAC is triggered, the microprocessor temporarily
relinquishes control of its bus so the DMAC can take over and quickly move the
data. The DMAC serves as a surrogate processor by directly generating
addresses and reading and writing data. From the microprocessor bus perspective,
nothing has changed, and data transfers proceed normally despite being controlled
by the DMAC rather than the microprocessor. Figure 3.11 shows the basic
internal structure of a DMAC.

A DMA transfer can be initiated by either the microprocessor or an I/O device
that contains logic to assert a request to the DMAC. DMA transfers are generally
broken into two categories: peripheral/memory and memory/memory. Peripheral/
memory transfers move data to a peripheral or re-trieve data from a peripheral.
A peripheral/memory transfer can be triggered by a DMA-aware I/O-device
when it is ready to accept more outgoing data or incoming data has arrived.
These are called single-address transfers, because the DMAC typically controls
only a single address that of the memory side of the transfer. The peripheral
address is typically a ?xed offset into its register set and is asserted by supporting
control logic that assists in the connectivity between the peripheral and the
DMAC.

DMA transfers do not have to be continuous, and they are often not in the case
of a peripheral transfer. If the microprocessor sets up a DMA transfer from a
serial communications controller to memory, it programs the DMAC to write a
certain quantity of data into memory. However, the transfer does not begin
until the serial controller asserts a DMA request indicating that data is ready.

When this request occurs, the DMAC arbitrates for access to the microprocessor
bus by asserting a bus request. Some time later, the microprocessor or its support
logic will grant the bus to the DMAC and temporarily pause the microprocessors
bus activity. The DMAC can then transfer a single unit of data from the serial
controller into memory. The unit of data transfer may be any number of bytes. When
?nished, the DMAC relinquishes control of the bus back to the microprocessor.

Memory/memory transfers move data from one region in memory to another.
These are called dual-address transfers, because the DMAC controls two
addresses into memory source and destination. Memory/memory transfers
are triggered by the microprocessor and can execute continuously, because
the data block to be moved is ready and waiting in memory.

Even when DMA transfers execute one byte at a time, they are still more
ef?cient than the micro- processor, because the DMAC is capable of transferring
a byte or word (per the microprocessors data bus width) in a single bus cycle
rather than the microprocessors load/store mechanism with additional overhead.
There is some initial overhead in setting up the DMA transfer, so it is not ef?cient
to use DMA for very short transfers. If the microprocessor needs to move only
a few bytes, it should probably do so on its own. However, the DMAC
initialization overhead is more than compensated for if dozens or hundreds
of bytes are being moved.


FIGURE 3.11 DMA controller block diagram.


FIGURE 3.11 DMA controller block diagram.


A typical DMAC supports multiple channels, each of which controls a different
DMA transfer. While only one transfer can execute at any given moment, multiple
transfers can be interleaved to prevent one peripheral from being starved for
data while another is being serviced. Because a typical peripheral transfer is not
continuous, each DMA channel can be assigned to each active peripheral.

A DMAC can have one channel con?gured to load incoming data from a serial
controller, another to store data to a disk drive controller, and a third to move data
from one region of memory to another. Once initialized by the microprocessor,
the exact order and interleaving of multiple channels is re-solved by the individual
DMA request signals, and any priority information is stored in the DMAC.
When a DMAC channel has completed transferring the requested quantity of data,
the DMAC asserts an interrupt to the microprocessor to signal that the data has
been moved. At this point, the microprocessor can restart a new DMA transfer
if desired and invoke any necessary routines to process data that has been moved.

External DMA support logic may be necessary, depending on the speci?c DMAC,
microprocessor, and peripherals that are being used. Some microprocessors contain
built-in DMAC arbitration logic. Some peripherals contain built-in DMA request logic,
because they are speci?cally designed for these high-ef?ciency memory transfers.
Custom arbitration logic typically functions by waiting for the DMAC to request the
bus and then pausing the microprocessors bus transfers until the DMAC relinquishes
the bus. This pause operation is performed according to the speci?cations of the
particular microprocessor. Custom peripheral control logic can include DMAC
read/write interface logic to assert the correct peripheral address when a transfer
begins and perform any other required mapping between the DMACs transfer
enable signaling and the peripherals read/write interface.



By : E-book Complete_Digital_Design