Next: Performance Counters
Up: Appendices
Previous: Appendices
Contents
PTLsim uop Reference
The following sections document the semantics and encoding of each
micro-operation (uop) supported by the PTLsim processor core. The
opinfo[] table in ptlhwdef.cpp and
constants in ptlhwdef.h give actual numerical values
for the opcodes and other fields described below.
Merging Rules
| Mnemonic |
Syntax |
Operation
op |
Merging Rules:
The x86 compatible ALUs implement operations on 1, 2, 4 or
8 byte quantities. Unless otherwise indicated, all operations take
a 2-bit size shift field (sz) used to determine
the effective size in bytes of the operation as follows:
- sz = 0: Low byte of rd
is set to the 8-bit result; high 7 bytes of rd
are set to corresponding bytes of ra.
- sz = 1: Low two bytes of rd
is set to the 16-bit result; high 6 bytes of rd
are set to corresponding bytes of ra.
- sz = 2: Low four bytes of rd
is set to the 32-bit result; high 4 bytes of rd
are cleared to zero in accordance with x86-64 zero extension
semantics. The ra operand is unused and should
be REG_zero.
- sz = 3: All 8 bytes of rd
are set to the 64-bit result. ra is
unused and should be REG_zero.
Flags are calculated based on the sz-byte
value produced by the ALU, not the final 64-bit result in rd.
Other Pseudo-Operators
The descriptions in this reference use various pseudo-operators
to describe the semantics of each uop. These operators are described
below.
EvalFlags(ra)
The EvalFlags pseudo-operator evaluates
the ZAPS, CF, OF flags attached to the source operand ra
in accordance with the type of condition code evaluation specified
by the uop. The operator returns 1 if the evaluation is true; otherwise
0 is returned.
SignExt(ra,
N)
The SignExt operator sign extends
the ra operand by the number of bits specified by N. Specifically,
bit ra[N] is copied to all high order bits
from bit 63 down to bit N. If N is not specified,
it is assumed to mean the number of bits in the effective size of
the uop's result (as described under Merging Rules).
MergeWithSFR(mem, sfr)
The MergeWithSFR pseudo-operator
is described in the reference page for load uops.
MergeAlign(mem, sfr)
The MergeAlign pseudo-operator is
described in the reference page for load uops.
mov and or xor andnot ornot nand
nor eqv
Logical Operations
| Mnemonic |
Syntax |
Operation
mov |
Notes:
- All operations merge the ALU result with ra
and generate flags in accordance with the standard x86 merging
rules described previously.
add sub addadd addsub subadd subsub
addm subm addc subc
Add and Subtract
| Mnemonic |
Syntax |
Operation
add |
Notes:
- All operations merge the ALU result with ra
and generate flags in accordance with the standard x86 merging
rules described previously.
- The adda and adds
uops are useful for small shifts and x86 three-operand LEA-style
address generation.
- The addc and subc
uops use only the carry flag field of their rc operand; the value
is unused.
- The addm and subm
uops mask the result by the immediate in rc.
They are used in microcode for modular stack arithmetic.
sel
Conditional Select
| Mnemonic |
Syntax |
Operation
sel.cc |
Notes:
- cc is any valid condition code flag
evaluation
- The sel uop merges the selected operand
with ra in accordance with the standard x86
merging rules described previously
- The 64-bit result and all flags are treated as a single value
for selection purposes, i.e. the flags attached to the selected input
are passed to the output
- If one of the (ra, rb) operands is not valid (has FLAG_INV
set) but the selected operand is valid, the result is valid. This
is an exception to the invalid bit propagation rule only when the
selected input is valid. If the rc operand
is invalid, the result is always invalid.
- If any of the inputs are waiting (FLAG_WAIT
is set), the uop does not issue, even if the selected input was ready.
This is a pipeline simplification.
- set rd = (a),b
- sel rd = b,0,1,c
set
Conditional Set
| Mnemonic |
Syntax |
Operation
set.cc |
Notes:
- cc is any valid condition code flag
evaluation
- The value 0 or 1 is zero extended to the operation size and
merged with rb in accordance with the standard
x86 merging rules described previously (except that set
uses rb as the merge target instead of ra)
- Flags attached to ra (condition code)
are passed through to the output
set.sub set.and
Conditional Compare and Set
| Mnemonic |
Syntax |
Operation
set.sub.cc |
Notes:
- The set.sub and set.and
uops take the place of a sub or and
uop immediately consumed by a set uop; this
is intended to shorten the critical path if uop merging is performed
by the processor
- cc is any valid condition code flag
evaluation
- The value 0 or 1 is zero extended to the operation size and
then merged with rc in accordance with the
standard x86 merging rules described previously (except that set.sub
and set.and use rc as
the merge target instead of ra)
- Flags generated as the result of the comparison are passed
through with the result
br
Conditional Branch
| Mnemonic |
Syntax |
Operation
br.cc |
Notes:
- cc is any valid condition code flag
evaluation
- The rip (user-visible instruction
pointer register) is reset to one of two immediates. If the flags
evaluation is true, the riptaken immediate
is selected; otherwise the ripseq immediate
is selected.
- If the flag evaluation is false (i.e., ripseq is selected),
the BranchMispredict internal exception is
raised. The processor should annul all uops after the branch and restart
fetching at the RIP specified by the result (in this case, ripseq).
- Branches are always assumed to be taken. If the branch is
predicted as not taken (i.e. future uops come from the next sequential
RIP after the branch), it is the responsibility of the decoder or
frontend to swap the riptaken and ripseq
immediates and invert the condition of the branch. All condition
encodings can be inverted by inverting bit 0 of the 4-bit condition
specifier.
- The destination register should always be REG_rip;
otherwise this uop is undefined.
- If the target RIP falls within an unmapped page, not present
page or a page marked as no-execute (NX), the PageFaultOnExec
exception is taken.
- No flags are generated by this uop
br.sub br.and
Compare and Conditional Branch
| Mnemonic |
Syntax |
Operation
br.cc |
Notes:
- The br.sub and br.and
uops take the place of a sub or and
uop immediately consumed by a br uop; this
is intended to shorten the critical path if uop merging is performed
by the processor
- cc is any valid condition code flag
evaluation
- The rip (user-visible instruction
pointer register) is reset to one of two immediates. If the flags
evaluation is true, the riptaken immediate
is selected; otherwise the ripseq immediate
is selected
- If the flag evaluation is false (i.e., ripseq is selected),
the BranchMispredict internal exception is
raised. The processor should annul all uops after the branch and restart
fetching at the RIP specified by the result (in this case, ripseq)
- Branches are always assumed to be taken. If the branch is
predicted as not taken (i.e. future uops come from the next sequential
RIP after the branch), it is the responsibility of the decoder or
frontend to swap the riptaken and ripseq
immediates and invert the condition of the branch. All condition
encodings can be inverted by inverting bit 0 of the 4-bit condition
specifier.
- The destination register should always be REG_rip;
otherwise this uop is undefined
- If the target RIP falls within an unmapped page, not present
page or a page marked as no-execute (NX), the PageFaultOnExec
exception is taken.
- Flags generated as the result of the comparison are passed
through with the result
jmp
Indirect Jump
| Mnemonic |
Syntax |
Operation
jmp |
Notes:
- The rip (user-visible instruction
pointer register) is reset to the target address specified by ra
- If the ra operand does not match
the riptaken immediate, the BranchMispredict
internal exception is raised. The processor should annul all uops
after the branch and restart fetching at the RIP specified by the
result (in this case, ra)
- Indirect jumps are always assumed to match the predicted target
in riptaken. If some other target is predicted,
it is the responsibility of the decoder or frontend to set the riptaken
immediate to that predicted target
- The destination register should always be REG_rip;
otherwise this uop is undefined
- If the target RIP falls within an unmapped page, not present
page or a marked as no-execute (NX), the PageFaultOnExec
exception is taken.
- No flags are generated by this uop
jmpp
Indirect Jump Within Microcode
| Mnemonic |
Syntax |
Operation
jmpp |
Notes:
- The jmpp uop redirects uop fetching
into microcode not accessible as x86 instructions. The target address
(inside PTLsim, not x86 space) is specified by ra
- If the ra operand does not match
the riptaken immediate, the BranchMispredict
internal exception is raised. The processor should annul all uops
after the branch and restart fetching at the RIP specified by the
result (in this case, ra)
- Indirect jumps are always assumed to match the predicted target
in riptaken. If some other target is predicted,
it is the responsibility of the decoder or frontend to set the riptaken
immediate to that predicted target
- The destination register should always be REG_rip;
otherwise this uop is undefined
- The user visible rip register is not updated after this uop
issues; otherwise it would point into PTLsim space not accessible
to x86 code. Updating is resumed after a normal jmp
issues to return to user code. It is the responsibility of the decoder
to move the user address to return to into some temporary register
(traditionally REG_sr2 but this is not required).
- No flags are generated by this uop
bru
Unconditional Branch
| Mnemonic |
Syntax |
Operation
bru |
Notes:
- The rip (user-visible instruction
pointer register) is reset to the specified immediate. The processor
may redirect fetching from the new RIP
- No exceptions are possible with unconditional branches
- If the target RIP falls within an unmapped page, not present
page or a marked as no-execute (NX), the PageFaultOnExec
exception is taken.
- No flags are generated by this uop
brp
Unconditional Branch Within Microcode
| Mnemonic |
Syntax |
Operation
bru |
Notes:
- The brp uop redirects uop fetching
into microcode not accessible as x86 instructions. The target address
(inside PTLsim, not x86 space) is specified by the riptaken
immediate
- The rip (user-visible instruction
pointer register) is reset to the specified riptaken
immediate. The processor may redirect fetching from the new
RIP
- No exceptions are possible with unconditional branches
- The user visible rip register is not updated after this uop
issues; otherwise it would point into PTLsim space not accessible
to x86 code. Updating is resumed after a normal jmp
uop issues to return to user code. It is the responsibility of the
decoder to move the user address to return to into some temporary
register (traditionally REG_sr2 but this
is not required).
- No flags are generated by this uop
chk
Check Speculation
| Mnemonic |
Syntax |
Operation
chk.cc |
Notes:
- The chk uop verifies certain
properties about ra. If this verification check passes, no
action is taken. If the check fails, chk
signals an exception of the user specified type in the rc
immediate. The result of the chk
uop in this case is the user specified RIP to recover at after the
check failure is handled in microcode. This recovery RIP is saved
in the recoveryrip internal register.
- This mechanism is intended to allow simple inlined uop sequences
to branch into microcode if certain conditions fail, since normally
inlined uop sequences cannot contain embedded branches. One example
use is in the REP series of instructions
to ensure that the count is not zero on entry (a special corner case).
- Unlike most conditional uops, the chk
uop directly checks the numerical value of ra against
zero, and ignores any attached flags. Therefore, the cc
condition code flag evaluation type is restricted to the subset
(e, ne, be, nbe, l, nl, le, nle).
- No flags are generated by this uop
ld ld.lo ld.hi ldx ldx.lo ldx.hi
Load
| Mnemonic |
Syntax |
Operation
ld |
Notes:
- The PTLsim load unit model is described in substantial
detail in Section 21; this section only gives
an overview of the load uop semantics.
- The ld family of uops loads values
from the virtual address specified by the sum ra
+ rb. The ld
form zero extends the loaded value, while the ldx
form sign extends the loaded value to 64 bits.
- All values are zero or sign extended to 64 bits; no subword
merging takes place as with ALU uops. The decoder is responsible for
following the load with an explicit mov uop
to merge 8-bit and 16-bit loads with their old destination register.
- The sfra operand specifies the store
forwarding register (a.k.a. store buffer) to merge with data from
the cache to form the final result. The inherited SFR may be determined
dynamically by querying a store queue or can be predicted statically.
- If the load misses the cache, the FLAG_WAIT
flag of the result is set.
- Load uops do not generate any other condition code flags
Unaligned Load Support:
- The processor supports unaligned loads via a pair of ld.lo
and ld.hi uops; an overview can be found
in Section 5.6. The alignment type
of the load is stored in the uop's cond field (0 = ld,
1 = ld.lo, 2 = ld.hi).
- The ld.lo uop rounds down its effective
address
to the nearest 64-bit
boundary and performs the load. The ld.hi
uop rounds
up to the next 64-bit
boundary, performs a load at that address, then takes as its third
rc operand the first (ld.lo) load's result.
The two loads are concatenated into a 128-bit word and the final unaligned
data is extracted (and sign extended if the ldx
form was used).
- Special corner case for when the actual user address (ra
+ rb) did not actually require any
bytes in the 8-byte range loaded by the ld.hi
uop (i.e. the load was contained entirely within the low 64-bit aligned
chunk). Since it is perfectly legal to do an unaligned load to the
very end of the page such that the next 64 bit chunk is not mapped
to a valid page, the ld.hi uop does not actually
access memory; the entire result is extracted from the prior ld.lo
result in the rc operand.
Exceptions:
- UnalignedAccess if the address (ra
+ rb) is not aligned to an integral
multiple of the size in bytes of the load. Unaligned loads (ld.lo
and ld.hi) do not generate this exception.
Since x86 automatically corrects alignment problems, microcode must
handle this exception as described in Section 5.6.
- PageFaultOnRead if the virtual address (ra
+ rb) falls on a page not accessible
to the caller in the current operating mode, or a page marked as not
present.
- Various other exceptions and replay conditions may exist depending
on the specific processor core model.
st
Store
| Mnemonic |
Syntax |
Operation
st |
Notes:
- The PTLsim store unit model is described in substantial
detail in Section 22.1; this section only gives
an overview of the store uop semantics.
- The st family of uops prepares values
to be stored to the virtual address specified by the sum ra
+ rb.
- The sfra operand specifies the store
forwarding register (a.k.a. store buffer) to merge the data to be
stored (the rc operand) into. The inherited
SFR may be determined dynamically by querying a store queue or can
be predicted statically, as described in 22.1.
- Store uops only generate the SFR for tracking purposes; the
cache is only written when the SFR is committed.
- The store uop may issue as soon as the ra
and rb operands are ready, even if
the rc and sfra operands
are not known. The store must be replayed once these operands become
known, in accordance with Section 22.2.
- Store uops do not generate any other condition code flags
Unaligned Store Support:
- The processor supports unaligned stores via a pair of st.lo
and st.hi uops; an overview can be found
in Section 5.6. The alignment type
of the load is stored in the uop's cond field (0 = st,
1 = st.lo, 2 = st.hi).
- Stores are handled in a similar manner, with st.lo
and st.hi rounding down and up to store parts of the
unaligned value in adjacent 64-bit blocks.
- The st.lo uop rounds down its effective
address
to the nearest 64-bit
boundary and stores the appropriately aligned portion of the rc
operand that actually falls within that range of 8 bytes. The ld.hi
uop rounds
up to the next 64-bit
boundary and similarly stores the appropriately aligned portion of
the rc operand that actually falls within
that high range of 8 bytes.
- Special corner case for when the actual user address (ra
+ rb) did not actually touch any
bytes in the 8-byte range normally written by the st.hi
uop (i.e. the store was contained entirely within the low 64-bit aligned
chunk). Since it is perfectly legal to do an unaligned store to the
very end of the page such that the next 64 bit chunk is not mapped
to a valid page, the st.hi uop does not actually
do anything in this case (the bytemask of the generated SFR is set
to zero and no exceptions are checked).
Exceptions:
- UnalignedAccess if the address (ra
+ rb) is not aligned to an integral
multiple of the size in bytes of the store. Unaligned stores (st.lo
and st.hi) do not generate this exception.
Since x86 automatically corrects alignment problems, microcode must
handle this exception as described in Section 5.6.
- PageFaultOnWrite if the virtual address (ra
+ rb) falls on a write protected
page, a page not accessible to the caller in the current operating
mode, or a page marked as not present.
- LoadStoreAliasing if a prior load is found
to alias the store (see Section 22.2.1).
- Various other exceptions and replay conditions may exist depending
on the specific processor core model.
ldp ldxp
Load from Internal Microcode Space
| Mnemonic |
Syntax |
Operation
ldp |
Notes:
- The ldp and ldxp
uops load values from the internal PTLsim address space not accessible
to x86 code. Typically this address space is mapped to internal machine
state registers (MSRs) and microcode scratch space. The internal address
to access is specified by the sum ra + rb.
The ldp form zero extends the loaded value,
while the ldxp form sign extends the loaded
value to 64 bits.
- Load uops do not generate any other condition code flags
- Internal loads may not be unaligned, and never stall or generate
exceptions.
stp
Store to Internal Microcode Space
| Mnemonic |
Syntax |
Operation
stp |
Notes:
- The stp uop stores a value to the
internal PTLsim address space not accessible to x86 code. Typically
this address space is mapped to internal machine state registers (MSRs)
and microcode scratch space. The internal address to store is specified
by the sum ra + rb and
the value to store is specified by rc.
- Store uops do not generate any other condition code flags
- Internal stores may not be unaligned, and never stall or generate
exceptions.
shl shr sar rotl rotr rotcl rotcr
Shifts and Rotates
| Mnemonic |
Syntax |
Operation
shl |
Notes:
- The shift and rotate instructions have some of the most bizarre
semantics in the entire x86 instruction set: they may or may not modify
flags depending on the rotation count operand, which we may not even
know until the instruction issues. This is introduced in Section 5.9.
- The specific rules are as follows:
- If the count
is zero, no flags are modified
- If the count
, both OF and CF are modified, but ZAPS
is preserved
- If the count
, only the CF is modified. (Technically
the value in OF is undefined, but on K8 and P4, it retains the old
value, so we try to be compatible).
- Shifts also alter the ZAPS flags while rotates do not.
- For constant counts (immediate rb values),
the semantics are easy to determine in advance.
- For variable counts (rb comes from
register), things are more complex. Since the shift needs to determine
its output flags at runtime based on both the shift count and the
input flags (CF, OF, ZAPS), we need to specify the latest versions
in program order of all the existing flags. However, this would require
three operands to the shift uop not even counting the value and count
operands. Therefore, we use a collcc (collect
condition code flags, see Section 5.4) uop
to get all the most up to date flags into one result, using three
operands for ZAPS, CF, OF. This forms a zero word with all the correct
flags attached, which is then forwarded as the rc
operand to the shift. This may add additional scheduling constraints
in the case that one of the operands to the shift itself sets the
flags, but this is fairly rare. Conveniently, this also lets us directly
implement the 65-bit rotcl/rotcr
uops in hardware with little additional complexity.
- All operations merge the ALU result with ra
and generate flags in accordance with the standard x86 merging
rules described previously.
- The specific flags attached to the result depend on the input
conditions described above. The user should always assume these uops
always produce the latest version of each of the ZAPS, CF, OF flag
sets.
mask
Masking, Insertion and Extraction
| Mnemonic |
Syntax |
Operation
mask.x|z |
Notes:
- The mask uop and its variants are
used for generalized bit field extraction, insertion, sign and zero
extension using the 18-bit control field in the immediate
- These uops are used extensively within PTLsim microcode, but
are also useful if the processor supports dynamically merging a chain
of shr, and, or
uops.
- The condition code flags (ZAPS, CF, OF) are the flags logically
generated by the final AND operation.
Control Field Format
The 18-bit rc immediate has the following
three 6-bit fields:
- The mask uop and its variants are
used for generalized bit field extraction, insertion, sign and zero
extension using the 18-bit control field in the immediate
Operation:
-
- M = 1'[(ms+mc-1):ms]
T = (ra & ~M) | ((rb >>> ds) & M)
if (Z) {
# Zero extend
rd = ra
(T & 1'[(ms+mc-1):0])
else if (X) {
# Sign extend
rd = ra
(T[ms+mc-1]) ? (T | 1'[63:(ms+mc)]) : (T & 1'[(ms+mc-1):0])
} else {
rd = ra
T
}
bswap
Byte Swap
| Mnemonic |
Syntax |
Operation
bswap |
Notes:
- The bswap uop reverses the endianness
of the rb operand. The uop's effective result
size determines the range of bytes which are reversed.
- This uop's semantics are identical to the x86 bswap
instruction.
- This uop does not generate any condition code flags.
collcc
Collect Condition Codes
| Mnemonic |
Syntax |
Operation
collcc |
Notes:
- The collcc uop collects the condition
code flags from three potentially distinct source operands into a
single output with the combined condition code flags in both its appended
flags and data.
- This uop is useful for collecting all flags before passing
them as input to another uop which only supports one source of flags
(for instance, the shift and rotate uops).
movccr movrcc
Move Condition Code Flags Between Register Value and
Flag Parts
| Mnemonic |
Syntax |
Operation
movccr |
Notes:
- The movccr uop takes the condition
code flag bits attached to ra and copies
them into the 64-bit register part of the result.
- The movrcc uop takes the low bits
of the ra operand and moves those bits into
the condition code flag bits attached to the result.
- The bits moved consist of the ZF, PF, SF, CF, OF flags
- The WAIT and INV flags of the result are always cleared since
the uop would not even issue if these were set in ra.
andcc orcc ornotcc xorcc
Logical Operations on Condition Codes
| Mnemonic |
Syntax |
Operation
andcc |
Notes:
- These uops are used to perform logical operations on the condition
code flags attached to ra and rb.
- If the rb operand is an immediate,
the immediate data is used instead of the flags normally attached
to a register operand.
- The 64-bit value of the output is always set to zero.
mull mulh
Integer Multiplication
| Mnemonic |
Syntax |
Operation
mull |
Notes:
- These uops multiply ra and rb,
then retain only the low N bits or high
N bits of the result (where N is the uop's
effective result size in bits). This result is then merged into ra.
- The condition code flags generated by these uops correspond
to the normal x86 semantics for integer multiplication (imul);
the flags are calculated relative to the effective result size.
- The rb operand may be an immediate
bt bts btr btc
Bit Testing and Manipulation
| Mnemonic |
Syntax |
Operation
bt |
Notes:
- These uops test a given bit in ra and
then atomically modify (set, reset or complement) that bit in the
result.
- The CF flag of the output is set to the original value in
bit position rb of ra.
Other condition code flag bits in the output are undefined.
- The bt (bit test) uop is special:
it generates a value of -1 or +1 if the tested bit is 1 or 0, respectively.
This is used in microcode for setting up an increment for the rep
x86 instructions.
ctz clz
Count Trailing or Leading Zeros
| Mnemonic |
Syntax |
Operation
ctz |
Notes:
- These uops find the bit index of the first '1' bit in rb,
starting from the lowest bit 0 (for ctz)
or the highest bit of the data type (for clz).
- The result is zero (technically, undefined) if ra is zero.
- The ZF flag of the result is 1 if rb was
zero, or 0 if rb was nonzero. Other condition
code flags are undefined.
ctpop
Count Population of '1' Bits
| Mnemonic |
Syntax |
Operation
ctpop |
Notes:
- The ctpop uop counts the number of
'1' bits in the ra operand.
- The ZF flag of the result is 1 if ra was
zero, or 0 if ra was nonzero. Other condition
code flags are undefined.
Floating Point Format and Merging
All floating point uops use the same encoding to specify the
precision and vector format of the operands. The uop's size
field is encoded as follows:
- 00: Single precision scalar floating
point (opfp
mnemonic). The operation is only performed on the low 32 bits (in
IEEE single precision format) of the 64-bit inputs; the high 32 bits
of the ra operand are copied to the high 32 bits of the output.
- 01: Single precision vector floating
point (opfv
mnemonic). The operation is performed on both 32 bit halves (in IEEE
single precision format) of the 64-bit inputs in parallel
- 1x: Double precision scalar floating
point (opfd
mnemonic). The operation is performed on the full 64 bit inputs (in
IEEE double precision format)
Most floating point operations merge the result with the
ra operand to prepare the destination. Since
a full 64-bit result is generated with the vector and double formats,
the ra operand is not needed and may be specified
as zero to reduce dependencies.
Exceptions to this encoding are listed where appropriate.
Unless otherwise noted, all operations update the internal
floating point status register (FPSR, equivalent to the MXCSR register
in x86 code) by ORing in any exceptions that occur. If the uop is
encoded to generate an actual exception on excepting conditions, the
FLAG_INV flag is attached to the output to
cause an exception at commit time.
No condition code flags are generated by floating point uops
unless otherwise noted.
addf subf mulf divf minf maxf
Floating Point Arithmetic
| Mnemonic |
Syntax |
Operation
addf |
Notes:
- These uops do arithmetic on floating point numbers in various
formats as specified in the Floating Point Format and
Merging page.
maddf msubf
Fused Multiply Add and Subtract
| Mnemonic |
Syntax |
Operation
maddf |
Notes:
- The maddf and msubf
uops perform fused multiply and accumulate operations on three operands.
- The full internal precision is preserved between the multiply
and add operations; rounding only occurs at the end.
- These uops are primarily used by microcode to calculate floating
point division, square root and reciprocal.
sqrtf rcpf rsqrtf
Square Root, Reciprocal and Reciprocal Square Root
| Mnemonic |
Syntax |
Operation
sqrtf |
Notes:
- These uops perform the specified unary operation on rb and
merge the result into ra (for a single precision scalar mode only)
- The rcpf and rsqrtf
uops are approximates - they do not provide the full precision results.
These approximations are in accordance with the standard x86 SSE/SSE2
semantics.
cmpf
Compare Floating Point
| Mnemonic |
Syntax |
Operation
cmpf.type |
Notes:
- This uop performs the specified comparison of ra
and rb. If the comparison is true,
the result is set to all '1' bits; otherwise it is zero. The result
is then merged into ra.
- The cond field in the uop encoding
holds the comparison type. The set of compare types matches the x86
SSE/SSE2 CMPxx instructions.
cmpccf
Compare Floating Point and Generate Condition Codes
| Mnemonic |
Syntax |
Operation
cmpccf.type |
Notes:
- This uop performs all comparisons of ra and
rb and produces x86 condition code flags (ZF,
PF, CF) to represent the result.
- The semantics of the generated condition code flags exactly
matches the x86 SSE/SSE2 instructions COMISS/COMISD/UCOMISS/UCOMISD.
- Unlike most encodings, the size field
holds the comparison type of the two values as follows:
- 00: cmpccfp: single
precision ordered compare (same semantics as x86 SSE COMISS)
- 01: cmpccfp.u: single
precision unordered compare (same semantics as x86 SSE UCOMISS)
- 10: cmpccfd: double
precision ordered compare (same semantics as x86 SSE2 COMISD)
- 11: cmpccfd.u: double
precision ordered compare (same semantics as x86 SSE2 UCOMISD)
cvtf.i2s.ins cvtf.i2s.p cvtf.i2d.lo
cvtf.i2d.hi
Convert 32-bit Integer to Floating Point
| Mnemonic |
Syntax |
Operation |
Used By
cvtf.i2s.ins |
Notes:
- These uops convert 32-bit integers to single or double precision
floating point
- The semantics of these instructions are identical to the semantics
of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by
these uops
cvtf.q2s.ins cvtf.q2d
Convert 64-bit Integer to Floating Point
| Mnemonic |
Syntax |
Operation |
Used By
cvtf.q2s.ins |
Notes:
- These uops convert 64-bit integers to single or double precision
floating point
- The semantics of these instructions are identical to the semantics
of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by
these uops
cvtf.s2i cvt.s2q cvtf.s2i.p
Convert Single Precision Floating Point to Integer
| Mnemonic |
Syntax |
Operation |
Used By
cvtf.s2i |
Notes:
- These uops convert single precision floating point values
to 32-bit or 64-bit integers
- The semantics of these instructions are identical to the semantics
of the x86 SSE/SSE2 instructions shown in the table
- Unlike most encodings, the size field
holds the rounding type of the result as follows:
- x0: normal IEEE rounding (as determined
by FPSR)
- x1: truncate to zero
cvtf.d2i cvtf.d2q cvtf.d2i.p
Convert Double Precision Floating Point to Integer
| Mnemonic |
Syntax |
Operation |
Used By
cvtf.d2i |
Notes:
- These uops convert double precision floating point values
to 32-bit or 64-bit integers
- The semantics of these instructions are identical to the semantics
of the x86 SSE/SSE2 instructions shown in the table
- Unlike most encodings, the size field
holds the rounding type of the result as follows:
- x0: normal IEEE rounding (as determined
by FPSR)
- x1: truncate to zero
cvtf.d2s.ins cvtf.d2s.p cvtf.s2d.lo
cvtf.s2d.hi
Convert Between Double Precision and Single Precision
Floating Point
| Mnemonic |
Syntax |
Operation |
Used By
cvtf.d2s.ins |
Notes:
- These uops convert single precision floating point values
to double precision floating point values
- The semantics of these instructions are identical to the semantics
of the x86 SSE/SSE2 instructions shown in the table
- The uop size field is not used by
these uops
Next: Performance Counters
Up: Appendices
Previous: Appendices
Contents
Matt T Yourst
2007-09-26