Next: Out Of Order Core
Up: Out of Order Processor
Previous: Out of Order Processor
Contents
Subsections
Introduction
PTLsim completely models a modern out of order x86-64 compatible processor,
cache hierarchy and key devices with true cycle accurate simulation.
The basic microarchitecture of this model is a combination of design
features from the Intel Pentium 4, AMD K8 and Intel Core 2, but incorporates
some ideas from IBM Power4/Power5 and Alpha EV8. The following is
a summary of the characteristics of this processor model:
- The simulator directly fetches pre-decoded micro-operations (Section
17.1) but can simulate cache accesses as if x86 instructions
were being decoded on fetch
- Branch prediction is configurable; PTLsim currently includes various
models including a hybrid g-share based predictor, bimodal predictors,
saturating counters, etc.
- Register renaming takes into account x86 quirks such as flags renaming
(Section 5.4)
- Front end pipeline has configurable number of cycles to simulate x86
decoding or other tasks; this is used for adjusting the branch mispredict
penalty
- Unified physical and architectural register file maps both in-flight
uops as well as committed architectural register values. Two rename
tables (speculative and committed register rename tables) are used
to track which physical registers are currently mapped to architectural
registers.
- Unified physical register file for both integer and floating point
values.
- Operands are read from the physical register file immediately before
issue. Unlike in some microprocessors, PTLsim does not do speculative
scheduling: the schedule and register read loop is assumed to take
one cycle.
- Issue queues based on a collapsing design use broadcast based matching
to wake up instructions.
- Clustered microarchitecture is highly configurable, allowing multi-cycle
latencies between clusters and multiple issue queues within the same
logical cluster.
- Functional units, mapping of functional units to clusters, issue ports
and issue queues and uop latencies are all configurable.
- Speculation recovery from branch mispredictions and load/store aliasing
uses the forward walk method to recover the rename tables, then annuls
all uops after and optionally including the mis-speculated uop.
- Replay of loads and stores after store to load forwarding and store
to store merging dependencies are discovered.
- Stores may issue even before data to store is known; the store uop
is replayed when all operands arrive.
- Load and store queues use partial chunk address matching and store
merging for high performance and easy circuit implementation.
- Prediction of load/store aliasing to avoid mis-speculation recovery
overhead.
- Prediction and splitting of unaligned loads and stores to avoid mis-speculation
overhead
- Commit unit supports stalling until all uops in an x86 instruction
are complete, to make x86 instruction commitment atomic
The PTLsim model is fully configurable in terms of the sizes of key
structures, pipeline widths, latency and bandwidth and numerous other
features.
PTLsim uses the concept of a VCPU (virtual CPU) to represent
one user-visible microprocessor core (or a hardware thread if a SMT
machine is being modeled). The Context structure
(defined in ptlhwdef.h) maintains all per-VCPU
state in PTLsim: this includes both user-visible architectural registers
(in the Context.commitarf[] array) as well
as all per-core control registers and internal state information.
Context only contains general x86-visible context
information; specific machine models must maintain microarchitectural
state (like physical registers and so forth) in their own internal
structures.
The contextof(N)
macro is used to return the Context object
for a specific VCPU, numbered 0 to contextcount-1.
In userspace-only PTLsim, there is only one context, contextof(0).
In full system PTLsim/X, there may be up to 32 (i.e. MAX_CONTEXTS)
separate contexts (VCPUs).
PTLsim Machine/Core/Thread
Class Hierarchy
PTLsim easily supports user defined plug-in machine models. Two of
these models, the out of order core (``ooo'')
and the sequential in-order core (``seq'')
ship with PTLsim; others can be easily added by users. PTLsim implements
several C++ classes used to build simulation models by dividing a
virtual machine into CPU sockets, cores and threads.
The PTLsimMachine class is at the root of the
hierarchy. Every simulation model must subclass PTLsimMachine
and define its virtual methods. Adding a machine model to PTLsim is
very simple: simply define one instance of your machine class in a
source file included in the Makefile. For instance, assuming XYZMachine
subclasses PTLsimMachine and will be called
``xyz'':
-
- XyzMachine xyzmodel(``xyz'');
The constructor for XyzMachine will be called
by PTLsim after all other subsystems are brought up. It should use
the addmachine(``name'')
static method to register the core model's name with PTLsim, so it
can be specified using the ``-corexyz'' option.
The machine models included with PTLsim (namely, OutOfOrderMachine
and SequentialMachine) have been placed in
their own C++ namespace. When adding your own core, copy the example
source file(s) to new names and adjust the namespace specifiers to
a new name to avoid clashes. You should be able to link any number
of machine models defined in this manner into PTLsim all at once.
The PTLsimMachine::init() method is called
to initialize each machine model the first time it is used. This function
is responsible for dividing the contextcount
contexts up into sockets, cores and threads, depending entirely on
the machine model's design and any configuration options specified
by the config parameter.
PTLsimMachine::run() is called to actually
run the simulation; more details will be given on this later.
PTLsimMachine::update_stats() is described
in Section 8.
PTLsimMachine::dump_state() is called to aid
debugging whenever an assertion fails, the simulator accesses a null
pointer or invalid address, or from anywhere else it may be useful.
Next: Out Of Order Core
Up: Out of Order Processor
Previous: Out of Order Processor
Contents
Matt T Yourst
2007-09-26