Abstract
Active Memory is the term given to a collection of memory technologies implemented in high-end IBM eServer xSeries servers.
Main
- Memory ProteXion
Memory ProteXion, also known as “redundant bit steering”, is the technology behind using redundant bits in a data packet to provide backup in the event of a DIMM failure.
Currently, other industry-standard servers use 8 bits of the 72-bit data packets for ECC functions and the remaining 64 bits for data. However, the server needs only 6 bits to perform the same ECC functions, thus leaving 2 bits free. In the event that a chip failure on the DIMM is detected by memory scrubbing, the memory controller can re-route data around that failed chip through the spare bits (similar to the hot-spare drive of a RAID array). It can do this automatically without issuing a Predictive Failure Analysis (PFA) or light path diagnostics alert to the administrator. After the second DIMM failure, PFA and light path diagnostics alerts would occur on that DIMM as normal.
- Memory scrubbing
Memory scrubbing is an automatic daily test of all the system memory that detects and reports memory errors that might be developing before they cause a server outage.
Memory scrubbing and Memory ProteXion work in conjunction with each other and do not require memory mirroring to be enabled to work properly.
When a bit error is detected, memory scrubbing determines if the error is recoverable or not. If it is recoverable, Memory ProteXion is enabled and the data that was stored in the damaged locations is rewritten to a new location. The error is then reported so that preventative maintenance can be performed. As long as there are enough good locations to allow the proper operation of the server, no further action is taken other than recording the error in the error logs.
If the error is not recoverable, then memory scrubbing sends an error message to the light path diagnostics, which then turns on the proper lights and LEDs to guide you to the damaged DIMM. If memory mirroring is enabled, then the mirrored copy of the data in the damaged DIMM is used until the system is powered down and the DIMM replaced. If hot-add is enabled in the BIOS, then no rebooting would be required and the new DIMM would be enabled immediately.
- Memory mirroring
Memory mirroring is roughly equivalent to RAID-1 in disk arrays, in that memory is divided in two ports and one port is mirrored to the other half. If 8 GB is installed, then the operating system sees 4 GB once memory mirroring is enabled (it is disabled in the BIOS by default). Since all mirroring activities are handled by the hardware, memory mirroring is operating system independent. Certain restrictions exist with respect to placement and size of memory DIMMs when memory mirroring is enabled, and these are system dependant.
- Hot-swap and hot-add memory
Currently, only the x445 supports hot-swap and hot-add memory. There are two configurations where you can add or replace memory while the server is still running:
- Hot-swap, where you can replace failed DIMMs of the same type, size, and clock speed without turning off the server. Hot-swap memory is operating-system independent. Memory mirroring must be enabled to use hot-swap.
- Hot-add, where you can add new DIMMs without turning off the server, thereby increasing the amount of RAM available to the operating system. This feature is currently only supported by Windows Server 2003, Enterprise Edition and Datacenter Edition. Memory mirroring must be disabled when using hot-add and due to the way memory is implemented in the x445, the port you are adding memory to must be empty before you add memory, and DIMMs must be added in multiples of two.
- Chipkill memory
Chipkill is integrated into the XA-32 second-generation chipset and does not require special Chipkill DIMMs. Chipkill corrects multiple single-bit errors to keep a DIMM from failing. When combining Chipkill with Memory ProteXion and Active Memory, the server provides very high reliability in the memory subsystem. Chipkill memory is approximately 100 times more effective than ECC technology, providing correction for up to four bits per DIMM (eight bits per memory controller), whether on a single chip or multiple chips.
If a memory chip error does occur, Chipkill is designed to automatically take the inoperative memory chip offline while the server keeps running. The memory controller provides memory protection similar in concept to disk array striping with parity, writing the memory bits across multiple memory chips on the DIMM. The controller is able to reconstruct the “missing” bit from the failed chip and continue working as usual.
Chipkill support is provided in the memory controller and implemented using standard ECC DIMMs, so it is transparent to the OS.
In addition, to maintain the highest levels of system availability, if a memory error is detected during POST or memory configuration, the server can automatically disable the failing memory bank and continue operating with reduced memory capacity. You can manually re-enable the memory bank after the problem is corrected via the Setup menu in the BIOS.
Trademarks
Lenovo and the Lenovo logo are trademarks or registered trademarks of Lenovo in the United States, other countries, or both. A current list of Lenovo trademarks is available on the Web at https://www.lenovo.com/us/en/legal/copytrade/.
The following terms are trademarks of Lenovo in the United States, other countries, or both:
Lenovo®
X-Architecture®
xSeries®
The following terms are trademarks of other companies:
Windows Server® and Windows® are trademarks of Microsoft Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.