High Availability Architectures and Solutions
The Maximum Availability Architecture (MAA) is
Oracle's best practices blueprint. It is based on proven Oracle high
availability technologies and recommendations. The goal of the MAA is to remove
the complexity in designing the optimal high availability architecture by
providing configuration recommendations and tuning tips to optimize your
architecture and Oracle features.
Traditionally, Oracle RAC is used in a multinode architecture,
with many separate database instances running on separate servers. Oracle RAC
One Node allows you to run one instance of an Oracle RAC database on a single
node in a cluster. Thus, this feature allows you to consolidate many databases
into a single cluster for easier management, while still providing high
availability by quickly relocating instances in the event of server failure.
If the node running your Oracle RAC One Node becomes overloaded,
you can relocate the
instance to another node in the cluster using the online database relocation
utility (
srvctl
relocate database
), with no downtime for application users.
You can allocate server resources to multiple instances
using Oracle Database Resource Manager Instance
Caging. Server scalability is unlimited, and if applications grow to require
more resources than a single node can supply, you can perform an online upgrade
to a traditional multinode Oracle RAC configuration.
The high availability benefits to using Oracle RAC One Node
include the following:
·
Offers better database availability than traditional cold
failover solutions
·
Provides better virtualization for databases than
hypervisor-based solutions
·
Enables online migration of database instances and online
patching and upgrading of operating system and database software (incurring no
downtime)
·
Delivers a comprehensive, single-vendor solution, with no need
to implement third-party products
·
Is ready to scale and upgrade to multinode Oracle RAC
·
Provides a standardized environment and a common toolset for
both single-node and multinode Oracle database deployments
·
Is less expensive than cold fail over solutions or a full Oracle
RAC deployment
·
Fully supports Oracle Data Guard. Any database in a Data Guard
configuration, whether a primary or standby database, can be an Oracle One Node
database.
An architecture that combines Oracle Database with Oracle
RAC is inherently a highly available system. Unlike
a traditional monolithic database server that is expensive and is not flexible
to changing capacity and resource demands, Oracle RAC combines the processing
power of multiple interconnected computers to provide system redundancy,
scalability, and high availability.
The clusters that are typical of Oracle RAC environments can
provide continuous service for both planned and unplanned outages. Oracle RAC
builds higher levels of availability on top of the standard Oracle Database
features. All single-instance high availability features, such as the Flashback
technologies and online reorganization, also apply to Oracle RAC. Applications
scale in an Oracle RAC environment to meet increasing data processing demands
without changing the application code. In addition, allowing maintenance
operations to occur on a subset of components in the cluster while the
application continues to run on the rest of the cluster can reduce planned
downtime.
Oracle RAC exploits the redundancy that is provided by
clustering to deliver availability with n -
1 node failures in an n-node cluster.
Unlike the cold cluster model where one node is completely idle, all instances
and nodes can be active to scale your application. Communication among the
nodes is optimized by means of Redundant Interconnect Usage (without requiring the use of
bonding or other technologies) to provide stability, reliability, and
scalability.
Oracle Database with Oracle RAC architecture provides the
following benefits over a traditional monolithic database server and the cold
cluster failover model:
·
Scalability across database instances
·
Flexibility to increase processing capacity using commodity
hardware without downtime or changes to the application
·
Ability to tolerate and quickly recover from computer and
instance failures (measured in seconds)
·
Optimized communication in the cluster over redundant network
interfaces, without using bonding or other technologies
Oracle Grid
Infrastructure and Oracle RAC make use of Redundant Interconnect Usage that
distributes network traffic and ensures optimal communication in the cluster.
This functionality is available starting with Oracle Database 11g Release 2 (11.2.0.2). In previous releases,
technologies like bonding or trunking were used to make use of redundant
networks for the interconnect.
·
Rolling upgrades for system and hardware changes
·
Rolling patch upgrades for some interim patches, security
patches, CPUs, and cluster software
·
Fast, automatic, and intelligent connection and service
relocation and failover
·
Comprehensive manageability integrating database and cluster
features with Grid Plug and Play and policy-based cluster and capacity
management
·
Load balancing advisory and run-time connection load balancing
help redirect and balance work across the appropriate resources
Oracle Data Guard is a high
availability and disaster-recovery solution that provides very fast automatic
failover (referred to as fast-start failover) in database failures, node
failures, corruption, and media failures. Furthermore, the standby databases
can be used for read-only access and subsequently for reader farms, for
reporting, and for testing and development.
Although traditional solutions (such as backup and recovery from
tape, storage-based remote mirroring, and database log shipping) can deliver
some level of high availability, Oracle Data Guard provides the most
comprehensive high availability and disaster recovery solution for Oracle
databases.
Oracle Data Guard provides a number of advantages over
traditional solutions, including the following:
·
Fast, automatic or automated database failover for data
corruptions, lost writes, and database and site failures
·
Automatic corruption repair automatically replaces a corrupted
block on the primary or physical standby by copying a good block from a
physical standby or primary database
·
Most comprehensive protection against data corruptions and lost
writes on the primary database
·
Reduced downtime for storage, Oracle ASM, Oracle RAC, system
migrations and some platform migrations, and changes using Data Guard
switchover
·
Reduced downtime with Oracle Data Guard rolling upgrade
capabilities
·
Ability to off-load primary database activities—such as backups,
queries, or reporting—without sacrificing the RTO and RPO ability to use the
standby database as a read-only resource using the real-time query apply lag
capability
·
Ability to integrate non-database files using Oracle Database File System (DBFS) as part of the full site
failover operations
·
No need for instance restart, storage remastering, or
application reconnections after site failures
·
Transparency to applications
·
Transparent and integrated support for application failover
·
Effective network utilization
Fast-Start Fault Recovery
Oracle provides fast and predictable recovery from system faults
and database failures. The Fast-Start Fault Recovery technology included in
Oracle Database automatically bounds database recovery time at startup by using
self-tuned checkpoint processing. This makes recovery time fast and
predictable, and improves the ability to meet service-level objectives. The
Oracle Fast-Start Fault Recovery feature can reduce recovery time on a
heavily laden database from tens of minutes to a few seconds.
Fast-Start Fault Recovery features include:
·
Predictable, bounded recovery from instance, database, and
computer failures
·
Database checkpointing that is self-tuning to maintain a desired
recovery time objective
Automatic Storage Management
Automatic Storage Management (ASM) provides a vertically
integrated file system and volume manager directly in the Oracle kernel,
resulting in:
·
Significantly less work to provision database storage
·
Higher level of availability
·
Elimination of the expense, installation, and maintenance of
specialized storage products
·
Unique capabilities for database applications
For optimal performance, ASM spreads files across all available
storage. To protect against data loss, ASM extends the concept of SAME (stripe
and mirror everything) and adds more flexibility in that it can mirror at the
database file level rather than the entire disk level.
More importantly, ASM simplifies the processes of setting up
mirroring, adding disks, and removing disks. Instead of managing hundreds and
possibly thousands of files (as in a large data warehouse), DBAs using ASM
create and administer a larger-grained object called a disk group. The disk
group identifies the set of disks that are managed as a logical unit.
Automation of file naming and placement of the underlying database files save
administrators time and ensure adherence to standard best practices.
The ASM native mirroring mechanism (2-way or 3-way) is an option
that protects against storage failures. With ASM mirroring, you can provide an
additional level of data protection with the use of failure groups. A failure
group is a set of disks sharing a common resource (disk controller or an entire
disk array) whose failure can be tolerated. Once defined, an ASM failure group intelligently
places redundant copies of the data in separate failure groups. This ensures
that the data is available and transparently protected against the failure of
any component in the storage subsystem.
ASM provides the following benefits:
·
Provides the ability to mirror and stripe across drives and
storage arrays
·
Automatically re-mirrors from a failed drive to remaining drives
·
Automatically rebalances stored data when disks are added or
removed while the database remains online
·
Allows for operational simplicity in managing database storage
·
Provides local read capability, which gives better performance
in an extended cluster
·
Supports very large databases
·
Supports ASM rolling upgrades
·
Supports finer granularity in tuning and security
·
ASM Fast Mirror Resync, which provides fast repair after a temporary
disk failure
Recovery Manager
Recovery Manager (RMAN) is an Oracle utility to manage database
backup and, more importantly, the recovery of the database. RMAN eliminates
operational complexity while providing superior performance and availability of
the database.
RMAN determines the most efficient method of executing the
requested backup, restoration, or recovery operation and then submits these
operations to the Oracle Database server for processing. RMAN and the server
automatically identify modifications to the structure of the database and
dynamically adjust the required operation to adapt to the changes.
RMAN provides the following benefits:
·
Automatic channel failover on backup and restore operations
·
Automatic failover to a previous backup when the restore
operation discovers a missing or corrupt backup
·
Automatic creation of new database and temporary files during
recovery
·
Automatic recovery through a previous point-in-time
recovery—recovery through resetlogs
·
Block media recovery enables the data file to remain online
while fixing the block corruption
·
Fast incremental backups using block change tracking
·
Fast backup and restore operations with intrafile and interfile
parallelism
·
Enhanced security with Virtual Private Catalog
·
Lower space consumption when creating a database over the
network by eliminating staging areas
·
Merge incremental backups into image copies in the background
providing up-to-date recoverability
·
Optimized backup and restore of required files only
·
Retention policy ensures that relevant backups are retained
·
Ability to resume backup and restore of previously failed
operations
·
Automatic backup of the control file and the server parameter
file ensuring that backup metadata is available in times of database structural
changes and media failure and disasters
·
Online backup does not require the database to be placed into
hot backup mode
Oracle Secure Backup
Oracle Secure Backup is a centralized tape backup management
solution providing performant, heterogeneous data protection in distributed
UNIX, Linux, Windows, and Network Attached Storage (NAS) environments. By
protecting file system and Oracle database data, Oracle Secure Backup provides
a complete tape backup solution for your IT environment.
Oracle Secure Backup is tightly integrated with RMAN to provide
the media management layer for RMAN, supporting releases since Oracle9i. With
optimized integration points, Oracle Secure Backup and RMAN provide the fastest
and most efficient tape backup capability for the Oracle database.
You can backup distributed servers to local and remote tape
devices from a central Oracle Secure Backup administrative server using backup
policies, calendar-based scheduling for lights outoperations, or
on-demand backup for immediate requirements. With its highly scalable
client/server architecture, Oracle Secure Backup provides local and remote data
protection, leveraging SSL for secure intradomain communication and two-way
server authentication.
The following list describes the key benefits of Oracle Secure
Backup:
·
Optimized tape backup for the Oracle database by backing up only
the currently used blocks and increasing backup performance by 10% to 25%.
·
Policy-based management allows backup administrators to exercise
precise control over the backup domain.
·
Dynamic drive sharing for increased tape resource use.
·
Heterogeneous storage area network (SAN) support allowing NAS,
UNIX, Windows, and Linux to share tape drives and media.
·
File system backup at the file, directory, file system or raw
partition level with full, incremental and offsite backup scheduling.
·
Integrated with Oracle Enterprise Manager, providing an
intuitive, familiar interface.
·
Backup encryption to tape.
·
Broad tape-device support for new and legacy tape devices in SAN
and SCSI environments.
·
Network Data Management Protocol (NDMP) support for highly
efficient backup of NAS filers.
·
Scalable, low-cost licensing model reduces IT costs and
operational considerations.
Flash Recovery Area
The flash recovery area is a unified storage location for
all recovery-related files and activities in Oracle Database. After this
feature is enabled, all RMAN backups, archived redo logs, control file
autobackups, and data file copies are automatically written to a specified file
system or automatic storage management disk group, and the management of this
disk space is handled by RMAN and the database server.
Performing a backup to disk is faster because using the flash
recovery area eliminates the bottleneck of writing to tape. More importantly,
if database media recovery is required, then data file backups are readily
available. Restoration and recovery time is reduced because you do not need to
find a tape and a free tape device to restore the needed data files and
archived redo logs.
The flash recovery area provides the following benefits:
·
Unified storage location of related recovery files
·
Management of the disk space allocated for recovery files, which
simplifies database administration tasks
·
Fast, reliable disk-based backup and restoration
·
Ability to backup and restore the entire flash recovery area
·
Ability to tolerate failures to the flash recovery area
LogMiner
Oracle log files contain useful information about the activities
and history of the Oracle database. Log files contain all data necessary to
perform database recovery, and also record all changes made to the data and
metadata in the database.
LogMiner is a fully relational tool that allows redo log files
to be read, analyzed, and interpreted using SQL. Using LogMiner, you can
analyze log files to:
·
Track or audit changes to data
·
Provide supplemental information for tuning and capacity
planning
·
Retrieve critical information for debugging complex applications
·
Recover deleted data
·
Provide additional browser-based simplification to help
troubleshoot and resolve logical failures
LogMiner features include:
·
Pinpointing when a logical corruption to the database—such as
errors made at the application level—may have occurred
·
Determining the necessary actions to perform fine-grained
recovery at the transaction level
·
Providing performance tuning and capacity planning through trend
analysis
·
Performing post auditing
Comments
Post a Comment