Oracle high availability architecture and best practices manual
Enable Autonomous Data Guard for mission-critical production databases that have stricter uptime requirements than databases with the default high-availability configuration and limited data-loss tolerance. Enabling Autonomous Data Guard adds one symmetric standby database with Oracle Data Guard to an Exadata rack that is located in another availability domain or in another region. The primary and standby database systems are configured symmetrically by default to ensure that performance service levels are maintained after Data Guard role transitions.
Oracle Data Guard features asynchronous redo transport in maximum performance mode by default to ensure zero application performance impact. Data Guard zero data loss protection can be achieved by changing to synchronous redo transport in maximum availability mode ; however, Maximum Availability database protection mode with synchronous redo transport is only available with Autonomous Database on Dedicated Infrastructure.
The standby database can be placed within the same availability domain, across availability domains, or across regions. As with databases that are not Data Guard-enabled, each Autonomous Database application service resides in at least one Oracle RAC instance and will automatically fail over to another available Oracle RAC instance, as previously described. The read-only standby database provides expanded application services to offload reporting, queries, and some updates.
The Database Backup Cloud Service schedules automated backups, which are stored in Oracle Cloud Infrastructure Object Storage and replicated to another availability domain. Those backups can be used to restore databases in the event of a double disaster where both primary and standby databases are lost.
Local and remote virtual cloud network peering provides a secure, high-bandwidth network across availability domains and regions for any traffic between primary and standby servers. When you use Maximum Availability Architecture best practices for application continuous service, most months would have a downtime of zero.
The uptime service-level objective does not include downtime as a result of user-initiated high availability tests, user-initiated Data Guard switchover tests, detection time to determine whether the primary database is down, and time leading up to customer initiated manual Data Guard switchover or failover operation.
You can choose whether your database failover site is located in a different availability domain within the same region or in a different region, contingent upon application or business requirements and data center availability.
Events that require failover to the standby database using Autonomous Data Guard, including:. For complete container database CDB failure the most common disaster or complete storage and data center failures, zero data loss Data Guard failover is available for Autonomous Data Guard on Dedicated Infrastructure.
Ensure that network connectivity to Oracle Cloud Infrastructure is reliable so that you can access your tenancy's Autonomous Database resources. Follow the guidelines to connect to your autonomous database Shared Infrastructure , Dedicated Infrastructure. Applications must connect to the predefined service name and download client credentials that include the proper tnsnsames. For more details about enabling continuous application service through planned and unplanned outages, see Application Checklist for Continuous Service for MAA Solutions.
Total Cost of Ownership and Return on Investment. The business impact analysis categorizes the business processes based on the severity of the impact of IT-related outages. Calculates the quantifiable loss risk for unplanned and planned IT outages affecting each of these business processes. Considers essential business functions, people and system resources, government regulations, and internal and external business dependencies.
Is based on objective and subjective data gathered from interviews with knowledgeable and experienced personnel. Reviews business practice histories, financial reports, IT systems logs, and so on. For example, consider a semiconductor manufacturer with chip fabrication plants located worldwide.
Semiconductor manufacturing is an intensely competitive business requiring a huge financial investment that is amortized over high production volumes. The human resource applications used by plant administration are unlikely to be considered as mission-critical as the applications that control the manufacturing process in the plant.
Failure of the applications that support manufacturing affects production levels and have a direct impact on the financial results of the company. As another example, an internal knowledge management system is likely to be considered mission-critical for a management consulting firm, because the business of a client-focused company is based on internal research accessibility for its consultants and knowledge workers.
The cost of downtime of such a system is extremely high for this business. Similarly, an e-commerce company is highly dependent on customer traffic to its website to generate revenue. Any disruption in service and loss of availability can dampen customer experience and drive away customers to the competition. Thus, the company needs to ensure that the existing infrastructure can scale and handle spikes in customer traffic. Sometimes, this is not possible using on-premise hardware and by moving the cloud the company can ensure their systems always remain operational.
A complete business impact analysis provides the insight needed to quantify the cost of unplanned and planned downtime. Understanding this cost is essential because it helps prioritize your high availability investment and directly influences the high availability technologies that you choose to minimize the downtime risk. Various reports have been published, documenting the costs of downtime in different industries.
Examples include costs that range from millions of dollars for each hour of brokerage operations and credit card sales, to tens of thousands of dollars for each hour of package shipping services.
These numbers are staggering. The Internet and Cloud can connect the business directly to millions of customers. Application downtime can disrupt this connection, cutting off a business from its customers. In addition to lost revenue, downtime can negatively affect customer relationships, competitive advantages, legal obligations, industry reputation, and shareholder confidence. The business impact analysis determines your tolerance to downtime, also known as the recovery time objective RTO.
An RTO is defined as the maximum amount of time that an IT-based business process can be down before the organization starts suffering unacceptable consequences financial losses, customer dissatisfaction, reputation, and so on. RTO indicates the downtime tolerance of a business process or an organization in general. RTO requirements are driven by the mission-critical nature of the business. Therefore, for a system running a stock exchange, the RTO is zero or near to zero. An organization is likely to have varying RTO requirements across its various business processes.
A high volume e-commerce website, for which there is an expectation of rapid response times, and for which customer switching costs are very low, the web-based customer interaction system that drives e-commerce sales is likely to have an RTO of zero or close to zero.
However, the RTO of the systems that support back-end operations, such as shipping and billing, can be higher. If these back-end systems are down, then the business may resort to manual operations temporarily without a significant visible impact.
Some organizations have varying RTOs based on the probability of failures. Typically, business-critical customers have an RTO of less than 1 minute for local failures, and may have a higher RTO of less than 1 hour for disasters. For mission-critical applications the RTOs may indeed be the same for all unplanned outages.
The business impact analysis also determines your tolerance to data loss, also known as a recovery point objective RPO. The RPO is the maximum amount of data that an IT-based business process can lose without harm to the organization. RPO measures the data-loss tolerance of a business process or an organization in general.
This data loss is often measured in terms of time, for example, zero, seconds, hours, or days of data loss. A stock exchange where millions of dollars worth of transactions occur every minute cannot afford to lose any data. Therefore, its RPO must be zero. The web-based sales system in the e-commerce example does not require an RPO of zero, although a low RPO is essential for customer satisfaction.
However, its back-end merchandising and inventory update system can have a higher RPO because lost data can be reentered. You must make an objective evaluation of the skill sets, management resources, and tools available in an organization, and the degree to which the organization can successfully manage all elements of a high availability architecture.
Just as RPO and RTO measure an organization's tolerance for downtime and data loss, your manageability goal measures the organization's tolerance for complexity in the IT environment. When less complexity is a requirement, simpler methods of achieving high availability are preferred over methods that may be more complex to manage, even if the latter could attain more aggressive RTO and RPO objectives. Understanding manageability goals helps organizations differentiate between what is possible and what is practical to implement.
All of the improvements that Silver offers compared to Bronze are related to RTO for server outages and for several frequently executed types of planned maintenance.
Areas of improvement compared to Bronze are in parentheses. Hardware or operating system maintenance and database patches that cannot be done online but are qualified for Oracle RAC rolling install. The Gold tier builds upon Silver by using database replication technology to eliminate single point of failure and provide a much higher level of data protection and HA from all types of unplanned outages including data corruptions, database failures, and site failures. The existence of a replicated copy also provides substantial advantages for reducing downtime during periods of planned maintenance.
RTO is reduced to seconds or minutes with an accompanying RPO of zero or near zero depending upon configuration. An overview of the Gold tier is shown in the following figure. The Gold tier adds advanced high availability components to achieve improved service levels described in the following sections. Oracle Active Data Guard maintains one or more synchronized physical replicas standby databases at a remote location that are used to eliminate single point of failure for a production database the primary database.
Choice of zero or near-zero data loss potential. Oracle Active Data Guard performs real-time replication of changes from a primary to a standby database.
Administrators can choose synchronous transport with Maximum Availability for a guarantee of zero data loss. Alternatively they can choose asynchronous transport and Maximum Performance for near-zero data loss. Maximum Performance can achieve sub-second data loss exposure when provided sufficient network bandwidth to accommodate transport volume. An Oracle Active Data Guard standby database can quickly take over production and restore service if there is a database or site outage that impacts the availability of the primary database.
The Oracle Database is always running, it does not need to be restarted to transition to the primary role, and role transitions can complete in less than 60 seconds, even on heavily loaded systems. This accelerates recovery time by eliminating the delay required for an administrator to be notified and respond to an outage. Fast Start Failover uses role-specific database services and the Oracle client notification framework to ensure that applications quickly drop their connections to a failed primary database and automatically reconnect to the new primary.
Role transitions can also be executed manually using either a command line interface or Oracle Enterprise Manager. Transparent replication. Oracle Active Data Guard performs complete, one-way physical replication of an Oracle Database with the following characteristics: high performance, simple to manage, support for all data types, applications, and workloads such as DML, DDL, OLTP, batch processing, data warehouse, and consolidated databases.
Production offload for high return on investment ROI. Oracle Active Data Guard standby databases can be opened read-only while replication is active, and they can be used to offload ad-hoc queries and reporting workloads from the production database.
The offload increases ROI in standby systems and improves performance for all workloads by utilizing capacity that would otherwise be idle.
It also provides continuous application validation because the standby systems are ready to support production workloads.
Backup offload. Primary and standby systems are exact physical replicas, enabling backups to be offloaded from the primary to the standby database. A backup taken at the standby can be used to restore either the primary or standby database.
This provides administrators with flexible recovery options without burdening production systems with the overhead of performing backups. Reduced downtime for planned maintenance. Standby databases can be used to upgrade to new Oracle Patch Sets for example, patch release Total downtime is limited to the time required to switch a standby database to the primary production role after maintenance has been completed.
An Oracle Active Data Guard standby performs continuous Oracle validation to ensure that corruption is not propagated from the source database. It detects physical and logical intra-block corruptions that can occur independently at either primary or standby databases. For more details see My Oracle Support Note Automatic block repair. It does this by retrieving a good copy of the block from the opposite database. No application changes are required and the repair is transparent to the user.
The points above explain how the Gold tier utilizes Oracle replication technology to maintain a synchronized copy, rather than using storage remote mirroring products for example, SRDF, Hitachi TrueCopy, and so on For a more in-depth discussion of the differences see Oracle Active Data Guard vs.
Storage Remote Mirroring. Oracle GoldenGate provides the option of logical replication to maintain a synchronized copy target database of the production database source database. Logical replication is a more complex process than physical replication but provides greater flexibility to handle different replication scenarios and heterogeneous platforms.
From a data distribution perspective, logical replication is designed to efficiently replicate subsets of a source database to distribute data to other target databases. It can also be used to consolidate data into a single target database for example, an Operational Data Store from multiple source databases. From a high availability perspective, logical replication can be used to maintain a complete replica of a source database for high availability or disaster protection that is ready for immediate failover should the source database become unavailable.
Oracle GoldenGate uses a logical replication process. It reads changes from disk at a source database, transforms the data into a platform independent file format, transmits the file to a target database, then transforms the data into SQL updates, inserts, and deletes native to the target database. The target database contains the same data, but is a different database from the source for example, backups are not interchangeable.
Oracle GoldenGate logical replication provides increased flexibility to perform maintenance and migrations in a rolling manner that is not possible using Data Guard physical replication.
For example, Oracle GoldenGate enables replication of a source running on a big-endian platform and target running on a little-endian platform cross-endian replication. This makes it possible to execute platform migrations with the additional advantage of being able to reversing the replication for fast fallback to the prior version after cutover. Oracle GoldenGate logical replication is a more sophisticated process that has a number of prerequisites that do not apply to Data Guard physical replication.
In return for these prerequisites Oracle GoldenGate provides unique capabilities to address advanced replication requirements. Refer to MAA Best Practices: Oracle Active Data Guard and Oracle GoldenGate for additional insights on the tradeoffs of each replication technology and requirements that may favor the use of one versus the other, or the use of both technologies in a complementary manner.
Oracle Site Guard enables administrators to orchestrate switchover a planned event and failover in response to an unplanned outage of their Oracle environment, multiple databases, and applications, between a production site and a remote disaster recovery site. Reduction of errors due to prepared response to site failure. Oracle Site Guard reduces the possibility of human error in case of disasters.
Recovery strategies are mapped out, tested, and rehearsed in prepared responses within the application. Once an administrator initiates a Site Guard operation for disaster recovery, human intervention is not required. Coordination across multiple applications, databases, and various replication technologies. Oracle Site Guard automatically handles dependencies between different targets while starting or stopping a site. Site Guard integrates with Oracle Active Data Guard to coordinate multiple concurrent database failovers.
Site Guard also provides an easy mechanism to integrate with any storage remote mirroring product. It integrates with storage appliances to perform switchover or failover by using callouts to any user-specified storage role reversal scripts in the operation workflow.
Faster recovery time. Oracle Site Guard automation minimizes the manual coordination of recovery activities. This accelerates recovery time even compared to the case where all manual efforts are executed successfully. Site Guard also avoids time consuming resolution of human error that often accompanies manual implementation of complex procedures. Table summarizes the data protection offered by the Gold tier. Recovery time and data loss potential are dramatically reduced in the Gold tier compared to Silver.
Areas of improvement compared to the Silver tier are in parentheses. The Platinum tier builds upon Gold to provide the highest level of HA and data protection for applications that have zero tolerance for outages or data loss. Platinum introduces several new Oracle Database 12 c capabilities as well as previously available products that have been enhanced with the latest release. Platinum masks the impact of outages to applications and users, ensuring that even in-flight transactions are preserved following recoverable failures.
It enables zero downtime maintenance, migrations, and application upgrades. It guarantees zero data loss in the event of failure of the primary database for any reason, regardless of the distance between sites. Finally, Platinum automatically manages the availability of database services and workload load balancing across database replicas in multiple sites.
An overview of the Platinum tier is provided in the following figure. Some applications will require a level of modification to achieve zero application outage using the capabilities provided by the Platinum tier. This explains why Platinum is described as providing zero application outage for Platinum-Ready Applications. Note that no application modifications are necessary in order to achieve zero data loss. Application Continuity. Edition Based Redefinition. Global Data Services.
Application Continuity protects applications from database session failures due to instance, server, storage, network, or any other related component, and even complete database failure. Application Continuity re-plays affected "in-flight" requests so that the failure appears to the application as a slightly delayed execution, masking the outage to the user. If an entire Oracle RAC cluster fails, making the database unavailable, Application Continuity will replay the session including the transaction, following an Oracle Active Data Guard failover.
While in many cases there is some modification to existing application code required to use Application Continuity, it simplifies development of new applications by transparently handling recoverable failures. Oracle Active Data Guard is the only Oracle-aware replication technology that offers zero data loss failover for Oracle Database.
Zero data loss is achieved using synchronous transport with Data Guard Maximum Availability mode. Network latency between primary and standby sites will affect database performance when synchronous transport is used.
As distance between site increases, so will latency and its impact on database performance. Because primary and secondary data centers are often separated by long distances, zero data loss failover is impractical to implement for many databases. Oracle Active Data Guard Far Sync with Oracle Database 12c eliminates prior limitations by enabling zero data loss failover even when primary and standby databases are hundreds or thousands of miles apart, without impacting primary database performance.
It achieves this by using a light-weight forwarding mechanism that is simple to deploy and transparent to Oracle Active Data Guard failover or switchover operations. Far Sync, when used in combination with the Oracle Advanced Compression Option, also enables off-host transport compression to conserve network bandwidth. By combining Far Sync with Data Guard Fast-Start-Failover automatic database failover , Application Continuity can mask outages for in-flight transactions regardless of the distance between primary and stand by sites.
Far Sync, therefore, enables two critical enhancements offered by the Platinum tier: zero data loss failover for any database and the ability to use Application Continuity regardless of the distance between sites. There are no application modifications required to take advantage of Far Sync.
The Platinum tier uses Oracle GoldenGate's advanced replication capabilities to implement zero downtime maintenance and migrations using bi-directional replication. In such a scenario:. Source and target are synchronized across versions using Oracle GoldenGate logical replication.
This handles cross-endian platform migrations. It also handles complex application upgrades that modify back-end objects where the replication mechanism must be able to transform data from old to new versions and vice versa.
Once the new version or platform is synchronized and stable, the bi-directional replication enables users to be gradually migrated to the new platform as they terminate sessions on the previous version and reconnect, providing a zero downtime experience.
Oracle GoldenGate bi-directional replication keeps old and new versions in sync during the migration process. This also provides for a quick fall back option should any unanticipated issues arise with the new version as load is added. Active-active bi-directional replication can also be used to increase availability service levels where a continuous read-write connection to multiple copies of the same data is required.
Bi-directional replication is not application transparent. It requires conflict detection and resolution when changes are made to the same record at the same time in multiple databases.
0コメント