Cloud  – The Evolution of the DBA

Cloud – The Evolution of the DBA

Traditionally, IT organisations manage their infrastructure on-premises, and infrastructure DBAs are responsible for maintaining data integrity, availability, and reliability. A large proportion of their time would be spent on tasks like patching, upgrades, capacity planning, user management, monitoring and troubleshooting. Infrastructure and application support teams would often be physically separated, sometimes across the globe, and interact mostly via tickets between queues. Many organizations feel this culture of teams working as separate entities is no longer effective in the changing IT industry.

Cloud is accelerating change

Today, IT is increasingly leveraging the Cloud. This move towards Cloud hosted services has implications on the traditional job roles that were established in company data centre environments.

Databases are now offered as Cloud managed services, self-provisioned from a catalogue of solutions. The control and flexibility over features and customisations are limited. With our traditional infrastructure DBA hats on, we may feel incapacitated or disarmed. But a change in mindset opens up a vast range of opportunities for DBAs to explore.

Databases in the Cloud are no longer considered as pets for the DBA to nurture, cuddle and put a bow on. Rather, they are like cattle in a big herd – treated generically and managed at scale. Pets often need special treatment and are difficult to replace. As DBAs, we felt comfortable knowing that we were heavily depended on as specialists in their exact needs (except for perhaps when we were being barked at at 3am!).

What does this mean for the DBA?

After years in the IT industry, it feels unnatural to accept that being an expert in a single technology may no longer be the best path. Having a range of skills and knowledge of multiple technologies is now increasingly valuable. The rise of NoSQL databases to address changing business needs – social, mobile, global access requirements, means that many organisations have hybrid database estates. Cloud makes this more accessible and therefore DBA skillsets are broadening to support the new landscapes. DBAs are expected to understand the range of services on offer and advise on what solutions would be most appropriate and cost effective. Knowledge of the whole stack is important for this process. DBAs are relied upon to guide organisations on their Cloud adoption journey, with data being at the heart of the design and migration decisions. DBAs may be finding their job titles changing to Cloud Architect or similar, reflecting the broader range of skills and responsibilities.

In the Cloud, managed services free up DBAs from simple, repeatable operational tasks and therefore allow them to concentrate on optimisation and innovation. Since Cloud pricing models are pay-per-use, organisations are moving from mostly CapEx to OpEx, and therefore finding ways to increase efficiency and free up resources yields immediate cost savings. Cloud is accelerating the DevOps movement, both in making automation tools accessible and driving the culture of a shared responsibility amongst teams for delivering solutions. DBAs who embrace DevOps processes and tools, and understand how to integrate databases into these, are in high demand.

Data as a strategic asset

Cloud is shifting the focus from the database to the data itself. As a result, Database Administrators may consider themselves more asData Administrators/Architects/ Engineers. Big Data, AI and analytics are helping organisations to gain deeper insights and extract more value from their data. This had unlocked a realm of opportunities for DBAs to be involved with. Understanding how to process data in Hadoop, and analyse it using R, for example, are valuable skills for a DBA to acquire.

The changing nature of how data is created and consumed in today’s world is of course changing the role of a DBA. There are a vast range of technologies being integrated into the DBA toolkit for organisations to successfully leverage Cloud. With data at the heart of business decisions, DBAs are importantly positioned to bridge the gap between IT and business and should certainly embrace these changes if they have not done already.

 

Haben Sie weitere Fragen? Wir beraten Sie gerne: marketing@avato.net

Impressum: 
Datum: Februar 2021
Autor: Grace Honeysett
Kontakt: marketing@avato.net
www.avato-consulting.com
© avato consulting ag – Copyright 2021
All Rights Reserved.

AWS Database Migration Service – is it an Enterprise Tool?

AWS Database Migration Service – is it an Enterprise Tool?

Amazon provides the AWS Database Migration Service to migrate data to and from different database platforms, whereby either the source or target is on AWS. It supports a range of database platforms, and is cost effective by comparison to many of the licensed migration tools in the market. Amazon declares that DMS can migrate your database in “just a few clicks”. But is it ever as simple as this in practice? The DMS user guide provides plenty of information on its limitations, but you may find yourself jumping between pages, before finally discovering a showstopper. Do the limitations of the tool present too many barriers to its ease of use? And importantly, can we rely on it as an enterprise migration tool? This article discusses some of the considerations that should be made before you use it as a migration method in your organisation, with a particular focus on Oracle as a source data store.

 

How it Works

A high-level overview of AWS DMS is as follows:

  • Create a replication instance.
  • Create source and target endpoints. These store connection information about your databases.
  • Create migration task/s to migrate the data. This has 3 major phases:
    • Full data load.
    • Apply cached changes made to source during full load.
    • Ongoing replication (Change Data Capture) until point of application switchover.

 

Considerations & Limitations

Replication Instance

The replication instance is an Amazon EC2 instance that hosts one or more migration tasks. The replication instance should be sized appropriately for the number of concurrent migration tasks you plan to run, in terms of CPU, memory and storage required for the Change Data Capture process. When performing production migrations, it may be prudent to use separate replication instances to avoid resource contention, particularly if you are unsure of the resource requirements. Choosing the Multi-AZ option also provides High Availability via a synchronous standby replica of the replication instance.

 

Source and Target Endpoints

AWS DMS supports a number of database engines as source and target endpoints. They can be on-premise, Amazon EC2 hosted, or an Amazon RDS service. Amazon S3 buckets can also be used as a data source for AWS DMS.

Supported data stores for Source Endpoints:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Sources.html

Supported data stores for Target Endpoints:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Introduction.Targets.html

Different considerations and limitations exist depending on the type of data store. We will discuss some of these later.

There is also a limit to the size of a database that AWS DMS can migrate, based on resource quotas per AWS Region per AWS account. This should be considered before choosing AWS DMS as a migration tool.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Limits.html

 

Migration Task

 

Metadata

The migration task migrates data between the source and target data stores. It only creates objects in the target that are considered necessary to migrate the data – tables, primary keys, and some unique indexes. Other indexes, constraints and code objects must be handled separately.

For homogenous migrations, it is recommended to use the database software’s native tools to migrate the metadata, before using AWS DMS to migrate the data.

For heterogenous migrations, the AWS Schema Conversion Tool can be used for the schema metadata. AWS SCT allows to automatically convert most of the schema metadata to the target database format, but since it has its own limitations, it highlights all code that requires manual actions. It can be a useful tool for identifying which elements of an application schema do not translate from one database platform to another and require redesign.

Data

When AWS DMS performs the full data load phase, it does not do so in any particular order, so child table data could be loaded before parent table data. This means that any referential integrity constraints between tables, enforced by foreign keys or triggers, must be disabled, to prevent violations and adverse effects on the data load. Of course, disabling foreign key constraints and triggers in the live source database would likely be unacceptable. The foreign key constraints and triggers should be disabled on the target side after the schema creation, and prior to the data load with DMS. They should then be reenabled after the data migration is complete, or optionally after the start of the Change Data Capture (CDC) phase – which will cause a higher load on the target database during the CDC phase, as result of constraints validation of the migration data stream but will have the benefit of a reduced application switchover time.

AWS DMS begins applying cached changes for a table as soon as the full load for that table is complete. Therefore, only after DMS has applied all cached changes for all tables, and switched to the ongoing replication phase, will the database be transactionally consistent. This is important to understand in the context of any migration progress checks on the target.

 

Considerations when using an Oracle Database as a Source data store

There are a significant number of limitations of AWS DMS in supporting an Oracle Database as a Source data source. Some of these will be very possible eventualities in the real world and therefore must be carefully considered before choosing AWS DMS as a migration tool.

For example:

  • Changes resulting from partition or subpartition operations (ADD, DROP, EXCHANGE, TRUNCATE) are not replicated.
  • Any data in LONG or LONG RAW columns longer than 64KB is truncated to 64KB.
  • Virtual columns are not supported.
  • Global temporary tables are not loaded or captured.
  • The ROWID datatype, and materialized views based on a ROWID column are not supported.
  • BFILE, REF, UROWID, ANYDATA and user-defined data types cannot be replicated, resulting in a null column.
  • For Oracle nested tables, only one level of table nesting is supported.

Refer to the following page for more:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.Limitations

The Change Data Capture phase of the migration task can be performed by either Oracle LogMiner or AWS DMS Binary Reader. The default is to use LogMiner, but there are again various situations under which each method may be unsuitable or unsupported.

  • In order to use a pluggable database as a source data store, the Binary Reader option must be chosen for the Change Data Capture.
  • Binary Reader does not support all Oracle compression and encryption options. Specifically, HCC Query Low compression is not supported, and TDE is only supported for self-managed databases (on-premise or Amazon EC2 hosted, not Amazon RDS).

Full details:

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.Oracle.html#CHAP_Source.Oracle.CDC

There are further considerations and prerequisites. In order to use the Binary Reader CDC method, additional privileges must be granted to the migration user specified in the source endpoint. If the redo logs are stored on non-ASM storage, the suggestion from AWS is to grant the CREATE ANY DIRECTORY system privilege, which will allow DMS to create directory objects to read from the redo logs. Granting of the CREATE ANY DIRECTORY system privilege is likely prohibited in many production databases, due to the security hole it exposes. The alternative is to manually create directories for all redo log locations and grant read access on the individual directory objects. The migration user in the source Oracle database is also required to have SELECT privileges on every table that will be replicated. Migrations often involve the whole database or whole schemas and granting SELECT ANY TABLE could again raise security concerns. The SELECT privileges would therefore need to be granted individually to each table.

When creating the migration task, there is an option relevant to Oracle source databases – “Include LOB columns in replication”. The options are to exclude LOB columns entirely from the migration, to include them, or to truncate them to a maximum size. It is difficult to imagine a real world use case for truncating LOBs to an arbitrary size, but the option is there and is recommended by AWS to be used whenever possible, offering “significant performance gains”. In Full LOB mode, LOBs are migrated one at a time, chunk by chunk. If the LOBs are smaller than the maximum VARCHAR size (64K), then DMS can treat them as VARCHAR and fetch them in bulk. Hence AWS recommends a maximum LOB size of 64K when using DMS.

https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Tasks.LOBSupport.html

The declaration that the AWS Database Migration Service can migrate your database in “just a few clicks” may be true for a very simple data set. But in the real world, there are many considerations before you can reap the promised benefits of its simplicity. Planning is crucial to the success of any database migration. When using Cloud services such as AWS DMS, whereby there is a further layer of abstraction between the administrators and the databases, understanding the potential pitfalls becomes even more important.