One of the database systems we develop consists of a 3 database set-up, connected to each other by database links and Java services. This is not an ideal configuration but was historically grown because of different ownership of the data at that time. And yes, one day all 3 databases will be merged into 1 database, one day…
Twice a year we ask our customer for a masked copy of production data. Before masking the data you need of course a copy of the database to work in and this can be done easily by using RMAN or using the data pump expdp tool. Our customers choice was expdp…
The extra challenge with making a copy of the production environment is that all three database ‘restores’ must be of the same time stamp. Now Oracle provides SCN functionality for restore points using expdp, that’s working great. However, 2 of the 3 databases contain reasonable amounts of documents and getting a set of 3 database export dumps with got more difficult as the databases grew bigger and bigger. ORA-01555 – Snapshot-too-old messages appeared and the customer had more and more difficulty getting an consistent dataset between all databases because of the duration and size of the databases.
So, the customer asked us for a optimization solution for the current export process using Oracle data pump.
Since we we’re already using Virtual Private Database (VPD) functionality in one of these databases I was wondering if using VPD functionality together with expdp was possible, since we have already seen that there are errors during export like ORA-39181: Only partial table data may be exported due to fine grain access control.
Our conclusion was that VPD is not usable when running expdp as user SYS or SYSTEM. For some reason the VPD policies don’t get activated and since doing schema backups was not an option for us we had to look further.
Fortunately Oracle datapump delivers functionality to accomplish the same result, ignore columns in the export, even in a much easier way than VPD.
The datapump REMAP_DATA option give you ability to do a function call before writing data to the export file. If we make sure the functions returning value is empty, nothing will be written in the end.
Because we don’t want to interfere with production schema’s we create a temporary user and give it the correct privileges:
create user ANO_BEHEER identified by ano_beheer
default tablespace users;
grant DATAPUMP_EXP_FULL_DATABASE to ANO_BEHEER;
grant RESOURCE to ANO_BEHEER ;
grant READ on directory EXPORT_BACKUP_DIR to ANO_BEHEER;
grant WRITE on directory EXPORT_BACKUP_DIR to ANO_BEHEER;
Now connect as the new user and create a package containing different functions. We created functions for emptying LOBS, BLOBS and (long) VARCHAR2 columns:
CREATE OR REPLACE PACKAGE ano_remove_data AUTHID CURRENT_USER
FUNCTION ano_empty_lob(old_blob IN BLOB) RETURN BLOB;
FUNCTION ano_empty_lob(old_clob IN CLOB) RETURN CLOB;
FUNCTION ano_empty_field(old_varchar IN VARCHAR2) RETURN VARCHAR2;
CREATE OR REPLACE PACKAGE BODY ano_remove_data AS
FUNCTION ano_empty_lob(old_blob IN BLOB) RETURN BLOB
FUNCTION ano_empty_lob(old_clob IN CLOB) RETURN CLOB
FUNCTION ano_empty_field(old_varchar IN VARCHAR2) RETURN VARCHAR2
Now create a par file containing the function calls for the specific columns from which we don’t need the data in the format
In this case, file_data is a CLOB column, description has a VARCHAR2 data type and document has BLOBS stored. We’re not interested in the contents of any of these columns.
Now run the expdp command with the just created .par file and check the size of the dump file (it’s of no use to check the number of rows since we’re just skipping column data, not rows!)
As the last step, drop the user we created for this purpose.
DROP USER ANO_BEHEER CASCADE;
What are the advantages of this method?
- Much smaller datapump export dump file, less storage necessary on OS
- Time taken for export much smaller
- So, according to above 2 points, import will go faster too
- Data masking functionality already possible at the export stage!
Considering the last point, the REMAP_DATA option gives you much flexibility and even together with datapump’s QUERY option, as explained in this article, the possibilities are enormous.