Jira Align offers a data file retrieval option for cloud Enterprise Insights instances.
On this page:
- What is Enterprise Insights data file retrieval?
- Requirements
- The set up process
- File delivery
- Directory structure
- File structure
- How to reconstruct data
What is Enterprise Insights data file retrieval?
Enterprise Insights (EI) data file retrieval lets you export your Jira Align cloud data from Enterprise Insights and import it into your own database or data lake. With data file retrieval, your Enterprise Insights data is exported to Apache Parquet formatted files. The files are stored within a dedicated Amazon Web Services (AWS) S3 bucket that only you have read access to.
Data file retrieval is an option for environments that are unable to use a Microsoft SQL Server client or compatible ODBC/JDBC client to access Enterprise Insights.
Requirements
You’ll need the following to use the data file retrieval option:
- Team member(s) with knowledge of Parquet files and Amazon Web Services (AWS)
- A data pipeline to consume the Parquet files from AWS S3.
- An AWS account to connect to the S3 bucket. This is required because your AWS role is used for authentication.
- You will need to configure IAM roles when setting up data file retrieval. This method is recommended to gain access.
- Alternatively, you can provide access to all principals within your AWS account by providing an ARN such as “arn:aws:iam::0000000000000:root”.
The set up process
Complete the following steps to set up the data file retrieval option and gain access to your S3 bucket:
- Contact your Atlassian Solutions Engineer or partner reseller to review your options for purchasing an Enterprise Insights license, and indicate your interest in the data file retrieval offering.
- Submit a ticket to our support team with the following information:
- Details about the AWS IAM role you’ve set up for access
- You can provide different ARNs for S3 access and SNS topic access. This is useful if you want to give a wider set of users access to SNS notifications than those who can access the S3 bucket.
- Your desired interval for ETL data refreshes. You can choose an interval between 1 and 24 hours.
- Your AWS region. We recommend that your region is one of the following used by Enterprise Insights:
- North America: us-east-2
- EMEA: eu-central-1
- APAC: ap-southeast-2
- Details about the AWS IAM role you’ve set up for access
- We send you an update through the support ticket, providing you with the following:
- The new S3 bucket name
- The ARN of the KMS key used for encryption
- The ARN for the SNS topic
- Assume the IAM role you created in your AWS account, and navigate to https://{your AWS region}.console.aws.amazon.com/s3/buckets/{bucket name} to view your new S3 bucket.
File delivery
Files are delivered to the AWS S3 bucket based on the data in the export_dw schema of your Enterprise Insights (EI) instance. Tables and columns included in export files will be marked as Yes under the Parquet column of the EI schema file.
There are two types of export jobs that run each day:
- Full exports will run between 11pm and 6am, in the timezone of your AWS region. By default, the job will begin at midnight, but we may adjust the timing by a few minutes for better performance. We can accommodate a specific job start time within this window.
- These can be found in the /full-export/ folder of your S3 file directory.
- Incremental exports run according to the data pipeline execution frequency set up for your EI instance.
- These can be found in the /delta-export/ folder of your S3 file directory.
Notifications
You can subscribe to Amazon’s Simple Notification Service (SNS) inside AWS to receive notifications when new Parquet files are available after a successful export.
We will provide you with a Topic ARN that you can subscribe to in AWS. SNS provides multiple ways to receive a notification when the S3 bucket has been updated with new data, including AWS Lambda, SMS, HTTP, and email.
Each SNS notification contains the following info:
- S3 Path Directory: The path to the S3 bucket where files can be consumed.
- ETL_ID: The execution ID of each job run.
- Export Type: Indication if the export is a full export or a delta export.
Directory structure
Two tiers of directories are provided to host full exports and delta exports. Both use a directory structure aligned with Coordinated Universal Time (UTC).
Full exports
Full exports use the following structure:
[S3 bucket name]/YYYY/MM/DD/daily-export/full-export/[table name]/[etlid]
/YYYY/MM/DD matches the UTC time when the full export job is run.
Delta exports
Delta exports use the following structure:
[S3 bucket name]/YYYY/MM/DD/daily-export/delta-export/YYYY/MM/DD/[table name]/[etlid]
The second use of /YYYY/MM/DD is based on when different delta exports are run. These may span two different dates, unless the export is run at midnight UTC.
File structure
File size
Export files are limited to 3.5 million records per file. This keeps the file size manageable, approximately 150MB - 250MB. If an export is split into multiple files, they will share a common prefix for each export run.
Parquet to SQL metadata conversion
Depending on your database or data lake type, you may need to convert data types when ingesting columns from the Parquet file to SQL or other formats. To look up the data type in parquet with python, you can use the following code example from Apache Arrow.
How to reconstruct data
Once you have access to the Parquet files in your S3 bucket, follow these steps for each table you need data from:
- In your target database, create two temporary tables. One will host the accumulated baseline, the full export. The other will host incremental updates, the delta exports.
- Ingest the full export Parquet files directly into the first temporary table. Commands to do this ingest vary depending on your database server provider.
- Ingest the delta export Parquet files directly into the second temporary table.
- Merge the delta exports with the accumulated baseline:
- Use Upsert queries to merge new and updated rows from the second temporary table into the first temporary table.
- Use select SQL commands to retrieve the soft delete records from the second temporary table, and delete those records in the first temporary table.
- Delete the data from the current permanent table in your database, and insert the data from the first temporary table.
- Alternatively, you can rename your permanent table to create a backup, then rename the first temporary table with the permanent table’s original name.
Join the Atlassian Community!
The Atlassian Community is a unique, highly collaborative space where customers and Atlassians come together. Ask questions and get answers, start discussions, and collaborate with thousands of other Jira Align customers. Visit the Jira Align Community Collection today.
Need to contact Jira Align Support? Please open a support request.