Enterprise Insights data file retrieval option

Jira Align offers a data file retrieval option for cloud Enterprise Insights instances.

On this page:

What is Enterprise Insights data file retrieval?
Requirements
The set up process
File delivery
Directory structure
File structure
How to reconstruct data

What is Enterprise Insights data file retrieval?

Enterprise Insights (EI) data file retrieval lets you export your Jira Align cloud data from Enterprise Insights and import it into your own database or data lake. With data file retrieval, your Enterprise Insights data is exported to Apache Parquet formatted files. The files are stored within a dedicated Amazon Web Services (AWS) S3 bucket that only you have read access to.

Data file retrieval is an option for environments that are unable to use a Microsoft SQL Server client or compatible ODBC/JDBC client to access Enterprise Insights.

There are two options available:

Standard EI data file retrieval uses the default EI Azure SQL database setup to produce parquet files, and other standard connections such as ODBC and Atlassian Analytics can be used.
AWS-only EI data file retrieval is also available for organizations that want to avoid processing data through Azue SQL, and stay within Amazon Web Services infrastructure. Note that this option requires the purchase of a dedicated virtual private cloud (VPC). Because data is limited to AWS, some features of EI are not accessible with this offering:
- No ODBC/JDBC access, you cannot connect directly to EI from your network. Instead, parquet files are used.
- No Atlassian Analytics connectivity, as the database is not accessible from the Atlassian platform.
- No self-service UI to configure EI for use with corporate networks.

Requirements

For both options, you’ll need the following to use data file retrieval:

Team member(s) with knowledge of Parquet files and Amazon Web Services (AWS)
A data pipeline to consume the Parquet files from AWS S3.
An AWS account to connect to the S3 bucket. This is required because your AWS role is used for authentication.
- You will need to configure IAM roles when setting up data file retrieval. This method is recommended to gain access.
- Alternatively, you can provide access to all principals within your AWS account by providing an ARN such as “arn:aws:iam::0000000000000:root”.

For an AWS-only setup, you will need to have purchased and provisioned a dedicated virtual private cloud (VPC) environment.

The set up process

Complete the following steps to set up the data file retrieval option and gain access to your S3 bucket:

Contact your Atlassian Solutions Engineer or partner reseller to provision Enterprise Insights, and indicate your interest in the data file retrieval offering.
Submit a ticket to our support team with the following information:
- Details about the AWS IAM role you’ve set up for access
  - You can provide different ARNs for S3 access and SNS topic access. This is useful if you want to give a wider set of users access to SNS notifications than those who can access the S3 bucket.
- Your desired interval for ETL data refreshes. You can choose an interval between 1 and 24 hours.
- Your AWS region. We recommend that your region is one of the following used by Enterprise Insights:
  - North America: us-east-2
  - EMEA: eu-central-1
  - APAC: ap-southeast-2
We send you an update through the support ticket, providing you with the following:
1. The new S3 bucket name
2. The ARN of the KMS key used for encryption
3. The ARN for the SNS topic
Assume the IAM role you created in your AWS account, and navigate to https://{your AWS region}.console.aws.amazon.com/s3/buckets/{bucket name} to view your new S3 bucket.

File delivery

Files are delivered to the AWS S3 bucket based on the data in the export_dw schema of your Enterprise Insights (EI) instance. Tables and columns included in export files will be marked as Yes under the Parquet column of the EI schema file.

There are two types of export jobs that run each day:

Full exports will run between 11pm and 6am, in the timezone of your AWS region. By default, the job will begin at midnight, but we may adjust the timing by a few minutes for better performance. We can accommodate a specific job start time within this window.
- These can be found in the /full-export/ folder of your S3 file directory.
Incremental exports run according to the data pipeline execution frequency set up for your EI instance.
- These can be found in the /delta-export/ folder of your S3 file directory.

Notifications

You can subscribe to Amazon’s Simple Notification Service (SNS) inside AWS to receive notifications when new Parquet files are available after a successful export. You can also use a manifest file to track the delivery of new Parquet files.

For SNS, we will provide you with a Topic ARN that you can subscribe to in AWS. SNS provides multiple ways to receive a notification when the S3 bucket has been updated with new data, including AWS Lambda, SMS, HTTP, and email.

Each SNS notification contains the following info:

S3 Path Directory: The path to the S3 bucket where files can be consumed.
ETL_ID: The execution ID of each job run.
Export Type: Indication if the export is a full export or a delta export.

Directory structure

Two tiers of directories are provided to host full exports and delta exports. Both use a directory structure aligned with Coordinated Universal Time (UTC).

Full exports

Full exports use the following structure:

[S3 bucket name]/YYYY/MM/DD/daily-export/full-export/[table name]/[etlid]

/YYYY/MM/DD matches the UTC time when the full export job is completed.

Delta exports

Delta exports use the following structure:

[S3 bucket name]/YYYY/MM/DD/daily-export/delta-export/YYYY/MM/DD/[table name]/[etlid]

The second use of /YYYY/MM/DD is based on when different delta exports are completed. These may span two different dates, unless the export is run at midnight UTC.

File structure

File size

Export files are limited to 3.5 million records per file. This keeps the file size manageable, approximately 150MB - 250MB. If an export is split into multiple files, they will share a common prefix for each export run.

Parquet to SQL metadata conversion

Depending on your database or data lake type, you may need to convert data types when ingesting columns from the Parquet file to SQL or other formats. To look up the data type in parquet with python, you can use the following code example from Apache Arrow.

How to reconstruct data

Once you have access to the Parquet files in your S3 bucket, follow these steps for each table you need data from:

In your target database, create two temporary tables. One will host the accumulated baseline, the full export. The other will host incremental updates, the delta exports.
Ingest the full export Parquet files directly into the first temporary table. Commands to do this ingest vary depending on your database server provider.
Ingest the delta export Parquet files directly into the second temporary table.
Merge the delta exports with the accumulated baseline:
1. Use Upsert queries to merge new and updated rows from the second temporary table into the first temporary table.
2. Use select SQL commands to retrieve the soft delete records from the second temporary table, and delete those records in the first temporary table.
Delete the data from the current permanent table in your database, and insert the data from the first temporary table.
- Alternatively, you can rename your permanent table to create a backup, then rename the first temporary table with the permanent table’s original name.

Join the Atlassian Community!

The Atlassian Community is a unique, highly collaborative space where customers and Atlassians come together. Ask questions and get answers, start discussions, and collaborate with thousands of other Jira Align customers. Visit the Jira Align Community Collection today.

Need to contact Jira Align Support? Please open a support request.