DataClarity 2021.3 & 2021.4 bring the following features and enhancements:
Platform's Query Engine
- Create a dataset based on extracted data
- Select data storage structure for an extract dataset
- Perform the first data load for an extract dataset
- Identify extract datasets
- Schedule a full refresh of an extract dataset
- Schedule an incremental refresh of an extract dataset
- Refresh an extract dataset on demand
- Create a dataset based on MongoDB
- Use new functions in calculations
- Display hyperlinks in Table
- Create a dynamic hyperlink calculation in Table
- Create an HTML calculation in Table
- Change the alignment of the column values in Table
- Customize the “no data” message in widgets
- Rearrange the data frame columns
- Rich Text widget
Installation & Configuration
Starting with this release, the DataClarity’s query & data federation engine has been optimized to increase query speed, scalability, and extensibility.
The engine enhancements allow for the following improved and new capabilities:
The schemeless query engine can process a high volume of data messages with minimal delay. This capability allows you to query big data at interactive speed, essential for data analytics environments. Moreover, the query engine automatically recognizes datastores and optimizes a query plan to take advantage of the datastores’ internal processing capabilities. The columnar query execution results in higher memory and CPU efficiency.
Support for non-relational data sources
The query engine enhancements provide a foundation for data source extensibility. This allows DataClarity to introduce new data connectors to practically any data source ranging from standard SQL datastores to non-relational datastores, including Hadoop, NoSQL databases (MongoDB, HBase), and cloud storage like Amazon S3. The engine can instantly combine data from various data sources on the fly in a single query with no centralized metadata definitions.
Creating data extracts
The new engine allows you to create data extracts. Data extracts are the snapshots of data stored on the platform’s query engine server side as compressed and highly optimized files. Data extracts help improve performance and reduce the load on databases by removing live connections to the databases.
You can scale the query engine from one to several thousand nodes and query petabytes of data within seconds. The engine leverages the aggregate memory in the cluster to execute queries using an optimistic pipelined model. In addition, the engine’s symmetrical architecture makes it easy to deploy and operate large clusters.
Before this release, all datasets used live data source connections by default. Starting with this release, you can choose to create datasets based on extracted data.
Extracts are snapshots of data loaded into the platform’s query engine memory and stored as highly optimized and compressed files. Data extracts help improve performance for large datasets and reduce the load on databases by avoiding live connections.
Data extract files use a columnar storage format. It takes time to create an extract, but it is especially fast to read the data from the extract. Thus, extracts reduce the time to access and aggregate the column values, making them essential for analytics and data discovery.
You can create a dataset extract in Step 2 – Refine (if a dataset is based on one data source table) or Step 3 – Join and preview (if a dataset has multiple data source tables). Only the users with the Dataset Extracts Creator role can create and manage extracts.
For more information on extract-based datasets, see "Data extracts" in DataClarity Help Center.
You can choose the data storage structure of an extract depending on the underlying data specifics―as a single table or multiple tables. Click Settings next to the Extract option, and select one of the following:
- Single table – All the data source tables are joined at the time of extract creation and saved as a single table.
- Multiple tables – Each data source table is stored in a separate file, reflecting the database structure. In this case, joins are performed at query time. Such extract files may be faster to generate and be smaller in size. They are faster with non-complex queries but might require more time to run complex visualization queries.
You can notice the performance differences between extract storage types with large amounts of data. You can experiment with both types and determine which one gives you the best performance and size benefits.
After you save a new extract dataset, no data is loaded by default. This way, you can choose when to perform the first data load for the best system performance. For large datasets, for example, you can schedule a data load during non-operating hours. Otherwise, you can click Extract now and start data load immediately.
You can quickly identify extract datasets in the Datasets pane. In Tile view, they are marked with the Extract label in the upper-right corner of the tile.
In List view, you can quickly identify extracts and live connections by the Data column. Click the column header to sort the datasets accordingly.
After creating an extract, you can refresh it with the latest data from the original data source connections. With a full refresh, you remove the current extract and replace it with the new data from the original data source. In addition to the immediate refresh that you can initiate manually, you can schedule it to run regularly at a convenient time.
To schedule a full refresh, go to an extract and under the More actions menu, select Modify > Extract > Schedule refresh. For the refresh type, ensure that Full is selected.
For a single-table extract, click Schedule. In the Schedule dialog, define the refresh frequency (hourly, daily, weekly, monthly) and select a time zone.
For a multi-table extract, you first select which tables need a regular refresh. Then, click Schedule and define a refresh schedule. If you add multiple tables, you can select the Use the same schedule for all tables option to define a common schedule once. You can unselect the option and adjust the schedule for each table at any time.
You can check the dates of the last refresh. In List view, point to the info icon next to an extract and review the tooltip. Depending on the extract type, you may see a single date or multiple dates (for each table).
Additionally, you can check the last refresh dates in the Explore dataset window.
When the structure of the original data source remains the same and only rows are being appended, you can use an incremental refresh to keep your dataset up to date. This refresh type adds only new rows since the previous refresh. To identify new rows, the data source should have a date (DateTime) column that is being updated.
To schedule an incremental refresh, go to an extract and under the More actions menu, select Modify > Extract > Schedule refresh. For the refresh type, ensure that Incremental is selected.
For a single-table extract, you first select a column with dates, based on which new rows can be identified. Then, click Schedule to define the refresh frequency (hourly, daily, weekly, monthly) and a time zone.
For a multi-table extract, you need to add tables that you want to refresh, and then for each table, select a date column used to identify new rows. To define the refresh schedule for a table, click Schedule. If applicable, you can select Use the same schedule for all tables to specify the schedule once. Then, at any time, you can unselect the checkbox and adjust the schedule for each table.
You can always check the dates of the last refresh. In List view, point to the info icon next to an extract and review the tooltip. Depending on the extract type, you may see a single date or different dates for each table.
Additionally, you can check the last refresh dates in the Explore dataset window.
In addition to scheduling a regular full refresh, you can also initiate a full extract refresh immediately. For an extract, under More actions menu, select Modify > Extract > Refresh now. The dataset will be available for visualization after the refresh is completed.
Now you can create, cache, and visualize a dataset based on a MongoDB data source. The new data source connection icon has been added to the data connections list.
The new query and federation engine introduced some new functions in the DataClarity Platform, making some other functions deprecated or replaced with the equivalents in the following categories:
- Date and time
Also, you can review the "Functions & macros" article in the DataClarity Help Center.
Starting with this release, you can display column values as hyperlinks in the Table widget. In the widget data settings, for a column containing the URLs, click Options and select Format. In the Column format dialog, under Format as, select HTML Link. You can specify a label to display instead of URLs. Additionally, you can choose how to open the link: in the same or a new window. Finally, you can customize a color for links, a color to show on a link hover, and a color for the visited link.
In Table, you can create dynamic hyperlinks using a column calculation. For example, you can combine a base URL and a column value by creating a calculation similar to the following:
Then, for the calculation, click Options and select Format. Under Format as, select HTML Link. As a result, each link is created dynamically by combining a base URL and a value from the Country column.
In the Table widget, you can now add a hyperlink as an HTML calculation. To do so, add a new dimension calculation, and then provide the URL formatted as the HTML code, for example:
'<a href=https://www.example.com target=_blank>Click me</a>'
Now, you can change the alignment of any column in the Table widget. In the widget settings, next to the column name, point to the Options icon and select Format. In the dialog that opens, select the alignment: Right, Left, or Center. By default, the alignment is set to Auto, where string columns are left-aligned and numeric columns are right-aligned.
Now, users can customize the message displayed in a widget when the visualization does not return any data. Using the new “No data” message field, you can change the default message to the one that better suits a use case for storyboard users. For example, it could be “No claims are available” or “There are no results for your data settings.”
The improved script editor now allows you to rearrange data frame columns simply by dragging and dropping the columns within the field. Additionally, when the script lines are longer than the script field, the text is automatically wrapped.
With a new Rich Text widget, you can add and format text to enhance a storyboard page. The widget’s built-in text editor helps you quickly define the appropriate font properties, create lists, add inline formatting, and align the text. To enhance the text, you can also link images, videos, URLs and insert special characters.
Additionally, the widget’s settings pane provides options for adjusting the widget’s margins and applying a background image.
To add the widget to a storyboard page, go to Widgets > Visualizations > Text & Tabular, and drag Rich Text to the canvas.
By default, Data Server has SSL disabled. However, suppose you need a desktop BI application, like Tableau or Power BI, to use a secure connection to Data Server. In that case, you will need to enable SSL and then import the Data Server certificate into the machine running the desktop application. After enabling SSL, Data Server will use a self-signed certificate. If you have a custom certificate, you can configure Data Server to use it.
Release Notes (other formats)
- To view the summary of the release features, see https://www.dataclaritycorp.com/2021-4-features/.
- To view the Release Notes in PDF formats, see https://www.dataclaritycorp.com/dataclarity-release-notes-2021-4/.