Add a dataset based on a server
You can add a new dataset using data from various data sources. This topic describes how to create a dataset based on a database. For details, see Data sources.
If you want to base your dataset on a file, see Add a dataset based on data files.
Prerequisites
-
You have all the necessary role permissions granted by an administrator in Access Manager.
Permissionsdataprep.access
dataprep.dataset.create
dataprep.data-connection.create (if you need to add files or your own connections)
dataprep.dataset.extract (if you need to create an extract dataset)
-
You have access to a server or database that you want to use.
-
To connect to data, you need to have one of the following:
-
A saved or shared data connection
-
Connection details to a server
-
Procedure
-
You have two options to proceed with dataset creation:
To a serverBy using this option, you establish a new connection to a server and then create a dataset based on it. The connection will be saved on the Data connections pane.
-
Click To a server.
-
Click a server that you need.
-
In the Add data connection dialog, define the connection to the server.
The available fields depend on the type of data source that you selected.
Microsoft SQL Server-
Select the authentication mode. You can choose one of the following options:
-
Username and password
-
Windows authentication
-
-
Specify the connection details as follows:
-
Server – Specify the address of the Microsoft SQL server.
-
Port – Specify the port used for the database.
-
Database – Specify the name of the database.
-
Username – Specify the username for the database.
This field is available only if you selected Username and password as the authentication mode. -
Password – Specify the password for the database.
This field is available only if you selected Username and password as the authentication mode. -
Schema – Specify the name of the database schema. Click Load list to view available schemas.
-
Ask user for credentials – Switch to On if you want to share this connection or a dataset based on it without sharing your credentials. In this case, recipients need to provide their credentials to access data from this connection.
-
Connection name – Specify a name for this connection.
-
Oracle data source-
Select the authentication mode. You can choose one of the following options:
-
Username, password, and SID
-
Username, password, and service name
-
-
Specify the connection details as follows:
-
Server – Specify the address of the server where the Oracle data source is stored.
-
Port – Specify the port used for the database.
-
Service ID – Specify the service ID for the database.
This field is available only if you selected Username, password, and SID as the authentication mode. -
Service name – Specify the service name for the database.
This field is available only if you selected Username, password, and service name as the authentication mode. -
Username – Specify the username for the database.
-
Password – Specify the password for the database.
-
Schema – Specify the name of the database schema. Click Load list to view available schemas.
-
Ask user for credentials – Switch to On if you want to share this connection or a dataset based on it without sharing your credentials. In this case, recipients will need to provide their credentials to access data from this connection.
-
Connection name – Specify a name for this connection.
-
Other databasesDepending on the database type, the following fields may be available:
-
Server – Specify the address of the server where the database is stored.
-
Port – Specify the port used for the database.
-
Database – Specify the name of the database.
-
Username – Specify the username for the database.
-
Password – Specify the password for the database.
-
Schema – Specify the name of the database schema. Click Load list to view available schemas.
The field is available for the PostgreSQL, DB2, Apache Derby, and Google Cloud SQL sources.
-
Host – Specify the host for the database.
-
Instance – Specify the instance of the database.
-
Ask user for credentials – Switch to On if you want to share this connection or a dataset based on it without sharing your credentials. In this case, recipients will need to provide their credentials to access data from this connection.
-
Connection name – Specify a name for this connection.
-
-
Click Test Connection.
-
If the connection to the data source was successful, click Create dataset.
-
-
Select data for your dataset:
-
In the Choose tables to import page, under Tables, double-click the data source to expand it.
-
In the tree, click the measures and dimensions that you want to include in the dataset.
All the items that you click appear under Selected items.
If you click a folder, all the items from that folder are added with one click.
Rename columnsTo rename all the columns in a table, point to a table, and then click Rename all columns.
In the Rename all columns dialog, specify how to rename the columns:
- Add prefix – Add a data source name or a custom string in front of the column name.
- Add suffix – Add a data source name or a custom string after the column name.
- Format – Capitalize the column names: only first letter, all uppercase, or all lowercase.
-
Replace – Enter the text that you want to replace and specify a new text. For example, you can replace the underscore with the space character.
Rename a columnTo rename a column in a table, point to the column, and then click Rename column.
In the Rename columns dialog, specify a new name for the column and click OK.
To remove an item from the Selected items section, point to a column, and then click Delete. To remove all selected items, click Clear all columns at the bottom of the page.
-
-
In the upper-right corner of the page, click Next.
The Step 2 – Refine page opens.
On this page, you can refine the data that you extracted from a data connection and add more data sources.
-
Review the connections and data sources details:
-
Connection to the data source
The icon indicates the type of data source (server or file). -
Data sources (tables)
If the source contains only one table, this table is displayed on a tab as a unique data source. The name of the table (data source) is the same as the file name. If you have more than one table, each is placed on a separate tab.
-
Columns selected for the dataset
You can click Preview data to view the data in columns. The preview is not available for some data sources.
-
-
Manage the connections and data sources as needed:
Add more data to the datasetClick Add more data and follow the required steps for adding a new data source or a new connection. If you want to add a file (for example, an Excel spreadsheet), see Add a dataset based on data files.
Note: If you have more than one data source, then after data refinement, you will review data joins to ensure relationships among data sources are identified as needed. More details are in Step 3 – Join and Preview.
Edit the data source detailsThe following options are available:
-
Rename – Click Rename data source and specify a new name for the data source.
Important: The name of the first data source is used for the new dataset. If you want to create several datasets from the same data source, specify a different name for each dataset. You cannot create several datasets with the same name.
-
Remove – Click Remove next to the name of the data source.
-
Readd a data source – Click the arrow next to the connection name, and then select the data source.
Add a calculationYou have several options for adding a calculation to a dataset. For details, see About calculations.
Add a security filterIf no security filters are added, then all the data in a shared dataset is visible to all its recipients. With a security filter, some recipients have access only to a subset of data that is relevant to them. For example, each department can see the data related to the respective department and not the whole dataset.
For details, see Row-level security filters.
-
-
Manage the columns of each data source as needed:
Add or edit a column in a data sourceThe following options are available:
-
Rename – Click column options, and then click Rename. Edit the name as needed.
-
Exclude – Point to the column, click column options, and then click Exclude.
-
Split – Point to the column, click column options, and then click one of the following options:
-
Clean – Point to the column, click column options, and then click one of the following options:
-
Format
- Make uppercase – Make all characters uppercase.
- Make lowercase – Make all characters lowercase.
- Capitalize first letter – Make the first letter uppercase and turn the rest to lowercase.
-
Remove
-
Punctuation – Remove selected types of punctuation in a string column.
-
NULLs – Add a filter to remove all rows with the NULL value in a column.
-
Blanks – Add a filter to remove all blank rows in a string column.
-
Zero values – Add a filter to remove all rows with zero value in a numeric column.
-
Negative values – Add a filter to remove all rows with a negative value in a numeric column.
-
-
Replace
-
Trim
-
Leading & Trailing spaces – Remove spaces at the beginning and at the end of the string in a column.
-
Leading spaces – Remove spaces at the beginning of the string in a column.
-
Trailing spaces – Remove spaces at the end of the string in a column.
-
All spaces – Remove all spaces in a column.
-
Leading character – Remove the specified character at the beginning of the string, in a column.
-
Trailing character – Remove the specified character at the end of the string, in a column.
-
-
Concatenate – Join two strings into one, with or without a separator in between.
-
-
Readd – Click the arrow next to the name of the data source. Then, select the column that you want to readd.
-
Sort alphabetically – Point to the column, and then click sort column.
-
Add new column – (Server data sources only) Click the arrow next to the name of the data source. Then, click the Add column and select the columns in the Add column dialog.
Check the data role for columnsFor each data source, check if the data roles are identified correctly. To adjust them, see Change the data role.
Check the aggregation typeCheck the aggregation type for the columns. Adjust the aggregation if needed. The aggregation type that is defined at the time of a dataset creation is a default type for that measure in a visualization.
For details, see Change the data aggregation.
Format numbersIn the Column format dialog, select the format type to apply, customize its options, and click OK.
NumberYou can customize the following format options:
-
Decimal places
- Select the number of decimal places to display in the number. For example, depending on the decimal places, a number may look like 3.14 or 3.14159.
-
Decimal separator
- Select the separator to show for decimals: Comma (,), Period (.), Space (), or Custom to provide another separator symbol.
-
Thousands separator
- Select which separator to show every thousand, for example, 10,000 or 10.000.
- You can select Comma (,), Period (.), Space (), or Custom to provide another separator symbol. To show the number without any separator, select Custom and leave the field empty.
-
Negative values
- Select how to display negative values: -1234, (1234), or 1234-.
-
Display units
- Select a specific unit to abbreviate the numbers: None, Auto, Thousands (K), Millions (M), Billions (B), Billions (G), and Trillions (T).
- For example, if the number is 47,000 and the display units are Thousands (K), the number is shown as 47K.
-
Prefix
- Specify characters that precede each displayed number.
-
Suffix
- Specify characters that follow each displayed number.
CurrencyYou can customize the following format options:
-
Decimal places
- Select the number of decimal places to display in the number. For example, depending on the decimal places, a number may look like 3.14 or 3.14159.
-
Decimal separator
- Select the separator to show for decimals: Comma (,), Period (.), Space (), or Custom to provide another separator symbol.
-
Thousands separator
- Select which separator to show every thousand, for example, 10,000 or 10.000.
- You can select Comma (,), Period (.), Space (), or Custom to provide another separator symbol. To show the number without any separator, select Custom and leave the field empty.
-
Negative values
- Select how to display negative values: -1234, (1234), or 1234-.
-
Display units
- Select a specific unit to abbreviate the numbers: None, Auto, Thousands (K), Millions (M), Billions (B), Billions (G), and Trillions (T).
- For example, if the number is 47,000 and the display units are Thousands (K), the number is shown as 47K.
-
Currency
- Select a currency from the list or specify the custom currency.
-
Display currency
-
Select how to display currency: as a symbol ($) or as an international symbol (USD).
-
Prefix
- Specify characters that precede each displayed number.
-
Suffix
- Specify characters that follow each displayed number.
PercentageYou can customize the following format options:
-
Decimal places
- Select the number of decimal places to display in the number. For example, depending on the decimal places, a number may look like 3.14 or 3.14159.
-
Multiply by 100
- Select whether to multiply the number representing a percentage. For example, the value of 0.05 will be formatted as 5%.
-
-
If you added another data source, you need to join them.
Click Next to go to the Step 3 – Join and preview page.
-
(Extract Creators only) By default, new datasets use a live connection. You may choose to create an extract. For details, see Extract data for a dataset.
-
Click Finish.
Click OK in the notification message.
The dataset is created.
Now, you can use this dataset for visualizations. For details, see Add widgets.
Note: If you established a connection to the server when creating this dataset, the data connection is saved to the Data connections pane.
On this page, you can review and adjust the relationships (joins) between tables, add or remove columns, preview the data source column data, and save your dataset.
-
Check the join types added between the tables. To adjust the joins, see Define joins between data sources.
-
If needed, you can manage data sources and columns in this step too. For details, see Manage data sources and columns in a dataset (Step 3).
-
(Optional) Click Preview data to preview the dataset. You can resize the Data preview pane by dragging the pane header.
You can also manage the columns from the preview table.
-
If needed, change the name of your dataset.
-
(Extract Creators only) By default, new datasets use a live connection. You may choose to create an extract. For details, see Extract data for a dataset.
-
Click Finish.
Click OK in the notification message.
The dataset is created.
Now, you can use this dataset for visualizations. For details, see Add widgets.
If you established a connection to a server when creating this dataset, the data connection is saved to the Data connections pane.
Note: You can return to the dataset and edit columns at any time later. For details, see Edit dataset.
Comments
0 comments