To create a dataset, you can use various data sources, such as data files, IBM data sources, databases, or any combination of these data sources.
If you create a dataset from two or more data sources (for example, a cube view and an Excel file), you need to define joins between them. For details, see Define joins between data sources.
Each table from the data source is displayed as a separate data source. You can make changes to these data sources, for example, remove columns or add calculations. These changes are not saved in the original files.
You can use the data files of the following types: .csv, .tsv, .txt, .xlsx, .xls, and .sav.
To create a dataset, you can use one or more data files. You may need to prepare your files before uploading them. For details, see Preparing data files.
You can create datasets using the following database types:
- Amazon Athena*
- Amazon Aurora (MySQL)
- Amazon Aurora (PostgreSQL)
- Amazon RDS (MariaDB)
- Amazon RDS (MySQL)
- Amazon RDS (PostgreSQL)
- Amazon Redshift*
- Apache Derby
- Apache Hive*
- Google AlloyDb*
- Google Cloud SQL (MySQL)
- Google Cloud SQL (PostgreSQL)
- IBM Db2
- Microsoft Azure SQL Database*
- Microsoft Azure Synapse*
- Microsoft SQL Server
- SAP Hana*
* The data sources are considered beta in the current release.
To create a dataset from a database, you need to define a connection to the respective server. For details, see Add data connections.