Data catalog features overview
The number of organizations’ datasets keeps growing, and users find it even more challenging to identify which datasets fit their business purpose. Data citizens need to quickly understand which datasets they can trust, where the data comes from, whether the data is up to date, and which datasets include the right data for a task. The platform’s data catalog uses metadata, search tools, and curation activities that help users evaluate whether the datasets are suitable for the intended use. For example, users can search for datasets and data sources, use tagging and data annotations, track lineage in datasets, evaluate usage statistics, review information on security access, and curate the data.
The platform offers the following data catalog capabilities:
You can search for datasets by names, tags (keywords), and descriptions (annotations). You can also use tags to search for needed data assets as data connections and AI connections. Further, you can quickly evaluate each dataset in detail using the Explore dataset feature, which gives you a comprehensive overview of a dataset, all in one place. For details, see Explore dataset.
You can evaluate whether a dataset fits your use case without downloading the data or using any external BI tools. To understand data included in a dataset, review its metadata and then evaluate the actual data values. Dataset explorer allows you to see the whole list of dataset columns, view data types, review columns’ descriptions, and understand calculations. To get a preview of actual data values in the columns and generate quick data insights based on automatic profiling capabilities, see Explore data in a dataset.
The platform provides a way for subject matter experts to contribute business knowledge and manage data assets’ metadata to help analysts understand the data. Dataset managers can add or remove tags, compose dataset annotations, and describe columns to curate data in the organization. Data stewards can endorse datasets—certification within the platform indicates that the data asset is based on reliable data sources and can be trusted. The notes can include explanations where the dataset fits best and highlight data specifics.
Information about the underlying data sources helps users to trust the data. Lineage helps you find the origin of the data (data source file or connection and the respective table). The data transformations, such as new calculated fields or applied filters, are included in the lineage graph.
The usage statistics provide information on how often a dataset is used and which columns are included in the queries. This way, you can evaluate the popularity of the dataset. The dataset overview page includes authors (users with editing rights), the list of top users, and who updated the dataset last. Thus, you may contact the dataset owners about data-related changes. You can quickly identify which datasets have the row-level security filters applied—they are labeled accordingly on the dataset card. If you have permission to edit a dataset, you can review its security filters.
Learn more in Online Help
To learn more about data catalog features, review the following topics:
Track where dataset columns are coming from on the Explore dataset tab of the dataset explorer.
Describe dataset columns — See Manage data sources and columns in a dataset (Step 3)
Let users know which datasets can be trusted — See Certify a dataset.