The Team was working on a data governance requirement. It required the Team to classify the assets as a “Dimension Table” or a “Transaction Table” based on the asset name. They were using Azure Purview as the data governance tool. However, it was a hectic task to follow the same steps and extremely monotonous to do the same work repeatedly. Therefore, we believed automating it would help speed up the process.
The Team used PyApacheAtlas to automate the classification of an asset. A Python SDK is used to perform the most common operations of Azure Purview programmatically. One can follow the steps below to classify an asset based on the asset name.
Step 1 – Establish a Connection
Below is the function to classify the assets as a “Dimension Table” or a “Transaction Table” based on the asset name. It fetches all the asset details in a particular collection and retrieves the asset name to classify.
CLIENT._classify_entity_adds method is used to update all the entities with the provided classifications.
get_all_entities_in_collection function is used to retrieve all entities in the specified collection.
The parameters are Collection-friendly names, typically a 6-letter pseudo-random string such as “kd2cbh,” which can be obtained in the purview portal.
Conclusion:
It helped us achieve the classification of assets in an efficient manner and removed the manual efforts.