Business Objective

The customer faced several challenges with extracting and processing the "BrandAnalyticsSearchTerms" report data from the Amazon Selling Partner API

Business Objective

Business Challenge

  • Generic ETL Pipeline for Varied Reporting Periods:
    The API’s different report formats required a pipeline capable of handling both weekly and monthly data loads.
  • Data Quality and Restatement:
    Ensuring high data quality and enabling the ability to restate data in case of errors were critical for maintaining accuracy and integrity.
  • Scalability for Future Reports:
    The pipeline needed to accommodate different report types beyond the “BrandAnalyticsSearchTerms,” ensuring flexibility for future data requirements.
  • Data Validation and Reconciliation (DVR):
    Implementing robust data consistency checks across multiple layers was essential, along with a process to allow corrections for specific accounts or dates.
  • Efficient Data Transformation and Storage:
    Transforming and storing large volumes of JSON data (up to 12 million items) required scalable, high-performance solutions in Azure.

Solution

To address these challenges, a highly efficient and scalable ETL pipeline was developed with the following features:

  1. Generic and Flexible Design:
    A generic pipeline was built using Azure Data Factory and Azure Databricks to process weekly and monthly data loads.
  2. Delta Lake for Data Storage:
    Delta Lake tables were used to store transformed data, enabling delta load patterns for Search_Terms fact and dimension tables.
  3. Scheduled Automation:
    Scheduled triggers ensured the pipeline ran automatically—weekly on Mondays at 11 PM and monthly on the 3rd of each month at 11 PM.
  4. Data Validation and Reconciliation (DVR):
    A DVR process validated data accuracy across all layers and supported restating data when needed.
  5. Advanced Data Transformation:
    Complex transformations were executed in Azure Databricks to prepare the data for analytics.
  6. Scalability for Future Needs:
    The pipeline’s flexible architecture supports additional report types and future scalability.

Benefits

  1. Actionable Insights:
    Customers could monitor search rank trends and adjust marketing strategies based on consumer interest.
  2. Optimized Marketing Efforts:
    Identifying top ASINs by search terms helped focus marketing on high-performing products, optimizing ad spend and improving product listings.
  3. Enhanced Branding:
    Tracking unique branded search terms provided insights into brand recognition and effectiveness of branding efforts.
  4. Improved ROI:
    Analyzing the correlation between search rank and ad spend enabled efficient budget allocation for maximum return on investment.
  5. Category-Specific Strategies:
    Segmenting search terms by category allowed for tailored marketing efforts, improving sales and market penetration.