Moving from Intuitions to Predictive Analysis with Big Data
About the Client:
As an Agro-Chemical major with over $6 Billion in annual revenue, our client and its affiliates are a global provider of value-added solutions for the agriculture, turf and ornamental, energy and chemical markets. They also service and supply one of the largest networks for providing the essential nutrients that support global agriculture. The company services demand through its global presence which includes Australia, UK, France, Brazil, Mexico, and U.S amongst others.
Our client has the need to understand the correlation between the sales of fertilizers and weather conditions across multiple regions. Their objective was to provide a data driven analytics platform for interrogation by their business analysts, and move the company towards predictive modeling and analytics based business planning.
The Business planners have been using market/sales forecasts entered manually, using spreadsheets and based on their knowledge of the industry and market conditions. The required data, gathered from several industry documents was spread across multiple databases/custom sheets at different demographic locations.
A further objective of this initiative was to better understand if the company could leverage “Big Data” tools and technologies to enable a better solution by managing and searching these documents.
TekLink Solution (High Level):
The overall approach was divided into data-driven strategies using “Big data” tools and technologies as below:
Technical Solution Overview (Details):
- The shipment data was loaded in HDFS and Hive tables created for simple queries. External weather data was loaded in separate files for 2 regions as a pilot run. Some data cleansing was inevitably needed on weather data as it had an amount of poor data like 999999 for min & max temperature for random days. This would have obviously caused data miss-match and miss-represented any results.
- Pig script was used to create a third data set which was an overlay and comparison of shipment and weather data. The filtering of shipment data (just taking pilot region shipment data) was also done in the Pig script before the creation of the third data set.
- This third dataset was then used to run various algorithms (multilinear regression and decision trees) to understand the correlation.
- These algorithms were run using scala code and calling MLLib functions that are resident within scala.
- A Solr tool was implemented and multiple relevant datasets were ingested in Solr.
- Solr with minimum configuration was able to ingest structured and unstructured data formats as required.
- The Solr admin interface was further used to demonstrate the search capabilities of Solr.
- Following data ingestion, the different search terms were used to hi-light the relevant documents.
Outcomes and Benefits:
- On analyzing various feature vectors, up to 0.9 correlation between weather and shipment data was shown to exist.
- Our client was therefore able by implementing the solution across the enterprise able to analyze and index documents and datasets utilizing Solr.
- As a result and or the first time business analysts were empowered by the use of a big-data analytics platform to make decisions based on accurate predictive data, rather than intuitions, and/or systems of record.