It is important to collect all data, for machine learning and stream analytics to produce the best possible insights. Even if you think it has no relevance to the question you are trying to find an answer for.
Context is what provides real insights.
- Engine temperature could be effected by bad fuel quality or driving in the incorrect gear given the road circumstances or even by a particularly hot day. The relevance and accuracy of machine learning based trend analytics will be severely impacted if a less than complete set of data is used for the analytics.
- Delays in road logistics could have been impacted by severe weather or holiday traffic patterns or even by strike or other forms of industrial action.
- The cash withdrawals at an ATM could be affected by the day of the month, the month of the year, by events at a nearby sports stadium or by severe rain if it is situated outside of a building or even by a spike in crime in the area where it is situated.
- Stock in a warehouse is affected by both up-stream and down-stream events. Forecasting assumes that last year’s data applies to this year. If there is a sudden run on stock in a particular retail region, stock outs could occur if planning was based on previous year data and stock is sent to other forecasted sales regions. Using IoT in a broad sense, companies should collect data from the news media and from other sources such as social media and from the ERP systems of both suppliers and customers. Use this data, along with prior year data, to inform planning.
- International events such as armed conflict could have a severe impact on production and logistics in neighbouring areas.
- Global research, innovations and patents using rare minerals or even using common minerals or metals in new ways could have an impact in demand, scarcity and pricing for certain commodities.
- Momentum in green initiatives and global decisions around greenhouse gas emissions could have a major impact on carbon rich business.
- Service delivery protests, industrial action, taxi strikes etc. could have an impact on staff attendance. This data will not be found on any company system or produced by any company sensor or controller unit. But, this data is critical for a better understanding of staff movement and productivity.
Getting as much as possible data from a diverse as possible set of data sources will provide critically needed context which will enrich the models produced by machine learning algorithms and the resultant actionable insights.