top of page

AI Case Study

Australian Renewable Energy Agency improved accuracy of solar energy predictions by 31% using machine learning methods based on a distributed network

Australian Renewable Energy Agency investigates the ability of machine learning models to predict the output of photovoltaic (solar) energy at different timeframes for a network of sites. Three machine learning methods are investigated along with a baseline. Each performs differently depending on the time interval at which they provide predictions. However, basing predictions on a distributed network versus single site, as used in other methods, resulted in increased accuracy of 9-31% over existing methods.



Renewable Energy

Project Overview

"[T]he central idea of the project was the use of learning methods to make forecasts of PV output at some future time horizon (10, 30 and 60 minutes). The aim of these forecasts is to aid the decision making of the participants in the energy market: for example, the market operator can use a forecast of low supply to request alternative generators ahead of time, while energy-conscious consumers can use a forecast of high supply in the future to decide to use some high consumption device at a later time. The decision making capabilities of a market participant based on forecasts are constrained by at least two factors. The first factor is how reliable the forecasts are, according to some suitable performance measure. Ideally, such a measure reflects the participants’ underlying utility (e.g. the dollar cost incurred from mispredictions); further, forecasts should ideally be produced so as to optimise such a reference performance measure. The second factor is the precise nature of information provided in these forecasts.

Our first goal, therefore, is to confirm the efficacy of a variety of feature engineering options by experimenting on a single PV system, over a range of standard forecasting algorithms and scenarios... Equipped with the insights on feature engineering drawn from a single site, we next develop novel approaches that further improve the forecasting accuracy by aggregating the output of a distributed PV network."

Reported Results

Machine learning algorithms were successfully developed to offer better solar output predictions than the baseline at 10, 30 and 60 minute intervals. It was additionally discovered that training the algorithms on joint predictor data (historic output data for neighbouring PV systems) was more effective than a single site system.

"We have successfully developed and employed new algorithms to tackle the challenge of forecasting the PV output in the 10-60 minute horizon. By aggregating the data from a distributed network of PV systems, we have successfully improved the prediction accuracy by 9.2-31.1% compared with the state-of-the-art methods that are based on single sites. Furthermore by exhaustive experiments we identified the set of features that are
key to this success. In particular, the appropriate choice of historical window and neighbourhood range are observed to be crucial for achieving the highest accuracy... To the best of our knowledge, the achievement of forecasting PV output based on a distributed network and obtaining significantly higher accuracy than localised predictors represents a world first and a signifi- cant breakthrough".


"Four different forecasting methods were tested. Three of the methods are machine-learning (ML) based methods, including two linear methods, with squared and absolute error, respectively, and a multi-layer perceptron neural network. The fourth was a naïve persistence predictor, for comparison to the machine learning methods. Our basic methodology is to train a function f(x) which will take a set of “features” x and predict the PV or radiation data for the new time." A single hidden-layer neural network was tested with a "total of five passes of stochastic gradient descent was used to find parameters."

"Amongst the ML models tested, the results are varied across the four error measures. The neural network’s performance is unreliable. Its MAE and nRMSE values increase with the forecast horizon in the 10 minute interval forecasts, but do the opposite with the 30 minute data. It also experiences increases in error measures with normalisation of the data. Interestingly, the neural network delivers the most accurate forecast for the 10 minute forecast horizon (10 minute data, t+1) while also giving the most accurate 3 hour forecasts (hourly data, t + 3). This result is consistent with similar findings in other solar forecasting research using neural networks [9]. In the linear methods, the L1 loss Support Vector Machine (SVM) either outperforms or is equivalent to the L2 loss SVM method at timescales less than 60 minutes, and has much better accuracy at time horizons greater than one hour. Interesting, the normalisation of the data tends to make the performance of the models more similar. It generally decreases their accuracy at forecast horizons of less than one hour, but the opposite is true for the hourly data forecast (60, 120, 180 minutes)."



Strategic Planning


Accurate forecasting for solar energy potential allows for better shortage planning and therefore makes solar energy systems more efficient. "The primary objective was to develop a methodology that compares favourably against existing methods in forecasting PV output. Although accuracy is one of the key aspects for comparison, computational and data complexity is also important in practice."



The research used "large-scale PV datasets in Canberra retrieved from over years 2012 to 2014. Along with the solar data, exogenous meteorological data was included. These were available from weather observations taken at the Wagga site and the Canberra Airport... These datasets were then further divided into training and test sets, where the last two years were used for testing, and the rest of the data for training. The PV data was trained on the period from April 2010 through the end of 2011 and then tested on 2012 through May 2013. The radiation data was trained on 2007-2009 data, and then tested over 2010-2011."
Missing data (output measures) for the 29 sites covered was filled in by taking the averages of the other sites.

bottom of page