At its simplest, machine learning (ML) is learning from data. Every day, various types of data are recorded on a massive scale
throughout the water sector and ML can be used to analyze these complex datasets. Well-trained ML models can explore and process massive and diverse datasets in real time while also providing rapid predictions and/or recommendations for operators – a difficult and sometimes impossible task for a human, especially in a short time frame.
One common misconception is that ML tools will replace human operational decision making. Water experts are critical to integrating the science of water into model development. And once in production, it will always be important for a human to review the recommendations of the model, periodically verify the model is continuously learning, and apply their own judgment and experience to the situation.
Machine Learning vs. Traditional Modeling
Models in the water industry have traditionally focused on known relationships and fixed equations derived from years of research. Those equations (rules) along with their inputs (data) can be coded directly into a computer model.
ML uses data and answers to learn rules, and uses error minimiz-ation algorithms to find the best way to represent a relationship between the data and the answers. ML can be used to gain insight into processes that are not well understood, processes that are too complex to use a conventional equation, or when mechanistic models don’t represent the system well. They can also be used for more complicated tasks like process optimization.
Case Study: Holistic Wet Weather Management Combining Machine Learning, Treatment Plant Optimization, and Predicting Collection System Influent Flow Hydrographs
The 75 mgd Neuse River Resource Recovery Facility (NRRRF) operated by Raleigh Water in North Carolina, has an average daily flow of 48 mgd and peak hydraulic capacity of 225 mgd. The facility has stringent total nitrogen (TN) limits of less than 3 mg/L at permitted flow, and a quarterly average effluent total phosphorus limit of 2 mg/L. High flows can impact the facility’s ability to meet these strict nutrient limits; influent flows increase dramatically during heavy and/or sustained rainfall, which can shorten treatment time.
The NRRRF has a 32-million-gallon equalization basin (EQ) to withhold a significant portion of the flow and load entering the facility during high flow events. Historically, NRRRF staff utilized collection system pump station data, weather forecasts, and experience to determine when to move flow into the EQ basin. Pump station data provides about 30-60 minutes of advance warning but does not predict if flows would continue to increase. As a result, operators historically had to use their own judgement and experience to determine how to optimize the utility of the EQ basin. Raleigh Water realized requiring a human to process the available data and recall how past storm events unfolded was neither practical nor efficient, and that it could benefit greatly from a quantitative model with the ability to predict the flow hydrograph in advance of and during a rainfall event.
Raleigh Water has a traditional collection system model and collection system flow monitors. The collection system model is an excellent tool for planning but is not equipped to provide flow forecasts in real-time, and the flow monitors are not predictive. Hazen determined that this would be an excellent opportunity to develop and implement a machine learning tool to provide real-time flow hydrograph forecasting.
The model development process was conducted entirely on a desktop computer. Hazen used supervised and unsupervised machine learning to gain insight into the input parameters that best predict future flow. The resulting model has 77 inputs, including streamflow, rainfall (past and predicted), and past plant flow. The ML algorithm was calibrated to six years of historical data, covering 38 storms, and the model accuracy was +/- 2.8 mgd for any point during the storm. Once the desktop model was developed, the project entered the deployment step.
Azure and SQL were used for the automated data pipeline. The predictions are displayed in a web-based Microsoft Power BI dashboard tool that includes a tool to estimate the optimal point to fill the equalization basin to maximize its utility (see Figure X). There is also a tool to estimate the near real-time process capacity of the secondary clarifiers (see Figure Y), which are typically the most strained part of the process during a wet weather event. The entire pipeline including data visualization dashboard was securely deployed to work alongside a closed SCADA network. The model and dashboard are updated hourly with the latest prediction. Since the model uses real-time streamflow, plant flow, rainfall, and rainfall predictions, it naturally adapts to changing conditions throughout the course of the storm and remains relevant.
Secondary clarifier guidance program screen. Left – displays key performance indicators for past 72 hours. Top center – displays past flow (blue colors), projected flow (green), and maximum allowable flow (red) with all secondary clarifiers in service.
Right – calculator tool that allows operators to solve for any one of the following by specifying the other four parameters (shown solved for SC surface area required): SVI, influent flow, RAS flow, MLSS, and clarifier surface area. Bottoms center – additional KPIs and combinations of small and large clarifiers that meet the criteria established in the calculator tool.
The project was deployed in a test mode in December 2019, and completed in July 2020. Since then, at least eight major storm events – including Hurricane Isaias – have occurred and been predicted well beforehand (see Figure Z). With this tool the plant implemented its wet weather standard operating protocol: putting 2 additional primary clarifiers online, then one additional BNR basin, and finally diverting flow to the EQ basin. Wet weather equalization was employed five times since the finalized model was deployed. For these five storms the equalization volume utilized ranged from 12.6 to 26.8 MG. More importantly, the equalization basin volume was never exceeded, indicating this program and the interpretation of its results by Plant Staff were utilized optimally.
As these figures show, the model errs on the side of being conservative, occasionally predicting a flow that is higher than the actual wastewater flow. One of the reasons for this is because rainfall forecasts are uncertain, and the model does depend on their accuracy. More importantly though, local streamflow was shown to be the most significant variable in predicting the peak flow. For this reason, models that predict future streamflow based on predicted rainfall quantities were developed and incorporated into the ML model. Flow predictions more than 10-hours away are highly dependent upon those streamflow predictions, whereas model predictions within that 10-hour window are much more reliant upon the actual streamflow. Thus, the model becomes increasingly accurate as the time to event narrows, but also provides a very good prediction of what is likely to happen based on past storm events when the event is still several days away. When the model overpredicts the peak flow it is generally because the actual streamflow did not rise as high as was expected based on past storms.
The eight storms and associated wet weather flows were significant. The largest rainfall event involved 6.7 inches of rain over a nine-hour period with a peak rainfall intensity of 4.5 inches per hour. Of the available 32 MG in the EQ basin, 17 of the were utilized and effluent quality the day after the storm was the same as the day prior, indicating the program helped the utility maintain process performance and efficiently utilize its equalization volume for managing wet weather flows.
The resultant model and Power BI dashboard are valuable tools that provide utility staff near real-time visualizations of key data, such as current operating parameters and stream flood stages as well as future flow predictions. The tool provides an interactive interface for quickly assessing current conditions and planning for projected future conditions, which assists with making informed decisions. The benefit is greater efficiency and reliability in utilizing existing infrastructure to effectively manage wet weather flows and continue to meet stringent effluent limits. The model is effective at proactive planning for large, anticipated storm events such as tropical storms with built-in tools for assessing a potential range of future flow conditions if additional rainfall occurs beyond current weather predictions. The model is also responsive to real-time measurements of streamflow, rainfall, and plant influent flow, and its accuracy generally improves the closer it gets to the wet weather event.