Recent improvements in computer resources have allowed weather prediction models to be run with grids fine enough to allow realistic simulation of thunderstorm systems, some with damaging winds and strong rotation. Although errors in location and timing can be large, it is likely that machine learning (ML) can be applied to mitigate some problems. For ML to work well, however, reliable sources of verification data are needed, but according to recent studies and SPC and NSSL scientists, serious shortcomings exist in the severe wind database (gusts >= 50 knots), making it likely the most poorly depicted of all severe hazards. For example, tree damage often verifies severe thunderstorm warnings, but it is strongly related to many non-meteorological parameters. Also, human estimates of wind typically exceed measured values by 25%. These problems not only harm verification of warnings, but also limit the potential of ML to improve forecasts.
Our project will create a diagnostic tool using ML that will assign probabilities that thunderstorm winds meet thresholds for severe (>= 50 knots) and significant severe (>=65 knots). ML techniques are being applied to a 12-year period of storm reports, surface weather station data and model analyses (RAP and RUC) that will supply near-storm environment information. The diagnostic tool will be tested in real-time at HWT Spring Forecast Experiments (SFEs). We will work with HWT personnel (our principle collaborators are Dr. Adam Clark, NSSL and Dr. Israel Jirak, SPC). Our research will focus on developing ML models within a semi-supervised environment, as the datasets we plan to use have both high uncertainty and low volumes of verified severe wind events. Thus, we will approach the ML model generation by first clustering observations from the historical database into several groups using the spatio-temporal locations and the variables described above. We will then develop predictive models that will learn within each cluster and also share wisdom across clusters, in particular to groups which do not have substantial numbers of observed wind speeds. Our model will incorporate semi-supervised learning methods that will account for uncertainty in the labeled parts of the data (that is, the information provided by the storm database reports) and predict wind speed more accurately, allowing us to use datasets of higher quality and more labels to generate ML models. We will develop both physics-based and data-driven techniques for feature extraction and dimensionality reduction for preprocessing input data for ML model generation. Specifically, in our data-driven approach, we will deploy Generative Adversarial Networks (GANs), as GANs are successful in identifying features to learn when those features may not be well understood or may be incomplete.