Research | Big Data in Atmospheric Physics

Machine learning in big data of atmospheric physics

In atmospheric physics, as in many other natural sciences, one is confronted with the situation that, due to the improvement of measurement technology and the enormous increase in computing power, huge amounts of data are available that can hardy be evaluated or not at all with conventional means. At the same time, we have a relatively poor understanding of the complex multi-scale system of the atmosphere; many fundamental processes and their impact on the system remain unclear.

The current progress in machine learning gives reason to hope that these new methods in the field of statistical data evaluation will provide tools and concepts that are useful in data analysis and process modelling in natural sciences and especially for atmospheric physics. On the one hand, the large amounts of data in atmospheric physics can be evaluated, so that we may be able to identify certain relevant and dominant processes and understand their impacts on the system. On the other hand, machine learning methods may allow us to automatically learn models from the data and use them to answer relevant scientific questions in atmospheric physics. These machine learning methods can take different forms, e.g. as supervised or unsupervised techniques and in automated form for the development of models.

In the project Big Data in Atmospheric Physics (BINARY) we will investigate relevant scientific questions in atmospheric physics applying modern machine learning techniques. Since it is usually not possible and if so, not meaningful to use standard methods without adaptation, in this project we combine expertise from computer sciences and atmospheric physics in an interdisciplinary fashion. Along scientific objectives in atmospheric physics, algorithms are developed and/or adapted for the respective problems. Only by this close combination of research fields in an interdisciplinary way, we will be able to use the tremendous potential of modern machine learning techniques for natural sciences. Actually, the close collaboration will also lead to scientific progress in the field of computer sciences. From complex application problems comprehensive algorithms can be developed. Also the technical issues in terms of handling huge data sets in terms of storage and intelligent processing on modern high performance computing architectures are complex and require further developments or adaptation of existing strategies leading to progress in the respective research field.

Methods of machine learning

In terms of methods of machine learning techniques, we will mostly concentrate on three different major classes, sorted by increasing complexity:

Classification and pattern recognition: Research often starts with formulation of hypotheses and evaluation against available data (e.g. in terms of empirical correlations). Learning of classifiers for pattern recognition and analysis of learned correlations might serve as a first step for understanding relevant processes.

Parameterisation of processes: In a next step, one can try to extend and/or improve insufficient models as used in natural sciences. Here, the methods can directly learn parameterisations from data and via iterations by comparison with model output and data, improved representations of possibly unresolved processes can be obtained.

Automatic modeling of complex systems: The model of a complex system can be learned directly from data. Prominent examples for these techniques are deep neural networks, which are able to learn physical laws of restricted systems directly from examples. In addition, information of the physical system can be used for reducing the amount of data to be used.

Scientific Objectives in atmospheric physics

In terms of scientific questions in atmospheric physics, we concentrate on three different topics:

Structure formation for clouds and cloud systems

Which structures (clouds and convective systems) occur under which environmental conditions?
What are the dominant physical processes that cause structure formation?
How can structure formation processes of clouds and cloud systems be modelled and represented in weather and climate models?

Predictability of difficult meteorological situations

Can machine learning from Big Data be used to find and classify false predictions?
Which physical processes and their representations in models are responsible for false predictions?
Can better parameterizations be automatically derived from data (measurements/model output) using learning algorithms?

Representation of small scale processes in coarse resolution models

Which structures are found in the small-scale measurements?
How are small scale structures from measurements represented in reanalysis data?
Is it possible to find classes of cases in high-resolution measurements that are reflected in coarse resolution data and to classify them accordingly ("upscaling")?

The goal of the project is to address the scientific questions from atmospheric physics using the methods from machine learning in an interdisciplinary team.