- Machine learning
- Statistics
Contribution to the Different Sub-projects
Scientific Activities in the Different Sub-Projects
COLOSYS:
In this sub-project my role is to apply my expertise in machine learning and statistics to complex data sets. The complexity of the data sets to be analyzed refers to several aspects: 1. the huge amounts of data that has become available and which, according to prominent researchers, exceed the capability of the existing tools to analyze them, 2. the strong heterogeneity of the data sets, due to the fact that they represent measurements from very diverse processes, and 3. certain artifacts that are caused by inaccurate observations, such as missingness and false positives.
These challenges will be handled by appropriate machine learning techniques, whereby statistical tools will be applied to ensure that the data sets are properly preprocessed and analyzed, and that results are interpreted in a statistically correct way. Furthermore, the heterogeneity of the collected data sets requires the use of methods that are able to handle this new kind of data. Since traditional methods have been developed to deal with homogeneous data (typically assuming a limited number of real input variables and one or more real output variables), research will be needed to extend these methods. Possible exploration paths include deep learning, which are able to learn multiple levels of representations that correspond to different levels of abstraction, and committee machines, which combine the results of different methods into a single response.