Home      Log In      Contacts      FAQs      INSTICC Portal
 

Keynote Lecture

 

Data Quality Processing for Deep Learning

Francisco Herrera
University of Granada
Spain
 

Brief Bio
Francisco Herrera (SM'15) received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. He has been the supervisor of 40 Ph.D. students. He has published more than 300 journal papers that have received more than 50000 citations (Scholar Google, H-index 112). He is coauthor of the books "Genetic Fuzzy Systems" (World Scientific, 2001) and "Data Preprocessing in Data Mining" (Springer, 2015), "The 2-tuple Linguistic Model. Computing with Words in Decision Making" (Springer, 2015), "Multilabel Classification. Problem analysis, metrics and techniques" (Springer, 2016), "Multiple Instance Learning. Foundations and Algorithms" (Springer, 2016). He currently acts as Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals. He received several honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the "Spanish Engineer on Computer Science", International Cajastur "Mamdani" Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides (by the regional government of Andalucía). He has been selected as a 2014 Thomson Reuters Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively) .


Abstract
In the last years, deep learning methods and particularly Convolutional Neural Networks (CNNs) have exhibited excellent accuracies in many image and pattern classification problems, among others. To get quality data is the foundation for god data analytics in general, and it is also very important for getting a good deep learning model.
Quality data requires a deep data preprocessing analysis to adapt the data to fulfill the input demands of each learning algorithm. Data preprocessing is an essential part of any data mining process. In some cases, it focuses on correcting the deficiencies that may damage the learning process, such as omissions, noise and outliers, among others. In contrast to the classical classification models, the high abstraction capacity of CNNs allows them to work on the original high dimensional space, which reduces the need for manually preparing the input. However, a suitable preprocessing is still important to improve the quality of the result. One of the most used preprocessing techniques with CNNs is data augmentation for small image datasets, which increases the volume of the training dataset by applying several transformations to the original input. There are other guided preprocessing procedures based on specific problems, like brightness and other images features.
In this talk we present the connection between deep learning and data guided preprocessing approaches throughout all families of methods used to improve the deep learning capabilities, together with some applications.



footer