PRIMORIS      Contacts      FAQs      INSTICC Portal
 

Keynote Lectures

Big Data, Smart Data and Imbalanced Classification: Preprocessing, Models and Challenges
Francisco Herrera, University of Granada, Spain

Privacy-preserving Machine Learning over Sensitive Data
Jin Li, Guangzhou University, China

Assimilated Learning: Bridging the Gap Between Big Data and Smart Data
Yi-ke Guo, Imperial College, United Kingdom

 

Big Data, Smart Data and Imbalanced Classification: Preprocessing, Models and Challenges

Francisco Herrera
University of Granada
Spain
 

Brief Bio
Francisco Herrera (SM'15) received his M.Sc. in Mathematics in 1988 and Ph.D. in Mathematics in 1991, both from the University of Granada, Spain. He is currently a Professor in the Department of Computer Science and Artificial Intelligence at the University of Granada. He has been the supervisor of 40 Ph.D. students. He has published more than 300 journal papers that have received more than 50000 citations (Scholar Google, H-index 112). He is coauthor of the books "Genetic Fuzzy Systems" (World Scientific, 2001) and "Data Preprocessing in Data Mining" (Springer, 2015), "The 2-tuple Linguistic Model. Computing with Words in Decision Making" (Springer, 2015), "Multilabel Classification. Problem analysis, metrics and techniques" (Springer, 2016), "Multiple Instance Learning. Foundations and Algorithms" (Springer, 2016). He currently acts as Editor in Chief of the international journals "Information Fusion" (Elsevier) and “Progress in Artificial Intelligence (Springer). He acts as editorial member of a dozen of journals. He received several honors and awards: ECCAI Fellow 2009, IFSA Fellow 2013, 2010 Spanish National Award on Computer Science ARITMEL to the "Spanish Engineer on Computer Science", International Cajastur "Mamdani" Prize for Soft Computing (Fourth Edition, 2010), IEEE Transactions on Fuzzy System Outstanding 2008 and 2012 Paper Award (bestowed in 2011 and 2015 respectively), 2011 Lotfi A. Zadeh Prize Best paper Award of the International Fuzzy Systems Association, 2013 AEPIA Award to a scientific career in Artificial Intelligence, and 2014 XV Andalucía Research Prize Maimónides (by the regional government of Andalucía). He has been selected as a 2014 Thomson Reuters Highly Cited Researcher http://highlycited.com/ (in the fields of Computer Science and Engineering, respectively) .


Abstract
Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. To overcome this issue, the MapReduce framework has arisen as a"de facto" solution. Basically, it carries out a "divide-and-conquer" distributed procedure in a fault-tolerant way to adapt for commodity hardware. Learning with imbalanced data refers to the scenario in which the amounts of instances that represent the concepts in a given problem follow a different distribution. The main issue when addressing such a learning problem is when the accuracy achieved for each class is also different. This situation occurs since the learning process of most classification algorithm is often biased towards the majority class examples, so that minorities ones are not well modeled into the final system. Being a very common scenario in real life applications, the interest of researchers and practitioners on the topic has grown significantly during these years. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts are accentuated during the data partitioning to fit the MapReduce programming style.
In this talk we will pay attention to the imbalanced big data classification problem, we will analyze the current research state of this are, the behavior of standard preprocessing techniques in this particular framework toward, and we will carry out a discussion on the challenges and future directions for the topic.



 

 

Privacy-preserving Machine Learning over Sensitive Data

Jin Li
Guangzhou University
China
 

Brief Bio
Jin Li is a professor at Guangzhou University. His research interests include design of secure protocols in Cloud Computing, cryptography, and machine learning. He served as a senior research associate at Korea Advanced Institute of Technology (Korea), VirginiaTech (U.S.A.), and Illinois Institute of Technology. He has published more than 100 papers in international conferences and journals, including IEEE INFOCOM, IEEE Transaction on Parallel and Distributed Computation, IEEE Transactions on Computers, IEEE Transactions on Cloud Computing and ESORICS etc. His work has been cited more than 5000 times at Google Scholar and the H-Index is 28.
He also served as program chairs and committee for many international conferences such as CSE 2017, ISICA 2015, 3PGCIC20 14, ICCCN and CloudCom etc. He received two National Science Foundation of China (NSFC) Grants for his research on Security and Privacy in Cloud Computing. He is also panel of NSFC He is PI for more than 15 funding. He has been selected as one of science and technology new stars in Guangzhou and outstanding young scholar in Guangdong province.


Abstract
Machine learning has been applied widely for classifying and recognizing complex data. However, security and privacy issues arise when the data are sensitive or the computing and data storage services are outsourced in the cloud computing. When the data are sensitive and the data evaluators are not fully trusted, the data have to be encrypted and traditional methods cannot be utilized to process the data. In this talk, I will introduce some basic solutions and challenges in this topic. Finally, I will show our method to solve this problem.



 

 

Assimilated Learning: Bridging the Gap Between Big Data and Smart Data

Yi-ke Guo
Imperial College
United Kingdom
 

Brief Bio
Yike Guo is a Professor of Computing Science in the Department of Computing at Imperial College London. He is the founding Director of the Data Science Institute at Imperial College, as well as leading the Discovery Science Group in the department. Professor Guo also holds the position of CTO of the tranSMART Foundation, a global open source community using and developing data sharing and analytics technology for translational medicine. Professor Guo received a first-class honours degree in Computing Science from Tsinghua University, China, in 1985 and received his PhD in Computational Logic from Imperial College in 1993 under the supervision of Professor John Darlington. He founded InforSense, a software company for life science and health care data analysis, and served as CEO for several years before the company's merger with IDBS, a global advanced R&D software provider, in 2009. He has been working on technology and platforms for scientific data analysis since the mid-1990s, where his research focuses on knowledge discovery, data mining and large-scale data management. He has contributed to numerous major research projects including: the UK EPSRC platform project, Discovery Net; the Wellcome Trust-funded Biological Atlas of Insulin Resistance (BAIR); and the European Commission U-BIOPRED project. He is currently the Principal Investigator of the European Innovative Medicines Initiative (IMI) eTRIKS project, a €23M project that is building a cloud-based informatics platform, in which tranSMART is a core component for clinico-genomic medical research, and co-Investigator of Digital City Exchange, a £5.9M research programme exploring ways to digitally link utilities and services within smart cities. Professor Guo has published over 200 articles, papers and reports. Projects he has contributed to have been internationally recognised, including winning the “Most Innovative Data Intensive Application Award” at the Supercomputing 2002 conference for Discovery Net, and the Bio-IT World "Best Practices Award" for U-BIOPRED in 2014. He is a Senior Member of the IEEE and is a Fellow of the British Computer Society.


Abstract
The importance of combined analysis of big and smart data has been well recognized and ample research has been conducted with the focus on “data integration” or “data fusion”. However, the aforementioned imbalance in size, context and richness in semantics made the integration at the data level a hard and unsustainable technology. Although there is some remarkable progresses made in studying the interaction of big and smart data and exploring the advantage of both for the mutual enhancement for their analysis, we still lack a systematic study and uniform approach for the joint analysis of both data types. In this talk, we are introducing Assimilated Learning where smart data and big data will be co-collected in a bi-directionally guided way and co-analysed with a bi-directional transfer learning mechanism.



footer