Big Data

The amount of data is exponentially growing.
To deal with this huge amount of information, and to extract its enormous hidden value, the DBGroup is carrying on research about: data management, data analysis, and data accessibility.

  • Big Data Integration: for making sense of big data, scattered across multiple sources, novel scalable techniques are needed. The DBGroup is studying and developing cutting-edge tools for supporting data engineers and data scientists to do that easily and efficiently.
  • Big Data Management, i.e., how to handle the huge amount of data: since the volume of the data to be analysed is extremely large, the DBGroup is adopting cutting-edge technologies to manage Big Data (e.g. Apache Hadoop, Apache Spark, NoSQL/NewSQL DBMS).
  • Big Data Analysis, i.e., how to get valuable insight form the data, and how to extract information to drive decision making process: given the huge amount of involved data, traditional techniques for machine learning and, more generally, data analysis on “small” data are no longer applicable. Hence, the DBGroup is focused on developing new approach to work in this context and integrated with the systems for Data Management.

The DBGroup is part of the international research movement that proposes a new perspective of using Machine Learning (ML), that is MLOps. The goal of MLOps is making high quality data available through all stages of the ML project lifecycle. MLOps tools are needed to make Data-Driven AI an efficient and systematic process.

Theory, techniques and tools to deal with big data are taught in the courses held by the DBGroup for the Master's Degree in Computer Engineering.

Higher Training Courses

Recent Talks

  • Prof. Sonia Bergamaschi and Luca Zecchini presented the contribution "Big Data Integration for Data-Centric AI", describing the research activities carried out by the DBGroup, at ItaData 2022 in Milan on September 20-21, 2022. [slides]
  • Dr. Giovanni Simonini and Dr. Luca Gagliardelli presented our research papers "Entity Resolution On-Demand" and "Generalized Supervised Meta-blocking" at VLDB 2022, held in Sydney on September 5-9, 2022.
  • Dr. Luca Gagliardelli presented our contribution "ECDP: A Big Data Platform for the Smart Monitoring of Local Energy Communities" at the DataPlat workshop at EDBT/ICDT 2022 on March 29, 2022.
  • Prof. Sonia Bergamaschi presented our contribution "Big Data Integration & Data-Centric AI for eHealth", describing the research activities carried out by the DBGroup in this area, at Ital-IA 2022, on February 10, 2022. [paper] [slides]
  • Prof. Sonia Bergamaschi held a talk entitled "Big Data Integration for e-Health" at the Data4SmartHealth workshop in Bolzano on October 27, 2021. [slides]
  • Prof. Sonia Bergamaschi held a talk "Big Data & Cognitive Computing: Challenges and Opportunities for Data Driven Economies" at Pulsar Event in Formigine on October 1st, 2021. [slides]
  • RulER has been presented at EDBT 2020, online conference (due to covid-19). [paper] [code]
  • Prof. Sonia Bergamaschi, Dr. Luca Gagliardelli and Dr. Giovanni Simonini participated to SEBD 2020 to present our last work about how to "Scaling Up Record-level Matching Rules".

