Supporting the data cleansing processes of data migration projects using the KNIME Analytics Platform

During data migration projects data cleaning is often neglected, simply because sufficient and skilled resources are not available. Therefore, in a pilot project we have examined which tool could be used without developer knowledge to effectively support data cleansing tasks during data migration projects.


MINDSPIRE Consulting, a member of Inovivo, our company group, provides data migration services for its banking industry clientele. They have developed a data migration methodology and toolset based on the experience gained from their successful ETL projects.

Our company, Onespire Ltd. is involved in data cleaning activities through our Data Science (DS) services, so this time in our joint pilot project we have reviewed its toolset in order to cover the data cleansing needs arising in migration projects.

The data cleaning tasks of data migration projects

A variety of data cleanliness issues can emerge during data migration projects, from simple typing errors to complex data consistency issues.

Based on our experience, data cleansing tasks are often not carried out during data migration projects because there are no experts available to perform this complex task.

This kind of data cleaning tasks are also challenging because they are unique, so the previously developed solutions or processes cannot be used without changes on other projects. Therefore, after verifying the data quality, a customized concept must be developed for the given environment.

Application of Data Science tools in data cleaning projects

One of the questions we faced was whether, in addition to traditional data migration solutions, could Data Science tools be used to support data cleansing tasks during the implementation of such projects.

In this regard, we selected the KNIME Analytics Platform because it is easy to use, does not require programming skills, and has many specific functions that are rather suitable for solving data cleaning tasks.

The KNIME Analytic Platform is a free, open source data analysis, reporting and integration platform. The tool effectively supports the data extraction – data transformation – data loading (ETL) processes.

The solution has a community of one hundred thousand users who, in addition to data migration, also use the software for data scrubbing, training algorithms, predictive analytics, interactive visual display and report creation.

KNIME is good at identifying data patterns and supports business decisions by exploiting hidden information. Developer knowledge is not required for its use, a complete process can be created on the interface by moving the various elementary units, the nodes.

The other question was to what extent Data Science methodology can meet the requirements of data migration projects.
It was evident that Data Science takes into account several aspects that are not relevant in data migration projects. These include, but are not limited to, scaling and normalization. However, there are also many building blocks that can be easily implemented in the data migration methodology. Examples include replacing missing values, removing duplicates, and type conversion.

In our joint pilot project with MINDSPIRE Consulting, we therefore created a sample data cleaning process with the help of KNIME in order to verify our concept.

Overview of our KNIME data cleansing project

The purpose of the pilot project was to find out whether KNIME can be used to support the data cleaning tasks of a data migration project. A big advantage of the KNIME platform, in addition to the ease of use, is flexibility. A workflow created can be easily and quickly changed by inserting new steps or by replacing and configuring previous steps. The disadvantage is that in case of larger amounts of data, we may face performance problems with the free version.

Structure of the KNIME workflow

The workflow created during the project aimed to clean the data of a customer database consisting of ten records and had four separate tasks:

  1. Defining the data cleaning step based on the Data Science methodology.
  2. Selecting or constructing a sample database.
  3. Creating the workflow using the KNIME workbench.
  4. Iterative process of testing and correcting.

The workflow was run on a data set with limited records, containing intentionally incorrect customer data.


The KNIME workflow created during the project

Knime workflow example

Detailed information about the KNIME data cleansing sample project is available here



The data cleansing pilot project carried out with the experts of MINDSPIRE Consulting supported our assumption that there are many similarities between the procedures defined and applied by Data Science and the mostly ad-hoc solutions used during data scrubbing tasks.

Accordingly, it can be stated that it is strongly recommended to use the existing experience and knowledge regarding Data Science tools and methodology during the planning and execution of the data cleansing tasks of data migration projects.

Based on our current knowledge, KNIME can be used adequately in connection with the planning, construction and testing of the data cleaning function, however, a truly effective data cleansing solution could be created with an independent module developed in an advanced programming language.

An additional advantage of KNIME is that it enables the recognition and analysis of hidden patterns in the data even without programming knowledge, thus enabling the involvement of additional employees in the tasks on typically resource-poor projects.

Supporting the data cleansing processes of data migration projects using the KNIME Analytics Platform

This article was written by: Ákos Erdész

Onespire Data Science and Analytics Services

Onespire logo small

Discover our other posts in this category!

Discover our latest posts!

Onespire Children’s Day 2024

Last Sunday for the International Children’s Day, the Onespire Office was somewhat transformed.

Onespire Ultrabalaton 2024

Our report about the event🏃‍♂️🚴‍♀️✨ Ultrabalaton 2024: Onespire Team's Triumph! ✨🚴‍♂️🏃‍♀️ We are thrilled to announce the incredible success of the Onespire team at this year's Ultrabalaton! 🏆 Despite the challenging terrain and fierce competition, our team rose to...

Budapest Business University professional event 2024

The students could listen to presentations by companies and inquire about career opportunities.

Comparison of SAP S/4HANA FSCM Credit Management and SAP ECC SD-based credit management

We analyze these solutions in detail in order to help you understand what changes have occurred.

SAP Partner Kick-off 2024

Endre Halmos, Member of the Board at Onespire Ltd. attended the SAP-organized conference in Zagreb.

Junior Gokart Program 2024

Our colleagues took part in an exciting go-karting event, allowing everyone to unleash their inner speed demon.

Overview of the Contract Management and Contract Lifecycle Management solutions

As more and more businesses seek to digitize their legal processes, it is crucial that contracts are easy to track, review, and secure.

SAP conference February 2024

We organized a technology event to present SAP S/4HANA and related revolutionary solutions.

Annual All-Staff Meeting & Award Ceremony 2024

This year, the Larus Restaurant and Events Center hosted Onespire’s annual All Staff Meeting again.

Supporting ERP processes using Artificial Intelligence

We examine how processes managed in enterprise resource planning systems can be supported using AI.

Onespire Ltd.’s year-end support activities in 2023

We outline our contribution to the operation of three non-profit public benefit organizations.

SAP CDS Views – The primary data delivery technology for SAP HANA

In this article, we highlight how SAP HANA can change the way riports are delivered.

Do you have a question regarding our services?

Contact Onespire's experts!

Follow us on social media!