Dataiku V13:4 key features to accelerate your data projects

For several years, Eulidia has been supporting its customers on Dataiku, and has seen numerous updates. But some of them really shake things up. With version 13, Dataiku introduced new features that concretely improve the lives of project teams. Here is our selection (excluding GenAI) of the 4 advances not to be missed.

Git merges in the interface: a real step towards smooth collaboration

Dataiku has always made it easy for multidisciplinary teams to collaborate on data projects. But when several people are working simultaneously on a DSS project, it can quickly be chaos:

We are losing track of changes, and it is becoming complex to identify precisely who made what changes, when and for what reason

You can break (unintentionally) a pipeline due to poor handling

You're crushing the work of your colleagues. If several contributors edit the same elements at the same time, this creates conflicts or overwrites the work of others.

A version control system in Dataiku

To address this problem, Dataiku integrates a version control system based on Git. It makes it possible to trace the changes made (code, recipes, scenarios...) in order to facilitate collaboration and to maintain a clear history of the project.

H3: work on a stable and functional version of the project

In its main use, the control version allows you to work on a stable and functional version of the project, while benefiting from development branches separate from the main branch, used to develop new functionalities in a safe environment.

Until now, it was possible to create new branches, but not to merge them. But with V13, we can finally manage the Merge directly in the interface, visualize conflicts, resolve them visually, and maintain control.

Concrete case: a scoring model to streamline the data project

Concrete case: A scoring model is already in production but new business data is now available. The Data team can therefore experiment with a new scoring model with the ambition of improving business metrics. During this experimentation phase, it is therefore useful to work on a secure version that is uncorrelated from the production version in order not to impact it (because we are not certain that the new version is better than the old one). Once the model is validated, it can then be merged into the main branch directly from the Dataiku interface.

Concretely, this makes it possible to meet the main challenges of collaborative work:

Monitoring changes: Each modification is recorded: we know who changed what, when, and why. During each merge, a comment can be added to document the choices made, which improves the collective understanding of the project.
Restoration in case of error: An error in a recipe? A script deleted by mistake? It is possible to go back to a previous stable version, whether it is an object, a file, or the entire project.
Conflicts between developers: Thanks to version control, everyone can work on a separate branch, limiting the risk of conflicts. And if there is a conflict, the resolution integrated into the interface makes it possible to resolve it without leaving Dataiku.

Dataiku V13: What's changing

So what changes with V13:

H3: The management of Merge from the graphical interface
H3: Comparing two branches and identifying differences
H3: Resolving conflicts visually
H3: Managing the branch merge properly, and without having to go through Git

Contenu de l’article — Interface to merge a branch into the Master branch

‍

With this innovation, Dataiku continues to strengthen its position as a collaborative platform that is accessible to all data profiles. A welcome advance for teams that industrialize their analytical workflows on a large scale.

Link: https://doc.dataiku.com/dss/latest/release_notes/13.html#new-feature-builtin-git-merging

Data Quality: an essential prerequisite for any data project

We have long known that data quality is the basis for the success of any data project. Without reliable, comprehensive, and consistent data, even the most sophisticated models become ineffective. Wrong data, and all your analysis goes off the rails. Data quality is not a luxury, it is a foundation.

Centralized and visual management of quality policies via Dataiku V12.6

In its V12.6, Dataiku introduces centralized and visual management of quality policies to ensure more reliable and auditable pipelines.

In practice, several situations can cause problems:

the initial data quality is unsatisfactory
the data may change (adding data to a dataset etc.) and the quality may become unsatisfactory as a result of the changes.

Example of unsatisfactory quality data to avoid

Data of unsatisfactory quality can take different forms depending on the case.

For example:

Missing values on critical fields, such as a customer ID or a product code.
Inconsistent or outlier values, such as birth dates in the future.
An abnormally high nullity rate as a result of a poorly defined join between two data sets.

Dataiku: A mechanism for monitoring data quality

It is to respond to this problem that Dataiku has implemented a data quality monitoring system. This mechanism makes it easy to verify that the data meets personalized quality standards, and to automatically monitor its status. A data quality rule tests whether a dataset meets certain defined conditions.

The resulting status can take one of the following values:

✅ OK: the condition is met

⚠️ Warning: attention, a threshold has been reached

❌ Error: the rule is broken

⭕ Empty: the condition could not be evaluated

Prior to this release, data quality monitoring was limited to metrics and checks defined individually at the level of each dataset. With V12.6, Dataiku introduces a “Data Quality” module directly integrated into the datasets interface.

‍

Another major evolution: the arrival of Data Quality Templates. These models include sets of preconfigured quality rules that can be easily applied to multiple datasets. They are global to the DSS instance, allowing all users to share, reuse, and standardize quality controls across a project.

Changes that simplify large-scale work on data projects.

Link: https://doc.dataiku.com/dss/latest/release_notes/12.html#data-quality

Data Lineage: for better traceability of data in your projects

In a context where data pipelines are becoming more and more complex, data traceability is no longer a luxury, but a necessity. Understanding where data comes from, how it has been transformed, and which data products are impacted by a modification has become essential to ensure the reliability and transparency of treatments.

The Risks of Unclear Traceability

Without clear traceability, several risks appear:

H4: Lack of data transparency
H4: Propagation of errors
H4: Misunderstanding the impact of changes

It is to respond to these problems that Dataiku has implemented a Data Lineage system. It allows you to follow the evolution of the data through a flow to understand the evolvations/modifications undergone.

This makes it possible to provide answers to the previous problems:

More transparency on the data because it allows you to visualize the complete path of the data
Less error propagation because Data Lineage makes it possible to quickly identify all entities impacted by a modification or error
Less misunderstanding of the impact of changes for the same reasons

‍

Be careful though, for Python or SQL recipes, the lineage is not generated automatically. It will therefore be necessary to define it manually so that it can be correctly propagated in the rest of the flow.

Link: https://doc.dataiku.com/dss/latest/release_notes/13.html#new-feature-column-level-data-lineage

AI-assisted recipe creation

A gain of time and more autonomy for teams

In an environment where speed of execution and ease of implementation are key challenges, advances in generative artificial intelligence are opening up new perspectives. One of the major benefits of Dataiku V13 is the ability to automatically generate recipes based on simple natural language instructions.

This novelty is not only intended to save time for technical users, it also makes it possible to give more autonomy to business profiles who do not necessarily have skills in development or advanced data manipulation.

Concretely, whereas previously creating a recipe required manual configuration step by step, it is now possible to describe the objective of the desired transformation in the form of a sentence in natural language. The AI then automatically generates an adapted recipe, ready to be previewed, modified if necessary, and then deployed in a few seconds.

A gain of time for experts, and new autonomy for business profiles.

Interface to generate a visual recipe

Result of the generation of the visual recipe

Link: https://doc.dataiku.com/dss/latest/release_notes/13.html#new-feature-ai-generate-recipe

In summary

With these new features, Dataiku further strengthens its position as a collaborative, robust and accessible platform, adapted to data projects that are increasing in complexity and demand.

At Eulidia, we already find these benefits with our customers:

More peace of mind when working with others
Fewer data quality risks
A better understanding and transparency of treatments

‍

#AI

#CONSULTING

#DATA

#GOVERNANCE

#EULIDIA

Dataiku V13:4 key features to accelerate your data projects

Git merges in the interface: a real step towards smooth collaboration

A version control system in Dataiku

H3: work on a stable and functional version of the project

Concrete case: a scoring model to streamline the data project

Dataiku V13: What's changing

Data Quality: an essential prerequisite for any data project

Centralized and visual management of quality policies via Dataiku V12.6

Example of unsatisfactory quality data to avoid

For example:

Dataiku: A mechanism for monitoring data quality

The resulting status can take one of the following values:

Data Lineage: for better traceability of data in your projects

The Risks of Unclear Traceability

AI-assisted recipe creation

A gain of time and more autonomy for teams

Result of the generation of the visual recipe

Antoine Rolland

Sommaire

Subscribe to our blog

articles similaires

Data and AI: what are the challenges for businesses in 2025?

Data and AI: what are the challenges for businesses in 2025?

Data consulting: how to choose the right consulting firm?

What is Business Intelligence (BI) or Business Intelligence?

A data or AI need?