Agility in a Data Science project

This article presents the advantages of the method Agile Scrum in software development and data science projects, highlighting frequent deliveries, of the regular customer returns, a adaptation to change. It describes how Scrum, with its iterative approach, promotes continuous exploration, puts the customer at the center, provides visibility and stimulates innovation.

Cécile Sebastian Profile Picture
Cécile Sebastian Agile consultant and coach

Agile “methods” are popular in software development projects. Scrum is the most used Agile framework, in almost 60% cases, and its advantages are numerous:

  • Frequent and regular deliveries of usable features
  • Faster, iterative marketing and therefore regular customer feedback.
  • As a result, a product meeting the customer's needs; and the list is far from exhaustive!

As a reminder, a Scrum project is broken down into iterations – called Sprints – lasting 4 weeks or less. The Scrum team is made up of 3 roles which intervene permanently and collaborate together:

  • The Product Owner is the voice of the customer/user and will translate the customer need so that it is understood by the development team, through Users Stories.
  • The development team is made up of 3 to 9 people whose set of skills makes it possible to transform User stories into results potentially deliverable to the client.
  • The Scrum Master masters the Scrum framework and ensures that the Scrum team understands, knows and applies the framework. He is in turn a trainer, coach and much more!

The stages of a Data Science project

A Data Science project is not managed in the same way as a software project, the first being very exploratory in nature, and its stages are very specific:

Step 1: Definition of the Business problem

As in any project, it is about answering the following questions: What are the customer's needs? What problems does he want to solve? What are the priorities ? Can they be solved using data? The challenge of this phase, whatever the method, is to ensure that the business clearly expresses its needs and that this is understood by the Data teams. 

Step 2: Data collection

This step is both critical and determining for the rest of the project. Some projects may stop there if we notice that the data we have does not allow us to respond to the problem, or even if the Data we need is non-existent. There are also questions of data accessibility and compliance with the GDPR because existing data does not always mean that it will be usable.

Step 3: Data cleaning

This is the most time-consuming stage of the project! It will require numerous back and forths between the Data team and the business in order to understand the data. The Agile method is therefore completely suitable here. Indeed, poor or partial understanding would result in a biased analysis. Also, in the era of Big Data, we must deal with an ever-increasing volume of data. Data is often incomplete; non-standardized (identical data in different formats, for example: France, FR); obsolete; in duplicates. It is therefore necessary to be attentive and rigorous if we want to obtain “clean” and usable data.

Step 4: Exploration

In possession of a harmonized and correctly formatted dataset, the Data Scientist will be able to begin the analysis. Again, it is difficult to predict what the outcome will be, or even whether the outcome will be satisfactory. But at this stage we get a first glimpse.

Step 5: Creation of the Data Science solution

The creation of the solution can begin if the exploration has produced sufficiently conclusive results. It is essential that the Data Scientist always has the initial business problem in mind in order to create the algorithm that will best respond to it and maximize the return on investment.

Step 6: Deployment

The deployment or production phase is the moment when customers take possession of the tool that has been developed so that it can be used.

What is the benefit of leading a Data Science project using the Scrum methodology?

The limits of “classic” methodologies

In a project managed in a “cascade” or “V-cycle”, each stage only starts when the previous one is completed. It can then happen that the allotted time or the project budget is almost entirely consumed before the final stages, forcing either to rush the latter or to increase the initial project budget. 

Before proceeding with deployment, that is to say delivery to users, the project team carried out technical tests and a sample of users carried out functional tests. If the results are satisfactory, the product can be deployed to all users. 

After deployment, in the context of projects managed in cascade, we very often observe that:

  • Users are not trained in the use of the product
  • Among those that have been developed, only a small number of features will be used, while other missing features were expected.

Why such a result? In a waterfall project, the requirement gathering stage takes place only once. It must therefore be exhaustive and can therefore last a few weeks or several months. The client is therefore heavily involved at that moment, then a few months later, during functional tests...once the product has been developed in its entirety! There is therefore almost no room for maneuver in the event that the product is not satisfactory!

 

 

The benefit of the Agile approach

It is to compensate for this that the “founders” of Agile methods wanted to put the customer at the center of the project in order to ensure satisfaction. The 4 values of the Agile Manifesto demonstrate this:

  • People and their interactions more than tools and processes.
  • Operational products more than exhaustive documentation.
  • Collaboration with customers more than contractual negotiation.
  • Adapting to change more than following a plan.

Also, the Scrum approach makes it possible to “parallelize” several of the 5 steps seen above during a single Sprint. It is therefore no longer months that separate the requirements collection phase from the deployment phase. At most 4 weeks pass between the Sprint Planning during which customer needs are addressed and the Sprint Review during which what has been produced is inspected by these same customers.

It is the entire development team that communicates with the Product Owner during Sprint Planning and it is the entire development team that receives feedback from customers/users during Sprint Reviews ! And this, with each iteration!

With Scrum, investigations will begin as soon as a need has been identified, even in a summary manner. The Product Owner begins to collect the first information concerning the customer need. He expresses this to Data Scientists through “Users Stories” and daily exchanges. At the same time, Data Scientists begin to “prepare the ground”: identification of the tools where the data is stored, request for access to the different tools and databases, collection and exploration of the data in order to be able to begin to identify the way in which it is stored. will use it to achieve the expected result.

Also, if one of the requests is impossible to satisfy, we will notice it very quickly. The project can then be reoriented at the start of the next sprint. The role of the Product Owner is to collect and prioritize user requests while the development team works to respond to initial needs. The iterative approach of Scrum allows you to start going further than just collecting customer needs, in order to quickly and regularly discover whether the project is on the right track.

If the data (or a sample) is available and usable as is, the exploration phase can start from the first Sprint! With the Scrum approach, we can obtain initial results from the first Sprints. These are not definitive but they allow you to validate the track followed in order to continue or to invalidate it in order to change strategy. 

Of course, depending on the complexity of the model to be produced, it may be delivered after one or more sprints. However, the idea is to quickly provide a first result which can be used by the client; this first result is called an MVP (Minimum Valuable Product). The client can then derive the first benefits and provide feedback which allows the team to improve/complete the model during subsequent sprints. 

Each iteration is therefore an opportunity for the client to refine, specify their needs, and prioritize them according to what brings them the most added value. For the Scrum team, each iteration is an opportunity to improve its understanding of customer needs in order to provide the best possible solution.

If at the end of a sprint, what was produced is not satisfactory, the Sprint review makes it possible to detect this...after 4 weeks maximum! It is then entirely possible to reorient the following sprint, based on new feedback collected.

Conclusion

In a Scrum project, if the results are not satisfactory, it is at most after 4 weeks that we notice it and we can reorient the project from the next sprint. If the result of the sprint is satisfactory, users can benefit from it. The project is not finished, requests for improvements and new functionalities are made over time…, while the development of additional functionalities is in the different previous phases. 

As you will have understood, each of the 6 phases seen previously will be addressed and repeated in each of the sprints in order to produce a partial but potentially usable result! But it is likely that the first sprints will not be able to produce results until the data is made available.

From the customer/user's point of view, Scrum allows them to regularly ensure that their needs have been understood, through demonstrations and regular deliveries. These regular deliveries allow it to benefit from the results of the project from the first iterations, as soon as the first functionalities are put into service. The project as a whole does not cost him less, but for an equivalent budget, the product meets his expectations much more. Also, users benefit from regular handling of the product.

From the Data Scientist's point of view, the advantage is that he will be able to explore, test several approaches, tools, etc. Each project is therefore an opportunity for him to discover a new tool and acquire new knowledge and skills, and therefore to offer ever more innovative solutions. In addition, operating through Sprint allows for better visibility and better framing of the actions to be carried out.

Finally, it would be utopian to think that the Scrum framework offers a magic formula for the success of any project! The project can only succeed if the entire Scrum framework is respected and if the principles and values of Agility are understood and embodied by the project stakeholders. The Scrum Master is there to support the team in this direction, each Sprint being an opportunity to improve!

So let's be Agile! 

 

A must see

Most popular articles

Do you have a transformation project? Let's talk about it !