The article is co-authored by Srini Raghavan, Co-Founder & Chief Executive Officer of Educational Initiatives.

Abstract
Mindspark is an adaptive-learning program that helps students achieve conceptual clarity, develop procedural fluency, and get extensive practice at their learning pace. Mindspark also provides teachers the visibility, insights, and tools to improve learning levels across their class. Mindspark is currently offered for the subjects of Maths – for grades 1 to 10 and English – for Grades 4 to 9.
There are five distinct student types, or personas that we observed among Mindspark Maths students: Tortoises, Beavers, Oysters, Dolphins, and Alligators1. Each persona has a different mindset alongside several defining characteristics, how to identify these characteristics and what kind of interventions that can be recommended to each of them is the key objective of this work.

Purpose of identifying student personas
This article discusses the work in which a student-centered design methodology is applied to gain a deeper understanding of our Mindspark students’ learning patterns. Specifically, it is concerned with the approach used to identify the personas; a data-driven approach that can be applied to the design of learning experiences. One of the benefits of using personas is in making implicit knowledge about students more explicit and thus enabling more informed interventions.
Student personas are user archetypes that characterize the learning requirements, learning patterns, personal characteristics (as available), and goals of larger groups of students (Cooper, 1999; Rogers et al., 2011; Saffer, 2009). The creation of rich, inspiring student profiles, or personas, can provide us with an enhanced view of our student population, and this is the main motivation for the work reported here. Cooper (1999) describes personas as a powerful design tool and adds that “the greater power of personas is their precision and specificity”.

What is a Persona?
Personas are a powerful design tool for providing personalized learning paths to students. A typical persona would consist of student name, grade, learning behavior, goals, and personal details such as what they do in their leisure time, hobbies, openness to learn new things
The idea behind collecting and processing these details is to create an archetype of multiple student groups which are mutually exclusive in their characteristics while completely homogeneous within itself. It provides a richer account of students and makes it much easier for teachers and mentors to keep the key characteristics in mind when they are working with them. In the context of learning design, personas also provide an effective tool for communicating in general terms about students within interdisciplinary teams; they provide a common vocabulary for discussions about the student population.

Methodology to identify student persona
The methodology employed for deriving the personas involved the development of data-driven personas using Mindspark Maths data of students in Grade 7 and above. A survey was conducted on students seeking their responses to collect additional details such as time spent in school, their short-term goals, the perception of their parents, and their own perception about their capabilities.
The Mindspark learning pattern analysis revealed Topic clarity, interest in Mindspark, Time spent on Mindspark and the attitude to complete the assigned clusters & topics in Mindspark as differentiating attributes in identifying student personas. The analysis also revealed that student personas vary in each specific domain in Maths as defined by Mindspark, namely, Algebra, Geometry, Number System, Probability & Statistics.
The process of identifying student personas can be explained better through an example. Suppose there are approximately 4660 students in Grade 8 who have completed multiple topics in Mindspark. If we look at their overall accuracy, say, in Algebra and Geometry and plot that on graph, we get a very cluttered and random visual as below,

The above visual does not provide us any interpretation other than, some are performing better, and some are performing lower in one of the domains. This interpretation does not help us understanding anything on any students’ learning characteristics. At this stage it is important look into more details about student behavior using more sophisticated statistical approaches, such as clustering. The initial clustering of students using the accuracy in Geometry and Algebra revealed that there could be multiple centroids among these 4660 students as depicted below, however, even this initial clustering with two attributes does not generate the differentiation clearly as shown in the visual below.

At this stage, it is important to identify and include more attributes that can differentiate the students based on the differences in performance behaviour. The statistical technique Principal Component Analysis2 allows us to test many attributes, such as accuracy, topic completion rate, cluster failure rate, time taken to answer questions. We tested 24 such attributes and grouped the important ones into components that gives more distinct information about student behavior. For example, the first component accuracy will include regular question accuracy, indicator question accuracy and misconception question accuracy. The second component time could include attributes such as time spent on answering questions, time spent on reading explanation. There were 6 such components which were derived from the Principal Component Analysis and those components were used to model the student behaviour using more sophisticated K-Means clustering3.
To describe the outcome of the clustering process in a simple way and to reduce the complexity of multiple aspects under consideration, let us consider the above two components, accuracy (PCA component 1) and time (PCA Component 2), and plot the K-Means clustering output for the 4660 students.

The above image clearly showcases 5 different groups of students. The characteristics of these groups were determined by evaluating the parameter scores gained through K-Means clustering and each 5 segments were profiled based on the scores. The picture below gives an idea of various attributes considered (left side) and the characteristics extracted (right side).

Any student who starts to use Mindspark can be further classified and fitted into any of these 5 personas based on their learning behavior. A sample reference table is shown below.

To present some of the ways in which the use of personas has influenced the design and development of learning experiences, it is necessary to set out some context of the above analysis. A pedagogical framework that supports the design and development of learning experiences for students was key to develop an understanding of learning behavior. The framework uses all the various types interactions a student has with Mindspark as well as with each other. This could be with or without the students realizing it. These interactions provide the lowest level of granularity of how learning experiences may be designed.
Initially, however, learning experiences are couched in terms of goals. The goal could be as simple as “student wants to score 90/100” in the upcoming test. These short-term goals are initially presented in a relatively open way and are set out in a narrative account provided to the mentor/teacher. Further, resources and recommendations that are useful for the achievement of the short-term goals are also provided to mentor/teacher. Each recommendation includes a rationale for the students’ goal. At the end, the student needs to be shown the path that they must take to achieve the goal. These are set out in the mentor’s/teacher’s narrative account. The learning gaps and feedback can be discrete and clearly defined (for example, the result of a diagnostic test), or be continuous and loosely defined (for example the progression of a learner’s thinking within a given topic). The key issue here is that learning experiences are inherently consequential; the use of goals as the main structure for the design of learning experiences is motivated by this idea.
Students are encouraged to use the recommendations set at a high level as the basis for their studies; they are more likely to be emergent and directed by the student themselves. Additionally, objectives are presented in increasing levels of detail; this is typically achieved by deconstructing goal (scoring 90/100) into its core, advanced and related elements (score 20/25 in Algebra, 40/45 in Geometry, 15/15 in Numbers and 15/15 in Probability & statistics).
The picture below depicts a “Beaver” persona with their characteristics in learning “Algebra” with recommended learning path in Algebra

One more example that depicts an “Alligator” persona with their characteristics in learning “Algebra” with recommended learning path in Algebra

Final Notes
A few factors that need to be considered while working with student persona would be, a good training and support system for mentors and teachers is necessary for them to interpret and incorporate the personas into their practice. A few practitioners could be skeptical and express the view that, whilst the personas are representative of the student population, they already have a clear sense of who their students are and, as such, the personas are unnecessary. The key communication in such cases would be of the idea that when mediating learning experiences using technology, much that had previously been implicit needs to be made explicit. Whilst mentors/teachers may be able to adapt synchronous face to face learning experiences as needed, this is much more difficult to do when they are mediated synchronously or asynchronously in flexible learning contexts (digital learning applications). Most of the student needs must be preempted and built into the learning experience up front in terms of its completeness and appropriateness for use in different contexts.
In the context of the topic discussed here, personas have had two main impacts, the first is in communications about our student learning patterns and the second is in identifying the same with overall student population. This approach excludes personal bias and is purely data-based and hence can be useful to teachers and mentors as an input, especially in an environment in which the student-teacher ratio is extremely skewed. It is our view that the use of personas is an important tool in empowering the educator community to provide learning experiences that are more appropriate and engaging to students in flexible and digital learning contexts.

Appendix
Reference Notes
• Persona Names and Description


• Principal Component Analysis: It is a technique for reducing the dimensionality of datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem, and the new variables are defined by the dataset at hand, not a priori, hence making PCA an adaptive data analysis technique

• K-means clustering: The k-means clustering algorithm is a data mining and machine learning tool used to cluster observations into groups of related observations without any prior knowledge of those relationships. Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-defined distinct non-overlapping subgroups (clusters) where each data point belongs to only one group. It tries to make the intra-cluster data points as similar as possible while also keeping the clusters as different (far) as possible

This is how K-Means algorithm works:
 The modeler sets an initial value of K, any numeric value between 3 – 8. This is initial number of
clusters and over the next steps the algorithm comes up with optimal clusters / groups
 K-means allocates every data point in the dataset to the nearest centroid, meaning that a data point is
in a cluster if it is closer to that cluster’s centroid than any other centroid. In this case the 4660
students’ multiple scores/value across different factors (accuracy, time etc.)
 Then K-means recalculates the centroids by taking the mean of all data points assigned to that
centroid’s cluster, hence reducing the total intra-cluster variance in relation to the previous step. The
“means” in the K-means refers to averaging the data and finding the new centroid
 The algorithm iterates between steps 2 and 3 until some criteria is met (e.g. the sum of distances
between the data points and their corresponding centroid is minimized, a maximum number of iterations is
reached, no changes in centroids value or no data points change clusters)

Anupama Muraleedharan & Srini Raghavan

Anupama works on data science and machine learning across multiple projects in EI.
Srini Raghavan is the Co-Founder & Chief Executive Officer of Educational Initiatives.

Latest posts by Anupama Muraleedharan & Srini Raghavan (see all)