0000000000635381
AUTHOR
Mirka Saarela
Automatic Profiling of Open-Ended Survey Data on Medical Workplace Teaching
On-the-job medical training is known to be challenging due to the fast-paced environment and strong vocational profile. It relies on on-site supervisors, mainly doctors and nurses with long practical experience, who coach and teach their less experienced colleagues, such as residents and healthcare students. These supervisors receive pedagogical training to ensure that their guidance and teaching skills are constantly improved. The aim of such training is to develop participants’ patient, collegiate and student guidance skills in a multiprofessional environment, and to expand their understanding of guidance as part of their work as supervisors of healthcare professionals. In this paper, we …
Do Country Stereotypes Exist in PISA? A Clustering Approach for Large, Sparse, and Weighted Data
Certain stereotypes can be associated with people from different countries. For example, the Italians are expected to be emotional, the Germans functional, and the Chinese hard-working. In this study, we cluster all 15-year-old students representing the 68 different nations and territories that participated in the latest Programme for International Student Assessment (PISA 2012). The hypothesis is that the students will start to form their own country groups when clustered according to the scale indices that summarize many of the students’ characteristics. In order to meet PISA data analysis requirements, we use a novel combination of our previously published algorithmic components to reali…
Measuring self‐regulated learning in a junior high school mathematics classroom : Combining aptitude and event measures in digital learning materials
Background Measurement of students' self-regulation skills is an active topic in education research, as effective assessment helps devising support interventions to foster academic achievement. Measures based on event tracing usually require large amounts of data (e.g., MOOCs and large courses), while aptitude measures are often qualitative and need careful interpretation. Precise and interpretable evaluation of self-regulation skills in a normal K-12 classroom thus poses a challenge. Objectives The present study proposes and explores a learning analytics method of combining aptitude and event measures to evaluate student's self-regulation skills. Methods An explorative learning analytics s…
Sima – an Open-source Simulation Framework for Realistic Large-scale Individual-level Data Generation
We propose a framework for realistic data generation and the simulation of complex systems and demonstrate its capabilities in a health domain example. The main use cases of the framework are predicting the development of variables of interest, evaluating the impact of interventions and policy decisions, and supporting statistical method development. We present the fundamentals of the framework by using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing, and fast random number gener…
Let Me Hack It: Teachers’ Perceptions About ‘Making’ in Education
Making in education is an emergent practice focusing on learners as creators of things in a collaborative fashion while promoting knowledge construction through technology, design, and creative self-expression. Teachers’ (n=33) opinions about making were studied using an online questionnaire after they had attended an online course for professional development about making in education. The results suggest that there exists a group of educators who consider making as a promising approach in education and want to promote its use in schools. peerReviewed
Analysing the Nigerian Teacher’s Readiness for Technology Integration
Towards Evidence-Based Academic Advising Using Learning Analytics
Academic advising is a process between the advisee, adviser and the academic institution which provides the degree requirements and courses contained in it. Content-wise planning and management of the student’ study path, guidance on studies and academic career support is the main joint activity of advising. The purpose of this article is to propose the use of learning analytics methods, more precisely robust clustering, for creation of groups of actual study profiles of students. This allows academic advisers to provide evidence-based information on the study paths that have actually happened similarly to individual students. Moreover, academic institutions can focus on management and upda…
“Sitting at the Stern and Holding the Rudder” : Teachers’ Reflections on Action in Higher Education Based on Student Agency Analytics
Digital technologies in teaching and learning in higher education have the potential to enhance student agency. Student agency is an essential resource to nurture, especially at times when students face challenges emerging from the volatile, uncertain, complex, and ambiguous world. In addition, contemporary policymaking has identified the importance of student agency. Student agency analytics is a process utilizing learning analytics, specifically psychometrics and machine learning to provide teachers insights about the agentic resources of their students. Four teachers in higher education were provided with student agency analytics results of their mathematics courses. The teachers partici…
Understanding the Study Experiences of Students in Low Agency Profile: Towards a Smart Education Approach
In this paper, we use student agency analytics to examine how university students who assessed to have low agency resources describe their study experiences. Students ( n=292 ) completed the Agency of University Students (AUS) questionnaire. Furthermore, they reported what kinds of restrictions they experienced during the university course they attended. Four different agency profiles were identified using robust clustering. We then conducted a thematic analysis of the open-ended answers of students who assessed to have low agency resources. Issues relating to competence beliefs, self-efficacy, student-teacher relations, time as a resource, student well-being, and course contents seemed to …
Automatic knowledge discovery from sparse and large-scale educational data : case Finland
The Finnish educational system has received a lot of attention during the 21st century. Especially, the outstanding results in the first three cycles of the Programme for International Student Assessment (PISA) have made Finland’s education system internationally famous, and its unique characteristics have been under active research by various, predominantly educational, scholars since then. However, despite the availability of real but often sparse big data sets that would allow more evidence-based decision making, existing research to date has mostly concentrated on using classical qualitative and (univariate) quantitative methods. This thesis discusses, in general terms, knowledge discove…
Expert-based versus citation-based ranking of scholarly and scientific publication channels
Abstract The Finnish publication channel quality ranking system was established in 2010. The system is expert-based, where separate panels decide and update the rankings of a set of publications channels allocated to them. The aggregated rankings have a notable role in the allocation of public resources into universities. The purpose of this article is to analyze this national ranking system. The analysis is mainly based on two publicly available databases containing the publication source information and the actual national publication activity information. Using citation-based indicators and other available information with association rule mining, decision trees, and confusion matrices, …
Course Satisfaction in Engineering Education Through the Lens of Student Agency Analytics
This Research Full Paper presents an examination of the relationships between course satisfaction and student agency resources in engineering education. Satisfaction experienced in learning is known to benefit the students in many ways. However, the varying significance of the different factors of course satisfaction is not entirely clear. We used a validated questionnaire instrument, exploratory statistics, and supervised machine learning to examine how the different factors of student agency affect course satisfaction among engineering students (N = 293). Teacher’s support and trust for the teacher were identified as both important and critical factors concerning experienced course satisf…
Feature Ranking of Large, Robust, and Weighted Clustering Result
A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…
Discovering Gender-Specific Knowledge from Finnish Basic Education using PISA Scale Indices
The Programme for International Student Assessment, PISA, is a worldwide study to assess knowledge and skills of 15- year-old students. Results of the latest PISA survey conducted in 2012 were published in December 2013. According to the results, Finland is one of the few countries where girls performed better in mathematics than boys. The purpose of this work is to refine the analysis of this observation by using education data mining techniques. More precisely, as part of standard PISA preprocessing phase certain scale indices are constructed based on information gathered from the background questionnaire of each participating student. The indices describe, e.g., students’ engagement, dri…
Analysing Student Performance using Sparse Data of Core Bachelor Courses
Curricula for Computer Science (CS) degrees are characterized by the strong occupational orientation of the discipline. In the BSc degree structure, with clearly separate CS core studies, the learning skills for these and other required courses may vary a lot, which is shown in students' overall performance. To analyze this situation, we apply nonstandard educational data mining techniques on a preprocessed log file of the passed courses. The joint variation in the course grades is studied through correlation analysis while intrinsic groups of students are created and analyzed using a robust clustering technique. Since not all students attended all courses, there is a nonstructured sparsity…
Supporting Institutional Awareness and Academic Advising using Clustered Study Profiles
The purpose of academic advising is to help students with developing educational plans that support their academic career and personal goals, and to provide information and guidance on studies. Planning and management of the students’ study path is the main joint activity in advising. Based on a study log of passed courses, we propose to use robust, prototype-based clustering to identify a set of actual study path profiles. Such profiles identify groups of students with similar progress of studies, whose analysis and interpretation can be used for better institutional awareness and to support evidence-based academic advising. A model of automated academic advising system utilizing the possi…
The Finnish Version of the Affinity for Technology Interaction (ATI) Scale : Psychometric Properties and an Examination of Gender Differences
The pervasiveness of technical systems in our lives calls for a broad understanding of the interaction between humans and technology. Affinity for technology interaction (ATI) scale measures the tendency of a person to actively engage or to avoid interaction with technological systems, including both software and physical devices. This research presents a psychometric analysis of a Finnish version of the ATI scale. The data consisted of 796 responses of students in a Finnish university. The data were analyzed utilizing factor analysis and both nonparametric and parametric item response theory. The Finnish version of the ATI scale proved to be essentially unidimensional, showing high reliabi…
Predicting Math Performance from Raw Large-Scale Educational Assessments Data : A Machine Learning Approach
Large-scale educational assessment studies (LSAs) regularly collect massive amounts of very rich cognitive and contextual data of whole student populations. Currently, LSAs are limited to reporting student proficiencies in the form of plausible values (PVs). PVs are random draws from the posterior distribution of a student’s ability, which is based on the Bayesian approach with the prior distribution modeling the student background within the population and the likelihood test item response using the Rasch model. While PVs have shown to be a reliable estimate for proficiencies of populations, a more comprehensive study of these rich data sets by deploying machine learning algorithms may pro…
Comparison of feature importance measures as explanations for classification models
AbstractExplainable artificial intelligence is an emerging research direction helping the user or developer of machine learning models understand why models behave the way they do. The most popular explanation technique is feature importance. However, there are several different approaches how feature importances are being measured, most notably global and local. In this study we compare different feature importance measures using both linear (logistic regression with L1 penalization) and non-linear (random forest) methods and local interpretable model-agnostic explanations on top of them. These methods are applied to two datasets from the medical domain, the openly available breast cancer …
Weighted Clustering of Sparse Educational Data
Clustering as an unsupervised technique is predominantly used in unweighted settings. In this paper, we present an efficient version of a robust clustering algorithm for sparse educational data that takes the weights, aligning a sample with the corresponding population, into account. The algorithm is utilized to divide the Finnish student population of PISA 2012 (the latest data from the Programme for International Student Assessment) into groups, according to their attitudes and perceptions towards mathematics, for which one third of the data is missing. Furthermore, necessary modifications of three cluster indices to reveal an appropriate number of groups are proposed and demonstrated. pe…
Knowledge Discovery from the Programme for International Student Assessment
The Programme for International Student Assessment (PISA) is a worldwide study that assesses the proficiencies of 15-year-old students in reading, mathematics, and science every three years. Despite the high quality and open availability of the PISA data sets, which call for big data learning analytics, academic research using this rich and carefully collected data is surprisingly sparse. Our research contributes to reducing this deficit by discovering novel knowledge from the PISA through the development and use of appropriate methods. Since Finland has been the country of most international interest in the PISA assessment, a relevant review of the Finnish educational system is provided. T…
Lokidatan käyttö oppilaiden profiloimisessa - sovellus matematiikan PISA-aineistoon
Predicting hospital associated disability from imbalanced data using supervised learning.
Hospitalization of elderly patients can lead to serious adverse effects on their functional capability. Identifying the underlying factors leading to such adverse effects is an active area of medical research. The purpose of the current paper is to show the potential of artificial intelligence in the form of machine learning to complement the existing medical research. This is accomplished by studying the outcome of hospitalization of elderly patients as a supervised learning task. A rich set of features characterizing the medical and social situation of elderly patients is leveraged and using confusion matrices, association rule mining, and two different classes of supervised learning algo…
Explainable Student Agency Analytics
Several studies have shown that complex nonlinear learning analytics (LA) techniques outperform the traditional ones. However, the actual integration of these techniques in automatic LA systems remains rare because they are generally presumed to be opaque. At the same time, the current reviews on LA in higher education point out that LA should be more grounded to the learning science with actual linkage to teachers and pedagogical planning. In this study, we aim to address these two challenges. First, we discuss different techniques that open up the decision-making process of complex techniques and how they can be integrated in LA tools. More precisely, we present various global and local e…
Robustness, Stability, and Fidelity of Explanations for a Deep Skin Cancer Classification Model
Skin cancer is one of the most prevalent of all cancers. Because of its being widespread and externally observable, there is a potential that machine learning models integrated into artificial intelligence systems will allow self-screening and automatic analysis in the future. Especially, the recent success of various deep machine learning models shows promise that, in the future, patients could self-analyse their external signs of skin cancer by uploading pictures of these signs to an artificial intelligence system, which runs such a deep learning model and returns the classification results. However, both patients and dermatologists, who might use such a system to aid their work, need to …
Mislabel Detection of Finnish Publication Ranks
The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results on the reference paper.
On Combining Explainable Artificial Intelligence and Interactive Multiobjective Optimization in Data-Driven Decision Support
Simulation Framework for Realistic Large-scale Individual-level Data Generation with an Application in the Health Domain
We propose a framework for realistic data generation and simulation of complex systems and demonstrate its capabilities in the health domain. The main use cases of the framework are predicting the development of risk factors and disease occurrence, evaluating the impact of interventions and policy decisions, and statistical method development. We present the fundamentals of the framework using rigorous mathematical definitions. The framework supports calibration to a real population as well as various manipulations and data collection processes. The freely available open-source implementation in R embraces efficient data structures, parallel computing and fast random number generation which…
Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator
The publication indicator of the Finnish research funding system is based on a manual ranking of scholarly publication channels. These ranks, which represent the evaluated quality of the channels, are continuously kept up to date and thoroughly reevaluated every four years by groups of nominated scholars belonging to different disciplinary panels. This expert-based decision-making process is informed by available citation-based metrics and other relevant metadata characterizing the publication channels. The purpose of this paper is to introduce various approaches that can explain the basis and evolution of the quality of publication channels, i.e., ranks. This is important for the academic …
Robust Principal Component Analysis of Data with Missing Values
Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can b…