0000000000017336
AUTHOR
Tommi Kärkkäinen
A Knowledge Interface System for Information and Cyber Security Using Semantic Wiki
Resilience against information and cyber security threats has become an essential ability for organizations to maintain business continuity. As bulletproof security is an unattainable goal, organizations need to concentrate to select optimal countermeasures against information and cyber security threats. Implementation of cyber risk management actions require special knowledge and resources, which especially small and medium-size enterprises often lack. Information and cyber security risk management establish knowledge intensive business processes, which can be assisted with a proper knowledge management system. This paper analyzes how Semantic MediaWiki could be used as a platform to assis…
ProcMiner: Advancing Process Analysis and Management
This paper contributes both to research and practice on process mining. Previous research on process mining has focused on mining patterns from event log files to generate process models. The process mining approach adopted in this paper is focused on producing patterns about process models, not the models themselves. The approach is demonstrated by ProcMiner -an explorative research prototype for management, consolidating, publishing, retrieving, and analyzing process models. Content-based document clustering is applied to process models represented as XML database in order to find topical groups from models. In practice, organizations face numerous challenges in managing their process mod…
Extraction of ERP from EEG data
In this article, a simple but novel technique for extracting a linear subspace related to event related potentials (ERPs) from ElectroEncephaloGraphy (EEG) data is introduced. The technique consists of a sequence of basic linear operations applied to multidimensional EEG data in a problem-specific manner. The derivation of the proposed technique is given and results with real data are described together with overall conclusions.
How to evaluate first aid skills after training: a systematic review
Abstract Background To be able to help and save lives, laypersons are recommended to undergo first aid trainings. The aim of this review was to explore the variety of the elements of the measuring systems to assess the effects of first aid trainings on different aspects of first aid skills including practical skills, knowledge, and emotional perspectives. Methods This systematic literature review used Scopus and PubMed databases and searched for studies published between January, 2000, and December, 2020. Out of 2,162 studies meeting the search criteria, 15 studies with quantitative and repeatable evaluation methods to assess first aid skills after first aid training for adults were include…
Using Slack for computer-mediated communication to support higher education students’ peer interactions during Master’s thesis seminar
Our study contributes to the research on computer-mediated communication in higher education by experimenting a modern communication tool called Slack. In particular, we consider using Slack to support students’ peer interactions during Master’s thesis work. For this purpose, we designed a case study that was executed in a Master’s thesis seminar course. During the course, all out-of-class communication was carried out by using Slack, instead of e-mails or learning management systems. After the course, we used a questionnaire to investigate how the students perceived Slack for asking for assistance, their intention to use Slack, and Slack’s ease of use. Furthermore, the questionnaire asked …
Automatic Profiling of Open-Ended Survey Data on Medical Workplace Teaching
On-the-job medical training is known to be challenging due to the fast-paced environment and strong vocational profile. It relies on on-site supervisors, mainly doctors and nurses with long practical experience, who coach and teach their less experienced colleagues, such as residents and healthcare students. These supervisors receive pedagogical training to ensure that their guidance and teaching skills are constantly improved. The aim of such training is to develop participants’ patient, collegiate and student guidance skills in a multiprofessional environment, and to expand their understanding of guidance as part of their work as supervisors of healthcare professionals. In this paper, we …
A Simple Cluster Validation Index with Maximal Coverage
Clustering is an unsupervised technique to detect general, distinct profiles from a given dataset. Similarly to the existence of various different clustering methods and algorithms, there exists many cluster validation methods and indices to suggest the number of clusters. The purpose of this paper is, firstly, to propose a new, simple internal cluster validation index. The index has a maximal coverage: also one cluster, i.e., lack of division of a dataset into disjoint subsets, can be detected. Secondly, the proposed index is compared to the available indices from five different packages implemented in R or Matlab to assess its utilizability. The comparison also suggests many interesting f…
Mining road traffic accidents
A Douglas–Rachford method for sparse extreme learning machine
Region of interest detection using MLP
A novel technique to detect regions of interest in a time series as deviation from the characteristic behavior is proposed. The deterministic form of a signal is obtained using a reliably trained MLP neural network with detailed complexity management and cross-validation based generalization assurance. The proposed technique is demonstrated with simulated and real data. peerReviewed
Extreme Minimal Learning Machine
Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM) are nonlinear and scalable machine learning techniques with randomly generated basis. Both techniques share a step where a matrix of weights for the linear combination of the basis is recovered. In MLM, the kernel in this step corresponds to distance calculations between the training data and a set of reference points, whereas in ELM transformation with a sigmoidal activation function is most commonly used. MLM then needs additional interpolation step to estimate the actual distance-regression based output. A natural combination of these two techniques is proposed here, i.e., to use a distance-based kernel characteristic in M…
Guided ultrasonic waves in long bones: modelling, experiment and in vivo application.
Existing ultrasound devices for assessing the human tibia are based on detecting the first arriving signal, corresponding to a wave propagating at, or close to, the bulk longitudinal velocity in bone. However, human long bones are effectively irregular hollow tubes and should theoretically support the propagation of more complex guided modes similar to Lamb waves in plates. Guided waves are attractive because they propagate throughout the bone thickness and can potentially yield more information on bone material properties and architecture. In this study, Lamb wave theory and numerical simulations of wave propagation were used to gain insights into the expected behaviour of guided waves in …
Numerical methods for nonlinear inverse problems
AbstractInverse problems of distributed parameter systems with applications to optimal control and identification are considered. Numerical methods and their numerical analysis for solving this kind of inverse problems are presented, main emphasis being on the estimates of the rate of convergence for various schemes. Finally, based on the given error estimates, a two-grid method and related algorithms are introduced, which can be used to solve nonlinear inverse problems effectively.
Assessment of the tibia using ultrasonic guided waves in pubertal girls
The purpose of this study was to compare low frequency ultrasonic guided wave measurements with established ultrasound and bone density measurements in terms of their ability to characterize the tibia in pubertal girls. Subjects were 12-14-year-old girls ( n=106) who were participating in a calcium and vitamin D intervention study. A prototype low frequency pulse transmission device consisting of a uniaxial scanning mechanism and low frequency transducers orientated perpendicularly to the limb was used to measure two ultrasound velocities in the tibia. The first velocity, V1, was that of the first arriving signal, similar to that measured by existing commercial tibial ultrasound devices. Th…
Automatic Customization Framework for Efficient Vehicle Routing System Deployment
Vehicle routing systems provide several advantages over manual transportation planning and they are attracting growing attention. However, deployment of these systems can be prohibitively costly, especially for small and medium-sized enterprises: the customization, integration, and migration is laborious and requires operations research expetise. We propose an automated configuration workflow for vehicle routing system and data flow customization, which provides the necessary basis for more experimental work on the subject. Our preliminary results with learning and adaptive algorithms support the assumption of applicability of the proposed configuration framework. The strategies presented h…
Do Country Stereotypes Exist in PISA? A Clustering Approach for Large, Sparse, and Weighted Data
Certain stereotypes can be associated with people from different countries. For example, the Italians are expected to be emotional, the Germans functional, and the Chinese hard-working. In this study, we cluster all 15-year-old students representing the 68 different nations and territories that participated in the latest Programme for International Student Assessment (PISA 2012). The hypothesis is that the students will start to form their own country groups when clustered according to the scale indices that summarize many of the students’ characteristics. In order to meet PISA data analysis requirements, we use a novel combination of our previously published algorithmic components to reali…
Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression
Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methods, the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM), in problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential, emphasizing the utility of the problem transformation especially with the EMLM. peerReviewed
One Dimensional Convolutional Neural Networks for Seizure Onset Detection Using Long-term Scalp and Intracranial EEG
Epileptic seizure detection using scalp electroencephalogram (sEEG) and intracranial electroencephalogram (iEEG) has attracted widespread attention in recent two decades. The accurate and rapid detection of seizures not only reflects the efficiency of the algorithm, but also greatly reduces the burden of manual detection during long-term electroencephalogram (EEG) recording. In this work, a stacked one-dimensional convolutional neural network (1D-CNN) model combined with a random selection and data augmentation (RS-DA) strategy is proposed for seizure onset detection. Firstly, we segmented the long-term EEG signals using 2-sec sliding windows. Then, the 2-sec interictal and ictal segments w…
ExtMiner : Combining multiple ranking and clustering algorithms for structured document retrieval
This paper introduces ExtMiner, a platform and potential tool for information management in SMEs (small & medium-size enterprise), or for organizational workgroups. ExtMiner supports interactive and iterative clustering of documents. It provides users with a visual cluster and list views at the same time, supporting iterative search process. ExtMiner may also be applied as a platform for research on retrieval fusion, since it combines search, clustering and visualization algorithms. ExtMiner was evaluated with three document collections. Although the findings were encouraging the user interface and performance with large document repositories need further development. peerReviewed
Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?
Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of an ordinary least-squares problem. Several different solvers can be applied to the resulting linear problem. In this paper, a thorough comparison of possible and recently proposed, especially randomized, algorithms is carried out for this problem with a representative set of regression datasets. In addition, we compare MLM with shallow and deep feedforward neural network models and study the effects of the number of observations and the number of features with a special dat…
A method for structure prediction of metal-ligand interfaces of hybrid nanoparticles
Hybrid metal nanoparticles, consisting of a nano-crystalline metal core and a protecting shell of organic ligand molecules, have applications in diverse areas such as biolabeling, catalysis, nanomedicine, and solar energy. Despite a rapidly growing database of experimentally determined atom-precise nanoparticle structures and their properties, there has been no successful, systematic way to predict the atomistic structure of the metal-ligand interface. Here, we devise and validate a general method to predict the structure of the metal-ligand interface of ligand-stabilized gold and silver nanoparticles, based on information about local chemical environments of atoms in experimental data. In …
Use cases for operational decision support system
Measuring self‐regulated learning in a junior high school mathematics classroom : Combining aptitude and event measures in digital learning materials
Background Measurement of students' self-regulation skills is an active topic in education research, as effective assessment helps devising support interventions to foster academic achievement. Measures based on event tracing usually require large amounts of data (e.g., MOOCs and large courses), while aptitude measures are often qualitative and need careful interpretation. Precise and interpretable evaluation of self-regulation skills in a normal K-12 classroom thus poses a challenge. Objectives The present study proposes and explores a learning analytics method of combining aptitude and event measures to evaluate student's self-regulation skills. Methods An explorative learning analytics s…
Application of a Knowledge Discovery Process to Study Instances of Capacitated Vehicle Routing Problems
Vehicle Routing Problems (VRP) are computationally challenging, constrained optimization problems, which have central role in logistics management. Usually different solvers are being developed and applied for different kind of problems. However, if descriptive and general features could be extracted to describe such problems and their solution attempts, then one could apply data mining and machine learning methods in order to discover general knowledge on such problems. The aim then would be to improve understanding of the most important characteristics of VRPs from both efficient solution and utilization points of view. The purpose of this article is to address these challenges by proposi…
OnMLM: An Online Formulation for the Minimal Learning Machine
Minimal Learning Machine (MLM) is a nonlinear learning algorithm designed to work on both classification and regression tasks. In its original formulation, MLM builds a linear mapping between distance matrices in the input and output spaces using the Ordinary Least Squares (OLS) algorithm. Although the OLS algorithm is a very efficient choice, when it comes to applications in big data and streams of data, online learning is more scalable and thus applicable. In that regard, our objective of this work is to propose an online version of the MLM. The Online Minimal Learning Machine (OnMLM), a new MLM-based formulation capable of online and incremental learning. The achievements of OnMLM in our…
A Memetic Differential Evolution in Filter Design for Defect Detection in Paper Production
This article proposes a Memetic Differential Evolution (MDE) for designing digital filters which aim at detecting defects of the paper produced during an industrial process. The MDE is an adaptive evolutionary algorithm which combines the powerful explorative features of Differential Evolution (DE) with the exploitative features of two local searchers. The local searchers are adaptively activated by means of a novel control parameter which measures fitness diversity within the population. Numerical results show that the DE framework is efficient for the class of problems under study and employment of exploitative local searchers is helpful in supporting the DE explorative mechanism in avoid…
Does the law matter? An empirical study on the accessibility of Finnish higher education institutions’ web pages
AbstractInformation and communication technology (ICT) has made higher education available to many students in a new way. The role of online learning in higher education institutions (HEIs) has grown to an unprecedented scale due to the COVID-19 pandemic. The diversity of higher education students has increased, and accessible solutions are needed. New European and national regulations support these trends. The research reported in this paper was conducted in Finland, which is one of the leading European countries in terms of high technology and digitalisation. The aim of this research is to explore the accessibility of all Finnish HEIs’ (N = 38) landing pages based on Web Content Accessibi…
Feature Extractors for Describing Vehicle Routing Problem Instances
The vehicle routing problem comes in varied forms. In addition to usual variants with diverse constraints and specialized objectives, the problem instances themselves – even from a single shared source - can be distinctly different. Heuristic, metaheuristic, and hybrid algorithms that are typically used to solve these problems are sensitive to this variation and can exhibit erratic performance when applied on new, previously unseen instances. To mitigate this, and to improve their applicability, algorithm developers often choose to expose parameters that allow customization of the algorithm behavior. Unfortunately, finding a good set of values for these parameters can be a tedious task that…
Let Me Hack It: Teachers’ Perceptions About ‘Making’ in Education
Making in education is an emergent practice focusing on learners as creators of things in a collaborative fashion while promoting knowledge construction through technology, design, and creative self-expression. Teachers’ (n=33) opinions about making were studied using an online questionnaire after they had attended an online course for professional development about making in education. The results suggest that there exists a group of educators who consider making as a promising approach in education and want to promote its use in schools. peerReviewed
Comparison of cluster validation indices with missing data
Clustering is an unsupervised machine learning technique, which aims to divide a given set of data into subsets. The number of hidden groups in cluster analysis is not always obvious and, for this purpose, various cluster validation indices have been suggested. Recently some studies reviewing validation indices have been provided, but any experiments against missing data are not yet available. In this paper, performance of ten well-known indices on ten synthetic data sets with various ratios of missing values is measured using squared euclidean and city block distances based clustering. The original indices are modified for a city block distance in a novel way. Experiments illustrate the di…
On automatic algorithm configuration of vehicle routing problem solvers
Many of the algorithms for solving vehicle routing problems expose parameters that strongly influence the quality of obtained solutions and the performance of the algorithm. Finding good values for these parameters is a tedious task that requires experimentation and experience. Therefore, methods that automate the process of algorithm configuration have received growing attention. In this paper, we present a comprehensive study to critically evaluate and compare the capabilities and suitability of seven state-of-the-art methods in configuring vehicle routing metaheuristics. The configuration target is the solution quality of eight metaheuristics solving two vehicle routing problem variants.…
Retrieving Open Source Software Licenses
Open Source Software maintenance and reuse require identifying and comprehending the applied software licenses. This paper first characterizes software maintenance, and open source software (OSS) reuse which are particularly relevant in this context. The information needs of maintainers and reusers can be supported by reverse engineering tools at different information retrieval levels. The paper presents an automated license retrieval approach called ASLA. User needs, system architecture, tool features, and tool evaluation are presented. The implemented tool features support identifying source file dependencies and licenses in source files, and adding new license templates for identifying l…
Towards Evidence-Based Academic Advising Using Learning Analytics
Academic advising is a process between the advisee, adviser and the academic institution which provides the degree requirements and courses contained in it. Content-wise planning and management of the student’ study path, guidance on studies and academic career support is the main joint activity of advising. The purpose of this article is to propose the use of learning analytics methods, more precisely robust clustering, for creation of groups of actual study profiles of students. This allows academic advisers to provide evidence-based information on the study paths that have actually happened similarly to individual students. Moreover, academic institutions can focus on management and upda…
Technology Comprehension
We account for the first research results from a government initiatedexperiment that scales Making to a national discipline.The Ministry of Education, in Denmark, has introducedTechnology Comprehension as a new discipline for lowersecondary education. Technology Comprehension is first experimentedas an elective subject in 13 schools. The disciplinecombines elements from computing, design, and the societalaspect of technology and, thus, resonates with the existingFabLearn and Making initiatives in Scandinavia. We reportthe identified opportunities and challenges based on interviews,surveys, and a theme discussion with experiencedteachers from the 13 schools. The main takeaways are: First,the…
“Sitting at the Stern and Holding the Rudder” : Teachers’ Reflections on Action in Higher Education Based on Student Agency Analytics
Digital technologies in teaching and learning in higher education have the potential to enhance student agency. Student agency is an essential resource to nurture, especially at times when students face challenges emerging from the volatile, uncertain, complex, and ambiguous world. In addition, contemporary policymaking has identified the importance of student agency. Student agency analytics is a process utilizing learning analytics, specifically psychometrics and machine learning to provide teachers insights about the agentic resources of their students. Four teachers in higher education were provided with student agency analytics results of their mathematics courses. The teachers partici…
Detection of developmental dyslexia with machine learning using eye movement data
Dyslexia is a common neurocognitive learning disorder that can seriously hinder individuals’ aspirations if not detected and treated early. Instead of costly diagnostic assessment made by experts, in the near future dyslexia might be identified with ease by automated analysis of eye movements during reading provided by embedded eye tracking technology. However, the diagnostic machine learning methods need to be optimized first. Previous studies with machine learning have been quite successful in identifying dyslexic readers, however, using contrasting groups with large performance differences between diagnosed and good readers. A practical challenge is to identify also individuals with bord…
Understanding the Study Experiences of Students in Low Agency Profile: Towards a Smart Education Approach
In this paper, we use student agency analytics to examine how university students who assessed to have low agency resources describe their study experiences. Students ( n=292 ) completed the Agency of University Students (AUS) questionnaire. Furthermore, they reported what kinds of restrictions they experienced during the university course they attended. Four different agency profiles were identified using robust clustering. We then conducted a thematic analysis of the open-ended answers of students who assessed to have low agency resources. Issues relating to competence beliefs, self-efficacy, student-teacher relations, time as a resource, student well-being, and course contents seemed to …
Korkeakoulujen yhteinen tutkintotavoitteinen koulutus : toiminta- ja koulutusmalli
Tämä raportti on syntynyt osana Euroopan sosiaalirahaston rahoittamaa DI/FM-yhteiskoulutuksen kehittäminen -hanketta. Raportissa tarkastellaan yhtä toimenpiteistä, joilla reagoitiin Keski-Suomen alueella alkaneeseen äkilliseen rakennemuutokseen vuosina 2008 – 2009. Toimenpide konkretisoitui Jyväskylän yliopiston sekä Tampereen teknillisen yliopiston järjestämänä yhteiskoulutuksena, jonka tavoitteena oli kehittää rakennemuutoksen kohderyhmän osaamista ja koulutusta alueellisten yritysten ja elinkeinorakenteen tarpeiden mukaisesti. Koulutuksen kohderyhmänä olivat alemman korkeakoulututkinnon suorittaneet työttömät tai työttömyysuhan alla olleet tuotekehitys-, prosessi- tai tuotantotehtävissä …
Semi-automatic literature mapping of participatory design studies 2006--2016
The paper presents a process of semi-automatic literature mapping of a comprehensive set of participatory design studies between 2006--2016. The data of 2939 abstracts were collected from 14 academic search engines and databases. With the presented method, we were able to identify six education-related clusters of PD articles. Furthermore, we point out that the identified clusters cover the majority of education-related words in the whole data. This is the first attempt to systematically map the participatory design literature. We argue that by continuing our work, we can help to perceive a coherent structure in the body of PD research.
Optimization of conducting structures by using the homogenization method
Approximation and numerical realization of a class of optimization problems with control variables represented by coefficients of linear elliptic state equations is considered. Convergence analysis of well-posed problems is performed by using one- and two-level approximation strategies. The latter is utilized in an optimization layout problem for two conductive constituents, for which the necessary steps to transfer the well-posed problem into a computational form are described and some numerical experiments are given.
Monte Carlo Simulations of Au38(SCH3)24 Nanocluster Using Distance-Based Machine Learning Methods
We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiol...
Visualizing Time Series State Changes with Prototype Based Clustering
Modern process and condition monitoring systems produce a huge amount of data which is hard to analyze manually. Previous analyzing techniques disregard time information and concentrate only for the indentification of normal and abnormal operational states. We present a new method for visualizing operational states and overall order of the transitions between them. This method is implemented to a visualization tool which helps the user to see the overall development of operational states allowing to find causes for abnormal behaviour. In the end visualization tool is tested in practice with real time series data collected from gear unit.
A New Augmented Lagrangian Approach for $L^1$-mean Curvature Image Denoising
Variational methods are commonly used to solve noise removal problems. In this paper, we present an augmented Lagrangian-based approach that uses a discrete form of the L1-norm of the mean curvature of the graph of the image as a regularizer, discretization being achieved via a finite element method. When a particular alternating direction method of multipliers is applied to the solution of the resulting saddle-point problem, this solution reduces to an iterative sequential solution of four subproblems. These subproblems are solved using Newton’s method, the conjugate gradient method, and a partial solution variant of the cyclic reduction method. The approach considered here differs from ex…
Sparse minimal learning machine using a diversity measure minimization
The minimal learning machine (MLM) training procedure consists in solving a linear system with multiple measurement vectors (MMV) created between the geometric congurations of points in the input and output spaces. Such geometric congurations are built upon two matrices created using subsets of input and output points, named reference points (RPs). The present paper considers an extension of the focal underdetermined system solver (FOCUSS) for MMV linear systems problems with additive noise, named regularized MMV FOCUSS (regularized M-FOCUSS), and evaluates it in the task of selecting input reference points for regression settings. Experiments were carried out using UCI datasets, where the …
A Topic-Case Driven Methodology for Web Course Design
A topic-case driven methodology for a web course design and realization process is based on software engineering metaphors for capturing the necessary steps in creating web courses by means of a content-based development method. The methodology combines instructional issues to design phases that guide teachers and instructors to design and implement online courses. The methodology has been used by students of computer science, teacher education as well as professional university educators from different educational fields. The results from these experiences have been reported as case studies. In this chapter, the methodology is introduced with the summarized results from three case studies.
How pedagogical agents communicate with students: A two-phase systematic review
Technological advancements have improved the capabilities of pedagogical agents to communicate with students. However, an increased use of pedagogical agents in learning environments calls for a deeper understanding of student–agent communication to assess the effectiveness of pedagogical agents in learning. This study is a two-phase systematic review of scientific papers on pedagogical agent communication research published between 2010 and 2020, including review papers and original research papers. In the first phase, this study analyses literature reviews and meta-analyses to find the status and research gaps. The findings indicate that pedagogical agents' characteristics and impact on l…
Extreme minimal learning machine: Ridge regression with distance-based basis
The extreme learning machine (ELM) and the minimal learning machine (MLM) are nonlinear and scalable machine learning techniques with a randomly generated basis. Both techniques start with a step in which a matrix of weights for the linear combination of the basis is recovered. In the MLM, the feature mapping in this step corresponds to distance calculations between the training data and a set of reference points, whereas in the ELM, a transformation using a radial or sigmoidal activation function is commonly used. Computation of the model output, for prediction or classification purposes, is straightforward with the ELM after the first step. In the original MLM, one needs to solve an addit…
Student agency analytics: learning analytics as a tool for analysing student agency in higher education
This paper presents a novel approach and a method of learning analytics to study student agency in higher education. Agency is a concept that holistically depicts important constituents of intentional, purposeful, and meaningful learning. Within workplace learning research, agency is seen at the core of expertise. However, in the higher education field, agency is an empirically less studied phenomenon with also lacking coherent conceptual base. Furthermore, tools for students and teachers need to be developed to support learners in their agency construction. We study student agency as a multidimensional phenomenon centring on student-experienced resources of their agency. We call the analyt…
Expert-based versus citation-based ranking of scholarly and scientific publication channels
Abstract The Finnish publication channel quality ranking system was established in 2010. The system is expert-based, where separate panels decide and update the rankings of a set of publications channels allocated to them. The aggregated rankings have a notable role in the allocation of public resources into universities. The purpose of this article is to analyze this national ranking system. The analysis is mainly based on two publicly available databases containing the publication source information and the actual national publication activity information. Using citation-based indicators and other available information with association rule mining, decision trees, and confusion matrices, …
Icon Recognition and Usability for Requirements Engineering
When we introduce icon-based language into the context of requirements engineering, we must take into account that what users perceive as recognizable and usable depends on their background. In this paper, we argue that it is not possible to provide a single set of visual notations that appeal to all of stakeholders. Instead, we suggest an adaptable preference framework, which generates personalized notations that correspond to personal background. We present and evaluate icon-based language: a new kind of approach to requirements engineering work to explore its possibility and usability. In an initial evaluation of students residing in Finland, results reveal that users are able to recogni…
A GPU-accelerated augmented Lagrangian based L1-mean curvature Image denoising algorithm implementation
This paper presents a graphics processing unit (GPU) implementation of a recently published augmented Lagrangian based L1-mean curvature image denoising algorithm. The algorithm uses a particular alternating direction method of multipliers to reduce the related saddle-point problem to an iterative sequence of four simpler minimization problems. Two of these subproblems do not contain the derivatives of the unknown variables and can therefore be solved point-wise without inter-process communication. Inparticular, this facilitates the efficient solution of the subproblem that deals with the non-convex term in the original objective function by modern GPUs. The two remaining subproblems are so…
Identifying Pathways to Computer Science : The Long-Term Impact of Short-Term Game Programming Outreach Interventions
Short-term outreach interventions are conducted to raise young students’ awareness of the computer science (CS) field. Typically, these interventions are targeted at K–12 students, attempting to encourage them to study CS in higher education. This study is based on a series of extra-curricular outreach events that introduced students to the discipline of computing, nurturing creative computational thinking through problem solving and game programming. To assess the long-term impact of this campaign, the participants were contacted and interviewed two to five years after they had attended an outreach event. We studied how participating in the outreach program affected the students’ perceptio…
Spatial weighted averaging for ERP denoising in EEG data
In the present paper we intend to improve the practical accuracy of ERP denoising methods proposed in earlier research by allowing them to take into account possible violations of the underlying assumptions, which often take place in practice. Here we consider ERP denoising approaches operating within the framework of the linear instantaneous mixing model that consist three steps: (1) forward linear transformation, (2) identification of the components related to signal and noise subspaces, (3) inverse transformation during which the components that belong to the noise subspace are disregarded, i.e. dimension reduction in the component space. The separation matrix is found based on problem-s…
A method for extracting subspace of deterministic sources from EEG data
In this paper, an algorithm for separating linear subspaces of time-locked brain responses and other noise sources in multichannel electroencephalography data is proposed. The search criterion used by method discriminates time-locked brain components and noise components on the basis of the assumed deterministic behavior that the time-locked brain sources obey. The comprehensive derivation of the method is given together with the description and the analysis of the results of the method's application to simulated and real EEG data sets. The possibilities of improving the results are also discussed.
Distributed Scrum when Turning into Maintenance : A Single Case Study
Global software development using agile methods is commonplace in software industry nowadays. Scrum, as the agile development management framework, can be distributed in many ways, especially concerning how the key roles are presented in different sites. We describe here a single case study of a distributed Scrum, mainly for maintenance of the already constructed web portal. Using a qualitative method, both working well and challenging parts of the software work, as experienced by the project stakeholders, are revealed and discussed.
The embodiment of emotion-label words and emotion-laden words : Evidence from late Chinese–English bilinguals
Although increasing studies have confirmed the distinction between emotion-label words (words directly label emotional states) and emotion-laden words (words evoke emotions through connotations), the existing evidence is inconclusive, and their embodiment is unknown. In the current study, the emotional categorization task was adopted to investigate whether these two types of emotion words are embodied by directly comparing how they are processed in individuals’ native language (L1) and the second language (L2) among late Chinese-English bilinguals. The results revealed that apart from L2 negative emotion-laden words, both types of emotion words in L1 and L2 produced significant emotion effe…
Introduction to partitioning-based clustering methods with a robust example
Course Satisfaction in Engineering Education Through the Lens of Student Agency Analytics
This Research Full Paper presents an examination of the relationships between course satisfaction and student agency resources in engineering education. Satisfaction experienced in learning is known to benefit the students in many ways. However, the varying significance of the different factors of course satisfaction is not entirely clear. We used a validated questionnaire instrument, exploratory statistics, and supervised machine learning to examine how the different factors of student agency affect course satisfaction among engineering students (N = 293). Teacher’s support and trust for the teacher were identified as both important and critical factors concerning experienced course satisf…
ASLA: reverse engineering approach for software license information retrieval
Software maintenance and reuse require identification of the applied software licenses. The information needs of maintainers and reusers can be supported by reverse engineering tools at different information retrieval levels. The paper presents a reverse engineering approach called ASLA for retrieving license information typically used in OSS. User needs, system architecture, tool features, and tool evaluation are presented. The implemented tool features include support for identifying source file dependencies and licenses in source files. The tool is evaluated against another tool for license information extraction. ASLA supports the same programming languages as GCC. License identificatio…
Feature Ranking of Large, Robust, and Weighted Clustering Result
A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…
Continuous reformulations and heuristics for the Euclidean travelling salesperson problem
We consider continuous reformulations of the Euclidean travelling salesperson problem (TSP), based on certain clustering problem formulations. These reformulations allow us to apply a generalisation with perturbations of the Weiszfeld algorithm in an attempt to find local approximate solutions to the Euclidean TSP.
Newton Method for Minimal Learning Machine
Minimal Learning Machine (MLM) is a distance-based supervised machine learning method for classification and regression problems. Its main advances are simple formulation and fast learning. Computing the MLM prediction in regression requires a solution to the optimization problem, which is determined by the input and output distance matrix mappings. In this paper, we propose to use the Newton method for solving this optimization problem in multi-output regression and compare the performance of this algorithm with the most popular Levenberg–Marquardt method. According to our knowledge, MLM has not been previously studied in the context of multi-output regression in the literature. In additio…
Icons: Visual representation to enrich requirements engineering work
Adapting icons in requirements engineering can support the multifaceted needs of stakeholders. Conventional ap- proaches to RE are mainly highlighted in diagrams. This paper introduces icon-based information as a way to represent ideas and concepts in the requirements engineering domain. We report on icon artifacts that support requirements engi- neering work such as priority types, status states and stakeholder kinds. We evaluate how users interpret meanings of icons and the efficacy of icon prototypes shaped to represent those requirements attributes. Our hypothesis is whether practitioners can recognize the icons’ meaning in terms of their functional representation. According to the empi…
Discovering Gender-Specific Knowledge from Finnish Basic Education using PISA Scale Indices
The Programme for International Student Assessment, PISA, is a worldwide study to assess knowledge and skills of 15- year-old students. Results of the latest PISA survey conducted in 2012 were published in December 2013. According to the results, Finland is one of the few countries where girls performed better in mathematics than boys. The purpose of this work is to refine the analysis of this observation by using education data mining techniques. More precisely, as part of standard PISA preprocessing phase certain scale indices are constructed based on information gathered from the background questionnaire of each participating student. The indices describe, e.g., students’ engagement, dri…
Robust refinement of initial prototypes for partitioning-based clustering algorithms
Non-uniqueness of solutions and sensitivity to erroneous data are common problems to large-scale data clustering tasks. In order to avoid poor quality of solutions with partitioning-based clustering methods, robust estimates (that are highly insensitive to erroneous data values) are needed and initial cluster prototypes should be determined properly. In this paper, a robust density estimation initialization method that exploits the spatial median estimate to the prototype update is presented. Besides being insensitive to noise and outliers, the new method is also computationally comparable with other traditional methods. The methods are compared by numerical experiments on a set of syntheti…
Hybrid vibration signal monitoring approach for rolling element bearings
New approach to identify different lifetime stages of rolling element bearings, to improve early bearing fault detection, is presented. We extract characteristic features from vibration signals generated by rolling element bearings. This data is first pre-labelled with an unsupervised clustering method. Then, supervised methods are used to improve the labelling. Moreover, we assess feature importance with each classifier. From the practical point of view, the classifiers are compared on how early emergence of a bearing fault is being suggested. The results show that all of the classifiers are usable for bearing fault detection and the importance of the features was consistent. peerReviewed
Instance-Based Multi-Label Classification via Multi-Target Distance Regression
Interest in multi-target regression and multi-label classification techniques and their applications have been increasing lately. Here, we use the distance-based supervised method, minimal learning machine (MLM), as a base model for multi-label classification. We also propose and test a hybridization of unsupervised and supervised techniques, where prototype-based clustering is used to reduce both the training time and the overall model complexity. In computational experiments, competitive or improved quality of the obtained models compared to the state-of-the-art techniques was observed. peerReviewed
Analysing Student Performance using Sparse Data of Core Bachelor Courses
Curricula for Computer Science (CS) degrees are characterized by the strong occupational orientation of the discipline. In the BSc degree structure, with clearly separate CS core studies, the learning skills for these and other required courses may vary a lot, which is shown in students' overall performance. To analyze this situation, we apply nonstandard educational data mining techniques on a preprocessed log file of the passed courses. The joint variation in the course grades is studied through correlation analysis while intrinsic groups of students are created and analyzed using a robust clustering technique. Since not all students attended all courses, there is a nonstructured sparsity…
Determination of the stochastic evolution equation from noisy experimental data
We have determined the coefficients of the Kardar-Parisi-Zhang equation as functions of coarse graining, which best describe the time evolution and spatial behavior observed for slow-combustion fronts in sheets of paper and magnetic flux fronts in a thin-film high-Tc superconductor. Reconstruction of the relevant equation of motion and its coefficients was mainly based on the inverse method proposed by Lam and Sander [Phys. Rev. Lett. 71, 561 (1993)]. The coefficient of the nonlinear term was also determined from the local slope-dependence of the front velocity.
A Robust Minimal Learning Machine based on the M-Estimator
In this paper we propose a robust Minimal Learning Machine (R-RLM) for regression problems. The proposed method uses a robust M-estimator to generate a linear mapping between input and output distances matrices of MLM. The R-MLM was tested on one synthetic and three real world datasets that were contaminated with an increasing number of outliers. The method achieved a performance comparable to the robust Extreme Learning Machine (R-RLM) and thus can be seen as a valid alternative for regression tasks on datasets with outliers. peerReviewed
On developing adaptive vocabulary learning game for children with an early language delay
Vocabulary replenishment is an ordinary child development process. Deviations in this process can significantly affect the further progress and perception of the educational material in the school. A hypothesis was proposed that the similarity of words on various factors can influence the child's understanding. To test this hypothesis in this work, we propose the development of AdapTalk game for children. This game is concentrated on learning words, in the context of animals. The game will create the basis for further testing the influence of the semantic similarity of words for the child. This work describes the development background, the basic principles of calculating the semantic simil…
Automating the Parameter Selection in VRP: An Off-line Parameter Tuning Tool Comparison
Vehicle route optimization is an important application of combinatorial optimization. Therefore, a variety of methods has been proposed to solve different challenging vehicle routing problems. An important step in adopting these methods to solve real-life problems is to find appropriate parameters for the routing algorithms. In this chapter, we show how this task can be automated using parameter tuning by presenting a set of comparative experiments on seven state-of-the-art tuning methods. We analyze the suitability of these methods in configuring routing algorithms, and give the first critical comparison of automated parameter tuners in vehicle routing. Our experimental results show that t…
Fitness diversity based adaptation in Multimeme Algorithms:A comparative study
This paper compares three different fitness diversity adaptations in multimeme algorithms (MmAs). These diversity indexes have been integrated within a MmA present in literature, namely fast adaptive memetic algorithm. Numerical results show that it is not possible to establish a superiority of one of these adaptive schemes over the others and choice of a proper adaptation must be made by considering features of the problem under study. More specifically, one of these adaptations outperforms the others in the presence of plateaus or limited range of variability in fitness values, another adaptation is more proper for landscapes having distant and strong basins of attraction, the third one, …
Applying Semiautomatic Generation of Conceptual Models to Decision Support Systems Domain
This paper presents a decision support system specification in the form of business use cases and a stereotyped conceptual model based on the specification. The use cases are based on generic user requirements and address cognitive biases. The specification can be used to set fixed and common terms among the project participants. Semiautomatic generation of the conceptual model is demonstrated with mixed results. While there are some shortcomings in the parsing and the result is dependent on the phrasing conventions used in the use cases, the conceptual model highlights the most essential entities in the domain and provides a base for further development phases. peerReviewed
Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering
Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…
Supporting Institutional Awareness and Academic Advising using Clustered Study Profiles
The purpose of academic advising is to help students with developing educational plans that support their academic career and personal goals, and to provide information and guidance on studies. Planning and management of the students’ study path is the main joint activity in advising. Based on a study log of passed courses, we propose to use robust, prototype-based clustering to identify a set of actual study path profiles. Such profiles identify groups of students with similar progress of studies, whose analysis and interpretation can be used for better institutional awareness and to support evidence-based academic advising. A model of automated academic advising system utilizing the possi…
Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection
The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generaliz…
Supporting Cyber Resilience with Semantic Wiki
Cyber resilient organizations, their functions and computing infrastructures, should be tolerant towards rapid and unexpected changes in the environment. Information security is an organization-wide common mission; whose success strongly depends on efficient knowledge sharing. For this purpose, semantic wikis have proved their strength as a flexible collaboration and knowledge sharing platforms. However, there has not been notable academic research on how semantic wikis could be used as information security management platform in organizations for improved cyber resilience. In this paper, we propose to use semantic wiki as an agile information security management platform. More precisely, t…
Abstract Estimates of the Rate of Convergence for Optimal Control Problems
A method for solving optimal control problems with general elliptic operators is presented and analyzed. Especially, estimates of the rate of convergence for the control problems with the proposed approach are derived independently of the underlying approximation method. Some numerical experiments with the proposed method are included.
The Finnish Version of the Affinity for Technology Interaction (ATI) Scale : Psychometric Properties and an Examination of Gender Differences
The pervasiveness of technical systems in our lives calls for a broad understanding of the interaction between humans and technology. Affinity for technology interaction (ATI) scale measures the tendency of a person to actively engage or to avoid interaction with technological systems, including both software and physical devices. This research presents a psychometric analysis of a Finnish version of the ATI scale. The data consisted of 796 responses of students in a Finnish university. The data were analyzed utilizing factor analysis and both nonparametric and parametric item response theory. The Finnish version of the ATI scale proved to be essentially unidimensional, showing high reliabi…
An enhanced memetic differential evolution in filter design for defect detection in paper production.
This article proposes an Enhanced Memetic Differential Evolution (EMDE) for designing digital filters which aim at detecting defects of the paper produced during an industrial process. Defect detection is handled by means of two Gabor filters and their design is performed by the EMDE. The EMDE is a novel adaptive evolutionary algorithm which combines the powerful explorative features of Differential Evolution with the exploitative features of three local search algorithms employing different pivot rules and neighborhood generating functions. These local search algorithms are the Hooke Jeeves Algorithm, a Stochastic Local Search, and Simulated Annealing. The local search algorithms are adap…
Orientation Adaptive Minimal Learning Machine for Directions of Atomic Forces
Machine learning (ML) force fields are one of the most common applications of ML in nanoscience. However, commonly these methods are trained on potential energies of atomic systems and force vectors are omitted. Here we present a ML framework, which tackles the greatest difficulty on using forces in ML: accurate prediction of force direction. We use the idea of Minimal Learning Machine to device a method which can adapt to the orientation of an atomic environment to estimate the directions of force vectors. The method was tested with linear alkane molecules. peerReviewed
Technology Comprehension — Combining computing, design, and societal reflection as a national subject
This article considers the implementation of a new learning subject ”Technology Comprehension” into lower secondary schools in Denmark, as part of an initiative by the Danish Ministry of Education. The subject consists of learning objectives related to computing, design, and societal reflection and was first introduced as an elective course in 13 schools to investigate how it could be integrated into the Danish education system. We present four key findings based on school visits, interviews, an electronical survey, two questionnaires, and workshops including theme discussions: (1) teachers did not perceive Technology Comprehension as a distinct subject, but rather as a set of skills that c…
Building blocks for odd–even multigrid with applications to reduced systems
Abstract Building blocks yielding an efficient implementation of the odd–even multigrid method for the Poisson problem in the reference domain (0,1) d , d=2,3, are described. Modifications needed to transform these techniques to solve reduced linear systems representing boundary value problems in arbitrary domains are given. A new way to define enriched coarser subspaces in the multilevel realization is proposed. Numerical examples demonstrating the efficiency of developed multigrid methods are included.
DOBRO : a prediction error correcting robot under drifts
We propose DOBRO, a light online learning module, which is equipped with a smart correction policy helping making decision to correct or not the given prediction depending on how likely the correction will lead to a better prediction performance. DOBRO is a standalone module requiring nothing more than a time series of prediction errors and it is flexible to be integrated into any black-box model to improve its performance under drifts. We performed evaluation in a real-world application with bus arrival time prediction problem. The obtained results show that DOBRO improved prediction performance significantly meanwhile it did not hurt the accuracy when drift does not happen.
Quantile index for gradual and abrupt change detection from CFB boiler sensor data in online settings
In this paper we consider the problem of online detection of gradual and abrupt changes in sensor data having high levels of noise and outliers. We propose a simple heuristic method based on the Quantile Index (QI) and study how robust this method is for detecting both gradual and abrupt changes with such data. We evaluate the performance of our method on the artificially generated and real datasets that represent different operational settings of a pilot circulating fluidized bed (CFB) reactor and CFB cold model. Our experiments suggest that QI can be used for designing very simple yet effective methods for gradual change detection in the noisy sensor data. It can be also used for detectin…
Improving Scalable K-Means++
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation …
Independent component analysis on the mismatch negativity in an uninterrupted sound paradigm.
We compared the efficiency of the independent component analysis (ICA) decomposition procedure against the difference wave (DW) and optimal digital filtering (ODF) procedures in the analysis of the mismatch negativity (MMN). The comparison was made in a group of 54 children aged 8-16 years. The MMN was elicited in a passive oddball protocol presenting uninterrupted auditory stimulation consisting of two frequent alternating tones (600 and 800 Hz) of 100 ms duration each. Infrequently, one of the 600 Hz tones was shortened to 50 or 30 ms. The event related potentials (ERPs) were decomposed into the MMN-like and non-MMN-like independent components (ICs) through the FastICA algorithm. The ICA …
Knowledge Discovery from the Programme for International Student Assessment
The Programme for International Student Assessment (PISA) is a worldwide study that assesses the proficiencies of 15-year-old students in reading, mathematics, and science every three years. Despite the high quality and open availability of the PISA data sets, which call for big data learning analytics, academic research using this rich and carefully collected data is surprisingly sparse. Our research contributes to reducing this deficit by discovering novel knowledge from the PISA through the development and use of appropriate methods. Since Finland has been the country of most international interest in the PISA assessment, a relevant review of the Finnish educational system is provided. T…
Icon-based language in requirements development
ERP denoising in multichannel EEG data using contrasts between signal and noise subspaces
Abstract In this paper, a new method intended for ERP denoising in multichannel EEG data is discussed. The denoising is done by separating ERP/noise subspaces in multidimensional EEG data by a linear transformation and the following dimension reduction by ignoring noise components during inverse transformation. The separation matrix is found based on the assumption that ERP sources are deterministic for all repetitions of the same type of stimulus within the experiment, while the other noise sources do not obey the determinancy property. A detailed derivation of the technique is given together with the analysis of the results of its application to a real high-density EEG data set. The inter…
Online mass flow prediction in CFB boilers with explicit detection of sudden concept drift
Fuel feeding and inhomogeneity of fuel typically cause fluctuations in the circulating fluidized bed (CFB) process. If control systems fail to compensate the fluctuations, the whole plant will suffer from dynamics that is reinforced by the closed-loop controls. This phenomenon causes reducing efficiency and the lifetime of process components. In this paper we address the problem of online mass flow prediction, which is a part of control. Particularly, we consider the problem of learning an accurate predictor with explicit detection of abrupt concept drift and noise handling mechanisms. We emphasize the importance of having domain knowledge concerning the considered case and constructing the…
Mislabel Detection of Finnish Publication Ranks
The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results on the reference paper.
Icon Representations in Supporting Requirements Elicitation Process
Abstract. T his paper considers the diff iculties faced by the stakeholders in general requirements engineering (R E ) . These difficulties range from the complexity of requirements gathering to requirements presentation . Affordable visualization techniques have been wide ly implemented to support th e re quirements engineering community . However, no universal characteristic s that could be associated with requirements completion have been identified so far . T he research focus of this paper is driven by the above considerations to introduce the icon - base d language comprising a set of icon notations, syntactic and semantics . Icon - based language would support the requirements engine…
Open Resources as the Educational Basis for a Bachelor-level Project-Based Course
This article presents an innovation-based course concept for project-based learning. In this course, student groups are asked to ideate and implement a software product based on Open Data and Open API releases. By emphasizing studentsâ own product ideation, the course requires and enhances self-directed learning skills and prompts the students to see the unlimited possibilities in becoming and being a practitioner of the computing discipline. Relatedly, the course provides a tool to improve student self-efficacy, as the students, coached through challenges, come to know that they are able to produce software using various open interfaces.
Aligning Two Specifications for Controlling Information Security
Assuring information security is a necessity in modern organizations. Many recommendations for information security management exist, which can be used to define a baseline of information security requirements. ISO/ IEC 27001 prescribes a process for an information security management system, and guidance to implement security controls is provided in ISO/IEC 27002. Finnish National Security Auditing Criteria (KATAKRI) has been developed by the national authorities in Finland as a tool to verify maturity of information security practices. KATAKRI defines both security control objectives and security controls to meet an objective. Here the authors compare and align these two specifications in…
Context-sensitive framework for visual analytics in energy production from biomass
Data masses require a lot of data processing. Data mining is the traditional way to convert data into knowledge. In visual analytics, humans are integrated into the process as there is continuous interaction between the analyst and the analysis software. Data mining methods can be utilized also in visual analytics where the priority is given to the visualization of the information and to dimension reduction. However, the provided data is not always enough. There is a large amount of background contextual information, which should be included into the automated process. This paper describes a context-sensitive approach, in which we utilize visual analytics by studying all phases in the proce…
A robust approach to ERP denoising
The purpose of presented study is to explore possibilities to increase the robustness and improve the performance of the spatial ERP denoising methods proposed in earlier research. The quality of the subspace separation solution may easily be degraded essentially, if the underlying assumptions become noticeably violated, which is a normal situation in practice. The distortions to the results of a separation are caused by non-zero sample signal-noise and noise-noise correlations, which are indistinguishable from the variances of the signal and noise in the framework of the second-order statistical information exploited by the discussed methods. Therefore, in the research reported in this art…
Can we automate expert-based journal rankings? Analysis of the Finnish publication indicator
The publication indicator of the Finnish research funding system is based on a manual ranking of scholarly publication channels. These ranks, which represent the evaluated quality of the channels, are continuously kept up to date and thoroughly reevaluated every four years by groups of nominated scholars belonging to different disciplinary panels. This expert-based decision-making process is informed by available citation-based metrics and other relevant metadata characterizing the publication channels. The purpose of this paper is to introduce various approaches that can explain the basis and evolution of the quality of publication channels, i.e., ranks. This is important for the academic …
Scalable robust clustering method for large and sparse data
Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach. peerReviewed
Model selection for Extreme Minimal Learning Machine using sampling
A combination of Extreme Learning Machine (ELM) and Minimal Learning Machine (MLM)—to use a distance-based basis from MLM in the ridge regression like learning framework of ELM—was proposed in [8]. In the further experiments with the technique [9], it was concluded that in multilabel classification one can obtain a good validation error level without overlearning simply by using the whole training data for constructing the basis. Here, we consider possibilities to reduce the complexity of the resulting machine learning model, referred as the Extreme Minimal Leaning Machine (EMLM), by using a bidirectional sampling strategy: To sample both the feature space and the space of observations in o…
Robust Principal Component Analysis of Data with Missing Values
Principal component analysis is one of the most popular machine learning and data mining techniques. Having its origins in statistics, principal component analysis is used in numerous applications. However, there seems to be not much systematic testing and assessment of principal component analysis for cases with erroneous and incomplete data. The purpose of this article is to propose multiple robust approaches for carrying out principal component analysis and, especially, to estimate the relative importances of the principal components to explain the data variability. Computational experiments are first focused on carefully designed simulated tests where the ground truth is known and can b…
A linearization technique and error estimates for distributed parameter identification in quasilinear problems
The identification problem of a nonlinear functional coefficient in elliptic and parabolic quasilinear equations is considered. A distributed observation of the solution of the corresponding equation is assumed to be known a priori. An identification method is introduced, which needs only a linear equation to be solved in each iteration step of the optimization. Estimates of the rate of convergence for the proposed approach are proved, when the equation is discretized with the finite element method with respect to space variables. Some numerical results are given.
Feature selection for distance-based regression: An umbrella review and a one-shot wrapper
Feature selection (FS) may improve the performance, cost-efficiency, and understandability of supervised machine learning models. In this paper, FS for the recently introduced distance-based supervised machine learning model is considered for regression problems. The study is contextualized by first providing an umbrella review (review of reviews) of recent development in the research field. We then propose a saliency-based one-shot wrapper algorithm for FS, which is called MAS-FS. The algorithm is compared with a set of other popular FS algorithms, using a versatile set of simulated and benchmark datasets. Finally, experimental results underline the usefulness of FS for regression, confirm…
Capturing cognitive load management during authentic virtual reality flight training with behavioural and physiological indicators
Background Cognitive load (CL) management is essential in safety-critical fields so that professionals can monitor and control their cognitive resources efficiently to perform and solve scenarios in a timely and safe manner, even in complex and unexpected circumstances. Thus, cognitive load theory (CLT) can be used to design virtual reality (VR) training programmes for professional learning in these fields. Objectives We studied CL management performance through behavioural indicators in authentic VR flight training and explored if and to what extent physiological data was associated with CL management performance. Methods The expert (n = 8) and novice pilots (n = 6) performed three approac…
Modelling Recurrent Events for Improving Online Change Detection
The task of online change point detection in sensor data streams is often complicated due to presence of noise that can be mistaken for real changes and therefore affecting performance of change detectors. Most of the existing change detection methods assume that changes are independent from each other and occur at random in time. In this paper we study how performance of detectors can be improved in case of recurrent changes. We analytically demonstrate under which conditions and for how long recurrence information is useful for improving the detection accuracy. We propose a simple computationally efficient message passing procedure for calculating a predictive probability distribution of …
Habituating Students to IPR Questions During Creative Project Work
Tailorable Representation of Security Control Catalog on Semantic Wiki
Selection of security controls to be implemented is an essential part of the information security management process in an organization. There exist a number of readily available information security management system standards, including control catalogs, that could be tailored by the organizations to meet their security objectives. Still, it has been noted that many organizations tend to lack even the implementation of the fundamental security controls. At the same time, semantic wikis have become popular collaboration and information sharing platforms that have proven their strength as an effective way to distribute domain-specific information within an organization. This paper evaluates…
Au38Q MBTR-K3
Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe version…
Au38Q MBTR-K3
Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe vers…
Kernels and Graphs on M25 + H (parent repository)
The repository contains codes related to article "Graphs and Kernelized Learning Applied to Interactions of Hydrogen with Doped Gold Nanoparticle Electrocatalysts". There are two main types of codes: codes to transform a catalytic system of protected gold nanoparticle and a single hydrogen atom into a graph-based representation, and codes to run kernel-based machine learning methods to predict interaction energies between the nanoparticle and the hydrogen atom. This is the metadata for the parent repository of the codes. Updates and possible corrections are documented in the GitLab project, where the material saved and shared. The GitLab project can be found and downloaded from the followin…