Factors Affecting Attrition among First Year Computer Science Students: the Case of University of Latvia
<p class="R-AbstractKeywords"><span lang="EN-GB">The purpose of our study was to identify reasons for high dropout of students enrolled in the first year of the computer science study program to make it possible to determine students, who are potentially in risk. Several factors that could affect attrition, as it was originally assumed, were studied: high school grades (admission score), compensative course in high school mathematics, intermediate grades for core courses, prior knowledge of programming. However, the results of our study indicate that none of the studied factors is determinant to identify those students, who are going to abandon their studies, with great precisio…
The measuring of research results can be used in different ways e.g. for assignment of research grants and afterwards for evaluation of project’s results. It can be used also for recruiting or promoting research institutions’ staff. Because of a wide usage of such measurement, the selection of appropriate measures is important. At the same time there does not exist a common view which metrics should be used in this field, moreover many existing metrics that are widely used are often misleading due to different reasons, e.g. computed from incomplete or faulty data, the metric’s computation formula may be invalid or the computation results can be interpreted wrongly. To produce a good framewo…
The Adaptation of a Web Information System: A Perspective of Organizations
We provide a different view on the problem of Web Information System (WIS) adaptation, looking from perspective of organizations that are interested in an adapted Web Information System for their needs if a unified system to support similar business processes is used. We propose an adaptation architecture for WIS. Two levels of adaptation are introduced—coarse grained adaptation for the organization level and fine grained adaptation for the user level. The architecture supports also the situation, when users can work with many instances of the system adapted for different organizations, which are integrated into one instance for a particular user.
Towards a Data Warehouse Architecture for Managing Big Data Evolution
Evolution-Oriented User-Centric Data Warehouse
Data warehouses tend to evolve, because of changes in data sources and business requirements of users. All these kinds of changes must be properly handled, therefore, data warehouse development is never-ending process. In this paper we propose the evolution-oriented user-centric data warehouse design, which on the one hand allows to manage data warehouse evolution automatically or semi-automatically, and on the other hand it provides users with the understandable, easy and transparent data analysis possibilities. The proposed approach supports versions of data warehouse schemata and data semantics.
Research Directions of OLAP Personalizaton
In this paper we have highlighted five existing approaches for introducing personalization in OLAP: preference constructors, dynamic personalization, visual OLAP, recommendations with user session analysis and recommendations with user profile analysis and have analyzed research papers within these directions. We have provided an evaluation in order to point out (i) personalization options, described in these approaches, and its applicability to OLAP schema elements, aggregate functions, OLAP operations, (ii) the type of constraints (hard, soft or other), used in each approach, (iii) the methods for obtaining user preferences and collecting user information. The goal of our paper is to syst…
Publication Data Integration as a Tool for Excellence-Based Research Analysis at the University of Latvia
The evaluation of research results can be carried out with different purposes aligned with strategic goals of an institution, for example, to decide upon distribution of research funding or to recruit or promote employees of an institution involved in research. Whereas quantitative measures such as number of scientific papers or number of scientific staff are commonly used for such evaluation, the strategy of the institution can be set to achieve ambitious scientific goals. Therefore, a question arises as to how more quality oriented aspects of the research outcomes should be measured. To supply an appropriate dataset for evaluation of both types of metrics, a suitable framework should be p…
Handling Evolving Data Warehouse Requirements
A data warehouse is a dynamic environment and its business requirements tend to evolve over time, therefore, it is necessary not only to handle changes in data warehouse data, but also to adjust a data warehouse schema in accordance with changes in requirements. In this paper, we propose an approach to propagate modified data warehouse requirements in data warehouse schemata. The approach supports versions of data warehouse schemata and employs the requirements formalization metamodel and multiversion data warehouse metamodel to identify necessary changes in a data warehouse.
A comparison of HDFS compact data formats: Avro versus Parquet
In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…
A Little Bird Told Me: Discovering KPIs from Twitter Data
The goal of our research and experiments is to find the definitions and values of key performance indicators (KPIs) in unstructured text. The direct access to opinions of customers served as a motivating factor for us to choose Twitter data for our experiments. For our case study, we have chosen the restaurant business domain. As in the other business domains, KPIs often serve as a solution for identification of current problems. Therefore, it is essential to learn which criteria are important to restaurant guests. The mission of our Proof-of-Concept KPI discovery tool presented in this paper is to facilitate the explorative analysis taking Twitter user posts as a data source. After process…
Performance Measurement Framework with Formal Indicator Definitions
Definition of appropriate measures of organization’s performance should be conducted in a systematic way. In this paper the performance measurement and indicators are discussed not only from the side of management models, but also from the point of view of measurement theories to find out appropriate definitions. In our work we propose a formal specification of indicators. The principles of indicator reformulation from free form indicators to formal requirements are formulated and applied in several examples from performance measures database. The formally defined indicators could be used in the proposed performance measurement framework that covers five-step indicator lifecycle.
Computer Programming Aptitude Test as a Tool for Reducing Student Attrition
Submitted to the VTR conference to be held in Rezekne, June 2015
Extending a Metamodel for Formalization of Data Warehouse Requirements
In performance measurement systems that are built on top of a data warehouse, the information requirements in natural language are different performance indicators that should be stored and analyzed. We use the requirement formalization metamodel to create a formal requirement repository out of information requirements in natural language. In the course of this research we tested the compatibility of the existing requirement formalization metamodel applying it to a set of over 150 requirements for the currently operating data warehouse project. As a result, we extended the formal specification of information requirements with some additional classes like themes, grouping, and requirement pr…
Intellectual Ability Data Obtaining and Processing for E-Learning System Adaptation
In this article authors describe how an e-learning system can obtain data about learner, so that later it could offer individual content for each learner, based on the obtained data. Authors also describe how the Learning Management System (LMS) Moodle has adapted a standard quiz module interface for testing elementary school students and how students’ individual abilities could be measured more efficiently, for example, by measuring mathematical reaction time. For obtaining necessary testing results and partial processing a new module (TAnalizer) is offered, which is adapted to the Moodle environment. With this module one can gain precise data about each student’s testing process and stude…
Change Discovery in Heterogeneous Data Sources of a Data Warehouse
Data warehouses have been used to analyze data stored in relational databases for several decades. However, over time, data that are employed in the decision-making process have become so enormous and heterogeneous that traditional data warehousing solutions have become unusable. Therefore, new big data technologies have emerged to deal with large volumes of data. The problem of structural evolution of integrated heterogeneous data sources has become extremely topical due to dynamic and diverse nature of big data. In this paper, we propose an approach to change discovery in data sources of a data warehouse utilized to analyze big data. Our solution incorporates an architecture that allows t…
Architecture Enabling Adaptation of Data Integration Processes for a Research Information System
Abstract Today, many efforts have been made to implement information systems for supporting research evaluation activities. To produce a good framework for research evaluation, the selection of appropriate measures is important. Quality aspects of the systems’ implementation should also not be overlooked. Incomplete or faulty data should not be used and metric computation formulas should be discussed and valid. Correctly integrated data from different information sources provide a complete picture of the scientific activity of an institution. Knowledge from the data integration field can be adapted in research information management. In this paper, we propose a research information system f…
Can SQ and EQ Values and Their Difference Indicate Programming Aptitude to Reduce Dropout Rate?
A crucial problem that we are currently facing at the Faculty of Computing of the University of Latvia is that during the first study semester on average 30% of the first-year students drop out, whereas after the first year of studies the number of dropouts increases up to nearly 50%. Thus, our overall goal is to determine in advance applicants that most likely will not finish the first study year successfully. A hypothesis formulated in another research study was that programming aptitude could be predicted based on the results of two personality self-report questionnaires − Systemizing Quotient (SQ) and Empathy Quotient (EQ) − taken by students. The difference between the SQ and EQ scores…
The Application of Optimal Topic Sequence in Adaptive e-Learning Systems
In an adaptive e-learning system an opportunity to choose a course topic sequence is given to ensure personalization. The topic sequence can be obtained from three sources: teacher-offered topic sequence that is based on teacher’s pedagogical experience; learner’s free choice that is based on indicated links between topics, and, finally, the optimal topic sequence acquisition method described in this article. The optimal topic sequence is based on previous learners’ experience. With the help of the optimal topic sequence method, data about previous learners’ course topic sequence and course results are obtained. After the data analysis the optimal topic sequence for the specific course is o…
Towards a System to Monitor the Virus’s Aerosol-Type Spreading
Recent scientific studies indicate that attention should be paid to the indoor spread of the Covid-19 virus. It is recommended to reduce the number of visitors to the premises and to provide frequent ventilation of the premises. The problem is that it is not known what the risk of infection is in a particular room at a specific time, when and what actions should be taken to reduce the risk. We offer a system that helps monitor the conditions in the premises with the help of sensors, calculate the risk of infection and provide information to reduce the infection risk. We give an insight into the created prototype with data collection from public spaces and data visualization according to use…
On Metadata Support for Integrating Evolving Heterogeneous Data Sources
With the emergence of big data technologies, the problem of structure evolution of integrated heterogeneous data sources has become extremely topical due to dynamic and diverse nature of big data. To solve the big data evolution problem, we propose an architecture that allows to store and process structured and unstructured data at different levels of detail, analyze them using OLAP capabilities and semi-automatically manage changes in requirements and data expansion. In this paper, we concentrate on the metadata essential for the operation of the proposed architecture. We propose a metadata model to describe schemata and supplementary properties of data sets extracted from sources and tran…
Strategies to Reduce Attrition among First Year Computer Science Students
The observed trend to lose from one-third to half of students in the first year of computing studies at the University of Latvia served as the motivation to explore the causes of dropout and to find methods, how to determine potential dropouts in advance. The study investigates students enrolled in the year 2013 using integrated data from surveys, management information system and e-learning environment. Several factors that could affect attrition were studied: admission score, compensative course in high school mathematics, intermediate grades for core courses, prior knowledge in programming. The research revealed that the trend of non-beginning studies might indicate the wrong choice of t…
Managing Evolution of Heterogeneous Data Sources of a Data Warehouse
Information Requirements for Big Data Projects: A Review of State-of-the-Art Approaches
Big data technologies are rapidly gaining popularity and become widely used, thus, making the choice of developing methodologies including the approaches for requirements analysis more acute. There is a position that in the context of the Data Warehousing (DW), similar to other Decision Support Systems (DSS) technologies, defining information requirements (IR) can increase the chances of the project to be successful with its goals achieved. This way, it is important to examine this subject in the context of Big data due to the lack of research in the field of Big data requirements analysis. This paper gives an overview of the existing methods associated with Big data technologies and requir…
Gathering formalized information requirements of a data warehouse
Query-Driven Method for Improvement of Data Warehouse Conceptual Model
We propose a query-driven method that elicits the information requirements from existing queries on data sources and their usage statistics. Our method presumes that the queries against the source database reflect the analysis needs of users. We use this method to recommend changes to the existing data warehouse schemata. In our method, we take advantage of the schema versioning approach to reflect all changes that occur in the analysed process, and we analyse the activity of users in the source system, rather than changes in physical data structure, to infer the necessary improvements to the data warehouse schema.
Accelerating data queries on Hadoop framework by using compact data formats
There are massive amounts of data generated from IoT, online transactions, click streams, emails, logs, posts, social networking interactions, sensors, mobile phones and their applications etc. The question is where and how to store these data in order to provide faster data access. Understanding and handling Big Data is a big challenge. The research direction in Big Data projects using Hadoop Technology, MapReduce kind of framework and compact data formats such as RCFile, SequenceFile, ORC, Avro, Parquet shows that only two data formats (Avro and Parquet) support schema evolution and compression in order to utilize less storage space. In this paper, file formats like Avro and Parquet are c…
OLAP Personalization with User-Describing Profiles
In this paper we have highlighted five existing approaches for introducing personalization in OLAP: preference constructors, dynamic personalization, visual OLAP, recommendations with user session analysis and recommendations with user profile analysis and have analyzed research papers within these directions. We have pointed out applicability of personalization to OLAP schema elements in these approaches. The comparative analysis has been made in order to highlight a certain personalization approach. A new method has been proposed, which provides exhaustive description of interaction between user and data warehouse, using the concept of Zachman Framework [1, 2], according to which a set of…
The Use of the Recommended Learning Path in the Personalized Adaptive E-Learning System
This paper promotes the idea of the learning process management in the e-learning system. A personalized adaptive e-learning system is used in this research that comprises three developed topic acquisition sequences: teacher, learner or optimal topic sequences. The learner has the ability to switch between the aforementioned topic sequences. The system stores data about the course acquisition process. The analysis of the stored data demonstrated that a bit more than half of the students used the teacher topic sequence; higher grades in topics got those students who chose the learner or optimal topic sequence; the grades of the half of the students who used the optimal and teacher topic sequ…
Development of Data Warehouse Conceptual Models
There are many methods in the area of data warehousing to define requirements for the development of the most appropriate conceptual model of a data warehouse. There is no universal consensus about the best method, nor are there accepted standards for the conceptual modeling of data warehouses. Only few conceptual models have formally described methods how to get these models. Therefore, problems arise when in a particular data warehousing project, an appropriate development approach, and a corresponding method for the requirements elicitation, should be chosen and applied. Sometimes it is also necessary not only to use the existing methods, but also to provide new methods that are usable i…
Adaptation of the Presentation in a Multi-tenant Web Information System
We introduced a Web Information System (WIS) adaptation architecture that is based on Software as a Service (SaaS) ideas. It includes adaptation components, which allow adaptation in two levels: the organizations and the users get their own adapted instance of the WIS. The user interface in case of multi-tenancy should be dynamically adapted according to the particular organization and user. The same application component that contains a set of fields, controls and other interface elements should be varied according to the usage context. In this paper we provide a method for adapting the user interface within the proposed adaptation architecture that uses a set of rules describing the seque…