Search results for "Workflow"
showing 10 items of 133 documents
sar: Automatic generation of statistical reports using Stata and Microsoft Word for Windows
2013
The output provided by most Stata commands is plain text not suitable to be presented or published. After the numerical and graphical outputs are obtained, the user has to copy them into a word processor to complete the editing process. Some Stata commands help you to obtain well-formatted output, especially tabulated results in LATEX or other formats, but they are not a complete solution nor are they friendly tools. Stata automatic report (Sar) is an easy-to-use macro for Microsoft Word for Windows that allows a powerful integration between Stata and Word. With Sar, the user can retrieve numerical results and graphs from Stata and automatically insert them into a well-formatted Word docum…
A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR.
2013
(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate t…
A Generic Approach to Scheduling and Checkpointing Workflows
2018
This work deals with scheduling and checkpointing strategies to execute scientific workflows on failure-prone large-scale platforms. To the best of our knowledge, this work is the first to target fail-stop errors for arbitrary workflows. Most previous work addresses soft errors, which corrupt the task being executed by a processor but do not cause the entire memory of that processor to be lost, contrarily to fail-stop errors. We revisit classical mapping heuristics such as HEFT and MinMin and complement them with several checkpointing strategies. The objective is to derive an efficient trade-off between checkpointing every task (CkptAll), which is an overkill when failures are rare events, …
Meeting Report from the Second “Minimum Information for Biological and Biomedical Investigations” (MIBBI) workshop
2011
This report summarizes the proceedings of the second workshop of the ‘Minimum Information for Biological and Biomedical Investigations’ (MIBBI) consortium held on Dec 1–2, 2010 in Rudesheim, Germany through the sponsorship of the Beilstein-Institute. MIBBI is an umbrella organization uniting communities developing Minimum Information (MI) checklists to standardize the description of data sets, the workflows by which they were generated and the scientific context for the work. This workshop brought together representatives of more than twenty communities to present the status of their MI checklists and plans for future development. Shared challenges and solutions were identified and the role…
Impact of analytic provenance in genome analysis
2014
Background Many computational methods are available for assembly and annotation of newly sequenced microbial genomes. However, when new genomes are reported in the literature, there is frequently very little critical analysis of choices made during the sequence assembly and gene annotation stages. These choices have a direct impact on the biologically relevant products of a genomic analysis - for instance identification of common and differentiating regions among genomes in a comparison, or identification of enriched gene functional categories in a specific strain. Here, we examine the outcomes of different assembly and analysis steps in typical workflows in a comparison among strains of Vi…
Towards Low-Cost Pavement Condition Health Monitoring and Analysis Using Deep Learning
2020
Governments are faced with countless challenges to maintain conditions of road networks. This is due to financial and physical resource deficiencies of road authorities. Therefore, low-cost automated systems are sought after to alleviate these issues and deliver adequate road conditions for citizens. There have been several attempts at creating such systems and integrating them within Pavement management systems. This paper utilizes replicable deep learning techniques to carry out hotspot analyses on urban road networks highlighting important pavement distress types and associated severities. Following this, analyses were performed illustrating how the hotspot analysis can be carried out to…
Automatic Customization Framework for Efficient Vehicle Routing System Deployment
2017
Vehicle routing systems provide several advantages over manual transportation planning and they are attracting growing attention. However, deployment of these systems can be prohibitively costly, especially for small and medium-sized enterprises: the customization, integration, and migration is laborious and requires operations research expetise. We propose an automated configuration workflow for vehicle routing system and data flow customization, which provides the necessary basis for more experimental work on the subject. Our preliminary results with learning and adaptive algorithms support the assumption of applicability of the proposed configuration framework. The strategies presented h…
OpenHVSR - Processing toolkit: Enhanced HVSR processing of distributed microtremor measurements and spatial variation of their informative content
2018
Abstract The investigation of seismic ambient noise (microtremor) in spectral ratio form, known as the Horizontal-to-Vertical Spectral Ratio technique, is extremely popular nowadays both to investigate large areas in a reduced amount of time, and to leverage a wider choice of low cost equipment. In general, measurements at multiple locations are collected to generate multiple, individual spectral ratio curves. Recently, however, there has been an increasing interest in spatially correlating informative content from different locations. Accordingly, we introduce a new computer program, “OpenHVSR – Processing Toolkit”, developed in Matlab (R2015b), specifically engineered to enhance data proc…
Streamlining distributed Deep Learning I/O with ad hoc file systems
2021
With evolving techniques to parallelize Deep Learning (DL) and the growing amount of training data and model complexity, High-Performance Computing (HPC) has become increasingly important for machine learning engineers. Although many compute clusters already use learning accelerators or GPUs, HPC storage systems are not suitable for the I/O requirements of DL workflows. Therefore, users typically copy the whole training data to the worker nodes or distribute partitions. Because DL depends on randomized input data, prior work stated that partitioning impacts DL accuracy. Their solutions focused mainly on training I/O performance on a high-speed network but did not cover the data stage-in pro…
Reusability and modularity in transactional workflows
1997
Abstract Workflow management techniques have become an intensive area of research in information systems. In large scale workflow systems modularity and reusability of existing task structures with context dependent (parameterized) task execution are essential components of a successful application. In this paper we study the issues related to management of modular transactional workflows, i.e., workflows that reuse component tasks and thus avoid redundancy in design. The notion of parameterized transactional properties of workflow tasks is introduced and analyzed, and the underlying architectural issues are discussed.