0000000000800597
AUTHOR
Andreas Hildebrandt
Next-generation sequencing: big data meets high performance computing
The progress of next-generation sequencing has a major impact on medical and genomic research. This high-throughput technology can now produce billions of short DNA or RNA fragments in excess of a few terabytes of data in a single run. This leads to massive datasets used by a wide range of applications including personalized cancer treatment and precision medicine. In addition to the hugely increased throughput, the cost of using high-throughput technologies has been dramatically decreasing. A low sequencing cost of around US$1000 per genome has now rendered large population-scale projects feasible. However, to make effective use of the produced data, the design of big data algorithms and t…
DrugTargetInspector: An assistance tool for patient treatment stratification
Cancer is a large class of diseases that are characterized by a common set of features, known as the Hallmarks of cancer. One of these hallmarks is the acquisition of genome instability and mutations. This, combined with high proliferation rates and failure of repair mechanisms, leads to clonal evolution as well as a high genotypic and phenotypic diversity within the tumor. As a consequence, treatment and therapy of malignant tumors is still a grand challenge. Moreover, under selective pressure, e.g., caused by chemotherapy, resistant subpopulations can emerge that then may lead to relapse. In order to minimize the risk of developing multidrug-resistant tumor cell populations, optimal (comb…
Machine learning of reverse transcription signatures of variegated polymerases allows mapping and discrimination of methylated purines in limited transcriptomes
AbstractReverse transcription (RT) of RNA templates containing RNA modifications leads to synthesis of cDNA containing information on the modification in the form of misincorporation, arrest, or nucleotide skipping events. A compilation of such events from multiple cDNAs represents an RT-signature that is typical for a given modification, but, as we show here, depends also on the reverse transcriptase enzyme. A comparison of 13 different enzymes revealed a range of RT-signatures, with individual enzymes exhibiting average arrest rates between 20 and 75%, as well as average misincorporation rates between 30 and 75% in the read-through cDNA. Using RT-signatures from individual enzymes to trai…
A fast solver for nonlocal electrostatic theory in biomolecular science and engineering
Biological molecules perform their functions surrounded by water and mobile ions, which strongly influence molecular structure and behavior. The electrostatic interactions between a molecule and solvent are particularly difficult to model theoretically, due to the forces' long range and the collective response of many thousands of solvent molecules. The dominant modeling approaches represent the two extremes of the trade-off between molecular realism and computational efficiency: all-atom molecular dynamics in explicit solvent, and macroscopic continuum theory (the Poisson or Poisson--Boltzmann equation). We present the first fast-solver implementation of an advanced nonlocal continuum theo…
CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…
Graph Rewriting Based Search for Molecular Structures: Definitions, Algorithms, Hardness
We define a graph rewriting system that is easily understandable by humans, but rich enough to allow very general queries to molecule databases. It is based on the substitution of a single node in a node- and edge-labeled graph by an arbitrary graph, explicitly assigning new endpoints to the edges incident to the replaced node. For these graph rewriting systems, we are interested in the subgraph-matching problem. We show that the problem is NP-complete, even on graphs that are stars. As a positive result, we give an algorithm which is polynomial if both rules and query graph have bounded degree and bounded cut size. We demonstrate that molecular graphs of practically relevant molecules in d…
ballaxy: web services for structural bioinformatics.
Abstract Motivation: Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. Results: In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large …
MetaCache: context-aware classification of metagenomic reads using minhashing.
Abstract Motivation Metagenomic shotgun sequencing studies are becoming increasingly popular with prominent examples including the sequencing of human microbiomes and diverse environments. A fundamental computational problem in this context is read classification, i.e. the assignment of each read to a taxonomic label. Due to the large number of reads produced by modern high-throughput sequencing technologies and the rapidly increasing number of available reference genomes corresponding software tools suffer from either long runtimes, large memory requirements or low accuracy. Results We introduce MetaCache—a novel software for read classification using the big data technique minhashing. Our…
A novel automated segmentation method for retinal layers in OCT images proves retinal degeneration after optic neuritis.
Aim The evaluation of inner retinal layer thickness can serve as a direct biomarker for monitoring the course of inflammatory diseases of the central nervous system such as multiple sclerosis (MS). Using optical coherence tomography (OCT), thinning of the retinal nerve fibre layer and changes in deeper retinal layers have been observed in patients with MS. Here, we first compare a novel method for automated segmentation of OCT images with manual segmentation using two cohorts of patients with MS. Using this method, we also aimed to reproduce previous findings showing retinal degeneration following optic neuritis (ON) in MS. Methods Based on a 5×5 expansion of the Prewitt operator to efficie…
Deep learning in next-generation sequencing
Highlights • Machine learning increasingly important for NGS. • Deep learning can improve many NGS applications.
Learning Molecular Classes from Small Numbers of Positive Examples Using Graph Grammars
We consider the following problem: A researcher identified a small number of molecules with a certain property of interest and now wants to find further molecules sharing this property in a database. This can be described as learning molecular classes from small numbers of positive examples. In this work, we propose a method that is based on learning a graph grammar for the molecular class. We consider the type of graph grammars proposed by Althaus et al. [2], as it can be easily interpreted and allows relatively efficient queries. We identify rules that are frequently encountered in the positive examples and use these to construct a graph grammar. We then classify a molecule as being conta…
Evaluating the microscopic effect of brushing stone tools as a cleaning procedure
Cleaning stone tool surfaces is a common procedure in lithic studies. The first step widely applied at any archeological site (and/or at field laboratories) is the gross removal of sediment from the surfaces of artifacts. Lithic surface alterations due to mechanical action applied in wet or dry cleaning regimes have never been examined at a microscopic scale. This could have important implications in traceology, as any modern surface modifications inflicted on archeological artifacts might compromise their functional interpretations. The current trend toward quantification of use-wear traces makes the testing even more important, as even slight, apparently invisible surface alterations migh…
Instruction of haematopoietic lineage choices, evolution of transcriptional landscapes and cancer stem cell hierarchies derived from an AML1-ETO mouse model.
The t(8;21) chromosomal translocation activates aberrant expression of the AML1-ETO (AE) fusion protein and is commonly associated with core binding factor acute myeloid leukaemia (CBF AML). Combining a conditional mouse model that closely resembles the slow evolution and the mosaic AE expression pattern of human t(8;21) CBF AML with global transcriptome sequencing, we find that disease progression was characterized by two principal pathogenic mechanisms. Initially, AE expression modified the lineage potential of haematopoietic stem cells (HSCs), resulting in the selective expansion of the myeloid compartment at the expense of normal erythro- and lymphopoiesis. This lineage skewing was foll…
CoverageAnalyzer (CAn): A Tool for Inspection of Modification Signatures in RNA Sequencing Profiles
Combination of reverse transcription (RT) and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modifi…
A Greedy Algorithm for Hierarchical Complete Linkage Clustering
We are interested in the greedy method to compute an hierarchical complete linkage clustering. There are two known methods for this problem, one having a running time of \({\mathcal O}(n^3)\) with a space requirement of \({\mathcal O}(n)\) and one having a running time of \({\mathcal O}(n^2 \log n)\) with a space requirement of Θ(n 2), where n is the number of points to be clustered. Both methods are not capable to handle large point sets. In this paper, we give an algorithm with a space requirement of \({\mathcal O}(n)\) which is able to cluster one million points in a day on current commodity hardware.
Graphical Workflow System for Modification Calling by Machine Learning of Reverse Transcription Signatures
Modification mapping from cDNA data has become a tremendously important approach in epitranscriptomics. So-called reverse transcription signatures in cDNA contain information on the position and nature of their causative RNA modifications. Data mining of, e.g. Illumina-based high-throughput sequencing data, is therefore fast growing in importance, and the field is still lacking effective tools. Here we present a versatile user-friendly graphical workflow system for modification calling based on machine learning. The workflow commences with a principal module for trimming, mapping, and postprocessing. The latter includes a quantification of mismatch and arrest rates with single-nucleotide re…
The impact of isolated lesions on white-matter fiber tracts in multiple sclerosis patients
Infratentorial lesions have been assigned an equivalent weighting to supratentorial plaques in the new McDonald criteria for diagnosing multiple sclerosis. Moreover, their presence has been shown to have prognostic value for disability. However, their spatial distribution and impact on network damage is not well understood. As a preliminary step in this study, we mapped the overall infratentorial lesion pattern in relapsing–remitting multiple sclerosis patients (N = 317) using MRI, finding the pons (lesion density, 14.25/cm3) and peduncles (13.38/cm3) to be predilection sites for infratentorial lesions. Based on these results, 118 fiber bundles from 15 healthy controls and a subgroup of 23 …
NOseq: amplicon sequencing evaluation method for RNA m6A sites after chemical deamination
Abstract Methods for the detection of m6A by RNA-Seq technologies are increasingly sought after. We here present NOseq, a method to detect m6A residues in defined amplicons by virtue of their resistance to chemical deamination, effected by nitrous acid. Partial deamination in NOseq affects all exocyclic amino groups present in nucleobases and thus also changes sequence information. The method uses a mapping algorithm specifically adapted to the sequence degeneration caused by deamination events. Thus, m6A sites with partial modification levels of ∼50% were detected in defined amplicons, and this threshold can be lowered to ∼10% by combination with m6A immunoprecipitation. NOseq faithfully d…
String kernels and high-quality data set for improved prediction of kinked helices in α-helical membrane proteins.
The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. For the particular case of kink prediction, we generated a data set of 132 membrane proteins containing 1014 manually labeled helices and examined the environment of kinks. Our sequence analysis confirms the great relevance of proline and reveals disproportionately high occurrences of glycine and serine at kink positions. The structural analysis shows significantly different solvent accessible surface area mean values for kinked and nonkinked helices. More important, …
CorCast: A Distributed Architecture for Bayesian Epidemic Nowcasting and its Application to District-Level SARS-CoV-2 Infection Numbers in Germany
Timely information on current infection numbers during an epidemic is of crucial importance for decision makers in politics, medicine, and businesses. As information about local infection risk can guide public policy as well as individual behavior, such as the wearing of personal protective equipment or voluntary social distancing, statistical models providing such insights should be transparent and reproducible as well as accurate. Fulfilling these requirements is drastically complicated by the large amounts of data generated during exponential growth of infection numbers, and by the complexity of common inference pipelines. Here, we present CorCast – a stable and scalable distributed arch…
Locality-sensitive hashing enables signal classification in high-throughput mass spectrometry raw data at scale
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: First, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Existing approaches for signal detection are usually not well suited for processing large amounts of data in parallel or rely on strong assumptions concerning the signals properties. In this study, it is shown that locali…
NightShift: NMR shift inference by general hybrid model training - a framework for NMR chemical shift prediction
A dynamic program analysis to find floating-point accuracy problems
Programs using floating-point arithmetic are prone to accuracy problems caused by rounding and catastrophic cancellation. These phenomena provoke bugs that are notoriously hard to track down: the program does not necessarily crash and the results are not necessarily obviously wrong, but often subtly inaccurate. Further use of these values can lead to catastrophic errors.In this paper, we present a dynamic program analysis that supports the programmer in finding accuracy problems. Our analysis uses binary translation to perform every floating-point computation side by side in higher precision. Furthermore, we use a lightweight slicing approach to track the evolution of errors.We evaluate our…
Parallelized Clustering of Protein Structures on CUDA-Enabled GPUs
Estimation of the pose in which two given molecules might bind together to form a potential complex is a crucial task in structural biology. To solve this so-called "docking problem", most algorithms initially generate large numbers of candidate poses (or decoys) which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates ranges from thousands to millions, performing the clustering on standard CPUs is highly time consuming. In this paper we analyze and evaluate different approaches to parallelize the nearest neighbor chain algorithm to perform hierarchical Ward clustering of protein structures usin…
SKINK: a web server for string kernel based kink prediction in α-helices
Abstract Motivation: The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. Here, we present a new web server, called SKINK, for string kernel based kink prediction. Extending our previous study, we also annotate the most probable kink position in a given α-helix sequence. Availability and implementation: The SKINK web server is freely accessible at http://biows-inf.zdv.uni-mainz.de/skink. Moreover, SKINK is a module of the BALL software, also freely available at www.ballview.org. Contact: benny.kneissl@roche.com
On the Applicability of Elastic Network Normal Modes in Small-Molecule Docking
Incorporating backbone flexibility into protein-ligand docking is still a challenging problem. In protein-protein docking, normal mode analysis (NMA) has become increasingly popular as it can be used to describe the collective motions of a biological system, but the question of whether NMA can also be useful in predicting the conformational changes observed upon small-molecule binding has only been addressed in a few case studies. Here, we describe a large-scale study on the applicability of NMA for protein-ligand docking using 433 apo/holo pairs of the Astex data sets. On the basis of sets of the first normal modes from the apo structure, we first generated for each paired holo structure a…
AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation
Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not only makes them challenging to understand and extend, but also difficult to port to other platforms. We present AnySeq - a novel library for computing different types of pairwise alignments of DNA sequences. Our approach combines high performance with an intuitively understandable implementation, which is achieved through the concept of partial evaluation. Using the AnyDSL compiler framework, AnySeq enables the compilation of algorithmic variants that ar…
ProteinScanAR - An Augmented Reality Web Application for High School Education in Biomolecular Life Sciences
Understanding protein structures is a crucial step in creating molecular insight for researchers as well as students and pupils. The enormous scaling gap between an atomic point of view and objects in daily life hampers developing an intuitive relation between them. Especially for high school students, it can be difficult to understand the spatial relations of a protein structure. Due to lack of direct imaging techniques, molecules can only be explored by studying abstract molecular models. Here, the use of Augmented reality (AR) techniques has proven to strongly improve structural perception. In this work we present ProteinScanAR, an augmented reality framework for biomolecular education t…
The reverse transcription signature of N-1-methyladenosine in RNA-Seq is sequence dependent
The combination of Reverse Transcription (RT) and high-throughput sequencing has emerged as a powerful combination to detect modified nucleotides in RNA via analysis of either abortive RT-products or of the incorporation of mismatched dNTPs into cDNA. Here we simultaneously analyze both parameters in detail with respect to the occurrence of N-1-methyladenosine (m1A) in the template RNA. This naturally occurring modification is associated with structural effects, but it is also known as a mediator of antibiotic resistance in ribosomal RNA. In structural probing experiments with dimethylsulfate, m1A is routinely detected by RT-arrest. A specifically developed RNA-Seq protocol was tailored to …
Algorithms for the Maximum Weight Connected $$k$$-Induced Subgraph Problem
Finding differentially regulated subgraphs in a biochemical network is an important problem in bioinformatics. We present a new model for finding such subgraphs which takes the polarity of the edges (activating or inhibiting) into account, leading to the problem of finding a connected subgraph induced by \(k\) vertices with maximum weight. We present several algorithms for this problem, including dynamic programming on tree decompositions and integer linear programming. We compare the strength of our integer linear program to previous formulations of the \(k\)-cardinality tree problem. Finally, we compare the performance of the algorithms and the quality of the results to a previous approac…
CellLineNavigator: a workbench for cancer cell line analysis
The CellLineNavigator database, freely available at http://www.medicalgenomics.org/celllinenavigator, is a web-based workbench for large scale comparisons of a large collection of diverse cell lines. It aims to support experimental design in the fields of genomics, systems biology and translational biomedical research. Currently, this compendium holds genome wide expression profiles of 317 different cancer cell lines, categorized into 57 different pathological states and 28 individual tissues. To enlarge the scope of CellLineNavigator, the database was furthermore closely linked to commonly used bioinformatics databases and knowledge repositories. To ensure easy data access and search abili…
Integrated quantitative proteomic and transcriptomic analysis of lung tumor and control tissue: a lung cancer showcase
Proteomics analysis of paired cancer and control tissue can be applied to investigate pathological processes in tumors. Advancements in data-independent acquisition mass spectrometry allow for highly reproducible quantitative analysis of complex proteomic patterns. Optimized sample preparation workflows enable integrative multi-omics studies from the same tissue specimens. We performed ion mobility enhanced, data-independent acquisition MS to characterize the proteome of 21 lung tumor tissues including adenocarcinoma and squamous cell carcinoma (SCC) as compared to control lung tissues of the same patient each. Transcriptomic data were generated for the same specimens. The quantitative prot…
Competing salt effects on phase behavior of protein solutions: tailoring of protein interaction by the binding of multivalent ions and charge screening.
The phase behavior of protein solutions is affected by additives such as crowder molecules or salts. In particular, upon addition of multivalent counterions, a reentrant condensation can occur; i.e., protein solutions are stable for low and high multivalent ion concentrations but aggregating at intermediate salt concentrations. The addition of monovalent ions shifts the phase boundaries to higher multivalent ion concentrations. This effect is found to be reflected in the protein interactions, as accessed via small-angle X-ray scattering. Two simulation schemes (a Monte Carlo sampling of the counterion binding configurations using the detailed protein structure and an analytical coarse-grain…
Automatic shape detection of ice crystals
Abstract Clouds have a crucial impact on the energy balance of the Earth-Atmosphere system. They can cool the system by partly reflecting or scattering of the incoming solar radiation (albedo effect); moreover, thermal radiation as emitted from the Earth's surface can be absorbed and partly re-emitted by clouds leading to a warming of the atmosphere (greenhouse effect). The effectiveness of both effects crucially depends on the size and the shape of a cloud's particulate constituents, i.e. liquid water droplets or solid ice crystals. For studying cloud microphysics, in situ measurements on board of aircraft are commonly used. An important class of measurement techniques comprises optical ar…
Efficient computation of root mean square deviations under rigid transformations
The computation of root mean square deviations (RMSD) is an important step in many bioinformatics applications. If approached naively, each RMSD computation takes time linear in the number of atoms. In addition, a careful implementation is required to achieve numerical stability, which further increases runtimes. In practice, the structural variations under consideration are often induced by rigid transformations of the protein, or are at least dominated by a rigid component. In this work, we show how RMSD values resulting from rigid transformations can be computed in constant time from the protein's covariance matrix, which can be precomputed in linear time. As a typical application scenar…
NESSie.jl – Efficient and intuitive finite element and boundary element methods for nonlocal protein electrostatics in the Julia language
Abstract The development of scientific software can be generally characterized by an initial phase of rapid prototyping and the subsequent transition to computationally efficient production code. Unfortunately, most programming languages are not well-suited for both tasks at the same time, commonly resulting in a considerable extension of the development time. The cross-platform and open-source Julia language aims at closing the gap between prototype and production code by providing a usability comparable to Python or MATLAB alongside high-performance capabilities known from C and C++ in a single programming language. In this paper, we present efficient protein electrostatics computations a…
Polish is quantitatively different on quartzite flakes used on different worked materials.
Metrology has been successfully used in the last decade to quantify use-wear on stone tools. Such techniques have been mostly applied to fine-grained rocks (chert), while studies on coarse-grained raw materials have been relatively infrequent. In this study, confocal microscopy was employed to investigate polished surfaces on a coarse-grained lithology, quartzite. Wear originating from contact with five different worked materials were classified in a data-driven approach using machine learning. Two different classifiers, a decision tree and a support-vector machine, were used to assign the different textures to a worked material based on a selected number of parameters (Mean density of furr…
CARE: context-aware sequencing read error correction.
Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…
Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [Python analysis]
This upload includes the following files related to the Python analysis: Raw data as a XLSX table (brushing_v2.xlsx), i.e. results from R Script #1 (see https://doi.org/10.5281/zenodo.3632517) Python script of the whole analysis (RunEveryParameter.py) Convenience script for running RunEveryParameter.py in background and logging all output (RunSingleParametesBash.sh) Log file for output of sampling from the model for each parameter in a loop (logAll.txt) Jupyter notebooks of the analysis run on epLsar as an example (Notebook_SingleParameter.inpyb) and of a summary of the whole analysis (Notebook_Overview.ipynb), plus associated HTML output files (*.html) For each parameter: Full samples of p…
Polish is quantitatively different on quartzite flakes used on different worked materials [ConfoMap analysis]
Each surface has been processed with two templates: 1) Extract two 50x50 µm sub-areas and extract topography layer from each sub-area. Export sub-areas as SUR files. File names start with "A35" or "VSH4". 2) Process all extracted sub-areas for quantitative analysis. File names start with "processing-quartzite-final". All ConfoMap templates are saved in MNT format (including all original and processed surfaces, as well as results). Each template has also been exported to a PDF file. Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952 Additionally, the results of the second template are collated into "proce…
Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [R analysis]
This upload includes the following files related to the R analysis: - Raw data as a CSV table (brushing_v2.csv), i.e. results from the ConfoMap analysis (see https://doi.org/10.5281/zenodo.3632490) - RStudio project (Brushing_project.Rproj) - R scripts as R Markdown files (*.Rmd) - Output from R scripts knitted to HTML files (*.html) - A text file containing the version of RStudio used (RStudioVersion.txt) Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952
Polish is quantitatively different on quartzite flakes used on different worked materials [Python analysis]
This upload includes the following files related to the Python analysis: 1. Raw data as a XLSX table (processing-quartzite-final-2020-04-29.xlsx) is the output from R Script #1 (see https://doi.org/10.5281/zenodo.3979139), even though the filename is slightly different. Plus, for each analysis (full and restricted datasets), included in the corresponding ZIP archive: 2. Jupyter notebooks of the analysis (Classification_RandSplitFeature_Revision_VXX.ipynb) rendered to HTML file (Classification_RandSplitFeature_Revision_VXX.html) 3. Dataframe including the artificially filled datapoints 4. Output of the analysis as PDF: • Confusion matrices ("CM&qu…
Evaluating the microscopic effect of brushing stone tools as a cleaning procedure [ConfoMap analysis]
ConfoMap templates for each surface in MNT format (including all original and processed surfaces, as well as results). Each template has also been exported to a PDF file. Additionally, results are collated into 'brushing_v2.csv' Instructions to download all files at once are given here: https://doi.org/10.5281/zenodo.4011952
Polish is quantitatively different on quartzite flakes used on different worked materials [R analysis]
This upload includes the following files related to the R analysis: - Raw data as a CSV table (processing-quartzite-final.csv), i.e. results from the ConfoMap analysis (see https://doi.org/10.5281/zenodo.3979116) - RStudio project (Quantification quartzite final.Rproj) - R scripts as R Markdown files (*.Rmd) - R scripts knitted to HTML files (*.html) - An R script (RStudioVersion.R) to write the used version of RStudio to a text file (RStudioVersion.txt) - Output from script #1: processing-quartzite-final.Rbin and processing-quartzite-final.xlsx - Output from script #2: processing-quartzite-final_summary-stats.xlsx - Output from script #3: all plots as PDF files. Note that for running the s…