Search results for "DATABASES"
showing 10 items of 937 documents
FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.
2016
The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50…
Automated selection of homologs to track the evolutionary history of proteins
2018
Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give …
The Human Proteome Organization–Proteomics Standards Initiative Quality Control Working Group: Making quality control more accessible for biological …
2017
To have confidence in results acquired during biological mass spectrometry experiments, a systematic approach to quality control is of vital importance. Nonetheless, until now, only scattered initiatives have been undertaken to this end, and these individual efforts have often not been complementary. To address this issue, the Human Proteome Organization–Proteomics Standards Initiative has established a new working group on quality control at its meeting in the spring of 2016. The goal of this working group is to provide a unifying framework for quality control data. The initial focus will be on providing a community-driven standardized file format for quality control. For this purpose, the…
Proteomics Standards Initiative: Fifteen Years of Progress and Future Work.
2017
Abstract: The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) has now been developing and promoting open community standards and software tools in the field of proteomics for 15 years. Under the guidance of the chair, co-chairs, and other leadership positions, the PSI working groups are tasked with the development and maintenance of community standards via special workshops and ongoing work. Among the existing, ratified standards, the PSI working groups continue to update PSI-MI XML, MITAB, mzML, mzIdentML, mzQuantML, mzTab, and the MIAPE (Minimum Information About a Proteomics Experiment) guidelines with the advance of new technologies and techniques. Furthe…
MODOMICS: a database of RNA modification pathways. 2017 update
2017
Abstract MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, the location of modified residues in RNA sequences, and RNA-modifying enzymes. In the current database version, we included the following new features and data: extended mass spectrometry and liquid chromatography data for modified nucleosides; links between human tRNA sequences and MINTbase - a framework for the interactive exploration of mitochondrial and nuclear tRNA fragments; new, machine-friendly system of unified abbreviations for modified nucleoside names; sets of modified tRNA sequences for two bact…
Sentinel hospital-based surveillance for norovirus infection in children with gastroenteritis between 2015 and 2016 in Italy
2018
Noroviruses are one of the leading causes of gastro-enteric diseases worldwide in all age groups. Novel epidemic noroviruses with GII.P16 polymerase and GII.2 or GII.4 capsid type have emerged worldwide in late 2015 and in 2016. We performed a molecular epidemiological study of the noroviruses circulating in Italy to investigate the emergence of new norovirus strains. Sentinel hospital-based surveillance, in three different Italian regions, revealed increased prevalence of norovirus infection in children (<15 years) in 2016 (14.4% versus 9.8% in 2015) and the emergence of GII.P16 strains in late 2016, which accounted for 23.0% of norovirus infections. The majority of the strains with a GII.…
Newly Digitized Database Reveals the Lives and Families of Forced Migrants from Finnish Karelia
2017
Studies on displaced persons often suffer from a lack of data on the long-term effects of forced migration. A register created during 1960s and published as a book series ‘Siirtokarjalaisten tie’ in 1970 documented the lives of individuals who fled the southern Karelian district of Finland after its first and second occupation by the Soviet Union in 1940 and 1944. To realize the potential value of these data for scientific research, we have recently scanned the register using optical character recognition (OCR) software, and developed proprietary computer code to extract these data. Here we outline the steps involved in the digitization process, and present an overview of the Migration Kare…
RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures
2017
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by a…
Fragments of peer review: A quantitative analysis of the literature (1969-2015)
2018
This paper examines research on peer review between 1969 and 2015 by looking at records indexed from the Scopus database. Although it is often argued that peer review has been poorly investigated, we found that the number of publications in this field doubled from 2005. A half of this work was indexed as research articles, a third as editorial notes and literature reviews and the rest were book chapters or letters. We identified the most prolific and influential scholars, the most cited publications and the most important journals in the field. Co-authorship network analysis showed that research on peer review is fragmented, with the largest group of co-authors including only 2.1% of the wh…
Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data
2017
AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…