Search results for " set"
showing 10 items of 2095 documents
Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts?
2021
Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with dif…
Mislabel Detection of Finnish Publication Ranks
2019
The paper proposes to analyze a data set of Finnish ranks of academic publication channels with Extreme Learning Machine (ELM). The purpose is to introduce and test recently proposed ELM-based mislabel detection approach with a rich set of features characterizing a publication channel. We will compare the architecture, accuracy, and, especially, the set of detected mislabels of the ELM-based approach to the corresponding reference results on the reference paper.
On the Convergence of Tsetlin Machines for the IDENTITY- and NOT Operators
2020
The Tsetlin Machine (TM) is a recent machine learning algorithm with several distinct properties, such as interpretability, simplicity, and hardware-friendliness. Although numerous empirical evaluations report on its performance, the mathematical analysis of its convergence is still open. In this article, we analyze the convergence of the TM with only one clause involved for classification. More specifically, we examine two basic logical operators, namely, the "IDENTITY"- and "NOT" operators. Our analysis reveals that the TM, with just one clause, can converge correctly to the intended logical operator, learning from training data over an infinite time horizon. Besides, it can capture arbit…
Dual Extrapolation for Sparse Generalized Linear Models
2020
International audience; Generalized Linear Models (GLM) form a wide class of regression and classification models, where prediction is a function of a linear combination of the input variables. For statistical inference in high dimension, sparsity inducing regularizations have proven to be useful while offering statistical guarantees. However, solving the resulting optimization problems can be challenging: even for popular iterative algorithms such as coordinate descent, one needs to loop over a large number of variables. To mitigate this, techniques known as screening rules and working sets diminish the size of the optimization problem at hand, either by progressively removing variables, o…
A survey of active learning algorithms for supervised remote sensing image classification
2011
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active …
Improving table compression with combinatorial optimization
2002
We study the problem of compressing massive tables within the partition-training paradigm introduced by Buchsbaum et al. [SODA'00], in which a table is partitioned by an off-line training procedure into disjoint intervals of columns, each of which is compressed separately by a standard, on-line compressor like gzip. We provide a new theory that unifies previous experimental observations on partitioning and heuristic observations on column permutation, all of which are used to improve compression rates. Based on the theory, we devise the first on-line training algorithms for table compression, which can be applied to individual files, not just continuously operating sources; and also a new, …
Monoids and Maximal Codes
2011
In recent years codes that are not Uniquely Decipherable (UD) are been studied partitioning them in classes that localize the ambiguities of the code. A natural question is how we can extend the notion of maximality to codes that are not UD. In this paper we give an answer to this question. To do this we introduce a partial order in the set of submonoids of a monoid showing the existence, in this poset, of maximal elements that we call full monoids. Then a set of generators of a full monoid is, by definition, a maximal code. We show how this definition extends, in a natural way, the existing definition concerning UD codes and we find a characteristic property of a monoid generated by a maxi…
Quantum, stochastic, and pseudo stochastic languages with few states
2014
Stochastic languages are the languages recognized by probabilistic finite automata (PFAs) with cutpoint over the field of real numbers. More general computational models over the same field such as generalized finite automata (GFAs) and quantum finite automata (QFAs) define the same class. In 1963, Rabin proved the set of stochastic languages to be uncountable presenting a single 2-state PFA over the binary alphabet recognizing uncountably many languages depending on the cutpoint. In this paper, we show the same result for unary stochastic languages. Namely, we exhibit a 2-state unary GFA, a 2-state unary QFA, and a family of 3-state unary PFAs recognizing uncountably many languages; all th…
Separations in Query Complexity Based on Pointer Functions
2015
In 1986, Saks and Wigderson conjectured that the largest separation between deterministic and zero-error randomized query complexity for a total boolean function is given by the function $f$ on $n=2^k$ bits defined by a complete binary tree of NAND gates of depth $k$, which achieves $R_0(f) = O(D(f)^{0.7537\ldots})$. We show this is false by giving an example of a total boolean function $f$ on $n$ bits whose deterministic query complexity is $\Omega(n/\log(n))$ while its zero-error randomized query complexity is $\tilde O(\sqrt{n})$. We further show that the quantum query complexity of the same function is $\tilde O(n^{1/4})$, giving the first example of a total function with a super-quadra…
Enumeration and Structure of Trapezoidal Words
2013
Trapezoidal words are words having at most $n+1$ distinct factors of length $n$ for every $n\ge 0$. They therefore encompass finite Sturmian words. We give combinatorial characterizations of trapezoidal words and exhibit a formula for their enumeration. We then separate trapezoidal words into two disjoint classes: open and closed. A trapezoidal word is closed if it has a factor that occurs only as a prefix and as a suffix; otherwise it is open. We investigate open and closed trapezoidal words, in relation with their special factors. We prove that Sturmian palindromes are closed trapezoidal words and that a closed trapezoidal word is a Sturmian palindrome if and only if its longest repeated …