0000000000345534

AUTHOR

Eduardo L. Pasiliao

Multitask deep learning for native language identification

Identifying the native language of a person by their text written in English (L1 identification) plays an important role in such tasks as authorship profiling and identification. With the current proliferation of misinformation in social media, these methods are especially topical. Most studies in this field have focused on the development of supervised classification algorithms, that are trained on a single L1 dataset. Although multiple labeled datasets are available for L1 identification, they contain texts authored by speakers of different languages and do not completely overlap. Current approaches achieve high accuracy on available datasets, but this is attained by training an individua…

research product

Exploring social media network landscape of post-Soviet space

The “post-Soviet space” consists of countries with a substantial fraction of the world’s population; however, unlike many other regions, its social media network landscape is still somewhat under-explored. This paper aims at filling this gap. To this purpose, we use anonymized data on user friendships at VK.com (also known as VKontakte and, informally, as “Russian Facebook”), which is the largest and most popular social media portal in the post-Soviet space with hundreds of millions of user accounts. Using the VK network snapshots from October 2015 to December 2016, we conduct a “multiscale” empirical study of this network by considering conn…

research product

Graph-based exploration and clustering analysis of semantic spaces

Abstract The goal of this study is to demonstrate how network science and graph theory tools and concepts can be effectively used for exploring and comparing semantic spaces of word embeddings and lexical databases. Specifically, we construct semantic networks based on word2vec representation of words, which is “learnt” from large text corpora (Google news, Amazon reviews), and “human built” word networks derived from the well-known lexical databases: WordNet and Moby Thesaurus. We compare “global” (e.g., degrees, distances, clustering coefficients) and “local” (e.g., most central nodes and community-type dense clusters) characteristics of considered networks. Our observations suggest that …

research product

Analysis of Viral Advertisement Re-Posting Activity in Social Media

More and more businesses use social media to advertise their services. Such businesses typically maintain online social network accounts and regularly update their pages with advertisement messages describing new products and promotions. One recent trend in such businesses’ activity is to offer incentives to individual users for re-posting the advertisement messages to their own profiles, thus making it visible to more and more users. A common type of an incentive puts all the re-posting users into a random draw for a valuable gift. Understanding the dynamics of user engagement into the re-posting activity can shed light on social influence mechanisms and help determine the optimal incentiv…

research product

Network-based indices of individual and collective advising impacts in mathematics

AbstractAdvising and mentoring Ph.D. students is an increasingly important aspect of the academic profession. We define and interpret a family of metrics (collectively referred to as “a-indices”) that can potentially be applied to “ranking academic advisors” using the academic genealogical records of scientists, with the emphasis on taking into account not only the number of students advised by an individual, but also subsequent academic advising records of those students. We also define and calculate the extensions of the proposed indices that account for student co-advising (referred to as “adjusted a-indices”). In addition, we extend some of the proposed metrics to ranking universities a…

research product

Neural Networks with Multidimensional Cross-Entropy Loss Functions

Deep neural networks have emerged as an effective machine learning tool successfully applied for many tasks, such as misinformation detection, natural language processing, image recognition, machine translation, etc. Neural networks are often applied to binary or multi-class classification problems. In these settings, cross-entropy is used as a loss function for neural network training. In this short note, we propose an extension of the concept of cross-entropy, referred to as multidimensional cross-entropy, and its application as a loss function for classification using neural networks. The presented computational experiments on a benchmark dataset suggest that the proposed approaches may …

research product

Sampled Fictitious Play on Networks

We formulate and solve the problem of optimizing the structure of an information propagation network between multiple agents. In a given space of interests (e.g., information on certain targets), each agent is defined by a vector of their desirable information, called filter, and a vector of available information, called source. The agents seek to build a directed network that maximizes the value of the desirable source-information that reaches each agent having been filtered en route, less the expense that each agent incurs in filtering any information of no interest to them. We frame this optimization problem as a game of common interest, where the Nash equilibria can be attained as limit…

research product

Engagement as a Driver of Growth of Online Health Forums: Observational Study

Background: The emerging research on nurturing the growth of online communities posits that it is in part attributed to network effects, wherein every increase in the volume of user-generated content increases the value of the community in the eyes of its potential new members. The recently introduced metric engagement capacity offers a means of quantitatively assessing the ability of online platform users to engage each other into generating content; meanwhile, the quantity engagement value is useful for quantifying communication-based platform use. If the claim that higher engagement leads to accelerated growth holds true for online health forums (OHFs), then engagement tracking should be…

research product