数据结构与算法学术速递[1.10]
Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.DS数据结构与算法,共计3篇
【1】 The Efficiency of the ANS Entropy Encoding
标题:ANS熵编码的效率分析
链接:https://arxiv.org/abs/2201.02514
备注:15 pages, 5 figures, 2 algorithms
摘要:The Asymmetric Numeral Systems (ANS) is a class of entropy encoders by Duda
that had an immense impact on the data compression, substituting arithmetic and
Huffman coding. The optimality of ANS was studied by Duda et al. but the
precise asymptotic behaviour of its redundancy (in comparison to the entropy)
was not completely understood. In this paper we establish an optimal bound on
the redundancy for the tabled ANS (tANS), the most popular ANS variant. Given a
sequence $a_1,\ldots,a_n$ of letters from an alphabet $\{0,\ldots,\sigma-1\}$
such that each letter $a$ occurs in it $f_a$ times and $n=2^r$, the tANS
encoder using Duda's ``precise initialization'' to fill tANS tables transforms
this sequence into a bit string of length (frequencies are not included in the
encoding size): $$ \sum\limits_{a\in
[0..\sigma)}f_a\cdot\log\frac{n}{f_a}+O(\sigma+r), $$ where $O(\sigma + r)$ can
be bounded by $\sigma\log e+r$. The $r$-bit term is an encoder artifact
indispensable to ANS; the rest incurs a redundancy of $O(\frac{\sigma}{n})$
bits per letter. We complement this bound by a series of examples showing that
an $\Omega(\sigma+r)$ redundancy is necessary when $\sigma > n/3$, where
$\Omega(\sigma + r)$ is at least $\frac{\sigma-1}{4}+r-2$. We argue that
similar examples exist for any methods that distribute letters in tANS tables
using only the knowledge about frequencies. Thus, we refute Duda's conjecture
that the redundancy is $O(\frac{\sigma}{n^2})$ bits per letter.
We also propose a new variant of range ANS (rANS), called rANS with fixed
accuracy, that is parameterized by $k \ge 1$. In this variant the integer
division, which is unavoidable in rANS, is performed only in cases when its
result belongs to $[2^k..2^{k+1})$. Hence, the division can be computed by
faster methods provided $k$ is small. We bound the redundancy for the rANS with
fixed accuracy $k$ by $\frac{n}{2^k-1}\log e+r$.
【2】 k-Center Clustering with Outliers in Sliding Windows
标题:滑动窗口中带离群点的K-中心聚类
链接:https://arxiv.org/abs/2201.02448
摘要:Metric $k$-center clustering is a fundamental unsupervised learning
primitive. Although widely used, this primitive is heavily affected by noise in
the data, so that a more sensible variant seeks for the best solution that
disregards a given number $z$ of points of the dataset, called outliers. We
provide efficient algorithms for this important variant in the streaming model
under the sliding window setting, where, at each time step, the dataset to be
clustered is the window $W$ of the most recent data items. Our algorithms
achieve $O(1)$ approximation and, remarkably, require a working memory linear
in $k+z$ and only logarithmic in $|W|$. As a by-product, we show how to
estimate the effective diameter of the window $W$, which is a measure of the
spread of the window points, disregarding a given fraction of noisy distances.
We also provide experimental evidence of the practical viability of our
theoretical results.
【3】 Fixation Maximization in the Positional Moran Process
标题:位置性Moran过程中的注视最大化
链接:https://arxiv.org/abs/2201.02248
备注:11 pages, 6 figures, to appear at AAAI 2022
摘要:The Moran process is a classic stochastic process that models invasion
dynamics on graphs. A single "mutant" (e.g., a new opinion, strain, social
trait etc.) invades a population of residents spread over the nodes of a graph.
The mutant fitness advantage $\delta\geq 0$ determines how aggressively mutants
propagate to their neighbors. The quantity of interest is the fixation
probability, i.e., the probability that the initial mutant eventually takes
over the whole population. However, in realistic settings, the invading mutant
has an advantage only in certain locations. E.g., a bacterial mutation allowing
for lactose metabolism only confers an advantage on places where dairy products
are present. In this paper we introduce the positional Moran process, a natural
generalization in which the mutant fitness advantage is only realized on
specific nodes called active nodes. The associated optimization problem is
fixation maximization: given a budget $k$, choose a set of $k$ active nodes
that maximize the fixation probability of the invading mutant. We show that the
problem is NP-hard, while the optimization function is not submodular, thus
indicating strong computational hardness. Then we focus on two natural limits.
In the limit of $\delta\to\infty$ (strong selection), although the problem
remains NP-hard, the optimization function becomes submodular and thus admits a
constant-factor approximation using a simple greedy algorithm. In the limit of
$\delta\to 0$ (weak selection), we show that in $O(m^\omega)$ time we can
obtain a tight approximation, where $m$ is the number of edges and $\omega$ is
the matrix-multiplication exponent. Finally, we present an experimental
evaluation of the new algorithms together with some proposed heuristics.
机器翻译,仅供参考
点击“阅读原文”获取带摘要的学术速递