Hi, I'm Thomas. I am a graduate student in Theoretical Computer Science with Rasmus Pagh, since 2015 at the IT University of Copenhagen.
My research has primarily involved the theoretical foundations of massive data, similarity search (which is also my primary funding through Scalable Similarity Search), high dimensional geometry, communication and conditional lower bounds, sublinear algorithms and intrinsic dimensionality.
Previously I have been at University of Copenhagen and University of Oxford.
Publications

TA — 2017
Optimal Las Vegas Locality Sensitive Data Structures. Foundations of Computer Science, (pdf, arxiv)We show that approximate similarity (near neighbour) search can be solved in high dimensions with performance matching state of the art (data independent) Locality Sensitive Hashing, but with a guarantee of no false negatives. Specifically we give two data structures for common problems. For capproximate near neighbour in Hamming space, for which we get query time dn^{1/c+o(1)} and space dn^{1+1/c+o(1)} matching that of [Indyk and Motwani, 1998] and answering a long standing open question from [Indyk, 2000a] and [Pagh, 2016] in the affirmative. For $(s1,s2)$approximate Jaccard similarity we get query time d^2n^{ρ+o(1)} and space d^2n^{1+ρ+o(1)}, ρ= [log (1+s1)/(2s1)]/[log (1+s2)/(2s2)], when sets have equal size, matching the performance of [Pagh and Christiani, 2017]. We use space partitions as in classic LSH, but construct these using a combination of brute force, tensoring and splitter functions à la~\cite{naor1995splitters}. We also show two dimensionality reduction lemmas with 1sided error. 
TA, M Aumüller, R Pagh — 2017
Parameterfree Locality Sensitive Hashing for Spherical Range Reporting. Proceedings of Symposium on Discrete Algorithms, (pdf, arxiv, slides)We present a data structure for spherical range reporting on a point set S, i.e., reporting all points in S that lie within radius r of a given query point q. Our solution builds upon the LocalitySensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While traditional LSH data structures have several parameters whose optimal values depend on the distance distribution from q to the points of S, our data structure is parameterfree, except for the space usage, which is configurable by the user. Nevertheless, its expected query time basically matches that of an LSH data structure whose parameters have been optimally chosen for the data and query in question under the given space constraints. In particular, our data structure provides a smooth tradeoff between hard queries (typically addressed by standard LSH) and easy queries such as those where the number of points to report is a constant fraction of S, or where almost all points in S are far away from the query point. In contrast, known data structures fix LSH parameters based on certain parameters of the input alone.
The algorithm has expected query time bounded by O(t(n/t)^ρ), where t is the number of points to report and ρ∈(0,1) depends on the data distribution and the strength of the LSH family used. We further present a parameterfree way of using multiprobing, for LSH families that support it, and show that for many such families this approach allows us to get expected query time close to O(n^ρ+t), which is the best we can hope to achieve using LSH. The previously best running time in high dimensions was Ω(tn^ρ). For many data distributions where the intrinsic dimensionality of the point set close to q is low, we can give improved upper bounds on the expected query time. 
TA, R Pagh, I Razenshteyn, F Silvestri — 2016
On the Complexity of Inner Product Similarity Join. Symposium on Principles of Database Systems, (pdf, arxiv, slides)A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric localitysensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are:
* Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis.
* New upper and lower bounds for (A)LSHbased algorithms. In particular, we show that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements.
* A new indexing method for IPS based on linear sketches, implying that our hardness results are not far from being tight.
Our technical contributions include new asymmetric embeddings that may be of independent interest. At the conceptual level we strive to provide greater clarity, for example by distinguishing among signed and unsigned variants of IPS join and shedding new light on the effect of asymmetry.
Media
 Stibo  August 2016
 Pressreader  December 2015
 Computerworld  June 2015
Code
 mtime  helps managing the ITU time management
 PyChess  a chess engine and Internet chess client
 Sunfish  a minimalist chess engine
 numberlink  a fast solver for numberlink puzzles