Hi, I'm Thomas. I am a Postdoctoral researcher in Theoretical Computer Science at the IT University of Copenhagen and at the Basic Algorithms Research group (BARC). I did my PhD with Rasmus Pagh on the Scalable Similarity Search project, and previously I have been at University of Texas at Austin, and University of Oxford.
My research has primarily involved the theoretical foundations of massive data, similarity search, high dimensional geometry, sketching and derandomization. See also my research program.
News
 17/jun/19
 Just defended my PhD! (slides) Thanks to everyone who came and to Jelani Nelson, Riko Jacob and Inge Li Gørtz for being on the comity!
 1/feb/19
 The date for handing in my PhD. is now set at April 1st. I am interested in hearing about Post Doc positions anywhere in the world.
 1/oct/18
 I have decided to leave SupWiz and return to Academia. Looking forward to working with everyone!
 1/sep/17
 I am taking a 1 year sabbatical to start up a new Natural Language Processing company, SupWiz, with former research colleagues.
Publications

TA — 2019
Subsets and Supermajorities: Unifying Hashingbased Set Similarity Search. Submitted, (pdf, arxiv, slides)We consider the problem of designing Locality Sensitive Filters (LSF) for set overlaps, also known as maximum inner product search on binary data. We give a simple data structure that generalizes and outperforms previous algorithms such as MinHash [J. Discrete Algorithms 1998], SimHash [STOC 2002], Spherical LSF [SODA 2017] and Chosen Path [STOC 2017]; and we show matching lower bounds using hypercontractive inequalities for a wide range of space/time tradeoffs. This answers the main open question in Christiani and Pagh [STOC 2017] on unifying the landscape of Locality Sensitive (nondatadependent) set similarity search. 
TA, M Kapralov, J Knudsen, R Pagh, A Velingker, D Woodruff, A Zandieh — 2019
Oblivious Sketching of HighDegree Polynomial Kernels. Submitted, ()Kernel methods are fundamental tools in machine learning that allow detection of nonlinear dependencies between data without explicitly constructing feature vectors in high dimensional spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at least) quadratic time, as kernel matrices are usually dense. Some methods for speeding up kernel linear algebra are known, but they all invariably take time exponential in either the dimension of the input point set (e.g., fast multipole methods suffer from the curse of dimensionality) or in the degree of the kernel function. Oblivious sketching has emerged as a powerful approach to speeding up numerical linear algebra over the past decade, but our understanding of oblivious sketching solutions for kernel matrices has remained quite limited, suffering from the aforementioned exponential dependence on input parameters. Our main contribution is a general method for applying sketching solutions developed in numerical linear algebra over the past decade to a tensoring of data points without forming the tensoring explicitly. This leads to the first oblivious sketch for the polynomial kernel with a target dimension that is only polynomially dependent on the degree of the kernel function, as well as the first oblivious sketch for the Gaussian kernel on bounded datasets that does not suffer from an exponential dependence on the dimensionality of input data points. 
TA — 2017, Updated Jun 2018
Optimal Las Vegas Locality Sensitive Data Structures. Foundations of Computer Science, (pdf, arxiv, slides)We show that approximate similarity (near neighbour) search can be solved in high dimensions with performance matching state of the art (data independent) Locality Sensitive Hashing, but with a guarantee of no false negatives. Specifically we give two data structures for common problems. For capproximate near neighbour in Hamming space, for which we get query time dn^{1/c+o(1)} and space dn^{1+1/c+o(1)} matching that of [Indyk and Motwani, 1998] and answering a long standing open question from [Indyk, 2000a] and [Pagh, 2016] in the affirmative. For $(s1,s2)$approximate Jaccard similarity we get query time d^2n^{ρ+o(1)} and space d^2n^{1+ρ+o(1)}, ρ= [log (1+s1)/(2s1)]/[log (1+s2)/(2s2)], when sets have equal size, matching the performance of [Pagh and Christiani, 2017]. We use space partitions as in classic LSH, but construct these using a combination of brute force, tensoring and splitter functions à la [Naor et al., 1995]. We also show two dimensionality reduction lemmas with 1sided error. 
TA, M Aumüller, R Pagh — 2017
Parameterfree Locality Sensitive Hashing for Spherical Range Reporting. Proceedings of Symposium on Discrete Algorithms, (pdf, arxiv, slides)We present a data structure for spherical range reporting on a point set S, i.e., reporting all points in S that lie within radius r of a given query point q. Our solution builds upon the LocalitySensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While traditional LSH data structures have several parameters whose optimal values depend on the distance distribution from q to the points of S, our data structure is parameterfree, except for the space usage, which is configurable by the user. Nevertheless, its expected query time basically matches that of an LSH data structure whose parameters have been optimally chosen for the data and query in question under the given space constraints. In particular, our data structure provides a smooth tradeoff between hard queries (typically addressed by standard LSH) and easy queries such as those where the number of points to report is a constant fraction of S, or where almost all points in S are far away from the query point. In contrast, known data structures fix LSH parameters based on certain parameters of the input alone.
The algorithm has expected query time bounded by O(t(n/t)^ρ), where t is the number of points to report and ρ∈(0,1) depends on the data distribution and the strength of the LSH family used. We further present a parameterfree way of using multiprobing, for LSH families that support it, and show that for many such families this approach allows us to get expected query time close to O(n^ρ+t), which is the best we can hope to achieve using LSH. The previously best running time in high dimensions was Ω(tn^ρ). For many data distributions where the intrinsic dimensionality of the point set close to q is low, we can give improved upper bounds on the expected query time. 
TA, R Pagh, I Razenshteyn, F Silvestri — 2016
On the Complexity of Inner Product Similarity Join. Symposium on Principles of Database Systems, (pdf, arxiv, slides)A number of tasks in classification, information retrieval, recommendation systems, and record linkage reduce to the core problem of inner product similarity join (IPS join): identifying pairs of vectors in a collection that have a sufficiently large inner product. IPS join is well understood when vectors are normalized and some approximation of inner products is allowed. However, the general case where vectors may have any length appears much more challenging. Recently, new upper bounds based on asymmetric localitysensitive hashing (ALSH) and asymmetric embeddings have emerged, but little has been known on the lower bound side. In this paper we initiate a systematic study of inner product similarity join, showing new lower and upper bounds. Our main results are:
* Approximation hardness of IPS join in subquadratic time, assuming the strong exponential time hypothesis.
* New upper and lower bounds for (A)LSHbased algorithms. In particular, we show that asymmetry can be avoided by relaxing the LSH definition to only consider the collision probability of distinct elements.
* A new indexing method for IPS based on linear sketches, implying that our hardness results are not far from being tight.
Our technical contributions include new asymmetric embeddings that may be of independent interest. At the conceptual level we strive to provide greater clarity, for example by distinguishing among signed and unsigned variants of IPS join and shedding new light on the effect of asymmetry.
Manuscripts

We construct a structured Johnson Lindenstrauss transformation that can be applied to simple tensors on the form x = x^(1) ⊗ ... ⊗ x^(c) ∈ ℝ^(dᶜ)$ in time nearly c⋅d. That is, exponentially faster than writing out the Kronecker product and then mapping down. These matrices, M, which preserves the norm of any x ∈ ℝ^(d^c), such that  Mx₂  x₂  < ε with probability 1δ , can be taken to have just Õ(c² ε⁻² (log1/δ)³) rows. This is within c² (\log1/δ)² of optimal for any JL matrix [Larsen & Nelson], and improves upon earlier 'Tensor Sketch' constructions by Pagh and Pham, which used Õ(3ᶜ ε⁻² δ⁻¹) rows, by an exponential amount in both c and δ⁻² . It was shown by Avron, Nguyen and Woodruff that Tensor Sketch is a subspace embedding. This has a large number of applications, such as guaranteeing the correctness of kernellinear regression performed directly on the reduced vectors. We show that our construction is a subspace embedding too, improving again upon the exponential dependency on c and δ⁻¹ , enabling sketching of much higher order polynomial kernels, such as Taylor approximations to the ubiquitous Gaussian radial basis function. Technically, we construct our matrix M such that M(x ⊗ y) = Tx ∘ T'y where ∘ is the Hadamard (elementwise) product and T and T' support fast matrixvector multiplication ala [Ailon Chazelle]. To analyze the behavior of Mx on nonsimple x , we show a higher order version of Khintchine's inequality, related to the higher order Gaussian chaos analysis by Latała. Finally we show that such sketches can be combined recursively, in a way that doesn't increase the dependency on c by much.

TA — 2017
It is NPhard to verify an LSF on the sphere. Not Published, (pdf)We show a reduction from verifying that an LSF family `covers` the sphere, in the sense of Las Vegas LSF, to 3sat. 
We consider efficient combinatorial constructions, that allow us to partly derandomize datastructures using the locality sensitive framework of Indyk and Motwani (FOCS '98). In particular our constructions allow us to make ZeroError Probabilistic Polynomial Time (ZPP) analogues of two state of the art algorithms for `Approximate Set Similarity': This datastructure problem deals with storing a collection X of sets such that given a query set q for which there exists x ∈ P with q ∩ x/q ∪ x >= s_1 , the data structures return x'∈ P with q ∩ x'/q ∪ x'\ge s_2 . The first algorithm by Broder et al.introduced the famous `minhash' function, which in the locality sensitive framework yields an n^{ρ_b} time, n^{1+ρ_b} space data structure for ρ_b=(log1/s_1)/(log1/s_2) . The second by Christiani et al.~ gives an n^{ρ_c} time n^{1+ρ_c} space datastructure for ρ_c=(\log2s_1/(1+s_1))/(\log2s_2/(1+s_2)) . Both algorithms use Monte Carlo randomization, but we show that this is not necessary, at least up to n^{o(1)} factors. This settles an open problem from Arasu et al and Pagh asking whether locality sensitive datastructures could be made _exact_ or _without false negatives_ other than for hamming distance, and whether a performance gap was needed in the exponent. The main approach in the thesis is to replace the `locality sensitive hash functions' or `space partitions' with `combinatorial design'. We show that many such designs can be constructed efficiently with the `multisplitters' introduced by Alon et al. We further show that careful constructions of such designs can be efficiently decoded. We also investigate upper and lower bounds on combinatorial analogues of the minhash algorithm. This is related to the existence of small, approximate minwise hashing families under l_∞ distance.

TA — 2017
Asymptotic Tail Bounds and Applications. Not Published, (pdf)In the field of Computer Science, the Chernoff bound is an extremely useful found bounding the error probabilities of various algorithms. Chernoff gives an exponentially decaying upper bound on the probability that a sum of independent random variables is many standard deviations away from its expectation. However sometimes we need more than an upper bound, and we need bounds that are tight within a constant factor or better. (There is a fairly standard tail lower bound, but it deviates from Chernoff by a factor of $\sqrt n$.) In this project I will explore and derive multiple upper and lower bounds for Chernoff using methods ranging from geometric series and generating functions to saddlepoint approximations and laplace approximation. All these results will be known, but they don’t seem have a good exposition in Computer Science. The reason also to consider more simple methods is to help an intuition for deriving similar bounds for different problems. In particular I will derive a tight bound for the size of the intersection between two hamming balls, which does not seem to exist in the literature. I will use the formulas to answer whether algorithms exists for the following problems: Locality Sensitive Filters in hamming space with limited use of random bits (as opposed to current methods, requiring gaussian samples), LSF with improved performance for low dimensional spaces. (dimension < 2 log n) Linear space bit sampling LSH, which I have previously partially analyzed, but only in a different context, in which a full analysis was not necessary.
Teaching

2019 Practical Concurrent and Parallel Programming. (pcpp)
Files and schedule.In this MSc course you learn how to write correct and efficient concurrent and parallel software, primarily using Java, on standard sharedmemory multicore hardware. The course covers basic mechanisms such as threads, locks and shared memory as well as more advanced mechanisms such as parallel streams for bulk data, transactional memory, message passing, and lockfree data structures with compareandswap. It covers concepts such as atomicity, safety, liveness and deadlock. It covers how to measure and understand performance and scalability of parallel programs. It covers methods to find bugs in concurrent programs.
Media
 Stibo  August 2016.
"The StiboFoundation supports ITtalents."
The announcement of my winning the Stibo Travel grant.  Linux Format  January 2016.
"Python: Sunfish chess engine."
Article about my Sunfish chess software. (pdf)  Computerworld  June 2015.
"With the National Team at the Programming World Cup: Sport Coding Sharpens You."
Coverage of my teams participation in the ICPC World Finals.  Computerworld  October 2013.
"These are Denmark's Three Greatest Programmers."
Code
 PyChess  a chess engine and Internet chess client
 Sunfish  a minimalist chess engine  play now!
 numberlink  a fast solver for numberlink puzzles
 codenames  an AI that plays Codenames using Glove vectors.
 fastchess  an experiment into using the FastText library to play chess.
 mtime  helps managing the ITU time management