Politecnico di Torino - Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY

+39 011 090 6100 info@tech-share.it

STRUCTURING AND COMPRESSION OF LABELED TREES

CodingCompressionCompressioneDatabaseSearchesSoftware

Introduction

The patented invention encompasses a method applicable to computer language, useful for compressing labeled trees, that is, having labels on the nodes, and supporting efficient searches on them. Labeled trees are used to represent data or calculations in software applications including trial applications, dictionaries, parse trees, suffix trees, and pixel trees.

Technical features

Tree labelling is carried out on the nodes. The operations that can be performed on the compressed tree include the navigation of its structure, or searches for sequences of labels that occur as paths in the tree itself. The innovative aspect of the patented technique consists in being able to: a) perform searches without decompressing the tree entirely, but acting only on a small portion of it, b) obtain compression performance close to the k-order entropy (state of the the art of Data-Compression) calculated on the content and structure of the tree, making it possible, therefore, to search even on large documents without being decompressed. The most natural application is compression and search in documents in XML or JSON format. The compressibility capabilities are high. In addition to this aspect, the technique allows to transform the labeled tree (image (a)) into two strings: one binary and one consisting of the labels present in the tree (image (b) and (c)). The two strings alone are sufficient to carry out extensive searches within the compressed document.

Possible Applications

  • Experiments on XML and JSON datasets;
  • Extension on other possible applications considering labelled trees not just from XML and JSON;
  • Compression and research on other types of data such as: byte sequences, integers, labelled graphs, or more complex structures;
  • Software engineering of the prototypes produced.

Advantages

  • Remarkable compressibility properties;
  • Simplicity in the implementation of the method;
  • Ability to efficiently perform searches on labeled trees of arbitrary degree and shape.