Politecnico di Torino - Corso Duca degli Abruzzi, 24 - 10129 Torino, ITALY

+39 011 090 6100 info@tech-share.it

Text Categorization and analytics

Classificatore di testiComputational linguisticsCorpus linguisticscovid-19CTENEXTData miningInformatica Tsd EnMachine learningText classifier

Introduction

Filing system, classification and analysis of texts. We developed a mixed system (SQL and NoSQL) of multi-layered databases for data storage that allow to make complex queries without an excessive demand for resources. This structure was born initially to preserve and analyze decaying languages with at least one dictionary, a limited corpus.

Technical features

The object of the invention consists in a system of texts storage and analysis by means of a classifier and an implementer based on statistical machine learning, developed to produce results even with a very limited number of data, for example the storage and analysis of decaying languages and their cultural heritage. The storage of textual data takes place in a mixed and stratified database system (SQL and NoSQL) to easily adapt to other areas that are not only linguistic analysis, corpus linguistics or computational linguistics, but any analysis that provide for distribution algorithms, predictive loops, statistical analyzes, etc. For example, we are evaluating applications in the medical (digitalization of anamnesis) and management (business intelligence).

Possible Applications

  • Corpus linguistics;
  • Content analysis;
  • Social network analytics;
  • Sentiment analysis;
  • Data retrieval;
  • Data mining.

Advantages

  • First corpus linguistics platform developed for minority and decaying languages;
  • Layered and fragmented file storage system, designed to not require high computing power;
  • Fast and flexible platform.