Lucie Molková

Lucie Molkova




Diplomová práce

Indexing Very Large Text Data

Master's Thesis

With emergence of digital libraries with non-textual content, there is a clear need for improved techniques to organize large quantities of information. It appears, that the textual descriptions associated with non-textual content are an important source of information when judging the topical relevance. The CoPhIR (Content-based Photo Image Retrieval) data set is an example of multimedia collection that serves as the basis of the experiments, including over 100 million images with associated metadata. The main objective of this thesis is to study Lucene technology for indexing text data, and, using this technology, implement indexing and content-based image searching of the CoPhIR collection. The procedure of creating index from initial data set is described in detail, including possible pitfalls. This work also surveys efforts that focus on information retrieval and presents some challenges for retrieval in both image indexing and searching.

Bakalářská práce

Relational Algebra Expression Evaluation

Bachelor's Thesis

Relational algebra is a query language that is being used to explain basic relational operations and their principles. Many books and articles are concerned with the theory of relational algebra, however there is no practical use for it. One of the reasons is that no software exists that would allow a real use of relational algebra. Most of the currently used relational database management systems work with SQL queries. This work therefore describes and implemets a tool that transforms a relational algebra expressions into SQL queries, allowing the expressions to be evaluated in standard databases. This work also defines a new, applicable syntax for relational algebra and describes its grammar in order to ensure the correctness of the expressions.