One drawback of DIH is that it's not capable of detecting and indexing just new files. It comes with a TikaEntityProcessor which extracts ebook metadata and contents as a Solr document. This pipeline is "Files push to Solr HTTP endpoint => extracted using Tika => written to Solr Index".īut first, more about DIH here. Push the files from outside to a Solr endpoint.This pipeline is "Files extracted using Tika => written to Solr index". Pull the files in from Solr's side using its Solr's Data Import Handler (DIH).IMO, for personal use, you don't really need a separate tika-server container - just a Solr container's enough. However, Solr already packages Tika the framework and comes pre-configured to use it. By Docker container for Tika, you're probably referring to a container that deploys tika-server. But it also provides an optional standalone REST API server called tika-server. Tika is primarily a Java parsing framework for use by other server components. I'll give an overview here.įirst, Tika is already integrated into Solr. I don't know of a good guide and couldn't find one.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |