diff options
| author | server <brownptcdash@gmail.com> | 2019-12-10 18:12:37 -0500 |
|---|---|---|
| committer | server <brownptcdash@gmail.com> | 2019-12-10 18:12:37 -0500 |
| commit | 7478e610d99d1f2fb383ecbfa0b70d72eae27f81 (patch) | |
| tree | 59486701838067351f00580315c05690f36a39ce /solr-8.3.1/example/films/README.txt | |
| parent | 096718ee546afbd7568bf8ec9e23dca0556d814d (diff) | |
solr changes
Diffstat (limited to 'solr-8.3.1/example/films/README.txt')
| -rw-r--r-- | solr-8.3.1/example/films/README.txt | 138 |
1 files changed, 138 insertions, 0 deletions
diff --git a/solr-8.3.1/example/films/README.txt b/solr-8.3.1/example/films/README.txt new file mode 100644 index 000000000..d1679d222 --- /dev/null +++ b/solr-8.3.1/example/films/README.txt @@ -0,0 +1,138 @@ +We have a movie data set in JSON, Solr XML, and CSV formats. +All 3 formats contain the same data. You can use any one format to index documents to Solr. + +The data is fetched from Freebase and the data license is present in the films-LICENSE.txt file. + +This data consists of the following fields: + * "id" - unique identifier for the movie + * "name" - Name of the movie + * "directed_by" - The person(s) who directed the making of the film + * "initial_release_date" - The earliest official initial film screening date in any country + * "genre" - The genre(s) that the movie belongs to + + Steps: + * Start Solr: + bin/solr start + + * Create a "films" core: + bin/solr create -c films + + * Set the schema on a couple of fields that Solr would otherwise guess differently (than we'd like) about: +curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field" : { + "name":"name", + "type":"text_general", + "multiValued":false, + "stored":true + }, + "add-field" : { + "name":"initial_release_date", + "type":"pdate", + "stored":true + } +}' + + * Now let's index the data, using one of these three commands: + + - JSON: bin/post -c films example/films/films.json + - XML: bin/post -c films example/films/films.xml + - CSV: bin/post \ + -c films \ + example/films/films.csv \ + -params "f.genre.split=true&f.directed_by.split=true&f.genre.separator=|&f.directed_by.separator=|" + + * Let's get searching! + - Search for 'Batman': + http://localhost:8983/solr/films/query?q=name:batman + + * If you get an error about the name field not existing, you haven't yet indexed the data + * If you don't get an error, but zero results, chances are that the _name_ field schema type override wasn't set + before indexing the data the first time (it ended up as a "string" type, requiring exact matching by case even). + It's easiest to simply reset the environment and try again, ensuring that each step successfully executes. + + - Show me all 'Super hero' movies: + http://localhost:8983/solr/films/query?q=*:*&fq=genre:%22Superhero%20movie%22 + + - Let's see the distribution of genres across all the movies. See the facet section of the response for the counts: + http://localhost:8983/solr/films/query?q=*:*&facet=true&facet.field=genre + + - Browse the indexed films in a traditional browser search interface: + http://localhost:8983/solr/films/browse + + Now browse including the genre field as a facet: + http://localhost:8983/solr/films/browse?facet.field=genre + + If you want to set a facet for /browse to keep around for every request add the facet.field into the "facets" + param set (which the /browse handler is already configured to use): +curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{ +"update" : { + "facets": { + "facet.field":"genre" + } + } +}' + + And now http://localhost:8983/solr/films/browse will display the _genre_ facet automatically. + +Exploring the data further - + + * Increase the MAX_ITERATIONS value, put in your freebase API_KEY and run the film_data_generator.py script using Python 3. + Now re-index Solr with the new data. + +FAQ: + Why override the schema of the _name_ and _initial_release_date_ fields? + + Without overriding those field types, the _name_ field would have been guessed as a multi-valued string field type + and _initial_release_date_ would have been guessed as a multi-valued pdate type. It makes more sense with this + particular data set domain to have the movie name be a single valued general full-text searchable field, + and for the release date also to be single valued. + + How do I clear and reset my environment? + + See the script below. + + Is there an easy to copy/paste script to do all of the above? + + Here ya go << END_OF_SCRIPT + +bin/solr stop +rm server/logs/*.log +rm -Rf server/solr/films/ +bin/solr start +bin/solr create -c films +curl http://localhost:8983/solr/films/schema -X POST -H 'Content-type:application/json' --data-binary '{ + "add-field" : { + "name":"name", + "type":"text_general", + "multiValued":false, + "stored":true + }, + "add-field" : { + "name":"initial_release_date", + "type":"pdate", + "stored":true + } +}' +bin/post -c films example/films/films.json +curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{ +"update" : { + "facets": { + "facet.field":"genre" + } + } +}' + +# END_OF_SCRIPT + +Additional fun - + +Add highlighting: +curl http://localhost:8983/solr/films/config/params -H 'Content-type:application/json' -d '{ +"set" : { + "browse": { + "hl":"on", + "hl.fl":"name" + } + } +}' +try http://localhost:8983/solr/films/browse?q=batman now, and you'll see "batman" highlighted in the results |
