Elasticsearch python API overview¶
import warnings
from elasticsearch import Elasticsearch, RequestsHttpConnection
warnings.filterwarnings('ignore')
Avant de commencer¶
Lancer elasticsearch avec docker¶
Pour ce faire, on va run un cluster elastic dans un container. Si vous n'avez pas deja l'image elastic dans votre registery local il faut la pull du hub avec la commande suivante:
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.11.1
puis on run le container sur le port 9200 tel que:
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.11.1
Lancer elasticsearch avec docker-compose¶
On peut aussi lancer plusieurs noeud au sein d'un meme cluster avec docker-compose tel que
version: '2.2'
services:
es01:
image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data01:/usr/share/elasticsearch/data
ports:
- 9200:9200
networks:
- elastic
es02:
image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
container_name: es02
environment:
- node.name=es02
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data02:/usr/share/elasticsearch/data
networks:
- elastic
es03:
image: docker.elastic.co/elasticsearch/elasticsearch:7.11.1
container_name: es03
environment:
- node.name=es03
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es01,es02
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
soft: -1
hard: -1
volumes:
- data03:/usr/share/elasticsearch/data
networks:
- elastic
volumes:
data01:
driver: local
data02:
driver: local
data03:
driver: local
networks:
elastic:
driver: bridge
Plus d'info sur le doc ici
🚧Attention à votre configuration Docker 🚧¶
Elastic demande beaucoup de ressource à votre docker (et donc à votre machine) il faut avoir au moins configurer 4GB de memoire que Docker peut utiliser. Vous pouvez aussi changer directement la configuration de la JVM des container avec le paramètre ES_JAVA_OPTS=-Xms512m -Xmx512m
et le passer à 256m
ou bien 128m
.
📟 Exercice [optionnel]¶
Ecrire un fichier docker-compose.yml
avec un service Elasticsearch sur le port 9200 (un seul noeud) et un service Kibana sur le port 5601 ainsi qu'un network elnet
Ping du container¶
import requests
res = requests.get('http://localhost:9200?pretty')
print(res.content)
b'{\n "name" : "3935d86abc4c",\n "cluster_name" : "docker-cluster",\n "cluster_uuid" : "pFca2ZHUT8KQpVXk2dIAzg",\n "version" : {\n "number" : "7.11.1",\n "build_flavor" : "default",\n "build_type" : "docker",\n "build_hash" : "ff17057114c2199c9c1bbecc727003a907c0db7a",\n "build_date" : "2021-02-15T13:44:09.394032Z",\n "build_snapshot" : false,\n "lucene_version" : "8.7.0",\n "minimum_wire_compatibility_version" : "6.8.0",\n "minimum_index_compatibility_version" : "6.0.0-beta1"\n },\n "tagline" : "You Know, for Search"\n}\n'
es = Elasticsearch('http://localhost:9200')
Create, delete and verify index¶
#create
es.indices.create(index="first_index",ignore=400)
#verify
print es.indices.exists(index="first_index")
#delete
print es.indices.delete(index="first_index", ignore=[400,404])
Insert documents¶
#documents to insert in the elasticsearch index "cities"
doc1 = {"city":"New Delhi", "country":"India"}
doc2 = {"city":"London", "country":"England"}
doc3 = {"city":"Los Angeles", "country":"USA"}
#Inserting doc1 in id=1
es.index(index="cities", doc_type="places", id=1, body=doc1)
#Inserting doc2 in id=2
es.index(index="cities", doc_type="places", id=2, body=doc2)
#Inserting doc3 in id=3
es.index(index="cities", doc_type="places", id=3, body=doc3)
📟 Exercice [optionnel]¶
Trouver la fonction qui vérifie que votre index est bien crée.
True
Retrieve data with id : get
¶
res = es.get(index="cities", doc_type="places", id=2)
res
{'_id': '2', '_index': 'cities', '_primary_term': 1, '_seq_no': 1, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places', '_version': 1, 'found': True}
📟 Exercice [optionnel]¶
Afficher uniquement les informations ci-dessous à partir de la variable res
{'city': 'London', 'country': 'England'}
Mapping¶
es.indices.get_mapping(index='cities')
{'cities': {'mappings': {'properties': {'city': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}}, 'type': 'text'}, 'country': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}}, 'type': 'text'}}}}}
More about mappings: https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html
Le endpoint _search
et les query
¶
Pour la suite des exemple assurez vous d'avoir importer les data via la _bulk api
res = es.search(index="cities", body={"query":{"match_all":{}}})
res
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '1', '_index': 'cities', '_score': 1.0, '_source': {'city': 'New Delhi', 'country': 'India'}, '_type': 'places'}, {'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}, {'_id': '3', '_index': 'cities', '_score': 1.0, '_source': {'city': 'Los Angeles', 'country': 'America Bitch'}, '_type': 'places'}], 'max_score': 1.0, 'total': {'relation': 'eq', 'value': 3}}, 'timed_out': False, 'took': 5}
📟 Exercice [optionnel]¶
Afficher uniquement les informations ci-dessous à partir de la variable res
[{'_id': '1', '_index': 'cities', '_score': 1.0, '_source': {'city': 'New Delhi', 'country': 'India'}, '_type': 'places'}, {'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}, {'_id': '3', '_index': 'cities', '_score': 1.0, '_source': {'city': 'Los Angeles', 'country': 'America Bitch'}, '_type': 'places'}]
Affiner ces critères de recherche avec _source
¶
es.search(index="movies", body={
"_source": {
"includes": [
"*.title",
"*.directors"
],
"excludes": [
"*.actors*",
"*.genres"
]
},
"query": {
"match": {
"fields.directors": "George"
}
}
})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5}, 'hits': {'hits': [{'_id': '475', '_index': 'movies', '_score': 5.6268926, '_source': {'fields': {'directors': ['George Clooney'], 'title': 'The Monuments Men'}}, '_type': 'movie'}, {'_id': '1183', '_index': 'movies', '_score': 5.6268926, '_source': {'fields': {'directors': ['George Nolfi'], 'title': 'The Adjustment Bureau'}}, '_type': 'movie'}, {'_id': '4150', '_index': 'movies', '_score': 5.6268926, '_source': {'fields': {'directors': ['Terry George'], 'title': 'Reservation Road'}}, '_type': 'movie'}, {'_id': '3378', '_index': 'movies', '_score': 4.881689, '_source': {'fields': {'directors': ['George Miller', 'George Ogilvie'], 'title': 'Mad Max Beyond Thunderdome'}}, '_type': 'movie'}, {'_id': '226', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Lucas'], 'title': 'Star Wars'}}, '_type': 'movie'}, {'_id': '690', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Clooney'], 'title': 'The Ides of March'}}, '_type': 'movie'}, {'_id': '1165', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Lucas'], 'title': 'American Graffiti'}}, '_type': 'movie'}, {'_id': '3022', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Lucas'], 'title': 'THX 1138'}}, '_type': 'movie'}, {'_id': '3715', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Gallo'], 'title': 'Middle Men'}}, '_type': 'movie'}, {'_id': '4639', '_index': 'movies', '_score': 4.719993, '_source': {'fields': {'directors': ['George Miller'], 'title': 'Andre'}}, '_type': 'movie'}], 'max_score': 5.6268926, 'total': {'relation': 'eq', 'value': 56}}, 'timed_out': False, 'took': 87}
Logique booléenne¶
es.search(index="movies", body=
{
"query": {
"bool": {
"must": [
{
"match": {
"fields.directors": "George"
}
},
{
"match": {
"fields.title": "Star Wars"
}
}
]
}
}
})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5}, 'hits': {'hits': [{'_id': '226', '_index': 'movies', '_score': 16.046509, '_source': {'fields': {'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher'], 'directors': ['George Lucas'], 'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTU4NTczODkwM15BMl5BanBnXkFtZTcwMzEyMTIyMw@@._V1_SX400_.jpg', 'plot': "Luke Skywalker joins forces with a Jedi Knight, a cocky pilot, a wookiee and two droids to save the universe from the Empire's world-destroying battle-station, while also attempting to rescue Princess Leia from the evil Darth Vader.", 'rank': 226, 'rating': 8.7, 'release_date': '1977-05-25T00:00:00Z', 'running_time_secs': 7260, 'title': 'Star Wars', 'year': 1977}, 'id': 'tt0076759', 'type': 'add'}, '_type': 'movie'}, {'_id': '469', '_index': 'movies', '_score': 10.593456, '_source': {'fields': {'actors': ['Ewan McGregor', 'Liam Neeson', 'Natalie Portman'], 'directors': ['George Lucas'], 'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTQ4NjEwNDA2Nl5BMl5BanBnXkFtZTcwNDUyNDQzNw@@._V1_SX400_.jpg', 'plot': 'Two Jedi Knights escape a hostile blockade to find allies and come across a young boy who may bring balance to the Force, but the long dormant Sith resurface to reclaim their old glory.', 'rank': 469, 'rating': 6.5, 'release_date': '1999-05-19T00:00:00Z', 'running_time_secs': 8160, 'title': 'Star Wars: Episode I - The Phantom Menace', 'year': 1999}, 'id': 'tt0120915', 'type': 'add'}, '_type': 'movie'}, {'_id': '371', '_index': 'movies', '_score': 10.064606, '_source': {'fields': {'actors': ['Hayden Christensen', 'Natalie Portman', 'Ewan McGregor'], 'directors': ['George Lucas'], 'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BNTc4MTc3NTQ5OF5BMl5BanBnXkFtZTcwOTg0NjI4NA@@._V1_SX400_.jpg', 'plot': "After three years of fighting in the Clone Wars, Anakin Skywalker falls prey to the Sith Lord's lies and makes an enemy of the Jedi and those he loves, concluding his journey to the Dark Side.", 'rank': 371, 'rating': 7.7, 'release_date': '2005-05-15T00:00:00Z', 'running_time_secs': 8400, 'title': 'Star Wars: Episode III - Revenge of the Sith', 'year': 2005}, 'id': 'tt0121766', 'type': 'add'}, '_type': 'movie'}, {'_id': '922', '_index': 'movies', '_score': 10.064606, '_source': {'fields': {'actors': ['Hayden Christensen', 'Natalie Portman', 'Ewan McGregor'], 'directors': ['George Lucas'], 'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTY5MjI5NTIwNl5BMl5BanBnXkFtZTYwMTM1Njg2._V1_SX400_.jpg', 'plot': 'Ten years later, Anakin Skywalker shares a forbidden romance with Padmé, while Obi-Wan investigates an assassination attempt on the Princess and discovers a secret clone army crafted for the Jedi.', 'rank': 922, 'rating': 6.7, 'release_date': '2002-05-16T00:00:00Z', 'running_time_secs': 8520, 'title': 'Star Wars: Episode II - Attack of the Clones', 'year': 2002}, 'id': 'tt0121765', 'type': 'add'}, '_type': 'movie'}], 'max_score': 16.046509, 'total': {'relation': 'eq', 'value': 4}}, 'timed_out': False, 'took': 35}
Les critères : SHOULD / MUST¶
es.search(index="movies", body=
{
"query": {
"bool": {
"must": [
{ "match": { "fields.title": "Star Wars"}}
],
"must_not": { "match": { "fields.directors": "George Miller" }},
"should": [
{ "match": { "fields.title": "Star" }},
{ "match": { "fields.directors": "George Lucas"}}
]
}
}
})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 5, 'total': 5}, 'hits': {'hits': [{'_id': '2509', '_index': 'movies', '_score': 14.557282, '_source': {'fields': {'actors': ['Mark Wahlberg', 'Jennifer Aniston', 'Dominic West'], 'directors': ['Stephen Herek'], 'genres': ['Comedy', 'Drama', 'Music'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMjE4NTYyNTQ0M15BMl5BanBnXkFtZTcwNDYwMTAyMQ@@._V1_SX400_.jpg', 'plot': 'Lead singer of a tribute band becomes lead singer of the real band he idolizes.', 'rank': 2509, 'rating': 5.9, 'release_date': '2001-09-04T00:00:00Z', 'running_time_secs': 6300, 'title': 'Rock Star', 'year': 2001}, 'id': 'tt0202470', 'type': 'add'}, '_type': 'movie'}, {'_id': '168', '_index': 'movies', '_score': 12.612011, '_source': {'fields': {'actors': ['Mark Hamill', 'Harrison Ford', 'Carrie Fisher'], 'directors': ['J.J. Abrams'], 'genres': ['Action', 'Adventure', 'Fantasy', 'Sci-Fi'], 'plot': 'A continuation of the saga created by George Lucas.', 'rank': 168, 'release_date': '2015-01-01T00:00:00Z', 'title': 'Star Wars: Episode VII', 'year': 2015}, 'id': 'tt2488496', 'type': 'add'}, '_type': 'movie'}, {'_id': '128', '_index': 'movies', '_score': 11.051997, '_source': {'fields': {'actors': ['Chris Pine', 'Zachary Quinto', 'Simon Pegg'], 'directors': ['J.J. Abrams'], 'genres': ['Action', 'Adventure', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMjE5NDQ5OTE4Ml5BMl5BanBnXkFtZTcwOTE3NDIzMw@@._V1_SX400_.jpg', 'plot': "The brash James T. Kirk tries to live up to his father's legacy with Mr. Spock keeping him in check as a vengeful, time-traveling Romulan creates black holes to destroy the Federation one planet at a time.", 'rank': 128, 'rating': 8, 'release_date': '2009-04-06T00:00:00Z', 'running_time_secs': 7620, 'title': 'Star Trek', 'year': 2009}, 'id': 'tt0796366', 'type': 'add'}, '_type': 'movie'}, {'_id': '2357', '_index': 'movies', '_score': 11.051997, '_source': {'fields': {'actors': ["Dan O'Bannon", 'Dre Pahich', 'Brian Narelle'], 'directors': ['John Carpenter'], 'genres': ['Comedy', 'Sci-Fi'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTUwODkwMzk1M15BMl5BanBnXkFtZTcwMjc4ODY3Mw@@._V1_SX400_.jpg', 'plot': 'In the far reaches of space, a small crew, 20 years into their solitary mission, find things beginning to go hilariously wrong.', 'rank': 2357, 'rating': 6.4, 'release_date': '1974-04-01T00:00:00Z', 'running_time_secs': 4980, 'title': 'Dark Star', 'year': 1974}, 'id': 'tt0069945', 'type': 'add'}, '_type': 'movie'}, {'_id': '1871', '_index': 'movies', '_score': 10.774647, '_source': {'fields': {'actors': ['Chris Cooper', 'Elizabeth Peña', 'Stephen Mendillo'], 'directors': ['John Sayles'], 'genres': ['Drama', 'Mystery', 'Romance'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTU3OTI2OTk0N15BMl5BanBnXkFtZTcwMTU3OTYxMQ@@._V1_SX400_.jpg', 'plot': 'When the skeleton of his murdered predecessor is found, Sheriff Sam Deeds unearths many other long-buried secrets in his Texas border town.', 'rank': 1871, 'rating': 7.5, 'release_date': '1996-06-21T00:00:00Z', 'running_time_secs': 8100, 'title': 'Lone Star', 'year': 1996}, 'id': 'tt0116905', 'type': 'add'}, '_type': 'movie'}, {'_id': '2571', '_index': 'movies', '_score': 10.774647, '_source': {'fields': {'actors': ['Abbie Cornish', 'Ben Whishaw', 'Paul Schneider'], 'directors': ['Jane Campion'], 'genres': ['Biography', 'Drama', 'Romance'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTg0NjEwNDgxNF5BMl5BanBnXkFtZTcwMjkyOTM3Mg@@._V1_SX400_.jpg', 'plot': 'The three-year romance between 19th century poet John Keats and Fanny Brawne.', 'rank': 2571, 'rating': 6.9, 'release_date': '2009-05-15T00:00:00Z', 'running_time_secs': 7140, 'title': 'Bright Star', 'year': 2009}, 'id': 'tt0810784', 'type': 'add'}, '_type': 'movie'}, {'_id': '1921', '_index': 'movies', '_score': 9.731573, '_source': {'fields': {'actors': ['Patrick Stewart', 'Jonathan Frakes', 'Brent Spiner'], 'directors': ['Stuart Baird'], 'genres': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMjAxNjY2NDY3NF5BMl5BanBnXkFtZTcwMjA0MTEzMw@@._V1_SX400_.jpg', 'plot': 'After the Enterprise is diverted to the Romulan planet of Romulus, supposedly because they want to negotiate a truce, the Federation soon find out the Romulans are planning an attack on Earth.', 'rank': 1921, 'rating': 6.3, 'release_date': '2002-12-09T00:00:00Z', 'running_time_secs': 6960, 'title': 'Star Trek: Nemesis', 'year': 2002}, 'id': 'tt0253754', 'type': 'add'}, '_type': 'movie'}, {'_id': '2041', '_index': 'movies', '_score': 9.731573, '_source': {'fields': {'actors': ['Patrick Stewart', 'William Shatner', 'Malcolm McDowell'], 'directors': ['David Carson'], 'genres': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BOTMyODkyODk1MV5BMl5BanBnXkFtZTcwNjk5MzI4OA@@._V1_SX400_.jpg', 'plot': 'Captain Picard, with the help of supposedly dead Captain Kirk, must stop a madman willing to murder on a planetary scale in order to enter a space matrix.', 'rank': 2041, 'rating': 6.5, 'release_date': '1994-11-17T00:00:00Z', 'running_time_secs': 7080, 'title': 'Star Trek: Generations', 'year': 1994}, 'id': 'tt0111280', 'type': 'add'}, '_type': 'movie'}, {'_id': '2236', '_index': 'movies', '_score': 9.522415, '_source': {'fields': {'actors': ['Patrick Stewart', 'Jonathan Frakes', 'Brent Spiner'], 'directors': ['Jonathan Frakes'], 'genres': ['Action', 'Adventure', 'Sci-Fi', 'Thriller'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMjA3NDI5MzQ1OF5BMl5BanBnXkFtZTcwMzcxNDI4OA@@._V1_SX400_.jpg', 'plot': 'When the crew of the Enterprise learn of a Federation plot against the inhabitants of a unique planet, Captain Picard begins an open rebellion.', 'rank': 2236, 'rating': 6.3, 'release_date': '1998-12-11T00:00:00Z', 'running_time_secs': 6180, 'title': 'Star Trek: Insurrection', 'year': 1998}, 'id': 'tt0120844', 'type': 'add'}, '_type': 'movie'}, {'_id': '3277', '_index': 'movies', '_score': 9.429694, '_source': {'fields': {'actors': ['Ziyi Zhang', 'Leehom Wang', 'Ruby Lin'], 'directors': ['Dennie Gordon'], 'genres': ['Adventure', 'Comedy'], 'image_url': 'http://ia.media-imdb.com/images/M/MV5BMTQ0MzE3Mjk4M15BMl5BanBnXkFtZTgwNTMzMTEyMDE@._V1_SX400_.jpg', 'plot': 'A woman gets caught up in an international diamond heist that draws her near to a spy trying to save the world.', 'rank': 3277, 'rating': 6.6, 'release_date': '2013-09-17T00:00:00Z', 'running_time_secs': 6840, 'title': 'My Lucky Star', 'year': 2013}, 'id': 'tt2102502', 'type': 'add'}, '_type': 'movie'}], 'max_score': 14.557282, 'total': {'relation': 'eq', 'value': 25}}, 'timed_out': False, 'took': 30}
Filtrer ses query avec filter
¶
On cherche ici les recettes avec un ingrédient de type parmesan
sans ingrédient tuna
en filtrant les recettes avec un temps de préparation inférieur ou egale à 15minutes.
es.search(index="receipe", body={
"query": {
"bool": {
"must": [
{
"match": {
"ingredients.name": "parmesan"
}
}
],
"must_not": [
{
"match": {
"ingredients.name": "tuna"
}
}
],
"filter": [
{
"range":{
"preparation_time_minutes": {
"lte":15
}
}
}
]
}
}
})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '1', '_index': 'receipe', '_score': 1.379573, '_source': {'created': '2017/03/29', 'description': "Cherry tomatoes are almost always sweeter, riper, and higher in pectin than larger tomatoes at the supermarket. All of these factors mean that cherry tomatoes are fantastic for making a rich, thick, flavorful sauce. Even better: It takes only four ingredients and about 10 minutes, start to finish—less time than it takes to cook the pasta you're gonna serve it with.", 'ingredients': [{'name': 'Dry pasta', 'quantity': '450g'}, {'name': 'Kosher salt'}, {'name': 'Cloves garlic', 'quantity': '4'}, {'name': 'Extra-virgin olive oil', 'quantity': '90ml'}, {'name': 'Cherry tomatoes', 'quantity': '750g'}, {'name': 'Fresh basil leaves', 'quantity': '30g'}, {'name': 'Freshly ground black pepper'}, {'name': 'Parmesan cheese'}], 'preparation_time_minutes': 12, 'ratings': [4.5, 5.0, 3.0, 4.5], 'servings': {'max': 6, 'min': 4}, 'steps': ['Place pasta in a large skillet or sauté pan and cover with water and a big pinch of salt. Bring to a boil over high heat, stirring occasionally. Boil until just shy of al dente, about 1 minute less than the package instructions recommend.', 'Meanwhile, heat garlic and 4 tablespoons (60ml) olive oil in a 12-inch skillet over medium heat, stirring frequently, until garlic is softened but not browned, about 3 minutes. Add tomatoes and cook, stirring, until tomatoes begin to burst. You can help them along by pressing on them with the back of a wooden spoon as they soften.', 'Continue to cook until sauce is rich and creamy, about 5 minutes longer. Stir in basil and season to taste with salt and pepper.', 'When pasta is cooked, drain, reserving 1 cup of pasta water. Add pasta to sauce and increase heat to medium-high. Cook, stirring and tossing constantly and adding reserved pasta water as necessary to adjust consistency to a nice, creamy flow. Remove from heat, stir in remaining 2 tablespoons (30ml) olive oil, and grate in a generous shower of Parmesan cheese. Serve immediately, passing extra Parmesan at the table.'], 'title': 'Fast and Easy Pasta With Blistered Cherry Tomato Sauce'}, '_type': '_doc'}, {'_id': '10', '_index': 'receipe', '_score': 1.2786832, '_source': {'created': '2017/04/27', 'description': 'Exceedingly simple in concept and execution, arrabbiata sauce is tomato sauce with the distinction of being spicy enough to earn its "angry" moniker. Here\'s how to make it, from start to finish.', 'ingredients': [{'name': 'Kosher salt'}, {'name': 'Penne pasta', 'quantity': '450g'}, {'name': 'Extra-virgin olive oil', 'quantity': '3 tablespoons'}, {'name': 'Clove garlic', 'quantity': '1'}, {'name': 'Crushed red pepper'}, {'name': 'Can whole peeled tomatoes', 'quantity': '400g'}, {'name': 'Finely grated Parmesan cheese', 'quantity': '60g'}, {'name': 'Minced flat-leaf parsley leaves', 'quantity': 'Small handful'}], 'preparation_time_minutes': 15, 'ratings': [1.5, 2.0, 4.0, 3.5, 3.0, 5.0, 1.5], 'servings': {'max': 4, 'min': 4}, 'steps': ['In a medium saucepan of boiling salted water, cook penne until just short of al dente, about 1 minute less than the package recommends.', 'Meanwhile, in a large skillet, combine oil, garlic, and pepper flakes. Cook over medium heat until garlic is very lightly golden, about 5 minutes. (Adjust heat as necessary to keep it gently sizzling.)', 'Add tomatoes, stir to combine, and bring to a bare simmer. When pasta is ready, transfer it to sauce using a strainer or slotted spoon. (Alternatively, drain pasta through a colander, reserving 1 cup of cooking water. Add drained pasta to sauce.)', 'Add about 1/4 cup pasta water to sauce and increase heat to bring pasta and sauce to a vigorous simmer. Cook, stirring and shaking the pan and adding more pasta water as necessary to keep sauce loose, until pasta is perfectly al dente, 1 to 2 minutes longer. (The pasta will cook more slowly in the sauce than it did in the water.)', 'Continue cooking pasta until sauce thickens and begins to coat noodles, then remove from heat and toss in cheese and parsley, stirring vigorously to incorporate. Stir in a drizzle of fresh olive oil, if desired. Season with salt and serve right away, passing more cheese at the table.'], 'title': 'Penne With Hot-As-You-Dare Arrabbiata Sauce'}, '_type': '_doc'}], 'max_score': 1.379573, 'total': {'relation': 'eq', 'value': 2}}, 'timed_out': False, 'took': 21}
Recherche avec un prefix¶
Les query de type prefix
permettent de trouver tout les termes commencant par le(s) caractère(s) correspondant.
es.search(index="cities", body={"query": {"prefix" : { "city" : "l" }}})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}, {'_id': '3', '_index': 'cities', '_score': 1.0, '_source': {'city': 'Los Angeles', 'country': 'America Bitch'}, '_type': 'places'}], 'max_score': 1.0, 'total': {'relation': 'eq', 'value': 2}}, 'timed_out': False, 'took': 11}
Rechercher avec des regex¶
#tout afficher
es.search(index="cities", body={"query": {"regexp" : { "city" : ".*" }}})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '1', '_index': 'cities', '_score': 1.0, '_source': {'city': 'New Delhi', 'country': 'India'}, '_type': 'places'}, {'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}, {'_id': '3', '_index': 'cities', '_score': 1.0, '_source': {'city': 'Los Angeles', 'country': 'America Bitch'}, '_type': 'places'}], 'max_score': 1.0, 'total': {'relation': 'eq', 'value': 3}}, 'timed_out': False, 'took': 4}
#afficher les cities qui commencent par L
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*" }}})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}, {'_id': '3', '_index': 'cities', '_score': 1.0, '_source': {'city': 'Los Angeles', 'country': 'America Bitch'}, '_type': 'places'}], 'max_score': 1.0, 'total': {'relation': 'eq', 'value': 2}}, 'timed_out': False, 'took': 15}
#afficher les cities qui commencent par L et terminent par n
es.search(index="cities", body={"query": {"regexp" : { "city" : "l.*n" }}})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 1, 'total': 1}, 'hits': {'hits': [{'_id': '2', '_index': 'cities', '_score': 1.0, '_source': {'city': 'London', 'country': 'England'}, '_type': 'places'}], 'max_score': 1.0, 'total': {'relation': 'eq', 'value': 1}}, 'timed_out': False, 'took': 50}
Agregation¶
#agregation simple -> movies/years
res = es.search(index="movies",body={"aggs" : {
"nb_par_annee" : {
"terms" : {"field" : "fields.year"}
}}})
res['aggregations']
{'nb_par_annee': {'buckets': [{'doc_count': 448, 'key': 2013}, {'doc_count': 404, 'key': 2012}, {'doc_count': 308, 'key': 2011}, {'doc_count': 253, 'key': 2009}, {'doc_count': 249, 'key': 2010}, {'doc_count': 207, 'key': 2008}, {'doc_count': 204, 'key': 2006}, {'doc_count': 200, 'key': 2007}, {'doc_count': 170, 'key': 2005}, {'doc_count': 152, 'key': 2014}], 'doc_count_error_upper_bound': 52, 'sum_other_doc_count': 2192}}
#agregation et stats simple -> moyennes des raitings
res = es.search(index="movies",body={"aggs" : {
"note_moyenne" : {
"avg" : {"field" : "fields.rating"}
}}})
res['aggregations']
{'note_moyenne': {'value': 6.387107691895831}}
#agregation et stats simple -> stats basiques raitings/years
res = es.search(index="movies",body={"aggs" : {
"group_year" : {
"terms" : { "field" : "fields.year" },
"aggs" : {
"note_moyenne" : {"avg" : {"field" : "fields.rating"}},
"note_min" : {"min" : {"field" : "fields.rating"}},
"note_max" : {"max" : {"field" : "fields.rating"}}
}
}}})
res["aggregations"]
{'group_year': {'buckets': [{'doc_count': 448, 'key': 2013, 'note_max': {'value': 8.699999809265137}, 'note_min': {'value': 2.5}, 'note_moyenne': {'value': 5.962700002789497}}, {'doc_count': 404, 'key': 2012, 'note_max': {'value': 8.600000381469727}, 'note_min': {'value': 2.4000000953674316}, 'note_moyenne': {'value': 5.961786593160322}}, {'doc_count': 308, 'key': 2011, 'note_max': {'value': 8.5}, 'note_min': {'value': 1.7000000476837158}, 'note_moyenne': {'value': 6.114285714440531}}, {'doc_count': 253, 'key': 2009, 'note_max': {'value': 8.399999618530273}, 'note_min': {'value': 2.700000047683716}, 'note_moyenne': {'value': 6.268774692248921}}, {'doc_count': 249, 'key': 2010, 'note_max': {'value': 8.800000190734863}, 'note_min': {'value': 1.7999999523162842}, 'note_moyenne': {'value': 6.239759046868627}}, {'doc_count': 207, 'key': 2008, 'note_max': {'value': 9.0}, 'note_min': {'value': 1.7999999523162842}, 'note_moyenne': {'value': 6.230917865527425}}, {'doc_count': 204, 'key': 2006, 'note_max': {'value': 8.5}, 'note_min': {'value': 1.7999999523162842}, 'note_moyenne': {'value': 6.31617646708208}}, {'doc_count': 200, 'key': 2007, 'note_max': {'value': 8.300000190734863}, 'note_min': {'value': 2.200000047683716}, 'note_moyenne': {'value': 6.419499988555908}}, {'doc_count': 170, 'key': 2005, 'note_max': {'value': 8.300000190734863}, 'note_min': {'value': 2.299999952316284}, 'note_moyenne': {'value': 6.289999998317045}}, {'doc_count': 152, 'key': 2014, 'note_max': {'value': 4.860000133514404}, 'note_min': {'value': 4.860000133514404}, 'note_moyenne': {'value': 4.860000133514404}}], 'doc_count_error_upper_bound': 52, 'sum_other_doc_count': 2192}}
📟 Exercice [optionnel]¶
Tester d'autres requetes
Datetime agrégation¶
Pour illuster l'agrégation par datetime on va créer un index travel
et utiliser des data de type :
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)}
#specify mapping and create index
if es.indices.exists(index="travel"):
es.indices.delete(index="travel", ignore=[400,404])
settings = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1
},
"mappings": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"country": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"datetime": {
"type": "date",
}
}
}
}
es.indices.create(index="travel", ignore=400, body=settings)
{'acknowledged': True, 'index': 'travel', 'shards_acknowledged': True}
import datetime
doc1 = {"city":"Bangalore", "country":"India","datetime": datetime.datetime(2018,1,1,10,20,0)} #datetime format: yyyy,MM,dd,hh,mm,ss
doc2 = {"city":"London", "country":"England","datetime": datetime.datetime(2018,1,2,22,30,0)}
doc3 = {"city":"Los Angeles", "country":"USA","datetime": datetime.datetime(2018,4,19,18,20,0)}
es.index(index="travel", id=1, body=doc1)
es.index(index="travel", id=2, body=doc2)
es.index(index="travel", id=3, body=doc3)
{'_id': '3', '_index': 'travel', '_primary_term': 1, '_seq_no': 2, '_shards': {'failed': 0, 'successful': 1, 'total': 2}, '_type': '_doc', '_version': 1, 'result': 'created'}
es.indices.get_mapping(index='travel')
{'travel': {'mappings': {'properties': {'city': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}}, 'type': 'text'}, 'country': {'fields': {'keyword': {'ignore_above': 256, 'type': 'keyword'}}, 'type': 'text'}, 'datetime': {'type': 'date'}}}}}
es.search(index="travel", body={"from": 0, "size": 0, "query": {"match_all": {}}, "aggs": {
"country": {
"date_histogram": {"field": "datetime", "calendar_interval": "year"}}}})
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 2, 'total': 2}, 'aggregations': {'country': {'buckets': []}}, 'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}, 'timed_out': False, 'took': 8}
📟 Exercice [optionnel]¶
Créer le document suivant et inserer le en base afin de rafficher l'histogramme precedent, dite ce qui à changer.
doc4 = {"city":"Sydney", "country":"Australia","datetime":datetime.datetime(2019,4,19,18,20,0)}
{'_id': '4', '_index': 'travel', '_primary_term': 1, '_seq_no': 0, '_shards': {'failed': 0, 'successful': 1, 'total': 2}, '_type': '_doc', '_version': 1, 'result': 'created'}
{'_shards': {'failed': 0, 'skipped': 0, 'successful': 2, 'total': 2}, 'aggregations': {'country': {'buckets': []}}, 'hits': {'hits': [], 'max_score': None, 'total': {'relation': 'eq', 'value': 0}}, 'timed_out': False, 'took': 5}
Search text introduction : endpoint _analyze
¶
Construire un Analyzer¶
Avant de commencer cette partie assurez vous d'avoir créer un french analyzer
dans elasticsearch.
Ci joint l'exemple d'analyzer francais vu dans le cour :
PUT french
{
"settings": {
"analysis": {
"filter": {
"french_elision": {
"type": "elision",
"articles_case": true,
"articles": ["l", "m", "t", "qu", "n", "s", "j", "d", "c", "jusqu", "quoiqu", "lorsqu", "puisqu"]
},
"french_synonym": {
"type": "synonym",
"ignore_case": true,
"expand": true,
"synonyms": [
"réviser, étudier, bosser",
"mayo, mayonnaise",
"grille, toaste"
]
},
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
}
},
"analyzer": {
"french_heavy": {
"tokenizer": "icu_tokenizer",
"filter": [
"french_elision",
"icu_folding",
"french_synonym",
"french_stemmer"
]
},
"french_light": {
"tokenizer": "icu_tokenizer",
"filter": [
"french_elision",
"icu_folding"
]
}
}
}
}
}
🤓 Assurer vous d'installer le pluging qui contient icu_tokenizer
avant sinon vous allez avoir une erreur.
doc1 = {"text" : "Une phrase en français :) ..."}
es.index(index="french", id=1, body=doc1)
{'_id': '1', '_index': 'french', '_primary_term': 1, '_seq_no': 2, '_shards': {'failed': 0, 'successful': 1, 'total': 2}, '_type': '_doc', '_version': 3, 'result': 'updated'}
es.indices.analyze(index="french",body={
"text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
})
{'tokens': [{'end_offset': 2, 'position': 0, 'start_offset': 0, 'token': 'je', 'type': '<ALPHANUM>'}, {'end_offset': 7, 'position': 1, 'start_offset': 3, 'token': 'dois', 'type': '<ALPHANUM>'}, {'end_offset': 14, 'position': 2, 'start_offset': 8, 'token': 'bosser', 'type': '<ALPHANUM>'}, {'end_offset': 19, 'position': 3, 'start_offset': 15, 'token': 'pour', 'type': '<ALPHANUM>'}, {'end_offset': 23, 'position': 4, 'start_offset': 20, 'token': 'mon', 'type': '<ALPHANUM>'}, {'end_offset': 27, 'position': 5, 'start_offset': 24, 'token': 'qcm', 'type': '<ALPHANUM>'}, {'end_offset': 33, 'position': 6, 'start_offset': 28, 'token': 'sinon', 'type': '<ALPHANUM>'}, {'end_offset': 36, 'position': 7, 'start_offset': 34, 'token': 'je', 'type': '<ALPHANUM>'}, {'end_offset': 41, 'position': 8, 'start_offset': 37, 'token': 'vais', 'type': '<ALPHANUM>'}, {'end_offset': 47, 'position': 9, 'start_offset': 42, 'token': 'avoir', 'type': '<ALPHANUM>'}, {'end_offset': 51, 'position': 10, 'start_offset': 48, 'token': 'une', 'type': '<ALPHANUM>'}, {'end_offset': 56, 'position': 11, 'start_offset': 52, 'token': 'sale', 'type': '<ALPHANUM>'}, {'end_offset': 61, 'position': 12, 'start_offset': 57, 'token': 'note', 'type': '<ALPHANUM>'}]}
📟 Exercice [optionnel]¶
Ajouter une fonctionnalités de reconnaissance de smiley à votre analyzer, de sorte qu'il fasse le lien suivant :
:) -> _content_
:( -> _triste_
Faite ensuite une requete en python sur le document ci-dessous :
{
"text" : "Je dois bosser pour mon QCM sinon je vais avoir une sale note :( ..."
}
{'tokens': [{'end_offset': 2, 'position': 0, 'start_offset': 0, 'token': 'je', 'type': '<ALPHANUM>'}, {'end_offset': 7, 'position': 1, 'start_offset': 3, 'token': 'dois', 'type': '<ALPHANUM>'}, {'end_offset': 14, 'position': 2, 'start_offset': 8, 'token': 'bosser', 'type': '<ALPHANUM>'}, {'end_offset': 19, 'position': 3, 'start_offset': 15, 'token': 'pour', 'type': '<ALPHANUM>'}, {'end_offset': 23, 'position': 4, 'start_offset': 20, 'token': 'mon', 'type': '<ALPHANUM>'}, {'end_offset': 27, 'position': 5, 'start_offset': 24, 'token': 'qcm', 'type': '<ALPHANUM>'}, {'end_offset': 33, 'position': 6, 'start_offset': 28, 'token': 'sinon', 'type': '<ALPHANUM>'}, {'end_offset': 36, 'position': 7, 'start_offset': 34, 'token': 'je', 'type': '<ALPHANUM>'}, {'end_offset': 41, 'position': 8, 'start_offset': 37, 'token': 'vais', 'type': '<ALPHANUM>'}, {'end_offset': 47, 'position': 9, 'start_offset': 42, 'token': 'avoir', 'type': '<ALPHANUM>'}, {'end_offset': 51, 'position': 10, 'start_offset': 48, 'token': 'une', 'type': '<ALPHANUM>'}, {'end_offset': 56, 'position': 11, 'start_offset': 52, 'token': 'sale', 'type': '<ALPHANUM>'}, {'end_offset': 61, 'position': 12, 'start_offset': 57, 'token': 'note', 'type': '<ALPHANUM>'}, {'end_offset': 64, 'position': 13, 'start_offset': 62, 'token': '_triste_', 'type': '<ALPHANUM>'}]}