This project presents a REST API for retrieving the most similar songs based on a user's text query. Leveraging the Single-Pass In-Memory Indexing (SPIMI) technique, this API indexes data from songs (lyrics, author, album name) to create a high-speed inverted index. The SPIMI approach enhances indexing efficiency, making the API ideal for rapid and relevant song retrieval.
Features:
- Efficient SPIMI-based indexing for quick data retrieval.
- Customizable query system tailored for music-related searches.
- Process: The data undergoes tokenization, normalization, and stemming.
- Description: Blocks of user-defined maximum size are created from the token stream and merged into a final index.
- Methodology: Utilizes tf.idf weights to process the merged index for query use.
- Benefits: Ensures relevant and accurately ranked search results.
- Functionality: Users can retrieve the top 'K' songs that closely match their query.
In order to implement the SPIMI process, the theory given in these docs by The Stanford Natural Language Processing Group was used.
- Python 3.11
- Basic understanding of REST APIs and Python scripting.
To correctly use this project and see how it works you should follow these steps:
-
Clone the Project:
git clone https://github.com/NicolasArroyo/spimi-ranked-retrieval
-
Install Dependencies:
cd spimi-ranked-retrieval pip install -r requirements.txt
-
Environment Setup:
-
Create a
.env
file with the following content. AdjustPAGE_SIZE
as needed.PAGE_SIZE=4096
-
-
Index Generation (Optional):
-
Modify the token stream source in
generate_index()
withinindex_generator.py
. -
Run the script:
python3 index_generator.py
-
Note: A pre-loaded index with 16k songs is provided.
-
-
Running the API:
uvicorn api:app --host 0.0.0.0 --port 3000
-
Query for Artist:
POST localhost:3000/api/index Content-Type: application/json { "query": "Weeknd", "k": 3 }
{ "doc0": { "score": 7.889683586827221, "id": "3Dt75NjLThmoBTp5wQC7g7", "name": "Same Old Song", "artist": "The Weeknd", "album_name": "Trilogy" }, "doc1": { "score": 3.9448417934136106, "id": "00NAQYOP4AmWR549nnYJZu", "name": "Secrets", "artist": "The Weeknd", "album_name": "Starboy" }, "doc2": { "score": 3.9448417934136106, "id": "0dcf0L6F1LUA1nE2zWH4J2", "name": "The Party & The After Party", "artist": "The Weeknd", "album_name": "Trilogy" } }
-
Query for Lyrics:
POST localhost:3000/api/index Content-Type: application/json { "query": "Pour some sugar on me", "k": 2 }
{ "doc0": { "score": 431.1133029364177, "id": "0PdM2a6oIjqepoEfcJo0RO", "name": "Pour Some Sugar On Me - Remastered 2017", "artist": "Def Leppard", "album_name": "Hysteria (Super Deluxe)" }, "doc1": { "score": 413.0929137110803, "id": "1e9oZCCiX42nJl0AcqriVo", "name": "Watermelon Sugar", "artist": "Harry Styles", "album_name": "Watermelon Sugar" } }
This project is licensed under the MIT License.