{ "cells": [ { "cell_type": "markdown", "id": "8ee0e82f-95fa-4efa-84e3-9873d4dcdaf1", "metadata": { "id": "8ee0e82f-95fa-4efa-84e3-9873d4dcdaf1" }, "source": [ "# Sound Similarity Search with Vector Database\n", "\n", "Use CassIO and Astra DB / Apache Cassandra® for similarity searches between **sound samples**, powered by sound embeddings and Vector Search." ] }, { "cell_type": "markdown", "id": "2bdb8284", "metadata": {}, "source": [ "_**NOTE:** this uses Cassandra's \"Vector Similarity Search\" capability.\n", "Make sure you are connecting to a vector-enabled database for this demo._" ] }, { "cell_type": "markdown", "id": "58bc1d2f-7039-4d2a-950c-ff3686013c55", "metadata": { "id": "58bc1d2f-7039-4d2a-950c-ff3686013c55" }, "source": [ "In this notebook you will:\n", "\n", "1. Download a library of sound samples from HuggingFace Datasets.\n", "2. Calculate sound embedding vectors for them with PANNs Inference.\n", "3. Store the embedding vectors on a table in your Cassandra / Astra DB instance, using the CassIO library for ease of operation.\n", "4. Run one or more searches for sounds similar to a provided sample.\n", "5. Start a simple web-app that exposes a **sound search** feature." ] }, { "cell_type": "markdown", "id": "e49d8157", "metadata": {}, "source": [ "### Import packages" ] }, { "cell_type": "markdown", "id": "49c18332", "metadata": {}, "source": [ "The CassIO object needed for this demo is the `MetadataVectorCassandraTable`:" ] }, { "cell_type": "code", "execution_count": 1, "id": "449f9dff", "metadata": {}, "outputs": [], "source": [ "from cassio.table import MetadataVectorCassandraTable" ] }, { "cell_type": "markdown", "id": "25d08ec4-0e7b-4819-933f-43ed5ff95e48", "metadata": { "id": "25d08ec4-0e7b-4819-933f-43ed5ff95e48" }, "source": [ "Other packages are needed for various tasks in this demo:" ] }, { "cell_type": "code", "execution_count": 2, "id": "54137e39", "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "from IPython.display import Audio\n", "from tqdm.auto import tqdm\n", "import torch\n", "import numpy as np\n", "\n", "# processing of sound samples:\n", "from scipy.io import wavfile\n", "import librosa\n", "# HuggingFace dataset loading:\n", "from datasets import load_dataset\n", "# Sound embedding calculation:\n", "from panns_inference import AudioTagging\n", "# To spawn simple data-oriented UIs from the notebook\n", "import gradio" ] }, { "cell_type": "code", "execution_count": 3, "id": "01da99af-da9b-4f38-b841-d802ff23bf2f", "metadata": { "id": "01da99af-da9b-4f38-b841-d802ff23bf2f" }, "outputs": [], "source": [ "try:\n", " from google.colab import files\n", " IS_COLAB = True\n", "except ModuleNotFoundError:\n", " IS_COLAB = False" ] }, { "cell_type": "markdown", "id": "b7a93b64-e9e9-41d1-95c9-dc9194a5ec8d", "metadata": { "id": "b7a93b64-e9e9-41d1-95c9-dc9194a5ec8d" }, "source": [ "### Connect to your DB" ] }, { "cell_type": "markdown", "id": "1dfb2e30", "metadata": { "id": "1dfb2e30" }, "source": [ "A database connection is needed. _(If on a Colab, the only supported option is the cloud service Astra DB.)_" ] }, { "cell_type": "code", "execution_count": 4, "id": "dbd71d7d-fd25-477f-8117-1ac66ba7fe1f", "metadata": {}, "outputs": [], "source": [ "# Ensure loading of database credentials into environment variables:\n", "import os\n", "from dotenv import load_dotenv\n", "load_dotenv(\"../../../.env\")\n", "\n", "import cassio" ] }, { "cell_type": "markdown", "id": "c30af571-5244-4b64-8bb9-853fb80749be", "metadata": {}, "source": [ "Select your choice of database by editing this cell, if needed:" ] }, { "cell_type": "code", "execution_count": 5, "id": "c32c5b16-6985-4a51-996f-5ec08f374934", "metadata": {}, "outputs": [], "source": [ "database_mode = \"cassandra\" # \"cassandra\" / \"astra_db\"if database_mode == \"astra_db\":" ] }, { "cell_type": "code", "execution_count": 6, "id": "09045382-f2bf-4cff-a35c-e3db0b79eeb5", "metadata": {}, "outputs": [], "source": [ "if database_mode == \"astra_db\":\n", " cassio.init(\n", " database_id=os.environ[\"ASTRA_DB_ID\"],\n", " token=os.environ[\"ASTRA_DB_APPLICATION_TOKEN\"],\n", " keyspace=os.environ.get(\"ASTRA_DB_KEYSPACE\"), # this is optional\n", " )" ] }, { "cell_type": "code", "execution_count": 7, "id": "e84678ae-4648-4df8-b829-0e2bb467085b", "metadata": {}, "outputs": [], "source": [ "if database_mode == \"cassandra\":\n", " from cqlsession import getCassandraCQLSession, getCassandraCQLKeyspace\n", " cassio.init(\n", " session=getCassandraCQLSession(),\n", " keyspace=getCassandraCQLKeyspace(),\n", " )" ] }, { "cell_type": "markdown", "id": "CCOnMYJoxLJw", "metadata": { "id": "CCOnMYJoxLJw" }, "source": [ "\n", "## Load the Data" ] }, { "cell_type": "markdown", "id": "otdRjVbexV-V", "metadata": { "id": "otdRjVbexV-V" }, "source": [ "In this demo, you will use audio samples from the [ESC-50 dataset](https://github.com/karolpiczak/ESC-50), a labeled collection of 2000 environmental audio recordings, each with a duration of five seconds.\n", "\n", "The dataset can be loaded from the HuggingFace model hub as follows:\n", "\n", "_(Note that, unless already cached, the download operation may take **a few minutes**.)_" ] }, { "cell_type": "code", "execution_count": 8, "id": "7oWJORavxb5H", "metadata": { "id": "7oWJORavxb5H", "outputId": "9b93862e-306a-41f1-a7f5-cc942f1dc672" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset({\n", " features: ['filename', 'fold', 'target', 'category', 'esc10', 'src_file', 'take', 'audio'],\n", " num_rows: 2000\n", "})\n" ] } ], "source": [ "audio_dataset = load_dataset(\"ashraq/esc50\", split=\"train\")\n", "\n", "# take a look...\n", "print(audio_dataset)" ] }, { "cell_type": "markdown", "id": "7HKVt8olAvkm", "metadata": { "id": "7HKVt8olAvkm" }, "source": [ "Each sample belongs to a \"category\". Take a look at the category for the first few items in the dataset:" ] }, { "cell_type": "code", "execution_count": 9, "id": "FE1o9dCdxulj", "metadata": { "id": "FE1o9dCdxulj", "outputId": "cdd4ecce-001a-4b03-96e5-618a0f5be59d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Categories:\n", "['dog', 'chirping_birds', 'vacuum_cleaner', 'vacuum_cleaner', 'thunderstorm']\n", "\n", "Filenames:\n", "['1-100032-A-0.wav', '1-100038-A-14.wav', '1-100210-A-36.wav', '1-100210-B-36.wav', '1-101296-A-19.wav']\n" ] } ], "source": [ "print(\"Categories:\")\n", "print(audio_dataset[\"category\"][:5])\n", "print(\"\\nFilenames:\")\n", "print(audio_dataset[\"filename\"][:5])" ] }, { "cell_type": "markdown", "id": "yaKxOuSTxjd0", "metadata": { "id": "yaKxOuSTxjd0" }, "source": [ "The actual audio signal is sampled at 44100 Hz and available as a NumPy array. Take a look at the first few entries:" ] }, { "cell_type": "code", "execution_count": 10, "id": "447f53f5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'path': None, 'array': array([0., 0., 0., ..., 0., 0., 0.]), 'sampling_rate': 44100}, {'path': None, 'array': array([-0.01184082, -0.10336304, -0.14141846, ..., 0.06985474,\n", " 0.04049683, 0.00274658]), 'sampling_rate': 44100}, {'path': None, 'array': array([-0.00695801, -0.01251221, -0.01126099, ..., 0.215271 ,\n", " -0.00875854, -0.28903198]), 'sampling_rate': 44100}]\n" ] } ], "source": [ "print(audio_dataset[\"audio\"][:3])" ] }, { "cell_type": "markdown", "id": "97gYTdw3xt17", "metadata": { "id": "97gYTdw3xt17" }, "source": [ "## Prepare the Audio Embedding Model\n", "\n", "_Note: if you are on a Colab, make sure your \"Runtime type\" has \"Hardware Acceleration\" set to GPU for best performance. The cell below will try to auto-detect your setup and adjust to it, please adapt to your specific hardware setup if necessary._\n", "\n", "_Note: please keep in mind that the cell below may take **up to eight minutes** to load the full PANNs model, unless already cached locally._" ] }, { "cell_type": "code", "execution_count": 11, "id": "X26ePde5yLaZ", "metadata": { "id": "X26ePde5yLaZ", "outputId": "3118872a-141f-485e-c694-1d911de8cd28" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checkpoint path: /home/USER/panns_data/Cnn14_mAP=0.431.pth\n", "Using CPU.\n", "\n", "Loaded the sound embedding model on the CPU. Reduced defaults will be used. Please consider upgrading to a GPU-powered hardware for best experience.\n" ] } ], "source": [ "GPU_AVAILABLE = torch.cuda.device_count() > 0\n", "\n", "if GPU_AVAILABLE:\n", " # load the default model on the GPU\n", " model = AudioTagging(checkpoint_path=None, device=\"cuda\")\n", " print(\"\\nLoaded the sound embedding model on the GPU.\")\n", "else:\n", " # fall back to the CPU\n", " model = AudioTagging(checkpoint_path=None, device=\"cpu\")\n", " print(\n", " \"\\nLoaded the sound embedding model on the CPU. Reduced defaults \"\n", " \"will be used. Please consider upgrading to a GPU-powered \"\n", " \"hardware for best experience.\"\n", " )" ] }, { "cell_type": "markdown", "id": "0670b30f-927f-47da-b71d-0a99092c3f58", "metadata": { "id": "0670b30f-927f-47da-b71d-0a99092c3f58" }, "source": [ "## Create a DB table through CassIO\n", "\n", "When an instance of `MetadataVectorCassandraTable` is created, CassIO takes care of the underlying database operations. An important parameter to supply is the embedding vector dimension (fixed, in this case, by the choice of the PANNs model being used):" ] }, { "cell_type": "code", "execution_count": 12, "id": "ENEwxldS_Vqy", "metadata": { "id": "ENEwxldS_Vqy" }, "outputs": [], "source": [ "table_name = \"audio_table\"\n", "embedding_dimension = 2048\n", "\n", "v_table = MetadataVectorCassandraTable(\n", " table=table_name,\n", " vector_dimension=embedding_dimension,\n", ")" ] }, { "cell_type": "markdown", "id": "c1fe256f-9efb-41f0-8803-d99696c6089b", "metadata": { "id": "c1fe256f-9efb-41f0-8803-d99696c6089b" }, "source": [ "## Compute and store embedding vectors for audio\n", "\n", "This cell processes the audio samples you just loaded. By working in batches, the embedding vectors are evaluated through the PANNs model, and the result is stored in the Cassandra / Astra DB table by invoking the `put` method of `MetadataVectorCassandraTable`.\n", "\n", "_Note: this operation will take **some minutes**. Feel free to adjust the total amount of sound clips to process from the library for a quicker demo._" ] }, { "cell_type": "code", "execution_count": 13, "id": "pMUTakZmG2Ye", "metadata": { "id": "pMUTakZmG2Ye", "outputId": "41400474-6c7a-4e56-f815-8f2e9e04487d" }, "outputs": [], "source": [ "if GPU_AVAILABLE:\n", " BATCH_SIZE = 100\n", " SAMPLES_TO_PROCESS = 2000\n", "else:\n", " BATCH_SIZE = 20\n", " SAMPLES_TO_PROCESS = 200\n", "\n", "for i in tqdm(range(0, SAMPLES_TO_PROCESS, BATCH_SIZE)):\n", " # Find end of batch\n", " i_end = min(i + BATCH_SIZE, SAMPLES_TO_PROCESS)\n", " # Extract batch filename and audio signal.\n", " # (the filename will also serve as row primary key on DB)\n", " batch_filenames = audio_dataset[\"filename\"][i:i_end]\n", " batch_audio = np.array([item[\"array\"] for item in audio_dataset[\"audio\"][i:i_end]])\n", " # Generate embeddings for all the audios in the batch\n", " _, batch_embeddings_np = model.inference(batch_audio)\n", " batch_categories = audio_dataset[\"category\"][i:i_end]\n", " # Insert all entries in the batch concurrently\n", " futures = []\n", " for filename, category, embedding_np in zip(\n", " batch_filenames, batch_categories, batch_embeddings_np\n", " ):\n", " metadata = {\n", " \"category\": category,\n", " \"filename\": filename,\n", " }\n", " # From a Numpy array to a plain list of floats:\n", " embedding = embedding_np.tolist()\n", " futures.append(v_table.put_async(\n", " body_blob=filename,\n", " vector=embedding,\n", " row_id=filename,\n", " metadata=metadata,\n", " ))\n", " for future in futures:\n", " future.result()" ] }, { "cell_type": "markdown", "id": "dafa5b74", "metadata": {}, "source": [ "Note that, as customary in Cassandra with (potentially) large binary blobs, you did not store the raw audio signal in the table itself. Rather, in the `document` field of the `MetadataVectorCassandraTable`, you have stored the necessary metadata to retrieve the audio file itself in some other way (which on a realistic setup could be a S3 bucket or similar). In this case this amounts to the `filename` field.\n", "\n", "To emulate a more realistic setup, create a dictionary for later lookup by filename:" ] }, { "cell_type": "code", "execution_count": 14, "id": "e81899e7", "metadata": {}, "outputs": [], "source": [ "audios_by_filename = {\n", " dataset_row[\"filename\"]: dataset_row[\"audio\"][\"array\"]\n", " for dataset_row in audio_dataset\n", "}" ] }, { "cell_type": "markdown", "id": "5db94e99", "metadata": {}, "source": [ "Here is how this (\"direct filename to audio array\") lookup would work:" ] }, { "cell_type": "code", "execution_count": 15, "id": "45925b0c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[-0.0118408203125, -0.103363037109375, -0.14141845703125, -0.120...\n" ] } ], "source": [ "# As an example:\n", "print(str(list(audios_by_filename[\"1-100038-A-14.wav\"]))[:64] + \"...\")" ] }, { "cell_type": "markdown", "id": "JiainTLa1U1q", "metadata": { "id": "JiainTLa1U1q" }, "source": [ "## Run a similarity search\n", "\n", "You will now obtain a new audio file and search for samples similar to it." ] }, { "cell_type": "markdown", "id": "1407bfce", "metadata": {}, "source": [ "Get the sound of a cat meowing with:" ] }, { "cell_type": "code", "execution_count": 16, "id": "ge8HGNwJ0bDB", "metadata": { "id": "ge8HGNwJ0bDB", "outputId": "651ad1e7-9a81-4344-f37d-f403a64bdc12" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2023-09-25 12:20:26-- https://storage.googleapis.com/audioset/miaow_16k.wav\n", "Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.180.155, 142.250.180.187, 142.251.209.27, ...\n", "Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.180.155|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 215546 (210K) [audio/x-wav]\n", "Saving to: ‘miaow_16k.wav’\n", "\n", "miaow_16k.wav 100%[===================>] 210.49K --.-KB/s in 0.06s \n", "\n", "2023-09-25 12:20:26 (3.36 MB/s) - ‘miaow_16k.wav’ saved [215546/215546]\n", "\n" ] } ], "source": [ "!wget https://storage.googleapis.com/audioset/miaow_16k.wav -O miaow_16k.wav" ] }, { "cell_type": "markdown", "id": "zhrz9EtLmnQv", "metadata": { "id": "zhrz9EtLmnQv" }, "source": [ "Load the audio using the `librosa` library:" ] }, { "cell_type": "code", "execution_count": 17, "id": "UlkXCOgm01rC", "metadata": { "id": "UlkXCOgm01rC", "outputId": "3ed4bd66-1602-4ec3-d2f2-5870c406c391" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Meow!\n" ] } ], "source": [ "meow_sound, meow_rate = librosa.load(\"miaow_16k.wav\")\n", "print(\"Meow!\")\n", "display(Audio(meow_sound, rate=meow_rate))" ] }, { "cell_type": "markdown", "id": "8yUPkZ1DuSHX", "metadata": { "id": "8yUPkZ1DuSHX" }, "source": [ "In order to run the search, first get the embedding vector for the input file, then use it to run a similarity search on the CassIO `MetadataVectorCassandraTable`:" ] }, { "cell_type": "code", "execution_count": 18, "id": "xwf3xsh40523", "metadata": { "id": "xwf3xsh40523" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Match 0: 1-34094-A-5.wav (category: cat, distance: 0.7972)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 1: 1-211527-A-20.wav (category: crying_baby, distance: 0.7102)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 2: 1-30226-A-0.wav (category: dog, distance: 0.6982)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 3: 1-211527-C-20.wav (category: crying_baby, distance: 0.6927)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 4: 1-34094-B-5.wav (category: cat, distance: 0.6924)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 2: 1-30226-A-0.wav (category: dog, distance: 0.6982)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 3: 1-211527-C-20.wav (category: crying_baby, distance: 0.6927)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Match 4: 1-34094-B-5.wav (category: cat, distance: 0.6924)\n" ] } ], "source": [ "# Reshape query audio\n", "reshaped_meow = meow_sound[None, :]\n", "# Get the embeddings for the new audio\n", "_, query_embedding_np = model.inference(reshaped_meow)\n", "query_embedding = query_embedding_np.tolist()[0]\n", "\n", "matches = v_table.metric_ann_search(\n", " vector=query_embedding,\n", " n=5,\n", " metric=\"cos\",\n", ")\n", "\n", "# Show a \"play\" widget for the top results\n", "for match_i, match in enumerate(matches):\n", " print(f\"Match {match_i}: {match['body_blob']} \", end=\"\")\n", " print(f\"(category: {match['metadata']['category']}, \", end=\"\")\n", " print(f\"distance: {match['distance']:.4f})\")\n", " # retrieve the audio clip content from \"storage\"\n", " match_audio = audios_by_filename[match[\"body_blob\"]]\n", " display(Audio(match_audio, rate=44100))" ] }, { "cell_type": "markdown", "id": "3-5Y0en3tPb0", "metadata": { "id": "3-5Y0en3tPb0" }, "source": [ "## Experiment with your own WAV file\n", "\n", "In this section, you can supply any WAV audio file of your own to have a bit of fun.\n", "\n", "While you're at it, do a bit of refactoring of the audio processing steps:" ] }, { "cell_type": "code", "execution_count": 19, "id": "0ed14d9c", "metadata": {}, "outputs": [], "source": [ "def wav_filepath_to_audio(filepath):\n", " loaded_audio, bitrate = librosa.load(filepath)\n", " return loaded_audio, bitrate\n", " \n", "def audio_similarity_search(query_audio, top_k=5):\n", " query_audio0 = query_audio[None, :]\n", " # If stereo sound comes from Gradio, this input will have a third dimension: average it away!\n", " if len(query_audio0.shape) == 3:\n", " query_audio1 = np.average(query_audio0, axis=2)\n", " else:\n", " query_audio1 = query_audio0\n", " # get the embeddings for the audio from the model\n", " _, query_embedding_np = model.inference(query_audio1)\n", " query_embedding = query_embedding_np.tolist()[0]\n", " matches = v_table.metric_ann_search(\n", " vector=query_embedding,\n", " n=top_k,\n", " metric=\"cos\",\n", " )\n", " return matches" ] }, { "cell_type": "markdown", "id": "19afe047", "metadata": {}, "source": [ "Now try providing a sound file of yours (skip this part if you want):" ] }, { "cell_type": "code", "execution_count": 20, "id": "cYnpzEN3tVlh", "metadata": { "id": "cYnpzEN3tVlh", "outputId": "0a1c8740-c274-4cd1-f027-4eb00b7b6265" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Please provide the full path to a WAV file: /home/USER/Desktop/man_oofing.wav\n", "Your query sound:\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Similar clips:\n", "1-172649-A-40.wav (helicopter)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1-32373-B-35.wav (washing_machine)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1-172649-B-40.wav (helicopter)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1-32373-B-35.wav (washing_machine)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "1-172649-B-40.wav (helicopter)\n" ] } ], "source": [ "if IS_COLAB:\n", " print(\"Please upload a WAV file from your computer:\")\n", " uploaded = files.upload()\n", " wav_file_title = list(uploaded.keys())[0]\n", " wav_filepath = os.path.join(os.getcwd(), wav_file_title)\n", "else:\n", " wav_filepath = input(\"Please provide the full path to a WAV file: \")\n", "\n", "supplied_audio, bitrate = wav_filepath_to_audio(wav_filepath)\n", "print(\"Your query sound:\")\n", "display(Audio(supplied_audio, rate=bitrate))\n", "\n", "print(\"Similar clips:\")\n", "for match in audio_similarity_search(supplied_audio, top_k=3):\n", " print(f\"{match['row_id']} ({match['metadata']['category']})\")\n", " match_audio = audios_by_filename[match[\"body_blob\"]]\n", " display(Audio(match_audio, rate=44100))" ] }, { "cell_type": "markdown", "id": "yXwEwwWmcs8E", "metadata": { "id": "yXwEwwWmcs8E" }, "source": [ "## Sound Similarity Web App\n", "\n", "The following cells set up and launch a simple application, powered by [Gradio](https://www.gradio.app/), demonstrating the sound similarity search seen so far.\n", "\n", "In its essence, Gradio makes it easy to expose a graphical interface around the following function, built using the components seen earlier, that accepts a user-provided sound as input and returns a number of results from the library, found by similarity.\n", "\n", "The input can be either a **sound recorded with the user's microphone** or an **uploaded WAV file** (the former taking precedence if both are supplied)." ] }, { "cell_type": "code", "execution_count": 21, "id": "d6a2df13", "metadata": {}, "outputs": [], "source": [ "NUM_RESULT_WIDGETS = 5\n", "\n", "def gradio_upload_audio(microphone_sound, input_sound):\n", " if microphone_sound is not None:\n", " input_sound = microphone_sound\n", " if input_sound:\n", " # Warning: Gradio sound signals arrive as \"int\" between +/- 32767.\n", " # First these must be normalized to [-1:+1]\n", " # (see https://github.com/gradio-app/gradio/issues/2789)\n", " max_sound_signal = np.abs(input_sound[1]).max()\n", " input_sound_norm_signal = input_sound[1]/max_sound_signal\n", " input_audio = np.array(input_sound_norm_signal, dtype=np.float32)\n", " found_audios = []\n", " for match in audio_similarity_search(input_audio, top_k=NUM_RESULT_WIDGETS):\n", " match_audio = audios_by_filename[match[\"body_blob\"]]\n", " sample_rate = 44100\n", " # normalize back to the Gradio y-scale for sounds:\n", " gradio_rescaled_audio = np.int16(match_audio * 32767)\n", " this_result = (sample_rate, gradio_rescaled_audio)\n", " found_audios.append(this_result)\n", " else:\n", " found_audios = []\n", " # pad the result in any case to the number of displayed widgets\n", " return found_audios + [None]*(NUM_RESULT_WIDGETS-len(found_audios))" ] }, { "cell_type": "markdown", "id": "qmwv1HnlAt3x", "metadata": { "id": "qmwv1HnlAt3x" }, "source": [ "The next cell starts the Gradio app: click on the URL that will be displayed to open it.\n", "\n", "Please keep in mind that:\n", "\n", "- The cell will keep running as long as the UI is running. **Interrupt the notebook kernel to regain control** (e.g. to modify and re-launch, or execute other cells, etc).\n", "- The cell output will give both a local URL to access the application, and an URL such as `https://<....>.gradio.live` to reach it from anywhere. **Use the latter link from Colab and when sharing with others**. _(The link will expire after a certain time.)_\n", "- The UI will also be shown within the notebook below the cell." ] }, { "cell_type": "code", "execution_count": null, "id": "a47be7d5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running on local URL: http://127.0.0.1:7860\n" ] } ], "source": [ "sound_ui = gradio.Interface(\n", " fn=gradio_upload_audio,\n", " inputs=[gradio.components.Audio(source=\"microphone\"), gradio.components.Audio(type=\"numpy\", label=\"Your query audio\")],\n", " outputs=[\n", " gradio.components.Audio(type=\"numpy\", label=f\"Search result #{output_i}\")\n", " for output_i in range(NUM_RESULT_WIDGETS)\n", " ],\n", " title=\"Sound Similarity Search with CassIO & Vector Database\",\n", ")\n", "\n", "sound_ui.launch(share=True, debug=True)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "T4", "provenance": [], "toc_visible": true }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 5 }