Further reading
If you want to see CassIO in action without runnig anything locally, as outlined in the "Start Here" page, just open a Colab example backed by an Astra DB cloud instance.
However, you may want to switch to a different setup. This page outlines how this is accomplished in two separate respects: running the examples locally as Jupyter notebooks, and running your own Cassandra cluster instead of Astra DB.
Run with local Jupyter
There are several reasons one might prefer to launch the code locally: for example, it may be easier to evolve the notebooks into a full-fledged application; also, one can prefer a non-ephemeral setup, especially when planning to run several examples in a row.
In the following we assume you have fulfilled the pre-requisites listed on the previous page.
You should have basic familiarity with git
and the shell console.
Clone this repository
First, clone this repository on your machine (the repo spans both the website and the examples). In a directory of your choice, execute the following:
git clone https://github.com/CassioML/cassio-website.git
cd cassio-website
Note that the following commands are to be run in the cassio-website
directory.
DB (Astra DB case)
You need a .env
file which defines the database credentials and connection
parameters.
You can copy the provided .env.template
file and replace
the environment variables you see there.
If using Astra DB, these amount to the database ID,
the Token (with role "database administrator"), and optionally a keyspace name.
If you plan on using a local Cassandra, the .env
setup instructions are given
below.
LLM Credentials
In this repo's root directory again, create a .api_keys
file where the secrets
necessary for your LLM of choice are defined. You can copy
the provided .api_keys.template
and adjust the values therein.
Check out the LLM Pre-requisites for a list of supported LLMs: each requires different variable(s) to be set here.
Automatic choice of LLM
The code examples generally rely on a helper function to determine
which LLM to use, based on which secrets are detected in this file.
You can define your preferred LLM
(e.g. in case you define more than one secret) by setting
the environment variable PREFERRED_LLM_PROVIDER
in .api_keys
.
Remember to "source" this file before launching notebooks or Python scripts:
. .api_keys
Framework-specific setup
Now, database and LLM are all set for running the examples locally.
For each framework, still, you will have to prepare a specific Python environment with the right dependencies. The instructions are given in the section of this docs specific to that framework: for example, here is how you start the LangChain examples locally.
Use a local Vector-capable Cassandra
Starting with version 5.0, Apache Cassandra® ships with Vector capabilities.
You can easily launch a locally-running (single-node) Cassandra cluster through Docker. First make sure you have Docker installed, then launch the following command:
docker run --name my-cassandra -d cassandra:5.0-alpha2
In the command above, you can name the container any way you like:
but keep in mind that the instructions on this page assume you
used my-cassandra
.
The 5.0-alpha2
is an image tag: you may want to check Cassandra's
DockerHub page
for the newest 5.*
tag to use.
In a few minutes, the container will be up and running, ready to be used. You
can verify this by running docker exec -it my-cassandra nodetool status
and
looking for an output line starting with UN ...
(which stands the "Up" and "Normal" state of the node).
Other ways to run Cassandra
If you have a running Cassandra cluster through other means than
Docker, no problem. Just make sure to specify the contact point(s)
and the keyspace name in the .env
file as outlined below.
For more advanced setup involving e.g. authentication, you might have to
modify the Python code that creates the Session
to fit your needs.
CQL Console
To launch a CQL Console on the Docker container, run the following:
docker exec -it my-cassandra cqlsh
Populate the database
Still in the CQL Console, create a keyspace for the examples by executing the following:
CREATE KEYSPACE IF NOT EXISTS cassio_tutorials
WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};
You can check that the keyspace exists with:
DESC KEYSPACES;
You can now exit the CQL console (EXIT
+ Enter, or Ctrl-D
).
Use the local Cassandra in the code
Your local Cassandra is ready to support all examples. Now make sure you
set the connection parameters in the .env
file (at the root of this repo).
You can copy from the provided .env.template
example file if
you haven't yet and, ignoring the ASTRA_DB...
variables, make sure
the LOCAL_...
variables in the .env
define:
- and the IP address (or "contact point") for the Docker image;
- the correct keyspace name used above.
In particular, the IP address of the container is found within the very long
output of docker inspect my-cassandra
. The following command
should locate it for you:
docker inspect my-cassandra | \
jq -r '.[].NetworkSettings.Networks.bridge.IPAddress'
# ... with an output such as "172.17.0.2"
All notebooks offer the choice between using Cassandra and Astra DB: the former
case relies on a cqlsession.py
, imported from the notebooks,
which provides the simple logic to create the session and read the keyspace
from the environment variables. If you need additional customization (such as
setting up authentication, using a custom port for CQL, etc), this is the
file you should
further edit
to fit your needs.
Keep in mind that if you are running the notebooks in a cloud environment such
as Google Colab the only supported choice will be the cloud database Astra DB.
(Should you need to run a Colab targeting a Cassandra cluster, you will have to
essentially transport the logic in cqlsession.py
into a notebook cell.)