In my post about the CQL WHERE clause I didn’t fully cover when the IN operator is supported and how it differs from the = operator. Based on the comments on that post and some questions I’ve gotten from people who read the post, I think it’s worth going into some detail about the IN operator.
Here’s where the IN operator is supported:
The last column in the partition key, assuming the = operator is used on the first N-1 columns of the partition key
The last clustering column, assuming the = operator is used on the first N-1 clustering columns and all partition keys are restricted
The last clustering column, assuming the = operator is used on the first N-1 clustering columns and ALLOW FILTERING is specified
It’s worth mentioning that I can use the IN operator interchangably with the = operator on a single value, for example:
is equivalent to:
But in that case the semantics are that of = even though I’m using IN.
The real use case of IN is with multiple distinct values. The simple case involves a single column partition key and/or clustering column. For this I’ll use the temperature table I’ve used in past examples:
I can use IN on the partition key:
Or on the clustering column (with ALLOW FILTERING):
In general it’s best to avoid queries that require ALLOW FILTERING because they often require lots of data to be scanned even if only a small amount of data is returned, but I show that example because it is a supported use of the IN operator.
A more feasible example is to use both together (without ALLOW FILTERING):
If I try to use the IN operator on the temperature column I get an error:
Even if I add a secondary index:
Moving on to composite keys, I’ll use the table that Vasyl Boroviak used in the comments of my previous post:
This table is useful because it contains a composite partition key and and has multiple clustering columns.
Since the partition key contains two columns, I need to use = on the first column in order to use IN on the second column:
Likewise there are two clustering columns, so I need to use = on the first column in order to use IN on the second column, and I also need to use ALLOW FILTERING if this is the only criteria:
Again I can combine the those two queries together without ALLOW FILTERING: