All Collections
June AI
How to use SQL to explore your data in June
How to use SQL to explore your data in June
Updated over a week ago

The AI explorer uses GPT-4 to translate questions about your product into SQL queries that can be executed on our database. While the AI explorer is in alpha phase, it's possible that it will sometimes generate SQL queries that don't really work with our database schema.

When that happens, you can refer this guide to get a sense of our database schema and the types of queries you can run on it.

We use ClickHouse as our analytics database, and give you read access to 3 tables: events, all_user_traits, and group_traits. This means that you can run SELECT queries on these tables and use ClickHouse functions depending on the data types of the columns you're working with. In most cases, you will use DateTime, String, and Integer functions.

Schema

events

This table contains all the TRACK and PAGE events that you've sent to us.

type

Integer

name

String

timestamp

DateTime

group_id

String

user_id

String

properties

String

context

String

all_user_traits

This table contains all the users you've sent to us, along with all of their traits and values.

user_id

String

key

String

value

String

timestamp

DateTime

group_traits

This table contains all the companies you've sent to us, along with all of their traits and values, and all the users associated with them.

group_id

String

user_id

String

key

String

value

String

timestamp

DateTime

Example queries

Count all track events

SELECT COUNT() FROM events WHERE type = 2

Count all page events

SELECT COUNT() FROM events WHERE type = 0

Count all distinct users

SELECT COUNT(DISTINCT user_id)
FROM all_user_traits
WHERE notEmpty(user_id)

Get all users who belong to a company

SELECT DISTINCT user_id
FROM group_traits
WHERE group_id = 'A'

Get all users where trait staff is true

SELECT
DISTINCT user_id
FROM (
SELECT
user_id,
key,
argMax(value, timestamp) AS _value
FROM all_user_traits
GROUP BY user_id, key
)
WHERE key = 'staff' AND _value = 'true'

Get all companies where trait is_paying is true

SELECT
DISTINCT group_id
FROM (
SELECT
group_id,
key,
argMax(value, timestamp) AS _value
FROM group_traits
GROUP BY group_id, key
)
WHERE key = 'is_paying' AND _value = 'true'

Count the events performed by non-staff users between a date range

SELECT
COUNT()
FROM events
WHERE timestamp BETWEEN '2023-01-01' AND '2023-02-01'
AND user_id IN (
SELECT
DISTINCT user_id
FROM (
SELECT
user_id,
key,
argMax(value, timestamp) AS _value
FROM all_user_traits
GROUP BY user_id, key
)
WHERE key = 'staff' AND _value != 'true'
)

Event property breakdown

SELECT properties_values[indexOf(properties_keys, '<your property name>')] AS <your property name>, COUNT(*) as count
FROM events
WHERE name = '<your event name>' AND has(properties_keys, '<your property name>')
GROUP BY <your property name>
ORDER BY count DESC

πŸ”΄ We do not share any user data with Open.ai. We just share the names of your events and the names of your traits πŸ”΄
​

FAQ:
- What exactly does is_active mean in the generated queries?
It's simply a condition that checks whether that particular event was deleted or not

- What's your schema?
- Which information can I ask AI?

Did this answer your question?