Tutorial: Uploading files and retrieving results
The basic workflow of document processing involves uploading files into Impira, then retrieving results. This section walks through the basics of how to accomplish this with Impira while introducing a few tradeoffs you may consider along the way.
Let's get set up
Before you start, you'll need to do two things:
1. Get an API token
You'll need a unique API token (i.e., a "key") to access Impira's API. See How to create an API token for learn how to create one.
Throughout this tutorial, you'll see references to <YOUR_TOKEN_HERE>
, which you should replace with your API token.
You also need to know your organization's name (it’ll be in your browser’s URL bar, after /o/
).Throughout the tutorial, you'll see references to <YOUR_ORG_HERE>
, which you should replace with your organization's name.
2. Create a Collection
In Impira, document processing occurs in Collections, which are like folders that group similar documents together. Documents within a Collection share a machine learning model that is unique to that Collection.
While getting started, we recommend creating a Collection in the UI. Once you create a Collection, you should grab its ID, which you can find in the URL after /fc/
. Throughout the tutorial, you'll see references to <YOUR_COLLECTION_ID>
, which you should replace with your Collection's ID.
The Impira API is HTTP-based. You can access it through a REST API using HTTP libraries in just about any programming language. We also offer a Python SDK which is currently in beta. The examples below show how to accomplish each task using both the REST API and Python SDK.
Uploading files
Most APIs operate over small volumes of data. Document files are comparatively larger, and so there are a couple different options for how to upload files through the API. Impira's API is asynchronous, which means that after you upload, you receive a uid
for each file which you can later use to access its data.

Option 1: Using a URL
The most performant and scalable way of uploading files is to use a publicly accessible HTTP URL (e.g., a signed S3 URL). These uploads have a much higher rate limit and are generally faster. If you upload using a URL that expires, we highly recommend setting a high expiration time (e.g., one week), in case the file needs to later be re-processed or if you happen to upload during peak load.
When you upload one or more files, Impira returns a uid
for each file. This is a globally unique identifier for the file that you can use to retrieve its results.
Example
Option 2: Using a local file
It isn't always possible to access files globally through a URL, like when the files are sitting on a local server. In that case, you can upload files directly to Impira.
Retry-After-Ms
, specifying the number of milliseconds until the next request should be sent.Example
Option 3: Using an integration
In addition to the API, Impira supports a number of integrations including Zapier, Email, and Snowflake which allow you to ingest data as well. Each of these integrations accepts a similar set of parameters as above and uses the same underlying API to ingest files.
If you plan to upload in bursts (i.e., 10 or more files at once), then you may want to use the async
flag, which means the API command returns before the file has been added to its Collection. During periods of high load, it may take some time for the file to fully process and get added to its Collection, and this flag helps avoid hitting timeout errors.
Retrieving results
Once a file has been uploaded to a Collection, Impira will immediately start processing it. Depending on the size of the file and number of fields in your Collection, it can take a few seconds to a few minutes for the file to process.

Impira provides a few different options for retrieving results as soon as they are ready:
Option 1: Webhooks
This is the most popular option. Impira can trigger a webhook once documents complete processing. For each file, the webhook includes its uid
, fields, and File
data that you can use to download the original file. It also tells you whether the file was inserted or deleted from the Collection. If you're familiar with relational databases, you can think of webhook data as entries in a "log" of changes to your Collection.
By default, Impira triggers a new Webhook for each file once it has completed processing. However, you can write custom IQL queries to receive a webhook for all kinds of scenarios, including the following (and many more):
- When a new low confidence prediction occurs
- When a new file is uploaded, before it completes processing
- When a file's data does not join correctly to a Dataset
Option 2: Long polling
If you aren't able to setup a webhook, you can alternatively use Impira's poll API which will block for up to a minute or when your file completes processing (whichever occurs first). The poll API actually uses the same underlying query mechanisms as webhooks and the data it returns has the same format.
The Python SDK has a built-in method named poll_for_results
which captures this workflow for you.
Option 3: Reading results ad-hoc
You can also read the results directly through the Impira API with an ad-hoc request. The tradeoff here is that the results may not be finished processing, in which case you'll receive an empty or partial result. In general, we recommend using this method only if you know that the file has processed (e.g., if you want to query historical results).
Next steps
We hope this tutorial gives you a sense of the basics for how to upload files and retrieve data through Impira's API. The rest of our API reference docs go further in depth on each of these topics.
Don't hesitate to reach out to support@impira.com if you have any questions or feedback.
How to create an API Token
Impira uses API tokens to authenticate your API requests and to protect your data from unauthorized access.
We recommend using a unique token for each integration you set up so you can leave all of your other integrations intact when removing any single token to block access in the future.
Creating an API token
- Click the Account icon in the top right corner.
- Select API Tokens from the drop-down menu.
- Click the + on the right side to create a new token.
- Name your token and choose OK.

You’ll be presented with a Token Value which you can copy. Hold onto your Token Value for later.
You can always come back to your API Token list and click the Reveal Token Value icon anytime you need to reference your Token Value.

Validating your API token
To validate your API token, you can run a simple query to retrieve the metadata for a single file. Open your Terminal app and paste in:
curl -X GET --header "X-Access-Token: <YOUR_TOKEN_HERE>" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files?limit=1"
- Replace
<YOUR_ORG_HERE>
with your organization’s name (it’ll be in your browser’s URL bar, after /o/) - Replace
<YOUR_TOKEN_HERE>
with the Token you created above
If your API token and Org name are valid, you should see JSON formatted output containing the metadata for a file, its schema, and a status field with a value of 200
:
{
...
"status": 200,
...
}
However, if either is invalid, you will see an error like:
{
"error": "Invalid token",
"status": 403,
"request_id": "ba28b068-00e0-4a96-af38-6e98be341cad"
}
Revoking an API Token
You may want to revoke a token so it can no longer be used to access data. Do this by selecting the Revoke Tokens icon on the right and choosing Revoke.

Reading data via Impira Read API
This article discusses:
- Reading your files or data via Impira Read API
- Querying data via Impira Query Language (IQL) API
- Polling for changes
REST API endpoints
In order to read from different areas of your organization, there are a few endpoint structures that can be used. These same endpoints can also be used for running queries or updating and uploading files:
All Files In Organization
https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files
Files in an individual Collection
https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>
Reading files or data through the REST API
To read data at a given path, you can use the GET verb. You do not need to specify a body for the request. However, you can specify a few optional query parameters:
fields
is a comma separated list of fields. You can use this to select specific fields. By default, the read API returns all fields in a Collection, Dataset, or View.include_joins
. By default, for performance reasons, the read API does not include joined fields. You can specifyinclude_joins=true
to include joined fields.highest
andlowest
can be used to sort by a particular field. For example,highest=Price
will sort records by the highest value of Price first.offset
andlimit
can help paginate long results.offset
defines the starting point andlimit
constrains the total number of records returned. If you useoffset
andlimit
to paginate, you should specify a sort viahighest
orlowest
otherwise you may duplicate or miss records.- Any other parameters are interpreted as filters. For example, if you have a field named
Inspected
, you can specifyInspected=true
as a query parameter to only include records that haveInspected
set to the valuetrue
.
Example requests
Return all records in All files (all files uploaded to the Org).
curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files"
Return all records in a specific Collection.
curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>"
Return all records in a specific Collection with Inspected=true.
curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?Inspected=true"
Return 10 records in a specific Collection.
curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE"
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?limit=10"
Return 10 records in a specific Collection sorted by price.
curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?limit=10&lowest=Price"
Response format
By default, the read API returns a resolved format of the data in your Collection, which strips out machine learning metadata including confidence scores, word/page location, and model version. This is the same format sent through webhooks.
This resolved format is much easier to consume in downstream applications, but advanced use cases may want to leverage some of the lower level metadata Impira exposes. The Impira Query Language (IQL), described below, allows you to query this metadata and run more operations like filters, grouping, normalization, and joins.
Querying via the Impira Query Language (IQL) API
Impira Query Language (IQL), is a simple, SQL-like language that is designed specifically for querying the kinds of data found in documents (semi-structured data). Just like SQL, you can perform all kinds of manipulations, including joins, filters, grouping, expressions, and more.
The IQL API also allows you to access lower-level data exposed by Impira, like the underlying machine learning confidence scores, so you can build advanced applications.
Creating an IQL query
-
Go back to your browser window and click the Experimental Features icon in the top-right corner and choose IQL Playground.
-
Choose the Collection you want to query (left sidebar) and select the Open in API to see the same data you pulled in Terminal in Step 2, but this time within a new browser tab. This query returns each row in the Collection, and each field in the row. The rows are unresolved, meaning that they include all of the low level machine learning metadata including the confidence score, model version, and page/word location for each field.
Writing a more specific query
Now let’s get more specific with your data starting with just one row, then a field within that row.
- In the search bar, add a space after the code and enter
limit:1
This will give you one row to work with. Now, we’re going to isolate the File column data for this row. Enter a space after limit:1 and enter[File]
.

This is how specific and focused you can be by running a simple IQL query through the API. You can view this data in a new browser tab by clicking Open in API.

Running an IQL query through the API
- Copy the URL from your API browser window.
- Go to Terminal and enter:
curl -X GET --header "X-Access-Token: <YOUR_TOKEN_HERE>"\
<URL>
- Replace
YOUR_TOKEN_HERE
with your Token Value from Step 1 and hit Enter. - Replace
URL
with your copied URL from the API browser window.
You’ll now see the same data from Step 4 in your Terminal.
Writing more advanced IQL queries
IQL is a rich language with support for filters, joins, grouping, expressions, and more. Explore the advanced IQL queries guide here to learn more.
Running large queries
If your query is long enough, the generated URL may exceed 2048 characters in length, making it impossible to submit a query through a GET
request. To avoid hitting this limit, you can use a POST request instead to submit the query in the request body against the IQL API URL:
https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/iql
The body should be a JSON-encoded object with a query field containing the IQL query and an optional defaultLimit
field with a limit on the number of results.
Example request
curl -X POST 'https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/iql' \
--header 'X-Access-Token: YOUR_TOKEN_HERE'
--data '{"query": "@files limit:1"}'
Polling for changes
The IQL API additionally supports polling for changes. This allows you to run a query and receive a cursor, which you can provide as a parameter the next time. The API won’t return any results until new events arrive after the cursor, and then return those new events.
Key concepts
The Impira runtime supports incremental execution over IQL queries (with some exceptions noted below). This is the building block that makes Impira capable of incrementally retraining models, updating the UI in real-time, and providing Webhooks over your data. The poll API gives you access to this building block, so you can build powerful, real-time applications as well. Poll is an advanced feature, and in most cases, users can use the direct query API or webhooks.
Under the hood, Impira logs every change to data with an ordered cursor. These cursors are strings which can be compared lexicographically. The poll API allows you to specify a cursor alongside a query, and it’ll hold off on returning any results until the results of the query change, and then return a set of events with the changes and a new cursor.
Events
The event format is very similar to the standard data format that the IQL and other APIs return. Each event includes three fields:
data
: the full record corresponding to the eventcursor
: the cursor at which the event occurredaction
: delete or insert
The following example indicates that a record with UID “65d527f1d94521e5” should be inserted at cursor “0200000010862702816486”:
{
"data": {
"uid": "65d527f1d94521e5",
"score": 0.5,
},
"action": "insert",
"cursor": "0200000010862702816486"
},
If you’re trying to maintain an up-to-date copy of the query (e.g., in a relational database that you’re syncing data into), you can, for most queries, use the UID as a primary key against which to upsert new records. If a new delete or insert record arrives, you can use the UID field to find the relevant row, and then delete or update it accordingly. Certain queries (e.g., those which do not project the uid
or have a pivot operation) will have a different primary key. However, you can always compare the full row (directly or through a hash) as well.
Using the Poll API
To query the poll API, simply replace the /iql
with /poll
. If you don’t specify a cursor, then poll will automatically run against the latest cursor and provide you with a snapshot of insert events and a new cursor to subscribe to. You can use the cursor value it returns in the next query to block on changes by simply providing a cursor=<cursor value>
parameter to the GET
request.
Uploading files and writing data via Impira Write API
This article discusses
- Uploading files and writing data via Impira Write API
- Updating data
- Deleting files/data
We strive to offer every function in the UI through our API so you can mix-and-match the right combination of UI and API for your use. This guide is divided into multiple sections to walk you through each core feature of the API.
Getting started
First, you'll need your Org name and an API token handy.
1. Find your Org name
Your organization's name will appear in the API end-point used to query your data. To find your Org name, look in the browser URL bar. Your org name appears in the URL path after the ‘/o/’ subdirectory.
https://app.impira.com/o/<YOUR_ORG_NAME>/collections
2. Create an API Token
In order to call the API, you’ll need an API Token. If you haven’t done so already, learn how to create your own API Token.
API URL structure
The Impira API uses a common URL structure:
https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/<path>
<path>
specifies the resource you want to read or write to.
The API supports different operations through the standard HTTP verbs GET, POST, PATCH, and DELETE. We’ll walk through how to use each of these verbs in depth below.
In most cases:
- GET allows you to retrieve information about a Collection, file, or other resource
- POST allows you to upload files and insert new records
- PATCH allows you to update existing records
- DELETE allows you to delete resources
Uploading files with POST
Uploading files directly to Impira is a breeze. Impira supports two methods to write files through the API:
- Submitting a POST request with a file URL and other metadata.
- Directly uploading a file as multipart/form-data through a POST request.
Uploading files with URLs
You can upload a file to a given resource (e.g., a Collection) by submitting a POST request to the appropriate path. The request takes a JSON object which includes one or more entries to insert. Each entry includes the following information:
- The File object contains the path to the file and optional metadata fields:
File.path
is a URL to the file. The file must be accessible to Impira’s cloud-based servers. You can use either HTTP or HTTPS URLs.File.name
is an optional name for the file which appears in the UI. This name doesn't have to be unique. If you omit the name, the file will appear with an empty name in the UI.
- If you're uploading to a Collection, you can only specify values in the File field. If you'd like to specify values for other fields, you must do that in a follow up PATCH request (see below). You cannot specify ML model (inferred) fields through the API.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"File": {
"path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
"name": "application form 10"
}
},
...
]
}'
Example response
{
"uids": [
"927a834f5f2036e6"
]
}
Uploading files directly
If you have a file on your local machine or server and want to upload it directly to Impira, you can simply upload it with a multi-part file form submission. The direct file upload API supports up to 10MB per request, which you can spread across one or more files.
For each file you'd like to upload, you can submit a file form field. You can optionally specify a data field for each file, which is a JSON encoded value that contains any record values to include alongside the file object. Just like URL-based upload, you can only specify fields in the File object (e.g., name). You can edit any follow up fields in a PATCH request.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--form "file=@local_file.pdf"
Example response
{
"uids": [
"8a4dc36aa82bf85b"
]
}
Example request
curl -X POST https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME> \
--header 'X-Access-Token: YOUR_TOKEN_HERE' \
--form 'file=@local_file1.pdf' \
--form 'data={"File": {"name": "my_file_1.pdf"}}' \
--form 'file=@local_file2.pdf' \
--form 'data={"File": {"name": "my_file_2.pdf"}}' \
Example response
{
"uids": [
"cb713868a5155ad8",
"f1df198b1ebdef98"
]
}
Adding existing files to a Collection
Files in Impira can belong to multiple Collections. To add an existing file into one or more Collections, you can issue a write to the file_collection_contents
entity class with the Collection's and file's UID
. You can find a Collection's UID
from its URL in the application (https://app.impira.com/o/<YOUR_ORG_HERE>/fc/<COLLECTION_UID>
).
{"uids": null}
.*Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/ec/file_collection_contents" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"file_uid": "cb713868a5155ad8",
"collection_uid": "5aff2aad39eaa373"
},
{
"file_uid": "f1df198b1ebdef98",
"collection_uid": "5aff2aad39eaa373"
}
...
]
}'
Example response
{
"uids": null
}
Inserting data with POST
You can insert data into a Dataset the same way as uploading a file. Unlike writing to /files
or a Collection, you don't need to include the File object when writing to a Dataset. Instead, just specify the fields you want to write.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/dc/337dd57d0c5948f9" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"Name": "Rhyley Green"
"Hired": true,
},
{
"Name": "Ammar Duffy"
"Hired": false,
}
...
]
}'
Example response
{
"uids": [
"360fe1982bb9055a",
"b0694bc41fa6b241"
]
}
The UID field is a unique identifier for the record you insert. If you do not specify a UID in the request, then Impira will generate one for you. You can use this UID in future requests (or just specify your own) to perform an "upsert," which will insert a new record if the UID is not already present or update the record if it is. For example, if the insert above returned a UID of "b0694bc41fa6b241" for "Ammar Duffy," then you can run this to update their Hired
field to true
.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/dc/337dd57d0c5948f9" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"uid": "b0694bc41fa6b241"
"Name": "Ammar Duffy"
"Hired": true,
}
]
}'
Example response
{
"uids": [
"b0694bc41fa6b241"
]
}
{"uid": "b0694bc41fa6b241", "Hired": true}
), then the request will overwrite Name with an empty value.Updating files or data
When you insert a new record using a POST request, you can either specify your own UID or Impira will generate one for you if you don't have one. You can use this UID later to reference the record and PATCH commands to update it. This method works across both Collections and Datasets.
Setting fields with a PATCH request
For each record in a PATCH request, you must specify the record's UID as well as any fields you'd like to update. This example updates the file we uploaded earlier to set its Inspected state to true.
Example request
curl -X PATCH "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"uid": "8a4dc36aa82bf85b"
"Inspected": true,
},
...
]
}'
Example response
{
"uids": [
"8a4dc36aa82bf85b"
]
}
Just like a POST request, you can specify multiple records to update in a PATCH. Each record can even contain different fields that you'd like to update.
Replacing files or data
While PATCH requests allow you to update fields in an existing record, you can also replace an entire record in a Collection or Dataset using a POST request with a specified UID
. This request will create a new record if the UID
does not exist, or overwrite an existing one if does. You can set the UID
to be any unique string of your choice while creating a new record, or if you inserted without a UID
, you can use the value returned in the POST
response.
For example, if a new version of a file arrives and you'd like to overwrite it, you can use this technique to update the record and avoid a duplicate record.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"uid": "my_unique_file_12345"
"File": {
"path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
"name": "application form 10"
},
},
...
]
}'
Example response
{
"uids": [
"my_unique_file_12345"
]
}
Deleting files or data
Use the DELETE verb to delete a record from the global set of files, a Collection, or a Dataset. The payload is the same format as the response from POST and PATCH requests (e.g., a list of UID values that you'd like to delete). For example, to delete the file above globally, you can run:
Example request
curl -X DELETE "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"uids": [
"8a4dc36aa82bf85b"
]
}'
Example response
{
"uids": [
"8a4dc36aa82bf85b"
]
}
Deleting from a Collection
Just as in the application, if you issue a DELETE against a Collection, Impira will remove the file from the Collection but not delete it globally. If you'd like to delete a file globally, then delete it from /files
as shown above. This will also remove the file from any Collections it belongs to.
Asynchronous mode
When you upload a new file to Impira through the API, the command completes the file has been fully uploaded and added to its target Collection. Depending on the load of the system, these steps can take several seconds to complete. For your use case, if you don't need the file to be initialized in its target Collection when the command completes, you can use the async
flag in the API request to complete much faster.
To help you decide whether asynchronous mode is appropriate, consider the following:
- If you perform an asynchronous upload, you may not see the file yet if you query the Collection immediately after the command completes.
- Asynchronous mode still returns the
UID
of the uploaded file(s). - Whether or not you use asynchronous mode, upload commands never wait for all of the machine learning models to run before completing.
Example request
curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>?async=true" \
--header "X-Access-Token: YOUR_TOKEN_HERE" \
--header "Content-Type: application/json" \
--data '
{
"data": [
{
"File": {
"path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
"name": "application form 10"
}
},
...
]
}'
Example response
{
"uids": [
"927a834f5f2036e6"
]
}
Need more?
See our Impira Read API to read files or data at a given path, or to query data via our Impira Query Language (IQL) API.
File modification during API upload
Impira has a built-in toolkit for modifying files already uploaded into Impira, but you can also modify your files as you upload them via the Impira Write API.
We currently support modifications for the following file types:

What could I use this feature for?
Save time by pre-encoding file modifications ahead of time if you anticipate needing certain adjustments without ever having to go into Impira yourself to do it manually. You can also reduce the number of page uploads you consume by removing unnecessary pages from your documents from the beginning.
If you’ve received a hundred different scanned forms that have been combined into a single PDF, or if many of your documents are oriented incorrectly, you can use the Impira Write API to ensure all of them are uploaded and modified simultaneously.
Basic query structure
Modification requests are built off of the files
or collections
endpoints used to upload files via the API:
https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/files
https://app.impira.com/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>
You can pass a mutate command to Impira via the File JSON representation. The mutate field contains instructions for transforming your base file into an extraction-ready format.
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"rotate": 90
}
}
}
For example, this command would upload a file from your system and rotate the entire document 90º clockwise:
curl -v -X POST https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/files \
--header 'X-Access-Token: <ACCESS_TOKEN>' \
--form 'file=@<FILE_NAME>' \
--form 'data={"File":{"name":"test_api.pdf","mutate":{"rotate":90}}}'
Page syntax
“pages:” 0
refers to the first page of a document. Read more in the What’s zero-indexing? section below.Our simple page syntax allows easy specification of a single page, a range of pages, a list of pages, and more:
Specifying: | Example: |
---|---|
A single page (e.g., the first page) | 0 |
A list of pages | 0,2,4,7,8 |
A page range | 2:5 |
Syntax combinations
You can specify what pages you want in a relatively simple manner using a combination of ranges and individual numbers:

Negative indices
Negative indices help specify pages of a document of unknown length.

End index omission
An end index omission can be used to call all pages that occur after a given page number.

Rotating pages
The following command rotates all pages of a file 90º clockwise (all rotations specified are clockwise):
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"rotate": 90
}
}
}
You can also choose specific pages or ranges you’d like to rotate:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"rotate_segments": [
{
"pages": "0",
"degrees": 90
},
{
"pages": "2",
"degrees": 180
}
]
}
}
}
The example above rotates the first page of the file by 90º clockwise, and the third page of the document by 180º (pages are zero-indexed in Impira). All other pages would be unaltered. The pages
field supports individual pages, lists, and ranges.
Refer to the Page syntax section above for more information.
Removing pages
You can also remove specific pages from a multipage file:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"remove_pages": "3,5,6"
}
}
}
This example removes pages four, six, and seven from the original document.
remove_pages
command should not be used in combination with the split_segments
command. Instead, simply omit the pages you wish to remove from the split_segments
field.Splitting pages
You’re also able to use the mutate command to split a multipage document into smaller sections. The following syntax would split the document into individual pages:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"split": "page"
}
}
}
For more complex document splitting operations, you can specify the shape of the desired result documents in terms of pages from the original document:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"split_segments": ["0,1", "2", "3,4", "5"]
}
}
}
The example above splits a six-page original document into four separate files with the following page distribution (page numbers are zero-indexed):

You can omit page numbers from split_segments
if you don’t need certain information from the original document, as well as repeat page numbers if a single page (e.g., a header or title page) is shared between the resulting split files. You can also utilize the ranges specified in the Page syntax section below to capture ranges of pages.
remove_pages
operations if split_segments
is also specified.The API will return a upload_uids
field containing a list of UIDs (unique identifiers) for all files inserted. If split
or split_segments
are specified, you can find the UIDs of the resulting documents based on this upload_uid
.
Beta: Splitting pages based on page content
In many cases, it's impossible to know the exact page numbers to split documents on without inspecting the documents themselves. Impira supports splitting pages based on their text content. You simply write a regular expression which matches the first page of each document type.
The following syntax splits off a new document each time it encounters the regular expression "Page 1":
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"split_exprs": {
"new_doc": "regexmatch(fulltext, 'Page\s+1')"
}
}
}
}
Each key in split_exprs
represents a different document type. These names are arbitrary and just used as a convention.
For example, if you have a submission packet which contains a form and multiple loss run documents, you could write a more complex expression like:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"split_exprs": {
"acord_125": "regexmatch(fulltext, 'ACORD\s+125') and regexmatch(fulltext, 'Page\s+1')",
"acme_loss_run": "regexmatch(fulltext, 'ACME Loss Run')",
"bank_stmt": "regexmatch(fulltext, 'Your monthly bank statement')"
}
}
}
}
Each expression can access the full range of IQL syntax in terms of the full_text
field (which is the full text of the page). If a page is true
for any expression, then it's considered a new document. In the event of a tie, Impira will pick the first alphabetic match in terms of the key. If no expressions match, then the page is considered the next page of the previous document.
This feature is under active development, so please reach out to us with any feedback or suggestions you have about the syntax or functionality at support@impira.com.
Retrieving split results
Once your original document has been split, you may have multiple resulting files within Impira. To fetch the UIDs of your results, you can query the files
entity class using the upload_uid
field. For files that weren’t split, you’ll receive a result with the same value for both uid
and upload_uid
.
Use this query for the example results of a split operation:
{
"upload_uids": [
"c60f4011886c34a0",
"e59d7e9f912bc3fd"
]
}
You can query against the files entity class to retrieve your final results:
@`files`[uid, File.upload_uid, File.MutationPending] in(`File`.`upload_uid`, "c60f4011886c34a0", "e59d7e9f912bc3fd")
If the MutationPending
field is true
, then the split operation has not yet completed.
[
{
"File.MutationPending": false,
"File.upload_uid": "c60f4011886c34a0",
"uid": "0aff1d6332b3266f"
},
{
"File.MutationPending": false,
"File.upload_uid": "c60f4011886c34a0",
"uid": "4e03c48128364ad1"
},
{
"File.MutationPending": false,
"File.upload_uid": "c60f4011886c34a0",
"uid": "5b7b973be2e0208c"
},
{
"File.MutationPending": false,
"File.upload_uid": "e59d7e9f912bc3fd",
"uid": "e59d7e9f912bc3fd"
}
]
Combining modifications
This is for all you go-getters out there. You can apply multiple modifications simultaneously to your API upload.
Example 1: Combining page removal and page splitting
This example removes the last three pages of the document and splits the remaining pages into single page files:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"remove_pages": "-3:",
"split": "page"
}
}
}
Example 2: Combining page removal, rotating, and splitting
This example removes page six and onwards, rotates all pages 90º, and splits the document into single pages. The result is five single-page documents corresponding to the first five pages of the original document, with each page being rotated 90º:
{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"remove_pages": "5:",
"rotate": 90,
"split": "page"
}
}
}
Example 3: Combining page rotation and page splitting
This last example rotates pages zero, two, and four by 180º while splitting the document into four separate files with the following page distribution (page numbers are zero-indexed):

{
"File": {
"name": "<FILE_NAME>",
"mutate": {
"split_segments": ["0,1", "2", "3,4", "5"],
"rotate_segments": [
{
"pages": "0,2,4",
"degrees": 180
}
]
}
}
}
More info: What’s zero-indexing?
Zero-indexing (or “zero-based array indexing”) numbers items of a sequence in a way where the first item has an index of 0. This means the first item in a sequence is the 0th item and the second item is the 1st item.
If you’re given a 5-page document file, the first page may be called “1” and your second page is called “2.” With zero indexing, this first page is called “0,” the second page is then called “1,” and so on.

The difference between one-indexing and zero-indexing can be seen in the different floor number schemes for buildings in North America and Europe. Exceptions aside, North American buildings often call the first story “Floor 1” and European buildings often refer to their first story as their ground floor or “Floor 0.”

Using zero-indexing for the File Modification API allows consistency between other array access patterns, such as our query patterns in IQL and array access in other programming languages. Zero-indexing also has a lot of mathematically favorable properties, such as the length of the result being the difference between the start and end index.
Creating and modifying Collections through the API
Create Collections and fields through the API and SDK.
Creating a Collection
To create a Collection, you simply hit the /collection/
endpoint with an empty data object.
Example
Importing fields from another Collection
When you create a new Collection, it's often useful to import fields from another Collection so you don't need to create them one by one. For example, you may want each of your different invoice formats to be in a separate Collection for better training, but you'd like the output to have the same fields.
Steps:
- Create Collection One and add your fields (if you haven't already done so).
- Create Collection Two and import your fields from Collection One.
In these examples SRC
refers to the source Collection with the fields you want to import, and DST
refers to the destination Collection that you want to add fields to.
file_collections::<collection_id>
. For example, if your Collection's id is e99d6bea7ace8b07
, then the entity name is file_collections::e99d6bea7ace8b07
. This isn't necessary if you're using the Python SDK.Example
Using Impira Query Language (IQL) for advanced queries
Impira Query Language (IQL) makes it really easy for you to interactively search, filter, and transform the semi-structured data trapped in your files. IQL is as easy to use as a search bar, generally with a single line of text, yet provide powerful features that would otherwise require writing a complex query using something like SQL.
IQL can be used for advanced queries within the Impira UI, to power a custom field on a Collection, or to query data over the API.
Design principles
IQL is designed specifically for building business applications on metadata-rich information. In fact, all of your interactions in Impira are powered by IQL queries — from uploading new files and creating collections to adding machine learning fields to a collection to extract information.
We wanted IQL to be approachable, designed to fit in a single line, and to be written incrementally. You can start with a search query, refine the results, then quickly add a few instructions to start pivoting and aggregating your data.
Switching to IQL mode
Once you’ve created a Collection and extracted some information, you will need to change your search mode to IQL. You can do this by clicking on the menu button on the right side of the search bar, selecting IQL, and clicking Save. For the rest of these examples we will be querying a Collection of invoices, but you can use a Collection of your own files, too.

IQL concepts
Querying
The first part of a query is the entity class you want
to query from. There are a number of entity classes available to query, but the two most common are @files
and @file_collections
. Each of your Collections within Impira are assigned a unique identifier, and after changing your search mode to use IQL, you’ll see the name of the entity class in the search bar.

Impira will pre-fill the entity class of the collection you’re currently working
on, and it’ll prefix it with an @
symbol to distinguish it from the other parts of the query. If you’re working with IQL via the API, you’ll need to include the entity class and the @
symbol in your query.
Running the query with just the entity class will return all of the fields for each row in the collection. Impira visualizes this in a table, but there’s a lot more information available for each file and field.
You can see this by switching to the JSON view.
Go ahead and switch to the JSON view and see how rich the data is behind the scenes. Switch between the table view and the JSON view at any time by returning to the view menu.

For fields with nested values, you can use dot notation (.) to navigate the nested path. For a field name that contains spaces, like Invoice No
, you can write the name between backticks (`).
Tip: Backticks — the apostrophe lookalike
It’s important to place backticks around a file or field name to distinguish it from the rest of an IQL query (see the example below with field names Invoice No and Due Date in backticks).

Backticks look a lot like apostrophes (ʼ) or single quotation marks (‘), so make sure you pick the right symbol on your keyboard as you write IQL queries.

Projection
The second part of a query is the projection list. This is used to specify which fields or expressions you want to return, allowing you to selectively access the fields specific to your needs. The projection list is specified between square brackets, [
and ]
, and each expression within the list is separated by a comma.
Let’s query our entity class for the Invoice No
and the Due Date
:
;[`Invoice No`, `Due Date`]

Notice that this returns the complete JSON structure for only the fields specified
in the projection. You can be more specific about obtaining just the extracted information
from this result by using the dot notation to navigate through the result. Let’s
query for the extracted value for Invoice No
and Due Date
: js [`Invoice No`.Label.Value, `Due Date`.Label.Value]
You’ll see your result change to a much simpler array
of objects with only two properties per row in your collection.

Aliasing
Items in the projection list can be given a new name, also known as
aliasing. You can do this by prefacing a field name with a name of your own,
separating your new name and the field name with a colon (:
). This makes reading
the results simpler, and also provides a powerful way to rename fields to meet the
needs of downstream applications that consume data via Impira APIs.
Go ahead and change your projection list to include new names for your fields.
[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value]

Switching back to the table view you’ll see that the new JSON result from your query is also visualized nicely with a header row that matches your aliased field names, and each row with the data you projected as part of the query. Switch back and forth between each view to get a different perspective on your result.

Filters, sorting, limits, and offsets
The remainder of the IQL query, after the entity class and (optional) projection, is the filter. The filter is flexible and designed to allow you to write simple text queries to more specific queries over fields or expressions.
Let’s take a look at some examples that build on the Invoice No
and Due Date
queries above.
Query for all invoice numbers with a due date greater than Jan 1st 2020:
[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] due>2020-01-01
Query for a specific invoice number:
[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] inv_no:451
Query for all invoice numbers with a due date in 2019 (notice in this example that the due date is filtered on both “less than” and “greater than” a specific date):
[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] due>2019-01-01 due<2020-01-01
Query for the 10 most recent invoices, including Invoice No and Total Due, sorted by the highest due date:
[inv_no:`Invoice No`.Label.Value, total:`Total Due`.Label.Value, due:`Due Date`.Label.Value] limit:10 highest:due
Query for the lowest 10 invoice totals:
[inv_no:`Invoice No`.Label.Value, total:`Total Due`.Label.Value, due:`Due Date`.Label.Value] limit:10 lowest:total
For larger results, you’ll often want to paginate the results into smaller chunks. You can do this using a combination of limit and offset filters. For example, here’s a filter to get 100 records, beginning from the 300th record:
limit:100 offset:300
Pivot and rollup
IQL also supports a concept called pivoting which is a neat way to group your results. Just like the above examples of limit, offset, highest, and lowest, you can set pivot to true. Let’s look at an example to see how you can easily group your data for more detailed analysis or to summarize your results.
Let’s assume you wanted to see all your invoices grouped by year. IQL has many built-in functions to transform and manipulate details, and for this example we’ll use the year()
function to truncate the Due Date
field to the year.
[year(`Due Date`), {`Invoice No`, `Total Due`}] pivot:true

Similarly, you can use other functions in your query to aggregate the results. With
a small change to the query, we can calculate the total due for all invoices for
each year. IQL has some additional conveniences too, and when an aggregate function
like sum()
is used, pivot is automatically set to true.
Combining what we’ve learned about filters and aliasing our projected fields, we can create a user friendly output that can be easily shared throughout your business.
[`Invoice year`: year(`Due Date`), `Total spend`: sum(`Total Due`)] highest:`Total spend`

In addition to pivoting, IQL supports a strong form of grouping called a rollup. A rollup will compute the value of each aggregate at each level of nesting in the query. Building on the previous example, you can expand this to include the month and see the total of all invoices by month and by year.
[`Invoice Year`: year(`Due
Date`), `Invoice Month`: month(`Due Date`), `Total spend`: sum(`Total Due`)] rollup:true

IQL functions catalog
IQL has a rich library of functions, which are inspired in name & functionality by Excel, Google Sheets, and SQL. We are constantly adding to this list. Please get in touch with us if there's a function that you'd like to use that's not on this list!
Type conversion functions
to_bool
to_bool(expr)
Casts expr
to a boolean type.
Example query
returns each value `id`
casted to a boolean.
to_number
to_number(expr)
Casts expr
to a number type. Timestamps casted to numbers return the UNIX timestamp representation (seconds since the epoch).
Example query
returns each value `placeholder_timestamp_field`
casted to a number (UNIX timestamp).
to_string
to_string(expr)
Casts expr
to a string type
Example query
returns each value `id`
casted to a string.
to_timestamp
to_timestamp(expr)
Casts expr
to a timestamp type. A number cast to timestamp is assumed to be a UNIX timestamp.
Example query
returns each value `id`
casted to a timestamp (interpreted as a UNIX timestamp).
String functions
concat
concat(arg1, arg2, arg3, ...)
Example query
constructs a string from the fields `id`
and `product_sku`
.
textjoin
textjoin(delimiter, string_list)
Joins together the strings in string_list
with the delimiter
.
Example query
returns the array ["a","b","c"]
joined together by " ,"
.
upper
upper(text)
Convert text to upper case.
Example query
returns 'The quick brown fox'
uppercased.
lower
lower(text)
Convert text to lower case.
Example query
returns 'The quick brown fox'
lowercased.
left
left(text, [num_chars])
Extract the text from the left of the string. num_chars
defaults to 1 if not provided.
Example query
takes the first 10 chars from the left of the input string and returns 'brown fox'
.
right
right(text, [num_chars])
Extract the text from the right of the string. num_chars
defaults to 1 if not provided.
Example query
takes the last 10 chars from the right of the input string and returns 'brown fox'
.
mid
mid(text, start, num_chars)
Extracts the text starting from start
characters (indexed by 1) up to num_chars
characters.
Example query
starts 5 chars from the start (q
), reads 11 chars, and returns 'brown fox'
.
strip
strip(text)
Removes leading and trailing whitespace from text
. This is similar to the strip()
function in Python.
Example query
removes leading/trailing whitespace and returns 'The quick brown fox'
.
trim
trim(text)
Removes all spaces in text
, leaving just a single space between words. Like strip
,
trim
removes leading/trailing whitespace, but also cleans up the spaces within the string.
Example query
returns 'The quick brown fox'
.
proper
proper(text)
Convert text
to title case, where all words begin with a capital letter.
Example query
converts the input string to title case and returns The Quick Brown Fox
.
sentence
sentence(text)
Convert text
to sentence case, where most words are lowercased and the first word in a sentence is capitalized.
Example query
converts the input string to sentence case and returns The quick brown fox
.
split
split(text, separator)
Split the input text
using the specified separator
and returns an array of the parts.
Example query
splits `placeholder_string_field`
using a hyphen -
as a separator.
stem
split(text)
Takes the input text
and returns a stemmed version of the words.
Example query
takes the word disappeared
and returns the root stem of the word, disappear
.
clean_stopwords
clean_stopwords(text)
Remove stopwords
(commonly used words, for example: I, we, you, me, them, their, the) that are filtered from the input text
. This can be useful when searching across OCR keywords.
Example query
filters out the stopwords
from the input string and returns quick brown fox lazy brown dog
.
Conditional functions
coalesce
coalesce(arg1, arg2, arg3, ...)
Returns the first non-null argument. All arguments must be either scalar or objects with the same spec. Coalesce is useful for combining fields where some values may be null.
Example query
returns `id`
if it's non-NULL
, otherwise `product_sku`
.
if
if(condition1, value1, [condition2, value2, ...], elseValue)
Returns first value
for which condition == true
. If no conditions are true
, returns elseValue
.
Example query
returns `id`
if it is positive, otherwise 0
.
_and
_and(x, y)
AND
's two values together. This is equivalent to x and y
. Evaluates to null
if any of the values are null
.
Example query
returns the boolean expression (`placeholder_bool_field` and true)
.
_or
_or(x,y)
OR
's two values together. nulls
evaluate to false
.
Example query
returns the boolean expression (`placeholder_bool_field` or true)
.
eq
eq(x,y)
Compares x
with y
and returns a BOOLEAN
as a result. This function works with all input types (STRING
, NUMBER
, TIMESTAMP
, BOOLEAN
, ENTITY
).
Example query
compares (`id`=1)
.
in
in(val, x, y, ...)
Compares val
with the remaining values (x
, y
, and so on) and returns a BOOLEAN
if val
equals any of them. This is semantically the same as
(val=x) or (val=y) ....
This function works with all input types (STRING
, NUMBER
, TIMESTAMP
, BOOLEAN
, ENTITY
).
Example query
returns true if (`id`
equals 1
, 2
, or 3
.
in_list
in_list(val, list)
Compares val
with the values in list
and returns a BOOLEAN
if val
equals any of them. This is semantically the same as
in(val, index(list, 0), index(list, 1), ...)
This function works with all input types (STRING
, NUMBER
, TIMESTAMP
, BOOLEAN
, ENTITY
).
Example query
returns true if (`id`
equals 1
, 2
, or 3
.
not
not(expr)
Returns a boolean for the opposite of the input expression when the expression evaluates as a boolean. If a binary number 0
or 1
is entered as the expression they are treated as boolean values.
Example query
returns true if (`id`
does not equal 1
.
Date and time functions
datepart
datepart(interval, timestamp)
Returns the part of the timestamp
specified by interval
. Supported values for interval
are:
yyyy | Year
q | Quarter
m | Month
y | Day of year
d | Day
w | Weekday
ww | Week
h | Hour
n | Minute
s | Second
Example query
returns the year part of `placeholder_timestamp_field`
.
year
year(timestamp)
Returns the timestamp truncated to the year.
Example query
returns the year part of `placeholder_timestamp_field`
.
month
month(timestamp)
Returns the timestamp truncated to the month.
Example query
returns the month part of `placeholder_timestamp_field`
.
week
week(timestamp)
Returns the timestamp truncated to the Monday of the week
Example query
returns the year part of `placeholder_timestamp_field`
.
weekday
weekday(timestamp, [return_type])
Returns the day of the week of timestamp
as a number. By default, Sunday is 1 and the values range from 1-7. However, you can specify the optional return_type
argument to change this behavior. IQL mimics Excel's specification which you can find here.
Example query
returns the day of week from `placeholder_timestamp_field`
.
day
day(timestamp)
Returns the timestamp truncated to the day.
Example query
returns the day part of `placeholder_timestamp_field`
.
hour
hour(timestamp)
Returns the timestamp truncated to the hour.
Example query
returns the hour part of `placeholder_timestamp_field`
.
minute
minute(timestamp)
Returns the timestamp truncated to the minute.
Example query
returns the minute part of `placeholder_timestamp_field`
.
second
second(timestamp)
Returns the timestamp truncated to the second.
Example query
returns the second part of `placeholder_timestamp_field`
.
add_interval
add_interval(timestamp, amount, timestamp_part)
Add the number specified by amount
to the timestamp_part
of the input timestamp
. timestamp_part
can be one of year
, month
, day
, hour
, minute
, second
.
Example query
increments the month
part of `placeholder_timestamp_field`
by 1
.
sub_interval
sub_interval(timestamp, amount, timestamp_part)
Subtracts the number specified by amount
to the timestamp_part
of the input timestamp
. timestamp_part
can be one of year
, month
, day
, hour
, minute
, second
.
Example query
decrements the month
part of `placeholder_timestamp_field`
by 1
.
sub_interval
datedif(timestamp1, timestamp2, part)
Compares the difference of the specified timestamp part
between the two timestamp
's. part
can be one of year
, month
, day
, hour
, minute
, second
.
Example query
compares how many day's difference there are between `placeholder_timestamp_field`
and now()
.
text
text(value, format_text)
Returns the string with the date in value
formatted according to format_text
.
Example query
returns the date `placeholder_timestamp_field`
formatted according to mm/dd/yy hh:mm:ss
.
Array and entity functions
array_to_object
array_to_object(field)
Given a field of an object that is part of a list, array_to_object
will construct a new object where the keys are the
values of field
in the original objects in the list, and the values are the objects themselves.
Example query
This hypothetical query would turn array_field
into an object keyed by f
. For instance, if array_field = [{"f":"a","value":5},{"f":"b","value":6}]
then array_to_object(array_field.f) = {"a":{"f":"a","value":5},"b":{"f":"b","value":6}}
.
array_window
array_window(array, lag, lead)
For each value in the array
returns the number of lagging and leading values specified by lag
and lead
(the values before and after each value in the array).
Example query
will return an array with the 2
values before and after each item in build_array(1,2,3,4,5,6,7,8,9,10)
.
build_array
build_array(arg1, arg2, arg3, ...)
Constructs an array with the provided arguments. The arguments must be either all entities or all non-entities. build_array()
returns an empty array
Example query
constructs the array [`id`, 2, 3]
.
build_entity
build_entity(field1, value1, field2, value2, ...)
Constructs an object with the arguments provided. Arguments should be provided in pairs, where the first argument in the pair is the field name, and the second is the value.
Example query
constructs the object {"foo":true}
.
concat_array
concat_array(array1, array2, array3, ...)
Concatenates and flattens arrays into a single array.
Example query
returns the array [1,2,3,4,5,6]
.
index
index(array, idx)
Returns the value at the idx
position (zero-indexed) of array
. If idx
is out of bounds, the function returns null
.
Example query
returns the first element of the array ("apple"
).
array length
len(array)
len
takes an array as an argument and returns the length of that array.
Example query
evaluates to 3
.
merge
merge(obj1, obj2, obj3, ...)
Merges the objects together into a single object. merge
ignores fields in objects with null values.
Example query
merges objects {"foo":1}
and {"bar":2}
into {"foo":1,"bar":2}
.
merge_many
merge_many(objArray1, objArray2, objArray3, ...)
Merges and flattens lists of objects together into a single object. merge_many
ignores empty arrays.
Example query
merges each object in arrays [{"foo":1}]
and [{"bar":2}]
into [{"foo":1,"bar":2}]
.
merge_unnest
merge_unnest(obj, array)
This function is used to expand an array
into a set of rows, and merge each row with the same obj
.
Example query
un-nests the array of bar
objects and merges each object with {foo: 1}
. The result is an array of new objects that look like {foo: 1, bar: 2}
.
slice
slice(array, start, end)
Returns the subset of the array
from index start
(zero-indexed, inclusive) to end
(zero-indexed, exclusive). Negative start
or end
indices are relative to the end of array
.
Example query
returns the first two elements of the array ("apple"
).
strip_nulls
strip_nulls(obj)
Recursively strips fields with null
values from objects.
Example query
returns each `deploy`
entity without any null keys or values.
Regular expression functions
regexextract
regexextract(str, reg)
Returns the left-most match of regex reg
on string str
. Returns null
if there is no match. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXEXTRACT
.
Example query
returns each value of `placeholder_string_field`
that contains the string "impira"
or null
otherwise.
regexmatch
regexmatch(str, reg)
Returns true
if str
matches regex reg
, otherwise it returns false
. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXMATCH
.
Example query
returns a boolean that indicates whether `placeholder_string_field`
contains the string "impira"
.
regexreplace
regexreplace(str, reg, replacement)
Replaces matches of reg
in str
with replacement
. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXREPLACE
.
Example query
returns the string with all matches of img[0-9]a
replaced with img
.
Mathematical functions
add
add(expr1, expr2)
Adds the values of expression expr1
plus expression expr2
. Expressions must evaluate to a NUMBER
else add
will return NULL
.
Example query
adds `id`
and 1
.
sub
sub(expr1, expr2)
Subtracts the value of expression expr2
from the value of expression expr1
. Expressions must evaluate to a NUMBER
, otherwise sub
will return NULL
.
Example query
subtracts 1
from `id`
.
mul
mul(expr1, expr2)
Multiplies the value of expression expr1
and expression expr2
. Expressions must evaluate to a NUMBER
, otherwise mul
will return NULL
.
Example query
multiplies `id`
by 2
.
div
div(expr1, expr2)
Divides the value of expression expr1
by expression expr2
. Expressions must evaluate to a NUMBER
, otherwise div
will return NULL
.
Example query
divides `id`
by 2
.
least
least(expr1, expr2, expr3, ...)
Returns the lowest value among the input expressions.
Example query
returns the lowest value among 1, 2, 3, 0
which is 0
.
greatest
greatest(expr1, expr2, expr3, ...)
Returns the highest value among the input expressions.
Example query
returns the highest value among 1, 2, 3, 0
which is 3
.
floor
floor(expr, significance)
Rounds the value of the expression expr
down, towards the nearest multiple of significance
. significance
is optional and is 1 by default.
ceiling
ceiling(expr, significance)
Rounds the value of the expression expr
up, towards the nearest multiple of significance
. significance
is optional and is 1 by default.
round
round(expr, precision)
Rounds expr
to precision
(optional) number of digits. If no precision
is given, the function rounds to the nearest whole number.
Example query
returns `id`
divided by 2 and rounded to the nearest whole number.
Aggregate functions
array_agg
array_agg(expr, [sort_expr])
An aggregate function used to return an array from values evaluated by the input expression expr
. An optional second param sort_expr
can be specified to sort the results.
count
count(expr)
Aggregates by counting the number of distinct values for the field. This is the same behavior as COUNT(DISTINCT ...)
in SQL.
If no argument is passed, count()
counts the number of entities. This is the same behavior as COUNT(*)
in SQL.
Example query
returns the number of unique values of `id`
.
count_non_null
count_non_null(expr)
count_non_null(expr)
will count the number of entities where expr
is not null
. This is the same behavior as COUNT(expr)
in SQL.
distinct
distinct(expr)
Returns all distinct values for expr
as a list
Example query
returns the unique values of `id`
.
min
min(expr)
Returns the minimum value for the specified expr
.
max
max(expr)
Returns the maximum value for the specified expr
.
mean
mean(expr)
Returns the mean value for the specified expr
. This is a synonym for avg
.
median
median(expr)
Returns the median value for the specified expr
.
merge_agg
merge_agg(expr, [sort_expr])
Merge sub-fields of an entity expr
into a single entity. If duplicate keys are encountered during the merge into a single entity, the optional argument sort_key
can be used to keep the value for the key with the higher sort order.
Example query
returns the subfields of `deploy`
merged into a single entity, grouped by `id`
.
object_agg
object_agg(name, value)
Aggregates name
and value
pairs into a single entity. Values for name
must be non-null.
Example query
returns an entity with values of `placeholder_string_field`
as the field names and the values of `deploy`
as the values, grouped by `id`
.
sum
sum(expr)
Computes an aggregate sum over all values of expr
. sum
mirrors the SUM()
function in SQL.
Example query
returns the total `id`
across each unique value of `placeholder_string_field`
.
single
single(expr)
Returns at most one non-null value of expr
. This aggregate can be used when you know that you are pivoting on a unique value.
Example query
returns a single value of field
for each value of uid
. It will return an error if there are multiple records with the same value of uid
.
Join functions
join_one
join_one(target_entity, source_field, target_field)
This function joins the calling entity to the target_entity
where the source_field
matches the target_field
. This is a one-to-one relationship and functions like a LEFT JOIN in SQL.
Example query
This example assumes a Dataset
called My Dataset
which has a field called id
and a Collection
with an extracted
field called Reference number
. When called, this function will join the matching row from the Dataset to the matching row in the Collection where the
Reference number
equals the id
in the Dataset.
join_many
join_many(target_entity, source_field, target_field)
This function joins the calling entity to the target_entity
where the source_field
matches the target_field
. This is a one-to-many relationship and functions like a LEFT JOIN in SQL.
Example query
This example assumes a Dataset
called My Dataset
which has a field called id
and a Collection
with an extracted
field called Reference number
. When called, this function will join the matching rows from the Dataset to the matching row in the Collection where the
Reference number
equals the id
in the Dataset and return an array.
Other general purpose functions
hash
hash(arg1, arg2, arg3, ...)
Hashes one or more arguments together. IQL uses the Go implementation of crc64.
Example query
returns the hash of `id`
and `product_sku`
for each record.
identity
identity()
Returns the identity key for the entity as specified in the spec file. If no identity key is specified, it returns the entity as a bracket surrounded string (i.e. "[{"foo":"bar"}]"
)
Example query
returns each `deploy`
as an identity key.
__permissive_filter
__permissive_filter(filter)
Mimics full-text search behavior with flexible free-text and field-specific filters. Permissive filtering does not fail if the fields specified via field:value
do not exist. It is intended to directly execute a query entered in a search bar.
Example query
returns records containing "sample text"
and where `id`
is greater than 0
.