Tutorial: Uploading files and retrieving results

The basic workflow of document processing involves uploading files into Impira, then retrieving results. This section walks through the basics of how to accomplish this with Impira while introducing a few tradeoffs you may consider along the way.

Tip: Before running through this tutorial, we highly recommend trying out this full workflow through the UI, so you can familiarize yourself with the basics of Impira. Getting started with data extraction walks through the basics.

Let's get set up

Before you start, you'll need to do two things:

1. Get an API token

You'll need a unique API token (i.e., a "key") to access Impira's API. See How to create an API token for learn how to create one.

Throughout this tutorial, you'll see references to <YOUR_TOKEN_HERE>, which you should replace with your API token.

You also need to know your organization's name (it’ll be in your browser’s URL bar, after /o/).Throughout the tutorial, you'll see references to <YOUR_ORG_HERE>, which you should replace with your organization's name.

2. Create a Collection

In Impira, document processing occurs in Collections, which are like folders that group similar documents together. Documents within a Collection share a machine learning model that is unique to that Collection.

While getting started, we recommend creating a Collection in the UI. Once you create a Collection, you should grab its ID, which you can find in the URL after /fc/. Throughout the tutorial, you'll see references to <YOUR_COLLECTION_ID>, which you should replace with your Collection's ID.

Supported languages

The Impira API is HTTP-based. You can access it through a REST API using HTTP libraries in just about any programming language. We also offer a Python SDK which is currently in beta. The examples below show how to accomplish each task using both the REST API and Python SDK.

Uploading files

Most APIs operate over small volumes of data. Document files are comparatively larger, and so there are a couple different options for how to upload files through the API. Impira's API is asynchronous, which means that after you upload, you receive a uid for each file which you can later use to access its data.

Option 1: Using a URL

The most performant and scalable way of uploading files is to use a publicly accessible HTTP URL (e.g., a signed S3 URL). These uploads have a much higher rate limit and are generally faster. If you upload using a URL that expires, we highly recommend setting a high expiration time (e.g., one week), in case the file needs to later be re-processed or if you happen to upload during peak load.

When you upload one or more files, Impira returns a uid for each file. This is a globally unique identifier for the file that you can use to retrieve its results.

Example

Option 2: Using a local file

It isn't always possible to access files globally through a URL, like when the files are sitting on a local server. In that case, you can upload files directly to Impira.

Note: These uploads are rate limited aggressively. If you expect to upload a lot of files at once, we recommend using Option 1 above. In this case of receiving a rate limit error, the Impira API will also include a header, Retry-After-Ms, specifying the number of milliseconds until the next request should be sent.

Example

Option 3: Using an integration

In addition to the API, Impira supports a number of integrations including Zapier, Email, and Snowflake which allow you to ingest data as well. Each of these integrations accepts a similar set of parameters as above and uses the same underlying API to ingest files.

Performance considerations

If you plan to upload in bursts (i.e., 10 or more files at once), then you may want to use the async flag, which means the API command returns before the file has been added to its Collection. During periods of high load, it may take some time for the file to fully process and get added to its Collection, and this flag helps avoid hitting timeout errors.

Retrieving results

Once a file has been uploaded to a Collection, Impira will immediately start processing it. Depending on the size of the file and number of fields in your Collection, it can take a few seconds to a few minutes for the file to process.

Impira provides a few different options for retrieving results as soon as they are ready:

Option 1: Webhooks

This is the most popular option. Impira can trigger a webhook once documents complete processing. For each file, the webhook includes its uid, fields, and File data that you can use to download the original file. It also tells you whether the file was inserted or deleted from the Collection. If you're familiar with relational databases, you can think of webhook data as entries in a "log" of changes to your Collection.

See the Webhooks guide for more information on how to set them up.

By default, Impira triggers a new Webhook for each file once it has completed processing. However, you can write custom IQL queries to receive a webhook for all kinds of scenarios, including the following (and many more):

  • When a new low confidence prediction occurs
  • When a new file is uploaded, before it completes processing
  • When a file's data does not join correctly to a Dataset

Option 2: Long polling

If you aren't able to setup a webhook, you can alternatively use Impira's poll API which will block for up to a minute or when your file completes processing (whichever occurs first). The poll API actually uses the same underlying query mechanisms as webhooks and the data it returns has the same format.

The Python SDK has a built-in method named poll_for_results which captures this workflow for you.

Note: The poll API is a low-level mechanism that requires handwriting queries and running multiple requests. We highly recommend using it only if you can use the Python SDK.

Option 3: Reading results ad-hoc

You can also read the results directly through the Impira API with an ad-hoc request. The tradeoff here is that the results may not be finished processing, in which case you'll receive an empty or partial result. In general, we recommend using this method only if you know that the file has processed (e.g., if you want to query historical results).

Next steps

We hope this tutorial gives you a sense of the basics for how to upload files and retrieve data through Impira's API. The rest of our API reference docs go further in depth on each of these topics.

Don't hesitate to reach out to support@impira.com if you have any questions or feedback.


How to create an API Token

Impira uses API tokens to authenticate your API requests and to protect your data from unauthorized access.

We recommend using a unique token for each integration you set up so you can leave all of your other integrations intact when removing any single token to block access in the future.

Creating an API token

  1. Click the Account icon in the top right corner.
  2. Select API Tokens from the drop-down menu.
  3. Click the + on the right side to create a new token.
  4. Name your token and choose OK.

You’ll be presented with a Token Value which you can copy. Hold onto your Token Value for later.

You can always come back to your API Token list and click the Reveal Token Value icon anytime you need to reference your Token Value.

Validating your API token

To validate your API token, you can run a simple query to retrieve the metadata for a single file. Open your Terminal app and paste in:

curl -X GET --header "X-Access-Token: <YOUR_TOKEN_HERE>" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files?limit=1"
  • Replace <YOUR_ORG_HERE> with your organization’s name (it’ll be in your browser’s URL bar, after /o/)
  • Replace <YOUR_TOKEN_HERE> with the Token you created above

If your API token and Org name are valid, you should see JSON formatted output containing the metadata for a file, its schema, and a status field with a value of 200:

{
  ...
  "status": 200,
  ...
}

However, if either is invalid, you will see an error like:

{
  "error": "Invalid token",
  "status": 403,
  "request_id": "ba28b068-00e0-4a96-af38-6e98be341cad"
}

Revoking an API Token

You may want to revoke a token so it can no longer be used to access data. Do this by selecting the Revoke Tokens icon on the right and choosing Revoke.

Reading data via Impira Read API

This article discusses:

Jump to the article Impira Write API to learn how to upload files, write and update data, or delete files or data.

REST API endpoints

In order to read from different areas of your organization, there are a few endpoint structures that can be used. These same endpoints can also be used for running queries or updating and uploading files:

All Files In Organization

https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files

Files in an individual Collection

https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>

Reading files or data through the REST API

To read data at a given path, you can use the GET verb. You do not need to specify a body for the request. However, you can specify a few optional query parameters:

  • fields is a comma separated list of fields. You can use this to select specific fields. By default, the read API returns all fields in a Collection, Dataset, or View.
  • include_joins. By default, for performance reasons, the read API does not include joined fields. You can specify include_joins=true to include joined fields.
  • highest and lowest can be used to sort by a particular field. For example, highest=Price will sort records by the highest value of Price first.
  • offset and limit can help paginate long results. offset defines the starting point and limit constrains the total number of records returned. If you use offset and limit to paginate, you should specify a sort via highest or lowest otherwise you may duplicate or miss records.
  • Any other parameters are interpreted as filters. For example, if you have a field named Inspected, you can specify Inspected=true as a query parameter to only include records that have Inspected set to the value true.

Example requests

Return all records in All files (all files uploaded to the Org).

curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files"

Return all records in a specific Collection.

curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>"

Return all records in a specific Collection with Inspected=true.

curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?Inspected=true"

Return 10 records in a specific Collection.

curl -X GET  --header "X-Access-Token: YOUR_TOKEN_HERE"
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?limit=10"

Return 10 records in a specific Collection sorted by price.

curl -X GET --header "X-Access-Token: YOUR_TOKEN_HERE" \
"https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME_HERE>?limit=10&lowest=Price"

Response format

By default, the read API returns a resolved format of the data in your Collection, which strips out machine learning metadata including confidence scores, word/page location, and model version. This is the same format sent through webhooks.

This resolved format is much easier to consume in downstream applications, but advanced use cases may want to leverage some of the lower level metadata Impira exposes. The Impira Query Language (IQL), described below, allows you to query this metadata and run more operations like filters, grouping, normalization, and joins.

Querying via the Impira Query Language (IQL) API

Impira Query Language (IQL), is a simple, SQL-like language that is designed specifically for querying the kinds of data found in documents (semi-structured data). Just like SQL, you can perform all kinds of manipulations, including joins, filters, grouping, expressions, and more.

The IQL API also allows you to access lower-level data exposed by Impira, like the underlying machine learning confidence scores, so you can build advanced applications.

Creating an IQL query

  1. Go back to your browser window and click the Experimental Features icon in the top-right corner and choose IQL Playground.

  2. Choose the Collection you want to query (left sidebar) and select the Open in API to see the same data you pulled in Terminal in Step 2, but this time within a new browser tab. This query returns each row in the Collection, and each field in the row. The rows are unresolved, meaning that they include all of the low level machine learning metadata including the confidence score, model version, and page/word location for each field.


Writing a more specific query

Now let’s get more specific with your data starting with just one row, then a field within that row.

  1. In the search bar, add a space after the code and enter limit:1 This will give you one row to work with. Now, we’re going to isolate the File column data for this row. Enter a space after limit:1 and enter [File].

This is how specific and focused you can be by running a simple IQL query through the API. You can view this data in a new browser tab by clicking Open in API.

Running an IQL query through the API

  1. Copy the URL from your API browser window.
  2. Go to Terminal and enter:
curl -X GET --header "X-Access-Token: <YOUR_TOKEN_HERE>"\
  <URL>
  • Replace YOUR_TOKEN_HERE with your Token Value from Step 1 and hit Enter.
  • Replace URL with your copied URL from the API browser window.

You’ll now see the same data from Step 4 in your Terminal.

Writing more advanced IQL queries

IQL is a rich language with support for filters, joins, grouping, expressions, and more. Explore the advanced IQL queries guide here to learn more.

Running large queries

If your query is long enough, the generated URL may exceed 2048 characters in length, making it impossible to submit a query through a GET request. To avoid hitting this limit, you can use a POST request instead to submit the query in the request body against the IQL API URL:

https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/iql

The body should be a JSON-encoded object with a query field containing the IQL query and an optional defaultLimit field with a limit on the number of results.

Example request

curl -X POST 'https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/iql' \
  --header 'X-Access-Token: YOUR_TOKEN_HERE'
  --data '{"query": "@files limit:1"}'

Polling for changes

The IQL API additionally supports polling for changes. This allows you to run a query and receive a cursor, which you can provide as a parameter the next time. The API won’t return any results until new events arrive after the cursor, and then return those new events.

Key concepts

The Impira runtime supports incremental execution over IQL queries (with some exceptions noted below). This is the building block that makes Impira capable of incrementally retraining models, updating the UI in real-time, and providing Webhooks over your data. The poll API gives you access to this building block, so you can build powerful, real-time applications as well. Poll is an advanced feature, and in most cases, users can use the direct query API or webhooks.

Under the hood, Impira logs every change to data with an ordered cursor. These cursors are strings which can be compared lexicographically. The poll API allows you to specify a cursor alongside a query, and it’ll hold off on returning any results until the results of the query change, and then return a set of events with the changes and a new cursor.

Events

The event format is very similar to the standard data format that the IQL and other APIs return. Each event includes three fields:

  • data: the full record corresponding to the event
  • cursor: the cursor at which the event occurred
  • action: delete or insert

The following example indicates that a record with UID “65d527f1d94521e5” should be inserted at cursor “0200000010862702816486”:

{
  "data": {
    "uid": "65d527f1d94521e5",
      "score": 0.5,
    },
  "action": "insert",
  "cursor": "0200000010862702816486"
},

If you’re trying to maintain an up-to-date copy of the query (e.g., in a relational database that you’re syncing data into), you can, for most queries, use the UID as a primary key against which to upsert new records. If a new delete or insert record arrives, you can use the UID field to find the relevant row, and then delete or update it accordingly. Certain queries (e.g., those which do not project the uid or have a pivot operation) will have a different primary key. However, you can always compare the full row (directly or through a hash) as well.

Using the Poll API

To query the poll API, simply replace the /iql with /poll. If you don’t specify a cursor, then poll will automatically run against the latest cursor and provide you with a snapshot of insert events and a new cursor to subscribe to. You can use the cursor value it returns in the next query to block on changes by simply providing a cursor=<cursor value> parameter to the GET request.

Uploading files and writing data via Impira Write API

This article discusses

  • Uploading files and writing data via Impira Write API
  • Updating data
  • Deleting files/data

We strive to offer every function in the UI through our API so you can mix-and-match the right combination of UI and API for your use. This guide is divided into multiple sections to walk you through each core feature of the API.

Tip: You can also make file modifications (page rotation, splitting, or removal) simultaneously as you upload files via API. Learn more.

Getting started

First, you'll need your Org name and an API token handy.

1. Find your Org name

Your organization's name will appear in the API end-point used to query your data. To find your Org name, look in the browser URL bar. Your org name appears in the URL path after the ‘/o/’ subdirectory.

https://app.impira.com/o/<YOUR_ORG_NAME>/collections

2. Create an API Token

In order to call the API, you’ll need an API Token. If you haven’t done so already, learn how to create your own API Token.

API URL structure

The Impira API uses a common URL structure:

https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/<path>
  • <path> specifies the resource you want to read or write to.

The API supports different operations through the standard HTTP verbs GET, POST, PATCH, and DELETE. We’ll walk through how to use each of these verbs in depth below.

In most cases:

  • GET allows you to retrieve information about a Collection, file, or other resource
  • POST allows you to upload files and insert new records
  • PATCH allows you to update existing records
  • DELETE allows you to delete resources

Uploading files with POST

Uploading files directly to Impira is a breeze. Impira supports two methods to write files through the API:

  1. Submitting a POST request with a file URL and other metadata.
  2. Directly uploading a file as multipart/form-data through a POST request.

Uploading files with URLs

You can upload a file to a given resource (e.g., a Collection) by submitting a POST request to the appropriate path. The request takes a JSON object which includes one or more entries to insert. Each entry includes the following information:

  • The File object contains the path to the file and optional metadata fields:
    • File.path is a URL to the file. The file must be accessible to Impira’s cloud-based servers. You can use either HTTP or HTTPS URLs.
    • File.name is an optional name for the file which appears in the UI. This name doesn't have to be unique. If you omit the name, the file will appear with an empty name in the UI.
  • If you're uploading to a Collection, you can only specify values in the File field. If you'd like to specify values for other fields, you must do that in a follow up PATCH request (see below). You cannot specify ML model (inferred) fields through the API.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
    "data": [
      {
        "File": {
        "path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
        "name": "application form 10"
      }
    },
    ...
  ]
}'

Example response

{
  "uids": [
    "927a834f5f2036e6"
  ]
}

Uploading files directly

If you have a file on your local machine or server and want to upload it directly to Impira, you can simply upload it with a multi-part file form submission. The direct file upload API supports up to 10MB per request, which you can spread across one or more files.

For each file you'd like to upload, you can submit a file form field. You can optionally specify a data field for each file, which is a JSON encoded value that contains any record values to include alongside the file object. Just like URL-based upload, you can only specify fields in the File object (e.g., name). You can edit any follow up fields in a PATCH request.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --form "file=@local_file.pdf"

Example response

{
  "uids": [
    "8a4dc36aa82bf85b"
  ]
}

Example request

curl -X POST https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME> \
  --header 'X-Access-Token: YOUR_TOKEN_HERE' \
  --form 'file=@local_file1.pdf' \
  --form 'data={"File": {"name": "my_file_1.pdf"}}' \
  --form 'file=@local_file2.pdf' \
  --form 'data={"File": {"name": "my_file_2.pdf"}}' \

Example response

{
  "uids": [
    "cb713868a5155ad8",
    "f1df198b1ebdef98"
  ]
}

Adding existing files to a Collection

Files in Impira can belong to multiple Collections. To add an existing file into one or more Collections, you can issue a write to the file_collection_contents entity class with the Collection's and file's UID. You can find a Collection's UID from its URL in the application (https://app.impira.com/o/<YOUR_ORG_HERE>/fc/<COLLECTION_UID>).

Note: The result of this API request is always {"uids": null}.*

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/ec/file_collection_contents" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
  "data": [
 {
      "file_uid": "cb713868a5155ad8",
      "collection_uid": "5aff2aad39eaa373"
    },
    {
      "file_uid": "f1df198b1ebdef98",
      "collection_uid": "5aff2aad39eaa373"
    }
    ...
  ]
}'

Example response

{
  "uids": null
}

Inserting data with POST

You can insert data into a Dataset the same way as uploading a file. Unlike writing to /files or a Collection, you don't need to include the File object when writing to a Dataset. Instead, just specify the fields you want to write.

Note: These fields must exist on the Dataset. You'll receive an error if you try to write data referencing fields that do not exist.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/dc/337dd57d0c5948f9" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
  "data": [
  {
      "Name": "Rhyley Green"
      "Hired": true,
    },
    {
      "Name": "Ammar Duffy"
      "Hired": false,
    }
    ...
  ]
}'

Example response

{
  "uids": [
      "360fe1982bb9055a",
      "b0694bc41fa6b241"
    ]
}

The UID field is a unique identifier for the record you insert. If you do not specify a UID in the request, then Impira will generate one for you. You can use this UID in future requests (or just specify your own) to perform an "upsert," which will insert a new record if the UID is not already present or update the record if it is. For example, if the insert above returned a UID of "b0694bc41fa6b241" for "Ammar Duffy," then you can run this to update their Hired field to true.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/dc/337dd57d0c5948f9" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
  "data": [
    {
      "uid": "b0694bc41fa6b241"
      "Name": "Ammar Duffy"
      "Hired": true,
    }
  ]
}'

Example response

{
  "uids": [
    "b0694bc41fa6b241"
  ]
}
Note: Unlike a PATCH request which updates a row, you must specify every field in an upsert. For example, if you omit the name field from the above request (e.g., {"uid": "b0694bc41fa6b241", "Hired": true}), then the request will overwrite Name with an empty value.

Updating files or data

When you insert a new record using a POST request, you can either specify your own UID or Impira will generate one for you if you don't have one. You can use this UID later to reference the record and PATCH commands to update it. This method works across both Collections and Datasets.

Note: While you can access Views through the API, they are read-only, so you can't make updates to them. To update the data displayed through a View, you must update the Collection or Dataset that the row originates from.

Setting fields with a PATCH request

For each record in a PATCH request, you must specify the record's UID as well as any fields you'd like to update. This example updates the file we uploaded earlier to set its Inspected state to true.

Example request

curl -X PATCH "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
      "data": [
      {
        "uid": "8a4dc36aa82bf85b"
        "Inspected": true,
      },
    ...
  ]
}'

Example response

{
  "uids": [
    "8a4dc36aa82bf85b"
  ]
}

Just like a POST request, you can specify multiple records to update in a PATCH. Each record can even contain different fields that you'd like to update.

Replacing files or data

While PATCH requests allow you to update fields in an existing record, you can also replace an entire record in a Collection or Dataset using a POST request with a specified UID. This request will create a new record if the UID does not exist, or overwrite an existing one if does. You can set the UID to be any unique string of your choice while creating a new record, or if you inserted without a UID, you can use the value returned in the POST response.

For example, if a new version of a file arrives and you'd like to overwrite it, you can use this technique to update the record and avoid a duplicate record.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
    "data": [
      {
        "uid": "my_unique_file_12345"
        "File": {
          "path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
          "name": "application form 10"
        },
      },
    ...
  ]
}'

Example response

{
  "uids": [
    "my_unique_file_12345"
  ]
}

Deleting files or data

Use the DELETE verb to delete a record from the global set of files, a Collection, or a Dataset. The payload is the same format as the response from POST and PATCH requests (e.g., a list of UID values that you'd like to delete). For example, to delete the file above globally, you can run:

Example request

curl -X DELETE "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/files" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
  "uids": [
    "8a4dc36aa82bf85b"
  ]
}'

Example response

{
  "uids": [
    "8a4dc36aa82bf85b"
 ]
}

Deleting from a Collection

Just as in the application, if you issue a DELETE against a Collection, Impira will remove the file from the Collection but not delete it globally. If you'd like to delete a file globally, then delete it from /files as shown above. This will also remove the file from any Collections it belongs to.

Asynchronous mode

When you upload a new file to Impira through the API, the command completes the file has been fully uploaded and added to its target Collection. Depending on the load of the system, these steps can take several seconds to complete. For your use case, if you don't need the file to be initialized in its target Collection when the command completes, you can use the async flag in the API request to complete much faster.

To help you decide whether asynchronous mode is appropriate, consider the following:

  • If you perform an asynchronous upload, you may not see the file yet if you query the Collection immediately after the command completes.
  • Asynchronous mode still returns the UID of the uploaded file(s).
  • Whether or not you use asynchronous mode, upload commands never wait for all of the machine learning models to run before completing.

Example request

curl -X POST "https://app.impira.com:443/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>?async=true" \
  --header "X-Access-Token: YOUR_TOKEN_HERE" \
  --header "Content-Type: application/json" \
  --data '
{
    "data": [
      {
        "File": {
        "path": "https://assets-global.website-files.com/5e3898dff5077817ea80d71c/6047b3733b9cbd22251c7ef8_Screen%20Shot%202021-03-09%20at%2012.42.02%20PM.png",
        "name": "application form 10"
      }
    },
    ...
  ]
}'

Example response

{
  "uids": [
    "927a834f5f2036e6"
  ]
}

Need more?

See our Impira Read API to read files or data at a given path, or to query data via our Impira Query Language (IQL) API.

File modification during API upload

This feature is currently in beta. Give it a go and send us feedback at feedback@impira.com.

Impira has a built-in toolkit for modifying files already uploaded into Impira, but you can also modify your files as you upload them via the Impira Write API.

We currently support modifications for the following file types:

What could I use this feature for?

Save time by pre-encoding file modifications ahead of time if you anticipate needing certain adjustments without ever having to go into Impira yourself to do it manually. You can also reduce the number of page uploads you consume by removing unnecessary pages from your documents from the beginning.

If you’ve received a hundred different scanned forms that have been combined into a single PDF, or if many of your documents are oriented incorrectly, you can use the Impira Write API to ensure all of them are uploaded and modified simultaneously.

Basic query structure

Modification requests are built off of the files or collections endpoints used to upload files via the API:

https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/files
https://app.impira.com/o/<YOUR_ORG_HERE>/api/v2/collection/<YOUR_COLLECTION_NAME>

You can pass a mutate command to Impira via the File JSON representation. The mutate field contains instructions for transforming your base file into an extraction-ready format.

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "rotate": 90
        }
    }
}

For example, this command would upload a file from your system and rotate the entire document 90º clockwise:

curl -v -X POST https://app.impira.com/o/<YOUR_ORG_NAME>/api/v2/files \
  --header 'X-Access-Token: <ACCESS_TOKEN>' \
  --form 'file=@<FILE_NAME>' \
  --form 'data={"File":{"name":"test_api.pdf","mutate":{"rotate":90}}}'

Page syntax

Note: Pages in Impira are zero-indexed, meaning that “pages:” 0 refers to the first page of a document. Read more in the What’s zero-indexing? section below.

Our simple page syntax allows easy specification of a single page, a range of pages, a list of pages, and more:

Specifying:Example:
A single page (e.g., the first page)0
A list of pages0,2,4,7,8
A page range2:5

Syntax combinations

You can specify what pages you want in a relatively simple manner using a combination of ranges and individual numbers:

Negative indices

Negative indices help specify pages of a document of unknown length.

End index omission

An end index omission can be used to call all pages that occur after a given page number.

Rotating pages

Note: Impira will automatically detect and correct the orientation of pages in your file upon upload. If you’d like to turn this off, let us know at support@impira.com.

The following command rotates all pages of a file 90º clockwise (all rotations specified are clockwise):

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "rotate": 90
        }
    }
}
Note: We’re only able to rotate a document by 90º intervals (90º, 180º, 270º) at this time.

You can also choose specific pages or ranges you’d like to rotate:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "rotate_segments": [
                {
                    "pages": "0",
                    "degrees": 90
                },
                {
                    "pages": "2",
                    "degrees": 180
                }
            ]
        }
    }
}

The example above rotates the first page of the file by 90º clockwise, and the third page of the document by 180º (pages are zero-indexed in Impira). All other pages would be unaltered. The pages field supports individual pages, lists, and ranges.

Refer to the Page syntax section above for more information.

Removing pages

You can also remove specific pages from a multipage file:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "remove_pages": "3,5,6"
        }
    }
}

This example removes pages four, six, and seven from the original document.

Note: The remove_pages command should not be used in combination with the split_segments command. Instead, simply omit the pages you wish to remove from the split_segments field.

Splitting pages

You’re also able to use the mutate command to split a multipage document into smaller sections. The following syntax would split the document into individual pages:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "split": "page"
        }
    }
}
Note: The split command will remove the original document when complete, only leaving the fully processed split files.

For more complex document splitting operations, you can specify the shape of the desired result documents in terms of pages from the original document:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "split_segments": ["0,1", "2", "3,4", "5"]
        }
    }
}

The example above splits a six-page original document into four separate files with the following page distribution (page numbers are zero-indexed):

You can omit page numbers from split_segments if you don’t need certain information from the original document, as well as repeat page numbers if a single page (e.g., a header or title page) is shared between the resulting split files. You can also utilize the ranges specified in the Page syntax section below to capture ranges of pages.

Note: We are unable to support remove_pages operations if split_segments is also specified.

The API will return a upload_uids field containing a list of UIDs (unique identifiers) for all files inserted. If split or split_segments are specified, you can find the UIDs of the resulting documents based on this upload_uid.

Beta: Splitting pages based on page content

This feature is in beta and subject to API changes. Please reach out to support@impira.com if you're using this feature so we can keep you posted about any updates.

In many cases, it's impossible to know the exact page numbers to split documents on without inspecting the documents themselves. Impira supports splitting pages based on their text content. You simply write a regular expression which matches the first page of each document type.

The following syntax splits off a new document each time it encounters the regular expression "Page 1":

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "split_exprs": {
              "new_doc": "regexmatch(fulltext, 'Page\s+1')" 
            }
        }
    }
}

Each key in split_exprs represents a different document type. These names are arbitrary and just used as a convention.

For example, if you have a submission packet which contains a form and multiple loss run documents, you could write a more complex expression like:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "split_exprs": {
              "acord_125": "regexmatch(fulltext, 'ACORD\s+125') and regexmatch(fulltext, 'Page\s+1')",
              "acme_loss_run": "regexmatch(fulltext, 'ACME Loss Run')",
              "bank_stmt": "regexmatch(fulltext, 'Your monthly bank statement')"
            }
        }
    }
}

Each expression can access the full range of IQL syntax in terms of the full_text field (which is the full text of the page). If a page is true for any expression, then it's considered a new document. In the event of a tie, Impira will pick the first alphabetic match in terms of the key. If no expressions match, then the page is considered the next page of the previous document.

This feature is under active development, so please reach out to us with any feedback or suggestions you have about the syntax or functionality at support@impira.com.

Retrieving split results

Once your original document has been split, you may have multiple resulting files within Impira. To fetch the UIDs of your results, you can query the files entity class using the upload_uid field. For files that weren’t split, you’ll receive a result with the same value for both uid and upload_uid.

Use this query for the example results of a split operation:

{
  "upload_uids": [
    "c60f4011886c34a0",
    "e59d7e9f912bc3fd"
  ]
}

You can query against the files entity class to retrieve your final results:

@`files`[uid, File.upload_uid, File.MutationPending] in(`File`.`upload_uid`, "c60f4011886c34a0", "e59d7e9f912bc3fd")

If the MutationPending field is true, then the split operation has not yet completed.

[
  {
    "File.MutationPending": false,
    "File.upload_uid": "c60f4011886c34a0",
    "uid": "0aff1d6332b3266f"
  },
  {
    "File.MutationPending": false,
    "File.upload_uid": "c60f4011886c34a0",
    "uid": "4e03c48128364ad1"
  },
  {
    "File.MutationPending": false,
    "File.upload_uid": "c60f4011886c34a0",
    "uid": "5b7b973be2e0208c"
  },
  {
    "File.MutationPending": false,
    "File.upload_uid": "e59d7e9f912bc3fd",
    "uid": "e59d7e9f912bc3fd"
  }
]

Combining modifications

This is for all you go-getters out there. You can apply multiple modifications simultaneously to your API upload.

Example 1: Combining page removal and page splitting

This example removes the last three pages of the document and splits the remaining pages into single page files:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "remove_pages": "-3:",
            "split": "page"
        }
    }
}

Example 2: Combining page removal, rotating, and splitting

This example removes page six and onwards, rotates all pages 90º, and splits the document into single pages. The result is five single-page documents corresponding to the first five pages of the original document, with each page being rotated 90º:

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "remove_pages": "5:",
            "rotate": 90,
            "split": "page"
        }
    }
}

Example 3: Combining page rotation and page splitting

This last example rotates pages zero, two, and four by 180º while splitting the document into four separate files with the following page distribution (page numbers are zero-indexed):

{
    "File": {
        "name": "<FILE_NAME>",
        "mutate": {
            "split_segments": ["0,1", "2", "3,4", "5"],
            "rotate_segments": [
                {
                    "pages": "0,2,4",
                    "degrees": 180
                }
            ]
        }
    }
}

More info: What’s zero-indexing?

Zero-indexing (or “zero-based array indexing”) numbers items of a sequence in a way where the first item has an index of 0. This means the first item in a sequence is the 0th item and the second item is the 1st item.

If you’re given a 5-page document file, the first page may be called “1” and your second page is called “2.” With zero indexing, this first page is called “0,” the second page is then called “1,” and so on.

The difference between one-indexing and zero-indexing can be seen in the different floor number schemes for buildings in North America and Europe. Exceptions aside, North American buildings often call the first story “Floor 1” and European buildings often refer to their first story as their ground floor or “Floor 0.”

Using zero-indexing for the File Modification API allows consistency between other array access patterns, such as our query patterns in IQL and array access in other programming languages. Zero-indexing also has a lot of mathematically favorable properties, such as the length of the result being the difference between the start and end index.

Creating and modifying Collections through the API

Create Collections and fields through the API and SDK.

Creating a Collection

To create a Collection, you simply hit the /collection/ endpoint with an empty data object.

Example

Importing fields from another Collection

When you create a new Collection, it's often useful to import fields from another Collection so you don't need to create them one by one. For example, you may want each of your different invoice formats to be in a separate Collection for better training, but you'd like the output to have the same fields.

Steps:

  1. Create Collection One and add your fields (if you haven't already done so).
  2. Create Collection Two and import your fields from Collection One.

In these examples SRC refers to the source Collection with the fields you want to import, and DST refers to the destination Collection that you want to add fields to.

Tip: To import fields directly through the API, you need to use the Collection's entity name — i.e., file_collections::<collection_id>. For example, if your Collection's id is e99d6bea7ace8b07, then the entity name is file_collections::e99d6bea7ace8b07. This isn't necessary if you're using the Python SDK.

Example


Using Impira Query Language (IQL) for advanced queries

Impira Query Language (IQL) makes it really easy for you to interactively search, filter, and transform the semi-structured data trapped in your files. IQL is as easy to use as a search bar, generally with a single line of text, yet provide powerful features that would otherwise require writing a complex query using something like SQL.

IQL can be used for advanced queries within the Impira UI, to power a custom field on a Collection, or to query data over the API.

Design principles

IQL is designed specifically for building business applications on metadata-rich information. In fact, all of your interactions in Impira are powered by IQL queries — from uploading new files and creating collections to adding machine learning fields to a collection to extract information.

We wanted IQL to be approachable, designed to fit in a single line, and to be written incrementally. You can start with a search query, refine the results, then quickly add a few instructions to start pivoting and aggregating your data.

Switching to IQL mode

Once you’ve created a Collection and extracted some information, you will need to change your search mode to IQL. You can do this by clicking on the menu button on the right side of the search bar, selecting IQL, and clicking Save. For the rest of these examples we will be querying a Collection of invoices, but you can use a Collection of your own files, too.

IQL concepts

Querying

The first part of a query is the entity class you want to query from. There are a number of entity classes available to query, but the two most common are @files and @file_collections. Each of your Collections within Impira are assigned a unique identifier, and after changing your search mode to use IQL, you’ll see the name of the entity class in the search bar.

Impira will pre-fill the entity class of the collection you’re currently working on, and it’ll prefix it with an @ symbol to distinguish it from the other parts of the query. If you’re working with IQL via the API, you’ll need to include the entity class and the @ symbol in your query.

Running the query with just the entity class will return all of the fields for each row in the collection. Impira visualizes this in a table, but there’s a lot more information available for each file and field.

You can see this by switching to the JSON view.

Go ahead and switch to the JSON view and see how rich the data is behind the scenes. Switch between the table view and the JSON view at any time by returning to the view menu.

For fields with nested values, you can use dot notation (.) to navigate the nested path. For a field name that contains spaces, like Invoice No, you can write the name between backticks (`).

Tip: Backticks — the apostrophe lookalike

It’s important to place backticks around a file or field name to distinguish it from the rest of an IQL query (see the example below with field names Invoice No and Due Date in backticks).

Backticks look a lot like apostrophes (ʼ) or single quotation marks (‘), so make sure you pick the right symbol on your keyboard as you write IQL queries.

Image credits: Apple and Keychron

Projection

The second part of a query is the projection list. This is used to specify which fields or expressions you want to return, allowing you to selectively access the fields specific to your needs. The projection list is specified between square brackets, [ and ], and each expression within the list is separated by a comma.

Let’s query our entity class for the Invoice No and the Due Date:

;[`Invoice No`, `Due Date`]

‍Notice that this returns the complete JSON structure for only the fields specified in the projection. You can be more specific about obtaining just the extracted information from this result by using the dot notation to navigate through the result. Let’s query for the extracted value for Invoice No and Due Date: js [`Invoice No`.Label.Value, `Due Date`.Label.Value] You’ll see your result change to a much simpler array of objects with only two properties per row in your collection.

Aliasing

Items in the projection list can be given a new name, also known as aliasing. You can do this by prefacing a field name with a name of your own, separating your new name and the field name with a colon (:). This makes reading the results simpler, and also provides a powerful way to rename fields to meet the needs of downstream applications that consume data via Impira APIs.

Go ahead and change your projection list to include new names for your fields.

[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value]

Switching back to the table view you’ll see that the new JSON result from your query is also visualized nicely with a header row that matches your aliased field names, and each row with the data you projected as part of the query. Switch back and forth between each view to get a different perspective on your result.

Filters, sorting, limits, and offsets

The remainder of the IQL query, after the entity class and (optional) projection, is the filter. The filter is flexible and designed to allow you to write simple text queries to more specific queries over fields or expressions.

Let’s take a look at some examples that build on the Invoice No and Due Date queries above.

Query for all invoice numbers with a due date greater than Jan 1st 2020:

[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] due>2020-01-01

Query for a specific invoice number:

[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] inv_no:451

Query for all invoice numbers with a due date in 2019 (notice in this example that the due date is filtered on both “less than” and “greater than” a specific date):

[inv_no:`Invoice No`.Label.Value, due:`Due Date`.Label.Value] due>2019-01-01 due<2020-01-01

Query for the 10 most recent invoices, including Invoice No and Total Due, sorted by the highest due date:

[inv_no:`Invoice No`.Label.Value, total:`Total Due`.Label.Value, due:`Due Date`.Label.Value] limit:10 highest:due

Query for the lowest 10 invoice totals:

[inv_no:`Invoice No`.Label.Value, total:`Total Due`.Label.Value, due:`Due Date`.Label.Value] limit:10 lowest:total

For larger results, you’ll often want to paginate the results into smaller chunks. You can do this using a combination of limit and offset filters. For example, here’s a filter to get 100 records, beginning from the 300th record:

limit:100 offset:300

Pivot and rollup

IQL also supports a concept called pivoting which is a neat way to group your results. Just like the above examples of limit, offset, highest, and lowest, you can set pivot to true. Let’s look at an example to see how you can easily group your data for more detailed analysis or to summarize your results.

Let’s assume you wanted to see all your invoices grouped by year. IQL has many built-in functions to transform and manipulate details, and for this example we’ll use the year() function to truncate the Due Date field to the year.

[year(`Due Date`), {`Invoice No`, `Total Due`}] pivot:true

Similarly, you can use other functions in your query to aggregate the results. With a small change to the query, we can calculate the total due for all invoices for each year. IQL has some additional conveniences too, and when an aggregate function like sum() is used, pivot is automatically set to true.

Combining what we’ve learned about filters and aliasing our projected fields, we can create a user friendly output that can be easily shared throughout your business.

[`Invoice year`: year(`Due Date`), `Total spend`: sum(`Total Due`)] highest:`Total spend`

In addition to pivoting, IQL supports a strong form of grouping called a rollup. A rollup will compute the value of each aggregate at each level of nesting in the query. Building on the previous example, you can expand this to include the month and see the total of all invoices by month and by year.

[`Invoice Year`: year(`Due
Date`), `Invoice Month`: month(`Due Date`), `Total spend`: sum(`Total Due`)] rollup:true

IQL functions catalog

IQL has a rich library of functions, which are inspired in name & functionality by Excel, Google Sheets, and SQL. We are constantly adding to this list. Please get in touch with us if there's a function that you'd like to use that's not on this list!

Type conversion functions

to_bool

to_bool(expr)

Casts expr to a boolean type.

Example query

returns each value `id` casted to a boolean.

to_number

to_number(expr)

Casts expr to a number type. Timestamps casted to numbers return the UNIX timestamp representation (seconds since the epoch).

Example query

returns each value `placeholder_timestamp_field` casted to a number (UNIX timestamp).

to_string

to_string(expr)

Casts expr to a string type

Example query

returns each value `id` casted to a string.

to_timestamp

to_timestamp(expr)

Casts expr to a timestamp type. A number cast to timestamp is assumed to be a UNIX timestamp.

Example query

returns each value `id` casted to a timestamp (interpreted as a UNIX timestamp).

String functions

concat

concat(arg1, arg2, arg3, ...)

Example query

constructs a string from the fields `id` and `product_sku`.

textjoin

textjoin(delimiter, string_list)

Joins together the strings in string_list with the delimiter.

Example query

returns the array ["a","b","c"] joined together by " ,".

upper

upper(text)

Convert text to upper case.

Example query

returns 'The quick brown fox' uppercased.

lower

lower(text)

Convert text to lower case.

Example query

returns 'The quick brown fox' lowercased.

left

left(text, [num_chars])

Extract the text from the left of the string. num_chars defaults to 1 if not provided.

Example query

takes the first 10 chars from the left of the input string and returns 'brown fox'.

right(text, [num_chars])

Extract the text from the right of the string. num_chars defaults to 1 if not provided.

Example query

takes the last 10 chars from the right of the input string and returns 'brown fox'.

mid

mid(text, start, num_chars)

Extracts the text starting from start characters (indexed by 1) up to num_chars characters.

Example query

starts 5 chars from the start (q), reads 11 chars, and returns 'brown fox'.

strip

strip(text)

Removes leading and trailing whitespace from text. This is similar to the strip() function in Python.

Example query

removes leading/trailing whitespace and returns 'The quick brown fox'.

trim

trim(text)

Removes all spaces in text, leaving just a single space between words. Like strip, trim removes leading/trailing whitespace, but also cleans up the spaces within the string.

Example query

returns 'The quick brown fox'.

proper

proper(text)

Convert text to title case, where all words begin with a capital letter.

Example query

converts the input string to title case and returns The Quick Brown Fox.

sentence

sentence(text)

Convert text to sentence case, where most words are lowercased and the first word in a sentence is capitalized.

Example query

converts the input string to sentence case and returns The quick brown fox.

split

split(text, separator)

Split the input text using the specified separator and returns an array of the parts.

Example query

splits `placeholder_string_field` using a hyphen - as a separator.

stem

split(text)

Takes the input text and returns a stemmed version of the words.

Example query

takes the word disappeared and returns the root stem of the word, disappear.

clean_stopwords

clean_stopwords(text)

Remove stopwords (commonly used words, for example: I, we, you, me, them, their, the) that are filtered from the input text. This can be useful when searching across OCR keywords.

Example query

filters out the stopwords from the input string and returns quick brown fox lazy brown dog.

Conditional functions

coalesce

coalesce(arg1, arg2, arg3, ...)

Returns the first non-null argument. All arguments must be either scalar or objects with the same spec. Coalesce is useful for combining fields where some values may be null.

Example query

returns `id` if it's non-NULL, otherwise `product_sku`.

if

if(condition1, value1, [condition2, value2, ...], elseValue)

Returns first value for which condition == true. If no conditions are true, returns elseValue.

Example query

returns `id` if it is positive, otherwise 0.

_and

_and(x, y)

AND's two values together. This is equivalent to x and y. Evaluates to null if any of the values are null.

Example query

returns the boolean expression (`placeholder_bool_field` and true).

_or

_or(x,y)

OR's two values together. nulls evaluate to false.

Example query

returns the boolean expression (`placeholder_bool_field` or true).

eq

eq(x,y)

Compares x with y and returns a BOOLEAN as a result. This function works with all input types (STRING, NUMBER, TIMESTAMP, BOOLEAN, ENTITY).

Example query

compares (`id`=1).

in

in(val, x, y, ...)

Compares val with the remaining values (x, y, and so on) and returns a BOOLEAN if val equals any of them. This is semantically the same as (val=x) or (val=y) .... This function works with all input types (STRING, NUMBER, TIMESTAMP, BOOLEAN, ENTITY).

Example query

returns true if (`id` equals 1, 2, or 3.

in_list

in_list(val, list)

Compares val with the values in list and returns a BOOLEAN if val equals any of them. This is semantically the same as in(val, index(list, 0), index(list, 1), ...) This function works with all input types (STRING, NUMBER, TIMESTAMP, BOOLEAN, ENTITY).

Example query

returns true if (`id` equals 1, 2, or 3.

not

not(expr)

Returns a boolean for the opposite of the input expression when the expression evaluates as a boolean. If a binary number 0 or 1 is entered as the expression they are treated as boolean values.

Example query

returns true if (`id` does not equal 1.

Date and time functions

datepart

datepart(interval, timestamp)

Returns the part of the timestamp specified by interval. Supported values for interval are:

yyyy | Year
q  | Quarter
m | Month
y | Day of year
d | Day
w | Weekday
ww | Week
h | Hour
n | Minute
s | Second

Example query

returns the year part of `placeholder_timestamp_field`.

year

year(timestamp)

Returns the timestamp truncated to the year.

Example query

returns the year part of `placeholder_timestamp_field`.

month

month(timestamp)

Returns the timestamp truncated to the month.

Example query

returns the month part of `placeholder_timestamp_field`.

week

week(timestamp)

Returns the timestamp truncated to the Monday of the week

Example query

returns the year part of `placeholder_timestamp_field`.

weekday

weekday(timestamp, [return_type])

Returns the day of the week of timestamp as a number. By default, Sunday is 1 and the values range from 1-7. However, you can specify the optional return_type argument to change this behavior. IQL mimics Excel's specification which you can find here.

Example query

returns the day of week from `placeholder_timestamp_field`.

day

day(timestamp)

Returns the timestamp truncated to the day.

Example query

returns the day part of `placeholder_timestamp_field`.

hour

hour(timestamp)

Returns the timestamp truncated to the hour.

Example query

returns the hour part of `placeholder_timestamp_field`.

minute

minute(timestamp)

Returns the timestamp truncated to the minute.

Example query

returns the minute part of `placeholder_timestamp_field`.

second

second(timestamp)

Returns the timestamp truncated to the second.

Example query

returns the second part of `placeholder_timestamp_field`.

add_interval

add_interval(timestamp, amount, timestamp_part)

Add the number specified by amount to the timestamp_part of the input timestamp. timestamp_part can be one of year, month, day, hour, minute, second.

Example query

increments the month part of `placeholder_timestamp_field` by 1.

sub_interval

sub_interval(timestamp, amount, timestamp_part)

Subtracts the number specified by amount to the timestamp_part of the input timestamp. timestamp_part can be one of year, month, day, hour, minute, second.

Example query

decrements the month part of `placeholder_timestamp_field` by 1.

sub_interval

datedif(timestamp1, timestamp2, part)

Compares the difference of the specified timestamp part between the two timestamp's. part can be one of year, month, day, hour, minute, second.

Example query

compares how many day's difference there are between `placeholder_timestamp_field` and now().

text

text(value, format_text)

Returns the string with the date in value formatted according to format_text.

Example query

returns the date `placeholder_timestamp_field` formatted according to mm/dd/yy hh:mm:ss.

Array and entity functions

array_to_object

array_to_object(field)

Given a field of an object that is part of a list, array_to_object will construct a new object where the keys are the values of field in the original objects in the list, and the values are the objects themselves.

Example query

This hypothetical query would turn array_field into an object keyed by f. For instance, if array_field = [{"f":"a","value":5},{"f":"b","value":6}] then array_to_object(array_field.f) = {"a":{"f":"a","value":5},"b":{"f":"b","value":6}}.

array_window

array_window(array, lag, lead)

For each value in the array returns the number of lagging and leading values specified by lag and lead (the values before and after each value in the array).

Example query

will return an array with the 2 values before and after each item in build_array(1,2,3,4,5,6,7,8,9,10).

build_array

build_array(arg1, arg2, arg3, ...)

Constructs an array with the provided arguments. The arguments must be either all entities or all non-entities. build_array() returns an empty array

Example query

constructs the array [`id`, 2, 3].

build_entity

build_entity(field1, value1, field2, value2, ...)

Constructs an object with the arguments provided. Arguments should be provided in pairs, where the first argument in the pair is the field name, and the second is the value.

Example query

constructs the object {"foo":true}.

concat_array

concat_array(array1, array2, array3, ...)

Concatenates and flattens arrays into a single array.

Example query

returns the array [1,2,3,4,5,6].

index

index(array, idx)

Returns the value at the idx position (zero-indexed) of array. If idx is out of bounds, the function returns null.

Example query

returns the first element of the array ("apple").

array length

len(array)

len takes an array as an argument and returns the length of that array.

Example query

evaluates to 3.

merge

merge(obj1, obj2, obj3, ...)

Merges the objects together into a single object. merge ignores fields in objects with null values.

Example query

merges objects {"foo":1} and {"bar":2} into {"foo":1,"bar":2}.

merge_many

merge_many(objArray1, objArray2, objArray3, ...)

Merges and flattens lists of objects together into a single object. merge_many ignores empty arrays.

Example query

merges each object in arrays [{"foo":1}] and [{"bar":2}] into [{"foo":1,"bar":2}].

merge_unnest

merge_unnest(obj, array)

This function is used to expand an array into a set of rows, and merge each row with the same obj.

Example query

un-nests the array of bar objects and merges each object with {foo: 1}. The result is an array of new objects that look like {foo: 1, bar: 2}.

slice

slice(array, start, end)

Returns the subset of the array from index start (zero-indexed, inclusive) to end (zero-indexed, exclusive). Negative start or end indices are relative to the end of array.

Example query

returns the first two elements of the array ("apple").

strip_nulls

strip_nulls(obj)

Recursively strips fields with null values from objects.

Example query

returns each `deploy` entity without any null keys or values.

Regular expression functions

regexextract

regexextract(str, reg)

Returns the left-most match of regex reg on string str. Returns null if there is no match. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXEXTRACT.

Example query

returns each value of `placeholder_string_field` that contains the string "impira" or null otherwise.

regexmatch

regexmatch(str, reg)

Returns true if str matches regex reg, otherwise it returns false. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXMATCH.

Example query

returns a boolean that indicates whether `placeholder_string_field` contains the string "impira".

regexreplace

regexreplace(str, reg, replacement)

Replaces matches of reg in str with replacement. IQL uses the RE2 syntax under the hood and strives to match the syntax of Google Sheets REGEXREPLACE.

Example query

returns the string with all matches of img[0-9]a replaced with img.

Mathematical functions

add

add(expr1, expr2)

Adds the values of expression expr1 plus expression expr2.  Expressions must evaluate to a NUMBER else add will return NULL.

Example query

adds `id` and 1.

sub

sub(expr1, expr2)

Subtracts the value of expression expr2 from the value of expression expr1. Expressions must evaluate to a NUMBER, otherwise sub will return NULL.

Example query

subtracts 1 from `id`.

mul

mul(expr1, expr2)

Multiplies the value of expression expr1 and expression expr2. Expressions must evaluate to a NUMBER, otherwise mul will return NULL.

Example query

multiplies `id` by 2.

div

div(expr1, expr2)

Divides the value of expression expr1 by expression expr2. Expressions must evaluate to a NUMBER, otherwise div will return NULL.

Example query

divides `id` by 2.

least

least(expr1, expr2, expr3, ...)

Returns the lowest value among the input expressions.

Example query

returns the lowest value among 1, 2, 3, 0 which is 0.

greatest

greatest(expr1, expr2, expr3, ...)

Returns the highest value among the input expressions.

Example query

returns the highest value among 1, 2, 3, 0 which is 3.

floor

floor(expr, significance) 

Rounds the value of the expression expr down, towards the nearest multiple of significance. significance is optional and is 1 by default.

ceiling

ceiling(expr, significance) 

Rounds the value of the expression expr up, towards the nearest multiple of significance. significance is optional and is 1 by default.

round

round(expr, precision)

Rounds expr to precision (optional) number of digits. If no precision is given, the function rounds to the nearest whole number.

Example query

returns `id` divided by 2 and rounded to the nearest whole number.

Aggregate functions

array_agg

array_agg(expr, [sort_expr])

An aggregate function used to return an array from values evaluated by the input expression expr. An optional second param sort_expr can be specified to sort the results.

count

count(expr)

Aggregates by counting the number of distinct values for the field. This is the same behavior as COUNT(DISTINCT ...) in SQL. If no argument is passed, count() counts the number of entities. This is the same behavior as COUNT(*) in SQL.

Example query

returns the number of unique values of `id`.

count_non_null

count_non_null(expr)

count_non_null(expr) will count the number of entities where expr is not null. This is the same behavior as COUNT(expr) in SQL.

distinct

distinct(expr)

Returns all distinct values for expr as a list

Example query

returns the unique values of `id`.

min

min(expr)

Returns the minimum value for the specified expr.

max

max(expr)

Returns the maximum value for the specified expr.

mean

mean(expr)

Returns the mean value for the specified expr. This is a synonym for avg.

median

median(expr)

Returns the median value for the specified expr.

merge_agg

merge_agg(expr, [sort_expr])

Merge sub-fields of an entity expr into a single entity. If duplicate keys are encountered during the merge into a single entity, the optional argument sort_key can be used to keep the value for the key with the higher sort order.

Example query

returns the subfields of `deploy` merged into a single entity, grouped by `id`.

object_agg

object_agg(name, value)

Aggregates name and value pairs into a single entity. Values for name must be non-null.

Example query

returns an entity with values of `placeholder_string_field` as the field names and the values of `deploy` as the values, grouped by `id`.

sum

sum(expr)

Computes an aggregate sum over all values of expr. sum mirrors the SUM() function in SQL.

Example query

returns the total `id` across each unique value of `placeholder_string_field`.

single

single(expr)

Returns at most one non-null value of expr. This aggregate can be used when you know that you are pivoting on a unique value.

Example query

returns a single value of field for each value of uid. It will return an error if there are multiple records with the same value of uid.

Join functions

join_one

join_one(target_entity, source_field, target_field)

This function joins the calling entity to the target_entity where the source_field matches the target_field. This is a one-to-one relationship and functions like a LEFT JOIN in SQL.

Example query

This example assumes a Dataset called My Dataset which has a field called id and a Collection with an extracted field called Reference number. When called, this function will join the matching row from the Dataset to the matching row in the Collection where the Reference number equals the id in the Dataset.

join_many

join_many(target_entity, source_field, target_field)

This function joins the calling entity to the target_entity where the source_field matches the target_field. This is a one-to-many relationship and functions like a LEFT JOIN in SQL.

Example query

This example assumes a Dataset called My Dataset which has a field called id and a Collection with an extracted field called Reference number. When called, this function will join the matching rows from the Dataset to the matching row in the Collection where the Reference number equals the id in the Dataset and return an array.

Other general purpose functions

hash

hash(arg1, arg2, arg3, ...)

Hashes one or more arguments together. IQL uses the Go implementation of crc64.

Example query

returns the hash of `id` and `product_sku` for each record.

identity

identity()

Returns the identity key for the entity as specified in the spec file. If no identity key is specified, it returns the entity as a bracket surrounded string (i.e. "[{"foo":"bar"}]")

Example query

returns each `deploy` as an identity key.

__permissive_filter

__permissive_filter(filter)

Mimics full-text search behavior with flexible free-text and field-specific filters. Permissive filtering does not fail if the fields specified via field:value do not exist. It is intended to directly execute a query entered in a search bar.

Example query

returns records containing "sample text" and where `id` is greater than 0.

© 2023 Impira Inc. All rights reserved. This site is built with Motif.