June 30, 2022
Instant Collections are pre-trained on common documents and forms so you can get going faster by not having to add individual fields and train your Collection. Learn more.
Earlier this year, we introduced page rotation so you could fix any incorrectly oriented pages before you began extracting data, but now we’ll automatically detect those as soon as you upload your files so you can get going even faster. You can always turn this feature off by letting us know at email@example.com.
Splitting pages with an IQL expression via the API
Want to split the pages of your file into separate files, using an IQL expression? Now you can, via the API. For example, you could split your document after the word “Total” or “Page 1” appears on the page. Learn more.
- You can now download up to 2,500 rows as a CSV file (up from 1,000 rows).
- You can now use the
enumerate()builtin in the IQL playground.
- File view has been optimized to open much quicker for large files.
- If you correct a date or number field to include text, we’ll invite you to change the value type to text rather than just erroring.
- The upload box in file view now resizes to accommodate smaller browser windows.
- Fixed processing errors that sometimes occurred while uploading some non-standard PDFs.
- Labels won’t flicker on and off your document as you set up your first row for table extraction.
- Now we prevent labels from occasionally being saved as “no value” during table extraction set up.
May 31, 2022
Upload files in file view
After extracting data fields on the first file of a Collection, you can conveniently upload more similar files straight into Collection while you're still in file view.
- P.S. This'll work even if the upload box isn't visible. Start dragging files in and it'll appear.
SOC 2 Type 2 Certification
Data security has always been a priority at Impira, so we've leveled up with our recent SOC 2 Type 2 certification. Learn more.
- Hey there, IQL user! There's now a faster way to refer to all the words in a file. The old way was to write:
textjoin(" ", File.text.words.word)but that seemed a tad long, so now you can just use:
- Dragging-and-dropping columns to reorder them is now just an overall nicer experience.
- Confirming entire rows: We made it easier to confirm entire rows and the first rows of predicted tables.
- Faster processing on document uploads.
- Picking dates: Setting date values is easier because you can now type dates yourself (in addition to choosing from a calendar).
- Improved handling of TIFF image files.
- Joins on custom fields weren't properly rendering all the time but do now.
- We thought it'd be nice to have a gentler message if you accidentally reuse an existing table column name.
April 26, 2022
Revamped Help Center
Our new Help Center has been rebuilt from the ground up to make it easier and faster for users to find what they need. Designed to be more readable and skimmable than ever, this new Help Center will be your new best friend. (Psst... you're actually in it right now.)
New “Add field” design
Our design team has taken a fresh look at how our users choose field types. This makes it easier for new users to quickly choose between Single text, checkbox, or table values.
Import fields from another Collection
You can now import a set of extracted fields from one Collection and add them to a different one. Read more.
Create and modify Collections through API
Users can now create Collections and import fields from other Collections through the API and SKD. Read more.
Modifying files while uploading via API
Earlier this year, we launched our file modification toolkit (rotating, splitting, and removing pages), but we’ve taken it a step further to allow users to make these modifications while simultaneously uploading them through the API. This feature is currently in open beta. Read more.
Table extraction cues
Did you know that tables only refresh predictions when you confirm an entire row, and that you should always try to confirm the first few rows of each document from top to bottom? We thought that wasn’t super intuitive, so we now have new buttons for you to confirm entire rows from top to bottom. We automatically open the next predicted row with the Confirm row button ready to go.
Use Views to save and manage a frequently used IQL query instead of using a bunch of bookmarks with super long URLs. Read more.
A recent update to our Zapier integration allows users to read files from Views.
For free trial users, you now have a “pages remaining” counter in the sidebar to help you keep track of your free page consumption.
Rows with varied heights (table extraction)
Our table extraction feature has helped so many users extract data, but not all tables are made the same. We've developed table extraction to tackle rows of variable heights so you can extract whatever kind of table comes your way.
Tip: Confirm a few rows with different heights to train the model.
Rows with shared fields (table extraction)
You can also create a shared field that's shared by multiple nested rows.
Naming files during file splitting
When a document is split, new file names are now “padded with 0's” (i.e., if there are 10 new files, we’ll have
file_10 , etc).
- You can now cancel your plan and change payment details through your billing page (found via the Account icon).
- Our user login page has been redesigned and more ready than ever to launch you into accelerating your workflow.
- In financial docs, a dollar amount in parentheses indicates a negative number, so we now automatically recognize them as such.
- New experience for uploading files:
- A new set of notifications provide more detail and transparency on your files’ uploading status and location.
- Faster uploading of large batches of files.
- Machine learning performance improvements for large documents.
- The IQL playground now has a historical query mode.
- A bug where deleting multiple Smart Collections only deleted one.
- Fixed PDF scaling and cropping issues.
- Multiple machine learning bug fixes that improve accuracy for text and table extraction.
- Fixed an issue where date filters used the wrong start date.
January 3, 2022
Table extraction for all
Extracting data from tables is now open to all users. Sign in to your account and start extracting table data. Read more.
Modifying files within Impira
Users can now modify their files before beginning data extraction. Rotate pages, remove pages, or split pages of a file into separate new files. Learn how.
- Improved and more intuitive navigation for users
- Improvements and bug fixes to table extraction feature
- Improved document view for dense documents, table view for collections with many files, and file processing and downloading
- Improvements with IQL: Join performance and added support for join keys with multiple fields
- Bug fixes for CSV and API
September 29, 2021
Impira x Snowflake integration
Our latest integration will allow joint users to use Impira to extract data from unstructured data files in their Snowflake warehouses and stream it back into Snowflake for further analysis. Learn more.
New billing process and pricing structure
Ready to upgrade? Get started with our new billing process by signing into Impira, clicking the Account icon in the top right of your window, then going to the Billing tab. Visit our pricing page to learn more about our new pricing structure
Table extraction: Beta
Our beta program for extracting data from tables is still underway and continually improving. We’d love for you to join our table extraction beta program, take it through the paces, and give us any feedback. Contact firstname.lastname@example.org to access this new feature.
- UI: Added tooltips, improved labelling and training experience
- IQL: Significantly reduced training and evaluation time for table extraction
- Created a new compact table view to make more data visibly accessible
- The “in progress” spinners that indicate the model is training wasn’t working for some cases
- Users can now confirm “no value” predictions
- Certain valid values weren’t being properly accepted for numeric and date types
- Fixed an issue where values copied from some cells to the clipboard had double quotation marks added
August 23, 2021
Table extraction: Beta
You asked and we’ve answered: Currently in beta, our new table extraction capability allows you to pull data from tables within documents like purchase orders and invoices. We know that tables are rarely straightforward nor do they always follow convention. This feature allows users to extract standard grid-like tables as well as more complex tables that may not conform to a rigid grid format.
We’d love for you to join our table extraction beta program, take it through the paces, and give us any feedback. Contact email@example.com to access this new feature.
Redesign: The “Add field” feature
The new “Add field” feature in our UI makes it easier for users to understand the order of steps needed to add new fields for data extraction. The redesign displays important information that allows the process to be more intuitive. This feature will be rolled out throughout the month of August.
Dynamic bounding box
The bounding boxes (used to select values to extract) will dynamically expand or contract to correspond with what a user is actively typing.
Want to see what the field you just extracted looks like on other files? When you’re in file view, check out the lower right corner to see a preview of what Impira’s doing in real time.
Text preprocessing improvements
We’ve updated the method of preprocessing incoming files to improve the quality of the extracted text. These changes include using a different parser for digital PDFs, improving the quality of images we use to run optical character recognition (OCR), ignoring embedded PDF text with font issues, and many more. All of these changes contribute to higher quality text extraction for your documents.
We’ve been hard at work improving the performance and speed of the Impira across the board. Some notable examples: Confirming and editing the values for machine learning (ML) predictions is now 10x faster, and file uploading speeds are more than 2x faster than before.
Lots of new entities
In our last release notes, we announced that we have started using Named Entity Recognition (NER) to improve the accuracy of the text extraction models. Since then, we’ve dramatically expanded the set of entities we detect, including addresses (and their constituent parts), email addresses, phone numbers, currency, and more. The ML models can use this semantic information to improve the accuracy of your extractions.
June 27, 2021
Reviewing Impira’s data predictions on your files is easier (and more fun) with the new Review workflow. Fly through the predictions that need review and actively improve Impira’s machine learning models while you’re at it through just a few clicks.
Zapier integration: Beta
The Impira x Zapier integration is now in public beta. Impira users can now use Zapier to connect Impira to thousands of other apps.
Experimental features: IQL Playground
Check out our new Experimental features button, where you can access the IQL Playground. The IQL Playground allows you to make instant IQL queries and view your data in any shape or form.
Named Entity Recognition (NER)
Impira now runs Named Entity Recognition (NER) on each new file, which identifies semantically important groups of text such addresses, currencies, email addresses, and more. Impira’s machine learning models can then use this information to learn what types of text you are trying to extract, leading to predictions with much higher accuracy.
- More detailed API error messages.
- Product speed and responsiveness. These speed-ups are most prominent in file upload and processing, creating and editing fields, and working with many collections.
May 6, 2021
Impira supports running complex queries against collections and datasets using Impira Query Language (IQL). You can now use
/poll to just receive the changes since the last time you ran a query. This allows you to run a continuous workload, where you receive updates as they happen. Explore our Read API documentation.
Impira supports configuring webhooks to subscribe to changes in Collections and datasets. This is part of the new Collection automations feature, which supports continuously ingesting files and exporting data from Impira. Stay tuned as we’ll be shipping more and more of these features in the months to follow.
New computed fields for confidence
Easily distinguish between files that may need a bit more review and files with fully confident data. With newly introduced computed fields, you can quickly inspect the confidence values and states across all extracted fields for each file in a collection. The three new boolean fields are:
File.IsPreprocessedtells you whether a file has completed processing (e.g., uploading, analyzing text, producing a thumbnail, etc).
__system.IsProcessedtells you whether a record in a collection has completed processing across all of its ML fields.
__system.IsConfidenttells you whether all machine learning values for a record are high confidence.
You can read more about these fields in the ML confidence guide.
New text parsing improvements
Impira has significantly upgraded its process of extracting text from documents and images. This includes improved preprocessing for images and documents for optical character recognition (OCR) which will yield more accurate and complete extracted text, and more complete extraction of text embedded in PDFs.
April 7, 2021
You can now upload files directly through Impira’s upload API, which supports both URL-based upload and direct upload over HTTP. We’ve also made several overall enhancements to the API, including simpler paths to access both collections and datasets, a simpler request/response format, and new documentation. Read more about API here.
Upload to Impira via email
You can now email attachments directly into Impira by either forwarding files straight from your inbox or add Impira’s forwarding email addresses to a mailing list so files upload as they arrive. Read more about email integration.
New text extraction
You now have access to a new version of the text extraction algorithm which is both more accurate and learns faster from each new data point. This new algorithm is the default for all users. Learn more about how to use text extraction.
Extract checkboxes, radio buttons, and other binary indicators just as you can for numbers, dates, and other text. Find out more about checkbox extraction.
New onboarding experience
During onboarding, users will see Impira automatically detect important fields for extraction as they upload files, making it easier and faster to get started. Users will also be able to use their own files rather than rely on provided samples.