How to extract data from tables
What can I use table extraction for?
Impira's powerful and flexible table extraction feature lets you pull data out of tables within your documents, as well as data from recurring lists and repeated values, and place them in a clean, spreadsheet-like format.
Adaptable to unique layouts
Tables come in all sorts of shapes, sizes, and formats. Group files with similar table layouts into Collections, and use the following steps to teach Impira what your tables look like. Impira will actively learn and begin to extract the same data you need out of the rest of your files.
Here are some examples of tables that Impira can extract:
How to extract table data
In many cases, Impira will automatically detect tables on the first page of your file. You can click this table suggestion (highlighted in purple) and go through an expedited process of creating a table field, or you can dismiss the suggestion and create a field manually.
Setting up an autodetected table
Step 1: Click the table
Open your file and click the big purple box to get started.
Step 2: Set up Row 1
- Starting from the left, check that each autodetected column name and value for the first row is correct.
- Click Next column after inspecting each value and column name (after making any adjustments to the blue box or value box if necessary).
- Hit Finish setup after you’ve checked all the values in the first row.
Step 3: Review predictions
- When Impira finishes extracting the rest of the table rows in this file, go down the queue and check the predictions, starting with the very next row, Row 2.
- If necessary, correct any incorrect prediction values by adjusting the blue box or manually typing a correct value into the value box.
- Click Confirm entire row when you’re done. This triggers Impira to take your new input into account to create better predictions.
Step 4: Upload more files
Exit Extraction view and upload more similar files with tables into this Collection. Impira will immediately start extracting table data from those files based on the table you just set up.
Step 5 (optional): Rename your table
We automatically give your table a name, but feel free to change it to anything you want.
Manual data extraction
Start with Row 1
Once you've uploaded your file and placed it into a Collection, open a file to enter Extraction view.
- Go to Add field in the top right corner, select Table, and give your table a name.
- A blue box will appear on your document. Adjust the box to capture your first row's values, minus any header or column names.
- One by one, name your column headers, highlight the corresponding value, then click Add value.
- Choose Done after you've added your last value for Row 1.
✨ Behind the magic: What's happening?
As you start labeling your first row values, Impira is already using your first few labels to learn what you're looking for and searching for those same values in other rows. The more column headers you label, the more Impira gets a clearer picture of the table you're trying to extract.
You'll see Impira's predictions for other rows show up on your file in just a few moments. If you don't see any show up, you may have to label a few more fields in a new row.
Impira continues to learn as you review predictions and will automatically apply your input and reprocess previous predictions to make them better. Each input fine-tunes your machine learning model so that it performs better for future files you upload into this Collection.
Improve results by confirming entire rows
By the time you finish labeling Row 1, Impira has already gone through the rest of your rows and files to extract matching data. You'll see Impira's predictions in just a few moments.
All the rows will be displayed in a list and will feature a red, green, or black marker. These indicate Impira's machine learning confidence level for that row.
Let's go through and check Impira's work.
- Choose any row and take a look to see if all the values are accurate.
- If they need adjustment, click the value and either highlight the correct value on your document or type in the value yourself, then click Confirm entire row.
- Continue this process for subsequent rows until Impira grows more confident and turns red markers green.
- When you go to your next document in the Collection, be sure to confirm Row 1 first. If Row 1 has been incorrectly identified, edit the values to reflect the correct first row. You won't be able to confirm any other rows before doing this.
Modifying your table
If you find yourself needing to insert missing rows or remove a few extra ones, hover over the three dots icon in any row and choose, Delete Row, Insert row above, or Insert row below.
It's best to make any row corrections in order from top to bottom.
Inserting missing rows
In the example below, Row 3 has been skipped, so we'll be inserting a new row below Row 2.
Renaming table columns
- From Collection View, click the three dots by the column name you want to edit.
- Make your edits and click Save.
Rows with varied heights
Many users come across tables with rows with varied heights. This happens when certain values contain more information than others. Fortunately, Impira's table extraction is flexible enough to adapt to these changes.
The image below depicts a table with variable height rows due to each row having highly varied values for Description field. In this one table alone, the Description field has anywhere from one to seven lines.
If your table has varied row heights, make sure you confirm multiple rows with differing heights after you finish setting up the first row. This helps Impira learn to make more accurate predictions by taking a range of row heights into account. The larger variety of rows you confirm, the more Impira learns and improves.
Example video: Variable Height Rows
This video walks through the process of setting up table extraction for invoices with variable height rows.
Tables with shared fields
You may come across tables that contain nested subtables, and the rows within a subtable may all collectively belong to a particular field. We call that particular field a shared field.
In the table above, the values for the fields Item, Units, Price, and Total, are associated with the values from the Warehouse field.
In other words, the fields Item, Units, Price, and Total all collectively share the field, Warehouse.
It might help to imagine shared fields as parents and the fields that share it are children. The child fields all share the same parent field.
How to add shared fields
- Begin by extracting the child fields as if they were a standard, standalone table. (See section How to extract table data to learn how.)
- After all the child fields have been successfully extracted across your whole table, go back and add your shared fields as a new column to Row 1.
- Go to the next row that shares these shared fields and add the same values there as well. As soon as you do this, a shared field icon with numbers will appear on your shared field, indicating which rows are associated with this shared field.
- Click Confirm entire row on the rows you're adding shared fields to, and Impira will reprocess predictions and look for similar shared fields in other areas.
- Review Impira's predictions and confirm or correct any predictions (and entire rows) to continue refining Impira and improving future predictions.
Example video: Shared fields
This video walks through the process of setting up shared fields in your table.
Now that your table rows are clean and ready to go, you can close Extraction view to see a bird's eye view of the rest of the data from your files. If you still see any more red markers, re-open the file and repeat the review steps listed above to make sure your tables are good to go.
You can also collapse or expand any table by clicking the arrows next to the file name.