How to extract data from tables

What can I use table extraction for?

Impira's powerful and flexible table extraction feature lets you pull data out of tables within your documents, as well as data from recurring lists and repeated values, and place them in a clean, spreadsheet-like format.

Adaptable to unique layouts

Tables come in all sorts of shapes, sizes, and formats. Group files with similar table layouts into Collections, and use the following steps to teach Impira what your tables look like. Impira will actively learn and begin to extract the same data you need out of the rest of your files.

Here are some examples of tables that Impira can extract:

Jump to the section Rows with varied heights below to see how adaptable Impira's table extraction can be.

How to extract table data

In many cases, Impira will automatically detect tables on the first page of your file. You can click this table suggestion (highlighted in purple) and go through an expedited process of creating a table field, or you can dismiss the suggestion and create a field manually.

Setting up an autodetected table

Step 1: Click the table

Open your file and click the big purple box to get started.

Step 2: Set up Row 1

  1. Starting from the left, check that each autodetected column name and value for the first row is correct.
  2. Click Next column after inspecting each value and column name (after making any adjustments to the blue box or value box if necessary).
  3. Hit Finish setup after you’ve checked all the values in the first row.
Note: Once this first row is set up, Impira will use this as a template to detect other rows in this file and others in the Collection.

Step 3: Review predictions

  1. When Impira finishes extracting the rest of the table rows in this file, go down the queue and check the predictions, starting with the very next row, Row 2.
  2. If necessary, correct any incorrect prediction values by adjusting the blue box or manually typing a correct value into the value box.
  3. Click Confirm entire row when you’re done. This triggers Impira to take your new input into account to create better predictions.

Step 4: Upload more files

Exit Extraction view and upload more similar files with tables into this Collection. Impira will immediately start extracting table data from those files based on the table you just set up.

Step 5 (optional): Rename your table

We automatically give your table a name, but feel free to change it to anything you want.

Manual data extraction

Start with Row 1

Once you've uploaded your file and placed it into a Collection, open a file to enter Extraction view.

  1. Go to Add field in the top right corner, select Table, and give your table a name.
  1. A blue box will appear on your document. Adjust the box to capture your first row's values, minus any header or column names.
  1. One by one, name your column headers, highlight the corresponding value, then click Add value.
  2. Choose Done after you've added your last value for Row 1.

✨ Behind the magic: What's happening?

As you start labeling your first row values, Impira is already using your first few labels to learn what you're looking for and searching for those same values in other rows. The more column headers you label, the more Impira gets a clearer picture of the table you're trying to extract.

You'll see Impira's predictions for other rows show up on your file in just a few moments. If you don't see any show up, you may have to label a few more fields in a new row.

Impira continues to learn as you review predictions and will automatically apply your input and reprocess previous predictions to make them better. Each input fine-tunes your machine learning model so that it performs better for future files you upload into this Collection.

Improve results by confirming entire rows

By the time you finish labeling Row 1, Impira has already gone through the rest of your rows and files to extract matching data. You'll see Impira's predictions in just a few moments.

All the rows will be displayed in a list and will feature a red, green, or black marker. These indicate Impira's machine learning confidence level for that row.

Let's go through and check Impira's work.

  1. Choose any row and take a look to see if all the values are accurate.
  2. If they need adjustment, click the value and either highlight the correct value on your document or type in the value yourself, then click Confirm entire row.
  3. Continue this process for subsequent rows until Impira grows more confident and turns red markers green.
  4. When you go to your next document in the Collection, be sure to confirm Row 1 first. If Row 1 has been incorrectly identified, edit the values to reflect the correct first row. You won't be able to confirm any other rows before doing this.
Note: Make sure you click Confirm entire row when you're sure a whole row is correct. This triggers Impira to refresh your predictions. The more you do this, the more accurate your predictions will be.
Note: If your table is missing row predictions on a page that looks a bit different from the others (e.g., if the page is more zoomed in), try adding and confirming a few of the topmost rows on that page.

Modifying your table

If you find yourself needing to insert missing rows or remove a few extra ones, hover over the three dots icon in any row and choose, Delete Row, Insert row above, or Insert row below.

It's best to make any row corrections in order from top to bottom.

Inserting missing rows

Note: When inserting a new row above or below an existing row, you must label each value one by one for that new row.

In the example below, Row 3 has been skipped, so we'll be inserting a new row below Row 2.

Renaming table columns

  1. From Collection View, click the three dots by the column name you want to edit.
  2. Make your edits and click Save.

Rows with varied heights

Many users come across tables with rows with varied heights. This happens when certain values contain more information than others. Fortunately, Impira's table extraction is flexible enough to adapt to these changes.

The image below depicts a table with variable height rows due to each row having highly varied values for Description field. In this one table alone, the Description field has anywhere from one to seven lines.

If your table has varied row heights, make sure you confirm multiple rows with differing heights after you finish setting up the first row. This helps Impira learn to make more accurate predictions by taking a range of row heights into account. The larger variety of rows you confirm, the more Impira learns and improves.

Example video: Variable Height Rows

This video walks through the process of setting up table extraction for invoices with variable height rows.

Tables with shared fields

You may come across tables that contain nested subtables, and the rows within a subtable may all collectively belong to a particular field. We call that particular field a shared field.

In the table above, the values for the fields Item, Units, Price, and Total, are associated with the values from the Warehouse field.

In other words, the fields Item, Units, Price, and Total all collectively share the field, Warehouse.

It might help to imagine shared fields as parents and the fields that share it are children. The child fields all share the same parent field.

How to add shared fields

  1. Begin by extracting the child fields as if they were a standard, standalone table. (See section How to extract table data to learn how.)
  2. After all the child fields have been successfully extracted across your whole table, go back and add your shared fields as a new column to Row 1.
  3. Go to the next row that shares these shared fields and add the same values there as well. As soon as you do this, a shared field icon with numbers will appear on your shared field, indicating which rows are associated with this shared field.
  1. Click Confirm entire row on the rows you're adding shared fields to, and Impira will reprocess predictions and look for similar shared fields in other areas.
  2. Review Impira's predictions and confirm or correct any predictions (and entire rows) to continue refining Impira and improving future predictions.

Example video: Shared fields

This video walks through the process of setting up shared fields in your table.

Now that your table rows are clean and ready to go, you can close Extraction view to see a bird's eye view of the rest of the data from your files. If you still see any more red markers, re-open the file and repeat the review steps listed above to make sure your tables are good to go.

You can also collapse or expand any table by clicking the arrows next to the file name.

© 2022 Impira Inc. All rights reserved. This site is built with Motif.