Q&A on Tabular Data with Huggingface

How to get answers to natural language questions with Huggingface
Author

Kedar Dabhadkar

Published

June 5, 2021

Tabular Data Q&A

Learn how to ask natural language questions to your tabular data using Huggingface Transformers. The three most important dependencies are PyTorch, Transformers and PyTorch-scatter.

Current shortcoming: Doesn’t wotk with non-string data

# !pip install torch==1.8.0
# !pip install transformers
# !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
import pandas as pd
from transformers import pipeline

Import data

data_df = pd.read_csv("https://raw.githubusercontent.com/dkedar7/Data-Analyzer/master/Analyzer/titanic.csv")
data_df = data_df.sample(100).reset_index(drop=True).astype(str)
data_df
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 66 1 3 Moubarek, Master. Gerios male nan 1 1 2661 15.2458 nan C
1 267 0 3 Panula, Mr. Ernesti Arvid male 16.0 4 1 3101295 39.6875 nan S
2 209 1 3 Carr, Miss. Helen "Ellen" female 16.0 0 0 367231 7.75 nan Q
3 817 0 3 Heininen, Miss. Wendla Maria female 23.0 0 0 STON/O2. 3101290 7.925 nan S
4 489 0 3 Somerton, Mr. Francis William male 30.0 0 0 A.5. 18509 8.05 nan S
... ... ... ... ... ... ... ... ... ... ... ... ...
95 255 0 3 Rosblom, Mrs. Viktor (Helena Wilhelmina) female 41.0 0 2 370129 20.2125 nan S
96 441 1 2 Hart, Mrs. Benjamin (Esther Ada Bloomfield) female 45.0 1 1 F.C.C. 13529 26.25 nan S
97 798 1 3 Osman, Mrs. Mara female 31.0 0 0 349244 8.6833 nan S
98 827 0 3 Lam, Mr. Len male nan 0 0 1601 56.4958 nan S
99 130 0 3 Ekstrom, Mr. Johan male 45.0 0 0 347061 6.975 nan S

100 rows × 12 columns

Define transformers pipeline

tqa = pipeline("table-question-answering", model="google/tapas-base-finetuned-wtq")

Ask questions

tqa(data_df, "Mean age")
{'answer': 'AVERAGE > 16.0, 16.0, 71.0, 2.0',
 'coordinates': [(1, 5), (2, 5), (6, 5), (20, 5)],
 'cells': ['16.0', '16.0', '71.0', '2.0'],
 'aggregator': 'AVERAGE'}
data_df.dtypes
PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object