Q&A on Tabular Data with Huggingface

How to get answers to natural language questions with Huggingface

Author

Kedar Dabhadkar

Published

June 5, 2021

Tabular Data Q&A

Learn how to ask natural language questions to your tabular data using Huggingface Transformers. The three most important dependencies are PyTorch, Transformers and PyTorch-scatter.

Current shortcoming: Doesn’t wotk with non-string data

# !pip install torch==1.8.0
# !pip install transformers
# !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html

import pandas as pd
from transformers import pipeline

Import data

data_df = pd.read_csv("https://raw.githubusercontent.com/dkedar7/Data-Analyzer/master/Analyzer/titanic.csv")
data_df = data_df.sample(100).reset_index(drop=True).astype(str)

data_df

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	66	1	3	Moubarek, Master. Gerios	male	nan	1	1	2661	15.2458	nan	C
1	267	0	3	Panula, Mr. Ernesti Arvid	male	16.0	4	1	3101295	39.6875	nan	S
2	209	1	3	Carr, Miss. Helen "Ellen"	female	16.0	0	0	367231	7.75	nan	Q
3	817	0	3	Heininen, Miss. Wendla Maria	female	23.0	0	0	STON/O2. 3101290	7.925	nan	S
4	489	0	3	Somerton, Mr. Francis William	male	30.0	0	0	A.5. 18509	8.05	nan	S
...	...	...	...	...	...	...	...	...	...	...	...	...
95	255	0	3	Rosblom, Mrs. Viktor (Helena Wilhelmina)	female	41.0	0	2	370129	20.2125	nan	S
96	441	1	2	Hart, Mrs. Benjamin (Esther Ada Bloomfield)	female	45.0	1	1	F.C.C. 13529	26.25	nan	S
97	798	1	3	Osman, Mrs. Mara	female	31.0	0	0	349244	8.6833	nan	S
98	827	0	3	Lam, Mr. Len	male	nan	0	0	1601	56.4958	nan	S
99	130	0	3	Ekstrom, Mr. Johan	male	45.0	0	0	347061	6.975	nan	S

100 rows × 12 columns

Define transformers pipeline

tqa = pipeline("table-question-answering", model="google/tapas-base-finetuned-wtq")

Ask questions

tqa(data_df, "Mean age")

{'answer': 'AVERAGE > 16.0, 16.0, 71.0, 2.0',
 'coordinates': [(1, 5), (2, 5), (6, 5), (20, 5)],
 'cells': ['16.0', '16.0', '71.0', '2.0'],
 'aggregator': 'AVERAGE'}

data_df.dtypes

PassengerId      int64
Survived         int64
Pclass           int64
Name            object
Sex             object
Age            float64
SibSp            int64
Parch            int64
Ticket          object
Fare           float64
Cabin           object
Embarked        object
dtype: object