# !pip install torch==1.8.0
# !pip install transformers
# !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.html
Q&A on Tabular Data with Huggingface
How to get answers to natural language questions with Huggingface
Tabular Data Q&A
Learn how to ask natural language questions to your tabular data using Huggingface Transformers. The three most important dependencies are PyTorch, Transformers and PyTorch-scatter.
Current shortcoming: Doesn’t wotk with non-string data
import pandas as pd
from transformers import pipeline
Import data
= pd.read_csv("https://raw.githubusercontent.com/dkedar7/Data-Analyzer/master/Analyzer/titanic.csv")
data_df = data_df.sample(100).reset_index(drop=True).astype(str) data_df
data_df
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 66 | 1 | 3 | Moubarek, Master. Gerios | male | nan | 1 | 1 | 2661 | 15.2458 | nan | C |
1 | 267 | 0 | 3 | Panula, Mr. Ernesti Arvid | male | 16.0 | 4 | 1 | 3101295 | 39.6875 | nan | S |
2 | 209 | 1 | 3 | Carr, Miss. Helen "Ellen" | female | 16.0 | 0 | 0 | 367231 | 7.75 | nan | Q |
3 | 817 | 0 | 3 | Heininen, Miss. Wendla Maria | female | 23.0 | 0 | 0 | STON/O2. 3101290 | 7.925 | nan | S |
4 | 489 | 0 | 3 | Somerton, Mr. Francis William | male | 30.0 | 0 | 0 | A.5. 18509 | 8.05 | nan | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
95 | 255 | 0 | 3 | Rosblom, Mrs. Viktor (Helena Wilhelmina) | female | 41.0 | 0 | 2 | 370129 | 20.2125 | nan | S |
96 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.25 | nan | S |
97 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.0 | 0 | 0 | 349244 | 8.6833 | nan | S |
98 | 827 | 0 | 3 | Lam, Mr. Len | male | nan | 0 | 0 | 1601 | 56.4958 | nan | S |
99 | 130 | 0 | 3 | Ekstrom, Mr. Johan | male | 45.0 | 0 | 0 | 347061 | 6.975 | nan | S |
100 rows × 12 columns
Define transformers pipeline
= pipeline("table-question-answering", model="google/tapas-base-finetuned-wtq") tqa
Ask questions
"Mean age") tqa(data_df,
{'answer': 'AVERAGE > 16.0, 16.0, 71.0, 2.0',
'coordinates': [(1, 5), (2, 5), (6, 5), (20, 5)],
'cells': ['16.0', '16.0', '71.0', '2.0'],
'aggregator': 'AVERAGE'}
data_df.dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object