# !pip install torch==1.8.0
# !pip install transformers
# !pip install torch-scatter -f https://pytorch-geometric.com/whl/torch-1.8.0+cpu.htmlQ&A on Tabular Data with Huggingface
ML
How to get answers to natural language questions with Huggingface
Tabular Data Q&A
Learn how to ask natural language questions to your tabular data using Huggingface Transformers. The three most important dependencies are PyTorch, Transformers and PyTorch-scatter.
Current shortcoming: Doesn’t wotk with non-string data
import pandas as pd
from transformers import pipelineImport data
data_df = pd.read_csv("https://raw.githubusercontent.com/dkedar7/Data-Analyzer/master/Analyzer/titanic.csv")
data_df = data_df.sample(100).reset_index(drop=True).astype(str)data_df| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 66 | 1 | 3 | Moubarek, Master. Gerios | male | nan | 1 | 1 | 2661 | 15.2458 | nan | C |
| 1 | 267 | 0 | 3 | Panula, Mr. Ernesti Arvid | male | 16.0 | 4 | 1 | 3101295 | 39.6875 | nan | S |
| 2 | 209 | 1 | 3 | Carr, Miss. Helen "Ellen" | female | 16.0 | 0 | 0 | 367231 | 7.75 | nan | Q |
| 3 | 817 | 0 | 3 | Heininen, Miss. Wendla Maria | female | 23.0 | 0 | 0 | STON/O2. 3101290 | 7.925 | nan | S |
| 4 | 489 | 0 | 3 | Somerton, Mr. Francis William | male | 30.0 | 0 | 0 | A.5. 18509 | 8.05 | nan | S |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 95 | 255 | 0 | 3 | Rosblom, Mrs. Viktor (Helena Wilhelmina) | female | 41.0 | 0 | 2 | 370129 | 20.2125 | nan | S |
| 96 | 441 | 1 | 2 | Hart, Mrs. Benjamin (Esther Ada Bloomfield) | female | 45.0 | 1 | 1 | F.C.C. 13529 | 26.25 | nan | S |
| 97 | 798 | 1 | 3 | Osman, Mrs. Mara | female | 31.0 | 0 | 0 | 349244 | 8.6833 | nan | S |
| 98 | 827 | 0 | 3 | Lam, Mr. Len | male | nan | 0 | 0 | 1601 | 56.4958 | nan | S |
| 99 | 130 | 0 | 3 | Ekstrom, Mr. Johan | male | 45.0 | 0 | 0 | 347061 | 6.975 | nan | S |
100 rows × 12 columns
Define transformers pipeline
tqa = pipeline("table-question-answering", model="google/tapas-base-finetuned-wtq")Ask questions
tqa(data_df, "Mean age"){'answer': 'AVERAGE > 16.0, 16.0, 71.0, 2.0',
'coordinates': [(1, 5), (2, 5), (6, 5), (20, 5)],
'cells': ['16.0', '16.0', '71.0', '2.0'],
'aggregator': 'AVERAGE'}
data_df.dtypesPassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object