{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TMVA BDT interface"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To facilitate the comparison of a trained NN and a classical TMVA BDT, the package provides an interface to compute the output score of a TMVA BDT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "from freeforestml import Variable, Process, Cut, hist, McStack, DataStack, Stack, TmvaBdt\n",
    "from freeforestml import toydata, example_style\n",
    "example_style()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = toydata.get()\n",
    "df = df.compute()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation of TMVA BDT\n",
    "The file `tmva_bdt.xml` contains a sample BDT trained on the toy dataset. The input variables are\n",
    " - $\\eta^{j_1}$,\n",
    " - $\\eta^{j_2}$,\n",
    " - $\\tau\\;\\mathrm{centrality}$ and\n",
    " - $\\ell\\;\\mathrm{centrality}$.\n",
    " \n",
    "The BDT was trained with regular TMVA in ROOT. The weights stored in the XML file can be read by `freeforestml.TmvaBdt` directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bdt = TmvaBdt(\"tmva_bdt.xml\")\n",
    "df['bdt_prediction'] = bdt.predict(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plotting"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p_ztt = Process(r\"$Z\\rightarrow\\tau\\tau$\", range=(0, 0))\n",
    "p_sig = Process(r\"Signal\", range=(1, 1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "colors = [\"windows blue\", \"amber\", \"greyish\", \"faded green\", \"dusty purple\"]\n",
    "colors = sns.xkcd_palette(colors)\n",
    "s_ztt = McStack(p_ztt, color=colors[0], histtype='step')\n",
    "s_sig = McStack(p_sig, color=colors[1], histtype='step')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "v_bdt = Variable(\"BDT Score\", \"bdt_prediction\")\n",
    "hist(df, v_bdt, 22, [s_ztt, s_sig], range=(-1.1, 1.1), \n",
    "     weight=\"weight\",  numerator=None)\n",
    "None"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}