{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introductory example\n", "\n", "In this notebook, we compute income taxes and social security contributions for example\n", "data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "from gettsim import InputData, MainTarget, TTTargets, copy_environment, main, tt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step in GETTSIM's new workflow is to define the targets you're interested in.\n", "The key sequences of the nested dictionary below are the paths GETTSIM will use as\n", "targets. For instance, via the path `einkommensteuer` and `betrag_m_sn`, we request the\n", "amount of income tax to be paid monthly at the Steuernummer level. *Note: Of course, the\n", "income tax is paid annually and calculated at that level, but GETTSIM will do the\n", "conversion for you.*\n", "\n", "The values on the lowest level of the dictionaries (called leaves) will be used as the\n", "column names of the resulting DataFrame. Here, `income_tax_m` will be the name of the\n", "column containing the income tax results.\n", "\n", "In this example, we are interested in the income tax and the social insurance\n", "contributions paid when being in regular employment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "TT_TARGETS = {\n", " \"einkommensteuer\": {\"betrag_m_sn\": \"income_tax_m\"},\n", " \"sozialversicherung\": {\n", " \"pflege\": {\n", " \"beitrag\": {\n", " \"betrag_versicherter_m\": \"long_term_care_insurance_contribution_m\"\n", " }\n", " },\n", " \"kranken\": {\n", " \"beitrag\": {\"betrag_versicherter_m\": \"health_insurance_contribution_m\"}\n", " },\n", " \"rente\": {\n", " \"beitrag\": {\"betrag_versicherter_m\": \"pension_insurance_contribution_m\"}\n", " },\n", " \"arbeitslosen\": {\n", " \"beitrag\": {\n", " \"betrag_versicherter_m\": \"unemployment_insurance_contribution_m\"\n", " }\n", " },\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we need to find out which input data we actually need to calculate the targets we\n", "are interested in. We can do this by specifying a template as the `main_target` of\n", "`gettsim.main`.\n", "\n", "Because we are interested social insurance contributions paid when being in regular\n", "employment, we are not interested in retirees or households which depend on social\n", "assistance. We can override these transfers when requesting the template. This removes\n", "the input data needed to compute these transfers from the template." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "main(\n", " main_target=MainTarget.templates.input_data_dtypes.tree,\n", " policy_date_str=\"2025-01-01\",\n", " tt_targets=TTTargets(tree=TT_TARGETS),\n", " input_data=InputData.tree(\n", " {\n", " \"p_id\": pd.Series([0]),\n", " \"sozialversicherung\": {\n", " \"rente\": {\n", " \"altersrente\": {\"betrag_m\": pd.Series([0])},\n", " },\n", " \"arbeitslosen\": {\"betrag_m\": pd.Series([0])},\n", " },\n", " \"wohngeld\": {\"betrag_m_wthh\": pd.Series([0])},\n", " \"kinderzuschlag\": {\"betrag_m_bg\": pd.Series([0])},\n", " \"elterngeld\": {\"betrag_m\": pd.Series([0])},\n", " \"arbeitslosengeld_2\": {\"betrag_m_bg\": pd.Series([0])},\n", " }\n", " ),\n", " include_warn_nodes=False,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, we create some example data. Here, we use a pandas DataFrame with column names that are different from the ones GETTSIM expects." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA = pd.DataFrame(\n", " {\n", " \"age\": [30, 30, 10],\n", " \"working_hours\": [35, 35, 0],\n", " \"disability_grade\": [0, 0, 0],\n", " \"birth_year\": [1995, 1995, 2015],\n", " \"hh_id\": [0, 0, 0],\n", " \"p_id\": [0, 1, 2],\n", " \"east_germany\": [False, False, False],\n", " \"self_employed\": [False, False, False],\n", " \"income_from_self_employment\": [0, 0, 0],\n", " \"income_from_rent\": [0, 0, 0],\n", " \"income_from_employment\": [5000, 4000, 0],\n", " \"income_from_forest_and_agriculture\": [0, 0, 0],\n", " \"income_from_capital\": [500, 0, 0],\n", " \"income_from_other_sources\": [0, 0, 0],\n", " \"contribution_to_private_pension_insurance\": [0, 0, 0],\n", " \"childcare_expenses\": [0, 0, 0],\n", " \"person_that_pays_childcare_expenses\": [-1, -1, 0],\n", " \"joint_taxation\": [True, True, False],\n", " \"amount_private_pension_income\": [0, 0, 0],\n", " \"contribution_private_health_insurance\": [0, 0, 0],\n", " \"has_children\": [True, True, False],\n", " \"single_parent\": [False, False, False],\n", " \"is_child\": [False, False, True],\n", " \"spouse_id\": [1, 0, -1],\n", " \"parent_id_1\": [-1, -1, 0],\n", " \"parent_id_2\": [-1, -1, 1],\n", " \"in_training\": [False, False, False],\n", " \"id_recipient_child_allowance\": [-1, -1, 0],\n", " }\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we define a mapping from GETTSIM's expected input structure to your data. Note\n", "that the paths are the union of the input_data for `main` and the result from calling it\n", "above (with `main_target=MainTarget.templates.input_data_dtypes.tree`).\n", "\n", "Just the leaves are different; we have replaced the dtype hints by the column names in\n", "the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "MAPPER = {\n", " \"alter\": \"age\",\n", " \"arbeitsstunden_w\": \"working_hours\",\n", " \"behinderungsgrad\": \"disability_grade\",\n", " \"geburtsjahr\": \"birth_year\",\n", " \"hh_id\": \"hh_id\",\n", " \"p_id\": \"p_id\",\n", " \"wohnort_ost\": \"east_germany\",\n", " \"einnahmen\": {\n", " \"bruttolohn_m\": \"income_from_employment\",\n", " \"kapitalerträge_m\": \"income_from_capital\",\n", " \"renten\": {\n", " \"betriebliche_altersvorsorge_m\": 0.0,\n", " \"geförderte_private_vorsorge_m\": 0.0,\n", " \"sonstige_private_vorsorge_m\": 0.0,\n", " },\n", " },\n", " \"einkommensteuer\": {\n", " \"einkünfte\": {\n", " \"ist_hauptberuflich_selbstständig\": \"self_employed\",\n", " \"aus_gewerbebetrieb\": {\"betrag_m\": \"income_from_self_employment\"},\n", " \"aus_vermietung_und_verpachtung\": {\"betrag_m\": \"income_from_rent\"},\n", " \"aus_forst_und_landwirtschaft\": {\n", " \"betrag_m\": \"income_from_forest_and_agriculture\"\n", " },\n", " \"aus_selbstständiger_arbeit\": {\"betrag_m\": \"income_from_self_employment\"},\n", " \"sonstige\": {\n", " \"alle_weiteren_m\": \"income_from_other_sources\",\n", " },\n", " },\n", " \"abzüge\": {\n", " \"beitrag_private_rentenversicherung_m\": (\n", " \"contribution_to_private_pension_insurance\"\n", " ),\n", " \"kinderbetreuungskosten_m\": \"childcare_expenses\",\n", " \"p_id_kinderbetreuungskostenträger\": \"person_that_pays_childcare_expenses\",\n", " },\n", " \"gemeinsam_veranlagt\": \"joint_taxation\",\n", " },\n", " \"sozialversicherung\": {\n", " \"arbeitslosen\": {\"betrag_m\": 0.0},\n", " \"rente\": {\n", " \"jahr_renteneintritt\": 0,\n", " \"altersrente\": {\n", " \"betrag_m\": 0.0,\n", " },\n", " \"erwerbsminderung\": {\n", " \"betrag_m\": 0.0,\n", " },\n", " },\n", " \"kranken\": {\n", " \"beitrag\": {\"privat_versichert\": \"contribution_private_health_insurance\"}\n", " },\n", " \"pflege\": {\"beitrag\": {\"hat_kinder\": \"has_children\"}},\n", " },\n", " \"familie\": {\n", " \"alleinerziehend\": \"single_parent\",\n", " \"kind\": \"is_child\",\n", " \"p_id_ehepartner\": \"spouse_id\",\n", " \"p_id_elternteil_1\": \"parent_id_1\",\n", " \"p_id_elternteil_2\": \"parent_id_2\",\n", " },\n", " \"wohngeld\": {\n", " \"betrag_m_wthh\": 0.0,\n", " },\n", " \"kinderzuschlag\": {\n", " \"betrag_m_bg\": 0.0,\n", " },\n", " \"elterngeld\": {\n", " \"betrag_m\": 0.0,\n", " },\n", " \"arbeitslosengeld_2\": {\n", " \"betrag_m_bg\": 0.0,\n", " },\n", " \"kindergeld\": {\n", " \"in_ausbildung\": \"in_training\",\n", " \"p_id_empfänger\": \"id_recipient_child_allowance\",\n", " },\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In practice, you would probably want to save the template above to disk (e.g. as a yaml\n", "file) and edit it there. Then you can read in the file and use its content as the\n", "mapper.\n", "\n", "Note: When writing and reading the template to your disk, don't forget to allow for\n", "unicode characters. This is important because many transfers have Umlaute in their\n", "names. An example could look like this:\n", "\n", "```python\n", "import yaml\n", "\n", "# Write the template to your disk...\n", "with PATH_FOR_TEMPLATE.open(\"w\") as f:\n", " yaml.dump(TEMPLATE, f, allow_unicode=True)\n", "\n", "# Edit the leafs in the template and then read it back in\n", "with PATH_FOR_TEMPLATE.open(\"r\") as f:\n", " MAPPER = {yaml.load(f, allow_unicode=True)}\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculating taxes and transfers\n", "\n", "Just as for taxes and transfers, GETTSIM's `main` function is powered by a DAG. This\n", "comes with the advantages that seasoned GETTSIM users already know from the DAG\n", "representing the taxes and transfers system:\n", "- Users can select any part of the DAG as a target. This means that users can access\n", " any intermediate objects.\n", "- Users can feed any part of the DAG as input. This means that users can overwrite\n", " specific parts of the DAG (e.g. the policy environment).\n", "- Users can decide which parts of the DAG not to compute. For example, users can choose\n", " not to perform safety checks on the input data. This means that GETTSIM is quicker in\n", " computing the result (at the expense of informative errors).\n", "\n", "First, we look at the one-stop shop: computing the targets defined above using the input\n", "data. In a second example, we manipulate the policy environment to see why the interface\n", "DAG is useful.\n", "\n", "### Simple computation\n", "\n", "Let's calculate taxes and transfers first:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = main(\n", " policy_date_str=\"2025-01-01\",\n", " input_data=InputData.df_and_mapper(\n", " df=DATA,\n", " mapper=MAPPER,\n", " ),\n", " main_target=MainTarget.results.df_with_mapper,\n", " tt_targets=TTTargets(tree=TT_TARGETS),\n", " include_warn_nodes=False,\n", ")\n", "result.T" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Manipulating the policy environment\n", "\n", "First, we obtain the policy environment for the policy date we're interested in. Similar\n", "to above, we call the `main` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "status_quo = main(\n", " policy_date_str=\"2025-01-01\",\n", " main_target=MainTarget.policy_environment,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us modify the policy environment by increasing the contribution rate of the public\n", "pension insurance by 1 percentage point. \n", "\n", "The first step is to create a copy." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "increased_rate = copy_environment(status_quo)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The contribution rate is a `ScalarParam` object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "type(status_quo[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We get the current `value` of the `ScalarParam` out. We then inject a new `ScalarParam` object into the same place of `policy_environment`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "old_beitragssatz = status_quo[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"]\n", "increased_rate[\"sozialversicherung\"][\"rente\"][\"beitrag\"][\"beitragssatz\"] = (\n", " tt.ScalarParam(value=old_beitragssatz.value + 0.01)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute taxes and transfers with the increased contribution rate:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = main(\n", " main_target=MainTarget.results.df_with_mapper,\n", " policy_date_str=\"2025-01-01\",\n", " policy_environment=increased_rate,\n", " input_data=InputData.df_and_mapper(\n", " df=DATA,\n", " mapper=MAPPER,\n", " ),\n", " tt_targets=TTTargets(\n", " tree=TT_TARGETS,\n", " ),\n", " include_warn_nodes=False,\n", ")\n", "result.T" ] } ], "metadata": { "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 2 }