{ "cells": [ { "cell_type": "markdown", "id": "b69db3b2", "metadata": {}, "source": [ "# Combinar fuentes de datos\n", "\n", "En las bases de datos relacionales, combinar tablas es una de las tareas más comunes y necesarias. Por lo general, la información se divide para hacerla más fácil de gestionar y por ello al recuperarla necesitamos combinarlas para obtener un resultado coherente en una sola tabla.\n", "\n", "Para este ejercicio necesitaremos dos fuentes de datos, así que aprovecharé el \"Catálogo Único de Claves de Áreas Geoestadísticas Estatales, Municipales y Localidades\" que proveé el [INEGI](https://www.inegi.org.mx/app/ageeml/) para obtener las coordenadas geográficas de los municipios que se anotan en el conjunto de datos por Covid." ] }, { "cell_type": "code", "execution_count": 1, "id": "21666a7e", "metadata": { "tags": [ "remove-cell" ] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/jairoantonio/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3444: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] } ], "source": [ "import pandas as pd\n", "try:\n", " covid_nacional = pd.read_csv('../data/casos_nacionales_covid-19_2022_semestre1.csv')\n", "except FileNotFoundError:\n", " covid_nacional = pd.read_csv('../datos/casos_nacionales_covid-19_2022_semestre1.csv')\n", "covid_nacional.rename(columns={\n", " \"entidad_nac\": \"entidad_nacimiento\",\n", " \"entidad_res\": \"entidad_residencia\",\n", " \"municipio_res\": \"municipio_residencia\"\n", "}, inplace=True)" ] }, { "cell_type": "code", "execution_count": 2, "id": "1ffb465c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(300083, 19)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MapaCve_EntNom_EntNom_AbrCve_MunNom_MunCve_LocNom_LocÁmbitoLatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
0100100011AguascalientesAgs.1Aguascalientes1AguascalientesU21°52´47.362N\"102°17´45.768W\"21.879823-102.2960471878F13D19863893419168444725246259
1100100941AguascalientesAgs.1Aguascalientes94Granja AdelitaR21°52´18.749N\"102°22´24.710W\"21.871875-102.3735311901F13D185**2
2100100961AguascalientesAgs.1Aguascalientes96Agua AzulR21°53´01.522N\"102°21´25.639W\"21.883756-102.3571221861F13D1841241712
3100101001AguascalientesAgs.1Aguascalientes100Rancho AlegreR21°51´16.556N\"102°22´21.884W\"21.854599-102.3727461879F13D180000
4100101021AguascalientesAgs.1Aguascalientes102Los Arbolitos [Rancho]R21°46´48.650N\"102°21´26.261W\"21.780181-102.3572951861F13D188**2
\n", "
" ], "text/plain": [ " Mapa Cve_Ent Nom_Ent Nom_Abr Cve_Mun Nom_Mun \\\n", "0 10010001 1 Aguascalientes Ags. 1 Aguascalientes \n", "1 10010094 1 Aguascalientes Ags. 1 Aguascalientes \n", "2 10010096 1 Aguascalientes Ags. 1 Aguascalientes \n", "3 10010100 1 Aguascalientes Ags. 1 Aguascalientes \n", "4 10010102 1 Aguascalientes Ags. 1 Aguascalientes \n", "\n", " Cve_Loc Nom_Loc Ámbito Latitud Longitud \\\n", "0 1 Aguascalientes U 21°52´47.362N\" 102°17´45.768W\" \n", "1 94 Granja Adelita R 21°52´18.749N\" 102°22´24.710W\" \n", "2 96 Agua Azul R 21°53´01.522N\" 102°21´25.639W\" \n", "3 100 Rancho Alegre R 21°51´16.556N\" 102°22´21.884W\" \n", "4 102 Los Arbolitos [Rancho] R 21°46´48.650N\" 102°21´26.261W\" \n", "\n", " Lat_Decimal Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina \\\n", "0 21.879823 -102.296047 1878 F13D19 863893 419168 \n", "1 21.871875 -102.373531 1901 F13D18 5 * \n", "2 21.883756 -102.357122 1861 F13D18 41 24 \n", "3 21.854599 -102.372746 1879 F13D18 0 0 \n", "4 21.780181 -102.357295 1861 F13D18 8 * \n", "\n", " Pob_Femenina Total De Viviendas Habitadas \n", "0 444725 246259 \n", "1 * 2 \n", "2 17 12 \n", "3 0 0 \n", "4 * 2 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "ruta_areas_inegi = '../data/AGEEML_2022842026272.csv'\n", "areas_inegi = pd.read_csv(ruta_areas_inegi)\n", "print(areas_inegi.shape)\n", "areas_inegi.head()" ] }, { "cell_type": "markdown", "id": "c0e05629", "metadata": {}, "source": [ "Verás que nuestro nuevo conjunto de datos contiene 300,083 filas y 19 columnas. Además de los datos correspondientes a las coordenadas geográficas tenemos otros datos como la altitud, el total de población (total y por sexos), el total de viviendas habitadas, entre otros datos relacionados con claves de entidades.\n", "\n", "Tenemos como dato común los municipios de residencia de los casos por Covid y el nombre del municipio (`Nom_Mun`) de los datos del INEGI. Ya que aprendimos a renombrar las columnas, aprovechemos para renombrar las columnas de las coordenadas de tal manera que podamos crear una columna común sobre la cual hacer la unión." ] }, { "cell_type": "code", "execution_count": 3, "id": "b214b160", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MapaCve_EntNom_EntNom_AbrCve_Munmunicipio_residenciaCve_LocNom_LocÁmbitoLatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
0100100011AguascalientesAgs.1Aguascalientes1AguascalientesU21°52´47.362N\"102°17´45.768W\"21.879823-102.2960471878F13D19863893419168444725246259
1100100941AguascalientesAgs.1Aguascalientes94Granja AdelitaR21°52´18.749N\"102°22´24.710W\"21.871875-102.3735311901F13D185**2
2100100961AguascalientesAgs.1Aguascalientes96Agua AzulR21°53´01.522N\"102°21´25.639W\"21.883756-102.3571221861F13D1841241712
3100101001AguascalientesAgs.1Aguascalientes100Rancho AlegreR21°51´16.556N\"102°22´21.884W\"21.854599-102.3727461879F13D180000
4100101021AguascalientesAgs.1Aguascalientes102Los Arbolitos [Rancho]R21°46´48.650N\"102°21´26.261W\"21.780181-102.3572951861F13D188**2
\n", "
" ], "text/plain": [ " Mapa Cve_Ent Nom_Ent Nom_Abr Cve_Mun municipio_residencia \\\n", "0 10010001 1 Aguascalientes Ags. 1 Aguascalientes \n", "1 10010094 1 Aguascalientes Ags. 1 Aguascalientes \n", "2 10010096 1 Aguascalientes Ags. 1 Aguascalientes \n", "3 10010100 1 Aguascalientes Ags. 1 Aguascalientes \n", "4 10010102 1 Aguascalientes Ags. 1 Aguascalientes \n", "\n", " Cve_Loc Nom_Loc Ámbito Latitud Longitud \\\n", "0 1 Aguascalientes U 21°52´47.362N\" 102°17´45.768W\" \n", "1 94 Granja Adelita R 21°52´18.749N\" 102°22´24.710W\" \n", "2 96 Agua Azul R 21°53´01.522N\" 102°21´25.639W\" \n", "3 100 Rancho Alegre R 21°51´16.556N\" 102°22´21.884W\" \n", "4 102 Los Arbolitos [Rancho] R 21°46´48.650N\" 102°21´26.261W\" \n", "\n", " Lat_Decimal Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina \\\n", "0 21.879823 -102.296047 1878 F13D19 863893 419168 \n", "1 21.871875 -102.373531 1901 F13D18 5 * \n", "2 21.883756 -102.357122 1861 F13D18 41 24 \n", "3 21.854599 -102.372746 1879 F13D18 0 0 \n", "4 21.780181 -102.357295 1861 F13D18 8 * \n", "\n", " Pob_Femenina Total De Viviendas Habitadas \n", "0 444725 246259 \n", "1 * 2 \n", "2 17 12 \n", "3 0 0 \n", "4 * 2 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "areas_inegi.rename(\n", " columns={'Nom_Mun':'municipio_residencia'}, # recordemos que cambiamos el nombre de la columna en el ejercicio anterior\n", " inplace=True)\n", "areas_inegi.head()" ] }, { "cell_type": "markdown", "id": "3c08766a", "metadata": {}, "source": [ "Ahora sí podremos empezar a \"jugar\" con nuestros datos para lograr un conjunto de datos con la información de geolocalización.\n", "\n", "## \"Inner\" merge\n", "\n", "De manera predeterminada, `pandas` combina nuestros datos con una unión \"inner\", es decir, que encuentra la información común de ambos conjuntos de datos y excluye la que no está en ambos. Podemos representar lo anterior a través de este diagrama de Venn:\n", "\n", "```{image} ../_static/imgs/merges/inner.jpeg\n", ":align: center\n", "```\n", "\n", "Para hacer la unión de nuestros datos usaremos la siguiente sintaxis:\n", "\n", "```python\n", "pd.merge(left, right, how='inner')\n", "```\n", "\n", "Lo cual se traduce en:" ] }, { "cell_type": "code", "execution_count": 4, "id": "c9b8efa5", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...LatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
\n", "

0 rows × 59 columns

\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Unnamed: 0, fecha_actualizacion, id_registro, origen, sector, entidad_um, sexo, entidad_nacimiento, entidad_residencia, municipio_residencia, tipo_paciente, fecha_ingreso, fecha_sintomas, fecha_def, intubado, neumonia, edad, nacionalidad, embarazo, habla_lengua_indig, indigena, diabetes, epoc, asma, inmusupr, hipertension, otra_com, cardiovascular, obesidad, renal_cronica, tabaquismo, otro_caso, toma_muestra_lab, resultado_lab, toma_muestra_antigeno, resultado_antigeno, clasificacion_final, migrante, pais_nacionalidad, pais_origen, uci, Mapa, Cve_Ent, Nom_Ent, Nom_Abr, Cve_Mun, Cve_Loc, Nom_Loc, Ámbito, Latitud, Longitud, Lat_Decimal, Lon_Decimal, Altitud, Cve_Carta, Pob_Total, Pob_Masculina, Pob_Femenina, Total De Viviendas Habitadas]\n", "Index: []\n", "\n", "[0 rows x 59 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inner_merge = pd.merge(covid_nacional, areas_inegi, how='inner', on='municipio_residencia')\n", "inner_merge.head()" ] }, { "cell_type": "markdown", "id": "a65b480f", "metadata": {}, "source": [ "Pero ¡este no es el resultado que esperábamos! ¿Por qué nos regresa una tabla vacía pero con 59 columnas?\n", "\n", "Esto es fácil de identificar: `pandas` no encuentra datos comunes porque la información del conjunto de datos de `covid_nacional` está escrito en mayúsculas sostenidas y la información del conjunto de datos de `areas_inegi` está escrito en mayúsculas y minúsculas. Más adelante veremos con mayor detalle como \"normalizar\" nuestros datos para evitar estas inconsistencias.\n", "\n", "Por lo pronto, podemos transformar nuestros datos a minúsculas en la columna `municipio_residencia` tanto en `covid_nacional` como en `areas_inegi` para que podamos hacer la unión. Afortunadamente, ambos conjuntos de datos conservan la acentuación gráfica de los datos, lo que nos facilita su unificación." ] }, { "cell_type": "code", "execution_count": 5, "id": "76e4c85d", "metadata": {}, "outputs": [], "source": [ "covid_nacional['municipio_residencia'] = covid_nacional['municipio_residencia'].str.lower()\n", "areas_inegi['municipio_residencia'] = areas_inegi['municipio_residencia'].str.lower()" ] }, { "cell_type": "markdown", "id": "b39be36c", "metadata": {}, "source": [ "Ahora, apliquemos nuevamente la unión a nuestro conjunto de datos. Pero primero, vamos a reducir nuestro conjunto de datos para hacerlo más manejable.\n", "\n", "```{admonition} ¿Por qué reducimos los datos?\n", ":class: tip\n", "Al experimentar con grandes conjuntos de datos es altamente recomendable ser amable con los recursos computacionales. Si hacemos estos ejercicios directamente con todo el conjunto de datos nos tomará más tiempo y estaremos expuestos a errores de memoria.\n", "```" ] }, { "cell_type": "code", "execution_count": 6, "id": "f5afb9cb", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(624, 59)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...LatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
082022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
182022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°26´28.806N\"099°20´21.556W\"19.441335-99.3393212846E14A389920480251182404
282022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°28´32.024N\"099°20´39.833W\"19.475562-99.3443982738E14A38359517731822942
382022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°26´21.138N\"099°22´38.200W\"19.439205-99.3772783298E14A3850282221
482022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°29´44.683N\"099°18´19.417W\"19.495745-99.3053942501E14A391013490523255
\n", "

5 rows × 59 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "0 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "1 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "2 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "3 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "4 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "\n", " entidad_um sexo entidad_nacimiento entidad_residencia \\\n", "0 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "1 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "2 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "3 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "4 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "\n", " municipio_residencia ... Latitud Longitud Lat_Decimal \\\n", "0 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "1 naucalpan de juárez ... 19°26´28.806N\" 099°20´21.556W\" 19.441335 \n", "2 naucalpan de juárez ... 19°28´32.024N\" 099°20´39.833W\" 19.475562 \n", "3 naucalpan de juárez ... 19°26´21.138N\" 099°22´38.200W\" 19.439205 \n", "4 naucalpan de juárez ... 19°29´44.683N\" 099°18´19.417W\" 19.495745 \n", "\n", " Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina Pob_Femenina \\\n", "0 -99.233218 2280 E14A39 776220 373698 402522 \n", "1 -99.339321 2846 E14A38 9920 4802 5118 \n", "2 -99.344398 2738 E14A38 3595 1773 1822 \n", "3 -99.377278 3298 E14A38 50 28 22 \n", "4 -99.305394 2501 E14A39 1013 490 523 \n", "\n", " Total De Viviendas Habitadas \n", "0 225509 \n", "1 2404 \n", "2 942 \n", "3 21 \n", "4 255 \n", "\n", "[5 rows x 59 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional_tm = covid_nacional[:100] # reducimos el conjunto de datos a 100 filas\n", "\n", "inner_merge = pd.merge(covid_nacional_tm, areas_inegi, how='inner', on='municipio_residencia')\n", "print(inner_merge.shape)\n", "inner_merge.head()" ] }, { "cell_type": "markdown", "id": "c926bc17", "metadata": {}, "source": [ "Ahora estamos enfrentándonos a otro problema: ¡el resultado de la unión son 624 filas! ¿Por qué? Esto se debe a que los datos de `municipio_residencia` no están asociados con un campo único. Esto lo podemos ver aquí:" ] }, { "cell_type": "code", "execution_count": 7, "id": "a70931b8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MapaCve_EntNom_EntNom_AbrCve_Munmunicipio_residenciaCve_LocNom_LocÁmbitoLatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
14080515025000115MéxicoMex.25chalco1Chalco de Díaz CovarrubiasU19°15´40.074N\"098°53´44.357W\"19.261132-98.8956552238E14B31174704832079149745578
14080615025000215MéxicoMex.25chalco2La Candelaria TlapalaU19°14´26.710N\"098°50´49.712W\"19.240753-98.8471422254E14B419446462048262428
14080715025000515MéxicoMex.25chalco5San Gregorio CuautzingoU19°15´25.528N\"098°51´29.893W\"19.257091-98.8583042244E14B318485410943762134
14080815025000915MéxicoMex.25chalco9Instituto Damián (Ex-Hacienda San Juan de Dios)R19°14´16.312N\"098°54´32.585W\"19.237864-98.9090512241E14B411**1
14080915025001015MéxicoMex.25chalco10San Juan TezompaU19°12´25.717N\"098°57´37.832W\"19.207144-98.9605092239E14B4113127642267053453
............................................................
14087215025014715MéxicoMex.25chalco147Hacienda San Juan [Conjunto Urbano]U19°16´08.352N\"098°50´25.142W\"19.268987-98.8403172257E14B311623774849458
14087315025014815MéxicoMex.25chalco148Villas de San Martín [Conjunto Urbano]U19°16´03.967N\"098°50´09.801W\"19.267769-98.8360562258E14B315714278629281698
14087415025014915MéxicoMex.25chalco149Los Héroes Chalco [Conjunto Urbano]U19°15´37.533N\"098°50´29.037W\"19.260426-98.8413992253E14B313427716552177259965
14087515025015115MéxicoMex.25chalco151Valle Plateado [Fraccionamiento]R19°13´06.312N\"098°47´15.789W\"19.218420-98.7877192449E14B4167363120
14087615025015215MéxicoMex.25chalco152La PulidaR19°13´30.678N\"098°49´25.945W\"19.225188-98.8238742292E14B41154688639
\n", "

72 rows × 19 columns

\n", "
" ], "text/plain": [ " Mapa Cve_Ent Nom_Ent Nom_Abr Cve_Mun municipio_residencia \\\n", "140805 150250001 15 México Mex. 25 chalco \n", "140806 150250002 15 México Mex. 25 chalco \n", "140807 150250005 15 México Mex. 25 chalco \n", "140808 150250009 15 México Mex. 25 chalco \n", "140809 150250010 15 México Mex. 25 chalco \n", "... ... ... ... ... ... ... \n", "140872 150250147 15 México Mex. 25 chalco \n", "140873 150250148 15 México Mex. 25 chalco \n", "140874 150250149 15 México Mex. 25 chalco \n", "140875 150250151 15 México Mex. 25 chalco \n", "140876 150250152 15 México Mex. 25 chalco \n", "\n", " Cve_Loc Nom_Loc Ámbito \\\n", "140805 1 Chalco de Díaz Covarrubias U \n", "140806 2 La Candelaria Tlapala U \n", "140807 5 San Gregorio Cuautzingo U \n", "140808 9 Instituto Damián (Ex-Hacienda San Juan de Dios) R \n", "140809 10 San Juan Tezompa U \n", "... ... ... ... \n", "140872 147 Hacienda San Juan [Conjunto Urbano] U \n", "140873 148 Villas de San Martín [Conjunto Urbano] U \n", "140874 149 Los Héroes Chalco [Conjunto Urbano] U \n", "140875 151 Valle Plateado [Fraccionamiento] R \n", "140876 152 La Pulida R \n", "\n", " Latitud Longitud Lat_Decimal Lon_Decimal Altitud \\\n", "140805 19°15´40.074N\" 098°53´44.357W\" 19.261132 -98.895655 2238 \n", "140806 19°14´26.710N\" 098°50´49.712W\" 19.240753 -98.847142 2254 \n", "140807 19°15´25.528N\" 098°51´29.893W\" 19.257091 -98.858304 2244 \n", "140808 19°14´16.312N\" 098°54´32.585W\" 19.237864 -98.909051 2241 \n", "140809 19°12´25.717N\" 098°57´37.832W\" 19.207144 -98.960509 2239 \n", "... ... ... ... ... ... \n", "140872 19°16´08.352N\" 098°50´25.142W\" 19.268987 -98.840317 2257 \n", "140873 19°16´03.967N\" 098°50´09.801W\" 19.267769 -98.836056 2258 \n", "140874 19°15´37.533N\" 098°50´29.037W\" 19.260426 -98.841399 2253 \n", "140875 19°13´06.312N\" 098°47´15.789W\" 19.218420 -98.787719 2449 \n", "140876 19°13´30.678N\" 098°49´25.945W\" 19.225188 -98.823874 2292 \n", "\n", " Cve_Carta Pob_Total Pob_Masculina Pob_Femenina \\\n", "140805 E14B31 174704 83207 91497 \n", "140806 E14B41 9446 4620 4826 \n", "140807 E14B31 8485 4109 4376 \n", "140808 E14B41 1 * * \n", "140809 E14B41 13127 6422 6705 \n", "... ... ... ... ... \n", "140872 E14B31 1623 774 849 \n", "140873 E14B31 5714 2786 2928 \n", "140874 E14B31 34277 16552 17725 \n", "140875 E14B41 67 36 31 \n", "140876 E14B41 154 68 86 \n", "\n", " Total De Viviendas Habitadas \n", "140805 45578 \n", "140806 2428 \n", "140807 2134 \n", "140808 1 \n", "140809 3453 \n", "... ... \n", "140872 458 \n", "140873 1698 \n", "140874 9965 \n", "140875 20 \n", "140876 39 \n", "\n", "[72 rows x 19 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "areas_inegi.loc[areas_inegi['municipio_residencia'] == 'chalco']" ] }, { "cell_type": "markdown", "id": "45f7db7d", "metadata": {}, "source": [ "Al hacer el merge, el conjunto de datos se multiplica por las coincidencias, encontrándonos así con una tabla que nos da información inconsistente.\n", "\n", "```{admonition} ¡importante!\n", ":class: tip\n", "Un error común al hacer un merge es terminar con un conjunto de datos mucho mayor del esperado. Esto se debe a que provienen de fuentes diferentes y no se puede predecir con exactitud cuántos datos se obtendrán.\n", "Por esta razón es **fundamental** asegurarse de que el conjunto de datos resultante sea el esperado. Una manera práctica de hacerlo es valiéndose del módulo shape. Si el resultado es un número inconsistente (por ejemplo, que 100 casos de Covid se convirtieran en 624) debemos revisar dónde se encuentra el error. \n", "\n", "Lastimosamente no existe una fórmula que abarque todos los casos, así que este es un paso que depende en buena medida de nuestra capacidad para detectar errores.\n", "```\n", "\n", "Tendremos finalmente que hacer una depuración de los datos de `areas_inegi_tm` para que coincida con nuestros datos. Encontramos que el registro de datos de `covid_nacional` usa como `municipio_residencia` el nombre principal del municipio, en este caso, coincide con la clave `1` del campo `Cve_Loc` en `areas_inegi`. Segmentaremos nuestro conjunto de datos valiéndonos del método `.loc` y podremos hacer nuevamente nuestro 'inner merge'" ] }, { "cell_type": "code", "execution_count": 8, "id": "57368622", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(15, 59)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...LatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
082022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
1112022-06-2604fc18FUERA DE USMERSEDENACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICOMÉXICOatizapán...19°10´36.328N\"099°29´16.591W\"19.176758-99.4879422590E14A4810873525056232342
2242022-06-263a5f83FUERA DE USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICOMÉXICOzumpango...19°47´49.643N\"099°06´02.282W\"19.797123-99.1006342261E14A1953362260582730413659
3252022-06-263d59eaFUERA DE USMERSSACIUDAD DE MÉXICOMUJERGUERREROMÉXICOzumpango...19°47´49.643N\"099°06´02.282W\"19.797123-99.1006342261E14A1953362260582730413659
4262022-06-26284743USMERISSSTECIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICOMÉXICOnezahualcóyotl...19°24´31.548N\"099°01´05.520W\"19.408763-99.0182002262E14A391072676515678556998297975
\n", "

5 rows × 59 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "0 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "1 11 2022-06-26 04fc18 FUERA DE USMER SEDENA \n", "2 24 2022-06-26 3a5f83 FUERA DE USMER SSA \n", "3 25 2022-06-26 3d59ea FUERA DE USMER SSA \n", "4 26 2022-06-26 284743 USMER ISSSTE \n", "\n", " entidad_um sexo entidad_nacimiento entidad_residencia \\\n", "0 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "1 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO MÉXICO \n", "2 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO MÉXICO \n", "3 CIUDAD DE MÉXICO MUJER GUERRERO MÉXICO \n", "4 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO MÉXICO \n", "\n", " municipio_residencia ... Latitud Longitud Lat_Decimal \\\n", "0 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "1 atizapán ... 19°10´36.328N\" 099°29´16.591W\" 19.176758 \n", "2 zumpango ... 19°47´49.643N\" 099°06´02.282W\" 19.797123 \n", "3 zumpango ... 19°47´49.643N\" 099°06´02.282W\" 19.797123 \n", "4 nezahualcóyotl ... 19°24´31.548N\" 099°01´05.520W\" 19.408763 \n", "\n", " Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina Pob_Femenina \\\n", "0 -99.233218 2280 E14A39 776220 373698 402522 \n", "1 -99.487942 2590 E14A48 10873 5250 5623 \n", "2 -99.100634 2261 E14A19 53362 26058 27304 \n", "3 -99.100634 2261 E14A19 53362 26058 27304 \n", "4 -99.018200 2262 E14A39 1072676 515678 556998 \n", "\n", " Total De Viviendas Habitadas \n", "0 225509 \n", "1 2342 \n", "2 13659 \n", "3 13659 \n", "4 297975 \n", "\n", "[5 rows x 59 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# segmentamos nuestro conjunto de datos de areas_inegi\n", "areas_inegi_tm = areas_inegi.loc[areas_inegi['Cve_Loc'] == 1]\n", "\n", "# realizamos nuestro merge\n", "inner_merge = pd.merge(covid_nacional_tm, areas_inegi_tm, how='inner', on='municipio_residencia')\n", "print(inner_merge.shape)\n", "inner_merge.head()" ] }, { "cell_type": "markdown", "id": "f888a1a8", "metadata": {}, "source": [ "¡Eureka! Tenemos una tabla con el resultado esperado. ¿Cómo lo sabemos? (15 filas parece un número sospechoso) Porque coincide con el número de valores no nulos que nos regresa la columna `municipio_residencia` en `covid_nacional_tm`." ] }, { "cell_type": "code", "execution_count": 9, "id": "70a6771c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "nezahualcóyotl 5\n", "zumpango 2\n", "chalco 2\n", "naucalpan de juárez 1\n", "atizapán 1\n", "atlautla 1\n", "ciudad fernández 1\n", "ecatepec de morelos 1\n", "huixquilucan 1\n", "Name: municipio_residencia, dtype: int64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional_tm['municipio_residencia'].value_counts()" ] }, { "cell_type": "markdown", "id": "59a7b30c", "metadata": {}, "source": [ "## Outer Merge\n", "\n", "La unión \"hacia afuera\" es todo lo contrario a un \"inner merge\". En este caso, combinamos todos los campos de `covid_nacional` con todos los campos de `areas_inegi`, incluyendo aquellos que no coincidan. Así lo vemos en un diagrama de Venn:\n", "\n", "```{image} ../_static/imgs/merges/outer.jpeg\n", ":align: center\n", "```\n", "\n", "Apliquemos un \"outer merge\" a nuestro ejemplo anterior para ver los resultados:" ] }, { "cell_type": "code", "execution_count": 10, "id": "14854150", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2564, 59)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...LatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
01.02022-06-260793b8FUERA DE USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
12.02022-06-260fef08USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
23.02022-06-2611e31aFUERA DE USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
34.02022-06-260741e4FUERA DE USMERISSSTECIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
45.02022-06-2613c92bFUERA DE USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICONaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

5 rows × 59 columns

\n", "
" ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "0 1.0 2022-06-26 0793b8 FUERA DE USMER SSA \n", "1 2.0 2022-06-26 0fef08 USMER SSA \n", "2 3.0 2022-06-26 11e31a FUERA DE USMER SSA \n", "3 4.0 2022-06-26 0741e4 FUERA DE USMER ISSSTE \n", "4 5.0 2022-06-26 13c92b FUERA DE USMER SSA \n", "\n", " entidad_um sexo entidad_nacimiento entidad_residencia \\\n", "0 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN \n", "1 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN \n", "2 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN \n", "3 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN \n", "4 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO NaN \n", "\n", " municipio_residencia ... Latitud Longitud Lat_Decimal Lon_Decimal Altitud \\\n", "0 NaN ... NaN NaN NaN NaN NaN \n", "1 NaN ... NaN NaN NaN NaN NaN \n", "2 NaN ... NaN NaN NaN NaN NaN \n", "3 NaN ... NaN NaN NaN NaN NaN \n", "4 NaN ... NaN NaN NaN NaN NaN \n", "\n", " Cve_Carta Pob_Total Pob_Masculina Pob_Femenina Total De Viviendas Habitadas \n", "0 NaN NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN NaN \n", "\n", "[5 rows x 59 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outer_merge = pd.merge(covid_nacional_tm, areas_inegi_tm, how='outer', on='municipio_residencia')\n", "print(outer_merge.shape)\n", "outer_merge.head()" ] }, { "cell_type": "markdown", "id": "6a978fd3", "metadata": {}, "source": [ "Vemos que el resultado son 2,564 filas. Es una cantidad bastante grande, pero no es un error. Lo que sucede en este caso, es que se combinan los datos que coincide con los valores de `municipio_residencia` y se combinan los datos que no coinciden. Donde no hay coincidencias, las celdas se llenan con valores nulos.\n", "\n", "Para nuestro ejemplo, una unión hacia afuera no es útil. Pero supongamos que tenemos un conjunto de datos de estudiantes con las calificaciones del primer trimestre:" ] }, { "cell_type": "code", "execution_count": 11, "id": "cad4acde", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NombreCalificaciones_T1
0Andrea10
1Berenice8
2Carlos9
\n", "
" ], "text/plain": [ " Nombre Calificaciones_T1\n", "0 Andrea 10\n", "1 Berenice 8\n", "2 Carlos 9" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "estudiantes_trimestre1 = pd.DataFrame(\n", " {'Nombre': [\"Andrea\", \"Berenice\", \"Carlos\"],\n", " 'Calificaciones_T1': [10, 8, 9]})\n", "estudiantes_trimestre1" ] }, { "cell_type": "markdown", "id": "b8514800", "metadata": {}, "source": [ "Ahora, queremos agregar las calificaciones de otro conjunto de datos correspondiente al segundo trimestre:" ] }, { "cell_type": "code", "execution_count": 12, "id": "502ee17f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NombreCalificaciones_T2
0Andrea9
1Berenice7
2Carlos8
\n", "
" ], "text/plain": [ " Nombre Calificaciones_T2\n", "0 Andrea 9\n", "1 Berenice 7\n", "2 Carlos 8" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "estudiantes_trimestre2 = pd.DataFrame(\n", " {'Nombre': [\"Andrea\", \"Berenice\", \"Carlos\"],\n", " 'Calificaciones_T2': [9, 7, 8]})\n", "estudiantes_trimestre2" ] }, { "cell_type": "markdown", "id": "2cb1fc9b", "metadata": {}, "source": [ "En este caso, un 'outer merge' es indicado:" ] }, { "cell_type": "code", "execution_count": 13, "id": "f3c14823", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NombreCalificaciones_T1Calificaciones_T2
0Andrea109
1Berenice87
2Carlos98
\n", "
" ], "text/plain": [ " Nombre Calificaciones_T1 Calificaciones_T2\n", "0 Andrea 10 9\n", "1 Berenice 8 7\n", "2 Carlos 9 8" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "outer_merge = pd.merge(estudiantes_trimestre1, estudiantes_trimestre2, how='outer', on='Nombre')\n", "outer_merge" ] }, { "cell_type": "markdown", "id": "35f1cf76", "metadata": {}, "source": [ "## A modo de cierre\n", "\n", "Es necesario que selecciones tu método de unión de acuerdo a la lógica de tus datos. Revisa los resultados de tus combinaciones y ajusta los datos para que lleguen al resultado esperado. En ocasiones esto no es un proceso simple, tienes que reducir los datos y hacerlos manejables para que sea mucho más sencillo llegar a encontrar el error. Todas estas son prácticas que debes tener en cuenta y que no dependen del lenguaje de programación que estes utilizando.\n", "\n", "## Para saber más\n", "\n", "Existen otros métodos de unión de conjuntos de datos como el 'left merge' y el 'right merge', el método 'join' o la concatenación ('concat'). Para no extendernos más, quisiera sugerirles el capítulo 2 del libro de {cite}`stepanek_thinking_2020` donde se explica de manera sintética cómo utilizar estos métodos." ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "-all", "formats": "md:myst", "text_representation": { "extension": ".md", "format_name": "myst", "format_version": 0.13, "jupytext_version": "1.14.0" } }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" }, "source_map": [ 14, 22, 36, 42, 48, 53, 73, 76, 84, 87, 96, 103, 107, 109, 123, 131, 135, 137, 149, 153, 159, 164, 168, 173, 177, 180 ] }, "nbformat": 4, "nbformat_minor": 5 }