{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Avance del proyecto\n", "\n", "Para esta semana, deberás tener un cuaderno similar al que te presentamos a continuación. Asegúrate de realizar las operaciones necesarias para que tu conjunto de datos sea lo más preciso posible." ] }, { "cell_type": "markdown", "metadata": { "colab_type": "text", "id": "view-in-github" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "H9FzFUrajfM7" }, "source": [ "# Importar datos\n", "\n", "Con \"importar datos\" nos referimos a la manera en la que preparamos la fuente de datos para ser leída por nuestro programa.\n", "\n", "Existen múltiples maneras de importar la información. Por ejemplo, podemos sencillamente utilizar el mismo método que usamos con nuestro archivo `ejemplo-1.txt`.\n", "\n", "Descarga el archivo que quieras utilizar en el directorio de Drive en el que vayas a almacenar tus datos.\n", "\n", "Como ejemplo, voy a utilizar los casos nacionales de COVID-19 registrados diariamente durante el primer semestre de 2022: https://datos.cdmx.gob.mx/dataset/casos-asociados-a-covid-19/resource/e5f65f40-5904-492a-ae33-1ea98fb73d78?inner_span=True\n", "\n", "Descargo el archivo CSV en un directorio de mi computadora. Posteriormente lo subo a mi directorio de datos de Google Drive:\n", "\n", "\n", "Volvemos a nuestro cuaderno de Google Colab. Me aseguro de haber activado Google Drive en mi Google Colab y busco el directorio en el cual está mi archivo. En mi caso: `'/content/drive/MyDrive/Colab Notebooks/curso_datos/casos_nacionales_covid-19_2022_semestre1.csv'`\n", "\n", "Con esos pasos, podemos hacer la importación:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "qnXNK7H2kz3M", "outputId": "f57572b9-4923-481e-d8c2-0fd5f8b172e4" }, "outputs": [ { "data": { "text/plain": [ "['\"\",\"fecha_actualizacion\",\"id_registro\",\"origen\",\"sector\",\"entidad_um\",\"sexo\",\"entidad_nac\",\"entidad_res\",\"municipio_res\",\"tipo_paciente\",\"fecha_ingreso\",\"fecha_sintomas\",\"fecha_def\",\"intubado\",\"neumonia\",\"edad\",\"nacionalidad\",\"embarazo\",\"habla_lengua_indig\",\"indigena\",\"diabetes\",\"epoc\",\"asma\",\"inmusupr\",\"hipertension\",\"otra_com\",\"cardiovascular\",\"obesidad\",\"renal_cronica\",\"tabaquismo\",\"otro_caso\",\"toma_muestra_lab\",\"resultado_lab\",\"toma_muestra_antigeno\",\"resultado_antigeno\",\"clasificacion_final\",\"migrante\",\"pais_nacionalidad\",\"pais_origen\",\"uci\"\\n']" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "datos = '/content/drive/MyDrive/Colab Notebooks/curso_datos/casos_nacionales_covid-19_2022_semestre1.csv'\n", "\n", "with open(datos, 'r') as f:\n", " data = f.readlines(10) # agrego este argumento porque el archivo es muy extenso.\n", "\n", "data" ] }, { "cell_type": "markdown", "metadata": { "id": "LukLwOCkpa7t" }, "source": [ "De esta manera hemos logrado incluir el archivo en nuestro cuaderno, pero será muy complejo manipularlo. Por esta razón, es preferible utilizar una librería que nos ayude a procesar estos datos. En nuestro caso, usaremos 'Pandas'\n", "\n", "Para hacer que nuestro programa funcione, solamente debemos importar la librería:\n", "\n", "`import pandas as pd`\n", "\n", "Y posteriormente podremos abrir nuestro archivo desde Python:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 508 }, "id": "b4yv7auIqCt7", "outputId": "7492d24f-b249-4f1f-80a3-f15935a62551" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:3326: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.\n", " exec(code_obj, self.user_global_ns, self.user_ns)\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacentidad_resmunicipio_res...otro_casotoma_muestra_labresultado_labtoma_muestra_antigenoresultado_antigenoclasificacion_finalmigrantepais_nacionalidadpais_origenuci
012022-06-260793b8FUERA DE USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2NO ESPECIFICADOMÉXICONO APLICANO APLICA
122022-06-260fef08USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NOSIPOSITIVO A SARS-COV-2NONO APLICA (CASO SIN MUESTRA)CASO DE SARS-COV-2 CONFIRMADONO ESPECIFICADOMÉXICONO APLICANO APLICA
232022-06-2611e31aFUERA DE USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2NO ESPECIFICADOMÉXICONO APLICANO APLICA
342022-06-260741e4FUERA DE USMERISSSTECIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NOSIRESULTADO NO ADECUADONONO APLICA (CASO SIN MUESTRA)NO REALIZADO POR LABORATORIONO ESPECIFICADOMÉXICONO APLICANO
452022-06-2613c92bFUERA DE USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICONaNNaN...SINONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2NO ESPECIFICADOMÉXICONO APLICANO APLICA
\n", "

5 rows × 41 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "0 1 2022-06-26 0793b8 FUERA DE USMER SSA \n", "1 2 2022-06-26 0fef08 USMER SSA \n", "2 3 2022-06-26 11e31a FUERA DE USMER SSA \n", "3 4 2022-06-26 0741e4 FUERA DE USMER ISSSTE \n", "4 5 2022-06-26 13c92b FUERA DE USMER SSA \n", "\n", " entidad_um sexo entidad_nac entidad_res municipio_res ... \\\n", "0 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN NaN ... \n", "1 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN NaN ... \n", "2 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN NaN ... \n", "3 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN NaN ... \n", "4 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO NaN NaN ... \n", "\n", " otro_caso toma_muestra_lab resultado_lab \\\n", "0 NO NO NO APLICA (CASO SIN MUESTRA) \n", "1 NO SI POSITIVO A SARS-COV-2 \n", "2 NO NO NO APLICA (CASO SIN MUESTRA) \n", "3 NO SI RESULTADO NO ADECUADO \n", "4 SI NO NO APLICA (CASO SIN MUESTRA) \n", "\n", " toma_muestra_antigeno resultado_antigeno \\\n", "0 SI NEGATIVO A SARS-COV-2 \n", "1 NO NO APLICA (CASO SIN MUESTRA) \n", "2 SI NEGATIVO A SARS-COV-2 \n", "3 NO NO APLICA (CASO SIN MUESTRA) \n", "4 SI NEGATIVO A SARS-COV-2 \n", "\n", " clasificacion_final migrante pais_nacionalidad \\\n", "0 NEGATIVO A SARS-COV-2 NO ESPECIFICADO MÉXICO \n", "1 CASO DE SARS-COV-2 CONFIRMADO NO ESPECIFICADO MÉXICO \n", "2 NEGATIVO A SARS-COV-2 NO ESPECIFICADO MÉXICO \n", "3 NO REALIZADO POR LABORATORIO NO ESPECIFICADO MÉXICO \n", "4 NEGATIVO A SARS-COV-2 NO ESPECIFICADO MÉXICO \n", "\n", " pais_origen uci \n", "0 NO APLICA NO APLICA \n", "1 NO APLICA NO APLICA \n", "2 NO APLICA NO APLICA \n", "3 NO APLICA NO \n", "4 NO APLICA NO APLICA \n", "\n", "[5 rows x 41 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "covid_nacional = pd.read_csv(datos)\n", "covid_nacional.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "Nu3Ce4XbqZ2J" }, "source": [ "De esta manera, nuestro archivo estará listo para ser procesado :)" ] }, { "cell_type": "markdown", "metadata": { "id": "zlne2GAtX-4M" }, "source": [ "# Análisis de estructuras de datos y preparación\n", "\n", "## Describe la fuente de datos\n", "\n", "Una descripción simple de la forma de la fuente de datos es la siguiente:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "blLldK-XYOqQ", "outputId": "5c2f0bf3-43d5-4109-ccf8-22e7cd55332c" }, "outputs": [ { "data": { "text/plain": [ "1323501" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# número de filas\n", "filas = covid_nacional.shape[0]\n", "filas" ] }, { "cell_type": "markdown", "metadata": { "id": "9tB3RFOpcszB" }, "source": [ "Esta es una fuente de datos con suficientes campos como para justificar una lectura distante de la información. Difícilmente una persona podría comprender la información que hay en ella solamente \"leyendo\" los datos de esas tablas." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9X7l8HuLdAvE", "outputId": "973ca9e2-040a-4475-923d-0945276ae0bf" }, "outputs": [ { "data": { "text/plain": [ "41" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# número de columnas\n", "columnas = covid_nacional.shape[1]\n", "columnas" ] }, { "cell_type": "markdown", "metadata": { "id": "Jv469AOXdGvl" }, "source": [ "Además, vemos que es un conjunto de datos con una cantidad significativa de categorías. Esto permite que con una sola fuente de información se puedan realizar operaciones de comparación entre columnas para analizar la información." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4PUmMBdvda_-", "outputId": "9d941421-83a6-4631-a735-a2fd7d9bb1ef" }, "outputs": [ { "data": { "text/plain": [ "Index(['Unnamed: 0', 'fecha_actualizacion', 'id_registro', 'origen', 'sector',\n", " 'entidad_um', 'sexo', 'entidad_nac', 'entidad_res', 'municipio_res',\n", " 'tipo_paciente', 'fecha_ingreso', 'fecha_sintomas', 'fecha_def',\n", " 'intubado', 'neumonia', 'edad', 'nacionalidad', 'embarazo',\n", " 'habla_lengua_indig', 'indigena', 'diabetes', 'epoc', 'asma',\n", " 'inmusupr', 'hipertension', 'otra_com', 'cardiovascular', 'obesidad',\n", " 'renal_cronica', 'tabaquismo', 'otro_caso', 'toma_muestra_lab',\n", " 'resultado_lab', 'toma_muestra_antigeno', 'resultado_antigeno',\n", " 'clasificacion_final', 'migrante', 'pais_nacionalidad', 'pais_origen',\n", " 'uci'],\n", " dtype='object')" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# nombre de las columnas\n", "covid_nacional.columns" ] }, { "cell_type": "markdown", "metadata": { "id": "Mbs9hsyUdjlA" }, "source": [ "El nombre de las columnas nos ayuda a identificar las categorías y posibles datos que contienen nuestra fuente de datos.\n", "\n", "No todas las fuentes de datos nombran sus columnas de manera significativa. En el caso de nuestro ejemplo, es bastante sencillo identificar qué tipo de información contiene cada categoría o columna, incluso qué tipo de dato sería deseable que tuviese cada una." ] }, { "cell_type": "markdown", "metadata": { "id": "rzqdf8zfd-qH" }, "source": [ "## Tipos de datos con `dtypes()`" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "OrisVD9BeE4q", "outputId": "0c0203fb-cc96-47a2-a57f-cc6423ae668f" }, "outputs": [ { "data": { "text/plain": [ "Unnamed: 0 int64\n", "fecha_actualizacion object\n", "id_registro object\n", "origen object\n", "sector object\n", "entidad_um object\n", "sexo object\n", "entidad_nac object\n", "entidad_res object\n", "municipio_res object\n", "tipo_paciente object\n", "fecha_ingreso object\n", "fecha_sintomas object\n", "fecha_def object\n", "intubado object\n", "neumonia object\n", "edad int64\n", "nacionalidad object\n", "embarazo object\n", "habla_lengua_indig object\n", "indigena object\n", "diabetes object\n", "epoc object\n", "asma object\n", "inmusupr object\n", "hipertension object\n", "otra_com object\n", "cardiovascular object\n", "obesidad object\n", "renal_cronica object\n", "tabaquismo object\n", "otro_caso object\n", "toma_muestra_lab object\n", "resultado_lab object\n", "toma_muestra_antigeno object\n", "resultado_antigeno object\n", "clasificacion_final object\n", "migrante object\n", "pais_nacionalidad object\n", "pais_origen object\n", "uci object\n", "dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional.dtypes" ] }, { "cell_type": "markdown", "metadata": { "id": "mYTidlIaeJeG" }, "source": [ "La mayoría de los datos se encuentran representados como tipo `object`, es decir, que son de tipo texto, numérico-textual o mixto.\n", "\n", "Aunque hay columnas que podrían tener un tipo de dato `datetime`, están representadas en tipo `object`. Esas columnas deberán ser transformadas para poder hacer operaciones y visualizaciones.\n", "\n", "## Descripción de los datos con `describe()`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 300 }, "id": "Xgau6A4jfIWY", "outputId": "59956d08-9c55-4dc1-d4e1-9c648302dc4c" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0edad
count1.323501e+061.323501e+06
mean6.617510e+053.774596e+01
std3.820620e+051.728453e+01
min1.000000e+000.000000e+00
25%3.308760e+052.500000e+01
50%6.617510e+053.600000e+01
75%9.926260e+055.000000e+01
max1.323501e+061.220000e+02
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 edad\n", "count 1.323501e+06 1.323501e+06\n", "mean 6.617510e+05 3.774596e+01\n", "std 3.820620e+05 1.728453e+01\n", "min 1.000000e+00 0.000000e+00\n", "25% 3.308760e+05 2.500000e+01\n", "50% 6.617510e+05 3.600000e+01\n", "75% 9.926260e+05 5.000000e+01\n", "max 1.323501e+06 1.220000e+02" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional.describe()" ] }, { "cell_type": "markdown", "metadata": { "id": "_yamjA7-fMtU" }, "source": [ "De modo predeterminado, `pandas` describe los datos numéricos `int64`. De estos solamente nos sería útil `edad`, pues `Unnamed: 0` es un índice (valor nominal)." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 522 }, "id": "Opn-B5jyfm5z", "outputId": "13cb451f-82e8-4ed1-e720-956ea504ffc4" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacentidad_resmunicipio_res...otro_casotoma_muestra_labresultado_labtoma_muestra_antigenoresultado_antigenoclasificacion_finalmigrantepais_nacionalidadpais_origenuci
count1.323501e+061323501132350113235011323501132350113235011323501149707149707...1323501132350113235011323501132350113235011323501132350113200401323501
uniqueNaN1132350121232233231190...325237312214
topNaN2022-06-260793b8FUERA DE USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICOMÉXICONEZAHUALCÓYOTL...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2NO ESPECIFICADOMÉXICONO APLICANO APLICA
freqNaN1323501111702677936061314661733991105227213337426282...8484341152385115238512045657716477923641305180130467313200401297093
mean6.617510e+05NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
std3.820620e+05NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
min1.000000e+00NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
25%3.308760e+05NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
50%6.617510e+05NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
75%9.926260e+05NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
max1.323501e+06NaNNaNNaNNaNNaNNaNNaNNaNNaN...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

11 rows × 41 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "count 1.323501e+06 1323501 1323501 1323501 1323501 \n", "unique NaN 1 1323501 2 12 \n", "top NaN 2022-06-26 0793b8 FUERA DE USMER SSA \n", "freq NaN 1323501 1 1170267 793606 \n", "mean 6.617510e+05 NaN NaN NaN NaN \n", "std 3.820620e+05 NaN NaN NaN NaN \n", "min 1.000000e+00 NaN NaN NaN NaN \n", "25% 3.308760e+05 NaN NaN NaN NaN \n", "50% 6.617510e+05 NaN NaN NaN NaN \n", "75% 9.926260e+05 NaN NaN NaN NaN \n", "max 1.323501e+06 NaN NaN NaN NaN \n", "\n", " entidad_um sexo entidad_nac entidad_res \\\n", "count 1323501 1323501 1323501 149707 \n", "unique 32 2 33 23 \n", "top CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO MÉXICO \n", "freq 1314661 733991 1052272 133374 \n", "mean NaN NaN NaN NaN \n", "std NaN NaN NaN NaN \n", "min NaN NaN NaN NaN \n", "25% NaN NaN NaN NaN \n", "50% NaN NaN NaN NaN \n", "75% NaN NaN NaN NaN \n", "max NaN NaN NaN NaN \n", "\n", " municipio_res ... otro_caso toma_muestra_lab \\\n", "count 149707 ... 1323501 1323501 \n", "unique 1190 ... 3 2 \n", "top NEZAHUALCÓYOTL ... NO NO \n", "freq 26282 ... 848434 1152385 \n", "mean NaN ... NaN NaN \n", "std NaN ... NaN NaN \n", "min NaN ... NaN NaN \n", "25% NaN ... NaN NaN \n", "50% NaN ... NaN NaN \n", "75% NaN ... NaN NaN \n", "max NaN ... NaN NaN \n", "\n", " resultado_lab toma_muestra_antigeno \\\n", "count 1323501 1323501 \n", "unique 5 2 \n", "top NO APLICA (CASO SIN MUESTRA) SI \n", "freq 1152385 1204565 \n", "mean NaN NaN \n", "std NaN NaN \n", "min NaN NaN \n", "25% NaN NaN \n", "50% NaN NaN \n", "75% NaN NaN \n", "max NaN NaN \n", "\n", " resultado_antigeno clasificacion_final migrante \\\n", "count 1323501 1323501 1323501 \n", "unique 3 7 3 \n", "top NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 NO ESPECIFICADO \n", "freq 771647 792364 1305180 \n", "mean NaN NaN NaN \n", "std NaN NaN NaN \n", "min NaN NaN NaN \n", "25% NaN NaN NaN \n", "50% NaN NaN NaN \n", "75% NaN NaN NaN \n", "max NaN NaN NaN \n", "\n", " pais_nacionalidad pais_origen uci \n", "count 1323501 1320040 1323501 \n", "unique 122 1 4 \n", "top MÉXICO NO APLICA NO APLICA \n", "freq 1304673 1320040 1297093 \n", "mean NaN NaN NaN \n", "std NaN NaN NaN \n", "min NaN NaN NaN \n", "25% NaN NaN NaN \n", "50% NaN NaN NaN \n", "75% NaN NaN NaN \n", "max NaN NaN NaN \n", "\n", "[11 rows x 41 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional.describe(include='all')" ] }, { "cell_type": "markdown", "metadata": { "id": "NdoEF5mofwNe" }, "source": [ "El parámetro `include='all'` obliga a realizar la operación en todas las columnas. \n", "\n", "Esto permite identificar algunas columnas con ciertas frecuencias que podrían ser sujeto de análisis. Por ejemplo, correlaciones entre enfermedades crónicas y resultados (positivos o negativos), o frecuencias de casos de migrantes, mujeres o indígenas relacionadas con un área geográfica.\n", "\n", "Debido a que esta fuente de datos no cuenta con información georeferenciada (contamos con los nombres de los municipios, pero no la información de latitud y longitud) será necesario utilizar una fuente de datos que permita agregar esa información." ] }, { "cell_type": "markdown", "metadata": { "id": "MtLRzOfYWhTk" }, "source": [ "# Procesamiento de datos\n", "\n", "## Manipulación de datos\n", "\n", "Aplicación del método `.iloc` para localizar filas y columnas por índice:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "id": "FQDBACFAW3-7", "outputId": "ab3fcd26-cbaf-4146-ff81-07c7fb89bcae" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fecha_actualizacionid_registro
22022-06-2611e31a
32022-06-260741e4
42022-06-2613c92b
52022-06-2604f190
62022-06-260a1655
.........
1952022-06-26485cdb
1962022-06-26984dc6
1972022-06-264b5708
1982022-06-26bb8b5b
1992022-06-26cc68e2
\n", "

198 rows × 2 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " fecha_actualizacion id_registro\n", "2 2022-06-26 11e31a\n", "3 2022-06-26 0741e4\n", "4 2022-06-26 13c92b\n", "5 2022-06-26 04f190\n", "6 2022-06-26 0a1655\n", ".. ... ...\n", "195 2022-06-26 485cdb\n", "196 2022-06-26 984dc6\n", "197 2022-06-26 4b5708\n", "198 2022-06-26 bb8b5b\n", "199 2022-06-26 cc68e2\n", "\n", "[198 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional.iloc[2:200, 1:3]" ] }, { "cell_type": "markdown", "metadata": { "id": "psE1lhRIXV-B" }, "source": [ "Aplicación del método `.loc` para hallar celdas por coincidencias:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 852 }, "id": "TsH71JG9XbJE", "outputId": "304affe2-00b1-4ec8-e2b3-180698aa630c" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacentidad_resmunicipio_res...otro_casotoma_muestra_labresultado_labtoma_muestra_antigenoresultado_antigenoclasificacion_finalmigrantepais_nacionalidadpais_origenuci
2522532022-06-26b94888FUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...SINONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIVENEZUELANaNNO APLICA
9719722022-06-26d22ed2USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...SINONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIESTADOS UNIDOS DE AMÉRICANaNNO APLICA
9799802022-06-266a5061USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...SINONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIESTADOS UNIDOS DE AMÉRICANaNNO APLICA
587758782022-06-26ac1990FUERA DE USMERPRIVADACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIESTADOS UNIDOS DE AMÉRICANaNNO APLICA
666666672022-06-268d5273FUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SICUBANaNNO APLICA
..................................................................
129822312982242022-06-26g16c3a9FUERA DE USMERPRIVADACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIITALIANaNNO APLICA
130524013052412022-06-26g154063FUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIEL SALVADORNaNNO APLICA
130527913052802022-06-26g1683feFUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIGUATEMALANaNNO APLICA
131668513166862022-06-26g0ebf9fFUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SIREPÚBLICA DE HONDURASNaNNO APLICA
131986413198652022-06-26g093480FUERA DE USMERSSACIUDAD DE MÉXICOMUJERNO ESPECIFICADONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2SICHILENaNNO APLICA
\n", "

1611 rows × 41 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "252 253 2022-06-26 b94888 FUERA DE USMER SSA \n", "971 972 2022-06-26 d22ed2 USMER SSA \n", "979 980 2022-06-26 6a5061 USMER SSA \n", "5877 5878 2022-06-26 ac1990 FUERA DE USMER PRIVADA \n", "6666 6667 2022-06-26 8d5273 FUERA DE USMER SSA \n", "... ... ... ... ... ... \n", "1298223 1298224 2022-06-26 g16c3a9 FUERA DE USMER PRIVADA \n", "1305240 1305241 2022-06-26 g154063 FUERA DE USMER SSA \n", "1305279 1305280 2022-06-26 g1683fe FUERA DE USMER SSA \n", "1316685 1316686 2022-06-26 g0ebf9f FUERA DE USMER SSA \n", "1319864 1319865 2022-06-26 g093480 FUERA DE USMER SSA \n", "\n", " entidad_um sexo entidad_nac entidad_res municipio_res \\\n", "252 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "971 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "979 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "5877 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "6666 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "... ... ... ... ... ... \n", "1298223 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "1305240 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "1305279 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "1316685 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "1319864 CIUDAD DE MÉXICO MUJER NO ESPECIFICADO NaN NaN \n", "\n", " ... otro_caso toma_muestra_lab resultado_lab \\\n", "252 ... SI NO NO APLICA (CASO SIN MUESTRA) \n", "971 ... SI NO NO APLICA (CASO SIN MUESTRA) \n", "979 ... SI NO NO APLICA (CASO SIN MUESTRA) \n", "5877 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "6666 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "... ... ... ... ... \n", "1298223 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "1305240 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "1305279 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "1316685 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "1319864 ... NO NO NO APLICA (CASO SIN MUESTRA) \n", "\n", " toma_muestra_antigeno resultado_antigeno clasificacion_final \\\n", "252 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "971 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "979 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "5877 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "6666 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "... ... ... ... \n", "1298223 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "1305240 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "1305279 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "1316685 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "1319864 SI NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "\n", " migrante pais_nacionalidad pais_origen uci \n", "252 SI VENEZUELA NaN NO APLICA \n", "971 SI ESTADOS UNIDOS DE AMÉRICA NaN NO APLICA \n", "979 SI ESTADOS UNIDOS DE AMÉRICA NaN NO APLICA \n", "5877 SI ESTADOS UNIDOS DE AMÉRICA NaN NO APLICA \n", "6666 SI CUBA NaN NO APLICA \n", "... ... ... ... ... \n", "1298223 SI ITALIA NaN NO APLICA \n", "1305240 SI EL SALVADOR NaN NO APLICA \n", "1305279 SI GUATEMALA NaN NO APLICA \n", "1316685 SI REPÚBLICA DE HONDURAS NaN NO APLICA \n", "1319864 SI CHILE NaN NO APLICA \n", "\n", "[1611 rows x 41 columns]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "seleccion = covid_nacional.loc[(covid_nacional['sexo'] == 'MUJER') & (covid_nacional['migrante'] == 'SI')]\n", "seleccion" ] }, { "cell_type": "markdown", "metadata": { "id": "wqwpZgPDYwqG" }, "source": [ "Renombramos las columnas para poder realizar correctamente la unión entre dos dataframes:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 358 }, "id": "mebho60TY5-r", "outputId": "1dd52513-4deb-4082-83ab-5bc06975bafb" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...otro_casotoma_muestra_labresultado_labtoma_muestra_antigenoresultado_antigenoclasificacion_finalmigrantepais_nacionalidadpais_origenuci
782022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICONAUCALPAN DE JUÁREZ...NONONO APLICA (CASO SIN MUESTRA)NONO APLICA (CASO SIN MUESTRA)CASO SOSPECHOSONO ESPECIFICADOMÉXICONO APLICANO APLICA
892022-06-260681f2FUERA DE USMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SINEGATIVO A SARS-COV-2NEGATIVO A SARS-COV-2NO ESPECIFICADOMÉXICONO APLICANO APLICA
9102022-06-260a98b4FUERA DE USMERSSACIUDAD DE MÉXICOMUJERMICHOACÁN DE OCAMPONaNNaN...NONONO APLICA (CASO SIN MUESTRA)SIPOSITIVO A SARS-COV-2CASO DE SARS-COV-2 CONFIRMADONO ESPECIFICADOMÉXICONO APLICANO APLICA
\n", "

3 rows × 41 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "7 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "8 9 2022-06-26 0681f2 FUERA DE USMER SSA \n", "9 10 2022-06-26 0a98b4 FUERA DE USMER SSA \n", "\n", " entidad_um sexo entidad_nacimiento entidad_residencia \\\n", "7 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "8 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO NaN \n", "9 CIUDAD DE MÉXICO MUJER MICHOACÁN DE OCAMPO NaN \n", "\n", " municipio_residencia ... otro_caso toma_muestra_lab \\\n", "7 NAUCALPAN DE JUÁREZ ... NO NO \n", "8 NaN ... NO NO \n", "9 NaN ... NO NO \n", "\n", " resultado_lab toma_muestra_antigeno \\\n", "7 NO APLICA (CASO SIN MUESTRA) NO \n", "8 NO APLICA (CASO SIN MUESTRA) SI \n", "9 NO APLICA (CASO SIN MUESTRA) SI \n", "\n", " resultado_antigeno clasificacion_final \\\n", "7 NO APLICA (CASO SIN MUESTRA) CASO SOSPECHOSO \n", "8 NEGATIVO A SARS-COV-2 NEGATIVO A SARS-COV-2 \n", "9 POSITIVO A SARS-COV-2 CASO DE SARS-COV-2 CONFIRMADO \n", "\n", " migrante pais_nacionalidad pais_origen uci \n", "7 NO ESPECIFICADO MÉXICO NO APLICA NO APLICA \n", "8 NO ESPECIFICADO MÉXICO NO APLICA NO APLICA \n", "9 NO ESPECIFICADO MÉXICO NO APLICA NO APLICA \n", "\n", "[3 rows x 41 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "covid_nacional.rename(columns={\n", " \"entidad_nac\": \"entidad_nacimiento\",\n", " \"entidad_res\": \"entidad_residencia\",\n", " \"municipio_res\": \"municipio_residencia\"\n", "}, inplace=True)\n", "covid_nacional[7:10]" ] }, { "cell_type": "markdown", "metadata": { "id": "epndNHsqZAw5" }, "source": [ "## Merge\n", "\n", "Nuevo conjunto de datos para realizar la combinación:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 392 }, "id": "H8vomfbAZIDU", "outputId": "9f7cbe5d-3976-4368-a4c5-57282e86bc14" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MapaCve_EntNom_EntNom_AbrCve_MunNom_MunCve_LocNom_LocÁmbitoLatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
0100100011AguascalientesAgs.1Aguascalientes1AguascalientesU21°52´47.362N\"102°17´45.768W\"21.879823-102.2960471878F13D19863893419168444725246259
1100100941AguascalientesAgs.1Aguascalientes94Granja AdelitaR21°52´18.749N\"102°22´24.710W\"21.871875-102.3735311901F13D185**2
2100100961AguascalientesAgs.1Aguascalientes96Agua AzulR21°53´01.522N\"102°21´25.639W\"21.883756-102.3571221861F13D1841241712
3100101001AguascalientesAgs.1Aguascalientes100Rancho AlegreR21°51´16.556N\"102°22´21.884W\"21.854599-102.3727461879F13D180000
4100101021AguascalientesAgs.1Aguascalientes102Los Arbolitos [Rancho]R21°46´48.650N\"102°21´26.261W\"21.780181-102.3572951861F13D188**2
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Mapa Cve_Ent Nom_Ent Nom_Abr Cve_Mun Nom_Mun \\\n", "0 10010001 1 Aguascalientes Ags. 1 Aguascalientes \n", "1 10010094 1 Aguascalientes Ags. 1 Aguascalientes \n", "2 10010096 1 Aguascalientes Ags. 1 Aguascalientes \n", "3 10010100 1 Aguascalientes Ags. 1 Aguascalientes \n", "4 10010102 1 Aguascalientes Ags. 1 Aguascalientes \n", "\n", " Cve_Loc Nom_Loc Ámbito Latitud Longitud \\\n", "0 1 Aguascalientes U 21°52´47.362N\" 102°17´45.768W\" \n", "1 94 Granja Adelita R 21°52´18.749N\" 102°22´24.710W\" \n", "2 96 Agua Azul R 21°53´01.522N\" 102°21´25.639W\" \n", "3 100 Rancho Alegre R 21°51´16.556N\" 102°22´21.884W\" \n", "4 102 Los Arbolitos [Rancho] R 21°46´48.650N\" 102°21´26.261W\" \n", "\n", " Lat_Decimal Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina \\\n", "0 21.879823 -102.296047 1878 F13D19 863893 419168 \n", "1 21.871875 -102.373531 1901 F13D18 5 * \n", "2 21.883756 -102.357122 1861 F13D18 41 24 \n", "3 21.854599 -102.372746 1879 F13D18 0 0 \n", "4 21.780181 -102.357295 1861 F13D18 8 * \n", "\n", " Pob_Femenina Total De Viviendas Habitadas \n", "0 444725 246259 \n", "1 * 2 \n", "2 17 12 \n", "3 0 0 \n", "4 * 2 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ruta_areas_inegi = '/content/drive/MyDrive/Colab Notebooks/curso_datos/AGEEML_2022842026272.csv'\n", "areas_inegi = pd.read_csv(ruta_areas_inegi)\n", "areas_inegi.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "XlOqlWihaI33" }, "source": [ "## Pasos previos para el inner merge\n", "\n", "Para nuestros conjuntos de datos requiere un `inner merge` para geolocalizar los datos. Pero para ello requeriremos realizar algunas tareas previas:\n", "\n", "### 1. Renombrar columna para unión en el segundo dataframe" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 392 }, "id": "ojHsgKCwavdh", "outputId": "312f78fc-04f2-46e4-eeac-bb693031ea72" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MapaCve_EntNom_EntNom_AbrCve_Munmunicipio_residenciaCve_LocNom_LocÁmbitoLatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
0100100011AguascalientesAgs.1Aguascalientes1AguascalientesU21°52´47.362N\"102°17´45.768W\"21.879823-102.2960471878F13D19863893419168444725246259
1100100941AguascalientesAgs.1Aguascalientes94Granja AdelitaR21°52´18.749N\"102°22´24.710W\"21.871875-102.3735311901F13D185**2
2100100961AguascalientesAgs.1Aguascalientes96Agua AzulR21°53´01.522N\"102°21´25.639W\"21.883756-102.3571221861F13D1841241712
3100101001AguascalientesAgs.1Aguascalientes100Rancho AlegreR21°51´16.556N\"102°22´21.884W\"21.854599-102.3727461879F13D180000
4100101021AguascalientesAgs.1Aguascalientes102Los Arbolitos [Rancho]R21°46´48.650N\"102°21´26.261W\"21.780181-102.3572951861F13D188**2
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Mapa Cve_Ent Nom_Ent Nom_Abr Cve_Mun municipio_residencia \\\n", "0 10010001 1 Aguascalientes Ags. 1 Aguascalientes \n", "1 10010094 1 Aguascalientes Ags. 1 Aguascalientes \n", "2 10010096 1 Aguascalientes Ags. 1 Aguascalientes \n", "3 10010100 1 Aguascalientes Ags. 1 Aguascalientes \n", "4 10010102 1 Aguascalientes Ags. 1 Aguascalientes \n", "\n", " Cve_Loc Nom_Loc Ámbito Latitud Longitud \\\n", "0 1 Aguascalientes U 21°52´47.362N\" 102°17´45.768W\" \n", "1 94 Granja Adelita R 21°52´18.749N\" 102°22´24.710W\" \n", "2 96 Agua Azul R 21°53´01.522N\" 102°21´25.639W\" \n", "3 100 Rancho Alegre R 21°51´16.556N\" 102°22´21.884W\" \n", "4 102 Los Arbolitos [Rancho] R 21°46´48.650N\" 102°21´26.261W\" \n", "\n", " Lat_Decimal Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina \\\n", "0 21.879823 -102.296047 1878 F13D19 863893 419168 \n", "1 21.871875 -102.373531 1901 F13D18 5 * \n", "2 21.883756 -102.357122 1861 F13D18 41 24 \n", "3 21.854599 -102.372746 1879 F13D18 0 0 \n", "4 21.780181 -102.357295 1861 F13D18 8 * \n", "\n", " Pob_Femenina Total De Viviendas Habitadas \n", "0 444725 246259 \n", "1 * 2 \n", "2 17 12 \n", "3 0 0 \n", "4 * 2 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "areas_inegi.rename(\n", " columns={'Nom_Mun':'municipio_residencia'}, # recordemos que cambiamos el nombre de la columna en el ejercicio anterior\n", " inplace=True)\n", "areas_inegi.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "rPased8Pa4Tx" }, "source": [ "### 2. Normalizar la columna común" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "7AMmcmptag4d" }, "outputs": [], "source": [ "covid_nacional['municipio_residencia'] = covid_nacional['municipio_residencia'].str.lower()\n", "areas_inegi['municipio_residencia'] = areas_inegi['municipio_residencia'].str.lower()" ] }, { "cell_type": "markdown", "metadata": { "id": "TxPQCvYBbEgn" }, "source": [ "### 3. Segmentación de la información" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "iMqjdRhqbIZu" }, "outputs": [], "source": [ "areas_inegi_tm = areas_inegi.loc[areas_inegi['Cve_Loc'] == 1]" ] }, { "cell_type": "markdown", "metadata": { "id": "iywXccYJbLDR" }, "source": [ "## Realización del inner merge" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 473 }, "id": "-_A78Ce3bQPN", "outputId": "14b65078-4b6a-41af-b566-92fbfa56ee7b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(158085, 59)\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0fecha_actualizacionid_registroorigensectorentidad_umsexoentidad_nacimientoentidad_residenciamunicipio_residencia...LatitudLongitudLat_DecimalLon_DecimalAltitudCve_CartaPob_TotalPob_MasculinaPob_FemeninaTotal De Viviendas Habitadas
082022-06-260ba73dFUERA DE USMERISSSTECIUDAD DE MÉXICOMUJERQUERÉTAROMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
11432022-06-26588e9bFUERA DE USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICOMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
21542022-06-2651860aUSMERSSACIUDAD DE MÉXICOHOMBRECIUDAD DE MÉXICOMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
39122022-06-26de16a0USMERSSACIUDAD DE MÉXICOMUJERCIUDAD DE MÉXICOMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
410322022-06-265f39e3USMERSSACIUDAD DE MÉXICOHOMBREGUANAJUATOMÉXICOnaucalpan de juárez...19°28´43.690N\"099°13´59.585W\"19.478803-99.2332182280E14A39776220373698402522225509
\n", "

5 rows × 59 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " Unnamed: 0 fecha_actualizacion id_registro origen sector \\\n", "0 8 2022-06-26 0ba73d FUERA DE USMER ISSSTE \n", "1 143 2022-06-26 588e9b FUERA DE USMER SSA \n", "2 154 2022-06-26 51860a USMER SSA \n", "3 912 2022-06-26 de16a0 USMER SSA \n", "4 1032 2022-06-26 5f39e3 USMER SSA \n", "\n", " entidad_um sexo entidad_nacimiento entidad_residencia \\\n", "0 CIUDAD DE MÉXICO MUJER QUERÉTARO MÉXICO \n", "1 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO MÉXICO \n", "2 CIUDAD DE MÉXICO HOMBRE CIUDAD DE MÉXICO MÉXICO \n", "3 CIUDAD DE MÉXICO MUJER CIUDAD DE MÉXICO MÉXICO \n", "4 CIUDAD DE MÉXICO HOMBRE GUANAJUATO MÉXICO \n", "\n", " municipio_residencia ... Latitud Longitud Lat_Decimal \\\n", "0 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "1 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "2 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "3 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "4 naucalpan de juárez ... 19°28´43.690N\" 099°13´59.585W\" 19.478803 \n", "\n", " Lon_Decimal Altitud Cve_Carta Pob_Total Pob_Masculina Pob_Femenina \\\n", "0 -99.233218 2280 E14A39 776220 373698 402522 \n", "1 -99.233218 2280 E14A39 776220 373698 402522 \n", "2 -99.233218 2280 E14A39 776220 373698 402522 \n", "3 -99.233218 2280 E14A39 776220 373698 402522 \n", "4 -99.233218 2280 E14A39 776220 373698 402522 \n", "\n", " Total De Viviendas Habitadas \n", "0 225509 \n", "1 225509 \n", "2 225509 \n", "3 225509 \n", "4 225509 \n", "\n", "[5 rows x 59 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "conjunto_datos = pd.merge(covid_nacional, areas_inegi_tm, how='inner', on='municipio_residencia')\n", "print(conjunto_datos.shape)\n", "conjunto_datos.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "I6v6L4v8bgT5" }, "source": [ "## Limpieza de datos\n", "\n", "### Segmentación por columnas útiles" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 357 }, "id": "2eP6bVDfblg0", "outputId": "890e15ab-aa5d-4d0b-caaf-fb0d7f3b40ae" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexoedadentidad_nacimientomunicipio_residenciaindigenanacionalidadmigrantepais_nacionalidadfecha_ingresofecha_sintomasfecha_defmunicipio_residenciaLat_DecimalLon_Decimal
0MUJER75QUERÉTAROnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-212022-02-16NaNnaucalpan de juárez19.478803-99.233218
1MUJER32CIUDAD DE MÉXICOnaucalpan de juárezNO ESPECIFICADOMEXICANANO ESPECIFICADOMÉXICO2022-01-072022-01-02NaNnaucalpan de juárez19.478803-99.233218
2HOMBRE30CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-042022-02-03NaNnaucalpan de juárez19.478803-99.233218
3MUJER51CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-28NaNnaucalpan de juárez19.478803-99.233218
4HOMBRE83GUANAJUATOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-30NaNnaucalpan de juárez19.478803-99.233218
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " sexo edad entidad_nacimiento municipio_residencia indigena \\\n", "0 MUJER 75 QUERÉTARO naucalpan de juárez NO \n", "1 MUJER 32 CIUDAD DE MÉXICO naucalpan de juárez NO ESPECIFICADO \n", "2 HOMBRE 30 CIUDAD DE MÉXICO naucalpan de juárez NO \n", "3 MUJER 51 CIUDAD DE MÉXICO naucalpan de juárez NO \n", "4 HOMBRE 83 GUANAJUATO naucalpan de juárez NO \n", "\n", " nacionalidad migrante pais_nacionalidad fecha_ingreso \\\n", "0 MEXICANA NO ESPECIFICADO MÉXICO 2022-02-21 \n", "1 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-07 \n", "2 MEXICANA NO ESPECIFICADO MÉXICO 2022-02-04 \n", "3 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-01 \n", "4 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-01 \n", "\n", " fecha_sintomas fecha_def municipio_residencia Lat_Decimal Lon_Decimal \n", "0 2022-02-16 NaN naucalpan de juárez 19.478803 -99.233218 \n", "1 2022-01-02 NaN naucalpan de juárez 19.478803 -99.233218 \n", "2 2022-02-03 NaN naucalpan de juárez 19.478803 -99.233218 \n", "3 2021-12-28 NaN naucalpan de juárez 19.478803 -99.233218 \n", "4 2021-12-30 NaN naucalpan de juárez 19.478803 -99.233218 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "muestra_covid = conjunto_datos[['sexo', 'edad', 'entidad_nacimiento', 'municipio_residencia', 'indigena', 'nacionalidad', 'migrante', 'pais_nacionalidad', 'fecha_ingreso', 'fecha_sintomas', 'fecha_def', 'municipio_residencia', 'Lat_Decimal', 'Lon_Decimal']]\n", "muestra_covid.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "y7SLyxm3cJT6" }, "source": [ "### Lidiar con datos nulos" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 835 }, "id": "jyX1zIkocLwU", "outputId": "e9f70bed-dda9-47b6-b3a3-bf9fc5863fe4" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py:5182: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " downcast=downcast,\n", "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py:6392: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " return self._update_inplace(result)\n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexoedadentidad_nacimientomunicipio_residenciaindigenanacionalidadmigrantepais_nacionalidadfecha_ingresofecha_sintomasfecha_defmunicipio_residenciaLat_DecimalLon_Decimal
0MUJER75QUERÉTAROnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-212022-02-16NaNnaucalpan de juárez19.478803-99.233218
1MUJER32CIUDAD DE MÉXICOnaucalpan de juárezNO ESPECIFICADOMEXICANANO ESPECIFICADOMÉXICO2022-01-072022-01-02NaNnaucalpan de juárez19.478803-99.233218
2HOMBRE30CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-042022-02-03NaNnaucalpan de juárez19.478803-99.233218
3MUJER51CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-28NaNnaucalpan de juárez19.478803-99.233218
4HOMBRE83GUANAJUATOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-30NaNnaucalpan de juárez19.478803-99.233218
.............................................
158080HOMBRE12VERACRUZ DE IGNACIO DE LA LLAVEamatlán de los reyesNOMEXICANANO ESPECIFICADOMÉXICO2022-06-232022-06-23NaNamatlán de los reyes18.847578-96.915484
158081MUJER46CIUDAD DE MÉXICOamatlán de los reyesNOMEXICANANO ESPECIFICADOMÉXICO2022-06-222022-06-19NaNamatlán de los reyes18.847578-96.915484
158082MUJER59CIUDAD DE MÉXICOgeneral simón bolívarNOMEXICANANO ESPECIFICADOMÉXICO2022-06-232022-06-22NaNgeneral simón bolívar24.689074-103.225975
158083MUJER27MÉXICOtemozónNOMEXICANANO ESPECIFICADOMÉXICO2022-06-242022-06-22NaNtemozón20.803680-88.201158
158084MUJER32MÉXICOizamalNOMEXICANANO ESPECIFICADOMÉXICO2022-06-242022-06-20NaNizamal20.932998-89.019715
\n", "

158085 rows × 14 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " sexo edad entidad_nacimiento municipio_residencia \\\n", "0 MUJER 75 QUERÉTARO naucalpan de juárez \n", "1 MUJER 32 CIUDAD DE MÉXICO naucalpan de juárez \n", "2 HOMBRE 30 CIUDAD DE MÉXICO naucalpan de juárez \n", "3 MUJER 51 CIUDAD DE MÉXICO naucalpan de juárez \n", "4 HOMBRE 83 GUANAJUATO naucalpan de juárez \n", "... ... ... ... ... \n", "158080 HOMBRE 12 VERACRUZ DE IGNACIO DE LA LLAVE amatlán de los reyes \n", "158081 MUJER 46 CIUDAD DE MÉXICO amatlán de los reyes \n", "158082 MUJER 59 CIUDAD DE MÉXICO general simón bolívar \n", "158083 MUJER 27 MÉXICO temozón \n", "158084 MUJER 32 MÉXICO izamal \n", "\n", " indigena nacionalidad migrante pais_nacionalidad \\\n", "0 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "1 NO ESPECIFICADO MEXICANA NO ESPECIFICADO MÉXICO \n", "2 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "3 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "4 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "... ... ... ... ... \n", "158080 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158081 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158082 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158083 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158084 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "\n", " fecha_ingreso fecha_sintomas fecha_def municipio_residencia \\\n", "0 2022-02-21 2022-02-16 NaN naucalpan de juárez \n", "1 2022-01-07 2022-01-02 NaN naucalpan de juárez \n", "2 2022-02-04 2022-02-03 NaN naucalpan de juárez \n", "3 2022-01-01 2021-12-28 NaN naucalpan de juárez \n", "4 2022-01-01 2021-12-30 NaN naucalpan de juárez \n", "... ... ... ... ... \n", "158080 2022-06-23 2022-06-23 NaN amatlán de los reyes \n", "158081 2022-06-22 2022-06-19 NaN amatlán de los reyes \n", "158082 2022-06-23 2022-06-22 NaN general simón bolívar \n", "158083 2022-06-24 2022-06-22 NaN temozón \n", "158084 2022-06-24 2022-06-20 NaN izamal \n", "\n", " Lat_Decimal Lon_Decimal \n", "0 19.478803 -99.233218 \n", "1 19.478803 -99.233218 \n", "2 19.478803 -99.233218 \n", "3 19.478803 -99.233218 \n", "4 19.478803 -99.233218 \n", "... ... ... \n", "158080 18.847578 -96.915484 \n", "158081 18.847578 -96.915484 \n", "158082 24.689074 -103.225975 \n", "158083 20.803680 -88.201158 \n", "158084 20.932998 -89.019715 \n", "\n", "[158085 rows x 14 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "muestra_covid.fillna({'municipio_residencia': 'NO APLICA', 'pais_nacionalidad': 'NO APLICA'}, inplace=True)\n", "muestra_covid" ] }, { "cell_type": "markdown", "metadata": { "id": "xhoZAZ12ca4D" }, "source": [ "### Transformar datos" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "kevd6_hXcdwH", "outputId": "eceaa809-2185-4adf-a301-b46e870b0fbf" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " This is separate from the ipykernel package so we can avoid doing imports until\n" ] }, { "data": { "text/plain": [ "sexo object\n", "edad int64\n", "entidad_nacimiento object\n", "municipio_residencia object\n", "indigena object\n", "nacionalidad object\n", "migrante object\n", "pais_nacionalidad object\n", "fecha_ingreso datetime64[ns]\n", "fecha_sintomas datetime64[ns]\n", "fecha_def datetime64[ns]\n", "municipio_residencia object\n", "Lat_Decimal float64\n", "Lon_Decimal float64\n", "dtype: object" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "columnas = ['fecha_ingreso', 'fecha_sintomas', 'fecha_def']\n", "for columna in columnas:\n", " muestra_covid[columna] = pd.to_datetime(muestra_covid.loc[:, columna])\n", "\n", "muestra_covid.dtypes" ] }, { "cell_type": "markdown", "metadata": { "id": "6YSKtk_p18es" }, "source": [ "# Georeferenciar los datos de `pais_nacionalidad`" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "RuPkl6XG3Xpe", "outputId": "eabebed2-03a1-4744-dfc2-b4a24f9f8f50" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n", "Collecting pycountry\n", " Downloading pycountry-22.3.5.tar.gz (10.1 MB)\n", "\u001b[K |████████████████████████████████| 10.1 MB 25.6 MB/s \n", "\u001b[?25h Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n", " Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n", " Preparing wheel metadata ... \u001b[?25l\u001b[?25hdone\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from pycountry) (57.4.0)\n", "Building wheels for collected packages: pycountry\n", " Building wheel for pycountry (PEP 517) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for pycountry: filename=pycountry-22.3.5-py2.py3-none-any.whl size=10681845 sha256=b33752378c0cefcf260b7f88dcdb7027d6691af7862ba5b8fcbb59b9832d4c66\n", " Stored in directory: /root/.cache/pip/wheels/0e/06/e8/7ee176e95ea9a8a8c3b3afcb1869f20adbd42413d4611c6eb4\n", "Successfully built pycountry\n", "Installing collected packages: pycountry\n", "Successfully installed pycountry-22.3.5\n" ] } ], "source": [ "# obtener librería pycountry\n", "!pip install pycountry\n", "import pycountry" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "-5KKVYb92Ac9" }, "outputs": [], "source": [ "# Función para convertir los datos a iso en español\n", "\n", "import gettext\n", "\n", "def map_country_code(country_name, language, iso):\n", " '''\n", " country_name: str. El nombre del país en español.\n", " language: str. El idioma en el que se desea obtener el código (p. ej: 'es').\n", " iso: str. Opciones posibles: 'alpha_2' o 'alpha_3'.\n", " '''\n", " try:\n", " if country_name is None:\n", " return None\n", " elif country_name == 'MÉXICO': # esta condición sintetiza el caso de México (reduce de 5 minutos a 6 segundos el tiempo de ejecución)\n", " if iso == 'alpha_2':\n", " return 'MX'\n", " elif iso == 'alpha_3':\n", " return 'MEX'\n", " spanish = gettext.translation('iso3166', pycountry.LOCALES_DIR, languages=[language])\n", " spanish.install()\n", " _ = spanish.gettext\n", " for english_country in pycountry.countries:\n", " country_name = country_name.lower()\n", " spanish_country = _(english_country.name).lower()\n", " if spanish_country == country_name:\n", " if iso == 'alpha_3':\n", " return english_country.alpha_3\n", " elif iso == 'alpha_2':\n", " return english_country.alpha_2\n", " except Exception as e:\n", " raise" ] }, { "cell_type": "markdown", "metadata": { "id": "P8kDOx2_2KBW" }, "source": [ "Conversión de los nombres a códigos alpha" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 565 }, "id": "VmlTv8n32OYj", "outputId": "199f7879-e7d3-4ddc-eafe-edc884769cfb" }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " \"\"\"Entry point for launching an IPython kernel.\n", "/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " \n" ] }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexoedadentidad_nacimientomunicipio_residenciaindigenanacionalidadmigrantepais_nacionalidadfecha_ingresofecha_sintomasfecha_defmunicipio_residenciaLat_DecimalLon_Decimalalpha3alpha2
0MUJER75QUERÉTAROnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-212022-02-16NaTnaucalpan de juárez19.478803-99.233218MEXMX
1MUJER32CIUDAD DE MÉXICOnaucalpan de juárezNO ESPECIFICADOMEXICANANO ESPECIFICADOMÉXICO2022-01-072022-01-02NaTnaucalpan de juárez19.478803-99.233218MEXMX
2HOMBRE30CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-042022-02-03NaTnaucalpan de juárez19.478803-99.233218MEXMX
3MUJER51CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-28NaTnaucalpan de juárez19.478803-99.233218MEXMX
4HOMBRE83GUANAJUATOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-30NaTnaucalpan de juárez19.478803-99.233218MEXMX
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " sexo edad entidad_nacimiento municipio_residencia indigena \\\n", "0 MUJER 75 QUERÉTARO naucalpan de juárez NO \n", "1 MUJER 32 CIUDAD DE MÉXICO naucalpan de juárez NO ESPECIFICADO \n", "2 HOMBRE 30 CIUDAD DE MÉXICO naucalpan de juárez NO \n", "3 MUJER 51 CIUDAD DE MÉXICO naucalpan de juárez NO \n", "4 HOMBRE 83 GUANAJUATO naucalpan de juárez NO \n", "\n", " nacionalidad migrante pais_nacionalidad fecha_ingreso \\\n", "0 MEXICANA NO ESPECIFICADO MÉXICO 2022-02-21 \n", "1 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-07 \n", "2 MEXICANA NO ESPECIFICADO MÉXICO 2022-02-04 \n", "3 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-01 \n", "4 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-01 \n", "\n", " fecha_sintomas fecha_def municipio_residencia Lat_Decimal Lon_Decimal \\\n", "0 2022-02-16 NaT naucalpan de juárez 19.478803 -99.233218 \n", "1 2022-01-02 NaT naucalpan de juárez 19.478803 -99.233218 \n", "2 2022-02-03 NaT naucalpan de juárez 19.478803 -99.233218 \n", "3 2021-12-28 NaT naucalpan de juárez 19.478803 -99.233218 \n", "4 2021-12-30 NaT naucalpan de juárez 19.478803 -99.233218 \n", "\n", " alpha3 alpha2 \n", "0 MEX MX \n", "1 MEX MX \n", "2 MEX MX \n", "3 MEX MX \n", "4 MEX MX " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "muestra_covid['alpha3'] = muestra_covid['pais_nacionalidad'].apply(lambda x: map_country_code(x, 'es', 'alpha_3'))\n", "muestra_covid['alpha2'] = muestra_covid['pais_nacionalidad'].apply(lambda x: map_country_code(x, 'es', 'alpha_2'))\n", "muestra_covid.head()" ] }, { "cell_type": "markdown", "metadata": { "id": "WnAQuxDkcrtT" }, "source": [ "## Guardar a CSV" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "eE3ImXLXctnk" }, "outputs": [], "source": [ "ruta = '/content/drive/MyDrive/Colab Notebooks/curso_datos/covid_clean.csv'\n", "muestra_covid.to_csv(ruta, index=False)" ] }, { "cell_type": "markdown", "metadata": { "id": "MkV1x1AFc9HC" }, "source": [ "comprobación" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 661 }, "id": "6rNehHq5c_TS", "outputId": "1b115d2d-c8f5-40c5-c073-b5f84d3bb24c" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sexoedadentidad_nacimientomunicipio_residenciaindigenanacionalidadmigrantepais_nacionalidadfecha_ingresofecha_sintomasfecha_defmunicipio_residencia.1Lat_DecimalLon_Decimalalpha3alpha2
0MUJER75QUERÉTAROnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-212022-02-16NaNnaucalpan de juárez19.478803-99.233218MEXMX
1MUJER32CIUDAD DE MÉXICOnaucalpan de juárezNO ESPECIFICADOMEXICANANO ESPECIFICADOMÉXICO2022-01-072022-01-02NaNnaucalpan de juárez19.478803-99.233218MEXMX
2HOMBRE30CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-02-042022-02-03NaNnaucalpan de juárez19.478803-99.233218MEXMX
3MUJER51CIUDAD DE MÉXICOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-28NaNnaucalpan de juárez19.478803-99.233218MEXMX
4HOMBRE83GUANAJUATOnaucalpan de juárezNOMEXICANANO ESPECIFICADOMÉXICO2022-01-012021-12-30NaNnaucalpan de juárez19.478803-99.233218MEXMX
...................................................
158080HOMBRE12VERACRUZ DE IGNACIO DE LA LLAVEamatlán de los reyesNOMEXICANANO ESPECIFICADOMÉXICO2022-06-232022-06-23NaNamatlán de los reyes18.847578-96.915484MEXMX
158081MUJER46CIUDAD DE MÉXICOamatlán de los reyesNOMEXICANANO ESPECIFICADOMÉXICO2022-06-222022-06-19NaNamatlán de los reyes18.847578-96.915484MEXMX
158082MUJER59CIUDAD DE MÉXICOgeneral simón bolívarNOMEXICANANO ESPECIFICADOMÉXICO2022-06-232022-06-22NaNgeneral simón bolívar24.689074-103.225975MEXMX
158083MUJER27MÉXICOtemozónNOMEXICANANO ESPECIFICADOMÉXICO2022-06-242022-06-22NaNtemozón20.803680-88.201158MEXMX
158084MUJER32MÉXICOizamalNOMEXICANANO ESPECIFICADOMÉXICO2022-06-242022-06-20NaNizamal20.932998-89.019715MEXMX
\n", "

158085 rows × 16 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " sexo edad entidad_nacimiento municipio_residencia \\\n", "0 MUJER 75 QUERÉTARO naucalpan de juárez \n", "1 MUJER 32 CIUDAD DE MÉXICO naucalpan de juárez \n", "2 HOMBRE 30 CIUDAD DE MÉXICO naucalpan de juárez \n", "3 MUJER 51 CIUDAD DE MÉXICO naucalpan de juárez \n", "4 HOMBRE 83 GUANAJUATO naucalpan de juárez \n", "... ... ... ... ... \n", "158080 HOMBRE 12 VERACRUZ DE IGNACIO DE LA LLAVE amatlán de los reyes \n", "158081 MUJER 46 CIUDAD DE MÉXICO amatlán de los reyes \n", "158082 MUJER 59 CIUDAD DE MÉXICO general simón bolívar \n", "158083 MUJER 27 MÉXICO temozón \n", "158084 MUJER 32 MÉXICO izamal \n", "\n", " indigena nacionalidad migrante pais_nacionalidad \\\n", "0 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "1 NO ESPECIFICADO MEXICANA NO ESPECIFICADO MÉXICO \n", "2 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "3 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "4 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "... ... ... ... ... \n", "158080 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158081 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158082 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158083 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "158084 NO MEXICANA NO ESPECIFICADO MÉXICO \n", "\n", " fecha_ingreso fecha_sintomas fecha_def municipio_residencia.1 \\\n", "0 2022-02-21 2022-02-16 NaN naucalpan de juárez \n", "1 2022-01-07 2022-01-02 NaN naucalpan de juárez \n", "2 2022-02-04 2022-02-03 NaN naucalpan de juárez \n", "3 2022-01-01 2021-12-28 NaN naucalpan de juárez \n", "4 2022-01-01 2021-12-30 NaN naucalpan de juárez \n", "... ... ... ... ... \n", "158080 2022-06-23 2022-06-23 NaN amatlán de los reyes \n", "158081 2022-06-22 2022-06-19 NaN amatlán de los reyes \n", "158082 2022-06-23 2022-06-22 NaN general simón bolívar \n", "158083 2022-06-24 2022-06-22 NaN temozón \n", "158084 2022-06-24 2022-06-20 NaN izamal \n", "\n", " Lat_Decimal Lon_Decimal alpha3 alpha2 \n", "0 19.478803 -99.233218 MEX MX \n", "1 19.478803 -99.233218 MEX MX \n", "2 19.478803 -99.233218 MEX MX \n", "3 19.478803 -99.233218 MEX MX \n", "4 19.478803 -99.233218 MEX MX \n", "... ... ... ... ... \n", "158080 18.847578 -96.915484 MEX MX \n", "158081 18.847578 -96.915484 MEX MX \n", "158082 24.689074 -103.225975 MEX MX \n", "158083 20.803680 -88.201158 MEX MX \n", "158084 20.932998 -89.019715 MEX MX \n", "\n", "[158085 rows x 16 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_csv(ruta)" ] } ], "metadata": { "colab": { "authorship_tag": "ABX9TyNhAZZ1RHuvIZl8O1JS1GcX", "collapsed_sections": [], "include_colab_link": true, "mount_file_id": "1fd3eYdJf9ooKOBF9Jf1L-UNu82l-7pWe", "name": "mi cuaderno de datos_semana3.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3.10.4 64-bit", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.4" }, "vscode": { "interpreter": { "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49" } } }, "nbformat": 4, "nbformat_minor": 0 }