{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Actividad de avance del proyecto\n",
"\n",
"Después de realizar los ejercicios previos, es momento de incorporarlos a tu cuaderno de proyecto. Al finalizar, deberás tener un cuaderno de Google Colab similar a este:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "H9FzFUrajfM7"
},
"source": [
"# Importar datos\n",
"\n",
"Con \"importar datos\" nos referimos a la manera en la que preparamos la fuente de datos para ser leída por nuestro programa.\n",
"\n",
"Existen múltiples maneras de importar la información. Por ejemplo, podemos sencillamente utilizar el mismo método que usamos con nuestro archivo `ejemplo-1.txt`.\n",
"\n",
"Descarga el archivo que quieras utilizar en el directorio de Drive en el que vayas a almacenar tus datos.\n",
"\n",
"Como ejemplo, voy a utilizar los casos nacionales de COVID-19 registrados diariamente durante el primer semestre de 2022: https://datos.cdmx.gob.mx/dataset/casos-asociados-a-covid-19/resource/e5f65f40-5904-492a-ae33-1ea98fb73d78?inner_span=True\n",
"\n",
"Descargo el archivo CSV en un directorio de mi computadora. Posteriormente lo subo a mi directorio de datos de Google Drive:\n",
"\n",
"\n",
"Volvemos a nuestro cuaderno de Google Colab. Me aseguro de haber activado Google Drive en mi Google Colab y busco el directorio en el cual está mi archivo. En mi caso: `'/content/drive/MyDrive/Colab Notebooks/curso_datos/casos_nacionales_covid-19_2022_semestre1.csv'`\n",
"\n",
"Con esos pasos, podemos hacer la importación:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qnXNK7H2kz3M",
"outputId": "03d7e0e8-b02b-4101-ce97-b23d2988f946"
},
"outputs": [
{
"data": {
"text/plain": [
"['\"\",\"fecha_actualizacion\",\"id_registro\",\"origen\",\"sector\",\"entidad_um\",\"sexo\",\"entidad_nac\",\"entidad_res\",\"municipio_res\",\"tipo_paciente\",\"fecha_ingreso\",\"fecha_sintomas\",\"fecha_def\",\"intubado\",\"neumonia\",\"edad\",\"nacionalidad\",\"embarazo\",\"habla_lengua_indig\",\"indigena\",\"diabetes\",\"epoc\",\"asma\",\"inmusupr\",\"hipertension\",\"otra_com\",\"cardiovascular\",\"obesidad\",\"renal_cronica\",\"tabaquismo\",\"otro_caso\",\"toma_muestra_lab\",\"resultado_lab\",\"toma_muestra_antigeno\",\"resultado_antigeno\",\"clasificacion_final\",\"migrante\",\"pais_nacionalidad\",\"pais_origen\",\"uci\"\\n']"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"datos = '/content/drive/MyDrive/Colab Notebooks/curso_datos/casos_nacionales_covid-19_2022_semestre1.csv'\n",
"\n",
"with open(datos, 'r') as f:\n",
" data = f.readlines(10) # agrego este argumento porque el archivo es muy extenso.\n",
"\n",
"data"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "LukLwOCkpa7t"
},
"source": [
"De esta manera hemos logrado incluir el archivo en nuestro cuaderno, pero será muy complejo manipularlo. Por esta razón, es preferible utilizar una librería que nos ayude a procesar estos datos. En nuestro caso, usaremos 'pandas'\n",
"\n",
"Para hacer que nuestro programa funcione, solamente debemos importar la librería:\n",
"\n",
"`import pandas as pd`\n",
"\n",
"Y posteriormente podremos abrir nuestro archivo desde Python:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 508
},
"id": "b4yv7auIqCt7",
"outputId": "3ac77c6a-2684-44ed-9b18-f9fa480700b8"
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py:2882: DtypeWarning: Columns (13) have mixed types.Specify dtype option on import or set low_memory=False.\n",
" exec(code_obj, self.user_global_ns, self.user_ns)\n"
]
},
{
"data": {
"text/html": [
"\n",
"
\n", " | Unnamed: 0 | \n", "fecha_actualizacion | \n", "id_registro | \n", "origen | \n", "sector | \n", "entidad_um | \n", "sexo | \n", "entidad_nac | \n", "entidad_res | \n", "municipio_res | \n", "... | \n", "otro_caso | \n", "toma_muestra_lab | \n", "resultado_lab | \n", "toma_muestra_antigeno | \n", "resultado_antigeno | \n", "clasificacion_final | \n", "migrante | \n", "pais_nacionalidad | \n", "pais_origen | \n", "uci | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "2022-06-26 | \n", "0793b8 | \n", "FUERA DE USMER | \n", "SSA | \n", "CIUDAD DE MÉXICO | \n", "HOMBRE | \n", "CIUDAD DE MÉXICO | \n", "NaN | \n", "NaN | \n", "... | \n", "NO | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "SI | \n", "NEGATIVO A SARS-COV-2 | \n", "NEGATIVO A SARS-COV-2 | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO APLICA | \n", "
1 | \n", "2 | \n", "2022-06-26 | \n", "0fef08 | \n", "USMER | \n", "SSA | \n", "CIUDAD DE MÉXICO | \n", "HOMBRE | \n", "CIUDAD DE MÉXICO | \n", "NaN | \n", "NaN | \n", "... | \n", "NO | \n", "SI | \n", "POSITIVO A SARS-COV-2 | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "CASO DE SARS-COV-2 CONFIRMADO | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO APLICA | \n", "
2 | \n", "3 | \n", "2022-06-26 | \n", "11e31a | \n", "FUERA DE USMER | \n", "SSA | \n", "CIUDAD DE MÉXICO | \n", "HOMBRE | \n", "CIUDAD DE MÉXICO | \n", "NaN | \n", "NaN | \n", "... | \n", "NO | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "SI | \n", "NEGATIVO A SARS-COV-2 | \n", "NEGATIVO A SARS-COV-2 | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO APLICA | \n", "
3 | \n", "4 | \n", "2022-06-26 | \n", "0741e4 | \n", "FUERA DE USMER | \n", "ISSSTE | \n", "CIUDAD DE MÉXICO | \n", "HOMBRE | \n", "CIUDAD DE MÉXICO | \n", "NaN | \n", "NaN | \n", "... | \n", "NO | \n", "SI | \n", "RESULTADO NO ADECUADO | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "NO REALIZADO POR LABORATORIO | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO | \n", "
4 | \n", "5 | \n", "2022-06-26 | \n", "13c92b | \n", "FUERA DE USMER | \n", "SSA | \n", "CIUDAD DE MÉXICO | \n", "MUJER | \n", "CIUDAD DE MÉXICO | \n", "NaN | \n", "NaN | \n", "... | \n", "SI | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "SI | \n", "NEGATIVO A SARS-COV-2 | \n", "NEGATIVO A SARS-COV-2 | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO APLICA | \n", "
5 rows × 41 columns
\n", "\n", " | Unnamed: 0 | \n", "edad | \n", "
---|---|---|
count | \n", "1.323501e+06 | \n", "1.323501e+06 | \n", "
mean | \n", "6.617510e+05 | \n", "3.774596e+01 | \n", "
std | \n", "3.820620e+05 | \n", "1.728453e+01 | \n", "
min | \n", "1.000000e+00 | \n", "0.000000e+00 | \n", "
25% | \n", "3.308760e+05 | \n", "2.500000e+01 | \n", "
50% | \n", "6.617510e+05 | \n", "3.600000e+01 | \n", "
75% | \n", "9.926260e+05 | \n", "5.000000e+01 | \n", "
max | \n", "1.323501e+06 | \n", "1.220000e+02 | \n", "
\n", " | Unnamed: 0 | \n", "fecha_actualizacion | \n", "id_registro | \n", "origen | \n", "sector | \n", "entidad_um | \n", "sexo | \n", "entidad_nac | \n", "entidad_res | \n", "municipio_res | \n", "... | \n", "otro_caso | \n", "toma_muestra_lab | \n", "resultado_lab | \n", "toma_muestra_antigeno | \n", "resultado_antigeno | \n", "clasificacion_final | \n", "migrante | \n", "pais_nacionalidad | \n", "pais_origen | \n", "uci | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | \n", "1.323501e+06 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "149707 | \n", "149707 | \n", "... | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1323501 | \n", "1320040 | \n", "1323501 | \n", "
unique | \n", "NaN | \n", "1 | \n", "1323501 | \n", "2 | \n", "12 | \n", "32 | \n", "2 | \n", "33 | \n", "23 | \n", "1190 | \n", "... | \n", "3 | \n", "2 | \n", "5 | \n", "2 | \n", "3 | \n", "7 | \n", "3 | \n", "122 | \n", "1 | \n", "4 | \n", "
top | \n", "NaN | \n", "2022-06-26 | \n", "0793b8 | \n", "FUERA DE USMER | \n", "SSA | \n", "CIUDAD DE MÉXICO | \n", "MUJER | \n", "CIUDAD DE MÉXICO | \n", "MÉXICO | \n", "NEZAHUALCÓYOTL | \n", "... | \n", "NO | \n", "NO | \n", "NO APLICA (CASO SIN MUESTRA) | \n", "SI | \n", "NEGATIVO A SARS-COV-2 | \n", "NEGATIVO A SARS-COV-2 | \n", "NO ESPECIFICADO | \n", "MÉXICO | \n", "NO APLICA | \n", "NO APLICA | \n", "
freq | \n", "NaN | \n", "1323501 | \n", "1 | \n", "1170267 | \n", "793606 | \n", "1314661 | \n", "733991 | \n", "1052272 | \n", "133374 | \n", "26282 | \n", "... | \n", "848434 | \n", "1152385 | \n", "1152385 | \n", "1204565 | \n", "771647 | \n", "792364 | \n", "1305180 | \n", "1304673 | \n", "1320040 | \n", "1297093 | \n", "
mean | \n", "6.617510e+05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
std | \n", "3.820620e+05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
min | \n", "1.000000e+00 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
25% | \n", "3.308760e+05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
50% | \n", "6.617510e+05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
75% | \n", "9.926260e+05 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
max | \n", "1.323501e+06 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
11 rows × 41 columns
\n", "