{
"cells": [
{
"cell_type": "markdown",
"id": "8d7c497d",
"metadata": {},
"source": [
"# Georeferenciar\n",
"\n",
"Ya vimos en el apartado dedicado a la [unión de dataframes](../S3-procesamiento/S3P5-union-df.md) como agregar la información de geolocalización de un conjunto de datos a otro. En este sentido, la complejidad radica simplemente en encontrar un listado o conjunto de datos que coincida con los espacios que queremos representar.\n",
"\n",
"Lo que tenemos que garantizar simplemente es que la clave del nombre coincida con el lugar de geolocalización.\n",
"\n",
"En datos globales, la geolocalización por países puede valerse de los códigos alpha-2 y alpha-3, que corresponden a una cadena de dos o tres letras que identifican el país.\n",
"\n",
"En Python, una librería muy utilizada para realizar esta tarea es [`pycountry`](https://pypi.org/project/pycountry/).\n",
"\n",
"Para usar esta librería en Google Colab, primero necesitamos instalarla de la siguiente manera:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "031e849f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: pycountry in /Users/jairoantonio/opt/anaconda3/lib/python3.9/site-packages (22.3.5)\r\n",
"Requirement already satisfied: setuptools in /Users/jairoantonio/opt/anaconda3/lib/python3.9/site-packages (from pycountry) (58.0.4)\r\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\r\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.2.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3\u001b[0m\r\n",
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\r\n"
]
}
],
"source": [
"!pip install pycountry"
]
},
{
"cell_type": "markdown",
"id": "c17e8948",
"metadata": {},
"source": [
"Posteriormente podemos importarla y utilizarla:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "50bacac5",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Country(alpha_2='MX', alpha_3='MEX', flag='🇲🇽', name='Mexico', numeric='484', official_name='United Mexican States')"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pycountry\n",
"pycountry.countries.get(alpha_2='MX')"
]
},
{
"cell_type": "markdown",
"id": "ecd874a5",
"metadata": {},
"source": [
"Nuestra fuente de datos de `covid_nacional` contiene información relacionada con el país de origen, disponible en la columna `pais_nacionalidad`, así que podremos transformar esa columna para obtener los datos de georeferenciación.\n",
"\n",
"Pero antes, veamos un ejemplo ideal:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "15d0aedb",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Nombre | \n",
" País | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Andrea | \n",
" Mexico | \n",
"
\n",
" \n",
" 1 | \n",
" Natalia | \n",
" United States | \n",
"
\n",
" \n",
" 2 | \n",
" Guadalupe | \n",
" Spain | \n",
"
\n",
" \n",
" 3 | \n",
" Pedro | \n",
" France | \n",
"
\n",
" \n",
" 4 | \n",
" Joaquín | \n",
" Italy | \n",
"
\n",
" \n",
" 5 | \n",
" Julio | \n",
" Germany | \n",
"
\n",
" \n",
" 6 | \n",
" Luisa | \n",
" China | \n",
"
\n",
" \n",
" 7 | \n",
" Juan | \n",
" Japan | \n",
"
\n",
" \n",
" 8 | \n",
" Vicente | \n",
" Korea, Republic of | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Nombre País\n",
"0 Andrea Mexico\n",
"1 Natalia United States\n",
"2 Guadalupe Spain\n",
"3 Pedro France\n",
"4 Joaquín Italy\n",
"5 Julio Germany\n",
"6 Luisa China\n",
"7 Juan Japan\n",
"8 Vicente Korea, Republic of"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"paises = pd.DataFrame({'Nombre': ['Andrea', 'Natalia', 'Guadalupe', 'Pedro', 'Joaquín', 'Julio', 'Luisa', 'Juan', 'Vicente'], 'País': ['Mexico', 'United States', 'Spain', 'France', 'Italy', 'Germany', 'China', 'Japan', 'Korea, Republic of']})\n",
"paises"
]
},
{
"cell_type": "markdown",
"id": "20613cf5",
"metadata": {},
"source": [
"En este caso, podemos utilizar un función para obtener el código alpha-2 de cada país a partir de su nombre en inglés:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "25ebf6d0",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Nombre | \n",
" País | \n",
" alpha2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Andrea | \n",
" Mexico | \n",
" MX | \n",
"
\n",
" \n",
" 1 | \n",
" Natalia | \n",
" United States | \n",
" US | \n",
"
\n",
" \n",
" 2 | \n",
" Guadalupe | \n",
" Spain | \n",
" ES | \n",
"
\n",
" \n",
" 3 | \n",
" Pedro | \n",
" France | \n",
" FR | \n",
"
\n",
" \n",
" 4 | \n",
" Joaquín | \n",
" Italy | \n",
" IT | \n",
"
\n",
" \n",
" 5 | \n",
" Julio | \n",
" Germany | \n",
" DE | \n",
"
\n",
" \n",
" 6 | \n",
" Luisa | \n",
" China | \n",
" CN | \n",
"
\n",
" \n",
" 7 | \n",
" Juan | \n",
" Japan | \n",
" JP | \n",
"
\n",
" \n",
" 8 | \n",
" Vicente | \n",
" Korea, Republic of | \n",
" KR | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Nombre País alpha2\n",
"0 Andrea Mexico MX\n",
"1 Natalia United States US\n",
"2 Guadalupe Spain ES\n",
"3 Pedro France FR\n",
"4 Joaquín Italy IT\n",
"5 Julio Germany DE\n",
"6 Luisa China CN\n",
"7 Juan Japan JP\n",
"8 Vicente Korea, Republic of KR"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"paises['alpha2'] = paises['País'].apply(lambda x: pycountry.countries.get(name=x).alpha_2)\n",
"paises"
]
},
{
"cell_type": "markdown",
"id": "4f672e8c",
"metadata": {},
"source": [
"Con este dato (alpha_2 o alpha_3) tendremos la posibilidad de visualizar nuestra información en un mapa.\n",
"\n",
"```{admonition} Función lambda\n",
":class: tip\n",
"En este caso utilizamos una función lambda. Este es un concepto algo complicado, pero básicamente, es una función que utilizaremos una sola vez y de manera repetida en una serie de filas.\n",
"```\n",
"\n",
"Obviamente, el inconveniente ahora será encontrar una opción para los casos en español. En este caso, incluyo esta función, modificada ligeramente a partir de esta respuesta dada en [StackOverflow](https://stackoverflow.com/a/62486395), con la cual podemos realizar esta conversión:\n",
"\n",
"```{admonition} conversor\n",
":class: tip\n",
"El código que viene a continuación es un programa hecho a medida para manipular nuestros datos y poder visualizarlos posteriormente. No es necesario que lo apliques en este curso, pero ten en cuenta que no siempre es posible recurrir a soluciones predeterminadas y sintéticas para resolver un problema proveniente de nuestros conjuntos de datos.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "26f07d29",
"metadata": {
"tags": [
"hide-input"
]
},
"outputs": [],
"source": [
"import pycountry\n",
"import gettext\n",
"\n",
"# Esta corrección es necesaria para incluir una serie de países que de otra manera quedarían excluidos\n",
"# Se excluyeron las siguientes claves por ser ambiguas o no localizables: 'OTRO', 'SE DESCONOCE', 'REPÚBLICA CHECA Y REPÚBLICA ESLOVACA', 'PAÍSES DE LA EX-U.R.S.S., EXCEPTO UCRANIA Y BIELORUSIA', 'AZERBAIYÁN - ISLAS AZORES'\n",
"\n",
"correccion_paises = {\n",
" 'nombre_original': ['ESTADOS UNIDOS DE AMÉRICA', 'VENEZUELA', 'TAIWÁN', 'HOLANDA', 'REPÚBLICA DE HONDURAS', 'BOLIVIA',\n",
" 'REPÚBLICA DE COREA', 'GRAN BRETAÑA (REINO UNIDO)', 'RUSIA',\n",
" 'REPÚBLICA DE COSTA RICA', 'REPÚBLICA DE PANAMÁ',\n",
" 'REPÚBLICA ORIENTAL DEL URUGUAY',\n",
" 'RUMANIA', 'IRÁN', 'ESTADO LIBRE ASOCIADO DE PUERTO RICO',\n",
" 'ESTADO DE KUWAIT', 'ANTIGUA Y BERMUDA',\n",
" 'CAMPIONE DITALIA',\n",
" 'EMIRATOS ARABES UNIDOS',\n",
" 'ZONA ESPECIAL CANARIA', 'COMMONWEALTH DE DOMINICA',\n",
" 'THAILANDIA', 'ESTADO DE BAHREIN', 'MALÍ',\n",
" 'ISLAS MENORES ALEJADAS DE LOS ESTADOS UNIDOS', 'GUYANA FRANCESA',\n",
" 'IRAQ'],\n",
" 'nombre_corregido': ['ESTADOS UNIDOS', 'VENEZUELA, REPÚBLICA BOLIVARIANA DE', 'TAIWÁN, PROVINCIA DE CHINA', 'PAÍSES BAJOS', 'HONDURAS', 'BOLIVIA, ESTADO PLURINACIONAL de',\n",
" 'COREA, REPÚBLICA DE', 'Reino Unido', 'FEDERACIÓN RUSA',\n",
" 'COSTA RICA', 'PANAMÁ', 'URUGUAY', 'rumanía', 'irán, república islámica de', 'Puerto rico', 'KUWAIT', 'ANTIGUA Y BARBUDA', 'ITALIA', 'EMIRATOS ÁRABES UNIDOS', 'ESPAÑA', 'DOMINICA', 'TAILANDIA', 'BAHREIN', 'MALI', 'ESTADOS UNIDOS', 'GUYANA', 'IRAK']\n",
" }\n",
"\n",
"\n",
"none_countries = {'nombre': ['ZONA NEUTRAL', 'COSTA DE MARFIL', 'REPÚBLICA DEMOCRÁTICA DE COREA', 'ARGELIA', 'NUEVA ZELANDIA', 'ARABIA SAUDITA', 'REPÚBLICA CENTRO AFRICANA', 'SUDÁFRICA'],\n",
" 'alpha2': ['NT', 'CI', 'KP', 'DZ', 'NZ', 'SA', 'CF', 'ZA'],\n",
" 'alpha3': ['NTZ', 'CIV', 'PRK', 'DZA', 'NZL', 'SAU', 'CAF', 'ZAF']}\n",
"\n",
"\n",
"def map_country_code(country_name, language, iso):\n",
" '''\n",
" country_name: str. El nombre del país en español.\n",
" language: str. El idioma en el que se desea obtener el código (p. ej: 'es').\n",
" iso: str. Opciones posibles: 'alpha_2' o 'alpha_3'.\n",
" '''\n",
" try:\n",
" if country_name is None:\n",
" return None\n",
" # esta condición sintetiza el caso de México (reduce de 5 minutos a 6 segundos el tiempo de ejecución)\n",
" elif country_name == 'MÉXICO':\n",
" if iso == 'alpha_2':\n",
" return 'MX'\n",
" elif iso == 'alpha_3':\n",
" return 'MEX'\n",
"\n",
" spanish = gettext.translation(\n",
" 'iso3166', pycountry.LOCALES_DIR, languages=[language])\n",
" spanish.install()\n",
" _ = spanish.gettext\n",
"\n",
" # check if country_name is in correccion_paises['nombre_original'] and correct it with correccion_paises['nombre_corregido']\n",
" if country_name in correccion_paises['nombre_original']:\n",
" country_name = correccion_paises['nombre_corregido'][correccion_paises['nombre_original'].index(country_name)] \n",
"\n",
" if country_name in none_countries['nombre']:\n",
" if iso == 'alpha_2':\n",
" return none_countries['alpha2'][none_countries['nombre'].index(country_name)]\n",
" elif iso == 'alpha_3':\n",
" return none_countries['alpha3'][none_countries['nombre'].index(country_name)]\n",
" else:\n",
" for english_country in pycountry.countries:\n",
" country_name = country_name.lower()\n",
" spanish_country = _(english_country.name).lower()\n",
" if spanish_country == country_name:\n",
" if iso == 'alpha_3':\n",
" return english_country.alpha_3\n",
" elif iso == 'alpha_2':\n",
" return english_country.alpha_2\n",
" \n",
" except Exception as e:\n",
" raise"
]
},
{
"cell_type": "markdown",
"id": "45df9b03",
"metadata": {},
"source": [
"Por lo pronto, solamente es relevante que con esta función puedes obtener el código alpha-2 o alpha-3 de un país en varios idiomas. Podemos probar que funciona de la siguiente manera:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "74221ee2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'ES'"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"map_country_code('España', 'es', 'alpha_2')"
]
},
{
"cell_type": "markdown",
"id": "12c5f154",
"metadata": {},
"source": [
"Ahora, vamos a aplicarlo a nuestro conjunto de datos:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "7d551af6",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/jairoantonio/opt/anaconda3/lib/python3.9/site-packages/IPython/core/interactiveshell.py:3444: DtypeWarning: Columns (11) have mixed types.Specify dtype option on import or set low_memory=False.\n",
" exec(code_obj, self.user_global_ns, self.user_ns)\n"
]
}
],
"source": [
"muestra_covid = pd.read_csv(\"../data/muestra_covid.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "6b29b370",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" sexo | \n",
" edad | \n",
" entidad_nacimiento | \n",
" municipio_residencia | \n",
" indigena | \n",
" nacionalidad | \n",
" migrante | \n",
" pais_nacionalidad | \n",
" fecha_ingreso | \n",
" fecha_sintomas | \n",
" fecha_def | \n",
" alpha3 | \n",
" alpha2 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0 | \n",
" HOMBRE | \n",
" 43 | \n",
" CIUDAD DE MÉXICO | \n",
" NaN | \n",
" NO | \n",
" MEXICANA | \n",
" NO ESPECIFICADO | \n",
" MÉXICO | \n",
" 2022-05-03 | \n",
" 2022-05-03 | \n",
" NaN | \n",
" MEX | \n",
" MX | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" HOMBRE | \n",
" 39 | \n",
" CIUDAD DE MÉXICO | \n",
" NaN | \n",
" NO | \n",
" MEXICANA | \n",
" NO ESPECIFICADO | \n",
" MÉXICO | \n",
" 2022-01-13 | \n",
" 2022-01-10 | \n",
" NaN | \n",
" MEX | \n",
" MX | \n",
"
\n",
" \n",
" 2 | \n",
" 2 | \n",
" HOMBRE | \n",
" 55 | \n",
" CIUDAD DE MÉXICO | \n",
" NaN | \n",
" NO | \n",
" MEXICANA | \n",
" NO ESPECIFICADO | \n",
" MÉXICO | \n",
" 2022-01-12 | \n",
" 2022-01-12 | \n",
" NaN | \n",
" MEX | \n",
" MX | \n",
"
\n",
" \n",
" 3 | \n",
" 3 | \n",
" HOMBRE | \n",
" 54 | \n",
" CIUDAD DE MÉXICO | \n",
" NaN | \n",
" NO | \n",
" MEXICANA | \n",
" NO ESPECIFICADO | \n",
" MÉXICO | \n",
" 2022-02-20 | \n",
" 2022-02-13 | \n",
" NaN | \n",
" MEX | \n",
" MX | \n",
"
\n",
" \n",
" 4 | \n",
" 4 | \n",
" MUJER | \n",
" 41 | \n",
" CIUDAD DE MÉXICO | \n",
" NaN | \n",
" NO | \n",
" MEXICANA | \n",
" NO ESPECIFICADO | \n",
" MÉXICO | \n",
" 2022-01-12 | \n",
" 2022-01-10 | \n",
" NaN | \n",
" MEX | \n",
" MX | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 sexo edad entidad_nacimiento municipio_residencia indigena \\\n",
"0 0 HOMBRE 43 CIUDAD DE MÉXICO NaN NO \n",
"1 1 HOMBRE 39 CIUDAD DE MÉXICO NaN NO \n",
"2 2 HOMBRE 55 CIUDAD DE MÉXICO NaN NO \n",
"3 3 HOMBRE 54 CIUDAD DE MÉXICO NaN NO \n",
"4 4 MUJER 41 CIUDAD DE MÉXICO NaN NO \n",
"\n",
" nacionalidad migrante pais_nacionalidad fecha_ingreso \\\n",
"0 MEXICANA NO ESPECIFICADO MÉXICO 2022-05-03 \n",
"1 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-13 \n",
"2 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-12 \n",
"3 MEXICANA NO ESPECIFICADO MÉXICO 2022-02-20 \n",
"4 MEXICANA NO ESPECIFICADO MÉXICO 2022-01-12 \n",
"\n",
" fecha_sintomas fecha_def alpha3 alpha2 \n",
"0 2022-05-03 NaN MEX MX \n",
"1 2022-01-10 NaN MEX MX \n",
"2 2022-01-12 NaN MEX MX \n",
"3 2022-02-13 NaN MEX MX \n",
"4 2022-01-10 NaN MEX MX "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"muestra_covid['alpha3'] = muestra_covid['pais_nacionalidad'].apply(lambda x: map_country_code(x, 'es', 'alpha_3'))\n",
"muestra_covid['alpha2'] = muestra_covid['pais_nacionalidad'].apply(lambda x: map_country_code(x, 'es', 'alpha_2'))\n",
"muestra_covid.head()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a1a0ae6f",
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"muestra_covid.to_csv('../data/muestra_georef_covid.csv', index=False)"
]
},
{
"cell_type": "markdown",
"id": "73d3215a",
"metadata": {},
"source": [
"Como verás, las soluciones no siempre vienen dadas de antemano. Ciertas situaciones requerirán de nuestra exploración y creatividad para resolver un problema o alcanzar el objetivo que estamos buscando.\n",
"\n",
"La gran riqueza de la programación radica, precisamente, en la capacidad creativa que podemos tener con cada lenguaje."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"formats": "md:myst",
"text_representation": {
"extension": ".md",
"format_name": "myst",
"format_version": 0.13,
"jupytext_version": "1.14.0"
}
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"source_map": [
14,
28,
30,
34,
37,
43,
47,
51,
54,
70,
145,
149,
151,
155,
160,
166,
169
]
},
"nbformat": 4,
"nbformat_minor": 5
}