Step-By-Step¶

This notebook guides you through a complete UrbanMapper workflow, step-by-step, using the PLUTO dataset in Downtown Brooklyn.

We’ll load data, create a street intersections layer, impute missing coordinates, filter data, map it to intersections, enrich with average floors, and visualise the results interactively. This essentially walks through Basics/[1-6] examples in a single notebook.

Data source used:

PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change

In [1]:

Copied!

import urban_mapper as um

# Initialise UrbanMapper
mapper = um.UrbanMapper()
import urban_mapper as um

# Initialise UrbanMapper
mapper = um.UrbanMapper()

Step 1: Load Data¶

Goal: Load the PLUTO dataset to begin our analysis.

Input: A CSV dataset available per the OSCUR HuggingFace datasets hub containing PLUTO data with columns like longitude, latitude, and numfloors. Replace with your own csv filepath here.

Output: A GeoDataFrame (gdf) with the loaded data, tagged with longitude and latitude columns for geospatial analysis.

Here, we use the loader module to read the CSV and specify the coordinate columns, making the data ready for geospatial operations.

In [2]:

Copied!





# Note: For the documentation interactive mode, we only query 5000 records from the dataset.  Feel free to remove for a more realistic analysis.
   
data = (
    mapper
    .loader
    .from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True)
    .with_columns(longitude_column="longitude", latitude_column="latitude")
#     .with_columns(geometry_column=<geometry_column_name>") # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.    
    .load()
)
data.head(10)  # Preview the first ten rows
# Note: For the documentation interactive mode, we only query 5000 records from the dataset.  Feel free to remove for a more realistic analysis.
   
data = (
    mapper
    .loader
    .from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True)
    .with_columns(longitude_column="longitude", latitude_column="latitude")
#     .with_columns(geometry_column=") # Replace  with the actual name of your geometry column instead of latitude and longitude columns.    
    .load()
)
data.head(10)  # Preview the first ten rows

Out[2]:

	borough	block	lot	cd	bct2020	bctcb2020	ct2010	cb2010	schooldist	council	...	appdate	plutomapid	firm07_flag	pfirm15_flag	version	dcpedited	latitude	longitude	notes	geometry
0	BK	5852	1	310.0	3003000.0	3.003000e+10	30.0	2000.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.638298	-74.030598	None	POINT (-74.0306 40.6383)
1	BK	5852	13	310.0	3003000.0	3.003000e+10	30.0	2000.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.638575	-74.030126	None	POINT (-74.03013 40.63858)
2	BK	5852	6	310.0	3003000.0	3.003000e+10	30.0	2000.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.638567	-74.030490	None	POINT (-74.03049 40.63857)
3	BK	5852	58	310.0	3003000.0	3.003000e+10	30.0	2000.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.638142	-74.029704	None	POINT (-74.0297 40.63814)
4	BK	5848	77	310.0	3003000.0	3.003000e+10	30.0	1007.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.639039	-74.030115	None	POINT (-74.03012 40.63904)
5	BK	5861	101	310.0	3003000.0	3.003000e+10	30.0	2001.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.637815	-74.030140	None	POINT (-74.03014 40.63781)
6	BK	5852	55	310.0	3003000.0	3.003000e+10	30.0	2000.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.638084	-74.029517	None	POINT (-74.02952 40.63808)
7	BK	5848	76	310.0	3003000.0	3.003000e+10	30.0	1007.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.639012	-74.030032	None	POINT (-74.03003 40.63901)
8	BK	5861	84	310.0	3003000.0	3.003000e+10	30.0	2001.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.637381	-74.030586	None	POINT (-74.03059 40.63738)
9	BK	5848	17	310.0	3003000.0	3.003000e+10	30.0	1007.0	20.0	47.0	...	None	1	NaN	NaN	25v1	None	40.639171	-74.029575	None	POINT (-74.02957 40.63917)

10 rows × 93 columns

Step 2: Create Urban Layer¶

Goal: Build a foundational layer of street intersections in Downtown Brooklyn to map our data onto.

Input: A place name (Downtown Brooklyn, New York City, USA) and mapping configuration (longitude, latitude, output column, and threshold distance).

Output: An UrbanLayer object representing street intersections, ready to associate data points with specific intersections.

We use the urban_layer module with type streets_intersections, fetch the network via OSMnx (using drive network type), and configure mapping to assign data points to the nearest intersection within 50 meters.

In [3]:

Copied!





layer = (
    mapper.urban_layer.with_type("streets_intersections")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .with_mapping(
        longitude_column="longitude",
        latitude_column="latitude",
#        geometry_column=<geometry_column_name>", # Replace <geometry_column_name> with the actual name of your geometry column instead of latitude and longitude columns.
        output_column="nearest_intersection",
        threshold_distance=50,
    )
    .build()
)
layer.static_render()  # Visualise the plain intersections statically (Optional)
layer = (
    mapper.urban_layer.with_type("streets_intersections")
    .from_place("Downtown Brooklyn, New York City, USA", network_type="drive")
    .with_mapping(
        longitude_column="longitude",
        latitude_column="latitude",
#        geometry_column=", # Replace  with the actual name of your geometry column instead of latitude and longitude columns.
        output_column="nearest_intersection",
        threshold_distance=50,
    )
    .build()
)
layer.static_render()  # Visualise the plain intersections statically (Optional)

No description has been provided for this image

Step 3: Impute Missing Data¶

Goal: Fill in missing longitude and latitude (or even geometry) values to ensure all data points can be mapped and played with.

Input: The GeoDataFrame from Step 1 (with potential missing coordinates) and the urban layer from Step 2.

Output: A GeoDataFrame with imputed coordinates, reducing missing values.

The SimpleGeoImputer from the imputer module removes records that simply are having missing coordinates (naive way) –– Further look in the documentation for more. We check missing values before and after to see the effect.

In [4]:

Copied!





print(f"Missing before: {data[['longitude', 'latitude']].isna().sum()}")
imputed_data = (
    mapper.imputer.with_type("SimpleGeoImputer")
    .on_columns(longitude_column="longitude", latitude_column="latitude")
#    .on_columns(geometry_column="geometry") # if the dataset has a geometry instead of latitude-longitude columns
    .transform(data, layer)
)
print(f"Missing after: {imputed_data[['longitude', 'latitude']].isna().sum()}")
print(f"Missing before: {data[['longitude', 'latitude']].isna().sum()}")
imputed_data = (
    mapper.imputer.with_type("SimpleGeoImputer")
    .on_columns(longitude_column="longitude", latitude_column="latitude")
#    .on_columns(geometry_column="geometry") # if the dataset has a geometry instead of latitude-longitude columns
    .transform(data, layer)
)
print(f"Missing after: {imputed_data[['longitude', 'latitude']].isna().sum()}")

Missing before: longitude    0
latitude     0
dtype: int64

Missing after: longitude    0
latitude     0
dtype: int64

Step 4: Filter Data¶

Goal: Narrow down the data to only points within Downtown Brooklyn’s bounds.

Input: The imputed GeoDataFrame from Step 3 and the urban layer from Step 2.

Output: A filtered GeoDataFrame containing only data within the layer’s bounding box.

Using the BoundingBoxFilter from the filter module, we trim the dataset to match the spatial extent of our intersections layer, reducing irrelevant data.

In [5]:

Copied!





print(f"Rows before: {len(imputed_data)}")
filtered_data = mapper.filter.with_type("BoundingBoxFilter").transform(
    imputed_data, layer
)
print(f"Rows after: {len(filtered_data)}")
print(f"Rows before: {len(imputed_data)}")
filtered_data = mapper.filter.with_type("BoundingBoxFilter").transform(
    imputed_data, layer
)
print(f"Rows after: {len(filtered_data)}")

Rows before: 5000

Rows after: 53

Step 5: Map to Nearest Layer¶

Goal: Link each data point to its nearest street intersection so later on we could enrich the intersections with some basic aggregations or geo-statistics.

Input: The filtered GeoDataFrame from Step 4.

Output: An updated UrbanLayer and a GeoDataFrame with a new nearest_intersection column indicating the closest intersection for each point.

The map_nearest_layer method uses the mapping configuration from Step 2 to associate data points with intersections, enabling spatial aggregation in the next step.

In [6]:

Copied!

_, mapped_data = layer.map_nearest_layer(filtered_data) # Outputs both the layer (unnecessary here) and the mapped data
mapped_data.head()  # Check the new 'nearest_intersection' column
_, mapped_data = layer.map_nearest_layer(filtered_data) # Outputs both the layer (unnecessary here) and the mapped data
mapped_data.head()  # Check the new 'nearest_intersection' column

Out[6]:

	borough	block	lot	cd	bct2020	bctcb2020	ct2010	cb2010	schooldist	council	...	plutomapid	firm07_flag	pfirm15_flag	version	dcpedited	latitude	longitude	notes	geometry	nearest_intersection
156	BK	2085	1	302.0	3003101.0	3.003101e+10	31.0	2000.0	13.0	35.0	...	1	NaN	NaN	25v1	None	40.691845	-73.980986	None	POINT (-73.98099 40.69185)	NaN
157	BK	2061	80	302.0	3003101.0	3.003101e+10	31.0	1001.0	13.0	35.0	...	1	NaN	NaN	25v1	None	40.692469	-73.981065	None	POINT (-73.98106 40.69247)	NaN
700	BK	157	18	302.0	3003700.0	3.003700e+10	37.0	1003.0	15.0	33.0	...	1	NaN	NaN	25v1	None	40.690059	-73.984444	None	POINT (-73.98444 40.69006)	6.0
701	BK	164	39	302.0	3003700.0	3.003700e+10	37.0	1012.0	15.0	33.0	...	1	NaN	NaN	25v1	None	40.689321	-73.986200	None	POINT (-73.9862 40.68932)	36.0
702	BK	164	34	302.0	3003700.0	3.003700e+10	37.0	1012.0	15.0	33.0	...	1	NaN	NaN	25v1	None	40.689560	-73.986143	None	POINT (-73.98614 40.68956)	37.0

5 rows × 94 columns

Step 6: Enrich the Layer¶

Goal: Add meaningful insights by calculating the average number of floors per intersection.

Input: The mapped GeoDataFrame from Step 5 and the urban layer from Step 2.

Output: An enriched UrbanLayer with an avg_floors column in its GeoDataFrame.

The enricher module aggregates the numfloors column by nearest_intersection using the mean, adding this statistic to the layer for visualisation or further analysis like Machine Learning-based.

In [7]:

Copied!





enricher = (
    mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)
enriched_layer = enricher.enrich(mapped_data, layer)
enriched_layer.get_layer().head()  # Preview the enriched layer's GeoDataFrame content
enricher = (
    mapper.enricher.with_data(group_by="nearest_intersection", values_from="numfloors")
    .aggregate_by(method="mean", output_column="avg_floors")
    .build()
)
enriched_layer = enricher.enrich(mapped_data, layer)
enriched_layer.get_layer().head()  # Preview the enriched layer's GeoDataFrame content

Out[7]:

	osmid	y	x	highway	street_count	geometry	avg_floors
0	42464631	40.692056	-73.982623	traffic_signals	4	POINT (-73.98262 40.69206)	0.000000
1	42464823	40.692170	-73.989126	traffic_signals	4	POINT (-73.98913 40.69217)	0.000000
2	42464824	40.691802	-73.988213	traffic_signals	4	POINT (-73.98821 40.6918)	1.500000
3	42464827	40.691455	-73.987339	traffic_signals	4	POINT (-73.98734 40.69145)	4.333333
4	42464832	40.690663	-73.985353	traffic_signals	3	POINT (-73.98535 40.69066)	3.000000

Step 7: Visualise Results¶

Goal: Display the enriched data on an interactive map for exploration.

Input: The enriched GeoDataFrame from Step 6.

Output: An interactive Folium map showing average floors per intersection with a dark theme.

The visual module creates an interactive map with the Interactive type and a dark CartoDB dark_matter style, highlighting the avg_floors column.

In [8]:

Copied!





fig = (
    mapper.visual.with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
    .show(columns=["avg_floors"])  # Show the avg_floors column
    .render(enriched_layer.get_layer())
)
fig  # Display the map
fig = (
    mapper.visual.with_type("Interactive")
    .with_style({"tiles": "CartoDB dark_matter", "colorbar_text_color": "white"})
    .show(columns=["avg_floors"])  # Show the avg_floors column
    .render(enriched_layer.get_layer())
)
fig  # Display the map

Out[8]:

Make this Notebook Trusted to load map: File -> Trust Notebook

Conclusion¶

Congratulations! You’ve completed a full UrbanMapper workflow, step-by-step. You’ve transformed raw PLUTO data into a visually rich map of average building floors per intersection in Downtown Brooklyn. For a more streamlined approach, check out the Pipeline End-To-End notebook!