Filter¶
In this notebook, we’ll zoom in on the important bits of your data, make sure only the data points within your just querried urban_layer
remains!
Data source used:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change
import urban_mapper as um
# Get UrbanMapper rolling
mapper = um.UrbanMapper()
Loading Data and Creating a Layer¶
First, let’s load some data and create a layer for say Downtown Brooklyn
.
Note that:
- Loader example can be seen in
examples/Basics/loader.ipynb
to know how to load your own data. - Urban Layer example can be seen in
examples/Basics/urban_layer.ipynb
to know how to query your layer e.g of Downtown brooklyn streets intersections.
# Load data
# Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis.
data = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude` or only `geometry`
)
# Create urban layer
layer = (
mapper.urban_layer.with_type("streets_intersections") # From the urban_layer module and with type streets_intersections
.from_place("Downtown Brooklyn, New York City, USA") # From a place
.build()
)
Applying the Filter¶
Now we've got all the ingradients, let’s use the BoundingBoxFilter
to keep only the data points within our layer’s bounds. It’s like putting a spotlight on Downtown Brooklyn say you had data for the whole of New York City.
# Apply filter
filtered_data = (
mapper
.filter # From the filter module
.with_type("BoundingBoxFilter") # With type BoundingBoxFilter which is a filter that filters out your data points based on the bounding box of the layer
.transform(data, layer) # Transform the data with the layer previously queried
)
filtered_data
borough | block | lot | cd | bct2020 | bctcb2020 | ct2010 | cb2010 | schooldist | council | ... | appdate | plutomapid | firm07_flag | pfirm15_flag | version | dcpedited | latitude | longitude | notes | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
152 | BK | 2061 | 101 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | 09/02/2008 | 1 | NaN | NaN | 25v1 | None | 40.693303 | -73.979673 | None | POINT (-73.97967 40.6933) |
155 | BK | 2061 | 60 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.692447 | -73.980152 | None | POINT (-73.98015 40.69245) |
156 | BK | 2085 | 1 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 2000.0 | 13.0 | 35.0 | ... | 05/03/2023 | 1 | NaN | NaN | 25v1 | None | 40.691845 | -73.980986 | None | POINT (-73.98099 40.69185) |
157 | BK | 2061 | 80 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.692469 | -73.981065 | None | POINT (-73.98106 40.69247) |
700 | BK | 157 | 18 | 302.0 | 3003700.0 | 3.003700e+10 | 37.0 | 1003.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.690059 | -73.984444 | None | POINT (-73.98444 40.69006) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
762 | BK | 164 | 40 | 302.0 | 3003700.0 | 3.003700e+10 | 37.0 | 1012.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.689274 | -73.986233 | None | POINT (-73.98623 40.68927) |
912 | BK | 173 | 58 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686825 | -73.981896 | None | POINT (-73.9819 40.68683) |
940 | BK | 173 | 54 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686738 | -73.981658 | None | POINT (-73.98166 40.68674) |
941 | BK | 173 | 55 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686762 | -73.981730 | None | POINT (-73.98173 40.68676) |
942 | BK | 173 | 1 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.687012 | -73.981701 | None | POINT (-73.9817 40.68701) |
68 rows × 93 columns
Be Able To Preview Your Filter¶
Curious about your filter? Use preview()
to see its setup—super useful when you’re borrowing someone else’s analysis!
# Preview filter
print(mapper.filter.preview())
Filter: BoundingBoxFilter Action: Filter data to the bounding box of the urban layer
None
Provide many different datasets to the same filter¶
You can load many datasets and feed the filter with a dictionary. In that case, the output will also be a dictonary. See the next simple example.
If you want to apply the filter to a specific dataset of the dictionary, provide .with_data(data_id=...)
to the filter.
# Load CSV data
data1 = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=1000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file and with the `longitude` and `latitude` or only `geometry`
)
# Load Parquet data
data2 = (
mapper
.loader
.from_huggingface("oscur/taxisvis1M", number_of_rows=1000, streaming=True) # To update with your own path
.with_columns("pickup_longitude", "pickup_latitude").load() # Inform your long and lat columns or only geometry
)
data = {
"pluto_data": data1,
"taxi_data": data2,
}
# Apply filter
filtered_data = (
mapper
.filter # From the filter module
.with_type("BoundingBoxFilter") # With type BoundingBoxFilter which is a filter that filters out your data points based on the bounding box of the layer
.transform(data, layer) # Transform the data with the layer previously queried
)
filtered_data["pluto_data"]
borough | block | lot | cd | bct2020 | bctcb2020 | ct2010 | cb2010 | schooldist | council | ... | appdate | plutomapid | firm07_flag | pfirm15_flag | version | dcpedited | latitude | longitude | notes | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
152 | BK | 2061 | 101 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | 09/02/2008 | 1 | NaN | NaN | 25v1 | None | 40.693303 | -73.979673 | None | POINT (-73.97967 40.6933) |
155 | BK | 2061 | 60 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.692447 | -73.980152 | None | POINT (-73.98015 40.69245) |
156 | BK | 2085 | 1 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 2000.0 | 13.0 | 35.0 | ... | 05/03/2023 | 1 | NaN | NaN | 25v1 | None | 40.691845 | -73.980986 | None | POINT (-73.98099 40.69185) |
157 | BK | 2061 | 80 | 302.0 | 3003101.0 | 3.003101e+10 | 31.0 | 1001.0 | 13.0 | 35.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.692469 | -73.981065 | None | POINT (-73.98106 40.69247) |
700 | BK | 157 | 18 | 302.0 | 3003700.0 | 3.003700e+10 | 37.0 | 1003.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.690059 | -73.984444 | None | POINT (-73.98444 40.69006) |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
762 | BK | 164 | 40 | 302.0 | 3003700.0 | 3.003700e+10 | 37.0 | 1012.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.689274 | -73.986233 | None | POINT (-73.98623 40.68927) |
912 | BK | 173 | 58 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686825 | -73.981896 | None | POINT (-73.9819 40.68683) |
940 | BK | 173 | 54 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686738 | -73.981658 | None | POINT (-73.98166 40.68674) |
941 | BK | 173 | 55 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.686762 | -73.981730 | None | POINT (-73.98173 40.68676) |
942 | BK | 173 | 1 | 302.0 | 3003900.0 | 3.003900e+10 | 39.0 | 1002.0 | 15.0 | 33.0 | ... | None | 1 | NaN | NaN | 25v1 | None | 40.687012 | -73.981701 | None | POINT (-73.9817 40.68701) |
68 rows × 93 columns
filtered_data["taxi_data"]
VendorID | tpep_pickup_datetime | tpep_dropoff_datetime | passenger_count | trip_distance | pickup_longitude | pickup_latitude | RateCodeID | store_and_fwd_flag | dropoff_longitude | dropoff_latitude | payment_type | fare_amount | extra | mta_tax | tip_amount | tolls_amount | improvement_surcharge | total_amount | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
824 | 1 | 2015-01-10 20:56:27 | 2015-01-10 21:00:26 | 1 | 0.7 | -73.988953 | 40.693775 | 1 | N | -73.984795 | 40.702522 | 2 | 4.5 | 0.5 | 0.5 | 0.0 | 0.0 | 0.3 | 5.8 | POINT (-73.98895 40.69378) |
865 | 2 | 2015-01-23 00:31:11 | 2015-01-23 00:41:25 | 2 | 2.5 | -73.987251 | 40.692081 | 1 | N | -73.964653 | 40.705872 | 2 | 11.0 | 0.5 | 0.5 | 0.0 | 0.0 | 0.3 | 12.3 | POINT (-73.98725 40.69208) |
More Geo Filter primitives ?¶
Wants more? Come shout that out on https://github.com/VIDA-NYU/UrbanMapper/issues/5
Wrapping Up¶
Well done, you star! You’ve filtered your data to focus on what matters. Next stop: try enricher
or visualiser
.