例題: where .. group by .. みたいなこと
やりたいこと
下の方に書いたようなカンマ区切り文字列があったとします。 列名は左から name, xMin, yMin, xMax, yMax。
やりたいことを SQL と Python 混在風に書くと以下のようなこと 。
sql
select
namePrefix,
min(xMin), min(yMin),
max(xMax), max(xMax)
where
name like '%市 %'
group by
name.split(' ')[0] as namePrefix
言い換えると、「政令指定都市の境界を一覧したい」
city-to-bound.csv
邑楽郡 大泉町,139.382443,36.217958,139.436981,36.276451
邑楽郡 邑楽町,139.426620,36.219253,139.503839,36.289193
さいたま市 西区,139.540591,35.871460,139.603690,35.945782
さいたま市 北区,139.588043,35.914509,139.643607,35.966024
さいたま市 大宮区,139.595452,35.887551,139.649622,35.925716
さいたま市 見沼区,139.625700,35.892556,139.701143,35.965172
さいたま市 中央区,139.604275,35.854349,139.645693,35.902247
さいたま市 桜区,139.568880,35.831714,139.637676,35.888669
さいたま市 浦和区,139.632534,35.846428,139.668766,35.901683
さいたま市 南区,139.616569,35.828592,139.698057,35.866937
さいたま市 緑区,139.660718,35.857048,139.744507,35.918457
さいたま市 岩槻区,139.653536,35.880293,139.757345,36.002692
川越市,139.377440,35.837584,139.558388,35.962265
熊谷市,139.289611,36.069632,139.446620,36.254374
回答(標準)
py
import csv
from collections import defaultdict
groups = defaultdict(lambda: [float("inf"), float("inf"), float("-inf"), float("-inf")])
with open("city-to-bound.csv", newline="", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
name, xMin, yMin, xMax, yMax = row
xMin, yMin, xMax, yMax = map(float, (xMin, yMin, xMax, yMax))
if "市 " not in name:
continue
prefix = name.split(" ")[0]
g = groups[prefix]
g[0] = min(g[0], xMin)
g[1] = min(g[1], yMin)
g[2] = max(g[2], xMax)
g[3] = max(g[3], yMax)
for prefix, (xmin, ymin, xmax, ymax) in groups.items():
print(prefix, xmin, ymin, xmax, ymax)
回答(pandas あり)
py
import pandas as pd
df = pd.read_csv("city-to-bound.csv", header=None, names=["name", "xMin", "yMin", "xMax", "yMax"])
df = df[df["name"].str.contains("市 ")]
df["namePrefix"] = df["name"].str.split(" ").str[0]
result = df.groupby("namePrefix").agg(
xMin=("xMin", "min"),
yMin=("yMin", "min"),
xMax=("xMax", "max"),
yMax=("yMax", "max"),
).reset_index()
print(result)