Skip to content
View yasumorishima's full-sized avatar

Block or report yasumorishima

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yasumorishima/README.md

Yasunori Morishima (盛島康徳)

Manufacturing Engineer & Data Analyst with 17 years of experience, specializing in data analysis, open source contribution, and business automation. (製造業にて17年の経験を持つエンジニア。データ分析・OSS貢献・業務自動化を専門としています)

🌍 Realtime Open Data

Japan Geohazard Monitor Persian Gulf Ship Tracker
31 geophysical data sources → ML earthquake prediction (AUC 0.754, CSEP Molchan 0.981) + real-time monitoring dashboard AIS vessel tracking across the Persian Gulf & Gulf of Oman with land mask filtering

Real-time API / WebSocket → SQLite → FastAPI + Leaflet.js (dark theme)All projects

Japan Geohazard Monitor — Earthquake prediction research: 76 features from 25+ sources (USGS, Earthdata, INTERMAGNET, NMDB, NOAA, IOC), walk-forward HistGBT + stacking, BigQuery data platform (216K rows). Weekly automated CI pipeline on GitHub Actions.


⚾ Baseball Analytics

Prediction Systems

Project Description Demo
NPB Season Prediction Bayesian ensemble (Marcel 35% + Stan/Ridge 40% + ML 25%) + Monte Carlo team simulation + 24 foreign player individual projections Live
NPB 2021 Backtest Could Bayesian model predict Yakult & Orix last→champion? 25 foreign players with FanGraphs data Analysis
MLB Win Probability Engine 3-engine ensemble WP (Normal + Empirical + LightGBM) + Gemini AI commentary Live
Baseball MLOps Pipeline Statcast × GCP MLOps: 5-model ensemble + BQML, weekly auto-retrained on BigQuery + Cloud Run Live
MLB Data Pipeline Shared BigQuery data platform — FanGraphs + Savant + Statcast for all baseball analytics projects
Prediction accuracy & details

NPB Season Prediction — 2026 Forecast

  • Bayesian ensemble: Marcel 35% + Stan/Ridge skill correction 40% + ML (XGBoost/LightGBM) 25%
  • CL: 阪神 71.5W (26%) > 巨人 71.1 > 中日 71.0 > DeNA 70.7 — 4 teams within 0.8W / PL: SB 81.3W (48%) > 日ハム 79.1 > オリ 77.5
  • 24 foreign players individually projected via Stan v2 (MLB/KBO → NPB conversion)
  • 10,000 Monte Carlo sims with park factors (Pythagorean k=1.83)
  • 8-year backtest: Bayesian wOBA MAE .0498, ERA MAE 1.222 — 97% probability of beating Marcel
  • Article (JP) / Article (EN)

NPB Season Prediction — 2021 Backtest

  • Tested if Bayesian model could predict Yakult & Orix going from last place to champions
  • 25 foreign players (13 hitters, 12 pitchers) with full FanGraphs data (wOBA, K%, BB%, ERA, FIP, WHIP)
  • Result: MAE 10.7 wins (vs 10.4 without foreign predictions) — foreign players had minimal impact
  • Key finding: Bayesian regression over-corrects extreme values; model predicted mediocre players well (Cron .703 vs .701 actual) but failed on extremes (Gerber .862 vs .352)
  • 2021 standings were driven by Japanese player breakouts/collapses, not foreign players
  • Data: baseball-data.com + npb.jp + FanGraphs + Baseball Savant

MLB Win Probability Engine

  • 3-engine ensemble: v1 (Markov + Normal + Optuna), v2 (Empirical WP table), LightGBM — inverse-Brier weighted + Isotonic calibration
  • 367K+ play states (2015–2024) on BigQuery for training/validation, leave-one-year-out CV
  • Gemini 2.5 Flash AI commentary with prompt versioning, quality evaluation (100pt), W&B tracking
  • Live feed: MLB Stats API → real-time WP / LI / tactical recommendations (30s auto-refresh)

Baseball MLOps Pipeline

  • Accuracy (2025 backtest): Batter wOBA MAE = .0287 (Marcel: .0326) / Pitcher xFIP MAE = 0.483 (Marcel: 0.558)
  • 5-model ensemble: LightGBM + CatBoost + ElasticNet Bayes + Component (PECOTA) + BQML Boosted Tree
  • GCP: BigQuery (13 raw tables + BQML models) + Cloud Run (FastAPI) + Grafana dashboard
  • Pipeline: GitHub Actions cron (weekly) → pybaseball → 5-model retrain → W&B + BigQuery → Cloud Run → Streamlit

Biomechanics

Baseball Skeleton Analysis — 3D skeleton visualization from Driveline OpenBiomechanics C3D data

Pitching Skeleton (3D C3D) Hitting Skeleton (3D C3D)

Trunk rotation range vs pitch speed: r=0.425 (strongest). Contributed bug fix PR #384 to ezc3d. Article (JP) / Article (EN)

Statcast Analysis

6 analyses covering Japanese MLB pitchers and Ohtani batting data.
All analyses (6)
Analysis Key Finding Article
Kikuchi Slider Revolution (2019-2025) SL 17%→37% after Astros trade Zenn / DEV.to / Kaggle
Senga Ghost Fork (2023-2025) FO whiff rate 58%→39%, decline pre-injury Zenn / DEV.to / Kaggle
Imanaga 2nd Year (2024-2025) 3-pitch concentration (97%), 1st TTO xwOBA .505 Zenn / DEV.to / Kaggle
Darvish Evolution (2021-2025) SL/ST halved, CU became putaway pitch Zenn / DEV.to / Kaggle
Ohtani Spray Chart spraychart() one-liner vs matplotlib manual Zenn
Ohtani Heatmap Stadium drawing + hit density heatmap Zenn

🌐 Open Source Contributions

(55 PRs / 28 Merged) across 22 repositories. See [oss-contributions](https://github.com/yasumorishima/oss-contributions) for full details.
PR highlights (click to expand)
Repository PR Description
dfinity/icp-js-core #1270 Improve Candid decode error messages
dfinity/icp-js-core #1277 Deduplicate parallel fetchSubnetKeys
dfinity/pic-js #235 Add fetchCanisterLogs() method
line/line-bot-mcp-server #369 Add get_follower_ids tool
pyomeca/ezc3d #384 Fix __eq__ early return bug
optuna/optuna Hyperparameter optimization framework
pandas-dev/pandas Data analysis library
jldbc/pybaseball #498-504 Bug fixes & documentation
team-mirai — Civic Tech OSS (21 PRs (11 Merged / 2 Open / 8 Closed))

Contributing to open-source civic tech projects that promote political transparency and citizen participation in Japan.

# Repository PR Status Description
21 marumie #1141 Open Display total amount when category filter is applied
20 action-board #1969 Merged Add 48 unit tests for pure functions
19 action-board #1918 Merged Disable Supabase Image Transformation
18 action-board #1914 Merged Block shape deletion with XP
17 post-checker #34 Open Fix timezone-dependent date parsing
16 action-board #1906 Merged Refactor achieveMissionAction
15 action-board #1869 Merged Supabase RPC function tests for develop
14 action-board #1868 Merged Posting count display: times to sheets
13 action-board #1867 Merged Error toast for poster mission failure
12 action-board #1859 Merged Supabase RPC function tests
11 fact-checker #88 Closed Slack same-thread reply
10 fact-checker #87 Closed Deduplicate tweets using start_time filter
9 fact-checker #86 Closed Unit tests for Note markdown utilities
8 action-board #1856 Merged Update video mission description
7 action-board #1855 Closed Street speech map link
6 fact-checker #85 Closed Slack button env-based branching
5 action-board #1849 Merged Breadcrumb navigation
4 action-board #1845 Merged Fix prefecture cache invalidation
3 fact-checker #84 Closed Disable Twitter posting in staging
2 fact-checker #83 Closed Client-side engagement filtering
1 fact-checker #69 Done X API investigation

Tech Stack: Next.js, TypeScript, Supabase, shadcn/ui, Biome, Bun, Vitest


📊 Data & Competitions

Kaggle

Notebooks Expert | 🥉 14 Bronze Notebook Medals

Active: S6E3 Churn (LB 0.914) / Deep Past (Akkadian→English) / RNA 3D Folding 2

Bronze Medal Notebooks (14)
Notebook Topic
savant-extras Defense & Pitching Quality Defense metrics & pitching quality analysis (savant-extras)
MLB Statcast Spray Charts for WBC 2026 Players WBC 2026 player spray charts + pitch zone charts (baseball-field-viz)
March Machine Learning Mania 2026 Baseline NCAA basketball tournament prediction (LightGBM + Logistic Regression)
CAFA 6 Baseline with Regularization Protein function prediction (PyTorch MLP)
Bat Tracking: Japanese MLB Batters (2024-2025) MLB bat speed & swing metrics analysis
Senga Ghost Fork Analysis MLB Statcast pitching analysis
Kikuchi Slider Revolution MLB Statcast pitching analysis
NFL Geometric Rules Baseline Physics-based rules, No ML, RMSE 2.921
PhysioNet ECG Baseline ECG submission format guide
Diabetes EDA & Baseline LightGBM 5-fold CV, AUC 0.727
Diabetes Rank-Based Ensemble Rank averaging for AUC optimization
Deep Past Cloud Workflow + TF-IDF Baseline Akkadian→English TF-IDF baseline
Titanic Japanese Optuna Test Titanic survival prediction with Optuna tuning
Matplotlib & Seaborn 日本語化テンプレート Kaggle環境の日本語フォント文字化け解消テンプレート

Kaggle Datasets

6 published MLB datasets
Dataset Description
🥈 MLB Bat Tracking Leaderboard (2024-2025) 452 batters, 19 swing metrics
🥈 WBC 2026 Scouting 306 players, 20 countries
Other datasets (4)
Dataset Description
Baseball Savant Leaderboards (2024-2025) 15 leaderboards, 2 seasons combined
Japanese MLB Players Statcast (2015-2025) 34 Japanese MLB players, 174k pitches+hits
MLB Pitcher Arsenal Evolution (2020-2025) 4,253 pitcher-seasons, 111 metrics
MLB Statcast + Bat Tracking (2024-2025) Combined Statcast + bat tracking data

DrivenData

DrivenData Competitions — Automated pipeline: GitHub Actions + GPU training + GPU→CPU fallback. Currently competing in On Top of Pasketti (Children's ASR, $120K prize, Wav2Vec2 CTC).


📱 Apps

App Description Link
MLB Bat Tracking Dashboard Leaderboard, Player Comparison, Team Lineup Builder. Powered by savant-extras Live
WBC 2026 Scouting Dashboard 30 Statcast apps across 19 countries. Zone heatmaps, spray charts, pitch movement Live
Daily Diary Flutter mobile app, 5 languages, offline-first, AdMob Google Play
WBC 2026 Scouting Dashboard details (30 apps)

WBC 2026 Scouting Dashboard — Statcast-based scouting dashboards for all WBC 2026 teams, deployed on Streamlit Community Cloud.

  • 30 apps across 19 countries — batters (17 countries) + pitchers (13 countries)
  • Features: Zone heatmaps, spray charts, pitch movement, count-by-count performance, LHP/RHP splits
  • Data: Baseball Savant Statcast via pybaseball, auto-fetched by GitHub Actions
Example Link
USA Batters wbc-usa-batters.streamlit.app
Japan Pitchers wbc-japan-pitchers.streamlit.app
All 30 apps GitHub README

📦 PyPI Packages

6 packages (click to expand)
Package Description
savant-extras 17 Baseball Savant leaderboards + date range support. Complements pybaseball
baseball-field-viz Statcast coordinate transform + field drawing + spray charts + pitch zone charts
kaggle-notebook-deploy Deploy Kaggle Notebooks via git push + GitHub Actions
kaggle-wandb-sync Sync W&B offline runs from Kaggle to W&B cloud
signate-deploy SIGNATE competition workflow via GitHub Actions
signate-wandb-sync Record SIGNATE scores to W&B runs

🔬 Learning Projects

Project Description
ICP Learning Project Persistent counter dApp on Internet Computer (Motoko, dfx CLI)
OpenClaw Twitter Bot Raspberry Pi 5 + OpenClaw + Gemini API auto-tweet bot — Article (JP)
Past Projects
Project Description
GAS Calendar Tool Batch calendar event registration with senior-friendly mobile UI
Dune Analytics On-chain data analysis — JPYC Stablecoin Dashboard
Archived Projects Selenium automation, business workflow tools, etc.

🛠️ Tech Stack

Category Technologies
Data Analysis & ML Python, pandas, scikit-learn, LightGBM, XGBoost, CatBoost, PyTorch, matplotlib, seaborn, DuckDB, W&B
Data Platform Google BigQuery (8 datasets, 96+ tables — baseball, geohazard, ship tracking), BigQuery ML, Cloud Run, Grafana
Data Sources Baseball Savant (Statcast), pybaseball, USGS, NASA Earthdata, AIS
Web & Dashboards Streamlit, Next.js, TypeScript, Supabase, shadcn/ui
Mobile App Flutter, Dart, Hive, Google AdMob
Automation & DevOps GitHub Actions, Google Apps Script, VBA, Power Query
Tools Claude Code, Kaggle, Google Colab, Excel, Looker Studio
Manufacturing Statistical Quality Control, Process Engineering

📈 Career

  • 2024 - Present: Quality Management @ Marubun Corporation (丸文株式会社)
  • 2020 - 2024: Technical Dept. @ Metaco Corporation (株式会社メタコ)
  • 2008 - 2020: Process Engineering in Semiconductor Manufacturing (半導体製造プロセスエンジニア)

🏆 Patents

Stencil mask and manufacturing method thereof (ステンシルマスク及びその製造方法)

  • Patent No: 6307851 (特許第6307851号)
  • Role: Inventor (発明者)
  • Assignee: Toppan Printing Co., Ltd. (凸版印刷株式会社)
  • Link: Google Patents (JP6307851B2)

📫 Contact & Blog

Pinned Loading

  1. kaggle-datasets kaggle-datasets Public

    Baseball-themed Kaggle datasets generated with pybaseball

    Python

  2. mlb-statcast-visualization mlb-statcast-visualization Public

    MLB Statcast data visualization with pybaseball - 3 methods to draw baseball fields (spraychart, matplotlib, sportypy)

    Python

  3. oss-contributions oss-contributions Public

    My open source contributions tracker

    Python

  4. wbc-scouting wbc-scouting Public

    Python