Skip to content
View YanCotta's full-sized avatar
๐Ÿงฌ
Excelsior
๐Ÿงฌ
Excelsior

Highlights

  • Pro

Block or report YanCotta

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
YanCotta/README.md


YanCotta

Due to NDAs and corporate policies, the vast majority of my professional-grade code resides in private repositories. The projects showcased here are primarily my academic and personal side-projects where I experiment, build, and deploy end-to-end systems from scratch.



๐Ÿ† TL;DR - ACHIEVEMENTS

Quick summary of awards and key metrics

โ€ข Working with Data Strategy & Innovation for Brazilโ€™s largest genomic improvement program (at Embrapa), with an elite team, and managing data assets across 7 Latin American countries.

โ€ข Spearheaded R&D for a State-of-the-Art Multimodal AI system (Vision + Language), engineering a novel architecture that solved critical data quality issues and achieved a 6x performance breakthrough over traditional baselines.

โ€ข 1st Place Winner at the Reply Enterprise Challenge (FIAP NEXT 2025). I designed and built an end-to-end, production-grade AI Multi-Agentic platform, that is production-ready, achieving a 76% reduction in a key operational KPI.

โ€ข Trained SOTA models at Outlier using RLHF (collaborating with OpenAI, Meta & Anthropic), increasing model efficiency by 64%.

โ€ข Developed an award-winning National Resilience AI Platform (FIAP 2025 Global Solution Winner) from concept to deployment.

โ€ข Built a Full-Stack Invoice Automation System (React + RAG) in a 15-day sprint, cutting manual work by >85%.

โ€ข Led 3 global SuperDataScience teams to deploy end-to-end AI systems, managing both project architecture, KPIs, and team execution/mentoring/leadership.


๐Ÿ“‘ QUICK NAVIGATION





๐Ÿ‘จโ€๐Ÿ’ป ABOUT ME


My mission is to build and lead the high-impact teams that will architect the future. I am a strategist who uses AI to solve complex, global business challenges and deliver measurable, executive-level value.

My unique advantage is Systemic Thinking. My background isn't just in AI; it's in the complex, interconnected systems of Biology and Cognitive Science. This allows me to deconstruct multifaceted problems, see the connections others miss, and architect holistic, high-impact solutionsโ€”not just code.





๐Ÿ’ผ PROFESSIONAL & RESEARCH EXPERIENCE

Proven track record in AI Engineering, Leadership & Bioinformatics

Data Manager: Strategy, Engineering & Innovation | Embrapa | Juiz de Fora, MG | Dec 2025 - Present

  • Strategic Leadership: working with the data strategy and innovation for Brazilโ€™s largest genetic improvement program, acting as the bridge between world-renowned PhD researchers, an elite team, and executive stakeholders.
  • Big Data & AI: Managing data assets across 7 Latin American countries and architecting SOTA AI/ML pipelines and LLMOps to modernize genomic evaluations.
  • Innovation: Designing automated workflows and securing laboratory budget/resources through high-impact executive presentations.

Key Areas: Data Strategy Big Data AI Leadership Genomics Innovation Management


Invited AI Researcher (Multimodal Deep Learning) | FrameNet Brasil / UFJF | Sep 2025 - Present

  • Advanced AI Research: Led the development of novel Vision-Language Models (VLMs) designed to automate the semantic understanding of complex visual scenes.

  • Architecture Innovation: Engineered a "Hybrid Neuro-Symbolic" system that combines visual perception with structured knowledge, successfully solving the challenge of training models on incomplete or "noisy" datasets.

  • Impact: Achieved State-of-the-Art (SOTA) performance in multi-label classification, delivering a 6x improvement in model accuracy compared to previous methods.

  • End-to-End Development: Managed the full research lifecycle, from data strategy and pipeline engineering to scientific validation and deployment of reproducible AI solutions.

Key Areas: Multimodal AI Computer Vision Deep Learning Research Semantic Understanding R&D Leadership


Project Lead: AI/ML Engineering & Data Science | SuperDataScience | Remote | Jun 2025 - Present

  • Agentic AI Leadership: Architecting "FinResearch AI", a multi-agent system using CrewAI to automate institutional financial research, pivoting teams from static notebooks to production-grade orchestration.
  • Predictive ML Platforms: Delivered full-stack healthcare (GlucoTrack) and HR analytics (MLPayGrade) platforms using Deep Learning, Model Explainability, and Tabular Embeddings.
  • Global Team Management: Orchestrating the full lifecycle for diverse international cohorts, aligning KPIs, conducting 1x1 mentorship, and enforcing software engineering best practices for scalable deployment.

Key Areas: Agentic AI Multi-Agent Systems CrewAI Technical Leadership LLMs RAG Full-Stack ML


R&D Intern (Data & Genomics) | Embrapa Gado de Leite | Juiz de Fora, MG | Sep 2025 - Dec 2025

  • Increased performance by 87% of genomic queries by migrating from PostgreSQL to Neo4j.
  • Architected a scalable MLOps pipeline for genomic analysis (Docker, Nextflow, FastAPI).
  • Optimized project presentations for stakeholders and executives responsible for laboratory budget and resources.

Key Areas: Genomics Bioinformatics Data Engineering Applied ML Neo4j MLOps


AI Trainer (LLM Systems via RLHF) | Outlier | Remote | Nov 2024 - Sep 2025

  • Developed technical content to align Large Language Models (OpenAI, Meta, Anthropic), increasing model efficiency by 64% via RLHF in collaboration with technical teams.

Key Areas: RLHF Model Alignment AI Safety LLMs Quality Assurance


Data Analyst (Ecological Impact) | Impaakt | Remote | Feb 2022 - Oct 2024

  • Delivered 500+ data-driven ecological impact reports that influenced ESG (Environmental, Social, and Governance) ratings used by investment firms.

Key Areas: Environmental Science Sustainability Analysis Data Analysis Process Optimization AI Integration Impact Assessment


Research Assistant | Georgia State University | Atlanta, GA | Feb 2019 - Feb 2020

  • Increased research productivity by 84% by automating data collection and analysis workflows using Python.

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Behavioral Analysis Research Methodology Data Analysis Data Science Python



๐ŸŽ“ ACADEMIC BACKGROUND


Bachelor of Technology (Technologist Degree) - AI Systems & Machine Learning | FIAP | 2024 - 2026 (expected)

Key Areas: AI Systems Architecture Machine Learning Engineering MLOps Edge AI IoT Development Software Engineering Data Engineering Cybersecurity Cloud Operations

Academic Excellence: GPA 4.0


Bachelor of Science - Biological Sciences | UniAcademia | 2022 - 2025 (in progress)

Key Areas: Molecular Biology Genetics Computational Biology Research Methodology Laboratory Management Scientific Publishing

Academic Excellence: GPA 3.7 | Thesis: Epigenetics Antiaging Health Software Leveraging Machine Learning & Deep Learning Algorithms


Bachelor of Science - Philosophy (Major) & Psychology (Minor) | Georgia State University | 2017 - 2020 (incomplete)

Key Areas: Cognitive Sciences Philosophy of Mind Psychology Human Behavior Research Methodology Academic Leadership

Academic Excellence: GPA 3.8 | Thesis: Differentiating Factual Belief, Imagination & Religious Credence - A Systematic Theory of Cognitive Attitudes

Additional Recognition: Columnist for "The Signal" (GSU's award-winning newspaper), Atlanta Campus Scholarship recipient, Dean's List, Honor Society member





๐Ÿค PROFESSIONAL RECOMMENDATIONS

What others say about working with me

View all recommendations on LinkedIn

I've been fortunate to work with exceptional professionals who have recognized my technical capabilities, problem-solving approach, and collaborative leadership style. These recommendations span my work in:

  • AI/ML Engineering & Research
  • Data Science & Analytics
  • Project Leadership & Team Collaboration
  • Academic Research & Scientific Methodology




End-to-end AI systems architected to solve real-world challenges

This portfolio showcases end-to-end AI systems I've architected to solve real-world challenges. Each project demonstrates business impact, technical excellence, and production-ready implementation.


Smart Maintenance SaaS

๐Ÿ† 1st PLACE WINNER - Reply Enterprise Challenge @ FIAP NEXT 2025 ๐Ÿ†

An end-to-end, production-grade predictive maintenance platform I built from scratch (investing hundreds of hours since March) to win Reply's annual enterprise challenge. This system uses a 12-agent event-driven architecture (FastAPI, Redis) and 17 ML models (trained on 6 real-world datasets like NASA, AI4I, XJTU) to predict equipment failures before they happen.

  • Business Value: Proven to reduce unplanned downtime by 40% and save R$ 100-500k per prevented failure.
  • Performance: Validated at 103.8 RPS with 3ms P99 latency under load.
  • Database: Achieved 37% faster dashboard queries using TimescaleDB continuous aggregates.
  • Stack: Python, FastAPI, TimescaleDB, MLflow, Docker, AWS, Streamlit.

Code/repository under an NDA contract


Invoice Automation System (Full-Stack & Multi-Agent)

Solo Development | AI-powered invoice processing automation

Business Goal: To eliminate the slow, error-prone manual process of invoice handling for small to medium businesses.

Solution & Impact: Built a full-stack system that automates the entire invoice processing pipeline. By mapping the user journey and applying RAG for intelligent error handling, the system reduced manual processing time by over 85%.

Technologies: React.js โ€ข Next.js โ€ข TypeScript โ€ข FastAPI โ€ข LangChain โ€ข RAG โ€ข FAISS โ€ข Docker โ€ข AWS S3 โ€ข PostgreSQL


๐Ÿ† Guardian System: National Resilience Platform (Award Winner)

Solo Development | My winning project for FIAP's 2025.1 Global Solution Challenge

Business Goal: To create a predictive system to manage and mitigate large-scale national crises like natural disasters.

Solution & Impact: I single-handedly architected and developed this award-winning multi-agent platform. Five autonomous "Guardian" agents for different threat domains, with a fully functional MVP for fire risk prediction using real-time IoT sensor data.

Technologies: Agentic AI โ€ข Python โ€ข FastAPI โ€ข Docker โ€ข MicroPython โ€ข ESP32 โ€ข IoT โ€ข Apache Spark


AI Platform for Anti-Aging (Thesis Project)

Solo Development | Personalized anti-aging recommendation system

Business Goal: To create a scalable HealthTech platform that provides personalized, data-driven health recommendations, moving beyond generic advice.

Solution & Impact: Developing an AI platform focused on Explainable AI (SHAP) and secure deployment (JWT). The system translates complex epigenetic data (BioPython) into actionable health insights. Analyzes genetic predispositions (SNPs) and lifestyle habits to generate personalized risk assessments.

Technologies: PyTorch โ€ข Scikit-learn โ€ข BioPython โ€ข MLFlow โ€ข SHAP โ€ข Docker โ€ข FastAPI โ€ข React


FarmTech Integrated Ecosystem (IoT & Edge AI)

End-to-End Architecture | Unified smart farming system integrating IoT, Cloud, and Hybrid AI

Business Goal: To optimize agricultural ROI by minimizing water usage and crop loss through real-time telemetry and automated decision-making.

Solution & Impact: A massive 6-module ecosystem combining Edge AI (YOLOv5) for pest detection and Cloud AI (GPT-4o) for insights. Features a custom Genetic Algorithm that solves the "knapsack problem" for crop allocation and a distributed ESP32 IoT network for predictive irrigation.

Technologies: Python โ€ข AWS โ€ข IoT (ESP32) โ€ข YOLOv5 โ€ข Genetic Algorithms โ€ข OpenAI API โ€ข SQLAlchemy โ€ข Streamlit


Elliott Wave ML Financial Analyzer (Scientific Project)

Student Lead & Architect | Automated B3 Stock Analysis & Prediction System

Business Goal: To automate the complex detection of Elliott Wave market patterns, creating a professional-grade technical analysis tool for the Brazilian Stock Exchange (B3).

Solution & Impact: Led the research and development of a full-stack ML system processing real-time market data. Built a custom feature engineering engine (24 technical indicators) and an MLOps pipeline (MLflow + AWS S3) to train and version Random Forest/SVM models. The system classifies market movements into 4 strategic categories (Impulse, Corrective, End of Cycle) via a Streamlit UI.

Technologies: Python โ€ข MLflow โ€ข AWS S3 โ€ข Docker โ€ข Scikit-learn โ€ข Streamlit โ€ข FastAPI โ€ข Technical Analysis


Admixture Automation Pipeline (Bioinformatics)

Lead Developer | High-performance bovine ancestry analysis pipeline for Embrapa

Business Goal: To solve computational bottlenecks in genomic ancestry analysis and democratize access to complex tools for researchers.

Solution & Impact: Architected a Nextflow automation pipeline that handles data conversion, Quality Control, and visualization. Introduced a parallelized Cross-Validation engine (reducing scan times drastically) and a Streamlit Web UI, allowing non-coders to run scientific-grade population structure analyses.

Technologies: Nextflow โ€ข Python โ€ข Streamlit โ€ข R โ€ข Bioinformatics โ€ข Parallel Computing โ€ข Docker





๐ŸŒ COMMUNITY PROJECTS & LEADERSHIP

Leading diverse teams to deliver production-ready AI platforms

As a Project Leader in the international SuperDataScience community, I led diverse teams of data scientists and ML engineers to deliver production-ready AI/ML platforms. I was responsible for aligning project priorities with stakeholders, defining KPIs, and managing deployment.

Leadership Experience: Project Lead for 2 projects | Project Member for 2 projects

GlucoTrack: Diabetes Risk Prediction Platform

Project Lead | Comprehensive diabetes risk assessment system using the CDC diabetes dataset

Led a diverse team of data scientists and ML engineers to deliver both beginner-friendly and advanced deep learning solutions.

Key Features: Built traditional ML models (Logistic Regression, Decision Trees) and advanced Feedforward Neural Networks with hyperparameter tuning. Includes model explainability tools and multiple deployment options.

Technologies: Python โ€ข Scikit-learn โ€ข Deep Learning โ€ข Streamlit โ€ข Model Explainability โ€ข Healthcare AI โ€ข Data Science

Live app: glucotrack.streamlit.app


MLPayGrade: ML Salary Prediction System

Project Lead | End-to-end salary prediction platform analyzing the 2024 machine learning job market

Coordinated a team of data scientists and ML engineers to build comprehensive solutions across multiple skill levels.

Key Features: Analyzes global salary trends and job feature impacts on compensation. Features both traditional ML pipelines and advanced deep learning on tabular data with embeddings and explainability.

Technologies: Python โ€ข Scikit-learn โ€ข Deep Learning โ€ข Tabular Data โ€ข Streamlit โ€ข Job Market Analyticsโ€ข Data Science


EduSpend: Global Education Cost Prediction

Project Member | End-to-end machine learning platform to predict Total Cost of Attendance for international higher education

Key Features: Achieved a 96.44% Rยฒ score with an XGBoost Regressor, deployed via both a Streamlit web app and a FastAPI service, all containerized with Docker and automated with CI/CD.

Technologies: Scikit-learn โ€ข XGBoost โ€ข MLflow โ€ข Streamlit โ€ข FastAPI โ€ข Docker โ€ข CI/CDโ€ข Data Science


FinResearch AI: Multi-Agent Market Intelligence

Project Lead | Agentic AI system for automated institutional-grade financial research

Led the development of an autonomous multi-agent system that mimics a professional financial analyst team.

Key Features: Orchestrates 5 specialized agents (Researcher, Quant Analyst, Reporter) using Shared Vector Memory to scrape real-time news, calculate financial ratios, and synthesize findings into investment-grade reports. Implements "Advanced Track" architecture using CrewAI concepts.

Technologies: Python โ€ข CrewAI โ€ข OpenAI Agents โ€ข RAG โ€ข ChromaDB โ€ข Streamlit โ€ข Financial APIs


Smart Leaf: Deep Learning for Crop Disease

Project Member | Deep learning solution that classifies 14 different crop diseases across four species

Key Features: A Convolutional Neural Network (CNN) trained on my local machine, on over 13,000 images, using only modulerized python scripts (no notebooks), deployed via a user-friendly Streamlit interface for real-time predictions. Covers corn, potato, rice, and wheat diseases.

Technologies: Deep Learning โ€ข Computer Vision โ€ข CNN โ€ข TensorFlow โ€ข PyTorch โ€ข Streamlitโ€ข Locally Trained Neural Network





๐Ÿ” EXPLORE MORE PROJECTS


... and even more projects in my repositories, covering Data Science, Machine Learning, MLOps, LLMOps, IoT, AI engineering, bioinformatics, and more!

View All Repositories




๐Ÿ› ๏ธ TECH STACK & TOOLS

My technical arsenal for building scalable solutions

AI & Machine Learning

Agentic AI & LLMs

Architecture, Backend & APIs

Databases & Data Engineering

Cloud & MLOps

Frontend & Visualization

Testing & Code Quality

IoT & Edge AI





๐Ÿ“œ CERTIFICATIONS


View all certifications on LinkedIn

I maintain active certifications across AI/ML platforms, cloud infrastructure, and software development to ensure I stay current with industry-leading technologies and best practices.

Key Certifications Include:

  • Machine Learning & AI Engineering
  • Cloud Platform Expertise (AWS, Azure)
  • Data Science & Analytics
  • Software Development & DevOps
  • Specialized domain certifications in Bioinformatics and IoT




๐Ÿ“š PUBLICATIONS


View all publications on LinkedIn

My research spans cognitive science, artificial intelligence, and computational biology, bridging theoretical frameworks with practical applications.

Research Areas:

  • Philosophy of Mind & Cognitive Attitudes
  • Machine Learning Applications in Health Sciences
  • Epigenetics & AI-Driven Personalized Medicine
  • Computational Biology & Genomics
  • AI Systems Architecture & Engineering




๐ŸŒ GLOBAL COMMUNICATION

Languages & International Collaboration


"The most flexible element is the one that controls the system."


Pinned Loading

  1. global_solution_1_fiap global_solution_1_fiap Public

    Winner of FIAP'S Global Solution 2025.1 Challenge. This repository contains the architecture for a multi-agent system where five autonomous "Guardians" work in synergy to predict, manage, and respoโ€ฆ

    Python 2 1

  2. FarmTech_System FarmTech_System Public

    Unified system for a smart/technological/automated farm in large scale

    Jupyter Notebook

  3. SmartCrops-IoT-ML-System SmartCrops-IoT-ML-System Public

    An IoT-ML project for smart agriculture: Dual ESP32 nodes (sensor via ESP-NOW, gateway to MQTT/Ubidots) collects temp, humidity, soil moisture data. ML Model analyzes crop yield and real-time plantโ€ฆ

    Jupyter Notebook 2

  4. SDS-CP035-gluco-track SDS-CP035-gluco-track Public

    Forked from SuperDataScience-Community-Projects/SDS-CP035-gluco-track

    GlucoTrack is a machine learning and deep learning project focused on predicting a personโ€™s risk level of diabetes

    Jupyter Notebook 1

  5. agentic_invoice_system_final_version agentic_invoice_system_final_version Public

    Technical test for Brim's AI Engineer role : implementation of a Multi-Agentic System for Invoice Automation. Due 02/28. Nextjs frontend implementation.

    Python 1

  6. post_training_llms post_training_llms Public

    Different post-training techniques for LLMs, including: SFT, DPO and Online RL

    Python 4 1