Skip to content

Latest commit

 

History

History
85 lines (45 loc) · 6.16 KB

File metadata and controls

85 lines (45 loc) · 6.16 KB

New Features & Enhancements

  • DPK 1.1.0 has been released. For details on the new features and enhancements, please see this.

Data Prep Kit Resources

📄 Papers

  1. Data-Prep-Kit: getting your data ready for LLM application development
  2. Granite Code Models: A Family of Open Foundation Models for Code Intelligence
  3. Scaling Granite Code Models to 128K Context

🎤 External Events and Showcase

  1. The AI Alliance Office Hours: “Introducing GneissWeb - a state-of-the-art LLM pre-training dataset” - Mar 6, 2025 - Slides

  2. Workshop at the AI for Connectivity Hackathon: “Preparing Data for LLM Applications with Docling & Data Prep Kit” - Jan 25, 2025

  3. Talk on DPK at IBM TechXchange Agents day - Jan 23, 2025 - Slides

  4. DPK tutorial at CODS-COMAD 2024 - Dec 18, 2024

  5. “Generative AI Model Data Pre-Training on Kubernetes: A Use Case Study” was accepted for KubeCon EU 2025 - Dec 2024

  6. DPK has been added to AI Alliance's “Living Guide to Applying AI” - Dec 2024

  7. Workshop on Preparing Data for LLM Applications Using Data Prep Kit -Dec 2024 - Video

  8. DPK tutorial and hands on session at IIIT Delhi - Nov 22, 2024

  9. Talk and Hands on session at MIT Bangalore - Nov 8, 2024

  10. PyData NYC 2024 - 90 mins Tutorial - Nov 6, 2024

  11. "Data Prep Kit: A Comprehensive Cloud-Native Toolkit for Scalable Data Preparation in GenAI App" - Oct 28-29, 2024 - Video | Slides

  12. Tech Educator summit IBM CSR Event - Oct 16, 2024

  13. Data Science Dojo Meetup - Oct 9, 2024 - Video

  14. Open Source RAG Pipeline workshop with Data Prep Kit at TechEquity's AI Summit in Silicon Valley - Oct 2024

  15. "RAG with Data Prep Kit" Workshop @ Mountain View, CA, USA ** - info - Sep 21, 2024

  16. IBM TechXchange Las Vegas

  17. Unstructured Data Meetup - SF, NYC, Silicon Valley

  18. Data Exchange Podcast with Ben Lorica - Sep 2024

  19. Open Source AI Demo Night - Aug 8, 2024

  20. "Building Successful LLM Apps: The Power of high quality data" - Video | Slides - Aug 2024

  21. "Hands on session for fine tuning LLMs" - Video - Aug 2024

  22. "Build your own data preparation module using data-prep-kit" - Video - Aug 2024

Example Code

Find example code in readme section of each tranform and some sample jupyter notebooks for getting started here

Blogs / Tutorials

Relevant online communities

We Want Your Feedback!

Feel free to contribute to discussions or create a new one to share your feedback