New York State Department of Labor
Coursera logo
Log In
Coursera
New York State Department of Labor
IBM
Generative AI Language Modeling with Transformers
  • About
  • Outcomes
  • Modules
  • Testimonials
  • Reviews
  • Recommendations
  1. Data Science
  2. Machine Learning
IBM

Generative AI Language Modeling with Transformers

This course is part of multiple programs.

This course is part of multiple programs

IBM AI Engineering Professional Certificate
Generative AI Engineering with LLMs Specialization
IBM Generative AI Engineering Professional Certificate
Joseph Santarcangelo
Fateme Akbari
Kang Wang

Instructors: Joseph Santarcangelo

Instructors

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

4.0 (16 ratings)
Joseph Santarcangelo
Joseph Santarcangelo
IBM
35 Courses•2,071,002 learners
Fateme Akbari
Fateme Akbari
IBM
4 Courses•21,788 learners
Kang Wang
Kang Wang
3 Courses•29,202 learners

Access provided by New York State Department of Labor

13,531 already enrolled

2 modules
Gain insight into a topic and learn the fundamentals.
4.5

(99 reviews)

Intermediate level

Recommended experience

Recommended experience

Intermediate level

Basic knowledge of Python and PyTorch. You should also be familiar with machine learning and neural network concepts.

9 hours to complete
Flexible schedule
Learn at your own pace

2 modules
Gain insight into a topic and learn the fundamentals.
4.5

(99 reviews)

Intermediate level

Recommended experience

Recommended experience

Intermediate level

Basic knowledge of Python and PyTorch. You should also be familiar with machine learning and neural network concepts.

9 hours to complete
Flexible schedule
Learn at your own pace
  • About
  • Outcomes
  • Modules
  • Testimonials
  • Reviews
  • Recommendations

What you'll learn

  • Explain the role of attention mechanisms in transformer models for capturing contextual relationships in text

  • Describe the differences in language modeling approaches between decoder-based models like GPT and encoder-based models like BERT

  • Implement key components of transformer models, including positional encoding, attention mechanisms, and masking, using PyTorch

  • Apply transformer-based models for real-world NLP tasks, such as text classification and language translation, using PyTorch and Hugging Face tools

Skills you'll gain

  • PyTorch (Machine Learning Library)
  • Large Language Modeling
  • Deep Learning
  • Natural Language Processing
  • Machine Learning Methods
  • Artificial Neural Networks
  • Generative AI

Details to know

Shareable certificate

Add to your LinkedIn profile

Assessments

6 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

Learn more about Coursera for Business
 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

Build your subject-matter expertise

This course is available as part of
When you enroll in this course, you'll also be asked to select a specific program.
  • Learn new concepts from industry experts
  • Gain a foundational understanding of a subject or tool
  • Develop job-relevant skills with hands-on projects
  • Earn a shareable career certificate

There are 2 modules in this course

This course provides a practical introduction to using transformer-based models for natural language processing (NLP) applications. You will learn to build and train models for text classification using encoder-based architectures like Bidirectional Encoder Representations from Transformers (BERT), and explore core concepts such as positional encoding, word embeddings, and attention mechanisms.

The course covers multi-head attention, self-attention, and causal language modeling with GPT for tasks like text generation and translation. You will gain hands-on experience implementing transformer models in PyTorch, including pretraining strategies such as masked language modeling (MLM) and next sentence prediction (NSP). Through guided labs, you’ll apply encoder and decoder models to real-world scenarios. This course is designed for learners interested in generative AI engineering and requires prior knowledge of Python, PyTorch, and machine learning. Enroll now to build your skills in NLP with transformers!

In this module, you will learn how transformers process sequential data using positional encoding and attention mechanisms. You will explore how to implement positional encoding in PyTorch and understand how attention helps models focus on relevant parts of input sequences. You'll dive deeper into self-attention and scaled dot-product attention with multiple heads to see how they contribute to language modeling tasks. The module also explains how the transformer architecture leverages these mechanisms efficiently. Through hands-on labs, you’ll implement these concepts and build transformer encoder layers in PyTorch. Finally, you'll apply transformer models for text classification, including building a data pipeline, defining the model, and training it, while also exploring techniques to optimize transformer training performance.

What's included

6 videos4 readings2 assignments2 app items1 plugin

6 videos•Total 39 minutes
  • Course Introduction•2 minutes
  • Positional Encoding•6 minutes
  • Attention Mechanism•7 minutes
  • Self-attention Mechanism•7 minutes
  • From Attention to Transformers•7 minutes
  • Transformers for Classification: Encoder•8 minutes
4 readings•Total 17 minutes
  • Course Overview•5 minutes
  • Specialization Overview•7 minutes
  • Optimization Techniques for Efficient Transformer Training •3 minutes
  • Summary and Highlights•2 minutes
2 assignments•Total 45 minutes
  • Practice Quiz: Positional Encoding, Attention, and Application in Classification•15 minutes
  • Graded Quiz: Fundamental Concepts of Transformer Architecture•30 minutes
2 app items•Total 105 minutes
  • Hands-on Lab: Attention Mechanism and Positional Encoding•45 minutes
  • Hands-on Lab: Applying Transformers for Classification•60 minutes
1 plugin•Total 2 minutes
  • Helpful Tips for Course Completion•2 minutes

In this module, you will learn how decoder-based models like GPT are trained using causal language modeling and implemented in PyTorch for both training and inference. You will explore encoder-based models, such as Bidirectional Encoder Representations from Transformers (BERT), and understand their pretraining strategies using masked language modeling (MLM) and next sentence prediction (NSP), along with data preparation techniques in PyTorch. You will also examine how transformer architectures are applied to machine translation, including their implementation using PyTorch. Through hands-on labs, you will gain practical experience with decoder models, encoder models, and translation tasks. The module concludes with a cheat sheet, glossary, and summary to help consolidate your understanding of key concepts.

What's included

10 videos6 readings4 assignments4 app items2 plugins

10 videos•Total 67 minutes
  • Language Modeling with the Decoders and GPT-like Models•6 minutes
  • Training Decoder Models•7 minutes
  • Decoder Models- PyTorch Implementation-Causal LM•5 minutes
  • Decoder Models: PyTorch Implementation Using Training and Inference•5 minutes
  • Encoder Models with BERT: Pretraining Using MLM•5 minutes
  • Encoder Models with BERT: Pretraining Using NSP•6 minutes
  • Data Preparation for BERT with PyTorch•8 minutes
  • Pretraining BERT Models with PyTorch•8 minutes
  • Transformer Architecture for Language Translation•5 minutes
  • Transformer Architecture for Translation: PyTorch Implementation•7 minutes
6 readings•Total 9 minutes
  • Summary and Highlights•1 minute
  • Summary and Highlights•1 minute
  • Summary and Highlights•1 minute
  • Course Conclusion•2 minutes
  • Thanks from the Course team•2 minutes
  • Congratulations and Next Steps•2 minutes
4 assignments•Total 63 minutes
  • Practice Quiz: Decoder Models•12 minutes
  • Practice Quiz: Encoder Models•12 minutes
  • Practice Quiz: Application of Transformers for Translation•9 minutes
  • Graded Quiz: Advanced Concepts of Transformer Architecture•30 minutes
4 app items•Total 180 minutes
  • Hands-on Lab: Decoder GPT-like Models•45 minutes
  • Hands-on Lab: Pretraining BERT Models•60 minutes
  • Hands-on Lab: Data Preparation for BERT•45 minutes
  • Lab: Transformers for Translation•30 minutes
2 plugins•Total 18 minutes
  • Cheat Sheet: Language Modeling with Transformers•15 minutes
  • Course Glossary: Language Modeling with Transformers •3 minutes

Earn a career certificate

Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.

Instructors

Instructor ratings

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

4.0 (16 ratings)
Joseph Santarcangelo
Joseph Santarcangelo
IBM
35 Courses•2,071,002 learners

Instructors

Instructor ratings

We asked all learners to give feedback on our instructors based on the quality of their teaching style.

4.0 (16 ratings)
Joseph Santarcangelo
Joseph Santarcangelo
IBM
35 Courses•2,071,002 learners
Fateme Akbari
Fateme Akbari
IBM
4 Courses•21,788 learners
Kang Wang
Kang Wang
3 Courses•29,202 learners

Offered by

IBM

Offered by

IBM

At IBM, we know how rapidly tech evolves and recognize the crucial need for businesses and professionals to build job-ready, hands-on skills quickly. As a market-leading tech innovator, we’re committed to helping you thrive in this dynamic landscape. Through IBM Skills Network, our expertly designed training programs in AI, software development, cybersecurity, data science, business management, and more, provide the essential skills you need to secure your first job, advance your career, or drive business success. Whether you’re upskilling yourself or your team, our courses, Specializations, and Professional Certificates build the technical expertise that ensures you, and your organization, excel in a competitive world.

Why people choose Coursera for their career

Felipe M.
Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."
Jennifer J.
Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."
Larry W.
Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."
Chaitanya A.
"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Learner reviews

4.5

99 reviews

  • 5 stars

    75%

  • 4 stars

    14%

  • 3 stars

    4%

  • 2 stars

    1%

  • 1 star

    6%

Showing 3 of 99

R
RR
4

Reviewed on Oct 11, 2024

Once again, great content and not that great documentation (printable cheatsheets, no slides, etc). Documentation is essential to review a course content in the future. Alas!

A
AB
5

Reviewed on Dec 30, 2024

This course gives me a wide picture of what transformers can be.

M
MA
5

Reviewed on Jan 18, 2025

Exceptional course and all the labs are industry related

View more reviews

Explore more from Data Science

  • D

    DeepLearning.AI

    Generative AI with Large Language Models

    Course

  • W

    Whizlabs

    NVIDIA: Fundamentals of NLP and Transformers

    Course

  • G

    Google Cloud

    Transformer Models and BERT Model

    Course

  • V

    Vanderbilt University

    Generative AI Assistants

    Specialization

Coursera Plus

Open new doors with Coursera Plus

Unlimited access to 10,000+ world-class courses, hands-on projects, and job-ready certificate programs - all included in your subscription

Learn more

Advance your career with an online degree

Earn a degree from world-class universities - 100% online

Explore degrees

Join over 3,400 global companies that choose Coursera for Business

Upskill your employees to excel in the digital economy

Learn more

Coursera Footer

Technical Skills

  • ChatGPT
  • Coding
  • Computer Science
  • Cybersecurity
  • DevOps
  • Ethical Hacking
  • Generative AI
  • Java Programming
  • Python
  • Web Development

Analytical Skills

  • Artificial Intelligence
  • Big Data
  • Business Analysis
  • Data Analytics
  • Data Science
  • Financial Modeling
  • Machine Learning
  • Microsoft Excel
  • Microsoft Power BI
  • SQL

Business Skills

  • Accounting
  • Digital Marketing
  • E-commerce
  • Finance
  • Google
  • Graphic Design
  • IBM
  • Marketing
  • Project Management
  • Social Media Marketing

Career Resources

  • Essential IT Certifications
  • High-Income Skills to Learn
  • How to Get a PMP Certification
  • How to Learn Artificial Intelligence
  • Popular Cybersecurity Certifications
  • Popular Data Analytics Certifications
  • What Does a Data Analyst Do?
  • Career Development Resources
  • Career Aptitude Test
  • Share your Coursera Learning Story

Coursera

  • About
  • What We Offer
  • Leadership
  • Careers
  • Catalog
  • Coursera Plus
  • Professional Certificates
  • MasterTrack® Certificates
  • Degrees
  • For Enterprise
  • For Government
  • For Campus
  • Become a Partner
  • Social Impact
  • Free Courses
  • ECTS Credit Recommendations

Community

  • Learners
  • Partners
  • Beta Testers
  • Blog
  • The Coursera Podcast
  • Tech Blog

More

  • Press
  • Investors
  • Terms
  • Privacy
  • Help
  • Accessibility
  • Contact
  • Articles
  • Directory
  • Affiliates
  • Modern Slavery Statement
  • Manage Cookie Preferences
Learn Anywhere
Download on the App Store
Get it on Google Play
Logo of Certified B Corporation
© 2025 Coursera Inc. All rights reserved.
  • Coursera Facebook
  • Coursera Linkedin
  • Coursera Twitter
  • Coursera YouTube
  • Coursera Instagram
  • Coursera TikTok
Coursera

Welcome back

New to Coursera?

Having trouble logging in? Learner help center