Ada Lovelace (microarchitecture)

Ada Lovelace, also referred to simply as Lovelace,[1] is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. Named after English mathematician Ada Lovelace,[2] who is often regarded as the first computer programmer, it is the first architecture to include a first and last name. Nvidia announced the architecture along with the new GeForce 40 series consumer GPUs[3] and the RTX 6000 Ada Generation pro workstation graphics card.[4] The new GPUs were revealed to use TSMC's new 5 nm "4N" process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.[5]

Daguerreotype of Ada Lovelace, eponym of the architecture

Ada Lovelace
LaunchedOctober 12, 2022 (2022-10-12)
Designed byNvidia
Manufactured by
Fabrication processTSMC 4N
Product Series
Desktop
Professional/workstation
  • RTX 6000 Ada
Server/datacenter
Specifications
Clock rate735 MHz to 2640 MHz
L1 cache128 KB (per SM)
L2 cache32 MB to 96 MB
Memory support
Memory clock rate21-22.4 Gbps
PCIe supportPCIe 4.0
Supported Graphics APIs
DirectXDirectX 12 Ultimate (12.2)
Direct3DDirect3D 12
Shader ModelShader Model 6.7
OpenCLOpenCL 3.0
OpenGLOpenGL 4.6
CUDACompute Capability 8.9
VulkanVulkan 1.3
Supported Compute APIs
CUDACUDA Toolkit 11.6
DirectComputeYes
Media Engine
Encode codecs
Decode codecs
Color bit-depth
  • 8-bit
  • 10-bit
Encoder(s) supportedNVENC
Display outputs
History
PredecessorAmpere
VariantHopper (datacenter)

Background

The Ada Lovelace architecture follows on from the Ampere architecture that was released in 2020. The Ada Lovelace architecture was announced by Nvidia CEO Jensen Huang during a GTC 2022 keynote on September 20, 2022 with the architecture powering Nvidia's GPUs for gaming, workstations and datacenters.[6]

Architectural details

Architectural improvements of the Ada Lovelace architecture include the following:[7]

  • CUDA Compute Capability 8.9[8]
  • TSMC 4N process (custom designed for NVIDIA) - not to be confused with TSMC's regular N4 node
  • 4th-generation Tensor Cores with FP8, FP16, bfloat16, TensorFloat-32 (TF32) and sparsity acceleration
  • 3rd-generation Ray Tracing Cores, plus concurrent ray tracing and shading and compute
  • Shader Execution Reordering (SER)[9]
  • Nvidia video encoder/decoder (NVENC/NVDEC) with 8K 10-bit 60FPS AV1 fixed function hardware encoding[10][11]
  • No NVLink support[12][13]

CUDA cores

128 CUDA cores are included in each SM.

RT cores

Ada Lovelace features third-generation RT cores The RTX 4090 features 128 RT cores compared to the 84 in the previous generation RTX 3090 Ti. These 128 RT cores can provide up to 191 TFLOPS of compute with 1.49 TFLOPS per RT core.[14] A new stage in the ray tracing pipeline called Shader Execution Reordering (SER) is added in the Lovelace architecture which Nvidia claims provides a 2x performance improvement in ray tracing workloads.[6]

Tensor cores

Lovelace's new fourth-generation Tensor cores enable the AI technology used in DLSS 3's frame generation techniques. Much like Ampere, each SM contains 4 Tensor cores but Lovelace contains a greater number of Tensor cores overall given its increased number of SMs.

Clock speeds

There is a significant increase in clock speeds with the Ada Lovelace architecture with the RTX 4090's base clock speed being higher than the boost clock speed of the RTX 3090 Ti.

RTX 2080 Ti RTX 3090 Ti RTX 4090
Architecture Turing Ampere Ada Lovelace
Base clock speed
(MHz)
1350 1560 2235
Boost clock speed
(MHz)
1635 1860 2520

Cache and memory subsystem

RTX 2080 Ti RTX 3090 Ti RTX 4090
Architecture Turing Ampere Ada Lovelace
L1 Data Cache 6.375 MB
(96 KB per SM)
10.5 MB
(128 KB per SM)
16 MB
(128 KB per SM)
L2 Cache 5.5 MB 6 MB 72 MB

The fully enabled AD102 Lovelace die features 96 MB of L2 cache, a 16x increase from the 6 MB in the Ampere-based GA102 die.[15] The GPU having quick access to a high amount of L2 cache benefits complex operations like ray tracing compared to the GPU seeking data from the GDDR video memory which is slower. Relying less on accessing memory for storing important and frequently accessed data means that a narrower memory bus width can be used in tandem with a large L2 cache.

Each memory controller uses a 32-bit connection with up to 12 present for a combined memory bus width of 384-bit. The Lovelace architecture can use either GDDR6 or GDDR6X memory. GDDR6X memory features on the desktop GeForce RTX 40 series while the more energy-efficient GDDR6 memory is used on its corresponding mobile versions and on RTX A6000 workstation GPUs.

Power efficiency and process node

The Ada Lovelace architecture is able to use lower voltages compared to its predecessor.[6] Nvidia claims a 2x performance increase for the RTX 4090 at the same 450W used by the previous generation flagship RTX 3090 Ti.[16]

Increased power efficency can be attributed in part to the smaller fabrication node used by the Lovelace architecture. The Ada Lovelace architecture is fabricated on TSMC's cutting-edge 4N process, a custom designed process node for Nvidia. The previous generation Ampere architecture used Samsung's 8nm-based 8N process node from 2018, which was two years old by the time of Ampere's launch.[17][18] The AD102 die with its 76.3 billion transistors has a transistor density of 125.5 million per mm2, a 178% increase in density from GA102's 45.1 million per mm2.

Media engine

The Lovelace architecture utilizes the new 8th generation Nvidia NVENC video encoder and the 7th generation NVDEC video decoder introduced by Ampere returns.[19]

NVENC AV1 hardware encoding with support for up to 8K resolution at 60FPS in 10-bit color is added, enabling higher video fidelity at lower bit rates compared to the H.264 and H.265 codecs.[20] Nvidia claims that its NVENC AV1 encoder featured in the Lovelace architecture is 40% more efficient than the H.264 encoder in the Ampere architecture.[21]

The Lovelace architecture received criticism for not supporting the DisplayPort 2.0 connection that supports higher display data bandwidth and instead uses the older DisplayPort 1.4a which is limited to a peak bandwidth of 32Gbps.[22] As a result, Lovelace GPUs would be limited by DisplayPort 1.4a's supported refresh rates despite the GPU's performance being able to reach higher frame rates. Intel's Arc GPUs that also released in October 2022 included DisplayPort 2.0. AMD's competing RDNA 3 architecture released just two months after Lovelace included DisplayPort 2.1.[23]

Ada Lovelace dies

Comparison of Ada Lovelace chips
Chip[24] AD102[25] AD103[26] AD104[27] AD106[28] AD107[29]
Die size 609 mm2 379 mm2 294 mm2 188 mm2 159 mm2
Transistors 76.3B 45.9B 35.8B 22.9B 18.9B
Transistor density 125.3 MTr/mm2 121.1 MTr/mm2 121.8 MTr/mm2 121.8 MTr/mm2 118.9 MTr/mm2
Graphics processing
clusters (GPC)
12 7 5 3 2
Streaming
multiprocessors (SM)
144 80 60 36 24
CUDA cores 18432 10240 7680 4608 3072
Texture mapping units 576 320 240 144 96
Render output units 192 112 80 64 32
Tensor cores 576 320 240 144 96
RT cores 144 80 60 36 24
L1 cache 18 MB 10 MB 7.5 MB 4.5 MB 3 MB
128 KB per SM
L2 cache 96 MB 64 MB 48 MB 32 MB

Ada Lovelace-based products

Gaming

  • GeForce 40 series
    • GeForce RTX 4050 (mobile) (AD107)
    • GeForce RTX 4060 (mobile) (AD107)
    • GeForce RTX 4060 Ti (AD106)
    • GeForce RTX 4070 (mobile) (AD106)
    • GeForce RTX 4070 (AD104)
    • GeForce RTX 4070 Ti (AD104)
    • GeForce RTX 4080 (mobile) (AD104)
    • GeForce RTX 4080 (AD103)
    • GeForce RTX 4090 (mobile) (AD103)
    • GeForce RTX 4090 (AD102)

Desktop Workstation

Model Launch Launch
MSRP
(USD)
Code
name
(s)
Transistors (billion)
Die
size
Core
config[lower-alpha 1]
SM
count[lower-alpha 2]
Cache Clock speeds[lower-alpha 3] Fillrate[lower-alpha 4][lower-alpha 5] Memory Processing power (TFLOPS) TDP
L1 L2 Core
clock
(MHz)
Memory
(Gb/s)
Pixel
(Gpx/s)
Texture
(Gtex/s)
Type Size Bandwidth
(GB/s)
Bus
width
Half
precision

(boost)
Single
precision

(boost)
Double
precision

(boost)
Tensor
compute
[sparse]

RTX 4000 SFF
Ada Generation[30]
Mar 21, 2023 $1,250 AD104-400 35.8 294.5 mm2 6144
192:80:48:192
48 6 MB 48 MB 1290
(1565)
16 Gbps 103.2
(125.2)
247.68
(300.48)
GDDR6 20 GB 320 160-bit
(19.2)
153.4
[306.8]
70 W
RTX 6000
Ada Generation[31]
Jan 20, 2023 $6,799 AD102-300 76.3 608.4 mm2 18,176
568:192:142:568
142 17.75 MB 96 MB 915
(2505)
20 Gbps 175.68
(480.96)
519.72
(1,422.84)
48 GB 960 384-bit
(91.1)
728.5
[1457.0]
300 W
  1. Shader Processors : Texture mapping units : Render output units : Ray tracing cores : Tensor Cores
  2. The number of Streaming multi-processors on the GPU.
  3. Core boost values (if available) are stated below the base value inside brackets.
  4. Pixel fillrate is calculated as the number of render output units (ROPs) multiplied by the base (or boost) core clock speed.
  5. Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.

Mobile Workstation

Model Launch Code
name
(s)
Transistors (billion)
Die
size
Core
config[lower-alpha 1]
SM
count[lower-alpha 2]
Cache Clock speeds[lower-alpha 3] Fillrate[lower-alpha 4][lower-alpha 5] Memory Processing power (TFLOPS) TGP
L1 L2 Core
clock
(MHz)
Memory
(Gb/s)
Pixel
(Gpx/s)
Texture
(Gtex/s)
Type Size Bandwidth
(GB/s)
Bus
width
Half
precision

(boost)
Single
precision

(boost)
Double
precision

(boost)
Tensor
compute
[sparse]

RTX 2000 Max-Q
Ada Laptop
Mar 21, 2023 AD107 146 mm2 3072
96:32:24:96
24 3 MB 12 MB 930
(1455)
14 Gbps 29.76
(46.56)
89.28
(139.68)
GDDR6 8 GB 224 128-bit 35 W
RTX 2000
Ada Laptop
1635
(2115)
16 Gbps 52.32
(67.68)
156.96
(203.04)
256
(14.5)
115.8
[231.6]
35โ€“140 W
RTX 3000
Ada Laptop
AD106 22.9 190 mm2 4608
144:48:36:144
36 4.5 MB 32 MB 1395
(1695)
66.96
(81.36)
200.88
(244.08)

(19.9)
159.3
[318.6]
RTX 3500
Ada Laptop
AD104 35.8 294.5 mm2 5120
160:64:40:160
40 5 MB 48 MB 1290
(1665)
18 Gbps 82.56
(106.56)
206.4
(266.4)
12 GB 432 192-bit
(23.0)
184.3
[368.6]
60โ€“140 W
RTX 4000
Ada Laptop
7424
232:80:58:232
58 7.25 MB 1290
(1665)
103.2
(133.2)
299.28
(386.28)

(33.6)
269.0
[538.0]
80โ€“175 W
RTX 5000
Ada Laptop
AD103 45.9 378.6 mm2 9728
304:112:76:304
76 9.5 MB 64 MB 1335
(1695)
149.52
(189.84)
405.84
(515.28)
16 GB 576 256-bit
(42.6)
340.9
[681.8]
  1. Shader Processors : Texture mapping units : Render output units : Ray tracing cores : Tensor Cores
  2. The number of Streaming multi-processors on the GPU.
  3. Core boost values (if available) are stated below the base value inside brackets.
  4. Pixel fillrate is calculated as the number of render output units (ROPs) multiplied by the base (or boost) core clock speed..
  5. Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.

Datacenter

Model Launch Launch
MSRP
(USD)
Code
name
(s)
Transistors (billion)
Die
size
Core
config[lower-alpha 1]
SM
count[lower-alpha 2]
Cache Clock speeds[lower-alpha 3] Fillrate[lower-alpha 4][lower-alpha 5] Memory Processing power (TFLOPS) TBP
L1 L2 Core
clock
(MHz)
Memory
(MHz)
Pixel
(Gpx/s)
Texture
(Gtex/s)
Type Size Bandwidth
(GB/s)
Bus
width
Half
precision

(boost)
Single
precision

(boost)
Double
precision

(boost)
Tensor
compute
[sparse]

L4 Mar 21, 2023 $ AD104-???-A1 35.8 295 mm2 7,680
240:80:60:240
60 7.5 MB 48 MB 795
(2040)
1313 63.6
(163.2)
190.8
(489.6)
GDDR6X 24 GB 504.2 192-bit 285 W
L40 [32] Oct 13, 2022 $ AD102-895-A1 76.3 608.4 mm2 18,176
568:192:142:568
142 17.75 MB 96 MB 735
(2490)
2250 58.8
(199.2)
176.4
(597.6)
GDDR6 48 GB 864 384-bit 300 W
L40G $ AD102-???-A1 48 MB 1005
(2475)
80.4
(198.0)
241.2
(594.0)
24 GB
L40 CNX $ AD102-???-A1
  1. Shader Processors : Texture mapping units : Render output units : Ray tracing cores : Tensor Cores
  2. The number of Streaming multi-processors on the GPU.
  3. Core boost values (if available) are stated below the base value inside brackets.
  4. Pixel fillrate is calculated as the number of render output units (ROPs) multiplied by the base (or boost) core clock speed..
  5. Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.

See also

References

  1. Freund, Karl (September 20, 2022). "NVIDIA Launches Lovelace GPU, Cloud Services, Ships H100 GPUs, New Drive Thor". Forbes. Retrieved November 18, 2022.
  2. Mujtaba, Hassan (September 15, 2022). "NVIDIA's Next-Gen Ada Lovelace Gaming GPU Architecture For GeForce RTX 40 Series Confirmed". Wccftech. Retrieved November 18, 2022.
  3. "NVIDIA Delivers Quantum Leap in Performance, Introduces New Era of Neural Rendering with GeForce RTX 40 Series". NVIDIA Newsroom (Press release). September 20, 2022. Retrieved September 20, 2022.
  4. "NVIDIA's New Ada Lovelace RTX GPU Arrives for Designers and Creators". Nvidia Newsroom. September 20, 2022. Retrieved November 18, 2022.
  5. Machkovec, Sam (September 20, 2022). "Nvidia's Ada Lovelace GPU generation: $1,599 for RTX 4090, $899 and up for 4080". Ars Technica. Retrieved November 18, 2022.
  6. Chiappetta, Marco (September 22, 2022). "NVIDIA GeForce RTX 40 Architecture Overview: Ada's Special Sauce Unveiled". HotHardware. Retrieved April 8, 2023.
  7. "NVIDIA Ada Lovelace Architecture". NVIDIA. September 20, 2022. Retrieved September 20, 2022.
  8. "CUDA C++ Programming Guide". docs.nvidia.com. Retrieved April 15, 2023.
  9. "Improve Shader Performance and In-Game Frame Rates with Shader Execution Reordering". NVIDIA Technical Blog. October 13, 2022. Retrieved April 6, 2023.
  10. Deigado, Gerado (September 20, 2022). "Creativity At The Speed of Light: GeForce RTX 40 Series Graphics Cards Unleash Up To 2X Performance in 3D Rendering, AI, and Video Exports For Gamers and Creators". NVIDIA. Retrieved September 20, 2022.
  11. "Nvidia Video Codec SDK". NVIDIA Developer. September 20, 2022. Retrieved November 18, 2022.
  12. Chuong Nguyen (September 21, 2022). "Nvidia kills off NVLink on RTX 4090". Windows Central. Retrieved January 1, 2023.
  13. btarunr (September 21, 2022). "Jensen Confirms: NVLink Support in Ada Lovelace is Gone". TechPowerUp. Retrieved November 18, 2022.
  14. "Nvidia Ada Lovelace GPU Architecture: Designed to deliver outstanding gaming and creating, professional graphics, AI, and compute performance" (PDF). Nvidia. p. 30. Retrieved April 5, 2023.
  15. "Nvidia Ada Lovelace GPU Architecture: Designed to deliver outstanding gaming and creating, professional graphics, AI, and compute performance" (PDF). Nvidia. p. 12. Retrieved April 6, 2023.
  16. "Nvidia Ada Lovelace GPU Architecture: Designed to deliver outstanding gaming and creating, professional graphics, AI, and compute performance" (PDF). Nvidia. p. 12. Retrieved April 5, 2023.
  17. James, Dave (September 1, 2020). "Nvidia confirms Samsung 8nm process for RTX 3090, RTX 3080, and RTX 3070". PC Gamer. Retrieved April 5, 2023.
  18. Bosnjak, Dominik (September 1, 2020). "Samsung's old 8nm tech at the heart of NVIDIA's monstrous Ampere cards". SamMobile. Retrieved April 5, 2023.
  19. "Nvidia Ada Lovelace GPU Architecture: Designed to deliver outstanding gaming and creating, professional graphics, AI, and compute performance" (PDF). Nvidia. p. 25. Retrieved April 5, 2023.
  20. Muthana, Prathap; Mishra, Sampurnananda; Patait, Abhijit (January 18, 2023). "Improving Video Quality and Performance with AV1 and NVIDIA Ada Lovelace Architecture". Nvidia Developer. Retrieved April 5, 2023.
  21. "Nvidia Ada Science: How Ada advances the science of graphics with DLSS 3" (PDF). Nvidia. p. 13. Retrieved April 5, 2023.
  22. Garreffa, Anthony (September 25, 2022). "NVIDIA's next-gen GeForce RTX 40 series lack DP2.0 connectivity, silly". TweakTown. Retrieved April 5, 2023.
  23. Judd, Will (November 3, 2022). "AMD announces 7900 XTX and 7900 XT graphics cards with FSR 3". Eurogamer. Retrieved April 5, 2023.
  24. "NVIDIA confirms Ada 102/103/104 GPU specs, AD104 has more transistors than GA102". VideoCardz. September 23, 2022. Retrieved September 23, 2022.
  25. "NVIDIA AD102 GPU Specs". TechPowerUp. Retrieved December 17, 2022.
  26. "NVIDIA AD103 GPU Specs". TechPowerUp. Retrieved December 17, 2022.
  27. "NVIDIA AD104 GPU Specs". TechPowerUp. Retrieved October 18, 2022.
  28. "NVIDIA AD106 GPU Specs". TechPowerUp. Retrieved December 17, 2022.
  29. "NVIDIA AD107 GPU Specs". TechPowerUp. Retrieved December 17, 2022.
  30. "NVIDIA RTX 4000 SFF Ada Generation: Power for endless possibilities" (PDF). Nvidia. Retrieved April 5, 2023.
  31. "RTX 6000 Ada Generation: Power for endless possibilities" (PDF). Nvidia. Retrieved April 5, 2023.
  32. "Nvidia L40 GPU Accelerator Product Brief" (PDF). Nvidia. Retrieved April 5, 2023.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.