GeForce 700 series: Difference between revisions

Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 15:27, 20 May 2013 editEBusiness (talk \| contribs)136 edits Big round of deleting, no sources have reliably linked the GF 700 series and the GK110 chip, so it is gone.← Previous edit		Revision as of 19:24, 20 May 2013 edit undo210.195.84.194 (talk)No edit summaryNext edit →
Line 3:		Line 3:
	\| name = GeForce 700 Series		\| name = GeForce 700 Series
	\| image = ]		\| image = ]
	\| codename =		\| codename = GK110, GK114, GK116, GK117
	\| created =		\| created =
	\| model = GeForce Series		\| model = GeForce Series
	\| model1 = GeForce GT Series		\| model1 = GeForce GT Series
	\| model2 = GeForce GTX Series		\| model2 = GeForce GTX Series
			\| transistors = 292M 40 nm (GF119)
	\| transistors1 = 585M 28 nm (GF117)		\| transistors1 = 585M 28 nm (GF117)
	\| transistors2 = 1,270M 28 nm (GK107)		\| transistors2 = 1,270M 28 nm (GK107)
	\| transistors3 = 1,270M 28 nm (GK208)		\| transistors3 = 1,270M 28 nm (GK208)
			\| transistors4 =

			\| transistors5 = 2,540M 28 nm (GK106)
			\| transistors6 = 3,540M 28 nm (GK104)
			\| transistors7 =
			\| transistors8 =
			\| transistors9 = 7,080M 28 nm (GK110)
	\| arch = ]		\| arch = ]
	\| entry =		\| entry =
Line 23:		Line 29:
	}}		}}

	The '''GeForce 700 Series''' will be a family of ]s developed by ], ~~to be~~ used in desktop and laptop PCs. It will serve as the introduction for the Kepler Refresh architecture (GK-codenamed chips), named after the German mathematician, astronomer, and astrologer ]. A number of GeForce 700 series chips were released for mobile devices in April 2013. ~~No desktop graphics cards have been released yet~~		The '''GeForce 700 Series''' will be a family of ]s developed by ], used in desktop and laptop PCs. It will serve as the introduction for the Kepler Refresh architecture (GK-codenamed chips), named after the German mathematician, astronomer, and astrologer ]. A number of GeForce 700 series chips were released for mobile devices in April 2013.

			== Overview ==
			With GK110, Nvidia focuses on compute performance. With 7.1 billion transistors it is the biggest GPU in terms of transistor count, dwarfing the GK104 and GF110. GK110 is unrivaled from a fabrication and power consumption standpoint, but the end result is that the performance per watt is unmatched due to the fact that so many tasks (graphical and compute) are massively parallel and map well to the large arrays of streaming processors found in GK110.

			With GK110, increase in space and bandwidth for both the register file and the L2 cache are seen. At the SMX level, GK110 register file space has increased to 256KB composed of 65K 32bit registers, as compared to Fermi. As for the L2 cache, GK110 L2 cache space increased by up to 1.5MB, twice as big as GF110. Both the L2 cache and register file bandwidth have also doubled.
			Performance in register-starved scenarios is also improved as there are more registers available to each thread. This goes in hand with an increase of total number of registers each thread can address, moving from 63 registers per thread to 255 registers per thread with GK110.

			With GK110, Nvidia also reworked the GPU texture cache to be used for compute. With 48KB in size, in compute the texture cache becomes a read-only cache, specializing in unaligned memory access workloads. Furthermore error detection capabilities have been added to make it safer for use with workloads that rely on ECC.<ref name="anandtech-GK110-preview">{{cite web \| url=http://www.anandtech.com/show/6446/nvidia-launches-tesla-k20-k20x-gk110-arrives-at-last/3\| title=NVIDIA Launches Tesla K20 & K20X: GK110 Arrives At Last \| date=11/12/2012 \| publisher=AnandTech}}</ref>

			== Features ==
			The GeForce 700 Series contains features from both GK104 and GK110. Kepler based members of the 700 series add the following standard features to the GeForce family.

			Derive from GK104 :

			* ] interface

			* ] 1.2
			* ] 1.4a 4K x 2K video output
			* ] hardware video acceleration (up to 4K x 2K H.264 decode)
			* Hardware H.264 encoding acceleration block (NVENC)
			* Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
			* Bindless Textures
			* GPU Boost
			* TXAA
			* Manufactured by ] on a 28 nm process

			New Features from GK110 :

			* Compute Focus SMX Improvement
			* ] Compute Capability 3.5
			* New Shuffle Instructions
			* Dynamic Parallelism
			* Hyper-Q (Hyper-Q's MPI functionality reserve for Tesla only)
			* Grid Management Unit
			* NVIDIA GPUDirect (GPU Direct’s RDMA functionality reserve for Tesla only)

			=== Compute Focus SMX Improvement ===

			With GK110, Nvidia opted to increase compute performance. The single biggest change from GK104 is that rather than 8 dedicated FP64 CUDA cores, GK110 has up to 64, giving it 8x the FP64 throughput of a GK104 SMX. The SMX also sees an increase in space for register file. Register file space has increased to 256KB compared to Fermi. The texture cache are also improved. With a 48KB space, the texture cache can become a read-only cache for compute workloads.<ref name=anandtech-GK110-preview />

			=== New Shuffle Instructions ===
			At a low level, GK110 sees an additional instructions and operations to further improve performance. New shuffle instructions allow for threads within a warp to share data without going back to memory, making the process much quicker than the previous load/share/store method. Atomic operations are also overhauled, speeding up the execution speed of atomic operations and adding some FP64 operations that were previously only available for FP32 data.<ref name=anandtech-GK110-preview />

			=== Hyper-Q ===
			Hyper-Q expands GK110 hardware work queues from 1 to 32. The significance of this being that having a single work queue meant that Fermi could be under occupied at times as there wasn’t enough work in that queue to fill every SM. By having 32 work queues, GK110 can in many scenarios, achieve higher utilization by being able to put different task streams on what would otherwise be an idle SMX. The simple nature of Hyper-Q is further reinforced by the fact that it’s easily map to MPI, a common message passing interface frequently used in HPC. As legacy MPI-based algorithms that were originally designed for multi-CPU systems that became bottlenecked by false dependencies now have a solution. By increasing the number of MPI jobs, it’s possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.<ref name=anandtech-GK110-preview />

			=== Dynamic Parallelism ===
			Dynamic Parallelism ability is for kernels to be able to dispatch other kernels. With Fermi, only the CPU could dispatch a kernel, which incurs a certain amount of overhead by having to communicate back to the CPU. By giving kernels the ability to dispatch their own child kernels, GK110 can both save time by not having to go back to the CPU, and in the process free up the CPU to work on other tasks.<ref name=anandtech-GK110-preview />

			=== Grid Management Unit ===
			Enabling Dynamic Parallelism requires a new grid management and dispatch control system. The new Grid Management Unit (GMU) manages and prioritizes grids to be executed. The GMU can pause the dispatch of new grids and queue pending and suspended grids until they are ready to execute, providing the flexibility to enable powerful runtimes, such as Dynamic Parallelism.
			The CUDA Work Distributor in Kepler holds grids that are ready to dispatch, and is able to dispatch 32 active grids, which is double the capacity of the Fermi CWD. The Kepler CWD communicates with the GMU via a bidirectional link that allows the GMU to pause the dispatch of new grids and to hold pending and suspended grids until needed. The GMU also has a direct connection to the Kepler SMX units to permit grids that launch additional work on the GPU via Dynamic Parallelism to send the new work back to GMU to be prioritized and dispatched. If the kernel that dispatched the additional workload pauses, the GMU will hold it inactive until the dependent work has completed. <ref>{{cite web \| url= http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf \| title= NVIDIA-Kepler-GK110-Architecture-Whitepaper\|}}</ref>

			=== NVIDIA GPUDirect ===
			NVIDIA GPUDirect™ is a capability that enables GPUs within a
			single computer, or GPUs in different servers located across a network, to directly exchange
			data without needing to go to CPU/system memory. The RDMA feature in GPUDirect allows
			third party devices such as SSDs, NICs, and IB adapters to directly access memory on multiple
			GPUs within the same system, significantly decreasing the latency of MPI send and receive
			messages to/from GPU memory. It also reduces demands on system memory bandwidth and
			frees the GPU DMA engines for use by other CUDA tasks. Kepler GK110 also supports other
			GPUDirect features including Peer‐to‐Peer and GPUDirect for Video.

	==Products==		==Products==
			===GeForce 700 (7xx) series===
			The GeForce 700 series for desktop architecture. The processing power is obtained by multiplying shader clock speed, the number of cores and how many instructions the cores are capable of performing per cycle.

			* <sup>1</sup> ] (] / ] / ]) : ] : ]

			{\| class="wikitable" style="font-size: 85%; text-align: center; width: auto;"
			\|-
			! rowspan=2 \| Model
			! rowspan=2 \| Launch
			! rowspan=2 \| ]
			! rowspan=2 \| Fab (])
			! rowspan=2 \| ] ]
			! rowspan=2 \| Memory (])
			! rowspan=2 \| Config core<sup>1</sup>
			! colspan=3 style="text-align:center;" \| Clock speed
			! colspan=2 style="text-align:center;" \| ]
			! colspan=3 style="text-align:center;" \| Memory
			! colspan=2 style="text-align:center;" \| ] support (version)
			! rowspan=2 \| Processing Power<sup>2</sup><br> (GFLOPS)
			! rowspan=2 \| ] (watts)
			! rowspan=2 \| Release Price (USD)
			\|-
			! Core (])
			! Shader (])
			! Memory (])
			! Pixel (]/s)
			! Texture (]/s)
			! Bandwidth (]/s)
			! Bus type
			! Bus width (])
			! ]
			! ]
			\|-
			!! style="text-align:left;" \| GeForce GTX 770
			\| May 30, 2013
			\| GK104-425-A2
			\| 28
			\| PCIe 3.0 x16
			\| 2048
			\| 1536:128:32
			\| 1058
			\| 1058
			\| 1753 <br> (7012)
			\| ?
			\| ?
			\| ?
			\| GDDR5
			\| 256
			\| 11.0
			\| 4.3
			\| ?
			\| 195
			\|
			\|-
			!! style="text-align:left;" \| GeForce GTX 780
			\| May 23, 2013
			\| GK110-300-A1
			\| 28
			\| PCIe 3.0 x16
			\| 3072
			\| 2304:192:48
			\| 863
			\| 863
			\| 1502 <br> (6008)
			\| ?
			\| ?
			\| ?
			\| GDDR5
			\| 384
			\| 11.0
			\| 4.3
			\| ?
			\| 250
			\|
			\|-
			\|}

	===GeForce 700M (7xxM) series===		===GeForce 700M (7xxM) series===
			The GeForce 700M series for notebooks architecture. The processing power is obtained by multiplying shader clock speed, the number of cores and how many instructions the cores are capable of performing per cycle.
	Some implementations may use different specifications.

	* <sup>1</sup> ] (]/]/]) : ] : ]		* <sup>1</sup> ] (]/]/]) : ] : ]
Line 43:		Line 188:
	! colspan=2 style="text-align:center;" \| ]		! colspan=2 style="text-align:center;" \| ]
	! colspan=3 style="text-align:center;" \| Memory		! colspan=3 style="text-align:center;" \| Memory
	! colspan=3 style="text-align:center;" \| ] support (version)		! colspan=2 style="text-align:center;" \| ] support (version)
	! rowspan=2 \| Processing Power<sup>2</sup><br> (GFLOPS)		! rowspan=2 \| Processing Power<sup>2</sup><br> (GFLOPS)
	! rowspan=2 \| ] (watts)		! rowspan=2 \| ] (watts)
Line 58:		Line 203:
	! ]		! ]
	! ]		! ]
	! ]
	\|-		\|-
			! style="text-align:left;" \| GeForce 705M
	! style="text-align:left;" \| GeForce 710M <ref>http://www.notebookcheck.net/NVIDIA-GeForce-710M.84746.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-710m/specifications</ref>
			\| April 2013
	\| ?
	\| ~~GF117~~		\| GF119
	\| 28		\| 40
	\| PCIe 2.0 x16		\| PCIe 2.0 x16
			\| 1024
	\| up to 2048
	\| 96:16:4		\| 48:8:4
			\| ?
			\| ?
	\| ?		\| ?
	\| ?		\| ?
	\| 1800
	\| ?		\| ?
	\| ?		\| ?
	\| 14.4
	\| DDR3		\| DDR3
	\| 64		\| 64
	\| 11		\| 11.0
	\| 4.1		\| 4.3
	\| Yes
	\| ?		\| ?
	\| ?		\| ?
			\| ]
	\|
	\|-		\|-
			! style="text-align:left;" \| GeForce 710M
	! style="text-align:left;" \| GeForce GT 720M <ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-720M.90247.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-720m/specifications</ref>
			\| Jan 2013
			\| GF117
			\| 28
			\| PCIe 2.0 x16
			\| 1024<br>2048
			\| 96:16:4
			\| 475
			\| 950
			\| 1800
			\| 1.9
			\| 7.6
			\| 14.4
			\| DDR3
			\| 64
			\| 11.0
			\| 4.3
			\| 182.40
			\| 12
			\| OEM
			\|-
			! style="text-align:left;" \| GeForce GT 720M
	\| April 1, 2013		\| April 1, 2013
	\| GF117		\| GF117
	\| 28		\| 28
	\| PCIe 2.0 x16		\| PCIe 2.0 x16
			\| ?
	\| up to 2048
	\| 96:16:4		\| 96:16:4
	\| 938
	\| 1876
	\| 2000
	\| ?		\| ?
	\| ?		\| ?
	\| ~~16.0~~		\| ?
			\| ?
			\| ?
			\| ?
	\| DDR3		\| DDR3
	\| 64		\| 64
	\| 11		\| 11.0
	\| 4.3		\| 4.3
	\| 1.2
	\| ?		\| ?
	\| ?		\| ?
	\|		\|
	\|-		\|-
	! style="text-align:left;" \| GeForce GT 730M ~~<ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-730M.84681.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-730m/specifications</ref>~~		! style="text-align:left;" \| GeForce GT 730M
	\| April 1, 2013		\| April 1, 2013
	\| ~~GK107/~~GK208		\| GK208
	\| 28		\| 28
	\| PCIe ~~2.0/~~3.0 x16		\| PCIe 3.0 x16
			\| ?
	\| up to 4096
			\| 192:16:16
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| DDR3
			\| 128
			\| 11.0
			\| 4.3
			\| ?
			\| ?
			\|
			\|-
			! style="text-align:left;" \| GeForce GT 730M
			\| Jan 2013
			\| GK107
			\| 28
			\| PCIe 3.0 x16
			\| 2048
	\| 384:32:16		\| 384:32:16
	\| 725		\| 725
	\| 725		\| 725
	\| 1800 ~~- 4000~~		\| 1800
			\| 5.8
			\| 23.2
			\| 28.8
			\| DDR3
			\| 128
			\| 11.0
			\| 4.3
			\| 556.8
	\| ?		\| ?
			\|
			\|-
			! style="text-align:left;" \| GeForce GT 735M
			\| April 1, 2013
			\| GK208
			\| 28
			\| PCIe 3.0 x16
	\| ?		\| ?
			\| 192:16:16
	\| 14.4 - 64.0
			\| ?
	\| DDR3/GDDR5
	\| ~~64/128~~		\| ?
	\| 11		\| ?
	\| ~~4.1~~		\| ?
	\| ~~Yes~~		\| ?
			\| ?
			\| DDR3
			\| 128
			\| 11.0
			\| 4.3
	\| ?		\| ?
	\| ?		\| ?
	\|		\|
	\|-		\|-
			! style="text-align:left;" \| GeForce GT 740M
	! style="text-align:left;" \| GeForce GT 735M <ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-735M.90246.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-735m/specifications</ref>
	\| April 1, 2013		\| April 1, 2013
	\| GK208		\| GK208
	\| 28		\| 28
	\| PCIe 3.0 x16		\| PCIe 3.0 x16
			\| ?
	\| up to 2048
	\| ~~384~~:32:16		\| 192:16:16
	\| ~~889~~		\| ?
	\| ~~889~~		\| ?
	\| ~~2000~~		\| ?
			\| ?
	\| ?		\| ?
	\| ?		\| ?
	\| 16.0
	\| DDR3		\| DDR3
	\| 64		\| 128
	\| 11		\| 11.0
	\| 4.3		\| 4.3
	\| 1.2
	\| ?		\| ?
	\| ?		\| ?
	\|		\|
	\|-		\|-
	! style="text-align:left;" \| GeForce GT 740M ~~<ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-740M.89900.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-740m/specifications</ref>~~		! style="text-align:left;" \| GeForce GT 740M
	\| April 1, 2013		\| April 1, 2013
	\| GK107/?		\| GK107
	\| 28		\| 28
	\| PCIe 3.0 x16		\| PCIe 3.0 x16
	\| ?		\| ?
	\| 384:32:16		\| 384:32:16
	\| 810
	\| 810
	\| 1800/3600
	\| ?		\| ?
	\| ?		\| ?
	\| ?		\| ?
			\| ?
	\| DDR3/GDDR5
	\| ~~128/~~?		\| ?
	\| 11		\| ?
			\| GDDR5
			\| 128
			\| 11.0
	\| 4.3		\| 4.3
	\| 1.2
	\| ?		\| ?
	\| ?		\| ?
	\|		\|
	\|-		\|-

	! style="text-align:left;" \| GeForce GT 745M <ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-745M.90244.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-745m/specifications</ref>
			! style="text-align:left;" \| GeForce GT 745M
	\| April 1, 2013		\| April 1, 2013
	\| GK107		\| GK107
	\| 28		\| 28
	\| PCIe 3.0 x16		\| PCIe 3.0 x16
			\| ?
	\| up to 2048
	\| 384:32:16		\| 384:32:16
	\| 837
	\| 837
	\| 2000 - 5000
	\| ?		\| ?
	\| ?		\| ?
			\| ?
	\| 32.0 - 80.0
			\| ?
			\| ?
			\| ?
	\| DDR3/GDDR5		\| DDR3/GDDR5
	\| 128		\| 128
	\| 11		\| 11.0
	\| 4.2		\| 4.3
	\| 1.2
	\| ?		\| ?
	\| ?		\| ?
	\|		\|
	\|-		\|-
	! style="text-align:left;" \| GeForce GT 750M ~~<ref>http://www.notebookcheck.net/NVIDIA-GeForce-GT-750M.90245.0.html</ref><ref>http://www.geforce.com/hardware/notebook-gpus/geforce-gt-750m/specifications</ref>~~		! style="text-align:left;" \| GeForce GT 750M
	\| April 1, 2013		\| April 1, 2013
	\| GK107		\| GK107
	\| 28		\| 28
	\| PCIe 3.0 x16		\| PCIe 3.0 x16
			\| ?
	\| up to 2048
	\| 384:32:16		\| 384:32:16
	\| 967		\| 967
	\| 967		\| 967
	\| 2000 - 5000		\| 2000<br>5000
	\| ?		\| ?
	\| ?		\| ?
	\| 32 - 80		\| 32<br>80
	\| DDR3/GDDR5		\| DDR3/GDDR5
	\| 128		\| 128
	\| 11		\| 11.0
	\| 4.2		\| 4.3
	\| 1.2		\| 742.7
	\| ?		\| ?
			\|
			\|-
			! style="text-align:left;" \| GeForce GTX 760M
			\| May 2013
			\| GK106
			\| 28
			\| PCIe 3.0 x16
	\| ?		\| ?
			\| 768:64:24
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| GDDR5
			\| 128
			\| 11.0
			\| 4.3
			\| ?
			\| ?
			\|
			\|-
			! style="text-align:left;" \| GeForce GTX 765M
			\| May 2013
			\| GK106
			\| 28
			\| PCIe 3.0 x16
			\| ?
			\| 768:64:24
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| ?
			\| GDDR5
			\| 128
			\| 11.0
			\| 4.3
			\| ?
			\| ?
			\|
			\|-
			! style="text-align:left;" \| GeForce GTX 770M
			\| May 2013
			\| GK104
			\| 28
			\| PCIe 3.0 x16
			\| 3072
			\| 960:80:24
			\| 600
			\| 600
			\| 2800
			\| 14.4
			\| 48.0
			\| 67.2
			\| GDDR5
			\| 192
			\| 11.0
			\| 4.3
			\| 1152
			\| 75
			\|
			\|-
			! style="text-align:left;" \| GeForce GTX 780M
			\| May 2013
			\| GK104
			\| 28
			\| PCIe 3.0 x16
			\| 4096
			\| 1536:128:32
			\| 720
			\| 720
			\| 5000
			\| 23
			\| 92.2
			\| 160
			\| GDDR5
			\| 256
			\| 11.0
			\| 4.3
			\| 2234.3
			\| 100+
	\|		\|
	\|-		\|-
Line 240:		Line 525:
	*		*
	*		*
			*
			*
			*
	{{refend}}		{{refend}}

Revision as of 19:24, 20 May 2013

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "GeForce 700 series" – news · newspapers · books · scholar · JSTOR (May 2013) (Learn how and when to remove this message)

GeForce 700 Series
API support
GeForce logo
Codename	GK110, GK114, GK116, GK117
Models	GeForce Series GeForce GT Series GeForce GTX Series
Transistors	292M 40 nm (GF119) 585M 28 nm (GF117) 1,270M 28 nm (GK107) 1,270M 28 nm (GK208) 2,540M 28 nm (GK106) 3,540M 28 nm (GK104) 7,080M 28 nm (GK110)
DirectX	Direct3D 11.0 Shader Model 5.0
OpenCL	OpenCL 1.2
OpenGL	OpenGL 4.3
History
Predecessor	GeForce 600 Series
Successor	GeForce 800 Series

The GeForce 700 Series will be a family of graphics processing units developed by Nvidia, used in desktop and laptop PCs. It will serve as the introduction for the Kepler Refresh architecture (GK-codenamed chips), named after the German mathematician, astronomer, and astrologer Johannes Kepler. A number of GeForce 700 series chips were released for mobile devices in April 2013.

Overview

With GK110, Nvidia focuses on compute performance. With 7.1 billion transistors it is the biggest GPU in terms of transistor count, dwarfing the GK104 and GF110. GK110 is unrivaled from a fabrication and power consumption standpoint, but the end result is that the performance per watt is unmatched due to the fact that so many tasks (graphical and compute) are massively parallel and map well to the large arrays of streaming processors found in GK110.

With GK110, increase in space and bandwidth for both the register file and the L2 cache are seen. At the SMX level, GK110 register file space has increased to 256KB composed of 65K 32bit registers, as compared to Fermi. As for the L2 cache, GK110 L2 cache space increased by up to 1.5MB, twice as big as GF110. Both the L2 cache and register file bandwidth have also doubled. Performance in register-starved scenarios is also improved as there are more registers available to each thread. This goes in hand with an increase of total number of registers each thread can address, moving from 63 registers per thread to 255 registers per thread with GK110.

With GK110, Nvidia also reworked the GPU texture cache to be used for compute. With 48KB in size, in compute the texture cache becomes a read-only cache, specializing in unaligned memory access workloads. Furthermore error detection capabilities have been added to make it safer for use with workloads that rely on ECC.

Features

The GeForce 700 Series contains features from both GK104 and GK110. Kepler based members of the 700 series add the following standard features to the GeForce family.

Derive from GK104 :

PCI Express 3.0 interface

DisplayPort 1.2
HDMI 1.4a 4K x 2K video output
Purevideo VP5 hardware video acceleration (up to 4K x 2K H.264 decode)
Hardware H.264 encoding acceleration block (NVENC)
Support for up to 4 independent 2D displays, or 3 stereoscopic/3D displays (NV Surround)
Bindless Textures
GPU Boost
TXAA
Manufactured by TSMC on a 28 nm process

New Features from GK110 :

Compute Focus SMX Improvement
CUDA Compute Capability 3.5
New Shuffle Instructions
Dynamic Parallelism
Hyper-Q (Hyper-Q's MPI functionality reserve for Tesla only)
Grid Management Unit
NVIDIA GPUDirect (GPU Direct’s RDMA functionality reserve for Tesla only)

Compute Focus SMX Improvement

With GK110, Nvidia opted to increase compute performance. The single biggest change from GK104 is that rather than 8 dedicated FP64 CUDA cores, GK110 has up to 64, giving it 8x the FP64 throughput of a GK104 SMX. The SMX also sees an increase in space for register file. Register file space has increased to 256KB compared to Fermi. The texture cache are also improved. With a 48KB space, the texture cache can become a read-only cache for compute workloads.

New Shuffle Instructions

At a low level, GK110 sees an additional instructions and operations to further improve performance. New shuffle instructions allow for threads within a warp to share data without going back to memory, making the process much quicker than the previous load/share/store method. Atomic operations are also overhauled, speeding up the execution speed of atomic operations and adding some FP64 operations that were previously only available for FP32 data.

Hyper-Q

Hyper-Q expands GK110 hardware work queues from 1 to 32. The significance of this being that having a single work queue meant that Fermi could be under occupied at times as there wasn’t enough work in that queue to fill every SM. By having 32 work queues, GK110 can in many scenarios, achieve higher utilization by being able to put different task streams on what would otherwise be an idle SMX. The simple nature of Hyper-Q is further reinforced by the fact that it’s easily map to MPI, a common message passing interface frequently used in HPC. As legacy MPI-based algorithms that were originally designed for multi-CPU systems that became bottlenecked by false dependencies now have a solution. By increasing the number of MPI jobs, it’s possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.

Dynamic Parallelism

Dynamic Parallelism ability is for kernels to be able to dispatch other kernels. With Fermi, only the CPU could dispatch a kernel, which incurs a certain amount of overhead by having to communicate back to the CPU. By giving kernels the ability to dispatch their own child kernels, GK110 can both save time by not having to go back to the CPU, and in the process free up the CPU to work on other tasks.

Grid Management Unit

Enabling Dynamic Parallelism requires a new grid management and dispatch control system. The new Grid Management Unit (GMU) manages and prioritizes grids to be executed. The GMU can pause the dispatch of new grids and queue pending and suspended grids until they are ready to execute, providing the flexibility to enable powerful runtimes, such as Dynamic Parallelism. The CUDA Work Distributor in Kepler holds grids that are ready to dispatch, and is able to dispatch 32 active grids, which is double the capacity of the Fermi CWD. The Kepler CWD communicates with the GMU via a bidirectional link that allows the GMU to pause the dispatch of new grids and to hold pending and suspended grids until needed. The GMU also has a direct connection to the Kepler SMX units to permit grids that launch additional work on the GPU via Dynamic Parallelism to send the new work back to GMU to be prioritized and dispatched. If the kernel that dispatched the additional workload pauses, the GMU will hold it inactive until the dependent work has completed.

NVIDIA GPUDirect

NVIDIA GPUDirect™ is a capability that enables GPUs within a single computer, or GPUs in different servers located across a network, to directly exchange data without needing to go to CPU/system memory. The RDMA feature in GPUDirect allows third party devices such as SSDs, NICs, and IB adapters to directly access memory on multiple GPUs within the same system, significantly decreasing the latency of MPI send and receive messages to/from GPU memory. It also reduces demands on system memory bandwidth and frees the GPU DMA engines for use by other CUDA tasks. Kepler GK110 also supports other GPUDirect features including Peer‐to‐Peer and GPUDirect for Video.

Products

GeForce 700 (7xx) series

The GeForce 700 series for desktop architecture. The processing power is obtained by multiplying shader clock speed, the number of cores and how many instructions the cores are capable of performing per cycle.

SPs - Shader Processors - Unified Shaders (Vertex shader / Geometry shader / Pixel shader) : TMUs - Texture mapping units : Render Output unit

Model	Launch	Code name	Fab (nm)	Bus interface	Memory (MiB)	Config core	Clock speed			Fillrate		Memory			API support (version)		Processing Power (GFLOPS)	TDP (watts)	Release Price (USD)
Model	Launch	Code name	Fab (nm)	Bus interface	Memory (MiB)	Config core	Core (MHz)	Shader (MHz)	Memory (MT/s)	Pixel (GP/s)	Texture (GT/s)	Bandwidth (GB/s)	Bus type	Bus width (bit)	DirectX	OpenGL	Processing Power (GFLOPS)	TDP (watts)	Release Price (USD)
GeForce GTX 770	May 30, 2013	GK104-425-A2	28	PCIe 3.0 x16	2048	1536:128:32	1058	1058	1753 (7012)	?	?	?	GDDR5	256	11.0	4.3	?	195
GeForce GTX 780	May 23, 2013	GK110-300-A1	28	PCIe 3.0 x16	3072	2304:192:48	863	863	1502 (6008)	?	?	?	GDDR5	384	11.0	4.3	?	250

GeForce 700M (7xxM) series

The GeForce 700M series for notebooks architecture. The processing power is obtained by multiplying shader clock speed, the number of cores and how many instructions the cores are capable of performing per cycle.

Unified Shaders (Vertex shader/Geometry shader/Pixel shader) : Texture mapping unit : Render Output unit

Model	Launch	Code name	Fab (nm)	Bus interface	Memory (MiB)	Config core	Clock speed			Fillrate		Memory			API support (version)		Processing Power (GFLOPS)	TDP (watts)	Notes
Model	Launch	Code name	Fab (nm)	Bus interface	Memory (MiB)	Config core	Core (MHz)	Shader (MHz)	Memory (MT/s)	Pixel (GP/s)	Texture (GT/s)	Bandwidth (GB/s)	Bus type	Bus width (bit)	DirectX	OpenGL	Processing Power (GFLOPS)	TDP (watts)	Notes
GeForce 705M	April 2013	GF119	40	PCIe 2.0 x16	1024	48:8:4	?	?	?	?	?	?	DDR3	64	11.0	4.3	?	?	OEM
GeForce 710M	Jan 2013	GF117	28	PCIe 2.0 x16	1024 2048	96:16:4	475	950	1800	1.9	7.6	14.4	DDR3	64	11.0	4.3	182.40	12	OEM
GeForce GT 720M	April 1, 2013	GF117	28	PCIe 2.0 x16	?	96:16:4	?	?	?	?	?	?	DDR3	64	11.0	4.3	?	?
GeForce GT 730M	April 1, 2013	GK208	28	PCIe 3.0 x16	?	192:16:16	?	?	?	?	?	?	DDR3	128	11.0	4.3	?	?
GeForce GT 730M	Jan 2013	GK107	28	PCIe 3.0 x16	2048	384:32:16	725	725	1800	5.8	23.2	28.8	DDR3	128	11.0	4.3	556.8	?
GeForce GT 735M	April 1, 2013	GK208	28	PCIe 3.0 x16	?	192:16:16	?	?	?	?	?	?	DDR3	128	11.0	4.3	?	?
GeForce GT 740M	April 1, 2013	GK208	28	PCIe 3.0 x16	?	192:16:16	?	?	?	?	?	?	DDR3	128	11.0	4.3	?	?
GeForce GT 740M	April 1, 2013	GK107	28	PCIe 3.0 x16	?	384:32:16	?	?	?	?	?	?	GDDR5	128	11.0	4.3	?	?
GeForce GT 745M	April 1, 2013	GK107	28	PCIe 3.0 x16	?	384:32:16	?	?	?	?	?	?	DDR3/GDDR5	128	11.0	4.3	?	?
GeForce GT 750M	April 1, 2013	GK107	28	PCIe 3.0 x16	?	384:32:16	967	967	2000 5000	?	?	32 80	DDR3/GDDR5	128	11.0	4.3	742.7	?
GeForce GTX 760M	May 2013	GK106	28	PCIe 3.0 x16	?	768:64:24	?	?	?	?	?	?	GDDR5	128	11.0	4.3	?	?
GeForce GTX 765M	May 2013	GK106	28	PCIe 3.0 x16	?	768:64:24	?	?	?	?	?	?	GDDR5	128	11.0	4.3	?	?
GeForce GTX 770M	May 2013	GK104	28	PCIe 3.0 x16	3072	960:80:24	600	600	2800	14.4	48.0	67.2	GDDR5	192	11.0	4.3	1152	75
GeForce GTX 780M	May 2013	GK104	28	PCIe 3.0 x16	4096	1536:128:32	720	720	5000	23	92.2	160	GDDR5	256	11.0	4.3	2234.3	100+

Chipset table

Main article: Comparison table of GeForce 700 Series

References

^ "NVIDIA Launches Tesla K20 & K20X: GK110 Arrives At Last". AnandTech. 11/12/2012. {{cite web}}: Check date values in: |date= (help)
"NVIDIA-Kepler-GK110-Architecture-Whitepaper" (PDF). {{cite web}}: Cite has empty unknown parameter: |1= (help)

External links

Nvidia

GeForce (List of GPUs)

Fixed pixel pipeline

Pre-GeForce

Vertex and pixel shaders

GeForce 3

4 Ti

Unified shaders

Unified shaders & NUMA

Ray tracing & Tensor Cores

Software and technologies

Multimedia acceleration	NVENC (video encoding) NVDEC (video decoding) PureVideo (video decoding)
Software	Cg (shading language) CUDA Nvidia GameWorks OptiX (ray tracing API) PhysX (physics SDK) Nvidia Omniverse (3D graphics) Nvidia RTX (ray tracing platform) Nvidia System Tools VDPAU (video decode API)
Technologies	Nvidia 3D Vision (stereo 3D) Nvidia G-Sync (variable refresh rate) Nvidia Optimus (GPU switching) Nvidia Surround (multi-monitor) MXM (module/socket) SXM (module/socket) NVLink (protocol) Scalable Link Interface (multi-GPU) TurboCache (framebuffer in system memory) Video Super Resolution (live video upscaling)
GPU microarchitectures	Celsius Kelvin Rankine Curie Tesla Fermi Kepler Maxwell Pascal Volta Turing Ampere Hopper Ada Lovelace Blackwell Rubin

Other products

Graphics Workstation cards	Nvidia Quadro Quadro Plex
GPGPU	Nvidia Tesla DGX
Console components	NV2A (Xbox) RSX 'Reality Synthesizer' (PlayStation 3) Tegra NX-SoC (Nintendo Switch)
Nvidia Shield	Shield Portable Shield Tablet Shield Android TV GeForce Now
SoCs and embedded	GoForce Drive Jetson Tegra
CPUs	Project Denver
Computer chipsets	nForce

Company

Key people	Jen-Hsun Huang Chris Malachowsky Curtis Priem David Kirk Bill Dally Debora Shoquist Ranga Jayaraman Jonah M. Alben
Acquisitions	3dfx Interactive Ageia ULi Bright Computing Cumulus Networks DeepMap Icera Mellanox Technologies Mental Images PortalPlayer Exluna MediaQ Stexar

Categories:

Misplaced Pages