PrimeGrid
Please visit donation page to help the project cover running costs for this month

Toggle Menu

Join PrimeGrid

Returning Participants

Community

Leader Boards

Results

Other

drummers-lowrise

Advanced search

Message boards : Number crunching : Cache limitations on Intel X computers

Author Message
Profile NickProject donor
Avatar
Send message
Joined: 11 Jul 11
Posts: 840
ID: 105020
Credit: 1,195,431,212
RAC: 1,663,854
Discovered 3 mega primesFound 5 primes in the 2020 Tour de PrimesFound 2 mega primes in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (10,373,391)Cullen LLR Jade: Earned 10,000,000 credits (10,628,251)ESP LLR Jade: Earned 10,000,000 credits (10,142,420)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (11,015,573)PPS LLR Sapphire: Earned 20,000,000 credits (20,087,468)PSP LLR Sapphire: Earned 20,000,000 credits (20,337,767)SoB LLR Sapphire: Earned 20,000,000 credits (20,827,751)SR5 LLR Jade: Earned 10,000,000 credits (10,194,757)SGS LLR Jade: Earned 10,000,000 credits (13,125,640)TRP LLR Sapphire: Earned 20,000,000 credits (20,307,419)Woodall LLR Sapphire: Earned 20,000,000 credits (22,371,625)321 Sieve Sapphire: Earned 20,000,000 credits (20,380,527)Cullen/Woodall Sieve (suspended) Gold: Earned 500,000 credits (744,531)Generalized Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,007,004)PPS Sieve Double Gold: Earned 500,000,000 credits (563,813,452)TRP Sieve (suspended) Bronze: Earned 10,000 credits (21,181)AP 26/27 Sapphire: Earned 20,000,000 credits (20,845,708)GFN Double Silver: Earned 200,000,000 credits (415,239,435)
Message 143307 - Posted: 14 Sep 2020 | 18:22:17 UTC
Last modified: 14 Sep 2020 | 18:39:34 UTC

Recently I discovered that running 2 x Woo 7-thread tasks on a 9960X ran slowly.
Running 2 x Woo 8-thread tasks ran fast (30% faster).

2 x 7-thread tasks: L2 cache 14 Mb, L3 cache 22 Mb
2 x 8-thread tasks: L2 cache 16 Mb, L3 cache 22 Mb

The Woo tasks had an FFT of 1960K:
2 x 8 x 1960 = 31.36 Mb

I am testing the theory that the number we need to look at for Intel X computers is 2 x L2 cache used (as set by threads used in tasks).

Edit: I should have compared running (as an example) a single task of SOB - 10 threads versus 12 threads - before making this post

mackerelProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Oct 08
Posts: 2420
ID: 29980
Credit: 418,021,017
RAC: 167,223
Discovered 2 mega primesEliminated 1 conjecture "k"Found 3 primes in the 2018 Tour de PrimesFound 1 mega prime in the 2018 Tour de PrimesFound 5 primes in the 2019 Tour de PrimesFound 6 primes in the 2020 Tour de Primes321 LLR Turquoise: Earned 5,000,000 credits (8,774,878)Cullen LLR Turquoise: Earned 5,000,000 credits (5,149,818)ESP LLR Turquoise: Earned 5,000,000 credits (6,454,573)Generalized Cullen/Woodall LLR Turquoise: Earned 5,000,000 credits (5,122,074)PPS LLR Emerald: Earned 50,000,000 credits (77,020,336)PSP LLR Jade: Earned 10,000,000 credits (15,223,714)SoB LLR Jade: Earned 10,000,000 credits (17,319,914)SR5 LLR Sapphire: Earned 20,000,000 credits (23,996,561)SGS LLR Turquoise: Earned 5,000,000 credits (7,342,780)TPS LLR (retired) Bronze: Earned 10,000 credits (34,130)TRP LLR Jade: Earned 10,000,000 credits (18,602,519)Woodall LLR Turquoise: Earned 5,000,000 credits (5,715,464)321 Sieve Sapphire: Earned 20,000,000 credits (20,236,219)Cullen/Woodall Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,383,853)Generalized Cullen/Woodall Sieve (suspended) Sapphire: Earned 20,000,000 credits (20,626,419)PPS Sieve Emerald: Earned 50,000,000 credits (76,969,144)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Ruby: Earned 2,000,000 credits (2,293,882)TRP Sieve (suspended) Turquoise: Earned 5,000,000 credits (5,012,757)AP 26/27 Sapphire: Earned 20,000,000 credits (21,918,894)GFN Emerald: Earned 50,000,000 credits (71,887,807)PSA Ruby: Earned 2,000,000 credits (2,939,755)
Message 143312 - Posted: 14 Sep 2020 | 23:04:08 UTC - in response to Message 143307.

They have what Intel call non-inclusive cache. Data is not duplicated in L2 and L3 like on consumer models. So in effect you get to use both L2 and L3. I've not tested it in depth, but my assumption is that the cache amount for comparison should be the total L2+L3 amount.

This may also apply to AMD Ryzen CPUs, since they use exclusive cache, but the L2 is far smaller and IMO not significant enough to bother with, even before we get to CCX complications. No, I don't know what the difference between exclusive and non-inclusive caches are.

Profile GrebulonerProject donor
Volunteer tester
Avatar
Send message
Joined: 2 Nov 09
Posts: 348
ID: 49572
Credit: 1,682,867,886
RAC: 1,235,461
Discovered 1 mega primeFound 2 primes in the 2018 Tour de PrimesFound 4 primes in the 2019 Tour de PrimesFound 3 primes in the 2020 Tour de PrimesFound 1 mega prime in the 2020 Tour de Primes321 LLR Jade: Earned 10,000,000 credits (13,008,932)Cullen LLR Jade: Earned 10,000,000 credits (10,168,095)ESP LLR Jade: Earned 10,000,000 credits (11,401,438)Generalized Cullen/Woodall LLR Jade: Earned 10,000,000 credits (11,458,304)PPS LLR Sapphire: Earned 20,000,000 credits (33,262,524)PSP LLR Jade: Earned 10,000,000 credits (12,641,950)SoB LLR Sapphire: Earned 20,000,000 credits (20,393,852)SR5 LLR Sapphire: Earned 20,000,000 credits (21,058,593)SGS LLR Jade: Earned 10,000,000 credits (11,718,998)TRP LLR Sapphire: Earned 20,000,000 credits (20,191,201)Woodall LLR Jade: Earned 10,000,000 credits (10,037,126)321 Sieve Emerald: Earned 50,000,000 credits (55,630,279)Cullen/Woodall Sieve (suspended) Ruby: Earned 2,000,000 credits (4,178,073)Generalized Cullen/Woodall Sieve (suspended) Emerald: Earned 50,000,000 credits (56,046,594)PPS Sieve Double Gold: Earned 500,000,000 credits (521,014,891)Sierpinski (ESP/PSP/SoB) Sieve (suspended) Turquoise: Earned 5,000,000 credits (9,468,384)TRP Sieve (suspended) Jade: Earned 10,000,000 credits (10,076,645)AP 26/27 Double Silver: Earned 200,000,000 credits (488,579,413)GFN Double Silver: Earned 200,000,000 credits (236,340,258)PSA Double Bronze: Earned 100,000,000 credits (126,200,096)
Message 143314 - Posted: 15 Sep 2020 | 4:27:13 UTC - in response to Message 143312.

No, I don't know what the difference between exclusive and non-inclusive caches are.


"An inclusive cache contains everything in the cache underneath it and has to be at least the same size as the cache underneath (and usually a lot bigger), compared to an exclusive cache which has none of the data in the cache underneath it. The benefit of an inclusive cache means that if a line in the lower cache is removed due it being old for other data, there should still be a copy in the cache above it which can be called upon. The downside is that the cache above it has to be huge – with Skylake-S we have a 256KB L2 and a 2.5MB/core L3, meaning that the L2 data could be replaced 10 times before a line is evicted from the L3.

A non-inclusive cache is somewhat between the two, and is different to an exclusive cache: in this context, when a data line is present in the L2, it does not immediately go into L3. If the value in L2 is modified or evicted, the data then moves into L3, storing an older copy. (The reason it is not called an exclusive cache is because the data can be re-read from L3 to L2 and still remain in the L3). This is what we usually call a victim cache, depending on if the core can prefetch data into L2 only or L2 and L3 as required. In this case, we believe the SKL-SP core cannot prefetch into L3, making the L3 a victim cache similar to what we see on Zen, or Intel’s first eDRAM parts on Broadwell. Victim caches usually have limited roles, especially when they are similar in size to the cache below it (if a line is evicted from a large L2, what are the chances you’ll need it again so soon), but some workloads that require a large reuse of recent data that spills out of L2 will see some benefit."

Anandtech article

I'm sure there's an extensive set of tests that could find the total usable amount, but it sounds like depending on how the scheduler sees the software (or how the software sees the scheduler?) determines what the total unique current cache data amount is.
____________
Eating more cheese on Thursdays.

Post to thread

Message boards : Number crunching : Cache limitations on Intel X computers

[Return to PrimeGrid main page]
DNS Powered by DNSEXIT.COM
Copyright © 2005 - 2020 Rytis Slatkevičius (contact) and PrimeGrid community. Server load 1.35, 1.89, 2.13
Generated 1 Nov 2020 | 2:53:19 UTC