From lixia789 at gmail.com Thu Aug 19 15:02:28 2010 From: lixia789 at gmail.com (Lixia Liu) Date: Thu, 19 Aug 2010 18:02:28 -0400 Subject: [PTLsim-devel] Cause of inaccurate simulated cycle number? Message-ID: <013f01cb3fea$376d2cb0$a6478610$@com> Hello. I am using ptlsim (userspace) to simulate sequential code on Intel multicore (Q6600). I found that sometimes it gives more than twice cycles # than hardware performance counter. One example is something like below. int tmp = 1; for (int i=0; i<100000; i++) { tmp = tmp + tmp*2; } The code is compiled by G++ 4.3.4 with -O3 option. The generated assembly code is quite simple. 4008a0: 83 c0 01 add $0x1,%eax 4008a3: 8d 1c 5b lea (%rbx,%rbx,2),%ebx 4008a6: 3d a0 86 01 00 cmp $0x186a0,%eax 4008ab: 75 f3 jne 4008a0 Measured results: Ptlsim: 400042 instructions and 407505 cycles Performance counter by Pfmon: 402333 instructions and 171829 cycles Does anyone have idea on the problem and how to fix it? Thanks! Best Regards, Lixia -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.ptlsim.org/pipermail/ptlsim-devel/attachments/20100819/4e9a6102/attachment.html From hiyangxi at gmail.com Thu Aug 19 15:26:57 2010 From: hiyangxi at gmail.com (xi yang) Date: Fri, 20 Aug 2010 08:26:57 +1000 Subject: [PTLsim-devel] Cause of inaccurate simulated cycle number? In-Reply-To: <013f01cb3fea$376d2cb0$a6478610$@com> References: <013f01cb3fea$376d2cb0$a6478610$@com> Message-ID: On Fri, Aug 20, 2010 at 8:02 AM, Lixia Liu wrote: > Hello. > > I am using ptlsim (userspace) to simulate sequential code on Intel multicore > (Q6600). I found that > > sometimes it gives more than twice cycles # than hardware performance > counter. So, you compared two micro-architecture: Intel Core2 VS PTLsim OOO core. If you run the same program in AMD K10 and Intel Core2, you will get different numbers. If you run the same program in Intel nehalem and Intel Core2, you will get different numbers. Regards. > > One example is something like below. > > ?? int tmp = 1; > > ?? for (int i=0; i<100000; i++) > > ?? { > > ????? tmp = tmp + tmp*2; > > ?? } > > The code is compiled by G++ 4.3.4 with ?O3 option. The generated assembly > code is quite simple. > > 4008a0:?? 83 c0 01??????????????? add??? $0x1,%eax > > 4008a3:?? 8d 1c 5b??????????????? lea??? (%rbx,%rbx,2),%ebx > > 4008a6:?? 3d a0 86 01 00?????? cmp??? $0x186a0,%eax > > 4008ab:?? 75 f3???????????? ????? ? jne??? 4008a0 > > Measured results: > > ??????? Ptlsim: 400042 instructions and 407505 cycles > > ??????? Performance counter by Pfmon: 402333 instructions and 171829 cycles > > Does anyone have idea on the problem and how to fix it? Thanks! > > Best Regards, > > Lixia > > _______________________________________________ > ptlsim-devel mailing list > ptlsim-devel at ptlsim.org > http://www.ptlsim.org/mailman/listinfo/ptlsim-devel > > From lixia789 at gmail.com Thu Aug 19 15:36:50 2010 From: lixia789 at gmail.com (Lixia Liu) Date: Thu, 19 Aug 2010 18:36:50 -0400 Subject: [PTLsim-devel] Cause of inaccurate simulated cycle number? In-Reply-To: References: <013f01cb3fea$376d2cb0$a6478610$@com> Message-ID: <014401cb3fef$046c6f60$0d454e20$@com> Yes, I do expect to see different number from different micro-architectures. But this kind of large difference should have a reason. Especially I'd like to see how to extend the ptl's OOO core to simulate Intel microarchitecture. There, any suggestion on that is appreciated. Thanks. -----Original Message----- From: xi yang [mailto:hiyangxi at gmail.com] Sent: Thursday, August 19, 2010 6:27 PM To: Lixia Liu Cc: ptlsim-devel at ptlsim.org Subject: Re: [PTLsim-devel] Cause of inaccurate simulated cycle number? On Fri, Aug 20, 2010 at 8:02 AM, Lixia Liu wrote: > Hello. > > I am using ptlsim (userspace) to simulate sequential code on Intel multicore > (Q6600). I found that > > sometimes it gives more than twice cycles # than hardware performance > counter. So, you compared two micro-architecture: Intel Core2 VS PTLsim OOO core. If you run the same program in AMD K10 and Intel Core2, you will get different numbers. If you run the same program in Intel nehalem and Intel Core2, you will get different numbers. Regards. > > One example is something like below. > > ?? int tmp = 1; > > ?? for (int i=0; i<100000; i++) > > ?? { > > ????? tmp = tmp + tmp*2; > > ?? } > > The code is compiled by G++ 4.3.4 with ?O3 option. The generated assembly > code is quite simple. > > 4008a0:?? 83 c0 01??????????????? add??? $0x1,%eax > > 4008a3:?? 8d 1c 5b??????????????? lea??? (%rbx,%rbx,2),%ebx > > 4008a6:?? 3d a0 86 01 00?????? cmp??? $0x186a0,%eax > > 4008ab:?? 75 f3???????????? ????? ? jne??? 4008a0 > > Measured results: > > ??????? Ptlsim: 400042 instructions and 407505 cycles > > ??????? Performance counter by Pfmon: 402333 instructions and 171829 cycles > > Does anyone have idea on the problem and how to fix it? Thanks! > > Best Regards, > > Lixia > > _______________________________________________ > ptlsim-devel mailing list > ptlsim-devel at ptlsim.org > http://www.ptlsim.org/mailman/listinfo/ptlsim-devel > >