问题描述
我发现当参数大小为 <= 8192 和 > 8192 时,numpy.sin 的行为不同.不同之处在于性能和返回的值.有人能解释一下这种效果吗?
I discovered that numpy.sin behaves differently when the argument size is <= 8192 and when it is > 8192. The difference is in both performance and values returned. Can someone explain this effect?
例如,让我们计算 sin(pi/4):
For example, let's calculate sin(pi/4):
x = np.pi*0.25 for n in range(8191, 8195): xx = np.repeat(x, n) %timeit np.sin(xx) print(n, np.sin(xx)[0])
64.7 μs ± 194 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 8191 0.7071067811865476 64.6 μs ± 166 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 8192 0.7071067811865476 20.1 μs ± 189 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8193 0.7071067811865475 21.8 μs ± 13.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 8194 0.7071067811865475
超过 8192 个元素限制后,计算速度提高了 3 倍以上,并给出了不同的结果:最后一位数字变为 5 而不是 6.
After crossing the 8192 elements limit the calculations become more than 3 times faster and give a different result: the last digit becomes 5 instead of 6.
当我尝试以其他方式计算相同的值时:
When I tried to calculate the same value in other ways I obtained:
- C++ std::sin(Visual Studio 2017,Win32 平台)给出 0.7071067811865475;
- C++ std::sin(Visual Studio 2017,x64 平台)给出 0.70710678118654756;
- math.sin 给出 0.7071067811865476,这是合乎逻辑的,因为我使用的是 64 位 Python.
- C++ std::sin (Visual Studio 2017, Win32 platform) gives 0.7071067811865475;
- C++ std::sin (Visual Studio 2017, x64 platform) gives 0.70710678118654756;
- math.sin gives 0.7071067811865476, which is logical because I used 64-bit Python.
我在 NumPy 文档及其代码中都找不到任何解释.
I couldn't find any explanation in the NumPy documentation, nor in its code.
更新 #2:很难相信,但是将 sin 替换为 sqrt 给出了这样的结果:
Update #2: It is hard to believe, but replacing sin by sqrt gives this:
44.2 μs ± 751 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 8191 0.8862269254527579 44.1 μs ± 543 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 8192 0.8862269254527579 10.3 μs ± 105 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8193 0.886226925452758 10.4 μs ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8194 0.886226925452758
更新:np.show_config() 输出:
mkl_info: libraries = ['mkl_rt'] library_dirs = ['C:/GNU/Anaconda3\Library\lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\include', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\lib', 'C:/GNU/Anaconda3\Library\include'] blas_mkl_info: libraries = ['mkl_rt'] library_dirs = ['C:/GNU/Anaconda3\Library\lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\include', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\lib', 'C:/GNU/Anaconda3\Library\include'] blas_opt_info: libraries = ['mkl_rt'] library_dirs = ['C:/GNU/Anaconda3\Library\lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\include', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\lib', 'C:/GNU/Anaconda3\Library\include'] lapack_mkl_info: libraries = ['mkl_rt'] library_dirs = ['C:/GNU/Anaconda3\Library\lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\include', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\lib', 'C:/GNU/Anaconda3\Library\include'] lapack_opt_info: libraries = ['mkl_rt'] library_dirs = ['C:/GNU/Anaconda3\Library\lib'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\include', 'C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2019.0.117\windows\mkl\lib', 'C:/GNU/Anaconda3\Library\include']
推荐答案
正如@WarrenWeckesser 所写,这几乎可以肯定是 Anaconda 和 Intel MKL 的问题;参见 http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145332030.jpg 和 http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145333355.jpg".
As @WarrenWeckesser wrote, "it's almost certainly an Anaconda & Intel MKL issue; cf. http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145332030.jpg and http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145333355.jpg".
不幸的是,在 Windows 下解决此问题的唯一方法是卸载 Anaconda 并使用另一个具有 MKL-free numpy 的发行版.我使用了 http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145355560.jpg 中的 python-3.6.6-amd64 并安装了所有东西否则通过 pip,包括 numpy 1.14.5.我什至设法让 Spyder 工作(不得不将 PyQt5 降级到 5.11.3,它拒绝在 >= 5.12 上启动).
And unfortunately, the only way to solve the issue under Windows is to uninstall Anaconda and use another distribution with MKL-free numpy. I used python-3.6.6-amd64 from http://www.51sjk.com/Upload/Articles/1/0/336/336472_20221114145355560.jpg and installed everything else via pip, including numpy 1.14.5. I even managed to make Spyder work (had to downgrade PyQt5 to 5.11.3, it refused to launch on >= 5.12).
现在 np.sin(xx) 始终为 0.7071067811865476(n = 8192 时为 67.1 μs)和 np.sqrt(xx) 0.8862269254527579(16.4 微秒).有点慢,但完全可以重现.
Now np.sin(xx) is consistently 0.7071067811865476 (67.1 μs at n = 8192) and np.sqrt(xx) 0.8862269254527579 (16.4 μs). A bit slower, but perfectly reproducible.