posted on 2024-11-02 14:20 read(132) comment(0) like(27) collect(1)
The official Mamba code link is: https://github.com/state-spaces/mamba . Based on the original blog " Summary of Mamba environment installation pitfalls and solutions ", the Mamba environment is installed without bypassing selective_scan_cuda, so that the same speed as Linux can be achieved1 .
(Installation issues/Self-resource/Paper collaboration ideas please +vx:931744281
)
conda create -n mamba python=3.10
conda activate mamba
conda install cudatoolkit==11.8
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install setuptools==68.2.2
conda install nvidia/label/cuda-11.8.0::cuda-nvcc_win-64
conda install packaging
pip install triton-2.0.0-cp310-cp310-win_amd64.whl
For more triton-2.0.0-cp310-cp310-win_amd64.whl
information, please refer to the original blog " Summary of Mamba environment installation pitfalls and solutions ".
causal-conv1d
The installation is the same as the original blog " Summary of Mamba environment installation pitfalls and solutions ", which is specifically detailed as follows:git clone https://github.com/Dao-AILab/causal-conv1d.git
cd causal-conv1d
git checkout v1.1.3 # 安装最新版的话,此步可省略
set CAUSAL_CONV1D_FORCE_BUILD=TRUE
pip install .
There is no official compiled whl for Windows , so you need to compile it manually using the above steps. I have compiled causal_conv1d-1.1.1-cp310-cp310-win_amd64.whl for Windows , which can also be downloaded and installed directly (only for torch 2.1).
pip install causal_conv1d-1.1.1-cp310-cp310-win_amd64.whl
After successful installation, a file will be generated in the corresponding virtual environment ( xxx\conda\envs\xxx\Lib\site-packages\
) , which corresponds to the causal_conv1d_cuda package.causal_conv1d_cuda.cp310-win_amd64.pyd
mamba-ssm
Prepare the environment and download the project files.git clone https://github.com/state-spaces/mamba.git
cd mamba
git checkout v1.1.3 # 安装最新版的话,此步可省略
Note that in the above process, the new version of mamba-ssm needs to be used with the new version of causal-conv1d, otherwise the functions are incompatible. After completing the preliminary work, proceed to the next step of formal compilation.
mamba-ssm
Compilation under Windowssetup.py
Modify line 41 of the mamba source code :FORCE_BUILD = os.getenv("MAMBA_FORCE_BUILD", "TRUE") == "TRUE"
csrc/selective_scan/selective_scan_fwd_kernel.cuh
function ofvoid selective_scan_fwd_launch
void selective_scan_fwd_launch(SSMParamsBase ¶ms, cudaStream_t stream) { // Only kNRows == 1 is tested for now, which ofc doesn't differ from previously when we had each block // processing 1 row. static constexpr int kNRows = 1; BOOL_SWITCH(params.seqlen % (kNThreads * kNItems) == 0, kIsEvenLen, [&] { BOOL_SWITCH(params.is_variable_B, kIsVariableB, [&] { BOOL_SWITCH(params.is_variable_C, kIsVariableC, [&] { BOOL_SWITCH(params.z_ptr != nullptr , kHasZ, [&] { using Ktraits = Selective_Scan_fwd_kernel_traits<kNThreads, kNItems, kNRows, kIsEvenLen, kIsVariableB, kIsVariableC, kHasZ, input_t, weight_t>; // constexpr int kSmemSize = Ktraits::kSmemSize; static constexpr int kSmemSize = Ktraits::kSmemSize + kNRows * MAX_DSTATE * sizeof(typename Ktraits::scan_t); // printf("smem_size = %d\n", kSmemSize); dim3 grid(params.batch, params.dim / kNRows); auto kernel = &selective_scan_fwd_kernel<Ktraits>; if (kSmemSize >= 48 * 1024) { C10_CUDA_CHECK(cudaFuncSetAttribute( kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, kSmemSize)); } kernel<<<grid, Ktraits::kNThreads, kSmemSize, stream>>>(params); C10_CUDA_KERNEL_LAUNCH_CHECK(); }); }); }); }); }
csrc/selective_scan/static_switch.h
function ofBOOL_SWITCH
#define BOOL_SWITCH(COND, CONST_NAME, ...) \
[&] { \
if (COND) { \
static constexpr bool CONST_NAME = true; \
return __VA_ARGS__(); \
} else { \
static constexpr bool CONST_NAME = false; \
return __VA_ARGS__(); \
} \
}()
(These two steps are to constexpr
change to static constexpr
)
csrc/selective_scan/cus/selective_scan_bwd_kernel.cuh
and csrc/selective_scan/cus/selective_scan_fwd_kernel.cuh
files:#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074
#endif
pip install .
can generally compile and install successfully.pip install mamba_ssm-1.1.3-cp310-cp310-win-amd64.whl
Since selective_scan_cuda is not bypassed at this time, the selective_scan_cuda.cp310-win-amd64.pydxxx\conda\envs\xxx\Lib\site-packages\
file is generated in the virtual environment ( ) , so the running speed is faster.
mamba_ssm
Problems and solutions for compiling under Windows (20240714)If you compile the source code directly using `pip install .`` without making any changes, the following error will appear:
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/data/xxx/anaconda3/envs/xxx/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
This is the most basic error message. It will be output as long as there is a compilation error. If you don't see a specific error message on it, you can setup.py
change it in
cmdclass={"bdist_wheel": CachedWheelsCommand, "build_ext": BuildExtension}
Change to
cmdclass={"bdist_wheel": CachedWheelsCommand, "build_ext": BuildExtension.with_options(use_ninja=False)}
By default, pytorch uses ninjia as backend 2. If you disable it, you can see the specific error message, but the compilation speed will actually be slower, so you can change it back after solving the bug.
Note: Some bloggers changed ['ninja','-v'] in the lib/python3.6/site-packages/torch/utils/cpp_extension.py file in the anaconda environment to ['ninja','–v'] or ['ninja','–version'], which is a wrong approach and only treats the symptoms but not the root cause.
Under Windows, a large number of errors will appear as follows:
xxx\mamba-1.1.3\csrc\selective_scan\selective_scan_bwd_kernel.cuh(221): error: identifier "M_LOG2E" is undefined
For the reason why this happens, please refer to the issue :
Note for the owners: The reason for needing #define is stated here: https://stackoverflow.com/a/56974843:
“On windows it is using the Microsoft compiler for that. So the Microsoft compiler is correct to disallow VLA, and there is no way to avoid this AFAIK. Your code works on linux, because on linux nvcc uses the g++ host compiler, and it allows (in a non-standard-compliant way) the use of a VLA in C++ host code.”
Therefore, just add the following code to the csrc/selective_scan/cus/selective_scan_bwd_kernel.cuh
and filescsrc/selective_scan/cus/selective_scan_fwd_kernel.cuh
#ifndef M_LOG2E
#define M_LOG2E 1.4426950408889634074
#endif
Under Windows, a large number of errors will appear as follows:
error C2975: “kIsVariableC_”:“Selective_Scan_bwd_kernel_traits”的模板参数无效,应为编译时常量表达式
Change csrc/selective_scan/static_switch.h
the function to constexpr
, static constexpr
refer to issue . For specific steps, refer to the previous section.
Under Windows, a large number of errors will appear as follows:
xxx\mamba-1.1.3\csrc\selective_scan\selective_scan_fwd_kernel.cuh(314): error C2975: “kNRows_”:“Selective_Scan_fwd_kernel_traits”的模板参数无效,应为编译时常量表达式
Change csrc/selective_scan/selective_scan_fwd_kernel.cuh
the function void selective_scan_fwd_launch
to constexpr
, static constexpr
refer to issue . For specific steps, refer to the previous section.
Some students found the following error after installation:
ImportError: DLL load failed while importing causal_conv1d_cuda: 找不到指定的程序。
or
ImportError: DLL load failed while importing selective_scan_cuda: 找不到指定的程序。
Although and have been generated in the corresponding locations of the virtual environment, they still cannot be imported causal_conv1d_cuda.cp310-win-amd64.pyd
and called.selective_scan_cuda.cp310-win-amd64.pyd
After checking the dependencies of these two packages, I found that
they are both highly dependent on torch-related dlls, so I guess the error is due to a conflict in the torch version. Uninstalling torch and reinstalling it solved the problem. (A student installed two different versions of torch in his environment, so there was a conflict.)
pip uninstall torch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
Since the two whl environments I compiled earlier are both torch 2.1, the environment you install must also be 2.1, otherwise this error will be reported when calling related functions through whl installation.
Author:Ineverleft
link:http://www.pythonblackhole.com/blog/article/245806/283225757dea40ef1250/
source:python black hole net
Please indicate the source for any form of reprinting. If any infringement is discovered, it will be held legally responsible.
name:
Comment content: (supports up to 255 characters)
Copyright © 2018-2021 python black hole network All Rights Reserved All rights reserved, and all rights reserved.京ICP备18063182号-7
For complaints and reports, and advertising cooperation, please contact vgs_info@163.com or QQ3083709327
Disclaimer: All articles on the website are uploaded by users and are only for readers' learning and communication use, and commercial use is prohibited. If the article involves pornography, reactionary, infringement and other illegal information, please report it to us and we will delete it immediately after verification!