-
Notifications
You must be signed in to change notification settings - Fork 958
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Python -VV
-
Pip Freeze
pip freeze | grep mistral
mistral_common==1.5.1
mistral_inference==1.5.0
Reproduction Steps
self.model_path = model_path
try:
from mistral_inference.transformer import Transformer
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
except ImportError as err:
logging.critical('Please install `mistral-inference` and `mistral_common`')
raise err
if os.path.exists(model_path):
cache_path = model_path
else:
if get_cache_path(model_path) is None:
snapshot_download(repo_id=model_path)
cache_path = get_cache_path(self.model_path, repo_type='models')
self.tokenizer = MistralTokenizer.from_file(f'{cache_path}/tekken.json')
model = Transformer.from_folder(cache_path, device='cpu')
model.cuda()
self.model = model
self.max_tokens = 2048
Expected Behavior
- The inference for Pixtral is super slow. Is their a way to specify to use flash-attention2 ?
Additional Context
No response
Suggested Solutions
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working