Inferflow: an Efficient and Highly Configurable Inference Engine for ...
We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by ...
We present Inferflow, an efficient and highly configurable inference engine for large language models (LLMs). With Inferflow, users can serve most of the common transformer models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. Compared with most existing inference engines, Inferflow has some key features. First, by ...
lines, users, LLMs, efficient, Inferflow, key features, source code, single line, corresponding configuration files, common transformer models, large language models, configurable inference engine, most existing inference engines