Skip to main content

2 posts tagged with "FLAMLv2"

View All Tags

ยท 3 min read
Jiale Liu

TL;DR: We demonstrate how to use flaml.autogen for local LLM application. As an example, we will initiate an endpoint using FastChat and perform inference on ChatGLMv2-6b.

Preparations

Clone FastChat

FastChat provides OpenAI-compatible APIs for its supported models, so you can use FastChat as a local drop-in replacement for OpenAI APIs. However, its code needs minor modification in order to function properly.

git clone https://github.com/lm-sys/FastChat.git
cd FastChat

Download checkpoint

ChatGLM-6B is an open bilingual language model based on General Language Model (GLM) framework, with 6.2 billion parameters. ChatGLM2-6B is its second-generation version.

Before downloading from HuggingFace Hub, you need to have Git LFS installed.

git clone https://huggingface.co/THUDM/chatglm2-6b

Initiate server

First, launch the controller

python -m fastchat.serve.controller

Then, launch the model worker(s)

python -m fastchat.serve.model_worker --model-path chatglm2-6b

Finally, launch the RESTful API server

python -m fastchat.serve.openai_api_server --host localhost --port 8000

Normally this will work. However, if you encounter error like this, commenting out all the lines containing finish_reason in fastchat/protocol/api_protocal.py and fastchat/protocol/openai_api_protocol.py will fix the problem. The modified code looks like:

class CompletionResponseChoice(BaseModel):
index: int
text: str
logprobs: Optional[int] = None
# finish_reason: Optional[Literal["stop", "length"]]

class CompletionResponseStreamChoice(BaseModel):
index: int
text: str
logprobs: Optional[float] = None
# finish_reason: Optional[Literal["stop", "length"]] = None

Interact with model using oai.Completion

Now the models can be directly accessed through openai-python library as well as flaml.oai.Completion and flaml.oai.ChatCompletion.

from flaml import oai

# create a text completion request
response = oai.Completion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL", # just a placeholder
}
],
prompt="Hi",
)
print(response)

# create a chat completion request
response = oai.ChatCompletion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
}
],
messages=[{"role": "user", "content": "Hi"}]
)
print(response)

If you would like to switch to different models, download their checkpoints and specify model path when launching model worker(s).

interacting with multiple local LLMs

If you would like to interact with multiple LLMs on your local machine, replace the model_worker step above with a multi model variant:

python -m fastchat.serve.multi_model_worker \
--model-path lmsys/vicuna-7b-v1.3 \
--model-names vicuna-7b-v1.3 \
--model-path chatglm2-6b \
--model-names chatglm2-6b

The inference code would be:

from flaml import oai

# create a chat completion request
response = oai.ChatCompletion.create(
config_list=[
{
"model": "chatglm2-6b",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
},
{
"model": "vicuna-7b-v1.3",
"api_base": "http://localhost:8000/v1",
"api_type": "open_ai",
"api_key": "NULL",
}
],
messages=[{"role": "user", "content": "Hi"}]
)
print(response)

For Further Reading

ยท 4 min read
Qingyun Wu

TL;DR:

  • Celebrating FLAML's milestone: 1 million downloads
  • Introducing Large Language Model (LLM) support in the upcoming FLAML v2

This week, FLAML has reached a significant milestone: 1 million downloads. Originating as an intern research project within Microsoft Research, FLAML has grown into an open-source library used widely across the industry and supported by an active community. As we celebrate this milestone, we want to recognize the passionate contributors and users who have played an essential role in molding FLAML into the flourishing project it is today. Our heartfelt gratitude goes out to each of you for your unwavering support, constructive feedback, and innovative contributions that have driven FLAML to new heights. A big shoutout to our industrial collaborators from Azure Core, Azure Machine Learning, Azure Synapse Analytics, Microsoft 365, ML.NET, Vowpal Wabbit, Anyscale, Databricks, and Wise; and academic collaborators from MIT, Penn State University, Stevens Institute of Technology, Tel Aviv University, Texas A & M University, University of Manchester, University of Washington, and The Chinese University of Hong Kong etc.

We'd also like to take the opportunity to reflect on FLAML's past achievements and its future roadmap, with a particular focus on large language models (LLM) and LLMOps.

FLAML's Journey: Past Achievements and Milestones

Bring AutoML to One's Fingertips

FLAML offers an off-the-shelf AutoML solution that enables users to quickly discover high-quality models or configurations for common ML/AI tasks. By automatically selecting models and hyperparameters for training or inference, FLAML saves users time and effort. FLAML has significantly reduced development time for developers and data scientists alike, while also providing a convenient way to integrate new algorithms into the pipeline, enabling easy extensions and large-scale parallel tuning. These features make FLAML a valuable tool in R&D efforts for many enterprise users. FLAML is capable of handling a variety of common ML tasks, such as classification, regression, time series forecasting, NLP tasks, and generative tasks, providing a comprehensive solution for various applications.

Speed and Efficiency: The FLAML Advantage

What sets FLAML apart from other AutoML libraries is its exceptional efficiency, thanks to the economical and efficient hyperparameter optimization and model selection methods developed in our research. FLAML is also capable of handling large search spaces with heterogeneous evaluation costs, complex constraints, guidance, and early stopping. The zero-shot AutoML option further reduces the cost of AutoML, making FLAML an even more attractive solution for a wide range of applications with low resources.

Easy Customization and Extensibility

FLAML is designed for easy extensibility and customization, allowing users to add custom learners, metrics, search space, etc. For example, the support of hierarchical search spaces allows one to first choose an ML learner and then sampling from the hyperparameter space specific to that learner. The level of customization ranges from minimal (providing only training data and task type as input) to full (tuning a user-defined function). This flexibility and support for easy customization have led to FLAML's adoption in various domains, including security, finance, marketing, engineering, supply chain, insurance, and healthcare, delivering highly accurate results.

Embracing Large Language Models in FLAML v2

As large language models continue to reshape the AI ecosystem, FLAML is poised to adapt and grow alongside these advancements. Recognizing the importance of large language models, we have recently incorporated an autogen package into FLAML, and are committed to focusing our collective efforts on addressing the unique challenges that arise in LLMOps (Large Language Model Operations).

In its current iteration, FLAML offers support for model selection and inference parameter tuning for large language models. We are actively working on the development of new features, such as low-level inference API with caching, templating, filtering, and higher-level components like LLM-based coding and interactive agents, to enable more effective and economical usage of LLM.

We are eagerly preparing for the launch of FLAML v2, where we will place special emphasis on incorporating and enhancing features specifically tailored for large language models (LLMs), further expanding FLAML's capabilities. We invite contributions from anyone interested in this topic and look forward to collaborating with the community as we shape the future of FLAML and LLMOps together.

For Further Reading

Do you have any experience to share about LLM applications? Do you like to see more support or research of LLMOps? Please join our Discord server for discussion.