Finance meets AI

AI & Bussiness

Do this single thing to learn more about ChatGPT & LLMs

If you want to learn more about nature, the best thing to do is to go hiking. This will increase your urge to learn more, or start reading about it, even if reading isn’t typically your thing. Similarly, If you want to learn more about LLMs, there is something to connect with them.

This is how it happened to me. The moment I downloaded such a language model on my computer and ran it locally, I became obsessed, and got a much better feeling of what  they are, how they work, their abilities, requirements, and probably a sense of their future developments.

You will be impressed to know that a model that offers the conversational magic you see on OpenAI’s ChatGPT, is just a file of a few GBytes that can easily fit in your hard drive. The inference engine/app to load it into can be just a Mac or Windows app, such as LM Studio, or Jan, which are for free and easily downloadable. Their interface is super simple and user friendly.

It looks like ChatGPT, but it is so strange that by the moment your PC struggles to produce the tokens and gets hot, you come closer to the idea of LLMs, and how they operate. It is like lifting the hood of your car. You haven’t become an engineer yet but you can sense what it is, and start observing its parts.

Your ability to download different models, and try their behaviour is a good exercise for discovering their strengths and limitations, and hardware requirements. Trying different settings, and models, feels like you are a “car modder”.

A year ago, there were no models available that could match GPT 3.5 intelligence. But today the best local models have started matching it or overpassing it. The problem is that for you to run Mixtral 8x7b, which is GPT3.5 equivalent, you need a lot of ram (32GB, or even better 64GB)

The scope of this article is not to delve into all the details, but rather to explain, in simple and abstract ways, a few characteristics that matter. The hope is that this will excite you and motivate you to go and try them.

Number of parameters: The bigger the model, the better the performance. Models with about 70 billion parameters probably match GPT 3.5. Some of them may be bad despite their big size, and some are miraculously good for their smaller size. For example, the newest LLama 8B (8 billion parameters model), seems to be really good for its size. However, if you want reasoning abilities that come closer to GPT 4, the king of LLMs, you will need the biggest available models.

Hardware Needs: The size of the model in billion parameters is linked with the RAM requirements and the speed of produced text (tokens per second). A 16-bit precision model (which I will explain below) requires twice as many parameters in gigabytes (GB) of RAM. For example, an 8-billion-parameter model demands 16GB of RAM.

Quantisation for reducing Requirements: Quantisation is a method that slightly decreases the abilities of the model, but makes it smaller and quicker. For instance, a Q8 precision model (compared to a 16-bit precision one) requires only half the RAM. Further reduction to a Q4 precision model demands just a quarter of the RAM.

With this level of quantization, even a system with 32GB of RAM can efficiently run models with up to 64GB of parameters — practically a bit less, as some memory is used by your system, and for the context. You may learn more about quantisation here

CPU vs GPU
To have acceptable speed you need GPUs and fast VRAM. So you may need two or three Nvidia 3090s to run the biggest models, which will cost you more than $2,000. If you don’t want to break the bank, your RAM and CPU will be fine, but very slow, especially for larger models. You will be running the small models with speeds of about 5 tokens per second, and the large ones with 1 token per second or so. You will just need some additional RAM if you want to experience the largest ones, and RAM is fortunately very cheap.

Some models to start with

If you are lucky to have 64GB or RAM, try the quantised versions of the QWen 72B 1.5 model, or the Llama 3 70B. Here you can see how the models perform: LMSYS leaderboard


With less RAM or speedy GPU VRAM, you may test the less capable, but still usable Llama 8B.

Final Thoughts

LM Studio and local inference will not replace the free and quick ChatGPT, but by using it and playing with it, especially in cases you care about privacy, it will massively accelerate your knowledge.

Very soon, you will find yourself interested in low code automations offered by frameworks and tools like Flowise or Autogen, which we could cover in a future article. So, ChatGPT,  once exotic and non-comprehensible to you, will start having a different essence in your mind.

As a business or investor it will immeasurably benefit you, because you will be able to approach and sense the possible effects of LLM for your business or investments. It will help you navigate through this transformation, and identify opportunities and possible use cases for you.

The simple and quick thing to do is to download a program like LM studio, and start the journey.

If you enjoyed the article, share, and leave any thoughts or questions in the comments below.


Discover more from FinAI

Subscribe to get the latest posts sent to your email.

1 Comment

  1. FinAI

    Some people say that Msty is even better. Is this true?

Leave a Reply

Discover more from FinAI

Subscribe now to keep reading and get access to the full archive.

Continue reading