LLMs in Focus: The Easy Guide & Your AI Glossary — 2024

VIVEK KUMAR UPADHYAY
43 min readMar 30, 2024

--

“The future is not something that happens to us, but something we create.” — Vivek

Hey everyone! 👋

Imagine having a chat with a computer that really gets you. That’s what LLMs, or Large Language Models, are all about. It’s 2024, and these smart programs are changing the game. They can write stories, answer questions, and even make jokes!

I’ve been busy gathering all the latest LLMs out there and I’m putting together a guide for each one. It’s going to be super straightforward and clear, so everyone can understand.

I’ve had the privilege of being guided and mentored by Dr. Alan D. Thompson. Dr. Thompson is a renowned expert in artificial intelligence, particularly known for his work in augmenting human intelligence and advancing the evolution of ‘integrated AI’. His extensive background includes serving as the former chairman for Mensa International’s gifted families and providing AI advisory to Fortune 500 companies, major governments, and intergovernmental entities.

Dr. Thompson’s contributions to the field of AI are vast, including the creation of a complete database of large language models and authoring the world’s most comprehensive analysis of datasets used for OpenAI GPT. His technical papers and resources on LLMs are used by all major AI labs, and his insights are featured across international media.

It’s his rigorous approach to AI and his commitment to making complex concepts accessible that made Dr. Thompson an invaluable mentor for this project. His guidance has ensured that the information presented here is not only accurate but also understandable for everyone, regardless of their background in AI.

For those interested in delving deeper into Dr. Thompson’s work and his contributions to the AI community, I encourage you to visit Life Architect. His expertise has been instrumental in shaping this guide, and for that, I extend my deepest gratitude.

So, if you’re curious about these AI wonders, hang tight. I’m working on making all this information easy to digest and fun to explore.

Ready to dive into the world of LLMs? Let’s go!

Let’s Start:

Grok-1.5:

Origin: Developed by xAI.

Description: A 314 billion parameter Mixture-of-Experts model trained from scratch. Released as an open-source model.

Notable Features: Large-scale, not fine-tuned for specific applications.

Source: Grok-1 Open Release1.

Llama 3:

Origin: Developed by Meta.

Description: An open-source LLM designed to empower developers and advance safety.

Notable Features: Powering innovation through access, suitable for research and commercial use.

Source: Llama2.

Meta GPT (Ajax):

Origin: Developed by Amazon.

Description: A large language model with 2 trillion parameters, aiming to compete with OpenAI’s GPT models.

Notable Features: Intended for corporate customers, powering features across Amazon services.

Source: Amazon’s Project Olympus3.

G3PO:

Origin: Developed by OpenAI.

Description: An open-source LLM, rumored to be in development to compete with Meta’s LLaMA and Google’s models.

Notable Features: Not yet officially released.

Source: OpenAI Steps Up Work on ‘G3PO’4.

Arrakis (GPT-4.5?):

Origin: Not specified.

Description: Potentially an iteration of the GPT series.

Notable Features: Unknown.

Source: Not available.

Gobi (GPT-5?):

Origin: Amazon.

Description: A large language model with 2 trillion parameters, potentially one of the largest LLMs.

Notable Features: Expected to outperform Titan (AWS’s current model) and GPT-4.

Source: Amazon’s Project Olympus5.

GPT-5:

Origin: Not specified.

Description: The next iteration in the GPT series.

Notable Features: Unknown.

Source: Not available.

Olympus:

Origin: Amazon.

Description: Being developed for corporate customers, expected to power features across Amazon services.

Notable Features: Larger than GPT-4.

Source: Amazon’s Project Olympus5.

EvoLLM-JP:

Origin: Developed by Sakana AI.

Description: An experimental general-purpose Japanese LLM created using the Evolutionary Model Merge method.

Notable Features: Merged from various source models, including Shisa Gamma 7B, WizardMath 7B V1.1, and Abel 7B 002.

Source: EvoLLM-JP-v1–10B1.

Parakeet:

Origin: Not specified.

Description: Details about Parakeet are limited, but it is likely a language model with specific characteristics.

Notable Features: Unknown.

Source: Not available.

RWKV-v5 EagleX:

Origin: Developed by Covariant.

Description: A Robotics Foundation Model (RFM) trained on both general internet data and rich physical real-world interactions.

Notable Features: Enhanced in-context learning, multi-image reasoning, and few-shot chain-of-thought prompting.

Source: RWKV-v5-Eagle-7B2.

MM1:

Origin: Developed by Apple.

Description: A multimodal LLM trained on both text and images, aiming to achieve human-like reasoning capabilities.

Notable Features: Combines large-scale multimodal pre-training with real-world robotics data.

Source: MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training3.

RFM-1:

Origin: Developed by Covariant.

Description: A Robotics Foundation Model trained on multimodal data, bridging the gap between language understanding and physical world interaction.

Notable Features: Accurate simulation and operation in demanding real-world conditions.

Source: Introducing RFM-1: Giving robots human-like reasoning capabilities4.

DeepSeek-VL:

Origin: Developed by DeepSeek AI.

Description: An open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. It possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.

Feature: Multimodal understanding across various data types.

Source: DeepSeek-VL on GitHub

AnyGPT:

Origin: Developed by a team of researchers.

Description: A unified multimodal LLM with discrete sequence modeling. It can handle various modalities, including speech, text, images, and music.

Feature: Discrete representations for unified processing of different modalities.

Source: AnyGPT on arXiv

Stable Beluga 2.5:

Origin: Developed by Stability AI.

Description: A fine-tuned version of LLaMA 2 70B, outperforming ChatGPT 3.5 in select benchmarks.

Feature: High performance in specific evaluation tasks.

Source: Stability AI News

Inflection-2.5:

Origin: Developed by Inflection AI.

Description: A multilingual Vision-Language Model designed for real-world vision and language understanding applications.

Feature: High efficiency, improved performance in coding and mathematics.

Source: Inflection AI

Apollo:

Origin: Developed by Anthropic.

Description: A lightweight multilingual medical LLM designed for democratizing medical AI to 6 billion people.

Feature: High comprehension and fluency on complex medical tasks.

Source: Apollo on arXiv

Claude 3 Opus:

Origin: Developed by Anthropic.

Description: An advanced model in the Claude 3 family, excelling in analysis, forecasting, content creation, and multilingual conversations.

Feature: High intelligence levels.

Source: Anthropic News

Samba-1:

Origin: Developed by SambaNova Systems.

Description: A model enabling trillion+ parameter models while delivering access control, security, and data privacy.

Feature: High accuracy, lower cost of ownership, and fine-tuning capabilities.

Source: SambaNova Systems

StarCoder 2:

Origin: Developed by BigCode.

Description: A transparently trained open code LLM, trained on The Stack v2 dataset.

Feature: Trained on 4 trillion tokens, supporting code generation.

Source: Geeky Gadgets

Megatron 530B:

Origin: Developed by NVIDIA.

Description: The world’s largest customizable language model, optimized for large-scale accelerated computing infrastructure.

Feature: Scalability and performance for sophisticated NLP models.

Source: NVIDIA News

Mistral Small:

Origin: Developed by Mistral AI.

Description: A decoder-based LLM with 7 billion parameters, optimized for low latency and cost efficiency.

Feature: Utilizes sliding window attention, grouped query attention, and byte-fallback BPE tokenizer.

Source: Mistral LLM and Langchain integration.

Mistral Large:

Origin: Also developed by Mistral AI.

Description: A high-performance model for multilingual reasoning, text comprehension, and code generation.

Feature: Incorporates innovations like RAG-enablement and function calling.

Source: Mistral AI unveils Mistral Large.

Hanooman:

Origin: Developed by the BharatGPT group in collaboration with Seetha Mahalaxmi Healthcare (SML).

Description: Responds in 11 Indian languages, including Hindi, Tamil, and Marathi.

Feature: Designed for healthcare, governance, financial services, and education.

Source: BharatGPT group unveils ‘Hanooman’.

Reka Edge:

Origin: Reka AI.

Description: An LLM optimized for local deployments.

Feature: Outperforms comparable models like Llama 2 7B and Mistral 7B.

Source: Reka AI Releases Reka Flash.

Reka Flash:

Origin: Reka AI.

Description: An efficient and capable multimodal LLM.

Feature: Achieves competitive results against models like GPT-4, Claude, and Gemini Pro.

Source: Reka AI Releases Reka Flash.

Gemma:

Origin: Google.

Description: A family of state-of-the-art open LLMs.

Feature: Comes in 2B and 7B parameter sizes, with base and instruction-tuned variants.

Source: Introducing Gemini 1.5.

Gemini 1.5 Pro:

Origin: Google.

Description: A mid-size multimodal model optimized for scaling across various tasks.

Feature: Achieves similar performance to Gemini 1.0 Ultra while using less compute.

Source: Introducing Gemini 1.5.

Qwen-1.5:

Origin: Alibaba Cloud.

Description: Part of the Qwen AI series, with varying sizes from 0.5 billion to 72 billion parameters.

Feature: Enhancements in language model technology.

Source: Alibaba Qwen 1.5.

GOODY-2

Origin: Developed by Microsoft, GOODY-2 is an AI model built with next-gen adherence to ethical principles.

Description: It’s designed to avoid answering controversial, offensive, or problematic queries.

Feature: Unbreakable ethical adherence, ensuring conversations stay within bounds.

Source: GOODY-2

Natural-SQL-7B

Origin: Developed by ChatDB, Natural-SQL-7B excels in converting natural language queries into SQL commands.

Description: It bridges the gap between non-technical users and complex database interactions.

Feature: Trained on 980B tokens of text data from 11 Southeast Asian languages.

Source: GitHub

Sea-Lion

Origin: Developed by AI Singapore, Sea-Lion focuses on Southeast Asia’s diverse contexts and languages.

Description: It’s designed to better represent the region’s breadth of cultures and languages.

Feature: Specialized vocabulary for optimal performance on SEA languages.

Source: AI Singapore

OLMo (Open Language Model)

Origin: Developed by AI2, OLMo is intentionally designed for open research.

Description: It provides access to data, training code, models, and evaluation code.

Feature: Full pretraining data, training code, and evaluation suite.

Source: AI2

FLOR-6.3B

Origin: Developed by Aina Tech, FLOR-6.3B has 6.3 trillion parameters.

Description: It allows advanced users and developers to explore AI capabilities.

Feature: Tailored for AI needs based on local languages and requirements.

Source: Projecte Aina Tech

Weaver

Origin: Developed by AI2, Weaver is dedicated to content creation.

Description: Pre-trained on a corpus to improve writing capabilities.

Feature: Models of different sizes (Mini, Base, Pro, Ultra).

Source: Hugging Face

Miqu 70B

Origin: Developed by Meta, Miqu-1–70B is a 70 billion parameter model.

Description: Quantized to run on less than 24GB of VRAM.

Feature: More accessible to users without high-end hardware.

Source: Cheatsheet

iFlytekSpark-13B

Origin: Developed by iFlytek and Huawei, SparkDesk helps build LLMs.

Description: Allows enterprises to create exclusive LLMs.

Feature: Parameters include 13 billion, 65 billion, and 175 billion.

Source: Yicai

Xinghuo 3.5 (Spark)

Origin: Developed by Chinese scientists, Spark v3.5 surpasses GPT-3.5.

Description: Outpaces GPT-4 Turbo in language, math, coding, and multimodal tasks.

Feature: Synthesizes speech conveying different emotions and tones.

Source: Live Science

MGIE:

Origin: Developed by Apple.

Description: MGIE (MLLM-Guided Image Editing) is designed to facilitate edit instructions and provide explicit guidance for image editing. It learns to derive expressive instructions and performs manipulation through end-to-end training1.

Feature: Focuses on image editing tasks.

Source: GitHub Repository

CodeLlama-70B:

Origin: Clarifai.

Description: CodeLlama-70B-Instruct is a state-of-the-art LLM specialized in code synthesis and understanding. It represents the pinnacle of the Code Llama series with 70 billion parameters, optimized for processing and generating code based on natural language instructions23.

Feature: Code synthesis and understanding.

Source: Clarifai Model Page

RWKV-v5 Eagle 7B:

Origin: RWKV.

Description: Eagle 7B is a 7.52B parameter model built on the RWKV-v5 architecture. It ranks as the world’s greenest 7B model per token and is trained on 1.1 trillion tokens across 100+ languages. It outperforms other 7B class models in multi-lingual benchmarks4.

Feature: Multilingual capabilities and energy efficiency.

Source: Hugging Face Model Page

MaLA-500:

Origin: MaLA-LM.

Description: MaLA-500 is designed to cover an extensive range of 534 languages. It builds upon LLaMA 2 7B, integrates continued pretraining with vocabulary extension, and boasts an extended vocabulary size of 260,1645.

Feature: Multilingual adaptation and vocabulary extension.

Source: Hugging Face Model Page

MambaByte:

Origin: Mamba.

Description: MambaByte is a token-free adaptation of the Mamba state space model, trained autoregressively on byte sequences. It achieves computational efficiency compared to other byte-level models and competes with state-of-the-art subword Transformers6.

Feature: Token-free language modeling.

Source: arXiv Paper

DeepSeek-Coder:

Origin: DeepSeek-AI.

Description: DeepSeek-Coder is a series of code language models trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese. It provides various sizes of code models, ranging from 1B to 33B versions7.

Feature: Code synthesis and understanding.

Source: GitHub Repository

FuseLLM:

Origin: Meta AI.

Description: FuseLLM focuses on merging and stacking LLMs to create more powerful models. It explores fusing models from a probabilistic distribution perspective and aims to overcome individual knowledge gaps8.

Feature: Knowledge fusion across multiple LLMs.

Source: Medium Article

Fuyu-Heavy:

Origin: Adept AI.

Description: Fuyu-8B is a multimodal architecture for AI agents, trained from scratch on 2 trillion tokens in both English and Chinese. It provides a small version of the multimodal model that powers their product9.

Feature: Multimodal capabilities.

Source: Adept AI Blog

DeepSeek:

Origin: Developed by DeepSeek AI.

Description: An advanced language model with 67 billion parameters, trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese.

Features:

Outperforms Llama2 70B Base in reasoning, coding, math, and Chinese comprehension.

Proficient in coding and math (HumanEval Pass@1: 73.78, GSM8K 0-shot: 84.1, Math 0-shot: 32.6).

Mastery in Chinese language, surpassing GPT-3.5.

Source: GitHub

LLaMA Pro:

Origin: Developed by Tencent Applied Research Center (ARC).

Description: An extension of LLaMA, specifically designed for enhancing instruction tuning of Code Large Language Models (LLMs).

Features:

Improves model nonlinearity through informed activation functions.

Achieves comparable performance to benchmarks with faster inference.

Source: GitHub

TinyLlama:

Origin: Derived from Llama2 LLM.

Description: A compact 1.1B language model pretrained on around 1 trillion tokens for approximately 3 epochs.

Features:

Leverages advances like FlashAttention for computational efficiency.

Remarkable performance in downstream tasks.

Source: GitHub

DocLLM:

Origin: Developed for reasoning over visual documents.

Description: Focuses on bounding box information to incorporate spatial layout structure.

Features:

Avoids expensive image encoders.

Enhances document layout understanding.

Source: GitHub

Unified-IO 2:

Origin: Developed by BAAI.

Description: An autoregressive multimodal model capable of understanding and generating images, text, audio, and action.

Features:

Tokenizes inputs and outputs into a shared semantic space.

Achieves state-of-the-art performance on various benchmarks.

Source: GitHub

WaveCoder-DS-6.7B:

Origin: Developed by an AI research team.

Description: Fine-tuned Code LLM with widespread and versatile enhanced instruction tuning.

Feature: Demonstrates impressive generalization ability across different code-related tasks.

Source: Research Paper

YunShan:

Origin: Developed by an undisclosed organization.

Description: A multilingual open-source LLM with a focus on improving Chinese language performance.

Feature: Pre-trained on meticulously cleaned and filtered data.

Source: Research Paper

PanGu-Pi:

Origin: Developed by a research team.

Description: Enhances language model architectures via nonlinearity compensation.

Feature: Achieves state-of-the-art performance in terms of accuracy and efficiency.

Source: Research Paper

YAYI 2:

Origin: Developed by an AI research team.

Description: Multilingual open-source LLM with a focus on improving Chinese language performance.

Feature: Pre-trained on meticulously cleaned and filtered data.

Source: Research Paper

Emu2:

Origin: Developed by an AI research team.

Description: Generative multimodal model for text and images.

Feature: Seamlessly generates images and texts in multimodal context.

Source: Research Paper

MedLM:

Origin: Developed by an AI research team.

Description: Fine-tuned for the healthcare industry, answering medical questions and drafting summaries.

Feature: High-quality generative AI output in the medical domain.

Source: Google Cloud Blog

SOLAR-10.7B:

Origin: Developed by an AI research team.

Description: Advanced LLM with 10.7 billion parameters, demonstrating superior performance.

Feature: Compact yet powerful, outperforming models with up to 30B parameters.

Source: Deci Model Zoo

Mistral-medium:

Origin: Developed by Mistral AI.

Description: A groundbreaking development in artificial intelligence.

Feature: Generative text model with 7 billion parameters.

Source: Medium Article

Mixtral-8x7B-32kseqlen:

Origin: Developed by Mistral AI.

Description: Sparse mixture-of-experts LLM.

Feature: Handles up to 32k tokens of context, multilingual, and follows instructions well.

Source: News Article

DeciLM-7B:

Origin: Developed by Deci.

Description: A 7.04 billion parameter decoder-only text generation model, outperforming other 7B base models in accuracy and computational efficiency.

Feature: Utilizes variable Grouped-Query Attention (GQA) for a balance between accuracy and efficiency.

Source: Deci/DeciLM-7B

StripedHyena 7B:

Origin: Developed by Together Research.

Description: Focuses on new architectures for long context, improved training, and inference performance.

Feature: Incorporates features from recurrent, convolutional, and continuous-time models.

Source: StripedHyena-Hessian-7B

NexusRaven-V2 13B:

Origin: Developed by Together Research.

Description: Surpasses GPT-4 in zero-shot function calling, converting natural language instructions into executable code.

Feature: Generalization to unseen functions.

Source: NexusRaven-V2–13B

Gemini Ultra 1.0:

Origin: Developed by Google.

Description: Google’s most capable AI model, optimized for complex tasks.

Feature: High-quality translation, efficient scaling, and versatility.

Source: Gemini Advanced

Mamba:

Origin: Developed by Google DeepMind.

Description: Enables expressive and multilingual translations in a streaming fashion.

Feature: End-to-end expressive and multilingual translations.

Source: SeamlessM4T-v2

LVM-3B:

Origin: Developed by Meta AI.

Description: A large language model based on the Transformer architecture.

Feature: Understands and generates human-like text for various tasks.

Source: Deciphering LLMs

SeaLLM-13b:

Origin: Developed by Alibaba DAMO Academy.

Description: Inclusive AI language models for Southeast Asian languages.

Feature: Provides accurate, up-to-date, and factual responses.

Source: TechWire Asia

pplx-70b-online:

Origin: Developed by Perplexity.

Description: Online LLMs with up-to-date information from the web.

Feature: Provides helpful, factual, and fresh responses.

Source: Perplexity

SeamlessM4T-Large v2:

Origin: Developed by Seamless Communication team from Meta AI.

Description: Enables expressive and streaming speech translation.

Feature: End-to-end expressive and multilingual translations.

Source: SeamlessM4T-v2

Q-Transformer:

Origin: Not specified.

Description: Enables understanding and answering questions, chat, and language translation.

Feature: Crucial for processing and understanding text.

Source: Q-Transformers

Yuan 2.0:

Origin: Not specified.

Description: Not provided.

Feature: Not provided.

Source: Not specified.

MEDITRON

Origin: MEDITRON is a suite of open-source medical Large Language Models (LLMs).

Description: MEDITRON-70B is a 70 billion parameters model adapted to the medical domain from Llama-2–70B through continued pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and a new dataset.

Feature: It is specifically tailored for medical applications.

Source: MEDITRON-70B on Hugging Face

Transformers-Arithmetic

Origin: Not specified.

Description: Transformers-Arithmetic is a model designed for arithmetic tasks using transformer-based architecture.

Feature: It specializes in arithmetic operations.

Source: Not specified.

Starling-7B

Origin: Developed by DALL·E 3.

Description: Starling-7B is an open-source LLM trained by Reinforcement Learning from AI Feedback (RLAIF). It harnesses the power of the GPT-4 labeled ranking dataset, Nectar, and a new reward training and policy tuning pipeline.

Feature: Strong reasoning abilities and performance on various tasks.

Source: Starling-7B on Hugging Face

Inflection-2

Origin: Not specified.

Description: Inflection-2 is a successor to Inflection-1, significantly more capable. It has an upgraded knowledge base for accurate user query responses.

Feature: Improved capabilities over its predecessor.

Source: Inflection-2 details

Claude 2.1

Origin: Developed by Microsoft Research.

Description: Claude 2.1 is an advanced LLM with a 200K token context window, reduced hallucination rates, and improved accuracy.

Feature: Enhanced reasoning abilities.

Source: Claude-2.1 on Hugging Face

TÜLU 2

Origin: Not specified.

Description: TÜLU 2 is a language model with specific capabilities.

Feature: Not specified.

Source: Not specified.

Orca 2

Origin: Developed by Microsoft Research.

Description: Orca 2 is a universal backbone for multimodal learning, addressing task diversity in computer vision and vision-language tasks.

Feature: Versatility across vision tasks.

Source: Orca-2–7b on Hugging Face

Phi-2

Origin: Not specified.

Description: Phi-2 is a small language model (SLM) with 2.7 billion parameters.

Feature: Improved reasoning abilities.

Source: Phi-2 on Hugging Face

Florence-2

Origin: Developed by Microsoft.

Description: Florence-2 is a vision foundation model addressing task diversity in computer vision and vision-language tasks.

Feature: Unified representation for various vision tasks.

Source: Florence-2 details

Mirasol3B

Origin: Developed by Microsoft Research.

Description: Mirasol3B is a multimodal autoregressive model for time-aligned and contextual modalities.

Feature: Combines audio, video, and text modalities.

Source: Mirasol3B on Hugging Face

OtterHD-8B:

Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).

Description: OtterHD-8B is an innovative multimodal model evolved from Fuyu-8B. It’s engineered to interpret high-resolution visual inputs with granular precision. Unlike conventional models, OtterHD-8B can handle flexible input dimensions.

Feature: It excels in discerning minute details and spatial relationships of small objects.

Source: ArXiv preprint

Gauss:

Origin: Developed by Samsung Electronics.

Description: Gauss is named after Carl Friedrich Gauss, a 19th-century German mathematician. It’s designed to answer questions with a touch of humor and suggest relevant questions.

Feature: Gauss Language (text generation), Gauss Code (code generation), and Gauss Image (image generation).

Source: Computerworld

Grok-1:

Origin: Developed by xAI.

Description: Grok-1 is a mixture-of-experts (MoE) LLM with 314B parameters. It excels in reasoning and coding tasks.

Feature: Supports long contexts (up to 2 million Chinese characters).

Source: Prompt Engineering Guide

Yi-34B:

Origin: Developed by 01.AI.

Description: Yi-34B is a bilingual embedding model with 314B parameters.

Feature: Supports up to 200K context window.

Source: NVIDIA NGC

GPT-4 Turbo:

Origin: Developed by OpenAI.

Description: GPT-4 Turbo is an advanced LLM that rivals GPT-4. It’s more efficient and cost-effective.

Feature: 128K context window.

Source: Internet Public Library

Kimi Chat:

Origin: Developed by Moonshot AI.

Description: Kimi is a Chinese ChatGPT, designed for understanding complex questions and generating coherent answers.

Feature: Understands long messages (up to 2 million Chinese characters).

Source: Dataconomy

ERNIE 4.0:

Origin: Developed by Baidu.

Description: ERNIE 4.0 is an NLP model that understands complex questions, applies logic, remembers information, and generates relevant answers.

Feature: Strong reasoning and generation capabilities.

Source: Medium

Fuyu:

Origin: Developed by Moonshot AI.

Description: Fuyu-8B is a unique foundation model that simplifies handling complex visual data.

Feature: Supports arbitrary image resolutions.

Source: Towards AI

ERNIE 4.0:

Origin: Developed by Baidu.

Description: ERNIE 4.0 has improved core capacities in understanding, generation, reasoning, and memory.

Feature: Demonstrated capabilities in generative skills and reasoning abilities.

Source: Shanghaiist

Zephyr:

Origin: Developed by Hugging Face, Zephyr is part of a series of language models trained to act as helpful assistants.

Description: Zephyr-7B-α, the first model in the series, is fine-tuned on publicly available synthetic datasets using Direct Preference Optimization (DPO).

Feature: A 7B parameter GPT-like model primarily trained on English.

Source: GitHub Repository | Demo

PaLI-3:

Origin: Developed by Hugging Face, PaLI-3 is a multilingual language-image model.

Description: It generates text based on visual and textual inputs, performing vision, language, and multimodal tasks.

Feature: Jointly scales language and vision components.

Source: GitHub Repository | Demo

Retro 48B:

Origin: Introduced by NVIDIA, Retro 48B surpasses the original GPT model in perplexity.

Description: After instruction tuning (InstructRetro), it enhances zero-shot question answering.

Feature: Pre-trained with retrieval, suitable for QA tasks.

Source: MarkTechPost

Ferret:

Origin: Developed by Apple Inc. in collaboration with Cornell University.

Description: Ferret integrates language understanding with image analysis.

Feature: Understands and interprets both text and visual data.

Source: AIToolMall

Lemur:

Origin: Open Foundation Models for Language Agents.

Description: Lemur is optimized for both natural language and coding capabilities.

Feature: Serves as the backbone of versatile language agents.

Source: GitHub Repository

AceGPT:

Origin: Developed for Arabic language by Wayve.

Description: Sets the state-of-the-art standard for open Arabic LLMs across various benchmarks.

Feature: Optimized for Arabic language understanding.

Source: Papers with Code

GAIA-1:

Origin: Wayve’s generative world model for autonomous driving.

Description: Generates realistic driving videos, offering fine-grained control over ego-vehicle behavior and scene features.

Feature: Incorporates video, text, and action inputs.

Source: Wayve Blog

MotionLM:

Origin: Developed for multi-agent motion forecasting.

Description: Represents continuous trajectories as sequences of discrete motion tokens.

Feature: Casts multi-agent motion prediction as a language modeling task.

Source: arXiv

Yasa-1

Origin: Developed by Reka AI.

Description: Yasa-1 is a multimodal assistant with visual and auditory sensors that can take actions via code execution. It supports long-context document processing, fast retrieval augmented generation, multilingual support (20 languages), and a search engine interface.

Source: Reka AI Blog

RT-X

Origin: NVIDIA.

Description: RT-X is a demo app that lets you personalize a GPT large language model (LLM) connected to your own content — docs, notes, or other data. It uses retrieval-augmented generation (RAG) and TensorRT-LLM for contextually relevant answers.

Source: NVIDIA

Qwen

Origin: Developed by Alibaba Group.

Description: Qwen is a large language model (LLM) pretrained on large-scale multilingual and multimodal data. It has been upgraded to Qwen1.5 and excels in understanding and generating human-like text.

Source: Qwen Documentation

Llama 2 Long

Origin: ThirdAI.

Description: Llama 2 is a family of pre-trained and fine-tuned large language models (LLMs) released by Meta AI in 2023. It is capable of various natural language processing tasks.

Source: IBM

LeoLM

Origin: Not specified.

Description: LeoLM is an LLM engineered for superior performance and efficiency. It leverages grouped-query attention (GQA) and sliding window attention (SWA) for effective handling of sequences of arbitrary length.

Source: Medium

Mistral 7B

Origin: Developed by ThirdAI.

Description: Mistral 7B is a multimodal literate model for machine reading of text-intensive images. It excels in generating spatially-aware text blocks and producing structured text output.

Source: arXiv

Kosmos-2.5

Origin: Developed by ThirdAI.

Description: Kosmos-2.5 is a multimodal literate model trained on large-scale text-intensive images. It performs well in transcription tasks and vertical domains like medicine and law.

Source: ThirdAI

Baichuan 2

Origin: Developed by ThirdAI.

Description: Baichuan 2 is a series of large-scale multilingual language models containing 7 billion and 13 billion parameters. It matches or outperforms other open-source models on various benchmarks.

Source: Hugging Face

BOLT2.5B

Origin: Deci AI.

Description: BOLT2.5B is a CPU-only pre-trained 2.5-billion parameter Generative LLM. It achieves remarkable performance without GPU involvement.

Source: Deci AI

DeciLM

Origin: Deci AI.

Description: DeciLM-7B is a super-fast and super-accurate 7-billion-parameter LLM. It is licensed under Apache 2.0 and represents a transformative force in language processing.

Source: MarkTechPost

MoLM:

Origin: Not specified.

Description: MoLM is an LLM, but further details are not available.

Feature: Not specified.

Source: Not specified.

NExT-GPT:

Origin: Developed by the National University of Singapore.

Description: NExT-GPT is an any-to-any multimodal LLM that perceives input and generates output in arbitrary combinations of text, image, video, and audio.

Feature: Supports various modalities and is tuned with only a small amount of parameters.

Source: GitHub

Phi-1.5:

Origin: Not specified.

Description: Phi-1.5 is a Transformer-based LLM with 1.3 billion parameters, trained on a mix of data sources.

Feature: Achieves nearly state-of-the-art performance among models with less than 10 billion parameters.

Source: SuperAnnotate

UniLM:

Origin: Developed by Microsoft Research.

Description: UniLM is a unified pre-trained LLM for various NLP tasks.

Feature: Supports both single-sentence and document-level tasks.

Source: GitHub

Persimmon-8B:

Origin: Developed by Adept.

Description: Persimmon-8B is an open-source LLM with a 16k token context, outperforming known foundation models.

Feature: Handles longer prompts and addresses complex tasks.

Source: Neurohive

FLM-101B:

Origin: Developed by CofeAI.

Description: FLM-101B is a 101B-parameter LLM that achieves performance comparable to GPT-4 and LLaMA 2.

Feature: Cost-effective training and supports both Chinese and English.

Source: Hugging Face

Hunyuan:

Origin: Developed by Tencent.

Description: Hunyuan is a proprietary LLM with over 100 billion parameters and significant Mandarin comprehension capabilities.

Feature: Integrated into Tencent’s business lines.

Source: Tencent

phi-CTNL:

Origin: Satirical model.

Description: phi-CTNL is a fictional 1 million parameter transformer-based LLM.

Feature: Achieves perfect results across diverse academic benchmarks.

Source: Emergent Mind

Falcon 180B:

Origin: Developed by TII (ThirdAI).

Description: Falcon 180B is the largest openly available language model, boasting 180 billion parameters. It was trained on a massive 3.5 trillion tokens using TII’s RefinedWeb dataset, representing the longest single-epoch pretraining for an open model.

Performance:

Outperforms Llama 2 70B and OpenAI’s GPT-3.5 on various benchmarks.

Comparable to Google’s PaLM 2-Large (Bard) and GPT-4.

Capabilities: Achieves state-of-the-art results across natural language tasks.

License: Permits commercial usage with certain restrictions.

Hardware Requirements: Inference requires 640GB of memory (quantized to half-precision FP16) or 320GB (quantized to int4).

Source: Hugging Face Blog

Jais:

Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).

Description: A powerful large language model (LLM) with 13 billion parameters, built upon the foundational architecture of GPT-3. It seamlessly integrates English text and code while bolstering Arabic language capabilities.

Feature: Sustains a nuclear fusion reaction for 30 seconds at temperatures exceeding 100 million°C.

Source: 12

Code Llama 34B:

Origin: Built on Code Llama, an LLM developed by Meta.

Description: Specialized in generating code and natural language about code from both code and natural language prompts.

Feature: Combines the best of RNN and transformer, offering great performance, fast inference, and efficient training.

Source: 345

IDEFICS:

Origin: Based on Flamingo, a visual language model developed by DeepMind.

Description: A multimodal model that accepts arbitrary sequences of image and text inputs, generating coherent text as output. Used for image-text tasks.

Feature: Answers questions about images, describes visual content, and creates stories grounded in multiple images.

Source: 678

Raven:

Origin: Developed by RWKV.

Description: An RNN with transformer-level LLM performance, combining the best of RNN and transformer.

Feature: Great performance, fast inference, and free sentence embedding.

Source: 9

DukunLM:

Origin: Developed by Boston University.

Description: Refines LLMs using a small amount of data to improve performance on specific tasks.

Feature: Corrects LLM predictions by training a smaller model to predict errors made by the LLM.

Source: 101112

WizardLM:

Origin: Developed by nlpxucan.

Description: Empowers LLMs to follow complex instructions. Suitable for chat-like applications and code generation.

Feature: Superior performance in quantitative LLM metrics.

Source: 13.

Japanese StableLM Alpha 7B:

Origin: Developed by Stability AI.

Description: Pre-trained on diverse Japanese and English datasets, maximizing Japanese language modeling performance.

Feature: Handles large amounts of code, making it ideal for learning and coding tasks.

Source: 14

StableCode:

Origin: Developed by Stability AI.

Description: Assists programmers with daily work, emphasizing code quality and correctness.

Feature: Provides autocomplete suggestions and handles larger code contexts.

Source: 151617

Med-Flamingo:

Origin: Developed by researchers at Stanford University.

Description: A multimodal medical few-shot learner adapted to the medical domain. Pre-trained on paired and interleaved medical image-text data from publications and textbooks.

Features: Enables few-shot generative medical visual question answering (VQA) and rationale generation.

Source: Med-Flamingo on arXiv

Alfred-40B-0723:

Origin: Developed by LightOn.

Description: An open-source language model based on Falcon-40B, designed for seamless integration of Generative AI into business workflows.

Features: Prompt engineering, no-code application development, and classic LLM tasks.

Source: Alfred-40B-0723 blog post2.

LLaMA-2–7B-32K:

Origin: Developed by Together.

Description: An open-source long-context language model fine-tuned from Meta’s original LLaMA-2 7B model.

Features: High-quality answers to medical questions.

Source: LLaMA-2–7B-32K on Hugging Face3.

Med-PaLM M:

Origin: Developed by Google Research.

Description: A large language model designed for the medical domain.

Features: High-quality answers to medical questions, surpassing the pass mark in the U.S. Medical Licensing Examination (USMLE)-style questions.

Source: Med-PaLM website

BTLM-3B-8K:

Origin: Developed by Cerebras Systems.

Description: A 3 billion parameter language model with an 8k context length trained on 627B tokens of SlimPajama.

Features: Licensed for commercial use, state-of-the-art performance, and supports 8k sequence length.

Source: BTLM-3B-8K on Hugging Face5.

Stable Beluga 2:

Origin: Developed by Stability AI.

Description: Built upon the LLaMA 2 70B foundation model, achieving industry-leading performance.

Features: Exceptional reasoning ability, understanding linguistic subtleties, and answering complex questions.

Source: Stable Beluga 2 blog post6

Meta-Transformer:

Origin: Developed for scalable automatic modulation classification (AMC) tasks.

Description: A meta-learning framework based on few-shot learning (FSL) to acquire general knowledge.

Features: Identifies new unseen modulations using very few samples.

Source: Meta-Transformer paper

Llama 2:

Origin: Developed by Meta AI in 2023.

Description: A family of pre-trained and fine-tuned LLMs, released for research and commercial use. Capable of various natural language processing tasks, from text generation to programming code.

Feature: Offers base foundation models and fine-tuned “chat” models, available with 7 billion (7B), 13 billion (13B), or 70 billion (70B) parameters.

Source: 123

WormGPT:

Origin: Created by a Portuguese programmer and initially sold on HackForums.

Description: A malicious AI chatbot built on the open-source GPT-J LLM, designed for cybercrime activities.

Feature: Allows users to engage in illegal activities and create ransomware, phishing scams, etc.

Source: 456

Claude 2:

Origin: Developed by Anthropic.

Description: A language model with improved performance, longer responses, and the ability to handle input sequences up to 8,000 tokens.

Feature: Supports coding, math, reasoning, and fine-tuning.

Source: 78

LongLLaMA:

Origin: Fine-tuned with the Focused Transformer (FoT) method based on OpenLLaMA.

Description: Designed for long sequence modeling tasks.

Feature: Handles input sequences up to 8,000 tokens.

Source: 3

xTrimoPGLM:

Origin: Developed by Biomap in partnership with Tsinghua University.

Description: Unified 100B-scale pre-trained transformer for deciphering the language of proteins.

Feature: Trained on an extended sequence length of 8K, supporting protein understanding and generation tasks.

Source: 91011

XGen:

Origin: Reka AI.

Description: A powerful LLM designed for long sequence modeling tasks.

Feature: Handles input sequences up to 8,000 tokens.

Source: 12

Zhinao (Intellectual Brain):

Origin: Developed by Refael Tikochinski et al.

Description: An incremental LLM for long text processing in the brain.

Feature: Integrates short contextual windows and summarizes information across long timescales.

Source: 13

Yasa:

Origin: Developed by Reka AI.

Description: A multimodal AI assistant with visual and auditory sensors.

Feature: Understands videos, audio, executes code, and supports long-context document processing.

Source: 14

Kosmos-2:

Origin: Developed by researchers at Korea Superconducting Tokamak Advanced Research (KSTAR).

Description: A Multimodal Large Language Model (MLLM) that integrates grounding and referring capabilities. It can perceive object descriptions (e.g., bounding boxes) and ground text to the visual world.

Features: Multimodal grounding, referring expression comprehension, perception-language tasks, and language understanding/generation.

Source: Kosmos-2 on arXiv

AudioPaLM:

Origin: Introduced by Google.

Description: A unified LLM combining text-based language models and audio prompting techniques. Excels in speech understanding and generation tasks, including voice recognition and voice-to-text conversion.

Features: Speech-related tasks.

Source: AudioPaLM on Times of India

Inflection-1:

Origin: Developed by Inflection AI.

Description: A best-in-class LLM with strong performance across various tasks. Trained using thousands of NVIDIA H100 GPUs.

Features: Multitask language understanding, including academic knowledge, reasoning, math, code, and more.

Source: Inflection-1 on Inflection AI

Phi-1:

Origin: Microsoft’s LLM specialized in Python coding tasks.

Description: Smaller-sized model compared to competing ones, focused on Python coding.

Features: Python code interpretation and data analysis.

Source: Phi-1 on OpenDataScience

InternLM:

Origin: Developed by Magic.

Description: Prototype neural network architecture with a 5,000,000 token context window. Can cover most code repositories.

Features: Large context windows for AI programming.

Source: LTM-1 on Magic

PassGPT:

Origin: Google AI.

Description: Trained on password leaks for password generation. Outperforms existing methods based on generative adversarial networks (GAN).

Features: Password modeling and guided password generation.

Source: PassGPT on arXiv

BlenderBot 3x:

Origin: Developed by Facebook AI.

Description: An update on the conversational model BlenderBot 3, trained using organic conversation and feedback data from participating users. It aims to improve both skills and safety.

Feature: Produces safer responses in challenging situations.

Source: BlenderBot 3x on arXiv

Orca (Logical and Linguistic Model):

Origin: Developed by Microsoft.

Description: A language model that combines logical reasoning and linguistic understanding. It uses the process of software development as training data.

Feature: Enhanced reasoning abilities by learning from developer activities.

Source: Orca blog post

PassGPT:

Origin: Developed by Magic.

Description: Trained on password leaks for password generation. Outperforms existing methods based on generative adversarial networks (GAN).

Feature: Guesses twice as many previously unseen passwords.

Source: PassGPT on arXiv

LTM-1 (Longterm Memory):

Origin: Developed by Magic.

Description: A Large Language Model with a 5,000,000 token context window. Enables AI to reference vast amounts of context when generating suggestions.

Feature: Can see an entire code repository, making it powerful for AI programming.

Source: LTM-1 blog post

GPT-4 MathMix:

Origin: Developed by OpenAI.

Description: A Transformer-based model pre-trained to predict the next token in a document. Post-training alignment process improves factuality and adherence to desired behavior.

Feature: Improved performance on measures of factuality and behavior.

Source: GPT-4 MathMix on LinkedIn.

PandaGPT:

Origin: Developed by Meta AI.

Description: Empowers large language models with visual and auditory instruction-following capabilities. Can perform tasks like image description generation, writing stories inspired by videos, and answering audio-related questions.

Feature: Multimodal inputs and natural composition of semantics.

Source: PandaGPT on arXiv.

Falcon:

Origin: Developed by Technology Innovation Institute (UAE).

Description: LLaMa-based model fine-tuned using QLoRA on carefully curated prompts and responses. Achieves strong performance on various tasks.

Feature: Comparable to GPT-4 and outperforms instruction-tuned models like Vicuna.

Source: Falcon LLM on Sapling3.

202305-refact2b-mqa-lion:

Origin: Developed by Refact AI.

Description: A 65B parameter LLaMa language model fine-tuned with standard supervised loss on only 1,000 curated prompts and responses.

Feature: High-quality output with minimal instruction tuning data.

Source: LIMA on arXiv4.

Guanaco:

Origin: Developed by Meta AI.

Description: LLaMa-based model fine-tuned using QLoRA on 1K data points. Achieves competitive results with top LLMs.

Feature: Strong performance on Vicuna benchmark.

Source: Guanaco on Sapling

LIMA:

Origin: Developed by Refact AI.

Description: A 65B parameter LLaMa language model fine-tuned with standard supervised loss on only 1,000 curated prompts and responses.

Feature: High-quality output with minimal instruction tuning data.

Source: LIMA on arXiv1.

Formosa (FFM):

Origin: Developed by TWS (Taiwan’s leading AI company).

Description: A large language model powered by the Taiwania 2 supercomputer with an impressive scale of 176 billion parameters.

Feature: Comprehends and generates text with traditional Chinese semantics, offering enterprise-level generative AI solutions.

Source: TWS Showcases Enterprise-level Large-scale Traditional Chinese Language Models2.

CodeT5+:

Origin: Developed by Google AI.

Description: An encoder-decoder LLM for code understanding and generation. Combines flexible architecture, diverse pretraining objectives, and efficient scaling.

Feature: Excels at advanced reasoning, multilingual translation, and coding tasks.

Source: CodeT5+: Open Code Large Language Models for Code Understanding and Generation.

PaLM 2:

Origin: Developed by Google AI.

Description: A state-of-the-art language model with improved multilingual, reasoning, and coding capabilities.

Feature: Strong performance in understanding nuanced text, logic, common sense reasoning, mathematics, and code generation.

Source: Google AI PaLM 24.

StarCoder:

Origin: Developed by NVIDIA.

Description: A Large Language Model for Code (Code LLM) trained on permissively licensed data from GitHub, including over 80 programming languages, Git commits, GitHub issues, and Jupyter notebooks.

Feature: Outperforms existing open Code LLMs on popular programming benchmarks.

Source: StarCoder: A State-of-the-Art LLM for Code.

GPT-2B-001:

Origin: Developed by NVIDIA, GPT-2B-001 is a transformer-based language model.

Description: It belongs to the GPT-2 family and has 2 billion trainable parameters.

Features:

SwiGLU activation function.

Rotary positional embeddings (RoPE).

Maximum sequence length of 4,096 tokens.

No dropout, no bias terms in linear layers.

Source: Hugging Face

Titan:

Origin: Developed by Amazon, Titan Text is a family of proprietary large language models (LLMs) designed for enterprise use cases.

Description: Titan Text LLMs generate text output in response to a given prompt.

Features:

Content creation, summarization, information extraction, and question answering.

Built-in support for responsible AI.

Easy customization through fine-tuning.

Source: Amazon Titan Text

WizardLM:

Origin: Developed by Stability AI, WizardLM is a series of language models.

Description: WizardLM enhances LLMs to follow complex instructions.

Features:

Evolved instructions for training.

Fine-tuned variants for specific tasks (e.g., coding, chat, storywriting).

Source: GitHub

MPT (MosaicPretrainedTransformer):

Origin: Developed by MosaicML, MPT models are pre-trained on 1T tokens.

Description: GPT-style decoder-only transformers with performance optimizations.

Features:

ALiBi for context length flexibility.

Efficient training and inference.

Source: Hugging Face

StableLM:

Origin: Developed by Stability AI, StableLM focuses on stability, compactness, and efficiency.

Description: Open-source LLMs with 3B and 7B parameters.

Features:

Efficient training and handling long inputs.

Instruction fine-tuning variants (e.g., chat, coding).

Source: GitHub

Dolly 2.0:

Origin: Developed by Databricks, an AI company.

Description: Dolly 2.0 is the first open-source, instruction-following LLM, fine-tuned on a human-generated instruction dataset licensed for research and commercial use.

Feature: It can generate human-like responses based on instructions and exhibits ChatGPT-like interactivity.

Source: Dolly Blog Post

Pythia:

Origin: Developed by EleutherAI.

Description: Pythia is a suite of 16 LLMs for in-depth research, spanning various model sizes.

Feature: It enables analysis of large language models across training and scaling.

Source: Pythia Paper

Koala-13B:

Origin: Developed by Berkeley AI Research.

Description: Koala is a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web.

Feature: It can effectively respond to user queries and outperforms Stanford’s Alpaca in some cases.

Source: Koala Blog Post

BloombergGPT:

Origin: Developed by Bloomberg LP.

Description: BloombergGPT is a 50-billion parameter LLM, purpose-built from scratch for finance.

Feature: It outperforms similarly-sized open models on financial NLP tasks.

Source: BloombergGPT Press Release

OpenFlamingo-9B:

Origin: An open-source replication of DeepMind’s Flamingo models.

Description: OpenFlamingo is a family of autoregressive vision-language models ranging from 3B to 9B parameters.

Feature: It combines vision and language understanding for various tasks.

Source: OpenFlamingo Paper

GPT4All-LoRa:

Origin: Developed by Nomic AI.

Description: An auto-regressive language model based on the transformer architecture and fine-tuned from LLaMA.

Features: English language support, GPL-3.0 license.

Source: GitHub Repository.

Cerebras-GPT:

Origin: Developed by Google Research.

Description: A family of GPT-3-like models scaled from 111M to 13B parameters, trained on the Eleuther Pile dataset.

Features: Multimodal (accepts image and text inputs, emits text outputs).

Source: NeuronAD Article.

PanGu-Sigma:

Origin: Developed by a team of researchers.

Description: A trillion-parameter language model trained on Ascend 910 AI processors using sparse heterogeneous computing.

Features: Sparse heterogeneity, strong performance in zero-shot learning for Chinese NLP tasks.

Source: arXiv Paper.

CoLT5:

Origin: Developed by a team of researchers.

Description: A long-input Transformer model with conditional computation for handling long documents efficiently.

Features: Enhanced text generation capabilities, optimal balance of quality, speed, and cost.

Source: arXiv Paper.

Med-PaLM 2:

Origin: Developed by Google Research.

Description: A Large Language Model (LLM) designed specifically for the medical domain.

Features: High-quality answers to medical questions.

Source: Medium Article

GPT-4:

Origin: Developed by OpenAI.

Description: A large multimodal model exhibiting human-level performance on professional and academic benchmarks.

Features: Improved factuality, steerability, and safety.

Source: OpenAI Research.

Kosmos-1:

Origin: Launched by the Soviet Union in 1962.

Description: A technology demonstration and ionospheric research satellite.

Features: Ionospheric research and technology testing.

Source: Wikipedia

LLaMA-65B:

Origin: Developed by Google Research.

Description: LLaMA is a collection of foundation language models ranging from 7B to 65B parameters. It is trained on publicly available datasets exclusively, without proprietary data.

Features: LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with other large models.

Source: LLaMA on arXiv

MOSS:

Origin: Ancient archaeological site located in modern-day Syria.

Description: Originally founded near a fertile natural oasis, it became a leading city of the Near East and a major trading post on the Silk Road.

Features: Historical significance, ancient ruins.

Source: History.com

Luminous Supreme Control:

Origin: Developed by Aleph Alpha.

Description: Part of the Luminous series, a family of large language models ranging from 7B to 65B parameters. Luminous models are multimodal, handling both text and images.

Features: Multimodal capabilities, fine-tuned for specific use-cases.

Source: Aleph Alpha API

Toolformer+Atlas 11B+NLLB 54B:

Origin: Developed by Meta AI.

Description: Toolformer is trained to use external tools via simple APIs. Atlas 11B and NLLB 54B are part of the Toolformer framework.

Features: Self-supervised learning to decide which APIs to call, when to call them, and how to incorporate results into token prediction.

Source: Toolformer on arXiv

Multimodal-CoT:

Origin: Proposed by Zhuosheng Zhang et al.

Description: Incorporates both language (text) and vision (images) modalities into a two-stage framework for reasoning.

Features: Improved performance on ScienceQA benchmark, surpassing human performance.

Source: Multimodal-CoT on arXiv

FLAME:

Origin: Rapid oxidation of a material in the exothermic chemical process of combustion.

Description: Flames are visible, gaseous parts of a fire that give off heat and light.

Features: Heat conduction and diffusion theories explain flame propagation.

Source: Britannica

Med-PaLM 1:

Origin: Developed by Google Research.

Description: Large language model designed for medical questions, surpassing the pass mark in the U.S. Medical Licensing Examination (USMLE) style questions.

Features: Accurate answers to medical queries, long-form answers to consumer health questions.

Source: Med-PaLM

OPT-IML:

Origin: Research model.

Description: Scaled language model instruction meta-learning for medical applications.

Features: Improved zero-shot performance on USMLE-style questions.

Source: OPT-IML on arXiv

RL-CAI:

Origin: RL-CAI is a 175B language assistant fine-tuned using reinforcement learning with human feedback helpfulness data and AI feedback harmlessness data.

Description: RL-CAI is designed to follow instructions in a prompt and provide detailed responses.

Features:

Responds concisely and accurately, even in zero-shot scenarios.

Supports a longer context window (max prompt+completion length) compared to other models.

Trained on a large-scale, real-world robotics dataset.

Source: OpenAI

ERNIE-Code:

Origin: ERNIE-Code is a unified pre-trained language model for 116 natural languages (NLs) and 6 programming languages (PLs).

Description: ERNIE-Code bridges the gap between multilingual NLs and multilingual PLs for large language models.

Features:

Universal cross-lingual pre-training using span-corruption language modeling and pivot-based translation language modeling.

Outperforms previous multilingual LLMs for PL or NL across various code intelligence tasks.

Source: arXiv

RT-1:

Origin: RT-1 was an early intercontinental ballistic missile design tested but not deployed by the Soviet Union during the Cold War.

Description: RT-1 was not assigned a NATO reporting name and did not enter service.

Features:

Mass: 35,500 kg

Length: 18.3 m

Diameter: 2 m

Warhead: 600–1000 kt nuclear warhead

Source: Wikipedia

ChatGPT (gpt-3.5-turbo):

Origin: ChatGPT is a sibling model to InstructGPT, designed for conversational interactions.

Description: ChatGPT interacts in a conversational way, answering follow-up questions, admitting mistakes, and challenging incorrect premises.

Features:

Dialogue format for natural conversations.

Fine-tuned using reinforcement learning from human feedback.

Available for free during the research preview.

Source: OpenAI

GPT-JT:

Origin: GPT-JT (6B) is a variant forked off GPT-J (6B), fine-tuned on 3.53 billion tokens.

Description: GPT-JT performs exceptionally well on text classification and other tasks.

Features:

Supports a longer context window than GPT-J.

Trained on a more recent dataset.

Source: Together AI

RWKV-4

Origin: RWKV Language Model is an RNN with transformer-level performance.

Description: It combines the best of RNN and transformer architectures, offering great performance, fast inference, and training while saving VRAM.

Features: RWKV-4 is attention-free, supports “infinite” context length, and provides free text embedding.

Source: RWKV Official Website

Galactica

Origin: Developed by researchers and detailed in a paper on arXiv.

Description: A large language model focused on scientific knowledge, capable of tasks such as citation prediction and mathematical reasoning.

Features: Outperforms existing models on technical knowledge probes and reasoning tasks.

Source: arXiv Paper

SED

Origin: SED-ML is an XML-based format for encoding simulation setups.

Description: It ensures the exchangeability and reproducibility of simulation experiments.

Features: Not a language model but a tool for describing simulations in a standardized format.

Source: SED-ML Website

mT0

Origin: Part of the BLOOM & mT5 family of models.

Description: A multilingual language model fine-tuned on crosslingual task mixtures.

Features: Capable of following human instructions in dozens of languages zero-shot.

Source: Hugging Face Model Page

BLOOMZ

Origin: An enhanced version of the BLOOM model, fine-tuned on a mixture of tasks.

Description: It’s part of the BLOOM & mT0 model family, recommended for prompting in English.

Features: Strong few-shot performance even compared to larger models.

Source: Hugging Face Model Page

PACT

Origin: PACT is a smart contract language purpose-built for blockchains.

Description: It facilitates transactional logic with a mix of functionality in authorization, data management, and workflow.

Features: Open-source, Turing-incomplete language.

Source: GitHub Repository

Flan-T5

Origin: An enhanced version of T5, fine-tuned on a mixture of tasks.

Description: FLAN-T5 can be used directly without further fine-tuning.

Features: Improved performance on a variety of NLP tasks.

Source: Hugging Face Documentation

Flan-PaLM

Origin: Google Research’s approach to more generalizable language models.

Description: Fine-tuned on a large set of varied instructions for better task-solving abilities.

Features: Improved generalization to unseen tasks8.

Source: Google Research Blog

U-PaLM

Origin: Google AI’s next-generation large language model.

Description: Excels at advanced reasoning tasks, including code and math, classification, and question answering.

Features: Pre-trained on parallel multilingual text and a larger corpus of different languages.

Source: Google AI Page

VIMA

Origin: Officially introduced in an ICML’23 paper.

Description: A robot manipulation model with multimodal prompts, combining textual and visual tokens.

Features: Encodes input sequences and decodes robot control actions autoregressively.

Source: GitHub Repository

OpenChat

Origin: Developed by a student team at Tsinghua University.

Description: OpenChat is an open-source library of language models fine-tuned with C-RLFT, learning from mixed-quality data.

Features: It has two modes: Coding and Generalist, and supports mathematical reasoning.

Source: Hugging Face

WeLM

Origin: Presented in a paper on arXiv.

Description: A well-read pre-trained language model for Chinese, capable of zero or few-shot demonstrations.

Features: Outperforms existing models on monolingual tasks and exhibits strong multi-lingual capabilities.

Source: arXiv Paper

CodeGeeX

Origin: Introduced by Tsinghua University.

Description: A multilingual code generation model pre-trained on various programming languages.

Features: Offers code generation, translation, and comment generation capabilities.

Source: Tsinghua University

Sparrow

Origin: Developed by DeepMind4.

Description: A dialogue model that aims to be helpful, correct, and harmless.

Features: Follows 23 rules during dialogue to ensure safety and appropriateness.

Source: DeepMind

PaLI

Origin: Introduced by Google Research.

Description: A jointly-scaled multilingual language-image model that generates text based on visual and textual inputs.

Features: Performs vision, language, and multimodal tasks across many languages.

Source: Google Research

NeMo Megatron-GPT 20B

Origin: Developed by NVIDIA.

Description: A transformer-based language model with 20 billion parameters.

Features: Part of the NeMo Megatron series, optimized for large-scale language tasks.

Source: Hugging Face

Z-Code++

Origin: Created by Microsoft Azure AI and Microsoft Research.

Description: Optimized for abstractive summarization, it extends encoder-decoder models with new techniques.

Features: Outperforms larger models on summarization tasks and offers bilingual support.

Source: arXiv

Atlas

Origin: Developed by Facebook Research.

Description: A retrieval-augmented language model that excels in few-shot learning.

Features: Demonstrates strong performance on knowledge-intensive tasks.

Source: GitHub

BlenderBot 3

Origin: Released by Meta AI.

Description: A publicly available chatbot with 175 billion parameters that improves over time.

Features: Capable of internet searches and conversational long-term memory.

Source: Meta AI Blog

GLM-130B

Origin: Developed by Tsinghua University and other collaborators.

Description: A bilingual (English & Chinese) pre-trained model with 130 billion parameters.

Features: Supports fast inference and is quantization-friendly for efficient use on consumer GPUs.

Source: GitHub

AlexaTM 20B

Origin: Developed by Amazon.

Description: A large-scale multilingual seq2seq language model designed for few-shot learning.

Features: Outperforms GPT-3 in zero-shot learning tasks and excels in 1-shot summarization and machine translation, especially for low-resource languages.

Source: Amazon SageMaker JumpStart

6.9B FIM

Origin: Introduced by OpenAI.

Description: An autoregressive language model trained to fill in the middle of texts, maintaining left-to-right generative capabilities.

Features: Efficient training with a large fraction of data transformed for infill tasks without harming original generative performance.

Source: OpenAI Research

‘monorepo-Transformer’

Origin: The term ‘monorepo’ refers to a development strategy that involves a single repository containing multiple projects.

Description: While not a specific language model, the term may relate to training transformer models on a monorepo, which is a large, single codebase containing many projects.

Features: Benefits include simplified dependency management and easier collaboration across different projects within the same repository.

Source: Google Research Blog

PanGu-Coder

Origin: Developed by researchers and detailed in a paper on arXiv.

Description: A pretrained decoder-only language model designed for text-to-code generation.

Features: Demonstrates equivalent or better performance than similarly sized models like CodeX, while attending a smaller context window and training on less data.

Source: arXiv Paper

NLLB

Origin: Created by Meta AI.

Description: A machine translation model capable of translating sentences between any of the 202 language varieties.

Features: Aims to provide high-quality translations directly between 200 languages, including low-resource languages.

Source: Meta AI Research

J-1 RBG

Origin: Released by AI21.

Description: A high-quality, affordable language model with 17B parameters.

Features: Offers supreme quality text generation at a more affordable rate compared to larger models.

Source: AI21 Studio

BLOOM (tr11–176B-ml)

Origin: A collaboration of hundreds of researchers.

Description: An open-access multilingual language model with 176B parameters.

Features: Outputs coherent text in 46 languages and 13 programming languages.

Source: Hugging Face

Minerva

Origin: Introduced by Google Research.

Description: A language model capable of solving mathematical and scientific questions using step-by-step reasoning.

Features: Achieves state-of-the-art performance on technical benchmarks without the use of external tools.

Source: Google Research Blog

GODEL-XL

Origin: Developed by Microsoft.

Description: A large-scale pre-trained model for goal-directed dialog.

Features: Trained on multi-turn dialogs and instruction and knowledge grounded dialogs.

Source: GitHub Repository

YaLM 100B

Origin: Developed by Yandex.

Description: A GPT-like neural network for generating and processing text with 100 billion parameters.

Features: Can be used freely by developers and researchers worldwide.

Source: GitHub Repository

Unified-IO

Origin: Proposed by AI2.

Description: A unified model for vision, language, and multi-modal tasks.

Features: Performs a large variety of AI tasks spanning classical computer vision tasks, vision-and-language tasks, to natural language processing tasks.

Source: arXiv Paper

Perceiver AR

Origin: Developed by Google Research.

Description: An autoregressive, modality-agnostic architecture that uses cross-attention to map long-range inputs to a small number of latents.

Features: Can directly attend to over a hundred thousand tokens, enabling practical long-context density estimation.

Source: GitHub Repository

LIMoE

Origin: Introduced by Google Research.

Description: A sparse mixture of experts model capable of multimodal learning.

Features: Accepts both images and text simultaneously, trained using a contrastive loss.

Source: arXiv Paper

GPT-4chan

Origin: Developed by EleutherAI and Yannic Kilcher.

Description: A controversial AI model fine-tuned on a dataset from the /pol/ board of 4chan, known for its extremist content.

Features: Generates text mimicking the style and tone of /pol/ users, including offensive and nihilistic content.

Source: Wikipedia

Diffusion-LM

Origin: Created by researchers at Stanford University.

Description: A non-autoregressive language model based on continuous diffusions.

Features: Enables complex, controllable text generation tasks.

Source: arXiv Paper

UL2 20B

Origin: Developed by Google.

Description: A unified framework for pretraining models universally effective across datasets and setups.

Features: Combines diverse pre-training paradigms and introduces mode switching for downstream fine-tuning.

Source: Hugging Face

Gato (Cat)

Origin: Developed by DeepMind.

Description: A multi-modal, multi-task, multi-embodiment generalist policy.

Features: Can play Atari, caption images, chat, and more with the same network and weights.

Source: arXiv

LaMDA 2

Origin: Unveiled by Google.

Description: Uses text from various sources to formulate unique “natural conversations”.

Features: Engages in free-flowing conversation on numerous topics.

Source: Wikipedia

OPT-175B

Origin: Released by Meta AI Research.

Description: A language model trained on a dataset containing 180B tokens.

Features: Comparable performance with GPT-3 but with a lower training carbon footprint.

Source: Meta AI Blog

Tk-Instruct

Origin: Introduced by AI researchers.

Description: A transformer model trained to follow a variety of in-context instructions.

Features: Outperforms existing instruction-following models.

Source: GitHub

InCoder

Origin: Presented by Facebook AI researchers.

Description: A generative model for code infilling and synthesis.

Features: Hosts example code showing how to use the model using HuggingFace’s transformers library.

Source: GitHub

NOOR

Origin: Developed by the Technology Innovation Institute.

Description: The world’s largest Arabic NLP model.

Features: Capable of varied cross-domain tasks and learning from natural language instructions.

Source: NOOR Official Website

mGPT

Origin: Introduced by AI Forever.

Description: A multilingual variant of GPT-3, pretrained on 61 languages.

Features: Intrinsic and extrinsic evaluation on cross-lingual NLU datasets and benchmarks.

Source: Hugging Face

PaLM-Coder

Origin: Not found

Description: Not found

Features: Not found

Source: Not found

PaLM

Origin: Developed by Google AI.

Description: A large language model that excels at advanced reasoning tasks, including code and math, classification, and question answering.

Features: Pre-trained on parallel multilingual text and a larger corpus of different languages than its predecessor.

Source: Google AI

SeeKeR

Origin: Introduced by Facebook AI Research.

Description: A modular language model that uses a search engine to stay relevant and up-to-date.

Features: Outperforms BlenderBot 2 in dialogue consistency, knowledge, factual correctness, and engagingness.

Source: Facebook AI Research

CodeGen

Origin: Proposed by Salesforce.

Description: An autoregressive language model for program synthesis trained on The Pile, BigQuery, and BigPython.

Features: Outperforms OpenAI’s Codex on the HumanEval benchmark.

Source: Hugging Face

VLM-4

Origin: Not found

Description: Not found

Features: Not found

Source: Not found

CM3

Origin: Developed by Facebook AI Research.

Description: A causally masked generative model trained over a large corpus of structured multi-modal documents containing text and image tokenz.

Features: Enables rich structured, multi-modal outputs while conditioning on arbitrary masked document contexts.

Source: arXiv

Luminous

Origin: Developed by Aleph Alpha.

Description: A family of large language models capable of processing and producing human text in multiple languages.

Features: Multimodal capabilities, working with images as well as text.

Source: Aleph Alpha API

Chinchilla

Origin: Developed by DeepMind.

Description: A large language model that claims to outperform GPT-3 and simplifies downstream utilization.

Features: Trained with a hypothesis that for every doubling of model size, the number of training tokens should also be doubled.

Source: Wikipedia

GPT-NeoX-20B

Origin: Developed by EleutherAI.

Description: A 20 billion parameter autoregressive language model trained on the Pile.

Features: Intentionally resembles the architecture of GPT-3 and is almost identical to GPT-J-6B.

Source: Hugging Face

ERNIE 3.0 Titan

Origin: Developed by Baidu.

Description: A hundred-billion-parameter model trained to generate credible and controllable texts.

Features: Employs self-supervised adversarial loss and controllable language modeling loss.

Source: arXiv

XGLM

Origin: Developed by Facebook AI Research.

Description: A family of multilingual autoregressive language models trained on a balanced corpus covering a diverse set of languages.

Features: Sets new state of the art in few-shot learning in more than 20 representative languages.

Source: GitHub

Fairseq

Origin: Developed by Facebook AI Research.

Description: A sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

Features: Supports multi-GPU training, fast generation, and a variety of search algorithms for text generation.

Source: GitHub Repository

Gopher

Origin: Created by DeepMind.

Description: A 280-billion-parameter AI natural language processing model based on the Transformer architecture.

Features: Trained on a 10.5TB corpus called MassiveText, outperforms current state-of-the-art on various evaluation tasks.

Source: arXiv Paper

GLaM

Origin: Introduced by Google.

Description: A sparsely activated mixture-of-experts model named Generalist Language Model (GLaM), which scales model capacity while incurring substantially less training cost compared to dense variants.

Features: The largest GLaM has 1.2 trillion parameters and achieves better overall zero-shot and one-shot performance across 29 NLP tasks.

Source: arXiv Paper

Anthropic-LM 52B

Origin: Developed by Anthropic.

Description: Part of an effort to red team language models to reduce potentially harmful outputs.

Features: Investigates scaling behaviors across different model sizes and types, including plain LMs and those trained with reinforcement learning from human feedback.

Source: Anthropic Research

RETRO

Origin: Developed by DeepMind.

Description: Enhances auto-regressive language models by conditioning on document chunks retrieved from a large corpus.

Features: With a 2 trillion token database, RETRO obtains comparable performance to GPT-3 and Jurassic-1 despite using significantly fewer parameters.

Source: arXiv Paper

BERT-480

Origin: Not found

Description: Not found

Features: Not found

Source: Not found

BERT-200

Origin: Not found

Description: Not found

Features: Not found

Source: Not found

Cedille FR-Boris

Origin: Developed by Coteries.

Description: A 6B parameter autoregressive language model based on the GPT-J architecture.

Features: Trained on around 78B tokens of French text from the C4 dataset, named after French writer Boris Vian.

Source: Hugging Face

MT-NLG

Origin: A joint effort between Microsoft and NVIDIA.

Description: The Megatron-Turing Natural Language Generation model (MT-NLG) is the largest and most powerful monolithic transformer language model trained to date, with 530 billion parameters.

Features: Demonstrates unmatched accuracy in a broad set of natural language tasks such as completion prediction, reading comprehension, and commonsense reasoning.

Source: NVIDIA Technical Blog

FLAN

Origin: Introduced by Google Research.

Description: Fine-tuned LAnguage Net (FLAN) uses instruction fine-tuning to make models more amenable to solving NLP tasks in general.

Features: Performs various unseen tasks without the need for task-specific fine-tuning.

Source: Google Research Blog

Command xlarge

Origin: Developed by Cohere.

Description: A highly scalable language model that balances high performance with strong accuracy.

Features: Optimized for Retrieval Augmented Generation at production scale, offering leading accuracy for advanced AI applications.

Source: Cohere’s Command Model

PLATO-XL

Origin: Created by Baidu.

Description: A large-scale pre-training dialogue generation model trained on both Chinese and English social media conversations.

Features: Achieves state-of-the-art results across multiple conversational tasks.

Source: arXiv Paper

Macaw

Origin: Developed by Allen Institute for AI.

Description: A high-performance question-answering model capable of outperforming other popular current language models.

Features: Significantly smaller yet more efficient than other models.

Source: Macaw

CodeT5

Origin: Introduced by Salesforce.

Description: Encoder-decoder LLMs for code that can be flexibly combined to suit a wide range of downstream code tasks.

Features: State-of-the-art performance on various code-related tasks.

Source: arXiv Paper

Codex

Origin: Released by OpenAI.

Description: An AI system that translates natural language to code, proficient in over a dozen programming languages.

Features: Can interpret simple commands in natural language and execute them on the user’s behalf.

Source: OpenAI Codex

Jurassic-1

Origin: Launched by AI21 Labs.

Description: A 178B-parameter autoregressive language model trained on the Pile dataset.

Features: Highly versatile, capable of human-like text generation and solving complex tasks.

Source: AI21 Studio

BlenderBot 2.0

Origin: Built and open-sourced by Facebook AI Research.

Description: The first chatbot that can build long-term memory and search the internet.

Features: Can engage in sophisticated conversations on nearly any topic.

Source: Blender Bot 2.0

GPT-J

Origin: Developed by EleutherAI.

Description: A GPT-3-like causal language model trained on the Pile dataset.

Features: Capable of generating human-like text that continues from a prompt.

Source: GPT-J

LaMDA

Origin: Developed by Google.

Description: A family of conversational large language models aiming to make interactions with technology more natural.

Features: Can engage in a free-flowing way about a seemingly endless number of topics.

Source: LaMDA

ruGPT-3

Origin: Not found

Description: Not found

Features: Not found

Source: Not found

Switch

Origin: Developed by Google.

Description: A mixture-of-experts model that scales up to 1.6 trillion parameters.

Features: Improves training time up to 7x compared to the T5 NLP model, with comparable accuracy.

Source: Medium Article

GPT-3

Origin: Released by OpenAI.

Description: A large language model with 175 billion parameters.

Features: Demonstrates strong “zero-shot” and “few-shot” learning abilities on many tasks.

Source: Wikipedia

Megatron-11B

Origin: Introduced by Facebook AI Research labs.

Description: A unidirectional language model with 11B parameters based on Megatron-LM.

Features: Trained using intra-layer model parallelism with each layer’s parameters split across 8 GPUs.

Source: GitHub

Meena

Origin: Developed by Google.

Description: An end-to-end, neural conversational model that learns to respond sensibly to a given conversational context.

Features: Uses the Evolved Transformer seq2seq architecture, aiming to minimize perplexity.

Source: Google Research Blog

T5

Origin: Presented by Google.

Description: An encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks.

Features: Converts every language problem into a text-to-text format.

Source: Hugging Face

RoBERTa

Origin: Proposed by Facebook AI.

Description: A robustly optimized BERT pretraining approach.

Features: Removes the next-sentence pretraining objective and trains with much larger mini-batches and learning rates.

Source: Hugging Face

GPT-2

Origin: Released by OpenAI.

Description: A large transformer-based language model with 1.5 billion parameters.

Features: Trained to predict the next word in sentences.

Source: Hugging Face

BERT

Origin: Introduced by researchers at Google.

Description: A language model based on the transformer architecture, notable for its dramatic improvement over previous state-of-the-art models.

Features: Pre-trained on the Toronto BookCorpus and English Wikipedia.

Source: Wikipedia

GPT-1

Origin: Created by OpenAI.

Description: The first transformer-based language model, known as GPT-1.

Features: A causal (unidirectional) transformer pre-trained using language modeling on a large corpus.

Source: Wikipedia

ULMFiT

Origin: Developed by fast.ai.

Description: An architecture and transfer learning method for NLP tasks involving a 3-layer AWD-LSTM architecture.

Features: Uses discriminative fine-tuning and Slanted Triangular Learning Rates (STLR).

Source: Papers With Code

As we close this chapter on our exploration of Large Language Models (LLMs) up to 2024, we hope you’ve found the above information to be a clear window into the complex world of AI language technology. From the intricate workings of GPT-3 to the nuanced capabilities of BERT and beyond, each LLM we’ve discussed holds the potential to revolutionize how we interact with machines and data.

Our journey through the landscape of LLMs has been one of discovery and understanding, made possible by the invaluable insights from Dr. Alan D Thompson. His expertise has been a guiding light in demystifying the technicalities and presenting them in a way that’s accessible to all.

As I look forward to the future, we anticipate even more innovative breakthroughs in AI that will continue to shape our digital lives.

The documentation for each LLM, arriving in the coming week, will serve as a comprehensive resource to help you navigate these advancements with ease.

Thank you for joining us on this enlightening journey. We invite you to keep the conversation going, share your thoughts, and stay curious. The world of AI is vast and ever-growing, and together, we’ll keep learning and growing with it.

Until next time, embrace the possibilities that LLMs bring to our world, and may your path through the digital age be filled with knowledge and wonder.

Let’s Continue Learning! — Vivek !!

--

--

VIVEK KUMAR UPADHYAY
VIVEK KUMAR UPADHYAY

Written by VIVEK KUMAR UPADHYAY

I am a professional Content Strategist & Business Consultant with expertise in the Artificial Intelligence domain. MD - physicsalert.com .

No responses yet