Creative Content API of the web

Whether you are a software company who wants to adopt a true value add for their end users, or need text classication and generation on a large scale.

Are you ready to 4x your workflow productivity with the power of large language models? Without all the infrastructure pain?

Cortex Online


Make your workflow smart with natural language processing, generation and machine creativity.

No environment issues, no GPU shortages, no prompting issues, no overloaded servers, no parameter optimization.

Only pure NLP power for your workflows and products. So you can focus on the product you want to get out.

Each solution is a simple integration of an API endpoint into your codebase.


Generate content and text on scale.

Get Started

Find similar entities on scale within your text data.

Get Started

Rewrite and summarize text in your workflow of any length.

Get Started

Label text to your instructions.

Get Started

We help you to find the right pricing model for your task


So many models and they feel like all having their own character. We help you choosing the best working models for your workflow. Within the NeoCortex system there are 4 categories.

Where Velox are the fastest models and Alta are the most powerful ones. Sophos models are our fine-tuned NeoCortex expert models for highly specialized workflows.

These models can be used for a variety of purposes, including classification, entity extraction, summarization, content generation, code generation, paraphrasing, and much more.





Short Description

the fast

the balanced

the strong

the expert

Price per 1K Tokens





Multilingual Surcharge
(Price per 1K Tokens)





Fine Tuning

on request

on request

on request

on request

Dedicated GPUs

on request

on request

on request

on request

Parameter Size

Up to 5 billion

Up to 19 billion

Over 20 billion


Got data? Let’s bring your models to the next level


Out of the box models are not made for your specific pain points?

That’s why we help you nail them down to what you need. With fine-tuned models on your data and workflow we can achieve better results while reducing overall costs in a number of tasks.

A small set of 100 examples can:

  • improve task performance by 10x
  • lead to 28% cost reduction
Talk To our integration team
How Rewriting API works (4:50)

Bring your own model

Hit the end of the road with your optimization?

You don’t manage to put a larger model on your infrastructure. We help you host your own model and deal with ongoing optimization.

  • Takes away 70% of the work of text creation
  • Customize text generation to match your users, industry and niche
  • Add an unbeatable timesaver to your value proposition
Talk To our integration team

Powerful and easy-to-use APIs


From Developers for Developers. Get integrated to our API in less than 3 lines of code.

  • Advanced optimizations - We are constantly tweaking models to achieve the best response speeds.
  • Dedicated Computations - Unlock higher performance with dedicated computing power.

Make your workflow smart

Looking for a service to host your large language model as an API? Struggling to make a larger model on your infrastructure?
We help you host your own model and deal with ongoing optimization.

Text Rewriting API

Give your workflow or your users the power to endless power to rewrite anything.

Get Started
Text Summarization API

Summarize anything in your workflow for easier digestion.

Get Started
Text generation API

Create content for long form articles, emails, social media posts.

Get Started
Text Adjustment API

Change the tone and language within your workflow.

Get Started
Text Extraction API

Extract core information out of text.

Get Started
Text Classification API

Classify data on large scale.

Get Started
Visual generation API

Use Stable Diffusion to create visual drafts in your workflow.

Get Started
Text Embeddings API

Capture text similarities between texts.

Get Started

Some of the most frequently asked questions

What is a token?

A token is the industry term for the volume our models operate. From the input you are giving them to work with to the output they are generating for you.

Think of them like paying for liters of water you consume from your city system and drain back into it.

Tokens are a form of measurement which can be expressed in different units like 1 liter of water is the same as 1000 milliliter of water. Similarly, 1 token consists of 4 characters.

BTW did you know that the average word consists of 4.5 characters? So 1 token is almost a word!

To put this all into perspective again. The text up to here has around 670 characters, 124 words and makes roughly 169 tokens.

If Alta would have created this block it would have costed:

[# of total tokens] x [per token price of Alta] = cost

169 x 0.00002 = 0.0038 USD

Which models do you support?

We work most of the time with our own proprietary NeoCortex models.

Other models which we can host, optimize and operate for you include:
- GPT Neo 2.7 bn.
- GPT-J 6 bn.
- GPT NeoX 20 bn.
- OPT: Open Pre-trained Transformer Language Models (125M to 66 bn)
- FairSeq (13 bn.)
- CodeGen (16 bn.)
- Bloom 560m
- Bloom 1.1 bn.
- Bloom 1.7 bn.
- Bloom 3 bn.
- Bloom 7.1 bn.
- Bloom 176 bn. (available soon)
- t5 Small
- t5 Base
- t5 Large
- t5 3b
- t5 11b
- Stable Diffusion hosted API

In case you have more questions reach out to our integration team.

How is the price calculated?

We calculate your charges going by the amount of input and output tokens you are working with.

A generation task with our "Aecus" model for LongForm is most of the times very output heavy.

Think about the following input prompt:
“The NeoCortex text generation API helps you in your workflow” (60 character = 15 tokens)

Here is the actual output:
“What is the NeoCortex?
NeoCortex is a deep learning text generation API, which uses only natural language to generate texts. The generated text is designed to be human-readable and understandable, helping you in your workflow. The API is easy to learn, with a simple interface. This makes it perfect for use by developers who want to create their own applications.” (360 characters = 90 Tokens)

Makes a total token count of 15 + 90 = 105 Tokens.

0.02$ divided by 1000 times 105 = 0.0021 $ would be charged for your request.

Can I try the text generation API for free?


When you sign up to our platform and generate an API key you automatically have $5 to try out our APIs for 30 days.

For help around your integration reach out to our integration team.

What exactly is fine-tuning?

When we fine tune your models we are taking your data and teaching a model how to behave on it and what is expected.

Like a nice suit we are tailoring you a text generation, classification, extraction model on your very needs.

All we need from you is a dataset with high quality data from your workflow.

We see first improvements from datasets as small as 500 observations.

In case you have more questions reach out to our integration team.

Is it easy to switch and integrate?


Base functionalities require less than 3-lines of code. For this sake we also have multiple OSS packages for easy integration (PyPi, NPM-JS).

Furthermore, we offer you a dedicated integration support for your onboarding time.

While newcomers require around 4 to 6 weeks to see first results. Somebody switching to our infrastructure finds their way around in less than 2 weeks time.

What model is the best for my use case?

While we try to manage our API endpoints as self explanatory as possible. We are always happy to help you find the best model.

In case you look for direct guidance reach out to our integration team.

Hosting large language models with TextCortex vs. self-hosting

Running large language models (LLM) requires powerful computing machinery in form of GPUs.

Building the infrastructure, maintaining and optimizing for it comes at high cost. We are experts in it having scaled down our models for our user facing products where response times in "ms" matter and cost optimization our margin driver is.

With TextCortex you have the benefit of benefitting from our NeoCortex system in two cost saving ways:

1) Pay-per-token
Ideal for smaller scale operations. For example, less 100k request a month or smaller models with less than 10 bn. parameters.

You run in a shared environment with other customers.

2) Pay-per-GPU-hour
Ideal for large scale process operation. For example, millions of request or models with more than 20 bn. parameters.

You pay us for managing and hosting models depending on the flat active GPU hourly rate your models run.

This is a priority option in which you run dedicated resources just for your case.

In case you look for more guidance reach out to our integration team.

AI copilot for your knowledge.

Connect your knowledge and work with your own data.

Start creating with AI