Whether you are a software company who wants to adopt a true value add for their end users, or need text classication and generation on a large scale.
Are you ready to 4x your workflow productivity with the power of large language models? Without all the infrastructure pain?
No environment issues, no GPU shortages, no prompting issues, no overloaded servers, no parameter optimization.
Only pure NLP power for your workflows and products. So you can focus on the product you want to get out.
Each solution is a simple integration of an API endpoint into your codebase.
CHOOSE YOUR MODEL
So many models and they feel like all having their own character. We help you choosing the best working models for your workflow. Within the NeoCortex system there are 4 categories.
Where Velox are the fastest models and Alta are the most powerful ones. Sophos models are our fine-tuned NeoCortex expert models for highly specialized workflows.
These models can be used for a variety of purposes, including classification, entity extraction, summarization, content generation, code generation, paraphrasing, and much more.
Short Description
the fast
the balanced
the strong
the expert
Price per 1K Tokens
$0.01
$0.02
$0.04
$0.12
Multilingual Surcharge
(Price per 1K Tokens)
+$0.12
+$0.12
+$0.12
+$0.12
Fine Tuning
on request
on request
on request
on request
Dedicated GPUs
on request
on request
on request
on request
Parameter Size
Up to 5 billion
Up to 19 billion
Over 20 billion
-
FINE-TUNING - IMPROVE PERFORMANCE AND EFFICIENCY
Out of the box models are not made for your specific pain points?
That’s why we help you nail them down to what you need. With fine-tuned models on your data and workflow we can achieve better results while reducing overall costs in a number of tasks.
A small set of 100 examples can:
Hit the end of the road with your optimization?
You don’t manage to put a larger model on your infrastructure. We help you host your own model and deal with ongoing optimization.
DEVELOPERS AT HEART
From Developers for Developers. Get integrated to our API in less than 3 lines of code.
Looking for a service to host your large language model as an API? Struggling to make a larger model on your infrastructure?
We help you host your own model and deal with ongoing optimization.
Give your workflow or your users the power to endless power to rewrite anything.
A token is the industry term for the volume our models operate. From the input you are giving them to work with to the output they are generating for you.
Think of them like paying for liters of water you consume from your city system and drain back into it.
Tokens are a form of measurement which can be expressed in different units like 1 liter of water is the same as 1000 milliliter of water. Similarly, 1 token consists of 4 characters.
BTW did you know that the average word consists of 4.5 characters? So 1 token is almost a word!
To put this all into perspective again. The text up to here has around 670 characters, 124 words and makes roughly 169 tokens.
If Alta would have created this block it would have costed:
[# of total tokens] x [per token price of Alta] = cost
169 x 0.00002 = 0.0038 USD
We work most of the time with our own proprietary NeoCortex models.
Other models which we can host, optimize and operate for you include:
- GPT Neo 2.7 bn.
- GPT-J 6 bn.
- GPT NeoX 20 bn.
- OPT: Open Pre-trained Transformer Language Models (125M to 66 bn)
- FairSeq (13 bn.)
- CodeGen (16 bn.)
- Bloom 560m
- Bloom 1.1 bn.
- Bloom 1.7 bn.
- Bloom 3 bn.
- Bloom 7.1 bn.
- Bloom 176 bn. (available soon)
- t5 Small
- t5 Base
- t5 Large
- t5 3b
- t5 11b
- Stable Diffusion hosted API
In case you have more questions reach out to our integration team.
We calculate your charges going by the amount of input and output tokens you are working with.
A generation task with our "Aecus" model for LongForm is most of the times very output heavy.
Think about the following input prompt:
“The NeoCortex text generation API helps you in your workflow” (60 character = 15 tokens)
Here is the actual output:
“What is the NeoCortex?
NeoCortex is a deep learning text generation API, which uses only natural language to generate texts. The generated text is designed to be human-readable and understandable, helping you in your workflow. The API is easy to learn, with a simple interface. This makes it perfect for use by developers who want to create their own applications.” (360 characters = 90 Tokens)
Makes a total token count of 15 + 90 = 105 Tokens.
0.02$ divided by 1000 times 105 = 0.0021 $ would be charged for your request.
Yes.
When you sign up to our platform and generate an API key you automatically have $5 to try out our APIs for 30 days.
For help around your integration reach out to our integration team.
When we fine tune your models we are taking your data and teaching a model how to behave on it and what is expected.
Like a nice suit we are tailoring you a text generation, classification, extraction model on your very needs.
All we need from you is a dataset with high quality data from your workflow.
We see first improvements from datasets as small as 500 observations.
In case you have more questions reach out to our integration team.
Yes.
Base functionalities require less than 3-lines of code. For this sake we also have multiple OSS packages for easy integration (PyPi, NPM-JS).
Furthermore, we offer you a dedicated integration support for your onboarding time.
While newcomers require around 4 to 6 weeks to see first results. Somebody switching to our infrastructure finds their way around in less than 2 weeks time.
While we try to manage our API endpoints as self explanatory as possible. We are always happy to help you find the best model.
In case you look for direct guidance reach out to our integration team.
Running large language models (LLM) requires powerful computing machinery in form of GPUs.
Building the infrastructure, maintaining and optimizing for it comes at high cost. We are experts in it having scaled down our models for our user facing products where response times in "ms" matter and cost optimization our margin driver is.
With TextCortex you have the benefit of benefitting from our NeoCortex system in two cost saving ways:
1) Pay-per-token
Ideal for smaller scale operations. For example, less 100k request a month or smaller models with less than 10 bn. parameters.
You run in a shared environment with other customers.
2) Pay-per-GPU-hour
Ideal for large scale process operation. For example, millions of request or models with more than 20 bn. parameters.
You pay us for managing and hosting models depending on the flat active GPU hourly rate your models run.
This is a priority option in which you run dedicated resources just for your case.
In case you look for more guidance reach out to our integration team.
Connect your knowledge and work with your own data.