🚀 Robots now fighting robots; What that AI experimenter did in 59 seconds; Crazy text-to-video AI

Wearable drone parachutes, private submersible superyacht & more

Mar 12, 2024

Hi,

This is Thomas, Cofounder and CEO of digital agency KRDS (more about me at the end).

You're receiving Future Weekly, my personal selection of news about some of the most exciting (and sometimes scary) developments in technology 🤖 summarized as bullet points to help you save time and anticipate the future 🔮.

First, you'll find small bites about many different news, and then further down these summaries:

All you need to know about Sora, the new crazy text-to-video AI by OpenAI
What that AI experimenter did in 59 seconds all at once
This Chinese Startup Is Winning the Open Source AI Race

Small Bites

The new text-to-video AI model by OpenAI, named Sora ("sky" in Japanese)
- To set the stage: Here are 2 videos: AI video generation 10 months ago vs. today
- A selection of amazing videos made using Sora:
  - "Welcome to bling zoo!" watch, amazing
  - "A bicycle race on ocean with different animals as athletes riding the bicycles with drone camera view" watch
  - “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” Watch
  - "This close-up shot of a futuristic cybernetic german shepherd showcases its striking brown and black fur..." Watch
  - "A walking figure made out of water tours an art gallery with many beautiful works of art in different styles" Watch
  - "realistic video of people relaxing at beach, then a shark jumps out of the water halfway through and surprises everyone" not so realistic, but funny/scary
  - "a wizard wearing a pointed hat and a blue robe with white stars casting a spell that shoots lightning from his hand and holding an old tome in his other hand" watch
  - More key insights from leading experts further down below
Wow, SpaceX seeks to launch its new gigantic rocket Starship “at least” 9 times in 2024 (source)
Who Saved the Most Lives in History
- 1 Carl Bosch - Synthetic Nitrogen Fertilizer Engineered: 2,3 billion
- 1 Fritz Haber - Synthetic Nitrogen Fertilizer Method: 2,3 billion
- 3 Karl Landsteiner - Blood Groups Leading to Blood Transfusions: 1,1 billion
- 4 Richard Lewisohn - Blood Storage Leading to Blood Transfusions: 1,1 billion
- 5 Norman Borlaug - Green Revolution - High Yield Wheat: 245 million
- See the following 95
Wearable drone ‘parachutes’ for construction industry, working at heights: watch the video
ChatGPT’ will test memory persistence across multiple chats (OpenAI).
- The service will remember personal details about a ChatGPT user even if they don’t make a custom instruction or tell the chatbot directly to remember something; it just picks up and stores details as conversations roll on.
- OpenAI says ChatGPT’s Memory is opt-in by default, which means a user has to actively turn it off.
- The Memory can be wiped at any point, either in settings or by simply instructing the bot to wipe it.
- You can of course still use Temporary Chats when you don’t want chatGPT to remember stuff.
Google introduced Gemini Pro 1.5 - A new model with insane context window and performance. (source)
- it is multimodal from the ground up—understands images, video, and audio natively.
- This new model can have up to 10M tokens in its context window. The big hype feature.
- = 1 hour video = 11 hours of audio = 30k lines of code = 700k words
- Context window means how long your prompt to an AI model can be and if you’re working with long-form content like business PDFs, books etc. you want all you can get.
- For comparison, the GPT-4 Turbo’s context window is only 128,000 tokens
- 3 examples
  - They fed the model a 402-page transcript of the Apollo moon landing mission. Then they showed Gemini a hand-drawn sketch of a boot, and asked it to identify the moment in the transcript that the drawing represents. “This is the moment Neil Armstrong landed on the moon,” the chatbot responded correctly. The model was also able to identify moments of humor.
  - In another demonstration, the team uploaded a 44-minute silent film featuring Buster Keaton and asked the AI to identify what information was on a piece of paper that, at some point in the movie, is removed from a character’s pocket. In less than a minute, the model found the scene and correctly recalled the text written on the paper.
  - One example highlighted Gemini learning to translate from English to Kalamang by simply copying a language manual into its context window.
About China
- In absolute terms China's gap in living standards relative to the U.S. is actually larger than it was in 1990.
- China is only at 28% of U.S. per capita GDP. (Noah Smith)
- China is the largest emitter of greenhouse gases by far but it's also playing a key role in solving the problem: it makes the cheapest solar panels and electric vehicles at scale. In 2023 alone, China added more solar panels than the U.S. has ever deployed.
That startup has created flying autonomous robots that can scan tree canopies and pick ripe apples and stone fruits around the clock (Bill Gates)
Musk signaled his respect for his rivals: “The Chinese car companies are the most competitive car companies in the world,” he said. “If there are not trade barriers established, they will pretty much demolish all other car companies in the world. They're extremely good.” (Wired)
An American entrepreneur attacks Tesla with a Super Bowl ad: watch the 30-sec ad
AI-generated audio went viral in because it deep faked the voice of a New York politician slamming another colleague (Politico)
IMF Says AI Will Upend Jobs and Boost Inequality. MIT AI Lab Says Not Fast. (source)
- They focused on computer vision, as cost models are more developed for this branch of AI. They found that the large upfront cost of deploying AI meant that only 23% of work supposedly “exposed” to AI would actually make sense to automate.
- While that’s not insignificant, they say it would translate to a much slower rollout of the technology than others have predicted, suggesting that job displacement will be gradual and easier to deal with.
AI holds tantalising promise for the emerging world (The Economist)
- India is combining large language models with speech-recognition software to enable illiterate farmers to ask a bot how to apply for government loans.
- Pupils in Kenya will soon be asking a chatbot questions about their homework, and the chatbot will be tweaking and improving its lessons in response.
- Researchers in Brazil are testing a medical AI that helps undertrained primary-care workers treat patients. Medical data collected worldwide and fed into AIs could help improve diagnosis.
- If AI can make people in poorer countries healthier and better educated, it should in time also help them catch up with the rich world.
And now: private submersible superyacht (source)

- Length overall: 165.8 m / Range: approx. 15.000 km / Submerged duration: approx. 4 weeks / Depth: approx. 250 m
Robots Are Fighting Robots in Russia’s War in Ukraine (Wired)
- Near the Ukrainian city of Avdiivka, a boxy robot zips along the rocky, cracked road. Snaking from side to side, the robot—a four-wheeled machine, around knee height—carries cargo and ammunition for Russian troops.
- However, it’s being watched. Hovering above the road, tracking the movements of the robot, is a Ukrainian drone. Suddenly, another drone smashes into the robot, blowing it to pieces.
- The attack, which happened in early December, is one of a small but growing number of incidents where unsophisticated robots have been used against other robots in Russia’s war in Ukraine
Research shows one person can supervise 'swarm' of 100 unmanned autonomous ground and aerial robots
- without subjecting the individual to an undue workload. (source)
Starlab commercial space station will launch on SpaceX's Starship rocket (Ars Technica)
- a large station with a habitable volume equivalent to half the pressurized volume of the International Space Station and will launch the new station no earlier than 2028
- While it took 37 space shuttle flights to assemble and outfit the International Space Station, Starlab will launch on a single Starship flight.
Hackers steal $25 million by deepfaking finance boss (source)
- Scammers are believed to have used publicly available footage to create deepfake representations of the staff. Some of the fake video calls apparently only had a single human on the line, with the rest being deepfakes created by the hackers.
- Despite at first being doubtful, the employee attended a group video call that featured the deepfake versions of the company's CFO, as well as other staff — who reportedly looked and sounded like their real-life counterparts.
The Rise of Batteries in Six Charts and Not Too Many Numbers: see the 6 charts
- Battery sales are growing exponentially up S-curves
- Battery costs keep falling while quality rises
Prosthetic limbs need not look like real ones (The Economist)
- the possibility of augmenting existing bodies with new capabilities, making prosthetics “a technology that could be of use to everybody, not just amputees”.
- To that end she has designed the “Third Thumb”, a small and robust prosthetic digit that does exactly what it says.

- Controlled, like Ms Knox’s vine-arm, by pressure sensors in a pair of shoes, the thumb can be used to replace a missing one. But it can also be added to an intact hand on the opposite side from its existing, biological thumb.
Fascinating and creepy video examples of real people falling for AI deepfakes created for fraud: watch
Google Pauses AI Image Generator After It Made "Racially Diverse Nazis" (source)

Finding this interesting? Share it on Whatsapp, just click here ❤️

If yes, feel free to take 3 seconds to forward that newsletter to one person, or share it on Whatsapp clicking here, I'd be immensely grateful
If that email was forwarded to you, you can click here to subscribe and make sure to receive future editions in your mailbox (many CEOs and startup founders are subscribers)

More to chew!

More about Sora

Sora’s technical report and see what OpenAI claims:
- "Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world."
- Sora can create videos in a large range of aspect ratios and resolutions. From widescreen 1920x1080p videos, vertical 1080x1920 videos and everything in between.
- Similar to DALL·E 3, OpenAI uses language models (GPT) to turn basic prompts into power prompts getting high-quality videos.
- Sora can use images and videos as inputs, not just text. That means:
  - It can animate images.
  - It can extend videos: backwards and forwards.
  - It can edit videos like changing the scene with keeping characters the same.
  - It can connect two videos, filling the in-between frames automatically
"Sora does not merely churn out videos that fulfill the demands of the prompts, but does so in a way that shows an emergent grasp of cinematic grammar." (Wired)
No, you can’t make coherent movies by stitching together 120 of the minute-long Sora clips, since the model won’t respond to prompts in the exact same way—continuity isn’t possible. For now...
Thomas: we don’t know yet when Sora will be available to the public, how long it will take and how much it will cost to generate a video of given length and resolution
The first generative models that could produce video from snippets of text appeared in late 2022. But early examples from Meta, Google, and a startup called Runway were glitchy and grainy. Since then, the tech has been getting better fast. Runway’s Gen-2 model, released last year, can produce short clips that come close to matching big-studio animation in their quality. But most of these examples are still only a few seconds long. (MIT Tech Review)
Yann LeCun, head of AI at Meta
- "Nice to generate videos, but largely useless for the purpose of modeling the world"
- Generation happens to work for text because text is discrete with a finite number of symbols. Dealing with uncertainty in the prediction is easy in such settings.
- Dealing with prediction uncertainty in high-dimension continuous sensory inputs is simply intractable. That's why generative models for sensory inputs are doomed to failure.
- Sora is trained to generate pixels. There is nothing wrong with that *if* your purpose is to actually generate videos.
- But if your purpose is to understand how the world works, it's a losing proposition.
- The best way to construct approximate-yet-predictive models is to find an abstract representation space within which relevant variables are represented and predicted and irrelevant details are not present.
- The number of possible video frames, for all practical purposes, is infinite, continuous, and high dimensional. Representing distributions (or uncertainty) over such spaces is not only fiendishly complicated, but largely useless for the purpose of modeling the world. (source, source)
Gary Marcus, AI researcher and Generative AI critic (source)
- “If you look at the videos for a second, you're like, ‘Wow, that's amazing.’ But if you look at them carefully, the AI system still doesn't really understand common sense,” he says. In some videos, the physics are clearly off, and animals and people spontaneously appear and disappear, or things fly backwards, for example.
- I predict that many issues will be hard to remedy. Why? Because the glitches don’t stem from the data, they stem from a flaw in how the system reconstructs reality.
- One of the most fascinating things Sora’s weird physics glitches is most of these are NOT things that appears in the data. Rather, these glitches are in some ways akin to LLM “hallucinations”, artifacts from (roughly speaking) decompression from lossy compression. They don’t derive from the world.
- More data won’t solve that problem. And like other generative AI systems, there is no way to encode (and guarantee) constraints like “be truthful” or “obey the laws of physics”or “don’t just invent (or eliminate) objects”.
- Space, time, and causality would be central to any serious world model
- As a technology for video artists that’s fine, if they choose to use it; the occasional surrealism may even be an advantage for some purposes (like music videos).
- As a solution to artificial general intelligence, though, I see it as a distraction.
- And god save us from the deluge of deepfakery that is to come.

What that AI experimenter did in 59 seconds all at once (Ethan Mollick)

Microsoft PowerPoint: I used the default Copilot option to turn a file (here an AI-written business case about Tesla) into a presentation.
Microsoft Word: I used Copilot with a simple prompt: Write a full syllabus for a 6 session introductory entrepreneurship class including tables, summarize the main class learnings, include assignments and grading.
ChatGPT with a version of my Trend Analyzer GPT, which has a short prompt that asks the AI to search for trends and then photoshoots of on-trend designs.
ChatGPT with my ProductLaunch GPT to look up a Wharton Interactive product - the Saturn Parable, and prepare a product launch post,
Bing/Copilot with the prompt: write a draft market research study in the style of a top strategy consulting firm on the market for virtual reality and augmented reality devices, use market research and discuss trends.
Results
- Five reasonably high-quality drafts were done in under a minute.
- If you last checked in on AI a few months ago, you might also be surprised at how much the quality of the output has improved.
- All the details here

This Chinese Startup Is Winning the Open Source AI Race (Wired)

01.AI, founded in June of last year, has raised $200 million in investment from Chinese ecommerce giant Alibaba and others and is valued at over $1 billion
Within a few days of its release 01.AI’s model, Yi-34B, rocketed to the top spot on a ranking maintained by startup Hugging Face, which compares the abilities of AI language models across various standard benchmarks for automated intelligence.
A few months on, modified versions of 01.AI’s model consistently score among the top models available to developers and companies on the Hugging Face list and other leaderboards.
“For many things, it’s the best model we have, even compared to 70-billion-parameter ones,” which might be expected to be twice as capable, says Jerermy Howard, an AI expert
“It’s a really good model that a lot of people are building on,” said Clément Delangue, CEO of HuggingFace, at a briefing in November shortly after 01.AI’s model was released.
The startup’s founder and CEO is Kai-Fu Lee, a prominent investor who did pioneering artificial intelligence research before founding Microsoft’s Beijing lab and then leading Google’s Chinese business until 2009, a year before the company largely pulled out of the country. He says the creation of Yi-34B is the culmination of his life’s work trying to build more intelligent machines.
01.AI’s engineers are experimenting with different “AI-first” apps, Lee says, for office productivity, creativity, and social media. He says the plan is for them to become successful around the globe, in a similar way to how Chinese-backed social network TikTok and online retailer Temu are top apps with US consumers.
Whatever the extent to which Yi-34B borrows from Meta's Llama 2, the Chinese model functions very differently because of the data it has been fed. “Yi shares Llama's architecture but its training is completely different—and significantly better,” says Eric Hartford, an AI researcher at Abacus.AI who follows open source AI projects. “They are completely different.”
The connection with Meta’s Llama 2 is an example of how despite Lee’s confidence in China’s AI expertise it is currently following America’s lead in generative AI. Jeffrey Ding, an assistant professor at George Washington University who studies China’s AI scene, says that although Chinese researchers have released dozens of large language models, the industry as a whole still lags behind the US.
Kai-Fu Lee avait écrit AI Superpowers en 2018, j'avais écrit cette analyse

Previous newsletters:

That's it for this week :)

If you made it until here, well, thanks a lot for reading this newsletter! A very simple way to encourage me to continue doing this is to take a few seconds to:

transfer this to one curious friend, or share it on Whatsapp clicking here
click on the little star next to that email in your mailbox
click on the heart at the bottom of that email

Thank you so much in advance!

Here to subscribe to make sure you get the future editions if this one was forwarded to you.

More about me

I cofounded KRDS right after college back in 2008 in Paris, we now also have offices in Singapore, HK, Shanghai, Dubai and India, we're one of the largest independent digital agencies in Asia. More here.
Watch our latest game showreel: At KRDS, we take pride in designing and developing games from scratch for brands and organizations, big and small! Gamification has always been part of our DNA, since our early days creating viral apps on Facebook back in Paris as the very first Facebook marketing partner outside of the USA!
I also run The WeChat Agency for the Chinese market (the Government of Singapore Investment Corporation, GIC, is a client)
I’m the cofounder of Yelda.ai, which deploys voice AIs able to answer customers and prospects calling your company on the phone using natural language.
I also write op-eds and do podcasts at times. Here are my latest articles and podcasts, and here my last episode on the Abundance Makers podcast, interviewing one of the most promising clean tech CEOs in the US.
My Linkedin and Twitter
For the French speakers:
- I’ve written more than 50 articles on the future of technology over the past years, all can be found listed here.
- This newsletter has a French version with slightly different content: Parlons Futur

Have a great weekend :)

Thomas