Invisible Pen — make any screen a touchscreen without special hardware
Year 2020 demanded big changes in our life style, especially in getting things done remotely. Networking and collaboration took a big hit due to this. The technology simply isn’t there yet to replace in-person human interactions for everyday tasks. One such area is the ability to express ideas freely on a whiteboard either during a meeting or for teaching. With just computer screens and the default hardware like mouse and keyboard a teacher cannot draw something easily to explain a concept better if it wasn’t clear from her slides. But as the famous saying goes, necessity is the mother of invention and people did come up with creative ways to solve this,
All the above solutions work great and are extremely economical but the overall experience for both the presenter & viewer is suboptimal. On the other end of the spectrum there are expensive solutions like Apple Pencil on iPads, touch screen monitors or custom hardware like Airbar. These work great for people who want to and can afford them. Can we have a middle ground?
Coming from a computer science and applied machine learning background, I thought what if we could use phone & laptop together, which is present in most households these days, to solve this problem. That is, my ideal solution would be something that satisfies all these —
- lets the presenter use their hand (and not mouse or external hardware) to express their ideas
- uses existing devices in houses so that it is economical
- is easy and free for anyone to use and setup
- should work always i.e. is reliable
- should be accurate enough to be usable for the above problem
Invisible Pen is one such attempt. While I didn’t solve it completely (and am looking for any like minded folks to work with), here is how the current solution works,
- The mobile device which is mounted on the tripod tracks the live hand movements of the presenter / teacher
- A server on the laptop receives the coordinates of the index finger from the mobile device over a local secure connection
- The server translates the finger coordinates to pixels on the screen and moves the mouse pointer thereby tracing the presenter’s drawing on the screen
Note that the same idea works directly on the laptop screen and doesn’t require any additional monitor per se.
As we directly control the mouse, this setup can be used beyond rough sketching use cases — e.g. flip the pages of an e-book or a PDF document
- On-device hand tacking is done by handpose, as a Tensorflow.js model
- Front end business logic is written in React.js & Next.js
- Predicted hand coordinates are sent wirelessly to a Python Flask web server using socket.io, which is on the same local network
- Our custom translation logic written in Python converts hand coordinates to screen pixel coordinates
- pyautogui and pynput Python libraries control the mouse and keyboard of the laptop
Currently the mouse is moved only if the user presses a (customizable) hotkey on their keyboard, but this can be simplified / automated in the future, if need be. The entire source code will be made available on Github soon.
We just got started and only proved that this setup is usable and convenient. There is a long way to go with a ton of exciting computer science challenges to be solved, specially in these areas
- Model accuracy
- On-device model performance
- Optimize the communication channel between the phone and laptop
- Robust computer algorithms to translate hand coordinates to the laptop’s screen pixels more accurately
- User experience research to improve the setup and onboarding
- Other mobile configurations besides behind the human, currently shown above
and more. Comment below if you would like to collaborate on this open source work!